COP 4610 Coding Standards

COP4610: Operating Systems & Concurrent Programming

Coding Standards & Practices

Spring 2015

These notes expand on the coding standards and practices outlined in the Study Guide. Please adhere to thse for any code you write in this course, unless given specific instructions to the contrary.

Indentation & Other Formatting Conventions

Code must be indented, and corresponding syntactic elements aligned, according to a consistent set of rules that reflect the nesting of sytactic structures and promote readabiity. You should also follow a consistent convention regarding the uses of upper- vs. lower-case letters, and underscores, in identifiers for macros, functions, types, parameters, variables, etc. If you are updating a provided file you should maintain the conventions established by the author of the file. For new files, choose an appropriate convention. For example, you may follow the conventions of the Linux kernel ( https://www.kernel.org/doc/Documentation/CodingStyle), or Dr. R.C. Lacher's coding style used in prerequisite courses ( http://www.cs.fsu.edu/~lacher/courses/DOCS/codestandards.html).

Internal Documentation (Comments)

Comments should be used to enhance understanding, by providing information that cannot be easily extracted from the code alone. Specifically, the following forms of comments are required:

Every source file should have a block comment at the beginning, containing at least the name of the file, date created, date last updated, author(s), and a brief description of the file contents (including how they relate to the larger application or system to which they belong). A copyright and licensing statement may be used as well, typically the last item in the header documentation.
For each global data structure, at the point where the corresponding structor typedeffirst appears, an explanation of the abstraction it implements (e.g., a linear null-terminated linked list, a circular doubly linked list, a a hash table with re-hashing, etc.) This often include "invariant" properties of the data structure, such as null-termination, which must be preserved by every piece of code that operates on it. For concurrent programs, this includes the mechanism or conventions that are used toensure mutual exclusion and prevent deadlock. Write these before you write the functions that implement algorithms on the structure.
For each function, at the point where the function prototype first appears, an explanation of:
1. A short explanation of what the function does, and how the parameters affect that, if it is not obvious from their names (and what is obvious to you might not be so obvious to others, or to yourself a few months or years later).
2. An assumptions the function makes about the values of its parameters, beyond that conveyed by the types and modes of the parameters, and about global variables or files upon which it depends for effect. These areoften called the "preconditions" for calling the function.
3. Guarantees it provides about the value returned from the function, and changes i makes to global variables and files, if the assumptions above are satisfied. These are often called the "postconditions" for the function.
4. If the function can fail, the convention on how failure is reported.
Write these comments before you write the function implementation, in the header file, and update them as necessary after you have completed the implementation.

Do not clutter your code with line-by-line comments that simply restate in English what the code already expresses. Reserve local comments for situations where the code is doing something that is not obvious.

Do write the comments as you go, and keep your comments up-to-date. Misleading out-of-date comments are worse than no comments.

OUTPUT & MESSAGES

Debugging, trace, and error message output is a necessity, but it should never be mixed into the same output file or stream as the normal correct output of a program. In particular, a program that fails should not corrupt any file as a side-effect of error messages, nor should debugging/trace output change the effect of a program on the files that it normally is expected to produce (thereby causing tests to fail).

Error messages should normally be sent to the standard error stream sdterr, or to a special log file (e.g., see the syslog() facility in Linux, not to stdout.
Debugging, trace, or other forms of logging output should be controllable, as to the level of verbosity (or total silence), via environment variable and/or command-line parameter.
Debugging code should generally be designed into a program, and retained for maintenance. (Removing debugging code for delivery is a frequent cause of other errors, and re-inserting debugging code during bug-fixing is another source of errors, as well as a waste of time.) If overhead is of concern, conditional compilation directives #ifdef DEBUG ...should be used. Comments should never be used to disable any code, debugging or otherwise.
Use standard error-reporting and logging mechaisms, like perror(), strerror(), and syslog() where appropriate.

File Formats

For this course, program files must be in a form that can be compiled, read, and printed under the Unix operating system.

The single character LF (CTR-J) (alone) is used to indicate the an end of line, and the file should end with such a new-line character.
The code should not contain any tabs, nulls, or other nonprintable (formatting) characters, or any blanks at the ends of lines.
The character encoding should be 8-bit ASCII or UTF-8. Avoid 16-bit encodings or variable-length encodings, like Unicode.
No line should contain more than 80 characters.

Take care that you do not use a Windows/DOS editor to edit program files. Windows/DOS uses two characters (^M^J) to indicate an end of line. The extra character (^M) will prevent your program from compiling under Unix. Take care not to process code with a word processing editor or e-mail tool that inserts blanks, tabs, or other "whitespace" characters at the ends of lines. Do not try to send source code in e-mail using a Windows-based mail agent; they are known to insert line breaks in long lines. In C-language macro definitions, adding extra whitespace at the end of a line can cause compilation errors. Likewise, breaking a line can cause syntax errors. The instructor has no recent analogous experience with Macintosh systems, but common sense dictates that there are likely to be similar pitfalls. To avoid such problems, you should do all of your editing of program code for this course on a Unix/Linux system, using either the emacsor vieditor. You may upload and download C/C++ source files to your personal system for backup, but you should probably not try to modify them there unless you are very savy about avoiding the above kinds of problems.

File Naming Conventions

C source code files should be divided into two types:

Header files, whose name ends with the suffix " .h". These may include the following:
- documentation for the file
- #include, macro/symbol definitions, and conditional compilation directives
- constant, type, and structure definitions
- function prototypes
Always protect header files from multiple read using the convention
```
   #ifndef _FILENAME_H
   #define _FILENAME_H
     ...
   #endif
```
Always use angle brackets for include files:
```
   #include <myfile.h> // OK - location of file is unspecified
   #include "myfile.h" // NOT OK - location of file is hard coded (relative)
   #include "/directory/myfile.h" // NOT OK - location of file is hard coded
    (absolute)
```
The reason: angle brackets allow for the included file to be movable without editing the file in which they are included. Quotes force an edit of #include statement whenever the relative locations of the includee and includor are changed or the absolute path of the includee is changed. It is much better to resolve these issues in the build record (makefile).
Implementation files, whose name ends with the suffix " .c". These may include the following, in this order:
- file header documentation
- #include directives
- constants
- locally used function prototypes (if needed)
- function implementations

Regardless of how an assignment is submitted, your instructor will specify a file naming convention that will allow your submitted work to be easily identified, among different assignments that you and other students submit for the course. It is essential that you follow the file naming convention for the assignment, or else your work may not be graded. For example, if the assignment says you are to name a file "prog1.c" and you name it "program1.c" it may not be graded.

Robust Coding

The following are some rules that I have found lead to more robust code. This is not exactly a matter of style, but more a matter of sound programming practice. Read about additional rules in the notes on secure coding.

Always check the results of all functions that can fail and return an error code, and handle the failure case in a safe way. Examples include malloc(), which returns NULL upon failure, and fork(), which returns the value -1 upon failure.
Explicitly initialize all variables, including all components of structures.
Always check for possible array/buffer overflows, and handle violations in a safe way.
Make no assumptions about the length and syntactic structure of inputs.
Make no assumptions about the vaidity of command-line arguments to progams.
Beware of dependencies on environment variables, including system calls whose effect can be modified by environment variables, which are implicit parameters to the program. For example, avoid calls to system(), and whenever using execve() verify both the security of the executable file and the environment variable values that are passed to it.
Make no assumptions about the length (in bytes) of any data type. Use strlen() and sizeof() where appropriate, but with care not to confuse pointers with objects pointed to.
Take care to avoid the possibility of free() being called more than once on the same object.
Take extreme care with pointer type conversions, including uses of void * (which is required by many operating system API calls), that the pointer actually points to a valid value of the target type.
Compile with warnings turned on, and pay attention to the warnings. In general, enable the gcc warning options including " -Wall -Wextra -pedantic". There should be no warnings, with the exception of some specific cases allowed by the assignment (e.g., use of gcc-specific extensions for uses of macros from the Linux kernel header list.h).
Whenever a function makes assumptions about its parameters, document them, especially where the function does not (or cannot) check its parameters for validity.
Beware of the potential effects of signals, which can be generated for and delivered to a progam from outside at any time.
Write error checking and recovery code in a layered systematic way, checking for errors "outside in", and recovering "inside out" (unwinding initializations and recovering resources). You may use goto(only) to implement a set of nested error recovery actions, similar to exception handlers in other languages, as practiced in the Linux kernel code.
1. Program command-line parameter and environment variable errors should be caught at start-up
2. Failures in module initialization code should be caught within the module, and generally cause program termination
3. Error recovery code should ensure that any resources not local to the process (e.g., objects in the filesystem namespace) are recovered and restored to a valid state
No error condition or failure should be entirely ignored. I have found it helpful to recognize three classes of errors, which need to be treated differently:
1. Fatal errors, from which no safe recovery is possible. These require termination of the program. After cleaning up any persistent objecs (e.g., files) to a valid state, call exit() with an appropriate exit status value (positive) that indicates failure to the parent process. Depending on the nature of the failure, it may also be appropriate to issue a message to a system log file or the standard error stream, e.g., through a call to perror().
2. Failure of a function for which there is a convention regarding return values that covers failure cases. In this case, the return value of the function should be the appropriate failure code. The model is analogous to C-library and system calls, which generally return 0 upon success, and some other value if they fail.
3. Errors from which local recovery is possible in a way that allows correct continuation of the rest of the program. The error should still be logged, but execution may proceed, at least up to some predetermined point where further progress becomes impossible. An example of such errors would be error messages produced by a compiler for syntax errors. One would expect the compiler to continue execution through the end of the parsing phase, but not produce executable code. Another example would be an HTTP Web server, which aborts service of a request if the URL is ill-formed, logs the failure, and returns to state where it is ready for the next request.

Portability Issues

As explained in the Study Guide learning to write portable code is one of the objectives of this course. Portability is generally achieved through adherence to widely supported standards, and avoiding dependence on implementation-specific features of the execution platform, compiler, libraries, and operating system. Severl specific rules are given in the Study Guide for this course. In addition, please consider the following principles whenever you code.

Be conservative in your choice of standards. Many people are using old versions of operating systems, and old compilers, that probably are not completely up-to-date with the most recent standards. Even the most recent release of gcc(at the time of this writing) did not completely support the most recent C language standard (C99), and many Linux systems are running older versions of gcc. For example, at the time of this writing, the version of gcc on the program servers was behind the version on the linprog servers. So, a person concerned with portability, even across Linux systems, may want to avoid writing code that depends on new features introduced by C99. You can generally control which version of the language a compiler checks for, for example the gccoption --std=c90specifies the C90 standard.

The same applies to libraries. The Unix/POSIX operating system service library functions interfaces are even implemented by Microsofts' Windows operating systems. When compiling, pay attention to correct use of appropriate feature-test macro definitions ( e.g. #define _XOPEN_SOURCE) to enforce standard-compliant versions of header files are used. Beware that the POSIX and Open Group standards, like programming language standards, go through revisions and a given implementation may not support the latest standard. Be careful about man-pages. They are generally specific to one OS version, and may mislead you with respect to what behavior is supported by POSIX. The Open Group has harmonized its Unix Standards to be consistent with the POSIX standard, and you can obtain access to the official Unix/POSIX man-pages from The Open Group's website for free, by signing up. Generally avoid usage that is specified as having "implemenation-defined" behavior.

For shell script portability stick to the syntax of the standard shshell, which is a subset of that supported by the bashshell, and begin the file with the indication of which shell should execute it, i.e., #/bin/sh.

For makefiles and scripts used by other utilities such as awk, stick to the portable POSIX syntax, or at least verify that they work on both Linux and SunOS.

T. P. Baker. ($Id)