C++ Program Structure

A C++ program must adhere to certain structural constraints.

A C++ program consists of a sequence of statements.
Every program has exactly one function called main.
Programs are built using one or more files, along with the usage of predefined libraries.
Statements are the smallest complete executable unit of a program. There are three types:
1. Declaration statements
2. Expression statements
3. Compound statements -- sets of statements enclosed in set braces { } (often called a block)
C and C++ programs usually consist of multiple files and at least one library
A library is precompiled code available to the programmer to perform common tasks; the location of the library code is known to the compiler
The library is defined for the program via an interface, or header file
By using the #include<> directive a library is made a part of your program
Other files may also be made part of the program using #include<> directives. Non-library files locations are not known to the compiler and thus must be given to the compiler explicitly.
The other form of #include" " directive, using quotes instead of angle brackets, is used only when the included file is in the same directory as the one where the compile command is invoked. This form is not recommended.

The reason use of the form #include" " is not recommended: use of "" builds an assumption about the location of files into the source code file itself. Using <> makes the project structure independent of location of files, leaving the location issues to the project build, where they are handled with the -I compile option. It is very common for project files to be developed in one directory and then migrate into various directories after development. Using <> makes the code files independent of changes in file location, whereas using "" means the files would have to be edited when their locations change.

Native Data Types

Native (aka "built in" or "atomic") data types are the types defined by the C++ language.

Signed integer types: char, short, int, long
Unsigned integer types: unsigned char, unsigned short, unsigned int, unsigned long
Floating point types: float, double, long double
Special type: bool (has values true and false)

The sizes for the various types are implementation dependent, with some constraints. The size of char is typically one byte, and the sizes must be non-decreasing as you read from left to right in these lists. To see what a particular installation uses, run this program:

#include <iostream>
int main()
{
   std::cout << "Size of bool           = " << sizeof(bool)   << " bytes\n\n";

   std::cout << "Size of char           = " << sizeof(char)   << " bytes\n";
   std::cout << "Size of short          = " << sizeof(short)  << " bytes\n";
   std::cout << "Size of int            = " << sizeof(int)    << " bytes\n";
   std::cout << "Size of long           = " << sizeof(long)   << " bytes\n\n";

   std::cout << "Size of unsigned char  = " << sizeof(unsigned char)   << " bytes\n";
   std::cout << "Size of unsigned short = " << sizeof(unsigned short)  << " bytes\n";
   std::cout << "Size of unsigned int   = " << sizeof(unsigned int)    << " bytes\n";
   std::cout << "Size of unsigned long  = " << sizeof(unsigned long)   << " bytes\n\n";

   std::cout << "Size of float          = " << sizeof(float)  << " bytes\n";
   std::cout << "Size of double         = " << sizeof(double) << " bytes\n";
   std::cout << "Size of long double    = " << sizeof(long double) << " bytes\n";
   return 0;
}

The sizeof() function may be applied to user-defined types as native types. It also applies to variables. The return value is in units of bytes.

Declared Variable Attributes

Every declared variable has the following attributes:

A name, chosen by the programmer, aka identifier
A type, specified in the declaration of the variable
A size, determined by the type
A value, the data stored in the variable's memory location
An address, the location in memory where the value is stored
The storage class, determining how the variable is situated in memory
The scope, determining when the variable is "visible" in the source code
The linkage, used in multifile programs

The declaration of variables is discussed later in this chapter.

Declared variables are static, meaning that (1) they have names that are fixed and determined at the time and place they are declared and (2) there is memory bound to the variable at the time the program is compiled. We say that these variables are bound to memory at compile time. When a program is compiled, a symbol table is created mapping the static variable name to its type, size, address and other attributes. Another kind of variable is bound at run time and called dynamic. These are discussed in another chapter.

Naming Variables

The names of variables (and other identifiers chosen by the programmer, such as names of constants, classes, and types) are subject to constraints:

Identifiers may consist of letters, digits, and underscores
An identifier must start with a non-digit; leading with underscore is best reserved for special purposes related to language implementation and support
C++ is case sensitive
Reserved words may not be used as identifiers

Here is a list of reserved keywords in C++:

asm       auto              bad_cast   bad_typeid 
bool      break             case       catch 
char      class             const      const_cast 
continue  default           delete     do 
double    dynamic_cast      else       enum 
except    explicit          extern     false 
finally   float             for        friend 
goto      if                inline     int 
long      mutable           namespace  new 
operator  private           protected  public 
register  reinterpret_cast  return     short 
signed    sizeof            static     static_cast 
struct    switch            template   this 
throw     true              try        type_info 
typedef   typeid            typename   union 
unsigned  using             virtual    void 
volatile  while

Note that other words are defined in standard libraries and are therefore reserved implictly. Examples include: size_t (defined in <stdlib>) and wchar_t (defined in <iostream>).

Declaring Variables

The basic format for declaration is:

typeName variableName;

Note that the left-hand item is a type, and must be understood as such by the compiler. The right-hand item is the identifier chosen by the programmer. The following are some examples of declarations:

int x;
float y;
int a, b, c;        // can also list several variables in one declaration 
                    //  statement, separated by commas
int a=0, b=0, c;    // can also initialize variables in 
                    //  declaration statements
int a(0), b(0), c;  // alternate syntax for initialization

A variable can be declared constant by using the keyword const; a constant must be initialized in the same statement, as in the following example:

const double PI = 3.14159;

Here are more examples:

int main()
{
   int x;
   float average;
   char letter;
   int y = 0;
   int bugs, daffy = 3, sam;
   double larry = 2.4, moe = 6, curly;
   char c, ch = 't', option;
   long a;
   long double ld;
   unsigned int u = 12;
}

Literals

integer literal -- an actual integer number written in code (4, -10, 18)
float literal -- an actual decimal number written in code (4.5, -12.9, 5.0) -- Note: these are interpreted as type double by most C++ compilers
character literal -- a character in single quotes: ('F', 'a', '\n')
string literal -- a string in double quotes: ("Hello", "Bye", "Wow!\n")

Comments

A comment in a program is a portion that is ignored by the compiler. The very important purpose of comments is to serve the humans who create, read, and maintain the code. The C block-style technique of commenting carries over to C++, i.e., comments can be enclosed in delimiters /* (comment here) */. This form of comment is useful for multiple lines of commentary.

/* This is a comment.
   It can stretch over
   several lines.  */

C++ allows an additional comment style, designed for shorter remarks embedded in code but suitable for only one line of comment:

int x;    // This is a comment that ends at the end of this line
x = 3;    // This is a comment that ends at the end of this line

Everything from the double slash // to the end of the line is a comment.

Operators

Operators are functions with special evaluation syntax. It's a good idea to keep in mind that operators are functions, because the function evaluation syntax must be used when re-defining operators, a topic we will study in this course.

Here is an example of the familiar addition operator used with operator syntax and operator function syntax:

int x, y, z;          // declare three int variables
...                   // code that gives x and y each a value
z = x + y;            // operator syntax
z = operator+(x,y);   // operator function syntax

The last two lines of this code have identical behavior: z is assigned a value equal to the sum of x and y.

A unary operator is an operator function that has one operand. A binary operator is an operator function that has two operands. A ternary operator is an operator function that has three operands. With the exception of the "conditional expression" operator expr ? expr : expr inherited from C, all C++ operators are either unary or binary.

C++ has a very rich set of operators. (There are 67 C++ operators listed on pp. 120-121 of [Stroustrup], with 18 levels of precedence.) We will discuss only a few here. Others will be introduced as they are needed.

Arithmetic Operators

The basic arithmetic operators are defined as native for integer types as follows:

Name Symbol Type Usage

Add + binary x + y

Subtract - binary x - y

Multiply * binary x * y

Divide / binary x / y

Modulo % binary x % y

Minus - unary -x

These operators perform arithmetic as expected for integers. Note that division x/y produces the quotient and modulo x%y produces the remainder when x is divided by y. All but modulo are overloaded for floating point types and have meaning as expected in that context. Here is an illustration:

int   p = 23, q = 5, r;
float x = 23, y = 5, z;
r = p / q; // r has the value 4
r = p % q; // r has the value 3
z = x / y; // z has the value 4.6

When doing an arithmetic operation on two operands of the same type, the result is also the same type. What about operations on mixed types? Suppose we have the declarations

int x = 5; 
float y = 3.6;

If we do (x + y), what type will the result have? In this case, a float. The rule is: an arithmetic operation on two mixed types returns the larger type. The result of (x + y) will be 8.6.

Please keep in mind that we are glossing over the issue of internal representation. A computer uses binary representation for internal storage of numbers. We are using decimal representation in this discussion. Decimal notation is only an external representation for human use. This is what we would see if we output the values to screen.

Operator Precedence

As noted above, there are 18 levels of operator precedence in C++, way beyond the scope of this review. (We will encounter many of these operators during this course, but not all.) This is both very rich and extraordinarily complicated to remember. The best rule is: When in doubt, use parentheses to force the order of operator evaluation.

For the more familiar and oft-used operators, however, it is easy to remember their syntax, precedence, and associativity. Here is a table of most of the C++ operators:

Common C++ Operators, Grouped by Precedence (High to Low)

Name Usage

binary scope resolution class_name :: member

binary scope resolution namespace_name :: member

unary (global) scope resolution :: name

value construction type expr

run-time checked conversion dynamic_cast<type> ( expr )

compile-time checked conversion static_cast<type> ( expr )

unchecked conversion reinterpret_cast<type> ( expr )

const conversion const_cast<type> ( expr )

post increment Lvalue++

post decrement Lvalue--

member selection object.member

member selection pointer->member

bracket operator pointer [ expression ]

function call function ( parameter list )

size of object sizeof expression

size of type sizeof ( type )

pre increment ++Lvalue

pre decrement --Lvalue

not ! expression

unary minus - expression

address of & lvalue

dereference * pointer

create new type

destroy delete type

member selection object.*pointer-to-member

member selection pointer->*pointer-to-member

multiply expr * expr

divide expr / expr

modulo expr % expr

add expr + expr

subtract expr - expr

shift left expr << expr

shift right expr >> expr

less than expr < expr

less than or equal expr <= expr

greater than expr > expr

greater than or equal expr >= expr

equal expr == expr

not equal expr != expr

bitwise AND expr & expr

bitwise XOR expr ^ expr

bitwise OR expr | expr

logical AND expr && expr

logical OR expr || expr

conditional expression expr ? expr : expr

assignment lvalue = expr

multiply and assign lvalue *= expr

divide and assign lvalue /= expr

modulo and assign lvalue %= expr

add and assign lvalue += expr

subtract and assign lvalue -= expr

shift left and assign lvalue <<= expr

shift right and assign lvalue >>= expr

bitwise AND and assign lvalue &= expr

bitwise OR and assign lvalue |= expr

bitwise XOR and assign lvalue ^= expr

throw exception throw expr

comma (sequencing) expr , expr

Note that the arithmetic operators have relative precedence in the language that follows normal mathematical usage. Note also that assignment (and its embelishments) have very low precedence, so that in a statement such as

x = a + b * c;

the evaluation is as you would hope and expect, namely x is assigned the value a + ( b * c ). There are suprises lurking in all this complexity, however, so remember the "when in doubt" rule.

Increment and Decrement

C++ has a number of unary operators. Among the most used are the four increment/decrement operators:

int x,y;
...
++x;   // prefix increment   same as x = x + 1; returns reference to (new) x
x++;   // postfix increment  same as x = x + 1; returns value of old x
--x;   // prefix decrement   same as x = x - 1; returns reference to (new) x
x--;   // postfix decrement  same as x = x - 1; returns value of old x

Note the distinction between the pre- and post- versions. The prefix returns a reference to the (newly updated) variable. The postfix returns the value of the variable before updating it. The behaviors are illustrated in this code example:

x = 2;
y = ++x;    // x and y have the value 3
y = x++;    // y has the value 3 and x has the value 4

Because the postfix versions of increment/decrement must build and return a value, they are slightly less efficient than the prefix versions. It is therefore good practice to use the prefix versions unless there is a specific need for postfix.

Operator Associativity

Each operator has a default associativity used when an otherwise ambiguous expression is formed. For example, the statement

sum = x + y + z;

is technically ambiguous, because operator+( , ) requires exactly two arguments, and there are three in the statement. The default associativity takes over in such situations to provide consistent meaning:

sum = x + y + z;    
sum = (x + y) + z;  // statement meaning identical to first

That is, first x and y are added, then z is added to the result. For this operator, default associativity is left-to-right, or LR. Most binary operators have LR default associativity. A notable exception is the assignment operator = which associates right-to-left (RL):

a = b = c;    // valid statement
a = (b = c);  // identical meaning

That is, first c is assigned to b, then b is assigned to a. The result is that all three variables a, b, and c have the same value.

Assignment and Equality Operators

The symbol = has ambiguous meaning in algebra, sometimes asserting that two things have the same value (as in "let x = y"), and other times asking the question whether two things have the same value (as in "solve x = y"). These two usages must be separated in a programming language. The first is assignment and is done in C/C++ with operator =. The second is equality and is done in C/C++ with operator ==. These two operators are very different in meaning and useage. Unfortunately, they are very similar in appearance, which can cause problems debugging programs when they are inadvertantly interchanged.

Assignment is an operator that first, as a side effect, makes its Lvalue (the operand on its left) equal to its Rvalue (the operand on its right), and second returns a reference to the (new) Lvalue). Here are some example usages:

x = 5;       // x is assigned the value 5
y = 10;      // y is assigned the value 10
z = x + y;   // z is assigned the value of the expression x + y,
             //  which is evaluated first, obtaining 15, which is assigned to z

Equality is an operator that returns "true" or "false" (either a boolean value or an integer), depending on whether the arguments are in fact equal or not. Equality is commonly used to test for conditional branching or loop termination:

if ( x == y )         // conditionally execute one of two statements
  z = 1;
else
  z = 2;

do 
{
  whatever();
}
while (x == 100);   // conditionally terminate loop

There is an entire family of assignment operator derivatives, such as operator += and operator &=. There is another family of equality/inequality operators such as operator < and operator >=.

Implicit Type Conversion

Whenever a variable of an unexpected type is used in an expression, either the ambiguities must be resolved by implicit, or "automatic" type conversion, or an error will occur. The general rule is that when there is a known rule for converting the unexpected type to the expected type, that rule will be invoked and computation can proceed.

For example, when the types on the left and right sides of an assignment statement do not match, that is, the Rvalue and Lvalue have different types, the assignment statement is allowed to proceed if and only if there is a way provided to convert from the Rvalue type to the Lvalue type. For native types, there is generally a way to convert from smaller types to larger types but not the reverse. (Care must also be taken when converting between signed and unsigned types of the same size.) Similarly, when mixed types appear in an arithmetic expression, the types will be converted to the largest type appearing in the expression:

char          a, b;
int           m, n;
float         x, y;
unsigned int  u, v;
m = a;  // OK
a = m;  // error
x = m;  // OK
u = m;  // dangerous - possibly no warning
x = a + n; // result of (a + n) is type int; converted to type float for assignment

Type conversion is another place where a "when in doubt" rule should be used: When in doubt, make type conversions explicit.

Explicit Type Conversion: Casting

It is excellent practice to always explicitly convert types rather than rely on the compiler, which may not always know the programmer's intent. Most experienced programmers use implicit type conversion only within one of these two families of native types:

Signed Family = {char, short, int, long, float, double, long double}
Unsigned Family = {unsigned char, unsigned short, unsigned int, unsigned long}

and otherwise use explicit type conversion in the form of cast operators. The C cast operator is invoked like this:

c = (char)y;   // cast a copy of the value of y as a char, and assign to c
x = (int)b;    // cast a copy of the value of b as an int, and assign to x

C++ installations may or may not recognize the C cast operator. C++ has a richer set of casting operators that give the programmer better control of how and when the type conversion occurs. The analog of C casting is the operator static_cast<type_name>(expr), which converts the value of expr to type type_name. This new style cast is invoked like this:

c = static_cast<char>(y); 
x = static_cast<int>(b);

There are two other C++ cast operators, dynamic_cast<type_name>(expr) and reinterpret_cast<type_name>(expr). The angle brackets in these operators are used to denote template parameter arguments, a topic we will cover later in the course.

Scope

The scope of a variable is the portion of the source code where the variable is valid, or "visible", to the computational context. Scope is determined implicitly by program structure.

A variable that is declared outside of any compound blocks and is usable anywhere in the file from its point of declaration is called a global variable and is said to have global scope. A variable declared within a block (i.e. a compound statement) has scope only within that block.

C++ allows the declaration of variables anywhere within a program, subject to the declare before use rule. C requires variable declarations at the beginning of a block. Here is code illustrating scope of three variables:

                                //   scopes:
                                //  x  i  j  k
float x;                        //  |
int main()                      //  |
{                               //  |
  int i;                        //  |  |
  for (int j = 0; j < 100; ++j) //  |  |  |
  {                             //  |  |  |
    std::cin >> i;              //  |  |  |
    int k = i;                  //  |  |  |  |
    // more code                //  |  |  |  |
  }                             //  |  |
  return 0;                     //  |  |
}                               //  |

Note that x is global, i and j are local. The following code is more subtle.

#include <iostream>
int main()
{
  std::cout << "\nStarting Program\n";
  {
    int x = 5;			// declare new variable
    std::cout << "x = " << x << '\n';
    {
      int x = 8;
      std::cout << "x = " << x << '\n';
    }
    std::cout << "x = " << x << '\n';
    {
      x = 3;
    }
    std::cout << "x = " << x << '\n';
  }
}

You can run the program to be sure you understand how scope rules affect the values of the variables.

Namespaces

Namespaces are also used to limit the scope of identifiers. A namespace is created using the namespace key word as follows:

// filename: mystuff.h
namespace mystuff
{
  // declare, define things here, such as
  int myfunction (int x)
  {
    // code here
  }
} // end namespace mystuff

Any declarations and definitions made within the namespace block will not be in the global namespace, but will be in the namespace mystuff. Thus statements using items from mystuff must "resolve" the scope with a scope resolution operator ::, as in this code:

#include < mystuff.h >
int x, y;
x = myfunction(y);          // error - unrecognized identifier
x = mystuff::myfunction(y); // OK - namespace resolved

Implicit namespace resolution may be used by invoking a using directive, as follows:

#include < mystuff.h >
using mystuff;              // includes mystuff into global namespace
int x, y;
x = myfunction(y);          // OK
x = mystuff::myfunction(y); // OK

A namespace may be opened and added to in several places, which enhances the convenience of their use. For example, several different files could add items to the mystuff namespace. Namespaces are extremely useful when more than one person is working on code for a single project, a situation quite common in the professional world. Using namespaces can prevent name conflicts and ambiguities created when two programmers (or one programmer at two different times) happen to use the same identifier for different purposes.

In general, global variables should be avoided. For this reason, the using directive should also be avoided in this course.

Storage Class and Linkage

The storage class of a variable determines the period during which it exists in memory. (Note that this period must be at least as long as the variable is in scope, but it may excede that time.) There are two storage classes: automatic and static. Confusingly, C++ provides five storage class specifiers which determine not just the storage class but also other things such as linkage and how the variable is treated by various components of a running program.

The storage class specifiers are as follows:

Storage class automatic: The variable is ensured to be in memory only when it is in scope.
1. Storage class specifier: auto
  This is the default specifier for all local variables, hence it is rarely seen used explicitly. It states the variable has storage class automatic.
2. Storage class specifier: register
  This specifies storage class automactic and also advises the compiler to place the variable into a CPU register instead of normal memory. This is an optimization suggestion but has no effect on the access logic for the variable.
Storage class static: The variable is ensured to be in memory during the entire execution of the program, regardless of whether the variable is in or out of scope.
1. Storage class specifier: extern
  This is the default for all global variables, including ordinary variables and function names. Extern variables are visible even across multiple files of source code, however, when an extern variable is defined in one file and used by another file, it must be declared in the second file using the extern specifier.
2. Storage class specifier: static
  The same as extern except that scope is limited to the file in which the variable is defined. A local variable may be specified as static, thus changing its storage class from the default automatic to static, with scope limited to the containing file.
3. Storage class specifier: mutable
  This specifier is used exclusively for classes and will be discussed later in that context.

Clearly these specifiers affect the storage class, the linkage, and in some cases the scope of the variable.

C++ Standard I/O

In C, I/O (Input/Output) is handled with functions found in stdio.h: printf and scanf. These are examples of "formatted" I/O statements and assume a certain file-oriented format.

In C++, I/O is handled through objects called streams. This is a very useful, flexible, and programmer-modifiable system; it is also somewhat complex. We will begin using two streams cin and cout that are pre-defined in the library iostream. cin is an object of type istream and cout is an object of type ostream. These stream objects reside in the namespace std. cin is typically bound to the keyboard and cout is typically bound to the screen, although files can be substituted using re-direction.

To use these stream objects:

Include the library where they are defined: #include<iostream>

Resolve the namespace where they are defined: std::cin and std::cout Typically work with the input and output operators >> and <<, respectively

/* example use of cout with output operator */ #include <iostream> // includes library std::cout << "Hello World"; // sends string "Hello World" to screen std::cout << 'a'; // sends character 'a' to screen std::cout << x << y << z; // sends values of x, y, and z to screen, in that order /* example use of cin with input operator */ #include <iostream> // includes library std::cin >> x; // read entered value into x std::cin >> a >> b >> c; // read three entered values to a, b, c (in that order) Note that literals cannot be used on the right side of the input operator. The input operator reads data IN to a variable location: The right side of the operator must specify an Lvalue. The input operator is sometimes called the extraction operator - it "extracts" data from the input stream object and puts it into the variable. Similarly, the output operator is sometimes called the insertion operator - it "inserts" a copy of the data into the output stream object. Some people find this terminology somewhat convoluted and prefer "input/output" to "extraction/insertion". Notes on Archaic Code C++ was officially standardized in 1998, so some older code will not reflect some of the newer features. A few worth noting: "int main()" vs "void main()" the newer style is to use int as the return type on the main function, which allows one program to receive a message of success or failure (with a given error code) from another calling program. This is commonly used, for example, in operating systems processes. Note also that the version with "int main()" has a return statement at the end, returning an integer value. In this context, typically 0 is used to report success, and a variety of negative values are used to report errors. iostream vs. iostream.h The older naming convention for library header files uses .h extension. The newer style uses a different naming scheme, which is also extended to the older C libraries. Some older compilers (like Borland C++ 5.0) do not recognize or understand "using" statements and namespaces, as they pre-date the 1998 standardization. The compilers in the FSU CS student computing environment are versions of the gnu g++ compiler. They are in fairly good compliance with the C++ standard. There is also an active standards group that will soon adopt a revised standard for C++. This will present yet another target for textbook and compiler writers as this rich language evolves. References The C++ Programming Language (3rd ed), Bjarne Stroustrup, Addison Wesley, 1997. C++ Primer (4th ed), by Stanley B. Lippman, Josie Lajoie, Barbara E. Moo, Addison-Wesley, 2005

Name	Symbol	Type	Usage
Add	`+`	binary	`x + y`
Subtract	`-`	binary	`x - y`
Multiply	`*`	binary	`x * y`
Divide	`/`	binary	`x / y`
Modulo	`%`	binary	`x % y`
Minus	`-`	unary	`-x`