An Introduction to C++ Programming - Part 1/13

Written by Björn Fahller

Introduction
C++ is a programming language substantially different from C. Many see C++ as "a better C than C", or as C with some add-ons. I believe that to be wrong, and I intend to teach C++ in a way that makes use of what the language can offer. C++ shares the same low level constructs as C, however, and I will assume some knowledge of C in this course. You might want to have a look at the C introduction course to get up to speed on that language.

Basic I/O
All intro courses in programming begin with a "Hello World" program [except those that don't - Ed], and so does this one. Line 1 includes the header , which is needed for the input/output operations. In C++ writing something on standard output is done by: cout << whatever; Here "whatever" can be anything that is printable; a string literal as "Hello EDM/2", a variable, an expression. If you want to print several things, you can do so at the same time with: cout << expr1 << expr2 << expr3 << ...; Again, expr1, expr2 and expr3 represents things that are printable.

In the "Hello EDM/2" program above, the last expression printed is "endl" which is a special expression (called a manipulator. Manipulators, and details about I/O, will be covered in a complete part sometime this fall) which moves the cursor to the next line. It's legal to cascade more expressions to print after "endl", and doing so means those values will be printed on the next line.

As usual, string escapes can be used when printing: cout << "\"\t-\\\"\t" << endl; gives the result: "      -\"     " The spaces in the printed result are tabs (\t.)

Reading values from standard input is done with: cin >> lvalue; Here "lvalue" must be something that can be assigned a value, for example a variable. Just as with printing, several things can be read at the same time by cascading the reads: cin >> v1 >> v2 >> v3 >> ...;

Loop variable scope and defining variables
Now let's add some things to this program. For example, let it print the parameters it was started with. You can compile and run: The result is probably not surprising, but if you know C, this does look interesting, does it not? The loop variable used for the "for" loop on is defined in the loop itself. One of the minor, yet very useful, differences from C is that you can leave defining a variable until you need it. In this case, the variable "count" is not needed until the loop, thus we don't define it until the loop. The loop is in an extra block (a { ... } pair is called a block) for compatibility between older and newer C++ compilers. For older C++ compilers, such as Visual Age C++, the variable "count" is defined from the "for" loop and until the "}" before "return". For newer C++ compilers, the variable is defined within the loop only.

Here's a "for" loop convention that's useful when you want your programs to work both with new and old C++ compilers:

If you define a loop variable, and want it valid within the loop only, enclose the loop in an extra block: If you define a loop variable and want it valid after the loop body, define the variable outside the loop: The former guarantees that the loop variable will not be defined after the last "}", be it on a new or old compiler. The latter guarantees that the loop variable *will* be defined after the loop for both new and old compilers.

Usually you don't want to use the loop variable after the loop, so the latter construction is not used frequently.

Let's have a closer look at the rules for when a variable is available and when it is not. First we can see two new things. C++ has two kinds of comments. The C style "/*", "*/" pair, and the one line "//" comment. For the latter, everything following "//" on a line is a comment [to the end of line - Ed].

The other new thing is the parameter list of "main". The second parameter is nameless. C++ allows you to skip the name of a parameter you are not going to use. Normally of course you do not have a parameter in the list that you're not going to use, but the "main" function is supposed to have these parameters (or void) so there's not much choice here. By not giving the parameter a name we save ourselves from a compiler warning like 'Parameter "argv" of function "main" declared but never referenced'.

So then, let's look at the variables.

At **1** a global variable called "i" is defined, and initialised with the value 5.

At **2** this global variable is printed. Yes, there is a variable called "i" in "main" but it has not yet been defined, thus it is the global one that is printed. [Note in the example how the variable "i" in "main" is defined the line after the output at **2**; not just given a value there, but actually declared as a variable there - Ed]

At **3** the auto variable "i" of "main" is defined. From this point on, trickery is needed (see **9**) to reach the global "i" from within "main," since this "i" hides the name.

This is why, at **4**, the variable defined at **3** is printed, and not the global alternative. The same thing happens at **5**.

At **6** yet another variable named "i" is defined, and this one hides the variable defined at **3** and the global alternative from **1**. The global "i" can still be reached (see **9**), but the one from **3** is now unreachable.

As expected, the variable printed at **7** is the one defined at **6**.

Between **7** and **8** however, the "i" defined at **6** goes out of scope. It ceases to exist, it dies. Thus, at **8** we can again reach the "i" defined at **3**.

At **9** a cheat is used to reach the global "i" from within main. The "::" operator, called the scope operator, tells C++ to look for the name in the global scope, which makes it reach the variable from **1**.

[Note: a good C++ compiler will warn you if you declare a variable with the same name as one already accessible to your function (eg like at **6**), to alert you to the possibility that you are referring to the wrong variable. This is often described as one variable "shadowing" the other one - Ed]

Function overloading
C++ allows you to define several different functions with the same name, providing their parameter list differs. Here's a small example of doing so: Compiling and running yields: Handled right, this can severely reduce the number of names you have to remember. To do something similar in C, you'd have to give the functions different names, like "print_int" and "print_string". Of course, handled wrong, it can cause a mess. Only overload functions when they actually do the same things. In this case, printing their parameters. Had the functions been doing different things, you would soon forget which of them did what. Function overloading is powerful, and it will be used a lot throughout this course, but everything with power is dangerous, so be careful.

To differentiate between overloaded function the parameter list is often included when mentioning their name. In the example above, I'd say we have the two functions "print(int)" and "print(const char*)", and not just two functions called "print." [Compilers will generally do something similar when reporting error messages as well, so if you have used function overloading in your program look closely at which of the functions the compiler is concerned about - Ed]

Error handling
Unlike C, C++ has a built in mechanism for dealing with the unexpected, and it's called exception handling (Note, if you're experienced in OS/2 programming in other languages, you might have used OS/2 exception handlers; this is not the same thing, this is a language construct, not an operating system construct.) Exceptions are a function's means of telling its caller that it cannot do what it's supposed to do. The classic C way of doing this, is using error codes as return values, but return values can easily be ignored, whereas exceptions can not. Exceptions also allow us to write pure functions, that either succeed in doing the job, thus returning a valid value, or fail and terminate through an exception. This last sentence is paramount to any kind of error handling. For a function there are only two alternatives; it succeeds with doing its job, or it does not. There's no "sort of succeeded." When a function succeeds, it returns as usual, and when it fails, it terminates through an exception. The C++ lingo for this termination is to "throw an exception." You can see this as an incredibly proud and loyal servant, that does what you tell it to, or commits suicide. When committing suicide, however, it always leaves a note telling why. In C++, the note is an instance of some kind of data, any kind of data, and being the animated type, the function "throws" the data towards its caller, not just leaves it neatly behind. Let's look at an example of exception handling, here throwing a character string: This mini program shows the mechanics of C++ exception handling. The function prototype for "divide" at //**1 adds an exception specification "throw (const char*)". Exceptions are typed, and a function may throw exceptions of different types. The exception specification is a comma separated list, showing what types of exceptions the function may throw. In this case, the only thing this function can throw is character strings, specified by "const char*".

Any attempt to do something, when you want to find out if it succeeded or not, must be enclosed in a "try/catch" block. At //**2 we see the "try" block. A try block is *always* followed by one or several "catch" blocks (//**3). If something inside the "try" block (in this case, a call to "divide") throws an exception, execution immediately leaves the "try" block and enters the "catch" block with the same type as the exception thrown. Here, when "divide" is called with a dividend of 0, a "const char*" is thrown, the "try" block is left and the "catch" block entered. If no "catch" block matches the type of exception thrown, execution leaves the function, and a matching "catch" block (if any) of its caller is entered. If no matching "catch" block is found when "main" is reached, the program terminates.

When a function finds that it cannot do whatever it is asked to do, it throws an exception, as shown at //**4. If the exception thrown does not match the exception specification of the function, the program terminates.

Compile and test the program: (If you're using VisualAge C++, you don't need any special compiler flags to enable exception handling, and for Watcom C++, use /xs.)

This exception handling can be improved, though. As I mentioned above, exceptions are typed. This is a fact that can, and should, be exploited. In the program above, we have little information, other than that something's gone wrong, and that we can see exactly what by reading the string. The program, however, cannot do much about the error other than printing a message, and the message itself is not very informative either, since we don't know where the error originated from anyway. If we instead create a struct type, holding more information, we can do much better. If we create different struct types for different kinds of errors, we can catch the different types (separate "catch" blocks) and take corresponding action. Here's an attempt at improving the situation:

If compiled and run: What do you think of this? Do you think the code is messy? How would you have implemented the same functionality without exception handling? The code is a bit messy, but part of the mess will be removed as you learn more about C++, and the other part is due to handling the error situations. It's frightening how frequent lack of error handling is, but as I read in a calendar "Unpleasant facts [errors] don't cease to exist just because you chose to ignore them." Also, errors are easier to handle if you take them into account in the beginning, instead of, as I've seen far too often, add error handling afterwards. There are problems with the code above, as will be mentioned soon, but one definite advantage gained by using exceptions is that the code for error handling is reasonably separated from the parts that does it's job. This separation will become even clearer as you learn more C++.

Programming by contract
While I haven't mentioned it, I've touched upon one of the most important ideas for improving software quality. As you may have noticed in the function prototypes, I did not only add an exception specifier, there was also a comment mentioning a precondition. That precondition is a contract between the caller of the function and the function itself. For example, the precondition for "divide" is "dividend != 0". This is a contract saying, "I'll do my work if, and only if, you promise not to give me a zero dividend." This is important, because it clarifies who is responsible for what. In this case, it means that it's the callers job to ensure that the dividend isn't 0. If the dividend is 0, the "divide" function is free to do whatever it wants. Throwing an exception, however, is a good thing to do, because it gives the caller a chance. Another very important part of programming by contract, that I have not mentioned, even briefly, is something called a post condition. While the precondition states what must be true when entering a function, the post condition states what must be true when leaving the function (unless left by throwing an exception). The functions used above do not use post conditions, which is very bad. Post conditions check if the function has done it's job properly, and if used right, also serves as a documentation for what the function does. Take for example the "divide" function. A post condition should say something about the result of it, in a way that explains what the function does. It could, for example, say:

This states two things: It *is* a division function, not something else just happening to have that name, and the result is rounded and will stay within a promised interval.

To begin with, scrutinously using pre- and post-conditions force you to think about how your functions should behave, and that alone makes errors less likely to slip past. They make your testing much easier, since you have stated clearly what every function is supposed to do. (If you haven't stated what a function is supposed to do, how can you be sure it's doing the right thing?) Enforcing the pre- and post-conditions makes errors that do slip by anyway, easier to find.

When using "Programming by Contract", exceptions are a safety measure, pretty much like the safety belt in a car. The contract is similar to the traffic regulations. If everybody always follows the rules, and when in doubt, use common sense and behave carefully, no traffic accidents will ever happen. As we know, however, people do break the rules, both knowingly, and by mistake, and they don't always use common sense either. When accidents do happen, the safety belt can save your life, just as exceptions can. Note that this means that exceptions are *not* a control flow mechanism to be actively used by your program, just as much as the safety belt isn't (someone who makes active use of the safety belt as a control mechanism of the car would by most people be considered pretty dangerous, don't you think?) They're there to save you when things go seriously wrong.

OK, so, in the light of the above, have you found the flaw in my test program above yet? There's something in it that poorly matches what I've just mentioned above. Have a look again, you have the time I'm sure.

Found it? Where in the program do I check that I send the functions valid parameters? I don't. The whole program trust the exception handling to do the job. Bad idea. Don't do that, not for pre- and post-conditions anyway. Pre- and post-conditions are *never* to be violated, ever. It's inexcusable to violate them. That's actually what they're for, right? When having a precondition, always make sure you're conforming to it, don't trust the error handling mechanism. The error handling mechanism is there to protect you when you make mistakes, but you must always try to do your best yourself.

struct
You've seen one subtle difference, between structs in C and structs in C++. There are quite a few very visible differences too. Let's have a look at one of them; constructors. A constructor is a well defined way of initialising an object, in this case an instance of a struct. When you create an instance of a struct in C, you define your variable, and then give the components their values. Not so in C++. You declare your struct with constructors, and define your struct instance with well known initial values. Here's how you can do it. "BoundsError" is a struct with no data, and it's used entirely for error checking. For this example, no data is needed for it. The struct "Range" is known to have two components "lower" and "upper" (usually referred to as member variables) and a constructor. You recognise a constructor as something that looks like a function prototype, declared inside the struct, and with the same name as the struct. Since C++ allows function overloading on parameter types, it is possible to specify multiple different constructors. Here you see yet something new; default parameter values. C++ allows functions to have default parameter values, and these values are used if you don't provide any when calling the function. In this case, it appears as if three constructors were called, one with no parameters, initialising both "upper" and "lower" to 0, one with one parameter, initialising "lower" to 0, and one with two parameters. The restriction on default parameter values is that you can only add them from the right on the parameter list. Here's a few examples for you: So far we have just said that the constructor exists, not what it does. Here comes that part: Quite a handful of new things for so few lines. Let's break them apart in three pieces: This is the constructor declarator. The first "Range" says we're dealing with the struct called "Range". The "::" is the scope operator (the same as in //**9** in the example for variable scope in the beginning in this article), means we're defining something that belongs to the struct. The rest of this line is the same as you saw in the declaration, except that the default parameter values are not listed here; they're an interface issue only. This might seem like redundant information, but it is not. If you forget "::", what you have is a function called "Range(int, int)" that returns a "Range". If you forget any of the "Range" you have a syntax error.

Now to the second piece, the one with //*1 comments.

In a constructor you can add what's called an initialiser list between the declarator and the function body. The initialiser list is a comma separated list of member variables that you give an initial value. This can of course be done in the function body as well, but if you can give the member variables a value in the initialiser list, you should. The reason is that the member variables will be initialised whether you specify it in a list it or not, and if you initialise them in the function body only, they will in fact be initialised twice. One thing that is important to remember with initialiser lists is that the order of initialisation is the order the member variables appear in the struct declaration. Do not try to change the order by reorganising the initialiser list because it will not work. [Some compilers will just rearrange the order of the list internally to be the right one and tell you they've done so; but don't rely on this -- Ed]

Last is the function body: This looks just like a normal function body. Member variables that we for some reason have been unable to initialise in the initialiser list can be taken care of here. In this case all were initialised before entering the function body, so nothing such is needed. Instead we check that the range is valid, and that the components were initialised as intended, and throw an exception if it isn't. "BoundsError" at //*2, means a nameless instance of struct "BoundsError". Note that even if you define a constructor, for which the function body is empty, it's still needed.

When used, constructors look like this:

Dynamic memory allocation
The way dynamic memory allocation and deallocation is done in C++ differs from C in a way that on the surface may seem rather unimportant, but in fact the difference is enormous and very valuable.

Here's a small demo of dynamic memory allocation and deallocation (ignoring error handling). Dynamic memory is allocated with the "new" operator. It's a built in operator that guarantees that a large enough block of memory is allocated and initialised, (compare with "malloc" where it's your job to tell how large the block should be and what you get is raw memory) and returns a pointer of the requested type. At //1 you see an "int" being allocated on the heap, and the pointer variable "pint" being initialised with its value. At //2 and //3 you see another interesting thing about the "new" operator; it understands constructors. At //2 an "int" is allocated on heap, and the "int" is initialised with the value 2. At //3 a "Range" struct is allocated on the heap, and initialised by a call to the constructor defined in the previous example. As you can see at //4 dynamic memory is deallocated with the built in "delete" operator.

Building a stack
OK, now we have enough small pieces of C++ as a better C, to do something real. Let's build a stack of integers, using constructors and dynamic memory allocation. At //**1** you see something unexpected. The pointer is initialised to 0, and not "NULL". There is no such thing as "NULL" in C++, so the number 0 is what we have, and use. The reason is, oddly as it may seem, that C++ is much pickier than C about type correctness. Typically in C, "NULL" is defined as "(void*)0." C++ never allows implicit conversion of a "void*" to any other pointer type, so this definition is not good enough. The integer 0, however, is implicitly convertible to any pointer type.

//**2** is an unpleasant little thing. Depending on how new your compiler is, the "new" operator will either return 0 (for old compilers) or throw an instance of "bad_alloc" (for new compilers) if the memory is exhausted.

So, what do you think of this small stack example? Is it good? Better than the C alternative with several different function names to remember, being careful to allocate objects of the correct size and initialise the struct members correctly? I think it is better. We don't have to overload our memory with many names, we don't have to worry about the size of the object to allocate, and we don't need to cast an incompatible type either (compare with "malloc") and initialisation of the member variables is localised to something that belongs to the struct; its constructor. We check our preconditions (no post conditions are used here) with C++ exceptions.

Exercises
Can you do this with an empty constructor body?
 * 1) Write a struct called "UserTypedInt" whose constructor initialises the integer member variable by reading a value from standard in.
 * 1) Improve the "math" example by using constructors for the exception classes.
 * 2) Think about the things in the stack example that could improve, then [mailto:bjorn@algonet.se e-mail me] your thoughts. I'll bring up some of the ideas next month.

Recap
This was a lot, wasn't it? In short:
 * You've seen how printing on standard out, and reading from standard in is done with "cout" and "cin".
 * Variables can, in C++, be defined when needed, and their lifetime depends on where they are defined.
 * There are a lot of differences between new and old C++ compilers (the difference is unfortunate, but the changes are for the better.)
 * Functions can be overloaded, provided their parameter lists differ.
 * You've learned about error handling with C++ exceptions.
 * Standard preprocessor macros __FILE__ and __LINE__ are handy to include in exception information, as their values are the C++ file and line number.
 * Programming by contract can be used to clarify responsibilities (you'll see a lot of this later on).
 * A struct can be empty, and can have constructors.
 * Dynamic memory is handled, in a type safe way, with the "new" and "delete" operators. The "new" operator understands constructors.
 * You've seen that functions are allowed to have default values for their parameters.

Next
After having digested this for a month, I'll present to you the encapsulation mechanism of C++, classes, how you declare them, what methods are, you'll see the reverse of the constructor called the destructor. You'll also get to know the C++ references better.

Please, by the way, I need your constant feedback. Write me! I want opinions and questions. If you think I'm wrong about things, going too fast, too slow, teaching the wrong things, whatever; tell me, ask me.