An Introduction to C++ Programming - Part 3/13

Written by Björn Fahller

References, Philosophy of Encapsulation and the Orthodox Canonical Form
[NOTE: Here is a link to a zip of the introcpp3.zip source code for this article. Ed.]

So far we've covered error handling through exceptions, and encapsulation with classes. The feedback from part 2 tells me I forgot a rather fundamental thing; what exactly is encapsulation, what should we make classes of. What's the meaning of a class? I will get to this, but first let's finally have a look at the promised references.

What's a class?
Now for the theoretical biggie. What, exactly, is the meaning of a class. When should you write a class, what should the class allow you to do, and what's a good name for a class? What's the relation between classes and objects?

When you write programs in Object Oriented Programming Languages, be it C++, Objective-C, Smalltalk, Eiffel, Modula-3, Java or whatever, you write classes. A class is, as I mentioned in part 2, a method of encapsulation, but more importantly, a class is a type. When you define a class, you add a new type to the language. C++ comes with a set of built in types like "int", "unsigned" and "double". In the previous lesson, when we wrote the class "intstack", we introduced a new type to the language, which programs could use, the stack of integers. The member functions of the class, describe the semantics of the type. With the built in integral types, we have operations like adding two instances of the type, yielding a third instance, which value happens to be the sum of the values of the other two. We can increment the value of instances of the type with operations like ++, and so on. With the "intstack", we had the operations "pop", "top", "push" and "nrOfElements", in addition to well defined construction and destruction of instances.

So, how can you know what classes to make? Classes are, as a rule of thumb, descriptions of ideas. "Bicycle" for example, is a classic example of a class. The idea "Bicycle" that is, not my particular bicycle. My bicycle is a physical entity that is currently getting wet in the rain. The idea of bicycle is a very good candidate for a class. What my bicycle is, on the other hand, is a good candidate for an instance of the class "Bicycle." So, when thinking of the problem you want to solve, you might have a good candidate for a class X, if you can say "The X ...", "An X...", or "A kind of X...". The objects are the instances of types (yes, an instance of type "float" is also an object, they need not be instances of classes.) A class represent the idea, and the functions that represent the semantics. Usually instances of the class has a state (for example, the state of a stack is the elements in it, and their order.) Having state means that the same member function can give different results depending on what has been done to the object before calling the member function (again, with a stack, the value returned by "top" or "nrOrElements" depends on the history of "push" and "pop" calls.) The class has member data to represent state. There are, however, exceptions to this rule of thumb. For example, is a mathematical function a class that you'd like to have instances of to toy within your program? According to the rule, it is not, since a mathematical function is state less. In most situations, the answer would, as expected, be no, but if you design a program for use by electronics engineers when designing their gadgets, you better have amplifiers (multiplication,) adders, subtracters and so on, or they won't use your program.

Note that objects don't exist when you write your program. Objects are run-time entities. When you write your program, what exists are types, descriptions of how instances of types can be used, and descriptions of semantics and state representation. When your program executes, the identifiers, (like "pc" in the reference example above) are replaced by bit-patterns representing objects.

So, then, what member functions should a class have? This is even harder to say, because there are so many ways to solve every problem. However, the things that you need to do, when solving your problem, to instances of types, like "Bicycle" or "intstack", must in one way or the other be expressible through the classes. If I need to ride my bicycle, it can be that the class "Bicycle", should have the member function "beRiddenBy" accepting an instance of class "Human", but it might also be that class "Human" should have the member function "ride" accepting an instance of class "Bicycle" as its parameter. If the starting point or destination are important, they probably should be parameters to the member functions. If the road itself is important, you probably need a class "Road", which you want to pass an instance of to the member function of either "Bicycle::beRiddenBy" or "Human::ride".

Given this, you might start to feel like someone's been fooling you. This Object Oriented Programming thing is a hoax! What it's all about, is class oriented programming. The objects are, after all, just the run time instances of the classes.

The Orthodox Canonical Form
The basic operations you should, in general, be able to do with objects of any class is construction from scratch, construction by copying another instance, assignment and destruction. This places a slightly heavier burden on us, compared to the work with the "intstack." The "intstack" guaranteed that no matter what happened, an instance was always destructible. The Orthodox Canonical Form poses the additional requirement that an instance must always be copyable. Normally this extra burden is light, but there are tricky cases. Construction from scratch we've seen. Construction by copying is done by the copy constructor.

Given a class named C, the copy constructor looks like this: The job of the copy constructor is to create an object that is identical to another object. It is your job to make sure it does this. This does not mean that every member variable of the newly constructed object must have values identical to the ones in the original. On the contrary, they often differ. What's important, though, is that they're semantically identical (i.e. given the same input to member functions, they give the same response.) The "intstack" for example must make its own copy of the stack representation in the copy constructor. This means that the base pointer will differ, but as far as you can see through the "push", "pop" and "top" member functions, there is no difference between the copy and the original.

Next in line is copy assignment. Again, given a class C, the copy assignment operator looks like this: Writing the copy assignment operator is more difficult than writing the copy constructor. Not only does the copy assignment operator need to make the object equal to its parameter, it also needs to cleanly get rid of whatever resources it might have held when being called (The copy constructor does not have this problem since it creates a new object that cannot have held any data since before.) The return value of an assignment operator is (by tradition, not by necessity) a reference to the object just assigned to. When inside a member function (the assignment operator as defined above is a member function) the object can be reached through a pointer named "this," which is a pointer to the class type. For the class C, above, the type of "this" is "C* const" This means that it's a pointer to type C, and the pointer itself is a constant (i.e. you cannot make "this" point to anything else than the current instance.) The reference to the object is obtained by dereferencing the "this" pointer, so the last statement of an assignment operator is almost always "return *this;"

The difficulty of writing a good copy constructor and copy assignment operator is best shown through a classical error: OK, so from the example it is pretty clear that it's more work than this. The copy constructor should allocate its own memory, and initialise that memory with the same value as that pointed to by the original. This goes for the copy assignment operator too, but it also needs to discard the pointer it already had. By doing so, we guarantee that the pointers owned by the objects are truly theirs, and their destructor can safely deallocate them. We do, however, have yet a problem to deal with, that of self assignment. A version of the program fixing the above issues can show you what is meant by that: OK, so assigning an object to itself is perhaps not the most frequently done operation in a program, but that doesn't mean it's allowed to crash, right? So, how can we make the copy assignment operator safe from self assignment? Here are two alternatives: Common to both is that they check if the right hand side (parameter b) is the same object. If it is, the assignment is simply not done. The first alternative does this by comparing the "pi" pointer. The second by comparing the pointer to the objects themselves. The latter perhaps feels a bit harder to understand, but it's actually the one most frequently seen, because normally classes have more than one member variable to check for. Note that if your class only has member variables of types for which copying the values does not lead to problems, the tests above are not necessary.

With these changes done, the class deserves a name change. It is no longer bad.

In the previous lesson, the copy constructor and copy assignment operator was declared private, to prevent copying and assignment. The reason is that a C++ compiler automatically generates a copy constructor and copy assignment operator for you if you don't declare them. The auto-generated copy constructor and assignment operator, however, will just copy/assign the member variables, one by one. In some cases this is perfectly OK. The "Range" class from the previous lesson, for example, does fine with this auto-generated copy constructor and copy assignment operator. The "intstack" on the other hand does not, since then both the original and the copy would share the same representation (and have exactly the same problem as described in the above "bad" example!)

If you decide that for your class, the auto generated copy constructor and/or copy assignment operator is OK, leave a comment in the class declaration saying so, so that readers of the source code know what you're thinking. Otherwise they might easily think you've simply forgotten to write them.

One last thing before wrapping up...

Const Correctness
When talking about passing parameters to functions by reference, I mentioned the const reference as a way to ensure that the parameter won't get modified, since the const reference treats whatever it refers to as a constant and thus won't allow you to do things that would modify it. The question is, how does the compiler know if something you do to an object will modify it? Does "pop" modify the "intstack?" Yes, it does. It removes the top element. Does "top" modify the stack? No. So, it should be allowed to call "top" for a constant stack, right? The problem is that the compiler doesn't know which member functions modify the objects, and which don't (and assumes they do, just to be on the safe side) unless you tell it differently. Since, by default, a member function is assumed to alter the object, you are, by default, not allowed to do anything at all to a constant object. This is of course hard. Fortunately we can tell the compiler differently. We can change "top" to be declared as follows: It's the word "const" after the parameter list that tells the compiler that this member function will not modify the object and can safely be called for constant objects. As a matter of fact, now when we know about references, we can do even better by writing two member functions "top", one "const" and one not, with the non-const version returning a non-const reference to the element instead. This has a tremendous advantage: For constant stack objects, we can get the value of the top element, for non-constant stack objects, we can alter the value of the top element by writing like this: There is no magic involved in this. Just as I mentioned in part one, functions can be overloaded if their parameter list differs. Member functions can be overloaded on "constness." The "const" member function is called for constant objects (or, const references or pointers, since they both treat the object referred to as if it was a constant.) The non-const member function is called for non-constant objects. Note that it is only member functions you can do this "const" overloading on. You cannot declare non-member functions "const." Our overloaded "top" member functions can be declared like this: This is getting too much without concrete examples. Here's a version of "intstack" with copy constructor, copy assignment operator, const version of "top" and "nrOfElements", and a non-const version of "top" (just as above.) Only the new and changed functions are included here. You'll find a zip file with the complete sources at the top. Since copying elements of a stack is the same when doing copy assignment and copy construction, I have a private helper function that does the job. This is not necessary by any means, but it means I won't have identical code in two places, and that is usually desirable. After all, if ever you need to change the code, you can bet you'll forget to update one of them otherwise, and you have a subtle bug that may be hard to find. With only one place to update, that mistake is hard to make. The same goes for deallocation of the stack. It is needed both in copy assignment and destructor. Since these helper functions "copy" and "destroyAll" are purely intended as an aid when implementing copy assignment, copy constructor and destructor, they're declared private. Just as a private member variable can only be accessed from the member functions of a class, and not by anyone else, member functions declared private can only be accessed from member functions of the same class. They have nothing what so ever to do with how the stack works, just how it's implemented.

Here comes the new implementation of "nrOfElements." Can you see what's different from the previous lesson? There isn't anything at all that differs from the previous version of "nrOfElements", other than that it's declared to be "const." Had we, in this member function (or any other member function declared as "const" attempted to modify any member variable, the compiler would give an error, saying that we're attempting to break our promise not to modify the object. "const" methods are thus good also as a way of preventing you from making mistakes. Note that declaring a member function "const" does not mean it's only for constant objects, it just means that it's callable on constant objects too. Whenever you have a member function that does not modify any member variable, declare it "const" so that errors can be caught by the compiler. It saves you debug time, in addition to making those member functions callable for constant objects (or constant references or pointers.)

Next in turn is "top", or rather the two versions of "top": As can be seen, not much differs between the two variants of "top." The implementation is in fact identical for both, but the first one returns a value and is declared const, the other one is not declared const and returns a reference. So why do we have two identical implementations here, when I earlier mentioned that this is always undesirable? The reason is simply that although the implementation is identical, neither can be expressed in terms of the other. The non-const version cannot be implemented with the aid of the const version, since we'd then return a reference to a local value. This is always bad, does not have the desired effect, and quite likely to cause unpredictable run-time behaviour. The "const" version could be implemented in terms of the non-const version, if it wasn't for the fact that it is not declared "const." The implementation of a const member function is not allowed to alter the object, and is, as a consequence of this, not allowed to call non-const member functions for the same object.

Remember that a reference really isn't an object on its own? You cannot distinguish it in any way from the object it refers to. In this case it means that what's returned from the non-const version of "top" is the top element itself, not a local copy of it. Since it is the element itself, it can be modified. Note that there is a danger in this too: What about this example? The answer to the last two questions is that "i" refers to a variable that no longer exists and that when assigning to it, or getting a value from it, anything can happen. If you're lucky, your program will crash right away, if you're out of luck, it'll start behaving randomly erratically!

Now for the copy constructor. With the help of the "copy" member function, it's really simple! The "pTop" member of the instance being created is initialized with the value from "i.copy". The "copy" helper function, creates a new copy of i's representation ("pTop" and whatever it points to) on the heap and returns the pointer to its base. If "i" is an empty stack, "copy" returns 0. If we run out of memory when "copy" is working, whatever was allocated will be deallocated, and "bad_alloc" thrown. In this case, it means that "bad_alloc" will be thrown before "pTop" is initialized, and thus the new object will never be constructed.

The copy assignment operator is a little bit trickier, but not that bad. Seemingly simple, and yet both efficient and exception safe. The difficulty lies in being careful with the order in which to do things. Here a temporary pointer "pTmp" is first set to refer to the copy of "i's" representation. This is very important from an exception handling point of view. Suppose we first destroyed the contents and then tried to get a copy, but the copying threw "bad_alloc." Since we're not catching "bad_alloc", it flows out of the function as intended, but our own "pTop" would point to something illegal, and thus our promise to always stay destructible, and copyable whenever resources allow, would be broken. Instead, first getting the copy is essential. If the copying fails, the member variables are not altered, and the object remains unchanged (whenever possible, try to leave objects in an unaltered state in the presence of exceptions, and always leave them destructible and copyable.) Again, since "bad_alloc" is not caught in the function, it'll flow off to the caller if thrown. If copying is successful, we can safely destroy whatever we have and then change the "pTop" member variable. Since we've promised that "destroyAll" won't throw anything (a promise we could make, since we've promised that our destructors won't throw) the rest is guaranteed to work. Also, since we first get a local copy of the object assigning from, and after that destroy our own representation, the self assignment guard ("if (this != &i)") is not necessary. It's a pure performance boost by making sure we do nothing at all instead of duplicating the representation, just to destroy the original.

With the aid of the "destroyAll" helper function, the destructor becomes trivial: So, how is this magic "destroyAll" helper function implemented? It's actually identical with the old version of the destructor. Now the only thing yet untold is how the helper function "copy" is implemented. It's the by far trickiest function of them all. To begin with, the return type is "intstack::stack_element*". The type "stack_element" is only known within "intstack," so whenever used outside of "intstack" it must be explicitly stated that it is the "stack_element" type that is defined in "intstack." As long as we're "in the header" of a member function, nested types must be explicitly stated. Well within the function, it is no longer needed, since it is then known what class the type belongs to.

The whole copying is in a "try" block, so we can deallocate things if something goes wrong. The local variable "pFirst", used to point to the first element of the copy, is defined outside of the "try" block, so it can be used inside the "catch" block. If we didn't leave this for the "catch" block, there would be no way it could find the memory to deallocate.

If "pTop" is non-zero, the whole structure that "pTop" refers to is copied.

There are two details worth mentioning here.


 * The "if" statements marked //**1 are only needed for older compilers. New compilers automatically throw "bad_alloc" when they're out of memory. Old compilers, however, return 0.
 * The "while" statement marked //**2 might look odd. What happens is that the variable "p" is given the value of "p->pNext", and that value is compared against zero. Remember that assignment is an expression, and that expression can be used, for example, for comparisons. The assignment "p=p->pNext" must be in a parenthesis for this to work. The precedence rules are such that assignment has lower precedence than comparison, so if we left out the parenthesis, the effect would be to assign "p" the value of "p->pNext" compared to 0, which would not be what we intended.

At the places where a "stack_element" is allocated, it is important that the "pNext" member variable is given the value 0, since it is always put at the end of the stack. If it was not set to 0, it would not be possible to know that it was the last element, and our program would behave erratically. It's not until we have successfully created another element to append to the stack, that the "pNext" member variable is given a value other than 0.

Now, it's up to you to toy with the "intstack". Whenever you have a need for a stack of integers, here you have one.

Exercises

 * When is guarding against self assignment necessary? When is it desirable?
 * How can you disallow assignment for instances of a class?
 * The non-const version of "top" returns a reference to data internal to the class. Mail me your reasons for why this can be a bad idea (it can, and usually even is!) Can it be bad in this case?
 * When can returning references be dangerous? When is it not?
 * Mail me an exhaustive list of reasons when assignment or construction can be allowed to fail under the Orthodox Canonical Form.
 * When is it OK to use the auto-generated copy constructor and copy assignment operator?

Recap
This month, yet more news has been introduced to you, as coming C++ programmers.


 * You have seen how C++ references work.
 * You have learned about "const", and how it works for objects.
 * You have seen how you can make member functions callable for "const" objects by declaring them as "const", and seen that member functions declared "const" are callable for non-const objects as well.
 * You have found out how you can overload member functions on "constness" to get different behaviour for const objects and non-const objects.
 * You have learned about the "Orthodox Canonical Form", which always gives you construction from nothing, construction by copying, assignment and destruction.
 * You have learned that your objects should always be in a destructible and copyable state, no matter what happens.
 * You have seen how you can implement common behaviour in private member functions. These member functions are then only callable from within member functions of that class.

Coming up
Next month I hope to introduce you to components of the C++ standard library. Most compilers available today do not have this library, but fortunately it is available and downloadable for free from a number of sources. Knowing this library, and how to use it, will be very beneficial for you, partly because it is standard, and partly because it's remarkably powerful.

As usual, you are the ones who can make this course the ultimate C++ course for you. Send me e-mail at once, stating your opinions, desires, questions and (of course) answers to this month's exercises!