An Introduction to C++ Programming - Part 13/13Run Time Type IdentificationWritten by Björn Fahller |
IntroductionLucky number thirteen will conclude the long C++ series. Last topic is Run Time Type Identification, or RTTI for short. It's a new addition to C++. I do not have access to any compiler under OS/2 that supports RTTI, so the example programs are written for, and tested with, the egcs compiler (a fast-moving bleeding edge gcc,) under Linux. RTTI is a way of finding out about the type of objects at run-time. There are two distinct ways this can be done. One is finding a type identification for an object, giving a unique identifier for each unique type. Another is a clever cast, which allows casting of pointers and references only if the type matches. Towards the end of the article there is also a discussion about various aspects of efficiency in C++, especially in comparison with C. Type Safe (down)CastingMany class libraries, most notably GUI libraries like IBM's Open Class
Library, suffer from the problem that callback functions will be called
with pointers to objects of a base class, but you know the
objects really are of some class inheriting from it. For example, concider
a button push event, from which you can get a pointer to the button
itself. Suppose a hierarchy like this:
It must be stressed that this use of
``dynamic_cast'' can always be avoided (I
cannot prove it, but I have never seen a counter proof either, and many
have tried.) The need arises from a poor design (the solution is to use
dynamic binding instead, by turning the problem ``inside-out''.) Sometimes
we have to live with poor designs, however, and then this can make our
life a lot easier. It can also be used during a transition between
different libraries. Here is an example mirroring a problem from my
previous job:
In which way is this worse from the previous version that did not have the problem; where everything was always ``ostrstream''? Well, in no way. With RTTI we can live with both worlds at the same time, and we have added an error check that should have been there earlier but was not because the language did not support it (what if the function is called with an ``fstream'' object?) Use RTTI ``dynamic_cast'' when being forced to adapt to poorly written libraries, as an error check, and during transitions between libraries. Identifying typesMuch more interesting is using explicit information about the type of an object. This solves a problem that cannot reliably be worked around with clever designs. There is a standard class called ``type_info'',
which purpose is to carry information about types. It is defined in the
header named <typeinfo> (note, no ``.h'')
and looks as follows:
The ``before'' member function gives you a sort order for types. Do not try to get a meaning from the sort order, there may not be one. It does not need to have any meaning, other than that it makes it possible to keep sorted collections of type information objects. You get a ``type_info'' object for a value
through the built-in operator ``typeid(x)''. Note
that it is the run-time type of ``x'' that is
accessed, not the static type. This requires a little bit of care. Say you
have this situation:
The answer is ``parent*'' followed by ``child''. This came as a surprise to me, but it makes sense. The run-time type of ``p'' is ``parent*''. It is what it points to that may differ, and that is exactly what is mirrored by the output. Is this useful?Suppose you need to store and retrieve objects from a typeless media, such as a file, or a socket. Storing them is easy, but when reading, you need to know what type of object to create. If you are designing a complete system from scratch, you can add some type identifier to all classes, but this is error prone, since it is easy to forget changing it when creating a new class inheriting from another one, and it does not work at all with existing classes. Chances are you will be extending existing code, or decide to use third party class libraries. Here is an outline for how to do this (in a non-portable, and slightly restricted way. Generality and portability requires more work.) First we design a class for I/O objects, let us call it
``persistent''. It may look like this:
Next we need something that does the work of calling the member
functions and creating the objects when read. Let us call this class
``persistent_store''. It may be defined as
follows:
The use of template functions that do not have any parameters of the
template type is a recent addition to C++ that few compilers support. The
syntax is a bit ugly, but it is not too bad. Here is how to use the
persistent store:
Now there are a number of problems that must be solved:
template <class T> void register_class() { persistent* p = (T*)0; ... };Any attempt to call ``register_class'' with a type not publicly inheriting from ``persistent'' will result in a compilation error. Not too bad. The second problem, deciding the type from a name is easy to understand conceptually, but requires a bit of trickery to implement. The solution lies in using a map from strings to a creation function. A
map is a data structure acting like an array, but allowing any kind of
indexing type. In this case the indexing type will be a string. Every
string in the map corresponds to a function accepting no parameters and
returning a ``persistent*''. That is the easy
part. So ... the implementation of
``register_class'' may look like this:
Here the map type from the C++ standard library is used. The tricky
part is to find out what ``???'' is. Perhaps it comes as no surprise that
the solution is yet another template, this time a function template.
Now for how to store and retrieve the type name. My suggestion is
simple; store the length of the name followed by the name itself. That
way, when reading, the length can be read, a character buffer allocated
for the correct size and exactly that many characters read. When storing
the string is checked for in the map to make sure no unregistered types
are stored. When reading, the creator function is looked up and called.
Here is how it is all implemented:
As can be guessed, the line ``if (iter == creator_map.end())'' checks if the lookup was successful. If true the string was not represented in the map, and thus the type was not registered. That is a mini persistency library for you. Enjoy. C++ and EfficiencyI have once in a while heard things like ``C++ programs are slow and large compared to a C program with the same functionality.'' In a way that statement is often true. Does it need to be that way? The answer is no, it does not. Here is one very good argument for why. If you have a lean C design, the same design can be used with C++. If you do, but use new and delete instead of malloc/free, use inline functions instead of macros and add constructors to your structs, you are still better off than in plain C, because of the added type safety, but the program will not be slower and larger. As I see it, there are mainly two reasons for C++ bloat. One is a cultural problem, one is technical. Culture related inefficiencyLet us first look at the cultural problem, because that is the one that is hardest to solve, and contributes most to the bloat. Throughout this course, I have been touting generality and extensibility over and over and over. However, not all problems need general and extensible solutions. Generality and extensibility have a price. Are you prepared to pay the price? It depends, right? If I can make use of the generality and extensibility at some other time, development-time efficiency has been gained, possibly at the cost of program size and run-time efficiency. Another cultural problem leading to inefficient C++ programs is the hunt for efficiency! Yes, it is true. Here are some examples: Virtual functionsI have heard people say things like ``As a performance optimization, I've removed all virtual functions and replaced the virtual function calls with switch statements and ordinary member function calls.'' That is a major performance killer. If you need the functionality that a virtual function call gives you, there is just no way to do it faster than through a virtual function call. The switch switch/call construct is guaranteed to require at least as many instructions (probably more) and is an instruction pipeline killer for the CPU, something that the virtual function call interestingly is not. Inline functionsWhile well chosen inline functions may speed up your program, ill chosen ones will lead to bloated executables, and are likely to reduce the number of cache hits since active code areas will increase in size. Remember that an inline function is actually not a function, strictly speaking. If inlined, the instructions for the function will be inserted at the place of the call. If the function is large (more than a very few instructions) the size of your program will grow considerably if the function is called often. More interesting is what happens if the compiler for one reason or another fails to inline it. It will then have to make an out-of-line function for it, which will be called like normal functions. There is one difference however (getting into the area of technical problems here, I will return to this one shortly.) Where will the out-of-line function reside? With one exception, all compilers I know of will add one copy, as a static function, for every translation unit (read .cpp file) where the function inlining failed. In a large program, this might mean *many* copies, and since every copy is in its own memory area, you will not get the boost from high performance hits due to good locality either. Fear of template code bloatSince it is ``a well known fact'' that templates leads to code bloat, many programmers avoid them. This means rolling their own algorithms and collections instead of using standard ones. Unless, however, the programmer is a top-notch algorithm and data-structure guru, the result is just about guaranteed to be slower, and quite possibly larger, than those in the standard library. After all, the contents of the standard library are state of the art. Have a look at the SGI STL, and read the documentation and code commens, and you will notice that many of the data structures and algorithms used were not even known a year ago. How many programmers have a working knowledge of the latest and greatest of algorithms and data structures? Technical problemsThe largest technical reason for slow C++ programs is compilers with very poor code optimizers. Most C++ compilers use exactly the same optimization techniques as C and Fortran compilers do, despite the fact that the way a C++ program works and accesses memory etc. differs a lot. I know of one C++ compiler that has a code optimizer made especially for the requirements of C++. It has been shown to equal and even beat Fortran for performance, and that is when using good encapsulation and templates. When feeding it a program written like a C program, performance will be poorer. If you are curious, look up KAI C++ (Note. I am not associated with them in any way. I have used their compiler, that is all.) Unfortunately it is not available for OS/2, and I doubt it ever will be (bug them about it, though, and it might happen.) I mentioned earlier the problem of outlined inline functions. A good compiler will not leave you several copies as static functions, but unfortunately many do. A worse technical problem is exceptions. Some compiler vendors, and many proponents of exceptions, say that exception handling can be implemented such that there is no performance loss at all, unless an exception is actually thrown. It may seem like it, but it is not true. No matter how good the implementation is, exceptions add a number of program flows that would otherwise not be there, and that makes life harder for the code optimiser. Recommended readingI want to finish by recommending some books on C++. ``Effective C++, 2nd ed'', Scott Meyers. A must read for any C++ programmer. Contains 50 tips, with examples and discussions, for how to improve your programs. Meyers' writing style is easy to follow and entertaining too. ``More effective C++'', Scott Meyers, Another 35 tips, this time introducing cleverer optimisation techniques like copy-on-write. ``Advanced C++, Programming Styles and Idioms'', James O. Coplien. Without doubt this book from 1992 is getting old, but it contains many useful tecniques. It is a mind opener in many ways. ``Scientific and Engineering C++'', John J. Barton and Lee R. Nackman. Often simply referred to as B&N. This is a modern ``Advanced C++''. What they have done is to break almost every rule of thumb, and as a result they have extremely extensible, slick, type-safe and fast designs. Beware, however, this book is not for beginners. ``The Design and Evolution of C++'', Bjarne Stroustrup. This book answers all the why's. Why does C++ have multiple inheritance? Why does it provide assignment, destruction, copy-constructor and default constructor for you? Why is RTTI there? Why no garbage collection? Why not dynamic types like in Smalltalk? ``Ruminations on C++'', Andrew Koening and Barbara Moo. Now, this is a pleasant book to read. Koenig is the only author I know who can explain a problem to solve, which one immediately realizes will require a large program, present an easy to understand one page solution just to say it was unnecessarily clumsy and reduce it to half a page without sacrificing extensibility and generality. Entertaining and enlightening. Also, do follow comp.lang.c++.moderated. /Björn. |