An Introduction to C++ Programming - Part 10/13

From EDM2
Jump to: navigation, search
An Introduction to C++ Programming / Part
1 2 3 4 5 6 7 8 9 10 11 12 13

Written by Björn Fahller

An improved file array

[Note: the source code for this month is introcpp10.zip here. Ed.]

Last month a file based array template for truly huge amounts of data was introduced. While good, it was nowhere near our goals. Error handling was missing completely, making it dangerous to use in real life. There was no way to say how a user defined data type should be represented on disk, yet they weren't disallowed, which is a dangerous combination. It was also lacking iterators, something that is handy, and is an absolute requirement for generic programming with algorithms that are independent of the source of the data. On top of that, we'd really like the ability to store several different arrays in the same file, and also have an anonymous array which creates a temporary file and removes it when the array is destroyed. All of these will be dealt with this month, yet very little will be new. Instead it's time to make use of all the things learned so far in the course.

The data representation problem

In the file array as implemented last month, data was always stored in a raw binary format, exactly mirroring the bits as they lay in memory. This works fine for integers and such, but can be disastrous in other situations. Imagine a file array of strings (where string is a ``char*''). With the implementation from last month, the pointer value would be stored, not the data pointed to. When reading, a pointer value is read, and when dereferenced, whatever happens to be at the memory location pointed to (if anything) will be used (which is more than likely to result in a rather quick crash.) Anything with pointers is dangerous when stored in a raw binary format, yet we must somehow allow pointers in the array, and preferably so without causing problems for those using the array with built-in arithmetic types. How can this be done?

In part 4, when templates were introduced, a clever little construct called ``traits classes'' was shown. I then gave this rather terse description: ``A traits class is never instantiated, and doesn't contain any data. It just tells things about other classes, that is its sole purpose.'' Doesn't that smell like something we can use here? A traits class that tells how the data types should be represented on disk?

What do we need from such a traits class? Obviously, we need to know how much disk space each element will take, so a ``size'' member will definitely be necessary, otherwise we cannot know much disk space will be required. We also need to know how to store the data, and how to read it. The easiest way is probably to have member functions ``writeTo'' and ``readFrom'' in the traits class. Thus we can have something looking like this:


  template <class T> class FileArrayElementAccess
  {
  public:
    static const size_t size;
    static void writeTo(T value, ostream& os);
    static T readFrom(istream& is);
  };

The array is then rewritten to use this when dealing with the data. The change is extremely minor. ``storeElement'' needs to be rewritten as:


  template <class T>
  void FileArray<T>::storeElement(size_t index,
                                  const T& element)
  {
    // what if index >= array_size?
    typedef FileArrayElementAccess<T> traits;
    (*pstream).seekp(traits::size*index
                      +sizeof(array_size), ios::beg);
    // what if seek fails?
    traits::writeTo(element,*pstream);
    // what if write failed?
    // what if too much data was written?
  }

The change for ``readElement'' is of course analogous. However, as indicated by the last comment, a new error possibility has shown up. What if the ``writeTo'' and ``readFrom'' members of the traits class are buggy and write or read more data to disk than they're allowed to? Since it's the user of the array that must write the traits class (at least for their own data types) we cannot solve the problem, but we can give the user a chance to discover that something went wrong. Unfortunately for writing, the error is extremely severe; it means that the next entry in the array will have its data destroyed...

In the traits class, by the way, the constant ``size'', used for telling how many bytes in the stream each ``T'' will occupy, poses a problem with most C++ compilers today (modern ones mostly makes life so much easier.) The problem is that a static variable, and also a static constant, in a class, needs to reside somewhere in memory, and the class declaration is not enough for that. This problem is two-fold. To begin with, where should it be stored? It's very much up to whoever writes the class, but somewhere in the code, there must be something like:


  const size_t ArrayFileElementAccess<X>::size = ...;

where ``X'' is the name of the class dealt with by the particular traits specialisation.

The second problem is that this is totally unnecessary. What we want is a value that can be used by the compiler at compile time, not a memory location to read a value from. As I mentioned, a modern compiler does make this much easier. In standard C++ it is allowed to write:


  template<> class ArrayFileElementAccess<X>
  {
  public:
    const size_t size = ...;
  ...
  };

Note that for some reason that I do not know, this construct is only legal if the type is a constant of an integral or enumeration type. ``size_t'' is such a type, it's some unsigned integral type, probably ``unsigned int'', but possibly ``unsigned long''. The expression denoted ``...'' must be possible to evaluate at compile time. Unless code is written that explicitly takes the address of ``size'', we need not give the constant any space to reside in. The odd construct ``template <>'' is also new C++ syntax, and means that what follows is a specialisation of a previously declared template.

For old compilers, however, there's a work-around for integral values, no larger than the largest ``int'' value. We cheat and use an enum instead of a ``size_t''. This makes the declaration:


  class ArrayFileElementAccess<X>
  {
  public:
    enum { size= ... };
  ...
  };

This is a bit ugly, but it is perfectly harmless.

The advantage gained by adding the traits class is flexibility and safety. If someone wants to use a file array for their own class, they're free to do so. However, they must first write a ``FileArrayElementAccess'' specialisation. Failure to do so will result in a compilation error. This early error detection is beneficial. The sloppy solution from last month would not yield any error until run-time, which means a (usually long) debugging session.

Several arrays in a file

What is needed in order to host several arrays in the same file? One way or the other, there must be a mechanism for finding out where one array begins and another ends. I think the simplest solution, is to let go of the file names, and instead make the constructors accept an ``fstream&''. We can then require that the put and get pointer of the stream must be where the array can begin, and we can in turn promise that the put and get pointer will be positioned at the byte after the array end. Of course, in addition to having a reference to the ``fstream'' in our class, we also need the ``home'' position, to seek relative to, when indexing the array. This becomes easy to write for us, it becomes easy to use as well. For someone requiring only one array in a file, there'll be slightly more code, an ``fstream'' object must be explicitly initialised somewhere, and passed to the constructor of the array, instead of just giving it a name. I think the functionality increase/code expansion exchange is favorable.

In order to improve the likelihood of finding errors, we can waste a few bytes of disk space by writing a well known header and trailer pattern at the beginning and end of the array (before the first element, and after the last one.) If someone wants to allocate an array using an existing file, we can find out if the get pointer is in place for an array start.

The constructor creating a file should, however, first try to read from the file to see if it exists. If it does, it should be created from the file, just like the constructor accepting a stream only does. If the read fails, however, we can safely assume that the file doesn't exist and should instead be created.

The change in the class definition, and constructor implementation is relatively straight forward, if long:


  template <class T>
  class FileArray
  {
  public:
    FileArray(fstream& fs, size_t elements);
    // create a new file.

    FileArray(fstream& fs);
    // use an existing file and get size from there
  ...
  private:
    void initFromFile(const char*);

    fstream& stream;
    size_t array_size; // in elements
    streampos home;
  };

 template <class T>
  FileArray<T>::FileArray(fstream& fs, size_t elements)
    : stream(fs),
      array_size(elements)
  {
    // what if the file could not be opened?
    // first try to read and see if there's a begin
    // pattern. Either there is one, or we should
    // get an eof.

    char pattern[6];
    stream.read(pattern,6);
    if (stream.eof()) {
      stream.clear(); // clear error state
                      // and initialise.

      // begin of array pattern.
      stream.write("ABegin",6);
      // must store size of elements, as last month
      const size_t elem_size
        =FileArrayElementAccess<T>::size;
      stream.write((const char*)&elem_size,
                   sizeof(elem_size));
      // and of course the number of elements
      stream.write((const char*)&array_size,
                   sizeof(array_size));
      // Now that we've written the maintenance
      // stuff, we know what the home position is.

      home = stream.tellp();

      // Then we must go the the end and write
      // the end pattern.

      stream.seekp(home+elem_size*array_size);
      stream.write("AEnd",4);

      // set put and get pointer to past the end pos.
      stream.seekg(stream.tellp());
      return;
    }

    initFromFile(pattern); // shared with other
                           // stream constructor
    if (array_size != elements) {
      // Uh oh. The data read from the stream,
      // and the size given in the constructor
      // mismatches! What now?
      stream.clear(ios::failbit);
    }

    // set put and get pointer to past the end pos.
    stream.seekp(stream.tellg());
  }

  template <class T>
  FileArray<T>::FileArray(fstream& fs)
    : stream(fs)
  {
    // First read the head pattern to see if
    // it's right.
    char pattern[6];
    stream.read(pattern,6);
    initFromFile(pattern);
    // set put and get pointer to past the end pos.
    stream.seekp(stream.tellg());
  }

  template <class T>
  void FileArray<T>::initFromFile(const char* p)
  {
    // Check if the read pattern is correct
    if (strncmp(p,"ABegin",6)) {
      // What to do? It was all wrong!
      stream.clear(ios::failbit);
      // for lack of better,
      // set the fail flag.
      return;
    }
    // OK, we have a valid array, now let's see if
    // it's of the right kind.
    size_t elem_size;
    stream.read((char*)&elem_size,sizeof(elem_size));
    if (elem_size != FileArrayElementAccess<T>::size)
    {
      // wrong kind of array, the element sizes
      // mismatch. Again, what to do? Let's set
      // the fail flag for now.
      stream.clear(ios::failbit);
      // stupid name for the
      // member function, right?
      return;
    }
    // Get the size of the array. Can't do much with
    // the size here, though.
    stream.read((char*)&array_size,sizeof(array_size));
    // Now we're past the header, so we know where the
    // data begins and can set the home position.

    home = stream.tellg();

    stream.seekg(home+elem_size*array_size);

    // Now positioned immediately after the last
    // element.

    char epattern[4];
    stream.read(epattern,4);
    if (strncmp(epattern,"AEnd",4)) {
      // Whoops, corrupt file!
      stream.clear(ios::failbit);
      return;
    }
    // Seems like we have a valid array!
  }

Other than the above, the only change needed for the array is that seeking will be done relative to ``home'' rather than the beginning of the file (plus the size of the header entries.) The new versions of ``storeElement'' and ``readElement'' become:


  template <class T>
  T FileArray<T>::readElement(size_t index) const
  { // what if index >= max_elements?
    typedef FileArrayElementAccess<T> traits;
    stream.seekg(home+index*traits::size);
    // what if seek fails?

    return traits::readFrom(stream);
    // what if read fails?
    // What if too much data is read?
  }

  template <class T>
  void FileArray<T>::storeElement(size_t index,
                                  const T& element)
  { // what if index >= array_size?
    typedef FileArrayElementAccess<T> traits;
    stream.seekp(home+traits::size*index);
    // what if seek fails?
    traits::writeTo(element,stream);
    // what if write failed?
    // what if too much data was written?
  }

Temporary file array

Making use of a temporary file to store a file array that's not to be persistent between runs of the application isn't that tricky. The implementation so far makes use of a stream and known data about the beginning of the stream, number of elements and size of the elements. This can be used for the temporary file as well. The only thing we need to do is to create the temporary file first, open it with an fstream object, and tie the stream reference to that object, and remember to delete the file in the destructor.

What's the best way of creating something and making sure we remember to undo it later? Well, of course, creating a new helper class which creates the file in its constructor and removes it in its destructor. Piece of cake. The only problem is that we shouldn't always create a temporary file, and when we do, we can handle it a bit different from what we do with a ``global'' file that can be shared. For example, we know that we have exclusive rights to the file, and that it won't be reused, so there's no need for the extra information in the beginning and end. So, how's a temporary file created? The C++ standard doesn't say, and neither is there any support for it in the old de-facto standard. I don't think C does either. There are, however, two functions ``tmpnam'' and ``tempnam'' defined as commonly supported extensions to C. They can be found in <stdio.h>. I have in this implementation chosen to use ``tempnam'' as it's more flexible. ``tempnam'' works like this: it accepts two string parameters named ``dir'' and ``prefix''. It first attempts to create a temporary file in the directory pointed to by the environment variable ``TMPDIR''. If that fails, it attempts to create it in the directory indicated by the ``dir'' parameter, unless it's 0, in which case a hard-coded default is attempted. It returns a ``char*'' indicating a name to use. The memory area pointed to is allocated with the C function ``malloc'', and thus must be deallocated with ``free'' and not delete[].

Over to the implementation details:

We add a class called temporaryfile, which does the above mentioned work. We also add a member variable ``pfile'' which is of type ``ptr<temporaryfile>''. Remember the ``ptr'' template from last month? It's a smart pointer that deallocates whatever it points to in its destructor. It's important that the member variable ``pfile'' is listed before the ``stream'' member, since initialisation is done in the order listed, and the ``stream'' member must be initialised from the file object owned by ``pfile''. We also add a constructor with the number of elements as its sole parameter, which makes use of the temporary file.


  class temporaryfile
  {
  public:
    temporaryfile();
    ~temporaryfile();
    iostream& stream();
  private:
    char* name;
    fstream fs;
  };

  temporaryfile::temporaryfile()
    : name(::tempnam(".","array")),
      fs(name, ios::in|ios::out|ios::binary)
  {
    // what if tmpnam fails and name is 0
    // what if fs is bad?
  }

  temporaryfile::~temporaryfile()
  {
    fs.close();
    ::remove(name);
    // what if remove fails?
    ::free(name);
  }

In the above code, ``tempnam'', ``remove'' and ``free'' are prefixed with ``::``, to make sure that it's the names in global scope that are meant, just in case someone enhances the class with a few more member functions whose name might clash.

For the sake of syntactical convenience, I have added yet another operator to the ``ptr'' class template:


  template <class T> class ptr
  {
  public:
    ptr(T* tp=0) : p(tp) {};
    ~ptr() { delete p; };
    T* operator->(void) const { return p; };
    T& operator*(void) const { return *p;};
  private:
    ptr(const ptr<T>&);
    ptr<T>& operator=(const ptr<T>&);
    T* p;
  };

It's the ``operator->'' that's new, which allows us to write things like ``p->x,'' where p is a ``ptr<X>'', and the type ``X'' contains some member named ``x''. The return type for ``operator->'' must be something that ``operator->'' can be applied to. The explanation sounds recursive, but it makes sense if you look at the above code. ``ptr<X>::operator->()'' returns an ``X*''. ``X*'' is something you can apply the built in ``operator->'' to (which gives you access to the elements.)


  template <class T>
  FileArray<T>::FileArray(size_t elements)
    : pfile(new temporaryfile),
      stream(pfile->stream()),
      array_size(elements),
      home(stream.tellg())
  {
    const size_t elem_size=
      FileArrayElementAccess<T>::size;
    // put a char just after the end to make
    // sure there's enough free disk space.
    stream.seekp(home+array_size*elem_size);
    char c;
    stream.write(&c,1);
    // what to do if write fails?
    // set put and get pointer to past the end pos
    stream.seekg(stream.tellp());
  }

That's it! The rest of the array works exactly as before. No need to rewrite anything else.

Code reuse

If you're an experienced C programmer, especially experienced with programming embedded systems where memory constraints are tough and you also have a good memory, you might get a feeling that something's wrong here.

What I'm talking about is something I mentioned the first time templates were introduced: ``Templates aren't source code. The source code is generated by the compiler when needed.'' This means that if we in a program uses FileArray<int>, FileArray<double>, FileArray<X> and FileArray<Y> (where ``X'' and ``Y'' are some classes,) there will be code for all four types. Now, have a close look at the member functions and see in what way ``FileArray<int>::FileArray(iostream& fs, size_t elements)'' differs from ``FileArray<char>::FileArray(iostream& fs, size_t elements)''. Please do compare them.

What did you find? The only difference at all is in the handling of the member ``elem_size'', yet the same code is generated several times with that as the only difference. This is what is often referred to as the template code bloat of C++. We don't want code bloat. We want fast, tight, and slick applications.

Since the only thing that differs is the size of the elements, we can move the rest to something that isn't templatised, and use that common base everywhere. I've already shown how code reuse can be done by creating a separate class and have a member variable of that type. In this article I want to show an alternative way of reusing code, and that is through inheritance. Note very carefully that I did not say public inheritance. Public inheritance models ``is-A'' relationships only. We don't want an ``is-A'' relationship here. All we want is to reuse code to reduce code bloat. This is done through private inheritance. Private inheritance is used far less than it should be. Here's all there is to it. Create a class with the desired implementation to reuse and inherit privately from it. Nothing more, nothing less. To a user of your class, it matters not at all if you chose not to reuse code at all, reuse through encapsulation of a member variable, or reuse through private inheritance. It's not possible to refer to the descendant class through a pointer to the private base class, private inheritance is an implementation detail only, and not an interface issue.

To the point. What can, and what can not be isolated and put in a private base class? Let's first look at the data. The ``stream'' reference member can definitely be moved to the base, and so can the ``pfile'' member for temporary files. The ``array_size'' member can safely be there too and also the ``home'' member for marking the beginning of the array on the stream. By doing that alone we have saved just about nothing at all, but if we add as a data member in the base class the size (on disk) for the elements, and we can initialise that member through the ``FileArrayElementAccess::size'' traits member, all seeking in the file, including the initial seeking when creating the file array, can be moved to the base class. Now a lot has been gained. Left will be very little. Let's look at the new improved implementation:

Now for the declaration of the base class.


  class FileArrayBase
  {
  public:
  protected:
    FileArrayBase(iostream& io,
                  size_t elements,
                  size_t elem_size);
    FileArrayBase(iostream& io);
    FileArrayBase(size_t elements, size_t elem_size);
    iostream& seekp(size_t index) const;
    iostream& seekg(size_t index) const;
    size_t size() const; // number of elements
    size_t element_size() const;
  private:
    class temporaryfile
    {
    public:
      temporaryfile();
      ~temporaryfile();
      iostream& stream();
    private:
      char* name;
      fstream fs;
    };
    void initFromFile(const char* p);
    ptr<temporaryfile> pfile;
    iostream& stream;
    size_t array_size;
    size_t e_size;
    streampos home;
  };

The only surprise here should be the nesting of the class ``temporaryfile.'' Yes, it's possible to define a class within a class. Since the ``temporaryfile'' class is defined in the private section of ``FileArrayBase'', it's inaccessible from anywhere other than the ``FileArrayBase'' implementation. It's actually possible to nest classes in class templates as well, but few compilers today support that. When implementing the member functions of the nested class, it looks a bit ugly, since the surrounding scope must be used.


  FileArrayBase::temporaryfile::temporaryfile()
    : name(::tempnam(".","array")),
      fs(name,ios::in|ios::out|ios::binary)
  {
    // what if tmpnam fails and name is 0
    // what if fs is bad?
  }

  FileArrayBase::temporaryfile::~temporaryfile()
  {
    fs.close();
    ::remove(name);
    // What if remove fails?
    ::free(name);
   }

  iostream& FileArrayBase::temporaryfile::stream()
  {
    return fs;
  }

The implementation of ``FileArrayBase'' is very similar to the ``FileArray'' earlier. The only difference is that we use a parameter for the element size, instead of the traits class.


  FileArrayBase::FileArrayBase(iostream& io,
                               size_t elements,
                               size_t elem_size)
    : stream(io),
      array_size(elements),
      e_size(elem_size)
  {
    char pattern[sizeof(ArrayBegin)];
    stream.read(pattern,sizeof(pattern));
    if (stream.eof()) {
      stream.clear(); // clear error state
                      // and initialize.
      // begin of array pattern.
      stream.write(ArrayBegin,sizeof(ArrayBegin));

      // must store size of elements
      stream.write((const char*)&elem_size,
                   sizeof(elem_size));

      // and of course the number of elements
      stream.write((const char*)&array_size,
                   sizeof(array_size));

      // Now that we've written the maintenance
      // stuff, we know what the home position is.
      home = stream.tellp();

      // Then we must go the the end and write
      // the end pattern.

      stream.seekp(home+elem_size*array_size);
      stream.write(ArrayEnd,sizeof(ArrayEnd));

      // set put and get pointer to past the end pos.
      stream.seekg(stream.tellp());
      return;
    }
    initFromFile(pattern); // shared with other
                           // stream constructor

    if (array_size != elements) {
      // Uh oh. The data read from the stream,
      // and the size given in the constructor
      // mismatches! What now?

      stream.clear(ios::failbit);
    }
    if (e_size != elem_size) {
      stream.clear(ios::failbit);
    }
    // set put and get pointer to past the end pos.
    stream.seekp(stream.tellg());
  }

To make life a little bit easier, I've assumed two arrays of char named ``ArrayBegin'' and ``ArrayEnd'', which hold the patterns to be used for marking the beginning and end of an array on disk.


  FileArrayBase::FileArrayBase(iostream& io)
    : stream(io)
  {
    char pattern[sizeof(ArrayBegin)];
    stream.read(pattern,sizeof(pattern));
    initFromFile(pattern);

    // set put and get pointer to past the end pos.
    stream.seekp(stream.tellg());
  }

  FileArrayBase::FileArrayBase(size_t elements,
                               size_t elem_size)
    : pfile(new temporaryfile),
      stream(pfile->stream()),
      array_size(elements),
      e_size(elem_size),
      home(stream.tellg())
  {
    stream.seekp(home+array_size*e_size);
    char c;
    stream.write(&c,1);
    // set put and get pointer to past the end pos.
    stream.seekg(stream.tellp());
  }

  void FileArrayBase::initFromFile(const char* p)
  {
    // Check if the read pattern is correct
    if (strncmp(p,ArrayBegin,sizeof(ArrayBegin))) {
      // What to do? It was all wrong!
      stream.clear(ios::failbit); // for lack of better,
                                  // set the fail flag.
      return;
    }
    // OK, we have a valid array, now let's see if
    // it's of the right kind.
    stream.read((char*)&e_size,sizeof(e_size));

    // Get the size of the array. Can't do much with
    // the size here, though.
    stream.read((char*)&array_size,sizeof(array_size));

    // Now we're past the header, so we know where the
    // data begins and can set the home position.
    home = stream.tellg();
    stream.seekg(home+e_size*array_size);
    // Now positioned immediately after the last
    // element.
    char epattern[sizeof(ArrayEnd)];
    stream.read(epattern,sizeof(epattern));
    if (strncmp(epattern,ArrayEnd,sizeof(ArrayEnd)))
    {
      // Whoops, corrupt file!
      stream.clear(ios::failbit);
      return;
    }
    // Seems like we have a valid array!
  }

  iostream& FileArrayBase::seekg(size_t index) const
  {
    // what if index is out of bounds?
    stream.seekg(home+index*e_size);
    // what if seek failed?
    return stream;
  }

  iostream& FileArrayBase::seekp(size_t index) const
  {
    // What if index is out of bounds?
    stream.seekp(home+index*e_size);
    // What if seek failed?
    return stream;
  }

  size_t FileArrayBase::size() const
  {
    return array_size;
  }

  size_t FileArrayBase::element_size() const
  {
    return e_size;
  }

Apart from the tricky questions, it's all pretty straight forward. The really good news, however, is how easy this makes the implementation of the class template ``FileArray''.


  template <class T>
  class FileArray : private FileArrayBase
  {
  public:
    FileArray(iostream& io, size_t size);// create one.
    FileArray(iostream& io); // use existing array
    FileArray(size_t elements);  // create temporary
    T operator[](size_t index) const;
    FileArrayProxy<T> operator[](size_t index);
    size_t size() { return FileArrayBase::size(); };
  private:
    FileArray(const FileArray<T>&); // illegal
    FileArray<T>& operator=(const FileArray<T>&);
    // illegal

    T readElement(size_t index) const;
    void storeElement(size_t index, const T& elem);
    friend class FileArrayProxy<T>;
  };

Now watch this!


  template <class T>
  FileArray<T>::FileArray(iostream& io, size_t size)
    : FileArrayBase(io,
                    elements,
                    FileArrayElementAccess<T>::size)
  {
  }

  template <class T>
  FileArray<T>::FileArray(iostream& io)
    : FileArrayBase(io)
  {
    // what if element_size is wrong?
  }

  template <class T>
  FileArray<T>::FileArray(size_t elements)
    : FileArrayBase(elements,
                    FileArrayElementAccess<T>::size)
  {
  }

  template <class T>
  T FileArray<T>::operator[](size_t index) const
  {
    // what if index>= size()?
    return readElement(index);
  }

  template <class T>
  FileArrayProxy<T>
  FileArray<T>::operator[](size_t index)
  {
    // what if index>= size()?
    return FileArrayProxy<T>(*this, index);
  }

  template <class T>
  T FileArray<T>::readElement(size_t index) const
  {
    // what if index>= size()?
    iostream& s = seekg(index); // parent seekg
    return FileArrayElementAccess<T>::readFrom(s);
    // what if read failed?
    // What if too much data was read?
    return t;
  }

  template <class T>
  void FileArray<T>::storeElement(size_t index,
                                  const T& element)
  { // what if index>= size()?
    iostream& s = seekp(index); // parent seekp
    // what if seek fails?
    FileArrayElementAccess<T>::writeTo(element,s);
    // what if write failed?
    // What if too much data was written?
  }

How much easier can it get? This reduced code bloat, and also makes the source code easier to understand, extend and maintain.

What can go wrong?

Already in the very beginning of this article series, part 1, I introduced exceptions; the C++ error handling mechanism. Of course exceptions should be used to handle the error situations that can occur in our array class. When I introduced exceptions, I didn't tell the whole truth about them. There was one thing I didn't tell, because at that time it wouldn't have made much sense. That one thing is that when exceptions are caught, dynamic binding works, or to use wording slightly more English-like, we can create exception class hierarchies with public inheritance, and we can choose what level to catch. Here's a mini example showing the idea:


  class A {};
  class B : public A {};
  class C : public A {};
  class B1 : public B{};

  void f() (throw A); // may throw any of the above

  void x()
  {
    try {
      f();
    }
    catch (B& b) {
      // **1
    }
    catch (C& c) {
      // **2
    }
    catch (A& a) {
      // **3
    }
  }

At ``**1'' above, objects of class ``B'' and class ``B1'' are caught if thrown from ``f''. In ``**2'' objects of class ``C'' (and descendants of C, if any are declared elsewhere) are caught. At ``**3'' all others from the ``A'' hierarchy are caught.

This may seem like a curious detail of purely academic worth, but it's extremely useful. We can use abstraction levels for errors. For example, we can have a root class ``FileArrayException'', from which all other exceptions regarding the file array inherits. We can see that there are clearly two kinds of errors that can occur in the file array; abuse and environmental issues outside the control of the programmer. For abuse I mean things like indexing outside the valid bounds, and with environmental issues I mean faulty or full disks (Since there are several programs running, a check if there's enough disk space is still taking a chance. Even if there was enough free space when the check was made, that space may be occupied when the next statement in the program is executed.)

A reasonable start for the exception hierarchy then becomes:


  class FileArrayException {};
  class FileArrayLogicError
    : public FileArrayException {};
  class FileArrayRuntimeError
    : public FileArray Exception {};

Here ``FileArrayLogicError'' are for clear violations of the not too clearly stated preconditions, and ``FileArrayRuntimeError'' for things that the programmer may not have a chance to do something about. In a perfectly debugged program, the only exceptions ever thrown from file arrays will be of the ``FileArrayRuntimeError'' kind.

We can divide those further into:


  class FileArrayCreateError
    : public FileArrayRuntimeError {};

For whenever the creation of the array fails, regardless of why (it's not very easy to find out if it's a faulty disk or lack of disk space, for example.)


  class FileArrayStreamError
    : public FileArrayRuntimeError {};

If after creation, something goes wrong with a stream; for example if seeking or reading/writing fails.


  class FileArrayDataCorruptionError
    : public FileArrayRuntimeError {};

If an array is created from an old existing file, and we note that the header or trailer doesn't match the expected.


  class FileArrayBoundsError
    : public FileArrayLogicError {};

Addressing outside the legal bounds.


  class FileArrayElementSizeError
    : public FileArrayLogicError {};

If the read/write members of the element access traits class are faulty and either write too much (thus overwriting the data for the next element) or reads too much (in which case the last few bytes read will be garbage picked from the next element.)

It's of course possible to take this even further. I think this is quite enough, though.

Now we have a reasonably fine level of error reporting, yet an application that wishes a coarse level of error handling can choose to catch the higher levels of the hierarchy only.

As an exercise, I invite you to add the throws to the code. Beware, however; it's not a good idea to add exception specifications to the member functions making use of the T's (since you cannot know which operations on T's that may throw, and what they do throw.) You can increase the code size and eligibility gain from the private inheritance of the implementation in the base by putting quite a lot of the error handling there.

Iterators

An iterator into a file array is something whose behavior is analogous to that of pointers into arrays. We want to be able to create an iterator from the array (in which case the iterator refers to the first element of the array.) We want to access that element by dereferencing the iterator (unary operator *,) and we want iterator arithmetic with integers.

An easy way of getting there is to let an iterator contain a pointer to a file array, and an index. Whenever the iterator is dereferenced, we return (*array)[index]. That way we even have error handling for iterator arithmetic that lead us outside the valid range for the array given for free from the array itself. The iterator arithmetics becomes simple too, since it's just ordinary arithmetics on the index type. The implementation thus seems easy; all that's needed is to define the operations needed for the iterators, and the actions we want. Here's my idea:

  • creation from array yields iterator referring to first element
  • copy construction and assignment are of course well behaved.
  • moving forwards and backwards with operator++ and operator--.
  • addition of array and ``long int'' value ``n'' yields iterator referring to n:th element of array.
  • iterator+=n (where n is of type long int) adds n to the value of the index in the iterator. This addition is never an error; it's dereferencing the iterator that's an error if the index is out of range. Operator -= is analogous.
  • iterator+n yields a new iterator referring to the iterator.index+n:th element of the array, and analogous for operator-.
  • iterator1-iterator2 yields a long int which is the difference between the indices of the iterators. If iterator1 and iterator2 refer to different arrays, it's an error and we throw an exception.
  • iterator1==iterator2 returns non-zero if the arrays and indices of iterator1 and iterator2 are equal.
  • iterator1!=iterator2 returns !(iterator1==iterator2)
  • *iterator returns whatever (*array)[index] returns, i.e a
  • leArrayProxy. * iterator[n] returns (*array)[index+n].
  • iterator1<iterator2 returns true if the iterators refer to the same array and iterator1.index < iterator2.index. If the iterators refer to different arrays, it's an error and we throw an exception. Likewise for operator>.
  • iterator1>=iterator2 returns !(iterator1<iterator2). Likewise for operator<=.

I think the above is an exhaustive list. Neither of the above is difficult. It's just a lot of code to write, and thus a good chance of making errors. With a little thought, however, quite a lot of code can be reused over and over, thus reducing the amount to write and also the risk for errors. As an example, a rule of thumb when writing a class for which an object ``o'' and some other value ``v'' the operations ``o+=v'', ``o+v'' and ``v+o'' are well defined and behaves like they do for the built in types (which they really ought to, unless you want to give the class users some rather unhealthy surprises) is to define ``operator+='' as a member of the class, and two versions of operator+ that are implemented with ``operator+=''. Here's how it's done in the iterator example:


  template <class T>
  class FileArrayIterator
  {
  public:
    FileArrayIterator(FileArray<T>& f);
    FileArrayIterator<T>& operator+=(long n);
    FileArrayProxy<T> operator*();
    FileArrayProxy<T> operator[](long n);
    ...
  private:
    FileArray<T>* array;
    unsigned long index;
  };

  template <class T> FileArrayIterator<T>
  operator+(const FileArrayIterator<T>& i, long n);

  template <class T> FileArrayIterator<T>
  operator+(long n, const FileArrayIterator<T>& i);

  template <class T>
  FileArrayIterator<T>::FileArrayIterator(
    const FileArray<T>& a
  )
    : array(&a),
      index(0)
  {
  }

  template <class T>
  FileArrayIterator<T>::FileArrayIterator(
    const FileArrayIterator<T>& i
  )
    : array(i.array),
      index(i.index)
  {
  }

  template <class T>
  FileArrayIterator<T>&
  FileArrayIterator<T>::operator+=(long n)
  {
    index+=n;
    return *this;
  }

  template <class T> FileArrayIterator<T>
  operator+(const FileArrayIterator<T>& i, long n)
  {
    FileArrayIterator<T> it(i);
    return it+=n;
  }

  template <class T> FileArrayIterator<T>
  operator+(long n, const FileArrayIterator<T>& i)
  {
    FileArrayIterator<T> it(i);
    return it+=n;
  }

  template <class T>
  FileArrayProxy<T> FileArrayIterator<T>::operator*()
  {
    return (*array)[index];
  }

  template <class T>
  FileArrayProxy<T>
  FileArrayIterator<T>::operator[](long n)
  {
    return (*array)[index+n];
  }

Surely, the code for the two versions of ``operator+'' must be written, but since its behaviour is defined in terms of ``operator+='' it means that if we have an error, there's only one place to correct it.

There's no need to display all the code here in the article, you can study it in the sources. The above shows how it all works, though, and as you can see, it's fairly simple.

Recap

This month the news in short was:

  • You can increase flexibility for your templates without sacrificing ease of use or safety by using traits classes.
  • Enumerations in classes can be used to have class-scope constants of integral type.
  • Modern compilers do not need the above hack. Defining a class-scope static constant of an integral type in the class declaration is cleaner and more type safe.
  • Standard C++ and even C, does not have any support for the notion of temporary files. Fortunately there are commonly supported extensions to the languages that do.
  • Private inheritance can be used for code reuse.
  • Private inheritance is very different from public inheritance. Public inheritance models ``is-A'' relationships, while private inheritance models ``is-implemented-in-terms-of'' relationships.
  • A user of a class that has privately inherited from something else cannot take advantage of this fact. To a user the private inheritance doesn't make any difference.
  • Private inheritance is in real-life used far less than it should be. In many situations where public inheritance is used, private inheritance should've been used.
  • Exception catching is polymorphic (i.e. dynamic binding works when catching.)
  • The polymorphism of exception catching allows us to create an arbitrarily fine-grained error reporting mechanism while still allowing users who want a coarse error reporting mechanism to use one (they'll just catch classes near the root of the exception class inheritance tree.)
  • Always implement binary operator+, operator-, operator* and operator/ as functions outside the classes, and always implement them in terms of the operator+=, operator-=, operator*= and operator/= members of the classes.

Exercises

  • Alter the file array such that it's possible to instantiate two (or more) kinds of FileArray<X> in the same program, where the alternatives store the data in different formats. (hint, the alternatives will all need different traits class specialisations.)
  • What's the difference between using private inheritance of a base class, and using a member variable of that same class, for reusing code?
  • In which situations is it crucial which alternative you choose?

Coming up

Next month we'll have a look at smart pointers. I'm beginning to dry up on topics now, however, so please write and give me suggestions for future topics to cover.