Feedback Search Top Backward
EDM/2

An Introduction to C++ Programming - Part 9/13

File I/O and Binary Streams

Written by Björn Fahller

Linkbar

 
Part1 Part2 Part3 Part4 Part5 Part6 Part7 Part8 Part9 Part10 Part11 Part12 Part13

In parts 5 and 6, the basics of I/O were introduced, with formatted reading and writing from standard input and output. We'll now have a look at I/O for files. In a sense, it's better to stop using the term I/O here, and instead use streams and streaming, since the ideas expressed here and in parts 5 and 6 can be used for other things than I/O, for example in-memory formatting of data (we'll see that at the very end of this article.)

Files

In what way is writing ``Hello world'' on standard output different from writing it to a file? The question is worth some thought, since in many programming languages there is a distinct difference. Is the message different? Is the format (as seen from the program) different? I cannot see any difference in those aspects. The only thing that truly differs is the media where the formatted message ends up. In the former case, it's on your screen, but for file I/O it's in a file somewhere on your hard disk. In other words, there is very little difference, or at least, there's very much in common.

As we've seen so far, commonality is expressed either through inheritance or templates, depending on what's common and what's not. To refresh your memory, templates are used when we want the same kind of behaviour, independent of data. For example a stack of some data type. Inheritance is used when you want similar, but in some important aspects different, behaviour at runtime for the same kind of data. We saw this for the staff hierarchy and mailing addresses in parts 7 and 8. In this case it's inheritance that's the correct solution, since the data will be the same, but where it will end up (and most notably, how it does end up there) differs. (Incidentally, there's a good case for using templates too, regarding the type of characters used. The C++ standard does indeed have templatized streams, just for differing between character types. Few compilers today support this, however. See the ``Standards Update'' towards the end of the article for more information.)

The inheritance tree for stream types look like this:

The way to read this is that there's a base class named ``ios'', from which the classes ``istream'' and ``ostream'' inherit. The classes ``ifstream'' and ``ofstream'' in their turn inherit from ``istream'' and ``ostream'' respectively. The ``f'' in the names imply that they're file streams. Then there's the odd ones, ``iostream'', which inherits from both ``istream'' and ``ostream'', and ``fstream'' which inherits from both ``ifstream'' and ``ofstream.'' Inheriting from two bases is called multiple inheritance, and is by many seen as evil. Many programming languages have banned it: Objective-C, Java, Smalltalk to mention a few, while other programming languages, like Eiffel, go to the other extreme and allow you to inherit the same base several times Personally I think multiple inheritance is very useful if used right, but it can cause severe problems. Here is a situation where it's used in the right way. Anyway, this means that ``fstream'' is a file stream for both reading and writing, while ``iostream'' is an abstract stream for both reading and writing. More often than you think, you probably don't want to use the ``iostream'' or ``fstream'' classes.

This inheritance, however, means that all the stream insertion and extraction functions (the ``operator>>'' and ``operator<<'') you've written, will work just as they do with file streams. Now, wasn't that neat? In other words, the only things you need to learn for file based I/O are the details that are specific to files.

File Streams

The first thing you need to know before you can use file streams is how to create them. The parts of interest look like this:


  class ifstream : public istream
  {
    ifstream();
    ifstream(const char* name,
             int mode=ios::in);
    void open(const char* name,
              int mode=ios::in);
    ...
  };

  class ofstream : public ostream
  {
    ofstream();
    ofstream(const char* name,
             int mode=ios::out);
    void open(const char* name,
              int mode=ios::out);
    ...
  };

  class fstream : public ofstream, public ifstream
  {
    fstream();
    fstream(const char* name,
            int mode);
    void open(const char* name,
              int mode);
    ...
  };
You get access to the classes by #including <fstream.h>. The empty constructors always create a file stream object that is not tied to any file. To tie such an object to a file, a call to ``open'' must be made. ``open'' and the constructors with parameters behaves identically. ``name'' is of course the name of the file. Since you normally use either ``ifstream'' or ``ofstream'' and rarely ``fstream'', this is normally the only parameter you need to supply. Sometimes, however, you need to use the ``mode'' parameter. It's a bit field, in which you use bitwise or (``operator|'') for any of the values ``ios::in'', ``ios::out'', ``ios::ate'', ``ios::app'', ``ios::trunc'', and finally ``ios::binary.'' Some implementations also provide ``ios::nocreate'' and ``ios::noreplace,'' but those are extensions. Some implementations do not have ``ios::binary,'' while others call it ``ios::bin.'' These variations of course makes it difficult to write portable C++ today. Fortunately, the six ones listed first are required by the standard (although, they belong to class ``ios_base,'' rather than ``ios.'') The meaning of these are:

  ios::in        open for reading

  ios::out       open for writing

  ios::ate       open with the get and set pointer at the end
                 (see Seeking for info) of the file.

  ios::app       open for append, that is, any write you make
                 to the file will be appended to the file.

  ios::trunc     scrap all data in the file if it already exists.

  ios::binary    open in binary mode, that is, do not do the brain
                 damaged LF<->CR/LF conversions that OS/2,
                 DOS, CP/M (RIP), Windows, and probably other
                 operating systems, so often insist on. The reason
                 some implementations do not have ios::binary
                 is that many operating systems do not have this
                 conversion, so there's no need for it.

  ios::noreplace cause the open to fail if the file already exists.

  ios::nocreate  cause the open to fail if the file doesn't exist.
Of course combinations like ``ios::noreplace | ios::nocreate'' doesn't make sense -- the failure is guaranteed.

On many implementations today there's also a third parameter for the constructors and ``open;'' a protection parameter. How this parameter behaves is very operating system dependent.

Now for some simple usage:


  #include <fstream.h>

  int main(int argc, char* argv[])
  {
    if (argc != 2) {
      cout << ``Usage: `` << argv[0] << ``filename'' << endl;
      return 1; // error code
    }

    ofstream of(argv[1]); // create the ofstream object
                          // and open the file.

    if (!of) { // something went wrong
      cout << ``Error, cannot open `` << argv[1] << endl;
      return 2;
    }

    // Now the file stream object is created. Write to it!
    of << ``Hello file!'' << endl;
    return 0;
  }
As you can see, once the stream object is created, its usage is analogous to that of ``cout'' that you're already familiar with. Of course reading with ``ifstream'' is done the same way, just use the object as you've used ``cin'' earlier.

The file stream classes also have a member function ``close'', that by force closes the file and unties the stream object from it. Few are the situations when you need to call this member function, since the destructors do close the file.

Actually this is all there is that's specific to files.

Binary streaming

So far we've dealt with formatted streaming only, that is, the process of translating raw data into a human readable form, or translating human readable data into the computer's internal representation. Some times you want to stream raw data as raw data, for example to save space in a file. If you look at a file produced by, for example a word processor, it's most likely not in a human readable form. Note that binary streaming does not necessarily mean using the ``ios::binary'' mode when opening a file (although, that is indeed often the case.) They're two different concepts. Binary streaming is what you use your stream for, raw data that is, and opening a file with the ``ios::binary'' mode, means turning the brain damaged LF<->CR/LF translation off.

Binary streaming is done through the stream member functions :


  class ostream ...
  {
  public:
    ostream& write(const char* s, streamsize n);
    ostream& put(char c);
    ostream& flush();
  ...
  };

  class istream ...
  {
  public:
    istream& read(char* s, streamsize n);
    int get();
    istream& get(char& c);
    istream& get(char* s, streamsize n, char delim='\n');
    istream& getline(char* s, streamsize n,
                     char delim='\n');
    istream& ignore(streamsize n=1, int delim=EOF);
  };
The writing interface is extremely simple and straight forward, while the reading interface includes a number of small but important differences. Note that these member functions are implemented in classes ``istream'' and ``ostream,'' so they're not specific to files, although files are where you're most likely to use them. Let's have a look at them, one by one:

  ostream& ostream::write(const char* s, streamsize n);
Write ``n'' characters to the stream, from the array pointed to by ``s.'' ``streamsize'' is a signed integral data type. Despite ``streamsize'' being signed, you're of course not allowed to pass a negative size here (what would that mean?) Exactly the characters found in ``s'' will be written to the stream, no more, no less.

  ostream& ostream::put(char c);
Inserts the character into the stream.

  ostream& ostream::flush();
Force the data in the stream to be written (file streams are usually buffered.)

  istream& istream::read(char* s, streamsize n);
Read ``n'' characters into the array pointed to by ``s.'' Here you better make sure that the array is large enough, or unpleasant things will happen. Note that only the characters read from the stream are inserted into the array. It will not be zero terminated, unless the last character read from the stream indeed is '\0'.

  int istream::get();
Read one character from the stream, and return it. The value is an ``int'' instead of ``char'' since the return value might be ``EOF'' (which is not uniquely representable as a ``char.'')

  istream& istream::get(char& c);
Same as above, but read the character into ``c'' instead. Here a ``char'' is used instead of an ``int,'' since you can check the value directly by calling ``.eof()'' on the reference returned.

  istream& istream::get(char* s, streamsize n,
                        char delim='\n');
This one's similar to ``read'' above, but with the difference that it reads at most ``n'' characters. It stops if the delimiter character is found. Note that when the delimiter is found, it is not read from the stream.

  istream& istream::getline(char* s, streamsize n,
                            char delim='\n');
The only difference between this one and ``get'' above, is that this one does read the delimiter from the stream. Note, however, that the delimiter is not stored in the array.

  istream& istream::ignore(streamsize n=1,
                           int delim=EOF);
Reads at most ``n'' characters from the stream, but doesn't store them anywhere. If the delimiter character is read, it stops there. Of course, if the delimiter is ``EOF'' (as is the default) it does not read past ``EOF,'' that's physically impossible.

Array on file

An example: Say we want to store an array of integers in a file, and we want to do this in raw binary format. Naturally we want to be able to read the array as well. A reasonable way is to first store a size (in elements) followed by the data. Both the size and the data will be in raw format.


  #include <fstream.h>

  void storeArray(ostream& os, const int* p, size_t elems)
  {
    os.write((const char*)&elems,sizeof(elems));
    os.write((const char*)p, elems*sizeof(*p));
  }
The above code does a lot of ugly type casting, but that's normal for binary streaming. What's done here is to use brute force to see the address of ``elems'' as a ``const char*'' (since that's what ``write'' expects) and then say that only the ``sizeof(elems)'' bytes from that pointer are to be read. What this actually does is to write out the raw memory that ``elems'' resides in to the stream. After this, it does the same kind of thing for the array. Note that ``sizeof(*p)'' reports the size of the type that ``p'' points to. I could as well have written ``sizeof(int),'' but that is a dangerous duplication of facts. It's enough that I've said that ``p'' is a pointer to ``int.'' Repeating ``int'' again just means I'll forget to update one of them when I change the type to something else.

To read such an array into memory requires a little more work:


  #include <fstream.h>

  size_t readArray(istream& is, int*& p)
  {
    size_t elems;
    is.read((char*)&elems, sizeof(elems));
    p = new int[elems];
    is.read((char*)elems, elems*sizeof(*p));
    return elems;
  }
It's not particularly hard to follow; first read the number of elements, then allocate an array of that size, and read the data into it.

Seeking

Up until now we have seen streams as, what it sounds like, continuous streams of data. Sometimes however, there's a need to move around, both backward and forward. Streams like standard input and standard output are truly continuous streams, within which you cannot move around. Files, in contrast, are true random access data stores. Random access streams have something called position pointers. They're not to be confused with pointers in the normal C++ sense, but it's something referring to where in the file you currently are. There's the put pointer, which refers to the next position to write data to, if you attempt to write anything, and the get pointer, which refers to the next position to read data from. An ostream of course only has the put pointer, and an istream only the get pointer. There's a total of 6 new member functions that deal with random access in a stream:


  streampos istream::tellg();

  istream& istream::seekg(streampos);

  istream& istream::seekg(streamoff, ios::seek_dir);

  streampos ostream::tellp();

  ostream& ostream::seekp(streampos);

  ostream& ostream::seekp(streamoff, ios::seek_dir);
``streampos'', which you get from ``tellg'' and ``tellp'' is an absolute position in a stream. You cannot use the values for anything other than ``seekg'' and ``seekp''. You especially cannot examine a value and hope to find something useful there (i.e. you can, but what you find out might hold only for the current release of your specific compiler, other compilers, or other releases of the same compiler, might show different characteristics for ``streampos.'') Well, there are two other things you can do with ``streampos'' values. You can subtract two values, and get a ``streamoff'' value, and you can add a ``streamoff'' value to a ``streampos'' value. ``streamoff,'' by the way, is some signed integral type, probably a ``long.''

By using the value returned from ``tellg'' or ``tellp,'' you have a way of finding your way back, or do relative searches by adding/subtracting ``streamoff'' values.

The ``seekg'' and ``seekp'' methods accept a ``streamoff'' value and a direction, and work in a slightly different way. You search your way to a position relative to the beginning of the stream, the end of the stream, or the current position, the selection of which, is done through the ``ios::seek_dir'' enum, which has these three values ``ios::beg'', ``ios::end'' and ``ios::cur.'' To make the next write occur on the very first byte of the stream, call ``os.seekp(0,ios::beg),'' where ``os'' is some random access ``ostream.''

In any reasonable implementation, any of the seek member functions use lazy evaluation. That is, when you call any of the seek member functions, the only thing that happens is that some member variable in the stream object changes value. It's not until you actually read or write, something truly happens on disk (or wherever the stream data resides.)

A stream array, for really huge amounts of data

Suppose we have a need to access enormous amounts of simple data, say 10 million floating point numbers. It's not a very good idea to just allocate that much memory, at least not on my machine with a measly 64Mb RAM. It'll not just make this application crawl, but probably the whole system due to excessive paging. Instead, let's use a file to access the data. This makes for slow access, for sure, but nothing else will suffer.

Here's the idea. The array must be possible to use with any data type, including user defined classes. Its usage must resemble that of real arrays as much as possible, but extra functionality that arrays do not have, such as asking for the number of elements in it, is OK. There must be a type, resembling pointers to arrays, that can be used for traversing it. We do not want the size of the array to be part of its type (if you've programmed in Pascal, you know why.) In addition to arrays, we want some measures of safety from stupid mistakes, such as addressing beyond the range of the array, and also for errors that arrays cannot have (disk full, cannot create file, disk corruption, etc.) We also want to say that an array is just a part of a file and not necessarily an entire file. This would allow the user to create several arrays within the same file. To prevent this article from growing way too long, quite a few of the above listed features will be left for next month. The things to cover this month are: An array of built-in fundamental types only, which lacks pointers and is limited to one file per array. We'll also skip error handling for now (you can add it as an exercise, I'll raise some interesting questions along the way,) and add that too next month.

First of all, the array must be a template, so it can be used to store arbitrary types. Since we do not want the size to be part of the type signature, the size is not a template parameter, but a parameter for the constructor. Of course, we cannot have the entire array duplicated in memory (then all the benefits will be lost,) instead we will search for the data on file every time it's needed.

Here's the outline for the class.


  template <class T>

  class FileArray
  {
  public:
    FileArray(const char* name, size_t elements);
    // Create a new array and set the size.

    FileArray(const char* name);
    // Create an array from an existing file, get the
    // size from the file.

    // use compiler defined destructor.

    T operator[](size_t index) const;
    ??? operator[](size_t index);

    size_t size() const;
  private:
    // don't want these to be used.
    FileArray(const FileArray&);
    FileArray& operator=(const FileArray&);
    ...
  };
As can be expected, ``operator[]'' can be overloaded, which is handy for providing a familiar syntax. However, already here we see a problem. What's the non-const ``operator[]'' to return? To see why this is a problem, ask yourself what you want ``operator[]'' to do. I want ``operator[]'' to do two things, depending on where it's used; like this:

  FileArray<int> x;
  ...
  x[5] = 4;
  int y = x[3];
When ``operator[]'' is on the left hand side of an assignment, I want to write data to the file, and if its on the right hand side of an assignment, I want to read data from the file. Ouch.

Warning: I've often seen it suggested that the solution is to have the const version read and return a value, and the non-const version write a value. As slick as it would be, it's wrong and it won't work. The const version is called for const array objects, the non-const version for non-const array objects.

Instead what we have to do is to pull a little trick. The trick is, as so often in computer science, to add another level of indirection. This is done by not taking care of the problem in ``operator[],'' but rather let it return a type, which does the job. We create a class template, looking like this:


  template <class T>
  class FileArrayProxy
  {
  public:
    FileArrayProxy<T>& operator=(const T&); // write value
    operator T() const; // read a value

    // compiler generated destructor

    FileArrayProxy<T>&
    operator=(const FileArrayProxy<T>& p);

    FileArrayProxy(const FileArrayProxy<T>&);
  private:
    ... all other constructors.
    FileArray<T>& array;
    const size_t index;
  };
We have to make sure, of course, that there are member functions in ``FileArray<T>'' that can read and write (and of course, those functions are not the ``operator[],'' since then we'd have an infinite recursion.) All constructors, except for the copy constructors, are made private to prevent users from creating objects of the class whenever they want to. After all, this class is a helper for the array only, and is not intended to ever even be seen. This, however, poses a problem; with the constructors being private, how can ``FileArray<T>::operator[]()'' create and return one?

Enter another C++ feature: friends. Friends are a way of breaking encapsulation. What?!?! Yes, what you read is right. Friends break encapsulation, and (this is the real shock) that's a good thing! Friends break encapsulation in a controlled way. We can, in ``FileArrayProxy<T>'' declare ``FileArray<T>'' to be a friend. This means that ``FileArray<T>'' can access everything in ``FileArrayProxy<T>,'' including things that are declared private. Paradoxically, violating encapsulation with friendship strengthens encapsulation when done right. The only alternative here to using friendship, is to make the constructors public, but then anyone can create objects of this class, and that's what we wanted to prevent. Friends are useful for strong encapsulation, but it's important to use it only in situations where two (or more classes) are so tightly bound to one another that they're meaningless on their own. This is the case with ``FileArrayProxy<T>.'' It's meaningless without ``FileArray<T>,'' thus ``FileArray<T>'' is declared a friend of ``FileArrayProxy<T>.'' The declaration then becomes:


  template <class T>

  class FileArrayProxy
  {
  public:
    FileArrayProxy& operator=(const T&); // write a value
    operator T() const; // read a value
    // compiler generated destructor

    FileArrayProxy<T>& // read from p and then write
    operator=(const FileArrayProxy<T>& p);

    // compiler generated copy contructor
  private:
    FileArrayProxy(FileArray<T>& fa, size_t n);
    // for use by FileArray<T> only.

    FileArray<T>& array;
    const size_t index;

    friend class FileArray<T>;
  };
We can now start implementing the array. Some problems still lie ahead, but I'll mention them as we go.

  // farray.hpp
  #ifndef FARRAY_HPP
  #define FARRAY_HPP

  #include <fstream.h>
  #include <stdlib.h> // size_t

  template <class T> class FileArrayProxy;
  // Forward declaration necessary, since FileArray<T>
  // returns the type.

  template <class T> class FileArray
  {
  public:
    FileArray(const char* name, size_t size); // create
    FileArray(const char* name); // use existing array
    T operator[](size_t size) const;
    FileArrayProxy<T> operator[](size_t size);
    size_t size() const;
  private:
    FileArray(const FileArray<T>&); // illegal
    FileArray<T>& operator=(const FileArray<T>&);

    // for use by FileArrayProxy<T>
    T readElement(size_t index) const;
    void storeElement(size_t index, const T&);

    fstream stream;
    size_t max_size;

    friend class FileArrayProxy<T>;
  };
The functions for reading and writing are made private members of the array, since they're not for anyone to use. Again, we need to make use of friendship to grant ``FileArrayProxy<T>'' the right to access them. Let's define them right away

  template <class T>
  T FileArray<T>::readElement(size_t index) const
  {
    T t;
    stream.seekg(sizeof(max_size)+index*sizeof(T));
    // what if seek fails?

    stream.read((char*)&t, sizeof(t));
    // what if read fails?

    return t;
  }
All of a sudden, we face an unexpected problem. The above code won't compile. The member function is declared ``const'', and as such, all member variables are ``const'', and neither ``seekg'' nor ``read'' are allowed on constant streams. The problem is one of differing between logical constness and bitwise constness. This member function is logically ``const'', as it does not alter the array in any way. However, it is not bitwise const; the stream member changes. C++ cannot understand logical constness, only bitwise constness. If you have a modern compiler, the solution is very simple; you declare ``stream'' to be ``mutable fstream stream;'' in the class definition. I, however, have a very old compiler, so I have to find a different solution. This solution is, yet again, one of adding another level of indirection. I can have a pointer to an ``fstream.'' When in a ``const'' member function, the pointer is also ``const'', but not what it points to (there's a difference between a constant pointer, and a pointer to a constant.) The only reasonable way to achieve this is to store the stream object on the heap, and in doing this I introduce a possible danger; what if I forget to delete the pointer? Sure, I'll delete it in the destructor, but what if an exception is thrown already in the constructor, then the destructor will never execute (since no object has been created that must be destroyed.)

Do you remember the ``thing to think of until this month?'' The clues were, destructor, pointer and delete. Thought of anything? What about this extremely simple class template?


  template <class T>
  class ptr
  {
  public:
    ptr(T* pt);
    ~ptr();

    T& operator*() const;
  private:
    ptr(const ptr<T>&); // we don't want copying
    ptr<T>& operator=(const ptr<T>&); // nor assignment

    T* p;
  };

  template <class T>
  ptr<T>::ptr(T* pt)
    : p(pt)
  {
  }

  template <class T>
  ptr<T>::~ptr()
  {
    delete p;
  }

  template <class T>
  T& ptr<T>::operator*() const
  {
    return *p;
  }
This is probably the simplest possible of the family known as ``smart pointers.'' I'll probably devote a whole article exclusively for these some time. Whenever an object of this type is destroyed, whatever it points to is deleted. The only thing we have to keep in mind when using it, is to make sure that whatever we feed it is allocated on heap (and is not an array) so it can be deleted with operator delete.

This solves our problem nicely. When this thing is a constant, the thing pointed to still isn't a constant (look at the return type for ``operator*,'' it's a ``T&,'' not a ``const T&.'') So, instead of using an ``fstream'' member variable called ``stream,'' let's use a ``ptr<stream>'' member named ``pstream.'' With this change, ``readElement'' must be slightly rewritten:


  template <class T>
  T FileArray<T>::readElement(size_t index) const
  {
    (*pstream).seekg(sizeof(max_size)+index*sizeof(T));
    // what if seek fails?

    T t;
    (*pstream).read((char*)&t, sizeof(t));
    // what if read fails?

    return t;
  }
I bet the change wasn't too horrifying.

  template <class T>
  void FileArray<T>::storeElement(size_t index,
                                  const T& elem)
  {
    (*pstream).seekp(sizeof(max_size)+index*sizeof(T),
                     ios::beg);
    // what if seek fails?

    (*pstream).write((char*)&elem, sizeof(elem));
    // what if write failed?
  }
Now for the constructors:

  template <class T>
  FileArray<T>::FileArray(const char* name, size_t size)
    : pstream(new fstream(name, ios::in|ios::out|ios::binary)),
      max_size(size)
  {
    // what if the file could not be opened?

    // store the size on file.
    (*pstream).write((const char*)&max_size,
                     sizeof(max_size));
    // what if write failed?

    // We want to write a value (any value) at the end
    // to make sure there is enough space on disk.

    T t;
    storeElement(max_size-1,t);
    // What if this fails?
  }

  template <class T>
  FileArray<T>::FileArray(const char* name)
    : pstream(new fstream(name, ios::in|ios::out|ios::binary)),
      max_size(0)
  {
    // get the size from file.
    (*pstream).read((char*)&max_size,
                    sizeof(max_size));
    // what if read fails or max_size == 0?
    // How do we know the file is even an array?
  }
The access members:

  template <class T>
  T FileArray<T>::operator[](size_t size) const
  {
    // what if size >= max_size?
    return readElement(size);
    // What if read failed because of a disk error?
  }

  template <class T>
  FileArrayProxy<T> FileArray<T>::operator[](size_t size)
  {
    // what if size >= max_size?
    return FileArrayProxy<T>(*this , size);
  }
Well, this wasn't too much work, but then, as can be seen by the comments, there's absolutely no error handling here. I've left out the ``size'' member function, since its implementation is trivial.

Next in line is ``FileArrayProxy<T>.''


  template <class T>
  class FileArrayProxy
  {
  public:
    // copy constructor generated by compiler
    operator T() const;
    FileArrayProxy<T>& operator=(const T& t);
    FileArrayProxy<T>&
      operator=(const FileArrayProxy<T>& p);
    // read from one array and write to the other.
  private:
    FileArrayProxy(FileArray<T>& f, size_t i);

    size_t index;
    FileArray<T>& fa;

    friend class FileArray<T>;
  };
The copy constructor is needed, since the return value must be copied (return from ``FileArray<T>::operator[],'') and it must be public for this to succeed. The one that the compiler generates for us, which just copies all member variables, will do just fine. The compiler doesn't generate a default constructor (one which accepts no parameters,) since we have explicitly defined a contructor. The assignment operator is necessary, however. Sure, the compiler will try to generate one for us if we don't, but it will fail, since references (``fa'') can't be rebound. Note, however, that if we instead of a reference had used a pointer, it would succeed, but the result would *NOT* be what we want. What it would do is to copy the member variables, but what we want to do is to read data from one array and write it to another.

Now for the implementation:


  template <class T>
  FileArrayProxy<T>::FileArrayProxy(FileArray<T>& f,
                                    size_t i)
    : index(i),
      fa(f)
  {
  }

  template <class T>
  FileArrayProxy<T>::operator T() const
  {
    return fa.readElement(index);
  }

  template <class T>
  FileArrayProxy<T>&
  FileArrayProxy<T>::operator=(const T& t)
  {
    fa.storeElement(index,t);
    return *this;
  }

  template <class T>
  FileArrayProxy<T>& FileArrayProxy<T>::operator=(
    const FileArrayProxy<T>& p
  )
  {
    fa.storeElement(index,p);
    return *this;
  }

#endif // FARRAY_HPP
That was it. Can you see what happens with the proxy? Let's analyze a small code snippet:

  1 FileArray<int> arr("file",10);
  2 arr[2]=0;
  3 int x=arr[2];
  4 arr[0]=arr[2];
On line two, ``arr.operator[](2)'' is called, which creates a ``FileArrayProxy<int>'' from ``arr'' with the index 2. The object, which is a temporary and does not have a name, has as its member ``fa'' a reference to ``arr'', and as its member ``index'' the value 2. On this temporary object, ``operator=(int)'' is executed. This operator in turn calls ``fa.storeElement(index, t),'' where ``index'' is still 2 and the value of ``t'' is 0. Thus, ``arr[2]=0'' ends up as ``arr.storeElement(2,0)''. On line 3, a similar proxy is created through the call to ``operator[](2)'' This time, however, the ``operator int() const'' is called. This member function in turn calls ``fa.readElement(2)'' and returns its value, thus ``int x=arr[2]'' translates to ``int x=arr.readElement(2).'' On line 4, finally, ``arr[0]=arr[2]'' creates two temporary proxies, one referring to index 0, and one to index 2. The assignment operator is called, which in turn calls ``fa.storeElement(0,p)'', where p is the temporary proxy referring to element 2. Since ``storeElement'' wants an ``int,'' ``p.operator int() const'' is called, which calls ``arr.readElement(2).'' In other words ``arr[0] = arr[2]'' generates the code ``arr.storeElement(0, arr.readElement(2)).''

As you can see, the proxies don't add any new functionality, they're just syntactic sugar, albeit very useful. With them we can treat our file arrays very much like any kind of array. There's one thing we cannot do:


  int* p = &arr[2];
  int& x = arr[3];
  *p=2;
  x=5;
With ordinary arrays, the above would be legal and have well defined semantics, assigning arr[2] the value 2, and arr[3] the value 5. With our file array we cannot do this, but unfortunately the compiler does not prevent it (a decent compiler will warn that we're binding a constant or pointer to a temporary.) We'll mend that hole next month (think about how) and also add iterators, which will allow us to use the file arrays almost exactly like real ones.

In memory data formatting

One often faced problem is that of converting strings representing some data to that data, or vice versa. With the aid of ``istrstream'', ``ostrstream'' and ``strstream'', this is easy. For example, say we have a string containing digits, and want those digits as an integer, the thing to do is to create an ``istrstream'' object from the string. An example will explain:


  char* s = "23542";
  istrstream is(s);
  int x;
  is >> x;
After executing this snippet, ``x'' will have the value 23542. ``istrstream'' isn't much more exciting than that. ``ostrstream'' on the other hand is more exciting. There are two alternative uses for ``ostrstream.'' One where you have an array you want to store data in, and one where you want the ``ostrstream'' to create it for you, as needed (usually because you have no idea what size the buffer must have.) The former usage is like this:

  char buffer[24];
  ostrstream os(buffer, sizeof(buffer));
  double x=23.34;
  os << "x=" << x << ends;
The variable ``buffer'' will contain the string ``x=23.34'' after this snippet. The stream manipulator ``ends'' zero terminates the buffer. Zero termination is not done by default, since the stream cannot know where to put it, and besides you might not always want it.

The other variant, where you don't know how large a buffer you will need, is generally more useful (I think.)


  ostrstream os;
  double x=23.34, y=34.45;
  os << x << '*' << y << '=' << x*y << ends;
  const char* p = os.str();
  const size_t length=os.pcount();

  // work with p and length.
  os.freeze(0); // release the memory.
I think the example pretty much shows what this kind of usage does. The member function ``str'' returns a pointer to the internal buffer (which is then frozen, that is, the stream guarantees that it will not deallocate the buffer, nor overwrite it. Attempts to alter the stream while frozen, will fail.) ``pcount'' returns the number of characters stored in the buffer. Last ``freeze'' can either freeze the buffer, or ``unfreeze'' it. The latter is done by giving it a parameter with the value 0. I find this interface to be unfortunate. It's so easy to forget to release the buffer (by simply forgetting to call ``os.freeze(0)'') and that leads to a memory leak.

``strstream'' finally, is just like ``fstream'' the combined read/write stream.

The string streams can be found in the header <strstream.h> (or for some compilers <strstrea.h>.)

Standards update

With the C++ standard, a lot of things have changed regarding streams. As I mentioned already last month, the headers are actually <iostream> and <fstream>, and the names std::istream, std::ostream, etc. The streams are templatized too, which both makes life easier and not. The underlying type for std::ostream is:


  std::basic_ostream<class charT,
                     class traits=std::char_traits<charT> >
``charT'' is the basic type for the stream. For ``ostream'' this is ``char'' (ostream is actually a typedef.) There's another typedef, ``std::wostream'', where the underlying type is ``wchar_t'', which on most systems probably will be 16-bit Unicode. The class template ``char_traits'' is a traits class which holds the type used for EOF, the value of EOF, and some other house keeping things.

Why the standard has removed the file stream open modes ios::create and ios::nocreate is beyond me, as they're extremely useful.

Casting is ugly, and it's hard to see in large code blocks. There are four new cast operators, that are highly visible, in the standard. They're (in approximate order of increasing danger,) dynamic_cast<T>, static_cast<T>, const_cast<T> and reinterpret_cast<T>. In the binary streaming seen in this article, reinterpret_cast<T> would be used, as a way of saying, ``Yeah, I know I'm violating type safety, but hey, I know what I'm doing, OK?'' The good thing about it is that it's so visible that anyone doubting it can easily spot the dangerous lines and have a careful look. The syntax is: os.write(reinterpret_cast<const char*>(&variable), sizeof(variable));

Finally, the generally useful strstreams has been replaced by ``std::istringstream'', ``std::ostringstream'' and ``std::stringstream'' (plus wide variants, std::wistringstream, etc.) defined in the header <sstream>. They do not operate on ``char*'', but on strings (there is a string class, or again, rather a string class template, where the most important template parameter is the underlying character.) ``std::ostringstream'' does not suffer from the freeze problem that ``ostrstream'' does.

Recap

The news this month were:

  • streams dealing with files, or in-memory formatting, are used just the same way as the familiar ``cout'' and ``cin,'' which saves both learning and coding (the already written ``operator<<'' and ``operator>>'' can be used for all kinds of streams already.)
  • streams can be used for binary, unformatted I/O too. This normally doesn't make sense for ``cout'' and ``cin'' or in-memory formatting (as the name implies,) but it's often useful when dealing with files.
  • It is possible to move around in streams, at least file streams and in-memory formatting streams. It's generally not possible to move around in ``cin'' and ``cout.''
  • proxy classes can be used to differentiate read and write operations for ``operator[]'' (the construction can of course be used elsewhere too, but it's most useful in this case.)
  • friends break encapsulation in a way that, when done right, strengthens encapsulation.
  • there's a difference between logical const and bitwise const, but the C++ compiler doesn't know and always assumes bitwise const.
  • truly simple smart pointers can save some memory management house keeping, and also be used as a work around for compilers lacking ``mutable'' (i.e. the way of declaring a variable as non-const for const members, in other words, how to differentiate between logical and bitwise const.)
  • streams can be used also for in-memory formatting of data.

Exercises

  • Improve the file array such that it accepts a ``stream&'' instead of a file name, and allows for several arrays in the same file.
  • Improve the proxy such that ``int& x=arr[2]'' and ``int* p=&arr[1]'' becomes illegal.
  • Add a constructor to the array that accepts only a ``size_t'' describing the size of the array, which creates a temporary file and removes it in its destructor.
  • What happens if we instantiate ``FileArray'' with a user defined type? Is it always desireable? If not, what is desireable? If you cannot define what's desireable, how can instantiation with user defined types be banned?
  • How can you, using the stream interface, calculate the size of a file?

Coming up

Next month will be devoted to improving the ``FileArray.'' We'll have iterators, allow arbitrary types, add error handling and more. I assume I won't need to tell you that it'll be possible to use the ``FileArray,'' just as ordinary arrays with generic programming, i.e. we can have the exact same source code for dealing with both!

/Björn

 

Linkbar