An Introduction to C++ Programming - Part 7/13

From EDM2
Jump to: navigation, search
An Introduction to C++ Programming / Part
1 2 3 4 5 6 7 8 9 10 11 12 13

Written by Björn Fahller

Iterators

Introduction

I admit it. I've had very little inspiration for writing this month. Maybe it's the winter darkness, or the to be switch of jobs. However, instead of leaving you in the dark for a month, here's some food for thought, even though it's short and not the planned details on inheritance.

Example:

Study this function template:


  template <class IN, class OUT>
  OUT copy(IN begin, IN end, OUT dest)
  {
    while (begin != end) {
      *dest = *begin;
      ++begin;
      ++dest;
    }
    return dest;
  }

What does it mean? Let us first have a look at the requirements on the types for IN and OUT.

IN: Must be copy-able (since the parameters are passed by value, not by reference.) Must be comparable with operator !=. Must have an operator*, whose return value must be assignable. Operator ++ (prefix) must be allowed.

OUT: Must be copy-able, both for the parameter and the return value. Must have operator*, whose return value must be something which can be assigned to the result of operator* for IN. Operator++ (prefix) must be allowed.

One match for "IN" and "OUT" is obvious: a pointer in an array. Let's say that the types are "T1*" and "T2*", then this is legal for all types "T1" that can be assigned (or implicitly converted to, from) a type "T2".

For example:


  int iarr[]={ 12,23,34,45};
  size_t isize=sizeof(iarr)/sizeof(iarr[0]);
  double darr[isize];
  copy(iarr,iarr+isize,darr);

So, what does the above do? The name no doubt gives a hint, but let's analyze it. At the call-site, the function template is expanded to a template function, as follows:


  double copy<int*,double*>(int* begin, int* end, double* dest)
  {
    while (begin != end) {
      *dest = *begin;
      ++begin;
      ++dest;
    }
    return dest;
  }

What this function does is to assign the value pointed to by "dest" the value of the dereferenced pointer "begin", as long as "begin" does not equal "end", and then increment "begin" and "dest."

This puts a requirement on "begin" and "end" by using operator++ (prefix only) on "begin", it must be possible to reach a state where "begin" does not compare unequal to "end." For example, "begin" might be the beginning of an array, and "end" the end of one. Note, however, that when "begin" and "end" are equal, no copying is done. That is, "copy(a,a,b)" does nothing at all (except return "b"). This means that to copy an entire array, like in the example above, "end" must point one past the last element of the array, like this:

                 +----+-----+-----+-----+----+
  primes_lt_10 = | 2  |  3  |  5  |  7  |XXXX|
                 +----+-----+-----+-----+----+
                   ^                      ^
                   |                      |
                 begin                   end (points to non-existing
                                              element one past the last
                                              one.)

Fortunate as it is, this is legal in C and C++. It illegal to dereference the "one-past-the-end" pointer, but the value is legal, and decrementing it will make it point to the last element (as opposed to making it point n, where n>1, elements past the end, which yields "undefined behaviour", i.e. all bets are off.)

This is useful, we can now copy arrays (or parts of arrays) of any type to arrays (or parts there of) of any type, if the source type can be implicitly converted to the destination type, by using the copy function template. Very useful.

However, the fun has only begun. The real joy begins when we realize we can write our own types that behaves the same way.

Other uses

Let's assume we want to print the contents of an array. How can we do this? Of course we can, as usual, use a loop over all elements and print them, but I can assure you, we can use the copy function template to do it. How?

Printing an array (actually, the values in an array) means copying the values from the array to the screen, through some conversion (the output formatting.) The problem thus becomes, how do we make a type that does the necessary conversion, prints on the screen, and conforms to the requirements stated for the template parameter "OUT?"

The secrets seems to lie in the lines "*dest = *begin" and "++dest". As I see it, there are two alternatives.

Either "*dest = *begin" prints the value of "*begin" on the screen, and "++dest" does nothing at all, or "*dest = *begin" makes our variable "dest" remember the value to print and on "++dest" it does the printing. To show you how this can be done, I will do the former. You can do the latter as an exercise.

Of course, it is now necessary to see how some more operators can be overloaded; operator* and operator++. Of these, operator* is very straight- forward, while operator++ requires some thought since there are two operator++, one postfix and one prefix (i.e. there's a difference between "dest++" and "++dest.")

Assuming a class C, operator ++ is (usually) modeled as follows (and always with these member function signatures:)


  class C
  {
  public:
    ... // misc
    C& operator++(void); // prefix, i.e. ++c;
    C operator++(int);   // postfix, i.e. c++
    ... // other misc.
  };


  C& C::operator++(void)
  {
    ... // whatever's needed to "increment" it.
    return *this;
  }

  C C::operator++(int) // throw away int, it's not used, other than to
  {                    // distinguish between pre- and post-fix ++
    C old_val(*this);  // remember the old value
    ::operator++();    // let the prefix version do the job
    return old_val;    // and return the old value
  }

Needless to say, decrementing is analogous.

Let's make our simple "int_writer" class, which writes integers on standard output (i.e. normally the screen.)


  class int_writer
  {
  public:
    // trust the compiler to generate necessary constructors and
  destructor
    int_writer& operator++();
    int_writer& operator*();
    int_writer& operator=(int i); // does the real writing.
  };

Operator++ we implement to do nothing at all, since it's not used for anything, but operator* and operator= are interesting. If you look at a pointer to T, dereferencing it yields a T. In this case, however, I say that dereferencing an int_writer yields an int_writer. Weird? Well, yes. It's weird, but it makes perfect sense anyway. What do I want to use the result of operator* for? Only for assigning to, and I want the assignment to write something on standard output, right? If I make operator* return the very object for which operator* was called on, we can use operator= for that class to do the writing. If we made operator* return some other type, we need to create two types, one int_writer, and one class whose only job in life is to be assignable by int, and which very assignment actually means writing. Perhaps the latter is purer, but the former is so much less work. Here's what the implementation looks like:


  int_writer& int_writer::operator++()
  {
    return *this; // do nothing
  }

  int_writer& int_writer::operator*()
  {
    return *this; // do nothing
  }

  int_writer& int_writer::operator=(int i)
  {
    cout << i << endl;
    return *this;
  }

This means that if "dest" is of type int_writer, the line "*dest = *begin", can be expanded to:


  dest.operator*().operator=(*begin);

Since the return value of "dest.operator*()" is a reference to "dest" itself, the following "operator=(*begin)" means "dest.operator=(*begin)", and if the type of "*begin" can be implicitly converted to "int", we're writing something. Cool eh? Here's all it takes to write the contents of the prime number array:


  copy(iarr,iarr+isize,int_writer());

Of course, the name "int_writer" is a dead giveaway for a class template, isn't it? Why limit it to integers only?


  template <class T>
  class writer
  {
  public:
    // trust the compiler to generate necessary constructors
    // and destructor
    writer<T>& operator++();
    writer<T>& operator*();
    writer<T>& operator=(const T&); // does the real writing.
  };

  template <class T>
  writer<T>& writer<T>::operator++()
  {
    return *this;
  }

  template <class T>
  writer<T>& writer<T>::operator*()
  {
    return *this;
  }

  template <class T>
  writer<T>& writer<T>::operator=(const T& t)
  {
    cout << t << endl;
  }

I've changed the signature for "operator=" to accept a const reference instead of a value, since T might be a type for which copying is expensive.

The prime number copying now becomes:


  copy(iarr,iarr+isize,writer<int>());

With this template, yet another requirement surfaced; for writer<T>, T must be writable through operator<<. Of course, that's no surprise, how else would you write it?

As a last example, let's have a look at the source side, the types for "IN". Can we create a type matching the requirements for "IN", such that a copy would read values from standard input (normally the keyboard?)

The requirements for "IN" are a little bit more complicated than those for "OUT." It must be not-equal comparable, it must be possible to reach one value from another, through operator++, such that operator!= yields true, and operator* must return a value.

This requires some thought, especially on the reachability issue. To make this example simple, I propose that we can create a "reader<T>" with a number, and the number is the amount of T's to read from standard input. For every operator++, a new T is read, and the number of reads remaining is decremented. It must also be possible to create an "end" reader<T>, and we can use the parameter-less constructor for that. Here's how reader<T> might look like:


  template <class T>
  class reader
  {
  public:
    reader(unsigned count=0);
    reader<T>& operator++();
    const T& operator*() const;
    int operator!=(const reader<T>& r) const;
  private:
    unsigned remaining;
    T t;
  };

  template <class T>
  reader<T>::reader(unsigned count)
   : remaining(count) // the number of remaining reads, 0 for
  {                   // the parameter-less constructor.
  }

  template <class T>
  reader<T>& reader<T>::operator++()
  {
    if (remaining > 0 )
      cin >> t; // read a new value only if
                // there are values to read.
    return *this;
  }

  template <class T>
  const T& reader<T>::operator*() const
  {
    return t; // return the last read value.
  }

  template <class T>
  int reader<T>::operator!=(const reader<T>& r) const
  {
    return r.remaining != 0 || remaining != 0;
  }

The last one's perhaps debatable; I've decided that operator != is really only useful for comparing with the end, i.e. it will only return false if both sides have reached the end (i.e. no remaining reads) state.

However, this is mighty neat:


  const unsigned size=5;
  float array[size];

  // read 5 integers from standard input and
  // store in our float array.
  copy(reader<int>(size),reader<int>(),array);

  // print the read values as unsigned long's
  copy(array,array+size,writer<unsigned long>());

Conclusion

What you've seen here is, most probably your first ever encounter with, generic programming. The template parameters "IN" and "OUT" (from "copy") are called "iterators," or "input iterators" and "output iterators" to be more specific. Anything that behaves like an input iterator, can be used as one, and likewise for output iterators. We can write input iterators for data base access, enumerating files in a directory, the series of prime numbers or whatever you want to get values from. We can write output iterators to store values in a data base, to enter values at the end of a linked list, to send audio data to our sound card, and whatever you need. The function template "copy" will be useful for any combination of the above, as long as the types are convertable from *IN to *OUT. This is *VERY* useful. If you write an algorithm in terms of generic iterators, you can use it with any kind of data source/sink which have iterators that follows your convention.

To make matters even better, this is all part of the now final draft C++ standard. The draft contains a function template "copy", which behaves identically to what I used in this article, and iterators called "input_iterator" and "output_iterator" which behaves very similarly to the "reader" and "writer" class templates.

The standard documents 5 iterator categories, input iterator, output iterator, forward iterator (sort of the combination, allows both read and write,) bidirectional iterator (like forward, but allows moving backwards too,) and lastly random access iterators (iterators which can be incremented/decremented by more than one.) Pointers in arrays are typical bidirectional iterators. If you write your iterators to comply with the requirements of one of these categories, your iterators can be used with any of the algorithms that requires such iterators. Every algorithm you write can be used with any iterators of the type your algorithm requires. That is a major time/code/debug saver.