Encapsulating Extended Attributes - Part 2/2

From EDM2
Jump to: navigation, search

Written by Björn Fahller

Part 1

Part 2

Introduction

In the previous issue, I showed you how extended attributes can be read and written with the OS/2 API. You probably remember that the code was all but clean, and also that I mentioned that a C++ frame work would makes the job easier.

This article, the second and last in the series, is about the design of a C++ frame work called "Your Extended Attribute Helper" (YEA.H for short) and it is a free class library that should be availible from the usual FTP sites by the time you read this. There is little in here that is specific to extended attributes, but more about how to use clever C++ techniques to provide good, usable, reusable and extensible class libraries that are simple to maintain. The ideas expressed can be used to solve many problems, not just handling extended attributes.

Recap of Extended Attributes

The key observation, regarding extended attributes from a frame work point of view, is that all attributes, regardless of type, are equal. Reading an extended attribute from a file is exactly the same operation no matter what type the attribute has, and once you have an in-memory representation of an extended attribute writing it to a file is the same operation, regardless of type. This is important, because it means it is possible to have one common base for all extended attributes, and this common base will not only work for the extended attributes coded into the frame work now, but will remain valid for all new kinds of attributes one might like to add in.

Things that are common to all extended attributes are that they are read/written with a name, that they have a type, and that they have a flag (fEA) associated with them.

The type of an extended attribute is not known until the attribute is read into memory. From a user's perspective, this means that it must be possible to ask an attribute what type it has, and then to reinterpret a pointer to it as the correct type. From an implementor's point of view, this means that extended attributes must be allocated on heap, and that it must be possible to create objects of a class that depends on the type information.

Extended attributes can be read from and written to a file represented by a file name and a file handle.

Goals for the Framework

  • It must be a frame work to which it is easy to add new extended attribute types, and new classes representing the standard extended attributes. Otherwise I will have to implement every single extended attribute type I know of and recompile everything after an addition. It would also become very inflexible. For example, someone implementing a PM extended attribute editor, might find it very convenient to represent the string (EAT_ASCII) attribute with an edit control, instead of an ordinary string class, and I believe a good frame work should allow that. If it was not possible to add new types to the frame work, it would be impossible to use with custom extended attributes.
  • Enjoying strict compile time type checking, I believe it must be possible to allocate an extended attribute object of a known type on the stack, and then to read its contents from a file and a name, accepting an error if the type is not the expected one. For example, the .SUBJECT attribute is supposed to be an ASCII string. If I am going to read the .SUBJECT attribute, there is little use in getting an anonymous extended attribute pointer, which I must check for the type and if proper, cast to the string attribute class. It is better to allocate an object on the stack, read it, and have an error handler called if it wasn't a string.
  • It must be simple to use.
  • It must be simple to extend. Ideally, the only programming needed to add a new type (or a new representation of a standard type) should be to write the methods for reading the data (and only the data, not the name, size, flags and such), and writing the data.
  • It should be usable together with the C++ fstream classes.
  • It must be compile-time type-safe.

Design Ideas

A problem which makes the programming much harder when reading and writing extended attributes is the possibility of reading and writing several extended attributes at the same time. It is, of course, nice to do, but what do you do if the 3rd attribute in the series of 5 is in error? Restricting the handling of extended attributes to only one at the time is a constraint, but really not that bad of a constraint. You can prevent the attribute set from being altered while reading/writing by locking the file you are operating on.

Putting the reading and writing, filling of the EAOP2 structure, and passing a pointer to the data to the specialized attribute implementation in a type-safe way is simple. Just put the code in a generic base class, and pass the data as an strstream.

The problem comes in with creating new objects of a class depending on the type identifier, without the implementor having to write that code, and without needing to repeat the type identifier over and over in user code. Is this possible to do? The answer is, as expected, yes. It is possible with the help of a bizarre, but fully legal C++ template construct.

class base
{
public:
  base(unsigned short i) : id(i) {};
  unsigned short identifier(void) const
  {
    return id;
  };
protected:
private:
  unsigned short id;
};

template <class T, unsigned short ID>
class tbase : public base
{
public:
  tbase(void) : base(ID) {};
  // will be able to refer to descendants id!!
  enum { typeId = ID };

  // "safe" casting.
  static T* cast(base*);
protected:
private:
  // Create instances of descendant!!
  static base* creator(void);
};

template <class T, unsigned ID>
base* tbase<T,ID>::creator(void)
{
  return new T;
}

template <class T, unsigned ID>
T* tbase<T,ID>::cast(base* pBase)
{
  return (T*)(pBase->identifier() == ID ? pBase : 0);
}

class derived : public tbase<derived, 0x5a5a>
{
};

Figure 1: Recursive class definition?

The truly bizarre part is that derived inherits tbase instantiated with derived! Don't try to understand this, it works, and it is legal C++. Bjarne Stroustrup even mentions this construct in "The Design and Evolution of C++." The good thing about this is that the identifier only needs to be explicitly written once, making the risk for inconsistencies due to typos practically nil, and there will be a uniform way of accessing the type identifier for all extended attribute classes, now and in the future. (for example, type comparisons can be written as pEA- >identifier() == derived::typeID.) It also means that the implementor of derived need not write the creator() method (which would be identical for all classes anyway,) thus reducing the amount of programming and, as a direct consequence, the risk for errors.

By using this construct, the frame work can check if the types are valid. By having a type identifier, a dictionary of types and creator functions can be looked up, to create a new extended attribute of the correct type on the heap, without the implementor having to write a single line of code to achieve it, and then pass that object the data part.

Of course, once Run Time Type Identification (RTTI) is available on most compilers, the unbelievably ugly, and not quite safe cast method (it fails in multiple inheritance situations, unless tbase<T,ID> is first in the child list, and always fails in case of virtual multiple inheritance) can be discarded.

Implementation

Enough talk, let's see some source code!

class EA
{
public:
  typedef unsigned short Identifier;
  typedef EA* (*Creator)(istrstream&);
  struct CreatorIdPair {
    Creator c;
    Identifier id;
    unsigned count;
  };
  typedef IMap<CreatorIdPair, Identifier> CreatorMap;
  static CreatorMap defaultCreatorMap;

  typedef unsigned char Flagset;
  enum { needed = 0x80 };
  struct Name {
    IString name;
    Flagset flags;
  };
  typedef IKeySortedSet<Name, IString> NameSet;

  typedef void (*ErrorHandler)(Error, unsigned long);
  static ErrorHandler errorHandler;

  static NameSet namesIn(const IString& file);
  static NameSet namesIn(fstreambase& file);

  virtual ~EA(void);

  Identifier attributeId(void) const;
  Flagset getFlags(void) const;
  void setFlags(Flagset f);

  static EA* newFrom(const IString& file,
                     const IString& name,
                     const CreatorMap& =
                        defaultCreatorMap);
  static EA* newFrom(fstreambase& file,
                     const IString& name,
                     const CreatorMap& =
                        defaultCreatorMap);

  static void remove(const IString& file,
                     const IString& name);
  static void remove(fstreambase& file,
                     const IString& name);

  void getFrom(const IString& file,
               const IString& name);
  void getFrom(fstreambase& file,
               const IString& name);

  void storeTo(const IString& file,
               const IString& name);
  void storeTo(fstreambase& file,
               const IString& name);

  virtual EA* clone(void) const = 0;
protected:
  virtual istrstream& readFrom(istrstream&) = 0;
  virtual ostrstream& writeTo(ostrstream&) = 0;
};

template <class T, EA::Identifier eaid>
class TEA : public EA
{
public:
  enum { id = eaid };

  static T* cast(EA*);
  static const T* cast(const EA*);
  virtual ~TEA(void) {};

  static void allowDynamic(EA::CreatorMap* pCM =
                              &EA::defaultCreatorMap);
  static void disallowDynamic(EA::CreatorMap* pCM =
                                 &EA::defaultCreatorMap);
};

Figure 2: The base classes of YEA.H, EA the generic base class, a TEA, the template base class to make specialisation's from.

What can be done with extended attributes in general, that is, what functionality does EA and TEA offer for the user of extended attributes?

All operations, reading, writing and removing an extended attribute can be done by file name and by an fstream (ifstream and ofstream both virtually inherit from fstreambase.) Reading the value to an already allocated extended attribute is done with the getFrom() methods. Removing is done with remove(), writing with storeTo().

Reading an extended attribute, regardless of type, is done with the EA::newFrom() methods. It uses a creator map to map the type id to a creator function (specified in TEA). This means that an object of the correct class (if any) will be created. There is a default creator map, but if you have a specific set of classes in mind for this particular read, you can provide your own creator map.

If an error occurs, the function pointed to by EA::errorHandler() is called. The default function throws exceptions, but you can provide your own handlers if that is not good enough.

You can get the set of EA names from a file with EA::namesIn(), and you can ask an attribute what id it has by calling attributeId().

TEA offers the two static cast methods. I dislike this construction, but until RTTI is available, it's the only construction I can see that works, and the methods allowDynamic() and disallowDynamic() for entering and removing creators from the creator/identity dictionary passed.

Now, what does this mean to the implementor of specific extended attributes?

// 0xfffd == EAT_ASCII
class strea : public TEA<strea, 0xfffd>,
              public IString
{
public:
  strea() {}; // construct empty string ea.
  strea(const strea& s) : TEA<strea,
                          strea::id>(s),
                          IString(s) {};

  // want to be able to read
  // a value upon construction.

  // create and read by name.
  strea(const IStrin& fname, const IString& eaname)
  {
     getFrom(fname, eaname);
  };

  // create and read via fstream
  // (i.e. by file descriptor.)
  strea(fstreambase& file, const IString& eaname)
  {
    getFrom(file, eaname);
  };
  virtual ~strea(void) {};

  // create an exact (as far as possible)
  // copy of self for
  // deep copying container EA classes.
  virtual EA* clone(void) const
  {
    return new strea(*this);
  }
protected:

  // override from EA to do the reading and writing.

  virtual istrstream& readFrom(istrstream& is);
  virtual ostrstream& writeTo(ostrstream& os);
};

Figure 3: Example of how a string extended attribute can be defined.

The definition for a simple string extended attribute is indeed simple, yet it does everything that is needed. Add a constructor from IString and an assignment operator, and it can be used exactly as an IString can, in addition (of course,) to being used as an extended attribute.

I still haven't showed you how to read and write with this, though.

istrstream& strea::readFrom(istrstream& is)
{
  unsigned short length;
  is.read((char*)(&length), sizeof(length));
        // get the length of the  string.
        // stored as binary, hence is.read
        // and not is >> length.


  // create a buffer,
  char* p = new char[length];
  // read the value into it
  is.read(p, length);
  // and assign the string its value.
  IString::operator=(IString(p, length));
  delete[] p;
  return is;
}

ostrstream& strea::writeTo(ostrstream& os)
{
  unsigned short length =
    (unsigned short)IString::length();
  // store the length (binary!!)
  os.write((char*)&length, sizeof(length));
  // and the string content.
  os << *this;
  return os;
}

Figure 4: The source code for reading and writing string extended attributes.

The source code you see in figure 4 is not only lean, easy to understand and safe. It is in fact not an example. It is the real code used in the string EA of YEA.H (in YEA.H it is called StringEA, though. I decided to give this example a slightly different name because of its much smaller public interface. The real StringEA has all the constructors and assignment operators that IString has.)

Simple enough?

Usage

After all this talk about how it was done, it is finally time to see how it is used.

#include <iostream.h>
#include <YEA.H>

int main(int argc, char* argv[])
{
   if (argc < 4) {
      cerr << "Usage: "
           << argv[0]
           << ":\t g filename eaname"
           << endl;
      cerr << "       "
           << argv[0]
           << ":\t s filename eaname value"
           << endl;
      return -1;
   } /* endif */
  try{
    switch (*argv[1]) {
    case 'g':
       {
         StringEA strea(argv[2], argv[3]);
         cout << "\""
              << strea
              << "\""
              << endl;
       }
       return 0;
    case 's':
       {
         StringEA strea(argv[4]);
         strea.storeTo(argv[2], argv[3]);
       }
       return 0;
    default:
       cerr << "Unknown switch '"
            << *argv[1]
            << "'"
            << endl;
       return -2;
    } /* endswitch */
  } /* endtry */
  catch (...) {
     cerr << "An error occurred" << endl;
     return -3;
  } /* endcatch */
}

Figure 5: getset.cpp, a small program to read/write string EA's

It is exactly as difficult as the source code above implies to use YEA.H for reading and writing extended attributes. I don't think there is much more to say about it.

Drawbacks

We aren't living in a perfect world, and there are ugly spots on everything, including the sun. YEA.H isn't free from them either.

I have mentioned a couple of drawbacks: the cast mechanism used is not safe, and only one attribute at the time can be dealt with (there is no problem with multi-value sequence attributes, though.)

The cast mechanism is by far the worst drawback - it is unsafe. If TEA isn't the first in the list of ancestors when using multiple inheritance, the cast will point to something invalid, and the only way you can notice, is by weird run-time errors. It is also rather ungraceful when dealing with deeper hierarchies. Unfortunately, without a compiler supporting run time type identification, it is the best I can do.

Another problem I see is that YEA.H is tied to IBM's ICLUI class library and Visual Age C++ (and other compilers supporting it the class library.) For containers, it would perhaps be better to use the Standard Template Library (STL), but which implementation then? Many are around but none cover the proposed standard, (because the compilers don't support all constructs that are needed) and they all solve the problems in different ways, leaving them incompatible with each other.

While the frame work is usable as is, and is very easy to extend, it is a problem that it is rather incomplete. The only types of extended attributes supported so far are EAT_ASCII (strings), EAT_MVST (typed sequences), and EAT_MVMT (untyped sequences). It may seem odd to support EAT_MVMT when the only attribute implemented that contains any information is EAT_ASCII, but a surprising number of extended attributes that are, in reality, sequences of strings, are stored as MVMT. .KEYPHRASES and .TYPE, are examples of such.

Having the above classes supported by default in the library is also a problem. Either all classes used must be instantiated at compile time, whether you use them or not (to allow dynamic creation,) or it is left for you, the user, to do that job (which is the route I decided to go, despite the fact that this means more work instead of less, and can be error prone.)

Conclusion

By examining examples of handling extended attributes through the OS/2 API in the previous issue, we have seen why so few programs support them. Programming for it is really a mess.

In the introduction to part 1, I wrote this about the frame work:

"The frame work must be simple to use and extend, otherwise the effort is wasted, even if it means that the implementation of the frame work will be hairy."

Have I succeeded in that? Is it simple to use, and extend? Seeing the implementation of the string extended attribute, and the example of getset.cpp, I'm prepared to say "yes, I believe I have succeeded." It wasn't easy, though - I spent many hours, fearing my brain would boil dry, trying to figure out how the class hierarchy should look to achieve this. This goes especially for the implementation of TSequenceEA (the EAT_MVST extended attribute,) which I decided not to talk about in this article.

If you want to use extended attributes in your software, feel free to use YEA.H.