Encapsulating Extended Attributes - Part 1/2

From EDM2
Jump to: navigation, search

Written by Björn Fahller

Part 1

Part 2

Introduction

A number of times, I have wanted to use extended attributes in my programs, and as many times, have I decided to do without them, after having read the API documentation. This time was going to be different, though, because this time there was no way out. I needed extended attributes. After some experimentation, I decided to never again dig in the dirt of extended attribute programming, not by avoiding them forever, but by writing a powerful C++ frame work that does the job for me. To be useful, the following conditions must be met:

  • I need to understand how to read and write extended attributes.
  • The frame work must be simple to use, and extend, otherwise the effort is wasted, even if it means that the implementation of the frame work will be hairy.

This article, part one of two, is about the first; how to program extended attributes with the OS/2 API.

What are Extended Attributes?

Extended attributes are data that is attached to a file, but is not a part of the file itself. Extended attributes are used, for example, for icons, WPS long names (if you use the FAT file system,) file types, and the association table of a program. Basically, there is no limit to what you can find in extended attributes. Well, yes, there is one limit. The size of the sum of all extended attributes attached to a file can not be larger than 64Kb.

Extended attributes have a type (string, icon, metafile and so on,) and are stored and accessed through a name (and, of course, the file they are attached to.)

Finding the names

As mentioned above, extended attributes are stored with a name. If you know the name of an extended attribute, you can read it from a file, but given a file, how do we know what extended attributes there are to read?

The function to call is DosEnumAttribute. To read the names with DosEnumAttribute, you need the file (by name, or as a file handle,) and an area to store the names. Simple enough? No. The problem lies in the format of the area.

The names are stored in a data structure called DENA2, and there came the first problem. In my (admittedly old) reference manual, DENA2 is not documented. A little header file reading showed that it is a synonym for FEA2, though. The FEA2 data structure looks as follows:

typedef struct _FEA2         /* fea2 */
{
   ULONG   oNextEntryOffset;    /* new field */
   BYTE    fEA;
   BYTE    cbName;
   USHORT  cbValue;
   CHAR    szName[1];           /* new field */
} FEA2;
typedef FEA2 *PFEA2;

Figure 1: The FEA2 structure, as defined in BSEDOS.H.

This structure cannot be used as is, since that doesn't leave enough room for the name (only one character,) and even if the buffer needed is over sized, there will still only be room for one name. In other words, a little flexibility is needed when interpreting this. What is needed is a memory area of, unfortunately, unknown size, which can be seen as a linked list of FEA2 structures. The first 4 bytes (oNextEntryOffset) is the offset to the next structure in the list (0 means it is the last one.) fEA is an odd one. It can only have the values 0 or 0x80. 0x80 means that the attribute is mandatory for the file, and that stripping it is an error. cbName is the length of the name, and cbValue the length of the still unknown data identified by this name.

OK, now we can read the names of all extended attributes attached to a file:

const unsigned size = 2000; // let's hope 2000 bytes is enough.
void* pBuffer = (void*)(new char[size]);
ULONG count = -1; // means, fill as many names as there is room for
APIRET rc = DosEnumAttribute(ENUMEA_REFTYPE_PATH,
                             PVOID(name), // Yuk, I hate this API!
                                      // name of the file.

                             1, // ordinal index of first EA to read,
                             pBuffer, // place to store values in
                             size,    // and its size.
                             &count,  // nr of names
                             ENUMEA_LEVEL_NO_VALUE); // only legal value!!

Figure 2: Reading the names with DosEnumAttribute.

A few things are interesting to note here:

If size is > 65364, and the file is not found, a large portion of your memory, starting with the address of the buffer, will be zeroed, invalidating most of your dynamically allocated data (very forgiving error control here, isn't it? Especially since the size ceiling is not documented anywhere.).

Before the call, count tells how many names we are prepared to accept. An often used method is to set count only to 1, and iterate over many calls to DosEnumAttribute. Since this is a multi tasking operating system, the extended attribute set of the file may change from one call to the next, so that might not be a good idea, even if it makes the programming simpler. By setting count to -1, we tell DosEnumAttribute to fill the buffer with as many names as there is room for. After returning, count tells how many names that actually were read.

The last parameter, ENUMEA_LEVEL_NO_VALUE, is interesting. This is the only legal value. One wonders what it is for in the first place.

Now to display the result:

PDENA2 pDena = (PDENA2)pBuffer; // buffer with names.
ULONG offset = 0;
if (count != 0) {
  do {
     pDena = PDENA2((CHAR*)(pDena) + offset); // point to next entry,
                                              // first time offset == 0

     cout << "oNextEntryOffset:\t" << pDena->oNextEntryOffset << endl;
     cout << "fEA:\t\t" << int(pDena->fEA) << endl;
     cout << "cbName:\t\t" << int(pDena->cbName) << endl;
     cout << "cbValue:\t\t" << pDena->cbValue << endl;
     cout << "szName:\t\t" << pDena->szName << endl;
     cout << endl;
     offset = pDena->oNextEntryOffset;
  } while ( pDena->oNextEntryOffset != 0 ); /* enddo */
} /* endif */

Figure 3: Displaying the content of a FEA2 structure.

Of course there are many ways to traverse the FEA2 list, but unfortunately they all include a lot of ugly type casting.

Reading Extended Attributes

Knowing what attributes there are on a file is good, but for most cases not very satisfying. One way or the other, the value of a known EA must be copied from the file storage to the memory, where it can be inspected and manipulated. This can be done, to my knowledge, by the API calls DosQueryFileInfo, and DosQueryPathInfo. They are both handled the same way.

Getting the extended attributes

When reading and writing extended attributes, a structure called EAOP2 is used. EAOP2 looks as follows:

typedef struct _FEA2LIST     /* fea2l */
{
   ULONG   cbList;
   FEA2    list[1];
} FEA2LIST;
typedef FEA2LIST *PFEA2LIST;

typedef struct _GEA2          /* gea2 */
{
   ULONG   oNextEntryOffset;     /* new field */
   BYTE    cbName;
   CHAR    szName[1];            /* new field */
} GEA2;
typedef GEA2 *PGEA2;

typedef struct _GEA2LIST      /* gea2l */
{
   ULONG   cbList;
   GEA2    list[1];
} GEA2LIST;
typedef GEA2LIST *PGEA2LIST;

typedef struct _EAOP2         /* eaop2 */
{
   PGEA2LIST   fpGEA2List;       /* GEA set */
   PFEA2LIST   fpFEA2List;       /* FEA set */
   ULONG       oError;           /* offset of FEA error */
} EAOP2;
typedef EAOP2 *PEAOP2;

Figure 4: EAOP2, and containing types, as defined in BSEDOS.H.

OK. EAOP2 contains a pointer to an ugly list of FEA2 and... wait a minute. Wasn't FEA2 a list when used in DosEnumAttribute? Yes, it was, or rather, DosEnumAttribute was actually passed a FEA2LIST, except for the size part... Argg... Um, yes FEA2LIST.LIST was passed. No wonder all arguments are passed as void*. OK. EAOP2 contains a pointer to an ugly list of FEA2, with a size (but the size is the size in bytes of the whole list, including the size itself, and not the number of elements in the list.) EAOP2 also contains an ugly list, with size, of GEA2. To confuse poor developers, the GEA2 list is where the EA names to be read are passed, and the FEA2 list, which until now contained names, will contain the read values. To make matters worse, the GEA2 structures must be aligned on double word boundaries.

Given the name list from DosQueryPathInfo, the values, all of them, can be read this way:

PDENA2 pDena = (PDENA2)pBuffer; // buffer from DosEnumAttribute

const ULONG eaop2size = 65000;
PEAOP2 pEAOP2 = PEAOP2(new char[eaop2size]);

// Let the fpGEA2List part of pEAOP2 point to the memory location
// just after the EAOP2 structure itself.

pEAOP2->fpGEA2List = PGEA2LIST((CHAR*)pEAOP2 + sizeof(EAOP2));

// A walker is needed for traversing the list

PGEA2 pGea2 = (&(pEAOP2->fpGEA2List->list[0]));

ULONG offset = 0;
if (count != 0) {
  do {
     pDena = PDENA2((CHAR*)(pDena) + offset); // next entry,
                                              // first time offset == 0

     // get the name and size of the name

     strcpy(pGea2->szName, pDena->szName);
     pGea2->cbName = pDena->cbName;

     // calculate the length, and align with double word boundary.

     ULONG length = pGea2->cbName + sizeof(pGea2->cbName) +
                       sizeof(pGea2->oNextEntryOffset);
     if (length % 4) {
        length+= 4-(length%4);
     } /* endif */

     // set oNextEntryOffset to 0, indicating end of list, if
     // oNextEntryOffset for the DENA list is 0.

     offset = pDena->oNextEntryOffset;
     pGea2->oNextEntryOffset = offset ? length : 0;

     // set pGea2 to point to the next location regardless of
     // whether we're at the end of the list. It will be used
     // for the size calculation.

     pGea2 = PGEA2((CHAR*)pGea2 + length);
  } while ( pDena->oNextEntryOffset != 0 ); /* enddo */

  pEAOP2->fpGEA2List->cbList = ((char*)pGea2 -
                                        (char*)(pEAOP2->fpGEA2List));

  // set fpFEA2List to point to the area immediately after the
  // GEA2List

  pEAOP2->fpFEA2List = PFEA2LIST((CHAR*)pEAOP2->fpGEA2List +
                             pEAOP2->fpGEA2List->cbList);
  pEAOP2->fpFEA2List->cbList = eaop2size -
                                        ((CHAR*)pEAOP2->fpFEA2List -
                                        (CHAR*)pEAOP2);

  rc = DosQueryPathInfo(name,
                        FIL_QUERYEASFROMLIST,
                        PVOID(pEAOP2),
                        sizeof(EAOP2));

Figure 5: Filling the EAOP2 structure for reading.

Figure 5 explains a lot of why I decided to write a class library. This kind of pointer arithmetic is not fun, and rather error prone. I lost count on the number of access violations I got when writing this little EDM/2 code snippet.

If you are the kind that carefully dissects code examples, you have noticed a few oddities, despite the jungle of type casts. The most glaring one, to me, is that the size passed to DosQueryPathInfo is not the size of the buffer holding all the data, but simply the size of an EAOP2 structure. Passing it any other size, the nice error handling mechanisms of the EA API gives you a trap c0000005 (Access Violation) in DOSMERGE.DLL (The very forgiving error handling strikes again, and also this time, the legal size space is thoroughly undocumented.) This, however, implies that the above arithmetic's can be simplified a little bit. Allocate an EAOP2 structure on the stack, and the FEA2LIST and GEA2LIST separately on the heap. As a matter of fact, that is how I have done it in the C++ frame work.

Interpreting the extended attributes

Now the extended attributes are in memory, but we still can't get to them. All that is known is that some how, they are hiding in the FEA2LIST, but as far as the documentation goes, there is no place for data in the FEA2LIST. Time for yet a little liberal interpretation of the meaning of structs. The first byte of the data, is the byte after the 0 terminator of the name string in the individual FEA2 structures. The length of the data field is, surprisingly, exactly the number of bytes told by cbValue of the FEA2 structure.

The first 2 bytes of the data is always the type. How the rest of the data is formatted, depends on the type. An ASCII string attribute has the type EAT_ASCII (0xfffd), and is stored as follows:

0xfffd      0xyyyy    sString

Where sString is the ASCII string. Note, that the string is not 0 terminated. For example, the string "hello", is stored as:

0xfffd 0x0005 'h' 'e' 'l' 'l' 'o'

Other standard types are binary data (EAT_BINARY,) icon (EAT_ICON,) bitmap (EAT_BITMAP,) metafile (EAT_METAFILE,) pointer to another attribute (EAT_EA,) multi value multi type attribute sequence (EAT_MVMT,) multi value single type attribute sequence (EAT_MVST,) and ASN.1 (EAT_ASN1.)

Of these the multi value sequences are the most interesting, and those that caused the most headache when writing the C++ frame work. Ironically, from the standpoint of someone who likes strong compile time type checking, it was the single type sequence (an ordinary array, in other words,) that was by far the worst.

Writing Extended Attributes

Being able to read extended attributes is good. In many cases it is all you need. Often, though, you couldn't do without writing.

Extended attributes are written with DosSetPathInfo, DosSetFileInfo or DosOpen.

Oddly enough, writing extended attributes is simpler than reading them, so the worst part is done. When writing, our dear friend EAOP2 is used again, but this time fpGEA2List is ignored, and can safely be set to 0. If we want to copy the extended attribute set of one file to another, all the information that is needed, is already at hand. The fpFEA2List already contains the correct information from the read, so a call to DosSetPathInfo with the name of the other file will do the job. As with anything else, though, it is more interesting and challenging to create than to copy.

void writeEAs(const char* filename, const char* longname, const char* subject)
{
   const unsigned fea2listsize = 6000;
   const char LONGNAME[] = ".LONGNAME";
   const char SUBJECT[] = ".SUBJECT";
   EAOP2 eaop2;
   eaop2.fpGEA2List = 0;
   eaop2.fpFEA2List = PFEA2LIST(new char[fea2listsize]);
   PFEA2 pFEA2 = &eaop2.fpFEA2List->list[0];

   // create .LONGNAME EA
   pFEA2->fEA = 0; // .LONGNAME is not needed
   pFEA2->cbName = sizeof(LONGNAME)-1; // skip \0 terminator

   pFEA2->cbValue = strlen(longname)+2*sizeof(USHORT);
   //                                      ^
   //                           space for the type and length field.
   //

   strcpy(pFEA2->szName, LONGNAME);
   char* pData = pFEA2->szName+pFEA2->cbName+1; // data begins at
                                                // first byte after
                                                // the name
   *(USHORT*)pData = EAT_ASCII;             // type
   *((USHORT*)pData+1) = strlen(longname);  // length
   strcpy(pData+2*sizeof(USHORT), longname);// content

   pFEA2->oNextEntryOffset = sizeof(FEA2)+pFEA2->cbName+
                                   1+pFEA2->cbValue;

   // point to next EA, the .SUBJECT EA
   pFEA2 = PFEA2(PCHAR(pFEA2)+pFEA2->oNextEntryOffset);
   pFEA2->fEA = 0; // .SUBJECT is not needed
   pFEA2->cbName = sizeof(SUBJECT)-1; // skip \0 terminator

   pFEA2->cbValue = strlen(subject)+2*sizeof(USHORT);
   //                                      ^
   //                           space for the type and length field.
   //

   strcpy(pFEA2->szName, SUBJECT);
   pData = pFEA2->szName+pFEA2->cbName+1; // data begins at
                                                // first byte after
                                                // the name
   *(USHORT*)pData = EAT_ASCII;            // type
   *((USHORT*)pData+1) = strlen(subject);  // length
   strcpy(pData+2*sizeof(USHORT), subject);// content

   pFEA2->oNextEntryOffset = 0; // no more EAs to write.

   eaop2.fpFEA2List->cbList = PCHAR(pData+2*sizeof(USHORT)+
                                    pFEA2->cbValue)-PCHAR(eaop2.fpFEA2List);
   APIRET rc = DosSetPathInfo(filename,
                              FIL_QUERYEASIZE,
                              &eaop2,
                              sizeof(eaop2),
                              0);
   if (rc) {
      cerr << "DosSetPathInfo => " << rc << endl;
      return;
   } /* endif */
}

Figure 6: Setting up EAOP2 for writing .SUBJECT and .LONGNAME.

Simple enough (compared to reading, that is.) Still some ugly type casts and pointer arithmetic's, but not too bad. Writing is really simpler than reading.

Compile and run writer, just to test it. After it is run, open the file you write to in settings view (WPS speak here,) and look at the title (the same as the longname you passed,) and the Subject field under the File tab.

Conclusions

Hairy as the API is, it is possible to read and write extended attributes, but it isn't a joy, and the risk for errors is enormous. Hiding the dirty work behind a curtain of C++ classes seem like a good idea, and since the dirty work now has been done, it appears possible. Of course, since I already mentioned in the introduction that I have done it, it is possible, and in the next issue you will see how it was done.