Extended Attributes - what are they and how can you use them ?

By Roger Orr

Introduction
One of the major features of OS/2 (first introduced in version 1.2) is the installable file system. This provides a standard way to support different file systems under OS/2; the most obvious example of such a file system is the "High Performance File System" (HPFS) which is supplied with OS/2. The most glaring feature of HPFS over the traditional File Attribute Table (FAT) file system (as used by DOS) is that long file names are allowed, thus breaking the restrictive "8.3" format of DOS (and OS/2 1.1) which we are all familiar with.

However another feature contained in the installable file system interface is that of the attributes of a file. The original FAT file system design did allow a few, predefined, binary attributes such as 'read only' or 'system', but with OS/2 1.20 this idea was extended to a more general set of file attributes, which are therefore given the name of "Extended Attributes".

Extended attributes are a property of directories as well as of files.

Not content with only supporting these attributes for the new installable file systems OS/2 1.20 enhanced the original FAT structure to allow extended attributes for disks using the FAT format as well as the newer HPFS format.

This was achieved by making use of previously reserved fields in the directory entry for each file; and putting the EA data itself into a hidden file on the root directory - named "EA_DATA.SF". This is a nice feature except that, since is an enhancement to the traditional FAT structure, if the same disk is accessed under DOS it is terribly easy to destroy or corrupt the EAs. (This is particularly true of DOS-based backup programs which typically cannot cope at all either with the reserved fields in the directory or with the peculiarly named data file, and usually fail to backup either of these!)

The basic intention for extended attributes is to provide a mechanism to attach named data items (of variable structure) to a file. In order to allow maximum flexibility to the use of EAs certain item names were reserved for standard purposes - an example is the ".TYPE" extended attribute which defines the file type, such as "Plain Text". IBM recommended that other programs using EAs should include some unique designator, such as the company name and product, in the item name to avoid conflicts.

Use of EAs by OS/2
Extended attributes were not heavily used under OS/2 1.x - the system editor kept asking annoying questions when saving text files and you had to decide whether your file was 'Plain Text', 'OS/2 Command File' or 'DOS Command File', but for most users most of the time it didn't matter much. If extended attributes were sometimes lost by, for example, using non-EA aware programs then it was usually not even noticed.

However extended attributes are rather more heavily used under OS/2 2.0 - especially for the desktop. If you look at the root directory of your boot disk you will see a directory such as "OS!2_21.0_D" (if it is a FAT disk) or "OS/2 2.0 Desktop" (if it is an HPFS disk), under which are subdirectories with names like "TEMPLATE" or "TOOLKIT" corresponding to folders on the desktop. When you look at these directories you may be puzzled by the lack of files - for example on my machine I have 24 subdirectories of C:\OS!2_21.0_D but a total of only 11 files in them!

The reason for this is that the desktop information is held in the extended attributes for the directories themselves, and so most of the folders and their contents can be described without requiring any additional files. So beware - if you attempt to backup your desktop configuration by using a program like XCOPY you must ensure that you copy even empty subdirectories. [So that's why the /e option is there in XCOPY !]

Provided you stick with programs written for OS/2 1.2 and above you are likely to have few problems with extended attributes. However since one of the strengths of OS/2 2.0 is its ability to run DOS programs there are a likely to be problems to be overcome when accessing files with extended attributes.

The simplest solution for single files is to use the OS/2 utility program EAUtil, shipped with OS/2, which allows you to split the extended attributes out from the data file (or directory) into a file and to recombine them later.

So for example if you wanted to send a file with EAs via a bulletin board using one of the many non-EA aware archive programs you could do the following: C:>EAUTil /s /p myfile myfile.ea ; create copy of EAs in file C:>pkzip myfile myfile myfile.ea ; ZIP both data file and EA file and the recipient can then recombine the files to reconstruct the original data file complete with EAs as follows: C:>pkunzip myfile                ; extract myfile and myfile.ea C:>EAUTIL /j myfile myfile.ea     ; combine together into single file The easiest way to see if a file on a FAT disk has EAs is to use the /n on DIR which forces output to the 'new' HPFS output format. The last but one field is the size of the EAs. For example: C:>DIR /n c:\os!2_21.0_d Directory of C:\os!2_21.0_d 16-08-92 10:54p                0. 16-08-92 10:54p                0  .. 16-08-92 10:54p              867  TEMPLATE 16-08-92 10:54p             3913  TOOLKIT . . . Note however that OS/2 does NOT provide a standard display tool for extended attributes so it is not that easy to find out what the actual items are in the extended attributes for a file.

Overview of the APIs used for EAs
Extended attributes appear in the file system API from the very start - when a file is created or replaced using DosOpen extended attributes can be specified. (In much the same way a file can be created read only or hidden.)

Once a file is created extended attributes can be queried and set using DosQueryFileInfo/PathInfo and DosSetFileInfo/PathInfo. The 'File' functions are used to access a file using a file handle and the 'Path' functions are used to access a file without opening it first, or a directory (since directories cannot be opened using DosOpen the 'File' functions cannot be used on them.)

These APIs require a (rather complicated) structure containing a list of EA names and values, and are used to access explicit EAs with known names.

For more general requests for specific item name on a set of files the DosFindFirst and DosFindNext APIs (with info level of 2 or 3) can be used to enumerate matching files and extract named EAs. This is roughly equivalent to a combination of the simple (viz info level 1) DosFindFirst/DosFindNext (to get the file names), together with DosQueryPathInfo (to get the EA information), but in a single call. It will not be discussed further in this article for this reason.

Finally for general information about extended attributes the DosEnumAttribute call can be used to enumerate the entire set of extended attributes for a file.

The API functions themselves seem relatively sensible - they allow creation of a file with extended attributes, querying and setting attributes for a file or directory and enumerating the complete list of extended attributes.

There are a few problems - the main one being that there is no fail-safe way of obtaining the size of a single EA (which you might like to do in order to allocate a buffer of the correct size into which to read it!).

This is because, unlike most other OS/2 API calls, if the buffer provided on a DosQueryXXX call is too small the buffer length is set to the size of the ENTIRE EA SET FOR THE FILE rather than the (rather more useful) size of the actual EAs you require!

The only API which will return the size of each EA is DosEnumAttribute, but in order to guarantee consistent results (since theoretically other programs could alter the file's attributes between calls to DosEnumAttribute) the programmer's reference manual itself recommends first opening the file in deny-write mode. Unfortunately (a) this is not always desirable and (b) this is no use at all for directories - which cannot be opened!

There are two common ways of resolving this problem: method 1 is to use DosEnumAttribute and hope the results are consistent, method 2 is to allocate a really big buffer so the EA being read is 'bound to fit'. Neither way strikes me as desirable in a professional operating system!

However the real problems with using EAs come with the data types which have been defined - both for accessing EAs and the format of the data itself.

Overview of the data types used for EA access
In my opinion they are a mess.

In fact I think EA actually stands for 'extremely awkward' based on the problems experienced when you try using them. This article itself was sparked off by discovering sample code for accessing EAs which could create extended attributes which the same code was unable to read - if it's that hard to write a sample program what hope do we have in using EAs in real programs?!

First the access data types, as used in DosQueryFileInfo for example.

It all starts with an EAOP2 structure, which basically contains nothing but pointers to two further structures: a GEA2LIST and a FEA2LIST.

Both structures are used for query type of operations: the GEA2LIST contains a list of the names of the EAs required, and the FEA2LIST points to a buffer which is to contain the actual EA data.

Only the FEA2LIST is used for set type of operations: the GEA2LIST is ignored.

The GEA2LIST and FEA2LIST both consist of a header (a total buffer length) followed by an 'array' of variable sized data structures. Each data structure in turn contains a 'offset to next' field, the length of the EA name and the name itself. The FEA2 structure also contains a flag byte and then (finally!) the actual EA data itself.

All clear so far ? To make it a bit easier here is a schematic diagram of an EAOP2 request buffer after requesting two EAs: +-+-+-+ | GEA2LIST pointer   | FEA2LIST pointer    | 0 (no error offset) | +-+---+-+---+-+      |                     |       |                     |       V    first GEA2       |                  second GEA2 ++-+--+--+---+-+-++ | length | offset |  EA name | EA name  |pad|   0 (no | EA name | EA name    | | of list| to next |  length  | + NUL    |   |   next) | length  | + NUL      | ++-+--+--+---+-+-++                            |       +-+       |       V    first FEA2 ++-+---+--+--+--+ - - | length | offset |  flag |  EA name | data item| EA name  | data item | of list| to next | byte |  length  |  length  |          | itself ++-+---+--+--+--+ - -              second FEA2 - - --+---+---+--+-+--+---+---+       |pad| 0 (no | flag | EA name | data item| EA name   | data item |        |   | next) | byte | length  |  length  |           | itself    | - - --+---+---+--+-+--+---+---+ (I hope the picture is worth a thousand words in showing the relationship the various structures and fields)

Since both the EA names AND the data item are of variable length, this sort of structure is hard to manipulate using C - and that's without touching the actual format of the EA data item.

Overview of the data formats of EA data
OS/2 recommends but does not impose a standard format scheme for EA data.

Firstly the names of EAs starting with '.' are reserved for system EAs, of which .TYPE (the file type) and .CLASSINFO (SOM class information) are examples.

Secondly there are a number of standard formats each consisting of an EA type byte followed by type specific data.

(1) Simple data types - which all begin with a 2 byte length then the data:
 * EAT_BINARY (binary data), EAT_ASCII (ASCII text),
 * EAT_BITMAP (bitmap), EAT_METAFILE (OS/2 metafile),
 * EAT_ICON (icon)
 * A special case of these is EAT_EA which contains the name of another EA containing further data. This provides, among other things, a way of generating EA data of more that 64K; which is the limit for a single EA data item.

(2) Headers for more complicated data types:
 * EAT_MVMT which defines a multi-valued, multi-typed field such as is used for the .COMMENT EA (there may be multiple comments of different data types for a single file), EAT_MVST which defines a multi-valued single-type field (as a simplification

of the MVMT type when all items have the same type), EAT_ASN1 which defines an ASN.1 ISO standard multi-valued data stream (I have never seen an example of this one 'in the wild' but I expect someone uses it!)
 * Just to make life REALLY interesting a multi-valued field can include multi-valued subfields as well as simple data types.

(3) In addition the values 0 to 0x7fff are reserved for user-defined types.

This flexibility makes it impossible to write general EA display programs since:
 * (a) user defined EAs follow no rules at all
 * (b) even the 'standard' EAs are interpreted differently by different programmers
 * (c) OS/2 does no checking of the format of EA data items when writing them to disk.

However despite this extended attributes can be useful, but please bear the above problems in mind when coding - especially if you ever write code to process multi-valued EAs!

Description of the sample program
Given the problems described in the overviews above I thought that a nice simple example program would perhaps encourage more OS/2 programmers to venture into the area.

The example I have used is restricted to the simple single-valued ASCII data type, such as is used for the .LONGNAME or .VERSION standard EAs.

This data type can be used for your own files - for example to attach a quick textual note to a file such as a README attribute describing the file, or a note of when it was last backed up!

Since it is hard to manipulate the data structures used for EA access I decided to write a couple of access functions: EAQueryString - to read a NUL terminated string EA EASetString - to write a NUL terminated string EA

Obviously this method could be extended to cover the standard data types, and provide a more 'programmer-friendly' interface.

The program itself merely calls the appropriate function to read or write the named EA.

Note that opinion among OS/2 programs appears divided over the question of whether or not the ASCII data item includes the trailing NUL character or not - I prefer removing it since the string length is defined by the 2 byte length following the EAT_ASCII byte, but other programs leave the NUL in place.

It is a good idea to process either format, whichever one your programs will actually generate!

EADemo expects two or three arguments. The first argument is the file (or directory) name and the second is the name of the EA item required. If there is a third argument it is the value to set the EA item to; if there is no third item the program merely displays the current value of the EA item.

Note that since OS/2 does not provide an explicit API to delete an extended attribute EADemo takes a zero-length string to imply deletion.

The programs are compiled as follows. (I am using IBM Set C/2) icc /c EAString.c icc EADemo.c EAString.obj

Then for example: C:>echo. > sample

C:>EADemo sample read.me "A simple test of the program" Value of EA item read.me set to: "A simple test of the program"

C:>EADemo sample read.me Value of EA item read.me is: "A simple test of the program"

C:>EADemo sample read.me "" EA item read.me deleted

Comments on the program
It is a little longer than I usually hope for in articles of this type - partly reflecting the difficulties referred to above in the way the API has been implemented. I have liberally commented the code, rather than writing a large amount of separate description, in hope of the providing a more useful working example.

EAString.c is a general purpose piece of code to read and write ASCII EAs. It does not make efficient use of memory since every request malloc's and free's a buffer. In addition it relies on being told how big a string to read data into, but it suffices for simple use.

The EAOP2 structure is only used inside the EAQueryData and EASEtData functions. I do not find it a useful structure when programming as it adds so little information to the underlying GEA2LIST and FEA2LIST structures.

Note that OS/2 2.0 will round the size of the buffer up to the NEXT DOUBLEWORD BOUNDARY so make sure that you pick a buffer length divisible by four (or allocate 4 bytes more than the length you said) - see the comment in EAQueryString.

The EADemo program attempts to display a text message on any error by loading the appropriate error message from OSO001.MSG. Note however that error 111 (ERROR_BUFFER_OVERFLOW) which is generated by OS/2 and EAString when is the buffer is too small to hold the EA data requested is interpreted in the message file as "SYS0111: the file name is too long" rather than a more relevant message referring to buffer sizes! You may prefer to 'lie' and map error 111 to another error code such as error 122 (ERROR_INSUFFICIENT_BUFFER) which has a more meaningful text associated with it.

Conclusion
Extended attributes are a nice idea but I believe they are spoilt by the poor interface. Hopefully over time IBM themselves will address this and provide a more usable interface - in the mean time writing simple functions to perform one task (as this article demonstrates) can make it considerably easier to add basic EA functionality to your programs by hiding the complexity of the interface inside various access functions. Roger Orr 04-May-1993
 * EAString.h
 * EAString.c
 * EADemo.c