The Fundamentals of Extended Attributes

From EDM2
Jump to: navigation, search

By Michael Norton

As most programmers can attest, from a technician's perspective, information about a file is often more important than the contents themselves. For example, the archive attribute allows backup software to copy only those files which have been modified since the last backup. Comparing file dates and sizes is a routine activity. And there isn't a technician out there who hasn't developed an intensely personal and zealously enforced naming convention for files. Nevertheless, the standard file stamps and attributes established under DOS aren't very flexible and provide only primitive information. OS/2 provides a facility for attaching much more and varied information to a file or directory: extended attributes, or EAs.

The capabilities that EAs provide often remain unexploited. For many users, their only exposure to EAs is when there is a problem. System error messages appear such as the dreaded cross-linked extended attributes or corrupted extended attributes message. Thus, EAs are viewed somewhat ambivalently by many users.

The truth is that OS/2, or more accurately, the OS/2 Workplace Shell (WPS), could not run without them. Extended attributes are the 'glue' that holds the WPS together. They contain the information the system needs to associate files with programs, icons with objects, long file names with files copied from HPFS to FAT, and a variety of other system functions. Anyone who has ever made the mistake of using a DOS defragger which does not support extended attributes on their OS/2 boot volume can certainly attest to how critical EAs are to the operation of OS/2.

Amazingly, the only native OS/2 utility for managing EAs is EAUTIL.EXE. EAUTIL allows you to split EAs from a file or directory into a separate file which may subsequently be rejoined to the original file or merged to a different file. The command seems to be originally designed to allow operating systems that do not recognize extended attributes (for example, DOS) to operate on files from an OS/2 system. In addition, EAUTIL provides support for file transfer operations, such as transferring a file via a modem, which would strip a file of its extended attributes. The documentation also suggests another use: to allow operating systems which do not support EAs to actually operate on the EA data. The separate 'hold' file containing the extended attributes can be operated on independently of the original file by any process, then rejoined to the original file. For example, the .SUBECT EA could be altered by a DOS utility to maintain file comments and still maintain full compatibility with OS/2 once the EA had been rejoined.

This approach, to my knowledge, is not used, probably due to the inherent cumbersomeness of the procedure; EAUTIL is primarily used to backup EAs, an awkward enough procedure in and of itself, since the EAUTIL command does not accept generic or wildcard characters in its file specification parameters. Thus, EAUTIL is often used as a call from a REXX program for mass file operations. In the REXX example accompanying this article, all EAs on a drive are backed up.

REXX

REXX provides two EA manipulation functions as part of its standard REXXUTIL utility functions. SysGetEA will read an EA for a specified file. SysPutEA will write an EA to a specified file. The latter is utilized in the REXX example. The REXX example writes the file specification to the new 'SOURCE' EA of the backup file; this EA may be retrieved and used by a REXX program to reattach the EA to the correct file, basically reversing the procedure. The example also demonstrates that you must know something about the structure of EAs to manipulate them with REXX: the SysPutEA function requires a hex string parameter, concatenated on the previous line into the 'subject' variable, to function properly.

The Structure Of EAs

The hex string represents the EA type. There are nine standard EA types which are identified to the operating system by a word (two byte) identifier. The EA types may be grouped into two categories: Single-Valued, which contain only one data item per EA, and Multi-Valued, which may contain multiple items. In the following example, the C DEFINE label is used to refer to the types in each category.

Single-Valued:
EAT_BINARY FFFE Binary data
EAT_ASCII FFFD Text
EAT_BITMAP FFFB Bitmap
EAT_METAFILE FFFA Metafile
EAT_ICON FFF9 Icon
EAT_EA FFFE Another associated EA
Multi-Valued:
EAT_MVMT FFDF Multi-Valued, Multi-Typed
EAT_MVST FFDE Multi-Valued, Single Typed
EAT_ASN1 FFDD ASN.1 field data

A programmer may define his own EA type by simply specifying a value between 0x0000 and 0x7FFF for the hex identifier. Values above this range are reserved.

For all Single-Valued EAs, the first word following the hex type identifier indicates the length of the data which follows. For Multi-Valued EAs, the first word following the hex type identifier indicates the code page to be used. The code page is followed by a word specifying the number of entries in the EA.

For Multi-Valued, Single-Typed EAs, the next word indicates the type of entries contained in the EA. Although this sounds slightly confusing, it is actually quite simple. The first type specification denotes that the EA contains multiple values, or entries; the second type specification indicates the type of those entries. This second type specification is followed by a series of 'records', each consisting of two bytes, indicating the length of the data followed by the data itself.

Since Multi-Typed EAs contain, as their name implies, various EA types, each value or entry must define its type; thus the 'record' includes two bytes indicating the type, followed by the length and data as in the Single-Typed Multi-Valued EA.

The SOURCE EA being written in the sample REXX program is an EAT_ASCII type extended attribute (FFFD). The next two concatenation elements in the string (d2c(length(files.i)) and '00'x) form the length, and the EA data follows.

Viewing and Editing Extended Attributes

Since EAs contain binary data they may not be viewed or edited with an ordinary text editor. Utilities can be used such as UniMaint or the EA editor in the GammaTech Power Pack. When writing code to exploit EAs, an editor is essential, not only for debugging problems but also for experimenting with more effective data constructs. UniMaint also brings to the table a host of features designed to make life simpler when programming using EAs, including splitting, rejoining and merging EAs. Not only are these facilities useful for the actual development work, they also provide a convenient method of protecting yourself from EA corruption during the development process. Remember, since the operating system relies so heavily on extended attributes, OS/2 is sensitive to EA corruption. Forewarned is forearmed; be sure to create a recovery strategy before experimenting with EAs. Backups and tools such as UniMaint are a solid foundation for such a strategy.

EA Storage

There are two common problems with extended attributes which have nothing to do with the EA itself, but rather the file system. HPFS rarely has problems with EAs because EAs are stored in a file's F-node, or header. Most EA problems are therefore on FAT volumes. The FAT file system uses a pointer in the directory structure, (highlighted in the graphic) to an entry in the EA DATA.SF file, which is really a logical construct used to account for extended attributes in the file allocation table. Extended attributes are assigned sequential pointers at creation. If two files reference the same pointer, cross-linked EAs and trouble result. Even CHKDSK behaves erratically in this case. CHKDSK has been known to simply delete the cross-linked files, but on other occasions, it simply ignored the pointer. The standard procedure for cross-linked EAs is to copy the affected files to another volume, delete the originals, then recopy the EAs back to the initial volume. Unfortunately, this is a kludge, as the first file copied will take the EA with it. With a little patience and a sector editor you can often resolve this problem by correcting the entry in the directory structure.

The other common error is also often a result of the inadequacies of the FAT file system in handling EAs. The Corrupted Extended Attributes' error message is most often caused by an allocation error in the EADATA.SF file. Again, CHKDSK cannot be depended on to resolve this situation. The only sound solution is to restore from a reliable backup.

Summary

Don't let these problems discourage you from exploiting the potential of extended attributes in your development endeavors. Extended attributes are one facility that will distinguish your application as a distinctively OS/2 application, apart from the deragatory DOS/Windows paradigm. Knowing the inhibitors to using EAs will help you avoid the pitfalls.

Reprint Courtesy of International Business Machines Corporation, © International Business Machines Corporation