OS/2 High Performance File System

From EDM2
Jump to: navigation, search

By Les Bell

License: Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

Author's Note: This article originally appeared in PC Support Advisor, some time in 1990. The tests described in it were performed on OS/2 1.21, the first version to support HPFS. However, the basic principles still apply. Some time, I'll find time to repeat the tests on later versions of OS/2 (Warp, Merlin). I've performed minimal editing to add some comments where updates were necessary.

During the development of OS/2 during 1986-87, it became obvious that the project was running late and some components would have to be delayed if the operating system was to meet its release date of late 1987. The major component to be delayed was OS/2's file system; the first two versions of the operating system shipped with a protected-mode version of the DOS file system. Performance was less than spectacular, at the very best; no attempt had been made to optimize the system for multitasking operation, and the best that could be said for it was that it offered 100% compatibility with existing DOS systems.

It was always expected that there was little point in tuning the FAT file system, since it would be replaced by OS/2's own system. This finally took place with OS/2 1.2, which introduced the OS/2 High Performance File System. In fact, HPFS is an implementation of the more general OS/2 Installable File Systems.

File System Principles

Installable File Systems

In developing OS/2's file systems, Microsoft became aware that they were shooting at a moving target. For example, they could build support for CD-ROM drives into the design of the operating system, but then new media such as smart-cards and optical drives would be difficult to support. Accordingly, the OS/2 kernel is designed to make as few assumptions about the format of supported media as possible. All format-specific information and code is encoded in a special form of dynamic-link library called an Installable File System driver, and this is invoked with the statement

IFS = .IFS [options]

in the CONFIG.SYS file.

This will allow the operating system to support media of any format or type, including the native formats of other operating systems such as Mac System 6/7, UNIX or VMS, as well as different media such as very high-density floppy disks, CD-ROM and even tape.

HPFS

The first installable file system to ship for OS/2 is the High Performance File System, which is built into the operating system in OS/2 1.2 and later. If HPFS is selected during the installation process, then the installation program places the lines

IFS=C:\OS2\HPFS.IFS -C:512 /autocheck:e
RUN=C:\OS2\CACHE.EXE

or similar into the CONFIG.SYS file.

LAN Manager 2.1 and LAN Server 1.3 or later

The other major IFS currently shipping is LAN Manager 2.1, which installs as a file system. This makes sense, since all network resources appear in the file system namespace using Uniform Naming Convention. A LAN Manager 2.1 workstation effectively 'mounts' remote file systems and devices in a similar fashion to UNIX.

Another benefit is that, since the workstation need to know nothing about the format of the remote file system, it can equally easily mount an OS/2 server's file system or a UNIX or Mac file system.

CD-ROM

OS/2 2.1 ships with CDFS.IFS, an installable file system driver for High Sierra and ISO-9660 format CD-ROMs. With suitable device driver support, such as that provided by the SCSI device drivers, this can be used to attach IBM, Toshiba and other CD-ROM drives to the system. A matching VCDROM.SYS device driver provides access to the CD-ROM drive from DOS sessions.

HPFS Features

Higher Performance

As its name suggests, the High Performance File System does work faster than the old way of doing things, but it does require a little effort to get the best out of it. Simply accepting the OS/2 installation program's defaults gives a less than optimal result.

My initial investigation of HPFS performance required the writing of a small benchmark program (Appendix B) to investigate the sequential writing performance and the random read performance of HPFS. I wanted to investigate the relative performance of FAT and HPFS file systems and the effect of cache size on performance of both.

The initial results were somewhat disappointing. Table 1 shows the performance of the FAT file system, and Table 2 that of the HPFS.

Table 1. Effect of cache size on FAT file system performance.
Buffer Size DISKCACHE=256 DISKCACHE=512 DISKCACHE=1024
Stage 1 3.73 3.75 3.72
Stage 2 2.77 2.79 2.73
Stage 3 6.50 6.54 6.45
Table 2. Effect of cache size on HPFS performance
Buffer Size 512K 1024K 2048K
Stage 1 5.79 5.84 5.82
Stage 2 1.14 1.12 1.08
Total 6.93 6.96 6.90

At first sight, this is not very encouraging - the old FAT file system appears to be outperforming the HPFS! However, in both cases, the HPFSTEST.EXE program was run with the file size set at 1,000 records, each 100 bytes in size, and therefore well below the cache size for both systems. What we are primarily seeing is the effect of the cache buffering on both systems. In addition, this is simply the initial default setup for HPFS, and there are some things we can do to improve performance.

The first of these is to enable lazy writes. Normally, when the operating system writes the contents of a buffer to the file system, the FS writes it to disk immediately before returning to the calling routine in the OS. With lazy writes enabled, the FS returns to the OS immediately, and will write the data to disk when it gets a chance (or when a time limit expires, whichever comes first). Of course, if the system should crash in the meantime, or the power fail, data is lost. However, that is a fairly unlikely circumstance: OS/2 1.21 is remarkably stable (has never crashed on me) and power failure in city locations are also pretty rare.

Lazy writes are enabled by adding the line

RUN=C:\OS2\CACHE.EXE /LAZY:ON

to the CONFIG.SYS file, immediately after the IFS=HPFS.IFS line.

If lazy writes are enabled, the system must be shut down using the Desktop Manager Shutdown command, which causes the cache program to flush its buffers. Alternatively, another program could be written which calls the DosShutdown function.

The picture changes quite dramatically when lazy writes are enabled:

Table 3. Effect of lazy writes on HPFS performance.
Buffer Size 512K 1024K 2048K
Stage 1 0.43 0.45 0.42
Stage 2 1.23 1.24 1.24
Total 1.66 1.69 1.66

The effect on the first stage of the benchmark, the file creation and writing, is quite noticeable.

The next step was to investigate the effects on a larger file (something not possible on the initial test machine due to a severe shortage of disk space). My network server, however, had plenty of free space on its all-HPFS drive, so the benchmark was run again, this time writing 100,000 records. The results were as follows:

(HPFSTEST.EXE, 100,000 records)

Table 4. Effect of cache size on HPFS386
Cache Size 256K 512K 1024K 4096K
Stage 1 51.72 48.67 48.25 43.73
Stage 2 1980.79 1515.32 1031.48 139.62
Total 2032.51 1563.99 1079.73 183.35

The most noticeable result here is the dramatic improvement in performance once the cache size gets above 2 MB. The effect of the cache is further shown by reports from the cache program itself:

With a 512K cache:
Cache Statistics

Read Requests:               594677     Disk Reads:        436960
Cache Hit Rate (Reads):          26%    Cache Reads:       157717

Write Requests:                6196     Disk Writes:          116
Cache Hit Rate (Writes):         98%    Lazy Writes:         6080

Hot Fixes:           0
With a 4096K cache:
Cache Statistics

Read Requests:               576801     Disk Reads:         19604
Cache Hit Rate (Reads):          96%    Cache Reads:       557197

Write Requests:               10887     Disk Writes:            0
Cache Hit Rate (Writes):        100%    Lazy Writes:        10887

Hot Fixes:           0

These reports indicate that the hit rate on the cache dramatically improves as it increases, and my own observations confirm this: with small cache sizes, the system was driving the disk really hard, while on the last test, it hardly referred to it at all and I was wondering, for a couple of minutes, whether the program might not have crashed!

My next action after this was to order up another 4 MB of memory for my server: you very rarely see such a terrific performance improvement for so little money.

However, please note that the major benefit of HPFS386 is the way that it directly couples to NETBIOS on servers, using its own internal SMB server, and thereby bypasses the OS/2 kernel and the accompanying ring transitions.

The Super-FAT file system code introduced in OS/2 2.0 uses some of the same caching and lazy write techniques introduced in HPFS while preserving the existing FAT file system format and consequently compatibility with DOS. It achieves a significant performance improvement over FAT, but does not catch up to HPFS on other but the smallest drives. In other words, for typical drives HPFS is still faster, especially when there are lots of files in a directory.

Major conclusions:

  • Don't assume that the installation program default allocations of cache memory are optimum. They are very much a compromise between main memory for programs (to minimise swapping) and disk performance. On most systems, some increase in cache size could be justified.
  • Increases in cache size do not produce much improvement initially, but once HPFS has a couple of MB to chew on, it runs like a scalded cat.
  • Turn lazy writes on!

Long File Names

HPFS supports filenames up to 254 characters in length, including spaces and other punctuation symbols. This allows the use of meaningful filenames, such as "Sales Figures for First Quarter of '91.", rather than SLS1Q91.WK3. Because DOS 8.3 filenames are a subset of the HPFS convention, DOS applications can access files on an HPFS drive, either from the DOS compatibility box of an OS/2 workstation, or from a DOS workstation to an OS/2 LAN Manager server.

Filenames can comprise multiple components separated by periods (.); unless explicitly defined by a file system, there is no limit on the number of components in a filename. Pathname components are separated by slashes or backslashes (/ or \) as for UNIX and DOS. Foreign character sets are supported: any character in the current codepage is allowed, including characters with values above 0xff (127), although a program may need to switch codepages to access files with such names.

Of course, DOS applications accessing HPFS drives, either locally (from the DOS compatibility box of an OS/2 machine) or across a network (DOS workstation accessing LAN Manager 2.1 server) cannot access files or subdirectories which have long names. Some care should therefore be taken in creating subdirectory trees from an OS/2 session. Of course, this can also be an advantage: I have a subdirectory on my server containing DOS virus code for examination, but since the directory is called "Virus Stuff" DOS workstations and sessions cannot be infected by anything in it!

Case Preservation and Insensitivity

File names can contain both upper and lower case characters. Case is preserved in directory entries, but case matching is not required in file searches and accesses, unlike UNIX.

The use of lower case characters can make directory listings much easier to read. An interesting side benefit of HPFS's use of B+ trees for its directory structures is that directory listings are always in alphabetic sequence.

Extended Attributes

DOS supports six single-bit attributes, four of which (Archive, System, Hidden and Read-only) are settable by applications and users. DOS knows nothing about the different types of files, other than that .COM and .EXE files are executable code and .BAT files are also runnable.

HPFS (and other IFS's), on the other hand, supports Extended Attributes, which can be thought of as a list of facts attached to a file or directory. Each extended attribute consists of two parts: a name and a value. The name is always a null terminated string, while the value can be text, a bitmap (such as an icon) or any binary data. A file can have any number of extended attributes, each of which can be up to 64 Kbytes long; however, under OS/2 1.2 and 1.3, all EA's for a file must total less than 64K.

Standard extended attributes include

  • .TYPE (Plain Text, Executable, Metafile, Bitmap, Icon, Dynamic Link Library, C Code, Pascal Code, Microsoft Excel Chart and others)
  • .KEYPHRASES
  • .SUBJECT
  • .COMMENTS
  • .HISTORY
  • .VERSION (version number of file format)
  • .ICON
  • .ASSOCTABLE (list of filetypes, extensions and icons for data files used by an application)
  • .LONGNAME (long file name where the file system does not support a long name).

The extended attributes are stored in the file system, not in the files themselves, and some applications may be unaware of the existence of attributes. Under FAT file systems, the extended attributes are stored in a hidden file in the root directory called EA DATA. FS. Do not tamper with or delete this file.

Possible applications for extended attributes include searching for files by date, topic, keyword, or author and generally doing the work of typical DOS shells, as well as storing network access histories (date/time of each access along with user and workstation ID's).

Of course, LAN Manager 2.1's HPFS386 stores security permissions on files in their extended attributes, thereby providing a much higher level of security than FAT file systems can. In addition, HPFS386 implements local security, requiring even a user sitting at the server to have an account on that machine and to log on before doing anything. This closes a major security hole for LAN Manager.

While the FAT file system in OS/2 2.0 and later also provides support for EA's, it does so in a fashion that is somewhat fragile: storing EA's in clusters pointed to by unused bytes in the file's directory entry, and then covering this up so that DOS CHKDSK will work normally by placing these clusters in a hidden file called EA DATA. SF. Use of DOS defragmentation utilities, directory sort utilities and file system repair utilities can easily 'break' this mechanism. In addition, DOS backup and restore utilities will not not deal either with the EA's themselves or the correct restoration of this file, and should not be used. For all these reasons, use of EA's on the FAT file system (and that includes any use of OS/2, due to the workplace shell's use of EA's) is a risky business. HPFS is far more reliable.

Internal Operation

HPFS provides significantly better performance than the FAT file system through several design features.

First, the FAT system falls down in a number of areas: its root directory and file allocation table are located on the outermost cylinders of a drive while the most frequently accessed files are generally located near the innermost cylinders, forcing massive - and slow - head movements. Almost any file access forces the heads to seek back out to the FAT and then back in to the file. Its directories are simply tables which must be searched linearly - a slow process, even when assisted by the FASTOPEN command.

By contrast, HPFS splits the disk surface into bands, each 16 MB in size. At the center of the band is its directory and bitmap area, with the file area extending for 8 MB either side. This means that directory and file allocation information is located near the files it controls, and not at the opposite end of the disk. In addition, the directories are B+ trees (think of them as being like dBASE .NDX files), which means that HPFS can locate directories very much faster than the FAT file system (this is also why directories always list alphabetically under HPFS).

The use of caching, of course, further enhances performance, as does the use of multiple threads to write asynchronously within the file system.

The FAT file system allocates space in terms of clusters, the size of which varies according to the drive size, but which is always at least 2K. A drive which contains lots of small files, for example an email file server, will have a lot of wasted space. HPFS, on the other hand, allocates sectors, in order to minimise wasted space.

Local and Remote File Systems

Installable file systems provide the means by which OS/2 is able to access remote file systems, since the DosFSAttach() function call allows programs to mount remote file systems and access them through a locally assigned drive letter.

Supported Function Calls

OS/2 1.2(1) adds a number of new functions for access to installable file systems. These include:

  • DosCopy Copies a file or subdirectory
  • DosEditName Transforms a source filename string using an editing string
  • DosFileIO Performs file I/O (lock, unlock, seek, read and write)
  • DosFindFirst2 Finds the first file that matches a specified file name and attributes
  • DosFSAttach Attaches or detaches a drive or pseudo character device from a remote file system
  • DosFSCtl Calls non-standard file system functions
  • DosGetResource2 Retrieves a resource for a module
  • DosMkDir2 Creates a directory
  • DosOpen2 Opens or creates a file with extended attributes
  • DosQFSAttach Queries information about an attached file system
  • DosSetPathInfo Sets information for a file or directory
  • DosShutdown Shuts down the file system

Command-Line Operation

Use of quotes

Because filenames can now comprise multiple words, a difficulty arises, in that OS/2's command line interpreter will, by default, interpret each word as a separate filename. For example, the command

RD Data Subdirectory

will attempt to remove the two subdirectories Data and Subdirectory. In order to have CMD.EXE treat a multi-word filename properly, the filename should be enclosed in quotes:

RD "Data Subdirectory"

You can even give a command like

md "This is a very long subdirectory name, much longer than you could create under DOS"

and OS/2 will process it correctly. Follow this with Up-Arrow, Home, c, Enter (using OS/2's command-line recall and editing) and you will have changed to this new directory - the command prompt becomes quite spectacular.

Because many characters have special meanings to the command line processor, you can also 'quote' single characters or 'escape' their special meanings by prefixing them with a caret (^) symbol. Thus, the RD example above could have been typed as

RD Data^ Subdirectory

Alternatively, you could create a file with parentheses in its name with a command like

copy con ^(Parenthetical.File^)

Wildcard Operation

HPFS (and IFS's generally) support a more sophisticated variant of the wildcard facilities found in DOS. As before, a question mark (?) matches any character except a period (.). The asterisk (*) matches any sequence of characters, including blanks.

However, under HPFS, multiple asterisks can be used in any component of a filename (though not a path). For example, the command

 DIR *Forecast*

will locate all files which have the word Forecast in them somewhere. Take this subdirectory, for example:

[E:\HPFS Test Subdirectory]dir
 
 The volume label in drive E is OS2 APPS.
 The Volume Serial Number is 259D:6C15
 Directory of E:\HPFS Test Subdirectory
 
31-12-90   8:04                      0 . 
31-12-90   8:04                      0 .. 
 2-01-91  10:25          0           0  +test.cap
 2-01-91  10:17         57           0  Acme Project.Cash Flow Forecast.Final
31-12-90  10:23         20           0  file.1.dat
31-12-90  11:18         19           0  Forecast for 1992
31-12-90  11:17         29           0  Quarterly Forecast for 1991
31-12-90   8:07         22           0  Test Copy 1
31-12-90   8:08         21           0  Test File
31-12-90   8:05         21           0  Test File 1
31-12-90   8:05         21           0  Test File 2
        11 File(s)   406528 bytes free

[E:\HPFS Test Subdirectory]dir  *Forecast*
            
 The volume label in drive E is OS2 APPS.
 The Volume Serial Number is 259D:6C15
 Directory of E:\HPFS Test Subdirectory
            
 2-01-91  10:17         57           0  Acme Project.Cash Flow Forecast.Final
31-12-90  11:18         19           0  Forecast for 1992
31-12-90  11:17         29           0  Quarterly Forecast for 1991
        3 File(s)   406528 bytes free
   
[E:\HPFS Test Subdirectory]dir *.*Forecast*.*
            
 The volume label in drive E is OS2 APPS.
 The Volume Serial Number is 259D:6C15
 Directory of E:\HPFS Test Subdirectory
            
 2-01-91  10:17         57           0  Acme Project.Cash Flow Forecast.Final
         1 File(s)   406528 bytes free
[E:\HPFS Test Subdirectory]dir *.Final
            
 The volume label in drive E is OS2 APPS.
 The Volume Serial Number is 259D:6C15
 Directory of E:\HPFS Test Subdirectory
            
 2-01-91  10:17         57           0  Acme Project.Cash Flow Forecast.Final
         1 File(s)   406528 bytes free

Use of Extended Attributes

By selecting the Settings Notebook view of an object, one is led to a dialog window which allows display and editing of extended attributes. This displays the file name and path, as well as the Subject and Icon EA's. Next, the file dates are displayed: Created, Last Modification and Last Access, along with the standard file attributes (archive, hidden and read-only). One can change the icon to any icon file on the system, including over-riding the default icon type if required. Next comes the Default Type field; again one can add additional types and change the default (notice, a file can have multiple types). Pressing on the '>' button gives access to fields for comments, Key Phrases and History.

File Searching

With a little planning, and appropriate support in OS/2 applications, one could totally eliminate any need for Magellan and similar file viewing/management utilities.

The Future

HPFS has rapidly become the format of choice for drives on LAN Manager servers and OS/2 workstations generally. OS/2 2.X continues to support HPFS, as well as other file systems.

The Workplace OS, due for introduction in 1994, will support HPFS, as well as the Journaling File System found on AIX systems.

Summary

The High Performance File System introduced in OS/2 1.21 offers something for everyone:

  • Higher performance - we can all use that!
  • Greater storage efficiency - important on file servers.
  • Meaningful, long file names - useful in multi-user environments or for people with short memories.
  • Extended attributes - great for getting organised and seamlessly integrating applications.
  • HPFS386 - even higher performance and security for network systems.

Appendices

Appendix A: Test System Configurations

For Tables 1, 2 & 3:

Compaq Portable 386, 20 MHz clock, 10 MB memory, Conner Peripherals 110 MB drive, OS/2 1.21 with LAN Manager 2.1 workstation service running

For Table 4:

33 MHz 386, 8 MB memory, Conner Peripherals 209 MB drive, OS/2 1.21 with LAN Manager 2.0 HPFS386 installed with local security, server and workstation services shut down.

Appendix B: HPFSTEST.C source code

/* HPFS Benchmark Test */

#include <os2.h>
#include 
#include 
#include 
#include  

/* Macro to get a random integer within a specified range */
#define getrandom( min, max ) ((rand() % (int)(((max)+1) - (min))) + (min)) 

#define NUMRECS 1000L 

main()
{
    FILE *datafile;
    long i, fileptr;

    struct {
	int key;
 	char buffer[98];
    } datarec;

    printf("Processing %ld records\n",NUMRECS);
    puts("Strike any key to start");
    getch();

    if((datafile = fopen("TEST.DAT","w+")) == NULL) {
	fprintf(stderr,"\nError opening test datafile");
 	exit(1);
    }

    for(i=0;i<NUMRECS;i++) {
	datarec.key = i;
 	fwrite(&datarec, sizeof(datarec),1,datafile);
/*	putch('.'); */
    }
    printf("\a\nStage 1 - complete\n");

    /* Seed the random number generator with current time. */
    srand( (unsigned) time( NULL ) );

    /* Randomly read NUMREC records */
    for(i = 0; i < NUMRECS; i++) {
	fileptr = (long)sizeof(datarec) * (long)getrandom(0,(NUMRECS-1));
	if(fseek(datafile,fileptr,SEEK_SET))
	    putch('!');
/*	else
 	    putch('*'); */
 	fread(&datarec,sizeof(datarec),1,datafile);
    }
    printf("\a\nStage 2 - complete");

    fclose(datafile);
}