OS/2 High Performance File System
By Les Bell
License: Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.
Author's Note: This article originally appeared in PC Support Advisor, some time in 1990. The tests described in it were performed on OS/2 1.21, the first version to support HPFS. However, the basic principles still apply. Some time, I'll find time to repeat the tests on later versions of OS/2 (Warp, Merlin). I've performed minimal editing to add some comments where updates were necessary.
During the development of OS/2 during 1986-87, it became obvious that the project was running late and some components would have to be delayed if the operating system was to meet its release date of late 1987. The major component to be delayed was OS/2's file system; the first two versions of the operating system shipped with a protected-mode version of the DOS file system. Performance was less than spectacular, at the very best; no attempt had been made to optimize the system for multitasking operation, and the best that could be said for it was that it offered 100% compatibility with existing DOS systems.
It was always expected that there was little point in tuning the FAT file system, since it would be replaced by OS/2's own system. This finally took place with OS/2 1.2, which introduced the OS/2 High Performance File System. In fact, HPFS is an implementation of the more general OS/2 Installable File Systems.
Contents
File System Principles
Installable File Systems
In developing OS/2's file systems, Microsoft became aware that they were shooting at a moving target. For example, they could build support for CD-ROM drives into the design of the operating system, but then new media such as smart-cards and optical drives would be difficult to support. Accordingly, the OS/2 kernel is designed to make as few assumptions about the format of supported media as possible. All format-specific information and code is encoded in a special form of dynamic-link library called an Installable File System driver, and this is invoked with the statement
IFS = .IFS [options]
in the CONFIG.SYS file.
This will allow the operating system to support media of any format or type, including the native formats of other operating systems such as Mac System 6/7, UNIX or VMS, as well as different media such as very high-density floppy disks, CD-ROM and even tape.
HPFS
The first installable file system to ship for OS/2 is the High Performance File System, which is built into the operating system in OS/2 1.2 and later. If HPFS is selected during the installation process, then the installation program places the lines
IFS=C:\OS2\HPFS.IFS -C:512 /autocheck:e RUN=C:\OS2\CACHE.EXE
or similar into the CONFIG.SYS file.
LAN Manager 2.1 and LAN Server 1.3 or later
The other major IFS currently shipping is LAN Manager 2.1, which installs as a file system. This makes sense, since all network resources appear in the file system namespace using Uniform Naming Convention. A LAN Manager 2.1 workstation effectively 'mounts' remote file systems and devices in a similar fashion to UNIX.
Another benefit is that, since the workstation need to know nothing about the format of the remote file system, it can equally easily mount an OS/2 server's file system or a UNIX or Mac file system.
CD-ROM
OS/2 2.1 ships with CDFS.IFS, an installable file system driver for High Sierra and ISO-9660 format CD-ROMs. With suitable device driver support, such as that provided by the SCSI device drivers, this can be used to attach IBM, Toshiba and other CD-ROM drives to the system. A matching VCDROM.SYS device driver provides access to the CD-ROM drive from DOS sessions.
HPFS Features
Higher Performance
As its name suggests, the High Performance File System does work faster than the old way of doing things, but it does require a little effort to get the best out of it. Simply accepting the OS/2 installation program's defaults gives a less than optimal result.
My initial investigation of HPFS performance required the writing of a small benchmark program (Appendix B) to investigate the sequential writing performance and the random read performance of HPFS. I wanted to investigate the relative performance of FAT and HPFS file systems and the effect of cache size on performance of both.
The initial results were somewhat disappointing. Table 1 shows the performance of the FAT file system, and Table 2 that of the HPFS.
Buffer Size | DISKCACHE=256 | DISKCACHE=512 | DISKCACHE=1024 |
---|---|---|---|
Stage 1 | 3.73 | 3.75 | 3.72 |
Stage 2 | 2.77 | 2.79 | 2.73 |
Stage 3 | 6.50 | 6.54 | 6.45 |
Buffer Size | 512K | 1024K | 2048K |
---|---|---|---|
Stage 1 | 5.79 | 5.84 | 5.82 |
Stage 2 | 1.14 | 1.12 | 1.08 |
Total | 6.93 | 6.96 | 6.90 |
At first sight, this is not very encouraging - the old FAT file system appears to be outperforming the HPFS! However, in both cases, the HPFSTEST.EXE program was run with the file size set at 1,000 records, each 100 bytes in size, and therefore well below the cache size for both systems. What we are primarily seeing is the effect of the cache buffering on both systems. In addition, this is simply the initial default setup for HPFS, and there are some things we can do to improve performance.
The first of these is to enable lazy writes. Normally, when the operating system writes the contents of a buffer to the file system, the FS writes it to disk immediately before returning to the calling routine in the OS. With lazy writes enabled, the FS returns to the OS immediately, and will write the data to disk when it gets a chance (or when a time limit expires, whichever comes first). Of course, if the system should crash in the meantime, or the power fail, data is lost. However, that is a fairly unlikely circumstance: OS/2 1.21 is remarkably stable (has never crashed on me) and power failure in city locations are also pretty rare.
Lazy writes are enabled by adding the line
RUN=C:\OS2\CACHE.EXE /LAZY:ON
to the CONFIG.SYS file, immediately after the IFS=HPFS.IFS line.
If lazy writes are enabled, the system must be shut down using the Desktop Manager Shutdown command, which causes the cache program to flush its buffers. Alternatively, another program could be written which calls the DosShutdown function.
The picture changes quite dramatically when lazy writes are enabled:
Buffer Size | 512K | 1024K | 2048K |
---|---|---|---|
Stage 1 | 0.43 | 0.45 | 0.42 |
Stage 2 | 1.23 | 1.24 | 1.24 |
Total | 1.66 | 1.69 | 1.66 |
The effect on the first stage of the benchmark, the file creation and writing, is quite noticeable.
The next step was to investigate the effects on a larger file (something not possible on the initial test machine due to a severe shortage of disk space). My network server, however, had plenty of free space on its all-HPFS drive, so the benchmark was run again, this time writing 100,000 records. The results were as follows:
(HPFSTEST.EXE, 100,000 records)
Cache Size | 256K | 512K | 1024K | 4096K |
---|---|---|---|---|
Stage 1 | 51.72 | 48.67 | 48.25 | 43.73 |
Stage 2 | 1980.79 | 1515.32 | 1031.48 | 139.62 |
Total | 2032.51 | 1563.99 | 1079.73 | 183.35 |
The most noticeable result here is the dramatic improvement in performance once the cache size gets above 2 MB. The effect of the cache is further shown by reports from the cache program itself:
With a 512K cache: Cache Statistics Read Requests: 594677 Disk Reads: 436960 Cache Hit Rate (Reads): 26% Cache Reads: 157717 Write Requests: 6196 Disk Writes: 116 Cache Hit Rate (Writes): 98% Lazy Writes: 6080 Hot Fixes: 0 With a 4096K cache: Cache Statistics Read Requests: 576801 Disk Reads: 19604 Cache Hit Rate (Reads): 96% Cache Reads: 557197 Write Requests: 10887 Disk Writes: 0 Cache Hit Rate (Writes): 100% Lazy Writes: 10887 Hot Fixes: 0
These reports indicate that the hit rate on the cache dramatically improves as it increases, and my own observations confirm this: with small cache sizes, the system was driving the disk really hard, while on the last test, it hardly referred to it at all and I was wondering, for a couple of minutes, whether the program might not have crashed!
My next action after this was to order up another 4 MB of memory for my server: you very rarely see such a terrific performance improvement for so little money.
However, please note that the major benefit of HPFS386 is the way that it directly couples to NETBIOS on servers, using its own internal SMB server, and thereby bypasses the OS/2 kernel and the accompanying ring transitions.
The Super-FAT file system code introduced in OS/2 2.0 uses some of the same caching and lazy write techniques introduced in HPFS while preserving the existing FAT file system format and consequently compatibility with DOS. It achieves a significant performance improvement over FAT, but does not catch up to HPFS on other but the smallest drives. In other words, for typical drives HPFS is still faster, especially when there are lots of files in a directory.
Major conclusions:
- Don't assume that the installation program default allocations of cache memory are optimum. They are very much a compromise between main memory for programs (to minimise swapping) and disk performance. On most systems, some increase in cache size could be justified.
- Increases in cache size do not produce much improvement initially, but once HPFS has a couple of MB to chew on, it runs like a scalded cat.
- Turn lazy writes on!
Long File Names
HPFS supports filenames up to 254 characters in length, including spaces and other punctuation symbols. This allows the use of meaningful filenames, such as "Sales Figures for First Quarter of '91.", rather than SLS1Q91.WK3. Because DOS 8.3 filenames are a subset of the HPFS convention, DOS applications can access files on an HPFS drive, either from the DOS compatibility box of an OS/2 workstation, or from a DOS workstation to an OS/2 LAN Manager server.
Filenames can comprise multiple components separated by periods (.); unless explicitly defined by a file system, there is no limit on the number of components in a filename. Pathname components are separated by slashes or backslashes (/ or \) as for UNIX and DOS. Foreign character sets are supported: any character in the current codepage is allowed, including characters with values above 0xff (127), although a program may need to switch codepages to access files with such names.
Of course, DOS applications accessing HPFS drives, either locally (from the DOS compatibility box of an OS/2 machine) or across a network (DOS workstation accessing LAN Manager 2.1 server) cannot access files or subdirectories which have long names. Some care should therefore be taken in creating subdirectory trees from an OS/2 session. Of course, this can also be an advantage: I have a subdirectory on my server containing DOS virus code for examination, but since the directory is called "Virus Stuff" DOS workstations and sessions cannot be infected by anything in it!
Case Preservation and Insensitivity
File names can contain both upper and lower case characters. Case is preserved in directory entries, but case matching is not required in file searches and accesses, unlike UNIX.
The use of lower case characters can make directory listings much easier to read. An interesting side benefit of HPFS's use of B+ trees for its directory structures is that directory listings are always in alphabetic sequence.
Extended Attributes
DOS supports six single-bit attributes, four of which (Archive, System, Hidden and Read-only) are settable by applications and users. DOS knows nothing about the different types of files, other than that .COM and .EXE files are executable code and .BAT files are also runnable.
HPFS (and other IFS's), on the other hand, supports Extended Attributes, which can be thought of as a list of facts attached to a file or directory. Each extended attribute consists of two parts: a name and a value. The name is always a null terminated string, while the value can be text, a bitmap (such as an icon) or any binary data. A file can have any number of extended attributes, each of which can be up to 64 Kbytes long; however, under OS/2 1.2 and 1.3, all EA's for a file must total less than 64K.
Standard extended attributes include
- .TYPE (Plain Text, Executable, Metafile, Bitmap, Icon, Dynamic Link Library, C Code, Pascal Code, Microsoft Excel Chart and others)
- .KEYPHRASES
- .SUBJECT
- .COMMENTS
- .HISTORY
- .VERSION (version number of file format)
- .ICON
- .ASSOCTABLE (list of filetypes, extensions and icons for data files used by an application)
- .LONGNAME (long file name where the file system does not support a long name).
The extended attributes are stored in the file system, not in the files themselves, and some applications may be unaware of the existence of attributes. Under FAT file systems, the extended attributes are stored in a hidden file in the root directory called EA DATA. FS. Do not tamper with or delete this file.
Possible applications for extended attributes include searching for files by date, topic, keyword, or author and generally doing the work of typical DOS shells, as well as storing network access histories (date/time of each access along with user and workstation ID's).
Of course, LAN Manager 2.1's HPFS386 stores security permissions on files in their extended attributes, thereby providing a much higher level of security than FAT file systems can. In addition, HPFS386 implements local security, requiring even a user sitting at the server to have an account on that machine and to log on before doing anything. This closes a major security hole for LAN Manager.
While the FAT file system in OS/2 2.0 and later also provides support for EA's, it does so in a fashion that is somewhat fragile: storing EA's in clusters pointed to by unused bytes in the file's directory entry, and then covering this up so that DOS CHKDSK will work normally by placing these clusters in a hidden file called EA DATA. SF. Use of DOS defragmentation utilities, directory sort utilities and file system repair utilities can easily 'break' this mechanism. In addition, DOS backup and restore utilities will not not deal either with the EA's themselves or the correct restoration of this file, and should not be used. For all these reasons, use of EA's on the FAT file system (and that includes any use of OS/2, due to the workplace shell's use of EA's) is a risky business. HPFS is far more reliable.
Internal Operation
HPFS provides significantly better performance than the FAT file system through several design features.
First, the FAT system falls down in a number of areas: its root directory and file allocation table are located on the outermost cylinders of a drive while the most frequently accessed files are generally located near the innermost cylinders, forcing massive - and slow - head movements. Almost any file access forces the heads to seek back out to the FAT and then back in to the file. Its directories are simply tables which must be searched linearly - a slow process, even when assisted by the FASTOPEN command.
By contrast, HPFS splits the disk surface into bands, each 16 MB in size. At the center of the band is its directory and bitmap area, with the file area extending for 8 MB either side. This means that directory and file allocation information is located near the files it controls, and not at the opposite end of the disk. In addition, the directories are B+ trees (think of them as being like dBASE .NDX files), which means that HPFS can locate directories very much faster than the FAT file system (this is also why directories always list alphabetically under HPFS).
The use of caching, of course, further enhances performance, as does the use of multiple threads to write asynchronously within the file system.
The FAT file system allocates space in terms of clusters, the size of which varies according to the drive size, but which is always at least 2K. A drive which contains lots of small files, for example an email file server, will have a lot of wasted space. HPFS, on the other hand, allocates sectors, in order to minimise wasted space.
Local and Remote File Systems
Installable file systems provide the means by which OS/2 is able to access remote file systems, since the DosFSAttach() function call allows programs to mount remote file systems and access them through a locally assigned drive letter.
Supported Function Calls
OS/2 1.2(1) adds a number of new functions for access to installable file systems. These include:
- DosCopy Copies a file or subdirectory
- DosEditName Transforms a source filename string using an editing string
- DosFileIO Performs file I/O (lock, unlock, seek, read and write)
- DosFindFirst2 Finds the first file that matches a specified file name and attributes
- DosFSAttach Attaches or detaches a drive or pseudo character device from a remote file system
- DosFSCtl Calls non-standard file system functions
- DosGetResource2 Retrieves a resource for a module
- DosMkDir2 Creates a directory
- DosOpen2 Opens or creates a file with extended attributes
- DosQFSAttach Queries information about an attached file system
- DosSetPathInfo Sets information for a file or directory
- DosShutdown Shuts down the file system
Command-Line Operation
Use of quotes
Because filenames can now comprise multiple words, a difficulty arises, in that OS/2's command line interpreter will, by default, interpret each word as a separate filename. For example, the command
RD Data Subdirectory
will attempt to remove the two subdirectories Data and Subdirectory. In order to have CMD.EXE treat a multi-word filename properly, the filename should be enclosed in quotes:
RD "Data Subdirectory"
You can even give a command like
md "This is a very long subdirectory name, much longer than you could create under DOS"
and OS/2 will process it correctly. Follow this with Up-Arrow, Home, c, Enter (using OS/2's command-line recall and editing) and you will have changed to this new directory - the command prompt becomes quite spectacular.
Because many characters have special meanings to the command line processor, you can also 'quote' single characters or 'escape' their special meanings by prefixing them with a caret (^) symbol. Thus, the RD example above could have been typed as
RD Data^ Subdirectory
Alternatively, you could create a file with parentheses in its name with a command like
copy con ^(Parenthetical.File^)
Wildcard Operation
HPFS (and IFS's generally) support a more sophisticated variant of the wildcard facilities found in DOS. As before, a question mark (?) matches any character except a period (.). The asterisk (*) matches any sequence of characters, including blanks.
However, under HPFS, multiple asterisks can be used in any component of a filename (though not a path). For example, the command
DIR *Forecast*
will locate all files which have the word Forecast in them somewhere. Take this subdirectory, for example:
[E:\HPFS Test Subdirectory]dir The volume label in drive E is OS2 APPS. The Volume Serial Number is 259D:6C15 Directory of E:\HPFS Test Subdirectory 31-12-90 8:04 0 . 31-12-90 8:04 0 .. 2-01-91 10:25 0 0 +test.cap 2-01-91 10:17 57 0 Acme Project.Cash Flow Forecast.Final 31-12-90 10:23 20 0 file.1.dat 31-12-90 11:18 19 0 Forecast for 1992 31-12-90 11:17 29 0 Quarterly Forecast for 1991 31-12-90 8:07 22 0 Test Copy 1 31-12-90 8:08 21 0 Test File 31-12-90 8:05 21 0 Test File 1 31-12-90 8:05 21 0 Test File 2 11 File(s) 406528 bytes free [E:\HPFS Test Subdirectory]dir *Forecast* The volume label in drive E is OS2 APPS. The Volume Serial Number is 259D:6C15 Directory of E:\HPFS Test Subdirectory 2-01-91 10:17 57 0 Acme Project.Cash Flow Forecast.Final 31-12-90 11:18 19 0 Forecast for 1992 31-12-90 11:17 29 0 Quarterly Forecast for 1991 3 File(s) 406528 bytes free [E:\HPFS Test Subdirectory]dir *.*Forecast*.* The volume label in drive E is OS2 APPS. The Volume Serial Number is 259D:6C15 Directory of E:\HPFS Test Subdirectory 2-01-91 10:17 57 0 Acme Project.Cash Flow Forecast.Final 1 File(s) 406528 bytes free
[E:\HPFS Test Subdirectory]dir *.Final The volume label in drive E is OS2 APPS. The Volume Serial Number is 259D:6C15 Directory of E:\HPFS Test Subdirectory 2-01-91 10:17 57 0 Acme Project.Cash Flow Forecast.Final 1 File(s) 406528 bytes free
Use of Extended Attributes
By selecting the Settings Notebook view of an object, one is led to a dialog window which allows display and editing of extended attributes. This displays the file name and path, as well as the Subject and Icon EA's. Next, the file dates are displayed: Created, Last Modification and Last Access, along with the standard file attributes (archive, hidden and read-only). One can change the icon to any icon file on the system, including over-riding the default icon type if required. Next comes the Default Type field; again one can add additional types and change the default (notice, a file can have multiple types). Pressing on the '>' button gives access to fields for comments, Key Phrases and History.
File Searching
With a little planning, and appropriate support in OS/2 applications, one could totally eliminate any need for Magellan and similar file viewing/management utilities.
The Future
HPFS has rapidly become the format of choice for drives on LAN Manager servers and OS/2 workstations generally. OS/2 2.X continues to support HPFS, as well as other file systems.
The Workplace OS, due for introduction in 1994, will support HPFS, as well as the Journaling File System found on AIX systems.
Summary
The High Performance File System introduced in OS/2 1.21 offers something for everyone:
- Higher performance - we can all use that!
- Greater storage efficiency - important on file servers.
- Meaningful, long file names - useful in multi-user environments or for people with short memories.
- Extended attributes - great for getting organised and seamlessly integrating applications.
- HPFS386 - even higher performance and security for network systems.
Appendices
Appendix A: Test System Configurations
For Tables 1, 2 & 3:
Compaq Portable 386, 20 MHz clock, 10 MB memory, Conner Peripherals 110 MB drive, OS/2 1.21 with LAN Manager 2.1 workstation service running
For Table 4:
33 MHz 386, 8 MB memory, Conner Peripherals 209 MB drive, OS/2 1.21 with LAN Manager 2.0 HPFS386 installed with local security, server and workstation services shut down.
Appendix B: HPFSTEST.C source code
/* HPFS Benchmark Test */ #include <os2.h> #include #include #include #include /* Macro to get a random integer within a specified range */ #define getrandom( min, max ) ((rand() % (int)(((max)+1) - (min))) + (min)) #define NUMRECS 1000L main() { FILE *datafile; long i, fileptr; struct { int key; char buffer[98]; } datarec; printf("Processing %ld records\n",NUMRECS); puts("Strike any key to start"); getch(); if((datafile = fopen("TEST.DAT","w+")) == NULL) { fprintf(stderr,"\nError opening test datafile"); exit(1); } for(i=0;i<NUMRECS;i++) { datarec.key = i; fwrite(&datarec, sizeof(datarec),1,datafile); /* putch('.'); */ } printf("\a\nStage 1 - complete\n"); /* Seed the random number generator with current time. */ srand( (unsigned) time( NULL ) ); /* Randomly read NUMREC records */ for(i = 0; i < NUMRECS; i++) { fileptr = (long)sizeof(datarec) * (long)getrandom(0,(NUMRECS-1)); if(fseek(datafile,fileptr,SEEK_SET)) putch('!'); /* else putch('*'); */ fread(&datarec,sizeof(datarec),1,datafile); } printf("\a\nStage 2 - complete"); fclose(datafile); }