Jump to content

HPFS

From EDM2
Revision as of 14:51, 17 June 2006 by Martini (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

High Performance File System

Original Work by Hartmut Frommert

Design Goals and Implementation of the High Performance File System

The High Performance File System (hereafter HPFS), which is making its first appearance in the OS/2 operating system Version 1.2, had its genesis in the network division of Microsoft and was designed by Gordon Letwin, the chief architect of the OS/2 operating system. The HPFS has been designed to meet the demands of increasingly powerful PC's, fixed disks, and networks for many years to come and to serve as a suitable platform for object-oriented languages, applications, and user interfaces. The HPFS is a complex topic because it incorporates three distinct yet interrelated file system issues. First, the HPFS is a way of organizing data on a random access block storage device. Second, it is a software module that translates file-oriented requests from an application program into more primitive requests that a device driver can understand, using a variety of creative techniques to maximize performance. Third, the HPFS is a practical illustration of an important new OS/2 feature known as Installable File Systems. This article introduces the three aspects of the HPFS. But first, it puts the HPFS in perspective by reviewing some of the problems that led to the system's existence.

FAT File System

The so-called FAT file system ,which is the file system used in all versions of the MS-DOS operating system to date and in the first two releases of OS/2' (Versions 1.0 and 1.1), has a dual heritage in Microsoft's earliest programming language products and the Digital Research CP/M operating system--software originally written for 8080-based and Z-80-based microcomputers. It inherited characteristics from both ancestors that have progressively turned into handicaps in this new era of multitasking, protected mode, virtual memory, and huge fixed disks.

The FAT file system revolves around the File Allocation Table for which it is named. Each logical volume has its own FAT, which serves two important functions: it contains the allocation information for each file on the volume in the fonn of linked lists of allocation units (clusters, which are power-of-2 multiples of sectors) and it indicates which allocation units are free for assignment to a file that is being created or extended.

The FAT was invented by Bill Gates and Marc McDonald in 1977 as a method of managing disk space in the NCR version of standalone Microsoft's Disk BASIC. Tim Paterson, at that time an employee of Seattle Computer Products (SCP), was introduced to the FAT concept when his company shared a booth with Microsoft at the National Computer Conference in 1979. Paterson subsequently incorporated FATs into the file system of 86-DOS, an operating system for SCP s S-100 bus 8086 CPU boards. 86-DOS was eventually purchased by Micro-soft and became the starting point for MS-DOS Version 1.0, which was released for the original lBM PC in August 1981.

When the FAT was conceived, it was an excellent solution to disk management, mainly because the floppy disks on which it was used were rarely larger than 1 Mb. On such disks, the FAT was small enough to be held in memory at all times, allowing very fast random access to any part of any file. This proved far superior to the CP/M method of tracking disk space, in which the information about the sectors assigned to a file might be spread across many directory entries, which were in turn scattered randomly throughout the disk directory. When applied to fixed disks, however, the FAT began to look more like a bug than a feature. it became too large to be held entirely resident and had to be paged into memory in pieces: this paging resulted in many superfluous disk head movements as a program was reading through a file and degraded system throughput. in addition, because the information about free disk space was dispersed across many sectors of FAT, it was impractical to allocate file space contiguously, and file fragmentation became another obstacle to good performance. Moreover, the use of relatively large clusters on fixed disks resulted in a lot of dead space, since an average of one- half cluster was wasted for each file. (Some network servers use clusters as large as 64Kb.)

The FAT file system 's restrictions on naming files and directories are inherited from CP/M. When Paterson was writing 86-DOS one of his primary objectives was to make programs easy to port from CP/M to his new operating system. He therefore adopted CP/M's limits on filenames and extensions so the critical fields of 86-DOS File Control Blocks (FCBs) would look almost exactly like those of CP/M. The sizes of the FCB filename and extension fields were also propagated into the structure of disk directory entries. In due time 86-DOS became MS- DOS and application programs for MS-DOS proliferated beyond anyone's wildest dreams. Since most of the early programs depended on the structure of FCBs the 8.3 format for filenames became irrevocably locked into the system.

During the last couple of years Microsoft and IBM have made valiant attempts to prolong the useful life of the FAT file system by lifting the restrictions on volume sizes improving allocation strategies caching path names and moving tables and buffers into expanded memory. But these can only be regarded as temporizing measures because the fundamental data structures used by the FAT file system are simply not well suited to large random access devices. The HPFS solves the FAT file system problems mentioned here and many others but it is not derived in any way from the FAT file system. The architect of the HPFS started with a clean sheet of paper and designed a file system that can take full advantage of a multitasking environment and that will be able to cope with any sort of disk device likely to arrive on microcomputers during the next decade.

HPFS Volume Structure

HPFS volumes are a new partition type--type 7--and can exist on a fixed disk alongside of the several previously defined FAT partition types. IBM-compatible HPFS volumes use a sector size of 512 bytes and have a maximum size of 2199Gb (232 sectors). Although there is no particular reason why floppy disks can't be formatted as HPFS volumes Microsoft plans to stick with FAT file systems on floppy disks for the foreseeable future. (This ensures that users will be able to transport files easily between MS-DOS and OS/2 systems.) An HPFS volume has very few fixed structures (Figure 1). Sectors 0-15 of a volume (8Kb) are the Bootblock and contain a volume name, 32-bit volume ID, and a disk bootstrap program. The bootstrap is relatively sophisticated (by MS-DOS standards) and can use the HPFS in a restricted mode to locate and read the operating system files wherever they might be found. Sectors 16 and 17 are known as the Super Block and the Spare Block respectively. The Super Block is only modified by disk maintenance utilities. It contains pointers to the free space bitmaps the bad block list the directory block band and the root directory. It also contains the date that the volume was last checked out and repaired with CHKDSK /F. The Spare Block contains various flags and pointers that will be discussed later it is modified although infrequently as the system executes. The remainder of the disk is divided into 8Mb bands. Each band has its own free space bitmap in which a bit represents each sector. A bit is 0 if the sector is in use and 1 if the sector is available. The bitmaps are located at the head or tail of a band so that two bitmaps are adjacent between alternate bands. This allows the maximum contiguous free space that can be allocated to a file to be 16Mb. One band located at or toward the seek center of the disk is called the directory block band and receives special treatment (more about this later). Note that the band size is a characteristic of the current implementation and may be changed in later versions of the file system.

[Fig. 1]

FIGURE 1. This figure shows the overall structure of an HPFS volume. The most important fixed objects in such a volume are the Bootblock the Super Block, and the Spare Block. The remainder of the volume is divided into 8Mb bands. There is a freespace bitmap for each band and the bitmaps are located between alternate bands consequently, the maximum contiguous space which can be allocated to a file is 16Mb.

HPFS Volume Structure

HPFS volumes are a new partition type--type 7--and can exist on a fixed disk alongside of the several previously defined FAT partition types. IBM-compatible HPFS volumes use a sector size of 512 bytes and have a maximum size of 2199Gb (232 sectors). Although there is no particular reason why floppy disks can't be formatted as HPFS volumes Microsoft plans to stick with FAT file systems on floppy disks for the foreseeable future. (This ensures that users will be able to transport files easily between MS-DOS and OS/2 systems.) An HPFS volume has very few fixed structures (Figure 1). Sectors 0-15 of a volume (8Kb) are the Bootblock and contain a volume name, 32-bit volume ID, and a disk bootstrap program. The bootstrap is relatively sophisticated (by MS-DOS standards) and can use the HPFS in a restricted mode to locate and read the operating system files wherever they might be found. Sectors 16 and 17 are known as the Super Block and the Spare Block respectively. The Super Block is only modified by disk maintenance utilities. It contains pointers to the free space bitmaps the bad block list the directory block band and the root directory. It also contains the date that the volume was last checked out and repaired with CHKDSK /F. The Spare Block contains various flags and pointers that will be discussed later it is modified although infrequently as the system executes. The remainder of the disk is divided into 8Mb bands. Each band has its own free space bitmap in which a bit represents each sector. A bit is 0 if the sector is in use and 1 if the sector is available. The bitmaps are located at the head or tail of a band so that two bitmaps are adjacent between alternate bands. This allows the maximum contiguous free space that can be allocated to a file to be 16Mb. One band located at or toward the seek center of the disk is called the directory block band and receives special treatment (more about this later). Note that the band size is a characteristic of the current implementation and may be changed in later versions of the file system.

[Fig. 1]

FIGURE 1. This figure shows the overall structure of an HPFS volume. The most important fixed objects in such a volume are the Bootblock the Super Block, and the Spare Block. The remainder of the volume is divided into 8Mb bands. There is a freespace bitmap for each band and the bitmaps are located between alternate bands consequently, the maximum contiguous space which can be allocated to a file is 16Mb.

Files and FNodes

Every file or directory on an HPFS volume is anchored on a fundamental file system object called an Fnode (pronounced "eff node"). Each Fnode occupies a single sector and contains control and access history information used internally by the file system extended attributes and access control lists (more about this later) the length and the first 15 characters of the name of the associated file or directory and an allocation structure (Figure 2).

An Fnode is always stored near the file or directory that it represents. The allocation structure in the Fnode can take several forms depending on the size and degree of contiguity of the file or directory. The HPFS views a file as a collection of one or more runs or extents of one or more contiguous sectors. Each run is symbolized by a pair of double-words--a 32-bit starting sector number and a 32-bit length in sectors (this is referred to as run length encoding). From an application program's point of view the extents are invisible; the file appears as a seamless stream of bytes. The space reserved for allocation information in an Fnode can hold pointers to as many aseight runs of sectors of up to 16Mb each . (This maximum run size is a result of the band size and free space bitmap placement only; it is not an inherent limitation of the file system.) Reasonably small files or highly contiguous files can therefore be described completely within the Fnode (Figure 3).

HPFS uses a new method to represent the location of files that are too large or too frag-mented for the Fnode and consist of more than eight runs. The Fnode's allocation structure becomes the root for a B+ Tree of allocation sectors which in turn contain the actual pointers to the file's sector runs (see Figure 4 and the sidebar, "B-Trees and B+ Trees"). The Fnode's root has room for 12 elements. Each allocation sector can contain, in addition to various control information, as many as 40 pointers to sector runs. Therefore a two-level allocation B+ Tree can describe a file of 480 (12x40) sector runs with a theoretical maximum size of 7.68Gb (12x40x16Mb) in the current implementation (although the 32-bit signed offset parameter for DosChgFilePtr effectively limits the sizes to 2Gb). In the unlikely event that a two-level B+ Tree is not sufficient to describe the highly fragmented file the file system will introduce additional levels in the tree as needed. Allocation sectors in the intermediate levels can hold as many as 60 intemal (nonterminal) B+ Tree nodes which means that the descriptive ability of this structure rapidly grows to numbers that are nearly beyond comprehension. For example a three level allocation B+ tree can describe a file with as many as 28 800 (12x60x40) sector runs. Run-length encoding and B+ Trees of allocation sectors are a memory-efficient way to specify a file's size and location but they have other s significant advantages. Translating a logical file offset into a sector number is extremely fast: the file system just needs to traverse the list (or B+ Tree of lists) of run pointers until it finds the correct range. It can then identify the sector within the run with a simple calculation. Run-length encoding also makes it trivial to extend the file logically if the newly assigned sector iscontiguous with the file's previous last sector the file system merely needs to increment the size double word of the file's last run pointer and clear the sector's bit in the appropriate freespace bitmap.

[Fig. 2]

FIGURE 2: This figure shows the overall structure of an Fnode. The Fnode is the fundamental object in an HPFS volume and is the first sector allocated to a file or directory. it contains control and access history information used by the file system, cached EAs and ACLs or pointers to same, a truncated copy of the file or directory name (to aid disk repair programs, and an allocation structure which defines the size and location of the file's storage.

[Fig. 3]

FIGURE 3: The simplest form of tracking for the sectors owned by a file is shown. The Fnode s allocation structure points directly to as many as eight sector runs. Each run pointer consists of a pair of 32-bit doublewords: a starting sector number and a length !n sectors.

[Fig. 4]

FIGURE 4: This figure demonstrates the technique used to track the sectors owned by a file with 9-480 sector runs. The allocation structure in the Fnode holds the roots for a B+ Tree of allocation sectors. Each allocation sector can describe as many as 40 sector runs. If the file requires more than 480 sector runs, additional intermediate levels are added to the B+ Tree, which increases the number of possible sector runs by a factor of sixty for each new level.

Files and FNodes

Every file or directory on an HPFS volume is anchored on a fundamental file system object called an Fnode (pronounced "eff node"). Each Fnode occupies a single sector and contains control and access history information used internally by the file system extended attributes and access control lists (more about this later) the length and the first 15 characters of the name of the associated file or directory and an allocation structure (Figure 2).

An Fnode is always stored near the file or directory that it represents. The allocation structure in the Fnode can take several forms depending on the size and degree of contiguity of the file or directory. The HPFS views a file as a collection of one or more runs or extents of one or more contiguous sectors. Each run is symbolized by a pair of double-words--a 32-bit starting sector number and a 32-bit length in sectors (this is referred to as run length encoding). From an application program's point of view the extents are invisible; the file appears as a seamless stream of bytes. The space reserved for allocation information in an Fnode can hold pointers to as many aseight runs of sectors of up to 16Mb each . (This maximum run size is a result of the band size and free space bitmap placement only; it is not an inherent limitation of the file system.) Reasonably small files or highly contiguous files can therefore be described completely within the Fnode (Figure 3).

HPFS uses a new method to represent the location of files that are too large or too frag-mented for the Fnode and consist of more than eight runs. The Fnode's allocation structure becomes the root for a B+ Tree of allocation sectors which in turn contain the actual pointers to the file's sector runs (see Figure 4 and the sidebar, "B-Trees and B+ Trees"). The Fnode's root has room for 12 elements. Each allocation sector can contain, in addition to various control information, as many as 40 pointers to sector runs. Therefore a two-level allocation B+ Tree can describe a file of 480 (12x40) sector runs with a theoretical maximum size of 7.68Gb (12x40x16Mb) in the current implementation (although the 32-bit signed offset parameter for DosChgFilePtr effectively limits the sizes to 2Gb). In the unlikely event that a two-level B+ Tree is not sufficient to describe the highly fragmented file the file system will introduce additional levels in the tree as needed. Allocation sectors in the intermediate levels can hold as many as 60 intemal (nonterminal) B+ Tree nodes which means that the descriptive ability of this structure rapidly grows to numbers that are nearly beyond comprehension. For example a three level allocation B+ tree can describe a file with as many as 28 800 (12x60x40) sector runs. Run-length encoding and B+ Trees of allocation sectors are a memory-efficient way to specify a file's size and location but they have other s significant advantages. Translating a logical file offset into a sector number is extremely fast: the file system just needs to traverse the list (or B+ Tree of lists) of run pointers until it finds the correct range. It can then identify the sector within the run with a simple calculation. Run-length encoding also makes it trivial to extend the file logically if the newly assigned sector iscontiguous with the file's previous last sector the file system merely needs to increment the size double word of the file's last run pointer and clear the sector's bit in the appropriate freespace bitmap.

[Fig. 2]

FIGURE 2: This figure shows the overall structure of an Fnode. The Fnode is the fundamental object in an HPFS volume and is the first sector allocated to a file or directory. it contains control and access history information used by the file system, cached EAs and ACLs or pointers to same, a truncated copy of the file or directory name (to aid disk repair programs, and an allocation structure which defines the size and location of the file's storage.

[Fig. 3]

FIGURE 3: The simplest form of tracking for the sectors owned by a file is shown. The Fnode s allocation structure points directly to as many as eight sector runs. Each run pointer consists of a pair of 32-bit doublewords: a starting sector number and a length !n sectors.

[Fig. 4]

FIGURE 4: This figure demonstrates the technique used to track the sectors owned by a file with 9-480 sector runs. The allocation structure in the Fnode holds the roots for a B+ Tree of allocation sectors. Each allocation sector can describe as many as 40 sector runs. If the file requires more than 480 sector runs, additional intermediate levels are added to the B+ Tree, which increases the number of possible sector runs by a factor of sixty for each new level.