Inside the High Performance File System - Part 4

From EDM2
Revision as of 17:03, 16 January 2019 by Ak120 (Talk | contribs)

Jump to: navigation, search
Inside the High Performance File System
Part: 1 2 3 4 5 6

By Dan Bridges

Part 4/6: Fragmentation, Diskspace Bitmaps and Code Pages

Introduction

This article originally appeared in the May 1996 issue of Significant Bits, the monthly magazine of the Brisbug PC User Group Inc.

This month we look at how HPFS knows which sectors are occupied and which ones are free. We examine the amount of file fragmentation on five HPFS volumes and also check out the fragmentation of free space. A program will be presented to show free runs and some other details. Finally, we'll briefly discuss Code Pages and look at a program to display their contents.

How Sectors are Mapped on a HPFS Volume

The sector usage on a HPFS partition is mapped in data band bitmap blocks. These blocks are 2 KB in size (four sectors) and are usually situated at either the beginning or end of a data band. A data band is almost 8 MB. (Actually 8,190 KB since 2 KB is needed for its bitmap.) See Figure 1. The state of each bit in the block indicates whether or not a sector (HPFS' allocation unit) is occupied. If a bit is set (1) then its corresponding sector is free. If the bit is not set (0) than the sector is occupied. Structures situated within the confines of a data band such as Code Page Info & Data sectors, Hotfix sectors, the Root Directory DirBlk etc. are all marked as fully occupied within that band's usage bitmap.

HPFS-4-fig1.png

Figure 1: The basic data layout of a HPFS volume.

Since each bit maps a sector, a byte maps eight sectors and the complete 2 KB block maps the 16,384 sectors (including the bitmap block itself) in an 8 MB band. And since two blocks can face each other, we arrive at the maximum possible extent (fragment) size of 16,380 KB. Examine Figure 2 now to see examples of file and freespace mapping.

HPFS-4-fig2.png

Figure 2: The correspondence of the first five bytes in a data band's usage bitmap to the first 40 sectors in the band.

In this example we see 23 occupied sectors ("u") and 4 unoccupied areas (".") which we will refer to as "freeruns" [of sectors]. At one extreme, the 23 sectors might belong to one file (here in four extents) while at the other extreme we might have the FNODEs of 23 "zero-length" files. (Every file and directory entry on a HPFS volume must have a FNODE sector.)

The advantages of the bitmap approach are twofold. First, the small allocation unit size on a HPFS volume means greatly reduced allocation unit wastage compared to large FAT partitions. Second, the compact mapping structure makes it feasible for HPFS to quickly search a data band for enough free space to slot in a file of known size, in one piece if possible. For example, as just mentioned HPFS can map 32,760 allocation units with just 4 KB of bitmaps whereas a 16-bit FAT structure requires 64 KB (per FAT copy) to map 32,768 allocation units.

A Fragmentation Analysis

In this section we'll examine the level of fragmentation on the five HPFS partitions of my first HD. Look at Figure 3. Notes:

  1. A time-since-last-defrag figure of "Never" means that I've never run a defragger across this partition since upgrading to OS/2 Warp 118 days ago. This value is stored in the SuperBlock (LSN 16) and was determined by using the ShowSuperSpare REXX program featured in Part 2.
  2. The fragmentation levels were reported by the wondrous FST (freeware) with "FST -n check -f C:" while the names of the fragmented files and their sizes came from the GammaTech Utilities (commercial) "HPFSOPT C: -u -d -o1 -l logfile". You can also use the Graham Utilities (commercial) "HPFS-EXT C: -s".
  3. The high number of files with 0 data extents on C: is due to the presence of the WPS folders on this drive. Each of these has "zero" bytes in the main file but they usually have bytes in EAs.
  4. Files with 0 or 1 extents are considered to fully contiguous, so I've placed them in one grouping.
  5. Files with 2-8 extents are considered to be "nearly" contiguous" since the fragments will usually be placed close together on the disk and also because a list of the location and length of up to 8 extents can be kept in a file's FNODE sector. This list will be kept memory resident while the file is open. Note 1: the extents themselves can not be kept memory resident since, theoretically, they could be up to 8*16,380 KB in size. But no non-data disk reads, after the initial read of the FNODE, would be required to work with the file. Note 2: under some circumstances, the 8 extents, if small enough, could be kept memory resident in the sense that they could be held in HPFS' cache. We will consider FNODEs in detail in a later instalment.
  6. Files with more than 8 extents have too many fragments to be listed in their FNODEs. Instead a B+tree allocation sector structure (an ALSEC) is used to map the extents. The sector mappings are small enough to keep memory resident while the file is open. ALSECs will be covered in a latter instalment.
  7. EAs are usually not fragmented since, in the current implementation of OS/2, the total EA size associated with any one file is only 64 KB. If a file has EAs in 0 extents then the EA information is stored completely within the FNODE sector. (There is space in the FNODE for up to 145 bytes of "internal" EAs.) In all other cases on my system they currently stored in single, external runs of sectors. EAs will be covered in later instalments.

HPFS-4-fig3.png

Figure 3: Fragmentation analysis of five HPFS partitions.

We now turn to the topic of what circumstances are leading to file fragmentation on these partitions.

C: _ The OS/2 system partition. I've run out of space on this drive on occasions. Activity here occurs though the running of Fixpacks (FP 16 and then FP 17 were run), INI maintenance utilities and driver upgrades. There is really nothing of concern here. Most HPFS defraggers suggest not trying to defrag files that have less than 2 or 3 extents since you run the risk of fragmenting the free space. We will return to this topic shortly.

D: _ My main work area and the location of communications files. I use the DOS comms package TELEMATE because I've always liked its features (although OS/2 has to work hard to handle its modem access during a file transfer - OS/2 comms programs, in general, are much less demanding of the CPU's attention). The other major comms package I use is OS/2 BinkleyTerm v2.60 feeding OS/2 Squish message databases. The fragmented files consist mainly of files downloaded by TELEMATE (DOS comms programs do not inform HPFS, ahead of time, of how much space the downloaded file will occupy) and Squish databases (*.SQD). The drive was defragged 53 days ago at which time no special effort was made to reduce file fragmentation below 2-3 extents, accounting for the presence of 245 files with two extents. This really is an insignificant amount regardless of what the 4% figure may lead you to believe.

The most fragmented file on this partition is a 150 KB BinkleyTerm logfile with 30 extents. The main reason I can see for fragmentation in this case is that the file is frequently being updated with information while file transfers are in progress. The Squish databases are also prone to fragmentation. Out of a total of 25 database files there were 8, averaging 500 KB each, with an average of 15 extents.

E: _ The fragmentation here was insignificant apart from a single 2.8 MB executable Windows program that has had a DOS patch program run over it, resulting in 38 fragments. The 2-extent files were mainly data files that are produced by this same Windows package (being run under WIN-OS2).

F: _ Almost no fragmentation since this partition is reserved for DOS programs and I don't use them much.

G: _ My second major work partition. Fragmentation is low and unlikely to go much lower since 2 extents is considered below the point of defragger involvement.

The conclusions to be drawn from the above is that, if you don't get too hot under the collar about some files having 2 or 3 extents then there will generally be little need to worry about fragmentation under HPFS. Only certain types of files (some comms/DOS/Windows) will be candidates. And keeping partitions less than 80% full should help reduce general fragmentation as well.

Defragmenting Files

Since fragmentation is a relatively minor concern under HPFS there is not much of an argument for purchasing OS/2 utilities based mainly on their ability to defragment HPFS drives, especially since it's not hard to defragment files yourself. You see, providing there is enough contiguous free space on a volume, the mere act of copying the files to a temporary directory, deleting the original and then moving the files back will usually eliminate, or at least reduce fragmentation since HPFS, knowing the original filesize, will look for a suitably sized free space. The success of this technique is demonstrated in Figure 4 where 25 Squish database files (*.SQD) totalling 5.7 MB where shuffled about on D:. Note: don't use the MOVE command to initially transfer the files to the temp directory since this will just alter the directory entry rather than actually rewriting the files.

HPFS-4-fig4.png

Figure 4: Number of extents in 25 SQD files before and after the defrag process described in the text.

I've used the GU's HPFS-EXT to report these figures. This is freely available in the GULITE demo package. Note: the fully functional HPFSDFRG is also in this package but I wanted to show that it's not that hard to do this by hand. HPFSDFRG does much the same as I did except that you can specify the optimisation threshold (minimum number of extents before a file becomes a candidate) and it will retry the copying operation up to ten times if there are more extents after the operation than before it (due to heavily fragmented free space).

The Fragmentation of Freespace

Another significant aspect of HPFS' fragmentation resistance is how well the FS keeps disk free space in big, contiguous chunks. If the current files on a partition are relatively fragmentation free but the remaining free space is arranged in lots of small chunks then there is a good change that new files will be fragmented. You can check this with "FST -n info -f C:". This produces a table that counts the number of free space extents that are 1, 2-3, 4-7, 8-15, ... 16384-32767 sectors long. In my opinion though it is more important to consider the product of the actual extent size by their frequency since the presence of numerous 1-extent spaces are not important if there are still a number of large spaces available.

Figure 5 shows the output of the REXX program ShowFreeruns.cmd. The partition of 100 MB is almost empty. The display shows the location of the 2 KB block that holds the list of the starting LSNs of each bitmap block (this figure comes from the dword at offset 18h in the SuperBlock), the location of each bitmap block on the left and the sector size and location of free space on the right. As you see, this partition has 13 data bands, 6 of which face each other. A version of ShowFreeruns.cmd that only outputs the run size was used to generate a list of figures. This list was loaded into a spreadsheet, sorted and a frequency distribution performed. See Figure 6. You can see that C: has no large areas remaining, D: has the majority of its free space in the 4 MB < 8 MB range and that E:, F: and G: have kept large majorities of their free space in very big runs. Overall, this is quite good performance.

Inspecting drive O:

 List of Bmp Sectors: 0x00018FF0 (102384)

Space-Usage Bitmap Blocks:
                                      Freespace Runs:

  0x00000014-00000017 (20-23)
  0x00007FFC-00007FFF (32764-32767)
                                      130-32763     (#1:32634)

  0x00008000-00008003 (32768-32771)
  0x0000FFFC-0000FFFF (65532-65535)
                                      32772-65531   (#2:32760)

  0x00010000-00010003 (65536-65539)
  0x00017FFC-00017FFF (98300-98303)
                                      65540-81919   (#3:16380)
                                      81926-98291   (#4:16366)

  0x00018000-00018003 (98304-98307)
  0x0001FFFC-0001FFFF (131068-131071)
                                      100369-102383 (#5:2015)
                                      102400-131067 (#6:28668)

  0x00020000-00020003 (131072-131075)
  0x00027FFC-00027FFF (163836-163839)
                                      131076-163835 (#7:32760)

  0x00028000-00028003 (163840-163843)
  0x0002FFFC-0002FFFF (196604-196607)
                                      163844-196603 (#8:32760)

  0x00030000-00030003 (196608-196611)
                                      196612-204767 (#9:8156)

Figure 5: Output from the ShowFreeruns.cmd REXX program.

HPFS-4-fig6.png

Figure 6: Free space analysis on five HPFS partitions.

The ShowFreeruns Program

Like other programs in this series, ShowFreeruns.cmd (see Figure 7) uses SECTOR.DLL to read a sector off a logical drive. I was motivated to design this program after seeing the output of the GU's "HPFSINFO C: -F". On a one-third full 1.2 GB partition, the program presented here takes 17 secs compared to HPFSINFO's time of 26 secs. HPFSINFO also shows the CHS (Cyl/Hd/Sec) coordinates of each run. I was not interested in these but instead display the freerun's size. HPFSINFO also displays the meaning of what's in the SuperBlock and the SpareBlock. If you want to do this, you can include the code from ShowSuperSpare.cmd from Part 2 and it will only add an extra 0.5 secs to the time. The performance then, for a interpreted program (REXX), is quite good and was achieved primarily through a speed-up technique to be discussed shortly. Moreover, HPFSINFO consistently overstates the end of each freerun by 1 and it sometimes does not show the last run (e.g. on C: it states that there are 366 freeruns but only shows 365 of them). This last bug appears to be caused by the last freerun continuing to the end of the partition. My design accounts for this situation.


 /* Shows bitmap locations and free space runs */
 ARG drive .  /* First parm should always be drive */
 
 IF drive = '' THEN CALL HELP
 parmList = "? /? /H HELP A: B:"
 IF WordPos(drive, parmList) \= 0 THEN CALL Help
 
 /* Register external DLL functions */
 CALL RxFuncAdd 'ReadSect','Sector','ReadSect'
 CALL RxFuncAdd 'RxDate','RexxDate','RxDate'
 
 /* Initialise Lookup Table*/
 DO exponent = 0 TO 7
    bitValue.exponent = D2C(2**exponent)
 END exponent
 
 secString = ReadSect(drive, 16)   /*Read Superblk sec*/
 freespaceBmpList = C2D(Reverse(Substr(secString,25,4)))
 totalsecs = C2D(Reverse(Substr(secString,17,4)))
 
 '@cls'
 SAY
 SAY "Inspecting drive" drive
 SAY
 /* LSN 25 = list of bitmap blocks */
 CALL ShowDword " List of Bitmap secs",25
 
 startOfListBlk = 0
     startOfBlk = 0
     bmpListBlk = ""
         bmpBlk = ""
 getFacingBands = 0
      runNumber = 0
     byteOffset = 0
      runNumber = 0
 /* Read in 4 secs of the list of sec-usage bmp blks */
 DO secWithinBlk = freespaceBmpList TO freespaceBmpList+3
    temp = StartOfListBlk + secWithinBlk
    bmpListBlk = bmpListBlk||ReadSect(drive, temp)
 END secWithinBlk
 
 SAY
 SAY "Space-Usage Bitmap Blocks:"
 SAY "                                  Freespace Runs:"
 
 /* Use dword pointers to bmps to read in 2KB bmp blks */
 DO listOffset = 1 TO 2048 BY 4
    startDecStr = C2D(Reverse(Substr(bmpListBlk,ListOffset,4)))
    IF startDecStr = 0 THEN    /* No more bmps listed */
       DO
       IF getFacingBands = 1 THEN
          DO    /* Last data band had no facing data band */
          bmpSize = 2048
          CALL DetermineFreeruns
          LEAVE
          END
 
       LEAVE
       END
 
    /*Display a blank line when a new facing band occurs*/
    IF (ListOffset+7//8 = 0 THEN SAY
 
    CALL ShowBmpBlk listOffset
    DO secWithinBlk = 0 TO 3
       temp = StartOfBlk + secWithinBlk
       bmpBlk = bmpBlk||ReadSect(drive, temp)
    END secWithinBlk
 
    getFacingBands = getFacingBands + 1
    IF getFacingBands = 2 THEN /* Wait until you get both */
       DO                    /* bmps for the facing data*/
       bmpSize = 4096       /* bands since maximum extent*/
       CALL DetermineFreeruns    /* length is 16,380 KB */
       byteOffset = byteOffset+4096
       getFacingBands = 0
       bmpBlk = ""
       END
 END listOffset
 
 EXIT  /**************EXECUTION ENDS HERE**************/
 
 
 FourBytes2Hex:  /* Given offset, return dword */
 ARG  startPos
 rearranged = Reverse(Substr(secString,startPos,4))
 RETURN C2X(rearranged)
 
 
 ShowDword:     /* Display dword and dec equivalent */
 PARSE ARG label, offset
 hexStr = FourBytes2Hex(offset)
 SAY label": 0x"hexStr "("X2D(hexStr)")"
 RETURN
 
 
 ShowBmpBlk:
 /* Show start-end of freespace runs in hex & dec */
 PARSE ARG  offset
 endDecStr = C2D(Reverse(Substr(bmpListBlk,offset,4)))+3
 SAY "   0x"D2X(startDecStr,8)"-"D2X(endDecStr,8)
                           " ("startDecStr"-"endDecStr")"
 startOfBlk = startDecStr
 RETURN
 
 
 DetermineFreeruns:
 runStatus = 0
 oldchar = ''
 /* Check 128 secs at a time to speed up operation */
 DO para = 1 to bmpSize BY 16
    /* 16 bytes*8 secs/byte = 128 secs per para scanned */
    char = Substr(bmpBlk,para,16)
    IF char = 'FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF'x &,
               runstatus = 1 THEN ITERATE para
    IF char = '00000000000000000000000000000000'x &,
               runstatus = 0 THEN ITERATE para
    /* Part of paragraph has run start/end
       so check a byte (8 secs) at a time. */
    DO byte = para TO para + 15
       char = Substr(bmpBlk,byte,1)
       IF char > '0'x THEN        /* 1 or more free secs */
          DO
          IF char = 'FF'x THEN       /* 8 unoccupied secs */
             IF runStatus = 1 THEN  /* Run is in progress */
                NOP
             ELSE         /* Run starts on 8 sec boundary */
                DO
                startByte = byte + byteOffset
                startBitPos = 0
                runStatus = 1   /* Start run determination */
                END
          ELSE
             CALL DetermineBit /* Partial usage of 8 secs */
          END
       ELSE
          DO                    /* All 8 secs are used */
          IF runStatus = 1 THEN
             DO
             endByte = byte + byteOffset
             endBitPos = -1   /* Run ends with prior sec */
             CALL ShowRun
             END
          END
    END byte
 END para
 
 IF runStatus = 1 THEN   /* Freespace at end of part. */
    DO
    endByte = 9999999999   /* Larger than # of secs in   */
    endBitPos = 0          /* max. possible part.(512GB) */
    CALL ShowRun           /* so ShowRun will set runEnd */
                        /* to last LSN in this part.  */
    END
 RETURN
 
 
 DetermineBit:  /* Free/occupied usage within 8 sec blk */
 DO bitPos = 0 TO 7
    IF runStatus = 0 THEN
       DO                /* No run currently in progress */
       IF BitAnd(char, bitValue.bitPos) > '0'x THEN
          DO              /* sec is free */
          startByte = byte + byteOffset
          startBitPos = bitPos
          runStatus = 1
          END
       END
    ELSE
       DO
       IF BitAnd(char, bitValue.bitPos) = '0'x THEN
          DO              /* sec is used */
          endByte = byte + byteOffset
          /* When a run ends, the sec before the first
             used one is the last sec in the freerun. */
          endBitPos = bitPos - 1
          CALL ShowRun
          END
       END
 END bitPos
 RETURN
 
 
 ShowRun:
 /* Display freerun start-end secs & reset run status */
 runNumber = runNumber + 1
 runStart = (startByte - 1) * 8 + startBitPos
 runEnd = (endByte - 1) * 8 + endBitPos
 
 IF runEnd > totalSecs THEN runEnd = TotalSecs - 1
 IF runStart \= runEnd THEN  /* More than 1 sec is free */
    DO
    run = runStart"-"runEnd
    run = Left(run||Copies(" ",14),15)
    SAY Copies(" ",40) run "(#"runNumber":"runEnd-RunStart+1")"
    END
 ELSE
    DO
    run = Left(runStart||Copies(" ",14),15)
    SAY Copies(" ",40) run "(#"runNumber":1)"
    END
 
 runStatus = 0
 RETURN
 
 
 Help:
 SAY
 SAY "Purpose:"
 SAY "  ShowFreeruns displays the location of the
        sec-usage bitmap blocks"  /* Wrapped long line */
 SAY "   and the location and extent of free space runs."
 SAY
 SAY "Example:"
 SAY "  ShowFreeruns C:"
 SAY
 EXIT
 

Figure 7: The ShowFreeruns.cmd REXX program. Requires SECTOR.DLL. Note that the long SAY line (line 40) should include the next line as well. (SAY clauses can't be continued on to the next line with a comma.)

Since a sector is mapped by a bit, the program often needs to check the status of a bit within a bitmap's byte. This is done using the BITAND(string1, string2) inbuilt function. In this design string 1 holds the byte to be examined and string 2 holds a character that only has the corresponding bit set. Rather than having to work out the character for string 2 each time BITAND() is used, we instead precalculate the eight characters and then store them in the BitValue. compound variable for later use.

The next step is to read in the SuperBlock and from it get the location of the list of bitmap sectors and the total number of sectors. The later value is required so we know when we've reached the end of the partition.

We then read in the four sectors of the block holding the list of bitmaps. The list consists of dwords that store the starting LSN of each bitmap block. 128 dwords can fit in each sector of the list so the four sectors of the list can hold 512 bitmap block LSNs. Now a bitmap block maps 8 MB of diskspace so this 'lite' version is only good when dealing with a partition of less than 4 GB. (Earlier works refer to the maximum partition size as 512 GB but in the recent "Just Add OS/2 Warp" package, in its technical section, it is stated that the maximum partition size is 64 GB.) I won't be able to check this aspect of the design until I get a HD bigger than 4 GB and succumb to the mad urge to partition it as one volume.

The end of the list is indicated by the first occurrence of 0000h. The list of the 100 MB partition shown in Figure 5 contains only 13 dwords since it has 13 data bands so, in a typical case, you should not expect to find much data stored in this block.

A freerun can be bigger than a data band since pairs of bands face each other, so we consider two bands at a time, unless we reach the end of the partition without a facing band. Once we have a data region we call the DetermineFreeruns procedure. Here we examine the two, combined data bitmaps (unless it's a solo band at the end). In the initial design I looked at each byte in the 4 KB bitmap combination to see it if it was either 00h (all eight sectors used) or FFh (all eight sectors free). Typically, you will find lots of occupied or free sectors together, so checking eight at a time speeds up the search. Only when the byte was neither of these is a bit-level search required.

However, the speed of this version was poor, with the search though each byte of the 322 KB of bitmaps for the 161 databands in the 1.2 GB partition taking a total of 104 secs. The obvious solution was to extend the optimisation method to a second, higher level by checking more bytes first to see if they were all set or clear. I settled on 16 bytes which covers 128 sectors (64 KB) of diskspace at a time and this resulted in the final time of 17 secs. Further experiments with larger (64 byte) groups and also with third-level optimisation did not show much improvement with my mix of partitions but your situation may warrant further experimentation.

Code Pages

Different languages have different character sets. Code Pages (CPs) are used to map an ASCII character to the actual character. CP tables reside in COUNTRY.SYS. They are also present on a HPFS volume and every directory entry (DIRENT) includes a CP index value.

CPs are used to map character case (i.e. in a foreign character set the relationship between lower and upper-case characters) and for collating sequences used when sorting. As mentioned in Part 1, HPFS directories use a B-tree structure which, as part of its operation, always store file/directory names in sorted order. Remember that HPFS is not case-sensitive (including when sorting) but it preserves case.

The European-style language (including English) have relatively straightforward Single-Byte Character Sets (SBCS) i.e. one character is represented by one byte. Asian character sets typically have many characters so they require two bytes per character (DBCS).

The first 128 characters in all ASCII CPs are the same so the CP tables on the disk only map ASCII 128-255.

The SpareBlock holds the LSN of the first CP Info sector. There is a header followed by up to 31 16-byte CP Info Entries. There is provision for more than one CP Info sector which could hold CP Info Entries 31-61 (counting from 0). Why so many different CPs are catered for I have no idea since I've been unable to have more than two loaded at a time. In Australia we typically use CP437 (standard PC) - Country 061 and CP850 (multilingual Latin-1) - Country 000. The layout of a CP Info sector is shown in Figure 8.

HPFS-4-fig8.png

Figure 8: The layout of a Code Page Information Sector.

The CP Info Entry contains the LSN where this entry's CP mapping table is stored. This sector is a CP Data Sector. As well as a header there is enough space for up to three 128-byte CP maps per sector. Figure 9 shows the layout of a CP Data Sector.

HPFS-4-fig9.png

Figure 9: The layout of a Code Page Data Sector.

The CP.cmd Program

Figure 10 shows the display produced by the REXX CP.cmd program (Figure 11). I've stopped it before it reached ASCII 255. Normally, the output will scroll off the screen, so either pause it or send it to the printer. If the mapped character has the same value as its ASCII value the word "same" is displayed instead to reduce clutter.

HPFS-4-fig10.png

Figure 10: Partial output from the CP.cmd program. List continues on to ASCII 255. /* Decodes CP info & CP data sectors on a HPFS volume */ ARG drive . /* First parm should always be drive */ IF drive = '' | drive = "?" | drive = "HELP", | drive = "A:" | drive = "B:" THEN CALL Help CALL RxFuncAdd 'ReadSect','Sector','ReadSect' /* In SECTOR.DLL */ secString = ReadSect(drive,17) /* SpareBlock is LSN 17 */ '@cls' SAY SAY "Inspecting drive" drive SAY /* Offset 33 in Spareblock contains dword of CP info LSN */ cpInfoSec = C2D(Reverse(Substr(secString,33,2))) secString = ReadSect(drive,cpInfoSec) /* Load CP info sec */ numOfCodePages = C2D(Reverse(Substr(secString,5,2))) prevDataSec = '' SAY "CODE PAGE INFORMATION (sector" cpInfoSec"):" SAY "Signature Dword: 0x"FourChars2Hex(1) SAY " CP# Ctry Code Code Page CP Data Sec Offset" DO x = 0 TO numOfCodePages-1 hexCountry = TwoChars2Hex((16*x)+17) decCountry = Right('00'X2D(hexCountry),3) cp = TwoChars2Hex((16*x)+19) country.x = X2D(cp) hexSec = FourChars2Hex((16*x)+25) decSec = X2D(hexSec) cpDataSec = decSec /* Since up to 3 CP tables can fit in 1 CP data sec, only read in a new data sec when the need arises. */ IF cpDataSec \= prevDataSec THEN DO dataSecString = ReadSect(drive,cpDataSec) prevDataSec = cpDataSec END offset = C2D(Reverse(Substr(dataSecString,(2*x)+21,2))) start = offset + 1 SAY " " x " 0x"hexCountry "("decCountry") 0x"cp "("X2D(cp)") 0x" hexSec "("decSec") 0x"D2X(offset) "("offset")" /* Wrapped long line */ /* Store table contents of each CP in an array */ DO y = 128 TO 255 char = Substr(dataSecString,start+6+y-18,1) IF C2D(char) \= y THEN array.x.y = Format(C2D(char),4) "("char")" ELSE array.x.y = " same " END y END x /* Work out title line based on number of CPs */ titleLine = " ASCII " DO x = 0 TO numOfCodePages-1 titleLine = titleLine " CP" country.x END x SAY SAY titleLine /* Display each table entry based on number of CPs */ DO y = 128 TO 255 dispLine = '' DO x = 0 TO numOfCodePages-1 dispLine = dispLine" "array.x.y END x SAY "" y "("D2C(y)"):" dispLine END y EXIT /****************EXECUTION ENDS HERE****************/ FourChars2Hex: ARG offset RETURN C2X(Reverse(Substr(secString,offset,4))) TwoChars2Hex: ARG offset RETURN C2X(Reverse(Substr(secString,offset,2))) Help: SAY "Purpose:" SAY " CP decodes the CodePage Directory sector &" SAY " the CodePage sector on a HPFS volume" SAY SAY "Example:" SAY " CP C:" EXIT Figure 11: The CP.cmd REXX program. Requires SECTOR.DLL.

While REXX does not support arrays it does have compound variables and I've used a CV called "array" to store the contents of each CP's mapping table. The design only deals with the first 31 CP Info entries (that should be more than enough anyway) and accommodates additional CPs by adding new columns to the display.

Armed with this printout you can experiment with different collating sequences when switching CPs. You can check out your current CP by typing "CHCP" and then switch to a different CP by issuing, say, "CHCP 850". I used "REM > File[Alt-nnn]" to create zero-length files, with one or more high-order ASCII characters in their filenames, as test fodder.

Conclusion

In this instalment you've learned how to decode the data band usage bitmaps contents and how to display the contents of the Code Page mapping tables. Next time we'll examine B-trees, DIRBLKs and DIRENTs.