Inside INF: Difference between revisions
mNo edit summary |
mNo edit summary |
||
(4 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
by [[Peter Childs]] | ''by [[Peter Childs]]'' | ||
==Introduction== | ==Introduction== | ||
===The Problem=== | |||
As I sat debating the approach I would adopt for this article it became distressingly obvious that there would be only a small audience of fanatics that would share my enthusiasm for the intricate details of the INF file format. | As I sat debating the approach I would adopt for this article it became distressingly obvious that there would be only a small audience of fanatics that would share my enthusiasm for the intricate details of the INF file format. | ||
I also had to painfully admit that others would not even know that INF/HLP files are the backbone of OS/2's online help system, and that most of those that did would be quite happy to consider the issue finished at that point. | I also had to painfully admit that others would not even know that INF/HLP files are the backbone of OS/2's online help system, and that most of those that did would be quite happy to consider the issue finished at that point. | ||
===The Solution=== | |||
In this article I will attempt to present an overview of the OS/2 INF/HLP file format in such a manner that a cursory read will leave the reader with a general idea of the file format. I will also explain some of the compression ideas used in the INF files and provide additional information for those wishing to investigate deeper. | |||
In this article I will attempt to present | |||
As my talents as a programmer are fairly limited I will not include large chunks of source. I am, however, working on developing some C++ classes to allow easy access to the information in INF files. If there is sufficient demand I may do an article showing the possible use of these classes. | As my talents as a programmer are fairly limited I will not include large chunks of source. I am, however, working on developing some C++ classes to allow easy access to the information in INF files. If there is sufficient demand I may do an article showing the possible use of these classes. | ||
==What does it mean?== | ==What does it mean?== | ||
===Some Basic Terms=== | |||
In this document we will be discussing the INF/HLP file format. | In this document we will be discussing the INF/HLP file format. | ||
INF files and HLP files are basically identical with the exception that INF files are generally designed to be viewed with the view.exe program, like this magazine, and HLP files are developed to provide online help for applications. To the best of my knowledge the file format is identical except for a single flag bit. From here on, when I refer to INF files I mean both INF and HLP files. | INF files and HLP files are basically identical with the exception that INF files are generally designed to be viewed with the view.exe program, like this magazine, and HLP files are developed to provide online help for applications. To the best of my knowledge the file format is identical except for a single flag bit. From here on, when I refer to INF files I mean both INF and HLP files. | ||
INF files are compiled with the IPFC (Information Presentation Facility Compiler) available with most OS/2 compilers, and the OS/2 | INF files are compiled with the IPFC (Information Presentation Facility Compiler) available with most OS/2 compilers, and the OS/2 Toolkits. The source used is a form of fairly simple markup, with the power to do just about anything you could want. | ||
The IPF Online Reference (1st Ed 1994) describes IPF: | The IPF Online Reference (1st Ed 1994) describes IPF: | ||
Line 32: | Line 27: | ||
==In the beginning== | ==In the beginning== | ||
===The Header=== | |||
The header is described here as a structure. When I began playing with INF files I just defined this structure and then read 155 bytes into it starting at offset 0. Although this worked fine for Borland C++ I had to muck with things with gcc to force the structure to be packed. | The header is described here as a structure. When I began playing with INF files I just defined this structure and then read 155 bytes into it starting at offset 0. Although this worked fine for Borland C++ I had to muck with things with gcc to force the structure to be packed. | ||
Most compilers offer a method of packing structures but if your code has to be portable then you will also have to consider the big-endian / small-endian stuff (ie. the bytes are stored differently in memory on | Most compilers offer a method of packing structures but if your code has to be portable then you will also have to consider the big-endian/small-endian stuff (ie. the bytes are stored differently in memory on SPARCs than PCs). Although probably obvious to most programmers, this had me stumped! | ||
Starting at file offset 0 the following structure resembles the header (packed): | Starting at file offset 0 the following structure resembles the header (packed): | ||
<code> | |||
struct os2infheader | struct os2infheader | ||
{ | { | ||
Line 47: | Line 39: | ||
int_8 unknown1; // unknown purpose, could be third letter of ID | int_8 unknown1; // unknown purpose, could be third letter of ID | ||
int_8 flags; // probably a flag word... | int_8 flags; // probably a flag word... | ||
//bit 0: set if INF style file | // bit 0: set if INF style file | ||
//bit 4: set if HLP style file | // bit 4: set if HLP style file | ||
//patching this byte allows reading HLP files | // patching this byte allows reading HLP files | ||
//using the VIEW command, while help files | // using the VIEW command, while help files | ||
// seem to work with INF settings here as well. | // seem to work with INF settings here as well. | ||
int_16 hdrsize; // total size of header | int_16 hdrsize; // total size of header | ||
Line 84: | Line 76: | ||
char8 title[48]; // ASCII title of database | char8 title[48]; // ASCII title of database | ||
} | } | ||
</code> | |||
''Figure 1) INF header structure'' | |||
Most of these values come in handy. | Most of these values come in handy. | ||
===Our Sample File=== | |||
Below is a simple IPF (source) file which I have compiled into an INF for use in examples. | |||
Below is a simple IPF (source) file which I have compiled into | <code> | ||
:userdoc. | :userdoc. | ||
:title. Sample INF file... | :title. Sample INF file... | ||
Line 104: | Line 93: | ||
:p. | :p. | ||
:artwork name='tocarray.bmp'. | :artwork name='tocarray.bmp'. | ||
:i1. This is | :i1. This is an index entry | ||
:euserdoc. | :euserdoc. | ||
</code> | |||
''Figure 2) Sample IPF file'' | |||
==Master Dictionary== | ==Master Dictionary== | ||
Each INF file has a master dictionary which holds all of the words and symbols used in the articles that make up the INF file. The dictionary starts at offset dictstart, has a length dictlen, and comprises of ndict words. | Each INF file has a master dictionary which holds all of the words and symbols used in the articles that make up the INF file. The dictionary starts at offset dictstart, has a length dictlen, and comprises of ndict words. | ||
[[Image:inside-inf1.png|frame|Figure 3) Master dictionary layout (in the file)]] | |||
In the example case above the dictionary is like this: | In the example case above the dictionary is like this: | ||
<code> | |||
[0] : (,) | [0] : (,) | ||
[1] : (.) | [1] : (.) | ||
Line 137: | Line 121: | ||
[14] : (This) | [14] : (This) | ||
[15] : (Windows) | [15] : (Windows) | ||
</code> | |||
''Figure 4) Master dictionary for the IPF sample'' | |||
Some things to note are that the source contains the word Os/2, whereas the dictionary contains the words '/', '2', and 'Os'. | Some things to note are that the source contains the word Os/2, whereas the dictionary contains the words '/', '2', and 'Os'. | ||
One way of loading the dictionary is detailed below (C++ code snippet). | One way of loading the dictionary is detailed below (C++ code snippet). | ||
<code> | |||
dict = new char*[ infHeader.ndict ]; // our array of pointers | dict = new char*[ infHeader.ndict ]; // our array of pointers | ||
Line 157: | Line 139: | ||
i += add; | i += add; | ||
} | } | ||
</code> | |||
''Figure 5) C++ code for loading the dictionary'' | |||
The method used is irrelevant but you need some way of mapping i to the i'th element of the dictionary. Also don't forget to delete the allocated memory if you use the above sample with: | The method used is irrelevant but you need some way of mapping i to the i'th element of the dictionary. Also don't forget to delete the allocated memory if you use the above sample with: | ||
delete[] dictstore; | delete[] dictstore; | ||
delete[] dict; | delete[] dict; | ||
''Figure 6) C++ to delete the dictionary'' | |||
==Articles== | ==Articles== | ||
Each article in an INF file is comprised of one or more slots. There are several structures that deal with slots. One is an array of offsets mapping i to the i'th slot's position in the file. Another is also the structure of the slot itself. Each slot also has a local dictionary that maps items in the slot to words in the master dictionary. | |||
===The Slots Array=== | |||
Beginning at file offset slotsstart (from the header) there is an array of int32's. These are offsets in the INF file in which the i'th slot can be found. | Beginning at file offset slotsstart (from the header) there is an array of int32's. These are offsets in the INF file in which the i'th slot can be found. | ||
int_32 slots[nslots] | int_32 slots[nslots] | ||
''Figure 7) Slots area declaration'' | |||
===The Slots Themselves=== | |||
Beginning at the file offset slots[i] the following structure can overlay the file: | Beginning at the file offset slots[i] the following structure can overlay the file: | ||
<code> | |||
{ | { | ||
int_8 stuff; // ?? [always seen 0] | int_8 stuff; // ?? [always seen 0] | ||
int_32 localdictpos; | int_32 localdictpos; // file offset of the local dictionary | ||
int_8 nlocaldict; | int_8 nlocaldict; // number of entries in the local dictionary | ||
int_16 ntext; // number of bytes in the text | int_16 ntext; // number of bytes in the text | ||
int_8 text[ntext]; | int_8 text[ntext]; // encoded text of the article | ||
} | } | ||
</code> | |||
''Figure 8) Slots structure'' | |||
===The Local Dictionary=== | |||
The local dictionary is used to map items in the encoded text of the slot to words in the master dictionary. Take note that the nlocaldict variable in a slot's structure is a byte in size, hence a single slot can only have a maximum of 255 (really 250 - we'll discuss that later) different words from the master dictionary in it. | The local dictionary is used to map items in the encoded text of the slot to words in the master dictionary. Take note that the nlocaldict variable in a slot's structure is a byte in size, hence a single slot can only have a maximum of 255 (really 250 - we'll discuss that later) different words from the master dictionary in it. | ||
Beginning at file offset localdictpos (for each article) there is an array: | Beginning at file offset localdictpos (for each article) there is an array: | ||
int_16 localdict[nlocaldict] | int_16 localdict[nlocaldict] | ||
''Figure 9) Local dictionary declaration'' | |||
===The Text Itself=== | |||
The encoded text is decoded somewhat like the following: | The encoded text is decoded somewhat like the following: | ||
<code> | |||
bool space = TRUE; | bool space = TRUE; | ||
while( i++ < ntext ) | while( i++ < ntext ) | ||
switch( text[i] ) | switch( text[i] ) | ||
{ | { | ||
case | case 0xfa: // end of paragraph, sets space to TRUE | ||
break; | break; | ||
case 0xfb: // [unknown] | case 0xfb: // [unknown] | ||
Line 238: | Line 199: | ||
break; | break; | ||
} | } | ||
</code> | |||
''Figure 10) Sample code for decoding text'' | |||
It is pretty obvious that this doesn't leave a lot of space for formatting commands. This is where the escape codes come in. The general format for an escape code is: | It is pretty obvious that this doesn't leave a lot of space for formatting commands. This is where the escape codes come in. The general format for an escape code is: | ||
<code> | |||
{ | { | ||
int_8 FF; | int_8 FF; // always equals 0xFF | ||
int_8 esclen; // length of sequence | int_8 esclen; // length of sequence | ||
// (including esclen, excluding 0xFF) | |||
int_8 escCode; | int_8 escCode; // which escape code | ||
} | } | ||
</code> | |||
''Figure 11) Escape codes structure'' | |||
These escape codes define things like setting margins, inter document links, and the like. We will ignore them here for the moment. There are described in the inf03.txt. | These escape codes define things like setting margins, inter document links, and the like. We will ignore them here for the moment. There are described in the inf03.txt. | ||
===So Show Me Something That Works!=== | |||
Ok, here's a snippet that shows the basic idea behind decoding slots. For the sake of simplicity we ignore most of the nasty stuff. You should also note that although each slot contains some text - the way these slots fit together is described next in Table of Contents. | Ok, here's a snippet that shows the basic idea behind decoding slots. For the sake of simplicity we ignore most of the nasty stuff. You should also note that although each slot contains some text - the way these slots fit together is described next in Table of Contents. | ||
Line 266: | Line 223: | ||
==Table of Contents== | ==Table of Contents== | ||
The Table of contents is created by loading in an array of [ntoc] 32-bit offsets, starting at offset tocstart. | The Table of contents is created by loading in an array of [ntoc] 32-bit offsets, starting at offset tocstart. | ||
At the offset ( tocentrystart[i] ) a toc entry structure is located that contains information including the title, the items level in the table of contents, if it is hidden or not, and, most importantly, how many and which slots make up the item. There is also a 'has_children' flag which if true means the following entry has a higher level. | At the offset ( tocentrystart[i] ) a toc entry structure is located that contains information including the title, the items level in the table of contents, if it is hidden or not, and, most importantly, how many and which slots make up the item. There is also a 'has_children' flag which if true means the following entry has a higher level. | ||
[[Image:inside-inf2.png|frame|Figure 12) TOC (Table of Contents) entries]] | |||
====Index==== | ====Index==== | ||
The index is pretty simple and relies on table of contents a fair bit. Beginning at file offset indexstart there is nindex structures like the following stored. | The index is pretty simple and relies on table of contents a fair bit. Beginning at file offset indexstart there is nindex structures like the following stored. | ||
<code> | |||
{ | { | ||
int_8 | int_8 nword; // size of name | ||
int_8 | int_8 level; // indent level | ||
int_8 | int_8 stuff; | ||
int_16 toc; | int_16 toc; // toc entry number of panel | ||
char8 | char8 word[nword]; // index word [not zero-terminated] | ||
} | } | ||
</code> | |||
''Figure 13) Index structure'' | |||
==Bitmaps== | ==Bitmaps== | ||
I am only going to cover this at a superficial level. I am only going to describe the compression used in newer INF files. The older (ie. v1.3 etc) INF files use a proprietary compression scheme. | I am only going to cover this at a superficial level. I am only going to describe the compression used in newer INF files. The older (ie. v1.3 etc) INF files use a proprietary compression scheme. | ||
The newer INF files use | The newer INF files use an LZW based compression scheme. This scheme is basically the same as the one covered in 'LZW Revisited' (Dr.Dobbs June 1990). You must alter the decompression code to use MAX_BITS 12 or you will spend a long time figuring out that after that last input byte 512 your output is all wrong [grin]. | ||
===Getting Started=== | |||
There seems to be no array of bitmap offsets in the file anywhere but there is a general start for the image information - imgstart. | There seems to be no array of bitmap offsets in the file anywhere but there is a general start for the image information - imgstart. | ||
Line 306: | Line 255: | ||
you know is an escape code (0xff) of length 7 bytes (0x07); that code is for a bitmap/metafile (0x0E). The next byte is the flags byte and it breaks down like this: | you know is an escape code (0xff) of length 7 bytes (0x07); that code is for a bitmap/metafile (0x0E). The next byte is the flags byte and it breaks down like this: | ||
<code> | |||
if( items[i] & 0x01 ) printf ("Left "); // 00000001 | if( items[i] & 0x01 ) printf ("Left "); // 00000001 | ||
if( items[i] & 0x02 ) printf ("Right "); // 00000010 | if( items[i] & 0x02 ) printf ("Right "); // 00000010 | ||
Line 313: | Line 261: | ||
if( items[i] & 0x08 ) printf ("Fit "); // 00001000 | if( items[i] & 0x08 ) printf ("Fit "); // 00001000 | ||
if( items[i] & 0x10 ) printf ("Runin "); // 00010000 | if( items[i] & 0x10 ) printf ("Runin "); // 00010000 | ||
</code> | |||
''Figure 14) Alignment for bitmaps'' | |||
The next four bytes are a 32-bit offset from imgstart to the bitmap/metafile. (i.e. you do an is.seekg( imgstart + offset) Also remember again here that if you are writing for cross platform support you'll have to deal with the big-endian, little-endian issue.) | |||
The next four bytes are a 32-bit offset from imgstart to the bitmap/metafile. (i.e. you do | |||
===The Bitmap Header and Colour table=== | |||
If you do the seek and read in the next two bytes you'll be able to know what sort of image comes next. mf means a metafile (I think - I haven't seen one!). BM means the old bitmap compression, and bM is the one that we are happy to see. | If you do the seek and read in the next two bytes you'll be able to know what sort of image comes next. mf means a metafile (I think - I haven't seen one!). BM means the old bitmap compression, and bM is the one that we are happy to see. | ||
So, if everything is OK (ie 'bM') then read in a basic OS2BITMAP_FILEHEADER and OS2BITMAP_INFOHEADER. Something like this: | So, if everything is OK (ie 'bM') then read in a basic OS2BITMAP_FILEHEADER and OS2BITMAP_INFOHEADER. Something like this: | ||
<code> | |||
{ // BITMAP FILE HEADER | { // BITMAP FILE HEADER | ||
char8 | char8 usType[2]; // = 'bM'; | ||
int_32 cbSize; | int_32 cbSize; | ||
int_16 xHotSpot; | int_16 xHotSpot; | ||
int_16 yHotSpot; | int_16 yHotSpot; | ||
int_32 offBits; | int_32 offBits; // =size(hdr)+size(colortbl) | ||
// BITMAP INFO HEADER | // BITMAP INFO HEADER | ||
int_32 cbFix; // =size(info_hdr) (usually = 12?) | int_32 cbFix; // =size(info_hdr) (usually = 12?) | ||
int_16 cx; // x size | int_16 cx; // x size | ||
int_16 cy; // y size | int_16 cy; // y size | ||
int_16 cPlanes; | int_16 cPlanes; // color planes | ||
int_16 cBitCount; | int_16 cBitCount; | ||
} | } | ||
</code> | |||
''Figure 15) Bitmap information structure'' | |||
A quick note that if you are going to use this structure "as is" to dump to a bitmap then you'll have to change the offBits like: | A quick note that if you are going to use this structure "as is" to dump to a bitmap then you'll have to change the offBits like: | ||
offBits = 14 + 12 + ( 3 * ( 1 << cBitCount ) ); | offBits = 14 + 12 + ( 3 * ( 1 << cBitCount ) ); | ||
''Figure 16) Code change to use structure in figure 15'' | |||
Next up after the header comes the colour table which is basically an array of ( 1 << cBitCount ) RGB entries (well actually 1 byte Blue, 1 byte Green, 1 byte Red) | Next up after the header comes the colour table which is basically an array of ( 1 << cBitCount ) RGB entries (well actually 1 byte Blue, 1 byte Green, 1 byte Red) | ||
===Data Blocks=== | |||
Next up (after all that) comes the Master Data Block, and one (or more) minor data blocks - each with their own compression type. | Next up (after all that) comes the Master Data Block, and one (or more) minor data blocks - each with their own compression type. | ||
<code> | |||
{ // Master Data Block | { // Master Data Block | ||
int_32 num_to_follow; // total number of bytes to follow | int_32 num_to_follow; // total number of bytes to follow | ||
Line 366: | Line 304: | ||
int_8 comp_type; // compression type 0=uncompressed, 2=lzw-based | int_8 comp_type; // compression type 0=uncompressed, 2=lzw-based | ||
} | } | ||
</code> | |||
''Figure 17) Data block structures'' | |||
To help with your understanding here is the output from a program I wrote while working out the decompression routine. | To help with your understanding here is the output from a program I wrote while working out the decompression routine. | ||
Line 375: | Line 313: | ||
==Parting words== | ==Parting words== | ||
All my knowledge of the INF file format stemmed from the work of others. I hopefully have added some small bits of useful information and my motivation for writing this article is to make that information available to others. | All my knowledge of the INF file format stemmed from the work of others. I hopefully have added some small bits of useful information and my motivation for writing this article is to make that information available to others. | ||
The document that encouraged me into investigating the INF file format is available at [[OS/2 2.0 Information Presentation Facility (IPF) Data Format]] and was authored by Carl Hauser, and updated by Marcus Groeber. I lifted lots of stuff out of it for this article and have included inf03.txt as a slightly updated version. | The document that encouraged me into investigating the INF file format is available at [[OS/2 2.0 Information Presentation Facility (IPF) Data Format]] and was authored by Carl Hauser, and updated by Marcus Groeber. I lifted lots of stuff out of it for this article and have included inf03.txt as a slightly updated version. | ||
[[Category:Tools Articles]] |
Latest revision as of 01:59, 12 February 2023
by Peter Childs
Introduction
The Problem
As I sat debating the approach I would adopt for this article it became distressingly obvious that there would be only a small audience of fanatics that would share my enthusiasm for the intricate details of the INF file format.
I also had to painfully admit that others would not even know that INF/HLP files are the backbone of OS/2's online help system, and that most of those that did would be quite happy to consider the issue finished at that point.
The Solution
In this article I will attempt to present an overview of the OS/2 INF/HLP file format in such a manner that a cursory read will leave the reader with a general idea of the file format. I will also explain some of the compression ideas used in the INF files and provide additional information for those wishing to investigate deeper.
As my talents as a programmer are fairly limited I will not include large chunks of source. I am, however, working on developing some C++ classes to allow easy access to the information in INF files. If there is sufficient demand I may do an article showing the possible use of these classes.
What does it mean?
Some Basic Terms
In this document we will be discussing the INF/HLP file format.
INF files and HLP files are basically identical with the exception that INF files are generally designed to be viewed with the view.exe program, like this magazine, and HLP files are developed to provide online help for applications. To the best of my knowledge the file format is identical except for a single flag bit. From here on, when I refer to INF files I mean both INF and HLP files.
INF files are compiled with the IPFC (Information Presentation Facility Compiler) available with most OS/2 compilers, and the OS/2 Toolkits. The source used is a form of fairly simple markup, with the power to do just about anything you could want.
The IPF Online Reference (1st Ed 1994) describes IPF:
The Information Presentation Facility (IPF) is a tool that enables you to create online information, to specify how it will appear on the screen, to connect various parts of the information, and to provide help information that can be requested by the user.
It is important to realize the difference between IPF source markup, and INF files. The INF files are the compiled versions, and the difference is as marked as the difference between C source and an executable. The IPF source markup is well documented, whereas the INF file format is not officially documented.
In the beginning
The Header
The header is described here as a structure. When I began playing with INF files I just defined this structure and then read 155 bytes into it starting at offset 0. Although this worked fine for Borland C++ I had to muck with things with gcc to force the structure to be packed.
Most compilers offer a method of packing structures but if your code has to be portable then you will also have to consider the big-endian/small-endian stuff (ie. the bytes are stored differently in memory on SPARCs than PCs). Although probably obvious to most programmers, this had me stumped!
Starting at file offset 0 the following structure resembles the header (packed):
struct os2infheader
{
int_16 ID; // ID magic word (5348h = "HS")
int_8 unknown1; // unknown purpose, could be third letter of ID
int_8 flags; // probably a flag word...
// bit 0: set if INF style file
// bit 4: set if HLP style file
// patching this byte allows reading HLP files
// using the VIEW command, while help files
// seem to work with INF settings here as well.
int_16 hdrsize; // total size of header
int_16 unknown2; // unknown purpose
int_16 ntoc; // 16 bit number of entries in the tocarray
int_32 tocstrtablestart; // 32 bit file offset of the start of the
// strings for the table-ofcontents
int_32 tocstrlen; // number of bytes in file occupied by the
// table-of-contents strings int_32 tocstart;
// 32 bit file offset of the start of tocarray
int_16 nres; // number of panels with resource numbers
int_32 resstart; // 32 bit file offset of resource number table
int_16 nname; // number of panels with textual name
int_32 namestart; // 32 bit file offset to panel name table
int_16 nindex; // number of index entries
int_32 indexstart; // 32 bit file offset to index table
int_32 indexlen; // size of index table
int_8 unknown3[10]; // unknown purpose
int_32 searchstart; // 32 bit file offset of full text search table
int_32 searchlen; // size of full text search table
int_16 nslots; // number of "slots"
int_32 slotsstart; // file offset of the slots array
int_32 dictlen; // number of bytes occupied by the
// "dictionary"
int_16 ndict; // number of entries in the dictionary
int_32 dictstart; // file offset of the start of the dictionary
int_32 imgstart; // file offset of image data
int_8 unknown4; // unknown purpose
int_32 nlsstart; // 32 bit file offset of NLS table
int_32 nlslen; // size of NLS table
int_32 extstart; // 32 bit file offset of extended data block
int_8 unknown5[12]; // unknown purpose
char8 title[48]; // ASCII title of database
}
Figure 1) INF header structure
Most of these values come in handy.
Our Sample File
Below is a simple IPF (source) file which I have compiled into an INF for use in examples.
:userdoc.
:title. Sample INF file...
:h1.Header One
:p.This is a test. Os/2, lies, and Windows 95.
:p.1234.5
:artwork name='in_inf.bmp'.
Hello
:p.
:artwork name='tocarray.bmp'.
:i1. This is an index entry
:euserdoc.
Figure 2) Sample IPF file
Master Dictionary
Each INF file has a master dictionary which holds all of the words and symbols used in the articles that make up the INF file. The dictionary starts at offset dictstart, has a length dictlen, and comprises of ndict words.

In the example case above the dictionary is like this:
[0] : (,)
[1] : (.)
[2] : (/)
[3] : (1234)
[4] : (2)
[5] : (5)
[6] : (95)
[7] : (a)
[8] : (and)
[9] : (Hello)
[10] : (is)
[11] : (lies)
[12] : (Os)
[13] : (test)
[14] : (This)
[15] : (Windows)
Figure 4) Master dictionary for the IPF sample
Some things to note are that the source contains the word Os/2, whereas the dictionary contains the words '/', '2', and 'Os'.
One way of loading the dictionary is detailed below (C++ code snippet).
dict = new char*[ infHeader.ndict ]; // our array of pointers
// change all length bytes to '\0' and set pointers
// to start of each word
while( i < infHeader.dictlen && j < infHeader.ndict )
{
add = dictstore[i];
dict[j++] = &( dictstore[i+1] );
dictstore[i] = '\0';
i += add;
}
Figure 5) C++ code for loading the dictionary
The method used is irrelevant but you need some way of mapping i to the i'th element of the dictionary. Also don't forget to delete the allocated memory if you use the above sample with:
delete[] dictstore; delete[] dict;
Figure 6) C++ to delete the dictionary
Articles
Each article in an INF file is comprised of one or more slots. There are several structures that deal with slots. One is an array of offsets mapping i to the i'th slot's position in the file. Another is also the structure of the slot itself. Each slot also has a local dictionary that maps items in the slot to words in the master dictionary.
The Slots Array
Beginning at file offset slotsstart (from the header) there is an array of int32's. These are offsets in the INF file in which the i'th slot can be found.
int_32 slots[nslots]
Figure 7) Slots area declaration
The Slots Themselves
Beginning at the file offset slots[i] the following structure can overlay the file:
{
int_8 stuff; // ?? [always seen 0]
int_32 localdictpos; // file offset of the local dictionary
int_8 nlocaldict; // number of entries in the local dictionary
int_16 ntext; // number of bytes in the text
int_8 text[ntext]; // encoded text of the article
}
Figure 8) Slots structure
The Local Dictionary
The local dictionary is used to map items in the encoded text of the slot to words in the master dictionary. Take note that the nlocaldict variable in a slot's structure is a byte in size, hence a single slot can only have a maximum of 255 (really 250 - we'll discuss that later) different words from the master dictionary in it.
Beginning at file offset localdictpos (for each article) there is an array:
int_16 localdict[nlocaldict]
Figure 9) Local dictionary declaration
The Text Itself
The encoded text is decoded somewhat like the following:
bool space = TRUE;
while( i++ < ntext )
switch( text[i] )
{
case 0xfa: // end of paragraph, sets space to TRUE
break;
case 0xfb: // [unknown]
break;
case 0xfc: // spacing = !spacing
break;
case 0xfd: // line break, set space to TRUE if not monospaced
// example
break;
case 0xfe: // space
break;
case 0xff: // escape code
break;
default: // output dict[localwords[text[i]]] and, if
// space==TRUE a space.
break;
}
Figure 10) Sample code for decoding text
It is pretty obvious that this doesn't leave a lot of space for formatting commands. This is where the escape codes come in. The general format for an escape code is:
{
int_8 FF; // always equals 0xFF
int_8 esclen; // length of sequence
// (including esclen, excluding 0xFF)
int_8 escCode; // which escape code
}
Figure 11) Escape codes structure
These escape codes define things like setting margins, inter document links, and the like. We will ignore them here for the moment. There are described in the inf03.txt.
So Show Me Something That Works!
Ok, here's a snippet that shows the basic idea behind decoding slots. For the sake of simplicity we ignore most of the nasty stuff. You should also note that although each slot contains some text - the way these slots fit together is described next in Table of Contents.
Included with this issue of EDM/2 is the source for a small program, called exttext.cc, that extracts all the textual information from an INF file. Also included is a simple INF header class.
A few quick notes I've noticed about decoding articles that isn't mentioned in the inf02a.doc. When you decode multi- slots articles, the state of SPACE (ie true or false) is retained between the article and the next. Basically, although the local dictionary changes, pretend that each document is merge onto the next -- with regard to settings like the left and right margins, fonts, colours, font styles, and space.
Table of Contents
The Table of contents is created by loading in an array of [ntoc] 32-bit offsets, starting at offset tocstart.
At the offset ( tocentrystart[i] ) a toc entry structure is located that contains information including the title, the items level in the table of contents, if it is hidden or not, and, most importantly, how many and which slots make up the item. There is also a 'has_children' flag which if true means the following entry has a higher level.

Index
The index is pretty simple and relies on table of contents a fair bit. Beginning at file offset indexstart there is nindex structures like the following stored.
{
int_8 nword; // size of name
int_8 level; // indent level
int_8 stuff;
int_16 toc; // toc entry number of panel
char8 word[nword]; // index word [not zero-terminated]
}
Figure 13) Index structure
Bitmaps
I am only going to cover this at a superficial level. I am only going to describe the compression used in newer INF files. The older (ie. v1.3 etc) INF files use a proprietary compression scheme.
The newer INF files use an LZW based compression scheme. This scheme is basically the same as the one covered in 'LZW Revisited' (Dr.Dobbs June 1990). You must alter the decompression code to use MAX_BITS 12 or you will spend a long time figuring out that after that last input byte 512 your output is all wrong [grin].
Getting Started
There seems to be no array of bitmap offsets in the file anywhere but there is a general start for the image information - imgstart.
Here's the basic rundown on decompression:
When during the decompression of a slot you come across a sequence something like 0xff + 0x07 + 0x0E + 0x01 + 0x00 + 0x00 + 0x00 + 0x00
you know is an escape code (0xff) of length 7 bytes (0x07); that code is for a bitmap/metafile (0x0E). The next byte is the flags byte and it breaks down like this:
if( items[i] & 0x01 ) printf ("Left "); // 00000001
if( items[i] & 0x02 ) printf ("Right "); // 00000010
if( items[i] & 0x04 ) printf ("Center "); // 00000100
if( items[i] & 0x08 ) printf ("Fit "); // 00001000
if( items[i] & 0x10 ) printf ("Runin "); // 00010000
Figure 14) Alignment for bitmaps
The next four bytes are a 32-bit offset from imgstart to the bitmap/metafile. (i.e. you do an is.seekg( imgstart + offset) Also remember again here that if you are writing for cross platform support you'll have to deal with the big-endian, little-endian issue.)
The Bitmap Header and Colour table
If you do the seek and read in the next two bytes you'll be able to know what sort of image comes next. mf means a metafile (I think - I haven't seen one!). BM means the old bitmap compression, and bM is the one that we are happy to see.
So, if everything is OK (ie 'bM') then read in a basic OS2BITMAP_FILEHEADER and OS2BITMAP_INFOHEADER. Something like this:
{ // BITMAP FILE HEADER
char8 usType[2]; // = 'bM';
int_32 cbSize;
int_16 xHotSpot;
int_16 yHotSpot;
int_32 offBits; // =size(hdr)+size(colortbl)
// BITMAP INFO HEADER
int_32 cbFix; // =size(info_hdr) (usually = 12?)
int_16 cx; // x size
int_16 cy; // y size
int_16 cPlanes; // color planes
int_16 cBitCount;
}
Figure 15) Bitmap information structure
A quick note that if you are going to use this structure "as is" to dump to a bitmap then you'll have to change the offBits like:
offBits = 14 + 12 + ( 3 * ( 1 << cBitCount ) );
Figure 16) Code change to use structure in figure 15
Next up after the header comes the colour table which is basically an array of ( 1 << cBitCount ) RGB entries (well actually 1 byte Blue, 1 byte Green, 1 byte Red)
Data Blocks
Next up (after all that) comes the Master Data Block, and one (or more) minor data blocks - each with their own compression type.
{ // Master Data Block
int_32 num_to_follow; // total number of bytes to follow
int_16 uncompressed_bytes; // uncompressed bytes in each block
}
{ // Minor Data Block
int_16 x_bytes_to_follow; // number of bytes in this block (to follow)
int_8 comp_type; // compression type 0=uncompressed, 2=lzw-based
}
Figure 17) Data block structures
To help with your understanding here is the output from a program I wrote while working out the decompression routine.
[Editor's note - my word processor completely destroyed the alignment that was present in the output, so I feel forced to delete it. My apologies.]
Parting words
All my knowledge of the INF file format stemmed from the work of others. I hopefully have added some small bits of useful information and my motivation for writing this article is to make that information available to others.
The document that encouraged me into investigating the INF file format is available at OS/2 2.0 Information Presentation Facility (IPF) Data Format and was authored by Carl Hauser, and updated by Marcus Groeber. I lifted lots of stuff out of it for this article and have included inf03.txt as a slightly updated version.