Jump to content

OS/2 2.0 Information Presentation Facility (IPF) Data Format: Difference between revisions

From EDM2
No edit summary
Ak120 (talk | contribs)
mNo edit summary
 
(13 intermediate revisions by 3 users not shown)
Line 1: Line 1:
by [[Carl Hauser]] and [[Marcus Groeber]]
''by [[Carl Hauser]], [[Marcus Groeber]] and [[User:graemeg|Graeme Geldenhuys]]''


==Carl Hauser's Introduction==
==Carl Hauser's Introduction==
 
Having become extremely frustrated by VIEW.EXE's penchant for windows that come and go, without even opening large enough to see everything in them, I thought I'd try to turn .INF files into something more conventional. While I don't have code to offer, I can tell you what I learned about .INF format it was enough to produce more-or-less readable more-or-less plaintext from .INFs.
Having become extremely frustrated by VIEW.EXE's penchant for windows that come and go, without even opening large enough to see everything in them, I thought I'd try to turn .INF files into something more conventional. While I don't have code to offer, I can tell you what I learned about .INF format--it was enough to produce more-or-less readable more-or-less plaintext from .INFs.


I offer this in the hope that somebody will give the community a really nice, tasteful, convenient, doesn't-use-too-much-screen-real-estate .INF browser to replace VIEW.EXE.
I offer this in the hope that somebody will give the community a really nice, tasteful, convenient, doesn't-use-too-much-screen-real-estate .INF browser to replace VIEW.EXE.


All of this was developed by looking at .INF files without any documentation
All of this was developed by looking at .INF files without any documentation of the format except what VIEW.EXE showed for a particular feature.
of the format except what VIEW.EXE showed for a particular feature.
 
I don't have a lot of personal interest in refining this document with additional escape sequences, etc., but I would be happy to correspond with someone who wanted to fill in the details, or to clarify anything that may be confusing.  If someone could point us to an official document describing the format that would be most helpful.


I don't have a lot of personal interest in refining this document with additional escape sequences, etc., but I would be happy to correspond with someone who wanted to fill in the details, or to clarify anything that may be confusing. If someone could point us to an official document describing the format that would be most helpful.


==Marcus Groeber's Introduction==
==Marcus Groeber's Introduction==
The original document contained most of the real tricky stuff in the file format (especially the compression algorithm) so going on from there was mainly a task of creating lots of help files using the IPFC and the decompiling them again to see what came out.
The original document contained most of the real tricky stuff in the file format (especially the compression algorithm) so going on from there was mainly a task of creating lots of help files using the IPFC and the decompiling them again to see what came out.


Line 22: Line 18:


Further research should go into the way multiple windows are handled (I didn't work on that because I have never required multiple window displays in my help files and therefore am not familiar with the concepts). Font usage and graphics linking could also use some more fiddling around.
Further research should go into the way multiple windows are handled (I didn't work on that because I have never required multiple window displays in my help files and therefore am not familiar with the concepts). Font usage and graphics linking could also use some more fiddling around.


==Types==
==Types==
All numeric quantities are least-significant-byte first in the file(little-endian).
All numeric quantities are least-significant-byte first in the file(little-endian).


Line 36: Line 30:


==The File Header==
==The File Header==
Starting at file offset 0 the following structure can overlay the file to provide some starting values:
Starting at file offset 0 the following structure can overlay the file to provide some starting values:
         {
         {
Line 80: Line 73:


==The table of contents array==
==The table of contents array==
Beginning at file offset tocstart, this structure can overlay the file:
Beginning at file offset tocstart, this structure can overlay the file:
         {
         {
Line 88: Line 80:


==The table of contents entries==
==The table of contents entries==
Beginning at each file offset, tocentrystart[i]:
Beginning at each file offset, tocentrystart[i]:
         {
         {
Line 113: Line 104:
         }
         }


if extended is 1 there are intervening bytes that (I think) describe
if extended is 1 there are intervening bytes that (I think) describe the kind, size and position of the window in which to display the article. I haven't decoded these bytes, though in most cases the following tells how many there are. Overlay the following on the next
the kind, size and position of the window in which to display the
two bytes:
article. I haven't decoded these bytes, though in most cases the
{
following tells how many there are. Overlay the following on the next
    int8 w1;
two bytes
    int8 w2;
        {
}
          int8 w1;
Here's a C code fragment for computing the number of bytes to skip:
          int8 w2;
int bytestoskip = 0;
        }
if (w1 & 0x8) { bytestoskip += 2 };
Here's a C code fragment for computing the number of bytes to skip
if (w1 & 0x1) { bytestoskip += 5 };
        int bytestoskip = 0;
if (w1 & 0x2) { bytestoskip += 5 };
        if (w1 & 0x8) { bytestoskip += 2 };
if (w2 & 0x4) { bytestoskip += 2 };
        if (w1 & 0x1) { bytestoskip += 5 };
        if (w1 & 0x2) { bytestoskip += 5 };
        if (w2 & 0x4) { bytestoskip += 2 };


skip over bytestoskip bytes (after w2) and find the tocslots and title
skip over bytestoskip bytes (after w2) and find the tocslots and title
Line 133: Line 121:


==The Slots array==
==The Slots array==
Beginning at file offset slotsstart (provided by the file header) find
Beginning at file offset slotsstart (provided by the file header) find
        {
{
          int32 slots[nslots];      // file offset of the article
    int32 slots[nslots];      // file offset of the article
                                      // corresponding to this slot
                              // corresponding to this slot
        }
}


==The Dictionary==
==The Dictionary==
 
Beginning at file offset dictstart (provided by the file header) and continuing until ndict entries have been read (and dictlen bytes have been consumed from the file) find a sequence of length-preceeded strings. Note that the length includes the length byte (not Pascal
Beginning at file offset dictstart (provided by the file header) and
compatible!). Build a table mapping i to the ith string.
continuing until ndict entries have been read (and dictlen bytes have
{
been consumed from the file) find a sequence of length-preceeded
    char8*    strings[ndict];
strings. Note that the length includes the length byte (not Pascal
}
compatible!). Build a table mapping i to the ith string.
        {
          char8*    strings[ndict];
        }


==The Article entries==
==The Article entries==
 
Beginning at file offset slots[i] the following structure can overlay the file:
Beginning at file offset slots[i] the following structure can overlay
the file:
         {
         {
           int8      stuff;          // ?? [always seen 0]
           int8      stuff;          // ?? [always seen 0]
Line 164: Line 145:


==The Local dictionary==
==The Local dictionary==
 
Beginning at file position localdictpos (for each article) there is an array:
Beginning at file position localdictpos (for each article) there is an
{
array:
      int16      localwords[nlocaldict];
        {
}
          int16      localwords[nlocaldict];
        }


==The Text==
==The Text==
 
The text for an article then consists of words obtained by referencing strings[localwords[text[i]]] for i in (0..ntext), with the following exceptions. If text[i] is greater than nlocaldict it means
The text for an article then consists of words obtained by referencing
strings[localwords[text[i]]] for i in (0..ntext), with the following
exceptions. If text[i] is greater than nlocaldict it means
 
       0xfa => end-of-paragraph, sets spacing to TRUE if
       0xfa => end-of-paragraph, sets spacing to TRUE if
                 not in a monospace example)
                 not in a monospace example)
Line 187: Line 162:
       0xff => escape sequence                        // see below
       0xff => escape sequence                        // see below


When spacing is true, each word needs a space put after it. When false,
When spacing is true, each word needs a space put after it. When false, the words are abutted and spaces are supplied using 0xfe or the
the words are abutted and spaces are supplied using 0xfe or the
dictionary. Examples are entered and left with 0xff escape sequences. The variable "spacing" is initially (start of every article slot) TRUE.
dictionary. Examples are entered and left with 0xff escape sequences. The
variable "spacing" is initially (start of every article slot) TRUE.


==0xff escape sequences==
==0xff escape sequences==
 
These are used to change fonts, make cross references, enter and leave examples, etc. The general format is:
These are used to change fonts, make cross references, enter and leave examples, etc. The general format is
<pre>
        {
{
          int8      FF;            // always equals 0xff
    int8      FF;            // always equals 0xff
          int8      esclen;        // length of the sequence (including
    int8      esclen;        // length of the sequence (including
                                      // esclen but excluding FF)
                              // esclen but excluding FF)
          int8      escCode;        // which escape function
    int8      escCode;        // which escape function
        }
}
 
</pre>
escCodes I have partially deciphered are
escCodes I have partially deciphered are
{|class="wikitable"
|-
|0x01||unknown
|-
|0x02 or 0x11 or 0x12||(esclen==3) set left margin. 0x11 always starts a new line. Arguments
{
    int8  margin;  // in spaces, 0=no margin
}
note: in an IPF source, you must code
:lm margin=256. to reset the left margin.
|-
|0x03||(esclen==3) set right margin. Arguments
{
    int8  margin;  // in spaces, 1=no margin
}
|-
|0x04||(esclen==3) change style. Arguments
{
    int8  style;    // 1,2,3: same as :hp#.
                    // 4,5,6: same as :hp5,6,7.
                    // 0 returns to plain text
}
|-
|0x05||(esclen varies) beginning of cross reference. The next two bytes of the escape sequence are an int16 index of the tocentrystart array. The remaining bytes describe the size, position and characteristics of the window created when the cross-reference is followed by VIEW.


        0x01 =>              unknown
I have not decoded this.
 
|-
        0x02 or 0x11 =>      (esclen==3) set left margin.
|0x06||unknown
            or 0x12          0x11 always starts a new line. Arguments
|-
                                {
|0x07||(esclen==4) footnote (:fn. tag). Arguments:
                                  int8  margin;  // in spaces, 0=no margin
{
                                }
    int16  toc;  // toc entry number of text
                              note: in an IPF source, you must code
}
                              :lm margin=256. to reset the left margin.
|-
 
|0x08||(escLen==2) end of cross reference introduced by escape code 0x05 or 0x07
        0x03 =>              (esclen==3) set right margin. Arguments
|-
                                {
|0x09||unknown
                                  int8  margin;  // in spaces, 1=no margin
|-
                                }
|0x0A||unknown
 
|-
        0x04 =>              (esclen==3) change style. Arguments
|0x0B||(escLen==2) begin monosp. example. set spacing to FALSE
                                {
|-
                                  int8  style;    // 1,2,3: same as :hp#.
|0x0C||(escLen==2) end monosp. example. set spacing to TRUE
                                                  // 4,5,6: same as :hp5,6,7.
|-
                                                  // 0 returns to plain text
|0x0D||(escLen==2) special text colors. Arguments:
                                }
{
 
  int8  color;  // 1,2,3: same as :hp4,8,9.
        0x05 =>              (esclen varies) beginning of cross
                  // 0: default color
                              reference.  The next two bytes of the
}
                              escape sequence are an int16 index of
|-
                              the tocentrystart array.  The
|0x0E||[bitmap]
                              remaining bytes describe the size,
|-
                              position and characteristics of the
|0x0F||if esclen==5 an inlined cross reference: the title of the referenced article becomes part of the text.<br/>
                              window created when the
This is probably the case even if esclen is not 5, but I don't know the decoding. In the case that esclen is 5, I don't know the purpose of the byte following the escCode, but the two bytes after that are an int16 index of the tocentrystart array.
                              cross-reference is followed by VIEW.
|-
                              I have not decoded this.
|0x10||[special link, reftype=launch]
 
|-
        0x06 =>              unknown
||0x13 or 0x14||(esclen==2) Set foreground (0x13) and background (0x14) color.  Arguments:
 
{
        0x07 =>              (esclen==4) footnote (:fn. tag). Arguments:
    int8  color;
                                {
}
                                  int16  toc;  // toc entry number of text
|-
                                }
|0x15||unknown
 
|-
        0x08 =>              (escLen==2) end of cross reference
|0x16||[special link, reftype=inform]
                              introduced by escape code 0x05 or 0x07
|-
 
|0x17||hide text (:hide. tag). Arguments:
        0x09 =>              unknown
{
 
    char8 key[];  // key required to show text
        0x0A =>              unknown
}
 
|-
        0x0B =>              (escLen==2) begin monosp. example. set
|0x18||end of hidden text (:ehide.)
                              spacing to FALSE
|-
 
|0x19||(esclen==3) change font? I haven't checked VIEW's decoding of the next byte. I used the same decoding as for 0x04
        0x0C =>              (escLen==2) end monosp. example. set
|-
                              spacing to TRUE
|0x1A||(escLen==3) begin :lines. sequence.  set spacing to FALSE.  Arguments
 
{
        0x0D =>              (escLen==2) special text colors. Arguments:
    int8  alignment; // 1,2,4=left,right,center
                                {
}
                                  int8  color;  // 1,2,3: same as :hp4,8,9.
|-
                                                  // 0: default color
|0x1B||(escLen==2) end :lines. sequence. set spacing to TRUE
                                }
|-
 
|0x1C||(escLen==2) Set left margin to current position. Margin is reset at end of paragraph.
        0x0E =>              [bitmap]
|-
 
|0x1F||[special link, reftype=hd database=...]
        0x0F =>              if esclen==5 an inlined cross
|-
                              reference: the title of the referenced
|0x20||(esclen==4) :ddf. tag. Arguments:
                              article becomes part of the text.
{
                              This is probably the case even if
    int16  res;  // value of res attribute
                              esclen is not 5, but I don't know the
}
                              decoding. In the case that esclen is
|}
                              5, I don't know the purpose of the
The font used in the text is the normal IBM extended character set, including line graphics and some of the characters below 32.
                              byte following the escCode, but the
                              two bytes after that are an int16
                              index of the tocentrystart array.
 
        0x10 =>              [special link, reftype=launch]
 
        0x13 or 0x14 =>      (esclen==2) Set foreground (0x13)
                              and background (0x14) color.  Arguments:
                                {
                                  int8  color;
                                }
 
        0x15 =>              unknown
 
        0x16 =>              [special link, reftype=inform]
 
        0x17 =>              hide text (:hide. tag). Arguments:
                                {
                                  char8 key[];  // key required to show text
                                }
 
        0x18 =>              end of hidden text (:ehide.)
 
        0x19 =>              (esclen==3) change font? I haven't
                              checked VIEW's decoding of the next
                              byte. I used the same decoding as for
                              0x04
 
        0x1A =>              (escLen==3) begin :lines. sequence.  set
                              spacing to FALSE.  Arguments
                                {
                                  int8  alignment; // 1,2,4=left,right,center
                                }
 
        0x1B =>              (escLen==2) end :lines. sequence. set
                              spacing to TRUE
 
        0x1C =>              (escLen==2) Set left margin to current
                              position. Margin is reset at end of
                              paragraph.
 
        0x1F =>              [special link, reftype=hd database=...]
 
        0x20 =>              (esclen==4) :ddf. tag. Arguments:
                                {
                                  int16  res;  // value of res attribute
                                }
 
The font used in the text is the normal IBM extended character set,
including line graphics and some of the characters below 32.


==The ressource number array==
==The ressource number array==
Beginning at file offset resstart, this structure can overlay the file:
Beginning at file offset resstart, this structure can overlay the file:
         {
         {
Line 331: Line 277:


==The text name array==
==The text name array==
 
Beginning at file offset namestart, this structure can overlay the file:
Beginning at file offset namestart, this structure can overlay the
file:
         {
         {
             int16  name[nres];        // index to panel name in dictionary
             int16  name[nres];        // index to panel name in dictionary
Line 340: Line 284:


==The index table==
==The index table==
 
Beginning at file offset indexstart, a structure like the following is stored for each of the nindex words (in alphabetical order).
Beginning at file offset indexstart, a structure like the following
is stored for each of the nindex words (in alphabetical order).
         {
         {
             int8  nword;              // size of name
             int8  nword;              // size of name
Line 353: Line 295:


==The extended data block==
==The extended data block==
 
Not yet decoded. This block has a size of 64 bytes and contains various pointers to font names, names of externel databases etc.
Not yet decoded. This block has a size of 64 bytes and contains various
pointers to font names, names of externel databases etc.


==The full text search table==
==The full text search table==
 
Not yet decoded. This table is supressed when "/S" is specified on the IPFC command line.
Not yet decoded. This table is supressed when "/S" is specified on
the IPFC command line.


==Image data==
==Image data==
Beginning at file offset '''imgstart''', this data is a series of compressed OS/2 bitmaps.
<pre>
      Each starts with a BITMAPFILEHEADER:
        {
          int16 usType;    // 'bM' for bitmap
          int32 cbSize;    // total bitmap size including header
                            // BEFORE compression: not correct in this context
          int16 xHotspot;  // only for icons/pointers, not relevant here?
          int16 yHotspot;
          int32 offBits;    // offset to the actual bitmap data bits
          BITMAPINFOHEADER bmp; // further bitmap data:
            int32 cbFix;    // length of bitmapinfo header structure (12)
                            // (including this field)
            int16 cx;      // bitmap width
            int16 cy;      // bitmap height
            int16 cPlanes;  // num bitplanes - always 1 AFAIK
            int16 bitCount; // bits per pixel e.g. 4 = 16 colors


Not yet decoded. This area contains data for graphics contained in the
          RGB  palette[ N ]; // 2 ^ bitCount * 3 bytes
text.


==NLS table==
          bitmapData;      // in a special IPF format:
              int32 totalLength; // not including this field, but including the next
              int16 bitmapSize; // total size of memory required
                                // for uncompressed bitmap i.e.
                                // bytes per line rounded up to longword (4byte)
                                // x rows
                                // (This info is redundant)


Not yet decoded. This table contains informations specific to the language
          Followed by a series of blocks each up to 64k uncompressed.
and codepage the document was prepared in. It seems to contain some
          Blocks:
bitfields as well that might be used for character classification.
              int16 dataLength; // length of data following (including data type field)


              int8  dataType;  // 0 = uncompressed
                                  2 = compressed
              data...
              Compression is LZW (Lempel Ziv Welch)


==Appendix 1: Some useful translations from IBM Extended ASCII to normal ASCII==
        }
</pre>


One other transformation I had to make was of the character box characters of the IBM extended ASCII set.  These characters appear in strings in the dictionary. They are given here in octal together with their translation.
==NLS table==
Not yet decoded. This table contains information specific to the language and codepage the document was prepared in. It seems to contain some bit fields as well that might be used for character classification.


        020, 021      => blank seems satisfactory
==See Also==
        037          => solid down arrow: used to give direction to
* The [http://openwatcom.org/ Open Watcom] project (a cross-platform compiler project) contains an open source IPF Compiler called '''wipfc'''. It is fully working under Windows, Linux and FreeBSD.
                              a line in the syntax diagrams
* The [http://fpgui.sourceforge.net fpGUI Toolkit] open-source project contains a cross-platform INF viewer called [http://fpgui.sourceforge.net/screenshots_apps.shtml DocView]. This is still actively being maintained and developed.
        0263          => vertical bar
* Netlabs is hosting an open source INF viewer, [http://trac.netlabs.org/newview/wiki/NewView NewView].
        0264          => left connector: vertical bar with short
                              horizontal bar extending left from the
                              center
        0277, 0300    => top right or bottom left corner; one is
                              one, the other is the other and I
                              can't tell which from my translation
        0301          => up connector: horizontal line with vertical
                              line extending up from the center
        0302          => down connector: horizontal line with
                              vertical line extending down from the
                              center
        0303          => right connector: vertical bar with short
                              horizontal bar extending right from
                              the center
        0304          => horizontal bar
        0305          => cross connector, i.e. looks like + only
                              slightly larger to connect with
                              adjacent chars
        0331, 0332    => top left or bottom right corner; one is
                              one, the other is the other and I
                              can't tell which from my translation


==Appendix 1: Some useful translations from IBM Extended ASCII to normal ASCII==
One other transformation I had to make was of the character box characters of the IBM extended ASCII set. These characters appear in strings in the dictionary. They are given here in octal together with their translation.
{|class="wikitable"
|-
|020, 021|| ||blank seems satisfactory
|-
|037||▼||solid down arrow: used to give direction to a line in the syntax diagrams
|-
|0263||│||vertical bar
|-
|0264||┤||left connector: vertical bar with short horizontal bar extending left from the center
|-
|0277, 0300||┐ └||top right or bottom left corner; one is one, the other is the other and I can't tell which from my translation
|-
|0301||┴||up connector: horizontal line with vertical line extending up from the center
|-
|0302||┬||down connector: horizontal line with vertical line extending down from the center
|-
|0303||├||right connector: vertical bar with short horizontal bar extending right from the center
|-
|0304||─||horizontal bar
|-
|0305||┼||cross connector, i.e. looks like + only slightly larger to connect with adjacent chars
|-
|0331, 0332||┘ ┌||top left or bottom right corner; one is one, the other is the other and I can't tell which from my translation
|}


==Appendix 2: Style codes for escCode 0x04 and 0x0D==
==Appendix 2: Style codes for escCode 0x04 and 0x0D==
 
escCode 0x04 implements font changes associated with the :hp# IPF source tag.
escCode 0x04 implements font changes associated with the :hp# IPF source tag.
:hp1 is italic font
:hp2 is bold font
:hp3 is bold italic font
:hp5 is normal underlined font
:hp6 is italic underlined font
:hp7 is bold underlined font
Tags :hp4, :hp8, and :hp9 introduce different colored text which is encoded in the .inf or .hlp file using escCode 0x0D. On my monitor normal text is dark blue.
:hp4 text is light blue
:hp8 text is red
:hp9 text is magenta


        :hp1 is italic font
[[Category:File formats]]
        :hp2 is bold font
        :hp3 is bold italic font
        :hp5 is normal underlined font
        :hp6 is italic underlined font
        :hp7 is bold underlined font
 
tags :hp4, :hp8, and :hp9 introduce different colored text which is encoded in the .inf or .hlp file using escCode 0x0D.  On my monitor normal text is dark blue.
 
        :hp4 text is light blue
        :hp8 text is red
        :hp9 text is magenta
 
 
[[Category:Tools Articles]]

Latest revision as of 03:14, 27 December 2022

by Carl Hauser, Marcus Groeber and Graeme Geldenhuys

Carl Hauser's Introduction

Having become extremely frustrated by VIEW.EXE's penchant for windows that come and go, without even opening large enough to see everything in them, I thought I'd try to turn .INF files into something more conventional. While I don't have code to offer, I can tell you what I learned about .INF format – it was enough to produce more-or-less readable more-or-less plaintext from .INFs.

I offer this in the hope that somebody will give the community a really nice, tasteful, convenient, doesn't-use-too-much-screen-real-estate .INF browser to replace VIEW.EXE.

All of this was developed by looking at .INF files without any documentation of the format except what VIEW.EXE showed for a particular feature.

I don't have a lot of personal interest in refining this document with additional escape sequences, etc., but I would be happy to correspond with someone who wanted to fill in the details, or to clarify anything that may be confusing. If someone could point us to an official document describing the format that would be most helpful.

Marcus Groeber's Introduction

The original document contained most of the real tricky stuff in the file format (especially the compression algorithm) so going on from there was mainly a task of creating lots of help files using the IPFC and the decompiling them again to see what came out.

I fixed a few minor bugs in the description of the header which was extended to describe the entire structure I believe to be the header (because variable data starts afterwards).

A number of escape codes have also been added and the descriptions of others have been refined. There are still a lot of question marks about the format, but this description already allows disassembling the text into ASCII form in a fairly true-to-life format (including indentations etc.).

Further research should go into the way multiple windows are handled (I didn't work on that because I have never required multiple window displays in my help files and therefore am not familiar with the concepts). Font usage and graphics linking could also use some more fiddling around.

Types

All numeric quantities are least-significant-byte first in the file(little-endian).

  bit1       1 bit boolean           \  used only for explaining
  int4       4 bit unsigned integer  /  packed structures
  char8      8 bit character (ASCII more-or-less)
  int8       8 bit unsigned integer
  int16      16 bit unsigned integer
  int32      32 bit unsigned integer

The File Header

Starting at file offset 0 the following structure can overlay the file to provide some starting values:

       {
         int16 ID;           // ID magic word (5348h = "HS")
         int8  unknown1;     // unknown purpose, could be third letter of ID
         int8  flags;        // probably a flag word...
                             //  bit 0: set if INF style file
                             //  bit 4: set if HLP style file
                             // patching this byte allows reading HLP files
                             // using the VIEW command, while help files
                             // seem to work with INF settings here as well.
         int16 hdrsize;      // total size of header
         int16 unknown2;     // unknown purpose
         int16 ntoc;         // 16 bit number of entries in the tocarray
         int32 tocstrtablestart;  // 32 bit file offset of the start of the
                                  // strings for the table-of-contents
         int32 tocstrlen;    // number of bytes in file occupied by the
                             // table-of-contents strings
         int32 tocstart;     // 32 bit file offset of the start of tocarray
         int16 nres;         // number of panels with ressource numbers
         int32 resstart;     // 32 bit file offset of ressource number table
         int16 nname;        // number of panels with textual name
         int32 namestart;    // 32 bit file offset to panel name table
         int16 nindex;       // number of index entries
         int32 indexstart;   // 32 bit file offset to index table
         int32 indexlen;     // size of index table
         int8  unknown3[10]; // unknown purpose
         int32 searchstart;  // 32 bit file offset of full text search table
         int32 searchlen;    // size of full text search table
         int16 nslots;       // number of "slots"
         int32 slotsstart;   // file offset of the slots array
         int32 dictlen;      // number of bytes occupied by the "dictionary"
         int16 ndict;        // number of entries in the dictionary
         int32 dictstart;    // file offset of the start of the dictionary
         int32 imgstart;     // file offset of image data
         int8  unknown4;     // unknown purpose
         int32 nlsstart;     // 32 bit file offset of NLS table
         int32 nlslen;       // size of NLS table
         int32 extstart;     // 32 bit file offset of extended data block
         int8  unknown5[12]; // unknown purpose
         char8 title[48];    // ASCII title of database
       }

The table of contents array

Beginning at file offset tocstart, this structure can overlay the file:

       {
          int32  tocentrystart[ntoc];       // array of file offsets of
                                            // tocentries
       }

The table of contents entries

Beginning at each file offset, tocentrystart[i]:

       {
          int8 len;          // length of the entry including this byte
          int8 flags;        // flag byte, description folows (MSB first)
          // bit1 haschildren;  // following nodes are a higher level
          // bit1 hidden;       // this entry doesn't appear in VIEW.EXE's
                                // presentation of the toc
          // bit1 extended;     // extended entry format
          // bit1 stuff;        // ??
          // int4 level;        // nesting level
          int8 ntocslots;    // number of "slots" occupied by the text for
                             // this toc entry
       }

if the "extended" bit is not 1, this is immediately followed by

       {
          int16 tocslots[ntocslots]; // indices of the slots that make up
                                     // the article for this entry
          char8 title[];             // the remainder of the tocentry
                                     // until len bytes have been used [not
                                     // zero terminated]
       }

if extended is 1 there are intervening bytes that (I think) describe the kind, size and position of the window in which to display the article. I haven't decoded these bytes, though in most cases the following tells how many there are. Overlay the following on the next two bytes:

{
   int8 w1;
   int8 w2;
}

Here's a C code fragment for computing the number of bytes to skip:

int bytestoskip = 0;
if (w1 & 0x8) { bytestoskip += 2 };
if (w1 & 0x1) { bytestoskip += 5 };
if (w1 & 0x2) { bytestoskip += 5 };
if (w2 & 0x4) { bytestoskip += 2 };

skip over bytestoskip bytes (after w2) and find the tocslots and title as in the non-extended case.

The Slots array

Beginning at file offset slotsstart (provided by the file header) find

{
   int32 slots[nslots];       // file offset of the article
                              // corresponding to this slot
}

The Dictionary

Beginning at file offset dictstart (provided by the file header) and continuing until ndict entries have been read (and dictlen bytes have been consumed from the file) find a sequence of length-preceeded strings. Note that the length includes the length byte (not Pascal compatible!). Build a table mapping i to the ith string.

{
   char8*     strings[ndict];
}

The Article entries

Beginning at file offset slots[i] the following structure can overlay the file:

       {
          int8       stuff;          // ?? [always seen 0]
          int32      localdictpos;   // file offset of the local dictionary
          int8       nlocaldict;     // number of entries in the local dict
          int16      ntext;          // number of bytes in the text
          int8       text[ntext];    // encoded text of the article
       }

The Local dictionary

Beginning at file position localdictpos (for each article) there is an array:

{
     int16      localwords[nlocaldict];
}

The Text

The text for an article then consists of words obtained by referencing strings[localwords[text[i]]] for i in (0..ntext), with the following exceptions. If text[i] is greater than nlocaldict it means

      0xfa => end-of-paragraph, sets spacing to TRUE if
                not in a monospace example)
      0xfb => [unknown]
      0xfc => spacing = !spacing
      0xfd => line break (outside an example: ".br",
                sets spacing to TRUE if not in a
                monospace example)
      0xfe => space
      0xff => escape sequence                        // see below

When spacing is true, each word needs a space put after it. When false, the words are abutted and spaces are supplied using 0xfe or the dictionary. Examples are entered and left with 0xff escape sequences. The variable "spacing" is initially (start of every article slot) TRUE.

0xff escape sequences

These are used to change fonts, make cross references, enter and leave examples, etc. The general format is:

 {
    int8       FF;             // always equals 0xff
    int8       esclen;         // length of the sequence (including
                               // esclen but excluding FF)
    int8       escCode;        // which escape function
 }

escCodes I have partially deciphered are

0x01 unknown
0x02 or 0x11 or 0x12 (esclen==3) set left margin. 0x11 always starts a new line. Arguments
{
   int8  margin;   // in spaces, 0=no margin
}

note: in an IPF source, you must code

:lm margin=256. to reset the left margin.
0x03 (esclen==3) set right margin. Arguments
{
   int8  margin;   // in spaces, 1=no margin
}
0x04 (esclen==3) change style. Arguments
{
   int8  style;    // 1,2,3: same as :hp#.
                   // 4,5,6: same as :hp5,6,7.
                   // 0 returns to plain text
}
0x05 (esclen varies) beginning of cross reference. The next two bytes of the escape sequence are an int16 index of the tocentrystart array. The remaining bytes describe the size, position and characteristics of the window created when the cross-reference is followed by VIEW.

I have not decoded this.

0x06 unknown
0x07 (esclen==4) footnote (:fn. tag). Arguments:
{
   int16  toc;  // toc entry number of text
}
0x08 (escLen==2) end of cross reference introduced by escape code 0x05 or 0x07
0x09 unknown
0x0A unknown
0x0B (escLen==2) begin monosp. example. set spacing to FALSE
0x0C (escLen==2) end monosp. example. set spacing to TRUE
0x0D (escLen==2) special text colors. Arguments:
{
  int8  color;   // 1,2,3: same as :hp4,8,9.
                 // 0: default color
}
0x0E [bitmap]
0x0F if esclen==5 an inlined cross reference: the title of the referenced article becomes part of the text.

This is probably the case even if esclen is not 5, but I don't know the decoding. In the case that esclen is 5, I don't know the purpose of the byte following the escCode, but the two bytes after that are an int16 index of the tocentrystart array.

0x10 [special link, reftype=launch]
0x13 or 0x14 (esclen==2) Set foreground (0x13) and background (0x14) color. Arguments:
{
   int8  color;
}
0x15 unknown
0x16 [special link, reftype=inform]
0x17 hide text (:hide. tag). Arguments:
{
   char8 key[];  // key required to show text
}
0x18 end of hidden text (:ehide.)
0x19 (esclen==3) change font? I haven't checked VIEW's decoding of the next byte. I used the same decoding as for 0x04
0x1A (escLen==3) begin :lines. sequence. set spacing to FALSE. Arguments
{
   int8  alignment; // 1,2,4=left,right,center
}
0x1B (escLen==2) end :lines. sequence. set spacing to TRUE
0x1C (escLen==2) Set left margin to current position. Margin is reset at end of paragraph.
0x1F [special link, reftype=hd database=...]
0x20 (esclen==4) :ddf. tag. Arguments:
{
   int16  res;  // value of res attribute
}

The font used in the text is the normal IBM extended character set, including line graphics and some of the characters below 32.

The ressource number array

Beginning at file offset resstart, this structure can overlay the file:

       {
           int16  res[nres];         // ressource number of panels
           int16  toc[nres];         // toc entry number of panel
       }

The text name array

Beginning at file offset namestart, this structure can overlay the file:

       {
           int16  name[nres];        // index to panel name in dictionary
           int16  toc[nres];         // toc entry number of panel
       }

The index table

Beginning at file offset indexstart, a structure like the following is stored for each of the nindex words (in alphabetical order).

       {
           int8  nword;              // size of name
           int8  level;              // indent level
                                     // bit 6 set: global entry
           int8  stuff;
           int16 toc;                // toc entry number of panel
           char8 word[nword];        // index word [not zero-terminated]
       }

The extended data block

Not yet decoded. This block has a size of 64 bytes and contains various pointers to font names, names of externel databases etc.

The full text search table

Not yet decoded. This table is supressed when "/S" is specified on the IPFC command line.

Image data

Beginning at file offset imgstart, this data is a series of compressed OS/2 bitmaps.

      Each starts with a BITMAPFILEHEADER:
        {
           int16 usType;     // 'bM' for bitmap
           int32 cbSize;     // total bitmap size including header
                             // BEFORE compression: not correct in this context
           int16 xHotspot;   // only for icons/pointers, not relevant here?
           int16 yHotspot;
           int32 offBits;    // offset to the actual bitmap data bits
           BITMAPINFOHEADER bmp; // further bitmap data:
             int32 cbFix;    // length of bitmapinfo header structure (12)
                             // (including this field)
             int16 cx;       // bitmap width
             int16 cy;       // bitmap height
             int16 cPlanes;  // num bitplanes - always 1 AFAIK
             int16 bitCount; // bits per pixel e.g. 4 = 16 colors

           RGB  palette[ N ]; // 2 ^ bitCount * 3 bytes

           bitmapData;       // in a special IPF format:
              int32 totalLength; // not including this field, but including the next
              int16 bitmapSize; // total size of memory required
                                // for uncompressed bitmap i.e.
                                // bytes per line rounded up to longword (4byte)
                                // x rows
                                // (This info is redundant)

           Followed by a series of blocks each up to 64k uncompressed.
           Blocks:
              int16 dataLength; // length of data following (including data type field)

              int8  dataType;   // 0 = uncompressed
                                   2 = compressed
              data...
              Compression is LZW (Lempel Ziv Welch)

        }

NLS table

Not yet decoded. This table contains information specific to the language and codepage the document was prepared in. It seems to contain some bit fields as well that might be used for character classification.

See Also

  • The Open Watcom project (a cross-platform compiler project) contains an open source IPF Compiler called wipfc. It is fully working under Windows, Linux and FreeBSD.
  • The fpGUI Toolkit open-source project contains a cross-platform INF viewer called DocView. This is still actively being maintained and developed.
  • Netlabs is hosting an open source INF viewer, NewView.

Appendix 1: Some useful translations from IBM Extended ASCII to normal ASCII

One other transformation I had to make was of the character box characters of the IBM extended ASCII set. These characters appear in strings in the dictionary. They are given here in octal together with their translation.

020, 021 blank seems satisfactory
037 solid down arrow: used to give direction to a line in the syntax diagrams
0263 vertical bar
0264 left connector: vertical bar with short horizontal bar extending left from the center
0277, 0300 ┐ └ top right or bottom left corner; one is one, the other is the other and I can't tell which from my translation
0301 up connector: horizontal line with vertical line extending up from the center
0302 down connector: horizontal line with vertical line extending down from the center
0303 right connector: vertical bar with short horizontal bar extending right from the center
0304 horizontal bar
0305 cross connector, i.e. looks like + only slightly larger to connect with adjacent chars
0331, 0332 ┘ ┌ top left or bottom right corner; one is one, the other is the other and I can't tell which from my translation

Appendix 2: Style codes for escCode 0x04 and 0x0D

escCode 0x04 implements font changes associated with the :hp# IPF source tag.

:hp1 is italic font
:hp2 is bold font
:hp3 is bold italic font
:hp5 is normal underlined font
:hp6 is italic underlined font
:hp7 is bold underlined font

Tags :hp4, :hp8, and :hp9 introduce different colored text which is encoded in the .inf or .hlp file using escCode 0x0D. On my monitor normal text is dark blue.

:hp4 text is light blue
:hp8 text is red
:hp9 text is magenta