Jump to content

MMProgRef - Resource Interchange File Format: Difference between revisions

From EDM2
No edit summary
Line 1: Line 1:
{{MMProgRef}}
[[Category:WorkToDo]]
RIFF (Resource Interchange File Format) is the **tagged file structure** developed for multimedia resource files. The structure of a RIFF file is similar to the structure of an Electronic Arts Interchange File Format file (EA IFF). RIFF is not actually a file format itself (because it does not represent a specific kind of information), but its name contains the words "interchange file format" in recognition of its roots in IFF.
RIFF (Resource Interchange File Format) is the **tagged file structure** developed for multimedia resource files. The structure of a RIFF file is similar to the structure of an Electronic Arts Interchange File Format file (EA IFF). RIFF is not actually a file format itself (because it does not represent a specific kind of information), but its name contains the words "interchange file format" in recognition of its roots in IFF.



Revision as of 04:20, 26 November 2025

Reprint Courtesy of International Business Machines Corporation, © International Business Machines Corporation

Multimedia Programming Reference
  1. Introduction
  2. What's New...
  3. MCI Functions
  4. High-Level Macro Service Functions
  5. Subsystem Messages
  6. Notification Messages
  7. MCI Command Messages
  8. String Commands
  9. Memory Playlist Commands
  10. Graphic Button Control
  11. Secondary Window Functions
  12. MMIO Functions
  13. MMIO Messages
  14. CODEC Messages
  15. DIVE Functions
  16. Real-Time MIDI Functions
  17. SPI Functions
  18. Data Stream State Table
  19. SMH Messages
  20. DDCMD Messages
  21. VSD Commands
  22. SHD Messages
  23. SHC Messages
  24. Data Types
  25. Types of MIDI Messages
  26. Multimedia Specification Overview
  27. Resource Interchange File Format
  28. Multimedia File Formats
  29. RIFF Compound Files and Elements - Sharing and Access
  30. Return Codes
  31. Notices
  32. Glossary

RIFF (Resource Interchange File Format) is the **tagged file structure** developed for multimedia resource files. The structure of a RIFF file is similar to the structure of an Electronic Arts Interchange File Format file (EA IFF). RIFF is not actually a file format itself (because it does not represent a specific kind of information), but its name contains the words "interchange file format" in recognition of its roots in IFF.

RIFF has a counterpart, **RIFX**, that is used to define RIFF file formats that use the **Motorola** integer byte-ordering format rather than the **Intel** format. A RIFX file is the same as a RIFF file, except that the first four bytes are 'RIFX' instead of 'RIFF', and integer byte ordering is represented in Motorola format.

Chunks

The basic building block of a RIFF file is called a chunk. Using C syntax, a chunk can be defined as follows:

typedef unsigned long ULONG;
typedef unsigned char BYTE;

typedef ULONG FOURCC;       /* Four-character code              */

typedef FOURCC CKID;        /* Four-character-code chunk identifier */
typedef ULONG CKSIZE;       /* 32-bit unsigned size value           */

typedef struct {            /* Chunk structure                      */
      CKID             ckID;        /* Chunk type identifier      */
      CKSIZE           ckSize;      /* Chunk size field (size of ckData) */
      BYTE             ckData[ckSize]; /* Chunk data           */
} CK;

A **FOURCC** is represented as a sequence of one to four ASCII alphanumeric characters, padded on the right with blank characters (ASCII character value 32) as required, with no embedded blanks.

For example, the four-character code 'FOO' is stored as a sequence of four bytes: 'F', 'O', 'O', ' ' in ascending addresses. For quick comparisons, a four-character code may also be treated as a 32-bit number.

The three parts of the chunk are described in the following table:

Part Description
ckID A four-character code that identifies the representation of the chunk data data. A program reading a RIFF file can skip over any chunk whose chunk ID it doesn't recognize; it simply skips the number of bytes specified by ckSize plus the pad byte, if present.
ckSize A 32-bit unsigned value identifying the size of ckData. This size value does not include the size of the ckID or ckSize fields or the pad byte at the end of ckData.
ckData Binary data of fixed or variable size. The start of ckData is word-aligned with respect to the start of the RIFF file. If the chunk size is an odd number of bytes, a pad byte with value zero is written after ckData. Word aligning improves access speed (for chunks resident in memory) and maintains compatibility with EA IFF. The ckSize value does not include the pad byte.

We can represent a chunk with the following notation (in this example, the ckSize and pad byte are implicit):

<ckID> ( <ckData> )

Two types of chunks, the 'LIST' and 'RIFF' chunks, may contain **nested chunks**, or subchunks. These special chunk types are discussed later in this document. All other chunk types store a single element of binary data in <ckData>.

RIFF Forms

A **RIFF form** is a chunk with a 'RIFF' chunk ID. The term also refers to a file format that follows the RIFF framework. The following is the current list of registered RIFF forms. Each is described in Multimedia File Formats.

Form Type Description
WAVE Waveform Audio Format

Using the notation for representing a chunk, a RIFF form looks like the following:

RIFF ( <formType> <ck>... )

The first four bytes of a RIFF form make up a chunk ID with values 'R', 'I', 'F', 'F'. The ckSize field is required, but for simplicity it is omitted from the notation.

The first ULONG of chunk data in the 'RIFF' chunk (shown above as <formType>) is a four-character code value identifying the data representation, or form type, of the file. Following the form-type code is a series of subchunks. Which subchunks are present depends on the form type. The definition of a particular RIFF form typically includes the following:

  • A unique four-character code identifying the form type
  • A list of mandatory chunks
  • A list of optional chunks
  • Possibly, a required order for the chunks

Defining and Registering RIFF Forms

The form-type code for a RIFF form must be unique. To guarantee this uniqueness, you must register any new form types before release. See Registering Multimedia Formats for information on registering RIFF forms.

Like RIFF forms, RIFX forms must also be registered. Registering a RIFF form does not automatically register the RIFX counterpart. No RIFX form types are currently defined.

Registered Form and Chunk Types

By convention, the form-type code for registered form types contains only digits and uppercase letters. Form-type codes that are all uppercase denote a registered, unique form type. Use lowercase letters for temporary or prototype chunk types.

Certain chunk types are also globally unique and must also be registered before use. These registered chunk types are not specific to a certain form type; they can be used in any form. If a registered chunk type can be used to store your data, you should use the registered chunk type rather than define your own chunk type containing the same type of information.

For example, a chunk with chunk ID 'INAM' always contains the name or title of a file. Also, within all RIFF files, file names or titles are contained within chunks with ID 'INAM' and have a standard data format.

Unregistered (Form-Specific) Chunk Types

Chunk types that are used only in a certain form type use a lowercase chunk ID. A lowercase chunk ID has specific meaning only within the context of a specific form type. After a form designer is allocated a registered form type, the designer can choose lowercase chunk types to use within that form. See Registering Multimedia Formats for information on registering form types.

For example, a chunk with ID 'scln' inside one form type might contain the "number of scan lines." Inside some other form type, a chunk with ID 'scln' might mean "secondary lambda number."

Notation for Representing Sample RIFF Files

RIFF is a binary format, but it is easier to comprehend as an ASCII representation of a RIFF file. This section defines a standard notation used to present samples of various types of RIFF files. If you define a RIFF form, we urge you to use this notation in any file format samples you provide in your documentation.

Basic Notation for Representing RIFF Files

The following information summarizes the elements of the RIFF notation required for representing sample RIFF files:

Notation Description
<ckID> (<ckData>) The chunk with ID <ckID> and data <ckData>. As previously described, <ckID> is a four-character code which may be enclosed by single quotes for emphasis.

For example, the following notation describes a 'RIFF' chunk with a form type of 'QRST'. The data portion of this chunk contains a 'FOO' subchunk.

RIFF('QRST' FOO(17 23))

The following example describes an 'ICOP' chunk containing the string "Copyright Encyclopedia International.":

'ICOP' ('Copyright Encyclopedia International.'Z)
Notation Description
<number>[<modifier>] A number in Intel format, where <number> is an optional sign (+ or -) followed by one or more digits and modified by the optional <modifier>.

Valid <modifier> values follow:

Modifier Meaning
None 16-bit number in decimal format
H 16-bit number in hexadecimal format
C 8-bit number in decimal format
CH 8-bit number in hexadecimal format
L 32-bit number in decimal format
LH 32-bit number in hexadecimal format

Several examples follow:

0
65535
-1
0L
4a3c89LH
-1C
21CH

Note that -1 and 65535 represent the same value. The application reading this file must know whether to interpret the number as signed or unsigned.

Notation Description
'<chars>' A four-character code (32-bit quantity) consisting of a sequence of zero to four ASCII characters <chars> in the given order. If <chars> is less than four characters long, it is implicitly padded on the right with blanks. Two single quotes is equivalent to four blanks.

Examples follow:

'RIFF'
'xyz'
''

<chars> can include escape sequences, which are combinations of characters introduced by a backslash (\) and used to represent other characters. Escape sequences are listed in the following section.

Notation Description
"<string>"[<modifier>] The sequence of ASCII characters contained in <string> and modified by the optional modifier <modifier>. The quoted text can include any of the escape sequences listed in the following section.

Valid <modifier> values follow:

Modifier Meaning
none No NULL terminator or size prefix.
Z String is NULL-terminated
B String has an 8-bit (byte) size prefix
US String has a 16-bit (ushort) size prefix
BZ String has a byte-size prefix and is NULL-terminated
WZ String has a word-size prefix and is NULL-terminated

NULL-terminated means that the string is followed by a character with ASCII value 0. A size prefix is an unsigned integer, stored as a byte or a word in Intel format preceding the string characters, that specifies the length of the string. In the case of strings with BZ or WZ modifiers, the size prefix specifies the size of the string without the terminating NULL.

The various string formats referred to above are discussed in "Storing Strings in RIFF Chunks," following later in this section.

Examples follow:

"No prefix, no NULL terminator"
"No prefix, NULL terminator"Z
"Byte prefix, NULL terminator"BZ

Escape Sequences for Four-Character Codes and String Chunks

The following escape sequences can be used in four-character codes and string chunks:

Escape Sequence ASCII Value Description
\n 10 Newline character
\t 9 Horizontal tab character
\b 8 Backspace character
\r 13 Carriage return character
\f 12 Form feed character
\\ 92 Backslash
\' 39 Single quote
\" 34 Double quote
\ddd Octal ddd Arbitrary character

Extended Notation for Representing RIFF Form Definitions

To unambiguously define the structure of new RIFF forms, document the RIFF form using the basic notation along with the following extended notation:

Notation Description
<name> A label that refers to some element of the file, where <name> is the name of the label.

Examples follow:

<NAME-ck>
<GOBL-form>
<bitmap-bits>
<foo>

Conventionally, a label that refers to a chunk is named <ckID-ck>, where 'ckID' is the chunk ID. Similarly, a label that refers to a RIFF form is named <formType-form>, where "formType" is the name of the form's type.

Notation Description
<name> ::= elements The actual data represented by <name> is defined as elements. This states that <name> is an abbreviation for elements, where elements is a sequence of other labels and literal data.

An example follows:

<GOBL-form> ::= RIFF ( 'GOBL' <form-data> )

This example defines label <GOBL-form> as representing a RIFF form with chunk ID 'GOBL' and data equal to <form-data>, where <form-data> is a label that would be defined in another rule. Note that a label may represent any data, not just a RIFF chunk or form.

Notation Description
<name:type> This is the same as <name>, but it also defines <name> to be equivalent to <type>. This notation obviates the following rule:
<name> ::= <type>

This allows you to give a symbolic name to an element of a file format and to specify the element data type.

An example follows:

<xyz-coordinate> ::= <x:INT> <y:INT> <z:INT>

This defines <xyz-coordinate> to consist of three parts concatenated together: <x>, <y>, and <z>. The definition also specifies that <x>, <y>, and <z> are integers. This notation is equivalent to the following:

<xyz-coordinate> ::= <x> <y> <z>
<x> ::= <INT>
<y> ::= <INT>
<z> ::= <INT>
Notation Description
[elements] An optional sequence of labels and literal data. Surrounded by square brackets, it may be considered an element itself.

An example follows:

<FOO-form> ::= RIFF('FOO' [<header-ck>] <data-ck>)

This example defines form "FOO" with an optional header chunk followed by a mandatory data chunk.

Notation Description
el2 | ... | elN Exactly one of the listed elements must be present.

An example follows:

<hdr-ck> ::= hdr(<hdr-x> | <hdr-y> | <hdr-z>)

This example defines the 'hdr' chunk's data as containing one of <hdr-x>, <hdr-y>, or <hdr-z>.

Notation Description
element... el2|...|elN," the ellipsis has its ordinary English meaning.

An example follows:

<data-ck> ::= data(<count:INT> <item:INT>...)

This example defines the data of the 'data' chunk to contain an integer <count>, followed by one or more occurrences of the integer <item>.

Notation Description
[element]... Zero or more occurrences of element may be present.

An example follows:

<data-ck> ::= data(<count:INT> [<item:INT>]...)

This example defines the data of the 'data' chunk to contain an integer <count> followed by zero or more occurrences of an integer <item>.

Notation Description
{elements} The group of elements within the braces should be considered a single element.

An example follows:

<blorg> ::= <this> | {<that> | <other>}...

This example defines <blorg> to be either <this> or one or more occurrences of <that> or <other>, intermixed in any way. Contrast this with the following example:

<blorg> ::= <this> | <that> | <other>...

This example defines <blorg> to be either <this> or <that> or one or more occurrences of <other>.

Notation Description
struct { ... } name A structure defined using C syntax. This can be used instead of a sequence of labels if a C header (include) file is available that defines the structure. The label used to refer to the structure should be the same as the structure's typedef name.

An example follows:

<3D_POINT> ::= struct {
                  INT x;      /* X-coordinate */
                  INT y;      /* Y-coordinate */
                  INT z;      /* Z-coordinate */
                } 3D_POINT

Because these types are more portable than C types such as int. The structure fields are assumed to be present in the file in the order given, with no padding or forced alignment.

Unless the RIFF chunk ID is 'RIFX', integer byte ordering is assumed to be in **Intel format**.

Notation Description
/* comment */ An explanatory comment to a rule.

An example follows:

<weekend> ::= 'Sat'|'Sun'       /* Four-character code
                                /* for day */

A Sample RIFF Form Definition and RIFF Form

The following example defines <GOBL-form>, the hypothetical RIFF form of type 'GOBL'. To fully document a new RIFF form definition, a developer would also provide detailed descriptions of each file element, including the semantics of each chunk and sample files documented using the standard notation.

<GOBL-form > ::=  RIFF ( 'GOBL'          /* RIFF form header  */
                       [<org-ck>]      /* Origin chunk      */
                                       /* (default (0,0,0)) */
                        <obj-list>)     /* Series of graphical
                                         objects           */

<org-ck> ::=    org(   <origin:3D_POINT> )
                                       /* Object-list origin  */

                                       /* An object is a:     */
<obj-list> ::=  LIST(  'obj'   {   <sqr-ck>  |           /* square,  */
                                 <circ-ck> |           /* circle,  */
                                 <poly-ck>  }... )    /* or polygon */

<sqr-ck> ::=    sqr(   <pt1:3D_POINT>    /* one vertex */
                       <pt2:3D_POINT>    /* another vertex */
                       <pt3:3D_POINT> )  /* a third vertex */

<circ-ck> ::=   circ(  <center:3D_POINT>     /* Center of circle */
                       <circumPt:3D_POINT> ) /* Point on circumference */

<poly-ck> ::=   poly( <pt:3D_POINT>... )   /* List of points in a polygon */

<3D_POINT> ::=  struct                 /* Defined in "gobl.h" */
                {   INT x;                     /* X-coordinate */
                    INT y;                     /* Y-coordinate */
                    INT z;                     /* Z-coordinate */
                } 3D_POINT

Sample RIFF Form The following sample RIFF form adheres to the form definition for form type GOBL. The file contains three subchunks:

  • An 'INFO' list
  • An 'org' chunk
  • An 'obj' chunk

The 'INFO' list and 'org' chunk each have two subchunks. The 'INFO' list is a registered global chunk that can be used within any RIFF file. The 'INFO' list is described in INFO List Chunk.

Since the definition of the GOBL form does not refer to the INFO chunk, software that expects only 'org' and 'obj' chunks in a GOBL form would ignore the unknown 'INFO' chunk.

RIFF( 'GOBL'
      LIST('INFO'     /* INFO list containing file name and copyright */
            INAM("A House"Z)
            ICOP("(C) Copyright Encyclopedia International 1991"Z)
           )
      org(2, 0, 0)         /* Origin of object list          */
      LIST('obj'           /* Object list containing two polygons */
            poly(0,0,0  2,0,0  2,2,0, 1,3,0, 0,2,0)
            poly(0,0,5  2,0,5  2,2,5, 1,3,5, 0,2,5)
          )
    )            /* End of form                    */

Storing Strings in RIFF Chunks

This section lists methods for storing text strings in RIFF chunks. While these guidelines may not make sense for all applications, you should follow these conventions if you must make an arbitrary decision regarding string storage.

NULL-Terminated String (ZSTR) Format

A **NULL-terminated string (ZSTR)** consists of a series of characters followed by a terminating NULL character. The ZSTR is better than a simple character sequence (STR) because many programs are easier to write if strings are NULL-terminated. ZSTR is preferred to a string with a size prefix (BSTR or WSTR) because the size of the string is already available as the <ckSize> value, minus one for the terminating NULL character.

String Table Format

In a **string table**, all strings used in a structure are stored at the end of the structure in packed format. The structure includes fields that specify the offsets from the beginning of the string table to the individual strings. An example follows:

typedef struct
{
    INT     iWidgetNumber;      /* the widget number */
    USHORT  offszWidgetName;    /* an offset to a string
                                   in <rgchStrTab>*/
    USHORT  offszWidgetDesc;    /* an offset to a string
                                   in <rgchStrTab> */
    INT     iQuantity;          /* how many widgets */
    CHAR    rgchStrTab[1];      /* string table (allocate
                                   as large as needed) */
}       WIDGET;

If multiple chunks within the file need to reference variable-length strings, you can store the strings in a single chunk that acts as a string table. The chunks that refer to the strings contain offsets relative to the beginning of the data part of the string table chunk.

NULL-Terminated, Byte Size Prefix String (BZSTR) Series

In a **BZSTR series**, a series of strings is stored in packed format. Each string is a BZSTR, with a byte size prefix and a NULL terminator. This format retains the ease-of-use characteristics of the ZSTR while providing the string size, allowing the application to quickly skip unneeded strings.

Multiline String Format

When storing multiline strings, separate lines with a **carriage return/line feed pair** (ASCII 13/ASCII 10 pair). Although applications vary in their requirements for new line symbols (carriage return only, line feed only, or both), it is generally easier to strip out extra characters than to insert extra ones. Inserting characters might require reallocating memory blocks or pre-scanning the chunk before allocating memory for it.

Choosing a Storage Method

The following lists guidelines for deciding which storage method is appropriate for your application.

Usage Recommended Format
Chunk data contains nothing except a string ZSTR (NULL-terminated string) format.
Chunk data contains a number of fields, some of which are variable-length strings String-table format
Multiple chunks within the file need to reference variable-length strings String-table format
Chunk data stores a sequence of strings, some of which the application may want to skip BZSTR (NULL-terminated string with byte size prefix) series
Chunk data contains multiline strings Multiline string format

LIST Chunk

A **LIST chunk** contains a list, or ordered sequence, of subchunks. A LIST chunk is defined as follows:

LIST( <list-type> [<chunk>]... )

The <list-type> is a four-character code that identifies the contents of the list.

If an application recognizes the list type, it should know how to interpret the sequence of subchunks. However, since a LIST chunk may contain only subchunks (after the list type), an application that does not know about a specific list type can still walk through the sequence of subchunks.

Like chunk IDs, list types must be registered, and an all-lowercase list type has meaning relative to the form that contains it. See Registering Multimedia Formats for information on registering list types.

INFO List Chunk

The **'INFO' list** is a registered global form type that can store information that helps identify the contents of the chunk. This information is useful but does not affect the way a program interprets the file; examples are copyright information and comments. An 'INFO' list is a 'LIST' chunk with list type 'INFO'. The following shows a sample 'INFO' list chunk:

LIST('INFO'       INAM("Two Trees"Z)
                   ICMT("A picture for the opening screen"Z) )

An 'INFO' list should contain only the following chunks. New chunks may be defined, but an application should ignore any chunk it doesn't understand. The chunks listed below may only appear in an 'INFO' list. Each chunk contains a ZSTR, or null-terminated text string.

Chunk ID Description
IARL Archival Location. Indicates where the subject of the file is archived.
IART Artist. Lists the artist of the original subject of the file. For example, "Michaelangelo."
ICMS Commissioned. Lists the name of the person or organization that commissioned the subject of the file. For example, "Pope Julian II."
ICMT Comments. Provides general comments about the file or the subject of the file. If the comment is several sentences long, end each sentence with a period. Do not include newline characters.
ICOP Copyright. Records the copyright information for the file. For example, "Copyright Encyclopedia International 1991." If there are multiple copyrights, separate them by a semicolon followed by a space.
ICRD Creation date. Specifies the date the subject of the file was created. List dates in year-month-day format, padding one-digit months and days with a zero on the left. For example, "1553-05-03" for May 3, 1553.
ICRP Cropped. Describes whether an image has been cropped and, if so, how it was cropped. For example, "lower right corner."
IDIM Dimensions. Specifies the size of the original subject of the file. For example, "8.5 in h, 11 in w."
IDPI Dots Per Inch. Stores dots per inch setting of the digitizer used to produce the file, such as "300."
IENG Engineer. Stores the name of the engineer who worked on the file. If there are multiple engineers, separate the names by a semicolon and a blank. For example, "Smith, John; Adams, Joe."
IGNR Genre. Describes the original work, such as, "landscape," "portrait," "still life," etc.
IKEY Keywords. Provides a list of keywords that refer to the file or subject of the file. Separate multiple keywords with a semicolon and a blank. For example, "Seattle; aerial view; scenery."
ILGT Lightness. Describes the changes in lightness settings on the digitizer required to produce the file. Note that the format of this information depends on hardware used.
IMED Medium. Describes the original subject of the file, such as, "computer image," "drawing," "lithograph," and so forth.
INAM Name. Stores the title of the subject of the file, such as, "Seattle From Above."
IPLT Palette Setting. Specifies the number of colors requested when digitizing an image, such as "256."
IPRD Product. Specifies the name of the title the file was originally intended for, such as "Encyclopedia of Pacific Northwest Geography."
ISBJ Subject. Describes the contents of the file, such as "Aerial view of Seattle."
ISFT Software. Identifies the name of the software package used to create the file, such as "Microsoft WaveEdit."
ISHP Sharpness. Identifies the changes in sharpness for the digitizer required to produce the file (the format depends on the hardware used).
ISRC Source. Identifies the name of the person or organization who supplied the original subject of the file. For example, "Trey Research."
ISRF Source Form. Identifies the original form of the material that was digitized, such as "slide," "paper," "map," and so forth. This is not necessarily the same as IMED.
ITCH Technician. Identifies the technician who digitized the subject file. For example, "Smith, John."

CSET (Character Set) Chunk

To define **character-set** and **language** information for a RIFF file, use the **CSET chunk**. The CSET chunk defines the code page and country, language, and dialect codes for the file. These values can be overridden for specific file elements; see Usage Codes for Extra Header and Extra Entry Fields for information on specifying character set information in a compound file.

The CSET chunk is defined as follows:

<CSET chunk> $\rightarrow$ CSET(<usCodePage:USHORT>
                               <usCountryCode:USHORT>
                               <usLanguageCode:USHORT>
                               <usDialect:USHORT>)

The fields are as follows:

  • **usCodePage**: Specifies the **code page** used for file elements. If the CSET chunk is not present, or if this field has a value of zero, assume standard **ISO 8859/1** code page (identical to code page 1004 without code points defined in hex columns 0, 1, 8, and 9).
  • **usCountryCode**: Specifies the **country code** used for file elements. See Country Codes for a list of currently defined country codes. If the CSET chunk is not present, or if this field has a value of zero, assume **USA** (country code 001).
  • **usLanguage**, **usDialect**: Specify the **language** and **dialect** used for file elements. See Language and Dialect Codes for a list of language and dialect codes. If the CSET chunk is not present, or if these fields have a value of zero, assume **US English** (language code 9, dialect code 1).

Country Codes

Use one of the following country codes in the **usCountryCode** field:

Country Code Country
000 None (ignore this field)
001 USA
002 Canada
003 Latin America
030 Greece
031 Netherlands
032 Belgium
033 France
034 Spain
039 Italy
041 Switzerland
043 Austria
044 United Kingdom
045 Denmark
046 Sweden
047 Norway
049 West Germany
052 Mexico
055 Brazil
061 Australia
064 New Zealand
081 Japan
082 Korea
086 People's Republic of China
088 Taiwan
090 Turkey
351 Portugal
352 Luxembourg
354 Iceland
358 Finland

Language and Dialect Codes

Specify one of the following pairs of language-code and dialect-code values in the **usLanguage** and **usDialect** fields:

Language Code Dialect Code Language
0 0 None (ignore these fields)
1 1 Arabic
2 1 Bulgarian
3 1 Catalan
4 1 Traditional Chinese
4 2 Simplified Chinese
5 1 Czech
6 1 Danish
7 1 German
7 2 Swiss German
8 1 Greek
9 1 US English
9 2 UK English
10 1 Spanish
10 2 Spanish Mexican
11 1 Finnish
12 1 French
12 2 Belgian French
12 3 Canadian French
12 4 Swiss French
13 1 Hebrew
14 1 Hungarian
15 1 Icelandic
16 1 Italian
16 2 Swiss Italian
17 1 Japanese
18 1 Korean
19 1 Dutch
19 2 Belgian Dutch
20 1 Norwegian - Bokmal
20 2 Norwegian - Nynorsk
21 1 Polish
22 1 Brazilian Portuguese
22 2 Portuguese
23 1 Rhaeto-Romanic
24 1 Romanian
25 1 Russian
26 1 Serbo-Croatian (Latin)
26 2 Serbo-Croatian (Cyrillic)
27 1 Slovak
28 1 Albanian
29 1 Swedish
30 1 Thai
31 1 Turkish
32 1 Urdu
33 1 Bahasa

JUNK (Filler) Chunk

A **JUNK chunk** represents **padding, filler** or outdated information. It contains no relevant data; it is a space filler of arbitrary size.

The JUNK chunk is defined as follows:

<JUNK chunk> $\rightarrow$ JUNK( <filler> )

where **<filler>** contains random data.

Compound File Structure

The **compound file structure** is a **RIFF-based structure** upon which multimedia file formats can be defined. The compound file structure is a parameterized structure that provides for the following:

  • Storage of multimedia data elements
  • Direct access to multimedia data elements (as opposed to sequential searching)

The goals of the compound file structure are to maximize **flexibility** and **extensibility** while minimizing implementation costs. Using the compound file structure, developers of multimedia data formats can define both simple and complex file formats.

The structure is flexible enough to be used for many purposes, but it can be simplified for use with simple file formats. Designers of new multimedia file formats can restrict the use of standard header fields, requiring some and removing others.

For example, a developer might define a compound file format that stores a series of bit maps in a single file, thus reducing compact disc seek times. Another developer might define a compound file format that contains a special type of audio resource, using the compound file header information to identify the attributes of the audio data stored within.

Structural Overview

Files based upon the compound file structure contain the following two RIFF chunks at their topmost level:

  • **Compound File Table of Contents (CTOC) chunk**
  • **Compound File Element Group (CGRP) chunk**

The **CTOC chunk indexes the CGRP chunk**, which contains the actual multimedia data elements. Defined using the standard chunk notation, a compound file is represented as follows:

<compound file> $\rightarrow$ RIFF('type' <CTOC> <CGRP>)

where **'type'** is a FOURCC indicating the file type.

This section describes the CTOC and CGRP chunks in detail.

Compound File Table of Contents (CTOC) Chunk

The **CTOC chunk** functions mainly as an **index**, allowing **direct access** to elements within a compound file. The CTOC chunk also contains information about the attributes of the entire file and of each media element within the file.

To provide the maximum flexibility for defining compound file formats, the CTOC chunk can be customized at several levels. The CTOC chunk contains fields whose length and usage is defined by other CTOC fields. This parameterization adds complexity, but it provides flexibility to file format designers and allows applications to correctly read data without necessarily knowing the specific file format definition.

Structural Overview

The CTOC chunk defines the contents of the CGRP chunk. The CTOC chunk has the following components:

  • **Header information** defining the size of the CTOC chunk, the number of entries in the CGRP chunk, the size of the CGRP chunk, and general information about the entire header file.
  • A **parameter table definition** defining the size and contents of the header parameter table and CTOC table entries.
  • A **header parameter table** defining attributes that apply to the entire compound file.
  • **CTOC table entries** defining the location, size, name, and attributes of the compound file elements contained in the CGRP chunk.

These components appear sequentially in the CTOC chunk. The individual fields in the CTOC chunk can be found by looking under MMCFINFO or MMCTOCENTRY respectively.

Following are lists of each area of fields.

Header Information

The header information section defines general information about the CTOC header and about the entire compound file. It contains the following fields:

  • **ulHeaderSize**
  • **ulEntriesTotal**
  • **ulEntriesDeleted**
  • **ulEntriesUnused**
  • **ulBytesTotal**
  • **ulBytesDeleted**
  • **ulHeaderFlags**

Parameter Table Definition

The parameter table definition defines the size and contents of the header parameter table and CTOC table. It contains the following fields:

  • **usEntrySize**
  • **usNameSize**
  • **usExHdrFields**
  • **usExEntFields**
  • **aulExHdrFldUsage**
  • **aulExEntFldUsage**

Valid usage codes for each field in this array are listed in Usage Codes for Extra Header and Extra Entry Fields.

Header Parameter Table

The header parameter table is an optional component generally used to define attributes of the entire compound file.

  • **aulExHdrField**

CTOC Table Entries

The **CTOC table entries** define the location, size, name, and other information about the individual compound file elements contained in the CGRP chunk. The number of CTOC table entries is determined by the **ulEntriesTotal** field in the header information of the CTOC chunk.

Each CTOC table entry is a structure containing the following fields:

  • **ulOffset**
  • **ulSize**
  • **ulMedType**
  • **ulMedUsage**
  • **ulCompressTech**
  • **ulUncompressBytes**
  • **aulExEntField**
  • **pszElementName**

Usage Codes for Extra Header and Extra Entry Fields

The following are valid usage codes for elements in the **aulExHdrFldUsage** and **aulExEntFldUsage** arrays, both of which are fields of the CTOC header. These arrays define the meaning of data stored in the **aulExHdrField** and **aulExEntField** "extra fields." All usage codes apply to both header fields and entry fields, unless explicitly stated otherwise.

Values marked in the extra **header field arrays** generally apply to all elements in the CFRG chunk, while values marked in the extra **entry field arrays** generally apply only to the element referenced by the corresponding CTOC table entry.

  • **CTOC\_EFU\_UNUSED (0x00)**
   * The field is unused. This usage code may be used to logically delete a header field.
  • **CTOC\_EFU\_LASTMODTIME (0x01)**
   * When used to describe an extra header field, the field contains the time that any portion of the CTOC or CGRP was last modified.
   * When used to describe an extra entry field, the field contains the time that the corresponding CTOC table entry, or the compound file element it refers to, was last modified.
   * The field is interpreted as a **ULONG** containing the number of seconds that have elapsed since 00:00:00 Greenwich Mean Time (GMT), January 1, 1970.
  • **CTOC\_EFU\_CODEPAGE**
   * The field contains the code page and country code for the **achName** field. These values override any values specified in a CSET chunk.
   * When used to describe an extra header field, the field contains code-page and country-code information for all CTOC table entries. When used to describe an extra entry field, the field contains information for that specific CTOC table entry.
   * The low-order word of the field contains one of the following code page values:
       * **Zero**: Use standard ISO 8859/1 code page. This is identical to code page 1004 without code points defined in hex columns 0, 1, 8, and 9.
       * **CTOC\_CHARSET\_CODEPAGE (0x0000****_nnnn_****)**: Use code page 0x****_nnnn_**, where 0x****_nnnn_** is the 16-bit code page number. For example, 0x00000352 for OS/2 code page 850, or 0x000004E4 for Windows 3.1 code page 1252.
   * The high-order word contains one of the following country codes:
       * **Zero**: Ignore this field.
       * **Country code**: See Country Codes for a list of currently defined country codes.
  • **CTOC\_EFU\_LANGUAGE**
   * The field contains language and dialect information for the **achName** field. These values override any values specified in a CSET chunk.
   * When used to describe an extra header field, the field contains language information for all CTOC table entries. When used to describe an extra entry field, the field contains information for that specific CTOC table entry.
   * The low-order word of the field contains one of the following language codes:
       * **Zero**: Ignore this field.
       * **Language code**: See Language and Dialect Codes for a list of currently defined language codes.
   * The high-order word of the field contains one of the following dialect codes:
       * **Zero**: Ignore this field.
       * **Dialect code**: See Language and Dialect Codes for a list of currently defined dialect codes.
  • **CTOC\_EFU\_COMPRESSPARAM0 (0x05) through CTOC\_EFU\_COMPRESSPARAM9 (0x14)**
   * Specifies a compression parameter. See Compression of Compound File Elements.

Compression of Compound File Elements

Compound file elements can be compressed. The **ulCompressTech** field of a CTOC table entry contains a **FOURCC** compression technique identifier for the corresponding compound file element. If the field is zero, the compound file element is not compressed.

The definition of a specific compression technique may specify that either the entire compound file element is compressed, or that some specific subset, for example one or more RIFF chunks, is compressed.

The **ulUncompressSize** field contains the number of bytes that the compound file element will occupy in memory after decompression. If the compound file element is not compressed, this field contains the same value as the **ulSize** field, which identifies the file size of the compound file element.

Compression techniques may demand extra header fields or extra entry fields for decompression parameters. Compression technique identifiers, and any new entry fields corresponding to decompression technique parameters, must be unique. See Registering Multimedia Formats for registration information.

Compound File Element Group (CGRP) Chunk

The actual elements of data referenced by the CTOC chunk are stored in a compound file **Element Group (CGRP) chunk**. The CGRP chunk contains all the compound file elements, concatenated together into one contiguous block of data. Some of the elements in the CGRP chunk might be unused, if the element was marked for deletion or was altered and stored elsewhere within the CGRP chunk.

Elements within the CGRP chunk are of arbitrary size and can appear in a specific or arbitrary order, depending upon the file format definition. Each element is identified by a corresponding CTOC table entry.

Using the standard RIFF notation, the CGRP chunk is defined as follows:

<CGRP-chunk> $\rightarrow$ CGRP([<compound file element>]...)

Placement of the CTOC and CGRP Chunks

The specific file format definition can specify which of the two chunks appear first in the data file.

  • Generally, the **CTOC chunk** is placed at the **front of the file** to reduce the seek and read times required to access it.
  • During authoring time, an application might place the **CTOC chunk at the end of the file**, so it can be expanded as elements are added to the CGRP chunk.