Jump to content

MMProgRef - Resource Interchange File Format

From EDM2

RIFF (Resource Interchange File Format) is the **tagged file structure** developed for multimedia resource files. The structure of a RIFF file is similar to the structure of an Electronic Arts Interchange File Format file (EA IFF). RIFF is not actually a file format itself (because it does not represent a specific kind of information), but its name contains the words "interchange file format" in recognition of its roots in IFF.

RIFF has a counterpart, **RIFX**, that is used to define RIFF file formats that use the **Motorola** integer byte-ordering format rather than the **Intel** format. A RIFX file is the same as a RIFF file, except that the first four bytes are 'RIFX' instead of 'RIFF', and integer byte ordering is represented in Motorola format.

Chunks

The basic building block of a RIFF file is called a chunk. Using C syntax, a chunk can be defined as follows:

typedef unsigned long ULONG;
typedef unsigned char BYTE;

typedef ULONG FOURCC;       /* Four-character code              */

typedef FOURCC CKID;        /* Four-character-code chunk identifier */
typedef ULONG CKSIZE;       /* 32-bit unsigned size value           */

typedef struct {            /* Chunk structure                      */
      CKID             ckID;        /* Chunk type identifier      */
      CKSIZE           ckSize;      /* Chunk size field (size of ckData) */
      BYTE             ckData[ckSize]; /* Chunk data           */
} CK;

A **FOURCC** is represented as a sequence of one to four ASCII alphanumeric characters, padded on the right with blank characters (ASCII character value 32) as required, with no embedded blanks.

For example, the four-character code 'FOO' is stored as a sequence of four bytes: 'F', 'O', 'O', ' ' in ascending addresses. For quick comparisons, a four-character code may also be treated as a 32-bit number.

The three parts of the chunk are described in the following table:

Part Description
ckID A four-character code that identifies the representation of the chunk data data. A program reading a RIFF file can skip over any chunk whose chunk ID it doesn't recognize; it simply skips the number of bytes specified by ckSize plus the pad byte, if present.
ckSize A 32-bit unsigned value identifying the size of ckData. This size value does not include the size of the ckID or ckSize fields or the pad byte at the end of ckData.
ckData Binary data of fixed or variable size. The start of ckData is word-aligned with respect to the start of the RIFF file. If the chunk size is an odd number of bytes, a pad byte with value zero is written after ckData. Word aligning improves access speed (for chunks resident in memory) and maintains compatibility with EA IFF. The ckSize value does not include the pad byte.

We can represent a chunk with the following notation (in this example, the ckSize and pad byte are implicit):

<ckID> ( <ckData> )

Two types of chunks, the 'LIST' and 'RIFF' chunks, may contain **nested chunks**, or subchunks. These special chunk types are discussed later in this document. All other chunk types store a single element of binary data in <ckData>.

RIFF Forms

A **RIFF form** is a chunk with a 'RIFF' chunk ID. The term also refers to a file format that follows the RIFF framework. The following is the current list of registered RIFF forms. Each is described in Multimedia File Formats.

Form Type Description
WAVE Waveform Audio Format

Using the notation for representing a chunk, a RIFF form looks like the following:

RIFF ( <formType> <ck>... )

The first four bytes of a RIFF form make up a chunk ID with values 'R', 'I', 'F', 'F'. The ckSize field is required, but for simplicity it is omitted from the notation.

The first ULONG of chunk data in the 'RIFF' chunk (shown above as <formType>) is a four-character code value identifying the data representation, or form type, of the file. Following the form-type code is a series of subchunks. Which subchunks are present depends on the form type. The definition of a particular RIFF form typically includes the following:

  • A unique four-character code identifying the form type
  • A list of mandatory chunks
  • A list of optional chunks
  • Possibly, a required order for the chunks

Defining and Registering RIFF Forms

The form-type code for a RIFF form must be unique. To guarantee this uniqueness, you must register any new form types before release. See Registering Multimedia Formats for information on registering RIFF forms.

Like RIFF forms, RIFX forms must also be registered. Registering a RIFF form does not automatically register the RIFX counterpart. No RIFX form types are currently defined.

Registered Form and Chunk Types

By convention, the form-type code for registered form types contains only digits and uppercase letters. Form-type codes that are all uppercase denote a registered, unique form type. Use lowercase letters for temporary or prototype chunk types.

Certain chunk types are also globally unique and must also be registered before use. These registered chunk types are not specific to a certain form type; they can be used in any form. If a registered chunk type can be used to store your data, you should use the registered chunk type rather than define your own chunk type containing the same type of information.

For example, a chunk with chunk ID 'INAM' always contains the name or title of a file. Also, within all RIFF files, file names or titles are contained within chunks with ID 'INAM' and have a standard data format.

Unregistered (Form-Specific) Chunk Types

Chunk types that are used only in a certain form type use a lowercase chunk ID. A lowercase chunk ID has specific meaning only within the context of a specific form type. After a form designer is allocated a registered form type, the designer can choose lowercase chunk types to use within that form. See Registering Multimedia Formats for information on registering form types.

For example, a chunk with ID 'scln' inside one form type might contain the "number of scan lines." Inside some other form type, a chunk with ID 'scln' might mean "secondary lambda number."

Notation for Representing Sample RIFF Files

RIFF is a binary format, but it is easier to comprehend as an ASCII representation of a RIFF file. This section defines a standard notation used to present samples of various types of RIFF files. If you define a RIFF form, we urge you to use this notation in any file format samples you provide in your documentation.

Basic Notation for Representing RIFF Files

The following information summarizes the elements of the RIFF notation required for representing sample RIFF files:

Notation Description
<ckID> (<ckData>) The chunk with ID <ckID> and data <ckData>. As previously described, <ckID> is a four-character code which may be enclosed by single quotes for emphasis.

For example, the following notation describes a 'RIFF' chunk with a form type of 'QRST'. The data portion of this chunk contains a 'FOO' subchunk.

RIFF('QRST' FOO(17 23))

The following example describes an 'ICOP' chunk containing the string "Copyright Encyclopedia International.":

'ICOP' ('Copyright Encyclopedia International.'Z)
Notation Description
<number>[<modifier>] A number in Intel format, where <number> is an optional sign (+ or -) followed by one or more digits and modified by the optional <modifier>.

Valid <modifier> values follow:

Modifier Meaning
None 16-bit number in decimal format
H 16-bit number in hexadecimal format
C 8-bit number in decimal format
CH 8-bit number in hexadecimal format
L 32-bit number in decimal format
LH 32-bit number in hexadecimal format

Several examples follow:

0
65535
-1
0L
4a3c89LH
-1C
21CH

Note that -1 and 65535 represent the same value. The application reading this file must know whether to interpret the number as signed or unsigned.

Notation Description
'<chars>' A four-character code (32-bit quantity) consisting of a sequence of zero to four ASCII characters <chars> in the given order. If <chars> is less than four characters long, it is implicitly padded on the right with blanks. Two single quotes is equivalent to four blanks.

Examples follow:

'RIFF'
'xyz'
''

<chars> can include escape sequences, which are combinations of characters introduced by a backslash (\) and used to represent other characters. Escape sequences are listed in the following section.

Notation Description
"<string>"[<modifier>] The sequence of ASCII characters contained in <string> and modified by the optional modifier <modifier>. The quoted text can include any of the escape sequences listed in the following section.

Valid <modifier> values follow:

Modifier Meaning
none No NULL terminator or size prefix.
Z String is NULL-terminated
B String has an 8-bit (byte) size prefix
US String has a 16-bit (ushort) size prefix
BZ String has a byte-size prefix and is NULL-terminated
WZ String has a word-size prefix and is NULL-terminated

NULL-terminated means that the string is followed by a character with ASCII value 0. A size prefix is an unsigned integer, stored as a byte or a word in Intel format preceding the string characters, that specifies the length of the string. In the case of strings with BZ or WZ modifiers, the size prefix specifies the size of the string without the terminating NULL.

The various string formats referred to above are discussed in "Storing Strings in RIFF Chunks," following later in this section.

Examples follow:

"No prefix, no NULL terminator"
"No prefix, NULL terminator"Z
"Byte prefix, NULL terminator"BZ

Escape Sequences for Four-Character Codes and String Chunks

The following escape sequences can be used in four-character codes and string chunks:

Escape Sequence ASCII Value Description
\n 10 Newline character
\t 9 Horizontal tab character
\b 8 Backspace character
\r 13 Carriage return character
\f 12 Form feed character
\\ 92 Backslash
\' 39 Single quote
\" 34 Double quote
\ddd Octal ddd Arbitrary character

Extended Notation for Representing RIFF Form Definitions

To unambiguously define the structure of new RIFF forms, document the RIFF form using the basic notation along with the following extended notation:

Notation Description
<name> A label that refers to some element of the file, where <name> is the name of the label.

Examples follow:

<NAME-ck>
<GOBL-form>
<bitmap-bits>
<foo>

Conventionally, a label that refers to a chunk is named <ckID-ck>, where 'ckID' is the chunk ID. Similarly, a label that refers to a RIFF form is named <formType-form>, where "formType" is the name of the form's type.

Notation Description
<name> ::= elements The actual data represented by <name> is defined as elements. This states that <name> is an abbreviation for elements, where elements is a sequence of other labels and literal data.

An example follows:

<GOBL-form> ::= RIFF ( 'GOBL' <form-data> )

This example defines label <GOBL-form> as representing a RIFF form with chunk ID 'GOBL' and data equal to <form-data>, where <form-data> is a label that would be defined in another rule. Note that a label may represent any data, not just a RIFF chunk or form.

Notation Description
<name:type> This is the same as <name>, but it also defines <name> to be equivalent to <type>. This notation obviates the following rule:
<name> ::= <type>

This allows you to give a symbolic name to an element of a file format and to specify the element data type.

An example follows:

<xyz-coordinate> ::= <x:INT> <y:INT> <z:INT>

This defines <xyz-coordinate> to consist of three parts concatenated together: <x>, <y>, and <z>. The definition also specifies that <x>, <y>, and <z> are integers. This notation is equivalent to the following:

<xyz-coordinate> ::= <x> <y> <z>
<x> ::= <INT>
<y> ::= <INT>
<z> ::= <INT>
Notation Description
[elements] An optional sequence of labels and literal data. Surrounded by square brackets, it may be considered an element itself.

An example follows:

<FOO-form> ::= RIFF('FOO' [<header-ck>] <data-ck>)

This example defines form "FOO" with an optional header chunk followed by a mandatory data chunk.

Notation Description
el2 | ... | elN Exactly one of the listed elements must be present.

An example follows:

<hdr-ck> ::= hdr(<hdr-x> | <hdr-y> | <hdr-z>)

This example defines the 'hdr' chunk's data as containing one of <hdr-x>, <hdr-y>, or <hdr-z>.

Notation Description
element... el2|...|elN," the ellipsis has its ordinary English meaning.

An example follows:

<data-ck> ::= data(<count:INT> <item:INT>...)

This example defines the data of the 'data' chunk to contain an integer <count>, followed by one or more occurrences of the integer <item>.

Notation Description
[element]... Zero or more occurrences of element may be present.

An example follows:

<data-ck> ::= data(<count:INT> [<item:INT>]...)

This example defines the data of the 'data' chunk to contain an integer <count> followed by zero or more occurrences of an integer <item>.

Notation Description
{elements} The group of elements within the braces should be considered a single element.

An example follows:

<blorg> ::= <this> | {<that> | <other>}...

This example defines <blorg> to be either <this> or one or more occurrences of <that> or <other>, intermixed in any way. Contrast this with the following example:

<blorg> ::= <this> | <that> | <other>...

This example defines <blorg> to be either <this> or <that> or one or more occurrences of <other>.

Notation Description
struct { ... } name A structure defined using C syntax. This can be used instead of a sequence of labels if a C header (include) file is available that defines the structure. The label used to refer to the structure should be the same as the structure's typedef name.

An example follows:

<3D_POINT> ::= struct {
                  INT x;      /* X-coordinate */
                  INT y;      /* Y-coordinate */
                  INT z;      /* Z-coordinate */
                } 3D_POINT

Because these types are more portable than C types such as int. The structure fields are assumed to be present in the file in the order given, with no padding or forced alignment.

Unless the RIFF chunk ID is 'RIFX', integer byte ordering is assumed to be in **Intel format**.

Notation Description
/* comment */ An explanatory comment to a rule.

An example follows:

<weekend> ::= 'Sat'|'Sun'       /* Four-character code
                                /* for day */

A Sample RIFF Form Definition and RIFF Form

The following example defines <GOBL-form>, the hypothetical RIFF form of type 'GOBL'. To fully document a new RIFF form definition, a developer would also provide detailed descriptions of each file element, including the semantics of each chunk and sample files documented using the standard notation.

<GOBL-form > ::=  RIFF ( 'GOBL'          /* RIFF form header  */
                       [<org-ck>]      /* Origin chunk      */
                                       /* (default (0,0,0)) */
                        <obj-list>)     /* Series of graphical
                                         objects           */

<org-ck> ::=    org(   <origin:3D_POINT> )
                                       /* Object-list origin  */

                                       /* An object is a:     */
<obj-list> ::=  LIST(  'obj'   {   <sqr-ck>  |           /* square,  */
                                 <circ-ck> |           /* circle,  */
                                 <poly-ck>  }... )    /* or polygon */

<sqr-ck> ::=    sqr(   <pt1:3D_POINT>    /* one vertex */
                       <pt2:3D_POINT>    /* another vertex */
                       <pt3:3D_POINT> )  /* a third vertex */

<circ-ck> ::=   circ(  <center:3D_POINT>     /* Center of circle */
                       <circumPt:3D_POINT> ) /* Point on circumference */

<poly-ck> ::=   poly( <pt:3D_POINT>... )   /* List of points in a polygon */

<3D_POINT> ::=  struct                 /* Defined in "gobl.h" */
                {   INT x;                     /* X-coordinate */
                    INT y;                     /* Y-coordinate */
                    INT z;                     /* Z-coordinate */
                } 3D_POINT

Sample RIFF Form The following sample RIFF form adheres to the form definition for form type GOBL. The file contains three subchunks:

  • An 'INFO' list
  • An 'org' chunk
  • An 'obj' chunk

The 'INFO' list and 'org' chunk each have two subchunks. The 'INFO' list is a registered global chunk that can be used within any RIFF file. The 'INFO' list is described in INFO List Chunk.

Since the definition of the GOBL form does not refer to the INFO chunk, software that expects only 'org' and 'obj' chunks in a GOBL form would ignore the unknown 'INFO' chunk.

RIFF( 'GOBL'
      LIST('INFO'     /* INFO list containing file name and copyright */
            INAM("A House"Z)
            ICOP("(C) Copyright Encyclopedia International 1991"Z)
           )
      org(2, 0, 0)         /* Origin of object list          */
      LIST('obj'           /* Object list containing two polygons */
            poly(0,0,0  2,0,0  2,2,0, 1,3,0, 0,2,0)
            poly(0,0,5  2,0,5  2,2,5, 1,3,5, 0,2,5)
          )
    )            /* End of form                    */

Storing Strings in RIFF Chunks

This section lists methods for storing text strings in RIFF chunks. While these guidelines may not make sense for all applications, you should follow these conventions if you must make an arbitrary decision regarding string storage.

NULL-Terminated String (ZSTR) Format

A **NULL-terminated string (ZSTR)** consists of a series of characters followed by a terminating NULL character. The ZSTR is better than a simple character sequence (STR) because many programs are easier to write if strings are NULL-terminated. ZSTR is preferred to a string with a size prefix (BSTR or WSTR) because the size of the string is already available as the <ckSize> value, minus one for the terminating NULL character.

String Table Format

In a **string table**, all strings used in a structure are stored at the end of the structure in packed format. The structure includes fields that specify the offsets from the beginning of the string table to the individual strings. An example follows:

typedef struct
{
    INT     iWidgetNumber;      /* the widget number */
    USHORT  offszWidgetName;    /* an offset to a string
                                   in <rgchStrTab>*/
    USHORT  offszWidgetDesc;    /* an offset to a string
                                   in <rgchStrTab> */
    INT     iQuantity;          /* how many widgets */
    CHAR    rgchStrTab[1];      /* string table (allocate
                                   as large as needed) */
}       WIDGET;

If multiple chunks within the file need to reference variable-length strings, you can store the strings in a single chunk that acts as a string table. The chunks that refer to the strings contain offsets relative to the beginning of the data part of the string table chunk.

NULL-Terminated, Byte Size Prefix String (BZSTR) Series

In a **BZSTR series**, a series of strings is stored in packed format. Each string is a BZSTR, with a byte size prefix and a NULL terminator. This format retains the ease-of-use characteristics of the ZSTR while providing the string size, allowing the application to quickly skip unneeded strings.

Multiline String Format

When storing multiline strings, separate lines with a **carriage return/line feed pair** (ASCII 13/ASCII 10 pair). Although applications vary in their requirements for new line symbols (carriage return only, line feed only, or both), it is generally easier to strip out extra characters than to insert extra ones. Inserting characters might require reallocating memory blocks or pre-scanning the chunk before allocating memory for it.

Choosing a Storage Method

The following lists guidelines for deciding which storage method is appropriate for your application.

Usage Recommended Format
Chunk data contains nothing except a string ZSTR (NULL-terminated string) format.
Chunk data contains a number of fields, some of which are variable-length strings String-table format
Multiple chunks within the file need to reference variable-length strings String-table format
Chunk data stores a sequence of strings, some of which the application may want to skip BZSTR (NULL-terminated string with byte size prefix) series
Chunk data contains multiline strings Multiline string format

LIST Chunk

A **LIST chunk** contains a list, or ordered sequence, of subchunks. A LIST chunk is defined as follows:

LIST( <list-type> [<chunk>]... )

The <list-type> is a four-character code that identifies the contents of the list.

If an application recognizes the list type, it should know how to interpret the sequence of subchunks. However, since a LIST chunk may contain only subchunks (after the list type), an application that does not know about a specific list type can still walk through the sequence of subchunks.

Like chunk IDs, list types must be registered, and an all-lowercase list type has meaning relative to the form that contains it. See Registering Multimedia Formats for information on registering list types.

INFO List Chunk

The **'INFO' list** is a registered global form type that can store information that helps identify the contents of the chunk. This information is useful but does not affect the way a program interprets the file; examples are copyright information and comments. An 'INFO' list is a 'LIST' chunk with list type 'INFO'. The following shows a sample 'INFO' list chunk:

LIST('INFO'       INAM("Two Trees"Z)
                   ICMT("A picture for the opening screen"Z) )

An 'INFO' list should contain only the following chunks. New chunks may be defined, but an application should ignore any chunk it doesn't understand. The chunks listed below may only appear in an 'INFO' list. Each chunk contains a ZSTR, or null-terminated text string.

Chunk ID Description
IARL Archival Location. Indicates where the subject of the file is archived.
IART Artist. Lists the artist of the original subject of the file. For example, "Michaelangelo."
ICMS Commissioned. Lists the name of the person or organization that commissioned the subject of the file. For example, "Pope Julian II."
ICMT Comments. Provides general comments about the file or the subject of the file. If the comment is several sentences long, end each sentence with a period. Do not include newline characters.
ICOP Copyright. Records the copyright information for the file. For example, "Copyright Encyclopedia International 1991." If there are multiple copyrights, separate them by a semicolon followed by a space.
ICRD Creation date. Specifies the date the subject of the file was created. List dates in year-month-day format, padding one-digit months and days with a zero on the left. For example, "1553-05-03" for May 3, 1553.
ICRP Cropped. Describes whether an image has been cropped and, if so, how it was cropped. For example, "lower right corner."
IDIM Dimensions. Specifies the size of the original subject of the file. For example, "8.5 in h, 11 in w."
IDPI Dots Per Inch. Stores dots per inch setting of the digitizer used to produce the file, such as "300."
IENG Engineer. Stores the name of the engineer who worked on the file. If there are multiple engineers, separate the names by a semicolon and a blank. For example, "Smith, John; Adams, Joe."
IGNR Genre. Describes the original work, such as, "landscape," "portrait," "still life," etc.
IKEY Keywords. Provides a list of keywords that refer to the file or subject of the file. Separate multiple keywords with a semicolon and a blank. For example, "Seattle; aerial view; scenery."
ILGT Lightness. Describes the changes in lightness settings on the digitizer required to produce the file. Note that the format of this information depends on hardware used.
IMED Medium. Describes the original subject of the file, such as, "computer image," "drawing," "lithograph," and so forth.
INAM Name. Stores the title of the subject of the file, such as, "Seattle From Above."
IPLT Palette Setting. Specifies the number of colors requested when digitizing an image, such as "256."
IPRD Product. Specifies the name of the title the file was originally intended for, such as "Encyclopedia of Pacific Northwest Geography."
ISBJ Subject. Describes the contents of the file, such as "Aerial view of Seattle."
ISFT Software. Identifies the name of the software package used to create the file, such as "Microsoft WaveEdit."
ISHP Sharpness. Identifies the changes in sharpness for the digitizer required to produce the file (the format depends on the hardware used).
ISRC Source. Identifies the name of the person or organization who supplied the original subject of the file. For example, "Trey Research."
ISRF Source Form. Identifies the original form of the material that was digitized, such as "slide," "paper," "map," and so forth. This is not necessarily the same as IMED.
ITCH Technician. Identifies the technician who digitized the subject file. For example, "Smith, John."

CSET (Character Set) Chunk