Jump to content

MMProgRef - Multimedia File Formats

From EDM2

This appendix describes the following multimedia file formats:

  • Audio/Video Interleaved (AVI)
  • Bundle (BND)
  • Device-independent bitmap (DIB)
  • RIFF DIB (RDIB)
  • Musical Instrument Digital Interface (MIDI)
  • RIFF MIDI (RMID)
  • Palette (PAL)
  • Rich Text Format (RTF)
  • Waveform audio (WAVE)

Most of these file formats are based on the resource interchange file format (RIFF), described in the **OS/2 Multimedia Application Programming Guide**.

Audio/Video Interleaved (AVI) Format

The Microsoft Audio/Video Interleaved (AVI) file format is a RIFF file specification used with applications that capture, edit, and playback audio/video sequences. In general, AVI files can contain multiple streams of different types of data. Most AVI sequences will use a single audio and a single video stream. A simple variation for an AVI sequence uses a single video stream and does not contain an audio stream.

This section describes the types of AVI files supported by OS/2 multimedia. Refer to the Microsoft documentation for a complete description of the AVI file format.

AVI RIFF Form

AVI files use the AVI RIFF form. The AVI RIFF form is identified by the four-character code 'AVI' RIFF form type. All AVI files include two mandatory LIST chunks. These chunks define the format of the streams and stream data. AVI files generally include an index chunk. This optional chunk specifies the location of data chunks within the file. An AVI file with these components has the following form:

RIFF ('AVI'
      LIST ('hdrl'
            ...
            ...
            ...
          )
      LIST ('movi'
            ...
            ...
            ...
          )
      ['idx1'<AVI Index>]
      )

The LIST chunks and the index chunk are subchunks of the RIFF 'AVI' chunk. The 'AVI' chunk identifies the file as an AVI RIFF file. The LIST 'hdrl' chunk defines the format of the data and is the first required list chunk. The LIST 'movi' chunk contains the data for the AVI sequence and is the second required list chunk. The 'idx1' chunk is the optional index chunk. AVI files must keep these three components in the proper sequence.

The LIST 'hdrl' and LIST 'movi' chunks use subchunks for their data. The following example shows the AVI RIFF form expanded with the chunks needed to complete the LIST 'hdrl' and LIST 'movi' chunks:

RIFF ('AVI'
           LIST ('hdrl'
                     'avih'(<Main AVI Header>)
                     LIST ('strl'
                           'strl'(<Stream header>)
                           'strf'(<Stream format>)
                           'strd'(additional header data)
                             ...
                             ...
                             ...
                             ...
                      )
                      ...
                      ...
                      ...
                    )
                     LIST ('movi'
                             {SubChunk | LIST ('rec'
                                                     SubChunk1
                                                     SubChunk2
                                                      ...
                                                      ...
                                                      ...
                                            )
                                           ...
                                           ...
                                           ...
                             }
                        ...
                        ...
                        ...
            )
                      ['idx1'<AVIIndex>]
)

The following sections describe the chunks contained in the LIST 'hdrl' and LIST 'movi' chunks as well as the 'idx1' chunk.

Data Structures for AVI Files

Data structures used in the RIFF chunks are defined in the AVIFMT.H header file. The reference section which follows describes the data structures that are used for the main AVI header, stream headers and AVI index chunks.

The Main AVI Header LIST

The file begins with the main header. In the AVI file, this header is identified with an 'avih' four-character code. The header contains general information about the file, such as the number of streams within the file and the width and height of the AVI sequence. The main header has the following data structure defined for it:

typedef struct {
   ULONG ulMicroSecPerFrame;
   ULONG ulMaxBytesPerSec;
   ULONG ulReserved1;
   ULONG ulFlags;
   ULONG ulTotalFrames;
   ULONG ulInitialFrames;
   ULONG ulStreams;
   ULONG ulSuggestedBufferSize;
   ULONG ulWidth;
   ULONG ulHeight;
   ULONG ulReserved[4];
} MainAVIHeader;

The **ulMicroSecPerFrame** field specifies the period between video frames. This value indicates the overall timing for the file.

The **ulMaxBytesPerSec** field specifies the approximate maximum data rate of the file. This value indicates the number of bytes per second the system must handle to present an AVI sequence as specified by the other parameters contained in the main header and stream header chunks.

The **ulFlags** field contains any flags for the file. The AVIF_HASINDEX flag applies to files with an index chunk. The AVI_HASINDEX flag indicates an index is present. The AVIF_ISINTERLEAVED flag indicates the AVI file has been interleaved. The system can stream interleaved data more efficiently than non-interleaved data.

The **ulTotalFrames** field of the main header specifies the total number of frames of data in file.

The **ulInitialFrames** field is used for some interleaved files. If you are creating interleaved files with audio skewing, specify the number of audio frames in the file prior to the initial video frame of the AVI sequence in this field.

The **ulStreams** field specifies the number of streams in the file. For example, a file with audio and video has two (2) streams.

The **ulSuggestedBufferSize** field specifies the suggested buffer size for reading the file. Generally, this size should be large enough to contain the largest chunk in the file. For an interleaved file, the buffer size should be large enough to read an entire record and not just a chunk.

The **ulWidth** and **ulHeight** fields specify the width and height of the AVI file in pixels.

The Stream Header 'strl' Chunks

The main header is followed by one or more 'strl' chunks. A 'strl' chunk is required for each data stream. These chunks contain information about the streams in the file. Each 'strl' chunk must contain a stream header and stream format chunk. Stream header chunks are identified by the 'strh' four-character code and stream format chunks are identified by the 'strf' four-character code. In addition to the stream header and stream format chunks, the 'strl' chunk might also contain a stream data chunk. Stream data chunks are identified with the four-character code 'strd'.

The stream header has the following data structure defined for it:

typedef struct {
   FOURCC fccType;
   FOURCC fccHandler;
   ULONG  ulFlags;
   ULONG  ulReserved1;
   ULONG  ulInitialFrames;
   ULONG  ulScale;
   ULONG  ulRate;
   ULONG  ulStart;
   ULONG  ulLength;
   ULONG  ulSuggestedBufferSize;
   ULONG  ulQuality;
   ULONG  ulSampleSize;
   ULONG  ulReserved[2];
} AVIStreamHeader;

The stream header specifies the type of data the stream contains, such as audio or video, by means of a four-character code. The **fccType** field is set to 'vids' if the stream it specifies contains video data. It is set to 'auds' if it contains audio data.

The **fccHandler** field contains a four-character code describing the installable decompressor used with the data.

The **ulFlags** field contains any flags for the data stream. The AVISF_ DISABLED flag indicates that the stream data should be rendered only when explicitly enabled by the user.

The **ulInitialFrames** is used for interleaved files. If you are creating interleaved files with audio skewing, specify the number of audio frames in the file prior to the initial video frame of the AVI sequence in this field.

The remaining fields describe the playback characteristics of the stream. These factors include the playback rate (**ulScale** and **ulRate**), the starting time of the sequence (**ulStart**), the length of the sequence (**ulLength**), the size of the playback buffer (**ulSuggestedBuffer**), an indicator of the data quality (**ulQuality**), and sample size (**ulSampleSize**). See the reference section for more information on these fields.

Some of the fields in the stream header structure are also present in the main header structure. The data in the main header structure applies to the whole file while the data in the stream header structure applies only to a stream.

A stream format 'strf' chunk must follow a stream header 'strh' chunk. The stream format chunk describes the format of the data in the stream. For video streams, the information in this chunk is a Windows BITMAPINFO structure (including palette information if appropriate). For audio streams, the information in this chunk is a Windows WAVEFORMATX or PCMWAVEFORMAT structure.

The 'strl' chunk might also contain a stream data 'strd' chunk. If used, this chunk follows the stream format chunk. The format and content of this chunk is defined by installable decompressors. Typically, decompressors use this information for configuration. Applications that read and write RIFF files do not need to decode this information. They transfer this data to and from a decompressor as a memory block.

An AVI player associates the stream headers in the LIST 'hdrl' chunk with the stream data in the LIST 'movi' chunk by using the order of the 'strl' chunks. The first 'strl' chunk applies to stream 0, the second applies to stream 1, and so forth. For example, if the first 'strl' chunk describes the video data, the video data is contained in stream 0. Similarly, if the second 'strl' chunk describes audio data, then the audio data is contained in stream 1.

The LIST 'movi' Chunk

Following the header information is a LIST 'movi' chunk that contains chunks of the actual data in the streams; that is, the pictures and sounds themselves. The data chunks are grouped into 'rec' chunks.

Like any RIFF chunk, the data chunks contain a four-character code to identify the chunk type. The four-character code that identifies each chunk consists of the stream number and a two-character code that defines the type of information encapsulated in the chunk. For example, a waveform chunk is identified by a of 'wb' two-character code. If a waveform chunk corresponded to the second LIST 'hdrl' stream description, it would have a '01wb' four-character code.

Since all the format information is in the header, the audio data contained in these data chunks does not contain any information about its format. An audio data chunk has the following format (the ## in the format represents the stream identifier):

WAVE Bytes  '##wb'
      BYTE  abBytes[];

Video data can be compressed or uncompressed DIBs. An uncompressed DIB has BI_RGB specified for the **biCompression** field in its associated BITMAPINFO structure. A compressed DIB has a value other than BI_RGB specified in the **biCompression** field.

A data chunk for an uncompressed DIB contains RGB video data. These chunks are identified with a two-character code of 'db' (db is an abbreviation for DIB bits). Data chunks for a compressed DIB are identified with a two-character code of 'dc' (dc is an abbreviation for DIB compressed). Neither data chunk will contain any header information about the DIBs. The data chunk for an uncompressed DIB has the following form:

DIB Bits  '##db'
    BYTE  abBits[];

The data chunk for a compressed DIB has the following form:

Compressed DIB  '##dc'
    BYTE        abBits[];

The 'idx1' Chunk

AVI files can have an index chunk after the LIST 'movi' chunk. The index chunk essentially contains a list of the data chunks and their location in the file. This provides efficient random access to the data within the file, because an application can locate a particular sound sequence or video image in a AVI file without having to scan it.

Index chunks use the 'idx1' four-character code. The following data structure is defined for index entries:

typedef struct {
    ULONG  ckid;
    ULONG  ulFlags;
    ULONG  ulChunkOffset;
    ULONG  ulChunkLength;
} AVIINDEXENTRY;

The **ckid**, **ulFlags**, **ulChunkOffset**, and **ulChunkLength** entries are repeated in the AVI file for each data chunk indexed. The index will have entries for each 'rec' chunk. The 'rec' entries should have the AVIIF_LIST flag set and the list type in the **ckid** field.

The **ckid** field identifies the data chunk. This field uses four-character codes for the identifying chunk.

The **ulFlags** field specifies any flags for the data. The AVIIF_KEYFRAME flag indicates key frames in the video sequence. Key frames do not need previous video information to be decompressed. The AVIIF_NOTIME flag indicates a chunk does not affect the timing of a video stream. The AVIIF_LIST flag indicates the current chunk is a LIST chunk. Use the **ckid** field to identify the type of LIST chunk.

The **ulChunkOffset** and **ulChunkLength** fields specify the position of the chunk and the length of the chunk. The **ulChunkOffset** field specifies the position of the chunk in the file relative to the 'movi' list. The **ulChunkLength** field specifies the length of the chunk excluding the eight bytes for the RIFF header.

If you include an index in the RIFF file, set the AVIF_HASINDEX in the **ulFlags** field of the AVI header. (This header is identified by 'avih' chunk ID.) This flag indicates that the file has an index.

Other Data Chunks

If you need to align data in you AVI file you can add a 'JUNK' chunk. (This chunk is a standard RIFF type.) Applications reading these chunks will ignore their contents. Files played from CD-ROM can use these chunks to align data so they can be read more efficiently. You might want to use this chunk to align your data for the 2 kilobyte CD-ROM boundaries. The 'JUNK' chunk has the following form:

    AVI Padding     'JUNK'
        Byte        data[]

As with any other RIFF files, all applications that read AVI files should ignore the non-AVI chunks that it does not recognize. Applications that read and write AVI files should preserve the non-AVI chunk when they save files they have loaded.

Interleaved Files

All AVI files produced by OS/2 multimedia are interleaved. The audio stream is divided into single frame pieces. The video and audio data for each frame are grouped into 'rec' chunks.

Platforms, other than OS/2 multimedia, have additional requirements to playback from CD-ROM devices. When OS/2 multimedia creates a file with the moderate frame sizes and frame rates which these other platforms support, it performs two additional steps to accommodate the the needs of these platforms:

  • Audio skewing
  • Padding

The audio data is skewed ahead of the video data by approximately 0.75 seconds. For example, a 15 frame per second movie would have 12 audio frames skewed ahead of the first video frame. The **ulInitialFrames** fields in the main and video stream headers are set to the number of skewed frames.

The 'rec' chunks are padded so that their size is a multiple of 2 kilobytes and so that the beginning of the actual data in the LIST chunk lies on a 2 kilobyte boundary.

OS/2 multimedia does not require either padding or skewing to playback from CD-ROM. However, to maintain compatibility with platforms, skewing and padding are performed on files with moderate data rates. Movie files with data rates less than or equal to the nominal data rate of an uncompressed 15 frame per second movie with a frame size of 160 by 120 pels are skewed and padded. This allows other platforms to playback these files.

However, OS/2 multimedia supports movies with nominal frame sizes of 320 by 240 pels at 15 frames per second. CD-ROM data rates do not permit the wasted bandwidth required by padding at this large frame rate. Files with these large data rate requirements are not padded.

AVI RIFF File Reference

This section lists data structures used to support AVI RIFF files. These structures are defined in AVIFMT.H. The data structures are presented in alphabetical order. The structure definition is given, followed by a description of each field.

AVIINDEXENTRY

The AVI file index consists of an array of **AVIINDEXENTRY** structures contained within an 'idx1' chunk at the end of an AVI file. This chunk follows the main LIST 'movi' chunk which contains the actual data.

typedef struc {
    ULONG   ckid;
    ULONG   ulFlags;
    ULONG   ulChunkOffset;
    ULONG   ulChunkLength;
} AVIINDEXENTRY;

The **AVIINDEXENTRY** structure has the following fields:

  • **ckid** Specifies a four-character code corresponding to the chunk ID of a data chunk in the file.
  • **ulFlags** Specifies any applicable flags. The flags in the low-order word are reserved for AVI, while those in the high-order word can be used for stream- and compressor/decompressor-specific information. The following values are currently defined:
   * AVIIF_LIST Indicates the specified chunk is a 'LIST' chunk, and the **ckid** field contains the list type of the chunk.
   * AVIIF_KEYFRAME Indicates this chunk is a key frame. Key frames do not require additional preceding chunks to be properly decoded.
   * AVIIF_FIRSTPART Indicates this chunk needs the frames following it to be used; it cannot stand alone.
   * AVIIF_LASTPART Indicates this chunk needs the frames preceding it to be used; it cannot stand alone.
   * AVIIF_NOTIME Indicates this chunk should have no effect on timing or calculating time values based on the number of chunks. For example, palette change chunks in a video stream should have this flag set, so that they are not counted as taking up a frame's worth of time.
  • **ulChunkOffset** Specifies the position in the file of the specified chunk. The position value includes the eight byte RIFF header.
  • **ulChunkLength** Specifies the length of the specified chunk. The length value does not include the eight byte RIFF header.

AVIStreamHeader

The **AVIStreamHeader** structure contains header information for a single stream of a file. It is contained within an 'strh' chunk within a LIST 'strl' chunk that is itself contained within the LIST 'hdrl' chunk at the beginning of an AVI RIFF file.

typedef struct {
    FOURCC  fccType;
    FOURCC  fccHandler;
    ULONG   ulFlags;
    ULONG   ulReserved1;
    ULONG   ulInitialFrames;
    ULONG   ulScale;
    ULONG   ulRate;
    ULONG   ulStart;
    ULONG   ulLength;
    ULONG   ulSuggestedBufferSize;
    ULONG   ulQuality;
    ULONG   ulSampleSize;
    ULONG   Reserved[2];
} AVIStreamHeader;

The **AVIStreamHeader** structure has the following fields:

  • **fccType** Contains a four-character code which specifies the type of data contained in the stream. The following values are currently defined for AVI data:
   * 'vids' Indicates the stream contains video data. The stream format chunk contains a **BITMAPINFO** structure which can include palette information.
   * 'auds' Indicates the stream contains video data. The stream format chunk contains a **WAVEFORMAT** or **PCMWAVEFORMAT** structure.
   * Other four-character codes can identify non-AVI data.
  • **fccHandler** Optionally, contains a four-character code that identifies a specific data handler. The data handler is the preferred handler for the stream.
  • **ulFlags** Specifies any applicable flags. The bits in the high-order word of these flags are specific to the type of data contained in the stream. The following flags are currently defined:
   * AVISF_DISABLED Indicates this stream should not be enabled by default.
   * AVISF_VIDEO_PALCHANGES Indicates this video stream contains palette changes. This flag warns the playback software that it will need to animate the palette.
  • **ulReserved1** Reserved. (Should be set to 0.)
  • **ulInitialFrames** Specifies how far audio data is skewed ahead of the video frames in interleaved files. Typically, this is about 0.75 seconds.
  • **ulScale** This field is used together with **ulRate** to specify the time scale that this stream will use. Dividing **ulRate** by **ulScale** gives the number of samples per second. For video streams, this rate should be the frame rate. For audio streams, this rate should correspond to the time needed for **nBlockAlign** bytes of audio, which for PCM audio simply reduces to the sample rate.
  • **ulRate** See **ulScale**.
  • **ulStart** This field is currently reserved and should be set to zero.
  • **ulLength** Specifies the length of this stream. The units are defined by the **ulRate** and **ulScale** fields of the stream's header.
  • **ulSuggestedBufferSize** Suggests how large a buffer should be used to read this stream. Typically, this contains a value corresponding to the largest chunk presented in the stream. Using the correct buffer size makes playback more efficient. Use zero if you do not know the correct buffer size.
  • **ulQuality** Specifies an indicator of the quality of the data in the stream. Quality is represented as a number between 0 and 10000. For compressed data, this typically represent the value of the quality parameter passed to the compression software. If set to -1, drivers use the default quality value.
  • **ulSampleSize** Specifies the size of a single sample of data. This is set to zero if the samples can vary in size. If this number is non-zero, then multiple samples of data can be grouped into a single chunk within the file. If it is zero, each sample of data (such as a video frame) must be in a separate chunk. For video streams, this number is typically zero, although it can be non-zero if all video frames are the same size. For audio streams, this number should be the same as the **nBlockAlign** field of the **WAVEFORMAT** structure describing the audio.

MainAVIHeader

The **MainAVIHeader** structure contains global information for the entire AVI file. It is contained within an 'avih' chunk within the LIST 'hdrl' chunk at the beginning of an AVI RIFF file.

typedef sturct {
    ULONG   ulMicroSecPerFrame;
    ULONG   ulMaxBytesPerSec;
    ULONG   ulReserved1;
    ULONG   ulFlags;
    ULONG   ulTotalFrames;
    ULONG   ulInitialFrames;
    ULONG   ulStreams;
    ULONG   ulSuggestedBufferSize;
    ULONG   ulWidth;
    ULONG   ulHeight;
    ULONG   ulReserved {4}
} MainAVIHeader;

The **MainAVIHeader** structure has the following fields:

  • **ulMicroSecPerFrame** Specifies the number of microseconds between frames.
  • **ulMaxBytesPerSec** Specifies the approximate maximum data rate of file.
  • **ulReserved1** Reserved. (This field should be set to 0.)
  • **ulFlags** Specifies any applicable flags. The following flags are defined:
   * AVIF_HASINDEX Indicates the AVI file has an 'idx1' chunk containing an index at the end of the file. For good performance, all AVI files should contain an index.
   * AVIF_ISINTERLEAVED Indicates the AVI file is interleaved.
  • **ulTotalFrames** Specifies the number of frames of data in file.
  • **ulInitialFrames** Specifies the initial frame for interleaved files. Non-interleaved files should specify zero.
  • **ulStreams** Specifies the number of streams in the file. For example, a file with audio and video has two (2) streams.
  • **ulSuggestedBufferSize** Specifies the suggested buffer size for reading the file. Generally, this size should be large enough to contain the largest chunk in the file. If set to zero, or if it is too small, the playback software will have to reallocate memory during playback which will reduce performance. For an interleaved file, this buffer size should be large enough to read an entire record and not just a chunk.
  • **ulWidth** Specifies the width of the AVI file in pixels.
  • **ulHeight** Specifies the height of the AVI file in pixels.
  • **ulReserved** Reserved. These four double words should be set to zero.

Bundle File Format

The bundle (BND) format contains a series of RIFF chunks or other multimedia files. The BND file is defined as follows:

<BND-file> $\rightarrow$ RIFF('BND' <CTOC-chunk> <CGRP-chunk> )

The **<CTOC-chunk>** and **<CGRP-chunk>** formats are defined in the **OS/2 Multimedia Application Programming Guide**.

Each compound file element must be capable of standing alone as an independent file. An element cannot be a random chunk (except the RIFF chunk, indicating a RIFF file) or random binary data (unless the binary data is to be treated as a file).

Device-Independent Bitmap File Format

The device-independent bitmap (DIB) format represents bitmap images in a device-independent manner. Bitmaps can be represented at 1, 4, and 8 bits per pixel, with a palette containing colors represented in 24 bits. Bitmaps also can be represented at 24-bits per pixel without a palette and in a run-length encoded format.

This documentation describes three types of DIB files:

  • Windows version 3.0 device-independent bitmap files
  • OS/2 Presentation Manager version 1.2 device-independent bitmap files
  • RIFF device-independent bitmap files

The Windows 3.0 and Presentation Manager 1.2 DIBs are similar, so they are discussed together.

Overview of DIB Structure

Windows 3.0 and Presentation Manager 1.2 DIB files consist of the following sequence of data structures:

  • A file header
  • Bitmap information header
  • Color table
  • Array of bytes that defines the bitmap bits

The following sections describe each of these structures.

Bitmap File Header

The bitmap file header contains information about the type, size, and layout of a device-independent bitmap (DIB) file. In both the Windows 3.0 and Presentation Manager 1.2 DIBs, it is defined as a BITMAPFILEHEADER data structure:

The following code illustrates how to define a bitmap file header.

typedef struct tagBITMAPFILEHEADER {
      USHORT   bfType;
      ULONG    bfSize;
      USHORT   bfReserved1;
      USHORT   bfReserved2;
      ULONG    bfOffBits;
} BITMAPFILEHEADER;

The BITMAPFILEHEADER data structure contains the following fields.

  • **bfType** Specifies the file type. It must consist of the character sequence BM (USHORT value 0x4D42).
  • **bfSize** Specifies the file size in bytes.
  • **bfReserved1** Reserved. Must be set to 0.
  • **bfReserved2** Reserved. Must be set to 0.
  • **bfOffBits** Specifies the byte offset from the BITMAPFILEHEADER structure to the actual bitmap data in the file.

Bitmap Information Header

The BITMAPINFO and BITMAPCOREINFO data structures define the dimensions and color information for Windows 3.0 and Presentation Manager 1.2 DIBs, respectively. These structures are defined as follows:

    • Windows 3.0 DIB**
typedef struct tagBITMAPINFO {
     BITMAPINFOHEADER bmiHeader;
     RGBQUAD bmiColors[1];
} BITMAPINFO;
    • Presentation Manager 1.2 DIB**
typedef struct BITMAPCOREINFO {
     BITMAPCOREHEADER   bmciHeader;
     RGBTRIPLE bmciColors[1];
} BITMAPCOREINFO;

These structures are alike essentially; this section describes both structures simultaneously. Each field name for the Windows BITMAPINFO structure is followed by the corresponding field name for the Presentation Manager BITMAPCOREINFO 1.2 structure, in parentheses. The following table describes these fields.

Windows (PM) Fields Description
Specifies information about the dimensions and color format of the DIB. The BITMAPINFOHEADER and BITMAPCOREHEADER data structures are described in the next section.
Specifies the DIB color table.The RGBQUAD and RGBTRIPLE data structures are described in Bitmap Color Table.