SGML and OpenDoc - Bento

From EDM2
Jump to: navigation, search

By Matt Timmermans

As requested, I've appended the Bento design overview here:

Other official bento (and OpenDoc) information can be found at ftp.cil.org:

  • /pub/cilabs/tech/bento/Design-Overview.txt - [Bento Design Overview]
  • /pub/cilabs/tech/bento/Bento-brief.txt - brief description of Bento [Bento Brief]
  • /pub/cilabs/tech/bento/Bento-Spec/postscript/*.ps -- complete Bento specification.

Since Bento is only a small part of OpenDoc, and since the goals of OpenDoc as a whole is not meant to solve the same problem as SGML, all of the Bento documentation is long on technicalities and short on philosophy. It is enough to remember that on the technical (rather than political) side of things, Bento and SGML have effectively identical mandates.

Also, you won't find the rigour here that you'd get from ISO.

Bento Design Overview

This file provides an overview of the Bento design. It describes the design more from the API perspective than the format perspective. However, most of the concepts also apply to the format level. In some respects the format is simpler than the API, but it difficult to understand without first understanding the API functionality it is intended to support.

Bento Entities

The easiest way to begin understanding the Bento design is probably to review the entities that the API manipulates.

Primary Entities

The most important entities in the Bento design are containers, objects, properties, values, and types.

Every object is in some container. An object consists of a set of properties. The properties are not in any particular order. Each property consists of a sequence of values, indexed from 1 to n. Every object must have at least one property, and that property must have at least one value. Each value has a type; several values of the same property may have the same type. The type of a value is unrelated to its index. Each value consists of a variable length sequence of bytes.

Now let us look at these primary entities in more detail.

Containers

All Bento objects are stored in containers. Bento knows very little about a container beyond the objects in it. However, the container itself is an object, and can have properties, so applications can specify further information about the container if they wish.

Containers are often files, but they can also be many other forms of storage. For example, we are already planning to support the following types of containers: blocks of memory, the clipboard, network messages, and Bento values. Undoubtedly other types of containers will be useful as well.

Objects

Each Bento object has a persistent ID which is unique within its container. Other than that, objects don't really exist independent of their properties. An object contains no information beyond what is stored in its properties.

Properties

A property defines a role for a value. Properties are like field names in a record, except they can be added freely to an object, and their names are globally unique, so that applications can understand them. Properties are distinct from types.

For example, a string might be used for the name of an object, the author of the object, a comment, etc. These different uses would be indicated by different properties.

Conversely, the string might be in ASCII, Unicode, or some other international string representation. These different formats would not be indicated by the property, but by the type (see below).

Values

Values are where the data is actually stored. The data for a value can be stored anywhere in a container. In fact, it can be broken up into any number of separate pieces, and the pieces can be stored anywhere. (See the discussion of continued values below.)

Values may range in size from 0 bytes to 2^32 bytes (if you have that much storage). Bento is optimized for 'large' values, such as streams of formatted text, graphics metafiles, etc.

Types

The type of a value describes the format of that value. Types record the structure of a value, whether it is compressed, what its byte ordering is, etc.

To continue the example above, the type of a string value would indicate the alphabet, whether it was null terminated, and possibly other information (such as the intended language). It might also indicate that the string was stored in a compressed form, and would indicate the compression technique, and the dictionary if one was required. If the string used multi-byte characters, and the byte-ordering was not defined by the alphabet, the type would indicate the byte-ordering within the characters.

Secondary Entities

There are several additional entities that play supporting roles in the Bento design. These entities are important to fully understand how Bento works, but they do not significantly change the picture given above.

Type and property descriptions

The property associated with a value is a reference to a property description. Similarly, the type is a reference to a type description. These type and property descriptions are objects, and their IDs are drawn from the same name-space as other object IDs.

Many type and property descriptions will simply consist of the globally unique name of the type or property. To continue the example above further, the type of a string of 7-bit ASCII, not compressed or otherwise transformed, would simply be described by a globally unique name. This would allow applications to recognize the type.

Reference to type and property descriptions are distinct from references to ordinary objects in the API to allow language type checking to catch errors in the manipulation of type and property references. However, type and property references can still be passed to the object and value operations, so that value manipulation can be done on types and properties as well as normal objects.

Globally unique names

Globally unique names are simply strings that follow certain conventions. They begin with a registered naming authority, and have additional segments, each of which is unique in the context of the previous segments.

The most common globally unique names will be generated by system vendors or commercial application developers, and may be registered. However, many names will be generated by local developers to record their local types and properties. To meet this need, the naming rules allow for local creation of unregistered unique names.

IDs and accessors

Each object is assigned a persistent ID that is unique within the container in which the object is created. These IDs are never reused once they have been assigned, so even if an object is deleted, its ID will never be reassigned.

In the API types, properties, and objects can be referred to using their IDs, but for convenience, they are usually referred to using accessors provided by the API. Since IDs are only unique within a container, they must always be used with an explicit container, while the accessors include an implicit container reference.

Accessors are used to refer to containers and values. Accessors are only unique within a given session, so they cannot be stored in values as reference to other values. IDs must always be used for persistent references.

Dynamic values

Bento needs to support external references from one container to another, or to other entities such as files, etc. It does this through dynamic values. These are values whose types indicate that they contain a description of the real value, rather than the actual data.

Except for the indirect characteristic of their types, indirect values are created and stored exactly like normal values. However, when they are accessed, a handler is called to resolve the description to an actual value.

Value segments

To support interleaving and other uses that require breaking a value up into pieces, Bento allows a value to consist of multiple segments stored at different locations in the container. These segments are not visible at the API, which glues them together to create a single stream of bytes.

Handlers

Handlers are pieces of code called by the Bento library to do specific jobs, but not part of the Bento library as such. Functions are put into handlers rather than the library to make the library more portable, and also to provide a standard way to extend the library.

Handlers come in two main forms: container handlers and value handlers. In addition, the API uses special handlers for reporting errors and allocating and deallocating memory.

Actual I/O to containers is always done using container handlers, to provide platform independence. Container handlers provide stream I/O, plus a few special interfaces for reading and writing specific parts of the container format.

The many different types of containers mentioned in the first section are not actually implemented in the Bento library. Instead, the library simply calls different types of handlers, all of which provide the same interface. These handlers map I/O to the underlying storage in a way that depends on the container type.

Value handlers are only required for values that require special support for access. For example, a value that is compressed on writing and decompressed on reading would need a special handler. Value handlers have the option of providing specialized operations to manipulate the value, either instead of or in addition to the standard value operations.