Enabling Industrial-Strength OO Applications with SOM and CORBAservices

Reprint Courtesy of International Business Machines Corporation, © International Business Machines Corporation

This article discusses the central role that IBM's System Object Model (SOM) and Object Management Group's (OMG's) CORBAservices specifications play in developing robust, reusable, distributed applications.

The first part of this article focuses on a straw man process, detailing a distributed development environment's roles, responsibilities, and contracts with each other. The article then discusses the need for a standard set of object services, such as those OMG provides. The article concludes with an architectural overview showing the primary interfaces that the various roles use, implement, install, and configure. After reading this article, you should have a good idea of what you can do today to prepare for tomorrow's distributed development environments.

Object technology has yet to fully deliver on its promises of increased productivity, large-scale reuse, and easy maintenance in developing industrial-strength software applications. The delay is not due to a limitation of the technology itself, but rather how it is used by mainstream practitioners. In short, few of us exploit the additional encapsulation afforded by objects. Luckily, the rising popularity of three-tier software architectures, with their dependency upon distributed object services, has forced us to start moving in the right direction.

Why Distributed Objects?

The notion of a distributed application goes beyond simple client/server separation to encompass the idea of distributed development, separating the concerns of the individuals involved in software development and their various roles, so that each can work in parallel (or incrementally/iteratively over time). The practical implication for either interpretation is the same - there have to be two things: a development process and a mechanism that allows separately developed (i.e., binary) objects to be integrated at runtime. IBM's System Object Model (SOM) provides the mechanism, while the Object Management Group's (OMG's) CORBAservices specifications provide a standard platform of object services around which a development process can be built.

Defining the Contracts

Figure 1 shows a straw man development process for distributed applications, showing at a high level the roles, their primary areas of expertise (i.e., concerns), and the work products that they are responsible for developing.

Figure 1. Distributed Development Roles, Responsibilities, and Work Products

Roles and Responsibilities

In Figure 1, the end-user (EU) role is included for completeness. End users are primarily concerned with their own job responsibilities, and they use distributed objects to help manage job-related tasks.

The three primary roles taken on by those responsible for software development include:

Object Providers (OPs) build new classes of objects, capture the business logic of a domain (e.g., insurance, banking, or travel) and use a standard set of object services (to be described in detail later in this article).
Service Providers (SPs) build classes of objects that implement these standard services on a particular set of platforms and technologies (such as DB2 or Lotus Notes).
Application Assemblers (AAs) create distributed application objects by installing and configuring business objects with object services to meet the operational requirements of the customer environment.

The most important idea here is that a set of standard object services serves as the basic contract between the roles. Although the straw man process proposed here is abstract enough to allow for any object services standards to be used, I will focus on the object services defined by the Object Management Group (OMG) as part of its CORBAservices architecture.¹

Object Services

For this discussion, it is useful to divide object services into the following three categories: explicit services, implicit services, and enabling services.

Explicit services are used by an OP in developing business objects, implemented by an SP, and configured by an AA. This category includes services already adopted by OMG such as naming, query, events, and properties, as well as those not yet adopted such as collections. These explicit services form the basis of the contracts between the three roles.

Implicit services are those that the OP can assume to exist when developing business objects, such as persistence, transactions, concurrency, and security. (Security has yet to be formally adopted by OMG, but was recently recommended for adoption by the OMG Technical Committee.) Although these services are only indirectly used by an OP during development, they are implemented by the SP and configured by the AA.

Figure 2. The Distributed Object Iceberg

Enabling services are a class of explicit services implemented by an OP, configured by an AA, and used by an SP to tie implicit object services to business objects in a domain-independent manner. These services, such as lifecycle and externalization, are the only ones requiring the OP to implement specific interfaces.

The Distributed Object Iceberg

An interesting way to visualize the relationship between these three categories of services is to imagine a distributed object as an iceberg (see Figure 2). A distributed object combines one or more business objects (implemented through services in the tip of the iceberg) with one or more object services (implemented through the massive, invisible part underneath the surface). Enabling services (which serve as the waterline and are visible to and implemented in part by all) are used when required to tie both domains together.

Distributed Application Framework Architecture

The previous discussion provided a conceptual background to understand the CORBAservices at a high level; however, you will need specific details (such as a class hierarchy and relevant interfaces) to really begin developing distributed applications. Figure 3 is a straw man for the rest of this article. The objects in Figure 3 show the more important CORBA-compliant interfaces they use and/or expose (for others, see the CORBAservices specifications), as well as some fundamental relationships among these objects. I'll briefly describe each.

Figure 3. CORBA-Centric Distributed Application Framework

Business Object

In general, Business Objects (BOs) will be created by the OP; however, as the framework shows, a Distributed Object (DO) is a subclass serving as the first "real" manifestation of a BO in a distributed application, using the various types of Object Services. The primary reason for this arrangement is to allow the AA to "insert" implicit services (such as identity, reference counting, persistence, and/or transactions) into a BO, transparent to the OP. In practice, whenever an OP thinks that he or she is working with a BO, he or she is actually dealing with a DO. (I'll discuss DOs and their relationship to BOs and Object Services in more detail in the next section.) The rest of this section describes the services that are associated with each BO.

duplicate() is used by "good-citizen" BOs (i.e., those that are coded by the OP according to good programming practices) when handing out references to other BOs, so that the Base Collection can reference-count the associated DO if desired.
release() is used by good-citizen BOs when they are done with a reference to other BOs, so that the Base Collection can garbage-collect the DO if desired. This should not be confused with the remove() method.
move() is used to transfer a BO's "ownership" (and identity) and associated DO from one Base Collection to another.
copy() is used to create a new BO and DO initialized from the BO called. Note: The new object's identity depends upon the Base Collection in which it is created.
remove() is used at the end of an object's logical lifecycle to completely destroy the BO and associated DO.
externalize_to_stream() is used to export the essential internal state of a BO so that it can be made persistent, passed by value, cached, moved, copied, or, in general, used by any services needing access to the BO's state.
internalize_from_stream() is used to import the essential internal state of a BO as a consequence of restoring its persistent state or otherwise initializing it from a move, copy, cache, etc.

You will notice that these are enabling services - that is, OPs will implement and use them to help move the various BOs of their domain through a stylized lifecycle (see Figure 4).

Figure 4. Business Object Lifecycle

A given Distributed Object (described next) extends most of these methods to add implementation-specific behaviors that are used with various Object Services.

Distributed Object

An AA configures Distributed Objects by mixing together a Business Object provided by an OP and a Distributed Object implementation provided by an SP. For example, an OP may provide an Employee Business Object, while an SP may provide a DB2 Distributed Object. The AA may configure them into a DB2 Employee Distributed Object.

As mentioned before, a DO is used to transparently add services to a BO, especially identity, which the SP provides with a Base Collection (a kind of Object Service described in the "Base Collection" section further in this article). The additional services associated with a DO to be described here are usually implicit and not often used directly by an OP, although there are exceptions. Instead, an SP uses them in implementing Object Services, such as Reference and Named Collections (described later in the "Reference Collection" and "Named Collection" sections).

A good way to visualize this relationship is by imagining a Base Collection as a circuit board, with the DO as an intelligent socket (in that it holds identity and other states) and the BO as the chip that plugs into the socket (see Figure 5).

Figure 5. Base Collection Circuit Board

Messages first must pass through the system backplane into the circuit board and through the socket before they are propagated to the chip itself; therefore, any number of additional implementation-dependent relationships and behaviors, such as the following, can be inserted without involving the chip designer (the OP):

is_identical() is used to determine if two references refer to the same logical DO (even though they may be physically replicated).
constant_random_id is a read-only attribute used to quickly determine if references are different, because this ID can be cached with local replicas (or proxies as they are commonly known); however, having two identical values is no guarantee that the references are to the same logical DO (use is_identical() to be sure that two references are equal when two constant_random_ids are equal).
externalize_key() is used to write an external form of the key into a stream, usually during a write_object call on a StreamIO or Cursor².

Object Service

The SP creates Object Services, whose main purpose in this level of the hierarchy is to serve as a common base class for the concrete types (such as base collection, reference collection, and named collection) in order to take advantage of any implementation details, yet allow for federation. In general, an Object Service's purpose is to provide access to one or more associated Distributed Objects via the following interface:

evaluate() takes a query string (of various syntaxes³) and returns a CORBA "any" type.

However, for purposes of this straw man architecture, I recommend that a Cursor (described next) be returned.

Cursor

Object Services and Cursors are meant to be developed in pairs (to take advantage of any underlying implementation for query evaluation), just as a Distributed Object is meant to be developed in conjunction with a Base Collection (so that the key can be implemented efficiently).

Regardless of what you call it, you need an object to be returned that is flexible enough to handle the wide variety of information that can be "selected" in an SQL-like query string:

Object references only (e.g., select & to make an associated Reference Collection)
A subset of data from the matching entries (e.g., select name phone for a report or view)
A copy of the matching entries (e.g., select * for an archival snapshot)

The following methods on Cursor are designed to support this flexibility:

next() is used to position the cursor at the next entry of selected data
reset() is used to position the cursor back to the first entry
more() is used to check the cursor for more entries that can be processed
read_xxx() is used to read the next data item from the entry, where xxx is an IDL data type or an object reference (the latter is read via read_object())
write_xxx() is used, when a cursor supports update or insert, to write the next data item to the entry

The StreamIO functions (read_xxx() and write_xxx()) eliminate the need for "any" processing, since the type is explicit in the function name. 4 They also allow use of internalize_from_stream() or externalize_to_stream() with the cursor as the source of the data in the copy scenario described.

In this manner, a Cursor can be viewed as a "window" over the result data set of a query, with the Iterator functions providing sequential access to entire entries, and the StreamIO functions providing sequential access to the individual data items within the entries.

Base Collection

An SP provides a Base Collection primarily for identity; however, it is also the means by which other implicit services (such as persistence, transactions, concurrency, and security) are added to Distributed Objects. For example, an SP who is an expert in DB2 databases may develop a DB2 Base Collection that not only provides identity (through the database table name and key fields) but also transactions, persistence, and security.

In general, an OP will select a Base Collection wherever a BO needs a "by-value" collection. For example, a Trip object in a travel reservation system needs a collection of Reservations and uses create_object() to create (or activate) an entry directly in the Base Collection using a set of name-value pairs to initialize the state and/or specify the key.

Reference Collection

The SP provides a Reference Collection when a BO needs only a collection of references to other BOs. For example, a Travel Service might only need a list of references to Reservations. The following methods are needed:

add_element() is used to add a reference to an existing object to the collection of references.
remove_element_at() is used with a Cursor to take elements out of the collection.

The ramification is that the objects maintained in a Reference Collection must already exist in a Base Collection (already described), although they might have been located through a Named Collection (described next). It also means that a given object can appear in any number of Reference Collections, or even multiple times within the same Reference Collection (which is why the remove_element_at() method needs to use a Cursor to point to the specific entry). In any event, the object references become part of the essential state of the Reference Collection itself (see the streamable functions in the previous BO description).

Named Collection

The SP provides a Named Collection when a BO needs to have a uniquely qualified (named) list of references to other BOs. For example, a Trip might only support one Reservation of each type, such as Car Rentals, Airlines, Hotels, etc. Named Collections support the following methods:

bind() is used to associate a unique (within this collection), human-understandable name to an existing object and, in effect, add to the collection of named references.
resolve() is used to look up a name and return the associated object reference (if the name is found within this context).
unbind() is used to remove the name and associated object reference, leaving the object itself unaffected.
rebind() is used to associate a different object reference to a name that already exists within the context, effectively "unbinding" and "binding" in one step.

Like Reference Collections, a given object reference can appear in many Named Collections and even appear multiple times in the same one (if different names are used to bind it). Further, both the external form of the name and the externalized key of the bound objects are part of the Named Collection's essential state.

The Development Process

Now that we have defined an implementable architecture, the next step is to understand the dynamics of how to use it to develop a distributed application. Space limits me from taking an example through the entire process; however, I will overview the major paradigm shifts.

The Role of SOM and Binary Reuse

SOM is an excellent mechanism to develop objects in the language and compiler of your choice (currently C and C++) and integrate them at runtime. If your shop does all development with a single compatible set of compilers, then this feature may not seem useful to you; however, over time and with future releases, even compilers within the same family can become incompatible. Using SOM now can help keep you from having to "recompile the world" later. Also, if your shop cannot afford (or lacks the expertise) to develop some of the more complicated object services (such as transactions and concurrency), then you will appreciate the ability to use the objects you develop with binary objects developed outside of your control.

To use SOM with this architecture, simply make SOMObject the "top" of your inheritance hierarchy by adding to the list of Business Object "parent classes." This subtle movement away from language-centric programming represents the first paradigm shift.

The Need for IDL

The second paradigm shift you must make when developing distributed applications is to start with the interfaces to Business Objects, that is, start with CORBA Interface Definition Language (IDL). Of course, you can use a tool to help you graphically draw your Business Objects and their relationships, methods, and attributes, then automatically generate the IDL.

IDL is necessary for your local compiler to link, load, and execute the binary objects supporting the interfaces described. Usually IDL is run through an "emitter" (i.e. a code generator) to generate headers in your development language, so that your Distributed Objects can "naturally" invoke methods on objects developed by others (down the hall, across the world, or even by you from an earlier point in time). Thus, even though a Business Object or an Object Service may be developed in C, your C++ Distributed Objects can treat them as if they were implemented in C++.

Coding Enabling Services in All BOs

The third major paradigm shift you'll make concerns how you develop your Business Objects (even if they are Distributed Objects, Object Services, or Cursors). You should override the enabling services associated with BOs in the previous section:

duplicate() is usually not overridden in a BO, while DOs that reference-count will often increment the count prior to calling the parent BO.
release() is implemented in the BO by propagating a release() to all other BOs to which it maintains references. A DO that reference-counts will often decrement the count and only propagate the call to the parent BO when the count re aches zero.
move() is often implemented by default in the BO to externalize itself to a stream, use the stream to initialize a newly created DO in the Base Collection, then remove itself. Some DOs will keep a forwarding reference to the new entry for a period of time.
copy() is similar to move, except that the DO need not worry about forwarding, because the state of the object does not change.
remove() is implemented in the BO by propagating a remove() to all "sub-objects" (i.e., those owned by this BO), and a release() to all others. A DO will often set a state indicating that the object has been deleted, so that the key does not get reused.
externalize_to_stream() is implemented in a BO by writing to the stream passed in all the non-derivable internal states in a form that is not likely to change.

Primitive attributes are written using their corresponding IDL type (float, long, string); object references are written using write_object(); sub-objects are written recursively using externalize_to_stream(); and sequences (although for maximum configurability by an AA, writing a reference to one of the collections described above should be used instead) are written sequentially with a write_end() after the last entry. DOs only have to externalize a state, according to these rules, that isn't already maintained in the Base Collection's underlying datastore (the externalize_key() method follows these rules as well).

internalize_from_stream() is implemented the same way as the externalize, except that the reads must exactly match the writes (and a read_end() should be checked prior to reading data entries in the sequence).

Providing an Essential Data Schema

The fourth paradigm shift is that you should provide an attribute-only "map" (or schema) of the data written during externalization. This map, which is called the essential data schema (EDS), enables the AA to transform the stream data into other forms such as existing database schemas. If this data is not provided, then the AA is limited in the following ways:

Persistence mechanisms must support "opaque" fields (uninterpreted strings of binary data).
All queries are required to activate each object during evaluation.
No indexing on given fields can be supported.
User-interface panels must get their data from active objects.

Unfortunately, IDL does not support a "schema" section, so a naming convention is required. I recommend that the IDL for a given BO have _BO appended, with the associated essential data schema having _EDS appended. For example, the two IDL interface definitions provided for an Employee class should be Employee_BO and Employee_EDS.

For a DO, a convention such as _DO and _KDS (for key data schema) should be followed so that the AA can use the schema of the key to map to underlying databases and user interfaces.

OPs and SPs who want to maintain some proprietary data while exposing other, more open fields can lump all the proprietary data together into one attribute (such as other_data), or simply give the individual fields meaningless attribute names that are not explained in the user documentation. The former allows for more opacity (since the types are not known), and the latter for more efficiency (because there is no need to first buffer the fields, which can still be indexed, queried, and so on).

Coding Domain-Specific Methods Using a Factory-Centric Programming Model

The basic paradigm shift here is that, by using the explicit interfaces of object services and enabling interfaces of other business objects, your object will not only be a good citizen in a distributed environment, but it will also be far more configurable than if it were to directly implement dependencies upon specific implicit services.

Beyond using these services, you should no longer use the create operation of your favorite development language to create new business objects. Instead, you must adopt a factory-centric programming model using the following steps:

Find a "factory" (Base Collection) that can create/activate the type of BO you require.
Send this factory the create_object() method as described in the Base Collection section.
Use and/or store this reference as part of your BO's essential data.

Specifically, each server process will have access to a Named Collection called the Distinguished Name Context (DNC). It will have various Base Collections bound to it by the AA. Therefore, the steps above can be as simple as writing the few lines of code shown in Figure 6.

empF = (BaseCollection *)DNC -> resolve("Employee");
emp = (Employee_DO *)empF -> create_object();
emp -> name = "John Jones";

Figure 6. Factory-Centric Sample 5

By using this factory-centric programming model, the AA can insert the implicit services that you as a BO programmer need not worry about. This does not mean that you cannot use a native create operator to manage sub-objects that you do not want the AA to know about; however, you would then be responsible for maintaining their integrity (usually by simply externalizing their essential state as part of your own).

Once your BO references another BO (with associated DO), its programming model described in the architecture section applies as-is. It is not repeated here.

Indicating Dependencies

The final paradigm shift is the way that BO dependencies are communicated to the AA in the absence of a formal IDL section to show dependencies between classes. As in the previous subsection, the OP determines the name and the expected class managed by the Base Collection that is to be bound into the DNC by the AA.

Again, I recommend using a naming convention and a special dependencies file (.dep). This file contains a list of names and classes and, if the class is one of the Object Services types such as Base Collection, Reference Collection, or Named Collection, it contains the class of entry maintained, referred to, or bound (see Figure 7 for an example).

TRS::Member::planned_trips BaseCollection TRS::Trip
TRS::Member::completed_trips BaseCollection TRS::Trip
TRS::Member::profiles ReferenceCollection TRS::Profile
TRS::Trip::reservations NamedCollection TRS::Reservation
TRS::Reservation::trip TRS::Trip

Figure 7. Example of Contents of the TRS.dep File Indicating Dependencies

You should note that many BOs will maintain references to entire Base Collections, Reference Collections, or Name Collections. Not only does this reduce the amount of code that the OP has to write (to query, insert, delete, and externalize/internalize), but it also permits the AA a high degree of configurability, since an AA can bind the same collections into the DNC under many names (as long as the returned class is the same).

For example, the AA could bind the factory for both planned and completed trips to the same Base Collection, or two different ones with radically different qualities of service (one stored in flat files, another in a DB2 database) without change to the Member or Base Collection class.

More About Services

Hopefully, this overview helps to illustrate the kind of flexibility a distributed application development environment offers. Following are some important points to remember about the services discussed:

The word "implicit" in object services refers only to their use by the OP when developing business objects-not whether the service is visible to an SP or AA. For this reason, services such as user interface and persistence are considered to be implicit.
Object services are considered a subclass of Business Object, to allow the AA to flexibly manage them without architecting how their keys should be implemented. In general, all Business Objects should be written to be independent of identity so that t hey can be used in many different contexts. This philosophy extends to Object Services and Cursors (although the latter are often implemented without identity).
There are many other implementation choices, even within a single service such as security (e.g., two- versus three-party authentication, public versus private key encryption, capability versus ACL-based authorization). Therefore, object services should be thought of as relatively open-ended categories.
The list of services, especially the implicit ones, will continue to grow as technology evolves. Therefore, you need to consider such non-standard services as Trace and Debug, which are sure to become widely available as part of a distributed application development framework.
There is no absolute requirement that the SP implement these services using object technology, nor that these services must expose "standard" interfaces beyond those described. This point is made not only to reinforce the idea that objects are encapsulated, but also to urge you not to wait for an outside company to provide these object services for you.

What Can You Do Now?

Without waiting for another company to provide you with a set of object services, what can you do today to enable robust, reusable distributed applications?

Follow the cookbook described here to separate the concerns of OPs, SPs, and AAs for code you develop.
Expect those who deliver code to you to recognize these separate concerns.
Look for code in your shop that provides similar function to the object services described above and "wrapper" (i.e., build object layers around) them using SOM to support the interfaces defined.

In the end, you will find that having a well-defined, distributed architecture will save you development time and effort, even if you must code everything yourself, because the work of defining the interfaces has already been done for you by OMG, and the mechanism for making binary objects "distributable" is already provided by IBM's SOM technology.

For more details about the exact signatures and other methods associated with the CORBAservices specification, so that you can extend this straw man architecture or develop your own, see Object Management Group's CORBAservices: Common Object Services Specification.

Acknowledgments

I want to especially thank the following people for their efforts during the past year to help me understand the role of object services in a robust, reusable distributed application: George Copeland, Randy Fox, Jim Sides, Rob High, Don Ferguson, Jim Rayfield, Eric Herness, Charlie Redlin, Frank Malin, Rimas Rekasius, and Tony Wells.