OS/2 Routes - Shared Nothing
Written by Tom Brown
Welcome to OS/2 Routes. The importance of networking continues to increase in the information technology sector to the point that now even word processors incorporate special network features. This entry and intermediate level column is dedicated to network related issues, but networking experts are unlikely to find any revelations on these pages. I will try to make the content and scope clear in the Topic section of each column so that knowledgeable reader's time is not wasted.
This column discusses IBM's Shared Nothing multi-processing architecture. Shared Nothing is a technology that IBM is counting on to take us into the future. I will share what I know about this very compelling technology, as well as engage in some outright speculation regarding the future implementation of this form of parallel processing.
Shared Nothing technology allows multi-processing across multiple separate machines using an external bus (the network). This is more than just a frivolous idea in an engineer's head. DB2 Parallel Edition has made use of this technology for roughly a year now, but a generic implementation of Shared Nothing multi-processing is still on the horizon. I have followed this technology with great interest and in this column, I will share what I have learned with you.
We're all familiar with Symmetric Multi-Processing (SMP). Multiple CPUs, which are typically identical and at the very least have identical external clock speeds, share memory and peripherals in a single box. The efficiency of this system decreases as the number of CPUs is increased. As I understand it, Intel's APIC architecture SMP boxes require that other processing cease during the processing of an interrupt. Since an increased number of CPUs is very likely to cause increased IO activity to feed these CPUs, this architecture will reach a capacity wall where increasing the number of CPUs does not increase performance. Even if multi-processing could continue during interrupt service, memory subsystem saturation will result in diminishing performance returns with an increased number of CPUs. The key to this system is that it is quickly lowering in cost and it provides a cost effective way to increase the availability of cycles to applications which are able to take advantage of them.
Shared peripheral multi-processing architecture provides each CPU with its own block of memory. Each CPU/Memory module shares a common bus which connects the peripherals. This architecture reduces the problem of memory subsystem saturation, but not the peripheral level bottleneck. The best application for this technology is on systems with very high cycle demands that also have very low I/O demands.
The Shared Nothing environment shares an external bus, that being the network, but does not require specialized hardware. Communication through the bus takes place using a networking protocol. Using duplicate network cards in each machine, all single points of failure can be eliminated. The capacity and availability improvements using this type of architecture will be tremendous. Not every system will lend itself to performing well with this architecture, however. It will be up to each application designer to provide improved availability, improved capacity, or both.
So far, this technology is not mature. The DB2 implementation is application specific and parallel processing is controlled by DB2 itself. Apparently, IBM is adding Shared Nothing multi-processing extensions to the O/S subsystems for an upcoming WARP Server PE product. At this point, there is little more information than reports of a very early design of WARP Server PE being demonstrated at COMDEX. Expect this system to be designed more for availability than capacity.
Communication between parallel nodes will likely require, or at the very least perform better with, a networking subsystem that provides expedited transport. This leaves APPN and/or ATM as the likely options. These protocols will further add value to the architecture with their use of IBM's high performance routing (HPR) extensions. APPN-HPR or ATM-HPR have the capability of routing around failure as well as the ability to prioritize bandwidth allocation. A good guess would be that IBM will use their AnyNet technology to allow inter-process communication to run over either APPN or ATM, perhaps with at least some form of IP support. DB2 currently makes use of APPN.
The use of a more sophisticated protocol for high performance communication between machines does not mean that all applications will have to be rewritten to these new protocols in order to take advantage of this new architecture. It is likely that current protocols will continue to flourish. Cross machine IPC will simply travel with a higher priority than NetBIOS or IP traffic, which will likely continue to be the protocols of choice. In other words, if you open a cross-machine semaphore, the system will create a high priority network packet (invisible to you), which can be routed around connection failures or lower priority packet congestion. The application could well be a high availability NNTP or NetBIOS server.
DB2 PE is able to take advantage of all three multi-processing architectures that are listed in the previous section. I don't know if the new multi-processing extensions will include support for all three architectures in a generic fashion. It is likely that IBM doesn't know at this time, either.
Does This Mean That a PC Can Replace a Mainframe?
The answer to this question remains the same. No. Could a group of PCs replace a mainframe? Perhaps for some tasks. Parallel processing could provide the capacity and availability required for inexpensive systems to take over some tasks which are currently done best by a single, highly available platform. Availability could actually be higher using parallel processing than using a mainframe, with a well designed system. Of course, there are some things which are done extremely well and cost effectively on a mainframe, such as galaxy scale transaction processing. These applications are not likely candidates for early porting to a closet full of PCs. Still, the capacity and availability of Shared Nothing multi-processing will likely make it a technology that will be increasingly difficult to ignore.
Application Design Speculation
Let's assume that an API is developed which will allow a remote thread to be spawned. We can take an educated guess as to some of the characteristics of this remote thread. The speed of the spawned thread will not be negatively impacted by the fact that it is running on a different machine and will simply run at the speed of the remote machine. Inter process communication with the thread will have the added processing overhead of a network transaction. A network transaction will mean a context switch to the networking process which will negate much of the charm of using multiple threads versus multiple processes. All remote IPCs will take on the latency associated with the network.
From this we can speculate that the applications which will perform best in a shared nothing multiprocessing environment will be those which make heavy use of general purpose worker threads. These threads will need to be fairly autonomous and make little or no use of semaphores or shared memory, if shared memory is implemented. Basically, each thread would be an identical transaction processor. Writing an application with general purpose worker threads is probably a good idea anyway.
Increased capacity does not always mean increased speed. Operations which are inherently atomic will not see an increase in speed during non load periods. Operations that can be broken down into smaller parts and spread across multiple machines will be more likely to see increases in speed. Atomic operations will still benefit from multiprocessing. While the speed of an individual transaction will remain roughly the same, the ability to process multiple transactions concurrently will improve the ability of the system to handle load. This will make performance more predictable.
A good candidate for shared nothing multiprocessing might be ray tracing. Once a scene is parsed and a virtual space is created, the tracing of light rays through the space could be divided among a large number of worker threads on many machines. This type of application is likely to see near linear performance improvement as machines are added. Other candidates might be existing transaction processing such as CICS or MQueue applications. These applications will already have a conducive architecture.
Availability and capacity problems can be solved today using technologies like DCE, and by building intelligence into applications. What is interesting about this development is that it looks like it will provide generic services to applications such as inter-machine semaphores and messaging. These services will make it easier to write high capacity, highly available applications. It is unknown at this time how generic this API will be and how easy it will be for developers to enable their applications for this type of parallel processing. The availability at all to external developers remains in question. One thing is clear though: the phrase "The network is the computer" is becoming increasingly accurate with the passage of time.