High Resolution Timing under OS/2
Written by Timur Tabi
Contrary to popular belief, high resolution timing is quite possible under OS/2. There are services to query the current time with microsecond accuracy and to receive events with millisecond accuracy. This article will discuss all of the timing mechanisms available for OS/2.
Background: Ring 3 and Ring 0 - User vs. Supervisor mode
OS/2 is a protected operating system. One of the features of a protected OS is that a given program lives in a particular level of protection, or privilege. The microprocessor defines the number of privilege levels and the differences between them, and they are ranked from least privileged to most privileged.
Early processors, like the 8086, only had one level. Most modern processors allow two levels, typically called supervisor mode and user mode. The 80x86 line, however, allows four levels, or "rings" as Intel calls them. Ring 0 is the most privileged and Ring 3 the least. OS/2 assigns Ring 0 to supervisor mode and Ring 3 to user mode. Ring 2 is used as a sort of "privileged user mode," but for the purpose of this article, it can be treated like Ring 3. Ring 1 is not used by OS/2.
Therefore, a particular piece of OS/2 code runs in either Ring 0 or Ring 3. For example, most device drivers and the OS/2 kernel run in Ring 0, and all applications run in Ring 3. The rules for program execution vary greatly between the two rings.
The primary difference between the two rings is in the type of instructions that code in a particular ring is allowed to execute. This is why these rings are also called "privilege levels" - code running in Ring 0 is more privileged than code in Ring 3, and can therefore execute more types of instructions. For example, Ring 0 code, which usually exists in the form of a physical device driver (PDD), can perform I/O instructions, can receive hardware interrupts, and can get access to all of the system memory. However, PDD's are more difficult to write and debug, so the idea is to avoid them if possible.
The second difference between Ring 0 and Ring 3 is the scheduling mechanism. Scheduling is the process of transfering control from one program to another, and the scheduling mechanism determines how and when such a transfer, a.k.a. task switch, occurs. As you know, OS/2 is a pre-emptive multithreaded operating system, but this only applies to Ring 3. Ring 0 has a completely different scheduling mechanism.
Processes: What are they, really?
The word "program" is very nebulous and abstract term. We all understand what a program is, but it's a difficult concept to describe in words, so I am not going to try. "Process" is sometimes used synonymously with "program", but a process is really more specific, in that it describes one aspect of a program. In short, a process refers to the fact that when a program runs in OS/2, the memory (RAM) it occupies is protected from other processes. That is, without special provisions, one process cannot affect another process, either by accessing its memory, modifying its threads, and so on. A program, in its most vague definition, can actually consist of multiple processes, but typically it's a one-to-one ratio. They key point is that whatever program or part of a program is located in a process, that code is separate from other processes.
Threads: Not quite virtual CPUs
A thread is defined as "a separate and almost simulataneous section of executing code." This definition, unfortunately, doesn't describe what a thread really is. Threads are an abstraction - they simulate having an infinite number of microprocessors working in parallel. In a sense, each thread is a "virtual CPU". The idea is to take a portion of your program that does one specific task and create a thread for it. Then one virtual CPU will perform this task while the other virtual CPUs work on other tasks. With an infinite number of CPUs, each thread can run at 100% all the time.
Unfortunately, this model does not fit reality. In practice, most computers have only one CPU, and those that have more still have far fewer CPUs than threads. In situations when there are more threads than CPUs (i.e. all the time), the threads must share the CPUs. This sharing is performed by having the CPU switch from thread to thread. After a thread is given a chance to run, typically for a few milliseconds, it is "put to sleep," and then the next thread, which was previously asleep, is given a chance to run also for a few milliseconds. This duration of time when a thread runs is called a timeslice, and the switching from one thread to another is called a task switch.
The fact that the CPU must switch from thread to thread alters the model of an infinite number of virtual CPUs. If each thread really did have its own CPU, then a thread would most likely enter an idle loop when it needs to wait for an event. If all threads had to share the same CPU, then this thread would eat up its entire timeslice doing nothing. In other words, although threads attempt to provide the concept of an infinite number of virtual threads, in reality the programmer must design his threads knowing how many real CPUs there are. And the rules change if you have one or more than one CPU.
Only one CPU
With only one CPU, there is no true parallelism among the threads. That is, you will never have two threads executing at the same time, so it makes no sense to divide a computationally intensive task into multiple threads, because it will not complete any faster. In fact, it will take longer because you now incur the overhead of task switches. The purpose of threads now becomes to make more efficient use of the CPU. The less CPU your application needs, the faster it runs and the more CPU remains for other applications. For this case, the use of threads falls into two categories:
More than one CPU
It's difficult to find software that could take advantage of multiple CPUs, let alone software that actually is optimized for multiple CPUs. In most cases, if your app is heavily multithreaded, it will run just as well, and maybe better, on multiple CPUs. In most cases, however, the other CPUs will be used for any other programs or for the operating system itself. Although your app may not run better, the system as a whole runs better. For example, DOS sessions can be configured to use two threads, and it has been reported that on Warp Server SMP with two CPUs, Windows MIDI applications run flawlessly.
A computer which has more than one CPU, where all CPUs are identical and operate in parallel, is called a Symmetric Multi Processing (SMP) machine. Techniques for optimizing your code for SMP are beyond the scope of this article. For more information, check out EDM/2 issues 5-7 and 5-9.
Thread states and classes
A typical system has dozens, maybe hundreds of threads at any given time. Since there is only one CPU for all these threads, something must keep track of threads so the CPU knows where to go during a task switch. This something is called the task scheduler, or just scheduler. In order to determine which threads are next in line, the scheduler applies two attributes to each thread: state and priority.
Threads have three states: running, ready, and blocked. A running thread is one that is currently running, so obviously the the number of running threads in the system is less than or equal to the number of CPUs. A ready thread is one that is waiting to run, i.e. the ready state occurs right before the running state. When a task switch occurs, the scheduler looks only at the list of ready threads to determine which one is next. A blocked thread is one that has been put to sleep, either because it or some other thread forced it to sleep, or because it is waiting for an I/O operation to complete. Once the thread is "woken up", it is moved to the ready state.
A thread has any one of four priority classes: idle (I), regular (R), foreground server or fixed high (S), and time-critical (TC), and within each class there is a level or delta, which ranges from 0 to 31. For example, a time-critical thread with a +5 delta would be written as TC+5. Deltas are used to specify a relative difference between two threads in a particular class. For instance, if you have two time-critical threads A and B, and if you want to ensure that A always runs first whenever both A and B are in the ready state, you would make A's priority TC+1 and B's priority TC+0.
The delta is also used by OS/2 to dynamically modify a thread's priority. A thread is never boosted to the next class, and only threads in the regular class are modified by the system. Reasons why a thread would be boosted include:
The idle and foreground server priority classes are not important in the context of this article. The time-critical class, however, is very important. It behaves radically different from the other classes, and these differences are the basis for some of the techniques discussed in this article. Time-critical threads are covered in detail below.
A process has at least one thread. Any process or program which has more than one thread is said to be multithreaded.
Thread Scheduling in Ring 3
The OS/2 scheduler, which controls the thread switching, has three characteristics:
Consider three threads, T1, T2, and T3. T1's priority is R+3, and both T2 and T3 are R+0. When the scheduler is activated, it must look at all of the threads in the ready state and decide which one is to be run. Let's consider all possible combinations of T1, T2, and T3 being in the ready state, and there are no other threads in the system. Let's also ignore the case where none of them are in the "ready" state.
T1 ready? T2 ready? T3 ready? Next to run? N N Y T3 N Y N T2 N Y Y T2 or T3 Y N N T1 Y N Y T1 Y Y N T1 Y Y Y T1As you can see, since T1 is of higher priority, it always runs if it's ready. And as for the case when T2 and T3 (but not T1) are ready, T2 will run if T3 was the last R+0 thread to run, and T3 will run if T2 was the last R+0 thread to run. In other words, all threads of the same priority are group together, in sequential order, and after a thread runs, it is moved to the end of the sequence.
Methods for synchronizing threads (e.g. by using semaphores) and communicating between threads (e.g. via pipes or shared memory) are beyond the scope of this article.
Regular threads - 32ms time slices
As you probably know, normal OS/2 threads run in 32ms time slices. But what does this mean, exactly? For starters, it means that a normal thread will run for no more than 32ms before it gets pre-empted. However, few threads actually run this long. After all, a 90MHz Pentium can easily execute two million instructions in 32 milliseconds. A thread will usually block for some other reason first, such as calling an OS/2 API.
Regular threads, as their name implies, are not very interesting. Because of they don't have any real-time capability, few of the advanced timing functions work correctly.
Time critical threads - 8ms time slices
Time critical threads are more than just threads that run at a higher priority. Their entire scheduling model is different in two ways:
OS/2 API Services for Ring 3 Timing
Scheduling in Ring 0
Whereas the scheduler in Ring 3 is pre-emptive, the Ring 0 scheduler is co-operative. Actually, the Ring 0 and Ring 3 schedulers are the same scheduler, the difference is that when a thread is running in Ring 0, the only thing that can pre-empt it is a hardware interrupt. And when the interrupt handler is finished, control returns back to the Ring 0 thread. There are no timeslices in Ring 0.
Threads in Ring 0 also have priorities and states, although thread priorities don't play as significant a role as they do in Ring 3, because usually there are far fewer threads in Ring 0 and they run infrequently and for only a short time. However, Ring 0 threads do have an addition attiribute: the context, or mode. The thread context limits what API's are available to the thread. The context is determined by how the thread was created, and therefore a thread cannot change its context. There are three contexts:
Device Driver Interrupt Threads
An interrupt thread is a device driver thread that runs in interrupt mode. In other words, it is the thread that runs an interrupt handler. Interrupt threads are the highest-priority threads in OS/2.
Interrupt handlers are functions that handle hardware interrupts - signals that come from devices, such as a hard drive or a modem, which indicate that the device needs attention from software. Typically this means that the device has data that it needs to send to a program, but it could also mean that the device is ready to accept data or that it has important status information.
The PC architecture supports 16 different hardware interrupt levels, or IRQs, and these levels constitute different priorities. This means that when an interrupt at a certain level is being serviced (i.e. the interrupt handler for that IRQ is currently running), it can be pre-empted by an interrupt of a higher priority. Interrupts at a lower priority are automatically delayed until the current interrupt handler exits. However, it is possible for an interrupt handler to disable all interrups with the cli assembly instruction, but SMP-aware drivers should use DevHelp_AcquireSpinLock API instead. In fact, a device driver can disable interrupts via cli or DevHelp_AcquireSpinLock from any context, not just from an interrupt handler.
Device Driver Context Hooks
Context hooks allow an interrupt handler to be processed in kernel mode. Before the interrupt handler is called, a context hook is allocated via DevHelp_AllocateCtxHook. A context hook is basically a device driver thread that runs in kernel mode. When the interrupt handler is called, it performs the bare minimum processing, such as fetching the data from the device and placing it into a queue, and then it arms the context hook via DevHelp_ArmCtxHook. Immediately after the interrupt handler exits, provided there isn't another interrupt pending, the context hook runs.
By using context hooks, there are three advantages. First, you can call more DevHelp services than you can in an interrupt handler. Second, moving time-consuming code from the interrupt handler to a context hook is more system-friendly, since interrupts are not disabled. Third, since the interrupt handler is much shorter, it might be possible to disable interrupts throughout the entire handler, removing the need for nested-interrupt logic in the code, thereby simplifying the programming.
DevHelp_Yield and DevHelp_TCYield
USHORT usrc; /* Return Code. */ usrc = DevHelp_Yield();DevHelp_Yield will yield the CPU to any higher priority thread. Every 3 milliseconds (as recommended by the online documentation), a driver should call DevHelp_GetDosVar subfunction DHGETDOSV_YIELDFLAG to see if there is a kernel thread in the ready state. If so, DevHelp_Yield should be called. DevHelp_TCYield is a special case of DevHelp_Yield, in that it will only yield the CPU to a time-critical thread. DevHelp_Yield is a superset of DevHelp_TCYield, so there is no need to call both.
DevHelp_SetTimer and DevHelp_TickCount - 32ms callbacks
USHORT usrc; /* Return Code. */ void NEAR TimerHandler(void); /* Callback function */ usrc = DevHelp_SetTimer((NPFN) TimerHandler )and
USHORT usrc; /* Return Code. */ USHORT TickCount; /* Number of ticks per call */ void NEAR TimerHandler(void); /* Callback function */ usrc = DevHelp_TickCount((NPFN) TimerHandler, TickCount );DevHelp_SetTimer is used to provide callbacks to the driver. A pointer to a function is passed, and every timer tick (32ms) thereafter, the kernel will call that function on an interrupt thread. DevHelp_TickCount does the same thing, but you have to option of specifying every n timer ticks.
DevHelp_GetDOSVar, subfunction DHGETDOSV_SYSINFOSEG
USHORT usrc; /* Return Code. */ USHORT TickCount; /* Number of ticks per call */ void NEAR TimerHandler(void); /* Callback function */ PGINFOSEG pGlobalInfoSeg; /* Pointer to Global Info Seg usrc = DevHelp_GetDOSVar(DHGETDOSV_SYSINFOSEG, 0, &pGlobalInfoSeg);This function returns a pointer to a GINFOSEG structure (see bsedos16.h) which contains, among other things, several fields of time and date information, much like the DosGetDateTime API.
TIMER0, The high resolution timer driver - the last resort
Again, it must be stressed that use of TIMER0 should be considered as a last resort.
Besides the IOCtl interface, TIMER0 also includes a device driver interface via Inter-Device Communication, or IDC. Two services are provided:
Given the large selection of timing services and the improved performance over Windows 95 and Windows NT, OS/2 Warp 4 makes a great platform for timing-sensitive applications, including those that need near real-time responsiveness.
Deitel, H.M., and M.S. Kogan. The Design of OS/2. Reading: Addison-Wesley Publishing, 1992.
Panov, Kathleen; Larry Salomon, Jr.; and Arthur Panov. The Art Of OS/2 Warp Programming. New York: John Wiley and Sons, 1995.
Stock, Mark. OS/2 Warp Control Program API. New York: John Wiley and Sons, 1995.
Mastrianni, Steven J. Writing OS/2 2.0 Device Drivers in C. New York: Van Nostrand Reinhold, 1992.
Reich, David E. Designing High-Powered OS/2 Warp Applications. New York: John Wiley and Sons, 1995.
[Note: many of these books are on sale in our bookstore.]