SMPV211 - Application Considerations
Reprint Courtesy of International Business Machines Corporation, © International Business Machines Corporation
The following sections discuss application considerations of OS/2 for SMP V2.11.
Application Compatibility Requirements
- An Application or associated subsystem must not use the 'INC' instruction as a semaphore without prepending a 'LOCK' prefix. On a UniProcessor (UP) system this instruction can be used as high performance semaphore without calling any other OS service if the semaphore is free and when the semaphore is clear and there are no waiters for the semaphore. Because the INC instruction can not be interrupted once started and because the results would be stored in the flags register which are per thread then it could be used safely as semaphore.
- In an OS/2 for SMP V2.11 environment this technique will not work because it is possible that two or more threads could be executing the same 'INC' instruction receiving the same results in each processor's/thread's flag register thinking that they each have the semaphore.
- Similarly a 486 or greater instruction such as the CMPXCHG has the same problem above if a 'LOCK' prefix is not prepended before the instruction.
- An Application or associated subsystem which relies on priorities to guarantee execution of its threads within a process will not work in OS/2 for SMP V2.11. For example an application may have a time-critical and an idle thread and may assume that while the time-critical thread is executing that the idle thread will not get any execution time unless the time-critical thread explicitly yields the CPU. In an OS/2 for SMP V2.11 environment it is possible that both the time-critical and idle threads are executing simultaneously on different processors.
The above compatibility requirements apply only to multithreaded applications, and therefore do not apply to DOS and WINOS2 applications. However, you are strongly encouraged to write 32-bit multithreaded applications for better performance and portability on OS/2 for SMP V2.11.
Given that there is the possibility of some set of applications which may use one of these techniques, OS/2 for SMP V2.11 provides a mechanism whereby these multithreaded applications can execute in UP mode. Only one thread of that process would be allowed to execute at any given time. That thread could execute on any one of the processors. A utility is used to mark the EXE file as uniprocessor only. OS/2 forces the process to run in the uniprocessor mode when the loader detects that the EXE file has been marked as uniprocessor only. See "The Single Processor Utility Program" section.
Application Exploitation
There are some very attractive benefits of OS/2 for SMP V2.11 beyond the increased raw CPU power. Caching is a technique that is employed in both hardware and software to increase performance. SMPs increase the effectiveness of the various caches dramatically. An application that can divide its work into separate executing units such as threads will see performance increases across the hardware and software.
Each x86 processor (assuming 386 or higher) has a translation lookaside buffer (TLB) that keeps the most recent page translation addresses in a cache, so that every time the processor needs to translate a linear address into a physical address it does not have access the Page Directory and Page Table which reside in much slower memory. This cache is very limited in size. The more unique entries it encounters the less its effectiveness. An application which is single threaded makes use of only one TLB and probably causes thrashing within the TLB because of branching. However, with multiple processors, multithreaded applications will make use of N TLBs (where N is the number of threads and processors available). Thus the performance increase is more than just raw CPU power.
Beyond the TLB cache, these processors also contain Level 1 (L1) caches and OEMs will sometimes add Level 2 (L2) caches to their systems. The same advantages are applicable here but to a further degree.
There are also some advantages for software caches as well. Consider a file system cache where the effectiveness of the cache is largely determined by the hit ratio. If the cache receives large number of hits compared to misses, it is effective. The best way to achieve this is to keep the Most Recently Used (MRU) data in the cache. The best way to achieve this is to keep referencing the same data. A multithreaded application running on OS/2 for SMP V2.11 will cause this behavior to exist because the file system cache is being accessed in a shorter period of time by the same application. A single-threaded application with longer periods of access could allow for the cache to be flushed.
Secondly, an important aspect of a demand paged OS is its ability to keep the right set of pages in memory at the right times. With OS/2 for SMP V2.11 and a multithreaded application, the Page Manager can make a better decision because pages for this application are being accessed more frequently than before.
New OS/2 for SMP V2.11 APIs
The new OS/2 for SMP V2.11 APIs are described in the following text.
This section defines the new spinlock APIs that have been added for multiprocessor support.