EDM/2 - Inside the OS/2 Kernel

Inside the OS/2 Kernel

[Note: Due to the nature of the material in this article, the width of the preformatted text likely exceeds normal browser dimensions. I apologize for this, but for this particular article, it was unavoidable, given the current capabilities of HTML. Ed.]
I. Introduction

In this article, I aim to take OS/2 users and developers on a figurative "journey to the center of the earth"-- an expedition into the little-seen but fundamental workings of the system kernel.
The main tool used in researching this article was the Kernel Debugger (KDB), helpfully provided by IBM Corporation, and the accompanying 4-volume "OS/2 Debugging Handbook". My previous article, "Adventures in Kernel Debugging" (EDM/2 Nov. 1996, http://www.edm2.com/0410/kdb.html), gave an introduction to the setup and use of KDB. To get the most out of the present article, you should have an understanding of that material as well as a working knowledge of the Intel 80x86 architecture and memory management features.
For a detailed look at the kernel, it is of course necessary to settle on a specific version of OS/2, since the kernel code has been modified and upgraded along with the other system components in the various versions and fixpack levels. I have chosen OS/2 2.1 for Windows, build 6.514, as the least common denominator of the systems that are likely to be still running. Since most users will be running newer versions, the memory locations, file sizes and so forth will be different from those shown here. But the major data structures and code components have not changed appreciably.
II. Components of the kernel

The main bulk of the kernel code is in the file OS2KRNL, which on the example system is about 730K (retail) or 1Mb (debug version). The supporting cast consists of several smaller files: OS2LDR (28K retail, 37K debug), and the base device drivers SCREEN01.SYS, PRINT01.SYS, KBD01.SYS, and CLOCK01.SYS (3-30K each). Since the source code for the base device drivers is provided on the IBM DDK Developer Connection CD-Roms, we will not delve into them here.
The system DLLs DOSCALL1.DLL, KBDCALLS.DLL, QUECALLS.DLL, SESMGR.DLL, OS2CHAR.DLL and so forth are not part of the Presentation Manager but are not strictly part of the kernel either. Much of this "code" is simply forwarder entries to the Dos* kernel APIs. The SESMGR code does deserve some comment, but I will postpone it to a future article in order to keep this one to a manageable length.
The file OS2KRNL is a standard "LX" format 32-bit segmented-executable file, and as such can be examined with an EXE-file formatting utility. But it is easier and more informative to use the KDB "lg" command, which reads the symbol files supplied by IBM. This reveals the following segments:
##lg os2krnl os2krnl: 0400:00000000 DOSGROUP 1100:00000000 DOSCODE 0120:00000000 DBGCODE 0128:00000000 DBGDATA 0030:00000000 TASKAREA 0138:00000000 DOSGDTDATA 0140:00000000 DOSINITDATA 0148:00000000 DOSINITR3CODE %00110000 DOSMVDMINSTDATA %00120000 DOSSWAPINSTDATA %ffeff000 DGROUP 0150:00000000 DOSHIGH2CODE 0158:00000000 DOSHIGH3CODE 0160:00000000 DOSHIGH4CODE %fff3f000 DOSHIGH32CODE ##
Of these segments, the 16-bit code segments are: DOSINITR3CODE, DOSCODE, DOSHIGH2CODE, DOSHIGH3CODE, DOSHIGH4CODE, and DBGCODE. The 16-bit data segments are DOSGROUP, DOSINITDATA and DBGDATA. (The DBGCODE and DBGDATA segments exist only to support KDB and are not found in the retail kernel.) The 32-bit code segment is DOSHIGH32CODE and the 32-bit data segment is DGROUP.
The "DOS" segments pertain, naturally, not to DOS but to the Control Program, the protected-mode successor to DOS. The "HIGH" segments are those which will be loaded high, that is, in physical memory above the 1Mb line.
Four segments require special comment. TASKAREA is an expand-down data segment which simply maps the current PTDA (Per-Task Data Area), as detailed in the Debugging Handbook. DOSGDTDATA maps the system GDT, which contains call gates for kernel API calls, and entries for segments used by some of the 16-bit kernel code. The symbol file has names for most of these segments, which give clues as to their functions (type "ls dosgdtdata" at the KDB prompt). DOSMVDMINSTDATA and DOSSWAPINSTDATA, as their names imply, support Multiple VDMs (Virtual DOS Machines), discussed in section V.
The following diagrams give an idea of the relative sizes of these segments. (The 16- and 32-bit segments are not drawn to the same scale, however.) The shaded areas represent pieces that are discarded after system initialization is complete.

The 16-bit code segments appear to be hand-coded in assembler, whereas the 32-bit code segment is largely compiled C code. Here is a synopsis of the major source modules making up each code segment.

III. The OS/2 start-up sequence

In this section we will take a close look at the kernel's bootstrap mechanism. This type of analysis is often useful in diagnosing hardware failures or file corruption which can cause OS/2 to fail to boot properly. For some more details on this phase of the system, see also the section "Remote IPL/ Bootable IFS" in the "Installable File Systems" (IFS.INF) file on Hobbes.
As every English schoolchild knows, when an IBM-compatible computer is turned on or rebooted, the appropriate boot sector is read into memory at location 07c0:0. Execution begins in real mode at this address, which is just below the 32K line. (This location was chosen so that the early IBM PCs could theoretically operate, though of course not run OS/2, with as little as 32K of RAM.)
The boot sector begins with a 3-byte JMP instruction, followed by a data structure of drive parameters which is common to both DOS and OS/2. This data structure is documented in a number of books on DOS and PC internals and will not be further discussed here. (See, for example, Ray Duncan, Advanced MS-DOS Programming, 2d. ed. 1988, Microsoft Press, p. 180.) The boot sector loads a small (1K) file called OS2BOOT at 0800:0, and OS2BOOT in turn loads OS2LDR at 2000:0. Of course, this file loading is done with real mode BIOS calls, since no file system is available yet.
The OS2LDR file is one of the most obscure and least documented parts of the kernel. There is no symbol file for this code, nor can one step through it in the Kernel Debugger, since the Kernel Debugger resides in the debug version of OS2KRNL, which has not been loaded yet. However, the debug version of OS2LDR generates a wealth of data on the debug terminal, and by analyzing this output along with code disassemblies, one can get a good idea of this module's activities.
We start with chip tests and basic system initialization-- querying available memory and installed drives, testing the CPU clock speed, and storing available video modes. Along the way, we get some cryptic progress indications on the debug terminal:
IODel 000a
The I/O delay-- the time to wait between IN and OUT instructions-- has been set to 10 based on the speed of this processor.
Int12 st 00000000 end 0009f7ff Int1588 st 00100000 end 03ffffff
The BIOS reports 640K (approx. 9f7ff hex) of real mode memory, and 64Mb (03ffffff hex) of extended memory.
CPUUsable = 00000001 CPUWeAre = 00000001
The CPU is an 80486 (0 = 386, 1= 486, etc.)
SLFrm len a342
The length of the OS2LDR segment, including stack at the end, is 0a342 hex.
cgvi
We are calling the "get video modes" routine.
cldr
And now we come to the activity for which the loader is named: loading OS2KRNL, or, if this file is not found, OS2KRNLI, the install kernel.
The loader routine first gives us an inventory of the OS2KRNL segments:
ob flags oi-flags paddr/sel glp laddr/fladdr msz/vsz Object name 01 rw--sfTLaA 00005063 004000/0400 0001 ffe00000/ffe00000 009000/00c5b3 DOSGROUP 02 r-x-sfTLa- 00001065 011000/1100 000a ffe0d000/ffe0d000 00c000/00bfb0 DOSCODE 03 r-x-sf-LaA 00005025 01d000/0120 0016 ffe19000/ffe19000 00b000/00aeea DBGCODE 04 rw--sf-LaA 00005023 028000/0128 0021 ffe24000/ffe24000 009000/0085c0 DBGDATA 05 rw--sN-LaA 0000d0a3 031000/0130 002a ffe2d000/ffe2d000 010000/010000 stack 06 rw--sN-LaA 0000d023 041000/0138 003a ffe3d000/ffe3d000 002000/001e50 DOSGDTDATA 07 rw--sf-LaA 00005023 043000/0140 003c ffe3f000/ffe3f000 002000/004b4e DOSINITDATA 08 r-x-sf-LaA 00005025 048000/0148 003e ffe44000/ffe44000 002000/001fe8 DOSINITR3CODE 09 rw-BPf-h-- 00002213 100000/0000 0040 ffefc000/00110000 001000/0001ac DOSMVDMINSTDATA 0a rw-BPf-h-- 00002013 101000/0000 0041 ffefd000/00120000 002000/001948 DOSSWAPINSTDATA 0b rw-Bsf-h-A 00006033 103000/0000 0043 ffeff000/ffeff000 012000/015326 DGROUP 0c r-x-sf-ha- 00001035 119000/0150 0055 fff15000/fff15000 010000/00fdcc DOSHIGH2CODE 0d r-x-sf-ha- 00001035 129000/0158 0065 fff25000/fff25000 00a000/009a08 DOSHIGH3CODE 0e r-x-sf-ha- 00001035 133000/0160 006f fff2f000/fff2f000 010000/00f304 DOSHIGH4CODE 0f r-xBsf-h-- 00002035 143000/0000 007f fff3f000/fff3f000 081000/080628 DOSHIGH32CODE
The rightmost column in the above table contains my own annotations, correlating the objects listed with the segments discussed in the previous section.
As an aside, the term "object" here means the equivalent in a 32-bit executable module of a "segment" in a 16-bit module. An object is more complex than a segment, since it consists of pages which can be read in and swapped to disk independently of one another, and this may be why a new term was felt to be necessary. But for most purposes, an object is simply a segment which is not limited to 64k in size. I will use the two terms more or less interchangeably.
At any rate, the table above gives attributes, physical and linear addresses and selectors, and sizes for each segment in OS2KRNL. The actual reading in of the file proceeds by first loading the "high" segments into low memory, and then moving them to their proper places using the BIOS Int 15h/87h "Move extended memory block" function. (Remember, we're still running in real mode here!) On the second pass, the low memory segments are loaded.
Wrapping up, OS2LDR gives us a map of physical memory:
pa=00000000 sz=00001000 va=00000000 sel=0000 fl=2000 of=00000003 ow=0000 Real mode IVT pa=00001000 sz=00002300 va=ffef9000 sel=0100 fl=2014 of=00001004 ow=ff6d OS2LDR 32-bit int dispatch pa=00004000 sz=0000c5b3 va=ffe00000 sel=0400 fl=2144 of=00005063 ow=ffaa DOSGROUP pa=00011000 sz=0000bfb0 va=ffe0d000 sel=1100 fl=2244 of=00001065 ow=ffaa DOSCODE pa=0001d000 sz=0000aeea va=ffe19000 sel=0120 fl=2344 of=00005025 ow=ffaa DBGCODE pa=00028000 sz=000085c0 va=ffe24000 sel=0128 fl=2444 of=00005023 ow=ffaa DBGDATA pa=00031000 sz=00010000 va=ffe2d000 sel=0130 fl=2544 of=0000d0a3 ow=ffaa stack pa=00041000 sz=00001e50 va=ffe3d000 sel=0138 fl=2644 of=0000d023 ow=ffaa DOSGDTDATA pa=00043000 sz=00004b4e va=ffe3f000 sel=0140 fl=2744 of=00005023 ow=ffaa DOSINITDATA pa=00048000 sz=00001fe8 va=ffe44000 sel=0148 fl=2844 of=00005025 ow=ffaa DOSINITR3CODE pa=0004a000 sz=00000ac8 va=00000000 sel=4a00 fl=2001 of=00000000 ow=0000 OS2DUMP pa=0004b000 sz=00049000 va=00000000 sel=0000 fl=2002 of=00000000 ow=0000 unused pa=00094000 sz=0000a762 va=ffeee000 sel=0000 fl=2054 of=00001003 ow=ffab OS2LDR (relocated) pa=0009f000 sz=00000800 va=00000000 sel=0000 fl=2002 of=00000000 ow=0000 unused pa=0009f800 sz=00000800 va=ffeed800 sel=0000 fl=2004 of=00000000 ow=ff37 romdata pa=000a0000 sz=00060000 va=00000000 sel=0000 fl=0001 of=00000000 ow=0000 video/BIOS area pa=00100000 sz=000001ac va=ffefc000 sel=0000 fl=0944 of=00002213 ow=ffaa DOSMVDMINSTDATA pa=00101000 sz=00001948 va=ffefd000 sel=0000 fl=0a44 of=00002013 ow=ffaa DOSSWAPINSTDATA pa=00103000 sz=00015326 va=ffeff000 sel=0000 fl=0b44 of=00006033 ow=ffaa DGROUP pa=00119000 sz=0000fdcc va=fff15000 sel=0150 fl=0c44 of=00001035 ow=ffaa DOSHIGH2CODE pa=00129000 sz=00009a08 va=fff25000 sel=0158 fl=0d44 of=00001035 ow=ffaa DOSHIGH3CODE pa=00133000 sz=0000f304 va=fff2f000 sel=0160 fl=0e44 of=00001035 ow=ffaa DOSHIGH4CODE pa=00143000 sz=00080628 va=fff3f000 sel=0000 fl=0f44 of=00002035 ow=ffaa DOSHIGH32CODE pa=001c4000 sz=00e3c000 va=00000000 sel=0000 fl=0002 of=00000000 ow=0000 unused pa=01000000 sz=00000000 va=00000000 sel=0000 fl=0001 of=00000000 ow=0000 unused pa=01000000 sz=03000000 va=00000000 sel=0000 fl=0002 of=00000000 ow=0000 unused pa=04000000 sz=00000000 va=00000000 sel=0000 fl=4000 of=00000000 ow=0000 limit of physical memory
Here I have again added annotations in the right-most column. The second to last column is the "System Object Id" for which a key can be found in section 4.6 of the Debugging Handbook, Volume IV.
We set the 8259 PIC chips so that IRQs 0 through 7 map to interrupts 50h-57h, and IRQs 8 through 0fh map to interrupts 70h-77h:
rPIC
And we jump to syiInitializeOS2 in the DOSCODE segment of OS2KRNL:
j syi
Now that we are in the kernel proper, we can step through most of the rest of the initialization code with the Kernel Debugger. A special KDB facility enables us to press and hold "R", "P", or <space> at the debug terminal and interrupt the startup code either before the switch to protected mode, after the switch to protected mode but before loader and pager initialization, or after loader and pager initialization, respectively. (Be sure to set the keyboard repeat rate on the debug terminal to a high value or else the keystroke may be missed by the COM port polling routines.)
The bulk of the system init part of DOSCODE consists of logic to parse CONFIG.SYS. There is also a component called the "system init file system" (abbreviated sifs), used to read in CONFIG.SYS as well as various BASEDEVs and other files needed before a full-fledged file system can be set up. Each major piece of the kernel (the loader, pager, scheduler, and so on) has an initialization routine which is called at this point, and we "return" to the syiProcess routine in the DOSINITR3CODE segment.
syiProcess then loads and initializes the regular (non-base) installable device drivers, loads the system DLLs, and starts the shell. This is the first place where we have available the ADD drivers used by the fully initialized system to access the hard disk. Since one of the first routines called by syiProcess is in the inicp.asm module (initialize codepage), it is here that we can get the infamous "Cannot find COUNTRY.SYS" error message, even when there is no problem with the COUNTRY.SYS file, if the base device drivers have not installed properly.
IV. The file access code

I want to take a brief look at the sequence of calls connecting a Dos* file I/O call with the hardware access code. For more details, please consult the "Storage Device Driver Reference" found in the DDK and the IFS.INF file on Hobbes.
In the first scenario, we call DOS32READ with the handle of a file on a FAT partition of a SCSI hard drive. This is an entry point in DOSCALL1 which soon passes through a call gate to the 32-bit kernel and invokes FS32IREAD. For FAT access, FS32IREAD then calls the 16-bit routine h_DOS_Read in DOSHIGH4CODE, which, after ascertaining that the requested data is not in a previously read buffer, formulates a "request list" and sends it to the OS2DASD.DMD device driver. The request list is contructed in the _BufReadExecute and _ReqListExecute routines of DOSHIGH32CODE, and consists of a single request with the extended command code 1Eh for Read.
The request specifies the start block, number of blocks to read, and addresses of buffers to hold the data. OS2DASD.DMD then calls the appropriate *.ADD device driver-- for the example system, AHA152X.ADD-- to access the physical hardware.
The second scenario is the same except that the file we are reading resides on an HPFS partition. In this case, FS32IREAD bypasses the legacy 16-bit FAT routines in DOSHIGH4CODE, and calls instead the FS_READ entry point of HPFS.IFS. The HPFS file system then takes care of buffering data and interfacing to the OS2DASD.DMD module.
The third scenario involves reading a file on a floppy disk. This is the same as the first scenario as far as the kernel is concerned; however, the OS2DASD.DMD code will pass the request down to IBM1FLPY.ADD (or IBM2FLPY.ADD for a Micro Channel machine), rather than to AHA152X.ADD.
Finally, to read data from a SCSI CD-Rom, FS32IREAD calls FS_READ on the CD-Rom file system driver CDFS.IFS. CDFS will again perform buffering services, then send a request list to OS2CDROM.DMD, which will invoke the appropriate BASEDEV-- in the example system, LMS206.ADD for a Philips CD-Rom drive.
V. The context-switching mechanism

A context switch generally takes place at the trailing edge of a kernel API call. Before returning from kernel mode to user mode, the system will call a special routine called KMExitKmodeEvents. This routine examines the global variable Resched, which indicates whether other threads of sufficient priority are ready to run. If Resched is non-zero, the next stop is the _tkSchedNext routine.
_tkSchedNext invokes the scheduler apparatus (_SCHGetNextRunner) to decide which of the ready threads will be the next to receive a timeslice. Some aspects of the scheduler, with its thicket of states and transitions, priority queues, sleep queues, and so forth, are documented in the Debugging Handbook. For now we simply note that _SCHGetNextRunner returns, in the EAX register, a pointer to the TCB of the new, or incoming thread. This pointer then becomes the single argument to the _PGSwitchContext routine.
The _PGSwitchContext code occupies 559 bytes in DOSHIGH32CODE, and it is worth close study. We cannot step through this code in the Kernel Debugger, since the page tables and system structures are in a transitional state which the debugger cannot make sense of. But by examining the disassembly we can understand its operation and gain significant insight into the OS/2 process architecture.
The path we take depends to some extent on whether we are switching to a different process or merely to a different thread within the same process. If it is a process switch, we must rewrite the portion of the page tables corresponding to user memory (typically up to about 256 Mb) to show the new physical addresses. A process switch also requires pointing the LDTR to a new value, since the LDT tiling can be different for different processes.
For either a process or thread switch, we must remap the TASKAREA segment (selector 30), since this selector addresses the current TCB and TSD as well as the PTDA. We must also update various system global variables: _pPTDACur, _TKSSBase, _TKTCBBias, _pTCBCur, _pTSDCur, and the ring 0 and ring 2 stack pointers.
For some more details on the context switching mechanism, see page 339 of the Debugging Handbook, Volume I.
Of course, it is possible for an application not to make any kernel calls for a long period of time. Perhaps the program is solving a differential equation, doing a complex string search, or otherwise minding its own business without needing to do any I/O or use any kernel services. Will the KMExitKmodeEvents routine then be bypassed, and must all other threads bide their time while waiting for such a program to finish?
The answer, as we would expect, is no, thanks to the 8254 PIT, or Programmable Interval Timer, chip on the PC motherboard. At boot-up, counter 0 of the 8254 chip is set to operate in mode 2 (rate generator mode) to cause an interrupt on IRQ0 approximately 18.2 times per second. Like other IRQs, this one is handled by intIRQRouter in DOSHIGH32CODE, and upon receiving IRQ0, intIRQRouter calls KMExitKmodeEvents, as above. This forces the application to undergo the same scheduling scrutiny that it would if it made a kernel call directly.
VI. Memory management

When an application calls DosAllocMem, the system creates a "memory object" by reserving a contiguous segment of the process's private virtual address space. The system allocates page table entries for the object, and since each page table entry controls 4Kb of memory, the object will actually have a size equal to the requested size rounded up to the nearest 4Kb. The object will begin on a 64Kb boundary to allow it to be addressed by an LDT selector, so each call to DosAllocMem consumes at least 64Kb of virtual address space.
However, no physical memory or disk swap space is allocated by the DosAllocMem code. The mechanism used is called "lazy commit": when an attempt is made to read or write to the area of virtual memory in question, a page fault will be generated (trap 0eh), and the handling routines in DOSHIGH32CODE will then allocate physical memory and set the "present" bit in the corresponding page table entry.
A simple experiment shows the "before" and "after" state of the page table resulting from a call to DosAllocMem. Here is a program about to make the call with a request to allocate 00020000h, or 128 Kb:
eax=0006eb03 ebx=000a0000 ecx=0006eb88 edx=0006ebb0 esi=00000000 edi=00019010 eip=000120f7 esp=0006eb7c ebp=0006ebd4 iopl=2 rf -- -- nv up ei pl nz na pe nc cs=005b ss=0053 ds=0053 es=0053 fs=150b gs=0000 cr2=00093ffe cr3=001f6000 005b:000120f7 e8385a011a call DOS32ALLOCMEM (1a027b34) ##d ss:esp l 20 0053:0006eb7c 88 eb 06 00 00 00 02 00-13 00 00 00 03 00 00 00 .k.............. 0053:0006eb8c 74 34 01 00 d0 eb 06 00-08 00 00 00 00 00 00 00 t4..Pk.......... ##
And here are the page table entries for %120000 and %130000:
##dp %120000 linaddr frame pteframe state res Dc Au CD WT Us rW Pn state %00120000* 02ec1 frame=02ec1 2 0 D A U W P resident ##dp %130000 linaddr frame pteframe state res Dc Au CD WT Us rW Pn state %00130000* 02ec1 frame=02ec1 2 0 D A U W P resident ##
We type "p", and then examine the stack and page tables again:
##p eax=00000000 ebx=000a0000 ecx=0006eb88 edx=0006ebb0 esi=00000000 edi=00019010 eip=000120fc esp=0006eb7c ebp=0006ebd4 iopl=2 -- -- -- nv up ei pl nz na pe nc cs=005b ss=0053 ds=0053 es=0053 fs=150b gs=0000 cr2=00093ffe cr3=001f6000 005b:000120fc 83c40c add esp,+0c ##d %6eb88 l 20 %0006eb88 00 00 12 00 74 34 01 00-d0 eb 06 00 08 00 00 00 ....t4..Pk...... %0006eb98 00 00 00 00 00 00 00 00-00 00 00 00 e0 5d 01 00 ............`].. ##dp %120000 linaddr frame pteframe state res Dc Au CD WT Us rW Pn state %00120000* 02ec1 frame=02ec1 2 0 D A U W P resident %00120000 vp id=01608 0 0 c u U W n pageable %00121000 vp id=01609 0 0 c u U W n pageable %00122000 vp id=0160a 0 0 c u U W n pageable %00123000 vp id=0160b 0 0 c u U W n pageable %00124000 vp id=0160c 0 0 c u U W n pageable %00125000 vp id=0160d 0 0 c u U W n pageable %00126000 vp id=01635 0 0 c u U W n pageable %00127000 vp id=01636 0 0 c u U W n pageable %00128000 vp id=01637 0 0 c u U W n pageable %00129000 vp id=01638 0 0 c u U W n pageable %0012a000 vp id=01639 0 0 c u U W n pageable %0012b000 vp id=0163a 0 0 c u U W n pageable %0012c000 vp id=0163b 0 0 c u U W n pageable %0012d000 vp id=0163c 0 0 c u U W n pageable %0012e000 vp id=0163d 0 0 c u U W n pageable %0012f000 vp id=0163e 0 0 c u U W n pageable ##dp %130000 linaddr frame pteframe state res Dc Au CD WT Us rW Pn state %00130000* 02ec1 frame=02ec1 2 0 D A U W P resident %00130000 vp id=0163f 0 0 c u U W n pageable %00131000 vp id=01640 0 0 c u U W n pageable %00132000 vp id=01641 0 0 c u U W n pageable %00133000 vp id=01642 0 0 c u U W n pageable %00134000 vp id=01643 0 0 c u U W n pageable %00135000 vp id=01644 0 0 c u U W n pageable %00136000 vp id=01645 0 0 c u U W n pageable %00137000 vp id=01646 0 0 c u U W n pageable %00138000 vp id=01647 0 0 c u U W n pageable %00139000 vp id=01648 0 0 c u U W n pageable %0013a000 vp id=01649 0 0 c u U W n pageable %0013b000 vp id=0164a 0 0 c u U W n pageable %0013c000 vp id=0164b 0 0 c u U W n pageable %0013d000 vp id=0164c 0 0 c u U W n pageable %0013e000 vp id=0164d 0 0 c u U W n pageable %0013f000 vp id=0164e 0 0 c u U W n pageable ##
The kernel has allocated 128Kb worth of page table entries beginning at linear address %120000.
The main worker routine in the kernel which does this is _VMAllocMem, which calls the routines _VMReserve, _PGAlloc, and _SELAlloc.
We may also want to see what happens when the program actually tries to access the memory. The KDB command "vsp e" will intercept page faults before they are processed, and this can be used in conjunction with the "zs" (change default command) facility to collect statistics on the page fault mechanism and its effect on system performance. For tracing purposes, it is easier just to put a breakpoint at the start of the lengthy _PGPageFault routine which handles this exception.
VII. The DOS emulation kernel

The DOS emulation component of the system is not mentioned at all in the Debugging Handbook and tends to be ignored by developers because it exists only for compatibility with older programs. However, it occupies over 25% of the code in OS2KRNL and is worth examining if only as an illustration of the versatility of the x86 architecture.
There are essentially three parts to DOS emulation in OS/2: the MVDM manager, the DOS emulator proper and the x86 emulator. A fourth part, the virtual device drivers necessary to run many DOS programs, exists outside the kernel but makes use of the Virtual DevHelp API calls implemented in the MVDM manager.
To get an idea of the issues involved in tracing a DOS application in the Kernel Debugger, let's look at a simple "Hello, world" program written in assembly. We open a DOS Window, whereupon the kernel gives us a VDM with a copy of the "stub virtual DOS kernel"-- the file C:\OS2\MDOS\DOSKRNL-- loaded into low memory to provide int 21h services. We then start the program HELLO.EXE. Here is the complete disassembly:
--u ac2:0 l 7 0ac2:00000000 b8c30a mov ax,0ac3 0ac2:00000003 8ed8 mov ds,ax 0ac2:00000005 b409 mov ah,09 ; output a string at ds:dx 0ac2:00000007 ba0000 mov dx,0000 0ac2:0000000a cd21 int 21 0ac2:0000000c b44c mov ah,4c 0ac2:0000000e cd21 int 21 --d ac3:0 l 10 0ac3:00000000 48 65 6c 6c 6f 2c 20 77-6f 72 6c 64 0d 0a 24 00 Hello, world..$.
Notice that KDB uses a double-dash prompt ("--") instead of the usual "##" to indicate that we are running in V86 mode. We were able to break in this code by patching in a CCh opcode at the entry point of HELLO.EXE, running the program, then editing memory to replace the proper opcode.
After typing "t" a few times, we arrive at a DOS system call:
--t eax=000009c3 ebx=00000000 ecx=000000ff edx=00000000 esi=00000000 edi=00000100 eip=0000000a esp=00000100 ebp=0000091c iopl=3 -- vm -- nv up ei pl zr na pe nc cs=0ac2 ss=0ac4 ds=0ac3 es=0ab2 fs=0000 gs=0000 cr2=01390000 cr3=001f6000 0ac2:0000000a cd21 int 21 --
But we will not be able to "t" into this call. This is because this instruction is about to cause a General Protection Exception (Trap 0D), even though IOPL is 3, apparently because the IDT entry for int 21h is invalid or contains a null pointer:
--di 21 0021 TrapG Sel:Off=0000:00000000 DPL=3 P
So we put a breakpoint at trap0d in the 32-bit kernel, and continue:
--br e trap0d --g Debug register hit eax=000009c3 ebx=00000000 ecx=000000ff edx=00000000 esi=00000000 edi=00000100 eip=fff491bc esp=00006708 ebp=0000091c iopl=3 rf -- -- nv up ei pl zr na pe nc cs=0170 ss=0030 ds=0000 es=0000 fs=0000 gs=0000 cr2=01390000 cr3=001f6000 os2krnl:DOSHIGH32CODE:trap0d: 0170:fff491bc 6a0d push +0d ;br0 ##
This code will call em86opINTnn to simulate the software interrupt, and we will soon "iretd" back into V86 mode. The call will then be handled by DOSKRNL in low memory.
We will leave for another day the rest of the saga, for DOSKRNL must still make BIOS calls, which will again cause GP exceptions and be routed to VDDs such as VBIOS.SYS and VVGA.SYS. These will cooperate with SESMGR and the PDDs to finally display the greeting on the screen.
Some additional clues about the workings of OS/2 DOS emulation can be found in The Design of OS/2, � 1992 Addison-Wesley, by H. M. Deitel and M. S. Kogan, pp. 290-300. These are only clues, however, as the correlation is not exact between the text descriptions offered there and what is observable with KDB.
VIII. The shut-down routines

Since all good things must come to an end, the Control Program API includes the DosShutdown routine. The worker code is at the symbol w_Shutdown in DOSHIGH4CODE.
This function disables the installed file system drivers by overwriting all their entry points with the address of the ShutdownBlock routine in DOSHIGH2CODE. Any thread thereafter attempting to call an FSD will be blocked. A few routines, however, remain intact for use by the shutdown code: FS_COMMIT, FS_DOPAGEIO, FS_FSCTL, FS_FLUSHBUF, and FS_SHUTDOWN. Also, for file system drivers used by the swapper, certain key entry points are first preserved at the locations FS_SDCHGFILEPTR, FS_SDFSINFO, FS_SDREAD, and FS_SDWRITE. This enables the paging routines to continue to perform shutdown chores, while all other threads are locked out.
We then iterate through all installed device drivers, sending "shutdown" (command code 1Ch) request packets to each. Each driver is called twice, with parameters of 0 and 1 for begin shutdown and end shutdown, respectively. Likewise, for each installed file system driver, the function FS_SHUTDOWN is called twice with start and end shutdown flags. In between these calls, the routines shutdown$FlushAllSFTs and h_FSD_FlushBuf stabilize RAM-cached portions of the file systems.
IX. Conclusion

At a time when IBM's support for OS/2 seems to grow less enthusiastic every day, it becomes increasingly important for users and developers to understand the internals of the system on their own. This knowledge can help in developing drivers and applications, building independent help desks, and even in coding patches to the system if necessary. With active support efforts from the outside community, OS/2 can and will continue to thrive. I hope that the present article has contributed in some measure to understanding the foundation of this imposing edifice.
eax=00000000 ebx=7b22002b ecx=80010013 edx=7ba0fc0c esi=7ba093c0 edi=ffffffff eip=fff46572 esp=00006688 ebp=00006688 iopl=2 -- -- -- nv up ei pl zr na pe nc cs=0170 ss=0030 ds=0168 es=0168 fs=0000 gs=0000 cr2=111f3302 cr3=001f6000 0170:fff46572 833d7c1ff1ff00 cmp dword ptr [_ptcbPriQReady (fff11f7c)],+00 ds:fff11f7c=00000000 ##
Copyright © 1998, David C. Zimmerli. All rights reserved.