The OS/2 Debug Kernel

by Charlie Schmitt and Monte Copeland

The OS/2 debug kernel is a special version of the OS/2 kernel that can help in debugging. Traditionally, device driver developers use it to debug ring 0 drivers. Today, any developer can benefit from using the debug kernel. This article describes the debug kernel and shows several techniques on how to use it effectively.

The debug kernel is a superset of the ship-level kernel - the retail kernel. For OS/2 2.1, the retail/debug kernels are 734,366/1,002,054 bytes. The extra code in the debug kernel performs usual debugger functions such as memory dump, program trace, breakpoints, hot patching, and stricter error checking.

Some unusual things about the debug kernel are:
 * It takes two PCs to run it. The "debuggee" PC runs OS/2 with the debug kernel. The "debugger" PC runs an asynchronous terminal emulator. Connect the two with a null-modem cable.
 * It does not perform source-level debugging. It is limited to assembler language and symbolic labels.

Installing the Debug Kernel
The debug kernel and installation program are on The Developer Connection for OS/2 CD-ROM. Actually, the CD-ROM contains many debug kernels, but only one is right for your version of OS/2. The installation program picks the right debug kernel for your machine. See \DEVTOOLS\OS2TK21\DEBUG\DBUGINST.DOC.

The installation program automates the following simple but tedious process:
 * 1) Change the attributes of the kernel file, \OS2KRNL, and make it read-write, user, and visible. Make a backup.
 * 2) Copy the debug kernel into \OS2KRNL.
 * 3) Copy symbol files (*.SYM) into their appropriate subdirectories.

When you restart your system, OS/2 loads and transfers control to the debug kernel. The debug kernel prints its output to COM2. By default the debug kernel will always use COM2 if present on the debuggee machine. If no COM2 exists, then it will use COM1. At this point during system initialization, most COM hardware operates at 9600 baud, eight bits, and no parity.

On the other side of the null-modem cable, configure the debug terminal with full duplex, TTY, 9600 baud, no parity, and eight bits. Here you will see messages from the debug kernel such as, symbols linked. Press Ctrl-C at the debug terminal. The debug kernel should respond with a register dump and one of its command prompts: ## for protect mode, and - for real mode. Resume execution by entering the letter g, which is the command for go.

Using the Debug Kernel
Command the debugger at the debug terminal. The commands are short to help reduce typing. Some commands work the same as DEBUG.COM, SYMDEB.EXE, and CV.EXE. For a complete list of commands, see DEBUG.DOC on the CD-ROM.

The debug kernel is a symbolic debugger. You must have symbol files to match your program, DLL, or driver. Use the /MAP switch to the linker, then run the toolkit program MAPSYM.EXE to generate a symbol file from the map file. Install the .SYM file in the same directory as the .EXE, .DLL, .DRV, and .SYS files on the OS/2 debuggee.

Installing a new .EXE and .SYM usually does not require a reboot. End the program and copy over the .EXE and .SYM. However, installing a new .DLL, .DRV, or .SYS means a reboot because DLLs and drivers often stay loaded until re-IPL.

At runtime, the debug kernel has a symbol for the address of every C function; use these symbols to set breakpoints. It has symbols for global variables declared in C. However, it only knows the location of any symbol declared PUBLIC for assembler.

Breakpoints
A breakpoint can be hard-coded directly in the program. The breakpoint instruction, an assembler INT3, causes the debug kernel to stop. All other breakpoints are set at the debug terminal using some variation of the breakpoint commands. Refer to DEBUG.DOC on your CD-ROM.

It's good to have an INT3 breakpoint at program initialization. Device driver writers hard-code an INT3 in their strategy routine; DLL programmers put one in the function _DLL_InitTerm; EXE programmers code one right after main.

When the debugger stops on any breakpoint, the debug terminal shows the registers, the current instruction, and the command prompt. You can dump memory, trace execution, and set additional breakpoints elsewhere in the program. The debug kernel remains stopped until commanded to trace or go.

One way to code INT3's in C is to call an assembler subroutine. On your Developer Connection CD-ROM, you will find INT3386.ASM for 32-bit C compilers and INT3286.ASM for 16-bit compilers. They have been assembled and placed in the static linking libraries: INT3386.LIB and INT3286.LIB.

The C programs that follow include INT3386.H, which defines the macro INT3. The programs were linked with INT3386.LIB.

Get Your Hands On It
The best way to evaluate or learn the debug kernel is to get your hands on it. The rest of this article provides hands-on examples. We have provided some programs to run on the debug kernel. As you run these programs, follow the output from our debug sessions and read the notes.

Print a copy of DEBUG.DOC! Plug in your debug terminal! Get your baud rates matched! Get your wires crossed! Copy PROG1.EXE and PROG1.SYM off the CD-ROM and run it on the debug kernel. ''Program 1. A correct program to illustrate the INT3 C macro and the setting of soft breakpoints.''



Notes about session 1:
On line 6, the program stops on the INT3 breakpoint. On line 7, the .p# command shows thread-slot data for the current thread. On line 10, set a breakpoint at function Add3Numbers. On line 11, run (g for go) the program.

The breakpoint hits on line 17 where we dump the memory pointed to by ESP, the stack pointer (line 18), then go to the caller's return address (line 20). Upon return, eax holds the result. On line 25, the list near (ln) command shows the thread has returned to main. On line 28, clear all breakpoints. On line 29, run the program to normal termination.

Extra credit: Using the debugger, determine the line number of the INT3 macro in PROG1.C. Refer to INT3386.H.

Deadly Embrace
Program 2 shows a deadly embrace. When an application appears to hang, use the debugger to find where threads are blocked. PROG2.EXE is a multi-threaded application that uses two mutex semaphores to protect some imaginary resources. Run PROG2.EXE. It prints one message, then hangs.  1: // prog2.c 2:  #define INCL_DOS 3: #include  4:  #include  5: 6: int main( void ); 7: void _System thread2( void *pv ); 8: 9: HMTX hmtx1, hmtx2; 10: 11: // 12:  int main 13: { 14:  // create two named, unowned mutex semaphores 15: DosCreateMutexSem( "\\SEM32\\MUTEX1", 0, 0); 16: DosCreateMutexSem( "\\SEM32\\MUTEX2", 0, 0); 17: // start the second thread 18: _beginthread( thread2, NULL, 32768, NULL ); 19: while ( 1 ) { 20: DosRequestMutexSem( hmtx1, -1 ); 21: DosRequestMutexSem( hmtx2, -1 ); 22: printf( "Thread 1 owns the two semaphores\n" ); 23: DosReleaseMutexSem( hmtx2 ); 24: DosReleaseMutexSem( hmtx1 ); 25: } 26:  return 0; 27: } 28:  // 29:  void _System thread2( void *pv ) 30: { 31:  while ( 1 ) { 32: DosRequestMutexSem( hmtx2, -1 ); 33: DosRequestMutexSem( hmtx1, -1 ); 34: printf( "Thread 2 owns the two semaphores\n" ); 35: DosReleaseMutexSem( hmtx1 ); 36: DosReleaseMutexSem( hmtx2 ); 37: } 38:  }  ''Program 2. A defective program that deadlocks itself''



Notes about session 2:
Run PROG2.EXE. At the debug terminal, press Ctrl-C. This is a random act; the register dump is not significant. Chances are, some thread is in the idle loop of the kernel.

The .p command identifies the two thread slots that make up PROG2. (The output of the .p command has been pruned.) On line 8, the state (Sta) of each thread is blocked (blk). The ordinal (Ord) field identifies each thread.

Slot 26 holds thread 1. (Slot numbers will vary from machine to machine.) Switch the debugger context to slot 26 (line 11) with the (line 12) with the .pb# command. Thread 1 is blocked on hmtx2.

To find out who owns semaphore hmtx2, dump the semaphore structure with the .d command (line 15). Semaphore hmtx2 is owned by the thread in slot 27, which is thread 2 of PROG2.EXE. From the SEM32 dump, the da command displays the ASCII name of the semaphore; see line 26.

The .pu# and .r commands display useful ring 3 information for the current slot. EIP is the ring 3 return address; ESP points to the ring 3 stack. Not shown: Use EIP to find the nearest symbol (ln) or to unassemble code. Use ESP to dump the parameters on the stack.

PROG2 is deadlocked. Thread 1 is blocked on MUTEX2, which is owned by thread 2. Apply the same commands to thread 2 (starting at line 36) and find that thread 2 is blocked on MUTEX1, which is owned by thread 1.

Extra credit: Correct the program so that both threads print messages.

The Interactive Ring 3 Trap Handler - VSF *
Program 3 shows the debug kernel's interactive trap handler. When the debugger operates with VSF * set, it stops like a breakpoint on the exact instruction that causes a fatal, hardware-generated, ring 3 exception. By issuing commands to the debugger, you determine the cause of the exception.  1: // prog3.c 2:  #define INCL_DOS 3: #include  4:  #include <stdio.h> 5:  #include "int3386.h" 6: 7: int main( void ); 8: VOID _System LoadBuffer( PSZ pBuf ); 9: PSZ _System my_strcpy( PSZ pDest, PSZ pSrc ); 10: // 11: int main 12: { 13: PSZ psz; 14: 15: INT3; 16: DosAllocMem( (PPVOID)4096, PAG_COMMIT | PAG_READ | PAG_WRITE ); 17: LoadBuffer( psz + 4090 ); 18: return 0; 19: } 20: // 21: VOID _System loadBuffer( PSZ pDest ) 22: { 23:  my_strcpy( pDest, "Hello World" ); 24: } 25:  // 26:  PSZ _System my_strcpy( PSZ pDest, PSZ pSrc ) 27: { 28:  PSZ pRet; 29: 30: pRet = pDest; 31: while ( *pDest++ = *pSrc++ ); 32: return pRet; 33: } </PRE> ''Program 3. A defective program that causes a ring 3 trap''



Notes about session 3:
When the INT3 breakpoint hits, issue the vsf* command, clear all breakpoints (bc*), and resume execution (g). The debugger stops at the trap (line 14), and gives an indication of the cause of the trap: address D1000 is invalid.

There are several good commands to determine the cause of the trap. On line 15, use the .m command (memory arena display) to query the module for the current EIP; line 20 shows that it is A:PROG3.EXE, but it could be a DLL used by PROG3.EXE. On line 21, the .p# command (print process status) shows that the trapping thread is thread 1 of PROG3.EXE.

On line 24, the ln command (list nearest symbol) reports that the current EIP is between my_strcpy and _beginthread, which means the thread is executing code in my_strcpy. On line 27, the k command (stack trace) shows the call stack for the trapping thr ead. This command works only if all the functions in the call chain have prepared their stack frame in C fashion. If so, an EBP chain exists for the k command to trace. Lines 32-41 show a manual stack walk as far as main.

In C, each function executes a prolog which pushes EBP, then moves ESP into EBP. Finally, the number of bytes required for local, automatic variables is subtracted from ESP. This lets passed parameters be addressed as positive offsets to EBP. Local, automatic variables are addressed as negative offset to EBP.

Line 33 is a dump of four dwords pointed to by EBP. The first is the EBP of the caller; followed by the return address to the caller; followed by parameter 1, pDest; followed by parameter 2, pSrc.

On line 42, we tried dumping what pDest points to, but pDest is invalid as reported on line 14. (pDest is one higher because the compiler treats *pDest++ on line 31 of Program 3.) On Line 44, use gt (go trap) to pass the trap and allow the system to take its default action.

Extra credit: given vsf * is set, would the debugger still stop on the trap if you have an exception handler registered?

KDB.INI
At boot time, the OS/2 2.x debug kernel searches for a file in the root directory named KDB.INI. This plain-text .INI is best created with COPY CON, and most plain text editors work. Do not use the OS/2 System Editor for this task.

The contents of KDB.INI are a series of commands, which are processed by the debug kernel early in system initialization. There is one command per line. Think of it as a AUTOEXEC.BAT for the debugger.

There are two principle uses for KDB.INI. First and foremost, it is used to issue the vsf * command. A KDB.INI to accomplish this follows: vsf * g g (go, run) should be the last command in KDB.INI followed by a carriage return; otherwise, the system remains stopped.

Second, you can use KDB.INI to change the default baud rate and COM port. By default, the debug kernel uses COM2 at 9600,N,8. The right commands in KDB.INI can change these defaults. This KDB.INI uses the COM1. .b 2400T 1 g The T after 2400 indicates 2400 is a base-10 number. The 1 indicates COM1; a 2 indicates COM2. For COM ports other than 1 or 2, give the base port address, rather than 1.

Printing Text to the Debug Terminal
The printf function alone has debugged many small programs, so its usefulness in large programs should not be overlooked. PMWIN.DLL and PMDD.SYS provide an interface for threads operating at ring 3 to print text to the debug terminal.

A DBPRINTF macro and worker routine are provided on The Developer Connection for OS/2 CD-ROM. See DBPRINTF.H and DBPRINTF.C. DBPRINTF is a conditional macro that appears only in debug builds.

Specialized Debug Terminal Software
On The Developer Connection for OS/2 CD-ROM, there is a program called Debugo. Debugo is a OS/2 PM terminal program especially for the debug kernel. Because the debug kernel is command driven, programs such as Debugo save typing. See the help panels in Debugo for more information.

Summary
The sample programs use the _System</tt> calling convention, instead of the _Optlink</tt> convention to ensure that parameters are passed through the stack, which is easier to debug.

This debugger exists for current and future non-MACH versions of OS/2 on Intel processors. In Boca, there is a Hang and Trap class offered by OS/2 Service; contact Dennis Sposato at IBM Boca Raton at 407-443-0509.

Send your answers to the Extra Credit to us using either the address in the Newsletter or on CompuServe (Section 12 of OS2DF2), the DEVCON CFORUM (IBM Talklink), or the Internet (devcon@vnet.ibm.com).

Extra credit: Figure out why those unpopped kernels didn't pop. Put them on the debug kernel!