Using the OS/2 debugging interface to monitor the system

By Roger Orr

Introduction
One of the components of OS/2 which nearly all OS/2 developers have used, but which very few have programmed themselves, is the debugging interface.

Both Codeview and Multiscope use this interface to provide you with the ability to debug your programs. It is this interface which enables you to single-step your program, to set breakpoints and to examine or modify data.

This article is a simple introduction to using this interface, and provides an example program which may be of some use in its own right as well.

Overview of DosPTrace
OS/2 is a protect mode operating system, which normally PREVENTS one process interfering with another one. Unfortunately this is exactly what a debugger is required to do, and so in order to support debugging a standardised debugging interface was included in OS/2.

The debugging interface allows one process (the debugger) to start another process in debug mode. OS/2 provides the mechanism for the debugger to control execution of the child process, set/clear breakpoints, read/write registers and memory, etc.

The mechanism is that the debugger starts the program to be debugged using the appropriate debugging option on DosExecPgm or DosStartSession to inform OS/2 that this program will be debugged.

The debugger then controls this process using the DosPTrace function. This function takes one parameter - a pointer to a debug buffer. Some of the fields in this buffer are set up before DosPTrace is called, and others are filled in by the operating system upon return. (See source listing below for a definition of the PTRACEBUF structure)

The two most important fields in the buffer are the 'pid' and 'cmd' fields. The 'cmd' field identifies the action required of DosPtrace, for example read memory, stop process, etc; and the 'pid' field identifies WHICH process the command is to refer to.

On return the 'cmd' field is replaced with the return code.

The first command issued for a new process should be the PT_STOP command, which initialises the DosPTrace buffer. This will return with PT_LOADED, which indicates that the main program module has been loaded. The command should then be reissued, and will return PT_LOADED for each of the DLLs which the program requires. Once all the DLLs are loaded the PT_STOP command will return PT_SUCCESS.

At this point the debugger can issue commands to read/write memory or registers, set/clear breakpoints, and to run the program.

Control will be returned by OS/2 when the debugged program hits a breakpoint, causes a processor exception, or terminates. OS/2 will fill in the buffer with the current register contents and an indication of which of the above caused the return.

In addition to this functionality, a second debug option for both DosExecPgm and DosStartSession is provided to allow the debugger to control both the initial process and also ALL processes begun by it, whether by DosExecPgm or DosStartSession. The mechanism is that DosPTrace returns with a status of PT_CHILDPID status, providing the PID of the descendant process. The debugger must then create a second thread, or another process, to manage the debugging of this process.

Background to the example program
Unfortunately, DosPTrace is not particularly well described in the OS/2 manuals which I have seen. In particular, the use of the child process debug option described in the previous paragraph is incomplete - the PT_CHILDPID return is generally not mentioned at all!

Space does not permit me to provide a complete description of DosPtrace - the range of commands which can be used, and the relevant fields for each command. The best source I can suggest is to refer to as many OS/2 manuals as you can get hold of - and not to believe any of them... I have found the only way to find out how DosPTrace works is to write programs which use it.

For this reason I thought that a simple, working, example of a framework for using DosPTrace would help to make up for the inadequacies of the documentation, and might well be of interest, or even or use, to the readers of Pointers.

While I was deciding what was the best way of going about this, it occurred to me that the main OS/2 shell program is, after all, just a program - it is PMSHELL.EXE. After OS/2 has initialised it is in charge of starting other sessions, and hence programs.

I decided my example program would use PMSHELL itself as the program to debug. By using the 'child process' debug mode, all processes started after OS/2 has initialised will be run under control of the OS/2 debug interface and hence of this example program.

I have called the example program 'BIGBRO', as in the 'big brother' of George Orwell's "1984", because it watches all the programs running without them being aware of it.

Functionality of the example program
The program is installed in CONFIG.SYS (as described below), and the system rebooted. After initialisation has completed all the processes which are started by PMSHELL or its descendants are run under the control of 'BIGBRO', which does the following:
 * logs all process starts, with PID and full program name
 * logs all process terminations, with return code
 * replaces the 'trap 0D' hard error popup with a simple-minded log of the registers, and a beep.

Obviously further enhancements would be easy to add, but space restricts them in this article! Data about active processes could be maintained, a full walkback could be performed for process exceptions, remote network access to BIGBRO could be added, etc.

There are a few point which must be borne in mind when using BIGBRO: Owing to the way DosPTrace works process which fail with a trap 0D do not return the same error code to the parent process.
 * each active process requires a dedicated thread in BIGBRO, which consumes memory, and also limits the number of active processes to the maximum number of threads in a single process.
 * Program loading is slower because each DLL loaded requires action by BIGBRO to continue.
 * The simple-minded logging creates a file which grows without limit.
 * Trap 0D errors (general protection violation) are handled by BIGBRO, rather than the default HARDERR handling of a full-screen popup.
 * It is very tricky to use a debugger while BIGBRO is active - OS/2 only allows ONE process to be in control. Both the debugger and BIGBRO are potential candidates, but only one will succeed!

Notes on the program itself
I am using Microsoft C 6.00 and OS/2 1.20 and 1.30. The compilation is: cl /AL /Gs /MT /W4 /Zp bigbro.c For other compilers you need to find out how to create multi-threaded programs, for example for C5.1 you need to use the "mt\xxx" include files and the multithreaded C runtime library.


 * 1) Each process is handled by a dedicated thread, which issues the PT_STOP command until it gets a successful completion. The PT_LOADED completion indicates a module has been loaded, and the first module is the program itself.
 * 2) PT_FAULT return indicates a protection fault (trap 0D). The registers are written to the logfile and the program is terminated.
 * 3) PT_CHILDPID return indicates that this process has started a child process, so a fresh thread is created to deal with this. NOTE this simple program doesn't do any checking on the return from _beginthread!
 * 4) PT_DYING indicates the process is terminating, and so the return code is saved for displaying at the end.
 * 5) PT_SIGNAL and PT_ENDTHREAD are ignored - they are merely informative.
 * 6) 'Unexpected' return codes cause the program to be aborted as a failsafe - if you have programs which cause some of the other errors, such as floating point errors, you may wish to treat these errors differently.

Testing and installation
The program can be tested by invoking as: bigbro c:\os2\cmd.exe this shells a command prompt at which you can execute a few programs, and type EXIT to quit. Then type c:\shell.log and check that the file contains data about processes started and stopping.

The program is then installed by editing CONFIG.SYS. The line: 'PROTSHELL=C:\OS2\PMSHELL.EXE ....' is changed by inserting BIGBRO after the '=' as follows: 'PROTSHELL=C:\DEBUG\BIGBRO.EXE C:\OS2\PMSHELL.EXE ....' where in this example I placed BIGBRO.EXE in the C:\DEBUG directory.

As always when changing CONFIG.SYS for any reason, KEEP A BACKUP in case of finger trouble.

Now reboot your machine. Start an OS/2 command prompt, and type out c:\shell.log for a list of which processes have started and terminated.

Program source BIGBRO.C
Roger Orr - 10-Aug-1991