Using the OS/2 debugging interface to monitor the system

From EDM2
Jump to: navigation, search

By Roger Orr

Introduction

One of the components of OS/2 which nearly all OS/2 developers have used, but which very few have programmed themselves, is the debugging interface.

Both Codeview and Multiscope use this interface to provide you with the ability to debug your programs. It is this interface which enables you to single-step your program, to set breakpoints and to examine or modify data.

This article is a simple introduction to using this interface, and provides an example program which may be of some use in its own right as well.

Overview of DosPTrace

OS/2 is a protect mode operating system, which normally PREVENTS one process interfering with another one. Unfortunately this is exactly what a debugger is required to do, and so in order to support debugging a standardised debugging interface was included in OS/2.

The debugging interface allows one process (the debugger) to start another process in debug mode. OS/2 provides the mechanism for the debugger to control execution of the child process, set/clear breakpoints, read/write registers and memory, etc.

The mechanism is that the debugger starts the program to be debugged using the appropriate debugging option on DosExecPgm or DosStartSession to inform OS/2 that this program will be debugged.

The debugger then controls this process using the DosPTrace function. This function takes one parameter - a pointer to a debug buffer. Some of the fields in this buffer are set up before DosPTrace is called, and others are filled in by the operating system upon return. (See source listing below for a definition of the PTRACEBUF structure)

The two most important fields in the buffer are the 'pid' and 'cmd' fields. The 'cmd' field identifies the action required of DosPtrace, for example read memory, stop process, etc; and the 'pid' field identifies WHICH process the command is to refer to.

On return the 'cmd' field is replaced with the return code.

The first command issued for a new process should be the PT_STOP command, which initialises the DosPTrace buffer. This will return with PT_LOADED, which indicates that the main program module has been loaded. The command should then be reissued, and will return PT_LOADED for each of the DLLs which the program requires. Once all the DLLs are loaded the PT_STOP command will return PT_SUCCESS.

At this point the debugger can issue commands to read/write memory or registers, set/clear breakpoints, and to run the program.

Control will be returned by OS/2 when the debugged program hits a breakpoint, causes a processor exception, or terminates. OS/2 will fill in the buffer with the current register contents and an indication of which of the above caused the return.

In addition to this functionality, a second debug option for both DosExecPgm and DosStartSession is provided to allow the debugger to control both the initial process and also ALL processes begun by it, whether by DosExecPgm or DosStartSession. The mechanism is that DosPTrace returns with a status of PT_CHILDPID status, providing the PID of the descendant process. The debugger must then create a second thread, or another process, to manage the debugging of this process.

Background to the example program

Unfortunately, DosPTrace is not particularly well described in the OS/2 manuals which I have seen. In particular, the use of the child process debug option described in the previous paragraph is incomplete - the PT_CHILDPID return is generally not mentioned at all!

Space does not permit me to provide a complete description of DosPtrace - the range of commands which can be used, and the relevant fields for each command. The best source I can suggest is to refer to as many OS/2 manuals as you can get hold of - and not to believe any of them... I have found the only way to find out how DosPTrace works is to write programs which use it.

For this reason I thought that a simple, working, example of a framework for using DosPTrace would help to make up for the inadequacies of the documentation, and might well be of interest, or even or use, to the readers of Pointers.

While I was deciding what was the best way of going about this, it occurred to me that the main OS/2 shell program is, after all, just a program - it is PMSHELL.EXE. After OS/2 has initialised it is in charge of starting other sessions, and hence programs.

I decided my example program would use PMSHELL itself as the program to debug. By using the 'child process' debug mode, all processes started after OS/2 has initialised will be run under control of the OS/2 debug interface and hence of this example program.

I have called the example program 'BIGBRO', as in the 'big brother' of George Orwell's "1984", because it watches all the programs running without them being aware of it.

Functionality of the example program

The program is installed in CONFIG.SYS (as described below), and the system rebooted. After initialisation has completed all the processes which are started by PMSHELL or its descendants are run under the control of 'BIGBRO', which does the following:

  • logs all process starts, with PID and full program name
  • logs all process terminations, with return code
  • replaces the 'trap 0D' hard error popup with a simple-minded log of the registers, and a beep.

Obviously further enhancements would be easy to add, but space restricts them in this article! Data about active processes could be maintained, a full walkback could be performed for process exceptions, remote network access to BIGBRO could be added, etc.

There are a few point which must be borne in mind when using BIGBRO:

  • each active process requires a dedicated thread in BIGBRO, which consumes memory, and also limits the number of active processes to the maximum number of threads in a single process.
  • Program loading is slower because each DLL loaded requires action by BIGBRO to continue.
  • The simple-minded logging creates a file which grows without limit.
  • Trap 0D errors (general protection violation) are handled by BIGBRO, rather than the default HARDERR handling of a full-screen popup.

Owing to the way DosPTrace works process which fail with a trap 0D do not return the same error code to the parent process.

  • It is very tricky to use a debugger while BIGBRO is active - OS/2 only allows ONE process to be in control. Both the debugger and BIGBRO are potential candidates, but only one will succeed!

Notes on the program itself

I am using Microsoft C6.00 and OS/2 1.20 and 1.30. The compilation is:

cl /AL /Gs /MT /W4 /Zp bigbro.c

For other compilers you need to find out how to create multi-threaded programs, for example for C5.1 you need to use the "mt\xxx" include files and the multithreaded C runtime library.

(1) Each process is handled by a dedicated thread, which issues the PT_STOP command until it gets a successful completion. The PT_LOADED completion indicates a module has been loaded, and the first module is the program itself.

(2) PT_FAULT return indicates a protection fault (trap 0D). The registers are written to the logfile and the program is terminated.

(3) PT_CHILDPID return indicates that this process has started a child process, so a fresh thread is created to deal with this. NOTE this simple program doesn't do any checking on the return from _beginthread!

(4) PT_DYING indicates the process is terminating, and so the return code is saved for displaying at the end.

(5) PT_SIGNAL and PT_ENDTHREAD are ignored - they are merely informative.

(6) 'Unexpected' return codes cause the program to be aborted as a failsafe - if you have programs which cause some of the other errors, such as floating point errors, you may wish to treat these errors differently.

Testing and installation

The program can be tested by invoking as:

bigbro c:\os2\cmd.exe

this shells a command prompt at which you can execute a few programs, and type EXIT to quit. Then type c:\shell.log and check that the file contains data about processes started and stopping.

The program is then installed by editing CONFIG.SYS. The line:

'PROTSHELL=C:\OS2\PMSHELL.EXE ....'

is changed by inserting BIGBRO after the '=' as follows:

'PROTSHELL=C:\DEBUG\BIGBRO.EXE C:\OS2\PMSHELL.EXE ....'

where in this example I placed BIGBRO.EXE in the C:\DEBUG directory.

As always when changing CONFIG.SYS for any reason, KEEP A BACKUP in case of finger trouble.

Now reboot your machine. Start an OS/2 command prompt, and type out c:\shell.log for a list of which processes have started and terminated.

Program source BIGBRO.C

/*****************************************************************************/
/* Include files                                                             */
/*****************************************************************************/

#define         INCL_DOSPROCESS
#define         INCL_DOSINFOSEG
#include        <os2.h> 

#include        <stdio.h>
#include        <stdlib.h>
#include        <string.h>
#include        <process.h>

/*****************************************************************************/
/* DosPTrace definitions (these are in some but not all versions of BSEDOS.H */
/*****************************************************************************/

USHORT APIENTRY DosPTrace(PBYTE pPtraceBuf); 

/* structure of DosPtrace buffer */
typedef struct _PTRACEBUF {
       PID    pid;               /* Process ID.                             */
       TID    tid;               /* Thread ID, or zero.                     */
       USHORT cmd;               /* command/return code.                    */
       USHORT value;             /* supplementary info.                     */
       USHORT offv;              /* offset value.                           */
       USHORT segv;              /* segment value.                          */
       USHORT mte;               /* library module handle.                  */
       USHORT rAX, rBX, rCX, rDX, rSI, rDI, rBP; /* register contents.      */
       USHORT rDS, rES, rIP, rCS, rF, rSP, rSS;
} PTRACEBUF;
typedef PTRACEBUF FAR *PPTRACEBUF;

/* selected command values to DosPTrace() */
#define PT_GO                0x0007    /* Go                                 */
#define PT_TERM              0x0008    /* Terminate child process.           */
#define PT_STOP              0x000A    /* Stop child process.                */
#define PT_GET_MOD           0x0010    /* Get library-module name.           */

/* selected return codes from DosPTrace() */
#define PT_SUCCESS           0x0000    /* Success return code.               */
#define PT_ERROR             0xFFFF    /* Error.                             */
#define PT_SIGNAL            0xFFFE    /* About to receive signal.           */
#define PT_DYING             0xFFFA    /* Process dying.                     */
#define PT_FAULT             0xFFF9    /* General protection fault occurred. */
#define PT_LOADED            0xFFF8    /* Library module has been loaded.    */
#define PT_ENDTHREAD         0xFFF6    /* thread terminated                  */
#define PT_CHILDPID          0xFFF4    /* child process starting             */

/*****************************************************************************/
/* local variables and definitions                                           */
/*****************************************************************************/

#define STACKSIZE       2048      /* size of stack (standard OS/2 recommended*/

static FILE *logfile = NULL;      /* file to write logging information to    */ 

static PGINFOSEG ginfo = NULL;    /* used for obtaining time-of-day          */


/*****************************************************************************/
/* fault: process fault message                                              */
/*****************************************************************************/ 

void fault(PTRACEBUF *bufp, char *program)
  {
  /* Log information about the trap */
  fprintf(logfile, "%2.2i:%2.2i:%2.2i - %4.4x: %s  TRAP at %4.4x:%4.4x\n",
       ginfo->hour, ginfo->minutes, ginfo->seconds,
       bufp->pid, program, bufp->segv, bufp->offv);

  fprintf(logfile, "  Registers: AX: %4.4x  BX: %4.4x  CX: %4.4x  DX: %4.4x\n",
               bufp->rAX, bufp->rBX, bufp->rCX, bufp->rDX);

  fprintf(logfile, "             SI: %4.4x  DI: %4.4x  BP: %4.4x  SP: %4.4x\n",
               bufp->rSI, bufp->rDI, bufp->rBP, bufp->rSP);

  fprintf(logfile, "             DS: %4.4x  ES: %4.4x  SS: %4.4x  Fl: %4.4x\n",
               bufp->rDS, bufp->rES, bufp->rSS, bufp->rF);

  /* tell the user something died! */
  DosBeep(300,  250);      DosBeep(200,  250);
  DosBeep(300,  200);      DosBeep(200,  200);
  DosBeep(300,  150);      DosBeep(200,  150);
  DosBeep(300,  100);      DosBeep(200,  100);
  DosBeep(2000, 1000);
  }


/*****************************************************************************/
/* processthread: thread to handle each process                              */
/*****************************************************************************/

void far processthread(PVOID pvoid)
  {
  PTRACEBUF buf;                 /* buffer for DosPTrace                    */
  USHORT rc;                     /* return code from DosPTrace              */
  USHORT cmd = PT_STOP;          /* current DosPTrace command               */
  char modbuf[256];              /* module name buffer                      */
  PSZ progname = (PSZ)modbuf;    /* pointer to program name                 */
  USHORT done = FALSE;           /* TRUE when process finished              */
  USHORT normal_exit = FALSE;    /* TRUE when process exits normally        */
  USHORT return_code = 0;        /* process return code on normal exit      */


  /* clear module name and set up the DosPTrace buffer */
  *progname = '\0';
  buf.pid = (PID) ((ULONG)pvoid);
  buf.tid = 0;

  for (; !done; )
     {
     buf.cmd = cmd;
     if ( (rc = DosPTrace((PBYTE)&buf)) != 0)
        {
        fprintf(logfile, "%2.2i:%2.2i:%2.2i - %4.4x: DosPTrace error %u\n",
                ginfo->hour, ginfo->minutes, ginfo->seconds,
                buf.pid, rc);
        break;
        }

     switch (buf.cmd)
        {
        case PT_SUCCESS:
           if (cmd == PT_STOP)
              {
              /* PT_STOP returns SUCCESS once all DLLs loaded */
              cmd = PT_GO;
              }
           break;

        case PT_ERROR:
           done = TRUE;
           break;

        case PT_LOADED:
           if (*progname == '\0')
              {
              /* first module is the program - get its name */
              buf.cmd = PT_GET_MOD;
              buf.value = buf.mte;
              buf.offv = OFFSETOF(progname);
              buf.segv = SELECTOROF(progname);
              DosPTrace((PBYTE)&buf);

              /* print out start message */
              fprintf(logfile, "%2.2i:%2.2i:%2.2i - %4.4x: Started %s\n",
                  ginfo->hour, ginfo->minutes, ginfo->seconds,
                  buf.pid, progname);
              }
           break;

        case PT_CHILDPID:
           /* start another thread for the new process */
           _beginthread(processthread, NULL, STACKSIZE, (PVOID)buf.value );
           break;

        case PT_ENDTHREAD:
        case PT_SIGNAL:
           break;

        case PT_DYING:
           normal_exit = TRUE;
           return_code = buf.value;
           break;

        case PT_FAULT:
           fault(&buf, progname);
           cmd = PT_TERM;
           break;

        default:
           fprintf(logfile, "%2.2i:%2.2i:%2.2i - %4.4x: Aborted - error %i\n",
                     ginfo->hour, ginfo->minutes, ginfo->seconds,
                     buf.pid, buf.cmd);
           cmd = PT_TERM;
           break;
        }
     }

  fprintf(logfile, "%2.2i:%2.2i:%2.2i - %4.4x: Terminated %s",
     ginfo->hour, ginfo->minutes, ginfo->seconds,
     buf.pid, progname);

  if (normal_exit)
     fprintf(logfile, " returning %u", return_code);

  fprintf(logfile, "\n");
  }
/*****************************************************************************/
/* init: initialise the program                                              */
/*****************************************************************************/

short init(char **argv, PID *pidptr)
  {
  char program[256];             /* program name + arguments                */
  SEL gsel, lsel;
  RESULTCODES res;
  USHORT rc;
  USHORT len;                    /* length of program name                  */

  /* extract GINFOSEG pointer for time of day */
  DosGetInfoSeg(&gsel, &lsel);
  ginfo = MAKEPGINFOSEG(gsel);

  /* set up program name */
  strcpy(program, *argv++);
  len = strlen(program);

  /* append command line arguments, if any */
  for (; *argv != NULL; argv++)
     {
     strcat(program, " ");
     strcat(program, *argv);
     }

  program[strlen(program) + 1] = '\0';
  program[len] = '\0';

  rc = DosExecPgm(NULL, 0,
             EXEC_TRACE * 2,  /* magic number for CHILD process debug!   */
             program, NULL, &res, program);


  if (rc != 0)
     {
     fprintf(logfile, "Error %u executing %s\n", rc, program);
     }
  else
     {
     /* save PID */
     *pidptr = res.codeTerminate;
     }

  return (rc);
  }


/*****************************************************************************/
/* M A I N  P R O G R A M                                                    */
/*****************************************************************************/

int main(int argc, char **argv)
  {
  int rc;                        /* return code */
  PID mainpid;                   /* PID of PMSHELL */


  printf("Big Brother started...\n");

  argv++, argc--;                /* skip our own name */

  /* open log file and make sure PMSHELL does not get it too */
  logfile = fopen( "C:\\SHELL.LOG", "w" );
  setbuf(logfile, NULL);
  DosSetFHandState(fileno(logfile), OPEN_FLAGS_NOINHERIT);

  if ((rc = init(argv, &mainpid)) == 0)
     processthread( (PVOID)mainpid );

  return rc;
  }

Roger Orr - 10-Aug-1991