Jump to content

Exception Management - or coping with bugs: Difference between revisions

From EDM2
Created page with " Exception Management - or coping with bugs Introduction ------------ One of the hardest parts of writing robust programs is dealing with the unexpected. Ther..."
 
mNo edit summary
Line 1: Line 1:
        Exception Management - or coping with bugs
[[Roger Orr]]
 
 
    Introduction
    ------------




==Introduction==
One of the hardest parts of writing robust programs is dealing with
One of the hardest parts of writing robust programs is dealing with
the unexpected.  There is nothing worse than a mission critical program
the unexpected.  There is nothing worse than a mission critical program
Line 60: Line 57:
handler chain for thread 1 of the program.
handler chain for thread 1 of the program.


 
==The functions provided by OS/2==
        The functions provided by OS/2
        ------------------------------
 
Eexception handlers are registered and de-registered by DosSetExceptionHandler
Eexception handlers are registered and de-registered by DosSetExceptionHandler
and DosUnSetExceptionHandler.  These take the address of an exception
and DosUnSetExceptionHandler.  These take the address of an exception
Line 96: Line 90:




        Overview of a general exception handler
==Overview of a general exception handler==
        ---------------------------------------
 
 
An exception handler ought to have the following properties:
An exception handler ought to have the following properties:


    1) It should not impact other exception handlers
1) It should not impact other exception handlers
    2) It must be reliable
2) It must be reliable


Point 1 is necessary to ensure that, for example, the exception caused by
Point 1 is necessary to ensure that, for example, the exception caused by reaching the end of the currently allocated stack is allowed to pass on along the chain to the exception handler which will allocate additional stack space and resume execution.  It is quite easy to enforce this behaviour by simply ensuring that 0 (or XCPT_CONTINUE_SEARCH) is returned for all exceptions other than the one or ones being explicitly processed.
reaching the end of the currently allocated stack is allowed to pass on along
the chain to the exception handler which will allocate additional stack space
and resume execution.  It is quite easy to enforce this behaviour by simply
ensuring that 0 (or XCPT_CONTINUE_SEARCH) is returned for all exceptions other
than the one or ones being explicitly processed.


Point 2 is necessary to ensure your program can be terminated successfully!
Point 2 is necessary to ensure your program can be terminated successfully!  
The exception chain is called on program termination and the program WILL NOT
The exception chain is called on program termination and the program WILL NOT TERMINATE until each exception handler has returned.  One other point is that nested exceptions are quite easy to generate (ie your exception handler causes an exception!) but will crash or hang your program unless checked against.
TERMINATE until each exception handler has returned.  One other point is that
Fortunately there is a flag passed to the exception handler when a nested exception is being processed and usually if this flag is set you will pass the exception on up the chain of handlers until it gets out of the nested region.
nested exceptions are quite easy to generate (ie your exception handler causes
an exception!) but will crash or hang your program unless checked against.
Fortunately there is a flag passed to the exception handler when a nested
exception is being processed and usually if this flag is set you will pass
the exception on up the chain of handlers until it gets out of the nested
region.


An example framework for a 'do nothing' exception handler in C is:
An example framework for a 'do nothing' exception handler in C is:


ULONG APIENTRY handler(
ULONG APIENTRY handler(
   EXCEPTIONREPORTRECORD *pReport, /* details of this exception */
   EXCEPTIONREPORTRECORD *pReport, /* details of this exception */
   EXCEPTIONREGISTRATIONRECORD *pRegRecord, /* registration record for handler */
   EXCEPTIONREGISTRATIONRECORD *pRegRecord, /* registration record for handler */
Line 139: Line 119:
   EXCEPTIONREGISTRATIONRECORD reg_rec = { 0, handler };
   EXCEPTIONREGISTRATIONRECORD reg_rec = { 0, handler };
   APIRET rc = 0;
   APIRET rc = 0;
 
   rc = DosSetExceptionHandler( &reg_rec );
   rc = DosSetExceptionHandler( &reg_rec );
 
   /* Code to be protected goes in here... */
   /* Code to be protected goes in here... */
 
   rc = DosUnsetExceptionHandler( &reg_rec );
   rc = DosUnsetExceptionHandler( &reg_rec );
   }
   }




Note that OS/2 uses the ADDRESS of the exception record to perform
Note that OS/2 uses the ADDRESS of the exception record to perform sorting and chain searching.  The exception record must be allocated from the stack and MUST be unregistered before the procedure exits.  If this is not done the exception record will be overwritten and your program will probably crash or fail to exit since the exception record chain will be corrupted.
sorting and chain searching.  The exception record must be allocated from
The single commonest cause of this error is registering an exception handler at the beginning of a procedure and either not deregistering it at the end, or doing so - but then returning before the end of the procedure (for example on some error condition).  The program may well continue to work fine - until it ends and OS/2 attempts to pass the program termination exception down the chain of handlers; disaster occurs because the data structure for the first handler has been overwritten and the program locks up. This is particularly true of exception handlers which are registered at the beginning of the 'main' procedure - you must EITHER deregister them OR ensure the program is ended by calling the exit function rather than by returning from the main procedure.
the stack and MUST be unregistered before the procedure exits.  If this is
not done the exception record will be overwritten and your program will
probably crash or fail to exit since the exception record chain will be
corrupted.
The single commonest cause of this error is registering an exception handler
at the beginning of a procedure and either not deregistering it at the end,
or doing so - but then returning before the end of the procedure (for example
on some error condition).  The program may well continue to work fine - until
it ends and OS/2 attempts to pass the program termination exception down the
chain of handlers; disaster occurs because the data structure for the first
handler has been overwritten and the program locks up.
This is particularly true of exception handlers which are registered at the
beginning of the 'main' procedure - you must EITHER deregister them OR
ensure the program is ended by calling the exit function rather than by
returning from the main procedure.


The first parameter of the exception handling function explains the actual
The first parameter of the exception handling function explains the actual exception being processed - the most important field being the ExceptionNum which describes the exception, for example XCPT_INTEGER_DIVIDE_BY_ZERO or XCPT_ACCESS_VIOLATION. Many exception handlers consist of a switch on this value with the 'default' statement returning 0. Additional fields give more information, which may be exception specific such as the address causing the access violation for the second example above; others more general such as a flag indicating whether an exception is nested.
exception being processed - the most important field being the ExceptionNum
which describes the exception, for example XCPT_INTEGER_DIVIDE_BY_ZERO or
XCPT_ACCESS_VIOLATION. Many exception handlers consist of a switch on this
value with the 'default' statement returning 0.
Additional fields give more information, which may be exception specific such
as the address causing the access violation for the second example above;
others more general such as a flag indicating whether an exception is nested.


The second parameter points to the registration record used to register this
The second parameter points to the registration record used to register this instance of the exception handler.  Typically the exception handler will need some additional parameters to enable proper action to be taken.  One option is of course to use some global or module variable, but this can get a little tricky with multiple threads or recursive procedures; and so a better method is to imbed the exception registration record in a larger structure, and thus since the address of this record is passed to the exception handler it can then be used to access the extra information.  I use this method in the sample program below.
instance of the exception handler.  Typically the exception handler will need
some additional parameters to enable proper action to be taken.  One option is
of course to use some global or module variable, but this can get a little
tricky with multiple threads or recursive procedures; and so a better method is
to imbed the exception registration record in a larger structure, and thus
since the address of this record is passed to the exception handler it can
then be used to access the extra information.  I use this method in the sample
program below.


The third parameter points to the context record describing the machine state
The third parameter points to the context record describing the machine state when the exception occurred - you can for example read (and modify) the registers.  This is of most use to assembler programmers who might for example single step over a failing instruction by modifying the instruction pointer in this record - users of a high level language usually do not have enough control over the machine code generated to do much with this information.
when the exception occurred - you can for example read (and modify) the
registers.  This is of most use to assembler programmers who might for example
single step over a failing instruction by modifying the instruction pointer in
this record - users of a high level language usually do not have enough control
over the machine code generated to do much with this information.


Finally the fourth parameter is used to pass additional information for one
Finally the fourth parameter is used to pass additional information for one or two specific exceptions.  See the full OS/2 documentation for more details.
or two specific exceptions.  See the full OS/2 documentation for more details.




      Description of the sample program
==Description of the sample program==
      ---------------------------------
Well that's enough (too much?) of these technical details - here is a simple example of how you might use an exception handler in 'C' to check pointers for validity.  This is the 'lazy validation' method - invalid pointers are hopefully a rare error so just access the area pointed to and 99% of the time it will work.  The exception handler will catch the bad 1% and enable you (in a fully fledged program) to return an error code or take some other avoiding action.


Well that's enough (too much?) of these technical details - here is a simple
Under earlier versions of OS/2 you couldn't do this - one bad pointer access and you program was unceremoniously aborted.
example of how you might use an exception handler in 'C' to check pointers
for validity.  This is the 'lazy validation' method - invalid pointers are
hopefully a rare error so just access the area pointed to and 99% of the
time it will work.  The exception handler will catch the bad 1% and enable
you (in a fully fledged program) to return an error code or take some other
avoiding action.


Under earlier versions of OS/2 you couldn't do this - one bad pointer
So how does it work?  The code is listed below and consists of a simple user test harness in the 'main' procedure which asks for an address and then calls the verify procedure to check and display the byte at that address. If unsuccessful the routine prints 'Bad address!' instead.
access and you program was unceremoniously aborted.


So how does it work?  The code is listed below and consists of a simple user
test harness in the 'main' procedure which asks for an address and then
calls the verify procedure to check and display the byte at that address.
If unsuccessful the routine prints 'Bad address!' instead.


==Comments on the program==
The first point to note is that the exception handler is localised - ie it is registered and deregistered within the verify procedure as close to where it is required as possible.  This means that I can safely assume that I can process ALL access violations without needing to check any further.  It is in general a good idea to try and keep exception handlers close to the code they are protecting.


      Comments on the program
The second point is the use of an 'extended' registration record (MYREC) to enable me to pass the jmp_buf structure into the exception handler.
      -----------------------
 
The first point to note is that the exception handler is localised - ie it
is registered and deregistered within the verify procedure as close to where
it is required as possible.  This means that I can safely assume that I can
process ALL access violations without needing to check any further.  It is
in general a good idea to try and keep exception handlers close to the code
they are protecting.
 
The second point is the use of an 'extended' registration record (MYREC) to
enable me to pass the jmp_buf structure into the exception handler.


The third point is that setjmp/longjmp makes use of the OS/2 exception manager.
The third point is that setjmp/longjmp makes use of the OS/2 exception manager.
In particular longjmp calls DosUnwindException to remove all exception
In particular longjmp calls DosUnwindException to remove all exception handlers added since the environment was saved by setjmp.  In the code I register VerifyHandler AFTER setjmp has been called.  If the pointer access is invalid the exception handler is called, longjmp is executed and the code to deregister the exeception handler is not called.  However, if you try and deregister VerifyHandler in the 'failing' branch of the setjmp call you will find that the call returns an error. This is because when longjmp called DosUnwindException VerifyHandler was removed from the chain.
handlers added since the environment was saved by setjmp.  In the code I
register VerifyHandler AFTER setjmp has been called.  If the pointer access is
invalid the exception handler is called, longjmp is executed and the code
to deregister the exeception handler is not called.  However, if you try and
deregister VerifyHandler in the 'failing' branch of the setjmp call you will
find that the call returns an error. This is because when longjmp called
DosUnwindException VerifyHandler was removed from the chain.
 
 
A general point to bear in mind is that the exception handler ONLY AFFECTS
THE THREAD IN WHICH IT WAS REGISTERED so it's no use to register a single
exception handler once in your main program and then create lots of threads -
each one will need to have the handler separately registered.  The only
case where one handler is enough is for signal exceptions since they are
ALWAYS passed to thread 1.


The final point is to make sure you have a lot of stack if you define your
own exception handlers - they seem to need it, especially when you get nested
exceptions.  Failure to have enough stack may cause your program to exit or
hang since OS/2 will be unable to dispatch the program termination exception
properly.  In this example I use 0x4000 bytes which for such a small program
may be a bit excessive, but under OS/2 v2 lazy stack allocation means that the
memory for this stack should only be actually obtained when I need it!


It is another reason why exception handlers should be short and simple
A general point to bear in mind is that the exception handler ONLY AFFECTS THE THREAD IN WHICH IT WAS REGISTERED so it's no use to register a single exception handler once in your main program and then create lots of threads -
(if possible!) since this reduces the stack requirement as well as the
each one will need to have the handler separately registered.  The only case where one handler is enough is for signal exceptions since they are ALWAYS passed to thread 1.
likelihood of a nested exception.


        Conclusion
The final point is to make sure you have a lot of stack if you define your own exception handlers - they seem to need it, especially when you get nested exceptions.  Failure to have enough stack may cause your program to exit or hang since OS/2 will be unable to dispatch the program termination exception properly.  In this example I use 0x4000 bytes which for such a small program may be a bit excessive, but under OS/2 v2 lazy stack allocation means that the memory for this stack should only be actually obtained when I need it!
        ----------


With version 2 OS/2 now provides fairly robust exception management.  The
It is another reason why exception handlers should be short and simple (if possible!) since this reduces the stack requirement as well as the likelihood of a nested exception.
main problem that is it still too easy to prevent a program exiting by a
careless line or two of code in an exception handler.  In addition stack
corruption, or failure to de-register an exception handler on exiting from
the procedure it was defined in, can cause problems which are not easy
to find later on.


I have noticed that a number of IBM's own OS/2 programs (including the command
==Conclusion==
shell itself!) will on occasion fail to exit because of problems with
With version 2 OS/2 now provides fairly robust exception management.  The main problem that is it still too easy to prevent a program exiting by a careless line or two of code in an exception handler.  In addition stack corruption, or failure to de-register an exception handler on exiting from the procedure it was defined in, can cause problems which are not easy to find later on.
exception handlers, and the program is then unkillable and remains hanging
about until you reboot.


On the positive side however it is a nice luxury to be able to protect
I have noticed that a number of IBM's own OS/2 programs (including the command shell itself!) will on occasion fail to exit because of problems with exception handlers, and the program is then unkillable and remains hanging about until you reboot.
important programs from unexpected events, especially bad pointers, and
without needing to resort to assembler to do so.


I hope more OS/2 programmers will be encouraged to have a go at putting
On the positive side however it is a nice luxury to be able to protect important programs from unexpected events, especially bad pointers, and without needing to resort to assembler to do so.
in exception handlers where they are appropriate to make their programs
more reliable or to simplify error handling.


I hope more OS/2 programmers will be encouraged to have a go at putting in exception handlers where they are appropriate to make their programs more reliable or to simplify error handling.


--------------------- Compilation command ------------------------
==Compilation command==
[I am using IBM C/C++]
[I am using IBM C/C++]


icc /wall+ /b/stack:0x4000 VerAddr.c
icc /wall+ /b/stack:0x4000 VerAddr.c
 
------------------------- VerAddr.c --------------------------
 
#define INCL_DOS
 
#include <os2.h>
#include <ctype.h>
#include <stdio.h>
#include <setjmp.h>


typedef struct _myrec            /* 'extended' registration record */
------------------------- VerAddr.c --------------------------
#define INCL_DOS
#include <os2.h>
#include <ctype.h>
#include <stdio.h>
#include <setjmp.h>
typedef struct _myrec            /* 'extended' registration record */
   {
   {
   EXCEPTIONREGISTRATIONRECORD RegRecord; /* MUST BE FIRST! */
   EXCEPTIONREGISTRATIONRECORD RegRecord; /* MUST BE FIRST! */
Line 303: Line 193:
   jmp_buf jmpbuf;
   jmp_buf jmpbuf;
   } MYREC, *PMYREC;
   } MYREC, *PMYREC;
 
ULONG APIENTRY VerifyHandler(
ULONG APIENTRY VerifyHandler(
   EXCEPTIONREPORTRECORD *pReport,
   EXCEPTIONREPORTRECORD *pReport,
   EXCEPTIONREGISTRATIONRECORD *pRegRecord,
   EXCEPTIONREGISTRATIONRECORD *pRegRecord,
   CONTEXTRECORD *pContext,
   CONTEXTRECORD *pContext,
   void *ptr)
   void *ptr)
 
   {
   {
   PMYREC pMyRec = (PMYREC)(PVOID)pRegRecord; /* Get extended structure */
   PMYREC pMyRec = (PMYREC)(PVOID)pRegRecord; /* Get extended structure */
 
   /* Reference unwanted parameters */
   /* Reference unwanted parameters */
   ptr = ptr;
   ptr = ptr;
   pContext = pContext;
   pContext = pContext;
 
   if ( pReport->ExceptionNum  == XCPT_ACCESS_VIOLATION )
   if ( pReport->ExceptionNum  == XCPT_ACCESS_VIOLATION )
       {
       {
Line 322: Line 212:
       longjmp( pMyRec->jmpbuf, 1 );
       longjmp( pMyRec->jmpbuf, 1 );
       }
       }
 
   return XCPT_CONTINUE_SEARCH;
   return XCPT_CONTINUE_SEARCH;
   }
   }
 
void verify( MPARAM pAddr )
void verify( MPARAM pAddr )
   {
   {
   MYREC except = { {0, VerifyHandler }, 0 };
   MYREC except = { {0, VerifyHandler }, 0 };
Line 332: Line 222:
   PSZ p = (PSZ) pAddr;
   PSZ p = (PSZ) pAddr;
   unsigned char ch = '\0';
   unsigned char ch = '\0';
 
 
   if ( setjmp( except.jmpbuf ) == 0 )
   if ( setjmp( except.jmpbuf ) == 0 )
       {
       {
Line 339: Line 229:
       if ( rc != 0 )
       if ( rc != 0 )
         printf( "DosSetExceptionHandler: error - %u\n", (int) rc );
         printf( "DosSetExceptionHandler: error - %u\n", (int) rc );
 
       ch = *p;
       ch = *p;
 
       rc = DosUnsetExceptionHandler( &except.RegRecord );
       rc = DosUnsetExceptionHandler( &except.RegRecord );
       if ( rc != 0 )
       if ( rc != 0 )
         printf( "DosUnSetExceptionHandler - error %u\n", (int)rc );
         printf( "DosUnSetExceptionHandler - error %u\n", (int)rc );
 
       if ( isprint( ch ) )
       if ( isprint( ch ) )
         printf("Value: '%c'\n", ch );
         printf("Value: '%c'\n", ch );
Line 353: Line 243:
   else
   else
       printf( "Bad address - violation error %u\n", except.ErrorCode );
       printf( "Bad address - violation error %u\n", except.ErrorCode );
 
 
   return;
   return;
   }
   }
 
int main( int argc, char **argv )
int main( int argc, char **argv )
   {
   {
   LONG lAddr = 0;
   LONG lAddr = 0;
   char buffer[ BUFSIZ ] = { '\0' };
   char buffer[ BUFSIZ ] = { '\0' };
 
   /* Reference unwanted parameters */
   /* Reference unwanted parameters */
   argc = argc;
   argc = argc;
   argv = argv;
   argv = argv;
 
   while( !feof( stdin ) )
   while( !feof( stdin ) )
       {
       {
Line 381: Line 271:
   }
   }


                                                                  Roger Orr
[[Roger Orr]]
                                                                27-Aug-1993
27-Aug-1993
 


Creative Commons License
[[Category:Languages Articles]]
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License

Revision as of 14:50, 15 January 2012

Roger Orr


Introduction

One of the hardest parts of writing robust programs is dealing with the unexpected. There is nothing worse than a mission critical program suddenly halting because of a division by zero error in some obscure statistics gathering routine, and leaving your users fuming!

The first line of defence is of course to add parameter checking - ensure numbers are 'realistic' and pointers are 'plausible' before attempting to use them. This will deal with most problems, but unfortunately it is extremely expensive (in programmer time, in testing, and in execution time) to exhaustively check every possible cause of error before it occurs. In addition bugs may well occur even in the checking code itself!

However even if this technique is used there is additional class of unexpected events - the user (or another program) killing the program via Ctrl+C or DosKillProcess. In a multi-programming environment it is nice to have some control over program exit to ensure shared resources are left in a consistent state.

What further protection against these events is possible? One answer is to make use of the OS/2 exception management subsystem. This provides a structured way to add special handling for various possible unexpected errors, such as divide by zero, Ctrl+C or attempting to access an invalid address.

Those familiar with OS/2 version 1 will recall that the first of the above could be handled by code using DosSetVec, the second by making use of DosSetSigHandler - and the third couldn't be handled. [Note: programming DosSetSigHandler was discussed in an earlier article - Pointers issue 12 (Jan/Feb 91)]

So what's new in version 2?

One of the goals of OS/2 version 2 was to provide an environment which was less specifically targeted to the Intel x86 processor. The most obvious part of the base OS/2 which has been affected is the memory model - no more 64Kb segments just 4Gb of linearly addressable memory! However one of the other areas to be reworked in version 2 is the handling of exceptions and signals, to provide a more generalised mechanism.

Under OS/2 version 2 each thread in a process can have one or more procedures registered as "exception handlers", which means that they are called when an exception occurs and can take corrective action.

The procedures are called in turn starting with the most recently registered. Each procedure has three possible actions: it can return 0 to continue the search down the chain; it can return 0xffffffff to resume execution of the program (after having removed the cause of the fault), or it can jump somewhere else - for example by using the 'C' language longjmp procedure. If every handler in the chain returns 0 then OS/2 takes some default action - typically to abort the process.

So-called 'signals' such as Ctrl+C or another program issuing DosKillProcess generate a specific class of exception which is handled by the exception handler chain for thread 1 of the program.

The functions provided by OS/2

Eexception handlers are registered and de-registered by DosSetExceptionHandler and DosUnSetExceptionHandler. These take the address of an exception registration record - containing a reserved word (used to maintain the chain of exception handlers) and the address of the exception handling function.

DosUnwindException can be used to call and remove a number of exception handlers from the chain. This function takes an address of the location where execution will resume once all procedures have unwound - it would not usually be called from a high level language directly, because of the problems of adjusting the programming environment, but via some language supported construct such as the 'C' longjmp function.

Specific exceptions can be generated by DosRaiseException - this includes user-defined exceptions as well as system ones.

DosError with parameter FERR_DISABLEEXCEPTION can be used to suppress the default exception popup message if a fatal exception causes program exit.

For signal exceptions the process uses DosSetSignalExceptionFocus to tell OS/2 that it is prepared to receive Ctrl+C or Ctrl+Break and once it has processed a signal exception it calls DosAcknowledgeSignalException to tell OS/2 it is ready for another one - this prevents a process being overrun with signal exceptions if the user 'leans' on the Ctrl+C key!. Another process can call DosSendSignalException to send a signal explicitly to another process, or DosKillProcess to do so a little more indirectly.

Finally critical pieces of code can be protected from signal exceptions by use of DosEnterMustComplete and DosExitMustComplete - any signals occuring between these two calls are delayed until the 'must complete' region is exited.


Overview of a general exception handler

An exception handler ought to have the following properties:

1) It should not impact other exception handlers 2) It must be reliable

Point 1 is necessary to ensure that, for example, the exception caused by reaching the end of the currently allocated stack is allowed to pass on along the chain to the exception handler which will allocate additional stack space and resume execution. It is quite easy to enforce this behaviour by simply ensuring that 0 (or XCPT_CONTINUE_SEARCH) is returned for all exceptions other than the one or ones being explicitly processed.

Point 2 is necessary to ensure your program can be terminated successfully! The exception chain is called on program termination and the program WILL NOT TERMINATE until each exception handler has returned. One other point is that nested exceptions are quite easy to generate (ie your exception handler causes an exception!) but will crash or hang your program unless checked against. Fortunately there is a flag passed to the exception handler when a nested exception is being processed and usually if this flag is set you will pass the exception on up the chain of handlers until it gets out of the nested region.

An example framework for a 'do nothing' exception handler in C is:

ULONG APIENTRY handler(
  EXCEPTIONREPORTRECORD *pReport, /* details of this exception */
  EXCEPTIONREGISTRATIONRECORD *pRegRecord, /* registration record for handler */
  CONTEXTRECORD *pContext,       /* machine context at time of fault */
  void *ptr )                    /* dispatcher context (exception specific) */
  {
  /* Exception handling goes in here... */
  return XCPT_CONTINUE_SEARCH;
  }

It might be used like this:

  {
  EXCEPTIONREGISTRATIONRECORD reg_rec = { 0, handler };
  APIRET rc = 0;

  rc = DosSetExceptionHandler( &reg_rec );

  /* Code to be protected goes in here... */

  rc = DosUnsetExceptionHandler( &reg_rec );
  }


Note that OS/2 uses the ADDRESS of the exception record to perform sorting and chain searching. The exception record must be allocated from the stack and MUST be unregistered before the procedure exits. If this is not done the exception record will be overwritten and your program will probably crash or fail to exit since the exception record chain will be corrupted. The single commonest cause of this error is registering an exception handler at the beginning of a procedure and either not deregistering it at the end, or doing so - but then returning before the end of the procedure (for example on some error condition). The program may well continue to work fine - until it ends and OS/2 attempts to pass the program termination exception down the chain of handlers; disaster occurs because the data structure for the first handler has been overwritten and the program locks up. This is particularly true of exception handlers which are registered at the beginning of the 'main' procedure - you must EITHER deregister them OR ensure the program is ended by calling the exit function rather than by returning from the main procedure.

The first parameter of the exception handling function explains the actual exception being processed - the most important field being the ExceptionNum which describes the exception, for example XCPT_INTEGER_DIVIDE_BY_ZERO or XCPT_ACCESS_VIOLATION. Many exception handlers consist of a switch on this value with the 'default' statement returning 0. Additional fields give more information, which may be exception specific such as the address causing the access violation for the second example above; others more general such as a flag indicating whether an exception is nested.

The second parameter points to the registration record used to register this instance of the exception handler. Typically the exception handler will need some additional parameters to enable proper action to be taken. One option is of course to use some global or module variable, but this can get a little tricky with multiple threads or recursive procedures; and so a better method is to imbed the exception registration record in a larger structure, and thus since the address of this record is passed to the exception handler it can then be used to access the extra information. I use this method in the sample program below.

The third parameter points to the context record describing the machine state when the exception occurred - you can for example read (and modify) the registers. This is of most use to assembler programmers who might for example single step over a failing instruction by modifying the instruction pointer in this record - users of a high level language usually do not have enough control over the machine code generated to do much with this information.

Finally the fourth parameter is used to pass additional information for one or two specific exceptions. See the full OS/2 documentation for more details.


Description of the sample program

Well that's enough (too much?) of these technical details - here is a simple example of how you might use an exception handler in 'C' to check pointers for validity. This is the 'lazy validation' method - invalid pointers are hopefully a rare error so just access the area pointed to and 99% of the time it will work. The exception handler will catch the bad 1% and enable you (in a fully fledged program) to return an error code or take some other avoiding action.

Under earlier versions of OS/2 you couldn't do this - one bad pointer access and you program was unceremoniously aborted.

So how does it work? The code is listed below and consists of a simple user test harness in the 'main' procedure which asks for an address and then calls the verify procedure to check and display the byte at that address. If unsuccessful the routine prints 'Bad address!' instead.


Comments on the program

The first point to note is that the exception handler is localised - ie it is registered and deregistered within the verify procedure as close to where it is required as possible. This means that I can safely assume that I can process ALL access violations without needing to check any further. It is in general a good idea to try and keep exception handlers close to the code they are protecting.

The second point is the use of an 'extended' registration record (MYREC) to enable me to pass the jmp_buf structure into the exception handler.

The third point is that setjmp/longjmp makes use of the OS/2 exception manager. In particular longjmp calls DosUnwindException to remove all exception handlers added since the environment was saved by setjmp. In the code I register VerifyHandler AFTER setjmp has been called. If the pointer access is invalid the exception handler is called, longjmp is executed and the code to deregister the exeception handler is not called. However, if you try and deregister VerifyHandler in the 'failing' branch of the setjmp call you will find that the call returns an error. This is because when longjmp called DosUnwindException VerifyHandler was removed from the chain.


A general point to bear in mind is that the exception handler ONLY AFFECTS THE THREAD IN WHICH IT WAS REGISTERED so it's no use to register a single exception handler once in your main program and then create lots of threads - each one will need to have the handler separately registered. The only case where one handler is enough is for signal exceptions since they are ALWAYS passed to thread 1.

The final point is to make sure you have a lot of stack if you define your own exception handlers - they seem to need it, especially when you get nested exceptions. Failure to have enough stack may cause your program to exit or hang since OS/2 will be unable to dispatch the program termination exception properly. In this example I use 0x4000 bytes which for such a small program may be a bit excessive, but under OS/2 v2 lazy stack allocation means that the memory for this stack should only be actually obtained when I need it!

It is another reason why exception handlers should be short and simple (if possible!) since this reduces the stack requirement as well as the likelihood of a nested exception.

Conclusion

With version 2 OS/2 now provides fairly robust exception management. The main problem that is it still too easy to prevent a program exiting by a careless line or two of code in an exception handler. In addition stack corruption, or failure to de-register an exception handler on exiting from the procedure it was defined in, can cause problems which are not easy to find later on.

I have noticed that a number of IBM's own OS/2 programs (including the command shell itself!) will on occasion fail to exit because of problems with exception handlers, and the program is then unkillable and remains hanging about until you reboot.

On the positive side however it is a nice luxury to be able to protect important programs from unexpected events, especially bad pointers, and without needing to resort to assembler to do so.

I hope more OS/2 programmers will be encouraged to have a go at putting in exception handlers where they are appropriate to make their programs more reliable or to simplify error handling.

Compilation command

[I am using IBM C/C++]

icc /wall+ /b/stack:0x4000 VerAddr.c
------------------------- VerAddr.c --------------------------

#define INCL_DOS

#include <os2.h>
#include <ctype.h>
#include <stdio.h>
#include <setjmp.h>

typedef struct _myrec             /* 'extended' registration record */
  {
  EXCEPTIONREGISTRATIONRECORD RegRecord; /* MUST BE FIRST! */
  ULONG ErrorCode;
  jmp_buf jmpbuf;
  } MYREC, *PMYREC;

ULONG APIENTRY VerifyHandler(
  EXCEPTIONREPORTRECORD *pReport,
  EXCEPTIONREGISTRATIONRECORD *pRegRecord,
  CONTEXTRECORD *pContext,
  void *ptr)

  {
  PMYREC pMyRec = (PMYREC)(PVOID)pRegRecord; /* Get extended structure */

  /* Reference unwanted parameters */
  ptr = ptr;
  pContext = pContext;

  if ( pReport->ExceptionNum  == XCPT_ACCESS_VIOLATION )
     {
     pMyRec->ErrorCode = pReport->ExceptionInfo[ 0 ];
     longjmp( pMyRec->jmpbuf, 1 );
     }

  return XCPT_CONTINUE_SEARCH;
  }

void verify( MPARAM pAddr )
  {
  MYREC except = { {0, VerifyHandler }, 0 };
  APIRET rc = 0;
  PSZ p = (PSZ) pAddr;
  unsigned char ch = '\0';


  if ( setjmp( except.jmpbuf ) == 0 )
     {
     rc = DosSetExceptionHandler(&except.RegRecord);
     if ( rc != 0 )
        printf( "DosSetExceptionHandler: error - %u\n", (int) rc );

     ch = *p;

     rc = DosUnsetExceptionHandler( &except.RegRecord );
     if ( rc != 0 )
        printf( "DosUnSetExceptionHandler - error %u\n", (int)rc );

     if ( isprint( ch ) )
        printf("Value: '%c'\n", ch );
     else
        printf("Value: '%2.2x'\n", (unsigned char)ch );
     }
  else
     printf( "Bad address - violation error %u\n", except.ErrorCode );


  return;
  }

int main( int argc, char **argv )
  {
  LONG lAddr = 0;
  char buffer[ BUFSIZ ] = { '\0' };

  /* Reference unwanted parameters */
  argc = argc;
  argv = argv;

  while( !feof( stdin ) )
     {
     printf( "Enter address, or press Ctrl+Z: " ); fflush( stdout );
     if ( gets( buffer ) != NULL )
        {
        if ( sscanf( buffer, "%li", &lAddr ) != 1 )
           printf( "Bad argument - integer expected\n" );
        else
           verify( MPFROMLONG( lAddr ) );
        }
     }
  return 0;
  }

Roger Orr 27-Aug-1993