SMP Considerations for OS/2 Device Drivers

From EDM2
Jump to: navigation, search

Written by Scott E. Garfinkle

[Here is a driver.asm file to accompany the article. Ed.]

Introduction

The purpose of this article is to assist developers of OS/2 device drivers to ensure that their device drivers perform as expected in a Symmetric Multi-Processing (SMP) environment. Some of this material is also covered in the smp.inf online documentation. This document is available on the Warp Server with SMP install CDROM. I will also make sure that EDM/2 has a copy.

Although most device drivers will work fine without modification, there are four main areas of concern to bear in mind. Specifically, these are:

  1. Local Infoseg access
  2. Port I/O operations and IRQ masking
  3. Serialization concerns
  4. High Memory access

1.- If you use the Local Infoseg (LIS) to get information about the currently-running thread, you can get in trouble if you're also using a 32-bit device driver. Specifically, the selector returned by DevHelp_GetDOSVars is remapped for each CPU to point to the correct information. However, the linear addr underneath is not (at least for now) remapped to the correct physical page. That means that, when you get this selector, if you then call DevHelp VirtToLin to get the linear address, the pointer you get back will only be valid on the CPU you happened to be on when you called VirtToLin. The moral of this is that, even if you are using a 32 bit DD, you HAVE to use a 16:16 access method to get to the LIS.

2.- Port I/O and IRQ masking: there are two different concerns here. First, many device drivers, particularly those modeled after some old DDK samples, may issue an End-of-Interrupt (EOI) to the 8259 chip by directly doing an "out" to port 20h. DON'T DO IT! This will fail catastrophically on systems running in "advanced interrupt" mode (see the Intel "Multiprocessor Specification", Intel order number 242016-004). Use DevHlp_EOI, instead. That is:

ifndef SMP   ; old way
    mov     al,20h
    out     20h,al
else      ; new way
    mov     al,irq_number_in_service
    mov     dl,DevHlp_EOI         ; 76h
    call    DWORD PTR [device_help]
endif

    ; as a (somewhat useless) example, here is the old EOI method rewritten
    ; to use the new port_io devhelp. Taken from smp.inf.
    port_io_s         STRUC
    port_io_port        DD   ?
    port_io_data        DD   ?
    port_io_flags       DD   ?
    port_io_s         ENDS

    IO_READ_BYTE      EQU  0000H
    IO_READ_WORD      EQU  0001H
    IO_READ_DWORD     EQU  0002H
    IO_WRITE_BYTE     EQU  0003H
    IO_WRITE_WORD     EQU  0004H
    IO_WRITE_DWORD    EQU  0005H
    IO_FLAGMASK       EQU  0007H

    PORT_IO      port_io_s <20h,20H,IO_WRITE_BYTE>

    LES     SI,PORT_IO
    MOV     DL,dh_Port_IO
    CALL    DevHlp
    JC      Error
    ;       EXIT:   port_io_struc.data filled in if I/O read

A related function, VDHPortIO(), is provided for virtual device drivers. Similarly, getting and setting IRQ masks may need to be mediated by the platform-specific driver (PSD). I will omit the code sample here -- see the INF file.

3.- serialization concerns: with only one CPU, it suffices for a device driver to use CLI/STI (or push/CLI/popf) to ensure that interrupt processing will not disrupt a critical code region. With more than one CPU, this can fail as follows: You are already processing an interrupt, but the system running your device driver is in "advanced interrupt mode". What do you do? Basically, you need to use a mutual exclusion ("mutex") semaphore, just as you would if you were writing ring 3 apps. Warp SMP provdes a suitable package, and I will also provide another one below. Warp provides a package called "SpinLocks" that is accessible from your device driver via the following calls to [DevHelp]:

   name                     DL value     ax:bx value
   DevHlp_CreateSpinLock    79h         &hlock
   DevHlp_FreeSpinLock      7Ah          hlock
   DevHlp_AcquireSpinLock   71h          hlock
   DevHlp_ReleaseSpinLock   72h          hlock

Interestingly, and currently undocumented, you can use the same interface at ring 3 via the following DOSCALL1 exports:

   DOSCREATESPINLOCK           @449
   DOSACQUIRESPINLOCK          @450
   DOSRELEASESPINLOCK          @451
   DOSFREESPINLOCK             @452
   DOS32CREATESPINLOCK         @557
   DOS32ACQUIRESPINLOCK        @558
   DOS32RELEASESPINLOCK        @559
   DOS32FREESPINLOCK           @560

The parameters are pretty much the same as above (though passed on the stack, of course). The 32 bit calls are simply thunked to their 16 bit counterparts for you. The nice thing here is that if you have some data that is manipulated by a ring 3 daemon for one reason or another, you can use a single lock handle to serialize access. DO NOT BLOCK WHILE HOLDING A SPINLOCK! This also implies that you should not call any APIs of any sort while holding a spinlock. You have been warned...

If you want, you can always "roll your own" locks. [driver.asm Supplied] is one possible way of doing this (written with the Visual Age optlink parameter-passing convention in mind, wher parm1 is EAX and parm2 is EDX). This may be a better choice where the resource in contention might be tied up for long periods of time. In this case, the requesting thread will get put to sleep until the resource is available. Also note that I store the pid/tid of the owner in the ULONG. This helps to detect deadlocks and also will make your driver more easily debuggable.

There is *probably* no danger of being interrupted by an external interrupt on CPU 0 if you do a CLI on cpu1. In theory, what will happen is this: Before calling your DD, os2krnl will have acquired the R0SubsysSpinlock. Before the interrupt manager calls your Interrupt handler, it will try to save and grab that spinlock. If you've done a CLI, your CPU will not respond to the interprocessor communication request, and so everything should work (though not on 2.11 SMP). This does not affect synchronization of shared data, however, between ring 2/3 and ring 0 or other issues if you *don't* do the CLI.

4.- High Memory Area (HMA) considerations: in Warp Server SMP (and the forthcoming Warp Server for e-business), apps can now allocate private or shared memory above the 512mb virtual address line. Because of its address, this is not "thunkable" using the usual shift and add algorithm. Be aware that ring 3 apps might pass pointers like this (i.e. addr > 0x1fffffff). In this case, you can either access the pointer directly if you have a 32 bit DD or create a GDT alias.

There are some other new features introduced in Warp Server SMP that are not specifically SMP issues, such as Raw File System access and access to PerfSysTrace data. Perhaps I'll cover these another time.