From Hello World to Real World - Part 5/6

From EDM2
Jump to: navigation, search

Written by Alger Pike

Part 1

Part 2

Part 3

Part 4

Part 5

Part 6

Making I/O Calls

Introduction

Making an I/O call under OS/2 is very different from making an I/O call in DOS. In DOS, all you do is include conio.h and begin calling inp() and family. But a protected mode operating system like OS/2 does not allow I/O calls from ring 3 (application) code. If an OS/2 application needs to do I/O, it usually requires a device driver to do so. However, there are several ways to do I/O from OS/2, some require a device driver and some do not. Each method has distinct advantages and disadvantages; I will list several from slowest to fastest.

One method uses a device driver which is a part of OS/2 called TESTCFG$. You use this device like any other device driver; by making the right IOCTL calls you can do I/O to ports above 0x100. The advantage of this is that the driver has already been written for you. You set up the proper structures, make the IOCTL call and that's it. Its main disadvantage is that it is very slow. Not only is there thunking to go from ring 3 to ring 0 code, but there are also many error checking routines that are called once you get into the IOCTL itself. If you ever have a chance to look at the TESTCFG$ code you will see what I mean. All this extra overhead slows things down considerably.

A second method is using input output privilege level (IOPL) ring 2 segments. These are 16-bit segments that run at ring 2 which means that they have I/O privileges. This method is slow, because you need thunking and stack copying to go from one ring to the other. The third method, which is about the same speed as IOPL but maybe slightly faster, (I never actually tested so I don't know), is to call an in or out instruction from an IOCTL in a device driver. This, like using TESTCFG$, will need to thunk the pointers so the driver can use them. It seems silly to do all this for a single in or out instruction; more time is spent on the overhead than on the actual I/O. However, both of these methods have an advantage over TESTCFG$: they do not go through all the error checking code. This makes them both faster than TESTCFG$.

The fastest way I know of to do I/O in OS/2 is to use the technique that Holger Veit describes in the August 1995 issue of EDM/2. In his article, Holger describes his FASTIO$ and why this method is very fast. Just to review, this method uses a device driver to configure a ring 3 -> ring 0 call gate. We adjust the offset of this call gate to point directly to our ring 0 I/O code. In this way we can call ring 0 code directly from ring 3, there is no thunking routine, and we configure the call gate so there is no copying of parameters from the ring 3 to ring 0 stack. In this way, we eliminate much of the overhead of an IOCTL call and we can make our I/O call and then return back to our application very quickly.

How much faster is the FASTIO$ method than TESTCFG$? This is the question I set out to answer. If FASTIO$ proved to be fastest, I would then port the code into my device driver. To answer this question I set up two versions of the same program. Each was set up to do one million I/O operations to the same I/O port. One used TESTCFG$ to do the I/O and the other used FASTIO$. The results are below:

Processor        TESTCFG$        FASTIO$        Times Faster
P100              267 s           3.00 s         89
P133              164             2.21           74

Table I. Comparision of I/O methods.

Clearly from Table I it can be seen that FASTIO$ is much faster than making IOCTL calls using TESTCFG$.

With that established I could now work on porting the FASTIO$ code and make it part of my AD3110$ driver. The original complete version of this code can be found with the article that Holger Veit wrote for EDM/2 in August 1995. The driver acquires the global descriptor table (gdt) selector in the open function of the device driver. The code required to do this is as follows:

acquire_gdt_ proc near
    pusha                                ;remark we push only
                                             ;16-bit regs but use 32-bit regs
    mov   ax, word ptr [_io_gdt32]
    or    ax, ax
    jnz   aexit                          ;if we don't have on  make it
    xor   ax, ax
    mov word ptr [_io_gdt32], ax         ;clear gdt save
    mov word ptr [gdthelper], ax         ;clear gdthelper
    push  ds
    pop   es
    mov   di, offset _io_gdt32           ;ES:DI = addr of _io_gdt32
    mov   cx, 2                          ;two selectors
    mov   dl, DevHlp_AllocGDTSelector

    call  [_Device_Help]                 ;get selectrors
    jc aexit                             ;exit on error

    sgdt fword ptr [gdtsave]             ;get GDT ptr ; access gdt pointer
    mov   ebx, dword ptr [gdtsave+2]     ;get lin addr of GDT
    movzx eax, word ptr [_io_gdt32]      ;build offset into table
    and   eax, 0fffffff8h                ;mask away DPL
    add   ebx, eax                       ;build address

    mov   ax,  word ptr [gdthelper]      ;sel to map to
    mov   ecx, 08h                       ;only one entry
    mov   dl, DevHlp_LinToGDTSelector
    call  [_Device_Help]
    jc aexit0                            ; exit if failed

    mov   ax, word ptr [gdthelper]
    mov   es, ax                         ;Build address to GDT
    xor   bx, bx

    mov word ptr es:[bx], offset io_call_;fix addr offset
    mov word ptr es:[bx+2], cs           ;fix address sel
    mov word ptr es:[bx+4], 0ec00h       ;a R0 386 call gate
    mov word ptr es:[bx+6], 0000h        ;high offset

    mov dl, DevHlp_FreeGDTSelector       ;free gdthelper
    call [_Device_Help]
    jnc short aexit

aexit0:
    xor ax, ax                           ;clear selector
    mov word ptr [_io_gdt32], ax
aexit:
    popa                                 ;restore all registers
    mov ax, word ptr [_io_gdt32]
    ret
acquire_gdt_ endp

Figure 1, Code required to setup the call gate. This code fragment appears in this article with permission from Holger Veit.

The assembler I use is WASM not MASM so some changes were required. The two most notable changes I made to this code were: 1) changes needed to call the Watcom devhelp library, and 2) sgdt qword ptr [gdtsave], in the original code needed to be changed to, sgdt fword ptr [gdtsave]. Try it with qword for fun; nice crash, Argh!

To complete the changes needed to the driver code I also needed to add an IOCTL call. This call returns the value of the call gate to the application so it can call the ring 0 code (ioctl.c).

case 0x60: 
    *((unsigned short FAR*)rp->DataPacket) = io_gdt32;
    break;

Figure 2. IOCTL which returns call gate to the 32-bit application.

This code is pretty simple but without it the driver will not work, the application needs to know where to go to make the desired I/O call. In order to use the gdt selector, we need to write wrapper functions that will allow us to call the gdt selector. These functions will set up parameters in the registers and do the I/O work for us. Once again, the wrapper functions I wrote are a direct port of Holger Veit's, with changes made so that they will work with WASM and the WATCOM C compiler. See iolib.h and wiolib.asm for the required pragma statements and name changes that are required. (You can obtain these files from the author's site if you need them. http://avenger.mri.psu.edu/os2page.html) Note also, that in order for the wrapper functions to work properly you must set your compiler to use stack calling conventions (page 9 of Watcom compiler options).

At this point, we now have a driver which has most of the ring 0 code that it needs to be functional. We have initialized the device and made it ready for use, and we have taken the necessary steps for the driver to perform fast input and output from the application level. If you are willing to settle for polling when doing timing operations, your driver is complete. If not, read on to learn how to do interrupt processing in part 6.