From Hello World to Real World - Part 5/6
Making I/O Calls
Written by Alger Pike
Making an I/O call under OS/2 is very different from making an I/O call in DOS. In DOS, all you do is include conio.h and begin calling inp() and family. But a protected mode operating system like OS/2 does not allow I/O calls from ring 3 (application) code. If an OS/2 application needs to do I/O, it usually requires a device driver to do so. However, there are several ways to do I/O from OS/2, some require a device driver and some do not. Each method has distinct advantages and disadvantages; I will list several from slowest to fastest.
One method uses a device driver which is a part of OS/2 called TESTCFG$. You use this device like any other device driver; by making the right IOCTL calls you can do I/O to ports above 0x100. The advantage of this is that the driver has already been written for you. You set up the proper structures, make the IOCTL call and that's it. Its main disadvantage is that it is very slow. Not only is there thunking to go from ring 3 to ring 0 code, but there are also many error checking routines that are called once you get into the IOCTL itself. If you ever have a chance to look at the TESTCFG$ code you will see what I mean. All this extra overhead slows things down considerably.
A second method is using input output privilege level (IOPL) ring 2 segments. These are 16-bit segments that run at ring 2 which means that they have I/O privileges. This method is slow, because you need thunking and stack copying to go from one ring to the other. The third method, which is about the same speed as IOPL but maybe slightly faster, (I never actually tested so I don't know), is to call an in or out instruction from an IOCTL in a device driver. This, like using TESTCFG$, will need to thunk the pointers so the driver can use them. It seems silly to do all this for a single in or out instruction; more time is spent on the overhead than on the actual I/O. However, both of these methods have an advantage over TESTCFG$: they do not go through all the error checking code. This makes them both faster than TESTCFG$.
The fastest way I know of to do I/O in OS/2 is to use the technique that Holger Veit describes in the August 1995 issue of EDM/2. In his article, Holger describes his FASTIO$ and why this method is very fast. Just to review, this method uses a device driver to configure a ring 3 -> ring 0 call gate. We adjust the offset of this call gate to point directly to our ring 0 I/O code. In this way we can call ring 0 code directly from ring 3, there is no thunking routine, and we configure the call gate so there is no copying of parameters from the ring 3 to ring 0 stack. In this way, we eliminate much of the overhead of an IOCTL call and we can make our I/O call and then return back to our application very quickly.
How much faster is the FASTIO$ method than TESTCFG$? This is the question I set out to answer. If FASTIO$ proved to be fastest, I would then port the code into my device driver. To answer this question I set up two versions of the same program. Each was set up to do one million I/O operations to the same I/O port. One used TESTCFG$ to do the I/O and the other used FASTIO$. The results are below:
Processor TESTCFG$ FASTIO$ Times Faster P100 267 s 3.00 s 89 P133 164 2.21 74Table I. Comparision of I/O methods.
Clearly from Table I it can be seen that FASTIO$ is much faster than making IOCTL calls using TESTCFG$.
With that established I could now work on porting the FASTIO$ code and make it part of my AD3110$ driver. The original complete version of this code can be found with the article that Holger Veit wrote for EDM/2 in August 1995. The driver acquires the global descriptor table (gdt) selector in the open function of the device driver. The code required to do this is as follows:
acquire_gdt_ proc near pusha ;remark we push only ;16-bit regs but use 32-bit regs mov ax, word ptr [_io_gdt32] or ax, ax jnz aexit ;if we don't have on make it xor ax, ax mov word ptr [_io_gdt32], ax ;clear gdt save mov word ptr [gdthelper], ax ;clear gdthelper push ds pop es mov di, offset _io_gdt32 ;ES:DI = addr of _io_gdt32 mov cx, 2 ;two selectors mov dl, DevHlp_AllocGDTSelector call [_Device_Help] ;get selectrors jc aexit ;exit on error sgdt fword ptr [gdtsave] ;get GDT ptr ; access gdt pointer mov ebx, dword ptr [gdtsave+2] ;get lin addr of GDT movzx eax, word ptr [_io_gdt32] ;build offset into table and eax, 0fffffff8h ;mask away DPL add ebx, eax ;build address mov ax, word ptr [gdthelper] ;sel to map to mov ecx, 08h ;only one entry mov dl, DevHlp_LinToGDTSelector call [_Device_Help] jc aexit0 ; exit if failed mov ax, word ptr [gdthelper] mov es, ax ;Build address to GDT xor bx, bx mov word ptr es:[bx], offset io_call_;fix addr offset mov word ptr es:[bx+2], cs ;fix address sel mov word ptr es:[bx+4], 0ec00h ;a R0 386 call gate mov word ptr es:[bx+6], 0000h ;high offset mov dl, DevHlp_FreeGDTSelector ;free gdthelper call [_Device_Help] jnc short aexit aexit0: xor ax, ax ;clear selector mov word ptr [_io_gdt32], ax aexit: popa ;restore all registers mov ax, word ptr [_io_gdt32] ret acquire_gdt_ endpFigure 1, Code required to setup the call gate. This code fragment appears in this article with permission from Holger Veit.
The assembler I use is WASM not MASM so some changes were required. The two most notable changes I made to this code were: 1) changes needed to call the Watcom devhelp library, and 2) sgdt qword ptr [gdtsave], in the original code needed to be changed to, sgdt fword ptr [gdtsave]. Try it with qword for fun; nice crash, Argh!
To complete the changes needed to the driver code I also needed to add an IOCTL call. This call returns the value of the call gate to the application so it can call the ring 0 code (ioctl.c).
case 0x60: *((unsigned short FAR*)rp->DataPacket) = io_gdt32; break;Figure 2. IOCTL which returns call gate to the 32-bit application.
This code is pretty simple but without it the driver will not work, the application needs to know where to go to make the desired I/O call. In order to use the gdt selector, we need to write wrapper functions that will allow us to call the gdt selector. These functions will set up parameters in the registers and do the I/O work for us. Once again, the wrapper functions I wrote are a direct port of Holger Veit's, with changes made so that they will work with WASM and the WATCOM C compiler. See iolib.h and wiolib.asm for the required pragma statements and name changes that are required. (You can obtain these files from the author's site if you need them. http://avenger.mri.psu.edu/os2page.html) Note also, that in order for the wrapper functions to work properly you must set your compiler to use stack calling conventions (page 9 of Watcom compiler options).
At this point, we now have a driver which has most of the ring 0 code that it needs to be functional. We have initialized the device and made it ready for use, and we have taken the necessary steps for the driver to perform fast input and output from the application level. If you are willing to settle for polling when doing timing operations, your driver is complete. If not, read on to learn how to do interrupt processing in part 6.