Doing Port I/O: The Front Doors

By Alger Pike

Device drivers
No doubt, as was already stated, a device driver is the real solution. Keep critical tasks away from the user is the rule of the game. The path to this is quite stony, however.

As often criticized, device drivers in OS/2 are still 16 bit code. This has several disadvantages.


 * You need to code entirely in assembler, or find a C compiler that still produces 16 bit code. Most of the standard drivers that come with OS/2 were built with either MASM 5.1 or MSC 6.0. It is not only the problem that IBM relies on products of a competitor here, but simply that these products are no longer available in stores, and the successors no longer support OS/2. Fortunately, the recent, widely available Watcom 10.0 compiler is usable for device driver development.
 * Coding in 16 bit throws the developer back into the stone age where he has to fight against 64K segments, far and near calls, moving around selectors and grouping segments in a certain order.

Nevertheless, one could think about writing a device driver that you can use to read and write I/O ports by issuing special DosDevIOCtl instructions. Surprisingly, this is an unnecessary enterprise! Not many OS/2 users know that every stock OS/2 system comes with a device driver that comes with such a functionality: TESTCFG.SYS.

Besides some other functions that are beyond the scope of this article, TESTCFG.SYS offers two ioctls, one for reading I/O ports and another one for writing. See table 1 for the description of the functions and the simple program fragment in figure 1 for an example. Table 1: Ioctl API of TESTCFG.SYS for doing direct I/O Figure 1: Sample code to read a port through TESTCFG.SYS

There is one drawback, and we will hear this argument again real soon: it is slow. Why?

Now see, we are calling this function from a 32 bit user program. The DosDevIOCtl will enter the kernel through a call gate which is some kind of a protected door. The kernel will then first check the validity of the parameter and data packets, identify the target driver to perform this function, and then call the appropriate driver entry point. Note the driver is 16 bit code, so the kernel must convert the addresses of the parameter and data packets from 0:32 bit user space addresses to 16:16 bit device driver addresses. Finally, the driver itself must decode the command and dispatch it to its routines.

I once tried to trace such an ioctl call with the kernel debugger, and eventually gave up counting after following some hundred instructions without seeing any driver code. Compare this with a single IN or OUT instruction. That's bureaucracy!

IOPL Segments
The second method is actually a leftover from earlier OS/2 1.X versions, hence it is a 16 bit technique as well.

Let me elaborate here a bit on the method used to prevent I/O access by user programs. The Intel 286 and later (386, 486, Pentium) processors can execute code at four different privilege levels. Because they are nested and usually drawn as concentric circles, these levels are frequently referred to as privilege rings (or protection rings). Ring 0 is the level with the highest privilege, and ring 3 has the lowest privilege (see figure 2).



Figure 2: The privilege rings and their use in OS/2

If a process wants to run with a higher privilege than the one it currently has, it must go through a special gate; one might also compare a gate with a tunnel or "wormhole". There are several types of gates such as interrupt, trap, or task gates. The only interesting type for us is the call gate. A call gate allows a one-way transfer of execution from a segment with some privilege to another one with same or higher privilege. The other direction, that is from a "trustworthy" high-privileged code segment to a less trusted lower-privileged segment, is not possible. See figure 2.



Figure 3: Allowed and forbidden transactions with call gates

Two bits in the processor status register (the IOPL field) determine the level that is necessary to execute I/O CPU instructions. Any code with less than this privilege level will trigger an exception at the first attempt to execute such an instruction. Table 2 lists the affected instructions. Certain instructions will even cause an exception if the process has the privilege to I/O. These instructions require ring 0 privilege. Table 2 also lists these instructions (386 processor).
 * Note 1: INT 3 (opcode 0xcc) and INTO are not affected
 * Note 2: I/O instructions are enabled or disabled by the I/O permission map in the 386 task state segment

Table 2: Privileged Instructions

In OS/2, the required privilege level for I/O is ring 2 or better, and tough luck, any user process only runs in ring 3 (figure 2).

In order to get a controlled way to do I/O, the OS/2 developers provided a method to execute 16 bit code at ring 2 level. When the linker produces an executable from several object files, it accepts a special attribute for code segments under certain circumstances. This attribute is named IOPL and is specified in the segment declaration section of a linker definition file (Consult appropriate linker documentation). The linker then annotates the code in a way that every call of a routine in this IOPL segment will be directed through a call gate, rather than a simple call. When such a program is loaded into memory for execution, the loader code in the kernel will generate a R3->R2 call gate for each target called in an IOPL segment (see call gate X in figure 3).

Each time such a call gate is entered, the processor will gain ring 2 privilege and lose it again when leaving by a normal return instruction.

Apparently, this looked like a feature which could be abused, so the IBM developers restricted it in a way that only segments in a DLL can get the IOPL attribute. This appears to be a built-in feature of the program loader, not just the linker, as patching the appropriate tables in the executable will not work.

This restriction is not a bad idea, as it is now no longer possible to make an executable disguising as a normal program, but doing I/O inside. There must be an accompanying DLL, to arouse suspicion - or at least should do so.

This could have been an almost ideal way for moderate I/O - if IBM had provided a similar method for 32 bit applications as well. There is no restriction in the processor itself concerning 32 bit I/O, as one might suspect; it is an intentional limitation. Since IBM will not support 16 bit software any longer in OS/2 for the PowerPC, those unsecure interfaces will disappear in the future.

Nevertheless, you can call routines in such a 16 bit IOPL DLL from a 32 bit executable, and there are several example files floating around in various FTP archives. The key item here is thunking. The main problem with calling code of another size gender is that the program counter as well as the stack pointer needs to be adjusted to the corresponding other size. If address parameters are passed through the stack, these addresses need to be converted as well. This is what a thunking routine does.

Usually the compiler generates such routines automatically when a 16 bit routine is declared, and this is why many high- level programmers do not encounter them at all. However, even if they seem to be invisible, they nevertheless contribute a considerable share to performance degradation if an I/O routine in the IOPL DLL is called from a 32 bit application.