OS/2 Installable File Systems - Part 2/3
Written by Andre Asselin
Last time I went over some of the background information needed to write an IFS. In this article, I'll continue on and examine a framework to write a split ring 0/ring 3 IFS. This month I'm going to limit myself to just the code that does initialization and communication between ring 0 and ring 3; it's complicated enough to warrant a full article of it own. The things we will cover are:
The source code for the project is divided up into two directories, RING0 and RING3 (included as IFSR0.ZIP and IFSR3.ZIP - Editor); RING0 holds the ring 0 source, and RING3 hold the ring 3 source. As I mentioned last time, all IFS's must be 16-bit code, so for the source in the RING0 directory, I'm using Borland C++ 3.1. The ring 3 side is 32-bit code, however, so for it I'm using Borland C++ for OS/2. I haven't tried this code on any other compilers, but it should be easily portable. One thing to note is that I'm compiling the code in C++ mode and using the C++ extensions for one line comments and anonymous unions (see for example R0R3SHAR.H). I also use one Borland #pragma for forcing enumerations to be 16 bits. With a few modifications, this source should work with any ANSI C compiler.
The contents of the RING0 directory are:
C0.ASM Stripped down Borland C++ startup code FSD.H FS_* call prototypes and data structures FSH.H FSH_* call prototypes and data structures FSHELPER.LIB Import library for file system helpers OS216.H Header file for 16-bit OS/2 Dos APIs R0.CFG Borland C++ configuration file R0.DEF Definition file for the linker R0.MAK Make file for the IFS R0COMM.C Routines to communicate with the control program R0DEVHLP.ASM DevHlp interface routines R0DEVHLP.H Header file for DevHlp interface routines R0GLOBAL.C Global variable definitions R0GLOBAL.H Global variable declarations R0INC.H Main include file R0R3SHAR.H Shared data structures between the IFS and control program R0STRUCT.H IFS private structures R0STUBS.C FS_* stub routinesThe contents of the RING3 directory are:
FSATT.C Sample attach program FSD.H FS_ call prototypes and data structures R0R3SHAR.H Shared data structures between the IFS and control program R3.CFG Borland C++ configuration file R3.MAK Make file for the control program R3COMM.C Routines to communicate with the IFS R3GLOBAL.H Global variable declarations R3INC.H Main include file R3STUBS.C FS_* stub routinesThe two directories are laid out pretty similarly. Some notes on the files:
Communicating Between Ring 0 and Ring 3
The easiest and fastest way for ring 0 and ring 3 code to communicate is through shared memory. The way I implemented it is to have the control program allocate two buffers when it initializes: one buffer is used to hold all the parameters for a given operation, and the other serves as a data buffer to hold data for operations like FS_WRITE. After allocating the buffers, it makes a special call to the IFS, which sets up a GDT alias for itself (we need to use GDT selectors because the IFS can be called in the context of any process). In more detail, what we do is:
When the IFS loads
Call the AllocGDTSelector DevHlp to allocate two GDT selectors. These will be the selectors used by the IFS to get access to the control program's two buffers. We allocate them now because GDT selectors can only be allocated at initialization time.
When the control program loads
A Shared Memory Protocol
Once the buffers are allocated and accessible to both the ring 0 and ring 3 code, we need to set up some kind of protocol for its use. The control program needs to know when a valid operation is in the buffers and ready to be performed. The IFS needs to know when the buffers are in use, and when the buffers contain the results of a completed operation. Again, there are several ways to implement this. The method I chose involves using semaphores and captive threads.
After the control program allocates the buffers and does any other initialization, it calls the IFS through DosFSCtl(). The IFS sets up the ring 0 GDT aliases for the buffers, and then suspends the control program's thread by making it wait on a semaphore (thus capturing it). To the control program, it just looks like it made a system call that is taking a very long time.
When a request comes in to the IFS on another thread, it places the parameters and data into the two buffers and releases the semaphore that the control program's thread is blocked on. When that thread starts running again, the IFS returns from the DosFSCtl() call to the control program, where it executes the operation and places the results back into the buffer. It then calls the IFS again, which blocks the control program on the semaphore and starts the whole process over again.
The advantage of this approach is that whenever the control program is running, it is guaranteed to have a valid operation in the buffer waiting to be executed. Thus you never have to worry about semaphores in the control program. This is especially nice because 16-bit and 32-bit semaphores are incompatible.
Even though the control program doesn't have to worry about semaphores, the IFS certainly does, and in a big way. It has to worry about serializing all the requests it gets, and handling things like the control program unexpectedly terminating. To do this, we employ four semaphores:
CPAttached is used to indicate whether the control program is currently attached to the IFS. A value of -1 indicates that it has never attached to the IFS, 0 means it currently is not attached, but has been in the past, and 1 means it currently is attached. This semaphore is unique in that it is not a system semaphore, but an int that is being used as a semaphore. The reason we need to implement it this way will become clear when we start discussing the code.
BufLock is used to serialize requests to the IFS. Whenever the IFS gets a request, the request thread blocks on this semaphore until it's clear, at which time it knows that its OK to use the shared buffers to initiate the next operation.
CmdReady is the semaphore used to tell the control program that a request is in the shared buffers and is ready to execute. The control program thread blocks on this semaphore; a request thread clears this semaphore when a request is ready.
CmdComplete indicates to the request thread that the command it initiated is complete and that the results are in the shared buffers. It is cleared by the control program thread when it calls back into the IFS after it completes an operation.
When OS/2 is booting and finds an IFS= line in the CONFIG.SYS, it will check that the file specified is a valid DLL and that it exports all of the required entry points for IFS's. If it is not a valid IFS, OS/2 will put up a message and refuse to load it. If the IFS is valid, OS/2 will load it into global system memory and then initialize it by calling FS_INIT (note that if the IFS has a LibInit routine, it will be ignored).
RING0\R0COMM.C contains the code for the FS_INIT routine. Just like device drivers, IFS's get initialized in ring 3. Because of the special state of the system, an IFS can make calls to a limited set of Dos APIs (see table 1 for a list of which ones are allowed). It can also call any of the DevHlp routines that are valid at initialization time, but it cannot call any of the file system helpers.
FS_INIT gets passed a pointer to the parameters on the IFS= line and a pointer to the DevHlp entry point. The third parameter is used to communicate between the IFS and the system's mini-IFS; we can safely ignore it.
The first thing our IFS does is call DosPutMessage() to put up a sign-on message (it's a good idea to put up a message like this while you are still debugging the IFS, but you should take it out in release versions). After the sign-on banner is printed, we call a special routine to initialize the C runtime environment. This is a stripped down version of the startup code that comes with Borland C++; all it does is zero the BSS area and call any #pragma startup routines. Strictly speaking, it is probably not necessary.
Next we save any parameters that were on the IFS= line in a global buffer and save the address of the DevHlp entry point. Note that contrary to what the IFS reference says, we have to check the szParm pointer before using it because it will be NULL if there are no parameters. The reference leads you to believe that it will point to an empty string, but that isn't true.
Next we allocate a small block of memory in the system portion of the linear address space with the VMAlloc DevHlp (the system portion is global to all processes, just like GDT selectors). This memory will be used to hold the two lock handles that are created by the VMLock DevHlp when we lock down the memory that is shared between the control program and the IFS. We have to allocate the lock handles in the linear address range because VMLock can only put its lock handles there. Since our code is 16-bit, the compiler doesn't know what a linear address is. We deal with them by creating a new typedef, LINADDR, which is just an unsigned long.
Next we also allocate two GDT selectors to alias the shared memory on the ring 0 side. This is done here because according to the PDD reference, you can only allocate GDT selectors at initialization time (in fact, if you do it after initialization, it still works, but why take the chance, right ?). We then create pointers out of the GDT selectors and assign them to the two global variables used to access the shared buffers. Note that at this point, no memory is allocated! We have our pointers set up, but if we were to try and access them, we'd get a TRAP D. We must wait for the control program to start and call the IFS before we can put memory behind those GDT selectors.
After that's done, we set CPAttached to -1, which says that the control program has never attached to the IFS. We'll see later why its important to distinguish between when it has never attached, and when it has attached but then detached.
Control Program Flow
RING3\R3COMM.C contains the code to startup the control program. It first prints a banner, just like the IFS, and then allocates and commits memory for the two buffers. Once that is done, it puts the pointers to the two blocks of memory in the structure that is passed to the IFS for initialization. Before we call the IFS, though, we make a copy of the file system name in a temporary buffer. The DosFSCtl() call can use three different methods to figure out which IFS to call; we want to use the method where we specify the IFS's name. To do that we have to make a temporary copy of the IFS name because DosFSCtl could modify the buffer that contains the IFS name.
Once all the preparations are made, the control program calls the IFS to initialize. To the control program it's really no big deal - just one DosFSCtl() call. When the DosFSCtl() returns, it will either be because there was an initialization error, or there was an operation waiting in the shared buffers to be executed. If an error occurred, we just terminate the control program (perhaps a more user friendly error message should be printed, but after all, this is just a framework). If it returned because an operation is ready, we enter the dispatch loop.
The dispatch loop figures out what operation was requested, and calls that routine to execute it. Right now we only support the attach routine (which is actually just a stub that returns NO_ERROR). If it gets a request for an operation it doesn't understand, it prints an error message and returns ERROR_NOT_SUPPORTED to the IFS.
Once the operation has been executed, we again copy the IFS name into a temporary buffer and make a DosFSCtl() call to indicate that this operation is complete, the results are in the shared buffer, and we're ready for the next request. When that DosFSCtl() returns, another operation will be waiting in the shared buffer.
Ring 0 Side of Control Program Initialization
As mentioned above, the ring 3 side of the control program initialization is very simple. The ring 0 side is a little more complicated, though. FS_FSCTL in RING0\R0COMM.C contains the code for the initialization. FS_FSCTL is used to provide an architected way to add IFS specific calls (sort of like the IOCTL interface for devices). There are three standard calls, which we just ignore for now. To those we add two new calls, FSCTL_FUNC_INIT and FSCTL_FUNC_NEXT. FSCTL_FUNC_INIT is called by the control program when it initializes. FSCTL_FUNC_NEXT is called when the control program has completed an operation and its ready for the next one.
When FSCTL_FUNC_INIT is called, the first thing we do is check to see if the control program is already attached. If it is, we return an error code (this scenario could happen if the user tries to start a second copy of the control program). If the control program isn't already running, we wait until the BufLock semaphore is cleared. We do this because theoretically, we could run into the following situation: a request comes into the IFS and it starts servicing it. The control program is then detached, and then a new copy is run and tries to attach. The IFS is still in the middle of trying to service that request, however, and hasn't yet noticed the control program detached in the first place. It could be really bad if that ever did happen because the shared buffers would be corrupted, so we explicitly wait until the BufLock semaphore is clear, meaning that there are no threads using the shared buffers. We have to surround this with a check to see if the control program has ever been attached, because if it hasn't, the BufLock semaphore will not be initialized.
Next we verify that the buffer that was passed to us is the proper size and that it is addressable. We have to check addressability on everything that is passed in from a ring 3 program because if it is not addressable, we bring down the whole entire system.
Once addressability has been verified, we lock down the operation parameter area, and put the returned lock into the memory we allocated at FS_INIT time. Once that is done, we map the memory to the GDT selector that we allocated at FS_INIT time. We then do the same for the data buffer. Once these operations are complete, the memory can be shared between the IFS and the control program.
Once that is complete, we clear the BufLock semaphore to initialize the semaphore that indicates that the shared buffer is not being used by anyone. We then get the process ID of the control program. This is used by the FS_EXIT routine. FS_EXIT is called whenever any process terminates. We have it check the process ID of the process that is terminating against the process ID of the control program, so that if the control program unexpectedly terminates, we detach it properly.
After all that initialization is completed, CPAttached is set to 1 to indicate that the control program is attached. We then fall through to FSCTL_FUNC_NEXT. Since this function will be called every time an operation is completed, we first ensure that the control program is attached. If it's not, we return an error code. If it is attached, we first set the CmdReady semaphore to indicate that a command is no longer in the shared buffers (instead, results are in the buffers). We then clear CmdComplete to unblock the requesting thread (letting it know that its results are waiting). We then wait on the CmdReady semaphore, which will be cleared when a new operation is put into the shared buffers.
At any time, any of the semaphore calls could return ERROR_INTERRUPT if the user is trying to kill the control program. If that occurs, we detach the control program before returning the error code.
To detach the control program, we have to first set CPAttached to 0. We have to do it first to avoid possible deadlocks. We then unlock the shared memory buffers; if we don't do this, the control program will appear to die, but you will never be able to get rid of its window. Finally, we clear the CmdComplete semaphore so that if there is a request in progress, the requesting thread will unblock.
An Example Call: Attaching a Drive
Before you can use a drive managed by your IFS, you have to attach it. This creates an association between a drive letter and the IFS. RING3\FSATT.C contains an example program that attaches a drive. It is basically a front end to the DosFSAttach() and DosQueryFSAttach() calls. With a little help from the Control Program Programming Reference, you should be able to figure it out easily.
The part that needs more explaining is the ring 0 side of the interface. When you issue a DosFSAttach() or DosQueryFSAttach(), the file system router calls the IFS's FS_ATTACH entry point (this can be found in RING0\R0STUBS.C). This code is basically a prototype for all of the FS_* calls that the IFS handles. It serializes access to the control program, does some preliminary validation of the parameters, sets up the argument block and passes it to the control program, waits until the control program executes the operation, and then returns the results of the operation. Once the details of this call are understood, all the others can be written pretty easily.
The first thing FS_ATTACH does is check to see if the control program is attached; if it isn't, it immediately returns an error code. If the control program is attached, it waits until it can get access to the shared buffers. It is possible to time out waiting for this access; if we do, we return an ERROR_NOT_READY return code to the caller.
Once we have access to the shared buffers, we wait until the control program completes the last operation it started. We have to do this because it is possible for a thread to give the control program a request to service, and then time out waiting for it to complete it. We could then have another thread come along and try to start a new request, but if the control program hasn't finished the last one yet, the shared buffers will get trashed because the IFS will be trying to put a new operation in them, and the control program will be trying to put the results of the last operation in them. Therefore we must wait until the control program has finished the last operation.
Once those verifications are completed, we check to make sure we can access the buffer that was passed in. For an attach or detach request, all we have to do is check for readability, but for the query attach request, we have to check writability.
We then check that the control program is still attached. This check is crucial because during any of those semaphore or FSH_PROBEBUF calls we could've blocked, and the control program could've terminated. If it did, the shared buffers are no longer valid, and if we try to access them we will trap. It's for this reason that the CPAttached semaphore is an int and not a system semaphore - the semaphore calls don't guarantee that they won't block (i.e. they could block). To make absolutely sure, the only thing we can rely on is a semaphore implemented as an int (it's probably worthwhile to refresh your memory here that ring 0 code will never be multitasked - you have to explicitly give up the CPU).
Once we have verified that the control program is still attached, and thus our shared buffers are still valid, we setup the shared buffers with the operation's parameters. You can refer to R0R3SHAR.H (in either RING0 or RING3) for the data structure used. After that's complete, we clear the CmdReady semaphore to unblock the control program and indicate to it that a request is ready to be executed. We then block on CmdComplete waiting for the control program to execute our request. We specify a time-out to the wait so that we never get hung up on a faulty control program (if you never want to time out, you can change the value of MAXCPRESWAIT to -1). If we should time out, we release our hold on the shared buffer by clearing BufLock, and then return ERROR_NOT_READY.
After the wait returns and we check for a time out, we also check to make sure the control program is still attached. It is possible that while the control program was executing our request that it terminated (maybe we had a bug that caused it to trap). If so, the shared buffers are no longer accessible, so we return an error code to the caller. If all went well, we copy the results out of the result buffers. Note that while we are doing this, we can't do anything that could cause us to yield because the control program could terminate during that time.
After we copy the results out, we free up out hold on the shared buffers by clearing the BufLock semaphore and then return the error code that the control program told us to return.
And That's About It
That about covers the communications between the ring 0 and ring 3 sides of an IFS. If you're daring, you now have all the basics to forge ahead and begin implementing this type of IFS. If this still seems a little scary, don't worry - in the next article I'll fill in all the rest of the routines to give you a true skeleton to work with, and start discussing how to implement the FS_* calls. I will also provide a state diagram that shows all of the various states the system can be in, along with the states of the semaphores, to show that no deadlocks will occur in the IFS no matter what happens (this is actually very important because a deadlock is extremely difficult to track down, so you're better off investing time up front making sure they will never occur than beating your head against a wall later trying to track one down).
I'd like to thank everyone who has written to encourage me to continue the series or with ideas for topics you'd like me to cover. Since the only pay I receive is your feedback, I hope you'll continue to write.
Dos APIs Callable at Initialization Time
The following Dos APIs are callable by the IFS at initialization time: