EDM/2 - OS/2 Installable File Systems

OS/2 Installable File Systems - Part 3/3

Part1 Part2 Part3

Introduction

Whew! Between my regular job (I don't call it my day job anymore, because it seems like I've been spending more and more nights there too), this month's code not working, Larry breathing down my neck for an article, and the joys of Christmas shopping, keeping sane recently has not been easy. I'll tell you, Murphy was right - if it wasn't for the last minute, a lot of things wouldn't get done.
Last time, I said that I'd present a state diagram that showed how the IFS/control program couldn't get deadlocked. I'm afraid that'll have to wait until next time. The code was very uncooperative this month, and I ended up not having enough time to finish it. Instead, I concentrated on getting the code into a real skeleton that you could just insert your own guts into, and also started inserting some guts for our versioning file system. In doing that, I of course ran into some problems...
But before I start my detective story, I'd like to make official what has been happening for the past few months: this series is officially bi-monthly. As much as I'd like to do a monthly series, I'm afraid I don't have the time. As always, though, if you have specific questions that you need answered before this series gets to them, feel free to write, and I'll try my best to explain.
Also, for those that don't know, I did an IPF version of the IFS document. It is available on Compuserve in the DAP section of the OS2DF2 forum (section 14), and is called IFSDOC.ZIP. You'll have to become a member of IBM's World Wide DAP to get access to that section, but the good news is joining is free and easy. Just GO OS2DAP on Compuserve and answer the questions. In about a week you should receive a welcome letter and have access to that section.
It used to work...

The code I presented last month was the beginnings of a skeleton for a ring 0/ring 3 IFS. The design and concept were all hashed out, but the actual skeleton code for all the ring 0 stubs and for the ring 3 router weren't complete. That was the first thing I worked on - filling in all the missing skeleton code. The first thing I had to do was expand the union in R0R3SHAR.H that defines all the data passed between the IFS and control program to include structures for all the FS_ calls. Once that was done, the next thing to do was cut and paste the code for the ring 0 FS_ATTACH into all the other FS_ functions so that they would route their calls to the control program. Finally, the control program's router had to have entries added for all the other FS_ calls. Talk about work that's hard to get psyched for! Cutting and pasting is not my idea of a stimulating programming session.
Once that was done, the next thing to do was to actually start filling in some of those stubs. The first one to do is FS_ATTACH, since that is the first call this type of IFS will receive. After that, a good second one is FS_CHDIR, since directories are very basic to any IFS. The code I filled in isn't all that elaborate - just something to start testing with. At that point, though I decided it was time to give it a whirl and see how it ran. After fixing a few syntax errors, it was time for the big moment...and it didn't work anymore!
OS/2 booted fine and loaded the IFS. The banner came up, and the system didn't crash. Things were looking good until I ran the ring 3 control program, when it failed trying to attach to the IFS. Funny, that worked before. Looking up the error code revealed it to be ERROR_INVALID_PARAMETER. I figured it was probably just a typo or something.
A week later I still hadn't a clue what was causing the problem. I narrowed it down to a call to VMLock in the IFS, but all it's parameters looked good. I thought it might be that OS/2 had gone flaky on my system, so I tried it on another system at work. Same thing. I thought that maybe my assembly DevHlp functions were messing up, so I wrote direct calls to the DevHlps in the code. Same thing. I thought maybe the lock handle VMLock uses couldn't be in swappable memory, so I put it in movable, and then fixed memory. Same thing. Nothing seemed to want to cure this problem.
At that point I got the brilliant idea of taking the code from the last article, compiling and running it, and seeing what changes I'd made. Well, I found out much to my dismay, that the code from last month had the same problem. I considered how lucky I was that the flood of angry email letters everyone sent me decrying my foul code must have somehow gotten filtered, and never made it to my electronic mail box. So after counting my blessings, I decided to get a beer. Not that that would help me find the problem, but it would help me think up an excuse to tell Larry why my article would be late.
The C preprocessor strikes again

Funny thing about problems where you have no clue what the cause is - you kind of want to hibernate for a month or two and hope somebody else fixes it. Unfortunately among the list of things I have, a co-author is not one of them, so I would have solve this one myself.
I won't pretend that what I did next had any kind of methodical systematic approach behind it. My whole focus at that point was to just make the call work somehow, so I inserted a return right after the call and started playing with the parameters. The first break came when I could make the call work by doing two VMAlloc's before the call, one for the data block to be locked down, and the other for the memory to put the lock handle into. A little more playing narrowed it down to just the VMAlloc for the lock handle. If I kept the call to VMLock the same as it was originally, but inserted that VMAlloc before it, it worked fine. Okay, so that pointed to a problem with the VMAlloc for the lock handle, which was done in the initialization routine. I thought maybe it's DevHlp routine was messed up and took a look at it. It looked fine. I decided to take a long look at the flags again to see if there was something I was missing, and while looking at the PDD reference, I decided to see if the values for the flags in the .H file matched those in the book. Lo and behold - VMAF_GLOBALSPACE was defined the opposite of what it should have been. The initialization routine was allocating memory for the lock handles in memory local to the process that calls the IFS at initialization time, instead of globally. VMLock had been complaining about not being able to address the lock handle! The whole problem was that I was looking at the code for the VMAlloc which clearly included the flag to allocate global memory, but the flag value was wrong. Bitten again by the C preprocessor.
Up and running, almost

Once I got that bug figured out, I was finally back to where I was before. The control program could load and attach to the IFS, and I could issue a DosFSAttach() call to associate a drive letter with the IFS. Now was the big chance to try and see if the new FS_CHDIR code would work. I typed CD T:\1 and got back an error code of ERROR_PATH_NOT_FOUND. Hmmm. That's strange. When I did the attach, I said for it to attach drive T to drive C, and I knew that directory existed on drive C. Time to add debugging code. At least since the problem was in the ring 3 piece, I didn't have to reboot after every recompile. Waiting for OS/2 to shutdown and restart every ten minutes was getting to be tiring.
After the debugging code was added, I found that the FS_CHDIR call was failing because DosQueryPathInfo() said that the directory I was passing in wasn't a directory. Double hmmm. I decided to have the control program print out what attributes DosQueryPathInfo() thought my directory had. It seems DosQueryPathInfo() was being very generous and saying my directory had lots of attributes, even some of the reserved ones, but not the directory attribute. Running it with Turbo Debugger and looking at the structure showed the same thing. This was too weird. I decided to pull the DosQueryPathInfo() call out into a little test program and add some code to dump the raw data that DosQueryPathInfo() returned. The code looked like this:
static FILESTATUS3 buf; int i; : for (i=0; i<sizeof(buf); i++) { printf("%u ", ((unsigned char *)&buf)[i]); }

Now something else weird was happening - it was printing out too many bytes. So I played with it a little, trying some variations before I just decided to look at the data.
Well, to understand the data, you have to know the structure, so I looked it up in the Control Program reference. A FILESTATUS3 has 3 FDATE and FTIME structures at the head of the structure, followed by two fields related to the file size, and finally the file attributes. Looking up an FDATE in the book said it consists of 3 USHORT's, one each for day, month, and year. Okay, fine, but now the dump code wasn't printing enough data. Time to look at the header files.
Looking at the BSEDOS.H header file in the 2.1 Toolkit showed the following definition for an FDATE (FTIME was similar):
#ifdef __IBMC__ typedef struct _FDATE /* fdate */ { UINT day : 5; UINT month : 4; UINT year : 7; } FDATE; typedef FDATE *PFDATE; #else typedef struct _FDATE /* fdate */ { USHORT day : 5; USHORT month : 4; USHORT year : 7; } FDATE; typedef FDATE *PFDATE; #endif

Now I see where the Control Program reference was wrong - it left off those important little ': 5' things on the end. An FDATE wasn't 3 USHORT's, it was one! After that discovery, I got to wondering what the definition was in the header files Borland ships with it's compiler. So I pulled up BSEDOS.H in the bcos2\include directory, and found this:
#if defined(__IBMC__) || defined(__BORLANDC__) typedef struct _FDATE /* fdate */ { UINT day : 5; UINT month : 4; UINT year : 7; } FDATE; typedef FDATE *PFDATE; #else typedef struct _FDATE /* fdate */ { USHORT day : 5; USHORT month : 4; USHORT year : 7; } FDATE; typedef FDATE *PFDATE; #endif

Well, guess what, a UINT is not the same size as a USHORT in 32-bit code! It seems someone at Borland forgot that. As a result, every structure that used an FDATE or FTIME was messed up (including FILEFINDBUF3, which was what I was trying to use). Getting rid of the '||' and everything after it did the trick - now DosQueryPathInfo() was working like it should. For your reference, lines 399 and 418 in the BSEDOS.H that Borland ships need to have this fix applied.
Debugging tools

There were some other problems after those two, but they weren't nearly as interesting. One thing I learned that I did find very interesting though, is that FS_EXIT in the IFS is not called if the ring 3 control program traps. That turns out to be unfortunate because OS/2 won't put up the trap info, or let the program die, until the memory that the IFS VMLock'ed has been VMUnlock'ed, which is the whole purpose of the code for FS_EXIT in the IFS. Because of that limitation, I'm considering modifying the design so that the IFS allocates the shared memory instead of the control program, but that will have to wait until next time. In the mean time, I added another operation to the IFS's FS_FSCTL entry point, FSCTL_FUNC_KILL, which will unlock the shared memory and clear the semaphores. FSKILL.C is a utility program that will make that call. This is extremely handy when debugging because you're bound to get things hosed up, and it's nice to have a way to unhose them besides rebooting.
I also added a another new program, FSDET.C, which will detach a drive from the IFS. After running FSATT a few times and associating a bunch of drive letters with the IFS, it's nice to have a way to disassociate those drive letters, too.
FS_ATTACH processing

Finally, this month, I'd like to talk about how FS_ATTACH and FS_CHDIR should be implemented. Unfortunately, the IFS reference isn't real clear on how they should work.
FS_ATTACH is the function that is called whenever OS/2 needs to associate or disassociate a drive letter with your IFS. It's also called when OS/2 is querying information about a particular attach. FS_ATTACH supports the DosFSAttach() and DosQueryFSAttach() calls.
When OS/2 wants to attach a drive, FS_ATTACH is called with flag set to FSA_ATTACH. pDev will contain the drive letter in the form 'R:', and pParm and pLen will correspond to the pDataBuffer and ulDataBufferLen parameters on DosFSAttach(). For our IFS, this area will contain a directory that the drive letter should be associated with.
The first thing the IFS has to do is verify that pDev contains a drive letter, and not a device name. IFS's can also be used to write character mode device drivers, and for that type of IFS, FS_ATTACH is used to associate the IFS with a device driver name. All you have to do to check is see if the first character of pDev is a '\' - if it is, somebody is trying to attach a device driver name to the IFS, so you should just return with ERROR_NOT_SUPPORTED.
The next thing you would if you were writing a pure ring 0 IFS would be to verify that you can access the pParm data area, since OS/2 does not do that automatically for you. In our split ring 0/ring 3 IFS, the ring 0 side handles that automatically, so the ring 3 side doesn't have to worry about it.
Next, the IFS can proceed with the attach processing. For our IFS, we do a DosQueryPathInfo() to check that the directory the user passed in the pParm area exists and is really a directory. If it is, we allocate enough memory to hold the directory and copy in there. We then use part of the CDFSD to hold the pointer to that memory.
So what's a CDFSD? IFS's have four types of structures it deals with, one each for volume parameters, current directories, open files, and file searches. Each one of those structures has a file system independent part, and a file system dependent part. The file system independent part is the same across every IFS, while each IFS can use the file system dependent part however it wants. So to answer the question, a CDFSD is a file system dependent current directory, and we use it to store a pointer to the text of the directory the user asked us to attach to. Descriptions of these structures can be found in the IFS reference, although some of the fields are a bit difficult to understand. As we need them, though, I'll explain what each one does.
At attach time, an IFS should also fill in the VPFSD, which is the file system dependent volume parameters. For our IFS, all we need is the same pointer that's in the CDFSD. This will be used on the FS_FSINFO and FS_FLUSHBUF calls when we implement those.
For a detach, flag will be set to FSA_DETACH. The IFS should flush any remaining buffers and deallocate any memory associated with the drive letter. Currently, our IFS doesn't do a free on the memory we allocated at attach time because it could get into trouble. Consider this scenario: a user runs the control program and attaches a drive (the control program allocates memory to hold the directory). The user then kills the control program without detaching the drive, and then re-runs it. At this point, OS/2 still considers the attach the user did to be valid, even though the control program wouldn't. Suppose now the user does a detach. If the control program tried to free the memory in the CDFSD, it would trap because it never allocated that memory - the previous instance of the control program did. There is a way around that problem, but for now we will just not free the memory.
For a query, FS_ATTACH will be called with flag set to FSA_ATTACH_INFO. pParm will point to a structure that looks like this:
struct attach_info { unsigned short cbFSAData; char rgFSAData[1]; }

The IFS should fill in the rgFSAData with whatever information it wants to return, and cbFSAData with the length of the information. Our IFS just returns the directory that the user originally specified when he attached. The information a particular IFS returns is completely up to the IFS, but generally it is some sort of ASCII text. To see what other IFS's return, you can use FSATT with no parameters - it'll go through all the drive letters, do a DosQueryFSAttach() on each, and print out what it returns.
FS_CHDIR

OS/2 handles current directories extremely intelligently. It treats a current directory not as just some piece of text, but as something that has to be allocated and freed every time a user switches directories. If two different processes are in the same current directory, OS/2 can share a single current directory structure between them and will free it only when both processes no longer are using it. It also only keeps one current directory structure for the root directory (the one that we filled in at FS_ATTACH time). This makes FS_CHDIR a little more challenging to implement, but not too terribly complicated.
When a user issues a CD command, the OS/2 will call FS_CHDIR with flag set to CD_EXPLICIT. This indicates to the IFS that it should mutate the CDFSD that it receives into the CDFSD appropriate for the directory being switched to. What OS/2 actually does is it copies the old CDFSD into new memory and calls the IFS pointing to the new memory. The IFS should do whatever verification it needs to do, and then change the CDFSD appropriately, unless we are switching to the root directory. If we are switching to the root directory, OS/2 will throw away whatever the IFS returns in CDFSD, and use the CDFSD that the IFS setup at FS_ATTACH time. The IFS should not free any resources kept with the old CDFSD.
The text of the new current directory is passed in the pDir pointer, and to help optimize processing, iCurDirEnd points to the end of the old current directory in pDir. For example, if the old current directory was "F:\DIRECT1", and we're switching to "F:\DIRECT1\DIRECT2", iCurDirEnd will point to second '\'. If iCurDirEnd is -1, it means that the user CD'd up the tree, such as from "F:\LEVEL1\LEVEL2" to "F:\LEVEL1", or to "F:\LEVEL1\LEVEL2A", and this optimization isn't possible.
What our IFS does for this call is simply verify that the directory the user is trying to switch to exists and really is a directory. We don't actually keep the text of the directory around, since we can recreate it whenever we need to by concatenating the Dir field of the CSFSD and the cdi_curdir field of the CDFSI (cdi_curdir is the text of the current directory).
When OS/2 want to free a current directory structure, it calls FS_CHDIR with flag set to CD_FREE. The IFS should release any resources associated with the current directory passed in. For us, we never allocated anything, so we don't need to free anything.
Finally, if OS/2 want to verify a current directory, it calls FS_CHDIR with flag set to CD_VERIFY. The IFS should verify that the current directory is a valid directory name, and that it exists. If it isn't a valid directory, it should release any resources associated with the current directory (i.e. it should pretend it got an FS_CHDIR/CD_FREE call). Our IFS simply calls DosQueryPathInfo() to verify that the directory still exists and that in fact it is a directory.
Next Time

Well, that's it for now. Next time I'll go over that state diagram, and also start talking about the FS_FIND* routines. At that point we'll have an IFS where you'll be able to switch directories and do DIR's. Good stuff!
[Note: Unfortunately this series never picked up again after this installment, so this is the last in the series. Ed.]