Memory-mapped files in OS/2

From EDM2
Jump to: navigation, search

by Sergey I. Yevtushenko

Foreword

This article explains small library which implement memory-mapped files in OS/2.

Overview of Memory-Mapped Files

Memory-mapped files is a technique used for accessing files like ordinary RAM. Frequently this method of accessing files is mentioned as one of the advantages of the APIs of some flavours of Unix or of Windows NT, but OS/2 is also capable of doing so, although this is not directly mentioned in the API. Let me explain this statement: Accessing files like RAM can be done in two ways: load the file into memory and access memory instead of the file and use the virtual memory engine to load parts of the file on indirect application demand. This first method is well known but requires large amounts of memory for large files, is slow, and of course it does not do direct mapping between memory and the file. The second method is much more interesting. It does not require having files loaded into memory and the mapping between the position in the file and memory address is direct. This technique is called "Memory Mapped File" or MMF. Because accessing files in this way requires the support of the virtual memory engine, you might imagine that this is required to be implemented in the OS/2 kernel, but this is not so. The OS/2 API provides all required lower level APIs to implement this method at the application level.

Let's look at this technique. To implement this technique we need three things: the ability to allocate address space, the ability to change the status of some parts of the address space (map to real memory, i.e. "commit") and notification when we try to access this address space at points not yet committed. For the first two purposes the OS/2 API provides DosAllocMem and DosSetMem functions. The last thing required is an exception handling API.

Step-by-step Implementation of Memory Mapped Files

0. Install exception handler
1. Open file
2. Determine file size
3. Allocate address space big enough to handle file.

The installation of the exception handler is done by the MMF library function DosInitMMF. This function should be called before any other call to the MMF library will be made. Because it requires a pointer to a structure in the application stack (to be exact, this is a requirement of DosSetExceptionHandler), it can't be called before main() using a portable across compilers technique (for example using C++ static objects). Therefore the beginning of main() should look like:

{
    MMFINIT RegRec = {0,0};
    ....

    DosInitMMF(&RegRec);
    ....

The rest of the steps are done inside the DosAllocMMF library function. It returns a pointer to the allocated memory region and you can access this memory directly. Furthermore: any library function which accepts a pointer to memory will work without knowledge about the real source of this memory. You may ask "where is trick?" Well, there is no trick at all: the pointer returned by DosAllocMMF requires additional processing in order that access to it will not cause a trap or reading of garbage. This processing is delayed before this memory is really accessed and is done in the exception handler installed by DosInitMMF. Because this is the most interesting part of the library, we will look inside it for more details:

ULONG APIENTRY PageFaultHandler(PEXCEPTIONREPORTRECORD p1,
                                PEXCEPTIONREGISTRATIONRECORD p2,
                                PCONTEXTRECORD p3,
                                PVOID pv)
{
    if( p1->ExceptionNum == XCPT_ACCESS_VIOLATION
      && ( p1->ExceptionInfo[0] == XCPT_WRITE_ACCESS
        || p1->ExceptionInfo[0] == XCPT_READ_ACCESS))
    {
        PMMF   pMMF   = 0;
        PVOID  pPage  = 0;
        APIRET rc     = NO_ERROR;
        ULONG  ulFlag = 0;
        ULONG  ulSize = PAG_SIZE;

        pMMF = Locate((void *)p1->ExceptionInfo[1]);

        if(!pMMF)
            return XCPT_CONTINUE_SEARCH;

        pPage = (PVOID)(p1->ExceptionInfo[1] & PAG_MASK);

/* Query affected page flags */

        rc = DosQueryMem(pPage, &ulSize, &ulFlag);

        if(rc)
            return XCPT_CONTINUE_SEARCH;

/*
** There can be three cases:
**
**  1. We try to read page              - always OK, commit it
**  2. We try to write committed page   - OK if READ/WRITE mode
**  3. We try to write uncommitted page - OK if READ/WRITE mode
**                                        but we need to commit it.
*/

/* filter out case 2 */

        if(p1->ExceptionInfo[0] == XCPT_WRITE_ACCESS
          && !(pMMF->ulFlags & MMF_READWRITE))
        {
            return XCPT_CONTINUE_SEARCH;
        }

/* if page not committed, commit it and mark as readonly */

        if(!(ulFlag & PAG_COMMIT))
        {
            ULONG ulTemp = 0;

            rc = DosSetMem(pPage, PAG_SIZE, PAG_COMMIT | PAG_READ | PAG_WRITE);

            if(rc)
                return XCPT_CONTINUE_SEARCH;

            /* set position */

            rc = DosSetFilePtr(pMMF->hFile,
                               (ULONG)pPage - (ULONG)pMMF->pData,
                               FILE_BEGIN,
                               &ulTemp);

            /* read page from disk */

            if(!rc)  /* Actually ignore errors here */
            {
                rc = DosRead(pMMF->hFile,
                             pPage,
                             PAG_SIZE,
                             &ulTemp);
            }

            rc = DosSetMem(pPage, PAG_SIZE, PAG_READ);

            if(rc)
                return XCPT_CONTINUE_SEARCH;

        }

/* if page already committed, and accessed for writing - mark it writable */

        if(p1->ExceptionInfo[0] == XCPT_WRITE_ACCESS)
        {
            rc = DosSetMem(pPage, PAG_SIZE, PAG_READ | PAG_WRITE);
        }
        return XCPT_CONTINUE_EXECUTION;
    }
    return XCPT_CONTINUE_SEARCH;
}

Comments make this piece of code almost self-explanatory, but two things should be noted. Exceptions which are not related to MMF will be filtered out and therefore can be handled by another exception handler. Also, opening files in read-only mode can be useful for accessing large amounts of static data. As you can see, this handler does most of the work of reading parts of the file on application demand. But it does not discard pages not accessed for a long time. You can improve this implementation by adding this feature. Some rudimentary parts of this idea are already done: pages accessed only for reading are distinguished from pages accessed for write.

Example Program

/*
** Module   :MMF_TEST.C
** Abstract :Memory mapped file API example
**
** Copyright (C) Sergey I. Yevtushenko
** Log: Wed  03/09/97   Created
*/

#include <mmf.h>
#include <stdio.h>
#include <string.h>

int main(VOID)
{
    MMFINIT RegRec = {0,0};
    char * data;
    char * data2;
    int rc;

    DosInitMMF(&RegRec);

    rc = DosAllocMMF("datafile", (void **)&data, MMF_READWRITE);
    printf("rc = %d\n", rc);

/* Try to uncomment these lines in any order and compare program output
   and 'datafile'
*/
    printf("Data: %s\n", data);

/*    data[4096] = ''; */

/*    printf("Data: %s\n", data); */

/* ATTENTION !!! Uncommenting following line will update file 'datafile' */
/*    DosUpdateMMF(data);                                                */

    DosFreeMMF(data);

    return 0;
}

This example opens "datafile" for reading and writing and prints the contents of the file using printf. To become more familiar with MMF, you can do some experiments with this code by uncommenting some commented-out lines. Before trying this example don't forget to create a file named "datafile" filled with, for example, text, and be sure to make it long enough (5K or so).

General Considerations for Using Memory Mapped Files

At first you should note that massive access to files with this technique is not recommended. It reads files by 4K pages. In some cases this size is too small to be effective. Much more appropriate is accessing large files of which only a small portion is actually read into memory. An example of appropriate use of MMF is the file viewer built into the ZTreeBold file manager written by Kim Henkel.

Final Words

The implementation of memory mapped files in OS/2 has some advantages and some disadvantages. One advantage is the complete flexibility of the implementation. It is much more flexible than, for example, the NT one. If you wish, you can limit the amount of real memory used by MMF, at your choice on a "per file" or on a "per application" basis. One disadvantage of the implementation of MMF in OS/2 is more serious: the amount of address space for each application is limited by the 512Mb memory barrier (to be exact slightly less, around 400Mb). But this has been changed in Warp Server SMP, and probably will go away with release of Aurora and any future versions of the client.

In any case, if you found the MMF library useful for your purposes, you can freely use it without any limitations. Usual disclaimer: use it at your own risk, there are no warranties at all and so on.

Credits

Many thanks to Dmitry Niqiforoff (mailto:dniq@hippo.ru). This implementation of MMF is the result of a discussion with him.