Memory Mapped File I/O for OS/2: An Alternative Solution

From EDM2
Jump to: navigation, search

by Christopher Matthews

Currently, OS/2 developers cannot take advantage of memory mapped file I/O due to limitations in the file system. This article suggests an alternate solution to give you the performance of memory mapped file I/O until the support is available natively. Let's start with the basics. What is memory mapped file I/O and what does it buy me?

Memory mapped file I/O is the ability to load a file's contents directly into memory just by referencing a section of the file via a pointer, instead of using the read() and write() routines that process through a buffer. Memory mapped file I/O routines generally give back a pointer to a virtual address that is a memory location of a file's contents. Various operating systems support memory mapped file I/O currently, for instance AIX and other UNIX systems.

The benefit of using memory mapped file I/O is that it eliminates the overhead of going through the file system APIs after the initial setup. Accesses to the file are caught as a page fault exception and mapped directly into memory at the location specified by the initial file system mapping. This usually increases performance of file I/O for your reads and writes. Now that you understand the basics of memory mapped file I/O let's continue with "Poor Man's Memory Mapped File I/O."

Since OS/2 currently does not provide the support for memory mapped file I/O in the base, this article discusses an alternate solution that should work on all OS/2 platforms today. The solution is to have the OS/2 system loader provide pseudo memory mapped I/O support. This solution supports Read Only (R/O) memory mapped file I/O, but does not support Read/Write (R/W) memory mapped file I/O. With the R/O support, you get the performance boost when you're working with R/O databases or large R/O data files, etc.. This should increase performance and improve memory utilization.

Here's how it works:

  1. Take the R/O data file and convert it into a .obj (linkable object file). This file is then linked with the system linker or VisualAge ilink into a DLL (shared library) along with stubs for DLL_InitTerm() and starting code segment stub.
  2. Now, in your application to access the R/O data, do a DosLoadModule() call on the new DLL to load it into memory or import it.
  3. Then use DosQueryProcAddr() to query the code segment stub to find out the virtual address for where your R/O data was loaded. This stub returns a pointer to the beginning of your R/O data within the code segment. You can use this pointer directly in your routines to access the data as if it were a pointer into a file.

The benefit of doing this is your R/O data is treated as code so it falls under the semantics of the loader. This means that the pages are discarded and not swapped; hence, reducing memory requirements. The loader just discards the pages, if they are no longer needed, and rereads them from disk when or if they are needed again. The system's loader page fault mechanism in the kernel is a lot faster then the file system's API path. You also get the added features that the linker provides, e.g. compression of code segment, basing of code, etc.. Compressed code usually runs faster than non-compressed code. It is faster to read 2K of compressed code and decompress it into memory than it is to read the full 4K noncompressed code from the disk. Currently, systems are disk bound because the processors run faster than the disk access.

The following code snippets show how to build your R/O DLL module. These code snippets also come on the IBM Developer Connection along with the tool to convert your R/O data files into linkable object files.

  1. First, convert your R/O data file into a linkable object file. The following program, found on the Developer Connection in rod2obj/tools, will convert an R/O file to an object.
    rod2obj MyData.Dat MyObject.obj
  2. Once the object file is created, build the stub files in order to complete the final link step. The stub files can be generated by going to the rod2obj/src tree and running the nmake command.
    Note: You will probably need to change some of the starting environment variables of the makefile to point to the correct source location, compiler, etc. These variables are found toward the top of the file.
    Here is a list of the stub files that are used
src/stub.h Header file for declaration of dummy function.
src/stub.c Dummy function to mark the beginning of the code segment.
src/initterm.c DLL_InitTerm function that also calculates the offset of where the real R/O data starts during the init of the DLL.
Other files included are:
src/makefile Makefile for the stub routines.
src/object.def Sample Definitions file needed for making final DLL.
tools/linkit.cmd Command file for running the link step to create the R/O data DLL.
tools/rod2obj.exe Conversion utility to convert the R/O data files, into linkable object files.
3. Once the build is complete, go to the rod2obj/rel path and modify the linkit.cmd file to place your data object file (.obj) in the link step. The linkit.cmd file is shown below.
ilink /NOFREE /MAP /EXEPACK:2 /NOE stub.ob
myobject.obj+initterm.obj,my.dll,object.map,,object.def
Note: If you do not have ilink, you can use link386 in its place. Also, the ordering of these objects is important because of the layout of code into memory. If you have additional code to add to the DLL, it should be added at the end. If there are concerns on where the code was placed, you can look at the .map file to see the layout of memory objects. The stub.obj and your R/O data object file should always be the first two objects in the segment.
4. Now run linkit.cmd file to generate a .DLL file. This file is loadable via imports or osLoadModule(). You can query the exported symbol DataOffset to get a pointer of starting location of R/O data in virtual memory. Maximum file size of your data file that can be supported is dependent on the system's loader for the given release.

Summary

This is not a system level memory mapped file solution, but an opportunity to have some support of memory mapping until the operating system supports it natively. The current solution could offer support for individuals using large data files, catalogs, images, or small databases that are R/O (read only). The performance may vary depending on your implementation. We do not know what the current plans are for memory mapped file I/O in the operating system, but we hope this helps in the interim.