Building Smaller, Faster Applications Using OS/2 Warp Version 3

by Allen Wynn

Running applications faster was one of the primary goals of the OS/2 Warp, Version 3 development team. We also had to find a way to rebuild existing applications so that they could load and execute faster than under previous versions of OS/2. To do this, we added new options and defaults to the linker (LINK386) and resource compiler (RC) for OS/2 Warp, Version 3. Applications built with the new options are typically 25% to 30% smaller and run faster, especially in low-memory situations.

Adding the new options and defaults required a major change to LINK386 and RC, so we changed the version numbers. Versions 2.00.xxx and 2.01.xxx of LINK386 and RC (where xxx can be any digits) do not contain the changes. Version 2.02.xxx of LINK386 and RC do contain the new options and defaults. At any OS/2 command prompt, type RC (with no options) or LINK386 /? to find out which versions you have.

A Word of Caution
Two of the new options, however, will produce executables that will only run under OS/2 Warp, Version 3. The new LINK386 /EXEPACK:2 option and RC -X2 option produce executables with compressed pages. '''Previous versions of OS/2 do not recognize these pages. If you try to run this executable under a previous version of OS/2, OS/2 returns error code 193 (ERROR_BAD_EXE_FORMAT) and an appropriate message.''' The other new options will not affect compatibility with previous versions of OS/2.

This is a tradeoff that you must evaluate in light of how you expect your program to be used. If your program will only be run on OS/2 Warp, Version 3 (and subsequent releases), use the new options. If, however, the program will be run on previous versions of OS/2, do not use the new flags.

What is Sector Alignment?
Previous versions of LINK386 start pages of code and data on boundaries that are a multiple of the alignment factor (specified with the /ALIGNMENT switch). If you use /ALIGN:4, for example, LINK386 creates an executable (or dynamic link library, physical device driver, or virtual device driver) with pages that start at an offset that is a multiple of 4 bytes from the beginning of the file. The offset specifies where the page of code or data starts in the executable file on disk; it does not affect the way the executable looks in memory.

LINK386 Version 2.02.xxx will, by default, start pages of code (but not necessarily pages of data) on sector boundaries, regardless of the value use on the /ALIGN switch. This is because the alignment factor must be a multiple of 2 and the sector size is 512 bytes, which is also a multiple of 2.

If LINK386 finds that a page of code can start before a sector boundary, LINK386 simply rounds the starting offset up to a sector boundary. Basically, LINK386 skips alignment factors, or adds padding, to make sure that pages of code will start on a sector boundary. This usually reduces the number of sectors that a page occupies, which, in turn, reduces the amount of time to load the page.

If you don't specify /EXEPACK:2, pages of code will be 4096 bytes in length (except for the last page in an object). This means that any page following a sector-aligned page will start 4096 bytes after a sector boundary, which is also a sector boundary. No padding is needed. Therefore, the first page in a code object might have extra padding in front of it, but no other pages in that code object will need to be adjusted (unless you specify /EXEPACK:2.)

If you want to override this default, specify the /NOSECTORALIGNCODE (or /NOS) option. In fact, the /NOSECTORALIGNCODE</tt> option actually creates smaller physical and virtual device drivers that can be loaded faster. This is because OS/2 preloads the entire image. In addition, some utilities make the bold assumption that a page of code will start at the next available alignment boundary. If your build process includes running such a utility against the application after the link step, you should use the /NOSECTORALIGNCODE</tt> option.

Why Is Sector Alignment Faster?
To realize the significance of sector aligning code, you must also understand demand paging, as well as how OS/2 accesses code and data.

When OS/2 loads an executable, it does not actually read into memory any code or data. OS/2 reads the Linear Executable (LX) header and the Loader Section of the executable file. (View the LX Specification, (LXSPEC.INF) on your accompanying Developer Connection for OS/2 CD-ROM). The Loader Section contains page tables that tells OS/2 where to get each page of code or data from the executable on disk. OS/2 reserves the virtual memory for the code and data. When a page is accessed, OS/2 generates a trap 14 (page not present) and retrieves that page from the executable.

In a memory-constrained system, OS/2 does not have enough physical memory to keep all of the code and data that has been loaded. In this case, OS/2 must free some memory. OS/2 views memory as follows: The code and data segments of executables, including DLLs, contain swappable or discardable pages.
 * Resident pages must remain in physical memory and are used only by the OS/2 kernel and device drivers.
 * Swappable pages might have been modified. They must be saved before being freed. Swappable pages are saved in the swap file (SWAPPER.DAT), which is maintained by the OS/2 kernel.
 * Discardable pages that are usually code and read-only data. Because these pages cannot be modified, they can always be reloaded into memory, if needed, from the executable on disk. Discardable pages can simply be discarded to make room for new pages.

In a memory-constrained environment, pages are constantly swapped out and discarded to make room for other pages. At some later time, the discarded (or swapped out) pages might need to be reloaded. Swapped out pages can be read from the SWAPPER.DAT file. Because pages are 4096 bytes, the SWAPPER.DAT file will always contain pages that are sector aligned.

For a discarded page, or on the initial load of swappable pages, the OS/2 loader reads the page from the executable file on disk. Remember that OS/2 cannot read a part of a sector, it must read complete sectors.

If a page does not start on a sector boundary, the sectors are read into a cache block and are copied to their final destination. The intermediate (full) sectors are read directly to the final destination. This results in a read of 9 sectors. All OS/2 needed was 4096 bytes, which could be contained in 8 sectors.

If a page starts on a sector boundary, OS/2 simply reads the entire page to its final destination. Because the page starts on a sector boundary and is 4096 bytes long, only 8 sectors must be read. This eliminates the need for a cache block and extra copying.

What Does Better Compression Buy Me?
We found that if a compression algorithm can compress a page small enough and a decompression algorithm can decompress it fast enough, the time saved in reading fewer sectors is greater than the time required for extra processing (decompressing). On most machines, reducing two sector reads through compression will more than offset the time necessary to decompress a page. Saving only one sector will produce no appreciable time difference on 80386 processors, but it will save time on 80486 and Pentium processors. Because emerging processor technology is by far outpacing DASD improvements, this trend is likely to continue.

In addition to faster load and execution, using the new compression algorithm also reduces the file size of the executable. Typical results with /EXEPACK:2 show a savings of 25% to 30% in file size. Resource-only DLLs built with -X2 showed savings of up to 75%. These are savings of file size only, not memory.

So, How Do I Get Better Compression?
The /EXEPACK option and -X option only compressed pages of data, not pages of code. The compression algorithm is basically run-length encoding, which makes it unsuitable for code and data that does not have strings of repeating characters.

New options were added to LINK386 and RC to include a new compression algorithm that is suitable both for code and data. LINK386 Version 2.02.xxx accepts three variations of the /EXEPACK option:
 * /EXEPACK</tt> Old compression method only
 * /EXEPACK:1</tt> Old compression method only
 * /EXEPACK:2</tt> Try both old and new compression for each page

RC Version 2.02.xxx accepts three variations of the -x parameter:
 * -x</tt> Old compression method only
 * -x1</tt> Old compression method only
 * -x2</tt> Try both old and new compression for each page

If you specify /EXEPACK:2</tt> or -X2</tt>, OS/2 applies both the old and new compression method to each code and data page, using the smallest result.

Code and data pages that OS/2 compresses with the new algorithm are marked as ITERDATA2, which is defined in the EXE386.H include file. This file is in The Developer's Toolkit for OS/2 Warp, Version 3 on your accompanying Developer Connection for OS/2 CD-ROM. Some utilities might not recognize the new ITERDATA2 pages, so use caution if the link step or resource compile step is not the last step in your build process.

Recommendations
Which new options are right for you? The answer is, as usual, "It depends." If you are writing applications that do not need to be run under previous versions of OS/2, use /EXEPACK:2 and -X2. Otherwise, use /EXEPACK:1 and -X1. Some general guidelines follow:

Use the RC -X2 option whenever possible.

Use the following LINK386 options for physical and virtual device drivers: /FAR /ALIGN:2 /EXEPACK:2 /NOS /PACKC /PACKD Use the following LINK386 options for dynamic link libraries: /FAR /ALIGN:4 /EXEPACK:2 /PACKC /PACKD Use the following LINK386 options for executables: /FAR /ALIGN:4 /EXEPACK:2 /PACKC /PACKD /BASE:0x10000

Conclusion
The new LINK386 and RC options played a large role in improving the performance of OS/2 Warp, Version 3. The OS2KRNL file is 25% smaller and loads faster thanks to the /EXEPACK:2</tt> option. Compression of fonts with -X2</tt> reduced the size of font DLLs by up to 70%. Most developers should see similar results, with the bottom line being smaller and faster applications.