Adding Native Compression To Your Application

From EDM2
Jump to: navigation, search

Written by Alger Pike

Using the Info-Zip Zip21 and Unzip52 DLL's

Introduction

As computers get faster and faster, they are able to generate more and more data in the same time. This is particularly true for data acquisition applications. For instance, if I were to take a scan at the maximum resolution over the entire range of my instrument, the resulting data set would take 4GB. That is not a misprint; that is four Gigabytes. Needless to say we don't take data sets that large, but the data sets we do take can very often be a 20 or 30MB file. With files this large, one can very quickly fill up a hard disk. So we need to compress our data. In this way, we can almost triple the number of data files that we can put onto the disk. Also we triple the number of files we can fit onto our writable CD-ROM. This translates into money because we do not have to buy as many discs. [Exactly how much space is saved will depend on your data, and with very repetitive data you could have space for more than triple the number of files; with very random data you may save less space. Ed.] Anyway, to meet this need I have converted the info-zip compression algorithms into C callable DLLs. These DLL functions allow the developer to zip and unzip files from the application level.

The first step in using this compression package is to include the header file. There is only one header file, named "ziplib.h". This header contains the function prototypes for both the zip and unzip functions. This way you only have to include one header file. Also, remember to change your makefile so that the zip21.lib and unzip52.lib file are linked into your application. A note to Borland users: You will need to make your own .lib files from the DLLs. The naming used by IBM and Borland in their respective binaries are different.

The next step in the use of these functions, is to setup the arguments that you want to use in your compression. This is done in a character array defined as follows:

static char*  zipargs[6] =
   {"zip21", "-q", "-9", "temp.zip", "temp.tmp", 0};

Notice the similarity to the syntax of the command line version:

zip -q -9 temp.zip temp.tmp

I did not change any of the internal workings of the core info-zip code itself. This means that special care must be taken when setting up the arguments. Setting up the arguments properly will ensure that the code does not crash when you run it. The first thing you notice is that arg[0] is the name of the executable. Also, there needs to be a zero argument at the end of the list. This is because a couple of times in the zip code the program cycles through the arguments with a while(args != 0) statement. As far as which arguments to use, I believe all the current info-zip arguments that are supported should work. The only exception to this is any argument that only outputs to stdout will not be visible, i.e. the output is lost. It's possible the pmprintf utility might pick these up though. As you can see from above I have used the -q and -9 arguments with success; but I have not tested them all.

Now that you have the arguments set up, you are ready to compress the file or files. At this point I will assume that these files have already been created. If you are changing a current application to use the compression this means that in your File...Save routine, you insert the compression code after your application file has been created. After your file is created it is now ready to be compressed. Put the name of the file to be compressed and the name of the zip file in the arguments, as I have done above. Then you are ready to call the zip function. The zip function takes two arguments and returns an error code:

rc = Zip(5, zipargs);
if(rc != 0)
{
  WinMessageBox(HWND_DESKTOP, hwnd,"Could Not Zip File",
                "File Write Error", 0, MB_OK);
  return;
}

The first is the number of arguments you are passing to the function. The trailing zero argument is not an argument, it is just a placeholder to notify the code the argument list is done, and so does not get included in the count. At this point your files are in the zip file so now you can delete the uncompressed form:

DosDelete(szFileDefault);

The string szFileDefault contains the name of the uncompressed application data file. In my own application code I usually name the zip file to have the same name as the user selected file from the dialog box. So after I delete the uncompressed file I rename the compressed file with the string szFileDefault.

At this point you should now have a seamless compression routine. When a user saves a file the following steps occur 1) Save the file as you have been all along, 2) Setup the arguments to compress the file, 3) Compress the file, 4) Delete the uncompressed file, and 5) rename the compressed file to the user selected name.

Now you also want to add the appropriate uncompression to the File...Open routine so that your application can read in the data. The procedure for calling the unzip function is exactly the same as for the zip. You need to setup the arguments following the same rules for the zip function as outlined above. Then you simply call the unzip function in the same way you called zip:

rc = Unzip(4, unzipargs);

The first parameter is again the number of arguments, excluding the trailing zero.

So to include the seamless uncompression routine into your application simply do the same steps for zip but in reverse: 1) rename the compressed file to the name of the zip file, 2) Create the uncompressed file by calling the unzip routine, 3) Open the file with your current application code.

Conclusion

Hopefully this article has given you a taste for what is involved with putting a native compression routine into your own application. By using a well established compression routine, the data files you create automatically become compatible with the hundreds of applets that utilize info-zip compression algorithm. The only trick has been to convert this command line utility into a form which is easily accessible by the developer from within C code.