REXX inside and out - File I/O

From EDM2
Jump to: navigation, search

Written by Joe Wyatt

Introduction

When Rexx was originally written for IBM's Virtual Machine operating system, it was given no native file handling capabilities. VM has its own file handling command, EXECIO, which has proven to be sufficient in that environment. There is a down side to everything and the problem with EXECIO is that it is part of VM and not an architected element of the Rexx language. When portability became an issue, Mike Cowlishaw, author of Rexx, went back to the drawing board and designed enhancements to the language that allow Rexx programs to read and write to files in such a way as to be independent of the operating platform. This article will take a look at each of Rexx's file handling functions to perform the basics of file manipulation.

"So what, Uncle Bob, are the basics", you ask? Well, boys and girls, quit calling me Uncle Bob and we'll get on with it. Rudimentary file function includes opening a file, writing to it, reading from it, and closing it. There is more than one way to do each of these and we will discuss all of the ones I know about, their good points, and their bad. Before we get into the nitty gritty we must get a definition behind us.

file pointer
A file pointer is a place holder the operating system uses to keep track of where the program is reading or writing in a file. Let's use a small file as an example to provide an illustrative definition. Imagine the file as 512 characters typed onto a page. Take your finger and point to the first character on the page. Read the character under your finger and move your finger to the next character. Your finger on the page is like the file pointer OS/2 uses. When you tell OS/2 to read a line or a character from a file, OS/2 moves its "finger" to the next line or character to be read. Although OS/2 doesn't really have any fingers it does have a better imagination than you do.

Opening a file

While Rexx does not contain a specific "open file" function, there are at least five methods of opening a file on purpose (explicitly as opposed to implicitly). The first four are easy to explain away.

Any attempt to write to an unopened file using the lineout or charout functions will implicitly attempt to open the file for read/write processing. These functions can also be used to explicitly open a file. The following statements:

/* open file and place file pointer on line 1 */
rc = lineout( myFile, , 1 );
rc = charout( myFile, , 1);

Figure 1: Implicitly opening a file for read/write access.

will attempt to open the file whose name is contained in the variable myFile. If the open process is successful the value of rc will be "0" and the file pointer will be positioned at the beginning of the existing file. If, for some reason, the file cannot be opened rc's value will be "1". While this is functional, it generally leads to such descriptive error messages as, "Hey, Hoser! Your file just ain't going to make it this time." This isn't a lot of debug information to go on. The input functions are similar in that the first use of them on an unopened file will implicitly attempt to open the file. They also may be used for an explicit open as follows:

/* open file and read zero lines from line 1 */
rc = linein( myFile, 1, 0 );

/* open file and read zero chars from char 1 */
rc = charin( myFile, 1, 0);

Figure 2: Implicitly opening a file for read-only access.

When either of these functions, used in this manner, are successful the file pointer is placed at the first character of the file. As far as diagnostic information is concerned these input functions are worse than their output siblings. These functions return data and no numeric code. If everything went well there is no return data from the function because of the "0" in the parameter list. If the file could not be opened then there is no return data from the function. This leads to error messages similar to, "Hey, Hoser. If you got this message then there were no syntax errors in the open call, but we don't know if your file opened successfully."

"Alas!", you cry. "Are users of my Rexx code destined to be called 'Hosers' for the rest of their days?"

Maybe. But you can at least give them some decent debugging information as well by using the stream function to open your files.

The stream function is multipurpose and can handle the opening of files very nicely. Stream has other uses, but this article will not delve into them. We will limit ourselves to using stream to open and close files. The following invocation of stream:

rc = stream( myFile, "c", "open" );

Figure 3: Explicitly opening a file.

will attempt to open the file for reading and writing and place something a bit more usable in the return area if the attempt fails.

Should the file be opened successfully the string "READY:" is stored in your result area. If there are problems, however, a string similar to "NOTREADY:nn" is issued. The "nn" in this case is a return code that indicates the reason the attempt to open the file failed. This code can be used with OS/2's help facility to provide the user with some valuable information. See example code Rexx_a.cmd for an illustration.

The stream function can also be used to open a file and designate its intended use. This allows you to set the file pointer for the processing you desire. The default open mode is "read" which will place the file pointer at the first byte of the file. Changing the third parameter from "open" to "open write" will cause the file pointer to be placed at the end of the file. "Open read" as the third parameter will set the file pointer to the top of the file and limit the function to read only.

Most of the Rexx programs that I have seen, though, do not use any of these explicit methods for opening a file for processing. If the two terms "quick and dirty" and "Rexx programming" seem synonymous to you then you may already be aware that the first use of any of the input/output functions will implicitly attempt to open the file. You still, of course, are limited to the eloquent "1" or "0" diagnostics with this method.

Reading Data From a File

Unlike open, you cannot read (or write) data by accident. Two built-in methods for reading data exist in the Rexx language. The "C" programmer in me was fairly comfortable dealing with the "one character at a time" functionality of the charin function, but the rich string handling functions in Rexx have caused me to all but abandon charin.

Charin

Charin reads one or more characters at a time from a specified location and returns the data to the result area. It is quite handy when there are no CR LF defined records in the file to be read or when a specified block length is needed, but that is usually a contrived situation on the PC platform. To test the performance, I created a file with 1024 records of 1024 bytes each and read them using both charin and linein. Detailed results are located in a table after the linein discussion. Syntax of the charin function:

text = charin(myFile, start, count);

Figure 4: Syntax of charin.

To make things interesting none of the parameters are required. If myFile is not specified then the data is taken from the standard input device (usually the keyboard). The second parameter, start, positions the file pointer before the read takes place. The final parameter, count, specifies the number of characters that are to be read. So to read eight characters starting at the 50th character, the function would be coded as follows:

text = charin(myFile, 50, 8);

Figure 5: Using charin to read a specific set of characters in a file.

That's about all there is to it. An example of the use of charin is included as Rexx_b.cmd. This program does a hex dump of itself. Not very useful, I agree, but you are welcome to customize the program for your own uses.

Linein

The Rexx language is a string manipulator's dream language (relative to C, PL/1, FORTRAN, Pascal, or COBOL). Because of this it is quite nice to consider an entire line at one time. This is achieved with the linein function. Linein's syntax:

record = linein(myFile, start, count);

Figure 6: Syntax of linein.

It would follow that the parameters act just like those of charin. They don't. MyFile still contains the name of the file to be read, and defaults to the standard input if not specified. Start specifies the line you wish to place the file pointer on before the read is done. OS/2 only allows a start value of 1, the beginning of the file. Count is still the number of lines to read, but the only valid values are "1" and "0".

So which function is best to perform a read? It depends. If you are performing random reads of files with constant record sizes then you might want to use charin because of its ability to position the file pointer. If there are variable length records in the file and the program considers a record at a time during processing it makes sense to use linein. There is usually a design consideration that makes one of these functions better than the other. It is a good knowledge of how each one works that gives the programmer the power to make that decision.

                     linein               charin
1 byte/read          n/a                  705.94 secs
1K bytes/read        3.7 secs             1.65 secs
1M bytes/read        n/a                  2.6 secs

Table 1: Comparison of the performance of linein and charin by the number of characters handled when reading file with 1024 records of 1024 bytes each. Cache was cleared between each test.

Writing to a File

Each of the input functions described above, charin and linein, have twins for writing data to a file. These functions are fairly consistent with their sibling input functions, but there is a difference when these functions are used to open the file implicitly. If charout or lineout are used the file is opened for write, the file pointer is positioned at the end file, and the data to be written is appended to the file.

Charout

Charout will place a string into a specified file starting at any location within the file that the programmer desires. No carriage return/line feed combination is inserted after the data.

count = charout(myFile, text, start);

Figure 7: Syntax of charout.

This function call looks much like charin with text, start, and count shifted. Text is the string to be written to the file. String can contain non-printable characters (including carriage returns and line feeds). Start, again, is the position of the file pointer before the read. The desired value of count is 0.

Count does not indicate the number of bytes successfully written to the file, but indicates the number of bytes not written to the file because of some type of error.

Lineout

While charout could be used several times during the output of a single record, lineout will always append the text with an end of record marker. This requires the programmer to build all data for the record in a program variable. Lineout looks quite similar to its cousin, charout.

count = lineout(myFile, text, start);

Figure 8: Syntax of lineout.

The parameters and return values are the same as charout except that they deal with lines rather than characters. The only valid value of start is "1". This will set the file pointer to the beginning of the file and existing text will be overwritten. Omit the parameter to continue writing where the file pointer currently resides.

Closing the File

Either of the output functions may be used to close the file by supplying only the file name as an argument.

lineout( myFile );
charout( myFile );

Figure 9: Implicitly closing a file.

Either of the two functions will suffice. However, and there is always a however, if the close should fail there is no indication of this fact nor the reasons for its occurrence. The stream function should be used for programs other than personal utilities to close the file as well as open it.

rc = stream(myFile, "c", "close");

Figure 10: Explicitly closing a file.

The above invocation should look familiar. The diagnostics are not as descriptive for close as they are for open, but at least there is an indication of the success of the call. Should the close command fail, the stream function will return a null string. The literal "READY:" is returned after a successful close.

Other File Functions of Note

Reading data from a file is fine and good, but it would be nice to know when to stop reading. I'll quickly outline two functions that exist for this purpose.

Chars

Chars is a function that will return the number of characters from the current location of a file pointer to the end of the file. This function can be used with the character based input function, charin, to calculate the number of times a loop needs to be iterated to read the entire file. It is suggested that chars be called once before the loop to calculate a constant value for the loop comparison rather than calling the function in the loop. Chars only accepts one argument.

count = chars( myFile );

Figure 11: Syntax of chars.

The above invocation will return the number of bytes from the current location of the file pointer to the end of myFile. Execution of this function against any defined OS/2 device, such as LPT1:, will always return "1". Chars cousin is lines.

Lines

You might think that the lines function would return the number of lines from the file pointer to the end of the file. You might have caught on by now that OS/2 has no idea how many lines there are from the current file pointer to the end of the file. Since there is no constant record structure in an OS/2 file the lines function simply returns a "1" if there are remaining lines to be read. This means that you cannot use lines to calculate a constant number of times to process a loop, but must use the function in the loop comparison.

do while( lines( myFile ) )
   rec = linein( myFile )
end

Figure 12: Using lines in a loop.

This snippet of useless code will read all of the data in myFile and stop at the end of file. The body of the loop should be customized to actually perform some useful work.

That's the Basics

You can't get much more basic than open, close, read, and write. There is more than one way to do each of these. The correct way depends on the context of your program and its intended audience. The important point is that your understanding of the available functions is key to your choosing the best method to fit the program's circumstances.

More (and probably better) information can be obtained from OS/2's online Rexx manual. If you would like to comment on or contest any of this information please feel free to do so.