Advanced REXX Programming Topics

From EDM2
Jump to: navigation, search

By Charles Daney

Introduction

REXX is one of the great under-appreciated treasures of OS/2. Although REXX as a programming language is now 15 years old and REXX has been implemented on just about every major computer platform, many OS/2 users will be encountering it for the first time. And when it is encountered for the first time in OS/2, users are often introduced to REXX as just a modern replacement for the primitive Microsoft-designed "batch" language.

In fact, REXX is a very good "batch" or "procedure" language, in that it can be used to automate repetitive sequences of operating system commands that need to be used together in a "batch". But REXX brings all of the features of a full-fledged programming language to this task - variables, arithmetic, input/output, and control structures. These features make it possible to do things easily in REXX which could be done in the old batch language (if they could be done at all) only with very arcane and convoluted techniques. Many computer columnists and book authors at one time wrote endless streams of words describing yet another clever trick to make the batch language perform tasks it simply wasn't designed to do - tasks that are almost trivial in REXX.

Many OS/2 users are introduced to REXX for the first time when they come across a simple REXX procedure to manipulate features of the Workplace Shell - such as creating folders, adding background bitmaps, modifying OS/2 configuration parameters, or changing system fonts. Other users encounter REXX for the first time as a tool for writing installation programs for various applications. These are all excellent examples of REXX used as a "batch" language.

Unfortunately, this usage of REXX may have obscured the fact that, since REXX is a complete programming language, it can actually do much more. Recently, several tools for visual programming have become very popular in OS/2 - VX-REXX from Watcom, VisPro/REXX from Hockware, and GpfRexx from Gpf Systems. As the product names imply, each of these products uses REXX as its underlying development language. So OS/2 users who first encountered REXX as a better batch language also discovered that it is the foundation of powerful tools for visual programming.

It is possible, using tools like these, for people with little detailed knowledge of GUI programming (or perhaps of any kind of programming) to quickly learn to develop professional-looking applications that use the full toolbox of OS/2 Presentation Manager elements, such as dialogs, push buttons, list boxes, notebooks, and containers. The user interface builders do most of the work of automatically generating REXX code to create the desired interface. It is necessary for the developer to write only a relatively small amount of REXX code to create a customized application. This makes it natural to use REXX for personal programming or application prototyping in a GUI environment.

But there are even more surprises for OS/2 users as far as REXX is concerned. In fact, OS/2 users are just now discovering that many OS/2 applications have adopted REXX as their primary or alternate "scripting" language. The list of such "REXX-enabled" applications now includes:

Text editors
  • KEDIT (Mansfield Software)
  • SPF/2 (Command Technology)
  • Tritus SPF (Tritus)
  • EPM (IBM)
  • SourceLink (One Up Corporation)
Communication software
  • REXXTERM (Quercus Systems)
  • PMCOMM (Multinet)
  • TE/2 (Oberon Software)
  • Extra! (Attachmate)
  • Communications Manager (IBM)
  • TCP/IP (IBM)
Word processors
  • Ami Pro (Lotus)
  • DeScribe (DeScribe)
Database tools
  • DB2/2 (IBM)
  • dbfREXX (dSoft Development)
  • REXXBASE (American Coders)
  • QELIB (Q+E Software)
  • XDB-QMT (XDB Systems)
Other
  • 1-2-3 (Lotus)
  • MMPM/2 (IBM)
  • Deskman/2 (Development Technologies)
  • Fax/PM (Microformatic)
  • Chron (Hilbert Computing)

Clearly something is going on here - REXX is being adopted as a universal macro language by sophisticated OS/2 applications. The most obvious benefit of this is that users no longer need to learn a new language to control each application - one common language can do it all. In the past, the very term "macro language" has been daunting to many users because such languages have tended to be obscure and difficult to use since they were designed with only a limited purpose in mind. And this is on top of the fact that they were different for each application. REXX makes a big improvement here. While it is not true that there is an insignificant learning curve to REXX, the language is still fairly natural and intuitive. But most importantly, once it has been mastered, one doesn't have to learn it all over again for a new application.

There is a second major advantage to having REXX as a universal macro language. This is the fact that it puts REXX in the position of being able to communicate with any REXX-enabled application. And hence, every such application can communicate with any other one through REXX. So REXX becomes a sophisticated, programmable, interprocess communication tool.

The bottom line of all this is that REXX is not just a better batch language, or a tool for visual programming and rapid application development, or a universal macro language. It is all of these things at the same time, so perhaps the best way to think of REXX is as a gateway to most of the facilities of OS/2 and among all REXX-enabled applications. REXX gives programmers access to most OS/2 services, like the Workplace Shell, multimedia, interprocess communication, and the file system. But because REXX is also accessible from applications, it makes all these services available to the applications, and it makes the services of each application available to others. A REXX-aware application does not (necessarily) need to provide its own support for multimedia, serial communications, or database services, because it can utilize REXX scripts to get at these capabilities. REXX can be thought of as "application glue".

With all this as prologue, it is not difficult to understand why so many OS/2 users have already taken the time to become at least passingly familiar with REXX (and many more will in the near future). Now, REXX was designed to be easy to use, but there's no point in pretending it is effortless, since it isn't. Prospective users of REXX should be prepared for a learning curve and should allow some time to get reasonably proficient with it - anywhere from a few days to a few weeks depending on the individual and his or her background. The good news is that, once proficiency is gained, the investment can be reused again and again because REXX is so versatile.

In the rest of this paper, we will assume that the reader has already attained some level of comfort in understanding and using REXX, because the purpose here is to provide an introduction to some advanced topics in REXX. Just as with the initial learning of the language, one will be repaid over and over by the mastery of some of these advanced techniques, because they are (potentially) equally useful in a complex VX-REXX based application, an Ami Pro macro, or a personal utility batch program.

Just in case you don't feel you already have a good grounding in the fundamentals of REXX, we'll mention a few resources you can turn to for review. (See the Bibliography for full details.) First, online documentation for REXX comes with OS/2. But hard copy, including a tutorial that isn't online, can be purchased from IBM. _The REXX Language_ by Mike Cowlishaw (who invented the language) is a very good language definition and complete reference on the rules. (A lot of IBM's own documentation is straight out of this book.) The present author's book, _Programming in REXX_, is recommended for readers who want a more detailed explanation of REXX concepts and general programming techniques. These two books deal with REXX in general and do not refer to OS/2 specifically.

There are already several books that do cover REXX explicitly from an OS/2 perspective. Two of the best are the _OS/2 2.1 REXX Handbook_ by Hal German and _Application Development Using OS/2 REXX_ by Tony Rudd. Finally, every even halfway serious REXX programmer should have a copy of the _REXX Reference Summary Handbook_ by Dick Goran. This last is not a textbook but rather a handy reference summary filled with the essential information about REXX, several REXX function libraries, and details of parameters used to control the Workplace Shell.

Data Structure, Program Structure

It's possible to become proficient with REXX simply by learning the relatively simple syntax rules of the language, studying a few good sample programs, and proceeding to actually use REXX in creating prototypes or full applications that are useful to yourself personally or to your company. Any one of the visual REXX programming tools mentioned before can be highly recommended to assist in the learning process. They do a lot of the work for you and teach you many of the standard REXX rules and techniques as you go. They also include their own debuggers, which can show you exactly how REXX programs operate.

However, because REXX is a full and complete programming language, becoming really good at it requires that you learn a few key concepts as well. Some of these concepts are shared by other programming languages, but there are others that are more or less unique to REXX itself.

The concepts we want to focus on have to do with data structure and program structure. In both cases, there are significant differences in how these structures are used in REXX as compared to other languages.

Data structures have to do with how program data is organized. Like other languages, REXX has variables. But REXX is rather different in how it organizes collections of data. Unlike most languages, REXX does not have arrays in the usual sense, but it does have a much more powerful structure called "compound" variables. Although such compound variables can be a little more awkward to work with than ordinary arrays, they can also do a lot more. A compound variable is something like an array that may have non-numeric subscripts. Sometimes this is called an "associative" array, because data items can be retrieved by their "association" with other data items. Another way to think of this is that it's like looking up a record in a database on the basis of some key value, e. g. a person's name or a book's title.

Another thing that is definitely lacking in REXX is the notion of a "structure" in the narrow sense of a C structure or Cobol record. This can be a major stumbling block for would-be advanced users of REXX. C structures are often used for one of two things. They may represent "records", which are collections of related data, such as information related to a particular employee. There are relatively simple (though somewhat clumsy) ways to simulate records in REXX using compound variables.

C structures can also be employed to build up more complex data objects using techniques involving linked lists, trees, and so forth. Such intermediate level data structures can in turn be used to represent fairly high-level data abstractions like sets, collections, tables, and the like. It becomes rather awkward to use REXX compound variables to simulate linked lists directly, and even harder to use these to implement the high-level abstractions.

But fortunately, it is often possible to succeed by rethinking the whole problem in terms of native REXX facilities. Many high-level abstractions can be handled in REXX without working with lower-level representations at all. A lot of the discussion here will explain how to do this.

When we come to discuss program structure here we will not say very much about traditional concepts of "structured" programming - loops, subroutines and functions, handling of alternative cases. The REXX facilities in this respect are largely similar to what exists in other languages like C or PL/I. Instead, we will look at how REXX applications are organized in one or more files and how these semi-independent pieces can communicate and share data. This is an issue that often does not even arise with other languages, where programs are traditionally all linked into a single executable file.

REXX applications, in contrast, often consist of separate program units that are never explicitly linked together at all. Sometimes the relation between such units is the simple one of caller and callee, in much the same way that subroutines and functions can be used within a single source file. But sometimes the relation can be more complicated, particularly in OS/2 with its sophisticated multithreading abilities. The relationship of different program parts can become even more complex when visual REXX programming tools are used, since there may be one or more pieces of REXX code associated with each event that can occur for a specific interface element - such as a button press.

Keeping persistent data associated with a single REXX program, and sharing data between independent REXX programs, can become a very tricky problem, but a solvable one. And this is the other issue we plan to address here.

There is, moreover, a significant interaction between data structure and program structure. Specifically, you have to consider how to represent data in REXX not only in light of how it will be used internally, but also in terms of how it may be shared between independent REXX programs.

Compound variables

Arrays are the most commonly used data structure in programming. Although REXX does not have arrays as such, compound variables can usually be used like an array, although with some occasional syntactical awkwardness.

To review, a compound variable is one whose name is derived from a compound symbol, that is a symbol which begins with a legal symbol character other than a number or a period, contains at least one period, and at least one character following the last period. The following are all legal compound symbols:

array.i
restaurant..address
a.b.c.d.e.f.g.h.i.j.k.l.m.n.o.p.q.r.s.t.u.v.w.x.y.z

The part of a compound symbol up to and including the first period is called the "stem". It is always used literally. All other simple symbols occurring in a compound symbol, i. e. the symbols delimited by the periods, are replaced with their current values in forming the name of a compound variable. In this respect they are analogous to the subscripts of arrays in other languages. Indeed, it is possible to think of a symbol like A.i.j.k as equivalent to an array element, which would be expressed as A[i][j][k] in C.

There are, however, significant differences between such "arrays" in REXX and arrays in other languages. Some of these differences represent advantages of REXX, but others are disadvantages. On the positive side, because of REXX's dynamic memory management, it is never necessary to declare in advance how large an array will be. It simply grows as needed, and (usually) does not consume storage for "unused" elements of the array.

In fact, a REXX array does not even have a specific "dimension" like an array in other languages. This is because the periods in a compound symbol have syntactic meaning only in the symbol. Once the name has been derived by substituting all values of simple symbols, there are really only two parts to it: the stem and the "tail". For instance if we have

i = 3
j = 4

then the symbol A.i.j actually consists of just the stem, which is "A." and the tail, which is "3.4". At this point, the fact that the tail still contains a period is irrelevant. So, if we also have

x = 34
y = 10
z = x/y

then the symbol A.z refers to exactly the same piece of data, because z has the value "3.4". Useful programs can actually be written that take advantage of this ambiguity of the "dimension" of a REXX array.

The tail of a REXX variable can consist of completely arbitrary data, including blanks and unprintable ASCII characters. In contrast to the usual situation in REXX, blanks are significant in a variable tail. Thus if

x = ""
y = " "
z = "  "

then A.x, A.y, and A.z refer to three completely different data items, even though x, y, and z are "equal" when compared with the normal comparison operators. This is another respect in which REXX "arrays" differ from those in other languages.

In some ways, however, the REXX array notation is not as powerful, or at least as convenient, as the notation of other languages. In particular, it is not possible to have expressions in a REXX "subscript". For instance, A.i+j is the sum of A.i and j, instead of an array element with the subscript i+j. Even parentheses cannot be used to circumvent this problem, since A.(i+j) is actually a function call to a function named "A.".

This notation is usually the most inconvenient when you simply want to use another compound symbol as a subscript. So if i.j is a value you wish to use as a "subscript", you cannot just refer to A.i.j. You must assign i.j to a simple variable first:

x = i.j
say A.x

Despite these syntactical inconveniences, the great power of REXX's notation lies in the fact that "subscripts" can be non-numeric. This allows you to build data structures which easily associate data values with data names. Suppose, for instance, that you want to work with a database of books. In REXX you can do this by having a number of arrays, each of which is subscripted by the name of the book. The names of these arrays might be "author", "date", "publisher", "ISBN", and so forth. Then if the name of a particular book is stored in the variable "title", you can retrieve all of the other information directly be referring to author.title, publisher.title, etc. Because of this direct association from a name to a value, such data structures are sometimes called "associative arrays".

As far as the language user is concerned, there is no search process at all involved in looking up the author of a given book. In reality, of course, REXX does need to do a search to find each piece of data. The advantage is that this search process is all built-in and transparent to the user.

To continue with the example, the collection of variables indexed by the book name is, in effect, a data structure much like a table. The rows of the table are labelled by book names, and the columns of the table have labels like "author", "publisher", etc. This is very typical of a more complex data structure in REXX: while it's conceptually a single object, it is actually composed of a number of REXX compound variables that have been "subscripted" in the same way.

Concretely, you would select appropriate compound variable stem names for each "column" of the table. For example:

book_author.
book_publisher.
book_date.
book_isbn.

Perhaps, if you do some programming in C where case is significant, you might prefer to use:

BookAuthor.
BookPublisher.
BookDate.
BookISBN.

Just remember that case is ignored in REXX stem names, so that "bookauthor." (for example) is not a different name.

In practice, you will have data on a number of books, and before you can use it in a program, it has to be loaded from somewhere. The data may normally be kept in a flat file, for instance. Unless you are importing the data from another source that has already determined a file format, you are relatively free to structure the data file any way you want. Let's say you decide to identify each data element with a tag so that your file contains lines like this:

Title:          Programming in REXX
Author:         Charles Daney
Publisher:      McGraw-Hill
Date:           1992
ISBN:           0-07-015305-1

You may put some sort of delimiter between the lines corresponding to a single book. Or you may just assume that every time you find a line beginning with "Title:" is starts the data for a new book. Then you could use this code to read in the data:

   do while lines(input) \= 0
       parse value linein(input) with label ':' data
       data = strip(data)
       label = translate(label)
       select
           when label = 'TITLE' then
               title = data
           when label = 'AUTHOR' then
               book_author.title = data
           when label = 'PUBLISHER' then
               book_publisher.title = data
           when label = 'DATE' then
               book_date.title = data
           when label = 'ISBN' then
               book_isbn.title = data
           otherwise nop
           end
       end
   call lineout input

This example is fairly straightforward, but we will come to some ways to simplify it later. As an aside, note the use of the STRIP function to remove leading and trailing blanks. We did not assume that the "data" part of the record was a single word, since we may very well want to have embedded blanks. And we used the TRANSLATE function to make sure we were working with the label in upper case. Also note that this example (as with most of the rest) has minimal error checking. We don't check that the labels are valid, for instance, except to provide an OTHERWISE case that does nothing in the SELECT statement.

Since a table is a typical sort of two dimensional array, you may be wondering whether it could be represented by using a single compound variable with two "subscripts". You might think of using a stem "book." and where one subscript is one of the column labels such as "author" while the other is the book name. In other words, a table entry would be referred to as "book.field.book_name", where "field" would be a variable with a value like "AUTHOR".

There are several pitfalls with this approach, which is why we didn't suggest it to begin with. The first is the fact that you will frequently want to refer to the elements of a particular column by putting the column name in explicitly:

   say book.author.title

in order to display the author of a book if you were given a title. This would actually work, provided there is no variable named "author" that has been assigned a value. (Since REXX uses a default value of "AUTHOR" - note the upper case.) However, if some time previously you had

   author = "William Shakespeare"

then you would be attempting to reference a compound variable whose tail begins with "William Shakespeare" instead of "AUTHOR".

There are various ways around this problem. For instance, you can always use an extra variable in the symbol:

   field = 'AUTHOR'
   say book.field.title

But this is cumbersome, and you also have to be very careful about alphabetic case, which is significant in the tail of a compound variable. I. e.

   field = 'Author'
   say book.field.title

wouldn't work unless you always use 'Author' instead of 'AUTHOR'.

Note that one thing you can't do is to use a literal in the compound variable name:

   say book.'AUTHOR'.title

doesn't work, since the expression following SAY is the concatenation of (the value of) book., the literal 'AUTHOR', and ".TITLE".

Even if you are careful never to assign anything to "author", there is a performance penalty with using it, because REXX will try to look up a value anyway. One thing you could do to get around that is to pick somewhat odd names for the columns, like '0AUTHOR'. This is actually legal, and will work, and will not try to perform unwanted substitution:

   say book.0author.title

but it certainly isn't elegant.

There is perhaps just one thing that can be said for the path we have been exploring where we use just one stem name for this collection of data. That is, you can more easily refer to the whole collection somewhat more compactly when you want to pass its name to a subroutine, or use it with EXPOSE or DROP. E. g.

   drop book.

makes the whole collection undefined, whereas otherwise you would need to be much more verbose:

   drop book_author book_publisher book_date book_isbn

and it could be a lot harder to maintain code that uses names in this form, since you must make a lot of explicit references to all of the columns of the table.

However, there is one thing that can be done to ameliorate this problem a little. REXX allows you to use a whole list of names with DROP and EXPOSE if you assign the list to another variable:

   book_stuff = 'book_author book_publisher book_date book_isbn'
   drop (book_stuff)

But it's clumsy even so. Still, there is hope. It is possible to avoid having to refer explicitly to every "column" of the table in some cases. Earlier we said there was a more compact way to write the code for reading in the book data. It is based on the fact that the stem part of a compound variable name can be a variable or computed string if we use the VALUE function. Here is how the code to read in book data might look with this approach:

   do while lines(input) \= 0
       parse value linein(input) with label ':' data
       if label =  then
           iterate
       data = strip(data)
       label = translate(label)
       if label = 'TITLE' then
           title = data
       else
           call value 'book_'label'.title', data
       end
   call lineout input

This is a lot more compact, and doesn't need to be changed if arbitrary new types of book information (i. e. columns) are added. VALUE is a tricky function to understand, but it's very convenient once you get the hang of it. If it isn't clear, try running the code using TRACE R to get a feel for what is happening here. (Note that we tested for a null input line, which might occur at the end of file and could be elsewhere.)

There is another, entirely different, difficulty with this table structure as we have outlined it so far. As we have presented it, the code and data structure are well-designed for taking a complete book title and retrieving information about it, such as the book's author, publisher, etc. All you have to do is reference the data using the book title as an associative key.

You can even tell easily whether a new book to be added to the database is already included:

   if symbol('book_author.title') = 'VAR' then
       say 'Already in database:' title

But it is an altogether different matter if you want to use the data some other way. Perhaps you want to list all books in the database that have a certain author or a certain character string in their title. Here we see a major disadvantage of non-numeric subscripts in REXX: there is no simple way (as there is with numbers) to iterate through "all values" of the subscript.

This is a problem that arises repeatedly when using non-numeric subscripts. There just isn't any way to find, in standard REXX, all the variables having a given stem. (Ironically, programs written in other languages that interface to REXX can do this through the shared variable interface.) But there are ways around the problem. One way to accomplish this objective is to make a compromise with the conceptual simplicity of pure associative arrays and re-introduce numerically subscripted arrays.

In our example, what we do is use the stem book_title. to store book titles. At the same time we build the rest of the table, we also set book_title.i to the title of book number i. Then, in any circumstance where we have to search through the whole list of books, we iterate through values of book_title.i for i=1 to whatever the largest book number is. Using the name stored in book_title.i we can then retrieve any of the other information, since it is "subscripted" with that name.

Here's how we might write the code to read in the data with this technique:

   count = 0
   do while lines(input) \= 0
       parse value linein(input) with label ':' data
       if label =  then
           iterate
       data = strip(data)
       label = translate(label)
       if label = 'TITLE' then do
           count = count + 1
           book_title.count = data
           title = data
           end
       else
           call value 'book_'label'.title', data
       end
   call lineout input
   book_title.0 = count

Notice that we set the .0 element of the compound variable to the number of elements in the "array". This is a very common convention one finds in REXX, though it isn't an official part of the language.

Now we have a means of looking up book information either directly through the book name or by processing the whole table of information sequentially. But there is yet another problem to consider. It depends on the details of how REXX compound variables are actually implemented. Most implementations store the data in binary trees. For processing efficiency, each node of the tree contains the string value of the "subscript". If the same string is used as a subscript on many variables, it may therefore appear many times in storage. With names that are long (such as the names of books), this can waste a lot of storage.

One last variation of the sample code fixes this:

   count = 0
   do while lines(input) \= 0
       parse value linein(input) with label ':' data
       if label =  then
           iterate
       data = strip(data)
       label = translate(label)
       if label = 'TITLE' then do
           count = count + 1
           index.data = count
           end
       call value 'book_'label'.count', data
       end
   call lineout input
   book_title.0 = count

Again the solution is to make another compromise and use numeric subscripts on most arrays. Since each book is associated with a unique index (the index in the "book_title" array), we might just as well use this numeric subscript to index all of the individual columns in the table. Then, to retain the ability to do associative lookups we introduce one more array, which will be the only one actually subscripted with the book title. We might call this array "index", and provide that index.title is the common numeric subscript for all the other arrays (the table's columns) which hold particular kinds of data about books. This data now includes the book's title, and each row is instead labeled by a number.

Then it will be true that if i = index.title, book_title.i = title. And all the other information is referenced as ook_author.i, book_publisher.i, etc. So it is still possible to do associative retrieval. Suppose we want to display all information about a book in response to a query. We might use the code fragment:

   i = index.title
   say 'Data for' title':' 'Author='book_author.i',',
       'Publisher='book_publisher.i',' 'Pub. date='book_date.i

But now it is also very easy to produce a report on all books in the database:

   do i = 1 to book_title.0
       say "Book:" book_title.i', Author:' book_author.i
       end

Or you could insert any selection logic you want into to the loop if you need to limit the search based on date or publisher or whatever. For instance, if you want only books that contain a certain phrase in the title:

   do i = 1 to book_title.0
       if pos(phrase, book_title.i) = 0 then
           iterate
       say "Book:" book_title.i', Author:' book_author.i
       end

That's not a bad solution. We now have the means to do both associative retrieval and sequential search in our "database". Notice that our database is kept entirely in memory in REXX variables, using natural REXX data structures. Unless the database is very large (perhaps a megabyte or more), this isn't a problem in OS/2.

Even so, we might wonder whether it isn't possible to do better. It seems at least a little inelegant to have to do a sequential scan of the database every time we want to find something that has not been explicitly indexed by keeping a separate array. Particularly since complex conditions may run somewhat slowly. But it's not really possible to do much better with standard OS/2 REXX.

There are, however, third-party extensions to REXX that are available for adding all kinds of new capabilities to the language. One of these, Quercus Systems' REXXLIB, has a number of functions for working with REXX arrays and compound variables.

One of the functions, called CVTAILS, is capable of scanning all the tails of a given compound variable searching for matches on a particular string. Using CVTAILS, the above loop would reduce to:

   call cvtails 'index.', 'list.', phrase
   do n=1 to list.0
       title = list.n
       i = index.title
       say "Book:" book_title.i', Author:' book_author.i
       end

What CVTAILS does is to construct another array (in the 'list.' variable) whose values are the selected titles. Although we still need a loop, it runs only over the list of actual "hits" we found in order to display them.

In order to understand this example, recall that the index. compound variable was "subscripted" by actual book titles, and the value of each item was the numeric index in a "conventional" array.

CVTAILS could have been used without a search phrase to actually generate the list of all tails of a compound variable without having had to set this up explicitly. (In the present case, this corresponds to the "book_title." array.)

Let's consider a slightly different problem. What if we wanted to find all the book titles by a particular author? We could do:

   do i = 1 to book_title.0
       if book_author.i \= name then
           iterate
       say "Book:" book_title.i', Author:' book_author.i
       end

All we've changed is to search a different column of the table. Can this be done with CVTAILS? No, because the author name was not kept in an index variable. But there is another function, CVSEARCH, in REXXLIB that will do what we want:

   call cvsearch 'book_author.', 'list.', name
   do n=1 to list.0
       i = list.n
       say "Book:" book_title.i', Author:' book_author.i
       end

Suppose we wanted to get fancier and product a report of books sorted alphabetically on the title. This could be a challenging exercise, since REXX does not have a built-in sort routine. But let's suppose for a minute that it did, called ARRAYSORT. This takes a REXX array (i. e. a compound variable indexed from 1 to whatever) and sorts it based on the value of each item. Or more generally, on one or more subfields within the value. One could write such a function in REXX itself, though it is not a trivial exercise, and the performance would probably by a little slow.

We still have to adapt such a routine to be used with the way we have stored the data. Suppose you just did:

   call arraysort 'book_title.'

in order to rearrange the 'book_title.' array. The problem is that this sorts only one column of the table. All other columns of the table would be unaffected, so all connection between titles and the other information would be lost. The relevance of this to the issue of exactly how we choose to store the data is that if we had continued to subscript all columns by the actual title there would not have been a problem. To reiterate, you have to consider how you will use the data at the time you decide how it will be stored.

But there are relatively simple expedients that can be employed if you decide you want to continue storing the data in numerically-subscripted arrays. You could, for instance, make a copy of the array of titles, and sort the copy:

   do i = 0 to book_title.0
       sorted_book_title.i = book_title.i
       end
   call arraysort 'sorted_book_title.'

REXXLIB contains two functions, ARRAYCOPY and CVCOPY, either of which could be used to make the copy without using a loop:

   call arraycopy 'book_title.', 'sorted_book_title.'
   call arraysort 'sorted_book_title.'

Then you could iterate through 'sorted_book_title.' and retrieve the proper subscript for each column using the 'index.' array:

   do n = 1 to sorted_book_title.0
       title = sorted_book_title.n
       i = index.title
       say "Book:" title', Author:' book_author.i
       end

But there are additional alternatives. For instance, you could construct a new array that contains both the title and its original numeric index:

   do i = 1 to book_title.0
       sorted_list.i = right(i, 5) || book_title.i
       end
   sorted_list.0 = book_title.0
   call arraysort 'sorted_list.',,, 6

The extra argument to ARRAYSORT is the position in the string at which sorting is to begin. It skips over the first 5 positions which contain the index number.

Then to use this array:

   do n = 1 to sorted_list.0
       parse var sorted list.n i 6 title
       say "Book:" title', Author:' book_author.i
       end

There are many other techniques that could be used as well, and the choice of which to use depends on the nature of the problem, or your own personal taste. It can be hard to predict ahead of time which methods will perform best. So if this matters, the best advice is to actually benchmark different approaches.

Incidentally, ARRAYSORT is another function that is included in REXXLIB.

We're going to move on soon to the other important "structure" issue in REXX programming, namely overall program structure. But first let's look at another data structure issue that turns out to be relevant to program structure.

Up until now we have been supposing that our book database is stored in a flat ASCII file. It could just as well have been stored in a more conventional database file maintained by DB2/2 or some other database manager. If we used a conventional database manager, then many of the issues of data retrieval and sorting might be handled by the database manager itself. This would be convenient, since we wouldn't have to program the operations in REXX.

However, there are drawbacks to using a conventional database manager. For instance, a DBMS can be difficult to set up and use. Installation alone takes time, and then there is the problem of defining the files to be used and different record structures for each application. And, if you want to distribute your application to others, you need some assurance that others have the database software - perhaps you will have to supply it to them (which can get expensive).

If you are using one of the REXX GUI application builders, this job may be easier, since they all have various tools for using a number of OS/2 database management systems.

However, we think that in fact a very large number of "database" applications can be programmed solely in REXX without an extra DBMS, simply by using the techniques presented here. Any time you have a set of data - whether it is about books, or people, or your multimedia CD-ROM collection - you have a database. This collection of data can be kept entirely in memory while you are processing it (if it isn't too large) by representing it in REXX variables. In other words, the aggregate of all of the variable values used in a REXX program (or suite of programs) constitutes a database.

Of course, you need some way to make this data persistent, so that it endures beyond the execution of any associated program, even though it may (and often will) be modified in part by the program. And this is certainly one thing that a conventional DBMS would do for you. But suppose we don't want to use a DBMS. Are there alternatives in REXX itself to storing everything in flat files?

Of course there are. For instance, we could store the data right inside the program. Returning to our book database example, let's first consider how we would do it in another language - say C. In the first place, C has real "structures". You might have this to represent a single book record (bear with us if you don't know C):

   struct book_record {
       char *title;
       char *author; };

(We'll omit other parts of the record for brevity.) You could then define all your data as a table of such structures:

   struct book_record book_table[] = {
       { "Star Maker", "Olaf Stapledon" },
       { "Brightness Falls from the Air", "James Tiptree" } };

How does one do a similar thing in REXX? Well, right away we run into the fact that REXX doesn't have data structure declarations analogous to what is in C or in many other languages. Consequently, it isn't possible to define data statically. If you're going to keep the data in the program, you pretty much have to do it with a series of assignments that are performed at run time:

   book_title.1 = "Star Maker"
   book_author.1 = "Olaf Stapledon"
   book_title.2 = "Brightness Falls from the Air"
   book_author.2 = "James Tiptree"

And don't forget the to set the total size of each array:

   book_title.0 = 2
   book_author.0 = 2

And create the index array too:

   do i = 1 to book_title.0
       title = book_title.i
       index.title = i
       end

That looks like a lot more work than you have to do in C, and it is also more work than you have to do in order to read the data from a flat file. So what advantage could there be to keeping the data in the program itself? Well, if we could find a better way to do this in REXX, it might actually be less work to type in. Recall that we used labels in our ASCII file in order to identify different elements of a record. There's a lot of extra typing just for all those labels.

Here's an alternative. Create a function that can be called with arguments that identify each record element. Name the function "make_record" (for instance). Then you could put a series of calls in your program:

   call make_record "Star Maker", "Olaf Stapledon"
   call make_record "Brightness Falls from the Air", "James Tiptree"

That's not really much more trouble than it is in C, since you can probably use features of your favorite text editor to insert the first part of each line. Here's what make_record looks like:

   make_record: procedure expose (book_stuff)
   n = book_title.0 + 1
   book_title.0 = n
   book_author.0 = n
   title = arg(1)
   book_title.n = title
   book_author.n = arg(2)
   index.title = n
   return

Of course, there will be even more run-time overhead with using this method of initializing your REXX data structures than there is with the series of assignments. But not much. I will mention later one way to mostly eliminate this overhead.

I wouldn't necessarily recommend you always use this approach instead of entering your data in a flat file. To some extent this is another matter of taste. Especially for relatively small collections of data (100 records? 1000 records?) it can be very convenient to keep it inside the program that needs it, instead of in a separate file. You might well have some data that warrants this treatment because it is tightly coupled to the program and not very meaningful outside of it - tabulated numerical data, for instance. Also, if you are using a REXX GUI tool that builds a .EXE file, your data will automatically be incorporated in this file, and it may even be encrypted (if that is important to you).

Perhaps you want to write a data-entry program, using one of the REXX GUI tools, to help you actually enter the data you have (if it must be done manually). You will probably wind up having to write a function like make_record anyhow.

Further, this suggests an interesting possibility. How about having functions in your program to access data items as well as store them? If you frequently need to fetch the author of a particular title, it would probably be nice to have a function like this:

   author: procedure expose (book_stuff)
   parse arg title
   n = index.title
   return book_author.n

Though this is only a few lines of code, it can be cumbersome to have to rewrite it every time you need it. This is an alternative that allows associative retrieval of authors given a title, without having to construct an author index.

Or you could get even more general, and write a function that would retrieve any field of the book record:

   book_data: procedure expose (book_stuff)
   parse arg field_name, title
   return value('book_'field_name'.'index.title)

Which is used like so:

   field = 'author'
   Say 'The' field 'of "'title'" is' book_data(field, title)

Note how this looks syntactically like a 2-dimensional array reference.

One of the big advantages of this as a technique is that it encapsulates the details of the data structure used inside the access functions. So you are free to change these structures as you see fit, without having to rewrite a lot of code. This is a big advantage because you will probably want to experiment with different data structures in an application as your understanding of the problem evolves. You might even choose at some point to move the data into a DBMS, yet most of the program wouldn't need to be aware of this.

We noted earlier that there might be a lot of overhead upon startup if a REXX program has to initialize each data record with a procedure call - certainly this is so in comparison with a language like C where the data can be stored statically. There may be just as much overhead, or more, if the program has to read the data from a flat file, since each line of the file has to be parsed for content. It would be good if there were some way of storing the data in a file that could be loaded into REXX variables with a minimum of overhead.

There is such a way. One of the facilities available in some REXX function libraries is the ability to store a group of variables in an external file with a single call and to reload the variables with another call. In REXXLIB, for instance, there are actually a couple of ways to do this. VARWRITE is the function that writes the data, and VARREAD reads it. These functions can deal with either selected named variables (or stems), or all of the variables in a given file or program.

So, if you have initialized the book database as suggested above,

   call varwrite filename, 'i', 'book_title.', 'book_author.',,
       'book_publisher.', 'book_isbn.', 'book_date.', 'index.'

would be enough to save all of the information. (The second argument, 'i', means that the remaining arguments are a list of the variables to be included in the operation.) Reloading it would be even easier:

   call varread filename

Given this, you might consider putting all of the data initialization calls into a separate REXX program that stores the data in a file with VARWRITE. The main program can then load this data any time it is needed with VARREAD, and the start-up overhead is minimized. If the data doesn't change very often, you wouldn't need to write a special data-entry program for it, since you could just edit the file creation program.

Multiple-file program structure


------- ---------

We have just indicated a special case where the separation of a single REXX application into more than one source code file makes sense. I. e., when you have one program to initialize some data, and a different one that uses the data. There are many more cases when an application might be divided into two or more source files. Certainly, if there is a lot of code in the application, it is much easier to maintain the code (especially if more than one programmer is involved), if multiple files are used. Or there may be a lot of code in the form of subroutines that needs to be used in different places. It is obviously desirable to keep only one copy of the subroutine code, which can be invoked as needed.

Another reason that it is sometimes necessary, or at least desirable, to keep REXX code in separate source files has to do with the REXX GUI tools. Each one has different requirements, but in general different application windows may most easily be created and maintained when their associated REXX code is kept in separate files.

Whatever the reason, using REXX code that resides in more than one source file is a fact of life with applications of any appreciable size. It turns out that this presents several problems. So the rest of this paper will deal with various problems and solutions for working with REXX code in multiple files. This is what we mean by "program structure" (rather than issues having to do with how code might be structured within a single file).

The main problem that arises with multiple REXX source files is the difficulty of sharing data among them. There is no feature in the REXX language itself that provides globally shared data in a more or less transparent manner. In most other languages it is at least possible to have static data that can be accessed by different source files that have been linked together. With REXX, on the other hand, the situation is always like that with respect to the sharing of data between separate .EXE files.

There is a second problem we will touch on later - the sharing of code (subroutines) among the different files in a large application.

In a nutshell, here are some of the ways that data can be shared or passed around among a number of separate REXX programs:

   1. data files on disk (or virtual disk)
   2. OS/2 .INI files
   3. OS/2 "environment variables"
   4. data sharing mechanisms provided by the REXX GUI tools and other
      third-party REXX add-ons
   5. REXX external data queues
   6. other interprocess communication facilities such as named pipes

Possibly the most straightforward data sharing technique simply uses data files. The files might be managed by a DBMS, which provides the greatest amount of functionality, as well as safeguards to guarantee the integrity of data in case of concurrent access by multiple programs. Or the files could be flat files or files created and accessed by functions like VARREAD and VARWRITE, such as we have already discussed. In this case, the REXX program may have to assume responsibility for properly handling concurrent access if there is a possibility that multiple threads might need to access the data (while at least one thread might be updating it). This can be done with OS/2 semaphores. There is no support for semaphores in OS/2 REXX as delivered. But it is available in some of the REXX GUI packages and some of the third-party REXX libraries like REXXLIB.

Semaphores are easy to use, given a package that supports them. The type of semaphore that has to be used is called a "mutual exclusion" semaphore. Only one thread at a time can have "ownership" of a semaphore. A thread gains ownership of a semaphore simply by requesting it. However, if another thread already owns it, the second thread has to wait until the semaphore is released.

Before a semaphore can be used at all it has to be created. Every semaphore has a name, which resembles a file name that begins with "\SEM32". Once it has been created, other threads simply refer to this semaphore by its name. Using REXXLIB functions, the call to create a semaphore looks like this:

   call mutexsem_create "\sem32\my_semaphore"

Suppose we want to have exclusive access to a file while it is being updated. Although the file system itself provides some protection against interference between different threads accessing a file, a program that is merely trying to read the file may not be able to determine that the reason it is unable to access the file at a given moment is due to file system serialization. It may be easier to use semaphores explicitly. To protect a file this way you might have:

   /* wait until resource is free */
   call mutexsem_request "\sem32\my_semaphore"
   do ...
       /* do something with the resource */
       end
   call mutexsem_release "\sem32\my_semaphore"

The resource in question here doesn't need to be a file. It could be anything that might conceivably be shared between threads, such as an external data queue or a shared variable. (We'll discuss these shortly.)

OS/2 .INI files present a special case of data files, because there is a special access function provided in OS/2 REXX as delivered: SysIni. This interface is at a fairly high level, and provides for several logical levels of data. Separate .INI files can be used for different applications. Even within the same file, data can be organized in a two-level hierarchy of major categories and individual "keys". The categories are called "applications", but in practice you would probably want to use at least one separate .INI file for each application.

REXX programs can use the system .INI file (OS2.INI), but this should usually be avoided for performance reasons, as well as to avoid possible name-space conflicts. Another problem with using OS2.INI is that it is vulnerable to corruption since it is used by so many other applications as well as by OS/2 itself. OS2.INI is sometimes difficult to back up, and it can be completely lost when system problems occur or OS/2 is reinstalled.

.INI files can be used simply for communication among any number of independent processes in the system, but they are best used when data needs to be persistent. You should definitely avoid using OS2.INI if you have a lot of data or if you are not concerned about data persistence.

OS/2 guarantees that access to the .INI files themselves is properly protected with respect to concurrent updates. That is, calls to SysIni (for the same file) are atomic. However, if you have to make a series of calls to read or write a number of different data elements, you should use semaphores (or an equivalent technique) to ensure consistency of the data.

One of the nice things about .INI files is that the SysIni function allows you to retrieve more than one piece of data at time. The highest level of data organization in a .INI file is called the "application", and you can retrieve the names of all applications in the file like this:

   call sysini filename, 'ALL:', 'applist.'

which puts the names into the array 'applist.', with the number of items in applist.0.

For any specific application name, the second level of information is called a "key". You can retrieve the names of all keys for an application like this:

   call sysini filename, appname, 'ALL:', 'keylist.'

That yields only the names of the keys. You then have to make separate calls to SysIni to retrieve the value associated with each key.

This framework is not especially well-suited for dealing with arrays, such as repeated records of a database. But it can be done. For instance, you could store the book database in one .INI file by making each title a separate "application", since the title is the unique "key" that identifies each record. (But watch out for possible duplication of titles!) Then for each title there would be separate keys for "author", "publisher", "date", and "isbn". (We'll assume, as we have all along, that there is only one author per book - or else we keep all author names somehow in the same data item.)

Given that assumption, then this code could write a .INI file with our book database:

   do i = 1 to book_title.0
       call sysini filename, book_title.i, 'author', book_author.i
       call sysini filename, book_title.i, 'publisher', book_publisher.i
       call sysini filename, book_title.i, 'date', book_date.i
       call sysini filename, book_title.i, 'isbn', book_isbn.i
       end

And this code could read all the information back into variables:

   call sysini filename, 'ALL:', 'book_title.'
   do i = 1 to book_title.0
       call sysini filename, book_title.i, 'ALL:', 'list.'
       do j = 1 to list.0
           call value 'book_'list.j'.'i,,
               sysini(filename, book_title.i, list.j)
           end
       end

Note that in this example we have allowed that there might be additional "keys" for any book besides the standard ones we have been using for illustration. (This example doesn't set the .0 element of arrays except for book_title., and it doesn't build the index. compound variable.)

Although this code is quite a bit more complex than the equivalent using VARWRITE and VARREAD shown above, it does use only facilities delivered with OS/2. The code is also a little simpler than the equivalent for writing and reading the data in a flat file.

There are other ways to store our book database in a .INI file, such as including the numeric subscripts in individual keys. Whether you would want to use a different approach, of course, depends ultimately on how you most often need to use the data.

Another nice thing about .INI files is that they can be used by programs in any language that can access OS/2 API functions, as well as from REXX. So they offer one way to communicate between programs written in REXX and those in other languages.

OS/2 environment variables present one of the easiest techniques for sharing data between programs. REXX programs can read or write environment variables by using the VALUE function:

   call value 'dirname', 'c:\myfiles', 'os2environment'

sets the environment variable called DIRNAME, and

   x = value('dirname', , 'os2environment')

retrieves it.

OS/2 environment variables are specific to one particular process within the system. They can therefore be used to share data among REXX programs that have a calling relationship or are otherwise part of the same process. But they can't be used to exchange data across processes.

Another problem with environment variables is that there isn't any support at all for easy use of arrays. You could create separate environment variables with names like

   book_title.1
   book_title.2

and so forth, but you would have to make a separate call to VALUE to read or write each data item.

Finally, as with most other data sharing techniques, you have to develop your own access control mechanisms using semaphores to handle concurrent update problems.

Each of the REXX GUI tools provides its own method of sharing REXX variables between separate code files. VisPro/REXX, for instance, allows you to define variables associated with each "form" (roughly speaking, a window). These variables are accessible to all event procedures for the form, and to all subforms.

In the first release of VisPro this technique was necessary, since REXX variables were not global to an application by default. In release 2.0 of VisPro ordinary REXX variables did become global by default, and this extra mechanism can instead be used as a way of keeping private data associated with a form.

Variables to be handled this way are defined by entering their names in one page of the settings notebook for the form. The nice thing about this mechanism is that you can easily share all elements of a compound variable simply by entering the stem name in the settings notebook. The variables are accessed within the program just be referring to them in the normal REXX way.

The downside to the way that VisPro handles this is that there is some runtime overhead associated with making private copies of such global variables when forms are opened or closed.

In VX-REXX there are methods called PutVar and GetVar that can be used to set and retrieve global variable values. This is similar to how the VALUE function is used for working with environment variables, and it is less convenient than the way things are done in VisPro, since these variables cannot be accessed in the normal REXX way. GetVar and PutVar must be used. There is the further restriction that only compound variables that are "arrays" (i. e. having positive integral tails with the number of such elements in the .0 element) can be used this way. The advantage of this approach is in reduced overhead in opening and closing windows.

GpfRexx has a set of functions (QueryStem, QueryStemElement, RemoveGlobal, RemoveStem, RemoveStemElement, SetGlobal, SetStem, and SetStemElement) for managing global variables. This is basically like the VX-REXX facilities, except that arbitrary compound variables can be handled.

In all three cases, the "global" variable facilities actually apply only to a single program (.EXE file). So they can be used to share data among the different parts of one application, but not at all between applications. Furthermore, the data is not persistent beyond the lifetime of each invocation of the application.

Other third party tools can overcome some of these restrictions. Quercus Systems' Personal REXX has a utility command called GLOBALV that is capable of creating and maintaining true system-wide global variables. Such global variables can even be made persistent for as long as OS/2 is running, or (optionally) even indefinitely by keeping the data in special disk files.

GLOBALV also supports a two-level hierarchy for structuring data. The first level is called a "table", and within each table can be any number of individual variables. There is not, however, an exact and automatic mapping onto REXX compound variables.

GLOBALV is patterned closely on a command of the same name available in VM/CMS. This makes it possible for programs that use it to be portable among systems that implement GLOBALV (which means, currently, DOS, OS/2, VM/CMS, and Windows).

As with most other data sharing techniques, GLOBALV protects its own internal data structures with respect to concurrent update. But consistency of related data items remains the programmer's responsibility.

Just as with files, GLOBALV makes it possible to share data among any processes in OS/2. This may be important, because the REXX language does not contain any multi-tasking capabilities of its own. Although there are multi-threading functions provided in the REXX GUI tools and certain third-party add-on function packages, the only way to achieve concurrency in OS/2 REXX as it is delivered is to use multiple processes. One might well choose to implement an application as multiple processes running REXX code in order to use the multitasking capabilities of OS/2. In this case, some means for communication between the processes becomes necessary.

There are, of course, alternatives to GLOBALV for interprocess communication. The two we will discuss here are REXX external data queues and named pipes. There are other IPC mechanisms (such as a native OS/2 "queue" mechanism, which is distinct from a REXX queue). But the various techniques differ among themselves mostly by their syntax and their performance.

There are two kinds of REXX external data queues: the unnamed "session" queue and named queues. The "session" queue is unique to a particular REXX session. This is broader in scope than a single OS/2 process, since it will usually include all descendant processes as well. For instance, one invocation of the CMD.EXE command shell, together with any commands it may run, represent one session. This means that one REXX program can save data in the session queue, and it will still be available to later REXX programs until it has been fully consumed. This is true even if the data is placed in the queue by a REXX-aware application (.EXE file) or a .EXE file built by one of the GUI tools. The session queue is not automatically destroyed when the .EXE file terminates. Instead, it survives, and its data may persist, until the original copy of CMD.EXE is closed.

Programs running in other sessions, however, have their own private session queue and cannot access a different session queue. The privacy of the session queue is an advantage, in that truly independent REXX programs cannot interfere unintentionally with each other through the session queue. But it also means that they can't use the session queue to communicate with each other.

The other type of external data queue is a "named" queue. This name is truly system-wide, and such queues can be used for communication between "unrelated" programs, as long as they all use the same name for the queue. It is the programmer's responsibility to ensure that unique names are created when appropriate, and that every application which needs to use a particular named queue has a means of finding out the correct name.

It is safest to use the unnamed session queue only for communication among different REXX files that have some calling relation to each other or that are part of the same application which has been created by one of the GUI tools. While data in the session queue will normally persist between different invocations of a macro by a REXX-enabled application, it may not be a good idea to rely on this, since the data will be lost if the application (or the system) unexpectedly terminates. On the other hand, this is appropriate behavior for transient, temporary data. So one good use of the session queue is as a large "scratch pad" data area that can hold data for use by several related REXX programs.

In particular, the session queue is a good way to pass relatively large amounts of data to a subprocedure. One of the most frequently asked questions about REXX is how to pass an array (especially a large one) to an external subroutine and how to return an array. The answer is that it just isn't possible directly. (For internal subroutines, of course, it is possible to share compound variables by the use of the EXPOSE keyword on a PROCEDURE statement.)

To pass an array to a subroutine through the queue you just QUEUE or PUSH each element. The data queue is maintained as a double-ended list. Data may be removed only from the front (or top) of the list, with the PULL instruction, but it can be added to the list either at the end (bottom) or the front of the list, depending on whether you use QUEUE or PUSH, respectively. These two possibilities are referred to as "first-in first-out" (FIFO) and "last-in first-out" (LIFO).

The FIFO case is normally the easiest to conceptualize, so it is the one you would probably choose most frequently. The main reason to choose LIFO is to avoid problems when the same queue is used in a nested fashion by more than one level of subroutine. That is, if any given routine adds data only to front of the queue before calling a subroutine, and if the callee only removes as much as it needs, then it is possible to nest such calls, even recursively, without disturbing data placed on the queue for a different purpose.

If the queue needs to be used for several, or many, unrelated purposes, it is probably safer to simply use multiple named queues, even if only one session is actually involved. But one problem with using a named queue this way is that you have to be careful that you construct a name that will be unique, so that if the same application is invoked more than once simultaneously then there is no interference between the multiple invocations. You also need to have a means for the intended user of the queue to find out what the actual name to be used is.

One other problem with named queues is simply that they are a little inconvenient to work with. You first have to create the queue with a call to the RXQUEUE built-in function. Then you have to check that the name was not already in use (as indicated by RXQUEUE returning a name other than the one you asked for). Finally, you have to make the new queue the default queue by another call to RXQUEUE:

   qname = rxqueue('c', proposed_name)
   if proposed_name \= qname then  /* this queue must already exist */
       call rxqueue 'd', qname     /* destroy the unwanted new queue */
   call rxqueue 's', proposed_name

The last call here makes the named queue become the default. This is a necessary step, since there is no way to indicate on the QUEUE or PUSH instruction what queue is meant - it is always the "default" queue. Names that are valid for data queues are just like names that are valid for REXX variables - they must begin with a letter (or certain characters like "!", "_") and not be longer than 250 characters.

This procedure can be simplified slightly if you let REXX choose a name for you when creating the queue. This is probably the best approach to use when you want to have a queue that is used only for temporary scratch space. You might do this for communication between separate REXX programs, or you might do it within a single program (even a single source file) just to avoid any possibility of conflicting use of the queue. To have REXX assign the name, just omit any name when you create the queue:

   qname = rxqueue('c')
   call rxqueue 's', qname

Any time you do create a queue, you are responsible for eventually destroying it, particularly if you are only using it for scratch space. By the very nature of a named queue, it is persistent (as long as OS/2 is running), so it isn't automatically destroyed when the program (or even the program's session) terminates. The 'd' option passed to RXQUEUE destroys a queue:

   call rxqueue 'd', qname

Incidentally, a queue is not destroyed when all data is removed from it. A queue can be empty, as it will be when it is initially created.

You can tell how many data items are in a queue with the QUEUED built-in function. This function takes no arguments - it works only on the current default queue. The fastest way to remove all items from a queue, is just PULL until QUEUED becomes 0:

   do while queued() \= 0
       pull
       end

Note that it is legal to use PULL with nothing else on the line.

Named queues are often used with applications that are structured as "client-server". This can be done only so long as the server and all clients are on the same computer, since queues are not supported across a network. (Named pipes, to be discussed shortly, are good for network use.) In this case, there will probably be at least one queue whose name is known "publicly", through which clients can contact the server.

When a REXX queue is used in a "public" way like this, it's a very good idea to use semaphores to control access to it, just as with other forms of interprocess communication. This is particularly true if several separate data items have to be placed on the queue, because it would probably not work to have messages from different clients intermixed.

In a client-server situation, it is necessary for the server to have some way to wait for data to appear on a particular queue, and for the clients to wait for data to be returned (in the same queue or a different one). The PULL instruction, which is ordinarily used to read from a queue is not appropriate for this situation, because it is defined to read from the keyboard if no data is available in the default queue. One could continually "poll" for data in the queue with the QUEUED function, but this will eat up CPU cycles.

The solution is to use a poorly-documented feature of the REXX I/O system. This is the fact that you can use a stream name of "QUEUE:" with the LINEIN and LINEOUT functions. On output (LINEOUT), the effect is the same as the QUEUE instruction. But on input,

   data = linein('QUEUE:')

has the effect of suspending the REXX program until data becomes available in the queue, without wasting CPU time.

Just about any form of interprocess communication that can be done with a REXX external data queue can also be done with a named pipe. In fact, a lot more is possible, because named pipes work across a network, and also because programs that do not even have special support for pipes can use them simply by treating them as a file. This makes it possible for an OS/2 REXX program to communication with a program running in a DOS session, for instance.

The problem with named pipes is that REXX as delivered with OS/2 has very little explicit support for named pipes, although a REXX program, like any other, can treat an already existing pipe as if it were a file. When used as a file, a pipe always has a name of the form "\PIPE\xxx" (when the pipe has been created by a process on the same computer) or "\\server\PIPE\xxx" when the pipe has been created by a process on a network-connected computer called "server".

There is exactly one function in standard OS/2 REXX for working with pipes: SysWaitNamedPipe. You have to use this in case a pipe is "busy", i. e. already in use, which is indicated by an error in the STREAM function used to open the pipe:

   parse value stream(pipename, 'c', 'open') with state ':' retc
   if retc = 231 then
       call syswaitnamedpipe(pipename, -1)

Various function packages available from third parties provide more complete support for named pipes. REXXLIB, in particular, has such support. The functions provided there make it possible to do just about anything that is possible with named pipes, except that only one instance of a pipe can be open a time, because of the single threaded nature of REXX.

As an example of the use of named pipes, consider the problem of debugging a complex REXX program, especially one that is constructed using one of the GUI tools. Although these tools include debuggers, sometimes the nature of the problem is such that the easiest technique to use is for the program to send messages to a message log when certain abnormal conditions are detected. The GUI tools provide for this through a "console I/O window" which receives the output of SAY instructions. However, if you aren't using a GUI tool, you have to have another way to handle this. It may not be reasonable to mix SAY debugging output with normal output of the program.

The solution is for the program being debugged to send output through a named pipe to a simple server running in another process. The server can display the information in its own window, or possibly even analyze it for the occurrence of specific events. Here is how the server code might look, using named pipe functions provided by REXXLIB:

   pipe = '\pipe\echo'
   call nmpipe_create pipe, 'm', 'm', 'w'
   do i=1
       call nmpipe_connect pipe
       say 'Connect RC =' result
       if result \= 0 then
           exit
       do forever
           message = nmpipe_read(pipe)
           if message = 'end' | message ==  then do
               call nmpipe_disconnect pipe
               iterate i
               end
           say 'Message received: "'message'"'
           end
       end

The pipe is created with NMPIPE_CREATE. NMPIPE_CONNECT is called so that the server can wait for a client to open its side of the pipe. After this occurs, a series of calls to NMPIPE_READ retrieve all data sent from the client, until either a null string or the string "end" is received. This is taken to mean that the client is done (perhaps died) or has closed its end of the pipe.

The client side of this is even simpler. The client might simply open the pipe implicitly by calling LINEOUT with some data. Subsequent calls to LINEOUT send additional messages, and when the client is all done, it just calls LINEOUT with only the pipe name in order to close it.

The last topic we're going to look at in this paper is the question of structuring a large application as a number of separate REXX source files. What we've just covered is a variety of techniques for sharing data among such files, which is a problem since "global" variable values of a calling program are not "inherited" by external subprocedures. But there are are other issues that arise too.

For one thing, state information of a running REXX program is not inherited, either, in a call to an external routine: trace settings, NUMERIC and ADDRESS settings, condition handlers, and timers. However, this state information is inherited on internal procedure calls. This can cause problems for you if you break a large program up into a main routine and several external procedures, since these procedures may no longer have the same state settings as before.

In spite of this, there are some features of a calling REXX program that are inherited by a callee. You should be aware of this, since it is not only unexpected (in light of the fact that most things aren't inherited), but also quite undocumented as well. Most importantly, open file information is inherited. This means, in particular, that a subroutine inherits position information about all open files, and the file can be left in a different position when the subprocedure returns.

Other information of somewhat lesser importance that is always inherited includes the name of the current default external data queue. Also, if you use any external function libraries that keep their own state information, this will most likely be maintained independently of external procedure calls.

Apart from the question of how various kinds of environmental information behave with respect to external procedure calls, the main issue you face in using external procedures is simply the extra overhead of finding and loading them. Although it doesn't take much time to find and load an external REXX procedure once, this can be a big factor if it has to be done 1000 times in a loop. Many REXX applications run surprisingly slowly for just this reason.

Yet there are good reasons for wanting to break up a large program - the standard concerns of modularity, modifiability, sharing of code, and so forth. Fortunately, there is an alternative that allows keeping code in separate file, without the overhead of searching for them on disk and loading them when required.

As supplied with OS/2, REXX supports a feature called "macro spaces". Basically this is a capability for loading program files into shared memory and keeping them resident as long as desired so that the search and load overhead can be bypassed. All or parts of this shared code space can be saved in disk files (in "tokenized" form) in order to create (in effect) REXX subroutine libraries. Such libraries can be saved or distributed and reloaded as a whole when appropriate.

Among other things, macro space libraries provide an answer to a question that often arises with programmers who have to distribute REXX code. For a variety of reasons (protection against modification, hiding of confidential information, etc.) it is often desirable not to distribute source code. Since the REXX code in a macro space is saved in the "tokenized" form, one gets an immediate solution to this problem. (Note that IBM does not guarantee tokenized code will continue to operate in future releases of OS/2. If you do this, be prepared to redistribute your code if an incompatible change does occur).

So macro spaces can help manage two distinct problems: the performance of calls to external routines and the need for a way to avoid exposing source code. Unfortunately, there is one hitch: REXX as distributed doesn't provide REXX-callable functions for working with a macro space. Third party libraries, again, provide the solution.

REXXLIB has a set of functions for managing macro spaces. The first step is to load all of the source files you want into the macro space. If you don't want to create a permanent macro library, that is all you have to do. Otherwise, you make one more call that causes the library to be written to disk.

For instance, suppose that the compound variable names. contains the list of names of procedures required. Suppose the extension of all files is ".CMD". Then you would use:

   do i = 1 to names.0
       call macroadd names.i, names.i".cmd", 'B'
       end
   call macrosave 'mymacros.mac', 'names.'

Although the file names used in this example are the same as the routine names (except for the extension), this does not need to be the case. You could associate any procedure name you want with any file.

The third argument of MACROADD is either 'B' ("before") or 'A' ("after"), which indicates whether REXX will search for that particular external procedure name in the macro space either before or after it searches on disk. There would be no performance advantage of macro spaces if you didn't use 'B', so it is the default. But you might want to use 'A' to create a kind of "default" procedure which would be executed only if the name wasn't found on disk. (This is a rather risky thing to do, since your application will probably fail if an unrelated REXX procedure with this name just happens to exist on disk. But it also makes it possible to supply overriding procedures if that is desirable.)

When a procedure has been loaded into the macro space with the 'B' option, REXX finds it before it searches the disk, and before it finds functions registered in .DLL and .EXE files. This makes it possible for you to override function names that you would otherwise not have control over. However, the macro code is still executed in exactly the same way any external procedure is. Consequently, you can't pass compound variable stems to a macro space routine any more than you can any other external REXX code. And all the other considerations listed previously for calling external routines still apply. Also, you can't override the names of built-in functions, since these are considered to be internal.

In order to use a macro space library you have created at an earlier time, it takes only one call to load it:

call macroload 'mymacros.mac'

If the macro library is one you use regularly, you should probably just load this in a STARTUP.CMD file. Unless you have a really large library, the memory usage isn't too significant, and it will be swapped out pretty soon anyhow if it's not actually used.

Although the macro space facility is a nice, little-known feature of OS/2 REXX, it has a few problems (apart from the fact there's no REXX-callable interface supplied by IBM). For one thing, the macro space is global to the whole OS/2 system. Macros loaded by one process immediately become visible to all other running processes. The same is true of any other change made to the macro space, such as deletion of routines. Of course, this isn't so different from the fact that a new REXX program added to your disk immediately becomes available to all processes as well. You just have to be careful, and you should probably establish naming conventions to avoid unintended name collisions.

Also, you can't get a list of macros that have already been loaded into the macro space. But you can query whether any particular name has been loaded. If you are especially security conscious, you might want to think about doing this to be sure a "trojan horse" REXX routine hasn't been loaded into the macro space to replace a routine you rely upon. (MACROQUERY is the REXXLIB function that provides this service.)

Bibliography

  1. Cowlishaw, M. F.; The REXX Language
    Prentice-Hall, ISBN 0-13-779067-8
  2. Daney, Charles; Programming in REXX
    McGraw-Hill, ISBN 0-07-015305-1
  3. German, Hallett; OS/2 2.1 REXX Handbook
    Van Nostrand Reinhold, ISBN 0-442-01734-0
  4. Goran, Dick; REXX Reference Summary Handbook
    CFS Nevada, ISBN 0-963-98541-8
  5. Rudd, Anthony; Application Development Using OS/2 REXX
    Wiley-QED, ISBN 0-471-60691-X