Common REXX Pitfalls

By Charles Daney

REXX is billed as a user-friendly language that is as easy as possible to read and write. Part of the foundation for this claim is that REXX syntax was carefully designed to "do the right" thing in the most common usage situations - the "principle of least surprise". Unfortunately, this objective isn't achieved 100% - sometimes surprising things happen in REXX, particularly from the point of view of people who have used other programming languages. Sometimes these surprises are actually the by-product of decisions that were intended to make REXX easier in some way or other.

Whatever the reason, we list here some of the language pitfalls that have been revealed by the experience of thousands of users over the years - things that show up again and again in customer support calls and messages in forums where REXX is discussed.

The NOP instruction
Although the NOP instruction does nothing, it has several uses.

First, it is required in the syntax of IF and SELECT statements. For instance, when the first branch of an IF statement is null, you must use NOP: if some_condition then nop; else say /* this is correct */ "Condition was false." if some_condition then; else say     /* this is NOT correct */ "Condition was false." Of course, this is not a natural example, since you would ordinarily make the only branch of an IF immediately follow THEN. However, when you have nested IF statements, you sometimes need to have an ELSE clause which does nothing, just so that the conditions match: Another case in which NOP is useful is in tracing. The specification of REXX says that during interactive tracing REXX will stop only after a statement is executed. You can use NOP to pause just before a statement: nop    /* about to invoke erase command */ 'erase' name_of_something Also, REXX will not pause at all during interactive tracing for certain types of statements, such as CALL and SIGNAL. You can use NOP to force a pause:

Uninitialized variables, quoting of literals
By far the most frequent mistake that both beginning and experienced REXX users make is neglecting to quote all literal strings. It is tempting to leave off quotes around system commands, but this easily leads to mysterious syntax errors, e. g.: Even worse, REXX will perform substitution on unquoted strings that are valid symbols. Depending on the names you use for variables you can produce some very unintended effects: copy = 'erase' ... copy "*.*" There is a more general problem with not quoting literal strings even when the direct results are harmless. One of the most frequent types of programming mistakes that both beginners and experience REXX users make is the use of uninitialized variables. Such errors can go completely undetected except for the fact that the program does not behave as expected: Since REXX provides a "default" value of uninitialized variables which is the (uppercase) name of the variable, the comparison in the above is false ("W" does not equal 42). There is a very easy way to prevent errors of this kind: always put the statement signal on novalue at the start of any REXX program. Then every attempted usage of an uninitialized variable will cause an error (which will be "label not found" unless you actually include a handler for the NOVALUE condition). Of course, this won't work if you have even one "innocuous" use of an unquoted literal.

So the best recommendation is: Never use unquoted literals or uninitialized variables, and always start a program with SIGNAL ON NOVALUE in order to catch uninitialized variable errors.

Variable scoping
Scoping problems with variables tend to cause many subtle errors in REXX programs. "Scope" refers to the part of a program in which a given variable is "visible". Normally, a variable is visible from the point it is created until entering a subroutine that begins with PROCEDURE. Even then the variable can be added to scope of the subroutine by naming it after EXPOSE.

If you have a program that has many nested subroutines, the typical problem is that a variable used in one of the higher routines must be exposed in all intermediate routines before it can be used in low-level routines: In the above example, the variable X is not available (or rather, is not initialized) in the subroutine called SECOND because it was not exposed in FIRST.

This problem most often arises when certain data variables need to be available globally throughout the whole program. However, the use of PROCEDURE statements is also a good thing, since it promotes "encapsulation" and prevents subroutines from having unintended side effects.

There are several techniques that can be used to provide "global" data easily. One is to place all such data into a single compound variable: glbl.screen_height = 25 glbl.screen_width = 80 glbl.attributes = 31 Then you need only be sure the stem GLBL. is exposed everywhere. An alternative is to list the names of all required variables in a string and then do a special kind of expose: Placing the variable name in parentheses after EXPOSE tells REXX that the variable itself, and all variable names contained in the value should be exposed.

The PARSE instruction
The PARSE instruction is a very powerful REXX feature, but it can take quite a bit of experience to use it effectively.

It's also very easy to create subtle errors which can be very hard to find if you don't know some of the details of how PARSE operates.

1. Incorrect use of WITH
The keyword WITH is used only with PARSE VALUE. In all other forms of PARSE, WITH is not a keyword and will not raise any error condition, since it will be interpreted as a variable name:

2. Unexpected blanks in parsed results
The rules of PARSE provide that the last variable to be assigned just before a literal subpattern or before the end of the whole pattern will contain all remaining characters. Frequently this includes some leading or trailing blanks. This can present subtle problems, since REXX ignores such blanks in ordinary comparisons and in numbers, but not in other contexts: The most general way to deal with this is to use the STRIP function where extraneous leading or trailing blanks could be a problem, since STRIP removes the leading and trailing blanks.

In some cases you can use the special "." notation in a PARSE pattern to send blanks to the bit-bucket:

3. Mismatched arguments and PARSE pattern
One of the most common uses of PARSE is in a PARSE ARGS statement that is used to access the arguments of a subroutine. Normally, arguments are passed to a subroutine as a list of values separated by commas. It is easy to forget that the commas must also be used in the PARSE ARGS statement: The reason that the first PARSE ARGS statement doesn't work is that only the first argument (which consists of only a single word) has been accessed, since there are no commas separating subpatterns of the template.

In general, it would be a good rule of thumb to always use the same number of commas in the PARSE ARGS statement as are used in the corresponding procedure invocation.

4. Problems accessing command-line arguments
A very common problem that is the inverse of the one just discussed occurs in receiving command line arguments. When CMD.EXE invokes a REXX program, it places the entire string following the program name into a single argument. This is true even if the string contains embedded commas.

For instance, if you enter parrot hello, world on the command line, then the parrot.cmd program should be something like this: /* a general parrot program */ parse args my_args say 'Polly says "'my_args'."' return The output of this will be Polly says "hello, world." There's one other thing to be careful of when receiving command line arguments. That is, it's quite possible that a user will include excess blanks before or after the argument string (intentionally or otherwise). As discussed above, this can cause excess blanks to be included in the parsed variables, which may create some very subtle program errors.

CALL statement syntax
Beginning REXX users frequently have trouble with the CALL instruction. The problem is that parentheses should not be used around the list of arguments. It is easy to forget this, because parentheses are required when the procedure is invoked as a function. The problem is further compounded by the fact that the following will actually work correctly: call subroutine ('only one argument') The reason that this works is that 'only one argument' is an expression consisting of a literal string, and one may always place parentheses around a REXX expression. (This would also work even if there were no space between 'subroutine' and '('.)

However, this does not work: call subroutine ('first argument', 'second argument') The reason it doesn't work is that two literals separated by a comma do not constitute a valid expression, and an error message will be issued to this effect.

Nested comments
REXX allows nesting of comments. This is a very convenient feature, which is not present in similar languages, like C. The main reason it is useful is that it allows you to easily "remove" code temporarily from your program by enclosing it in comment delimiters: Comments can be nested to any arbitrary depth. There is a subtle danger in this, however. While REXX is scanning for comments within comments it looks only for "*/" and "/*". In particular, it will ignore the possibility that these character sequences may occur in quoted strings. If you enclose a sequence of code containing either /* or */ in a literal string, then a mismatched comment error will probably result: /* This isn't going to work... say 'Watch out for the use of "/*".' */ What will happen here is that REXX will assume that a second level of comment nesting has occurred, and probably the entire remainder of the program will be treated as a comment. Surprise!

Compound variables
There are certain subtleties in the use of the REXX compound variables. These lead to a variety of common problems.

1. Case sensitivity in compound variable tails
Whenever you refer to a compound variable in a program, REXX automatically interprets the symbol as if it were written in upper case. Therefore, Country.Tuesday = 'Belgium' actually assigns a variable whose name is COUNTRY.TUESDAY, provided TUESDAY is not itself the name of a variable. What's actually happening is that the stem, COUNTRY., is automatically taken as upper case, and the tail contains just one part. REXX looks for a simple variable called TUESDAY (also upper case), and if none has been initialized, the default initial value, which is TUESDAY, is substituted.

There is, however, an important distinction between the name of a compound variable, and the symbol which is used to refer to it. This distinction often causes problems, particularly related to case. For instance, if you had the following: day = 'Tuesday' say "If it's" day", this must be" Country.day"." Then assuming the preceding assignment, what would be displayed is If it's Tuesday, this must be COUNTRY.Tuesday. That is because the variables COUNTRY.TUESDAY and COUNTRY.Tuesday are distinct (though the symbols are not, as far as REXX is concerned).

2. Inability to have 'constant' values in tails
One would often like to create record-like data structures using compound variables, in order to obtain the same effect as one has with structures in C, PL/I, and other languages. For instance, it would be nice to represent personnel records using variables like person.age.name person.ssn.name person.salary.name Unfortunately, all parts of the tail of the compound variable are subject to substitution. For instance, consider name = 'Kilgore Trout' say "SSN of" name "is" person.ssn.name This will work OK as long as SSN is not used as a REXX variable, because REXX will use the uninitialized value, which is "SSN". However, if SSN is ever used as a variable, intentionally or otherwise, this will probably break. Even if it this doesn't happen, this usage has an adverse performance impact, since REXX has to do a full variable look-up in order to discover that SSN is uninitialized.

There aren't any fully satisfactory solutions to this problem, either. You cannot simply write person.'ssn'.name since that evaluates to the concatenation of the stem value person., the literal 'ssn', and the symbol .name.

One thing you can do is either to be very careful not to use SSN as a variable in your program, but the it's obviously very easy to forget about this restriction. Another thing you could do is to us a scratch variable to contain the exact value that you need: x = 'SSN' say "SSN of" name "is" person.x.name Obviously, this is a lot of extra work, and you also have to be careful about being consistent to always use the right case in the literal: x1 = 'ssn' x2 = 'SSN' say person.x1.name "is not necessarily the same as", person.x2.name because case is significant in the evaluated form of a stem.

Another possibility is to use a symbol which can't possibly be evaluated as a variable: name = 'Kilgore Trout' say "SSN of" name "is" person.0ssn.name This works because REXX will always take 0ssn as a literal and not try to evaluate it, since variable names can't start with numbers. But it is obviously not very aesthetically pleasing.

One remaining possibility is to not try to use "multidimensional" compound variables, and instead adopt a naming convention like this: person_age.name person_ssn.name person_salary.name There's a lot to be said for this approach in terms of readability, performance, and relative immunity to the problems we've been discussing. But it does make it more difficult to deal with the compound variable as a whole, for instance if you need to DROP the whole data structure. In this case, you would have to drop each distinct stem, instead of just drop person.

3. Inability to have expressions in tails
It is very tempting to think of REXX compound variables as if they were just like arrays in other languages. Unfortunately, this is not quite possible. One reason is that REXX does not allow arbitrary expressions in "array" subscripts. For instance, i = 10 j = 20 say "Value =" array.(i+j) Does not display the value of array.30. Instead, it tries to call a function called array., which will probably fail because the function does not exist. The reason is that in REXX, a symbol (which array. is) immediately followed by a left parenthesis is considered to be a function reference.

The only way you can use a "computed subscript" is to assign the value to a temporary variable: x = i + j say "Value =" array.x A similar problem arises when you want to use one compound variable as a "subscript" in another. Suppose, for instance, that you use the stem book. to contain the index (subscript) of a data item related to books. Stem. itself will be indexed by the name of a book. This is known as an associative array, since data can be retrieved by "associations". It is commonly used in advanced REXX programming, and it is one of the most powerful features of the language.

For performance reasons, it is desirable to store a multi-column table of book-related information in a number of compound variables that have a numeric index instead of being indexed by a string. (This is faster and requires less storage space, since the index string doesn't need to be stored multiple times internally.) A typical record of book information might be set up like this: The reason that the actual book title has been used to index the book. array is that (we assume) there is some need to retrieve book information by the exact title. However, we avoid the overhead of using long strings as indices in every column of the table by keeping a row number and using that as the index.

When the time comes to retrieve some information, such as the author of a given book, we do it like this: title = "Lolita" index = book.title say "The author of" title "is" author.index"." It would not have worked to do this: title = "Lolita" say "The author of" title "is" author.book.title"." The reason is that REXX tries to substitute values for book and title separately and independently. Unless book has been assigned some value, it will be evaluated as BOOK and the resulting tail will be BOOK.Lolita

4. Inability to deal with cross sections of a compound variable
Consider a possible database for a garden club: If an individual leaves the club, there should be a way to remove all related information. One is tempted to do this: name = "Susy Flor" drop address.name. specialty.name. But this won't work as intended. While it will not cause a REXX error, all it will do is try to drop REXX variables called ADDRESS.Susy Flor. and SPECIALTY.Susy Flor.. This is because address.name. and specialty.name. are not valid REXX stems.

There isn't any simple way to do the intended thing in REXX. All you can do is write loops to drop each variable individually.

Strict vs. non-strict comparison
Comparison in REXX with the = operator is "non-strict". This means that REXX will attempt to determine whether both operands are numeric values, in which case a numeric comparison will be done. In addition, even if a character-string comparison is done, leading or trailing blanks on both operands are ignored. This is frequently helpful when dealing with user input, since extra blanks may well be present. (See The PARSE instruction.) Such "non-strict" comparison rules in fact apply with any comparison operator, such as <, >, <=, \=, etc.

There is another type of comparison operation which is "strict". That is, the operation treats the data only as character strings rather than possibly as numbers. Because of this, strict comparisons can be a little faster. Furthermore, leading and trailing blanks are not ignored in a strict comparison. Strict comparison operators are written as <<, >>, <<=, \==, and so forth.

The choice of the proper types of comparison to use in any given case can be somewhat confusing. The non-strict comparison operations are used most commonly, e. g.: a = '3' b = '3.0' c = '3e0' d = ' 3 ' say (a = b)',' (a = c)',' (a = d)  /* displays "1, 1, 1" */ Here, all comparisons yield a value of "1" (true), because all of the strings are equivalent forms of the number 3. This is probably the intended result. If instead strict comparison (==) were used in the SAY statement, then the result would be "0, 0, 0", because all the strings are distinct as character strings.

However, there are pitfalls hidden in the convenience of the non-strict comparison operators. One of the most insidious is a result of the fact that numbers can be represented in exponential notation with an embedded "e", for instance 3e0. But sometimes, non-numeric data may contains such strings naturally, such as with hexadecimal values. Consider: say '3e0' < '300'  /* gives "1", since 3 < 300 */ If the data in this example were intended to represent either character strings or hex numbers, then the program would fail, because the correct result in that case should be "0" ('3e0' is after '300' in the ASCII collating sequence). Strict comparison (<<) should probably have been used here.

There is yet another problem when you are working with character strings which you want to treat as strings but which may be interpreted as numbers. Perhaps they are database keys. If these strings are longer than the current NUMERIC DIGITS setting (normally 9 digits), then you can get very surprising results: say '1234567890' = '1234567891'    /* gives "1" */ This gives what is probably the wrong answer for most purposes, because the strings are interpreted as numbers, and by the definition of REXX arithmetic comparison, they are equal, since only 9 significant digits are considered in the comparison.

Line continuation
When a REXX clause is not complete on one line, it is necessary to indicate this with a comma, which is the continuation character. Normally this is a convenience, since most statements do not need to be continued, and therefore the end of the line can be taken as the end of the statement, which makes it unnecessary to include a semicolon after each statement.

However, commas are also used to separate arguments in a procedure call and sub-templates in a parse pattern. It's easy to forget to add an extra comma when one of these statements is continued: say max(3, 4, 5, 6, 7, 8, 9) This statement will produce an error 40 (incorrect call to routine), since the ending comma on the first line is taken to be a continuation character. The result is the same as if you had coded say max(3, 4, 5 6, 7, 8, 9) since the comma is replaced with one blank when the second line is concatenated to the first, and "5 6" isn't a valid number because of the embedded blank. This example should have been written say max(3, 4, 5,,    6, 7, 8, 9)

Condition handling
A REXX program can trap certain exceptional conditions by using a SIGNAL ON or CALL ON statement. For instance, you can trap pressing of the Ctrl-Break (or Ctrl-C) key by the user, which interrupts the program: However, there can be problems with this in a more complex program. Perhaps one of the choices you want to offer the user is the option of leaving the current operation or computation and returning to an initial prompt in the program. What you want to do may be something like this: The problem here is that this most likely will not work right if the HALT condition occurs while a subprocedure is being executed, because as far as REXX is concerned, the program is still executing the subprocedure. The only way to get out of a subprocedure is a RETURN (or EXIT) statement - SIGNAL doesn't do it. The program might appear to work correctly, but it will be subject to various kinds of errors. For instance, important variables might not be exposed if the subprocedure started with a PROCEDURE statement. Unfortunately, there is no way to solve this kind of problem as REXX is currently defined.

There is another sort of problem which can occur, but which is very easy to fix. In the example above, as soon as the HALT condition is raised by the user pressing Ctrl-Break, further handling of this condition will be disabled. You will need to explicitly execute SIGNAL ON again in order to re-enable the handler. In the example above, this can be done by putting another signal on halt immediately after the halt: label.

When to use explicit concatenation
The concatenation operation in REXX is normally implicit, but it can be requested explicitly with the || operator.

There are times when || needs to be used explicitly. Normally REXX code is fairly free-format. That is, blanks or their absence is not important. However, there are a few important exceptions:
 * If one or more blanks occur between symbols, literals, or parenthesized expressions, then "blank concatenation" is implied, rather than "abuttal concatenation".
 * If a string or a literal is followed immediately (without intervening blanks) by a left parenthesis, then REXX treats it as a function reference.
 * If a literal is followed by the letter 'X' (upper or lower case), then REXX treats it as a hex string (even if the string is not a valid hex number).
 * If a literal is followed by the letter 'B' (upper or lower case), then REXX treats it as a bit string (even if the string is not a valid bit string).

You can wind up with unintended function call references if you forget to write a concatenation operator in certain expressions, e. g.: say "Number of observations--"(alpha + beta) In REXX any quoted string is a "token", which can refer to a function to be called if it is followed immediately (without intervening blanks) by a left parenthesis. That is what would happen here, even though the string in question doesn't look at all like a function name. (And the corresponding function almost certainly will not be found.) This should have been written say "Number of observations--"||(alpha + beta) An even more insidious problem can occur if you use variables called X or B When the letters X or B occur by themselves immediately following a quoted string, they cause the string to be interpreted as a hex or binary literal - even if its syntax is not proper for such a literal. So you will get error 15 (invalid hexadecimal or binary string) instead of what you wanted from x = 100 y = 50 say "Sum="x + y This should have been written as either of the following: say "Sum="||x + y say "Sum=" x + y Or perhaps it would be better if you just avoid the use of variables named X or B altogether.

Uppercasing by ARG and PULL
The ARG instruction is provided as an abbreviation of PARSE UPPER ARG. Likewise PULL is an abbreviation for PARSE UPPER PULL. Although both of these instructions are frequently required in REXX programs, the specification of UPPER, which causes automatic uppercase conversion of strings is somewhat unfortunate.

The reason this is done is to make string handling somewhat case insensitive. That is, if all strings can be treated as upper case, it is not necessary to deal separately with equivalent lower or mixed case strings. This can be a great convenience. But there are several drawbacks as well. In the first place, most PC users are more accustomed to dealing with strings in lower case (less frequent need to use the shift key).

But more significantly, the automatic upper casing by ARG and PULL can be unexpected and a source of subtle bugs. ARG in particular may be used heavily, so it's more often a problem. You may simply forget that it does upper casing and try to do comparisons using lower or mixed case strings against variables set by ARG.

An even more serious problem may occur any time you pass binary data to a subroutine. Binary data may be read from a file, for example, or be produced by the D2C or X2C functions, or be encoded using hex string literals. Any characters in that data that happen accidentally to be lower case alphabetic letters will be converted unexpectedly by ARG!

Case sensitivity of labels
Normally a REXX program itself is mostly case insensitive. That is, you can usually write keywords, variable names, and labels in upper, lower, or mixed case, and it doesn't matter. REXX generally treats such things as if they were always upper case.

The SIGNAL instruction can be used to implement a kind of computed GOTO, though there are problems with this usage, since SIGNAL also is defined to terminate any open DO groups. Nevertheless, it can be a useful technique. It can, for example, be much more efficient than a SELECT statement that contains a large number of WHEN clauses. But watch out for alphabetic case sensitivity! The value of the expression in the signal statement much match the alphabetic case of labels in the program exactly, even though REXX will always take the labels themselves to be upper case. If in this example we had used signal value 'case'x then an error 16 (label not found) would occur, since REXX would look for labels like case1, case2, etc., even though the actual labels in the program are CASE1, CASE2, etc. (in spite of how they are written!).

Null strings vs. omitted strings
In calls to REXX routines, including the main program, arguments can always be omitted as far as REXX is concerned. That is, the omission of an argument will not cause a REXX error (except for built-in functions, which often have required arguments).

There is an official technique using the ARG built-in function for determining in a subroutine whether an argument has been omitted: (Note that there isn't any way to raise error 40 (incorrect call to routine) automatically as is done for built-in functions.) Most REXX programmers, however, tend to use the short-cut of testing an argument for a null string, since REXX supplies a null string whenever one attempts to refer to a missing argument: This is not really a very good approach. In the first place, this example used ordinary comparison, so if a string consisting of multiple blanks had been passed (which might be meaningful), it would still be treated as if it had been omitted. (See Strict vs. non-strict comparison.) But even if a strict comparison to the null string had been done, it would still complain if an explicit null string had been passed. Yet there might be valid reasons to accept a null string as an argument, but not the complete omission of the argument.

A different form of this problem can occur if you just blindly pass an argument to a built-in function: If this program is run it will fail with error 40 in the call to MAX in case either argument has been omitted. This is because PARSE ARG will set first or second to a null string if the corresponding argument is omitted. But a null string is not the same thing as an omitted string, and MAX will fail if you pass it a null string, though not if you simply omit the second argument.

Distinction of commands and functions
The term "command" when used in a discussion of REXX is somewhat ambiguous. There are about 25 native "commands" in REXX, but the preferred term is "instruction" or "keyword instruction". Among these are ADDRESS, CALL, DO, EXIT, IF, PARSE, PULL, PUSH, QUEUE, RETURN, SAY, and SELECT. When such keywords are used in a REXX program, they are never quoted.

There is another class of REXX statements that consists of commands to an external command processor. Usually in OS/2 this is CMD.EXE, or else a REXX-enabled application. Such commands usually start with a keyword and this may be followed by one or more parameters. In order to avoid certain common problems (see Quoting literals) it is advisable to always enclose command keywords (and perhaps the entire command) in single or double quotes: 'copy config.sys config.bak'   /* entire command quoted */ parse arg from to 'copy' from to                 /* only command name quoted */ Certain system commands handled by CMD.EXE, such as CALL, IF, and EXIT, have the same names as REXX keyword instructions. These must appear inside quotes if they are not to be treated as REXX instructions.

There is another kind of REXX service that is provided by "built-in functions". There are about 65 such functions, which include SUBSTR, POS, LINEIN, LINEOUT, MAX, VALUE, etc. All of these are technically functions in the REXX sense, which means that they return a value. However, some of them are frequently used for the "side effects" they produce rather than for their value - LINEOUT, CHAROUT, STREAM, and VALUE, for instance. Many other useful services are available through functions in external function packages such as REXXUTIL.

A very common error is to invoke such a function by writing it as a function call on a line by itself, as one does in a language like C. What actually happens then is that the value returned by the function (often "0" or "1") is passed as a command to the default command environment, which will try to execute it. Usually the result will be a SYS1041 error message stating that the command is not a recognized internal or external command, operable program, or batch file.

The proper way to invoke such a function is either by using it on the right hand side of an assignment statement, or else invoking it with a REXX CALL instruction. Note that using CALL means parentheses around the argument list should be omitted. (See CALL statement syntax.) value('path', newpath, 'os2environment')       /* wrong */ x = value('path', newpath, 'os2environment')   /* right */ call value 'path', newpath, 'os2environment'   /* right */