Arrays - sequential & associative

By Richard K. Goran

REXX has a powerful and unique array capability. Arrays are usually groups of variables that contain related table or list data which may be referenced with a common variable name plus a subscript or index to indicate which element of the list is being referred to. I prefer the term subscript over index since, to me, subscript implies a base of one and index implies a base of zero.

An example of a simple, or single dimensional, array would be a list with the name of the each month being assigned to the same variable name modified by a subscript in the range of 1 to 12. If "March" is assigned to month.3, then by assigning the value of 3 to sub, the value of month.sub would be "March". All of the elements in a stem may be initialized by assigning a value to the stem name. For example, month. = '' would assign a null value to all elements of the stem. However, a value may not be assigned to a group of elements below the stem level.

Arrays need not be declared and are simply referenced with compound or stem variables. Compound variables use a period (.) to separate their component parts which include the stem (all characters up to, and including, the first period) and the tail (all characters following the first period). Each token in the tail is preceded by a period. The variable used as the stem portion of the name is translated to upper case like any simple variable. The variable names used within the tail are used in the case they are written.

Prior to the compound variable name (stem and tail) being used, each individual symbol used in the variable name is replaced with its respective value and the resulting "derived" name is used as the name of the variable. There is no limit to the number of symbols (variables and constants) which may be used in a compound variable name other than the overall limit of 250 characters in the derived variable name.

Some confusion may arise, particularly for REXX programmers from the mainframe world who have been accustomed to using all uppercase variable names, when using mixed case variable names within a compound variable. The confusion occurs because of OS/2 REXX using these symbols literally when they are used as part of the tail of a compound variable but translating them to upper case when used as a simple variable.

Figure 1 contains an example of what Charles Daney of Quercus Systems (Personal REXX) likes to call a "REXX gotcha".

Figure 1 - REXX Gotcha /* 9412FG01.CMD - Illustrate tail variables */ /* Another classic REXX gotcha. */       x = 'blue' bg. = ''       bg.x = 16 /* Then */ say bg.x       /* 16 */ say bg.blue    /* null string */ say bg.'blue'  /* 'blue' */ /* is correct. */

In bg.blue, REXX is looking for the variable BG.BLUE, and the tail is case sensitive making it different from bg.blue. There is a distinction between the name of the variable and the symbol that refers to it, which is bg.blue. bg.'blue' is the concatenation of bg. (null string) and 'blue'.

Associative Arrays
Unlike so many of the other programming languages, the value assigned to the variables used in the tail of a compound variable need not be numeric. In fact, the value of the variables may contain any value, without exception. This fact lends itself to the concept of associative storage or arrays.

Using the example of a month table, we could build an associative array that would look like Figure 2.

Figure 2 - Associative array contents Name=MONTH.1, Value='Jan'	Name=INDEX.Jan, Value='1' Name=MONTH.2, Value='Feb'	Name=INDEX.Feb, Value='2' Name=MONTH.3, Value='Mar'	Name=INDEX.Mar, Value='3' Name=MONTH.4, Value='Apr'	Name=INDEX.Apr, Value='4' Name=MONTH.5, Value='May'	Name=INDEX.May, Value='5' Name=MONTH.6, Value='Jun'	Name=INDEX.Jun, Value='6' Name=MONTH.7, Value='Jul'	Name=INDEX.Jul, Value='7' Name=MONTH.8, Value='Aug'	Name=INDEX.Aug, Value='8' Name=MONTH.9, Value='Sep'	Name=INDEX.Sep, Value='9' Name=MONTH.10, Value='Oct'	Name=INDEX.Oct, Value='10' Name=MONTH.11, Value='Nov'	Name=INDEX.Nov, Value='11' Name=MONTH.12, Value='Dec'	Name=INDEX.Dec, Value='12'

As you can see, we can reference the sequential array to obtain the month abbreviation or we can reference the associative array using the value of the sequential array to obtain the subscript of the sequential array.

When an array element is referenced with a non-sequential tail, REXX finds the appropriate string with a tightly written search function. This results in much better performance than could be obtained at the REXX source level.

How many times have you wanted to compare two lists and process the like or unlike entries? For example, have you ever had a need to compare the contents of two directories? More than likely you have either written merge logic in your program or read one of the directory's entries into an array of some sort and then processed the second directory against the array created from the first directory. If either of the directories or lists are unsorted, this can lead to a long running program because of the search time that results from the repetitive scanning of the array built from the first list.

If we initialize all of the elements of the associative array with a recognizable value that would not conflict with the sequential number of the list entries - like zero, then we could test for the presence of a file name in the array just by using its name as the subscript. Listing 1, available via anonymous FTP ([9412ls01.cmd]), is a working example that will compare two directories and display the names of the non-matching entries.

Listing 1 /**\ |                                                                         | | 9412LS01.CMD (rxls02) - Compare two directories using associative arrays | |                                                                         | \**/ call RxFuncAdd 'SysLoadFuncs', 'REXXUTIL', 'SysLoadFuncs'         /* 0006 */ call SysLoadFuncs                                                /* 0007 */ /* 0008 */ directory_1 = 'd:\*.*'                                           /* 0009 */ directory_2 = 'e:\*.*'                                           /* 0010 */ /* 0011 */ /*---*\                                      /* 0012 */ |  Process 1st directory  |                                       /* 0013 */ \*---*/                                      /* 0014 */ index_1. = 0                       /* initialize all elements */ /* 0015 */ call SysFileTree directory_1, 'tree_1', 'FO'                     /* 0016 */ do t1 = 1 to tree_1.0                                            /* 0017 */ subscript = FILESPEC( 'P', tree_1.t1 ) ||,                    /* 0018 */ FILESPEC( 'N', tree_1.t1 )                        /* 0019 */ index_1.subscript = t1                                        /* 0020 */ end                                                              /* 0021 */ /* 0022 */ /*---*\                              /* 0023 */ |  Process 2nd directory vs. 1st  |                               /* 0024 */ \*---*/                              /* 0025 */ index_2. = 0                       /* initialize all elements */ /* 0026 */ call SysFileTree directory_2, 'tree_2', 'FO'                     /* 0027 */ do t2 = 1 to tree_2.0                                            /* 0028 */ subscript = FILESPEC( 'P', tree_2.t2 ) ||,                    /* 0029 */ FILESPEC( 'N', tree_2.t2 )                        /* 0030 */ index_2.subscript = t2                                        /* 0031 */ if index_1.subscript > 0 then                                 /* 0032 */ do                                                         /* 0033 */ index_1.subscript = -1    /* indicate matching entry */ /* 0034 */ iterate                                                 /* 0035 */ end                                                        /* 0036 */ say directory_1         ||,                                   /* 0037 */ ' does not contain ' ||,                                  /* 0038 */ subscript                                                 /* 0039 */ end                                                              /* 0040 */ /* 0041 */ /**\                 /* 0042 */ |  Process unmatched entries in 1st directory  |                  /* 0043 */ \**/                 /* 0044 */ do t1 = 1 to tree_1.0                                             /* 0045 */ subscript = FILESPEC( 'P', tree_1.t1 ) ||,                    /* 0046 */ FILESPEC( 'N', tree_1.t1 )                        /* 0047 */ if index_1.subscript > 0 then                                 /* 0048 */ do                                                         /* 0049 */ say directory_2         ||,                             /* 0050 */ ' does not contain ' ||,                            /* 0051 */ subscript                                           /* 0052 */ end                                                        /* 0053 */ end                                                              /* 0054 */ /* 0055 */ exit                                                             /* 0056 */

Lines 15 - 21 initialize the associative array for the first directory, index-1, to zero and then build both the sequential array and the associative array for the directory.

Lines 26 - 40 does the same for the second directory as well as writing a message to the screen for all files that do not occur in the first directory (the corresponding index entry does not have a value greater than zero). Also, if the directory_2 entry is found to exist in directory_1, the index for the directory_1 entry is set to minus one to distinguish it from the positive value it had when the index was built.

Finally, lines 45 - 54 list any entries in the first directory that do not exist in the second directory (the index value is neither 0 or -1).

As you can see in Listing 1, only three iterations are necessary to compare both directories against each other regardless of the number of items in each directory. This provides significant savings over any other technique when you are dealing with unsorted lists.