Parse this!

From EDM2
Revision as of 22:46, 7 March 2018 by Ak120 (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

By Richard K. Goran

Variations of the Parse Instruction

Probably the single most powerful function of REXX is its string parsing ability. Parsing means to break a string down into its component parts. That can be as simple as separating a sentence into individual words or as complex as extracting parts of a string in varying order. The PARSE instruction comes with many different options but the most impressive part of the PARSE instruction, and the most difficult for many users to grasp, is the use of templates with the PARSE instruction. Table 1 contains the seven variations of the PARSE instruction. Table 2 contains a copy of the symbols that my be used in defining templates for the PARSE instruction. The template definition was taken from the REXX Reference Summary Handbook.

The simplest form of a parsing template consists of just variable names. The string being parsed is split up into words and each successive word in the source string is assigned to a variable in sequence, from left to right. A word is considered to be any group of non-space characters followed by a space. The last variable in the template is handled specially in that it receives any remaining data in the source string and may therefore contain multiple words. For example:

parse value 'The quick red fox jumped.' with,
      Var1 Var2 Var3

Var1 would be assigned the value "The", Var2 would be assigned the value "quick" and Var3 would be assigned the value "red fox jumped." Leading and trailing space characters are removed from each word before it is assigned to the variable; however, since the last variable receives all of the remaining data, it may contain both leading and trailing spaces.

Because of the unique handling of the last word in the source string that is being parsed, special care must be used in parsing strings like that returned by the SysFileTree() function. SysFileTree() returns the date, time, size, and attributes assigned to a file followed by the full file system name of the file. However, the file system name is preceded by 2 spaces. Since file system names on an HPFS volume may contain spaces, the word parsing capability of the PARSE instruction should not be used when extracting the fields returned by SysFileTree(). Rather, the file system name should be considered to be the remaining data. Using this technique requires that the spaces surrounding the file system name be removed. The following technique will handle all file system names returned by the SysFileTree() function:

  parse value stem.n with,
      file_date,
      file_time,
      file_size,
      file_attr,
      file_path_and_name
   file_path_and_name = STRIP( file_path_and_name )

All variables specified in a template are assigned a new value, whether there are more or less words, so that a variable that does not receive a word because there are more variables than there are words in the source string will be assigned a value of null ().

A literal string may be used in a template to identify a delimiter rather than the implied delimiter of a space. For example:

parse value 'Tues., Sept. 29, 1942' with,
  Var1 ',' Var2 Var3

would result in:

Var1 = "Tues."
Var2 = " Sept."
Var3 = "29, 1942"

Note that only the variable is removed from the string before the values are assigned to Var1 and Var2. However, since the default delimiter of a space is used to separate the values assigned to Var2 and Var3, we revert to the rules for parsing words. Therefore, the space preceding "Sept." is removed. Any combination of delimiters may be used in a single PARSE instruction.

A pattern may be specified in a variable and is indicated to the PARSE instruction by enclosing the variable name in parentheses. Thus the above example could be written:

x = ','
parse value 'Tues., Sept. 29, 1942' with,
   Var1 (x) Var2 Var3

and the same result would occur.

The next variation of the parsing template is the numeric positional pattern. This is similar to the literal string shown above except that the numeric positional pattern indicates the position at which the next token from the source string begins. For example:

parse value 'Tues., Sept. 29, 1942' with,
   Var1 5 Var2 12 Var3

results in:

Var1 = "Tues"
Var2 = "., Sept"
Var3 = ". 29, 1942"

The positional numbers may be relative to the last number used.

parse value 'Tues., Sept. 29, 1942' with,
   Var1 5 Var2 +7 Var3

would yield the identical result as the previous example.

As with literal string patterns, the positional patterns can be specified as a variable by inserting the variable name in the template enclosed in parentheses, in place of the number. An absolute position number is indicated by the use of an equal sign ('='). The relative indicator (i.e. +, - or =) must precede the left parenthesis. Therefore the above example could also be written as:

begin  = 5
length = 7
string = 'Tues., Sept. 29, 1942'
parse value string with,
   Var1 =(begin) Var2 +(length) Var3

Positional patterns can be used to extract fields bi-directionally. Fragment 1 contains an example of the PARSE instruction being used with a template that uses both literal and positional patterns. First, it finds the string "UTC", moves its pointer backward by a number of positions and then moves its pointer forward. The string used in the example is typical of that returned by calling the Naval observatory to get the current time.

Fragment 1

/* Bi-directional positional PARSE template */
response = '1200*49798 081 054505 UTC'
parse value response with ' UTC',
   -06 UTC_hours     +02,
       UTC_minutes   +02,
       UTC_seconds   +02,
   -10 UTC_ddd       +03,
   -09 UTC_day_count +05

say 'UTC_hours ="'     || UTC_hours     || '"'
say 'UTC_minutes ="'   || UTC_minutes   || '"'
say 'UTC_seconds ="'   || UTC_seconds   || '"'
say 'UTC_ddd ="'       || UTC_ddd       || '"'
say 'UTC_day_count ="' || UTC_day_count || '"'

There is one "gotcha" you must watch for when mixing literal and positional patterns in a template. When switching to a positional pattern following a literal pattern, you must specify the length of the literal pattern before switching to a positional pattern. In Fragment 2 the explicit length of +3 must be used following the literal pattern of 'def' in order to assign the character g to the variable char1.

Fragment 2

/* literal followed by positional */
string = 'abcdefghijklmnopqrstuvwxyz'
parse value string with,
   'def' +3,
   char1 +1,
   char2 +1,
   rest
say 'char1 = "' || char1 || '"'
say 'char2 = "' || char2 || '"'
say 'rest  = "' || rest  || '"'

Table 1 - PARSE Formats

PARSE [UPPER] ARG [template]
Parses the arguments according to template from a function or subroutine call, optionally first translating them to uppercase.
PARSE [UPPER] LINEIN [template]
Parses the input from the default character input stream according to template, optionally first translating it to uppercase.
PARSE [UPPER] PULL [template]
Parses the next line in the REXX data queue according to template, optionally first translating it to uppercase. If the queue is empty, lines will be read from the standard input stream (normally the keyboard).
PARSE [UPPER] SOURCE [template]
Parses the program's source information (3 tokens) according to template, optionally first translating it to uppercase.

Example:

OS/2  COMMAND      C:\OS2\REXXTRY.CMD
OS/2  SUBROUTINE   D:\OS2\rexxtry.CMD

Note: If issued within a subroutine, the information reflects the parent.

PARSE [UPPER] VALUE [expression] WITH [template]
Parses the value of expression according to template, optionally first translating it to uppercase.
PARSE [UPPER] VAR name [template]
Parses the value of name according to template, optionally first translating it to uppercase.
PARSE [UPPER] VERSION [template]
Parses the information describing the language processor and level followed by its date, according to template, optionally first translating it to uppercase.

Example:

REXXSAA          4.00  08  Jul  1992
REXXSAA          4.00  10  Feb  1994    (V3)
REXXSAA          4.00  24  Aug  1996    (V4)
OBJREXX          6.00  12  Jul  1996   (OBJ)
REXX/Personal    4.00  12  Oct  1994

Table 2 - PARSE templates

A list of symbols separated by blanks or patterns which include:

variable name
the name of a variable to be assigned a value
literal
used to match within the input string
(variable name)
variable whose value is used to match the input string
. (period)
a placeholder that receives part of the input string, except that no assignment is actually performed
integer
absolute character position in the input string
=integer
same as preceding
+integer
relative position in the input string
-integer
same as preceding
=(variable name)
variable whose value specifies an absolute character position
+(variable name)
variable whose value specifies a relative character position
-(variable name)
same as preceding

In addition, a comma can be used in the template for PARSE ARG to indicate that the next argument becomes the input string for the following portion of the template.