Jump to content

Drawing a directory tree with REXX

From EDM2
Revision as of 15:04, 12 November 2016 by Ak120 (talk | contribs) (formatting - to make it readable for users that don't use >2000px wide screens)

By Gordon Snider

I wanted to get a bird's eye view of my directory structure and disk space usage. But I wanted to use the command line, without a lot of CD commands to switch into directories. While the OS/2 TREE command produces lots of output, it's not very useful, and certainly is not very compact. So I decided to see if REXX could help me out. It turns out it can.

The name

This utility, which I named TRE.CMD, draws a diagram of a directory tree starting from the current directory and working its way out to the leaf directories. (Enhancements planned for the future will also show disk space usage in each directory.) I chose the name TRE.CMD because the command TREE is already in use. Since I like short names, I dropped the last E.

TRE may produce a lot of output. If the output is too long, pipe it into the MORE filter:

TRE | MORE

Or redirect it to a file:

TRE > BRANCH.TXT

The file can be read with a text editor such as E.EXE or EPM.EXE.

Details

The problems to be solved are:

to discover all the subdirectories of the starting directory, and
to present the output on the screen in a way that makes sense.

TRE.CMD uses the SysFileTree() function from RexxUtil to discover the subdirectories. (If you aren't loading RexxUtil at boot time, see Loading RexxUtil at boot time to learn how.)

To make sense, the output needs a codepage that provides line drawing characters. The characters used by TRE are the vertical bar │ (at decimal 179), the right angle └ (at decimal 192), and the sideways T ├ (at decimal 195). Codepage 437, the default codepage for US English, is one codepage that provides line drawing characters.

TRE uses recursion to walk a directory tree. In OS/2, REXX supports 100 levels of nested control structures. This affects the program by setting a practical directory nesting limit of about 50 deep, from the current directory to the farthest leaf. If your directories are nested deeper than this, the command will fail with the message REX0011 Control Stack Full. Nothing is harmed, but TRE won't draw the full tree. The workaround is to run TRE from a directory closer to the leaf directories and collect the output in stages.

The program uses recursion inside a loop, almost the same as RDD.CMD (which deletes a whole branch of a directory tree). [See the extended attributes article Pruning a directory branch for more on RDD.] Using recursion inside a loop simplifies the code.

Kernel

Here is the kernel of the TRE command. I added line numbers to make it easier to explain the logic - they aren't part of the code.

I nominate this as the REXX command with the fewest clauses that produce this output.

 0.  /* TRE.CMD     REXX CMD  by  Gord Snider  v1.0   */
     /* PURPOSE:  To draw a directory tree from the
           current directory.
        SYNTAX:  TRE
     */
 1.  current = STRIP( DIRECTORY(), 'T', '\')
 2.  SAY SUBSTR(current, LASTPOS('\', current) + 1)
 3.  CALL next current
 4.  EXIT 0
 5.  next: PROCEDURE
 6.  PARSE ARG nextdir,spacer
 7.  CALL SysFileTree nextdir || '\*', 'dir.', 'DO'
 8.  DO dir = 1 TO dir.0
 9.    IF dir < dir.0 THEN leader = D2C(195)
10.                   ELSE leader = D2C(192)
11.    SAY spacer || ' ' || leader || SUBSTR( dir.dir, LASTPOS('\', dir.dir) + 1)
12.    IF leader = D2C(195) THEN leader = D2C(179)
13.                         ELSE leader = D2C(32)
14.    CALL next dir.dir, spacer || ' ' || leader
15.  END dir
16.  RETURN

Explanation

line 1: Save the full directory path so we can return here later.
line 2: Put the unqualified current directory name on the screen.
line 3: Call the NEXT procedure passing the current directory name.
line 5: Hide old set of variables, enable new set.
line 6: Variable spacer is each line's indenting characters.
line 7: Are subfolders here? If so, put them in a list in dir..
line 8: If there are any in the list begin to process them.
line 9: If this is not the last subfolder then leader gets ├.
line 10: If it is the last subfolder in this list then leader gets └.
line 11: Output a line to the screen containing indenting characters and a directory name.
line 12: Modify the leader character so it is suitable for the next line.
line 13: If this is the last directory in this list close off the line with a space.
line 14: Pivot point of the recursion. Everything ahead of here is head recursion, everything following it is tail recursion.
line 15: Back to top of dir. loop. This is tail recursion, executed only after a RETURN keyword is executed.

Recursion inside a loop

Books on programming talk about recursion and loops as alternate ways to execute code more than once. However, recursion has the benefit of also hiding variables (with the PROCEDURE keyword). This allows the SysFileTree() function to be called repeatedly to expose the name of every subdirectory below the current directory.

Using recursion inside a loop was interesting for several reasons, but mainly because it made the code shorter. The loop's index variable, dir, counts up one for each subdirectory in the folder. Each time the subroutine is called recursively, the focus moves deeper in the directory tree by one nesting level. Recursion hides the current set of variables, so the leader and spacer variables, the fully qualified name of the current directory, and any remaining entries in the list dir. are preserved by the system. This saves having to write code to preserve them.

No matter what directory is being looked at, the question is the same: Are any subdirectories here? If there are, then SysFileTree preserves a list of them, and the code starts a loop to process each one in turn. If not, then a leaf directory was reached, so end off on that level of recursion and return to the previous level. This re-activates the prior values of the spacer, leader, dir., and dir variables.

The spacer variable holds enough space characters to maintain the indenting. This aligns the text to show the relationship of each directory to the others.

The SAY clause SUBSTR() function separates the name of the current directory from the full path name.

Each time the code reaches line 14, CALL, control goes back to line 5 and at line 7 the next level of directory nesting is accessed, if any. Thus the various levels of variables associated with each level of nesting are kept separate.

Installation

Copy the file into your REXX command directory. (The REXX command directory should be in your CONFIG.SYS PATH statement.) Load RexxUtil, if it's not loaded at boot time.

This gives a bird's eye view of the directory structure as I wanted.