NetRexx: An Alternative for Writing Java Classes

From EDM2
Jump to: navigation, search

Reprint Courtesy of International Business Machines Corporation, © International Business Machines Corporation

by Mike Cowlishaw

NetRexx, a new human-oriented programming language, is designed as an alternative to the Java language. The reference implementation of NetRexx compiles with the Java Virtual Machine (JVM) and allows programmers to create programs and applets for the Java environment faster and more easily than by programming in Java.

The constraints of safety, efficiency, and environment necessitated that this language would have to differ in some details of syntax and semantics from REXX; it could not be a fully upward-compatible extension. The need for changes, however, offered the opportunity to make some simplifications and enhancements to the language, both to improve its safety and to strengthen certain other features of the original REXX design along with some additions from ANSI and Object REXX.

Equally, the concepts and philosophy of the REXX design can be applied to avoid many of the minor irregularities that characterize the C and Java language families by providing suitable simplifications in the programming model. For example, the NetRexx looping construct has only one form, rather than three, and exception handling can be applied to all blocks, rather than requiring an extra construct. Similarly, as in REXX, all NetRexx storage allocation and de-allocation is implicit - an explicit new operator is not required.

NetRexx classes and Java classes are equivalent; NetRexx can use any Java class and vice versa. Using existing Java classes is especially easy in NetRexx, as the different types of numbers and strings that Java expects are handled automatically by the language.

The end result of adding Java typing capabilities to the REXX language is a new language that has the REXX strengths for scripting and for writing macros and the Java strengths for good efficiency, portability and security at the environment level.

Design Goals: Why Use REXX Syntax?

The objective for NetRexx is to be as efficient as languages such as Java, while preserving the low threshold to learning and ease-of-use of the original REXX language. To meet this objective, it must be possible to use the language for writing the simplest of programs and macros as easily as in REXX, as well as for writing applets and applications that can be executed as efficiently as those written in Java.

A further requirement for a language designed to be suitable for execution from source across an internet is that it can be extended safely over time. Languages invariably need to evolve over time as the needs and expectations of their users change.

There are many approaches to designing a language with these attributes. Those considered were:

  • Start with a C- or Java-like syntax and simplify the language to make it easier to use.
  • Start with a BASIC-like syntax, simplify it to make it easier to use, and extend it to deal efficiently with the semantics of the JVM and its object model.
  • Start with a REXX-like syntax and extend it to deal efficiently with the semantics of the JVM and its object model.
  • Design a whole new language.

The third option seemed the most attractive, especially as it offered the possibility of further streamlining and simplifying the original design with the benefit of hindsight. In NetRexx, the great strengths of REXX (uncluttered syntax, comprehensive string handling, and human-oriented arbitrary precision decimal arithmetic) have been preserved. Keyword safety, an important feature of REXX that allows the language to be extended substantially, has been enhanced. Because of the many similarities, the following sections describe where NetRexx differs from REXX and illustrate the feel of the language with examples.

Syntax and Structure

As in REXX, the primary data type in NetRexx is the symbolic (character) string, to which both string and arithmetic operations can be applied. In practice, this is implemented as a class called REXX. Subclasses are used under the covers to improve the granularity of the design.

To a new programmer who only needs to deal with symbolic data, these data can be represented by the REXX class and the concept of types and explicit typing need not be introduced.

Also as in REXX, there is an emphasis on simplicity in NetRexx; indeed, the Hello World program looks exactly as it does in REXX:

/* This is a sample program */
say 'Hello World!'

NetRexx syntax follows REXX very closely, particularly in the concept of bounded small syntactic units called clauses, which allow accurate and localized error reporting. The tokenization of NetRexx programs into clauses is the same as in REXX (blanks adjacent to special characters disappear, strings can be delimited with single or double quotes, /* comments nest, etc.), with only minor differences. The most significant of these are:

  • Prefix operators cannot repeat. This has freed the sequence '_' for introducing a comment that is terminated by end-of-line, for example:
    Say "Hello" -- displays a message
  • A number in exponential notation must have a sign after the E, hence avoiding confusion with hexadecimal symbols such as 1E0.
    The following are valid:
    12E+2 12e-3 12.7E+0 1E-1
    A symbol that starts with a digit must have the syntax of a number.
  • Square brackets, rather than a period, are used for array and indexed string notation, as in Object REXX and Java.

Clauses in NetRexx are the basis for instructions; these may be simple assignments or method calls, formed from a single clause, or they may be more complex instructions such as if .. then ..else. The flow-of-control constructs are the same as in REXX, except that the do instruction (which acts in REXX as both a grouping and a looping construct) has been split into do (which just groups) and loop (which usually repeats) for clarity. For example:

loop i=1 to 10 while j<3
say 'Hi'
if j=k then leave i
say 'Lo'
end i

As in REXX, end and leave may optionally name the control variable for improved readability and error checking. In NetRexx, this has been extended to all of the flow-of-control constructs.

Note that in REXX and NetRexx, notation (such as parentheses and separator semicolons) can be kept to a minimum. For example, semicolons are only needed if more than one instruction is to be placed on a line. A hyphen (instead of the comma in REXX) is used to indicate continuation of a clause to a following line.

Keywords are not case-sensitive, as many people prefer mixed-case or uppercase keywords. In addition, a symbol will never be recognized as a keyword if it has already been used as the name of a variable in the current scope. This important rule, together with limited static scoping that defines which symbols are variable names at a given point, provides the keyword safety that is a feature of NetRexx. It also aids learning the language, as the programmer need not have learned all the keywords used by the language before choosing a variable name.

The use of keywords instead of braces for block constructs makes the language easier to read for many people (and it is less easy to overlook or omit the beginning or end of a block).

This use of keywords also permits a very natural extension to the exception handling style used in Java. In NetRexx, all three control constructs (do, loop, and select) are closed by an end clause; in all three cases catch and finally clauses (semantically equivalent to those in Java) may be included prior to the end. This avoids introducing a special try construct into the language, and in practice, often avoids extra levels of nesting in programs:

if a=b then do 
/* something that might raise an Exception */
catch Exception -- unassigned
say 'Something happened' 
finally say 'All done'
end -- ends the DO group

In an object-oriented environment such as that supported by the JVM, a comprehensive library of classes, including a variety of input and output possibilities, can be assumed. This allows several simplifications to REXX; in particular, the concept of an External Data Queue has been removed.

For simple stand-alone applications, NetRexx source code can be included in an input stream (such as a file) with no preamble. As shown in the above examples, the language processor can provide an appropriate default method and class. For more complicated applications, multiple methods and classes can be explicitly defined with the method and class instructions, as in the following partial example.

   class DemoApp extends AppletLike
       name=""
       value=5
   method init
       Name=getParameter("text")
       if Name=null then Name="Rexx"
   method show(number=int)
       say 'The number is ' number
       say 'The name is ' name
   method pause(time=value*1000)
       say 'Pausing for ' time'ms'
       Thread.sleep(time)

Selected defaults are applied throughout; the first class is assumed to be public, properties (instance variables) are assumed to be inheritable (visible only to the current class and subclasses), and so on.

In this example, name and value are properties of the object; all properties must be introduced before the first method. Each method is ended by the start of a new method or class, or by the end of the source.

NetRexx is case-independent and case-preserving as far as is practical. New names that are externally visible or refer to external identifiers are used as first written (as the environment may be case-sensitive), but internal names (such as variable names) and references to existing external names are not case sensitive, except to resolve ambiguity.

  • The first method (init) in the above example shows references to the name variable in a different case mixture to the original use.
  • The second method (show) illustrates the definition of a parameter; its type is set by assignment, as elsewhere in the language.
  • The argument to the third method (pause) shows the syntax for optional arguments; if the caller provides an argument then it will be used, otherwise the default expression will be evaluated at run-time and used for that argument. The JVM constrains optional arguments to be from-the-right only (an argument may not be omitted if an argument is provided to the right of it).

The current definition also allows environment-dependent instructions before the first class instruction. For the Java environment, these include the package and import instructions. The package instruction essentially is the same as in Java; import has slightly different syntax and semantics--importing a hierarchy is permitted.

Expressions and Operators

NetRexx expressions follow the rules of REXX. All REXX operators are valid, with REXX precedence rules during evaluation. Parentheses can be used, as usual, to alter operator precedence.

Operators act on either one or two terms. Terms (data descriptors) may be any of the following:

  • Literal strings and numbers
  • Sub-expressions (in parentheses)
  • References, which can be simple symbols, array variable or method argument, a property of the current class, or a type (primitive or defined by a .class file - see below).
  • Compound references, which comprise the simpler forms of reference combined using the . connector. Syntactically, any combination is valid, though some may be invalid semantically. The following are examples of syntactically valid compound references:
myaddress.street
list[3].word(n).left(1)
is.openstream
'fade'.x2c

The semantics of a compound reference follow the rules of Java. A compound reference must start with either a reference or a type, where the type may be qualified. Each piece after the next connector can further refine the term, returning a reference or (optionally, if it is the final part of a term) a value. The end result of a term is therefore either a type, a typed reference, or a typed value.

Types, Variables and Names

As indicated above, NetRexx includes the concept of types (classes), although many programs and applets can be written without explicitly involving type declarations or conversions.

The type of a NetRexx variable is determined statically and is the type of the result of the expression that first is assigned to it (first also is determined statically rather than dynamically). An assignment to a variable is allowed, in general, if the value can be assigned without significant loss of information. If the value of the expression is simply a type, the variable is, in effect, just declared and assigned an initial value appropriate for the type.

Variables in NetRexx refer to objects (possibly with only local scope). By default, the type (class) of literal strings and numbers are REXX and variables used for symbolic manipulation and calculation tend to be of the REXX type. This, together with the automatic conversions between well-known types, allows the language to remain simple and easy to learn, as the different types of numbers and strings that Java expects are handled automatically by the language. Potentially more efficient (though of course less human-oriented), binary representations of literals and binary operations can be selected optionally as the default.

If required, the type of a value may be overridden by explicit conversion operations (called casts, in C). The syntax for this is the familiar REXX blank operator: when the term on the left-hand side of a blank operator results in a type then the operation converts the result of the term on the right-hand side of the operator to that type, if allowed. For example:

int 7.0
int 7.3**5
Exception e
Complex vector

The priority (precedence) of the conversion operator is the same as when it means concatenate with blank; it is lower than arithmetic operators but higher than logical operators.

Certain types, such as int, may be defined as primitive by the underlying environment, and this may affect their semantics. For example, objects of a primitive type may be passed by value rather than by reference. However, NetRexx makes no syntactic distinction for primitive types, and provides constructors for them for completeness.

An implementation of NetRexx can determine from knowledge of the characteristics of types (and other factors) when an object is being constructed, so no new operator is needed. As in Rexx, storage allocation and de-allocation are both implicit.

Arrays

NetRexx arrays (ordered references to objects of the same type, indexed by integers) are essentially the same as those in Object REXX and Java. However, NetRexx extends these to arrays of dynamic size with arbitrary indices, for the REXX type.

Arrays are indicated in NetRexx syntax by the use of square brackets, [], and fixed-size arrays are constructed using a type followed by brackets:

i=int[3]         -- makes an array

Similarly, brackets are used for referring to a member of an array:

i[2]=3
j=i[2]

Regular multidimensional arrays can be constructed and referred to by using multiple expressions within the brackets, separated by commas:

i=int[2,3]
i[2,2]=3
j=i[2,2]         -- j is now 3

The type of a variable that refers to an array can be set (declared) by assignment of the type with array notation that indicates the dimension of an array without an initial size or sizes:

k=int[]          -- one-dimensional 
m=float[,,]      -- three-dimensional

The same syntax also is used when describing an array type in the arguments of a method instruction or when converting types.

Tracing

Like REXX, NetRexx defines run-time tracing as part of the language. Run-time tracing complements interactive debuggers and is useful for rapid analysis of failures, especially in a program that is run remotely.

As the methods in a program run, the flow of execution may be traced, and this trace either can be viewed as it occurs or captured in a file. The trace can show each clause as it is executed, and optionally shows the values of expressions, method arguments, etc. For example, the program:

trace results
    number=1/7
    parse number before '.' after
    say after'.'before

would result in the trace:

2 *=*number=1/7
    >v> number "0.142857143"3 *=*parse number before '.' after
    >v> before "0"
    >v> after "142857143"4 *=*say after'.'before
    >>>"142857143.0"

where the lines marked with *=* are the instructions in the program, lines with >v> show results assigned to local variables, and lines with >>> show results of unnamed expressions.

Development and Current Status

The current implementation of NetRexx is a translator that translates NetRexx source code into Java source code. This approach ensures that valid and compatible Java class files are produced.

The initial implementation of the NetRexx translator was written in REXX, running on OS/2. Once sufficient language had been implemented, along with the REXX class, this translator was ported to NetRexx. This port took three weeks. Currently, the language processor runs on eight platforms. It is interesting to note that the REXX language took nine years to port to eight platforms.

One of the first programs written in NetRexx was a re-implementation of an early Java applet called NervousText. The NetRexx version is available from the NetRexx World Wide Web site at (http://www2.hursley.ibm.com/netrexx). The NetRexx version follows the algorithm and variable naming of the Java version, except that instance variables that are not used have been removed, and those used only in one method have been moved to that method.

Comparing the NetRexx program with the similarly corrected Java original shows that the NetRexx program is simpler and smaller than the Java version. The Java version has 36.9 percent more lexical tokens, and requires 20.3 percent more significant keystrokes (that is, excluding comments and syntactically insignificant white space and line feeds). The size of the resulting class file was effectively unchanged from the original (six bytes larger). Similar results were obtained with other applets that have been re-written in NetRexx.

The size of the differences in tokens and keystrokes was larger than expected; C is often considered to be a terse language, and Java is syntactically close to C. Analysis shows that about half of the difference is in extra punctuation (parentheses, braces, and semicolons), and a quarter is due to the need for explicit importation of standard Java types (NetRexx imports the entire java.tree by default). The remainder is miscellaneous syntactic differences (Java has slightly more complicated looping constructs, adds the try statement, and so on).

Given these encouraging results, work continued to refine and define the language. The first complete beta was released in December 1996. In January 1997, the language NetRexx 1.00 was released by IBM and the language definition (The NetRexx Language, ISBN 0-13-806332-X) was published by Prentice Hall.

Summary

NetRexx combines the strengths of two very different programming languages, REXX and Java. The result is a language which is tuned for both scripting and application development, and is therefore genuinely general purpose. NetRexx has the REXX strengths for scripting and for writing macros and the Java strengths for good efficiency, portability, and security at the environment level.

Mike Cowlishaw, IBM Fellow, is the creator of the REXX and NetRexx languages. He has long been interested in the human aspects of computing, working on the design and implementation of languages, editors, displays, image processing systems, and text formatters. His current technical interests (in addition, of course, to REXX) include user interfaces, the World Wide Web, the Java environment, lightweight computers, and neural networks. You can reach Mike at the IBM UK Laboratories, Hursley Park, Winchester, S021 2JN, UK or via e-mail addressed to: mfc@vnet.ibm.com

Reprint Courtesy of International Business Machines Corporation, © International Business Machines Corporation