Feedback Search Top Backward Forward
EDM/2

Software Conversion, Translation, and Migration

Automating the Construction of Application Generators

Written by Jules Gilbert of SymbTech

Linkbar

 

An Introduction to SymbTech's Apprentice Technology

This technical note describes a powerful new kind of software utility which can reinforce the efforts of programmers and software developers by providing an unparalled level of programming automation, (previously unavailable with other types of software development/maintenance tools).

The underlying basis of this value and quality is a symbolic programming system, augmented by an expert system supplied with a rules base which focuses on compiler writing issues, and includes descriptions of several popular programming languages and also computer science topics necessary for constructing these types of programs.

The product envelope is an interactive platform, and while not making use of Windows or other special tools presents users with a simple basis for performing many kinds of complex tasks common to computer science and general programming.

A working knowledge of C is all that is required to use the program.

When first presented with a project by a user, common practice is to enhance the system with all of the specific knowledge which is known to be relevant.

For instance, if one is building a business tax processor, one should expect to supply the elements of the tax code.

But for compiler and translator projects, the program has already been enhanced appropriately.

The name of this software utility is aiTRAN.

It is the hope of the developer that users, once having used aiTRAN, will agree that this software technology does provide utterly unparalleled automation.

aiTRAN has been successfully employed over a wide class of software engineering projects and it is described here with respect to projects involving:

  • conversion, translation, and migration of software, and
  • the construction of application generators.
For specific programming projects, it is not unrealistic to expect a productivity gain of two orders of magnitude compared to conventional solutions.

Of course, this result takes time to realize, and -- without specialized training -- it is usually necessary to practice for a week or so before a substantial gain in personal programming productivity can be demonstrated.

A Simple Example

A simple example follows, which builds Unix lex/yacc source code.

Make no mistake, the author of aiTRAN does not normally use Lex or Yacc. These tools, while excellent in many ways have the disadvantage of generating tables of constants (rather than maintainable C source) so their value is somewhat limited. But because Lex and Yacc are so well known, this first example demonstrates what is possible within a Lex/Yacc context.

This example is a simple tokenizer for the C language which is constructed with Unix (f)lex, providing support for C language numeric constants.

The file "lexdemo.wrk" describes the C language numeric constants and (f)lex issues are described in the file "lex.wrk".

Running this instructional demo is very simple.

First, three MSDOS 'set' commands insure adaquate virtual RAM and also 'point' aiTRAN to the library directory.

The sets follow:


set AITMEM=32M
set AITSTK=50K
set AITLIB=C:\AITLIB
Invoke aiTRAN:

ait386

An Instructional Example

aiTRAN v1E061,26288K:C/AP All Rights Reserved by SymbTech. Copyright 1991.


[1]     load lexdemo ;  /* Remaining text on this page was
                           produced by running demo file.

                           Try this example!

                           See the file "values.lex". */

#define Clex_oct 1001
#define Clex_dec 1002
#define Clex_hex 1003
#define Clex_flt 1004

%%


([0])((([0-7])+)(([Ll])?))                    {return Clex_oct;}

(([0-9])+)(([Ll])?)                           {return Clex_dec;}

([0])(([Xx])((([0-9A-Fa-f])+)(([Ll])?)))      {return Clex_hex;}

((([0-9])+)([Ee]))\
|(((([0-9])+)(([Ee])(([0-9])+)))\
|(((([0-9])+)\
(([Ee])(([\053\055])(([0-9])+))))\
|(((([0-9])+)([\056]))\
|(((([0-9])+)(([\056])(([0-9])+)))\
|(((([0-9])+)(([\056])((([0-9])+)([Ee]))))\
|(((([0-9])+)\
(([\056])((([0-9])+)(([Ee])(([0-9])+)))))\
|(((([0-9])+)\
(([\056])\
((([0-9])+)\
(([Ee])(([\053\055])(([0-9])+))))))\
|(((([0-9])+)(([\056])([Ee])))\
|(((([0-9])+)(([\056])(([Ee])(([0-9])+))))\
|(((([0-9])+)\
(([\056])(([Ee])(([\053\055])(([0-9])+)))))\
|((([\056])(([0-9])+))\
|((([\056])((([0-9])+)([Ee])))\
|((([\056])((([0-9])+)(([Ee])(([0-9])+))))\
|(([\056])\
((([0-9])+)\
(([Ee])(([\053\055])(([0-9])+)))))))))))))))))) {return Clex_flt;}
The above screen text can be passed directly to a Unix based (f)lex. (Minor editing occurred for typesetting purposes.

Starting the Program

aiTRAN v1E061,10288K:C/AP All Rights Reserved by SymbTech. Copyright 1991.


/* stack: 1714K bytes */
/* heap: 8573K bytes */
/* parser: 250K cells */

/* tuple on; */
/* infix on; */
/* args on; */
/* hybrid off; */
/* cshow on; */
/* exam off; */
/* echo on; */
/* compile on; */

/* tuple;   */   /* Show parentheses at the top-level.         */
/* infix;   */   /* Show parentheses as supplied to operators. */
/* args;    */   /* Show parentheses at function invocation.   */
/* hybrid;  */   /* Many character class tokens, like ".AND.". */
/* cshow;   */   /* Show character's in C escapes or literally.*/
/* exam;    */   /* Show various kinds of debug information.   */
/* echo;    */   /* Echo aiTRAN program text.                  */
/* compile; */   /* Emit C source code from an aiTRAN program. */

echo off ;


[1]     exit ;
The above screen text was captured from an invocation of aiTRAN.

Source code translation, conversion, and migration

aiTRAN is a fully symbolic programming system developed by SymbTech.

Special application-oriented versions of aiSOFT and aiTRAN are available which support development and maintenance efforts by programmers working with several assembler dialects as well as PASCAL, BASIC, C, PL/1, FORTRAN, COBOL, AND ADA.

Also, this technology facilitates "mixing and matching" of programs so that source code translators, migrations tools, and conversion programs can be prepared quickly and inexpensively.

So, Apprentice technology makes it cheap and easy to originate, maintain, convert, migrate and translate source code materials in any of these base languages.

Also, various other projects which make use of or interface these langauges can be significantly easier to complete.

Depending on the level of performance required and the size and scope of the project, virtual memory requirements will vary from 12-250M, with perhaps 5-25% of that total amount being actual RAM. ('compile' flag off. )

Virtual memory is depleted by three major consumers:

  • The parser stack,
  • an internal stack,
  • and a heap area.
The first two spaces persist well in a virtual environment. Only the heap space is sometimes troublesome.

A Programmers' Introduction

This is an annotated introductory session using the famous "Hello, world" program.

A PORTION OF THE 'capini.wrk' SCRIPT

This script contains the definition of the 'Csrc' function (see above at prompt #4) used to reconstitute source code.

Notice the reference to 'paras' in capini? It's defined in 'misc.wrk'. It is a trivial utility, it merely encapsulates its argument with parens and returns.

And it takes a single line!

'misc.wrk' gets loaded by the 'load' statement which begins the 'capini' script.

Notice the 'Blok' function. It accepts one of three non-parameterized constructors and returns a function, which by declaration is known to require one input argument, a string, and to return one output argument, a string.

See how 'Blok' is used in the 'Csrc' function.

So Csrc is simply a means of re-building source code after manipulating C source. Hopefully, as you see more functions written in aiTRAN, (the underpinnings of Apprentice technology) the purpose and application will become obvious on inspection.

Basic Data Types

aiTRAN supports two basic data types. These are 'char' and 'num'. Everything else is built on these.

'char' will match any ASCII character and 'num' matches any number, whether (in C nomenclature) int or float.

These datatypes were originally based on the definition of a bit.

Datatypes may be produced in three ways;

  • aiTRAN supported primitives, like 'num' and 'char'.
  • composite types may be produced by the 'tuple' command.
  • derived types may be produced by the 'form' command.

Mechanical Construction of lex and yacc materials

SymbTech sells a compiler for a small language that is built around a textual pattern matching run-time. The output of aiSOFT is C source code.

This language is aiSOFT and is designed as a front-end or 'reader' program. With a small amount of application coding, a day or so, a program can be written to read (ie., lex and parse) any computer programming language.

Unlike all other compiler-writing tools, building such tools does not require one to be a skilled compilter writer.

Merely supplying aiTRAN with existing 'client' source code examples is enough to build high quality front-end tools which not only lex and parse, but also apply selected actions to 'seen' syntactic instances. aiTRAN is unique here.

With the first version of aiSOFT (1986), a first rate Fortran to C translator was written in just under three weeks, and in less than 3000 lines on a MS-DOS machine with no extended memory features.

(It changed the industry as it quickly dominated the Fortran-to-C marketplace.)

Originally, aiSOFT and aiTRAN were hand-coded. But using aiTRAN, a description of the these langauges and run-time environment was prepared and the present version of these products was derived from an description of the program.

Here is the layout of some of the important details:

Several language models have to be described:

lex, yacc, and both aiSOFT and aiTRAN had to be described. These descriptions covered not only the lexical and syntactic details of the these products, but also (using the 'say' statement) the semantics of each language was described.

The next page documents some portions of the abstract syntax and operational semantics for a recent version of aiSOFT. Only highlights are shown here. The complete description of aiSOFT was less than 15k-bytes (written in aiTRAN), and the aiTRAN description of aiTRAN was about 45k-bytes.

The user should understand that this technology makes it possible to build high quality C-based software with no hand-coding.

An aiSOFT Filter to Process yacc Source

The aiSOFT source code listing which follows was used to read the ADA grammar (supplied as a yacc file by the Army to anyone who asks politely) and re-emit a replacement yacc source file.

The purpose here was to build a front-end for ADA. The yacc file output by this ai-SOFT program contained yacc actions not present in the original input file.

No other technology can mechanically supply actions without hand coding.

This example is interesting because it demonstrates that users do not need to be computer scientists to make good use of this technology.

IE., with the toolset provided as part of the extended aiTRAN bundle, 'compiling' aiSOFT programs is trivial. And thus, programmers who may never learn the details of symbolic programming can still obtain the advantages, to wit, high quality source code in C or other target languages.

And the coding time for this kind of high level source is perhaps 1% of the coding time if the program were written in C or ADA.

And by the way, this program was sufficiently difficult for the author to conceptualize that without aiSOFT acting as an intellectual accelerant, such a program would likely never have been written at all.

An aiTRAN based description of this aiSOFT source file is shown, beginning on the next page.

Manipulating aiSOFT Source Files

Just as C source files can be processed into an aiTRAN compatible format, the "module" command can also be used to prepare aiSOFT files for subsequent manipulation.

The source code shown in the above example is compiled here into C source.

Use the same command,


module "srcfile.snp" ;
The 'module' command can process several different types of output, including C source.

The following script is a faithful reproduction of the preceding aiSOFT source file, converted into an aiTRAN compatible format.

Note that aiSOFT files expect to make use of the 'sapini.wrk' script package.

Build usable source code

Usually, the basic underlying objective with respect to processing aiSOFT source code is to convert the input program, now represented as a data structure, into appropriate lex, yacc and C action structures for output. First, various transformations must be performed.

Here are some simple examples:


LEN ( 4 ) ;
is really just shorthand for,

ANY ( all ) <&> ANY ( all ) <&> ANY ( all ) <&> ANY ( all ) ;
where 'all' represents the set of characters comprising those characters which can be matched in the above example, LEN(4).

'all' is also algebraically defined. Remember the '---' operator described in the introductory tour. 'all' is just the following:


dcl all : list ( char ) ;

say all <== chr(0) --- chr(255) ;
It's also nice to say:

dcl lower_case_letter , upper_case_letter : bnf ;

say lower_case_letter <== ANY ('a' --- 'z') ;
say upper_case_letter <== ANY ('A' --- 'Z') ;
and then, to build a 'letter' by 'say'ing something like:

dcl letter : bnf ;

say letter <== lower_case_letter <|> upper_case_letter ;
Another interesting simplification:

dcl nbr : bnf ;

say nbr <== SPN ( '0' --- '9' ) ;
This is a useful pattern which will match an integer or any contiguous span of digits.

Optimizations and conversion to C

Remember, whenever a pattern is 'interpreted', runtime efficiency is lost. So the goal is to produce tight directly executable C code rather than code which must be threaded ('interpreted').

And, within the context of aiSOFT, the goal is to generate the tightest C obtainable, or at least, the tightest derivable source based on the supplied information. Various optimizations are necessary.

Assume the existence of a function 'cvt1' which converts a 'bnf' structure to another intermediate, 'bnf' construction.

It is this intermediate 'bnf' construction which is subsequently converted to a yacc representation by the 'cvt2' function. The 'cvt3' function, finally, converts the 'bnf' function to source.

(In this case, we are using the PCCTS compiler tool.)

Here it is:


dcl cvt1 : bnf --> bnf ;
dcl cvt2 : bnf --> yac ;
dcl cvt3 : yac --> str ;
A few of the actual intermediate simplifications follow, (the last two are useful in finding and processing partial results which have been processed by other rules.)

say cvt1 ( NLX <&> b ) <== cvt1 ( b ) ;
say cvt1 ( a <&> NLX ) <== cvt1 ( a ) ;
say cvt1 ( a <|> NLX ) <== NLX <|> cvt1 ( a ) ;

say cvt1 ( a <&> b ) <== cvt1 ( a ) <&> cvt1 ( b ) ;
say cvt1 ( a <|> b ) <== cvt1 ( a ) <|> cvt1 ( b ) ;

Practical aiSOFT

The following example was taken from a real application written in aiSOFT.

By the way, the first stage of aiSOFT is an ordinary C pre-processor. It's just convenient!

What follows is a cute little function which was originally written as part of a major Fortran conversion project. It converts Fortran Holleriths to strings. That's all it does.

Why is it significant?

Because doing this in C, with the lex and yacc tools is a disaster.

In fact, about 25% of the lex source to a famous Fortran-77 compiler is just complications from trying to handle Holleriths and other sticky Fortran tokens.


#define   nbr  spn("1234567890")

function UnHol(s)
 local siz, tmp
  while s ? nbr.siz & ('H'|'h') do
        s ? nbr     & ('H'|'h') & *len(*siz).tmp <- tix2(tmp)
 return s
end

/* tix2 is actually a macro, but if it were not,
   this would be it's definition */


#define tic2 '"'


function tix2(s)
 return tic2 || s || tic2
end
Unhol gets used in the following code:

function module (fileid,eid)
 local i, j, text, form

   MSGO := 'Fortran Module: ' || eid || crlf_

   kount := 0
   count := 0
   lastc := 0

   p_cmt := lm_ & '      ' & ('c'|'C') & rem.cmtext
   p_lbl := lm_ & len(6).numb & rem.ftntxt
   p_fmt := lm_ & 'FORMAT' & wx_ & '(' & arb.fmtext & ')' & wx_ & rm_

   while line := DSNI do { count +:= 1

      if line ? p_cmt then { tabl('CMT',lastc,temp); lastc := count }
                      else { line ? p_lbl
                             numb := UnZro(numb)
                             temp := upper_(UnBln(
                                       UnStr(UnHol(temp))))
/* split up line if ';' are found */
         while text := Semi() do { if text === '' then exit
                                   last := text
            if text ? p_fmt then tabl('FMT',numb,format_(fmtext))
            else { if text ? pgmdcls then { info := table()
                                            tabl('STM',count,text) }
               else { loops()
                      labgen()
                      tabl('STM',count,text)
                      endloop(numb) }}}}}
end
The C source code to a Fortran front-end is here.

Information

Jules Gilbert of SymbTech
Phone:(508) 695-3783

All materials are Copyright 1988 - 1992

'Apprentice technology' is proprietary to SymbTech, and is closely held as an unpublished trade secret. Certain information, not disclosed herein, is made available exclusively to licensees.

 

Linkbar