SMARTsort: The Choice for Handling Workstation Data

by Tony Gilbert

SMARTsort is a fast and flexible data manipulation tool for workstations. You can use SMARTsort to Sort, Merge, Copy, and Check data. You can call SMARTsort either from your application or from the command line, and use it to process data that is either byte-oriented or record-oriented.

Note: Volume 10 of The Developer Connection for OS/2 CD-ROM contains a Beta version of SMARTsort for OS/2. For complete information about SMARTsort, see the online SMARTsort Guide and Reference for Workstations, which is installed on your workstation when you install SMARTsort for OS/2. The guide contains additional examples of command line syntax, API and user-exit coding, error message text, and explanations.

SMARTsort is not related to the Source Migration Analysis Reporting Toolset (SMART) product from One Up Corporation.

What SMARTsort Can Do
The SMARTsort command line syntax was developed to conform to the X/Open Portability Guide Issue 4 (XPG4) specifications for commands and utilities. SMARTsort actually is a superset of the XPG4 X/Open SORT command, with additional functions and features not defined in the X/Open XPG4 specifications.

With SMARTsort, you can:
 * Sort Data - Sort records in either ascending or descending order. You can either direct the sorted output to a separate file, or specify the same file name for both input and output, which replaces the contents of the input file with sorted data. If you do not specify a key-field, SMARTsort treats the entire record as the key-field.
 * MergeData - You can merge two or more previously sorted files to form a single file of sorted records. All input files must be sorted and in the same order(ascending or descending). Otherwise the output file will not be sorted.
 * Copy Data - You can copy one or more files into a new file. No sorting or merging takes place, and the input records don't need to be in a particular sequence.
 * CheckData - The Check function simply verifies that the input files are sorted in either ascending or descending sequence.

Using various options of the Sort, Merge, and Copy commands, you can also:
 * Restructure Data - Rearrange input record fields to different locations in the output records. You can also add data constants to all the output records.
 * FilterData - Exclude or include input records based upon specified criteria.
 * Select Record Positions - You can specify where to begin and end processing of input files. The SKIPREC ExtendedOption indicates the number of input records to skip before processing begins. The STOPAFT ExtendedOption specifies the total number of input records that SMARTsort is to process.

Cultural Processing and National Language Support
SMARTsort also provides culture-sensitive national language support (NLS) through the use of environmental variables and Locale facilities. A Locale identifies how character data of a given language is interpreted, the code page to use, and the collation rules for sorting and merging. SMARTsort uses a default Locale for collation rules unless you specify a Locale with the -x locale ExtendedOption.

Using SMARTsort from the Command Line
SMARTsort lets you create powerful sets of commands and save them for reuse. By placing SMARTsort commands in a parameter file, you simply reference the parameter file name instead of manually entering long command-line sequences.

Using Parameter Files
Specify a parameter file name with the -p PrimaryOption. The following example shows how parameter files can reduce the number of keystrokes to execute a complex SMARTsort command. By creating the parameters files shown in Figure 1, you can reduce keystrokes and possible keying errors, for example: smrtsort -p main.tpl Is equivalent to:  smrtsort -x 'format int int char 2 double' -k 1,1r -k 2,2

eq " #" and 1,1 eq 2,2'

FILE                CONTENTS

main.tpl:           # This is my main parameter file #                                                                                            -p myformat.tpl                          # format -p mykeys.tpl                           # key-specs -p myfilter.tpl                         # incl/omit #                     # End of main parameter

myformat.tpl:       -x 'format int                                     # first field int                                     # second field char 2                                  # third field double'                                 # fouth field

mykeys.tpl:         -k 1, 1r                                 # descending first field -k 2,2                                  # ascending second field

myfilter.tpl:       # notice the use of                      -x 'select                               # comment...                      3,3 eq " \#"                             # comment... and 1,1 eq 2,2  ''Figure 1. SMARTsort parameter file example''

Defining Keys and Field Ranges
A key is usually a portion of an input line or record. You identify the key to SMARTsort using -k, optionally followed by a field number, column number, and bit number. If you don't specify a field, column, and bit number, SMARTsort assumes the entire record or line is the key. A key can be a single field or a field range. A simplified syntax diagram for specifying the start and end of a key field looks like this: k [Fstart][.Cstart][.Bstart][Modifier][,[Fend][.Cend][.Bend][Modifier]] Where: Keys are defined with the -k PrimaryOption, for example: k 3.2 Where -k means the key starts with the third (3) field and the second (.2) column. The key is assumed to extend to the end of the record or line. For example:  Before Sorting    After Sorting KEN JONES 99953   DAVID YU 99934 DAVID YU 99934    HARRY GEORGE 99943 MARY ANNE 99962   KEN JONES 99953 STAN MANN 99987   MARY ANNE 99962 HARRY GEORGE 99943 STAN MANN 99987  The following example uses modifiers: k 3.2b,3r Where 2b means the key starts with the second non-blank column. The ,3r indicates the third field is also the ending field and the output is sorted in reverse (descending) order. For example:  Before Sorting    After Sorting KEN JONES 99953   STAN MANN 99987 DAVID YU 99934    MARY ANNE 99962 MARY ANNE 99962   KEN JONES 99953 STAN MANN 99987   HARRY GEORGE 99943 HARRY GEORGE 99943 DAVID YU 99934 
 * -k means the following field or field range specification is a key.
 * Fstart is the starting field number.
 * Cstart is the column number within the starting field.
 * Bstart is the bit position within the starting field.
 * Modifier is a modifier (b, d, e, f, i, n, r) associated with this field.
 * Fend is the ending field number within a field range.
 * Cend is the column number within the ending field.
 * Bend is the bit position within the ending field.
 * Modifier is a modifier (b, d, e, f, i, n, r) associated with this field

Defining the Structure of Your Input Data
If your data consists of only text data, SMARTsort expects a record-separator character between each line or record of line_sequential files. Text-only files also must have a field separator character between fields within the individual records or lines. A field separator is a blank or tab character by default. If your data has a different field separator character, use the -t PrimaryOption to define it to SMARTsort.

For virtual storage access method (VSAM) files, the records must be fixed or variable length. For text-only files, there is no need to explicitly define the structure of your input data to SMARTsort.

However, for binary files you must have an explicit definition for the entire input record. Field separators are not used by SMARTsort with binary files. You must use the SMARTsort format ExtendedOption to define the format of the input lines or records. For example, to define a binary file with fixed-length records and the following structure:
 * A signed integer field (int),
 * A10 - characterfield( char10 ), and
 * A double precision floating point field (double)

You would enter the following: smrtsort -x 'format int char 10 double'

Filtering Data
SMARTsort has two synonymous ExtendedOption operators, filter and select, that let you include or exclude data using relational operators as well as logical and, or, and not operators. You can use these operators to create a powerful and concise SMARTsort command to selectively process your data. In this example, only input records containing identical data in the sixth and eighth fields would be eligible for being placed in the output: smrtsort -x'filter (6,6 eq 8,8)' Where:
 * -x is the ExtendedOption identifier.
 * 'filter...' is the filtering operator (select could have been used in place of filter).
 * 6,6 identifies field-6 as the left operand of the EQ operator.
 * eq is the equal comparison operator.
 * 8,8 identifies field-8 as the right operand of the EQ operator.

Defining the Structure of Your Output Data
Defining how your output data is to be structured is another powerful feature of SMARTsort. By default, SMARTsort uses the input structure to format the output structure.

If you want to change the content and size of your output data structure from a Sort, Merge, or Copy operation, you must use the reformat ExtendedOption. Using input field-range references, you simply specify the order in which the input fields are to be placed in the output. For example, you can: For more details about using there format Extended Option, see the SMARTsort Guide and Reference for Workstations.
 * Delete a field or field range.
 * Reorder the position of a field or field range.
 * Insert or add constants (blanks, zeros, special characters) to your output data.

Calling SMARTsort from Your Program
An application programming interface (API) allows programs written in C, C++, COBOL, FORTRAN, or PL/1 to dynamically invoke the SMARTsort functions.

This article concentrates on using SMARTsort from a C-language application in OS/2. Refer to the SMARTsort Guide and Reference for Workstations for details about using other languages and operating systems.

Support for user-written input/output Exit routines, for both Sort and Merge operations, provides another level of control for sophisticated users to interact with SMARTsort at the record level. Exit routines are specified as parameters on the program call to SMARTsort. The actual routines can be written in any language that supports a "call interface," as shown in the examples in the following sections.

Application Program Interface
When SMARTsort is called from a program, it does not use stdin (standard input) as the default input file, nor is stdout (standard output) used as the default output file. Although standard input and standard output can still be used, a program calling SMARTsort must explicitly code the stdin and/or stdout ExtendedOption parameters.

By default, when SMARTsort is invoked from a program, no input origin is assumed. Input can come from one or more sources: If no input source is specified, SMARTsort terminates with an error.
 * One or more user in-memory source buffers.
 * One or more input files as specified in the flags parameter.
 * A user input exit, which must be specified when calling SMARTsort.

As with input, SMARTsort also makes no assumptions about the destination of output. Output must be directed at only one of the following destinations: For the Sort, Merge, and Check functions, if no out put destination is specified, or more than one output destination is specified (with the exception of the user output exit), SMARTsort terminates with an error. Because no output is generated with the Check function, no output destination is needed with it.
 * One or more user in-memory target buffers.
 * One output file specified with the -o PrimaryOption when calling SMARTsort.
 * To stdout, which requires coding the stdout ExtendedOption.
 * A user-written output exit, which can be specified with any one of the first three destinations.

The SMARTsort interface
The following structure is used to pass buffers back and forth between SMARTsort and user-written programs:  struct SMARTSORT_BUFFER {               char* buffer;     /* pointer to buffer */ longbuffer _size; /* sizeofbuffer */ longnbytes _used; /* number of bytes used */ } SMARTSORT_BUFFER;  ''Figure 2. SMARTsort_BUFFER structure''

The following example shows SMARTsort being called from a C-language program: rc=SMARTSORT(cmd, source, target, in_exit, out_exit, altseq, io_error ) ; ''Figure 3. SMARTsort program call interface''

Where: If you do not use an optional parameter, set it to NULL. Refer to the SMARTsort Guide and Reference for Workstations for details about using each of these parameters.
 * cmd is an optional SMARTsort command line.
 * source and target are optional pointers to arrays of SMARTSORT_BUFFER structures, which in turnpoint to source and target data, respectively.
 * in_exit and out_exit are optional pointers to user-written exit routines for input and output processing, respectively.
 * altseq is an optional pointer to an array of 256 integer collation weights for each value of a byte (0to255).
 * io_error is an optional pointer to a user-written I/O error handler.
 * rc is a SMARTsort return code.

Calling SMARTsort from a C Program
You can call SMARTsort from a C or C++ program. The header file, SMRTSORT.H, contains the SMARTsort prototype function definition, the SMARTSORT_BUFFER structure definition, and the user-exit return code definitions.

OS/2 Considerations
To call SMARTsort, your program must have a sufficiently large stack segment (16K or larger).

Also pay attention to the linkage convention used for function calling (the way parameters are passed to function). SMARTsort resides in a dynamic link library (.DLL) that is called by the system loader, so you must select system-type linkage when using compilers that offer more than one type of linkage. For example, if you are using IBM C Set++ or IBM VisualAge C++, include the linkage keyword _System on the SMARTsort function prototype to establish system-type linkage. Check the documentation for your compiler to see if any other special linkage considerations exist.

Figure 6 demonstrates one technique for calling SMARTsort from an OS/2 application. Do not use this technique if your program has any special error recovery requirements. If for some reason the system cannot find the SMARTsort .DLL, a system error message is issued and your program is terminated.  include "smrtsort.h"

int iSortRC ;   /* return code from sort call */ char* szSortFlags =" - ssort- x' formatrlength2intfloatchar20 '- k3, 3" ; SMARTSORT_BUFFER ssbSource[2] ; /* 1 source buf + NULL entry */ SMARTSORT_BUFFER ssbTarget[2] ; /* 1 target buf + NULL entry */

iSortRC = SMARTsort (szSortFlags, ssbSource, ssbTarget, NULL, NULL, NULL, NULL);  ''Figure 6. Typical C-language program call for OS/2''

There are two different ways to link your calling program with the SMARTsort DLL. One way is to use the SMARTsort import library, SMRTSORT.LIB, as shown below: icc /ST:16384 smrtcllr.obj, smrtcllr.exe, smrtcllr.map, smrtsort.lib ''Figure 7. Linking calling program with SMRTSORT.LIB for OS/2''

The other way is to use a .DEF file to specify the SMARTsort function as an imported function, as displayed in Figure 8:  SMRTCLLR.DEF: IMPORTS smrtsort.SMARTsort STACKSIZE 16384

linker invocation: icc smrtcllr.obj, smrtcllr.exe, smrtcllr.map,, smrtcllr.def  ''Figure 8. Linking calling program with a .DEF file for OS/2''

Why Use SMARTsort?
Because it offers very useful data manipulation functions and features found in high-priced workstation products and it is very fast! Independent tests have shown SMARTsort to be at least 10 times faster than the default OS/2 and AIX SORT commands. So try it... you'll like it!