Solving the Mysterious Software Failure

From EDM2
Jump to: navigation, search

by Curt Finch

Software on the average AIX workstation comes from a variety of places, and it all behaves in different ways. It often has poor documentation or poor error messages, and source code is not available. When something goes wrong, it can take a developer or system administrator hours or days to discover that the problem was in a cryptic configuration file buried deep on the hard disk. Or that the misbehaving program was trying to communicate with another machine on the network—and that machine was moved SCTrace, the AIX system call tracer, solves all that.

The SCTrace utility logs all system calls, and optionally all function calls, invoked by a given process as it executes. It records all parameters sent to the system calls (or functions) invoked by that process and all return values SCTrace also gathers the data referenced by pointer-type parameters to system calls. Additionally, SCTrace logs any signals received by the traced process, as well as system calls invoked from within the corresponding signal handlers.

The prpt utility reads the output from SCTrace and generates a report. The prpt utility understands the parameter formats for all the common system calls and is able to format its output accordingly. The exact command-line options and parameter formats for the sctrace and prpt commands are explained in detail in the man page (online documentation) that is shipped with the SCTrace package. This article explores the various options of SCTrace and demonstrates its usefulness in day-to-day programming and systems administration activities.

Specifying Command-Line Arguments

To start a new command and trace its execution, you should specify the command to execute, and all of its arguments, on the SCTrace command line. For example, to trace an ls command, type:

% sctrace ls

Any command-line arguments may be specified after the command name. To specify the -1 argument to the ls command, type:

% sctrace ls -l

Note that in this case, the -l argument is sent to the ls command and is not interpreted as an argument to SCTrace.

Any arguments intended for SCTrace should be specified before the name of the command to execute, for example:

% sctrace -o output ls -l

This argument traces the command ls -l and writes the trace data to a file named OUTPUT, which must be viewed with the prpt command.

Usage Samples

Now that you are familiar with the command structure, here are some examples for using SCTrace

Example 1

# sctrace -p ping servhost

The -p option adds timestamps to SCTrace output In this example, it can tell you why ping is taking so long to resolve hostnames.

Example 2

# sctrace -a 12345 -o /tmp/sct out

The -a option attaches to an existing process with (in this case) PID=12345. The -o option is required in this case and specifies an output file for the SCTrace data. This output file can be viewed at SCTrace process completion with the prpt command You can detach by pressing the Ctrl-C keys.

Example 3

You can also trace function calls with SCTrace. The output can be voluminous, but text indentations give you information about the calling hierarchy in the traced program.

# sctrace -f ls

This command generates the following output:

SCTrace 2.0b6 (c)1995 The Kernel Group, Inc

Licensed to Developer Connection CDROM

Command sctrace -f ls

Date Thu Jul 11 04 07 35 1996

main(0x1, 0x2ff21620)

setlocale(0xffffffff, 0x2000078c)

getenv(0xf01beb68) -> 0

getenv(0xf01beb74) -> 804395693 (0x2ff21aad)

getenv(0xf01bebe8) -> 804398517 (0x2ff225b5)

load_all_locales(0xf01beb70, 0x2000078c, 0x2ff225b5)

locale_name(0, 0x2000078c, 0xf01beb70)

getenv(0xf01bec08) -> 0

getenv(0xf01bec4c) -> 804395693 (0x2ff21aad)

< much data deleted >

In this example, the routine main() calls setlocale(), which calls getenv() three times and then calls load_all_locales(), which calls locale_name(), which calls getenv() twice. These locale routines (setlocale(), load_all_locales(), and locale _name()) are AIX library routines that correspond to the natural language (e.g., English, French, German) used by the machine Getenv() reads environment variables to determine that language. If getenv() returns 0 (as in the first getenv() in this example), then the variable asked for was not found.

Sample Output

The following is sample output from the sctrace -s ls command in a directory with only one file:

SCTrace 2 0b6 (c)1995 The Kernel Group, Inc

Licensed to Developer Connection CDROM

Command sctrace -s ls

Date Fri Jun 28 15 43 34 1996

getuidx(ID_SAVED) -> 229 (0xe5) getuidx(ID_REAL) -> 229 (0xe5) getuidx(ID_ EFFECTIVE) ->
 229 (0xe5) getgidx(ID_SAVED) -> 0 getgidx(ID_REAL) -> 0

x

getgidx(ID_EFFECTIVE) -> 0

_load( /usr/lib/nls/loc/En_US , 0, /lib /usr/lib ) -> 0 getuidx(ID_REAL) -> 229 (0xe5)

kioctl(1, TXISATTY, 0x00000000, 0) -> 0

kioctl(1, TIOCGWINSZ, 0x2ff21378, 0) -> 0

sbrk(0x10000) -> 536874328 (0x20000d58)

brk(0x20000d60) -> 0

sbrk(0x10000) -> 536874336 (0x20000d60)

sbrk(0x40000) -> 536939872 (0x20010d60)

statx( , 0x2ff20228, 76, 0x1) -> 0

statx( , 0x2ff21148, 76, 0) -> 0

open( , O_RDONLY, 0) -> 3

fstatfs(3, 0x2ff211c0) -> 0

getdirent(3, 0x2004cdb8, 4096) -> 48 (0x30)

lseek(3, 0, SEEK_SET) -> 0

kfcntl(3, F_GETFD, 0) -> 0

kfcntl(3, F_SETFD, 1) -> 0

getdirent(3, 0x2004cdb8, 4096) -> 48 (0x30)

getdirent(3, 0x2004cdb8, 4096) -> 0

close(3) -> 0

kioctl(1, TXISATTY, 0x00000000, 0) -> 0

kwritev(1, 0x2ff210e8, 1, 0) -> 2

kfcntl(1, F_GETFL, -266421032) -> 2

close(1) -> 0

kfcntl(0, F_GETFL, 0) -> 2

kfcntl(2, F_GETFL, -266421064) -> 2

_exit(0)

--EOF--

You can see the ls command opening on file descriptor 3 and reading the contents by way of getdirent.

Problem-Solving Examples

Although the previous examples were very basic, SCTrace for AIX can solve complex problems, such as the following.

Problem: A function in your program is being passed bogus data, causing some data corruption Later, the corruption causes a core dump With dbx, you can see where your program crashes, but you don't know where or when the corruption started The function in question is called from 14 places. You don't know which place initiated the invalid parameter, and inserting (and then removing) 14 printf s just doesn't sound fun.

Solution: The -f option to sctrace will record every single function call made inside your program. With this information, you can determine not only who got called with what arguments, but also who did the calling. You run:

% sctrace -f -o /tmp/trace program

% prpt /tmp/trace | more

After a bit of searching, you find where your function is first being called with an invalid parameter You then look up a couple of lines and you know which function did the calling. You also can find out which function called that function, and which function called that function all the way back to main(), and all with full arguments The problem was found in 10 minutes

Problem The Elm mail-reading program fails on exit with the following error.

Write to temp file failed, exiting leaving mailbox intact! /tmp and /var/tmp look like the permissions are ok.

Solution Use SCTrace to attatch to Elm s process ID (PID) right before exiting the program One of the writes looks like this.

kwritev(5, 0x2ff20028, 1, 0)

0x2ff20028 struct iovec

base 20056a10 len 4096

-> -1 ENOSPC(28) (No space left on device)

Sure enough, /tmp is low on space as evidenced by the df command Running the skulker command frees some space on /tmp, and the problem goes away.

Summary

SCTrace for AIX is useful for both system administrators and programmers. System administrators use it to isolate and debug erratic system and command behavior, such as failure resulting from missing or damaged configuration files or network administration problems. Programmers use it to analyze program behavior to look for clues to unexplained core dumps and other erratic actions taken by programs. Because of its ease-of-use and blistering speed, SCTrace can greatly quicken the problem resolution process on today s AIX networks.

IBM Austin has found SCTrace so useful for their system administrators and programmers that they obtained a site license. The CD-ROM for Developer Connection for AIX Volume 11 includes a 30-day timed-out version of the SCTrace evaluation package for you to try. If you have any questions about or problems with the product, please direct them to:

The Kernel Group, Inc
1250 Capital of Texas Highway South
Building Three, Suite 601
Austin, TX 78746
Tel No (512)433-3333
FAX (512)433-3200 fax
E-mail info@tkg.com

Reprint Courtesy of International Business Machines Corporation, © International Business Machines Corporation