TCP/IP Socket Programming in REXX
Written by Patrick Mueller
Remember when your mother used to tell you "Don't stick things into the wall outlets - sockets are dangerous!" Well, you've grown up, and guess what: sockets are fun to play with! At least TCP/IP sockets are. TCP/IP sockets are the programming interface used by all your favorite TCP/IP programs, including FTP, TELNET, news readers, IRC, and so forth.
Most implementations of TCP/IP provide a programming toolkit that includes a C language library for socket functions. I've created a REXX interface to the TCP/IP socket functions as provided by the IBM TCP/IP product, allowing programs that use sockets to be written in the REXX language as well as the C language.
In this article, I will discuss how to write programs in REXX that access TCP/IP sockets. First, I give an overview of sockets, followed by a description of the socket programming interface available for REXX. Then, I explain the sample programs shipped with the article (TMSG.CMD and TMSGD.CMD). Finally, I list a set of references that provide more information on programming with sockets, and what you can use sockets for.
After reading the article, you should have enough information to start implementing clients and servers for your favorite TCP/IP protocols, or implementing new protocols and applications in REXX that run over TCP/IP.
What are sockets?
The TCP/IP programming interface provides a way to allow multiple computers to work together, assuming that they are connected somehow. Computers that are connected form a network, and there are a number of different ways computers can be connected to one another: Token Ring LANs, Ethernet, telephone connections, and so on. Although it is possible to write programs with a specific type of connection in mind, it is often not practical to do so. Instead, there are abstraction layers for network programming that provide common programming interfaces and handle the underlying physical network connections for you.
TCP/IP is one such abstraction layer. There are TCP/IP implementations available for nearly all popular networks. It is also widely available on various harware platforms - from lowly 8088 PCs, to supercomputers - and everything in between. So, programming to the TCP/IP layer not only gives you network independence, but also some level of platform independence.
The lowest level of programming to the TCP/IP layer is done with sockets. Sockets are in many ways similar to file handles in traditional programming languages. Within a program, you open a socket, read from the socket, write to the socket, and close the socket. The main difference is that instead of a file handle being associated with a file on a disk drive, a socket is associated with another program that is also reading and writing data on the socket. In this manner, sockets are similar in behavior to pipes on OS/2 and Unix.
Programs that use sockets often use a client/server model. In this model, a server program and a client program both open sockets and connect them together. The client sends some data to the server; the server reads the data, does some processing, and sends data back to the client. The client then reads that data back from the server. This data exchange can continue back and forth for some time. Eventually, both the client and server program close their sockets and end the communication.
As an example, consider news readers and news servers. When you read Usenet news, you start up your news reader (client). The client connects to an already-running news server (server). The client requests information from the server, for instance, "list all the subject lines for articles in the comp.os.os2.programmer.misc news group". The client requests the information by sending this command (in a more standardized form than the English example given above) to the server. The server consults its database of Usenet news and sends back the list of subject lines. This sort of command/response flow is typical of the client/server model.
Socket functions for sending and receiving data
Below is a description of the basic socket functions that TCP/IP socket programs can use. There are functions to open and close a socket and functions to read from and write to a socket. The functions described are the REXX functions available in the rxSock function package.
- This function creates a new socket and returns it to the caller. The socket is an integer value. The socket is used by all the other socket functions, passed as the first parameter.
- The parameters passed to the socket determine different flavors of socket the programmer can use. Most of the time, the values AF_INET, SOCK_STREAM, and IPPROTO_TCP are used as the parameters, respectively. Other values are used only for special purposes.
- This function reads data from the socket. The buffer parameter is the name of a variable that will be used to place the data received from the socket. The length parameter is the maximum amount of data to be received by the socket. The flags parameter is generally unused and is optional.
- The number of bytes actually read into the buffer variable is returned from the function.
- This function writes data into the socket. The string parameter is the data sent into the socket. The flags parameter is generally unused and is optional.
The number of bytes actually written is returned from the function.
- This function closes a socket and makes it unusable on both sides of the connection.
Sockets have the following properties:
- The data sent and received on a socket is not translated in any way. If your programs must deal with EBCDIC and ASCII character encoding, or byte ordering of integers, you need to provide this in your program.
- Sockets are full duplex, meaning that a program can both read and write to the same socket.
- Data transmitted via SockSend() can be broken up by the underlying software into smaller units. For example, if a SockSend() of 8 bytes occurs, the data may be received by one call to SockRecv() returning 3 bytes, and the next call to SockRecv() returning 5 bytes.
- If a program calls SockRecv() and the other program has not sent any data via SockSend(), the SockRecv() call blocks until some data has been sent, or the socket is closed on the SockSend() side.
The properties described above imply that the client and server programs have to know something about the data being sent over the network. For instance, if you plan on sending binary data as 2-or 4-byte integers between a client and server, you'll have to decide before writing the programs what the order of the bytes in the integer will be. On most PC workstations, integers are stored internally with their bytes reversed. For instance, the number 0x12AB is stored internally as AB 12. Many of the higher-powered, non-PC workstations store integers without the bytes reversed. Choose either the reversed or non-reversed format, and then make sure the hardware your program is running on uses that format. If it doesn't, reverse the bytes before sending them (or after receiving them, as the case may be).
Also, because text sent with SockSend() can be broken into multiple packets, you will have to determine how the program doing the SockRecv() will decide when it has received enough data. It can't read until it gets everything, since it will eventually block. Instead, you will have to rely on one of the following techniques:
- Have the sender and receiver pass messages of a predetermined, fixed length.
- Have the sender sender prefix the data being sent with its length, which the receiver can read first to determine the length of the data to be received.
- Have the sender add a special character or characters to the end of the data to signify the end of the data. Many client/server programs use this method, particularly if they are sending textual data across the network. The terminating characters are often a carriage return, followed by a line feed.
Socket functions for connecting clients and servers
The functions SockRecv() and SockSend() described above only can be used once a socket is connected to another machine. But how does the connection between two programs take place? In order for a client program to talk to a server program, the client has to know on what machine the server is running. Every machine on the TCP/IP network is uniquely identified by a 32-bit number, called its Internet address, or IP address. Because 32-bit numbers can get rather unwieldy to use for humans, they are often displayed as four 8-bit numbers, converted to decimal numbers with dots between them, called dot decimal addresses. For example, the address 22.214.171.124 corresponds to the 32-bit number 9 * 256^3 + 67 * 256^2 + 225 * 256 + 165 = 155,443,621. Even dot decimal addresses are unwieldy, so there's usually a human readable name also associated with each IP address, called its hostname.
The socket functions generally deal with IP addresses. There is a function - SockGetHostByName() - that can be used to convert a hostname to an IP address, so many programs accept either an IP address or hostname as a machine address. When a hostname is passed in, it is converted to an IP address, and that address is used in the remainder of the program.
Because there is often more than one server program running on a machine, there must be a way to distinguish between them. This is done with a port.
A port is just an integral number that is associated with a particular server. Most of the programs provided with TCP/IP, such as FTP, TELNET, SMTP (mail), have "well-known" ports associated with them. The well-known port for FTP is 21, and the well-known port for TELNET is 23. In other words, any program that wants to interact with an FTP server uses 21 as the port. Two servers on the same machine cannot use the same port.
If you are writing your own client and server program with sockets, you'll need to have the client and server agree on a port. This can be done by hard-coding it in your program or by allowing it to be passed in as a parameter to the program.
The combination of an address and port are enough to distinguish any particular server running on any machine in the network. These two numbers are set in the address stem variable, which is used in functions to connect clients and servers. These functions are described below. See the rxsock.doc file shipped with the rxSock function package for more information on the address stem' variable.
- This function is used by servers to reserve a port. The port field of the address stem variable is filled in with the port the server will be using.
- This function is used by a server to wait for a client to try to make a connection. The socket must have been bound with the SockBind() function above before calling SockAccept(). Once the connection is made, the function returns a new socket and fills in the address stem variable with the IP address of the client. The new socket returned is used for all other communication with the client. The original socket remains open, waiting for new connections.
- This function is used by a client when it tries to connect to a server. The address stem variable is filled in with the IP address and port of the server.
The basic flow of the reading, writing, and connecting functions is shown below.
Note that the loop for SockSend() and SockRecv() is dependant on the application - some clients will just send, receive, or send and receive data once. The loop enclosing SockAccept() on the server is to handle each client ; each time through the loop corresponds with one client session. The SockClose() call in the loop on the server is to close the socket obtained by SockAccept(), not the socket obtained by SockSocket().
client server ------ ------ SockSocket() SockBind() SockSocket() loop ... SockConnect() SockAccept() loop ... loop ... SockSend() SockRecv() SockRecv() SockSend() SockClose() SockClose() SockClose()
Up until now I've only discussed the socket functions themselves. The socket functions are not included in OS/2 REXX itself, but implemented in a function package. The function package is implemented in an OS/2 Dynamic Link Library (DLL), available from the IBM Employee Written Software (EWS) program. EWS files are available from a number of places, including CompuServe, and the ftp.cdrom.com and software.watson.ibm.com anonymous FTP sites. Look for a subdirectory called EWS. On ftp.cdrom.com, rxsock is located in the /pub/os2/ibm/ews subdirectory. The rxSock function package is in a file called rxsock.zip. Unpack the .ZIP file, and read the rxsock.doc file for installation instructions and reference material on the functions in rxSock.
The REXX functions implemented in rxSock closely match the socket functions available to C programs. If you already know C, once you learn the REXX socket functions, you won't have any problem learning the C socket functions. In this respect, rxSock provides a way to prototype socket programs in REXX, and then implement them in C for speed.
The rxSock function package is supported with the OS/2 TCP/IP product, versions 1.2.1 and 2.0. At this time, I know of no other non-IBM TCP/IP implementations that support rxSock.
Example programs - TMSG and TMSGD
Now it's time to put all the pieces together and write an application. This example is an application to send a textual message from one OS/2 machine to another.
Two programs are used to implement the application: TMSG is the client, and TMSGD is the server. The client program is used to send a message to the server. The server receives the message and displays it on the console.
To start the server, just run the TMSGD program. It will display a startup message and then wait for clients to send messages. The program will not terminate unless you press Ctrl-C or Ctrl-Break.
To use the client, run the TMSG program, passing it the hostname of the machine to display the message on, followed by the text of the message.
To test the programs on your own system,
- start the server
- run the client, passing your hostname as the first parameter
Both programs assign a port number (the same number) at the top of the program, and make sure the rxSock package is loaded.
The TMSG program does the following:
- Converts the hostname to an IP address
- Creates a socket
- Connects the socket to the server
- Sends the message to the server - the message is followed by a carriage return and then a linefeed character
- Closes the socket
The TMSGD program does the following:
- Creates a socket
- Minds the socket to a port
- Loops through the following processing (once per client connection)
- Accepts a connection from the client
- Converts the client's IP address to a hostname
- Reads the message from the client - it reads from the socket until it receives a carriage return and then a linefeed character.
- Closes the socket returned from the SockAccept() call
- Displays the message
A few notes on some of the other processing the programs do:
- The client program does a SockRecv() before closing the socket. The server never sends any data back, but the SockRecv() call will return having read 0 bytes when the server closes the socket. The call to SockRecv() does nothing more than wait for the server to close its side of the socket before the client closes its side. If the client were to close its socket before the server read all the data, the server might not receive all the data.
- In this example, the client program never received anything back from the server. Quite often, the client receives data back from the server after sending it some data, but not always.
- The SockListen() call by the server is used to set the backlog of connections that TCP/IP will handle. If the server gets a connection request while it's processing another client, this connection will be queued up until the server can handle it. The number passed to SockListen() determines the maximum number of connection requests the will be queued up by the server.
Two good books on socket programming are:
- "Internetworking with TCP/IP" by Douglas Comer, ISBN 0-13-468505-9
- "Unix Network Programming" by Richard Stevens, ISBN 0-13-949876-1
The Comer book focuses more on TCP/IP, especially on the internals of TCP/IP, and the Stevens book focuses more on different types of interprocess communication on Unix systems. Unless you are really interested in the internals of TCP/IP, the Stevens book is probably the better source of information.
Many of the TCP/IP protocols used by programs are documented in RFC files (Request For Comments). These include NNTP (news), TELNET, FTP, and hundreds more. These files may be obtained via the Internet from venera.isi.edu in the in-notes directory. The rfc-index.txt file contains an index of all the RFCs currently available.
There are a few programs publically available that make use of the rxSock function package. These programs are available via anonymous ftp to ftp.cdrom.com, in the /pub/os2/2_x/network subdirectory. The programs are:
- a finger server
- a Network Time Protocol (NTP) client
- an NNTP news reader
Besides being useful programs in their own right, the programs can also be used as further examples of socket programming with the rxSock function package.
Legal Glorp Copyright IBM Corp., 1994. All Rights Reserved. While the information provided herein is believed to be accurate, IBM does not warrant the information, and the information is provided "as is". OS/2 is a trademark of International Business Machines Corp. Unix is a trademark of X/Open Company Ltd.