Get Usenet News Articles Using REXX, Network News Transfer Protocol and Sockets

From EDM2
Jump to: navigation, search

Written Dave Briccetti

This tip demonstrates retrieving selected news articles from an NNTP server using REXX, NNTP, and TCP Sockets.

If you are a REXX, NNTP, or Sockets expert and you see any errors or possible improvements to this tip, please share your knowledge with me.

As a software developer and contract programmer, I like to keep up with available contracts by searching Usenet newsgroups, especially ba.jobs.contract. I got tired of starting up my newsreader, opening the group, searching for "OS/2", and then weeding out those postings which say "W2ONLY". I wanted to click on an object and have it all done for me automatically. So I wrote this REXX program.

Newsreaders often get the news articles from a program known as a News Server, which runs on a host somewhere. Your Internet service provider or system administrator should give you the name of the server to use. Newsreaders communicate with the news server using the Network News Transfer Protocol, or NNTP.

To use this program, which is called GetNews, type the following from the command line (or set up program objects to supply your frequently used options):

GetNews NewsGroup SearchString [ExcludeString]

NewsGroup is the name of the newsgroup you want to search, such as ba.jobs.contract.

SearchString is the string you are searching for, such as OS/2. All articles containing your search string in the Subject: line will be considered for retrieval.

ExcludeString is an optional string, which, if found in the body of the article, will cause the article to be skipped.

Here is an example:

GetNews ba.jobs.contract OS/2 W2ONLY

This will retrieve all articles from ba.jobs.contract with subjects containing "OS/2" and where the string "W2ONLY" does not appear anywhere in the article.

Here's part of the result of running the program as in the above example, with commentary interspersed. The lines in the larger font are the NNTP commands we are sending.


200 shellx.best.com InterNetNews NNRP server INN 1.4 22-Dec-93 ready (posting ok).

This is what the news server says when we connect to it. The 200 code tells us that all is well and we can continue.


GROUP ba.jobs.contract

Here we select the newsgroup.

211 3614 72340 75980 ba.jobs.contract

This response (code 211) tells us that the server accepted the GROUP command, and gives the number of articles, the starting article number, and the ending article number of the group.

XPAT SUBJECT 1- *OS/2*

We request a list of all articles whose subject contains the string "OS/2."

221 subject matches follow.

72392 US-CA-San Fran-MGR-OS/2 WARP-Recruiter
72440 US-CA-San Fran-OS/2 Testing Engineer-RecruiterChen & McGinley, Inc.
72491 US-CA-San Fran-QA-OS/2, Testing-Recruiter
72563 US-CA-Oakland-LAN-LAN WAN TCP/IP OS/2 NT-Recruiter
72567 US-CA-Oakland-LAN-LAN WAN TCP/IP OS/2 NT-Recruiter

The search results appear, followed by a line containing only a period, which identifies the end of the data.

BODY 72392

Now we ask for the body of the first message.

 222 72392 <80652395268020@dice.com> body
 SEARCH KEYS: TYPE:MGR   TERM:CON   W2ONLY   STATE:CA    AREA:415
 
 POSITION ID: ARCSF.011
 DATE POSTED: 12/01/95
 
 POSITION TITLE     : PROJECT MANAGER
 SKILLS REQUIREMENTS: OS/2 WARP
 
 LOCATION  : SAN MATEO
 START DATE: 12/20/95
 PAY RATE  : NEGOTIABLE  (+ benefits
 LENGTH    : 1 year
 
 COMMENTS: Manage large-scale, multi-site rollout of high powered
           superstation desktops.  Very interesting, high profile
           position.

And here it is, again ending with the period.

QUIT

We indicate we're finished.

205

The server accepts the command and ends the session.

REXX Program

The complete REXX program follows. You can also get it in zipped form.

 /* ============================================================================ 
 Get Usenet News Articles Matching Search Criteria, Using Network News Transfer Protocol
 RFC977 (http://www.cis.ohio-state.edu/htbin/rfc/rfc977.html)and the Draft Common NNTP
 Extensions(ftp://ftp.internic.net/internet-draft/draft-barber-nntp-imp-01.txt)
 
 Written by a novice REXX programmer
 
 Dave Briccetti, December 1995
 daveb@davebsoft.com, http://www.davebsoft.com 
 
 May be used for any purpose
 
 ============================================================================ */
 
 parse arg NewsGroup SearchString ExcludeString
 
 if NewsGroup = '' | SearchString = '' then
 do
    say 'usage: GetNews NewsGroup SearchString [ExcludeString]'
    say 'example: GetNews ba.jobs.contract OS/2 W2ONLY'
    say '  shows all postings in ba.jobs.contract with OS/2 in the'
    say '  subject line and without the string 'W2ONLY' in the body'
    exit
 end
 
 OutFile = 'results.' || NewsGroup  /* Change this if you don't have long file names */
 /*OutFile = 'search.txt'*/      
 NewsServer    = 'your.news.server' /* News server */
 SearchField   = 'subject'          /* Article header field to search */
 
 TRUE                    = 1
 FALSE                   = 0
 REPLYTYPE_OK            = '2'   /* NNTP reply code first byte */
 
 /* Load the REXX Socket interface */
 call RxFuncAdd 'SockLoadFuncs', 'rxSock', 'SockLoadFuncs'
 call SockLoadFuncs
 
 '@if exist' OutFile 'del' OutFile
 
 if EstablishProtocol() = FALSE then
    exit
 
 /* Get the postings */
 call GetPostings socket, NewsGroup, SearchField, ,
    SearchString, ExcludeString, OutFile
 
 /* End the protocol with QUIT */
 CmdReply = TransactCommand(socket, 'QUIT', 1, '0d0a'x)
 
 /* Close the socket */
 call SockSoClose socket
 
 exit
 
 
 /* ========================================================================= */
 EstablishProtocol:
 /* ========================================================================= */
 
 socket = ConnectToNewsServer(NewsServer)   
 if socket <.= 0 then
 do
    say 'Could not connect to news server'
    return FALSE
 end
 
 CmdReply = GetCmdReply(socket, '0d0a'x)
 say CmdReply
 
 if left(CmdReply, 1) \= REPLYTYPE_OK then
 do
    say 'Could not establish protocol'
    return FALSE
 end
 
 return TRUE
  
 
 /* ========================================================================= */
 GetPostings: procedure
 /* ========================================================================= */
 
 parse arg socket, NewsGroup, SearchField, SearchString, ExcludeString, OutFile
 CRLF = '0d0a'x
 Dot = CRLF || '.' || CRLF
 REPLYTYPE_OK            = '2'   /* NNTP reply code first byte */
 
 CmdReply = TransactCommand(socket, 'GROUP' NewsGroup, 1, CRLF)
 
 parse var CmdReply code num first last group
 
 if left(code, 1) = REPLYTYPE_OK then
 do
    CmdReply = TransactCommand(socket, ,
        'XPAT' SearchField '1- *' || SearchString || '*', 0, Dot)
    if left(CmdReply, 1) \= REPLYTYPE_OK then
    do
        say 'xpat failed'
        return FALSE
    end
    CmdReply = StripFirstLine(CmdReply)
    call lineout OutFile, CmdReply
 
    do while length(CmdReply) >. 5
        line = GetFirstLine(CmdReply)
        if line = '' then
            CmdReply = ''
        else
        do
            CmdReply = StripFirstLine(CmdReply)
            parse var line num rest
            body = TransactCommand(socket, 'BODY' num, 0, Dot)
            if ExcludeString = '' | (pos(ExcludeString, body) = 0) then
            do
                From = HeaderLine(socket, 'from')
                call lineout OutFile, From
                Subject = HeaderLine(socket, 'subject')
                call lineout OutFile, Subject
                BodyStripped = StripFirstLine(body)
                call lineout OutFile, BodyStripped
            end
        end
    end
 end
 
 return
  
 
 /* ========================================================================= */
 ConnectToNewsServer: procedure
 /* ========================================================================= */
 
 parse arg NewsServer
 socket = 0
 
 /* Open a socket to the news server.  (The Sock* functions are
   documented in the REXX Socket book in the Information folder
   in the OS/2 System folder */
 
 call SockInit
 if SockGetHostByName(NewsServer, 'host.!') = 0 then
    say 'Could not get host by name' errno h_errno
 else
 do
    socket = SockSocket('AF_INET','SOCK_STREAM',0)
    address.!family = 'AF_INET'
    address.!port = 119          /* the standard NNTP port */
    address.!addr = host.!addr
    if SockConnect(socket, 'address.!') = -1 then
        say 'Could not connect socket' errno h_errno
 end
 return socket
 
 
 /* ========================================================================= */
 GetCmdReply: procedure
 /* ========================================================================= */
 
 parse arg socket, EndString
 
 /* Receive the response to the command into a variable.  Use
   more than one socket read if necessary to collect the whole
   response. */
  
 if SockRecv(socket, 'CmdReply', 1000) <. 0 then do
    say 'Error reading from socket' errno h_errno
    exit
 end
 
 ReadCount = 1
 MaxParts = 10
 
 do while ReadCount <. MaxParts & right(CmdReply, length(EndString)) \= EndString
    if SockRecv(socket, 'CmdReplyExtra', 1000) <. 0 then do
        say 'Error reading from socket'
        exit
    end
    CmdReply = CmdReply || CmdReplyExtra
    ReadCount = ReadCount + 1
 end
 
 return CmdReply
 
 
 /* ========================================================================= */
 TransactCommand:
 /* ========================================================================= */
 
 parse arg socket, Cmd, SayCmd, EndString
 
 /* Send a command to the SMTP server, echoing it to the display
   if requested */
 
 if SayCmd then
    say Cmd
   
 rc = SockSend(socket, Cmd || '0d0a'x)
 reply = GetCmdReply(socket, EndString)
 if SayCmd then
     say reply
 return reply
 
 
 /* ========================================================================= */
 GetFirstLine: procedure
 /* ========================================================================= */
 
 parse arg TextBlock
 p = pos('0a'x, TextBlock)
 if p >. 0 then
    line = left(TextBlock,p)
 else
    line = ''
    
 return line
 
           
 /* ========================================================================= */
 StripFirstLine: procedure
 /* ========================================================================= */
 
 parse arg TextBlock
 p = pos('0a'x, TextBlock)
 if p >. 0 then
     StrippedTextBlock = right(TextBlock,length(TextBlock)-p)
 else
     StrippedTextBlock = ''
    
 return StrippedTextBlock
 
           
 /* ========================================================================= */
 HeaderLine: procedure           
 /* ========================================================================= */
 
 parse arg socket, linetype
 
 CRLF = '0d0a'x
 Dot = CRLF || '.' || CRLF
 
 XhdrResponse = TransactCommand(socket, 'xhdr' linetype, 0, Dot)
 
 hl = StripFirstLine(XhdrResponse)   /* Strip off the first line */
 hl = GetFirstLine(hl)               /* Take the first line of what remains */
 hl = delword(hl,1,1)                /* Delete the article number */
 return hl