Get Usenet News Articles Using REXX, Network News Transfer Protocol and Sockets
Written Dave Briccetti
This tip demonstrates retrieving selected news articles from an NNTP server using REXX, NNTP, and TCP Sockets.
If you are a REXX, NNTP, or Sockets expert and you see any errors or possible improvements to this tip, please share your knowledge with me.
As a software developer and contract programmer, I like to keep up with available contracts by searching Usenet newsgroups, especially ba.jobs.contract. I got tired of starting up my newsreader, opening the group, searching for "OS/2", and then weeding out those postings which say "W2ONLY". I wanted to click on an object and have it all done for me automatically. So I wrote this REXX program.
Newsreaders often get the news articles from a program known as a News Server, which runs on a host somewhere. Your Internet service provider or system administrator should give you the name of the server to use. Newsreaders communicate with the news server using the Network News Transfer Protocol, or NNTP.
To use this program, which is called GetNews, type the following from the command line (or set up program objects to supply your frequently used options):
GetNews NewsGroup SearchString [ExcludeString]
NewsGroup is the name of the newsgroup you want to search, such as ba.jobs.contract.
SearchString is the string you are searching for, such as OS/2. All articles containing your search string in the Subject: line will be considered for retrieval.
ExcludeString is an optional string, which, if found in the body of the article, will cause the article to be skipped.
Here is an example:
GetNews ba.jobs.contract OS/2 W2ONLY
This will retrieve all articles from ba.jobs.contract with subjects containing "OS/2" and where the string "W2ONLY" does not appear anywhere in the article.
Here's part of the result of running the program as in the above example, with commentary interspersed. The lines in the larger font are the NNTP commands we are sending.
200 shellx.best.com InterNetNews NNRP server INN 1.4 22-Dec-93 ready (posting ok).
This is what the news server says when we connect to it. The 200 code tells us that all is well and we can continue.
GROUP ba.jobs.contract
Here we select the newsgroup.
211 3614 72340 75980 ba.jobs.contract
This response (code 211) tells us that the server accepted the GROUP command, and gives the number of articles, the starting article number, and the ending article number of the group.
XPAT SUBJECT 1- *OS/2*
We request a list of all articles whose subject contains the string "OS/2."
221 subject matches follow.
72392 US-CA-San Fran-MGR-OS/2 WARP-Recruiter 72440 US-CA-San Fran-OS/2 Testing Engineer-RecruiterChen & McGinley, Inc. 72491 US-CA-San Fran-QA-OS/2, Testing-Recruiter 72563 US-CA-Oakland-LAN-LAN WAN TCP/IP OS/2 NT-Recruiter 72567 US-CA-Oakland-LAN-LAN WAN TCP/IP OS/2 NT-Recruiter
The search results appear, followed by a line containing only a period, which identifies the end of the data.
BODY 72392
Now we ask for the body of the first message.
222 72392 <80652395268020@dice.com> body SEARCH KEYS: TYPE:MGR TERM:CON W2ONLY STATE:CA AREA:415 POSITION ID: ARCSF.011 DATE POSTED: 12/01/95 POSITION TITLE : PROJECT MANAGER SKILLS REQUIREMENTS: OS/2 WARP LOCATION : SAN MATEO START DATE: 12/20/95 PAY RATE : NEGOTIABLE (+ benefits LENGTH : 1 year COMMENTS: Manage large-scale, multi-site rollout of high powered superstation desktops. Very interesting, high profile position.
And here it is, again ending with the period.
QUIT
We indicate we're finished.
205
The server accepts the command and ends the session.
REXX Program
The complete REXX program follows. You can also get it in zipped form.
/* ============================================================================ Get Usenet News Articles Matching Search Criteria, Using Network News Transfer Protocol RFC977 (http://www.cis.ohio-state.edu/htbin/rfc/rfc977.html)and the Draft Common NNTP Extensions(ftp://ftp.internic.net/internet-draft/draft-barber-nntp-imp-01.txt) Written by a novice REXX programmer Dave Briccetti, December 1995 daveb@davebsoft.com, http://www.davebsoft.com May be used for any purpose ============================================================================ */ parse arg NewsGroup SearchString ExcludeString if NewsGroup = '' | SearchString = '' then do say 'usage: GetNews NewsGroup SearchString [ExcludeString]' say 'example: GetNews ba.jobs.contract OS/2 W2ONLY' say ' shows all postings in ba.jobs.contract with OS/2 in the' say ' subject line and without the string 'W2ONLY' in the body' exit end OutFile = 'results.' || NewsGroup /* Change this if you don't have long file names */ /*OutFile = 'search.txt'*/ NewsServer = 'your.news.server' /* News server */ SearchField = 'subject' /* Article header field to search */ TRUE = 1 FALSE = 0 REPLYTYPE_OK = '2' /* NNTP reply code first byte */ /* Load the REXX Socket interface */ call RxFuncAdd 'SockLoadFuncs', 'rxSock', 'SockLoadFuncs' call SockLoadFuncs '@if exist' OutFile 'del' OutFile if EstablishProtocol() = FALSE then exit /* Get the postings */ call GetPostings socket, NewsGroup, SearchField, , SearchString, ExcludeString, OutFile /* End the protocol with QUIT */ CmdReply = TransactCommand(socket, 'QUIT', 1, '0d0a'x) /* Close the socket */ call SockSoClose socket exit /* ========================================================================= */ EstablishProtocol: /* ========================================================================= */ socket = ConnectToNewsServer(NewsServer) if socket <= 0 then do say 'Could not connect to news server' return FALSE end CmdReply = GetCmdReply(socket, '0d0a'x) say CmdReply if left(CmdReply, 1) \= REPLYTYPE_OK then do say 'Could not establish protocol' return FALSE end return TRUE /* ========================================================================= */ GetPostings: procedure /* ========================================================================= */ parse arg socket, NewsGroup, SearchField, SearchString, ExcludeString, OutFile CRLF = '0d0a'x Dot = CRLF || '.' || CRLF REPLYTYPE_OK = '2' /* NNTP reply code first byte */ CmdReply = TransactCommand(socket, 'GROUP' NewsGroup, 1, CRLF) parse var CmdReply code num first last group if left(code, 1) = REPLYTYPE_OK then do CmdReply = TransactCommand(socket, , 'XPAT' SearchField '1- *' || SearchString || '*', 0, Dot) if left(CmdReply, 1) \= REPLYTYPE_OK then do say 'xpat failed' return FALSE end CmdReply = StripFirstLine(CmdReply) call lineout OutFile, CmdReply do while length(CmdReply) > 5 line = GetFirstLine(CmdReply) if line = '' then CmdReply = '' else do CmdReply = StripFirstLine(CmdReply) parse var line num rest body = TransactCommand(socket, 'BODY' num, 0, Dot) if ExcludeString = '' | (pos(ExcludeString, body) = 0) then do From = HeaderLine(socket, 'from') call lineout OutFile, From Subject = HeaderLine(socket, 'subject') call lineout OutFile, Subject BodyStripped = StripFirstLine(body) call lineout OutFile, BodyStripped end end end end return /* ========================================================================= */ ConnectToNewsServer: procedure /* ========================================================================= */ parse arg NewsServer socket = 0 /* Open a socket to the news server. (The Sock* functions are documented in the REXX Socket book in the Information folder in the OS/2 System folder */ call SockInit if SockGetHostByName(NewsServer, 'host.!') = 0 then say 'Could not get host by name' errno h_errno else do socket = SockSocket('AF_INET','SOCK_STREAM',0) address.!family = 'AF_INET' address.!port = 119 /* the standard NNTP port */ address.!addr = host.!addr if SockConnect(socket, 'address.!') = -1 then say 'Could not connect socket' errno h_errno end return socket /* ========================================================================= */ GetCmdReply: procedure /* ========================================================================= */ parse arg socket, EndString /* Receive the response to the command into a variable. Use more than one socket read if necessary to collect the whole response. */ if SockRecv(socket, 'CmdReply', 1000) < 0 then do say 'Error reading from socket' errno h_errno exit end ReadCount = 1 MaxParts = 10 do while ReadCount < MaxParts & right(CmdReply, length(EndString)) \= EndString if SockRecv(socket, 'CmdReplyExtra', 1000) < 0 then do say 'Error reading from socket' exit end CmdReply = CmdReply || CmdReplyExtra ReadCount = ReadCount + 1 end return CmdReply /* ========================================================================= */ TransactCommand: /* ========================================================================= */ parse arg socket, Cmd, SayCmd, EndString /* Send a command to the SMTP server, echoing it to the display if requested */ if SayCmd then say Cmd rc = SockSend(socket, Cmd || '0d0a'x) reply = GetCmdReply(socket, EndString) if SayCmd then say reply return reply /* ========================================================================= */ GetFirstLine: procedure /* ========================================================================= */ parse arg TextBlock p = pos('0a'x, TextBlock) if p > 0 then line = left(TextBlock,p) else line = '' return line /* ========================================================================= */ StripFirstLine: procedure /* ========================================================================= */ parse arg TextBlock p = pos('0a'x, TextBlock) if p > 0 then StrippedTextBlock = right(TextBlock,length(TextBlock)-p) else StrippedTextBlock = '' return StrippedTextBlock /* ========================================================================= */ HeaderLine: procedure /* ========================================================================= */ parse arg socket, linetype CRLF = '0d0a'x Dot = CRLF || '.' || CRLF XhdrResponse = TransactCommand(socket, 'xhdr' linetype, 0, Dot) hl = StripFirstLine(XhdrResponse) /* Strip off the first line */ hl = GetFirstLine(hl) /* Take the first line of what remains */ hl = delword(hl,1,1) /* Delete the article number */ return hl