Jump to content

Into Java - Part XIV: Difference between revisions

From EDM2
Ak120 (talk | contribs)
mNo edit summary
 
(6 intermediate revisions by 2 users not shown)
Line 1: Line 1:
By [[Simon Gronlund]]
{{IntoJava}}
''By [[Simon Grönlund]]''


{| width="400" cellpadding="4"
==Into Java, Part 14==
| align="CENTER" valign="MIDDLE" |
Streams are important data structures to know of. Not long after the first computers were powered on, directly controlled by switches, terminals cabled to the machines were introduced. Over time, keyboards of many different kinds surfaced and needed to be plugged into the computer, and devices like video screens also needed connections. Today we use files on disks or other media, networks, radio links and quite a few more techniques to receive or send information.
[[Into Java - Part I|Part I]]
| align="CENTER" valign="MIDDLE" |
[[Into Java - Part II|Part II]]
| align="CENTER" valign="MIDDLE" |
[[Into Java - Part III|Part III]]
| align="CENTER" valign="MIDDLE" |
[[Into Java - Part IV|Part IV]]
| align="CENTER" valign="MIDDLE" |
[[Into Java - Part V|Part V]]
| align="CENTER" valign="MIDDLE" |
[[Into Java - Part VI|Part VI]]
| align="CENTER" valign="MIDDLE" |
[[Into Java - Part VII|Part VII]]
| align="CENTER" valign="MIDDLE" |
[[Into Java - Part VIII|Part VIII]]
| align="CENTER" valign="MIDDLE" |
[[Into Java - Part IX|Part IX]]
| align="CENTER" valign="MIDDLE" |
[[Into Java - Part X|Part X]]
|}


{| width="400" cellpadding="4"
Every time we use some kind of data stream, it is simply a sequence of bytes or characters. Basically any kind of stream is handled the same way, there is no big difference between streaming audio data and a text file stream, except that the text file is locally stored and the audio may spring from a live concert somewhere.
| align="CENTER" valign="MIDDLE" |
[[Into Java - Part XI|Part XI]]
| align="CENTER" valign="MIDDLE" |
[[Into Java - Part XII|Part XII]]
| align="CENTER" valign="MIDDLE" |
[[Into Java - Part XIII|Part XIII]]
| align="CENTER" valign="MIDDLE" |
[[Into Java - Part XIV|Part IV]]
| align="CENTER" valign="MIDDLE" |
[[Into Java - Part XV|Part XV]]
| align="CENTER" valign="MIDDLE" |
[[Into Java - Part XVI|Part XVI]]
| align="CENTER" valign="MIDDLE" |
[[Into Java - Part XVII|Part XVII]]
| align="CENTER" valign="MIDDLE" |
[[Into Java - Part XVIII|Part XVIII]]
| align="CENTER" valign="MIDDLE" |
[[Into Java - Part XIX|Part XIX]]
| align="CENTER" valign="MIDDLE" |
[[Into Java - Part XX|Part XX]]
| align="CENTER" valign="MIDDLE" |
[[Into Java - Part XXI|Part XXI]]
| align="CENTER" valign="MIDDLE" |
[[Into Java - Part XXII|Part XXII]]
| align="CENTER" valign="MIDDLE" |
[[Into Java - Part XXIII|Part XXIII]]
|}


=='''<font face="Helv,Helvetica,Arial">Into Java, Part 14</font>'''==
The four basic stream classes in Java effectively shield you from the exact mechanism of how a stream is read or written, mainly because the actual functions differ for source and target types. The most important thing to us is that we can read or write to a stream.


<font size="-1" face="Helv,Helvetica,Arial">Streams are imporant data structures to know of. Not long after the first computers were powered on, directly controlled by switches, terminals cabled to the machines were introduced. Over time, keyboards of many different kinds surfaced and needed to be plugged into the computer, and devices like video screens also needed connections. Today we use files on disks or other media, networks, radio links and quite a few more techniques to receive or send information.<br /></font>
That hiding of the low level implementation is not only convenient to us, it also gives the low level guys and girls the freedom to change their implementation, as long as the interface, the API, is untouched. Also, new technology can be offered without any need to change our application.


<font size="-1" face="Helv,Helvetica,Arial">Every time we use some kind of data stream, it is simply a sequence of bytes or characters. Basically any kind of stream is handled the same way, there is no big difference between streaming audio data and a text file stream, except that the text file is locally stored and the audio may spring from a live concert somewhere.<br /></font>
We will however look at what a basic stream is and what we can do with it. Then we will add the convenience classes of Java to the stream and make use of a few such classes. Both input and output streams in Java have a veritable zoo of different add-on classes to pick from.


<font size="-1" face="Helv,Helvetica,Arial">The four basic stream classes in Java effectively shield you from the exact mechanism of how a stream is read or written, mainly because the actual functions differ for source and target types. The most important thing to us is that we can read or write to a stream.<br /></font>
Further we will use some data structures. Unlike the many data structures added to Java 2, streams have been integral since Java version 1. Finally we will learn another term, <tt>abstract</tt>, and what that implies and means.


<font size="-1" face="Helv,Helvetica,Arial">That hiding of the low level implementation is not only convenient to us, it also gives the low level guys and girls the freedom to change their implementation, as long as the interface, the API, is untouched. Also, new technology can be offered without any need to change our application.<br /></font>
A stream is a sequence of bytes or characters. Input streams are read from and output streams are written to. Historically 8-bit bytes are used, but today the 16-bit Unicode characters are becoming more popular and hence Java offers special classes for handling these characters transparently to the programmer.


<font size="-1" face="Helv,Helvetica,Arial">We will however look at what a basic stream is and what we can do with it. Then we will add the convenience classes of Java to the stream and make use of a few such classes. Both input and output streams in Java have a veritable zoo of different add-on classes to pick from.<br /></font>
Consider a pipe providing one byte, either from time to time as from a keyboard that is used infrequently, or at a high speed from reading a locally stored file. In both cases, there is only one byte at a time provided, you have to remove it to give room for the next one. That is what the most basic stream looks like, which leads us to the <tt>InputStream</tt>. The <tt>OutputStream</tt> is the exact opposite, a pipe that can take one byte at a time, but we will come to that class in a moment.
 
<font size="-1" face="Helv,Helvetica,Arial">Further we will use some data structures. Unlike the many data structures added to Java 2, streams have been integral since Java version 1. Finally we will learn another term,</font><font size="-1"><code> abstract</code></font><font size="-1" face="Helv,Helvetica,Arial">, and what that implies and means.<br /></font>
 
<br /><font size="-1" face="Helv,Helvetica,Arial">A stream is a sequence of bytes or characters. Input streams are read from and output streams are written to. Historically 8-bit bytes are used, but today the 16-bit Unicode characters are becoming more popular and hence Java offers special classes for handling these characters transparently to the programmer.<br /></font>
 
<font size="-1" face="Helv,Helvetica,Arial">Consider a pipe providing one byte, either from time to time as from a keyboard that is used infrequently, or at a high speed from reading a locally stored file. In both cases, there is only one byte at a time provided, you have to remove it to give room for the next one. That is what the most basic stream looks like, which leads us to the</font><font size="-1"><code> InputStream</code></font><font size="-1" face="Helv,Helvetica,Arial">. The</font><font size="-1"><code> OutputStream </code></font><font size="-1" face="Helv,Helvetica,Arial">is the exact opposite, a pipe that can take one byte at a time, but we will come to that class in a moment.<br /></font>
 
====<font face="Helv,Helvetica,Arial">'''InputStream and Reader<br />'''</font>====
 
<font size="-1" face="Helv,Helvetica,Arial">As both the byte streams (</font><font size="-1"><code>InputStream</code></font><font size="-1" face="Helv,Helvetica,Arial">) and the character streams (</font><font size="-1"><code>Reader</code></font><font size="-1" face="Helv,Helvetica,Arial">) provide almost the same functionality, I discuss them together. Later we will see how the two groups differ in usage. A basic input stream provides these methods:<br /></font>


===InputStream and Reader===
As both the byte streams (<tt>InputStream</tt>) and the character streams (<tt>Reader</tt>) provide almost the same functionality, I discuss them together. Later we will see how the two groups differ in usage. A basic input stream provides these methods:
  void close()                  // abstract in Reader
  void close()                  // abstract in Reader
  void mark()
  void mark()
Line 82: Line 28:
  void reset()
  void reset()
  void skip(long n)
  void skip(long n)
A method may not be implemented in a specific class and then be declared abstract, as
public abstract int read()
That makes the entire class abstract. In fact such a class is more of a framework, and in some aspects it is close to an ''interface''. But where interfaces only declare the methods to be used, an abstract class may have one or several methods that in fact '''are''' implemented, and some that are not.


<font size="-1" face="Helv,Helvetica,Arial">A method may not be implemented in a specific class and then be declared abstract, as<br /></font><font size="-1"><code>public abstract int read()</code></font><br /><font size="-1" face="Helv,Helvetica,Arial">That makes the entire class abstract. In fact such a class is more of a framework, and in some aspects it is close to an ''interface''. But where interfaces only declare the methods to be used, an abstract class may have one or several methods that in fact '''are''' implemented, and some that are not.<br /></font>
Abstract classes cannot be instantiated, thus they are always subclassed and the subclass(es) then implement the missing method bodies as they want to. This is quite close to overriding methods, only this time implementation is most certainly needed. Such classes may also act as a label on certain available streams so that we know what we can expect from them.
 
<font size="-1" face="Helv,Helvetica,Arial">Abstract classes cannot be instantiated, thus they are always subclassed and the subclass(es) then implement the missing method bodies as they want to. This is quite close to overriding methods, only this time implementation is most certainly needed. Such classes may also act as a label on certain available streams so that we know what we can expect from them.<br /></font>


<font size="-1" face="Helv,Helvetica,Arial">Why is this? It is obvious when looking at the differences between data streams and text streams, one is using</font><font size="-1"><code> byte </code></font><font size="-1" face="Helv,Helvetica,Arial">and the other</font><font size="-1"><code> char</code></font><font size="-1" face="Helv,Helvetica,Arial">, which leads to two different implementations.<br /></font>
Why is this? It is obvious when looking at the differences between data streams and text streams, one is using <tt>byte</tt> and the other <tt>char</tt>, which leads to two different implementations.


<font size="-1" face="Helv,Helvetica,Arial">Still, any non abstract</font><font size="-1"><code> java.io </code></font><font size="-1" face="Helv,Helvetica,Arial">subclass of</font><font size="-1"><code> InputStream </code></font><font size="-1" face="Helv,Helvetica,Arial">and</font><font size="-1"><code> Reader</code></font><font size="-1" face="Helv,Helvetica,Arial">, '''has''' the methods mentioned!<br /></font>
Still, any non abstract <tt>java.io</tt> subclass of <tt>InputStream</tt> and <tt>Reader</tt>, '''has''' the methods mentioned!


<font size="-1" face="Helv,Helvetica,Arial">In addition to these there are a few methods that differ between the classes. The most important methods are the</font><font size="-1"><code> read </code></font><font size="-1" face="Helv,Helvetica,Arial">methods, the basic one returning an</font><font size="-1"><code> int </code></font><font size="-1" face="Helv,Helvetica,Arial">(I presume an</font><font size="-1"><code> int </code></font><font size="-1" face="Helv,Helvetica,Arial">is chosen since</font><font size="-1"><code> Reader </code></font><font size="-1" face="Helv,Helvetica,Arial">may return an</font><font size="-1"><code> int </code></font><font size="-1" face="Helv,Helvetica,Arial">in the range of 0 to 65535 (</font><font size="-1"><code>0x00-0xffff</code></font><font size="-1" face="Helv,Helvetica,Arial">), or -1 if the end of the stream is found), and the other ones filling a provided</font><font size="-1"><code> char </code></font><font size="-1" face="Helv,Helvetica,Arial">array with characters one at a time.<br /></font>
In addition to these there are a few methods that differ between the classes. The most important methods are the <tt>read</tt> methods, the basic one returning an <tt>int</tt> (I presume an <tt>int</tt> is chosen since <tt>Reader</tt> may return an <tt>int</tt> in the range of 0 to 65535 (<tt>0x00-0xffff</tt>), or -1 if the end of the stream is found), and the other ones filling a provided <tt>char</tt> array with characters one at a time.


<font size="-1"><code>reset() </code></font><font size="-1" face="Helv,Helvetica,Arial">is used to start over from the place</font><font size="-1"><code> mark() </code></font><font size="-1" face="Helv,Helvetica,Arial">was used to put kind of a "book-mark", if the stream provides "book-marks". We may also</font><font size="-1"><code> skip(long n) </code></font><font size="-1" face="Helv,Helvetica,Arial">bytes or characters if we want to. When finished reading the stream we</font><font size="-1"><code> close() </code></font><font size="-1" face="Helv,Helvetica,Arial">it so the system resources will be returned.<br /></font>
<tt>reset()</tt> is used to start over from the place <tt>mark()</tt> was used to put kind of a "book-mark", if the stream provides "book-marks". We may also <tt>skip(long n)</tt> bytes or characters if we want to. When finished reading the stream we <tt>close()</tt> it so the system resources will be returned.


<font size="-1" face="Helv,Helvetica,Arial">These methods differ in</font><font size="-1"><code> InputStream </code></font><font size="-1" face="Helv,Helvetica,Arial">and</font><font size="-1"><code> Reader </code></font><font size="-1" face="Helv,Helvetica,Arial">only in the former processing bytes and the latter characters. The characters are represented internally within your Java application in Unicode format, but externally the encoding depends on the underlying system and the actual stream processed.<br /></font>
These methods differ in <tt>InputStream</tt> and <tt>Reader</tt> only in the former processing bytes and the latter characters. The characters are represented internally within your Java application in Unicode format, but externally the encoding depends on the underlying system and the actual stream processed.


<font size="-1" face="Helv,Helvetica,Arial">Now an observation: </font><font size="-1"><code>read </code></font><font size="-1" face="Helv,Helvetica,Arial">is a rather dumb method, it will sit and wait at the end of the input stream for more data. At least until an</font><font size="-1"><code> IOException </code></font><font size="-1" face="Helv,Helvetica,Arial">arrives. Hence, if you read from a stream, your application will freeze if the stream stops for a while, as a stream over the Internet may do. Later we will see how to take care of that.<br clear="ALL" /></font>
Now an observation: <tt>read</tt> is a rather dumb method, it will sit and wait at the end of the input stream for more data. At least until an <tt>IOException</tt> arrives. Hence, if you read from a stream, your application will freeze if the stream stops for a while, as a stream over the Internet may do. Later we will see how to take care of that.
 
====<font face="Helv,Helvetica,Arial">'''OutputStream and Writer<br />'''</font>====
 
<font size="-1" face="Helv,Helvetica,Arial">As input streams had their common methods the output streams do as well where x is</font><font size="-1"><code> byte </code></font><font size="-1" face="Helv,Helvetica,Arial">or</font><font size="-1"><code> char </code></font><font size="-1" face="Helv,Helvetica,Arial">respectively:<br /></font>


===OutputStream and Writer===
As input streams had their common methods the output streams do as well where x is <tt>byte</tt> or <tt>char</tt> respectively:
  void close()                      // abstract in Writer
  void close()                      // abstract in Writer
  void flush()                      // abstract in Writer
  void flush()                      // abstract in Writer
Line 110: Line 55:
  void write(String str)                  // Writer only
  void write(String str)                  // Writer only
  void write(String str, int off, int len) // Writer only
  void write(String str, int off, int len) // Writer only
The <tt>write(...)</tt> methods are self-explanatory, as is <tt>close()</tt>. <tt>flush()</tt> tells the system to write the data right away, if buffering is used.


<font size="-1" face="Helv,Helvetica,Arial">The</font><font size="-1"><code> write(...) </code></font><font size="-1" face="Helv,Helvetica,Arial">methods are self-explanatory, as is </font><font size="-1"><code>close() </code></font><font size="-1" face="Helv,Helvetica,Arial">. </font><font size="-1"><code> flush() </code></font><font size="-1" face="Helv,Helvetica,Arial">tells the system to write the data right away, if buffering is used.<br /></font>
Three pipes from the <tt>System</tt> class are always available, <tt>System.in, System.out </tt> and <tt>System.err</tt>. The former two we have used, especially <tt>System.out</tt>. <tt>System.in</tt> is an <tt>InputStream</tt> while the latter two are of the <tt>OutputStream</tt> type.
 
<font size="-1" face="Helv,Helvetica,Arial">Three pipes from the</font><font size="-1"><code> System </code></font><font size="-1" face="Helv,Helvetica,Arial">class are always available, </font><font size="-1"><code> System.in, System.out </code></font><font size="-1" face="Helv,Helvetica,Arial">and</font><font size="-1"><code> System.err</code></font><font size="-1" face="Helv,Helvetica,Arial">. The former two we have used, especially </font><font size="-1"><code> System.out</code></font><font size="-1" face="Helv,Helvetica,Arial">.</font><font size="-1"><code> System.in </code></font><font size="-1" face="Helv,Helvetica,Arial">is an</font><font size="-1"><code> InputStream </code></font><font size="-1" face="Helv,Helvetica,Arial">while the latter two are of the</font><font size="-1"><code> OutputStream </code></font><font size="-1" face="Helv,Helvetica,Arial">type.<br /></font>
 
<font size="-1" face="Helv,Helvetica,Arial">However, not all of the methods mentioned so far are implemented, see the abstract box. All these methods will be implemented by the classes we wrap these basic streams up in. Hence we may consider these basic stream classes a plain pipe, and we have to wrap it up in convenience classes if we do not want to do a lot of tiresome coding ourselves.<br /></font>
 
====<font face="Helv,Helvetica,Arial">'''Convenience classes<br />'''</font>====
 
<font size="-1" face="Helv,Helvetica,Arial">Below is a table of most of the IO-classes. Remember that a subclass also has the properties of its superclass(es). That is for example,</font><font size="-1"><code> BufferedInputStream </code></font><font size="-1" face="Helv,Helvetica,Arial">''is a''</font><font size="-1"><code> FilterInputStream</code></font><font size="-1" face="Helv,Helvetica,Arial">, that ''is an''</font><font size="-1"><code> InputStream</code></font><font size="-1" face="Helv,Helvetica,Arial">, that ''is an''</font><font size="-1"><code> Object</code></font><font size="-1" face="Helv,Helvetica,Arial">. Hence you may for example use an </font><font size="-1"><code>InputStream </code></font><font size="-1" face="Helv,Helvetica,Arial">reference from somewhere. And from that abstract object we may make ourselves an</font><font size="-1"><code> InputStreamReader </code></font><font size="-1" face="Helv,Helvetica,Arial">that is a</font><font size="-1"><code> Reader </code></font><font size="-1" face="Helv,Helvetica,Arial">so we can make a</font><font size="-1"><code> BufferedReader</code></font><font size="-1" face="Helv,Helvetica,Arial">.<br /></font>


<font size="-1" face="Helv,Helvetica,Arial">I have colored the base IO classes so it can be easily seen where they may be used in constructors.<br /></font>
However, not all of the methods mentioned so far are implemented, see the abstract box. All these methods will be implemented by the classes we wrap these basic streams up in. Hence we may consider these basic stream classes a plain pipe, and we have to wrap it up in convenience classes if we do not want to do a lot of tiresome coding ourselves.


<center>
===Convenience classes===
Below is a table of most of the IO-classes. Remember that a subclass also has the properties of its superclass(es). That is for example, <tt>BufferedInputStream</tt> ''is a'' <tt>FilterInputStream</tt>, that ''is an'' <tt>InputStream</tt>, that ''is an'' <tt>Object</tt>. Hence you may for example use an <tt>InputStream</tt> reference from somewhere. And from that abstract object we may make ourselves an <tt>InputStreamReader</tt> that is a <tt>Reader</tt> so we can make a <tt>BufferedReader</tt>.


{| cellspacing="5" cellpadding="2" align="CENTER" bgcolor="#00B6FF"
I have coloured the base IO classes so it can be easily seen where they may be used in constructors.
|
{| width="100%" cellpadding="3"
{| width="100%" cellpadding="3" bgcolor="White"
| nowrap="NOWRAP" valign="TOP" |
| nowrap="NOWRAP" valign="TOP" |
  <font face="Helv,Helvetica,Arial">Class names
  Class names
   
   
  </font><font size="-1"><code>'''Object'''
  '''Object'''
</code></font>
| nowrap="NOWRAP" valign="TOP" |
| nowrap="NOWRAP" valign="TOP" |
  <font face="Helv,Helvetica,Arial">Constructor</font>
  Constructor
|-
|-
| nowrap="NOWRAP" valign="TOP" |
| nowrap="NOWRAP" valign="TOP" |
<font size="-1"><code>    </code></font><font size="-1" color="Red"><code>'''InputStream'''</code></font>
    <font color="Red">'''InputStream'''</font>
<font size="-1"><code>        ByteArrayInputStream
          ByteArrayInputStream
           FileInputStream
           FileInputStream
           FilterInputStream
           FilterInputStream
Line 146: Line 83:
           ObjectInputStream
           ObjectInputStream
           PipedInputStream
           PipedInputStream
</code></font>
| nowrap="NOWRAP" valign="TOP" |
| nowrap="NOWRAP" valign="TOP" |
  <font size="-1"><code>'''abstract'''
  '''abstract'''
  (byte[] buf)
  (byte[] buf)
  (</code></font><font size="-1" color="Fuchsia"><code>File</code></font><font size="-1"><code>), (String)
  (<font color="Fuchsia">File</font>), (String)
  (</code></font><font size="-1" color="Red"><code>InputStream</code></font><font size="-1"><code>)
  (<font color="Red">InputStream</font>)
  (</code></font><font size="-1" color="Red"><code>InputStream</code></font><font size="-1"><code>)
  (<font color="Red">InputStream</font>)
  (</code></font><font size="-1" color="Red"><code>InputStream</code></font><font size="-1"><code>)
  (<font color="Red">InputStream</font>)
  (</code></font><font size="-1" color="Red"><code>InputStream</code></font><font size="-1"><code>)
  (<font color="Red">InputStream</font>)
  (</code></font><font size="-1" color="Red"><code>InputStream</code></font><font size="-1"><code>)
  (<font color="Red">InputStream</font>)
  (), (PipedOutputStream)
  (), (PipedOutputStream)
</code></font>
|-
|-
| nowrap="NOWRAP" valign="TOP" |
| nowrap="NOWRAP" valign="TOP" |
  <font size="-1" color="Blue"><code>'''    OutputStream'''</code></font>
  <font color="Blue">'''    OutputStream'''</font>
<font size="-1"><code>        ByteArrayOutputStream
          ByteArrayOutputStream
           FileOutputStream
           FileOutputStream
           FilterOutputStream
           FilterOutputStream
Line 169: Line 104:
           ObjectOutputStream
           ObjectOutputStream
           PipedOutputStream
           PipedOutputStream
</code></font>
| nowrap="NOWRAP" valign="TOP" |
| nowrap="NOWRAP" valign="TOP" |
  <font size="-1"><code>'''abstract'''
  '''abstract'''
  (), (int size)
  (), (int size)
  (</code></font><font size="-1" color="Fuchsia"><code>File</code></font><font size="-1"><code>), (String)
  (<font color="Fuchsia">File</font>), (String)
  (</code></font><font size="-1" color="Blue"><code>OutputStream</code></font><font size="-1"><code>)
  (<font color="Blue">OutputStream</font>)
  (</code></font><font size="-1" color="Blue"><code>OutputStream</code></font><font size="-1"><code>)
  (<font color="Blue">OutputStream</font>)
  (</code></font><font size="-1" color="Blue"><code>OutputStream</code></font><font size="-1"><code>)
  (<font color="Blue">OutputStream</font>)
  (</code></font><font size="-1" color="Blue"><code>OutputStream</code></font><font size="-1"><code>)
  (<font color="Blue">OutputStream</font>)
  (), (</code></font><font size="-1" color="Blue"><code>OutputStream</code></font><font size="-1"><code>)
  (), (<font color="Blue">OutputStream</font>)
  (), (PipedInputStream)
  (), (PipedInputStream)
</code></font>
|-
|-
| nowrap="NOWRAP" valign="TOP" |
| nowrap="NOWRAP" valign="TOP" |
  <font size="-1" color="Olive"><code>'''    Reader'''</code></font>
  <font color="Olive">'''    Reader'''</font>
<font size="-1"><code>        BufferedReader
          BufferedReader
               LineNumberReader
               LineNumberReader
           CharArrayReader
           CharArrayReader
Line 193: Line 126:
           PipedReader
           PipedReader
           StringReader
           StringReader
</code></font>
| nowrap="NOWRAP" valign="TOP" |
| nowrap="NOWRAP" valign="TOP" |
  <font size="-1"><code>'''abstract'''
  '''abstract'''
  (</code></font><font size="-1" color="Olive"><code>Reader</code></font><font size="-1"><code>)
  (<font color="Olive">Reader</font>)
  (</code></font><font size="-1" color="Olive"><code>Reader</code></font><font size="-1"><code>)
  (<font color="Olive">Reader</font>)
  (char[] buf)
  (char[] buf)
  abstract (Reader)
  abstract (Reader)
  (</code></font><font size="-1" color="Olive"><code>Reader</code></font><font size="-1"><code>)
  (<font color="Olive">Reader</font>)
  (</code></font><font size="-1" color="Red"><code>InputStream</code></font><font size="-1"><code>)
  (<font color="Red">InputStream</font>)
  (</code></font><font size="-1" color="Fuchsia"><code>File</code></font><font size="-1"><code>), (String)
  (<font color="Fuchsia">File</font>), (String)
  (), (PipedWriter)
  (), (PipedWriter)
  (String)
  (String)
</code></font>
|-
|-
| nowrap="NOWRAP" valign="TOP" |
| nowrap="NOWRAP" valign="TOP" |
  <font size="-1" color="#00C11F"><code>'''    Writer'''</code></font>
  <font color="#00C11F">'''    Writer'''</font>
<font size="-1"><code>        BufferedWriter
          BufferedWriter
           CharArrayWriter
           CharArrayWriter
           FilterWriter
           FilterWriter
Line 217: Line 148:
           PrintWriter
           PrintWriter
           StringWriter
           StringWriter
</code></font>
| nowrap="NOWRAP" valign="TOP" |
| nowrap="NOWRAP" valign="TOP" |
  <font size="-1"><code>'''abstract'''
  '''abstract'''
  (</code></font><font size="-1" color="#00C11F"><code>Writer</code></font><font size="-1"><code>)
  (<font color="#00C11F">Writer</font>)
  (), (int initialSize)
  (), (int initialSize)
  abstract (</code></font><font size="-1" color="#00C11F"><code>Writer</code></font><font size="-1"><code>)
  abstract (<font color="#00C11F">Writer</font>)
  (</code></font><font size="-1" color="Blue"><code>OutputStream</code></font><font size="-1"><code>)
  (<font color="Blue">OutputStream</font>)
  (</code></font><font size="-1" color="Fuchsia"><code>File</code></font><font size="-1"><code>), (String)
  (<font color="Fuchsia">File</font>), (String)
  (), (PipedReader)
  (), (PipedReader)
  (</code></font><font size="-1" color="Blue"><code>OutputStream</code></font><font size="-1"><code>), (</code></font><font size="-1" color="#00C11F"><code>Writer</code></font><font size="-1"><code>)
  (<font color="Blue">OutputStream</font>), (<font color="#00C11F">Writer</font>)
  (), (int initialSize)
  (), (int initialSize)
</code></font>
|-
|-
| nowrap="NOWRAP" valign="TOP" |
| nowrap="NOWRAP" valign="TOP" |
  <font size="-1" color="Fuchsia"><code>    File</code></font>
  <font color="Fuchsia">    File</font>
| nowrap="NOWRAP" valign="TOP" |
| nowrap="NOWRAP" valign="TOP" |
  <font size="-1"><code>(</code></font><font size="-1" color="Fuchsia"><code>File</code></font><font size="-1"><code>), (String)
  (<font color="Fuchsia">File</font>), (String)
</code></font>
|}
|}
|}
Let us start with <tt>File</tt>, a handy object that represents either a file path and file name, or only a path to a directory. Its implementation differs from one operating system to another, as the Unix path separator / is represented with \ on an OS/2 machine. To play safe, we use the <tt>File.separator</tt> static variable when writing paths. <tt>File</tt> is constructed with only a file name, a complete path, or a path as one string and the file name as another string. The instantiated object can now answer many questions, such as <tt>canRead(), canWrite(), exists(), isDirectory(),</tt> etc. Hence, if you have to create a file yourself, it may be a good idea to instantiate a <tt>File</tt> object.


</center>
Since <tt>read()</tt> can only read one data packet a time, sitting in a loop until maybe the end-of-line (EOL) character arrives, it seems convenient to use classes that have methods like <tt>readLine(). BufferedReader</tt> and <tt>LineNumberReader</tt> have such methods, thus when reading text files, one of these two classes is most often used, mainly the <tt>BufferedReader</tt>. We used such a reader in [[Into Java - Part IV|Into Java 4]] and [[Into Java - Part V|No 5]] and we will use <tt>BufferedReader</tt> today too.


<br clear="ALL" />
===Word frequency counter===
We will make ourselves an application that can count how many times each of the words of any provided text files are used. Hence we need to read a text file. We will use a data structure that can help us find equal words and we will write the result to a file.


<font size="-1" face="Helv,Helvetica,Arial">Let us start with</font><font size="-1"><code> File</code></font><font size="-1" face="Helv,Helvetica,Arial">, a handy object that represents either a file path and file name, or only a path to a directory. Its implementation differs from one operating system to another, as the Unix path separator / is represented with \ on an OS/2 machine. To play safe, we use the</font><font size="-1"><code> File.separator </code></font><font size="-1" face="Helv,Helvetica,Arial">static variable when writing paths.</font><font size="-1"><code> File </code></font><font size="-1" face="Helv,Helvetica,Arial">is constructed with only a file name, a complete path, or a path as one string and the file name as another string. The instantiated object can now answer many questions, such as</font><font size="-1"><code> canRead(), canWrite(), exists(), isDirectory(), </code></font><font size="-1" face="Helv,Helvetica,Arial">etc. Hence, if you have to create a file yourself, it may be a good idea to instantiate a</font><font size="-1"><code> File </code></font><font size="-1" face="Helv,Helvetica,Arial">object.<br /></font>
A speedy data structure that provides a way to tell the difference between objects is <tt>Hashtable</tt>, two equal words will produce the same hash code, and we may use <tt>containsKey(Object key)</tt> to find out if a word is used more than once. But where do we store the hits since a <tt>String</tt> cannot hold hits? It looks like we have to make ourselves a helper class that holds one word and a counter. Let us start with that one.


<font size="-1" face="Helv,Helvetica,Arial">Since</font><font size="-1"><code> read() </code></font><font size="-1" face="Helv,Helvetica,Arial">can only read one data packet a time, sitting in a loop until maybe the end-of-line (EOL) character arrives, it seems convenient to use classes that have methods like</font><font size="-1"><code> readLine(). BufferedReader </code></font><font size="-1" face="Helv,Helvetica,Arial">and</font><font size="-1"><code> LineNumberReader </code></font><font size="-1" face="Helv,Helvetica,Arial">have such methods, thus when reading text files, one of these two classes is most often used, mainly the</font><font size="-1"><code> BufferedReader</code></font><font size="-1" face="Helv,Helvetica,Arial">. We used such a reader in [../v5n1/intojava.html Into Java 4] and [../20000516/intojava.html No 5] and we will use</font><font size="-1"><code> BufferedReader </code></font><font size="-1" face="Helv,Helvetica,Arial">today too.<br /></font>
Since that helper class is to be used in a <tt>Hashtable</tt> it must implement <tt>hashCode</tt> and <tt>equals</tt>. On the other hand, since we work with <tt>String</tt> objects, we may use the methods of that class and we will just do a call-back to the <tt>String</tt> class' methods on each of the methods mentioned. (If we use Java 2 and would like to get a sorted output, we must implement the java.lang.Comparable interface, and that has only one method to implement, compareTo(Object other). That is because the static Collections.sort() method demands that the objects to be sorted support that interface.)


===='''Word frequency counter<br />'''====
[[Image:java14a.gif]]


<font size="-1" face="Helv,Helvetica,Arial">We will make ourselves an application that can count how many times each of the words of any provided text files are used. Hence we need to read a text file. We will use a data structure that can help us find equal words and we will write the result to a file.<br /></font>
Having this helper class we may continue with the WordCounter class. We will settle for a tiny terminal window version, although it could be increased into a GUI application, using this class as an invisible engine. The class we are making shall have a <tt>Hashtable</tt>, hence we must import the <tt>java.util</tt> package. We must also import <tt>java.io</tt> to get the file readers.


<font size="-1" face="Helv,Helvetica,Arial">A speedy data structure that provides a way to tell the difference between objects is</font><font size="-1"><code> Hashtable</code></font><font size="-1" face="Helv,Helvetica,Arial">, two equal words will produce the same hash code, and we may use</font><font size="-1"><code> containsKey(Object key) </code></font><font size="-1" face="Helv,Helvetica,Arial">to find out if a word is used more than once. But where do we store the hits since a</font><font size="-1"><code> String </code></font><font size="-1" face="Helv,Helvetica,Arial">cannot hold hits? It looks like we have to make ourselves a helper class that holds one word and a counter. Let us start with that one.<br /></font>
[[Image:java14b.gif]]


<font size="-1" face="Helv,Helvetica,Arial">Since that helper class is to be used in a</font><font size="-1"><code> Hashtable </code></font><font size="-1" face="Helv,Helvetica,Arial">it must implement</font><font size="-1"><code> hashCode </code></font><font size="-1" face="Helv,Helvetica,Arial">and</font><font size="-1"><code> equals</code></font><font size="-1" face="Helv,Helvetica,Arial">. On the other hand, since we work with</font><font size="-1"><code> String </code></font><font size="-1" face="Helv,Helvetica,Arial">objects, we may use the methods of that class and we will just do a call-back to the</font><font size="-1"><code> String </code></font><font size="-1" face="Helv,Helvetica,Arial">class' methods on each of the methods mentioned. (If we use Java 2 and would like to get a sorted output, we must implement the java.lang.Comparable interface, and that has only one method to implement, compareTo(Object other). That is because the static Collections.sort() method demands that the objects to be sorted support that interface.)<br /></font>
I think I have mentioned that there are system dependent characters and <tt>line.separator</tt> is one. A way to support system independence in Java is to use the variables available through the <tt>System.getProperty</tt> method. There is a list of such variables near that method in the Java API. The first line of <tt>main</tt> makes it valid.


{| width="100%" cellspacing="5" cellpadding="2" align="LEFT" bgcolor="#00B6FF"
The next task is to get the filename from the input argument, we must assure ourselves that there is a parameter to read, else we notify the user and quit automatically. Once we have a valid input, we instantiate an object of the <tt>WordCount</tt> class type, passing the input argument as a parameter to the constructor. So far we know that we have a valid argument from the user, but wait, didn't we use <tt>throws IOException</tt> the last time we worked with file reading?
|
{| width="100%" cellpadding="5" bgcolor="White"
| nowrap="NOWRAP" |
<font size="-1"><code>[[Image:java14a.gif]]
</code></font>
|}
|}


<font size="-1" face="Helv,Helvetica,Arial">Having this helper class we may continue with the WordCounter class. We will settle for a tiny terminal window version, although it could be increased into a GUI application, using this class as an invisible engine. The class we are making shall have a</font><font size="-1"><code> Hashtable</code></font><font size="-1" face="Helv,Helvetica,Arial">, hence we must import the</font><font size="-1"><code> java.util </code></font><font size="-1" face="Helv,Helvetica,Arial">package. We must also import</font><font size="-1"><code> java.io </code></font><font size="-1" face="Helv,Helvetica,Arial">to get the file readers.<br /></font>
This time we instantiate a handle to the file we want to access even though we are not yet certain if there is a file to read from. Fortunately <tt>File</tt> is an abstract handle to a file and does not need a file, hence we may create ourselves a file handle to use. The next step is to ask the handle if the file <tt>exists</tt>, and if not, tell the user and exit.


{| width="100%" cellspacing="5" cellpadding="2" align="LEFT" bgcolor="#00B6FF"
A small note, last time I mentioned briefly that using primes as the starting value for <tt>Hashtable</tt> gives better results. I will not argue further on that but please note that I [http://www.utm.edu/research/primes/ looked one up] and am using that prime, 2671, and that will be used for a while. Naturally, if you would like to count a huge file, you will need to increase this to a much bigger prime.
|
{| width="100%" cellpadding="5" bgcolor="White"
| nowrap="NOWRAP" |
<font size="-1"><code>[[Image:java14b.gif]]
</code></font>
|}
|}
 
 
<font size="-1" face="Helv,Helvetica,Arial">I think I have mentioned that there are system dependent characters and</font><font size="-1"><code> line.separator </code></font><font size="-1" face="Helv,Helvetica,Arial">is one. A way to support system independence in Java is to use the variables available through the</font><font size="-1"><code> System.getProperty </code></font><font size="-1" face="Helv,Helvetica,Arial">method. There is a list of such variables near that method in the Java API. The first line of</font><font size="-1"><code> main </code></font><font size="-1" face="Helv,Helvetica,Arial">makes it valid.<br /></font>
 
<font size="-1" face="Helv,Helvetica,Arial">The next task is to get the filename from the input argument, we must assure ourselves that there is a parameter to read, else we notify the user and quit automatically. Once we have a valid input, we instantiate an object of the</font><font size="-1"><code> WordCount </code></font><font size="-1" face="Helv,Helvetica,Arial">class type, passing the input argument as a parameter to the constructor. So far we know that we have a valid argument from the user, but wait, didn't we use</font><font size="-1"><code> throws IOException </code></font><font size="-1" face="Helv,Helvetica,Arial">the last time we worked with file reading?<br /></font>
 
<font size="-1" face="Helv,Helvetica,Arial">This time we instantiate a handle to the file we want to access even though we are not yet certain if there is a file to read from. Fortunately</font><font size="-1"><code> File </code></font><font size="-1" face="Helv,Helvetica,Arial">is an abstract handle to a file and does not need a file, hence we may create ourselves a file handle to use. The next step is to ask the handle if the file</font><font size="-1"><code> exists</code></font><font size="-1" face="Helv,Helvetica,Arial">, and if not, tell the user and exit.<br /></font>
 
<font size="-1" face="Helv,Helvetica,Arial">A small note, last time I mentioned briefly that using primes as the starting value for</font><font size="-1"><code> Hashtable </code></font><font size="-1" face="Helv,Helvetica,Arial">gives better results. I will not argue further on that but please note that I [http://www.utm.edu/research/primes/ looked one up] and am using that prime, 2671, and that will be used for a while. Naturally, if you would like to count a huge file, you will need to increase this to a much bigger prime.<br /></font>
 
<font size="-1" face="Helv,Helvetica,Arial">So far you may compile without error and try the error messages. Nothing else will work since we have not done anything to the</font><font size="-1"><code> count </code></font><font size="-1" face="Helv,Helvetica,Arial">method.<br /></font>
 
{| width="100%" cellspacing="5" cellpadding="2" align="LEFT" bgcolor="#00B6FF"
|
{| width="100%" cellpadding="5" bgcolor="White"
| nowrap="NOWRAP" |
<font size="-1"><code>[[Image:java14c.gif]]
</code></font>
|}
|}
 
<font size="-1" face="Helv,Helvetica,Arial">We chose to have the</font><font size="-1"><code> count </code></font><font size="-1" face="Helv,Helvetica,Arial">method public, thus it may be used by GUI apps. (To do this, you would instantiate an object with the file name as parameter and call</font><font size="-1"><code> count</code></font><font size="-1" face="Helv,Helvetica,Arial">. Unfortunately I will not make it quite that easy, you will have to change some lines to redirect the output to a text area.) Now we need a</font><font size="-1"><code> try/catch </code></font><font size="-1" face="Helv,Helvetica,Arial">block since</font><font size="-1"><code> readLine </code></font><font size="-1" face="Helv,Helvetica,Arial">may go weird, and if so, an error message needs to be printed. This time we use the</font><font size="-1"><code> System.err </code></font><font size="-1" face="Helv,Helvetica,Arial">that is the standard pipe to print error messages to. At this point it is still equal to</font><font size="-1"><code> System.out</code></font><font size="-1" face="Helv,Helvetica,Arial">, but it might very well be redirected to a log file or any other stream.<br /></font>
 
<font size="-1" face="Helv,Helvetica,Arial">We use the</font><font size="-1"><code> File </code></font><font size="-1" face="Helv,Helvetica,Arial">handle to make a</font><font size="-1"><code> FileReader</code></font><font size="-1" face="Helv,Helvetica,Arial">, which is used to make a</font><font size="-1"><code> BufferedReader</code></font><font size="-1" face="Helv,Helvetica,Arial">, a convenience class that has some useful methods like</font><font size="-1"><code> readLine</code></font><font size="-1" face="Helv,Helvetica,Arial">, which are preferred to the low level methods of</font><font size="-1"><code> FileReader</code></font><font size="-1" face="Helv,Helvetica,Arial">. Next, we start reading the file line by line as long as there are more lines to read. Please, note the parentheses in the</font><font size="-1"><code> while </code></font><font size="-1" face="Helv,Helvetica,Arial">clause.<br /></font>
 
<font size="-1" face="Helv,Helvetica,Arial">Every line read is sent to</font><font size="-1"><code> processLine </code></font><font size="-1" face="Helv,Helvetica,Arial">which simply wraps a</font><font size="-1"><code> StringTokenizer </code></font><font size="-1" face="Helv,Helvetica,Arial">around the line. This useful piece of code is located in</font><font size="-1"><code> java.util </code></font><font size="-1" face="Helv,Helvetica,Arial">and gives you tokens delimited by blanks, or any other whitespace, if you do not specify your own delimiters. As long as the line has more tokens (words) the</font><font size="-1"><code> while </code></font><font size="-1" face="Helv,Helvetica,Arial">loop continues. Finally we</font><font size="-1"><code> close </code></font><font size="-1" face="Helv,Helvetica,Arial">the file, although this time it is not strictly needed since we are only playing with it, in the future you might be working on networks where you should be more polite.<br /></font>


<font size="-1" face="Helv,Helvetica,Arial">Recall that a</font><font size="-1"><code> Hashtable </code></font><font size="-1" face="Helv,Helvetica,Arial">needs two things, a</font><font size="-1"><code> key </code></font><font size="-1" face="Helv,Helvetica,Arial">to map from, and the</font><font size="-1"><code> value </code></font><font size="-1" face="Helv,Helvetica,Arial">to store. They need not be the same thing, as in this case. We use the words as keys to the table, but we store the</font><font size="-1"><code> Word </code></font><font size="-1" face="Helv,Helvetica,Arial">instances made out of the actual tokens.<br /></font>
So far you may compile without error and try the error messages. Nothing else will work since we have not done anything to the <tt>count</tt> method.


<font size="-1" face="Helv,Helvetica,Arial">We use the token as a</font><font size="-1"><code> key </code></font><font size="-1" face="Helv,Helvetica,Arial">and want to know if it is already stored in the table, if so we increment the word count. Else we make a new</font><font size="-1"><code> Word </code></font><font size="-1" face="Helv,Helvetica,Arial">instance, use the token as</font><font size="-1"><code> key </code></font><font size="-1" face="Helv,Helvetica,Arial">and</font><font size="-1"><code> put </code></font><font size="-1" face="Helv,Helvetica,Arial">the object as</font><font size="-1"><code> value </code></font><font size="-1" face="Helv,Helvetica,Arial">in the table. In the end, the file will be read and all the words</font><font size="-1"><code> put </code></font><font size="-1" face="Helv,Helvetica,Arial">into the table and counted.<br /></font>
[[Image:java14c.gif]]


<font size="-1" face="Helv,Helvetica,Arial">Now we will continue with the "more code to come" part. What to do with the output? I have made two versions available, one for Java version 2 (that is Java 1.2 and above) and the one actually used here for the prior versions. If you use a later flavor of Java, please remove the appropriate lines and make a few changes to the code as explained, both in this class and in the</font><font size="-1"><code> Word </code></font><font size="-1" face="Helv,Helvetica,Arial">class where</font><font size="-1"><code> implements Comparable </code></font><font size="-1" face="Helv,Helvetica,Arial">must be visible.<br /></font>
We chose to have the <tt>count</tt> method public, thus it may be used by GUI apps. (To do this, you would instantiate an object with the file name as parameter and call <tt>count</tt>. Unfortunately I will not make it quite that easy, you will have to change some lines to redirect the output to a text area.) Now we need a <tt>try/catch</tt> block since <tt>readLine</tt> may go weird, and if so, an error message needs to be printed. This time we use the <tt>System.err</tt> that is the standard pipe to print error messages to. At this point it is still equal to <tt>System.out</tt>, but it might very well be redirected to a log file or any other stream.


<font size="-1" face="Helv,Helvetica,Arial">This time we get ourselves an</font><font size="-1"><code> Enumeration</code></font><font size="-1" face="Helv,Helvetica,Arial">, that is an abstraction of any data structure that is enumerable. This interface has two methods,</font><font size="-1"><code> public boolean hasMoreElements() </code></font><font size="-1" face="Helv,Helvetica,Arial">and</font><font size="-1"><code> public Object nextElement() </code></font><font size="-1" face="Helv,Helvetica,Arial">that operate on the underlying structure.<br /></font>
We use the <tt>File</tt> handle to make a <tt>FileReader</tt>, which is used to make a <tt>BufferedReader</tt>, a convenience class that has some useful methods like <tt>readLine</tt>, which are preferred to the low level methods of <tt>FileReader</tt>. Next, we start reading the file line by line as long as there are more lines to read. Please, note the parentheses in the <tt>while</tt> clause.


<font size="-1" face="Helv,Helvetica,Arial">Using such interfaces hides the actual data structure and you may conveniently change from a</font><font size="-1"><code> Vector </code></font><font size="-1" face="Helv,Helvetica,Arial">to an</font><font size="-1"><code> ArrayList </code></font><font size="-1" face="Helv,Helvetica,Arial">(Java 1.2) without too much work. For example, there is not a single line to be changed in the</font><font size="-1"><code> while </code></font><font size="-1" face="Helv,Helvetica,Arial">loop,</font><font size="-1"><code> Enumeration </code></font><font size="-1" face="Helv,Helvetica,Arial">still works.<br /></font>
Every line read is sent to <tt>processLine</tt> which simply wraps a <tt>StringTokenizer</tt> around the line. This useful piece of code is located in <tt>java.util</tt> and gives you tokens delimited by blanks, or any other whitespace, if you do not specify your own delimiters. As long as the line has more tokens (words) the <tt>while</tt> loop continues. Finally we <tt>close</tt> the file, although this time it is not strictly needed since we are only playing with it, in the future you might be working on networks where you should be more polite.


{| width="100%" cellspacing="5" cellpadding="2" align="LEFT" bgcolor="#00B6FF"
Recall that a <tt>Hashtable</tt> needs two things, a <tt>key</tt> to map from, and the <tt>value</tt> to store. They need not be the same thing, as in this case. We use the words as keys to the table, but we store the <tt>Word</tt> instances made out of the actual tokens.
|
{| width="100%" cellpadding="5" bgcolor="White"
| nowrap="NOWRAP" |
<font size="-1"><code>[[Image:java14d.gif]]
</code></font>
|}
|}


We use the token as a <tt>key</tt> and want to know if it is already stored in the table, if so we increment the word count. Else we make a new <tt>Word</tt> instance, use the token as <tt>key</tt> and <tt>put</tt> the object as <tt>value</tt> in the table. In the end, the file will be read and all the words <tt>put</tt> into the table and counted.


<font size="-1" face="Helv,Helvetica,Arial">The very next thing is to get ourselves an output filename. We use a</font><font size="-1"><code> StringBuffer </code></font><font size="-1" face="Helv,Helvetica,Arial">which does not cause any overhead to the JVM (as concatenating and mixing with</font><font size="-1"><code> String</code></font><font size="-1" face="Helv,Helvetica,Arial">s do) and it has many useful methods. Since we do not know if we can lengthen the filename (maybe you use 8+3 FAT) we alter the first two characters. Another option would be to change the file extension, if we know there is one.<br /><br /> Another</font><font size="-1"><code> try/catch </code></font><font size="-1" face="Helv,Helvetica,Arial">block encompasses a new</font><font size="-1"><code> FileWriter </code></font><font size="-1" face="Helv,Helvetica,Arial">that is set to not append to an existing file if there is one. For example, if you have a log file that is added to once in a while, you instantiate a</font><font size="-1"><code> FileWriter </code></font><font size="-1" face="Helv,Helvetica,Arial">object with boolean</font><font size="-1"><code> true</code></font><font size="-1" face="Helv,Helvetica,Arial">, and it will append to the end of the existing log file.<br /></font>
Now we will continue with the "more code to come" part. What to do with the output? I have made two versions available, one for Java version 2 (that is Java 1.2 and above) and the one actually used here for the prior versions. If you use a later flavour of Java, please remove the appropriate lines and make a few changes to the code as explained, both in this class and in the <tt>Word</tt> class where <tt>implements Comparable</tt> must be visible.


<font size="-1" face="Helv,Helvetica,Arial">The while loop is mainly self explanatory due to the narrative method names. Please note how convenient the</font><font size="-1"><code> toString </code></font><font size="-1" face="Helv,Helvetica,Arial">method may be from time to time.</font><font size="-1"><code> toString </code></font><font size="-1" face="Helv,Helvetica,Arial">is not only a good debugging method, it can serve our purposes this time as well. It does not provide splendid output but it is speedily implemented &lt;grin&gt;.<br /></font>
This time we get ourselves an <tt>Enumeration</tt>, that is an abstraction of any data structure that is enumerable. This interface has two methods, <tt>public boolean hasMoreElements()</tt> and <tt>public Object nextElement()</tt> that operate on the underlying structure.


<font size="-1" face="Helv,Helvetica,Arial">Finally the output file is closed and the</font><font size="-1"><code> count </code></font><font size="-1" face="Helv,Helvetica,Arial">method is finished. Compile and go for it. Optionally you may add a<br /></font>
Using such interfaces hides the actual data structure and you may conveniently change from a <tt>Vector</tt> to an <tt>ArrayList</tt> (Java 1.2) without too much work. For example, there is not a single line to be changed in the <tt>while</tt> loop, <tt>Enumeration</tt> still works.


<blockquote><font size="-1"><code>System.out.println("The table size is: " + table.size());</code></font><br /><font size="-1" face="Helv,Helvetica,Arial"> </font></blockquote>
[[Image:java14d.gif]]


to the method and you will see how close to the prime we got.<br />
The very next thing is to get ourselves an output filename. We use a <tt>StringBuffer</tt> which does not cause any overhead to the JVM (as concatenating and mixing with <tt>String</tt>s do) and it has many useful methods. Since we do not know if we can lengthen the filename (maybe you use 8+3 FAT) we alter the first two characters. Another option would be to change the file extension, if we know there is one.


===<font face="Helv,Helvetica,Arial">'''Summary<br />'''</font>===
Another <tt>try/catch</tt> block encompasses a new <tt>FileWriter</tt> that is set to not append to an existing file if there is one. For example, if you have a log file that is added to once in a while, you instantiate a <tt>FileWriter</tt> object with boolean <tt>true</tt>, and it will append to the end of the existing log file.


<font size="-1" face="Helv,Helvetica,Arial">This time we used a speedy data structure that is good for quick input and lookup. The object was to count words, and we expected to find a lot of duplicates and in fact, we were not interested in a sorted structure. At least not from the beginning. Hashtable is very good for such tasks. Java 2 offers a method to sort our data structures as well.<br /></font>
The while loop is mainly self explanatory due to the narrative method names. Please note how convenient the <tt>toString</tt> method may be from time to time. <tt>toString</tt> is not only a good debugging method, it can serve our purposes this time as well. It does not provide splendid output but it is speedily implemented <grin>.


<font size="-1" face="Helv,Helvetica,Arial">The</font><font size="-1"><code> put </code></font><font size="-1" face="Helv,Helvetica,Arial">and lookup (</font><font size="-1"><code>containsKey</code></font><font size="-1" face="Helv,Helvetica,Arial">) do not depend on the size of the table, the time is constant with increasing sizes.<br /></font>
Finally the output file is closed and the <tt>count</tt> method is finished. Compile and go for it. Optionally you may add a
System.out.println("The table size is: " + table.size());
to the method and you will see how close to the prime we got.


<font size="-1" face="Helv,Helvetica,Arial">Further we found that is was not hard to get a list of the contents, we used an</font><font size="-1"><code> Enumeration </code></font><font size="-1" face="Helv,Helvetica,Arial">and found a nice interface, having only two methods.<br /></font>
===Summary===
This time we used a speedy data structure that is good for quick input and lookup. The object was to count words, and we expected to find a lot of duplicates and in fact, we were not interested in a sorted structure. At least not from the beginning. Hashtable is very good for such tasks. Java 2 offers a method to sort our data structures as well.


<font size="-1" face="Helv,Helvetica,Arial">We have touched on the useful</font><font size="-1"><code> File </code></font><font size="-1" face="Helv,Helvetica,Arial">class, a handle to a file, existing or not. Instances from</font><font size="-1"><code> File </code></font><font size="-1" face="Helv,Helvetica,Arial">may be used with some other classes, as we did.<br /></font>
The <tt>put</tt> and lookup (<tt>containsKey</tt>) do not depend on the size of the table, the time is constant with increasing sizes.


<font size="-1"><code>BufferedReader </code></font><font size="-1" face="Helv,Helvetica,Arial">encapsulating a</font><font size="-1"><code> FileReader </code></font><font size="-1" face="Helv,Helvetica,Arial">showed itself to contain many useful methods. Reading a stream always follows the same pattern:<br /></font>
Further we found that is was not hard to get a list of the contents, we used an <tt>Enumeration</tt> and found a nice interface, having only two methods.


* <font size="-1" face="Helv,Helvetica,Arial">open the stream<br /></font>
We have touched on the useful <tt>File</tt> class, a handle to a file, existing or not. Instances from <tt>File</tt> may be used with some other classes, as we did.
* <font size="-1" face="Helv,Helvetica,Arial">read some data<br /></font>
* <font size="-1" face="Helv,Helvetica,Arial">process the data<br /></font>
* <font size="-1" face="Helv,Helvetica,Arial">repeat until no more data is available<br /></font>
* <font size="-1" face="Helv,Helvetica,Arial">close the stream<br /></font>


<font size="-1" face="Helv,Helvetica,Arial">Finally we used</font><font size="-1"><code> FileWriter </code></font><font size="-1" face="Helv,Helvetica,Arial">to print the results, and as with input streams, the output streams follow the same procedure:<br /></font>
<tt>BufferedReader</tt> encapsulating a <tt>FileReader</tt> showed itself to contain many useful methods. Reading a stream always follows the same pattern:
*open the stream
*read some data
*process the data
*repeat until no more data is available
*close the stream


* <font size="-1" face="Helv,Helvetica,Arial">open the stream<br /></font>
Finally we used <tt>FileWriter</tt> to print the results, and as with input streams, the output streams follow the same procedure:
* <font size="-1" face="Helv,Helvetica,Arial">write some data<br /></font>
*open the stream
* <font size="-1" face="Helv,Helvetica,Arial">repeat until finished<br /></font>
*write some data
* <font size="-1" face="Helv,Helvetica,Arial">close the stream<br /></font>
*repeat until finished
*close the stream


<font size="-1" face="Helv,Helvetica,Arial">Next time we will continue with streams and look into Exceptions. We have used the</font><font size="-1"><code> try/catch </code></font><font size="-1" face="Helv,Helvetica,Arial">mechanism before but there is more to say about that. Exception handling is one of the bigger strengths of Java and using them in a good, sensible way may makes your apps reliable and robust.<br /></font>
Next time we will continue with streams and look into Exceptions. We have used the <tt>try/catch</tt> mechanism before but there is more to say about that. Exception handling is one of the bigger strengths of Java and using them in a good, sensible way may makes your apps reliable and robust.


<font size="-1" face="Helv,Helvetica,Arial">I must also announce that I have to make this column much shorter in the future, mostly because my time is limited and I do not have unlimited strength. Still I will try to do my best, and I hope you enjoy future installments.
I must also announce that I have to make this column much shorter in the future, mostly because my time is limited and I do not have unlimited strength. Still I will try to do my best, and I hope you enjoy future instalments.


[[Category:Languages Articles]]
[[Category:Java Articles]]

Latest revision as of 14:56, 21 March 2018

Into Java / Part
I II III IV V VI VII VIII IX X XI XII
XIII IV XV XVI XVII XVIII XIX XX XXI XXII XXIII

By Simon Grönlund

Into Java, Part 14

Streams are important data structures to know of. Not long after the first computers were powered on, directly controlled by switches, terminals cabled to the machines were introduced. Over time, keyboards of many different kinds surfaced and needed to be plugged into the computer, and devices like video screens also needed connections. Today we use files on disks or other media, networks, radio links and quite a few more techniques to receive or send information.

Every time we use some kind of data stream, it is simply a sequence of bytes or characters. Basically any kind of stream is handled the same way, there is no big difference between streaming audio data and a text file stream, except that the text file is locally stored and the audio may spring from a live concert somewhere.

The four basic stream classes in Java effectively shield you from the exact mechanism of how a stream is read or written, mainly because the actual functions differ for source and target types. The most important thing to us is that we can read or write to a stream.

That hiding of the low level implementation is not only convenient to us, it also gives the low level guys and girls the freedom to change their implementation, as long as the interface, the API, is untouched. Also, new technology can be offered without any need to change our application.

We will however look at what a basic stream is and what we can do with it. Then we will add the convenience classes of Java to the stream and make use of a few such classes. Both input and output streams in Java have a veritable zoo of different add-on classes to pick from.

Further we will use some data structures. Unlike the many data structures added to Java 2, streams have been integral since Java version 1. Finally we will learn another term, abstract, and what that implies and means.

A stream is a sequence of bytes or characters. Input streams are read from and output streams are written to. Historically 8-bit bytes are used, but today the 16-bit Unicode characters are becoming more popular and hence Java offers special classes for handling these characters transparently to the programmer.

Consider a pipe providing one byte, either from time to time as from a keyboard that is used infrequently, or at a high speed from reading a locally stored file. In both cases, there is only one byte at a time provided, you have to remove it to give room for the next one. That is what the most basic stream looks like, which leads us to the InputStream. The OutputStream is the exact opposite, a pipe that can take one byte at a time, but we will come to that class in a moment.

InputStream and Reader

As both the byte streams (InputStream) and the character streams (Reader) provide almost the same functionality, I discuss them together. Later we will see how the two groups differ in usage. A basic input stream provides these methods:

void close()                  // abstract in Reader
void mark()
int read()                    // abstract in InputStream
int read(char[] buf)
int read(char[] buf, int offset, int length) // abstract in Reader
void reset()
void skip(long n)

A method may not be implemented in a specific class and then be declared abstract, as

public abstract int read()

That makes the entire class abstract. In fact such a class is more of a framework, and in some aspects it is close to an interface. But where interfaces only declare the methods to be used, an abstract class may have one or several methods that in fact are implemented, and some that are not.

Abstract classes cannot be instantiated, thus they are always subclassed and the subclass(es) then implement the missing method bodies as they want to. This is quite close to overriding methods, only this time implementation is most certainly needed. Such classes may also act as a label on certain available streams so that we know what we can expect from them.

Why is this? It is obvious when looking at the differences between data streams and text streams, one is using byte and the other char, which leads to two different implementations.

Still, any non abstract java.io subclass of InputStream and Reader, has the methods mentioned!

In addition to these there are a few methods that differ between the classes. The most important methods are the read methods, the basic one returning an int (I presume an int is chosen since Reader may return an int in the range of 0 to 65535 (0x00-0xffff), or -1 if the end of the stream is found), and the other ones filling a provided char array with characters one at a time.

reset() is used to start over from the place mark() was used to put kind of a "book-mark", if the stream provides "book-marks". We may also skip(long n) bytes or characters if we want to. When finished reading the stream we close() it so the system resources will be returned.

These methods differ in InputStream and Reader only in the former processing bytes and the latter characters. The characters are represented internally within your Java application in Unicode format, but externally the encoding depends on the underlying system and the actual stream processed.

Now an observation: read is a rather dumb method, it will sit and wait at the end of the input stream for more data. At least until an IOException arrives. Hence, if you read from a stream, your application will freeze if the stream stops for a while, as a stream over the Internet may do. Later we will see how to take care of that.

OutputStream and Writer

As input streams had their common methods the output streams do as well where x is byte or char respectively:

void close()                       // abstract in Writer
void flush()                       // abstract in Writer
void write(int x)                  // abstract in OutputStream
void write(x[] x)
void write(x[] x, int off, int len)      // abstract in Writer
void write(String str)                   // Writer only
void write(String str, int off, int len) // Writer only

The write(...) methods are self-explanatory, as is close(). flush() tells the system to write the data right away, if buffering is used.

Three pipes from the System class are always available, System.in, System.out and System.err. The former two we have used, especially System.out. System.in is an InputStream while the latter two are of the OutputStream type.

However, not all of the methods mentioned so far are implemented, see the abstract box. All these methods will be implemented by the classes we wrap these basic streams up in. Hence we may consider these basic stream classes a plain pipe, and we have to wrap it up in convenience classes if we do not want to do a lot of tiresome coding ourselves.

Convenience classes

Below is a table of most of the IO-classes. Remember that a subclass also has the properties of its superclass(es). That is for example, BufferedInputStream is a FilterInputStream, that is an InputStream, that is an Object. Hence you may for example use an InputStream reference from somewhere. And from that abstract object we may make ourselves an InputStreamReader that is a Reader so we can make a BufferedReader.

I have coloured the base IO classes so it can be easily seen where they may be used in constructors.

Class names

Object
Constructor
    InputStream
         ByteArrayInputStream
         FileInputStream
         FilterInputStream
               BufferedInputStream
               DataInputStream
               PushBackInputStream
         ObjectInputStream
         PipedInputStream
abstract
(byte[] buf)
(File), (String)
(InputStream)
(InputStream)
(InputStream)
(InputStream)
(InputStream)
(), (PipedOutputStream)
     OutputStream
         ByteArrayOutputStream
         FileOutputStream
         FilterOutputStream
               BufferedOutputStream
               DataOutputStream
               PrintStream
         ObjectOutputStream
         PipedOutputStream
abstract
(), (int size)
(File), (String)
(OutputStream)
(OutputStream)
(OutputStream)
(OutputStream)
(), (OutputStream)
(), (PipedInputStream)
    Reader
         BufferedReader
              LineNumberReader
         CharArrayReader
         FilterReader
              PushbackReader
         InputStreamReader
              FileReader
         PipedReader
         StringReader
abstract
(Reader)
(Reader)
(char[] buf)
abstract (Reader)
(Reader)
(InputStream)
(File), (String)
(), (PipedWriter)
(String)
    Writer
         BufferedWriter
         CharArrayWriter
         FilterWriter
         OutputStreamWriter
              FileWriter
         PipedWriter
         PrintWriter
         StringWriter
abstract
(Writer)
(), (int initialSize)
abstract (Writer)
(OutputStream)
(File), (String)
(), (PipedReader)
(OutputStream), (Writer)
(), (int initialSize)
    File
(File), (String)

Let us start with File, a handy object that represents either a file path and file name, or only a path to a directory. Its implementation differs from one operating system to another, as the Unix path separator / is represented with \ on an OS/2 machine. To play safe, we use the File.separator static variable when writing paths. File is constructed with only a file name, a complete path, or a path as one string and the file name as another string. The instantiated object can now answer many questions, such as canRead(), canWrite(), exists(), isDirectory(), etc. Hence, if you have to create a file yourself, it may be a good idea to instantiate a File object.

Since read() can only read one data packet a time, sitting in a loop until maybe the end-of-line (EOL) character arrives, it seems convenient to use classes that have methods like readLine(). BufferedReader and LineNumberReader have such methods, thus when reading text files, one of these two classes is most often used, mainly the BufferedReader. We used such a reader in Into Java 4 and No 5 and we will use BufferedReader today too.

Word frequency counter

We will make ourselves an application that can count how many times each of the words of any provided text files are used. Hence we need to read a text file. We will use a data structure that can help us find equal words and we will write the result to a file.

A speedy data structure that provides a way to tell the difference between objects is Hashtable, two equal words will produce the same hash code, and we may use containsKey(Object key) to find out if a word is used more than once. But where do we store the hits since a String cannot hold hits? It looks like we have to make ourselves a helper class that holds one word and a counter. Let us start with that one.

Since that helper class is to be used in a Hashtable it must implement hashCode and equals. On the other hand, since we work with String objects, we may use the methods of that class and we will just do a call-back to the String class' methods on each of the methods mentioned. (If we use Java 2 and would like to get a sorted output, we must implement the java.lang.Comparable interface, and that has only one method to implement, compareTo(Object other). That is because the static Collections.sort() method demands that the objects to be sorted support that interface.)

Having this helper class we may continue with the WordCounter class. We will settle for a tiny terminal window version, although it could be increased into a GUI application, using this class as an invisible engine. The class we are making shall have a Hashtable, hence we must import the java.util package. We must also import java.io to get the file readers.

I think I have mentioned that there are system dependent characters and line.separator is one. A way to support system independence in Java is to use the variables available through the System.getProperty method. There is a list of such variables near that method in the Java API. The first line of main makes it valid.

The next task is to get the filename from the input argument, we must assure ourselves that there is a parameter to read, else we notify the user and quit automatically. Once we have a valid input, we instantiate an object of the WordCount class type, passing the input argument as a parameter to the constructor. So far we know that we have a valid argument from the user, but wait, didn't we use throws IOException the last time we worked with file reading?

This time we instantiate a handle to the file we want to access even though we are not yet certain if there is a file to read from. Fortunately File is an abstract handle to a file and does not need a file, hence we may create ourselves a file handle to use. The next step is to ask the handle if the file exists, and if not, tell the user and exit.

A small note, last time I mentioned briefly that using primes as the starting value for Hashtable gives better results. I will not argue further on that but please note that I looked one up and am using that prime, 2671, and that will be used for a while. Naturally, if you would like to count a huge file, you will need to increase this to a much bigger prime.

So far you may compile without error and try the error messages. Nothing else will work since we have not done anything to the count method.

We chose to have the count method public, thus it may be used by GUI apps. (To do this, you would instantiate an object with the file name as parameter and call count. Unfortunately I will not make it quite that easy, you will have to change some lines to redirect the output to a text area.) Now we need a try/catch block since readLine may go weird, and if so, an error message needs to be printed. This time we use the System.err that is the standard pipe to print error messages to. At this point it is still equal to System.out, but it might very well be redirected to a log file or any other stream.

We use the File handle to make a FileReader, which is used to make a BufferedReader, a convenience class that has some useful methods like readLine, which are preferred to the low level methods of FileReader. Next, we start reading the file line by line as long as there are more lines to read. Please, note the parentheses in the while clause.

Every line read is sent to processLine which simply wraps a StringTokenizer around the line. This useful piece of code is located in java.util and gives you tokens delimited by blanks, or any other whitespace, if you do not specify your own delimiters. As long as the line has more tokens (words) the while loop continues. Finally we close the file, although this time it is not strictly needed since we are only playing with it, in the future you might be working on networks where you should be more polite.

Recall that a Hashtable needs two things, a key to map from, and the value to store. They need not be the same thing, as in this case. We use the words as keys to the table, but we store the Word instances made out of the actual tokens.

We use the token as a key and want to know if it is already stored in the table, if so we increment the word count. Else we make a new Word instance, use the token as key and put the object as value in the table. In the end, the file will be read and all the words put into the table and counted.

Now we will continue with the "more code to come" part. What to do with the output? I have made two versions available, one for Java version 2 (that is Java 1.2 and above) and the one actually used here for the prior versions. If you use a later flavour of Java, please remove the appropriate lines and make a few changes to the code as explained, both in this class and in the Word class where implements Comparable must be visible.

This time we get ourselves an Enumeration, that is an abstraction of any data structure that is enumerable. This interface has two methods, public boolean hasMoreElements() and public Object nextElement() that operate on the underlying structure.

Using such interfaces hides the actual data structure and you may conveniently change from a Vector to an ArrayList (Java 1.2) without too much work. For example, there is not a single line to be changed in the while loop, Enumeration still works.

The very next thing is to get ourselves an output filename. We use a StringBuffer which does not cause any overhead to the JVM (as concatenating and mixing with Strings do) and it has many useful methods. Since we do not know if we can lengthen the filename (maybe you use 8+3 FAT) we alter the first two characters. Another option would be to change the file extension, if we know there is one.

Another try/catch block encompasses a new FileWriter that is set to not append to an existing file if there is one. For example, if you have a log file that is added to once in a while, you instantiate a FileWriter object with boolean true, and it will append to the end of the existing log file.

The while loop is mainly self explanatory due to the narrative method names. Please note how convenient the toString method may be from time to time. toString is not only a good debugging method, it can serve our purposes this time as well. It does not provide splendid output but it is speedily implemented <grin>.

Finally the output file is closed and the count method is finished. Compile and go for it. Optionally you may add a

System.out.println("The table size is: " + table.size());

to the method and you will see how close to the prime we got.

Summary

This time we used a speedy data structure that is good for quick input and lookup. The object was to count words, and we expected to find a lot of duplicates and in fact, we were not interested in a sorted structure. At least not from the beginning. Hashtable is very good for such tasks. Java 2 offers a method to sort our data structures as well.

The put and lookup (containsKey) do not depend on the size of the table, the time is constant with increasing sizes.

Further we found that is was not hard to get a list of the contents, we used an Enumeration and found a nice interface, having only two methods.

We have touched on the useful File class, a handle to a file, existing or not. Instances from File may be used with some other classes, as we did.

BufferedReader encapsulating a FileReader showed itself to contain many useful methods. Reading a stream always follows the same pattern:

  • open the stream
  • read some data
  • process the data
  • repeat until no more data is available
  • close the stream

Finally we used FileWriter to print the results, and as with input streams, the output streams follow the same procedure:

  • open the stream
  • write some data
  • repeat until finished
  • close the stream

Next time we will continue with streams and look into Exceptions. We have used the try/catch mechanism before but there is more to say about that. Exception handling is one of the bigger strengths of Java and using them in a good, sensible way may makes your apps reliable and robust.

I must also announce that I have to make this column much shorter in the future, mostly because my time is limited and I do not have unlimited strength. Still I will try to do my best, and I hope you enjoy future instalments.