Into Java - Part XIII

By Simon Grönlund

Into Java, Part 13
How amazing, this instalment numbers thirteen and the columns now span over three years and two millennia. Now even the puritans can agree on that, December 31th last year the second millennium ended and the third one started. Time is flying away too fast to imagine, but the computer business seems to keep in pace.

This time we will have a discussion on data structures. Fred Brooks once wrote: "Show me your {code} and conceal your Note that put needs two objects, the key and the value to be stored. Many times the key and the value are the very same object, the object maps to itself. That is not necessary of course, as the  Integer  example shows, the only way to get to the integer object is to give the key "one". That is, the key to "James Bond" may be "Agent 007", no other key will ever reveal the name of that spy.

With String the equals does a compare character by character and does not pay any attention to where the String objects actually are stored in memory. Thus you can get to a value by, for example another "Alpha" that in fact is a new String object, but still gives the same hash value and equals very well. If, on the contrary, the Integer object were put with itself as a key, we need to have it on hand to get it out later and that is not what we liked to. Hence a good practice with your classes may be a good equals method, or a good toString that produces a known key that you may replicate later on, without need to keep it on hand.

A very, very simple test case.



At least at my computer I found that some of the latter strings produce exactly the same hash value, maybe a flaw in the hashCode</tt> implementation with very long strings, as the latter strings came to be. Else we understand that the strings produce a hash value that is later used by  put </tt> to insert the strings at their indexes in the hash table array.

We can not get the exact length of that array in the end, but I guess it might be 47 since the default size is 11 (primes gives better result than other sizes) and the default load factor is 0.75. Hence two turns in  rehash </tt> have to be done, each does size * 2 + 1</tt>. This time we waste more than 50% of the memory foot print, since it was only two or three elements added after the latest rehash, but there is room for 15 elements until the next rehash.

Summary
We have made a deep and much theoretical dive into the field of data structures. The outcome will be that we have to think twice when we are to chose a structure. Is the look-up time important, or is sortability more important? Is duplicate elements to expect, or should they not be allowed? How limited is the working memory?

The main difference between ordinary arrays and list on the one hand, and maps on the other hand is that the former use an index to get to an element, the latter use the key-value concept.

We will use many of the discussed structures in the future and the next time we will start a slow wandering through streams, reading and writing. And continuing this shock start on data structures, but only one or two types at a time.

Thanks for this time and CU around.

Note: This columns covers some classes only found in Java 2 or later, some others are also found in Java 1.1.8. The theory holds for anyone.

Also note that Collections (with a trailing s) is the name of the framework, but Collection</tt> is an interface found in the java.util</tt> package.