Feedback Search Top Backward Forward
EDM/2

An Introduction to C Programming - Part 8

Written by Björn Fahller

Linkbar

 
Part1 Part2 Part3 Part4 Part5 Part6 Part7 Part8 Part9 Part10

Introduction

Last month we saw how we could encapsulate the implementation of a word reading file in a separate module. The encapsulation was limited, though, in that only one word file could be open at the time, and that the definition of what is a word was hard coded. This article will introduce a different kind of encapsulation, usually referred to as an abstract data type (often referred to as an "ADT",) which allows us to use as many word files as we please. The new word file will also be expanded so that the definition of what is a word can be changed by the user.

Definition

If you look at the implementation of last month's word file, you notice that there was a variable called "file" of type "FILE*," that was used when reading data. The variable either had the value NULL, or a value given from a call to the standard function "fopen." What I did not mention last month, was that it is possible to have several files open at the same time. All you have to do is to have several variables of type "FILE*," and assign each of them the value of different calls to "fopen." Which variable you use determines which file you operate on. That is exactly how a word file should work. We need a data type called "WORDFILE*" (or something like that) and variables of that type get their values from "wordfile_open." "FILE*" is what is called an abstract data type. Abstract because we don't know, and don't care, about its values (other than to compare against NULL.) We just pass values of that kind on to functions that do understand them, like "fgets", "fgetc" and "fclose."

Just like last month, let's first have a look at the desired semantics of our new word file.

Semantics


  WORDFILE* wordfile_open(const char* filename);
  • Failure to open a word file is reported by returning NULL, success by returning a non-NULL value.
  • Passing the NULL pointer as the name is a programming error.
  • If no file with the passed name exists, open should fail.
  • If the return value is non-NULL, the word file is opened.

  int wordfile_close(WORDFILE* wordfile);
Like last month, this closes the word file. "wordfile" tells which. If closing was successful, the value returned is 1, otherwise 0.
  • "wordfile" must be a value returned by a previous call to "wordfile_open," for which "wordfile_close" has not been called before.
  • If "wordfile" is NULL, "wordfile_close" does nothing and returns 1.
  • A return value of 1 indicates successful closing of the file, and 0 indicates a failure.

  size_t wordfile_nextword(WORDFILE* wordfile,
                           char* buffer,
                           size_t buffersize);
Here "wordfile" determines which word file we want the next word from. "buffer", "buffersize" and the return value have the same meaning as last month.
  • "wordfile" must be a value returned by a call to "wordfile_open", for which "wordfile_close" has not been called.
  • It is a programming error if "wordfile" is the NULL pointer.
  • "buffer" must not be the NULL pointer.
  • "buffersize" must be at least 2, to hold a minimum of one character and the null-termination.
  • Return the length of the word copied into buffer. If 0 is returned, no word was found before end of file. If the number returned equals "buffersize," there was not room for the entire word in "buffer." In this case, the buffer will contain only the first "buffersize-1" characters of the word, the remaining characters will be discarded.
  • If end of file is reached when reading a word, the end of the word is also reached, so the word read is copied into buffer, and the length of it returned. The next call will return 0, indicating that the last word has been read.
A header file for our new improved word file can look something like this:

  /* Usage:                  */
  /* #include <stdio.h>      */
  /* #include "wordfile.h"   */

  struct wordfile_struct;                  /* Forward declaration of */
  typedef struct wordfile_struct WORDFILE; /* abstract data type.    */
                                           /* Explained below.       */

  WORDFILE* wordfile_open(const char* filename);

  /* Open a word file                                             */
  /*                                                              */
  /* Return values: NULL, failure to open the file                */
  /*                non-NULL, a value to use in wordfile_nextword */
  /*                and wordfile_close.                           */
  /*                                                              */
  /* Preconditions:                                               */
  /*   filename != NULL                                           */
  /*                                                              */
  /* Postconditions:                                              */
  /*  If success, the file is open.                               */


  int wordfile_close(WORDFILE* wordfile);

  /* Close the open wordfile                                      */
  /*                                                              */
  /* Return values: 0 failure to close the file.                  */
  /*                1 succeeded in closing the file.              */
  /*                                                              */
  /* Postconditions:                                              */
  /*   If success, the wordfile is closed.                        */


  size_t wordfile_nextword(WORDFILE* wordfile,
                           char* buffer,
                           size_t buffersize);

  /* Read the next word in the open word file into the buffer     */
  /*                                                              */
  /* Return values: The length of the word copied into the        */
  /*                buffer. A value of "buffersize" indicates     */
  /*                that the word has been truncated and that     */
  /*                only the first buffersize-1 characters of     */
  /*                the word are available.                       */
  /*                                                              */
  /* Preconditions:                                               */
  /*   wordfile != NULL.                                          */
  /*   buffersize >= 2                                            */
  /*   buffer != NULL                                             */

The forward declaration says that there is a struct data type with the name "struct wordfile_struct", which we call "WORDFILE". We don't say anything, however, about what the guts of the struct are, or will be. This incomplete data type cannot be instantiated (i.e., we cannot have variables of type "WORDFILE".) We can, however, have pointers to an incomplete datatype. The good thing about this, is that for a user, WORDFILE is a secret. It's something they can use, through calls to our functions, but they cannot (easily) tamper with the data used by the word file.

Thoughts on implementation

Before getting down to writing the code, there are a few things that needs to be thought out. The data type "WORDFILE", or rather, the "struct wordfile_struct" cannot remain a secret much longer. To the user of our improved word file, it should of course remain a secret, but to us, as implementors, the time has come to define it. What information is needed for every word file? If we look at last month's solution, the only data used was the variable "file" of type "FILE*." Is a "FILE*" enough? Actually, yes, for now. One way of defining our data type is:


  struct wordfile_struct {
    FILE* file
  };
Since the typedef makes "WORDFILE" an alias for "struct wordfile_struct", we can hereafter refer to the data type as "WORDFILE."

Then comes the next problem; that of returning values of type "WORDFILE*." We saw in part 6 how a pointer to a variable can be obtained with the unary operator "&". That doesn't sound like a very good solution for us now, though. How many variables would we need? Would 2 be enough, or 10? Using an array instead of discrete variables doesn't help either, since we still have the problem of deciding how large the array should be. Fortunately there is a solution available in the ANSI/ISO C library. The solution is a function pair named "malloc" and "free", declared in <stdlib.h>. Their prototypes are:


  void* malloc(size_t size);
  void free(void* ptr);
What is "void*"? Earlier I've said that "void" is a pseudo type used to denote "nothing at all", for example as the return type of a function not returning any value, or as the parameter type of a function not requiring any parameters. In the case of "void*", "void" should be interpreted as "anything," thus "void*" becomes a pointer to anything. Since this type can be used to point to any data, it is not possible to do any arithmetics on it. It can be compared to the NULL pointer, and it can be cast to other pointer types.

Enough about "void*" for now. What is it "malloc" and "free" actually do? "malloc" allocates a block of memory, as large as its parameter says it should be, and returns the pointer to it. If you remember part 6, on pointers and arrays, you maybe remember the "sizeof" operator. "sizeof" is very frequently used together with "malloc," since to have a pointer to a type X, "malloc(sizeof(X))" will allocate exactly as large a block of memory as is required for the type X, and return the pointer to the block.

"free" undoes what "malloc" did, that is, it deallocates the block of memory that "malloc" allocated. It is very important to remember to always "free" objects created with "malloc" when they are no longer needed, otherwise you get what is called a memory leak.

So, for our word file, we can create our "WORDFILE" with "malloc" in "wordfile_open," and deallocate it with "free" in "wordfile_close." In both cases, what we use is the pointer to the "WORDFILE" created by "malloc."

Implementation

First, the definition of "WORDFILE" should go into "wordfile.h". Other than that, "wordfile.c" can look as follows:


  #include <stdio.h>        /* size_t and file stuff */
  #include "wordfile.h"
  #include <assert.h>       /* assert                */
  #include <stdlib.h>       /* malloc and free       */
  #include <ctype.h>        /* is***                 */

  struct wordfile_struct {  /* our ADT */
    FILE* file;
  };

  WORDFILE* wordfile_open(const char* filename)
  {
    WORDFILE* wordfile = NULL;

    /* Preconditions:                                        */
    /*   filename != NULL                                    */
    assert(filename != NULL);


    /* First create the wordfile to use */
    WORDFILE* wordfile = (WORDFILE*)malloc(sizeof(WORDFILE));

    /* check that malloc succeeded, otherwise return NULL for failure */
    if (wordfile == NULL)
    {
      return NULL;
    }

    /* Now open the file, and assign the "file" component    */
    /* of our newly created wordfile the value returned by   */
    /* fopen                                                 */
    wordfile->file = fopen(filename, "r");

    if (wordfile->file == NULL)
    {
      free(wordfile);  /* if fopen failed, free the allocated memory */
      wordfile = NULL; /* and set the return value to NULL           */
    }

    return wordfile;
  }


  int wordfile_close(WORDFILE* wordfile)
  {
    int retval = 1;
    if (wordfile != NULL)
    {
      retval = (fclose(wordfile->file) == 0); /* 1 */
      free(wordfile);
    }
    return retval;
  }

  size_t wordfile_nextword(WORDFILE* wordfile,
                           char* buffer,
                           size_t buffersize)
  {
    size_t retval = 0;
    int c = 0;

    /* Preconditions:                                               */
    /*   wordfile != NULL.                                          */
    assert(wordfile != NULL);

    /*   buffersize >= 2                                            */
    assert(buffersize >= 2);

    /*   buffer != NULL                                             */
    assert(buffer != NULL);

    /* internal check. If wordfile->file == NULL, something is      */
    /* seriously wrong.                                             */
    assert(wordfile->file != NULL);

    while ((c = fgetc(wordfile->file)) != EOF && !isalnum(c))
      ;/* loop until we find an alphanumeric character or EOF */


    while (isalnum(c))
    {
      if (retval < buffersize)
        buffer[retval++] = (char)c;
      c = fgetc(wordfile->file);
    }

    if (retval < buffersize)
      buffer[retval] = 0;/* null terminate */
    else
      buffer[buffersize-1] = 0; /* force null-termination
                                   of too long word */

    return retval; /* return the length of the copied word */
  }
The implementation of "wordfile_nextword" is the same as last month, with "file" replaced with "wordfile->file."

At /* 1 */, "fclose" returns 0 on success, so if the closing is successful, "retval" is assigned the value 1, as agreed upon in the interface specification.

Usage of this wordfile is slightly different from usage of the one from last month. Here's last month's test program rewritten for this version.


  #include <stdio.h>
  #include "wordfile.h"

  int main(int argc, char* argv[])
  {
    char word[64]; /* should be large enough, I hope */
    int length;
    WORDFILE* file = NULL;

    if (argc != 2)
    {
      printf("Usage: %s filename\n", argv[0]);
      return 1;
    }

    file = wordfile_open(argv[1]);
    if (file == NULL)
    {
      printf("The file %s could not be opened as a wordfile\n",
             argv[1]);
      return 2;
    }

    for (;;)
    {
      length = wordfile_nextword(file, word, sizeof(word));
      if (length == 0)
        break;

      if (length == sizeof(word))
        printf("** long word, truncated ** %s\n", word);
      else
        printf("%s\n", word);
    }
    wordfile_close(file);
    return 0;
  }

What is a word

Now that we can have many word files open simultaneously, there's still the problem that a user must trust our judgement for what a word is. What's used so far is that any sequence of characters in the English alphabet and the digits, surrounded by anything else, is a word. In many cases, this is not good enough. Just as an example, suppose I want all identifiers used in the program itself to pass as words. "wordfile_close" will not pass, since "_" fails "isalnum".

What we have to do, is to let the user define what a word is. An easy way to do it, is to let the user pass a string containing all valid characters for a word. That's simple to understand and to implement, so I think it should be done. It's not good enough, though. Suppose I want to split constructions like "ThisWordIsConcatenated" to the word sequence "This", "Word", "Is", "Concatenated". Here both upper case and lower case letters are allowed in words, but a capital letter is always the beginning of a new word. A way to allow this, and much more, without us worrying too much about how to do it, is to let the user tell which function to use when distinguishing words.

There is a data type available in C, that I have so far not mentioned, that can be used for this. The data type is called a pointer to function. The good thing about pointers to functions is that they're flexible yet type safe. The bad thing is that their syntax is terrible.

When defining a pointer to a function, the things to think of are the return type and the parameter list, of the kind of function it is supposed to be pointing to.

This tiny example will show you. Please take your time and study the syntax carefully.


  #include <stdio.h>

  void f1(int i)
  {
    printf("f1 called with %d\n", i);
  }

  void f2(int i)
  {
    printf("f2 called with %d\n", i);
  }

  int add(int i, int n)
  {
    printf("add(%d,%d)\n", i, n);
    return i+n;
  }

  int sub(int i, int n)
  {
    printf("sub(%d,%d)\n", i, n);
    return i-n;
  }

  int main(void)
  {
    int result;
    void (*p1)(int); /* p1 is a pointer to a function taking an int and */
                     /* returning nothing */

    int (*p2)(int, int); /* p2 is a pointer to a function taking two */
                         /* int's and returning an int               */

    p1 = &f1; /* let p1 point to function f1 */
    p1(2);
    p1 = &f2; /* let p1 point to function f2 */
    p1(2);

    p2 = &add; /* let p2 point to function add */
    result = p2(5,6);
    printf("%d\n", result);
    p2 = ⊂
    result = p2(5,6);
    printf("%d\n", result);
    return 0;
  }
The unary "&" operator before the function names is not needed when you want the pointer to the function. It's purely stylistic. I always use it, to clearly show other readers of my code, that I do indeed intend to use the pointer to the function, and not call the function (and accidentally forgot the parenthesis.)

The beauty of pointers to functions is not in the syntax for sure, but in its usefulness. Let's put this in perspective of our word file. The user of it can specify any function they like, as long as it conforms to an interface (i.e. the return type and parameter list) that we specify, and we can call that function through a pointer. Thus our code is not made much harder than it is today, yet its flexibility for the user has grown tremendously.

So how then, should the function used for the word file behave and look like? If we look again at the example where both upper and lower case letters are allowed for words, but the transition from lower case to upper case denotes the beginning of a new word, it is clear that the function needs the previous letter. There are three ways to allow the function to do this. We can, in our implementation, keep the previous letter, and send both the previous and the current letter to the function. The problems with that are, what to send as the previous character when the first character is read from the file, and what if the function requires a longer history than one character? Second is to make that the user's problem all together. The problem with that is that no matter how the user implements it, they will be restricted to only one word file with that function at the time, because the history data will be shared. The third, approach, is to make it the user's problem with our help; that is, the user must specify what data it needs as the history, and the user must initialise it to something reasonable, but we can help by passing that data to the function, for every word file. The only problem with that, is that we cannot, in our implementation, know what data the user will need. The work around for the latter is to revisit our new friend, the "pointer to anything" type, the "void*". The user can instantiate data of whatever kind needed, and pass the address to it as a "void*", and we can pass that pointer to the function, which in its turn casts it back to the type it knows it is. Here's an example of how this can work, just to show you. This example does not use previous characters as its history, but instead a counter of how many times it has been called.


  #include <stdio.h> /* printf  */
  #include <ctype.h> /* isalnum */
  #include <stdlib.h> /* malloc */

  typedef int (*userfunction)(char, void*); /* our functions must accept */
                                            /* a char and a void* and    */
                                            /* return an int.            */

  /* here comes our abstract data type */

  typedef struct {                    /* the abstract data type holding the */
    userfunction f;                   /* necessary stuff. Here it's not     */
    void* pdata;                      /* really abstract, though, but       */
  } abstract;                         /* rather transparent.                */


  abstract* create(userfunction aFunction, void* user)
  {
    abstract* p = (abstract*)malloc(sizeof(abstract)); /* create, init,     */
    p->f = aFunction;                                  /* and return the    */
    p->pdata = user;                                   /* instance of our   */
    return p;                                          /* abstract datatype */
  }

  int call(abstract* pabstract, char c) /* call our function through the */
  {                                     /* abstract datatype instance    */
    return (pabstract->f)(c, pabstract->pdata);
  }

  void destroy(abstract* pabstract)     /* free the memory used */ {
    free(pabstract);
  }


  typedef int userdata; /* the datatype used to store the history */
                        /* for our function                       */


  int function(char c, void* p)     /* an isalnum counting calls */ {
    userdata* puser = (userdata*)p; /* since we know that the void* is   */
                                    /* really a pointer to userdata, we  */
                                    /* can safely cast it.               */

    (*puser)++;                     /* increment userdata for every call */

    return isalnum(c); /* do what "isalnum" does. */
  }


  int main(void)
  {
    userdata counter = 0;
    char* string = "letters in a word";

    /* create the abstract datatype, with our function, and pass */
    /* the address of "counter" as a void*.                      */

    abstract* pabstract = create(function, (void*)&counter);

    /* keep calling our function, through the abstract datatype, for the */
    /* characters in the string, as long as our function does not        */
    /* return 0.                                                         */

    while (call(pabstract, *string++)) /* 1 */
      ;

    printf("function called %d times\n", counter);

    /* release memory used */
    destroy(pabstract);
    return 0;
  }
/* 1 */ Compare "*string++" with "(*puser)++" in "function". "*string++" dereferences "string," and then increments it, while "(*puser)++" increments the dereferenced value of "*puser". The operator precedence rules make "*string++" identical with "*(string++)"

With a construction like the above, we leave control to the user. How does this fit in with our word file then?

Semantics again

Yet again, it's time to think about how we want the word file to behave. The way I think is preferable, although perhaps unnecessarily constraining, is to allow the user to change word definition only once, and only between opening the word file, and reading the first word. The default behaviour must be exactly the same as in the previous implementation, because only then can a user upgrade without changing any currently written programs.

In other words, the two new functions (for defining what a word is) can look something like this:


  typedef enum { notInWord, firstInWord, inWord } charkind;
  typedef charkind (*wordfunction)(unsigned char, void*);

  /* Function type used to identify words. The function must return: */
  /*   notInWord for characters not in a word,                       */
  /*   inWord or firstInWord for characters in a word.                 */
  /*                                                                   */
  /* If the return value is firstInWord, the word will be considered   */
  /* as ended by the character in the previous call, and the current   */
  /* character will be used as the first in the next word.             */
  /* word. The "void*" will be the same as the pointer passed to       */
  /* wordfile_wordfunction for every call.                             */
  /*                                                                   */
  /* The function will be called exactly once for every character.     */


  void wordfile_wordfunction(WORDFILE* wordfile,
                             wordfunction function,
                             void* userdata);

  /* Makes the wordfile use the function "function" when distinguishing */
  /* words. "userdata" is passed to "function" in every call.           */

  /* Preconditions:                                      */
  /*   wordfile != NULL                                  */
  /*   No words are read from the word file              */
  /*   Current word definition is the default definition */

  void wordfile_wordchars(WORDFILE* wordfile, const char* string);

  /* Makes the wordfile accept the characters in the string as */
  /* characters allowed in a word.                             */

  /* Preconditions:                                      */
  /*   wordfile != NULL                                  */
  /*   No words are read from the word file              */
  /*   Current word definition is the default definition */
Of course, our "WORDFILE" datatype now needs to hold more information than just a "FILE*". It must hold either a function to call and userdata, or a character string, and the first char of a new word, if "firstInWord" is returned (in order to store it as the first character in the string on the next call to "wordfile_nextword",) and a flag indicating if reading has begun or not. We can compress that to only a function, userdata, the char and the flag, by having a special string function, looking exactly like the "wordfunction", where the userdata is the string. The default "isalnum" behaviour can also be implemented by a function which just calls "isalnum" just as in the call counting example. When storing the left over character, if "firstInWord" is used, there is a problem in how to tell that no character was left over. Either a flag can be used, or an illegal value. To use an illegal value for "char", though, the datatype needs to be something else. I've chosen "int", as the datatype, and the constant "EOF" to represent no character left over.

Implementation again


  #include <stdio.h>        /* size_t and file stuff */
  #include "wordfile.h"
  #include <assert.h>       /* assert                */
  #include <stdlib.h>       /* malloc and free       */
  #include <ctype.h>        /* is***                 */
  #include <string.h>       /* strchr                */

  struct wordfile_struct { /* our ADT */
    FILE* file;
    wordfunction isWord;
    void* userdata;
    int firstCharInWord;
    int wordReadFlag;
  };

  /* the default wordfunction, which uses "isalnum" */

  static charkind wordfile_isword(unsigned char c, void* userdata)
  {                                          /* static to avoid polluting */
    return isalnum(c) ? inWord : notInWord;  /* the global namespace      */
   /* return "inWord" if "isalnum" returns non-0, and notInWord otherwise */
  }

  /* the string testing function */
  static charkind wordfile_string(unsigned char c, void* userdata)
  {
    return (c != 0 && strchr((const char*)userdata, c) != NULL)    /* 1 */
            ? inWord : notInWord;         /* same construction as above */
  }


  WORDFILE* wordfile_open(const char* filename)
  {
    /* First create the wordfile to use */
    WORDFILE* wordfile = (WORDFILE*)malloc(sizeof(WORDFILE));

    /* check that malloc succeeded */
    if (wordfile == NULL)
    {
      return NULL;
    }

    /* Preconditions:                                        */
    /*   filename != NULL                                    */
    assert(filename != NULL);

    /* Now open the file, and assign the "file" component    */
    /* of our newly created wordfile the value returned by   */
    /* fopen                                                 */
    wordfile->file = fopen(filename, "r");

    if (wordfile->file == NULL)
    {
      free(wordfile);  /* if fopen failed, free the allocated memory */
      wordfile = NULL; /* and set the return value to NULL           */
    }
    else
    {
      wordfile->wordReadFlag = 0;          /* Indicate no word read yet */
      wordfile->firstCharInWord = EOF;     /* No first char to use,     */
      wordfile->userdata = NULL;           /* and no user data set      */
      wordfile->isWord = &wordfile_isword; /* More initializations, we  */
                                              /* want "isalnum" default    */
                                              /* behaviour.                */
    }
    return wordfile;
  }


  /* close is exactly the same as before */

  int wordfile_close(WORDFILE* wordfile)
  {
    int retval = 1;
    if (wordfile != NULL)
    {
      retval = (fclose(wordfile->file) == 0);
      free(wordfile);
    }
    return retval;
  }


  size_t wordfile_nextword(WORDFILE* wordfile,
                           char* buffer,
                           size_t buffersize)
  {
    size_t retval = 0;
    int c = EOF;
    charkind kind; /* This variable is used to control the logic      */
                   /* when determining whether to store the character */
                   /* for the next iteration. `kind' should only be   */
                   /* accessed if `c' is not EOF. There is a fair bit */
                   /* of logic involved to guarantee exactly one call */
                   /* of "wordfile->isWord" for every character.   */


    /* Preconditions:                                               */
    /*   wordfile != NULL.                                          */
    assert(wordfile != NULL);

    /*   buffersize >= 2                                            */
    assert(buffersize >= 2);

    /*   buffer != NULL                                             */
    assert(buffer != NULL);

    /* internal check. If wordfile->file == NULL, something is      */
    /* seriously wrong.                                             */
    assert(wordfile->file != NULL);

    wordfile->wordReadFlag = 1; /* We've started reading, so now it's */
                                /* too late to "change the rules"     */

    /* Obtain the first character of the word. If we have a "leftover" */
    /* first character from the last call, pretend that we've just     */
    /* read it from the file                                           */

    if (wordfile->firstCharInWord != EOF)
    {
      /* Restore state from end of previous call */
      c = wordfile->firstCharInWord;
      kind = firstInWord;
      wordfile->firstCharInWord = EOF; /* Don't handle again in next */
                                       /* call, unless specifically  */
                                       /* set otherwise below        */
    }
    else /* Otherwise we must take care of non-word characters */
    {
      do
      {
        c = fgetc(wordfile->file);
        if (c != EOF)
          kind = wordfile->isWord(c, wordfile->userdata);  /* 2 */
      } while (c != EOF && kind == notInWord);
      /* Loop until we find EOF or something in a word */
    }

    /* No matter which way we took above, "c" is now either EOF or a */
    /* character that belongs to a word. The following loop is not   */
    /* terminated by a firstInWord in the first iteration.           */

    while (c != EOF && (kind == inWord
                        || (kind == firstInWord && retval == 0)))
    {
      if (retval < buffersize)
        buffer[retval++] = (char)c;
      /* Read the next character and keep `kind' up to date */
      c = fgetc(wordfile->file);
      if (c != EOF)
        kind = wordfile->isWord((unsigned char)c, wordfile->userdata);
    }

    /* Now we've either hit EOF, a character of type notInWord, or  */
    /* a character of type firstInWord. If the character that broke */
    /* the loop above was firstInWord, it must be stored for the    */
    /* next call.                                                   */

    if (c != EOF && kind == firstInWord)
      wordfile->firstCharInWord = c; /* Store it for use in next call */

    if (retval < buffersize)
      buffer[retval] = 0;/* null terminate */
    else
      buffer[buffersize-1] = 0; /* Force null-termination of too long word */

    return retval; /* Return the length of the copied word */
  }

  void wordfile_wordfunction(WORDFILE* wordfile,
                             wordfunction function,
                             void* userdata)
  {
  /* Preconditions:     */
  /*   wordfile != NULL */

    assert(wordfile != NULL);

  /*   Current word definition is the default definition */

    assert(wordfile->isWord == &wordfile_isword);

  /*   No words are read from the word file */

    assert(wordfile->wordReadFlag == 0);

    wordfile->isWord = function;
    wordfile->userdata = userdata;
  }


  void wordfile_wordchars(WORDFILE* wordfile, const char* string)
  {
  /* Preconditions:     */
  /*   wordfile != NULL */

    assert(wordfile != NULL);

  /*   Current word definition is the default definition */

    assert(wordfile->isWord == &wordfile_isword);

  /*   No words are read from the word file */

    assert(wordfile->wordReadFlag == 0);

    wordfile->isWord = &wordfile_string;
    wordfile->userdata = (void*)string;
  }
/* 1 */ "strchr" searches for a character in a string. If found, it returns the pointer to the character in the string, otherwise it returns NULL. In this case, we want to see if "c" is in the string, which it is if the return value is not NULL. Before this test, we check if "c" is 0, because all strings always have the character (char)0 (the null-termination character.)

/* 2 */ Instead of calling "isalnum" as before, we now call the function pointed to by "wordfile->isWord" with the user data supplied, and check loop as long as the character is deemed not to be in a word. We must also explicitly check for EOF before calling "wordfile->isWord", since it accepts a character, and not an int, and hence cannot safely react on "EOF".

We can now recompile the previous test program with this implementation. Its behaviour will be identical, as promised. The fun begins when we make use of the extras. Try adding this to the program:


  #include <ctype.h>

  charkind toUpper(unsigned char c, void* p)
  {
    unsigned char* pc = (unsigned char*)p;
    charkind retval = notInWord;
    if (isalnum(c))
    {
      if (isupper(c) && !isupper(*pc))
      {
        retval = firstInWord;
      }
      else
      {
        retval = inWord;
      }
    }
    *pc = c;
    return retval;
  }
To the variables of "main" add

  unsigned char lastchar = 0;
and just prior to the reading loop, add:

  wordfile_wordfunction(file, toUpperTransition, (void*)&lastchar);
If you recompile and run against a file containing, for example "ThisSentenceIsBuiltUpOfManyConcatenatedWordsWithTheirFirstLetterCapitalized"

You can probably already guess what it does.

Not too bad, eh?

I'll try to keep next month's part a bit shorter. Promise!

Recap

  • malloc and free can be used to create/destroy objects as needed. This is more or less essential when using abstract data types.
  • The type "void*" is a generic pointer which can point to anything, but it is impossible to do any arithmetics on it. In general, one should be careful with "void*" since they can indeed point to anything, and you can cast them to any other pointer type. If you're not careful, it's easy to cast it to the wrong type.
  • Pointers to functions are useful when we want to leave it to the user of an abstract data type to define the behaviour. This usage is generally referred to as a "callback function," since the part we've written, call back to user a function.
  • The encapsulation makes it possible for us to make changes to the internals, without affecting a user of the abstract data type.

Coming up

Next month I will show you how to write dynamic data structures of arbitrary size (unlike arrays or structs, for which we need to know the number of elements.) This problem must be solved before writing the word frequency histogram, since it is not known until the entire file is read how many unique words there are in the file. To solve this, I will make much more use of "malloc" and "free", and you will also see a function call construction called "recursion."

Please don't hesitate to e-mail me if you have questions, wishes for details to cover or want me to clarify things.

 

Linkbar