Introduction to C Programming - Part 6

From EDM2
Jump to: navigation, search
Introduction to C Programming
Part: 1 2 3 4 5 6 7 8 9 10

by Björn Fahller

An Introduction to C Programming - Part 6

As promised last month, we're now going to take a look at pointers and arrays. If you search for books in a library, you get in contact with real-world pointers and arrays, in many ways. When searching, you wade through a collection of pointers to books, be they stored on CD-ROM, micro film or card files. The information you find, is not the book itself (of course,) but a pointer to the book. The pointer says where you can find it. When you know where to find it, you go to the bookshelf where it's supposed to be. The bookshelf is an array of books, i.e. one entity holding several items of the same kind. Pointers and arrays conceptually have one thing in common; their type is defined by what they refer to. In the library we have pointers to books and arrays of books.

In this month's lesson, I will first explain the basics of pointers, and then arrays. After this, I will show you how they're both tightly coupled in C.

Pointers

As mentioned in the short introduction, the type of a pointer is defined by the type it points to. An example will show you a few declarations of pointers:

int* pint;         /* pointer to int */
book *bookPointer; /* pointer to book */

It is the "*" that makes it a pointer declaration.

Note the slight difference in the declarations above with respect to the placement of the "*". This makes no difference at all to the C compiler, but is a matter of taste and programming style. I will not enter a taste-war here, but there are good arguments for and against both styles. The ones I've heard of are:

  1. The "*" is what makes it a pointer type. Hence it's a part of the type declaration, so obviously it should be written like the first alternative.
  2. OK, the "*" is what makes it a pointer type, but the way C interprets variable declarations makes the above style dangerous if you declare several variables at the same time.

To explain point 2, have a look at this:

int* p1, p2, p3;

What do you think this declares? It declares "p1" as a pointer to "int", and "p2" and "p3" as "int." Most probably, this was not what was intended.

If you decided that style 1 is your preference, be very careful in always declaring one variable at the time, and not several in the same declaration.

Pointer Initialization and Dereferencing

A pointer is not of much use if it doesn't point to anything. How, then, do we get a pointer to point to something? We can get the pointer to any variable, by prepending its name with "&", which in this case is called the "address of" operator. For example:

int  anInt;
int* pint;

pint = &anInt; /* now pint will point to anInt */

A pointer is still of little use, if we cannot reach what the pointer is pointing to. Doing this is called "dereferencing" and is done by prepending the name with "*", or the "dereference" operator. Now we can actually do something:

#include <stdio.h>

int main(void)
{
  int anInt = 2;      /* initialize with 2 */
  int* pint = &anInt; /* point to anInt */

  printf("*pint = %d, anInt = %d\n", *pint, anInt);

  *pint = 5; /* now what happens with anInt? */

  printf("*pint = %d, anInt = %d\n", *pint, anInt);
  return 0;
}

The output of the above example is:

*pint = 2, anInt = 2
*pint = 5, anInt = 5

The first line tells us that we can get the value of "anInt" by dereferencing "pint", since "pint" points to "anInt." The second line tells us that we can not only get the value of "anInt" through "pint", but also change its value.

Pointer Idioms in Function Calls

While it may not seem like it, this is useful. A fairly common idiom in C, is to declare functions like this:

int readDouble(double* pValue);

Where the return value of the function is an error indicator, and the double read is stored in the variable pointed to by the parameter. An example of use can be this:

  int main(void)
  {
    double value;
    printf("Enter a floating point number: ");
    while (!readDouble(&value))
    {
      printf("Please, enter a floating point number: ");
    }
    printf("The number entered was %f\n", value);
  }

Here the while loop goes on, begging for a floating point number as long as "readDouble" returns 0.

This technique is used very often in the OS/2 API, where almost all functions return an error code.

Pointer to Struct

A pointer can point to any type, including struct and enum types, and even pointer types. For various reasons, pointers to struct types are very common. Here is an example of how you can deal with them:

  typedef struct {
    int a;
    int b;
  } demostruct;

  demostruct aStruct;
  demostruct* structPointer = &aStruct;

  (*structPointer).a = 5;

The last line does a number of things at once. First, within the parenthesis, the pointer is dereferenced. The dereferenced value is the struct, to which we pick the "a" component, and assign it the value 5. Because of the precedence rules of "." and unary "*", the parenthesis are unfortunately necessary. However, accessing components of struct types through pointers is so common that there is a special operator "->" for it. The last line of the above example can be rewritten as:

structPointer->a = 5;

Now, here follows a small example program, that makes use of all you have so far learned about pointers. This program includes a "readDouble" as mentioned above, that does not alter the value of the pointed to variable if the read fails. It includes a struct type called "Coordinate", and of course a "readCoordinate" function.

  #include <stdio.h>

  int readDouble(double* pValue)
  {
    int converted;
    double local; /* We need a local variable to toy with, so */
                  /* the original is not changed, should the  */
                  /* read fail                                */

    char buf[100]; /* This declaration will be explained a  */
                   /* bit further down, please keep reading */

    /* The functions in this section will be explained in  */
    /* another lesson. What this does, is to make sure     */
    /* everything sent to earlier"printf" really is        */
    /* written and then read data entered and try to       */
    /* convert it to a floating point value, which will be */
    /* stored in "local". If the read fails, the program   */
    /* will terminate. If the read succeeds, "converted"   */
    /* will be assigned the number of successful           */
    /* conversions (in this, what we want is 1 successful  */
    /* conversion, anything else is  an error.)            */

    fflush(stdout);
    if (fgets (buf, sizeof (buf), stdin) == NULL)
      exit(0);
    converted = sscanf (buf, "%lf", &local);

    if (converted == 1)
      *pValue = local; /* read succeeded, store result */
    return converted == 1;
  }

  typedef struct {
    double latitude;
    double longitude;
  } Coordinate;

  int readCoordinate(Coordinate* pCoordinate)
  {
    printf("Enter the latitude:");

    /* Read longitude by passing address of longitude     */
    /* component in the passed coordinate. The precedence */
    /* rules for "&" and "->" makes sure we get the right */
    /* one (otherwise the compiler would complain)        */

    /* A lot of things are happening on the following two    */
    /* lines. Find the "latitude" component of the variable  */
    /* pointed to by "pCoordinate", using the "->" operator. */
    /* Then take its address with the unary "&" operator,    */
    /* and pass it to "readDouble". The value read, if the   */
    /* read is successful, will be stored there. If the      */
    /* value returned by "readDouble" is 0, return 0,        */
    /* otherwise go on.                                      */

    if (!readDouble(&pCoordinate->latitude))
      return 0;

    printf("Enter the longitude:");

    /* When in doubt, an extra parenthesis won't hurt. Which */
    /* do you think is clearer about the intention, the      */
    /* readDouble above, or the one below?                   */

    if (!readDouble(&(pCoordinate->longitude)))
      return 0;

    return 1;
  }

  int main(void)
  {
    Coordinate coord;
    if (!readCoordinate(&coord))
      printf("Reading coordinate failed");
    else
      printf("The coordinates entered were: %fN, %fW",
             coord.latitude,
             coord.longitude);
    return 0;
  }

A test run might look like follows:

  D:> coordtst
  Enter the latitude:60.3
  Enter the longitude:-17.6
  The coordinates entered were: 60.300000N, -17.600000W
  D:> coordtst
  Enter the latitude:a
  Reading coordinate failed

This will be enough with pointers for a while. Off to...

Arrays

Arrays is a way of creating a multi-part type, just like structs are, but with arrays, the elements are indexed by an int instead of having individual names, and all elements have the same type. Here's an example of declaring an array of 20 integers:

int intarray[20];

Here "intarray" is a variable storing 20 elements, of type int. The number of elements must be a constant known at compile time, so it is not possible to do this:

int main(void)
{
  int x=25;
  int intarray[x]; /* ERROR! x is not a constant known at */
                   /*        compile time                 */
  return 0;
}

Array Indexing

The elements of an array are reached with the subscript operator "[x]", where x is an "int." The indexes start with 0, so the first element of an array is always "arrayname[0]". The index of the last element is always one less than the number of elements. So, in the example "intarray" above, valid indexes are from 0 to 19 inclusively. An example printing the 10 first elements of the fibonacci series in reversed order shows how the indexing is done:

  #include <stdio.h>

  int main(void)
  {
    int fib[10]; /* array to store the numbers in */
    int index;

    fib[0] = 1; /* starting values for the fibonacci series */
    fib[1] = 1;

    /* create the rest of the values in the series */
    for (index = 2; index < 10; ++index)
    {
      fib[index] = fib[index-1] + fib[index-2];
    }

    /* print the series in reversed order. */
    for (index = 9; index > 0; --index)
    {
       printf("%d, ", fib[index]);
    }

    /* index 0 handled separately for pretty printing */

    printf("%d\n", fib[0]);
    return 0;
  }

It is very important to be careful with the index values. C compilers normally do not check if the index values are valid. The error showed below will not be caught by the compiler, will probably result in odd run-time behaviour that is likely to give you many days of debugging work.

double darray[10];
darray[10] = 3.141592;

Unfortunately, this error, indexing one past the end, is very common, because it's so easy to make unless you're careful. In this case, where the index is a numeric constant, it's fairly easy to see, but when the index is a variable coming from user input, a return value from a function, or something such, it's not quite as easy to see.

Array Initialization

Variables can be initialized when created, as you've seen, among others, in the pointer examples above:

int anInt = 2; /* initialize anInt to have the value 2 */
int* pint = &anInt; /* initialize pint to point to anInt */

Arrays too can be initialized when created, and you do that as follows:

int anarray[5] = { 1, 3, 5, 7, 11 };

This causes "anarray[0]" to be initialized with the value 1, "anarray[1]" to be initialized with the value 3, and so on until "anarray[4]" which is initialized with the value 11.

If, like in the fibonacci example, you only know what to initialize the first few elements with, you can initialize just them, like this:

int fib[10] = {1, 1 };

Here "fib[0]" and "fib[1]" are both initialized with the value 1, and the other 8 elements are uninitialized.

One bothering thing with initialising a whole array, is that you must be careful to make the array the same size as the number of elements you initialize it with. It's always a problem when you store the same information, in this case the number of elements, in two or more places, because sooner or later you will make a change to one of them and forget the other. With array initialization there is a short-cut:

int anarray[] = { 1, 3, 5, 7, 11 };

Now the number of elements will be defined by the number of values are entered in the initializer list.

The problem with this, is that you cannot easily see how many elements the array holds. If you, in several places in your program, use 5 as the number of elements, you have the same problem anyway. There is a compile time operator called "sizeof", probably not really intended for this problem, that solves it rather nicely. "sizeof" looks like a function, to which you pass either a variable or a type. When compiling, the "sizeof" operator with its argument, is replaced by the number of bytes the type or variable occupies. Since all elements of an array are of the same type, and hence occupy the same number of bytes, the compiler can calculate the number of elements for us:

int anarray[] = { 1, 3, 5, 7, 11, 13 };
unsigned arrayElements = sizeof(anarray)/sizeof(anarray[0]);

Here "arrayElements" will be 6, and if "anarray" is extended or shrunk, or if the type is changed to, say, array of short, "arrayElements" will still correctly tell the number of elements in "anarray."

Strings

A character string in C is an array. They're arrays of char, with the special characteristic that the last element has the value 0. As you have seen numerous times, a string is initialized with the characters between quotation marks. The terminating 0 is not visible, but is there none the less. The two examples below are identical, except of course for the name of the identifiers:

char string1[] = "string";
char string2[] = {'s', 't', 'r', 'i', 'n', 'g', 0};

It is important to remember the terminating 0, and the space needed for it. Just about all string handling functions rely on strings ending with 0.

Note that if you declare the string with a size, it will have that size, even if that means the terminating 0 will be stripped.

char hello[5] = "hello"; /* string without terminating 0. */

If you ever find a reason to make a declaration like the above, I think you really should comment that the stripping of the terminating 0 is intentional, and why. Other developers reading your code will otherwise very probably think of it as an error. Note that if you use a C++ compiler, this construction is an error.

Arrays and Functions

As with any other type, it is of course possible to pass arrays to functions. Unfortunately in doing so, the size of the array is lost, so the size must be passed separately unless you have some other way of finding it out. Below is the fibonacci example again, but making use of initialization, sizeof() and a function printing the reversed series:

  #include <stdio.h>

  void printReversedFib(int fib[], unsigned elements)
  {
    unsigned index;
    for (index = elements-1; index > 0; --index)
    {
      printf("%d, ", fib[index]);
    }
    printf("%d\n", fib[0]);
  }

  int main(void)
  {
    int array[10] = {1, 1};
    unsigned size=sizeof(array)/sizeof(array[0]);
    unsigned index;
    for (index = 2; index < size; ++index)
    {
      array[index] = array[index-1] + array[index-2];
    }
    printReversedFib(array, size);
    return 0;
  }

Except from the size information being lost when passing an array to a function, there is something else that is a bit special. This example will show you what:

  #include <stdio.h>

  typedef struct {
    int a;
    int b;
  } AB;

  void function(int i, int* pi, AB ab, int ai[])
  {
    i = 5;
    *pi = 35;
    ab.a=90;
    ai[0] = 105;
  }

  int main(void)
  {
    int integer1 = 1;
    int integer2 = 2;
    AB  anAB;
    anAB.a = 10;
    int array[5] = {3}; /* init. only the first elem. */

    function(integer1, &integer2, anAB, array);
    /*         ^          ^        ^     ^              */
    /*         |           |       |     Pass an array  */
    /*         |           |       +---- Pass a struct  */
    /*         |           +------------ Pass a pointer */
    /*         +------------------------ Pass a value   */

    printf("int %d, int* %d, struct %d, int[] %d\n",
           integer1, integer2, anAB.a, array[0]);
    return 0;
  }

Running this gives the following output:

D:> ftest
int 1, int* 35, struct 10, int[] 105

As expected "integer1" was not changed, since only the value that "integer1" had was passed in the function call, not the variable itself. "integer2" was given the value 35, since the pointer to it was passed to the function, and it changed whatever the pointer pointed to. The struct component "a" didn't change, but "array[0]" did. It appears that arrays behave just like pointers do. In fact, when passing an array in a function call, what is passed is always a pointer to the first element of the array.

Arrays and Pointers

So, arrays are passed to functions as a pointer to the first element of the array. How is that possible? We can, after all, use the subscript operator "[x]" on the array when in the function.

The answer is that there is a lot more to pointers than what I've told you so far. A pointer is a type, which is not just restricted to pointing to things, but which it is possible to do arithmetics on. You can add an integer value to a pointer, and get another pointer value from that. That is exactly what the subscript operator does. Take a look at this:

  #include <stdio.h>

  int array[] = { 2, 3, 5, 7, 11 };

  int main(void)
  {
    int* pint = array; /* here pint will point to the first  */
                       /* element or array, exactly the same */
                       /* way as when passing an array to    */
                       /* a function                         */

    printf("%d, %d, %d, %d, %d\n",
           *pint, *(pint+1), *(pint+2), *(pint+3), *(pint+4));
    return 0;
  }

The output of this program is:

2, 3, 5, 7, 11

What happens is that "pint" points to the first element, and dereferencing it will give the value of the first element. "pint+1" will point to the second element, and thus "*(pint+1)" will give the value of the second element, and so on.

Pointers can be manipulated with operators "++", "--", "+=", "-=", and can be compared. When reading C code, you will often find constructions like:

  #include <stdio.h>

  int array[] = {2, 3, 5, 7, 11, 13, 17 };
  int elements = sizeof(array)/sizeof(array[0]);

  int main(void)
  {
    int* pint = &array[2]; /* point to third element */
    printf("Pointer indexing: %d\n", pint[3]);
    for (pint = array; pint < array+elements; ++pint)
    {
      printf("%d\n", *pint);
    }
    return 0;
  }

The output of this program is:

  Pointer indexing: 13
  2
  3
  5
  7
  11
  13
  17

How does this work? "pint" is first set to point to the third element of "array", and the 4th element from there, i.e. the 7th element or "array", which is 13, is printed. In the for loop, "pint" is first set to point to the first element of "array". Since "pint" is dereferenced in the printout, the first value of the array is printed. Then "pint", which still points to the first value of "array" is compared if it is less than "array+elements", which it is, since "array+elements" is one off the end of "array". Since it was smaller, it is incremented to point to the second element, and then the loop goes on, until the last element of the array has been printed.

Here I've only used pointers to int, but the behaviour is exactly the same regardless of the type pointed to.

I think this will be enough for this month.

Recap

  • With the aid of pointers, we can reach other variables, to read and alter their values.
  • We get a pointer to a variable with the unary operator "&".
  • Pointers can be dereferenced with unary "*", or "->" if they point to a struct.
  • Pointers can be used to let a function both leave a result and return an error code.
  • Arrays is a way of storing several components of the same type.
  • The elements in an array are indexed by an integer using the subscript operator "[x]".
  • Pointers and arrays are tightly connected, and when passing an array in a function call, a pointer to the first element is what really is passed.
  • If a pointer points to an element in an array, the subscript operator can be used on it, as if it was an array.
  • It is possible to do arithmetics on pointers, and to compare their values.

Coming Up

While it may not seem like it, most of the C language has now been covered. What's lacking are a few details, and of course an explanation of at least some of the numerous functions in the ANSI C library, and the OS/2 API. To prove that this really is true, I'm going to spend the next few articles on writing a real program. As it develops, I'll explain the functions, and the few yet uncovered C constructions used. From this you will learn, not just C, but also some good software engineering practices.

Please don't hesitate to e-mail me if you have questions, wishes for details to cover or want me to clarify things.