Introduction to C Programming - Part 5

From EDM2
Revision as of 10:24, 8 September 2018 by Ak120 (Talk | contribs)

Jump to: navigation, search
Introduction to C Programming
Part: 1 2 3 4 5 6 7 8 9 10

by Björn Fahller

This month's topic is something that has been used in all parts so far, with little to no words on what it's about: data types. Data types is a wide topic, and with C, where much is left hardware dependent, compiler dependent or worst of all, undefined, the topic quickly grows out of hand. I'll cut the topic down severely, though, to give you a start and an idea, from which you can go on with further studies.

Definition

The data type of a variable describes what values the variable can have, what operations are allowed on the variable, and the effect the operation has. The first data type we learn to use, when we start counting on our fingers and toes, is a subset of the natural numbers, the numbers from 1 through 20 (0 is a number most of us learn about at a much later state.) An attempt to represent anything but numbers from 1 through 20 using your fingers and toes is something most children consider impossible, and they won't even try it. The operations we learn are addition and subtraction. Soon we learn how to handle larger numbers. Later in life, we learn about negative numbers. Something that can have both a positive and a negative value (integer) is another data type. In C, unsigned integer, and integer are different types. Yet a bit later, we learn about division, and through that rational numbers, and later on irrational numbers. These are in C approximated by the types float, double or long double. This is more or less how far C stretches when it comes to types to do arithmetic with (I'm not going into pointers in this lesson.)

Reason

C is, what is often referred to as a compile time typed language, or statically typed language. This means that once a variable is given a type, it will have that type throughout its lifetime, and the type must be known, and defined at compile time. Some languages allows variables to have any type, change type, and have unknown type at compile time and may even create new types at run time. A look back in the mirror of computer science history shows that the original reason for compile time typing was as a performance optimization. If the type is known already at compile time, type checks need not be done at run time, so performance is gained. Later on, static typing has become a science of its own, and is by many seen as a way for the compiler to find errors, instead of waiting for odd behaviour at run time.

Available C types

There are two kind of arithmetic types in C; integral types and floating point types. A floating point type approximates real numbers. The name comes from having a decimal point with floating position indicated by an exponent.

The integral types are grouped by the size of their value space (how many unique numbers they can represent), and if the range includes negative numbers or not.

The smallest integral type in C is named "char", with the variants "signed char" and "unsigned char." "char" differs from the other integral types in that it is not defined whether "char" is signed or unsigned. To be on the safe side you must specify which you mean. If all you intend to do is to store values, though, it matters little if it is signed or not. However, conversions to other types (explained later) requires some precaution. If you intend to do "char" arithmetic, you should always specify if you mean "signed char" or "unsigned char."

Information about all integral types can be found in macros defined in the header <limits.h>. The maximum value for a char is CHAR_MAX, and the minimum CHAR_MIN, for signed char SCHAR_MAX and SCHAR_MIN and the maximum unsigned char is UCHAR_MAX (the minimum unsigned anything is always 0).

It is worth noting that the value space of a char (or any integral type for that matter,) is not rigorously defined. You can find out how many bits a char is represented with by checking CHAR_BIT, which on OS/2 is always 8.

The next integral types are "short int" (often just referred to as "short"), "int" and "long int" (often just "long".) All of them have "unsigned" versions. ANSI C only requires that the value space of "short" is a subset of that of "int" which is a subset of that of "long", and that "int" is the type most natural for the processor. For 32-bit OS/2 this means that "int" is 32 bits. All compilers I've seen use 16-bits for "short" and 32-bits for "long", but there might be exceptions [The Alpha uses 64 bit longs. Ed]. You can find the limits for these types too in <limits.h>.

Signed integral types are usually not quite symmetric due to the representation used (although they actually can be, this depends on the processor.) Usually the negative range is one number larger than the positive range. For example the maximum signed 16-bit number is 32767 and the minimum signed 16-bit number is -32768 for Intel processors.

After all this on integral types, a few words should be said on floating point types.

There are three floating point types in C, "float", "double" and "long double." The floating point types are always signed.

The smallest floating point type "float" is, just like the smallest integral type "char" a special case. What's special with them both is that arithmetic is not done with them; they are just a way of storing data with a certain precision. Arithmetic is instead done with the precision of a larger type. For "char" they're done at "int" precision. For "float" the arithmetic is done at "double" precision.

Here's are some examples of using floating point types:

 double pi=3.14159265358979323846;
 double zero=0.;
 double large=.32e205; /* 0.32*10^205 */
 double small=543e-210; /* 543*10^-210 */
 double max=DBL_MAX;

If you have operations with mixed types, the type with the largest value space decides the precision the operation will be done with.

Type conversions

C differs from many other compile time typed languages in that it allows implicit type conversions. For example, the following is legal C, but illegal in many other languages (for example Pascal.)

long l;
short s;
...
s = l;

Many compilers, however, issue a warning about the above. The reason for the warning is that you lose precision. What happens if the value held by "l" is outside the value range for "short"? If you know what you're doing, you can shut up the compiler by explicitly doing the conversion, like this:

long l2;
short s2;
...
s2 = (short)l2;

This looks ugly, but I think it should, because it is. This way the compiler won't complain, but it's fairly clear from reading the code that something ugly is done.

The construction "(short)l2" above, is what is known as a type cast. In short, what it means is: make a short out of the value in l2.

Now is also the time to go back and explain why it might be important to know whether "char" is signed or not when converting it to another type. One of the reasons (there are others, but I will not go into those at this point), is that conversions are, as far as possible, sign preserving. Let's look at this example:

  #include <stdio.h>

  int main(void)
  {
    unsigned char uc=255;
    signed char sc = 255;
    char c = 255;

    int iuc = uc;
    int isc = sc;
    int ic  = c;

    printf("unsigned => %d, signed => %d, unknown => %d\n",
           iuc, isc, ic);
    return 0;
  }

With VisualAge C++, which I use, where "char" is unsigned by default, the output becomes:

unsigned => 255, signed => -1, unknown => 255

Had "char" been signed, though, the output would have been:

unsigned => 255, signed => -1, unknown => -1

Making our own types

A first step towards making our own types is to set our own name to types. The keyword "typedef" does this for us. With what little I've presented on types so far, not that much can be done, but here is something:

typedef short int16;            /* 16-bit integer          */
typedef unsigned short uint16;  /* 16-bit unsigned integer */
typedef int int32;              /* 32-bit integer          */
typedef unsigned int uint32;    /* 32-bit unsigned integer */

The above means that an alias "int16" is defined for the type "short," and an alias "uint16" is defined for the type "unsigned short," and so on for the "int" types.

The above is not a necessary step for defining our types, but as will be seen later on, things will become clearer with the aid of "typedef."

An enumeration type

The easiest type to define is an enumeration. Enumerations is a way of setting names to integer values. An classic example of an enumeration type is the days of the week:

enum DayNames { Sunday,
                Monday,
                Tuesday,
                Wednesday,
                Thursday,
                Friday,
                Saturday };
enum DayNames dayOfWeek;

The above example first defines an enumeration type called "enum DayNames", which includes the names of the days of the week. Note that the name of the type is "enum DayNames", where "DayNames" is called the tag for the enumeration type. The values for the names begin with 0 for "Sunday", 1 for "Monday" and so on through 6 for "Saturday." Then a variable "dayOfWeek" of the type "DayNames" is declared.

Contrary to what most beginners (and some long time user's too, for that matter) of C believes, this means that "dayOfWeek" is an integer. It also means that the names "Sunday" through "Saturday" are defined integral values, where "Sunday" is 0, "Monday" is 1 and so on. Note that it's perfectly legal to do:

dayOfWeek = 85;

later on in the program. The compiler might warn for it, to hint that this might not be what you wanted, but it is legal.

The fact that "dayOfWeek" is an "int" can also cause some other problems. Take a look at this loop:

void todaysDuty(enum DayNames);

enum DayNames dayOfWeek = Monday;
for (;;) {   /* for ever and ever */
  todaysDuty(dayOfWeek);
  dayOfWeek++;           /* increment dayOfWeek */
}

Remembering last month's lesson, we see that there is a function called "todaysDuty" which needs an "enum DayNames" to do its work. The return type "void" has not been described yet. "void" is a pseudo type meaning "nothing at all." In this case it means that "todaysDuty" does not return any value. The first time "todaysDuty" is called, "dayOfWeek" is "Monday", and then "dayOfWeek" is incremented to "Tuesday". The problem comes after working on "Saturday." Seeing that "dayOfWeek" is really an "int" and that "Saturday" is really a name for the value 6, "dayOfWeek++" will increment "dayOfWeek" to 7, which was not intended.

This can be avoided by a guard:

  for (;;) {   /* for ever and ever */
    todaysDuty(dayOfWeek);
    if (dayOfWeek == Saturday)
      dayOfWeek = Sunday
    else
      dayOfWeek++;
  }

This is not that smooth, though. If you, like me, live in a country where days of the week are enumerated from Monday through Sunday, you might easily make the mistake of checking for "Sunday", which would not work very well at all in this case, since the loop will go on a good while "dayOfWeek" wraps around and becomes zero again.

A way around that is to use (or maybe abuse, depending on how you see it,) the knowledge that "dayOfWeek" is an "int" and make use of what integer arithmetics we have at hand. One handy such is the remainder operator "%", which returns the rest after division. A small example of "%":

int a = 14%5;  /* a = 4, since 14=2*5+4    */
int b = 8%7;   /* b = 1, since 8=1*7+1     */
int c = 254%12 /* c = 2, since 254=21*12+2 */

Now we can use this when looping through our enumeration:

for (;;) {   /* for ever and ever */
  todaysDuty(dayOfWeek);
  dayOfWeek = (dayOfWeek+1)%7;
}

This will work regardless of which day of the week the enumeration is started with.

Did you notice that the name of the type is "enum DayNames?" The "enum" part of the name feels like unnecessary baggage. With the help of "typedef" we can free ourselves from the need to specify "enum" every time we refer to the type.

typedef enum DayNames Weekdays;

Now we can refer to the type by the alias "Weekdays" instead. This typedef can be specified immediately when declaring the enum:

typedef enum DayNames { Monday,
                        Tuesday,
                        Wednesday,
                        Thursday,
                        Friday,
                        Saturday,
                        Sunday} Weekdays;
Weekdays dayOfWeek;

Feels a bit shorter and lighter. It can actually be shortened down even further, because it is possible to create an enum without tag, and give it a name with "typedef," like this:

typedef enum { Monday,
               Tuesday,
               Wednesday,
               Thursday,
               Friday,
               Saturday,
               Sunday } Weekdays;

An interesting aspect of "enum" is that you can set the values of the names. A small traffic light simulation will show you how, and also why it might be a good thing:

typedef enum {
   red, yellow, green
} LightForCars;

typedef enum {
   dont_walk = red, walk = green
} LightForPedestrians;

Now the value of "dont_walk" is the same as the value for "red" (which it would be anyway, since enumerations by default start at 0), and the value for "walk" is the same as the value for "green". Since "walk" has the same meaning for pedestrians as "green" has for cars, it makes sense to give them the same value.

This also shows another less good aspect of "enum." In Sweden, the lights for pedestrians are either a red standing man, or a green walking man. One might be tempted to make this declaration:

typedef enum {
   red,
   yellow,
   green
} LightForCars;

typedef enum {
   red,     /* ERROR!! red already defined for LightForCars */
   green    /* ERROR!! green already defined as above */
} LightForPedestrians;

Once a name has been used in an enumeration, it's used, and cannot be reused in any other enumeration.

Structure types

New types can be created by using already defined types. One such way is to define what is known as structs.

A typical declaration of a struct is like this:

struct Complex {
  double real;
  double imaginary;
} comp;

Here we have declared a type called "struct Complex" which contains two parts, the doubles called "real" and "imaginary," and we have declared a variable "comp" of that type. "comp" is now a two component variable with one name. The different components can be referred to as "comp.real" and "comp.imaginary." This is much more useful than it at first might seem. Often you find a need for several variables to stick together. If you do complex mathematics, and need distinct variables for the real parts and the imaginary parts, you will sooner or later miss something, or confuse one variable's value for another. The reason is that when doing complex mathematics, you don't normally think about distinct real and imaginary values, you think about complex values.

One hassle with the above construction is that the type is always referred to as "struct Complex", and actually it is both a type definition and a variable declaration at the same time. Separating the type definition from the variable declaration, and using a typedef will make things clearer.

typedef struct _Complex {
  double real;
  double imaginary;
} Complex;

Complex comp;

The above looks rather odd, but there is little to do about it. It means that there is now a type defined called "struct _Complex", which through the typedef is now available with the short hand name "Complex". The reason the struct is called "struct _Complex" instead of "struct Complex" is just to make it work with C++ compilers as well, since C++ handles the names of structs differently from C. Actually any name other that "Complex" would do to satisfy a C++ compiler, but having a name resembling the one in the typedef makes sense.

It is possible to create a "struct" without a tag, just as it is possible to create an "enum" without a tag, and it is done the same way:

typedef struct {
  double real;
  double imaginary;
} Complex;

Which you chose is a matter of taste.

Nested struct

The components in a "struct" type are not limited to the predefined types of C. They can be of "enum" type, or of other "struct" types. If you're writing a database of map coordinates, you're likely to define a type of this kind:

typedef struct {
  double latitude;
  double longitude;
} Coordinate;

In such a database, you're more than likely to have the coordinate represent something, the name of a city, or a lake, for example. So you might declare another struct like this:

typedef struct {
  Coordinate location;
  LocationKey key;
} Place;

Here I've assumed that there is a type "LocationKey" defined somewhere, which can be used to look up information about the place in a location database. The components of "Place" can be reached as follows:

  void showInfoOn(LocationKey);

  void tellAbout(Place aPlace)
  {
    showInfoAbout(aPlace.key);
    printf("Location: %f north, %f west\n",
           aPlace.location.latitude,
           aPlace.location.longitude);
  }

Again, I've assumed that somewhere we have a function "showInfoAbout" that retrieves the information about the place in the location database through the key, and prints it.

More on struct and enum

I should mention another thing about names for "struct" and "enum" types.

The name space used for "struct" and "enum" tags is shared, but different from the global name space. Two examples shows this better than the above sentence:

typedef double atype;
enum atype { avalue } enumvar;
atype doublevar;

This is not a name collision. The variable "doublevar" is of type "atype", which is an alias for the type "double." The variable "enumvar" is of type "enum atype"

This example, of course, shows unbelievably poor judgement in naming, but I think you understand what I mean. This with the name space for "struct" and "enum" tags being different from the global name space is not usually a problem. The detail that the namespace for "struct" and "enum" tags is shared, however, can indeed be a problem. Look at this:

struct RGB {
  int red;
  int green;
  int blue;
};
enum RGB { red, green, blue }; /* ERROR!!! */
typedef enum { red, green, blue } RGB;

The error is that the tag "RGB" is already occupied by the struct, since the name space for "struct" and "enum" tags is shared. The last line of the example, however, is not an error, since the identifier "RGB" that is used by the struct, is in the name space for "struct" and "enum" tags, and the one in the last line is in the global namespace (because of the "typedef".)

Recap

Data types of variables and function parameters must be known at compile time.

Data types are referred to by the kind of data they represent (integral, floating point, and so on,) and grouped by their size and range.

The integral types are:

char, signed char, unsigned char
short int, unsigned short int
int, unsigned int
long, unsigned long
enum types

The floating point types are:

float, double, long double.

The size and range of the types can differ between implementations, but we can find them in <limits.h> and <float.h>

Type conversions are done implicitly, often with a compiler warning, but the conversions can be done explicitly through type casts.

Enumerations can be defined to name values, but an enum variable is really an integer.

Different enumerations can not share names for their values.

When several variables logically belong to one another, one struct variable can be declared instead. A struct can contain components of any type, including enum and other struct types.

Next

I have in this lesson completely avoided character strings and arrays. The reason is that they are so tightly coupled to pointers. Arrays and pointers are alone a topic large enough for an article, so I think next month's topic will be arrays and pointers. Please don't hesitate to e-mail me suggestions if you have ideas for other topics.