Jump to content

Introduction to C Programming - Part 2

From EDM2
Revision as of 17:27, 26 June 2016 by Ak120 (talk | contribs)

Written by Carsten Whimster

Part 1

Part 2

Part 3

Part 4

Part 5

Part 6

Part 7

Part 8

Part 9

Part 10


Greetings Again

I hear that we may have with us Randy Slemko's nephew, who is 11 years old. For this reason, I will take a little extra care making sure that my explanations are good. Welcome to OS/2 programming, or just C programming for now, actually. The type of C programming I have been teaching so far can be used on both DOS, OS/2, UNIX, and some less well known platforms. This is possible because C is a platform-independent language, to a large extent.

In this installment, I will often leave out the #include line and the main declaration for brevity. Just insert the code into the skeleton I gave you last time, and you can compile these snippets.

More about Declarations

One thing I omitted to mention last time is that when you declare a variable, you can also initialize it at the same time. This is a very good habit to get into, since if you don't initialize it, it will have whatever value happens to be in the piece of memory that C gives it. That could be just about anything. Unfortunately, you can actually use this value, even though you never assigned anything to your variable. This causes all kinds of unintentional bugs and errors, and is one of C's weak points, by modern standards. If you initialize your variables right away, you will never get errors caused by the the value of un-initialized variables always being different. Other non-deterministic stuff can happen, but not this.

Here is how you initialize the variable when you declare it:

int   a = 0;
float b = 0.0;
char  c = 'X';

Figure 1: Initializing variables in the declaration

You will notice that I lined up the variable names. That makes the code easier to read, and is a good habit mostly. When is it not? When the type of the variable is really long on some lines, and really short on others. That can't happen too easily with C types, but it can happen very easily with user-defined types, which we will see next time.

Another thing you can do with variable declarations is to bunch up several of the same type on the same line:

int  a, b, c;
char ch, ch2;

Figure 2: Multiple variables in one declaration

I would not suggest mixing the above two methods, as it can get quite confusing. In fact, since I recommended that you initialize all your variables, you probably don't need the multiple declarations at all.

More about Statements

Statements in C are a bit different that most other languages. In most languages, a statement has no value. Expressions always have values, because assignments have expressions on the right-hand side, and need to in order to work properly. But statements don't have values in Pascal, BASIC, and most other languages. What does this mean? Here is an example:

int a = 0;
int b = 10;
int c = 0;
int d = 0;

a = b + 1;
d = c = a + b + 1;
a = b + (c = d);

Figure 3: Statements have values in C

Whoa, what is going on here? The declarations are ok, and so is the first line, which just assigns 11 to a. But the last two lines are weird, and you will only see this in C, and a very few other languages. What happens here is that statements have a value, and unlike expressions which are generally evaluated from left to right, assignments are evaluated from right to left. What does *that* mean? Well, the d= line is evaluated as follows: a + b + 1 is calculated (and by the way, C doesn't tell us if it chooses to evaluate a or b first, which is sometimes important), and gives 22, which is assigned to c. So far so good. Now, the statement c = a + b + 1 has the same value as the left-hand side, ie. 22. This means that d is also 22.

Ok, but what about the last one. Same thing: c = d, ie. 22, a = b + 22, ie. 32.

This is possibly a misfeature of C. It does allow certain constructs to be expressed very compactly, but in general it is quite confusing. I would recommend only using it in two cases. The first is like this:

a = b = c = 0;

Figure 4: One place it is ok to use multiple assignments in one statement

This is ok only because it is easy to see what is happening. All the variables get the same value, ie. 0. The second case we won't see in detail until later, but here it is, without explanation:

while ((ch = getchar()) != EOF) {
  /* some code that uses the character ch */
}

Figure 5: Another ok way to use values of statements

This one is only ok because it is so common and convenient, and because you know that it works, even though it is confusing. I still have to look this one up every time I use it, just to make sure that I got it right. I will explain what it does another time.

Confessions

Well, I lied a little last time, and now I have to fess up. When you declare main, you cannot declare it like this:

void main(void)

Figure 6: The wrong way to declare main

You have to declare it differently, due to a restriction in the language. The above does work on most compilers, but it is not correct, and shouldn't be used. My excuse is that I wanted to shield you a bit from something in C that I didn't cover last time, namely functions. This time I will show you the right way to declare main. In our next installment, I will show you what the real declaration of main means, and how to create your own functions, but for now, here it is:

int main(void)

Figure 7: A right way to declare main

The int out front means that main returns an integer. How does it do that? With a return statement, of course. Here is how:

int main(void) {
  return 0;
}

Figure 8: How to return an integer

Here, zero is our token return value. Normally, you would consider passing a number different than zero if you encountered an error, and zero if you didn't. Normally, you would also pass a variable back with the return statement, not just a number. This variable would have to be of type integer, of course. I will show you some examples of this later. For now, I'll just go ahead and use this return statement. In fact, main can have yet another different signature than this, but I will cover that next time.

A Quick Look at Branching

Branching basically means using an if or switch statement in C. You can branch with a goto, but this is such a bad idea that I will leave it for much later, just so that you don't get used to using it. The if statement looks like this:

if (conditional)
  statement1;
else
  statement2;

Figure 9: The if statement

It mostly looks familiar, except the if, and the conditional. To explain the conditional, I have to explain boolean expressions first.

Boolean Expressions

A boolean expression has the value true or false. Not anything else. C doesn't actually have boolean variables, but it fakes it passably with other types. When C evaluates an expression, the value zero means false, and any non-zero value means true. This is both convenient, and unfortunate, as with so many of C's features. The unfortunate part is that you can do things like:

if (c)
  printf("Error: C is not zero\n");

Figure 10: One way not to use the conditional

This looks weird, but can unfortunately easily be useful. It does end up saving a little time in programs like this:

if (c != 0)
  printf("Error: C is not zero\n");

Figure 11: A better way to do the above

Figure 10 is testing if the resulting value was zero. In Figure 11 we are seeing if c is zero. How are those different? They are not different in result, but they are very different in the way they look. The first one says "if c" but the second one says "if c is not equal to zero". *Very* different! Same action, but for the first one, we invariably end up thinking "if c what?". Once you are quite experienced with C, you can use the first method in certain places without confusion, but I am never going to encourage it. That type of programming invites errors. All decent compilers would remove the code to test explicitly for zero anyhow, so they are equally efficient.

Well, so we can compare equality, but what else? Here are all the comparison operators:

< <= == != >= > !

Figure 12: The comparison operators

The weird looking == just means equal, and != means not equal, as we saw. ! is used to mean not. Now we can do things like:

if ((a + b) >= (c - d))
  e = 10;

Figure 13: An example of a comparison operator

Notice the liberal use of brackets. Because some operators work before others, it is always dangerous to presume that you know what order C is going to evaluate things in. Without the inner brackets above, this could mean several things, given C's propensity to evaluate anything non-zero as true and anything zero-valued as false. For example, does (a + b >= c - d) mean the same as ((a + b) >= (c - d)) or the same as (a + (b >= c) - d)? The middle part of the second expression will evaluate to zero or one in C. Hence both of these expressions are possible. It is actually possible to figure out what order C will evaluate things in, given a operator precedence chart, but just to encourage good programming practices, I will not list it here, so you will have to use brackets when you are not sure.

Back to Branching

You will notice that I call it the if statement. This is because when you have these four components, the whole thing is also considered a statement. That also means that statement1 (or statement2) in figure 9 could be an if statement itself. I will show an example of this later, and a better way of handling some of those types of setups.

Anyway, let's dissect figure 9. The conditional is just a boolean expression.

Remember from part one that a statement can also be anything in braces. That means that the if statement could look like this:

if (conditional) {
  statement1;
  statement2;
  statement3;
  /* and so on */
} else {
  statement4;
  statement5;
  /* and so on */
}

Figure 14: Another if statement

As an aside, where the braces are is not important. I prefer the above, because it takes fewer lines to write than other ways. Also, the indentation in the if statement helps the eye to realize where the various statements belong. The following figure shows other popular and valid ways of doing the same:

/* method 2 */
if (conditional)
{
  statement1;
  /* some if code here */
}
else
{
  /* some else code here */
}

/* method 3 */
if (conditional)
  {
  /* some if code here */
  }
else
  {
  /* some else code here */
  }

Figure 15: Yet more if statements

All of these are fine, since they all indent nicely, and all help the eye realize where the various lines of code belong. Choose one, and stick with it, but be prepared to switch if you get a job as a programmer. Many companies have standards that you have to fit in with. Here is a method (which I just made up) which you shouldn't use, for obvious reasons:

/* don't do this */
if (conditional) { statement1; statement2;
/* some more if code here */ }
else { statement1;
statement2;
/* some more else code here */ }

Figure 16: Badly formatted if statement

Looping

It is near impossible to write a program which does something useful without some way of repeating the same code for a bit, while changing only a few variables. One example of when this is useful is if you have to print out the Fibonacci (Fibonacci is pronounced fi-bo-na-chee) sequence or something similar to that. For those unfamiliar with the Fibonacci sequence, it goes like this: F(0) = 1 F(1) = 1 F(n) = F(n-1) + F(n-2), for n > 1

ie. the sequence starts like this: 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, ...

If someone wants you to write a program that does this, you can do it two ways. The first way is to ask them when it should end, and then declare that many variables, and do that many calculations. Here is an example, if your boss says that it needs to print out the first ten:

int main(void) {
  int f0, f1, f2, f3, f4, f5, f6, f7, f8, f9;

  f0 = 1;
  f1 = 1;
  f2 = f0 + f1;
  f3 = f1 + f2;
  f4 = f2 + f3;
  f5 = f3 + f4;
  f6 = f4 + f5;
  f7 = f5 + f6;
  f8 = f6 + f7;
  f9 = f7 + f8;

  printf("%d %d %d %d %d %d %d %d %d %d\n",
          f0, f1, f2, f3, f4, f5, f6, f7, f8, f9);

  return 0;
}

Figure 17: The fast way of calculating the first ten values of the Fibonacci sequence

You will notice that the caption says that this is the *fast* way of calculating the first 10 Fibonacci numbers. I will get back to this after explaining loops.

This took me only about 2 minutes to write, but suppose that your boss comes back the next day and asks you to write a program that prints the first 1,000,000 numbers in this sequence. This would get real old, real quick. This is why C has looping constructs. We can tell C to do the same thing x times, or 1,000,000 times in our case.

There are several types of loops in C. The one that is most frequently used is the for loop, because in a for loop you specify how many times the loop should run. Here is a neat sample program which uses much of what we have learned so far:

#include <stdio.h>

int main(void) {
  int n    = 0;
  int Fn   = 0;
  int Fnm1 = 1;
  int Fnm2 = 1;

  /* print out the first two numbers */
  printf("F(%d) = %d\n", 0, 1);
  printf("F(%d) = %d\n", 1, 1);

  /* print out the next 48 numbers */
  for (n = 2; n < 50; n++) {
    /* calculate the next number and print it */
    Fn = Fnm1 + Fnm2;
    printf("F(%d) = %d\n", n, Fn);

    /* update the old two numbers for next */
    /* time through the loop               */
    Fnm2 = Fnm1;
    Fnm1 = Fn;
  }

  /* no error */
  return 0;
}

Figure 18: A cool Fibonacci number printer

Note that C is sensitive to the case of variable names, so that Fn and fn would be two different variables. I am not sure that this is a good feature, but I am used to it by now. I think I would prefer it if it would remember the case, but not consider them different.

If you type this in and run it, you will get the expected output, except that right near the end, you will get this:

F(43) = 701408733
F(44) = 1134903170
F(45) = 1836311903
F(46) = -1323752223
F(47) = 512559680
F(48) = -811192543
F(49) = -298632863

Figure 19: Output from figure 18

Whassup wi'dat? Well, F(43) is correct, as is F(44) and F(45). F(46) should have been 2,971,215,073 but it is -1,323,752,223 instead. You may remember that the largest positive value we can store in an integer is 2,147,483,647 from part 1 of this series. So, what happened is that C ran out of bits to store the number in, and instead of giving an error, it just dropped some bits off the top, wrapped around, and started counting from the negative numbers and up towards zero. This is yet another misfeature of C. You can probably understand why everyone is so excited about Java now, since Java fixes virtually every misfeature of C I have mentioned so far, while looking enough like C that it should be easy to learn for C programmers. Anyhow...

So, how do we fix that? Well, we could use a long integer instead of an integer. If we replace long for int for Fn, Fnm1, and Fnm2 in the program above, we get the same output! This is because C uses 32 bits for an integer in OS/2, and 32 bits for a long in OS/2. On a 64-bit CPU, this fix would have worked. For now, we are stuck. Oh, and you probably understand why I didn't let the loop go to 1,000,000 now :) These numbers get big quick! Your boss may want to wait around for that, but we sure don't.

Anyway, let's dissect the for loop. The three parts say n=2, n<50 and n++ respectively, and have to be separated by semicolons. The first one is always an initialization statement, ie. it is run before the loop is started, and never again. The second one is what stops the loop, when it becomes false. So n<50 is true until n is large enough to make it false, ie. 50. The third one gets executed each time we finish the loop, so after the Fnm1 = Fn statement. In fact, it gets executed just before the test is done, so after the loop, n actually has the value 50, even though it was never 50 going through the loop.

Now I can explain what I meant when I said that the first Fibonacci program was the fast one. Because the counter has to be incremented, and the test has to be performed each time through the loop, extra code is executed for a loop. This means that loops actually slow down code slightly. This is usually insignificant, but can become important with speed-critical code. Sometimes it is better to calculate the next five values inside the loop, rather than just the next one. That way the increment and the test are only performed once per five calculations. Some compilers actually do optimizations like this for you. This is called loop unrolling. Of course, it doesn't have to be five, it could be anything. Normally, though, you don't need to do this.

Well, I have only covered one kind of loop, but this article is getting long, so I will cover other types next time.

Conclusion

Well, that just about covers it for this time. Next time, we will take a look at some more of the C programming language, perhaps including the switch statement. If you have any questions or comments, please don't hesitate to mail me at editor@edm2.com.