Introduction to C Programming - Part 1
Written by Carsten Whimster
Introduction
C++ programming is all the rage these days, and many companies are looking for university graduates with 3+ years of experience! How they get this experience when they were just studying is unknown. Still, C++ is an extension of C, and to learn C++ you can take two routes:
- learn C, and then learn C++
- skip C, and learn C++
There are advocates of both routes, but I have taken the former. In this article, I will introduce C programming. Of course, even if you do not intend to move on to C++ later, C is still a hot language, and well worth learning. And if you learn C or C++ here in EDM/2, and go on to use it for the next three years, with some kind of proof, then you are very employable. Being a shareware author is one way of proving that you know your stuff to a company.
The "main" Function
The first difficulty in C comes when you have to write a complete program that compiles, as opposed to just looking at snippets. Some tutorials leave this until later, but I will start off with it so that you can start writing programs right away. Here is a hello world C program:
 #include <stdio.h>
 
 void main(void) {
    printf("Hello World!\n");
 }
 
Figure 1: A Hello World C Program
This is a fair amount of stuff for a minimum program, yet it is all necessary. For now, just use this program as a template to program in C. You can replace the printf line with any other code you want to try out. Later on, I will return and explain the rest of this program, but for now, let's learn some more stuff we can use instead of the printf line.
If you have VAC++, you can compile this program by putting this code in a file called hw.c, and then typing:
[E:\]icc hw.c
Figure 2: compiling hw.c
This will produce some files, one of which is hw.exe, which you can run. What is compiling? Well, although a REXX program will just run from the command-line., a C program will not. This is for several reasons, the most important of which is speed. A REXX program is always in source code form, but to run it, the REXX interpreter has to read it, figure out what it means, and then try to do something identical to what the program intends. A C program starts out as source code, but after you compile it, a new file is created which is in machine code. Machine code is the native language of your computer, and there is no translation needed to run it; it just runs, but FAST! Another difference is this: presume that you have been given the task of writing a REXX interpreter. This can be done both in REXX, and in C, but there is one small difference. When you are done, you need the OS/2 REXX interpreter to run the REXX interpreter written in REXX, but you don't need it to run the REXX interpreter in C. This demonstrates the independence of C programs. Once written, there is no way to tell how your program was created. It could have been created in assembly language, in C, in C++, in Pascal, and so on. It is native code, not C any more.
Statements in C
A simple C program can consist of a main function with one or more statements in it. What is a statement? Well, a statement is one of two things:
- a line of valid code, ending in the character ";"
- anything inside an opening brace: { and a closing brace: }
That means that in a sense, a function itself is one statement, and this close parallel is behind good function design, but I digress. In the program above, we can see that the printf line is one statement. The main function looks a bit like a statement, but it isn't quite.
The second type of statement means that we can put anything we want inside braces, and it will be a statement. What goes inside the braces? More statements, of course. This is useful to group several things into one statement, which we will need later. Keep this in mind.
Variables in C
To do anything useful in C, we must be able to add things, subtract things, assign things, and so on. This is accomplished with C's many operators. But what do we add and subtract and assign? Let me introduce variables. To use a variable in C, you must first declare it. What does that mean? Well, in languages like BASIC and REXX, if you want to use a variable, you just go ahead and use it. In C, if that variable is not declared first, you will not be able to compile your program. Here is how to declare a variable in C:
int num;
Figure 3: An integer named num
This declares a variable called num which can hold integer values. Why do you have to tell C what type of values you want it to be able to hold? Well, it boils down to speed again. REXX variables do not have a type. It can do this by storing all things in the same format, and then doing translation depending on how you use it. When the type is already known, no translation is needed. This is much faster. This also means that you will not be able to store a floating point number in an integer variable, but thankfully this is not usually desirable in real life anyway. Well, what types of variables can we declare in C then? Here is a list:
char short or short int int long or long int float double
Figure 4: The Basic C Types
In addition, there are many other ways of modifying variable declarations, but these are the basic types. What are they all, and why aren't there more? Integer variables (short, int, and long) can hold only whole number values, whereas floating point variables (float and double) can hold numbers with decimal places. The first type above is char, which is a bit of an oddball type, but for now, it basically holds characters, like 'a'. By the way, characters in C have to be surrounded by single normal quotes, not any other kind of single or double quotes, ie. 'a', not `a` or "a".
There aren't any more because the designers of C felt that this was enough. That means that strings have to be faked with chars, and there are other ramifications as well. You can actually define your own variable types, but that is an advanced topic for later.
Why several types of integer and floating point numbers? Well, the intention originally was that a short would hold smaller numbers than an int, which would hold smaller numbers than a long, and similarly for float and double (double precision). At the time that C was created, there were many different types of computers out there with some real oddball architectures. The intention was that whoever wrote a C compiler could choose how big these types should be, with the only restriction that an int is at least as large as a short, a long is at least as large as an int, and a double is at least as large as a float. In DOS, a short is 16 bits, and int is 16 bits, and a long is 32 bits, as far as I know. In 32-bit operating systems like OS/2 and Linux, a short is 16 bits, an int is 32 bits, and a long is 32 bits. On the Alpha machine, a short is 16 bits, an int is 32 bits, and a long is 64 bits. The upshot of this is that it can be quite tedious to port code from one machine to another, since the size of the variables is not even the same. In hindsight, this was probably a mistake.
But what does it mean for a short to be 16 bits, and so on? Well, if a short has to keep a number in it, and it has 16 bits, the largest value it can store is 2^16 or 65,536. A 32 bit number can store numbers as large as 2^32 or 4,294,967,296. These correspond to 64K and 4G, which are probably numbers you have heard of before. K means times 1024, and G means times 1024 squared. DOS, for example, uses a (unsigned) short to store addresses, and this is why DOS programmers have to work with 64KB chunks of memory. Lucky us OS/2 programmers! But that would only give us positive numbers, so in actuality, a short can store from -32,768 to +32,767 and an int can store from -2,147,483,648 to +2,147,483,647. You will notice that the largest negative number has a magnitude one larger than the largest positive number. This is due to the encoding of the numbers, called two's complement, which grabs one of the positive numbers to store 0. More on this another time.
OK, let's put this to use:
void main(void) {
   int a; int b;
   short c;
float longVariableName;
       char ch;
   double longVariableName2;
   long whatever;
}
Figure 5: Some Variable Declarations
There are a few things to notice here. First of all, the funny looking #include line is gone. That is because it imports some functions into our program which all have to do with input and output, such as the printf statement. Since we are not doing any input or output, we don't need it. The second thing is that the first two variables are on the same line. C is not white-space sensitive, meaning that you can put extra spaces, tabs, and newlines anywhere you like, as long as they don't break up words. This also explains why the float and char lines don't cause problems. It looks poor though, and is hard to read, so normally you indent code one tab stop inside braces, and put maximum one statement on each line. This also makes it easier for other people to understand, and to maintain.
The names of the variables also demand some explanation. C has to recognize at least 31 character names, but in practice, most compilers recognize far more. Look in your documentation for the length of variable names it will recognize. In practice, names of between 10 and 20 characters are the best. Not too curt, but not too wordy either. Of course, there will always be exceptions. Counters are frequently called i and j, for example, without loss of comprehension.
Well, our little program above did nothing except declare some variables. This means that we are now free to use these variables, but in fact we didn't. Before I show you how to manipulate them, let me just introduce the comment to you. The C comment starts with the two characters /* and end with the two characters */. They can span as many lines as you like, and will only end when the */ characters are seen. Since it is quite possible to forget the end, you probably want to adopt a convention for multiline comments.
Here are some examples of common multiline comment conventions:
void main(void) {
   /* Here is example number        */
   /* one. One full comment         */
   /* on each line. Notice how the  */
   /* beginnings and ends line up.  */
   /*
    * Here is example number two.
    * I personally don't like this one,
    * but it is quite valid.
    */
   /* Here is one to avoid.
      This one just goes on and on,
      and never really seems to end,
      until suddenly, there it is */
   /* Notice that this comment has no end.
   /* That means that this is still the
   /* SAME comment until the end here */
}
Figure 6: Comment Styles
You can adopt whichever you like best, but I recommend not using the last two. It is easy to forget the end. Of course, if your comment fits on one line, great!
Operators in C
Before I wrap up for this time, let me give you some operators you can play with, and elaborate a bit on the printf function. Here is an example which introduces most of the common operators:
 #include <stdio.h>
 
 void main(void) {
    /* declare some variables */
    int a, b;
    float c, d;
    char e, f;
 
    /* manipulate the integers */
    a = 2;
    b = 1 + 4*3 + 5/a;
    printf("a = %d and b = %d\n", a, b);
    printf("a = %d\n", ++a);
    printf("a = %d\n", a);
    printf("a = %d\n", a++);
    printf("a = %d\n", a);
    printf("%d\n", 5%2);
 
    /* manipulate the floating point variables */
    c = 5.0;
    d = 12 /c;
    printf("c = %f and d = %f\n", c, d);
 
    /* manipulate the char variables */
    e = '1';
    f = e + 20;
    printf("e = %c and f = %c\n", e, f);
 }
 
Figure 7: Basic C Operators
Which produces the output:
a = 2 and b = 15 a = 3 a = 3 a = 3 a = 4 1 c = 5.000000 and d = 2.400000 e = 1 and f = E
Figure 8: Output from Figure 6
Well, have we got a lot to talk about! Let's cover printf last. #include is back, because we need printf to output stuff. You will notice that several variables of the same type can be declared on one line, with commas between the names. Assignments are obviously done with the = operator. +, -, *, and / are as expected, with one quirk: / operates as integer division on integers, and as regular division on floating point types. This means that remainders are thrown away in integer division, since integers cannot store them. That explains b. Integers in floating point expressions are converted automatically to floating point number by the C compiler. That explains the value of d.
The ++ operator (and -- as well) is odd, and unique to C, C++, and now Java. It means "increment", or add one to the value of the variable. The two forms are slightly different. ++a means increment a, and then do the rest of the line, whereas a++ means do the rest of the line first, and then increment a before moving to the next line. This explains why those printf statements turn out the way they did.
The % operator is a modulo operator. That means that it gives you the remainder after a division, so that explains why 5%2 is 1.
The characters are odd though. Why can you add to a character, and why is '1' + 20 equal to 'E'? Well, characters are implemented in C as an 8-bit integer, and C allows math to happen on chars. That explains the first item, but why can you get a letter from a number character? You have probably heard of the ASCII characters. You can get an ASCII table almost anywhere. It basically assigns a character to each of the numbers from 0 to 127. Adding or subtracting to a character merely moves you down or up the ASCII table, respectively. This can be quite useful under certain circumstances, but for the most part it is not good. More about this another time.
Finally, the printf we were used to has changed again. We see the \n symbol again, but it still doesn't print. That is because anything that starts with a \ in printf strings is a special code. This one (\n) means newline. If you need an actual \ sign, you just double it up, ie. \\. That way C knows that you really wanted a \ in the first place.
Now printf takes two, three, or four arguments, rather than just one. This function uses one of C's less fortunate features, namely that of a function with a variable number of arguments. For now, just accept that. The basic format of printf is this:
- a string (in double quotes) which shows the basic thing you want printed
- an argument for each symbol in the string
Each of these symbols mentioned above starts with a % sign. Now you should be able to verify that each printf statement has the right number of arguments, ie. a string with some % signs in, and the same number of arguments after the string as there were % signs. But what do you put for the % sign types and the rest of the arguments? Each basic C type has its own code. Integers are %d, floats are %f, and characters are %c. So for each %d in the above, there is an integer argument after the string, and so on. The sequence must be the same. If you need to print an actual % sign, just double it up, as with the \ sign, ie. %%. You will notice that the argument can be an expression, not just a variable name. The value of the expression is calculated before calling the printf function.
Conclusion
Well, that is it for now. You can now write rudimentary C programs, compile them, and run them. Go home and practice for next month. There is still a fair amount more to learn before you can do anything useful, however. Next time we will cover branching and looping. If you have any questions or comments, please don't hesitate to mail me at editor@edm2.com.