The Anon CVS Bazaar - Part I

Written by Henry Sobotka

Until recently, most open-source code was released in "tarballs", usually *.zip or *.tar.gz files named after the version they contain. The trend nowadays is to set up a public or anonymous CVS server and make it the "recommended" method of downloading the project's source code.

The main advantage of CVS is that it gives you "live" code, meaning the source tree as it is right now with all the latest updates; or the option of using various flags to specify a particular version, branch or date to copy to your system. Its update command also enables you to keep your copy in sync with the current version instead of downloading another tarball or one or more sets of patches which you then have to apply.

As always, there's a downside: version mergers during an update sometimes result in "conflicts" that have to be cleaned up manually before the file can be compiled; or, while checking out or updating your copy, you might cross paths with someone checking in changes, and end up with a broken build because you pull the latest bar.c, which now calls newfoo defined in foo.c, which only gets checked in seconds after you draw the old foo.c. But these fall mostly into the category of inconveniences and are far outweighed by the system's benefits.

A Peek into the Attic
[cvs110.zip CVS] first appeared in December 1986 in the comp.sources.unix newsgroup (vol. 6, issues 40 and 41). The introduction reads: Subject: CVS, an RCS front-end (cvs) This is CVS, Concurrent Versions System, a front end for RCS. It supports the concurrent and independent use of an RCS directory by several people. We have been using it for half a year now, on various projects. It uses the RCS programs rcs, ci, co, rcsmerge and rlog in such a way that you can do a multi-file commit, etc. It is all shell scripts. Dick Grune Vrije Universiteit Amsterdam, the Netherlands

Below the snip line is a two-part shar archive containing a READ_ME, Makefile, Install, cvs.1 manual, 10 shell-command scripts for /bin, and 15 "auxiliaries" for /lib. The files are all timestamped mid-June 1986.

RCS, the Revision Control System, had been developed a few years earlier by Walter F. Tichy at Purdue University, Indiana. Tichy's Design, Implementation, and Evaluation of a Revision Control System was among the papers presented at the 6th International Conference on Software Engineering in September 1982. The oldest RCS files are dated the following month, and the first release (version 3.1) appeared in 1983. Tichy provides a concise description of his program in RCS - A System for Version Control, a 1995 paper distributed with RCS: "RCS manages revisions of text documents, in particular source programs, documentation, and test data. It automates the storing, retrieval, logging and identification of revisions, and it provides selection mechanisms for composing configurations." He also points out that "RCS was originally intended for programs".

Hello Magic!
For a firsthand look at how RCS works, bash out a really ugly little hello.c: main{ printf("Hello Warphead!");}

Then create a subdirectory named "RCS" for your repository and, with the RCS binaries in your PATH and rcslib.dll in LIBPATH, check in the file: ci hello.c

If all's well, you'll get a prompt requesting a description. Type a few words (or lines), and kill the prompt by entering a line containing only a period ("."). Your copy of hello.c will now be deleted but stored for retrieval in RCS/hello.c. To retrieve it, you have to check it out of the repository: co -l hello.c

The -l flag locks the repository file and a working copy is made for you. If you omit the lock flag, your copy will be read-only. Now edit hello.c a little, check it in and out, and repeat the process a couple more times until version 1.4 looks like this: int main(void) { char *msg = "Hello Warphead!\n"; printf(msg); return 0; } You can then go: co -l -r1.1 hello.c to retrieve your the original version from the repository, or you can specify any other one of your intermediate versions with the -r flag. But if you look into the RCS subdirectory, you'll find only one hello.c, not four. head	1.4; access; symbols; locks; strict; comment	@ * @; 1.4 date	99.04.04.03.17.28;	author sobotka;	state Exp; branches; next	1.3; 1.3 date	99.04.04.03.15.31;	author sobotka;	state Exp; branches; next	1.2; 1.2 date	99.04.04.03.13.11;	author sobotka;	state Exp; branches; next	1.1; 1.1 date	99.04.04.03.11.29;	author sobotka;	state Exp; branches; next	; desc @Hello Warphead program @ 1.4 log @Made message a variable. @ text @#include  int main(void) { char *msg = "Hello Warphead!\n"; printf(msg); return 0; } @ 1.3 log @Changed main to return int. @ text @d5 3 a7 1 printf("Hello Warphead!"); @ 1.2 log @Included  and formatted. @ text @d3 2 a4 1 void main(void) { d6 2 @ 1.1 log @Initial revision @ text @d1 5 a5 1 main{ printf("Hello Warphead!");} @
 * 1) include 

RCS/hello.c
The repository hello.c is divided into four sections: header, revision list, description and revisions.

The first line of the header records the latest version, here 1.4. The "access" field holds the login names of the people allowed to update the file. When empty, as in our example, normal UNIX permissions apply. Just below that, "symbols" would record the symbolic names associated with a specific revision or branch. For instance, if you decided to name version 1.3 "GONZO", the RCS/hello.c header would read: symbols GONZO:1.3; With this, you can pull version 1.3 with: co -l -rGONZO hello.c The next item, "locks", records the login name of anyone who has a lock on the file along the version they locked, e.g. "fuzzhead:2.7;". Finally, "comment" provides space for adding a note to RCS/hello.c in place of the asterisk. "@" is the string delimiter in RCS files, so to include that character in a log entry (for example, if adding the reviser's email address), you have to use "@@".

The revision list beneath the header is in reverse chronological order, with the latest version at the top. First comes the revision number, followed by the date. The timestamp is in year-month-day-hours-minutes-seconds format. This value is always stored as UTC (Universal Time Coordinated) aka GMT (Greenwich Mean Time) in the RCS file. A "-z" option to both "ci" and "co", however, can be used to specify a format for keyword substitution, or the timezone when using the -d option either to control the check-in date and time, or to check out a file by date instead of revision number.

Next on the same line is a record of the login name of the author of the revision, derived from the LOGNAME environment variable. Lastly, "state" is here set to the default value of "Exp", or "experimental". All other states are user-defined by using the -s option of "ci". For instance: ci -sbeta hello.c sets the value of "state" for that version of hello.c to "beta". The following line lists any branches stemming from that revision, and the last line in each entry specifies the "next" (lower) version.

The "desc" section simply contains the program description you entered when you first checked in hello.c.

The final block of RCS/hello.c stores the contents of the successive versions of hello.c, again in reverse order. First comes the latest in its entirety, followed by the previous ones expressed as deltas, or differences, each accompanied by your log entry. Here you can see the "@" characters marking the beginning and end of the comments you entered and of the content of the revisions.

The differences are relative to the next higher version, and use what is known as "RCS diff format". Its syntax is command|starting-line number of lines. Thus "d5 3" means "delete 3 lines starting at line 5", and "a7 1" is "add 1 line starting at line 7 (of the first file)". Differences are stored by line rather than by character, so changing a single letter will cause the entire line to appear among the deltas.

Thus to retrieve a specific revision, RCS reads the latest version and applies diffs until it reaches the one you want. Because of the reverse order, any version can quickly be pulled with a single pass through the RCS file.

Going Out on a Limb
In terms of the tree metaphor commonly used in working with RCS and CVS, the first version of hello.c is the root, and 1.4 is the tip. Now let's create a branch. Suppose your friend Pierre wants a French version of hello.c to greet him. Easy enough: char *msg = "Bonjour, Pierre!\n"; To store it separately from your original, you can create a branch: ci -r1.4.1 hello.c Now if you reopen RCS/hello.c, you'll see 1.4.1.1 listed under "branches" for version 1.4, with its diff stored between 1.4 and 1.3. But meanwhile Pierre decides he would rather be greeted by "Salut, Warpie!", so you pull his branch with co -l -r1.4.1 hello.c and change the message. When checking it back in, you don't have to specify the branch because RCS records what version you checked out. Looking at RCS/hello.c, you'll find you've now created 1.4.1.2. Notice, however, that the two branch versions are stored in sequential rather than reverse order. Here again, this is for one-pass retrieval. To reconstruct the current version of the branch, RCS starts at the tip of the main trunk, diffs its way back to the branch-off point, then forward through the branch revisions.

But accommodating Pierre gives you the idea of customizing the message, so you revamp hello.c to read: int main(void) { char *greeting = "Hello, Warpie"; char *name = getenv("USERNAME"); printf("%s %s!\n", greeting, name); exit(0); }
 * 1) include 
 * 2) include 

This, you decide, should become a separate version by going: ci -r2 hello.c The check-in creates version 2.1 in RCS/hello.c where, as before, only the latest revision appears in its entirety and even the final 1.4 is stored as a diff. This version numbering, it should be noted, is RCS-specific; you can use an completely different set of digits for releases. You can also go back and work from any point on tree. For instance, if Klaus and Gino come along and want the same thing as Pierre in their native languages, you could just pull 1.4 and create branches 1.4.2 and 1.4.3.

What happens if you simply pull 1.3, change "Warphead" to "Warpie", and check it back in? You effectively yet create another branch (1.3.1).

Now suppose you want to update Pierre's branch to the new version. You can do this by merging the two files. First, check out 1.4.1 and then enter: rcsmerge -r1.4.1 -r2 hello.c This overwrites hello.c with a new merged version which in our example turns out to be identical to 2.1. If you now check it in without specifying a revision (-r flag), you create 1.4.1.3. Or you could translate the greeting and check it in as -r2.1.1 which creates a new branch off 2.1.

Moreover, if you don't want rcsmerge to overwrite hello.c, you can use the -p flag and direct the output to another file. For example: rcsmerge -p -r1.4.1 -r2 hello.c > hello.c.merged These mergers sometimes give rise to conflicts that only you can resolve. For instance, suppose the variable "foo" is declared as an "int" in version 1. You change it to "long", yet meanwhile someone else working on the file makes it a "long long" and checks it in as 1.2. The conflict will turn up in your merged version as: <<<<<<< foo.c  long foo; =======  long long foo; >>>>>>> 1.2 Here you have to decide which data type you want, and then delete the conflicting declaration along with pair of lines marking the start and end of the conflict block, as well as the line of "=" signs separating the two versions.

In our example, there are no conflicts and, after the merged version has been checked in, RCS/hello.c will look something like this: head	2.1; access; symbols; locks sobotka:1.4.1.3; strict; comment	@ * @; 2.1 date	99.04.04.07.48.54;	author sobotka;	state Exp; branches 2.1.1.1; next	1.4; 1.4 date	99.04.04.03.17.28;	author sobotka;	state Exp; branches 1.4.1.1; next	1.3; 1.3 date	99.04.04.03.15.31;	author sobotka;	state Exp; branches 1.3.1.1; next	1.2; 1.2 date	99.04.04.03.13.11;	author sobotka;	state Exp; branches; next	1.1; 1.1 date	99.04.04.03.11.29;	author sobotka;	state Exp; branches; next	; 1.3.1.1 date	99.04.04.08.14.28;	author sobotka;	state Exp; branches; next	; 1.4.1.1 date	99.04.04.06.59.22;	author sobotka;	state Exp; branches; next	1.4.1.2; 1.4.1.2 date	99.04.04.07.15.23;	author sobotka;	state Exp; branches; next	1.4.1.3; 1.4.1.3 date	99.04.04.08.52.54;	author sobotka;	state Exp; branches; next	; 2.1.1.1 date	99.04.04.08.58.35;	author sobotka;	state Exp; branches; next	; desc @Hello Warphead program @ 2.1 log @Customized message @ text @#include  int main(void) { char *greeting = "Hello, Warpie"; char *name = getenv("USERNAME"); printf("%s %s!\n", greeting, name); exit(0); } @ 2.1.1.1 log @Translated greeting @ text @d6 1 a6 1 char *greeting = "Bonjour, Warpie"; @ 1.4 log @Made message a variable. @ text @d2 1 d6 2 a7 1 char *msg = "Hello Warphead!\n"; d9 2 a10 2 printf(msg); return 0; @ 1.4.1.1 log @First French version. @ text @d5 1 a5 1 char *msg = "Bonjour, Pierre!\n"; @ 1.4.1.2 log @Changed French message. @ text @d5 1 a5 1 char *msg = "Salut, Warpie!\n"; @ 1.4.1.3 log @Merged with 2.1 @ text @a1 1 d5 1 a5 2 char *greeting = "Hello, Warpie"; char *name = getenv("USERNAME"); d7 2 a8 2 printf("%s %s!\n", greeting, name); exit(0); @ 1.3 log @Changed main to return int. @ text @d5 3 a7 1 printf("Hello Warphead!"); @ 1.3.1.1 log @Changed message. @ text @d5 1 a5 1 printf("Hello Warpie!"); @ 1.2 log @Included  and formatted. @ text @d3 2 a4 1 void main(void) { d6 2 @ 1.1 log @Initial revision @ text @d1 5 a5 1 main{ printf("Hello Warphead!");} @
 * 1) include 
 * 1) include 

Again, notice how the revisions are ordered so that RCS can pull any version in one pass through the file.

Next month in Part II we'll describe the two basic administrative utilities, rcs and rlog, as well as take a closer look at locks, the "controversial" feature that gave rise to CVS.