The Anon CVS Bazaar - Part I
Written by Henry Sobotka
Until recently, most open-source code was released in "tarballs", usually *.zip or *.tar.gz files named after the version they contain. The trend nowadays is to set up a public or anonymous CVS server and make it the "recommended" method of downloading the project's source code.
The main advantage of CVS is that it gives you "live" code, meaning the source tree as it is right now with all the latest updates; or the option of using various flags to specify a particular version, branch or date to copy to your system. Its update command also enables you to keep your copy in sync with the current version instead of downloading another tarball or one or more sets of patches which you then have to apply.
As always, there's a downside: version mergers during an update sometimes result in "conflicts" that have to be cleaned up manually before the file can be compiled; or, while checking out or updating your copy, you might cross paths with someone checking in changes, and end up with a broken build because you pull the latest bar.c, which now calls newfoo() defined in foo.c, which only gets checked in seconds after you draw the old foo.c. But these fall mostly into the category of inconveniences and are far outweighed by the system's benefits.
A Peek into the Attic
Below the snip line is a two-part shar archive containing a READ_ME, Makefile, Install, cvs.1 manual, 10 shell-command scripts for /bin, and 15 "auxiliaries" for /lib. The files are all timestamped mid-June 1986.
RCS, the Revision Control System, had been developed a few years earlier by Walter F. Tichy at Purdue University, Indiana. Tichy's Design, Implementation, and Evaluation of a Revision Control System was among the papers presented at the 6th International Conference on Software Engineering in September 1982. The oldest RCS files are dated the following month, and the first release (version 3.1) appeared in 1983. Tichy provides a concise description of his program in RCS -- A System for Version Control, a 1995 paper distributed with RCS: "RCS manages revisions of text documents, in particular source programs, documentation, and test data. It automates the storing, retrieval, logging and identification of revisions, and it provides selection mechanisms for composing configurations." He also points out that "RCS was originally intended for programs".
For a firsthand look at how RCS works, bash out a really ugly little hello.c:
Then create a subdirectory named "RCS" for your repository and, with the RCS binaries in your PATH and rcslib.dll in LIBPATH, check in the file:
The repository hello.c is divided into four sections: header, revision list, decription and revisions.
The first line of the header records the latest version, here 1.4. The "access" field holds the login names of the people allowed to update the file. When empty, as in our example, normal UNIX permissions apply. Just below that, "symbols" would record the symbolic names associated with a specific revision or branch. For instance, if you decided to name version 1.3 "GONZO", the RCS/hello.c header would read
With this, you can pull version 1.3 with:
The next item, "locks", records the login name of anyone who has a lock on the file along the version they locked, e.g. "fuzzhead:2.7;". Finally, "comment" provides space for adding a note to RCS/hello.c in place of the asterisk. "@" is the string delimiter in RCS files, so to include that character in a log entry (for example, if adding the reviser's email address), you have to use "@@".
The revision list beneath the header is in reverse chronological order, with the latest version at the top. First comes the revision number, followed by the date. The timestamp is in year-month-day-hours-minutes-seconds format. This value is always stored as UTC (Universal Time Coordinated) aka GMT (Greenwich Mean Time) in the RCS file. A "-z" option to both "ci" and "co", however, can be used to specify a format for keyword substitution, or the timezone when using the -d option either to control the check-in date and time, or to check out a file by date instead of revision number.
Next on the same line is a record of the login name of the author of the revision, derived from the LOGNAME environment variable. Lastly, "state" is here set to the default value of "Exp", or "experimental". All other states are user-defined by using the -s option of "ci". For instance:
The "desc" section simply contains the program description you entered when you first checked in hello.c.
The final block of RCS/hello.c stores the contents of the successive versions of hello.c, again in reverse order. First comes the latest in its entirety, followed by the previous ones expressed as deltas, or differences, each accompanied by your log entry. Here you can see the "@" characters marking the beginning and end of the comments you entered and of the content of the revisions.
The differences are relative to the next higher version, and use what is known as "RCS diff format". Its syntax is command|starting-line number of lines. Thus "d5 3" means "delete 3 lines starting at line 5", and "a7 1" is "add 1 line starting at line 7 (of the first file)". Differences are stored by line rather than by character, so changing a single letter will cause the entire line to appear among the deltas.
Thus to retrieve a specific revision, RCS reads the latest version and applies diffs until it reaches the one you want. Because of the reverse order, any version can quickly be pulled with a single pass through the RCS file.
Going Out on a Limb
In terms of the tree metaphor commonly used in working with RCS and CVS, the first version of hello.c is the root, and 1.4 is the tip. Now let's create a branch. Suppose your friend Pierre wants a French version of hello.c to greet him. Easy enough:
To store it separately from your original, you can create a branch:
Now if you reopen RCS/hello.c, you'll see 220.127.116.11 listed under "branches" for version 1.4, with its diff stored between 1.4 and 1.3. But meanwhile Pierre decides he would rather be greeted by "Salut, Warpie!", so you pull his branch with
But accommodating Pierre gives you the idea of customizing the message, so you revamp hello.c to read:
This, you decide, should become a separate version by going:
The check-in creates version 2.1 in RCS/hello.c where, as before, only the latest revision appears in its entirety and even the final 1.4 is stored as a diff. This version numbering, it should be noted, is RCS-specific; you can use an completely different set of digits for releases. You can also go back and work from any point on tree. For instance, if Klaus and Gino come along and want the same thing as Pierre in their native languages, you could just pull 1.4 and create branches 1.4.2 and 1.4.3.
What happens if you simply pull 1.3, change "Warphead" to "Warpie", and check it back in? You effectively yet create another branch (1.3.1).
Now suppose you want to update Pierre's branch to the new version. You can do this by merging the the two files. First, check out 1.4.1 and then enter:
This overwrites hello.c with a new merged version which in our example turns out to be identical to 2.1. If you now check it in without specifying a revision (-r flag), you create 18.104.22.168. Or you could translate the greeting and check it in as -r2.1.1 which creates a new branch off 2.1.
Moreover, if you don't want rcsmerge to overwrite hello.c, you can use the -p flag and direct the output to another file. For example:
These mergers sometimes give rise to conflicts that only you can resolve. For instance, suppose the variable "foo" is declared as an "int" in version 1. You change it to "long", yet meanwhile someone else working on the file makes it a "long long" and checks it in as 1.2. The conflict will turn up in your merged version as:
Here you have to decide which data type you want, and then delete the conflicting declaration along with pair of lines marking the start and end of the conflict block, as well as the line of "=" signs separating the two versions.
In our example, there are no conflicts and, after the merged version has been checked in, RCS/hello.c will look something Figure 2. Again, notice how the revisions are ordered so that RCS can pull any version in one pass through the file.
Next month in Part II we'll describe the two basic administrative utilities, rcs and rlog, as well as take a closer look at locks, the "controversial" feature that gave rise to CVS.