K&R C


I was recently reading the C99 rationale (because why the fuck not) and I was intrigued by some comments that certain features were retired in C89, so I started wondering what other weird features existed in pre-standard C. Naturally, I decided to buy a copy of the first edition of The C Programming Language, which was published in 1978.

"The C programming Language" by Kernighan and Ritchie

Here are the most interesting things I found in the book:

  • The original hello, world program appears in this book on page 6.

    In C, the program to print “hello, world” is

    main()
    {
         printf("hello, world\n");
    }

    Note that return 0; was not necessary.

  • Exercise 1-8 reads:

    Write a program to replace each tab by the three-column sequence >, backspace, -, which prints as ᗒ, and each backspace by the similar sequence ᗕ. This makes tabs and backspaces visible.

    … Wait, what? This made no sense to me at first glance. I guess the output device is assumed to literally be a teletypewriter, and backspace moves the carriage to the left and then the next character just gets overlaid on top of the one already printed there. Unbelievable!

  • Function definitions looked like this:
    power(x, n)
    int x, n;
    {
         ...
         return(p);
    }

    The modern style, in which the argument types are given within the parentheses, didn’t exist in K&R C. The K&R style is still permitted even as of C11, but has been obsolete for many years. (N.B.: It has never been valid in C++, as far as I know.) Also note that the return value has been enclosed in parentheses. This was not required so the authors must have preferred this style for some reason (which is not given in the text). Nowadays it’s rare for experienced C programs to use parentheses when returning a simple expression.

  • Because of the absence of function prototypes, if, say, you wanted to take the square root of an int, you had to explicitly cast to double, as in: sqrt((double) n) (p. 42). There were no implicit conversions to initialize function parameters; they were just passed as-is. Failing to cast would result in nonsense. (In modern C, of course, it’s undefined behaviour.)
  • void didn’t exist in the book (although it did exist at some point before C89; see here, section 2.1). If a function had no useful value to return, you just left off the return type (so it would default to int). return;, with no value, was allowed in any function. The general rule is that it’s okay for a function not to return anything, as long as the calling function doesn’t try to use the return value. A special case of this was main, which is never shown returning a value; instead, section 7.7 (page 154) introduces the exit function and states that its argument’s value is made available to the calling process. So in K&R C it appears you had to call exit to do what is now usually accomplished by returning a value from main (though of course exit is useful for terminating the program when you’re not inside main).
  • Naturally, since there was no void, void* didn’t exist, either. Stroustrup’s account (section 2.2) appears to leave it unclear whether C or C++ introduced void* first, although he does say it appeared in both languages at approximately the same time. The original implementation of the C memory allocator, given in section 8.7, returns char*. On page 133 there is a comment:

    The question of the type declaration for alloc is a vexing one for any language that takes its type-checking seriously. In C, the best procedure is to declare that alloc returns a pointer to char, then explicitly coerce the pointer into the desired type with a cast.

    (N.B.: void* behaves differently in ANSI C and C++. In C it may be implicitly converted to any other pointer type, so you can directly do int* p = malloc(sizeof(int)). In C++ an explicit cast is required.)

  • It appears that stdio.h was the only header that existed. For example, strcpy and strcmp are said to be part of the standard I/O library (section 5.5). Likewise, on page 154 exit is called in a program that only includes stdio.h.
  • Although printf existed, the variable arguments library (varargs.h, later stdarg.h) didn’t exist yet. K&R says that printf is … non-portable and must be modified for different environments. (Presumably it peeked directly at the stack to retrieve arguments.)
  • The authors seemed to prefer separate declaration and initialization. I quote from page 83:

    In effect, initializations of automatic variables are just shorthand for assignment statements. Which form to prefer is largely a matter of taste. We have generally used explicit assignments, because initializers in declarations are harder to see.

    These days, I’ve always been told it’s good practice to initialize the variable in the declaration, so that there’s no chance you’ll ever forget to initialize it.

  • Automatic arrays could not be initialized (p. 83)
  • The address-of operator could not be applied to arrays. In fact, when you really think about it, it’s a bit odd that ANSI C allows it. This reflects a deeper underlying difference: arrays are not lvalues in K&R C. I believe in K&R C lvalues were still thought of as expressions that can occur on the left side of an assignment, and of course arrays do not fall into this category. And of course the address-of operator can only be applied to lvalues (although not to bit fields or register variables). In ANSI C, arrays are lvalues so it is legal to take their addresses; the result is of type pointer to array. The address-of operator also doesn’t seem to be allowed before a function in K&R C, and the decay to function pointer occurs automatically when necessary. This makes sense because functions aren’t lvalues in either K&R C or ANSI C. (They are, however, lvalues in C++.) ANSI C, though, specifically allows functions to occur as the operand of the address-of operator.
  • The standard memory allocator was called alloc, not malloc.
  • It appears that it was necessary to dereference function pointers before calling them; this is not required in ANSI C.
  • Structure assignment wasn’t yet possible, but the text says [these] restrictions will be removed in forthcoming versions. (Section 6.2) (Likewise, you couldn’t pass structures by value.) Indeed, structure assignment is one of the features Stroustrup says existed in pre-standard C despite not appearing in K&R (see here, section 2.1).
  • In PDP-11 UNIX, you had to explicitly link in the standard library: cc ... -lS (section 7.1)
  • Memory allocated with calloc had to be freed with a function called cfree (p. 157). I guess this is because calloc might have allocated memory from a different pool than alloc, one which is pre-zeroed or something. I don’t know whether such facilities exist on modern systems.
  • Amusingly, creat is followed by [sic] (p. 162)
  • In those days, a directory in UNIX was a file that contains a list of file names and some indication of where they are located (p. 169). There was no opendir or readdir; you just opened the directory as a file and read a sequence of struct direct objects directly. Example is given on page 172. You can’t do this in modern Unix-like systems, in case you were wondering.
  • There was an unused keyword, entry, said to be reserved for future use. No indication is given as to what use that might be. (Appendix A, section 2.3)
  • Octal literals were allowed to contain the digits 8 and 9, which had the octal values 10 and 11, respectively, as you might expect. (Appendix A, section 2.4.1)
  • All string literals were distinct, even those with exactly the same contents (Appendix A, section 2.5). Note that this guarantee does not exist in ANSI C, nor C++. Also, it seems that modifying string literals was well-defined in K&R C; I didn’t see anything in the book to suggest otherwise. (In both ANSI C and C++ modifying string literals is undefined behaviour, and in C++11 it is not possible without casting away constness anyway.)
  • There was no unary + operator (Appendix A, section 7.2). (Note: ANSI C only allows the unary + and - operators to be applied to arithmetic types. In C++, unary + can also be applied to pointers.)
  • It appears that unsigned could only be applied to int; there were no unsigned chars, shorts, or longs. Curiously, you could declare a long float; this was equivalent to double. (Appendix A, section 8.2)
  • There is no mention of const or volatile; those features did not exist in K&R C. In fact, const was originally introduced in C++ (then known as C With Classes); this C++ feature was the inspiration for const in C, which appeared in C89. (More info here, section 2.3.) volatile, on the other hand, originated in C89. Stroustrup says it was introduced in C++ [to] match ANSI C (The Design and Evolution of C++, p. 128)
  • The preprocessing operators # and ## appear to be absent.
  • The text notes (Appendix A, section 17) that earlier versions of C had compound assignment operators with the equal sign at the beginning, e.g., x=-1 to decrement x. (Supposedly you had to insert a space between = and - if you wanted to assign -1 to x instead.) It also notes that the equal sign before the initializer in a declaration was not present, so int x 1; would define x and initialize it with the value 1. Thank goodness that even in 1978 the authors had had the good sense to eliminate these constructs… :P
  • A reading of the grammar on page 217 suggests that trailing commas in initializers were allowed only at the top level. I have no idea why. Maybe it was just a typo.
  • I saved the biggest WTF of all for the end: the compilers of the day apparently allowed you to write something like foo.bar even when foo does not have the type of a structure that contains a member named bar. Likewise the left operand of -> could be any pointer or integer. In both cases, the compiler supposedly looks up the name on the right to figure out which struct type you intended. So foo->bar if foo is an integer would do something like ((Foo*)foo)->bar where Foo is a struct that contains a member named bar, and foo.bar would be like ((Foo*)&foo)->bar. The text doesn’t say how ambiguity is resolved (i.e., if there are multiple structs that have a member with the given name).
Advertisement

About Brian

Hi! I'm Brian Bi. As of November 2014 I live in Sunnyvale, California, USA and I'm a software engineer at Google. Besides code, I also like math, physics, chemistry, and some other miscellaneous things.
This entry was posted in Uncategorized. Bookmark the permalink.

16 Responses to K&R C

  1. I am not sure if function prototypes where in the _original_ K&R but they definitely existed well before ANSI. They looked like this (quote from a working program):

    int zmsndfiles(file_list*);
    int zmsndfiles(lst)
    file_list *lst;
    {

  2. Not sure why it’s unbelievable that you would print a character, then a backspace, then another character to get overprinting… back in the day, we did that all the time. It didn’t work on what we called “glass TTYs” i.e. “modern” serial terminals, but it worked on teletypewriters, most dot matrix or daisy wheel printers (yes, I remember them too). I also worked with a vector terminal a few times, and overprinting worked on it. In fact, it was a bear to figure out how to erase characters inline with that one.

  3. James Brown says:

    “return 0” from main is not necessary even now (C99).

  4. Struct members were in the same global namespace, so there was no ambiguity possible.

    The Unix v6 sources used this quite a bit for cheap type overloading. Weird when you first see it.

  5. Rob Pike says:

    Guess you never used a paper terminal (or line printers). Overstriking was the norm. APL depended on it. There were magical overstriking video terminals too, such as the Tektronix 4014.

    I believe void* first appeared in pcc, Steven Johnson’s portable C compiler, although I don’t remember the exact reason why. It might have been to allow explicitly an idea of “pointer to anything” in an era when a byte pointer and a word pointer might have very different representations, but that’s just a guess.

    printf was special but so was nargs (number of arguments) a sorta-related function that needed to be implemented afresh with every new PDP-11 MMU, and always had bugs.

    C with classes was, I believe, the first place not to require dereferencing a function pointer, perhaps because it made methods clumsy. It then appeared in pcc, although I think that was pretty much by accident. I remember asking for it to be added to some compiler for consistency as well as convenience, but I’m not sure whether it was pcc or VAX cc or maybe the 68020 pcc. DMR’s cc didn’t do it.

    Whoever broke directory I/O is my enemy for life.

    I don’t believe const and volatile belong in C at all, but that’s just an opinion.

    Linking with -lS got you standard I/O, as it was called, but it wasn’t really standard yet. The real standard C library came for free. What you now know as standard I/O came in V7.

    In V6 there was #define and #include, implemented by cc.c, which was just a wrapper for the C compiler’s multiple passes (c0, c1, c2, as, ld). The abominations like ## and #if came much later; ## was invented by the ANSI committee as part of making the preprocessor the worst part of C, after wide character handling. Do not invent in committee!

    For the frame buffer on our custom hardware at U of T that lived with registers at address 0, we wrote 0->reg and such to access the control registers defined in a struct like:

    struct X {
    int reg;

    };

    You’ve barely scratched the surface of what was different then, but enjoy.

  6. I think I have a newer edition of the book printed in 1984 which I am sure is not the second edition, since it has the same cover as your book.

    I can’t remember the teletype example or returning in parentheses stuff. I guess preface was saying that ANSI standard is being developed and some changes were made since the first edition.

    I have to check though. Once I find the book I will look at the points you wrote and check which ones were changed before second edition (if any).

  7. kernelbob says:

    Rob Pike wrote: “Whoever broke directory I/O is my enemy for life.”
    Especially directory O, right? (-: I’ve heard that Unix directories were user-writable early on, but by the time I got to Unix (between V6 and V7), that was not possible.

    As for exit() being in stdio, no. In K&R C, it was not necessary to declare extern functions. Any undefined word used as a call was assumed to be an extern.

    The really alien thing about Unix in the 1970s was the size of it. It really was possible to read and understand _everything_. When my school got its V7 tapes, I printed off all eight chapters of the man pages, took them home and read them that night. They were only about 3/4 inch thick (2 cm). Maybe 300 pages, probably less, and most were quite brief. And you’ve no doubt seen the Lions book, which includes the full kernel source code in ~100 pages. (And Digital published full schematics for many of their machines of that era, but I digress.)

  8. The fact that the structure member names were from a common names space–the reason you could say 0->bar–is the reason that Unix defined structures have a three letter prefix. Each member in struct time, for example, begins with the string “tm_”. The use of these pesky prefixes has now been immortalized in the Posix standard.

    When Dennis heard that I was writing a quick and dirty C compiler from the 1973 definition of C for a project in 2003 he said that function prototypes were a long over due feature even in 1973. If he had it to do again they would have been in the language from the beginning.

  9. In Solaris 10 and OpenSolaris, reading a directory still works. I haven’t tried it in Solaris 11.
    The Classic C argument lists are easy to understand when you remember two things:
    – V6 C mostly managed without types at all, they were mainly added to distinguish between integers and pointers, and until C99 you could still omit the ‘int’ type, e.g., “auto x;” was a legal way to declare an int variable. So you already had funcname(args) { body } declarations. Adding
    the types between the header and the body meant not disrupting the existing header.
    – Algol 68 and Pascal put argument types inside the header; previous languages, including Fortran, Algol 60, PL/I, Algol W, and Simula 67, put them between the header and the body. C was following a well established tradition. The problem was not where the argument types were written, but that they didn’t have to be written at all and the compiler didn’t use them in processing calls.
    Void, enums, and struct assignment/pass by value were added in between Unix V6 and Unix V7,
    so some time around 1980.

  10. Dima Korolev says:

    I remember writing those exit(0)-s at the end of main()-s.

    It’s interesting to look back and observe how the old C was slowly diverging from old old Bash.

  11. It’s not very often that I get to appreciate articles that are as good as this.
    I shared this on my twitter. Thank you for posting
    this.

  12. Pingback: Linux++ (20th Edition) [Part 1] - Front Page Linux

  13. Pingback: A Guide Through The History of Unix & Linux: Everything You Need To Know - Front Page Linux

  14. segin2010 says:

    While I have not tried this on FreeBSD in about 7 years, as of FreeBSD 9 (released 8 years ago), one can open() and read() directories. However, the data returned was in the directory format of the underlying filesystem. This behavior appeared to be valid at least for FreeBSD UFS and FAT.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s