Literate programming part 1: What is it?

Previously I’ve written about Niels Malotaux’s drive for zero defects. One of his principles (and the IBM Clean Room approach) is that any failure, such the as the discovery of a bug during testing, triggers a return to the design phase. The idea is that a strong design should yield zero defects; any defect is first assumed to be a defect in the design. Niels spends most of his time in the hardware world, and when I saw him last he demonstrated how a design document can link directly to an implementation, such as a circuit diagram or microcontroller code.

I was intrigued by this idea and wondered how it could be applied to the world of software. Ben Linders suggested I look at literate programming (LP) and that sent me down a rabbit hole. Several months later I thought it was about time I wrote down some of what I’d discovered. I hope this will go some small way to breathing new life into a very overlooked subject.

This is therefore a series of articles on literate programming:

  1. What is it? (The rest of this article)
  2. Problems and challenges
  3. Modern variants
  4. Closing thoughts

Literate programming is an approach to writing software, developed by Donald Knuth in the 1980s. I’m inclined to describe it as “design driven coding”, but he didn’t, so perhaps I shouldn’t. What he does say is

Literate programming is a methodology that combines a programming language with a documentation language, thereby making programs more robust, more portable, more easily maintained, and arguably more fun to write than programs that are written only in a high-level language. The main idea is to treat a program as a piece of literature, addressed to human beings rather than to a computer.

The broad approach is that you, as a developer, primarily write a design document, describing the design of your software. But throughout this document you insert the code that demonstrates your design. So you might say “We should make a connection to the server, and we’ll capture network errors, but save the reporting of them for later,” and that will be followed by the code which does exactly that. This way we see the design decision and the code which implements it, right alongside it.

One crucial aspect of this is that the source code is extracted from the design document. Source code is never written directly. This makes the design and the source code almost inseparable. A “code review”, if you chose to do one, becomes a “design and code review”, and we escape the problem of documentation falling behind the code.

A second crucial aspect of Knuth’s original idea is that source code can be written out of order. In fact, it probably should be. The primary ordering of content is the story of the design. The source code included in our design document can include placeholder references to low level code to be included later, which keeps the focus on the high level design, and we can detail the placeholders’ parts later.

A full example might be useful. This design document is from Zachary Yedidia’s Literate project, which also works with a Vim plugin for editing. Here, @s signals a section heading, and @{...} indicates a placeholder, to be detailed further down:

@title Hello world in C

@s Introduction

This is an example hello world C program.
We can define codeblocks with `---`

--- hello.c
@{Includes}

int main() {
    @{Print a string}
    return 0;
}
---

Now we can define the `Includes` codeblock:

--- Includes
#include <stdio.h>
---

Finally, our program needs to print "hello world"

--- Print a string
printf("hello world\n");
---

With a single command (the lit command), this document can then be turned into a nicely-formatted HTML document, complete with hyperlinks allowing the reader to see where a piece of code is detailed, or where detailed code is used. That’s especially useful for non-trivial documents. And of course the same command will also extract and assemble the source code to compile Hello World.

This is a simple example, but there’s a much more powerful example for the Unix word count program. I hope you can see the potential for this. Knuth himself wrote the whole of TeX in this way, and says he couldn’t have done so without it. You can see the rendered design document of his original code as a PDF.

And literate programming has some supporters today. After 6 months of use, Joël Franusic says:

Not only is it a lot of fun to write literate programs, I feel like I have gained a new “super power” of sorts.

Timothy Daly has been using it for 15 years and says:

Imagine a physics textbook that was “just the equations” without any surrounding text. That’s the way we write code today. It is true that the equations are the essence but without the surrounding text they are quite opaque.

He has a great article going into more detail.

And John W. Shipman gives this example, after defending LP against critcism:

I have been doing data entry for the Institute for Bird Populations since 1988. My data entry system is on its seventh complete rewrite. Sometimes several years go by between change requests from the client. The specification is 34 pages, and the internals about 200 pages. A lot of the code is now obsolete. Yet when I get a change request, I can generally refamiliarize myself with the project structure and jet down to the point of the change and charge the client less than an hour’s labor.

If these are the stories and promises, why isn’t literate programming used more widely? I’ll look at that in the next article in the series.

Photo by Christine Cowen

2 thoughts on “Literate programming part 1: What is it?

  1. Hi Nik

    Interesting return to the IBM Program Design Language we used back in the early 1980s (https://dl.acm.org/citation.cfm?id=988253).

    The developer took the functional spec and wrote a pseudocode definition of the requirements translated into human-understandable language that described what the program would do to deliver the requirement. It was then walked through as code – to check that the logic described met the requirements of the spec; the business user could participate and understand the logic, not just developers. Only then could the executable code be developed, placed between the related blocks of pseudocode which served as easily feasible explanatory comments.

    The benefits to code quality and maintainability were so obvious that we used it not only for languages such as PL/1 but also the early 4GLs emerging at the time.

Comments are closed.