Photo by J Brew

Literate programming part 3: Modern variants

Literate programming (LP) hasn’t taken off, despite its promises to developers. After my previous posts understanding literate progamming itself and looking at some possible challenges and problems, in this post I’ll be looking at some tooling that’s been developed recently. They try to fulfill some of those promises and some explicitly take on some of the challenges.

In this series

  1. What is it?
  2. Problems and challenges
  3. Modern variants (below)
  4. Closing thoughts

We’ll start with a…

Brief history

The tools today can be said to have descended from the tools of past decades, so let’s review those briefly.

Knuth’s original tool was called WEB [pdf]. Though it heralded literate programming it was complex. It required the source material to be written in TeX (which is itself complicated), it was tied to producing Pascal output, and it had a large number of directives, most of which allowed additional formatting of the Pascal code as it was embedded in the rendered design document. The manual itself runs to 210 pages, which says a lot. After WEB came CWEB [pdf], which was WEB for C programming.

Then in 1993 Norman Ramsey produced noweb which was effectively a simplified WEB. It is language-independent,  has very few directives, and its manual runs to 25 pages.

The more recent attempts to revive literate programming take their lead primarily from noweb, in that they aim to be easy to get started with and aim to be language-independent. Although this is not intended to be a complete survey, they can be categorised as follows….

Faithful LP tools

These are tools which aim to be quite faithful to the original intent and tooling of LP. They provide placeholders (which allows souce code to be written out of order, to promote the design narrative) and are mostly language-independent.

Zachary Yedidia’s Literate is a small tool that I’ve spent a bit of time with, and it’s what was behind the examples I provided in the first article of the series. It produces not only single pages split into to sections, but also multi-page books split into chapters. (Each chapter is one of the aforementioned pages.) It also has a vim plugin, so if vim is your editor of choice then you can edit its .lit files and get appropriate syntax highlighting and navigation. If vim is not your editor of choice you may have a bit of work to do. I found it simple, but also quite complete.

James Taylor’s literate-programming-lib is the basis for his lightweight litpro tool and the larger literate-programming command line client which is litpro plus a lot of plugins specifically to aid Javascript developers. As it’s a Javascript library it allows the end-user tools to be extended with more plugins. The raw design documents for litpro and friends are markdown—or rather markdown plus some additional syntax. There’s a manual for all of them on LeanPub, though it’s unfinished, and whole hasn’t been touched for a couple of years. But that’s not to say it’s not working or not useful.

If you code in F# then you might like to take a look at F# Formatting. This allows the raw design document to be either a markdown document with F# code, or an F# script document with embedded markup. Only the latter seems to provide placeholder inclusion, but the former has compiler integration, too.

It’s also worth mentioning Bob Myers’ modernlit. You may recall I referenced Bob extensively in the last article, because he wrote a very detailed piece addressing seriously some of the arguments against literate programming. But modernlit is, sadly, an incomplete project, even though the documentation promises much. It’s intended to rely on pure markdown, and integrate with VSCode. It’s a shame.

Finally we should not forget noweb itself. I’ve not tried this but it’s alive and kicking and available on Github (among other places).

The challenge with all these tools is that writing code in markdown (or similar) doesn’t allow full use of an IDE, but the suggestion from these authors is that to do so will always require use of an IDE’s hooks. Specifically, with a pre-compile hook to first assemble the code from the raw design documents we can get our code compiling in the IDE. Once there, Literate and modernlit both allow source mapping to enable line number referencing back to the raw design docs. Perhaps this isn’t unreasonable; after all, unit test frameworks (for example) stand alone from their related IDE integration.

If you want to have a go with literate programming more-or-less as it was intended, then Literate and noweb should be fairly easy to get started with.

Git-based tools

Picture from Bo-Yi WuAs mentioned before, one of the protestations levelled at literate programming is that it was created for a time when source control wasn’t in use, and that source control seems to be a good place to document coding decisions. With that in mind, some people have experimented with git as a way of approaching literate programming.

Pete Corey has discussed literate commits. The intent is that git allows more flexibility over how to present the story of the code. But Pete has only started down that road. He’s using commit messages as a way of thoughtfully documenting his design decisions:

The knowledge that the project’s revision history will be on display, rather than buried in the annals of git log is a powerful motivating factor for doings things right the first time.

Then it’s possible to extract a narrative using the git log comments with the code, as in his example.

A very similar tool is gitorial, which does largely the same thing, but with the explicit intent of a producing a tutorial. There is an example output to be viewed, too.

Ben North has taken the idea much further with literate-git. This presents git commits (written as markdown) as a hierarchical document—no longer a linear history. With his demonstration project you can see a story of how the code is developed (description plus code) but at each stage there is a chance to drill down and see the more detailed story of that stage, including code changes in GitHub’s diff style—this placeholder was deleted and replaced with this code, etc. The result is an interactive design document.

However, there are limitations with literate-git, as Ben explains very openly. It requires rewriting a linear series of git commits, adding markers to indicate which commits are a subsection of a larger piece of work. This alone requires care. Also problematic is the principle of editing the design commentary (i.e. the commits) of work already published or pushed to others.

Clearly there has been some thought going into git-based literate programming, but it still seems as though it’s twisting git for a purpose it wasn’t designed for. In all these tools the design narrative is source control, whereas it seems to me that the narrative itself should be under source control. If we evolve our design there is nowhere to store that history.

Editor-based tools

There are a couple of literate programming tools tied directly into text editors.
Babel builds on Emacs’s Org-mode (outline mode) by allowing multiple languages to co-exist in a single Org-mode doc. It uses the same syntax as noweb.

Leo is a text editor wholly centred on writing in outline mode (like Emacs’s Org-mode) and integrates with noweb.

Simplified approaches

If you like the idea of documentation tied closely to the code then you might like a much simplified approach.

KISS Literate Programming (KLP) is very, very simple. It seems to me to be just a script to extract code from a README file.

Reverse literate programming by Quentin Bonnard is based on this idea:

Instead of writing a story that will be transformed into a program, we will write a program that will be reassembled into a story.

There is the code, and there is a markdown file describing the code. Quentin’s tool is an extension to Jekyll (which is what GitHub Pages is based on), so it allows Jekyll markdown documents to include intructions that identify and insert code snippets.

Language-specific approaches

There is also a small number of language-specific tools that, by virtue of being language-specific, enable very powerful, well-integrated workflow.

Rmarkdown, Jupyter Notebooks and Wolfram Notebooks all enable the creation of interactive or “live” notebooks, allowing us to explain complex analyses. These are typically used—widely—in the science and data science communities and produce particularly compelling results.

On a less grand scale Literate Haskell is a way of compiling Haskell from markup documents, and Marginalia allows Closure code to be viewed alongside its associated commentary (extracted from the code). You can see an example of Marginalia’s output.

None of these allow LP’s all-important placeholder substitution, so none of them can really claim to be literate programming as it was intended, but certainly the notebook tools demonstrate that IDE integration really elevate their capabilities.

What have we learned?

Well, I’m not sure what you’ve learned, but here’s what I’ve learned. There are some respectable (and relatively straightforward) LP tools available today. However, they are mostly missing the integration with the rest of a developer’s toolchain—IDEs in particular. And when we look at Rmarkdown and friends it’s clear what a compelling proposition that integration would make them. Meanwhile, the few experiments with git commits as the home for design decisions have shown (to me at least) that it’s just not the right tool for this job; its use is either too basic to constitute real literate programming or too complicated for practical use.

In the next and final part of the series I’ll offer some closing thoughts on where we might go from here.

Bombe photo from J Brew

gitflow picture from Bo-Yi Wu