1986. Don Knuth vs Doug McIlroy

It well known computing history that in 1986, Donald Knuth had explored an idea called “Literate Programming” and it involved setting out a programming solution in ways that made sense to humans instead of made sense to a compiler (in the WEB language), and it then was processed and turned into a compiled Pascal program.

And he gave an example WordCount program that ran to several pages and hundreds of lines of code.

Doug McIlroy critiqued it, and compared it to a “one-line” unix script …

tr -cs A-Za-z '\n' |
tr A-Z a-z |
sort |
uniq -c |
sort -rn |
sed ${1}q

McIlroy: So, when I wrote this critique a year or two ago of Knuth’s web demonstration. Jon Bentley got Knuth to demonstrate his web programming system, which is a beautiful idea. You write a program in an order that fits the exposition of the program, rather than the order that fits the syntax of the programming language. And then you have a processor which takes the program written in expository order and turns it into a Pascal program and files it. The expository order not only contains program fragments, it contains lots of text right with it. The declarations are stuck in at the point were you need them, rather than up at the head of the block, and then they’re moved by the web processor to the right place to compile the run. Really elegant, and he calls it, Knuth calls it “literate programming”. And Bentley asked him to write a demonstration literate program for Bentley’s “Programming Pearls” column.

I wrote a critical report about Knuth’s program. It’s a little unfair, because his program was written as illustration of a technique and I, my report criticizes it on engineering grounds, that that was not the right way to write the program. And one of the things I wrote about his program was, it reads the text and prints out a list of the distinct words in the text and word counts, sorted by word count. An easy problem. And he wrote one monolithic program to do whole job, and I said "Look, here’s the way you do it in Unix with canonical pipelines, and although I don’t recommend it … although this is not what he was out to do, I really think that he should have not put definition of what word is, into the program that builds tables, that these were completely unrelated things.

I have heard a number of times people, eg in a YouTube video, compare the unix script to the Pascal (actually WEB) program, with the perspective that the solutions were something like 500 lines vs 6 (or 1) line.

For a long time, I have thought … “Hold up. Each of those unix utilities is a program made up of code. You might not have written them, but they did have to be written”.

And I have wanted to check up on how much code is embodied in those utilities : tr, sort, uniq, & sed.

So today I asked Claude.ai about it.

When asking anything from AI I always add ‘List references’.

Not only is it often very instructive, but when you get a ridiculous answer, you can usually immediately see why.

1 Like

What are you interpreting as incorrect?

Given the prompt, and realising now there’s only 4 utilities involved …
(Not 1986 versions, but indicative)

tr : 2000
sort : 5000
uniq : 650
sed : 5500 w/o dependencies

CoreUtils : coreutils/src at master · coreutils/coreutils · GitHub
Sed : sed/sed at master · mirror/sed · GitHub

I remembered that I forgot to add the page that got me started thinking of this stuff again.