wf.sh is a frequently used in introductory CompSci courses to get students to rethink how they imagine data. The classic puzzle is expressed as:
“Given a text file and an integer K, you are to print the K most common words in the file (and the number of their occurrences) in decreasing frequency” (Bentley & Knuth, May 1986).
McIlroy’s solution (June 1986) forms the core of my Catastrophic Frequency script, which can process an infinitely large corpus at a rate of approximately 100 megabytes per minute (on a single 2Ghz processor). Below the fold: the sources.
- Classic Shell Scripting: Hidden Commands that Unlock the Power of Unix, by Arnold Robbins, Nelson H. F. Beebe & Nelson H. F. Beebe. Published by O’Reilly, 2005. p. 102
Wherein Robins & Beebe tell the history of wf.sh. - Programming pearls: literate programming, by Jon Bentley & Don Knuth. Communications of the ACM archive, Volume 29, Issue 5 (May 1986) p.384 – 369
Wherein Bentley presents his word counting challenge to Knuth. - Communications: Volume 29, Issue 6 , Programming pearls: a literate …, by Jon Bentley, Don Knuth & Doug McIlroy. Communications of the ACM archive. Volume 29, Issue 6 (June 1986). p.471 – 483
Wherein Doug McIlroy presents wf.sh in response to a challenge by Bentley to Knuth.