Classic Shell Scripting: the history of is a frequently used in introductory CompSci courses to get students to rethink how they imagine data. The classic puzzle is expressed as:

“Given a text file and an integer K, you are to print the K most common words in the file (and the number of their occurrences) in decreasing frequency” (Bentley & Knuth, May 1986).

McIlroy’s solution (June 1986) forms the core of my Catastrophic Frequency script, which can process an infinitely large corpus at a rate of approximately 100 megabytes per minute (on a single 2Ghz processor). Below the fold: the sources.