A follow-up to the ngram.sh post with data sources for Wikipedia and Project Gutenberg ngrams. The ngram.sh script can easily be modified to extract keywords from these databases, and many others.
A follow-up to the ngram.sh post with data sources for Wikipedia and Project Gutenberg ngrams. The ngram.sh script can easily be modified to extract keywords from these databases, and many others.
Drawing inspiration from General Inquirer (1966) and KWIC, this post proposes an iterative hybrid of available methods in a quest for a more flexible and robust machine-assisted content analysis system.
A short bibliography of the famous wf.sh from 1986:
“Given a text file and an integer K, you are to print the K most common words in the file (and the number of their occurrences) in decreasing frequency.”
A longitudinal study of keyword frequencies in New York Times between 2001 and 2008 supported the hypothesized typologies of catastrophic myths. Patterns of occurrence are consistent between natural and man-made disasters.