Science, Culture And The Google Ngram Viewer

Written by January 12, 2011 10:06 pm 1 comment

Scientific study of cultural trends involves years of training and preparation, followed by a few additional years of data compilation, ending with spending a major chunk of one’s life analyzing all that data collected.

With Google’s new Ngram Viewer application you can become a scientific researcher on cultural trends without having to leave your room!

Here’s how it works. Google has archived tons of books in a few different languages, going back a few centuries. This archival material can be subjected to Google’s search algorithms, given various criteria. The result is that this vast database of information can be scanned for specific words, and this information presented in useful graphical comparisons.

google-ngram-viewer_1

Comparison of books containing the word "science" vs those containing "religion".

Google calls this Culturomics. They have been working with the Cultural Observatory at Harvard to come up with the datasets and the tools to analyze them. According to Google’s description of the Ngram Viewer tool, it is “the first tool of its kind, capable of precisely and rapidly quantifying cultural trends based on massive quantities of data. It is a gateway to culturomics! The browser is designed to enable you to examine the frequency of words (banana) or phrases (‘United States of America’) in books over time. You’ll be searching through over 5.2 million books: ~4% of all books ever published!”

google-ngram-viewer_3

Same comparison as above, with dates narrowed down to the last few decades.

Each language that is represented in the database makes up one corpora, and the entire collection is called a corpus. You can search each corpora, and you can refine your search within each corpora. You can also observe cultural trends in one corpora and compare it to others. This allows one to compare cultural trends across different cultures!

All searches are case-sensitive. Various factors, such as changing use of terminology in each language, must be taken into account before drawing conclusions from the analysis of the raw data.

Same image as first one, with Science and Religion capitalized

Similar search to the first one above, but with Science and Religion capitalized

Like always in science, the interpretation of the raw data is key. Careful controls are absolutely essential. A paper published last month in the journal Science by a team comprised of the Harvard group, the Google group and a few other scientists including Steven Pinker, provides potential scientific researchers with guidelines as to the proper use of the Google Ngram Viewer tool and the corpus of book entries. The full article is available for free when you register on the website of the journal Science (no credit card necessary).

Let’s use the forums to take a stab at generating and interpreting data using this tool. A thread for this purpose has been created here.

Share This Article:
  • Facebook
  • Twitter
  • StumbleUpon
  • Reddit
  • email
  • Diigo

This post was written by:

- who has written 62 posts on Nirmukta.

1 Comment

Leave a Reply


Comments are moderated. Please see our commenting guidelines

Trackbacks