Feb 9, 2011

Analyze Democrat vs Republic speeches: trying out NLTK

Task:  to identify interesting collocations from text

Corpus:  speeches made by politicians in the U.S. House of Representatives during debates over legislation

Analysis: using google's nltk package, collocations module.

Association measures:
  • chi square
  • mutual information
  • log likelihood
  • raw frequency
Green:  appear in both parties' speech, but in different order.
Red: only in Republican speech
Blue: only in Democrat speech.
Black: almost equal importance given by both parties

This table is the result of using raw frequency count of 2 consecutive words appearing together in the corpus, sorted by frequency.

             Top 16 collocations by                 
        Democrats           

united-states
stem-cell
health-care
american-people
cell-research
social-security
tax-cuts
patriot-act
embryonic-stem
conference-report
estate-tax
last-year
bill-would
endangered-species
national-security
homeland-security
  Top 16 collocations by  
   Republicans        


united-states
stem-cell
embryonic-stem
small-businesses
small-business
would-like
cell-research
patriot-act
may-consume
american-people
health-care
death-tax
homeland-security
federal-government
law-enforcement
conference-report