Task: to identify interesting collocations from text
Corpus: speeches made by politicians in the U.S. House of Representatives during debates over legislation
Data: downloaded from http://www.cs.cornell.edu/home/llee/data/convote.html
Analysis: using google's nltk package, collocations module.
Association measures:
- chi square
- mutual information
- log likelihood
- raw frequency
Green: appear in both parties' speech, but in different order.
Red: only in Republican speech
Blue: only in Democrat speech.
Black: almost equal importance given by both parties
This table is the result of using raw frequency count of 2 consecutive words appearing together in the corpus, sorted by frequency.
Top 16 collocations by Democrats united-states stem-cell health-care american-people cell-research social-security tax-cuts patriot-act embryonic-stem conference-report estate-tax last-year bill-would endangered-species national-security homeland-security | Top 16 collocations by Republicans united-states stem-cell embryonic-stem small-businesses small-business would-like cell-research patriot-act may-consume american-people health-care death-tax homeland-security federal-government law-enforcement conference-report |