Aug 18, 2011

Reducing GWT application compile time

After setting up gwt in eclipse and creating my first gwt project,  I saw the compiler output console shows :


Compiling 1 permutation
      Compiling permutation 0...
   Compile of permutations succeeded



 I was wondering what are these permutations ! Then I found out that gwt generates different versions of the application for different browsers and locales, as unfortunately browsers can behave differently. So it generates different version of the javascript for IE, firefox, safari etc. And if we are using internationalization, for example, 5 languages, then different versions for each language. So 5 locales and 3 browsers, that creates 15 different cases or in other words, 15 different permutations resulting more time for compilation. For development, this can be annoying, so just to check quick output, we can limit this to single permutation by mentioning just one browser or user - agent .  To do this, we need to mention which user - agent or browser we want it to compile for. We need to set the property in the gwt module file : < module-name >.gwt.xml .


< module rename-to='myproject'   >
< inherits name='com.google.gwt.user.User'  />
< set-property name="user.agent" value="gecko1_8"  />
...


In this case it only generates one version for firefox ( gecko1_8 is firefox :D ) and the compilation time reduces to tolerable limit for a developer.




Feb 9, 2011

Analyze Democrat vs Republic speeches: trying out NLTK

Task:  to identify interesting collocations from text

Corpus:  speeches made by politicians in the U.S. House of Representatives during debates over legislation

Analysis: using google's nltk package, collocations module.

Association measures:
  • chi square
  • mutual information
  • log likelihood
  • raw frequency
Green:  appear in both parties' speech, but in different order.
Red: only in Republican speech
Blue: only in Democrat speech.
Black: almost equal importance given by both parties

This table is the result of using raw frequency count of 2 consecutive words appearing together in the corpus, sorted by frequency.

             Top 16 collocations by                 
        Democrats           

united-states
stem-cell
health-care
american-people
cell-research
social-security
tax-cuts
patriot-act
embryonic-stem
conference-report
estate-tax
last-year
bill-would
endangered-species
national-security
homeland-security
  Top 16 collocations by  
   Republicans        


united-states
stem-cell
embryonic-stem
small-businesses
small-business
would-like
cell-research
patriot-act
may-consume
american-people
health-care
death-tax
homeland-security
federal-government
law-enforcement
conference-report