V0.4.2 - 12/3/2012
* Multi-threaded bootstrap resampling and approximate randomization significance tests (large time savings for many systems with many optimization runs)
* Fixed bug in n-best scoring that caused oracle *submetrics* such as precision and recall (*not* overall score for BLEU, TER, METEOR, etc. metrics) to be reported incorrectly for oracle hypotheses
* Updated to Guava V11
Some timing results on 4 threads vs 1 using the recent multi-threading improvements on the 3 example systems with 3 optimizer runs each:
Load METEOR                   25.5s /      25.5s
Collect Sufficient Stats      32.7s /      86.7s
Bootstrap Resampling          70.9s /     189s
Approximate Randomization    122s   /     336s
TOTAL                     4m  13s   / 10m  39s
V0.4.1 - 12/30/2011
* Fix sizing bug reported by John DeNero, which caused MultEval to crash
* Removed "static" keyword from several places within the TER library to make it more amenable to multi-threading