V0.4.2 - 12/3/2012
* Multi-threaded bootstrap resampling and approximate randomization significance tests (large time savings for many systems with many optimization runs)
* Fixed bug in n-best scoring that caused oracle *submetrics* such as precision and recall (*not* overall score for BLEU, TER, METEOR, etc. metrics) to be reported incorrectly for oracle hypotheses
* Updated to Guava V11
Some timing results on 4 threads vs 1 using the recent multi-threading improvements on the 3 example systems with 3 optimizer runs each:
Load METEOR 25.5s / 25.5s
Collect Sufficient Stats 32.7s / 86.7s
Bootstrap Resampling 70.9s / 189s
Approximate Randomization 122s / 336s
TOTAL 4m 13s / 10m 39s
V0.4.1 - 12/30/2011
* Fix sizing bug reported by John DeNero, which caused MultEval to crash
* Removed "static" keyword from several places within the TER library to make it more amenable to multi-threading