Jane 2.1 has been released

46 views
Skip to first unread message

Matthias Huck

unread,
Aug 27, 2012, 11:04:44 AM8/27/12
to jane-ann...@googlegroups.com
Dear Jane users,

on behalf of the RWTH Aachen University machine translation research
group, I am happy to announce the release of version 2.1 of RWTH's open
source statistical machine translation toolkit. Jane 2.1 is now
available for download at http://www.hltpr.rwth-aachen.de/jane .

We have also updated the Jane manual and extended it with usage
instructions for some of Jane's more advanced features.

Changes in Jane 2.1 are:

* tools
- phrase table interpolation tool
- extraction of reordering events for the lexicalized discriminative
reordering model
* phrase extraction
- more efficient phrase extraction pipeline with filtering of
marginals
- weighted phrase extraction
* features
- extended count feature (enhanced low frequency, ELF)
* translation
- LM look-ahead in source cardinality synchronous search (SCSS)
- word class LM in SCSS
- KenLM interface
- option to apply observation histogram pruning after the computation
of secondary model scores in the cube prune decoder
* support for running Jane in the LSF queueing system
* support for pseudo-parallelization on multi-processor machines by
running multiple instances of Jane (no "real" shared-memory
multi-threaded parallelization)
* some bugs fixed (e.g., on-demand LM loading with reloadLM option in
HPBT is working correctly now)
* extended manual, with more information on how to use some advanced
features

Jane supports phrase-based and hierarchical phrase-based machine
translation, both with many recent enhancements. Let me take the
opportunity to summarize key features of the toolkit:

* phrase extraction
- contiguous phrases
- hierarchical phrases
- support for weighted phrase extraction
* search algorithms
- source cardinality synchronous search for phrase-based translation
- cube pruning, pooled cube pruning and cube growing for hierarchical
translation
* forced alignment phrase training for the phrase-based model
* syntactic extensions of the hierarchical model
- soft syntactic features
- soft syntactic labels
- soft string-to-dependency
* reordering models
- distance-based distortion
- non-lexicalized reordering rules
- discriminative lexicalized reordering model
* lexicon models
- various scoring techniques
- discriminative word lexicon models
- triplet lexicon models
* insertion and deletion models
* optimization methods
- MERT
- MIRA
- Downhill Simplex
* phrase table interpolation tool
* support for different language model formats
- ARPA format
- SRI binary format
- KenLM binary format
- randomized LMs
- in-house format with on-demand loading capabilities
* scoring with class-based LM
* parallelized operation under an Oracle Grid Engine or Platform LSF
batch system

Do not hesitate to contact us if you have any questions about Jane.

Enjoy!

Best regards,
Matthias

--
Dipl.-Inform. Matthias Huck http://www.hltpr.rwth-aachen.de/~huck
Chair of Computer Science 6 Phone: +49 (241) 80-21617
RWTH Aachen University Fax: +49 (241) 80-22219
Ahornstr. 55, 52056 Aachen, Germany Room: 6126

Reply all
Reply to author
Forward
0 new messages