Skip from tokenizer

14 views
Skip to first unread message

arezooa...@gmail.com

unread,
Oct 13, 2014, 9:04:47 AM10/13/14
to joshua_d...@googlegroups.com
Hi
The Joshua source code is added in eclipse and built with Ant.
But
I want to skip from Tokenization step. How can I compile the Joshua source code without tokenizer?

Thank You

Matt Post

unread,
Oct 13, 2014, 1:31:25 PM10/13/14
to joshua_d...@googlegroups.com
Hi,

I'm not sure what you mean. Eclipse is used to develop the decoder portion of Joshua; the tokenizer is only used when building models, and has nothing to do with Eclipse. You can specify your own tokenizer with the '--tokenizer' flag to pipeline.pl. The command should read the text on STDIN and print it to STDOUT. You can also turn off tokenization (and normalization and lowercasing) with --no-prepare-data.

matt


--
You received this message because you are subscribed to the Google Groups "Joshua Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to joshua_develop...@googlegroups.com.
To post to this group, send email to joshua_d...@googlegroups.com.
Visit this group at http://groups.google.com/group/joshua_developers.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages