--
You received this message because you are subscribed to the Google Groups "Joshua Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to joshua_develop...@googlegroups.com.
To post to this group, send email to joshua_d...@googlegroups.com.
Visit this group at http://groups.google.com/group/joshua_developers.
For more options, visit https://groups.google.com/d/optout.
You received this message because you are subscribed to a topic in the Google Groups "Joshua Developers" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/joshua_developers/FUBFP0hvqlQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to joshua_develop...@googlegroups.com.
To post to this group, send email to joshua_d...@googlegroups.com.
Visit this group at http://groups.google.com/group/joshua_developers.
For more options, visit https://groups.google.com/d/optout.
matt
How large?
To unsubscribe from this group and stop receiving emails from it, send an email to joshua_developers+unsubscribe@googlegroups.com.
To post to this group, send email to joshua_developers@googlegroups.com.
Visit this group at http://groups.google.com/group/joshua_developers.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to a topic in the Google Groups "Joshua Developers" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/joshua_developers/FUBFP0hvqlQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to joshua_developers+unsubscribe@googlegroups.com.
To post to this group, send email to joshua_developers@googlegroups.com.
--Lewis
--Lewis
--Lewis
Visit this group at https://groups.google.com/group/joshua_developers.
Visit this group at https://groups.google.com/group/joshua_developers.
Hi Lewis,That's the monolingual data. You want the parallel data (see the entries under the "Parallel data" box). The monolingual data you pointed to can be used to build a larger language model, but that's a bit more complicated.I am working on building a Chinese–English model. The parallel data for that is harder to acquire because it's all tied up with DARPA stuff. Is that all right, or do you want to build it yourself?
matt
matt
How large?
--Lewis
--Lewis
--Lewis
matt
As big as possible — the more, the better. I'd say at least 100k, but ideally in the millions (for reference, the Europarl corpora usually have about 2 million sentence pairs. You can see the http://statmt.org/wmt15/translation-task.htmlsummary for more information).
--Lewis
--Lewis
matt
How large?
--Lewis
--Lewis
--Lewis
Visit this group at https://groups.google.com/group/joshua_developers.
For more options, visit https://groups.google.com/d/optout.