Some Thrax Notes

7 views
Skip to first unread message

Juri Ganitkevitch

unread,
Feb 12, 2013, 11:07:40 AM2/12/13
to Joshua Developers
Hey guys,

I have a couple of questions that are relevant for the changes I'm
(almost done) doing now.

(1) The ALL unary rule handler: when we encounter a unary rule/chain
in a parse, we can extract the label for the corresponding span in a
number of ways. One of them is named ALL and, if I'm not mistaken,
concatenates the unary chain into a combined nonterminal like so:
PP:NP:NN. That seems iffy, and it also makes a part of my work hard to
do. Did anyone ever use the ALL handler, and is it any BLEU-good? Can
I drop it from Thrax?

(2) Numbers: doubles versus floats. Joshua decodes in floats, Thrax
extracts in (mostly) doubles. Do we actually care to do that? We might
be looking at quite a bit of added space savings if we consistently
change to floats. Unless any of you have experience, I propose that
once I get Thrax operational again we do a contrastive run of
doubles-versus-floats to gauge the impact/loss to be expected in
practice.

Thoughts?

-- Juri

Matt Post

unread,
Feb 12, 2013, 11:10:17 AM2/12/13
to joshua_d...@googlegroups.com
> (1) The ALL unary rule handler: when we encounter a unary rule/chain
> in a parse, we can extract the label for the corresponding span in a
> number of ways. One of them is named ALL and, if I'm not mistaken,
> concatenates the unary chain into a combined nonterminal like so:
> PP:NP:NN. That seems iffy, and it also makes a part of my work hard to
> do. Did anyone ever use the ALL handler, and is it any BLEU-good? Can
> I drop it from Thrax?

I've never used this and it seems that standard practice is to collapse unary chains, since they're kind of bogus anyway. Can you retain the option that lets you choose the label from either the top or the bottom?

> (2) Numbers: doubles versus floats. Joshua decodes in floats, Thrax
> extracts in (mostly) doubles. Do we actually care to do that? We might
> be looking at quite a bit of added space savings if we consistently
> change to floats. Unless any of you have experience, I propose that
> once I get Thrax operational again we do a contrastive run of
> doubles-versus-floats to gauge the impact/loss to be expected in
> practice.

I'm happy to stick to floats. I think we even switched a while back to outputting "%.5f" (instead of the 10 or 20 digits you get when printing a double).



> Thoughts?
>
> -- Juri
>
> --
> You received this message because you are subscribed to the Google Groups "Joshua Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to joshua_develop...@googlegroups.com.
> To post to this group, send email to joshua_d...@googlegroups.com.
> Visit this group at http://groups.google.com/group/joshua_developers?hl=en.
> For more options, visit https://groups.google.com/groups/opt_out.
>

Chris Callison-Burch

unread,
Feb 12, 2013, 11:32:49 AM2/12/13
to joshua_d...@googlegroups.com
I agree that it's fine to remove the non-terminal chains, and that floats are fine. --C

Jonathan Weese

unread,
Feb 12, 2013, 11:35:26 AM2/12/13
to joshua_d...@googlegroups.com
Agreed with everyone else. Drop the "all" (I've never used it; I only included it because the SAMT paper said they had it) and I assume going to floats won't hurt us much.
Reply all
Reply to author
Forward
0 new messages