Alignment appears random

16 views
Skip to first unread message

Macho Philipovich

unread,
Aug 8, 2019, 10:41:40 PM8/8/19
to pialign-users
Hi there,

I installed pialign and the first couple of alignments that I tested appear to be no better than would be a random answer.

I chose an arbitrary sentence from a Canadian statute, namely:
  • First Nation means a band, or an Indigenous group that is party to a self-government agreement implemented by an Act of Parliament.
  • première nation S’entend soit d’une bande, soit d’un groupe autochtone qui est partie à un accord sur l’autonomie gouvernementale mis en oeuvre par une loi fédérale.
The answer pialign gave me is the following:
  • < [ ((( première ||| by ))) [ [ ((( nation ||| an ))) [ ((( S’entend ||| Act ))) { < ((( soit ||| of ))) ((( d’une |||  ))) > } ] ] ((( bande, ||| Parliament. ))) ] ] [ [ [ ((( soit ||| First ))) [ ((( d’un ||| Nation ))) [ [ ((( groupe ||| means ))) < ((( autochtone ||| a ))) < [ [ ((( qui ||| band, ))) [ < ((( est |||  ))) < ((( partie ||| an ))) ((( à ||| or ))) > > < < [ ((( un ||| group ))) { < ((( accord ||| that ))) ((( sur |||  ))) > } ] ((( l’autonomie |||  ))) > ((( gouvernementale ||| Indigenous ))) > ] ] [ < ((( mis ||| party ))) ((( en ||| is ))) > ((( oeuvre ||| to ))) ] ] ((( par ||| a ))) > > ] ((( une ||| self-government ))) ] ] ] ((( loi ||| agreement ))) ] ((( fédérale. ||| implemented ))) ]
I tried a second test sentence with pialign, in case this was some kind of outlier, but the result was similar. I don't expect perfection, but this output appeared random.In disappointment, I turned to mgiza++, but its out-of-the-box response to the same input was even more incomprehensible, namely:
  • # Sentence pair (1) source length 24 target length 36 alignment score : 4.33196e-97
    première nation s ’ entend soit d ’ une bande , soit d ’ un groupe autochtone qui est partie à un accord sur l ’ autonomie gouvernementale mis en oeuvre par une loi fédérale .
  • NULL ({ 2 3 5 10 }) first ({ 29 30 31 32 36 }) nation ({ }) means ({ }) a ({ 33 34 35 }) band ({ }) , ({ 4 }) or ({ 6 }) an ({ }) indigenous ({ }) group ({ }) that ({ }) is ({ 11 12 13 15 16 17 18 19 22 }) party ({ 20 21 23 24 25 27 28 }) to ({ }) a ({ 9 }) self-government ({ }) agreement ({ }) implemented ({ }) by ({ }) an ({ 1 }) act ({ }) of ({ 7 }) parliament ({ }) . ({ 8 14 26 })
Is there anything I can do to get helpful results? I looked at other word alignment tools, namely eflomal, the Berkeley aligner, and tdx-nlp but I wasn't able to get them to either compile or run. The Berkeley aligner, in particular, required a tree format version of the input, and its docs say a script to generate this would be released later, so that stumped me.

Any help in getting reasonable word alignment working would be greatly appreciated.

Many thanks,
Macho

Leather Dog Muksihs

unread,
Sep 25, 2019, 10:06:33 PM9/25/19
to pialign-users
How large is your bi-lingual corpus? [line count]

Macho Philipovich

unread,
Sep 25, 2019, 11:19:21 PM9/25/19
to pialig...@googlegroups.com, Leather Dog Muksihs

Thanks for the response.

I'm really sorry to have caused the trouble, but at the time I'd written this I fundamentally misunderstood what sentence aligners do, and my entire corpus was the text I included in the email. Have been using bigger a corpus since then and getting more reasonable results.

Best,

Macho

--
You received this message because you are subscribed to the Google Groups "pialign-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pialign-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pialign-users/5fcb6922-ac17-4a92-a865-394868669fe4%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages