Dec 1, 2015, 2:50:58 PM12/1/15
to pialign-users

I'm trying to use pialign to align data from this https://wit3.fbk.eu/
I have most of the sentence alignment problems solved, so now just want to get some word alignments.

I have two files, ted_en and ted_es of size ~10K that i'd like to run pialign through.
It runs for a while, then I get 

WARNING: parsing failed! loosening beam to prob=2664.09

addToChart(Span(28,29,28,28), Bus error: 10

Actually, I seem to get these parsing failed! warnings a lot, and I have no idea what's causing them.

Here are some example sentences:

i first became fascinated with octopus at an early age . 

i grew up in mobile , alabama -- somebody's got to be from mobile , right ? -- 

and mobile sits at the confluence of five rivers, forming this beautiful delta . 

and the delta has alligators crawling in and out of rivers filled with fish and cypress trees dripping with snakes, birds of every flavor . 

it's an absolute magical wonderland to live in -- if you're a kid interested in animals , to grow up in . 

and this delta water flows to mobile bay , and finally into the gulf of mexico . 

and i remember my first real contact with octopus was probably at age five or six . 

i was in the gulf , and i was swimming around and saw a little octopus on the bottom . 

and i reached down and picked him up , and immediately became fascinated and impressed by its speed and its strength and agility . 

it was prying my fingers apart and moving to the back of my hand . 

los pulpos me fascinaron desde una edad muy temprana .

crecí en mobile , alabama. alguien tenía que ser de mobile , ¿no ?

y mobile está emplazado en la confluencia de 5 ríos que forman este hermoso delta .

y el delta tiene caimanes que entran y salen de ríos llenos de peces y cipreses repletos de serpientes y aves de todo tipo .

es un mundo absolutamente mágico para vivir y crecer allí , si uno es niño y le interesan los animales .

el agua del delta fluye hacia la bahía mobile y finalmente al golfo de méxico .

recuerdo mi primer contacto real con pulpos ... fue probablemente a los 5 ó 6 años .

yo estaba nadando en el golfo y vi un pequeño pulpo en el fondo .

me agaché y lo recogí , e inmediatamente quedé fascinado e impresionado por su velocidad , su fuerza y agilidad .

curioseaba mis dedos y se movía hacia la palma de mi mano .

I don't see any issues with them? it's hard to know what the error is.

I ran pialign with: tools/pialign/pialign -threads 8 -batchlen 32 data/raw/ted/ted_en data/raw/ted/ted_es data/fastalign/



Graham Neubig

Dec 1, 2015, 3:12:43 PM12/1/15
to pialig...@googlegroups.com
Hi Vishesh,

Thanks for the report!
Would it be possible for you to provide the ted_en and ted_es files? It will be a lot easier to debug if you can provide them.


Dec 1, 2015, 3:24:56 PM12/1/15
to pialign-users


Dec 1, 2015, 3:28:04 PM12/1/15
to pialign-users
Also, I tried running this without the multithreaded options on, and it did produce a 1.samp and 1.pt file with no warnings. The output structure looked very different (EM algorithm style, versus a whole bunch of addSpan writeouts). 
However, when i run itgstats align < data/1.samp I get a whole bunch of blank lines, and I'm not sure what's up with that.

Graham Neubig

Dec 1, 2015, 4:03:26 PM12/1/15
to pialig...@googlegroups.com
Hi Visesh,

I tried running it on the data, and I occasionally get the "parse failed" warning, but wasn't able to reproduce the bus error. Does this happen to you every time, or only occasionally?

WRT the parse failed error, parsing does have the potential to fail occasionally, but this is not the end of the world. I commented out one line that gave over-agressive debug output when this happened, and it seems to be continuing fine after that. Could you try checking out the latest version and seeing if it works?

