Word-Break alignment issues

75 views
Skip to first unread message

Elana Feldman

unread,
Jun 6, 2014, 10:26:32 AM6/6/14
to autobi...@googlegroups.com
I'm trying to run AuToBI on a file that has both words and breaks annotated, but it's giving me an error that says 'unequal number of breaks and words.' I know that this is true, but is there a way to override this requirement because it seems from all the ToBI tutorials that there should be an unequal number of boundaries on each tier.

Additionally, when I run the program with only the word tier, every word block on the pitch accent hypothesis tier comes out as either a L* or deaccented.  Am I doing something wrong or is this a valid output from the program?

Thanks,
Elana Feldman

Andrew Rosenberg

unread,
Jun 6, 2014, 3:34:27 PM6/6/14
to autobi...@googlegroups.com
Hi Elana,

Under the ToBI standard each non-silent word should have a break index associated with the degree of disjuncture at the end of the word.  It may be an issue where the way that silent words are marked in the file you're working with are not recognized by AuToBI.   Words that match the regular expression: (#|>brth|brth|}sil|endsil|sil|_|_\*_|\*_|_\*) will be considered silent.  you can change this regular expression using the command line flat -silence_regex

About the results this is valid output, though most likely very incorrect.  The accent/deaccent decisions are typically reliable, the accent type predictions, however, are quite errorful.  (They're still state of the art, but it's a challenging task.  I'm still trying to figure out the best way to handle this -- some people have reduced the task to HIGH/DOWNSTEP/LOW distinction.)

-Andrew


--
You received this message because you are subscribed to the Google Groups "AuToBI Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to autobi-users...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages