gestures coded without objects, and obtaining parser input

25 views
Skip to first unread message

Karin

unread,
Sep 23, 2017, 4:12:50 PM9/23/17
to chibolts
Hello all,

I will list here two questions, but if it's better to split them up between posts, please let me know.

(1) I am wondering how to interpret the following coding of a gesture:


you take <two pieces> [//] &=ges two [/] two &=fingers:two &sli slices of bread .


Does this imply that (a) the speaker made an empty gesture, and then said "two", or (b) the annotator forgot a colon?


(2) Is there a program in CLAN that allows for the cleaning of the speaker tiers, such that they appear in the way that the parser sees them?   If not, is there a CLAN program that I could modify to produce such output?  I am reasonably experienced with programming, and hopefully would know enough C++ to manage this.


Thank you in advance for your time --

-- Karin



Leonid Spektor

unread,
Sep 23, 2017, 6:37:08 PM9/23/17
to chib...@googlegroups.com
Karin,

I can only answer your second question. There are two CLAN commands that will cleanup speaker tiers in CHAT files. It depends on how much cleanup you need. First command is FLO. It removes all codes and more leaving as little as just a plain text lines. Second command is CHAT2CONLL. It creates output in format suited for either one of the following parsers: Depparse, MaltParser, TurboParser, AnCoraCorpus, Connexor, Clearparser.


Leonid.

--
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+u...@googlegroups.com.
To post to this group, send email to chib...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/1c48fc40-b165-4d03-980f-46b79ae102d9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Davida Fromm

unread,
Sep 23, 2017, 8:53:32 PM9/23/17
to ChiBolts
Karin,

I will answer your first question.  The transcriber did not forget a colon.  The speaker repeated the word “two” in revising “two pieces to “two slices”.  Whether it was an “empty gesture” may depend on your definition of an empty gesture.  It was probably relatively imprecise if it was flagged simply as &=ges, but the best thing to do is to view the video at the Browsable database or by downloading the relevant media file.  The files are set up so that each utterance is linked to the video so you can search for that utterance and play it and watch the video.  Please send an email directly to me (fr...@andrew.cmu.edu) if you want more information about the gesture coding or video playback options.

-Davida

Karin

unread,
Sep 24, 2017, 8:40:02 PM9/24/17
to chibolts
Thanks for the detailed answers!  I'll look into those CLAN commands, and I agree that the video would help disambiguate the "ges" annotation.  It's helpful to know that it wasn't a mistake.

Karin

unread,
Oct 22, 2017, 5:53:04 PM10/22/17
to chibolts
Quick follow-up question on this discussion from a month ago: I do not see chat2conll in the list of available programs in unix-clan/unix/bin.  Here's a list of all files that start with "chat2":

chat2anvil  chat2ca  chat2elan  chat2praat  chat2xmar

Would any of those programs create text that is identical or similar to what CLAN's parser sees?   If not, is there a way I can install the chat2conll program (or am I looking in the wrong directory)?

Thanks again --

- Karin

Leonid Spektor

unread,
Oct 22, 2017, 11:24:51 PM10/22/17
to chib...@googlegroups.com
Karin,

The "chat2..." commands convert CHAT formatted data to format of the corresponding applications "Anvil", "Elan", "Praat", "EXMARaLDA" and "CA". CHAT2CONLL was not ported to Unix. It can be ported if people in charge at our end approve it, but I would suggest that you try that command in either MacOS or Windows OS version of CLAN to see if it creates the output that you really want before we spent time porting it to Unix. As far as I know, no one except us in house have ever used CHAT2CONLL command.

I have to ask you to clarify what do you mean by "text that is identical or similar to what CLAN's parser sees". CLAN has one parser MOR and it sees the CHAT formatted files, so they do not need to be converted to anything else. Perhaps I misunderstood your question and so I gave you the wrong suggestion to use CHAT2CONLL. Please, explain in more details what it is you are trying to achieve and what data format is your starting point. It would also help if you could give an example of what output format you want to get.


Leonid.

Karin

unread,
Oct 23, 2017, 12:55:08 AM10/23/17
to chibolts
Thanks for the quick response!   I don't think that porting to Unix should be necessary.  I could use another OS.   I also now see that FLO might be sufficient.   Some example output from FLO might illustrate what I was trying to say.

Here's an example sentence:

*PAR:    and they whole [: all] [* s:r] <mushed up into> [//]
    &=hands:together &=laughs mushed in together .
%mor:    coord|and pro:sub|they post|all v|mush-PAST adv|in adv|together .

Here's the FLO output, using the +d option and +t*PAR:
*PAR:    and they all mushed in together .
%mor:    coord|and pro:sub|they post|all v|mush-PAST adv|in adv|together .


I think when I said "what the parser sees", this was motivated by the fact that the %mor line reflects the dependency parse after a lot of postprocessing (e.g., removal of repetition, replacement with annotator's suggested word, etc).  If I were to use an external parsing program, I would want it to see the results of that postprocessing, rather than the original text with all its annotations.  And it looks like FLO might be getting something close to that.

Thanks again for your help --
-Karin
Reply all
Reply to author
Forward
0 new messages