CLAN: Text Extraction

79 views
Skip to first unread message

Snigdha Khanna

unread,
Feb 6, 2024, 3:08:36 PMFeb 6
to chibolts
Hello!

I am trying to extract "clean" text from annotated transcripts that I have. Is there any way to use CLAN to export a txt file format, or a simpler method to remove annotations from the transcripts, so that I can parse it using NLP?

Any help is appreciated!

Thanks,
Snigdha

Brian Macwhinney

unread,
Feb 6, 2024, 4:10:32 PMFeb 6
to ChiBolts
CLAN’s FLO program does most of this. Alternatively, you could grab all the <w> tags from the XML version of the database.

What kind of NLP do you want to use? You could apply Universal Dependencies directly.

— Brian MacWhinney
Teresa Heinz Professor of Cognitive Psychology,
Language Technologies and Modern Languages, CMU
> --
> You received this message because you are subscribed to the Google Groups "chibolts" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+u...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/237e8996-63ba-4476-859f-4b1e6841ab3an%40googlegroups.com.

Snigdha Khanna

unread,
Feb 6, 2024, 4:14:57 PMFeb 6
to chibolts
I want to remove all annotations like the gestures and errors. Hence, I would like to use the txt format of just the transcribed text without annotations.

Any idea how to do that?

Leonid Spektor

unread,
Feb 6, 2024, 4:39:03 PMFeb 6
to chib...@googlegroups.com
Command flo +ca +t* *.cha should work.


Leonid.

Giulia Sanguedolce

unread,
Feb 6, 2024, 5:31:16 PMFeb 6
to chib...@googlegroups.com
Hello Snigdha :) I used a python code to extract the text lines from the .cha files..Let me know if this can help you and I’ll send you the piece of code for it !

Regards, 
Giulia
________________
Giulia Sanguedolce 
PhD student - AI & Machine Learning for Healthcare https://ai4health.io 
Department of Electrical & Electronic Engineering | Department of Brain Science
Imperial College London
e-mail: gs2...@ic.ac.uk | sangue...@gmail.com

Reply all
Reply to author
Forward
0 new messages