Replacement function in a series of transcripts

21 views
Skip to first unread message

sophie....@gmail.com

unread,
Oct 27, 2023, 4:52:21 AM10/27/23
to chibolts
Dear everyone,

I'm using CLAN's MOR function and I need to correct some annotations within a large number of transcripts (usually separating two words like "j'ai" into "j' ai" so that both words are annotated). Is there a function in CLAN to do this among all the transcripts?

Sorry if this question is naive, but I haven't found an answer in the documentation.

I'd also like to take this opportunity to point out that the MOR function no longer works on French grammar with the latest versions of CLAN. I have to use an earlier version for the function to work: is there a reason for this? Please know that I'm willing to help in any project aimed at extending the functions available in CLAN to the French language (especially the Universal Dependency System).

Thank you in advance for your help,

Best regards,

Sophie Fagniart
University of Mons, Belgium

Leonid Spektor

unread,
Oct 27, 2023, 6:50:20 AM10/27/23
to chib...@googlegroups.com
Sophie,

1. You can use command chstring +s"j'ai" "j' ai" *.cha to separate words with space character.

2. MOR for French still works. CLAN can download and install French (fra) grammar, but there seem to be many words missing from the French lexicon. Someone else would have to answer MOR French grammar question.
 

Leonid.

--
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/67cbdc86-828d-435c-bbbe-f62f03740e7dn%40googlegroups.com.

Brian Macwhinney

unread,
Oct 27, 2023, 12:32:31 PM10/27/23
to ChiBolts, info-c...@groups.google.com
Dear Sophie,
As I noted in a posting to chibolts and info-childes about 3 months ago, I have been implementing UD (Universal Dependencies) now for about half of the languages in CHILDES, including all of the Romance and Germanic languages, along with Turkish and soon East Asian languages. Note that many of these did not have MOR grammars and many corpora were not tagged, but most are now. All of French is tagged using UD.
During this process it was necessary to move as much as possible to standard orthography for each language, as determined by the computational linguists creating the training sets for UD. For French, this means having j’ai as one unit, for example.
If you wish to run UD on new French data, please make sure first that the files pass check. After that you can either send the data to me for addition to CHILDES or you can use the morphotag command inside Batchalign which you could download from https://github.com/talkbank.

— Brian MacWhinney
Teresa Heinz Professor of Cognitive Psychology,
Language Technologies and Modern Languages, CMU

Fagniart Sophie

unread,
Oct 31, 2023, 8:33:34 AM10/31/23
to chib...@googlegroups.com, info-c...@groups.google.com
Mrs Spektor, thank you for the command code, it works very well and will save me a lot of time. Yes, I made a few mistakes updating my MOR files from a recent update, but everything is back to normal, thank you for your information.

Mr McWhinney, thank you for all this precious information on this very interesting system. I'll familiarize myself with how the system works and get back to you.

Kind regards,

Sophie

Reply all
Reply to author
Forward
0 new messages