Freqmerg and neologisms

Birsu Kandemirci

no leída,

10 dic 2021, 6:52:1710/12/21

a chib...@googlegroups.com

Hi everyone,

I have two questions about freqmerg and neologism functions on CLAN.

1) When using freqmerg, is there a way to create an excel spreadsheet? I've been trying to add +d2 to the freqmerg code but the output seems to suggest it's invalid. I will end up with a big document (hoping to merge around 100 stories to look for commonly used words) and having tried with smaller freqmerg outputs, it doesn't seem straightforward to copy-paste the output to excel and work on it manually.

2) We are using @n (neologisms) as a way to highlight grammatical mistakes and/or typos. We were hoping that it would be possible to count the number of words tagged with @n with a specific code but so far we couldn't seem to find it. Is the best way to count them manually?

Please let me know if you need further clarification re my questions. Thank you!

Best wishes,

Birsu

Leonid Spektor

no leída,

10 dic 2021, 10:42:4110/12/21

a ChiBolts

Hi Birsu,

FREQMERG is a short and limited version that does what FREQ command now can do with just few more options. FREQMERG was created a long time ago when FREQ was much more limited in its functionality. Now you are better off using FREQ command by itself.

1) As I understand what you are doing you can use command "freq +o3 +u +d2 *.cha" to get your result. If you are interested in words of just one speaker, for example *CHI, then use command "freq +t*chi +u +d2 *.cha".

2) To look for @n (neologisms) words add option "+s*@n" to either one of above commands.

If I misunderstood your goal, then please provide more information about what you are trying to achieve.

Leonid.

--
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/CAK11xC8ivrKyxrL6QEgvEn3SO6OQG%2BBAbMzN22yHgBuuq%3DwXZw%40mail.gmail.com.

Xiaowei Zhao

no leída,

11 dic 2021, 12:11:3411/12/21

a chib...@googlegroups.com

Hi Birsu,

We just did an error analysis on a corpus, and here is the command we use to list all the errors marked with neologism. Hope it helps.

freq +s"[\* n*]" +d6 +u *.cha

Best wishes,

Xiaowei Zhao

Xiaowei Zhao, Ph.D.

Associate Professor of Psychology

Emmanuel College

400 The Fenway | Boston | MA 02115

--

Brielle Stark

no leída,

11 dic 2021, 12:21:4611/12/21

a ChiBolts

For that command to work, the neologisms need to have been marked as [: n*] rather than @n, for clarification. The typical error coding is the [: ] notation. It may be easiest to find and replace @n with the [: ] notation to standardize the usual error marking.

Brie

Brielle C. Stark, PhD
Assistant Professor
Department of Speech, Language and Hearing Sciences
Program in Neuroscience, Cognitive Science Program
Indiana University Bloomington

sent from mobile, please excuse errors

To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/CANVosvVUAW8ew1%2BPP%2BwOa-Gu8er8tjuPHXiOMukEsY88t%3DT3nQ%40mail.gmail.com.

Brielle Stark

no leída,

11 dic 2021, 12:22:4011/12/21

a ChiBolts

At second glance, I see the backslash there in the code, so that may work with the @n. But just for future, usually errors are marked with [: ]. Good luck!

Brie

Brielle C. Stark, PhD
Assistant Professor
Department of Speech, Language and Hearing Sciences
Program in Neuroscience, Cognitive Science Program
Indiana University Bloomington

sent from mobile, please excuse errors

Brian Macwhinney

no leída,

11 dic 2021, 14:39:3511/12/21

a ChiBolts

Dear Brie,
I agree that it is time to replace use of the @n special form with more specific and diagnostic coding. What people have called neologisms seem to fall into three general classes. First, there are morphological marking errors (omissions, overgeneralizations, undergeneralizations, wrong affix) that are best handled through the use, first of the [: target] replacement form in order to allow MOR to work smoothly. Second, there are forms that are something like non-words. For these, the target is just not clear and it would be best to mark them using some other special marker, such as @b, @c, or @wp. Third, there are occasionally forms that seem like true “creative” neologisms, often based on compounding or analogy with some other forms. My sense is that the amount of true “creative” neologism in CHILDES or AphasiaBank is rather low. So, converting the @n forms to something more accurate and diagnostic would be a good idea. I don’t think this is a major problem for AphasiaBank, because Davida used the newer system for error coding.

Let me add a bit of history. Back in the 70s, people were quite concerned about not characterizing child forms as “errors”. In part, that was the reason for relying on the @n marker for tagging various error forms neologisms. In addition, the error coding scheme that we developed in the 80s was so cumbersome that it was actually never used. About 9 years ago, Davida and I worked out the newer error coding system and we could use this scheme to reformat the uses of @n to either the error coding format or else other special form markers, perhaps leaving a few true neologisms

— Brian

> To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/CAEs2yTorO_AHWL8zG7LJv7O9QiYqO6fKjC8F7zqsAzFmPtGhxw%40mail.gmail.com.

Responder a todos

Responder al autor

Reenviar