MLU in characters?

12 views
Skip to first unread message

Janet Bang

unread,
Jul 25, 2025, 2:53:22 PMJul 25
to chibolts
Hello!

Is there a way to use the MLU program to extract MLU in characters? We are exploring measures to facilitate cross-linguistic comparisons between English and Spanish and someone had recommended using characters (over MLU words) given the orthographic transparency of Spanish. 

We saw some other programs on github, but I was hoping there was something within CLAN because we had already used MOR within CLAN.

Thanks, 
Janet

Leonid Spektor

unread,
Jul 25, 2025, 3:12:41 PMJul 25
to chib...@googlegroups.com
Janet,

I am sorry to say it, but MLU can only count words or morphemes.

If you plan to use another program, then please keep in mind that MLU uses a lot of rules to decide if utterance or word(s) should be counted. You can read those rule in CLAN manual at https://talkbank.org/0info/manuals/CLAN.pdf. Please look for chapter "7.19" MLU in the manual.


Leonid.

--
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/chibolts/9b2ae135-2fdb-4b55-b9f2-06886ace8217n%40googlegroups.com.

Janet Bang

unread,
Jul 25, 2025, 3:15:52 PMJul 25
to chib...@googlegroups.com
Got it, thank you!



--
Janet Y. Bang, Ph.D (she/her/hers)
Assistant Professor
Child and Adolescent Development
Lurie College of Education, San José State University
Message has been deleted

Rasmus Steinkrauss

unread,
Jul 29, 2025, 12:26:54 PMJul 29
to chibolts
Dear Janet,

To get at the MLU in character, it might be worth a try to run WDLEN and then calculate the MLU in characters from the values provided there.

Best wishes,
Rasmus

Shanley

unread,
Jul 29, 2025, 12:26:57 PMJul 29
to chib...@googlegroups.com, Shanley
The poor person’s workaround - you could tweak the system by just making each character into a word - i.e. by putting a space between every character on whatever tier you’re using to count MLU. Surely a python script could easily do this for you.

Or a more complicated variant would be to write a python script to calculate what you want from the existing file.

In both cases, you should of course take Leonid’s observation below into account - that you’d need to first decide which words/utterances should be included.

Best,
Shanley Allen.




********************************************************************************
Prof. Dr. Shanley E. M. Allen
Director, Psycholinguistics and Language Development Group
Center for Cognitive Science
University of Kaiserslautern-Landau
Erwin-Schrödinger-Straße 57/409
67663 Kaiserslautern
Germany

e-mail: al...@rptu.de
phone: +49-631-205-4136
fax: +49-631-205-5182
office: Building 57, Office 409
web: http://www.sowi.uni-kl.de/psycholinguistics/home/
********************************************************************************

Nan Bernstein Ratner

unread,
Jul 29, 2025, 1:24:54 PMJul 29
to chib...@googlegroups.com, Shanley
Couldn't WDLEN do something in this regard? It counts characters...

Nan Bernstein Ratner, F-, H-ASHA, F-AAAS, Board Certified Specialist in Stuttering, Cluttering, and Fluency Disorders 
she/her/hers
Distinguished University Professor
Hearing and Speech Sciences
University of Maryland
0100 Lefrak Hall, 7251 Preinkert Drive
College Park, MD 20742

Faculty, Language Science (languagescience.umd.edu; Neuroscience & Cognitive Neuroscience (NACS, nacs.umd.edu), Developmental Science Field Committee





Leonid Spektor

unread,
Jul 29, 2025, 2:01:27 PMJul 29
to chib...@googlegroups.com, Shanley
HI,

It is easy to add an option to MLU to count characters over utterances. Currently MLU counts words or morphemes over utterances. 

Just to confirm I understand what you want. I will change +b option to count characters or words. In the case of counting characters each word will be used to count how many characters are in that word and the sum of all characters will be used to count MLU over utterances. Is this what you want?

If it is, then I will put new version of CLAN on the web by the end of today.


Leonid.

Janet Bang

unread,
Jul 29, 2025, 2:29:25 PMJul 29
to chib...@googlegroups.com, Leonid Spektor, Shanley
Hi everyone,

Thank you for your ideas. The thought also crossed our mind to do the manual version of inserting a space! We'll also look into WDLEN.

@Leonid Spektor, yes I think that would work for our exploratory use case (comparing types, tokens, and MLU for English and Spanish using morphemes, words, and characters). We are still in early stages. 

Would the +b option consider the same words and utterances that would be counted with MLUw? Or would this disregard the MLU rules that are built in?

Janet

Leonid Spektor

unread,
Jul 29, 2025, 6:33:36 PMJul 29
to Janet Bang, chib...@googlegroups.com, Shanley
I have changed MLU to count characters. The new options are -bw for counting words and -bc for counting characters. Without -b option MLU will count morphemes. 

New CLAN is on the web.


Leonid.

Janet Bang

unread,
Jul 29, 2025, 7:10:51 PMJul 29
to Leonid Spektor, chib...@googlegroups.com, Shanley
Much appreciated!
Reply all
Reply to author
Forward
0 new messages