Changes to Gleason files

carla hudson Kam

unread,

Oct 31, 2013, 3:12:55 PM10/31/13

to chib...@googlegroups.com

I apologize if the answer to this is somewhere already, but I couldn't find anything relevant when I searched old messages.

I've been using various data files in classes for several years, and a question came up about one of them today, that seem curious. It's about the Nanette file (the mother transcript specifically, but it could be true of other parts of Nanette, or other children in the Gleason files). The version of the file I have (downloaded several years ago) contains a lot of overregularized past tense verbs produced by the Mother. It's a version of the file with a %mmor tier but no %gra tier, which should give people an idea of how old it is. One of my students was trying to figure out why this might be the case, and went and looked at the file on the website (In this particular class I give them excerpts in an excel file) and found that the overregularizations weren't in the file. I double checked that it's not just an issue with the excel file I created, and it's not, they overregularizations are actually in the .cha file I have.

My question is whether anyone knows when and how these changes were made to the files, and if there have been similar changes made to other files. Were the overregularizations in the original just transcription errors (although odd ones, since breaked and broke are very confusable)? If not, then the replacement correct forms are errors right?

Thanks for any and all assistance with this.
Carla Hudson Kam

Carla L. Hudson Kam
Associate Professor
Canada Research Chair in Language Acquisition
Department of Linguistics
University of British Columbia

Brian MacWhinney

unread,

Oct 31, 2013, 4:09:03 PM10/31/13

to ChiBolts, carla hudson Kam

Dear Carla,

The problem here is that, in the good old days, CHAT allowed people to use the method of encoding grammatical morphemes directly onto the main line. This method was originally introduced in SALT. So, the word broke was actually coded on the main line as break-ed. This was obviously a pretty terrible thing because it was never clear when something was a main line morphemicization vs. a real error. We stopped using this practice and removed these codes from the transcripts in the early 1990s. Once we built the MOR program, then we changed CHAT to no longer allow for this type of main-line coding.

Fortunately, for the Gleason data we later received all the audio. So, your student could go back to check these out. I am curious what they learn.

—Brian MacWhinney

--
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+u...@googlegroups.com.
To post to this group, send email to chib...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/0ae3618b-c360-471f-b943-f069a6886474%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

William Snyder

unread,

Oct 31, 2013, 5:05:31 PM10/31/13

to chib...@googlegroups.com, carla hudson Kam

Dear Brian,

Is there any type of "log" where changes to the CHILDES transcripts are documented?

If not, do you think it might be possible to begin one, and make it visible to CHILDES users (somewhere on the website)? (I assume this situation arises only rarely, so maybe it wouldn't take too much effort?)

With best wishes,

William

To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/4D1AA3E4-1BFF-49E0-9FFD-3C92A9315EC0%40cmu.edu.

Brian MacWhinney

unread,

Oct 31, 2013, 6:25:44 PM10/31/13

to ChiBolts, William Snyder, carla hudson Kam

Dear Info-CHILDES,

I realize this is an important issue. I think it would be most useful to document the major types of changes, rather than some log file listing line-by-line changes. Tracing through all the little changes would not be that easy going forward and impossible going backwards. I will try to compile a few paragraphs explaining these, when I get a chance.

If people want to test alternative analyses against an absolutely unchanging target corpus, then I recommend storing a version locally. Or, if several people want to target some particular data set for competing analyses, as is being done for the segmentation analysis learning programs, then we can store that particular target corpus in a standard frozen form.

— Brian

To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/CAJhjXi%2BDyj2%3Dqvs4B91qboJfygE18GkOA%3DnkYHSU_wuNiDod%3DA%40mail.gmail.com.

Reply all

Reply to author

Forward