update

37 views
Skip to first unread message

Brian Macwhinney

unread,
Jul 6, 2023, 3:45:26 PM7/6/23
to fluen...@googlegroups.com
Dear FluencyBank,

Here is an update on several advances in FluencyBank over the last year. Apologies for cross postings.

1. Corpora:
* We have earlier FluencyBank protocol data for 22 children who stutter (CWS) and 38 adults who stutter (AWS).
* We have newer data from 15 CWS and 15 control children in the UMD/CMU corpus using the current protocol.
* We have created new versions of the transcripts for the IISRP corpus using the ASR system described below.
* We are processing a corpus of 40 AWS from Nathan Maxfield

2. Automatic Speech Recognition (ASR): In the summer of 2022, we worked with Houjun Liu to apply the Rev-AI ASR system and the Montreal Forced Aligner (MFA) to CHILDES and other TalkBank data using a Python script. This system has been remarkably successful, reducing transcription time to about 4 times recording time. We are now using it to automatically transcribe new data, transcribe untranscribed audio, and time-align FluencyBank and other TalkBank data. Although ASR is still having problems processing highly disfluent speech, it is doing relatively well now with moderate disluency and very well with data from control participants. An open-access article describing this "Batchalign" system is now available at https://doi.org/10.1044/2023_JSLHR-22-00642 and we have made the Batchalign system publicly available at https://github.com/talkbank . We are happy to provide email and Zoom support for users who want to explore use of this system.

3. Cross-tier analyses: Based on the word-level time-marking or the ASR output, we are now able to study patterns of disfluency across the lexical, phonological, morphological, and syntactic tiers in a CHAT transcript. This is made possible by the fact that these tiers are now in one-to-one alignment. So one can ask a question such as whether initial syllable repetition occurs more or less on words with a certain morphological class and syntactic position or phonological composition.

4. Collaborative Commentary: Based on funding from NSF, we are continuing development of the Collaborative Commentary (CC) system at https://talkbank.org/CC. CC allows project groups to create a set of tags for language behaviors and locate instances of those tags in CHILDES data available directly through the TalkBankBrowser in the web. Eight research groups are using the alpha version of this system for teaching and research. Three are using CC for data from AphasiaBank, one for ClassBank data, one for DementiaBank data, and three for CHILDES data. We also hope to encourage use of the system for analysis of specific patterns of disfluency.

5. TalkBankDB: Last year we completed initial development of the TalkBankDB data base search engine system at https://talkbank.org/DB. This year, we found that use of this system was so heavy that it put a strain on our servers’ capacities. To deal with this, we rewrote the server access code to offload computations to the client machine and to avoid reduplicative queries. We also installed a larger amount of memory on the server and now we are no longer experiencing any crashes. Going forward, we will be adding additional methods for cross-domain analyses for phonology, lexicon, syntax, and discourse. We are also continuing expansion of support for direct analysis of TalkBankDB data from R and Python.

Finally, we have plans to implement by the end of the summer a more standard and comprehensive authentication system for TalkBank.

— Brian MacWhinney and Nan Bernstein Ratner
Reply all
Reply to author
Forward
0 new messages