README

Masaki Eguchi

unread,

Apr 6, 2022, 3:58:07 PM4/6/22

to Suite of automatic linguistic analysis tools

Welcome to the SALAT support community page! This community page was created to better support users of linguistic analysis tools such as TAALES, TAACO, TAALED and TAASSC (among others).

This page is moderated by Kristopher Kyle and his PhD students (Masaki Eguchi and Hakyung Sung) and by Scott Crossley and his PhD students.

When you encounter any technical issues in using the above tools, you are more than welcome to start a "New conversation" (or a thread) to ask your questions or start a discussion on the tool. You can also post any feature requests and/or general discussions about the tools. Before posting a question:

Please check the official documentation first to see if any of your questions are covered, and
Please check the already-existing threads and if they help answer your questions (use the search box above!).
Please use the following template to get you started, which helps us to keep track of necessary information. This will allow the community members to respond to your questions more efficiently.

=== start of the template ===

Title:

{ Name of the Tool }: { Topic }

Body:

{brief description of what you are trying to do}

{brief description of what you have tried, what error message and/or output you are getting and not}

Operating system: {OS (version); MacOS Monterey 12.1}
Linguistic tool: { The name of the tool; e.g., TAALES }
Versions of the tool: {version; e.g., 2.8.1}
Distribution: { GUI/ Python package; e.g., GUI }

If you are working on a python package:

Python version: { e.g., python 3.8, anaconda distribution}
spaCy and pylats installed: {yes/ no/ not sure if this is relevant}
Other packages in your working environment: {list of packages and their version; optional}

Labels for the post if applicable (Choose from below):

#bug #installation #file_formatting #output_file

=== end of the template ===

Thank you for spending time reading this document! Your contribution to the community is greatly appreciated!!

Brahim Ait Hammou

unread,

May 27, 2022, 6:24:32 AM5/27/22

to Suite of automatic linguistic analysis tools

Thank you for inviting me to this group. I'm looking forward to more updates about the tools.

I have a question related to TAALES: Why are the frequency and range counts based on word tokens instead of lemmas or at least word types? I think there should be more options in the tool.

sacro...@gmail.com

unread,

May 27, 2022, 11:05:36 AM5/27/22

to Suite of automatic linguistic analysis tools

Most of the frequency and range counts are lemma based.

Please see https://www.linguisticanalysistools.org/uploads/1/3/9/3/13935189/taales_2.2_index_guide_11-8-2016.xlsx

The index guide has a column (M) labeled "Raw/Lemma"

It let's you know which indices are lemma or token based.

Scott

Ait Hammou

unread,

May 27, 2022, 11:19:26 AM5/27/22

to sacro...@gmail.com, Suite of automatic linguistic analysis tools

Thank you professor,

I have already downloaded and seen the index guide because I have used TAALES (and also TAALED & TAASSC) in my dissertation study and in other short articles. The very important (compared to traditional frequency lists such AWL) COCA frequency and range indices are not lemmatized (raw).

My comment is technical: Isn't there a way to (technically) allow the researcher/user to choose the way he/she wants to treat the words in the corpus?

I think this needs only including a tagger/lemmatizer which will receive the command and process the corpus accordingly.

Regards,

--
You received this message because you are subscribed to the Google Groups "Suite of automatic linguistic analysis tools" group.
To unsubscribe from this group and stop receiving emails from it, send an email to linguistic-analysi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/linguistic-analysis-tools/38a0a1e7-d3f2-41f7-8064-99a24cce98d4n%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

Brahim Ait Hammou - Morocco

Ait Hammou

unread,

Jun 6, 2022, 7:48:41 AM6/6/22

to sacro...@gmail.com, Suite of automatic linguistic analysis tools

Good morning dear researchers,

I have two questions related to TAALES if you don't mind:

1) I wonder how function and content words are operationalized in TAALES. What are the grammatical categories which make function words?

2) Is the classification of content/function words adopted in TAALES similar to the one which is adopted in COCA (or other known corpora such as BNC)?

I hope I am clear.

Regards,

Scott Crossley

unread,

Jun 6, 2022, 8:28:55 AM6/6/22

to Ait Hammou, Suite of automatic linguistic analysis tools

Hi Ait,

TAALES uses a stopword list.

It should be this one: http://www.d.umn.edu/~tpederse/Group01/WordNet/words.txt

Kris is working on a new pipeline that will incorporate spaCy, and this may use the built in spaCy stop word list (although we may use existing stop word list). Kris can say more about this.

Scott

--

Scott Crossley
Professor, Departments of Applied Linguistics, Computer Science, and Learning Science
Georgia State University

http://alsl.gsu.edu/profile/crossley-scott/

https://www.linguisticanalysistools.org/

Ait Hammou

unread,

Jun 6, 2022, 8:47:29 AM6/6/22

to Scott Crossley, Suite of automatic linguistic analysis tools

Thank you very much. I will check it out.

Kristopher Kyle

unread,

Jun 6, 2022, 1:23:21 PM6/6/22

to Ait Hammou, Scott Crossley, Suite of automatic linguistic analysis tools

Hi Ait,

Earlier versions of TAALES (up to and including version 2.2) use the attached function word stop list (see the last row). Later versions use a part of speech tagger to identify content words as verbs, nouns, adjectives, and some adverbs.

Upcoming versions of TAALES define content words as lexical verbs (excluding copular "be" and "have" and "do" when used as auxiliary verbs), nouns, adjectives, and adverbs that end in "-ly".

Best,

Kris

To view this discussion on the web visit https://groups.google.com/d/msgid/linguistic-analysis-tools/CAEAsvByrXkfidK93hU_sCzS8P9TnMb1dNAH_kOjwTbTYTKmN_Q%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

--

Kristopher Kyle

Assistant Professor

Department of Linguistics

University of Oregon

www.kristopherkyle.com

master_word_list.txt

Ait Hammou

unread,

Jun 6, 2022, 1:39:27 PM6/6/22

to Kristopher Kyle, Scott Crossley, Suite of automatic linguistic analysis tools

Thank you so much professor. That was very helpful.

Ait Hammou

unread,

Jun 15, 2022, 3:55:33 AM6/15/22

to Kristopher Kyle, Scott Crossley, Suite of automatic linguistic analysis tools

Good morning all,

I have a question concerning TAASSC if you don't mind.

Why is the COCA spoken section not taken into account in the output results of syntactic sophistication? (at least in version 1.3.8)

Regards,

Kristopher Kyle

unread,

Jun 20, 2022, 4:50:09 PM6/20/22

to Ait Hammou, Scott Crossley, Suite of automatic linguistic analysis tools

Thanks for the question, Ait.

We had not formally evaluated the performance of Stanford CoreNLP on spoken texts, so we decided to omit that data.

We got a grant to investigate NLP accuracy on spoken L2 texts... those papers will be coming out soon. The good news is that state of the art NLP tools (e.g., Spacy's transformer models) do quite well on spoken texts (particularly if they are trained with appropriate corpora).

The next version of TAASSC (which we plan to release sometime in Fall 2022) will feature these models (and different reference corpora).

Best,

Kris

--

Kristopher Kyle

Associate Professor

Ait Hammou

unread,

Jun 20, 2022, 4:53:19 PM6/20/22

to Kristopher Kyle, Scott Crossley, Suite of automatic linguistic analysis tools

Great, thanks for the update!

Suite of automatic linguistic analysis tools

unread,

Jun 27, 2022, 10:51:38 PM6/27/22

to Suite of automatic linguistic analysis tools

Dear All,

Thank you very much for starting the first conversation in the group. We greatly appreciate your contributions.

I wanted to send this message to remind you that you can create a new conversation in the group, not posting as a reply to this readme thread (see the screenshot for how to locate the new conversation button). It would indeed be great if you could start an independent conversation thread if you don't find any relevant conversation threads. We plan to assign labels to each thread as the conversation grows for future references. That way, we can curate FAQ as the group organically grows.

Thank you again for being one of the first contributors to the group, and I hope to see more conversations on this platform.

Sincerely,

Masaki Eguchi

Ph.D. Candidate

Learner Corpus Research and Applied Data Science lab

Department of Linguistics

University of Oregon

Reply all

Reply to author

Forward