#bug TAALED 1.4.1: selected indices not included in output file (64-bit Windows 10)

Andrew Barnes

unread,

Jul 19, 2024, 11:23:46 PM7/19/24

to Suite of automatic linguistic analysis tools

Dear SALAT support team,

I am trying to process the short texts of Japanese EFL students. The texts have been placed in individual .txt files. Spelling and punctuation errors have been corrected. However, that is as far as I have formatted the input files. I have not used any labelling as in truth I have been unable to find the manual/guidelines for preparing data.

--------------------------------------------

I have tried the following versions and OSs with varying results:

· Operating system: 64-bit Windows 10

· Linguistic tool: TAALED

· Versions of the tool: 1.4.1

· Distribution: GUI

Selected indices do not appear in the output file. Proper Nouns such as “English”, “Japanese”, etc. are being ignored in calculations.

· Operating system: Mac OS

· Linguistic tool: TAALED

· Versions of the tool: 1.4.1

· Distribution: GUI

Select indices do appear in the output file, but Proper Nouns such as “English”, “Japanese”, etc. are being ignored.

· Operating system: 64-bit Windows 10

· Linguistic tool: TAALED

· Versions of the tool: 1.3.1

· Distribution: GUI

Selected indices do appear in the output file, but descriptive statistics are very different from the output of 1.4.1. I cannot make an educated guess as to why.

---------------------------------------------------------------

If possible, I would like a way to include common proper nouns such as “English”, “Japanese”, “Google” in calculations of lexical diversity.

Thank you for your help!

Andrew J Barnes
PhD Candidate, Waseda University

Kristopher Kyle

unread,

Jul 22, 2024, 4:48:24 PM7/22/24

to Andrew Barnes, Suite of automatic linguistic analysis tools

Hi Andrew,

Thank you for your message.

I will let Windows users chime in regarding common issues with Windows 10 - I do not often work in Windows - there may be a filepath issue (particularly if there are non-ASCII characters in the filepath).

As I recall, version 1.4.1 (released in 2020) fixed some tokenization issues that existed in 1.3.1 (released in 2018).

With the compiled version of TAALED there is not a quick way to not ignore proper nouns - I/we made the decision not to include the use of varied proper nouns as evidence of lexical diversity.

With the Python version(s) of TAALED it is straightforward to bypass this bit of code.

Hope that helps.

Kris

--
You received this message because you are subscribed to the Google Groups "Suite of automatic linguistic analysis tools" group.
To unsubscribe from this group and stop receiving emails from it, send an email to linguistic-analysi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/linguistic-analysis-tools/d487a569-7b12-4645-a845-2b38a1d8c76bn%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

Kristopher Kyle

Associate Professor

Department of Linguistics

University of Oregon

www.kristopherkyle.com

Andrew Barnes

unread,

Jul 23, 2024, 9:39:19 AM7/23/24

to Suite of automatic linguistic analysis tools

Hello Professor Kyle,

Thank you very much for your clarification. I am happy to use 1.4.1 as it was designed. Though, thank you for your suggestions regarding Python as well.

Regarding Windows, I do have access to a macintosh as a workaround.

Kind regards,

Andrew

Reply all

Reply to author

Forward