Hi Kristopher and Scott,
I am Kexin Yan, a current PhD student focusing on corpus linguistics and using Taales 2.2 in my study. I would like to ask some questions about TAALES 2.2.
First, as for the word frequency index, in TAALES 2.2, there are at least 9 indices about the written word frequency indices, corpora ranging from BNC, COCA (4 SUB-CORPORA), Kucera-Francis, SUBTLEXus, Thorndike-Lorge, to Brown corpus. I know that some corpora, such as COCA, are updated gradually. For example, here is the link to COCA:
https://www.english-corpora.org/coca/ , and the official intro says the current version contains "
25+ million words each year 1990-2019 from eight genres: spoken, fiction, popular magazines, newspapers, academic texts, TV and Movies subtitles, blogs, and other web pages.". Can I please know the COCA version information used in TAALES, such as word number, and the data collecting year (e.g. from 2001 to 2005)?
By the way, I see the user manual of TAALES 2.5 and 2.8, there are differences in the definition of content words and function words between the 2 versions. However, now the user manual of TAALES 2.2 is unavailable from the website, can you describe the definition of content words and function words in TAALES 2.2, please?
Thank you so much.
Kind regards,
Kexin