Issues about TAALES.

43 views
Skip to first unread message

Yan, Kexin

unread,
Dec 5, 2022, 4:58:47 PM12/5/22
to linguistic-a...@googlegroups.com
Hi Kris,

Thank you for your reply. In this email, I need to confirm some sentences. You can reply "yes" or "no" in each descriptive sentence. As for answering "no" sentences, you can add your comments. 

Description sentences group  1. In BNC frequency indices, for example, for BNC_written_bigram_normed,  if I can see a number result (such as the frequency of how is is 0.008586364), then it means that how is occurs in BNC written corpus at least 1 time.  If the result of one bigram is N/A (for example, the result of lately I is N/A), it means that lately I is cannot be searched in BNC written corpus at all (the occurrence = 0). The arbitrary cut-off of 5 occurrences is not applied in BNC frequency indices

Sentence 2. In COCA frequency indices, for example, for COCA_Academic_Bigram_Frequency, if I can see a number (such as the frequency of how is is 7.07015027663), then it means that how is occurs in COCA_Academic corpus at least 5 times. If the result of one bigram is N/A (such as the result of you lately is N/A), due to the arbitrary cut-off of 5 occurrences being applied, the frequency of you lately is less than 5, probably it can be searched in COCA Academic corpus (0< occurrences <5 ), probably it cannot be searched in COCA Academic corpus at all (occurrence =0). 

Thanks for your reply. Thank you very much. I am looking forward to seeing your comments.

Kind regards,

Kexin Yan

05/12/2022

Kristopher Kyle

unread,
Dec 6, 2022, 7:28:11 PM12/6/22
to Yan, Kexin, linguistic-a...@googlegroups.com
Description sentences group  1. In BNC frequency indices, for example, for BNC_written_bigram_normed,  if I can see a number result (such as the frequency of how is is 0.008586364), then it means that how is occurs in BNC written corpus at least 1 time.  If the result of one bigram is N/A (for example, the result of lately I is N/A), it means that lately I is cannot be searched in BNC written corpus at all (the occurrence = 0). The arbitrary cut-off of 5 occurrences is not applied in BNC frequency indices
Not exactly. I didn't calculate the BNC norms, but each list comprises the 50,000 most frequent bigrams or trigrams. The number represents the frequency normed per 1,000 words. Working backwards, the lowest frequency represented in the written bigram list (e.g., "and fishing") occurs approximately 170 times (this is approximate because I don't have the actual numbers they were working with - there are a few possible permutations. I searched one version of the written BNC and found this string 159 times in the written section of the BNC).

Sentence 2. In COCA frequency indices, for example, for COCA_Academic_Bigram_Frequency, if I can see a number (such as the frequency of how is is 7.07015027663), then it means that how is occurs in COCA_Academic corpus at least 5 times. If the result of one bigram is N/A (such as the result of you lately is N/A), due to the arbitrary cut-off of 5 occurrences being applied, the frequency of you lately is less than 5, probably it can be searched in COCA Academic corpus (0< occurrences <5 ), probably it cannot be searched in COCA Academic corpus at all (occurrence =0). 

Yes, based on the 1990-2012 version of COCA

--
You received this message because you are subscribed to the Google Groups "Suite of automatic linguistic analysis tools" group.
To unsubscribe from this group and stop receiving emails from it, send an email to linguistic-analysi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/linguistic-analysis-tools/LO4P265MB3583DDD2332043BE5B191DBEDB189%40LO4P265MB3583.GBRP265.PROD.OUTLOOK.COM.
For more options, visit https://groups.google.com/d/optout.


--
Kristopher Kyle
Associate Professor
Department of Linguistics
University of Oregon
Reply all
Reply to author
Forward
0 new messages