Thank you so much for your explanations. Everything is becoming a bit clearer. I have two follow up questions.
First, I want to check if my understanding of #1 (COCA_Academic_Frequency_Log_CW) is correct. Here is what I understand:
1. The program checks to see which words in the text appear in COCA.
2. Each word in COCA appears X number of times -- that is its "frequency score." This frequency score is then log-transformed.
3. Each word in the text gets a frequency score based on the log-transformed frequency scores in COCA.
4. The frequency scores of all the words in the text are summed. This sum becomes the numerator.
5. The denominator is the number of words in the text that appear in COCA.
6. Interpretation: texts with higher index scores use more high frequency words.
Second, I want to check if my understanding of #2 (COCA_academic_bi_T) is correct. Here is what I understand:
1. The program checks to see which bigrams in the text appear in COCA.
2. The T-score for each bigram is calculated.
3. The T-scores for all the bigrams in the text that appear in COCA are summed. This becomes the numerator.
4. The denominator is the number of bigrams in text with T scores in COCA.
5. Interpretation: texts with higher index scores contain more bigrams with high T-scores.
Please correct me if am understanding incorrectly. I really appreciate your time and patience!
Best,
Shireen