Hi everyone,
Thanks for your reply. I have another question to ask.
In some published papers, authors will search for bigrams in one large national corpus to see the frequency, and then decide whether to calculate the MI score and t-score.
authors search bigrams in BNC. If the frequencies are below 5, then those target bigrams will be labelled in the below threshold group. Then those word sequences will not be assigned association scores. If word
sequences can reach the frequency threshold of 5 occurrences in the BNC, then those word sequences will be assigned MI scores and t-scores. Obviously, in the below threshold group, the occurrences can be 4, 3, 2, 1 even 0. 0 means that the matched word sequences
are absent in BNC. For some sequences, even if the frequency of one target word sequence is 4, it still cannot be assigned association scores.
In TAALES 2.0, as for association strength calculation, I also have questions about the frequencies. I copy the calculation results of TAALES as an example.
COCA_academic_bi_ MI |
how is |
-0.436766314163 |
|
is going |
2.4478375438 |
|
you lately |
N/A |
|
you want |
4.46709863611 |
|
about with |
-2.59387173594 |
|
was helped |
N/A |
It is clear that in COCA_academic_bi_ MI index, different bigrams have different MI results. N/A means that two bigrams cannot be found in COCA academic corpus. My question is about the other 4 bigrams that assigned MI scores. For example, in COCA_ academic
corpus, whether as long as the word sequences can be found in it, then those sequences will be assigned association scores automatically? Even the occurrence of a word sequence in COCA_academic probably is 1, which means that this word sequence is rare in
the COCA_academic corpus?
By the way, after COCA calculation, there is a Microsoft Excel file named results, containing results of the target self-built corpus. I would like to know if in this excel file, does index "COCA_academic_bi_MI" means that this index is the mean score of one
txt file or txt files on the bases of bigram MI scores of the COCA academic corpus?
Thank you for your reply.
Kind regards,
Kexin