Hi all,
I have a problem that I haven’t found a satisfying solution for yet.
I ran two vector space models on amplifier-adjective bigrams (very + good, really + good, very + nice, really + nice, etc.) in two corpora.
So, the data the models worked on looked something like this:
nice good beautiful bad …
so 15 5 13 2 …
really 10 54 12 5 …
very 30 24 34 20 …
…
Now I have two vectors of cosine values for the amplifiers reflecting their distributional similarity in the two corpora (I essentially followed Leshina, Natalia. 2015. How to do Linguis,tics with R. Data exploration and statistical analysis. John Benjamins: Amsterdam & Philadelphia: 323-332 in creating the models).
I want to test whether the cosine values for very and really are more similar in one corpus compared to the other. In other words, I want to test, whether two individual values that are part of two vectors of values differ significantly (mind you – I am not interested in whether the distributions differ but only two specific values out of the two distributions).
Comparing the cluster solutions and stating that the number of nodes connecting really and very in the two corpora seems quite unsatisfactory in this case… I thought about z-transforming the values so that I can at least compare them but this does not tell me much in terms of significance. Also, I thought about modeling the data as a repeated-measures design but this does also not strike me as a recommendable procedure for my data (as it is not truly a repeated-measures design). I am sorry for bothering you all with this, but I am at a loss about how to test for significance here. Any ideas?
Best, Martin
--
You received this message because you are subscribed to the Google Groups "StatForLing with R" group.
To unsubscribe from this group and stop receiving emails from it, send an email to statforling-wit...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.