Regression analysis: genre variation in unbalanced corpora

39 views

Skip to first unread message

cbechet90

unread,

Aug 26, 2020, 2:04:25 PM8/26/20

to StatForLing with R

Dear all,

I know that such questions do not relate to R programming, but since methodological issues are often discussed in the books, I venture to ask my question in the group.

Since historical corpora are hardly genre-balanced (except for COHA, but what I'm working on is based on corpora with a greater time-depth), is it methodologically sound to use genre as a predictor in regression analysis when one is not sure about the genre distribution over the different periods of time? What if the corpus size is 1+ billion words (e.g. in EEBO)? Isn't there a way to do away with problems of overrepresentativity of an item once the corpus is very large?

Reply all

Reply to author

Forward

0 new messages