Dear Radim,
Many thanks for your response. I am interested in studying modern language models to further our understanding of human psychology. The modern language models such as word2vec, encode the language patterns of thousands or millions of individuals. Thus, I think, instead of asking individuals about their opinions, thoughts, or feelings (in the context of an online or a lab study), one could interview/consult the language model. In the context of studying bias, for example, instead of asking people about their perceptions of young vs. old, or men vs. women (while trying to find a way of circumventing people’s tendency to hide their biases), one could examine the model derived from the language samples collected in the wild. Like in this Science paper https://science.sciencemag.org/content/356/6334/183. Or study gender biases encoded in language by examining the probability distributions over words predicted to finish a sentence such as “She is…” vs. “He is…”.
To answer your question about the kind of models needed, I think that the most useful ones would be derived from language used spontaneously in the wild to describe people’s own thoughts, feelings, opinions,etc. (e.g., Twitter, blogs, websites). Interesting insights could be also derived from language used to describe others / society / politics (e.g. Google News or Wiki datasets).
I can definitely train the models myself, but I would rather examine a range of existing, state-of-the-art, models. This would not only enable others to easily replicate / build on my research, but would also minimize the chances that my findings are biased by the particular approach I used.