NLP for the social sciences: opportunities and challenges
Massive digital datasets, such as social media data, are a promising source to study social and cultural phenomena. They provide the opportunity to study language use and behaviour in a variety of social situations on a large scale. However, to fully leverage their potential for research in the social sciences, new computational approaches are needed. In this talk I will start with a general introduction to this research area. I will then focus on two case studies. First, I discuss how natural language processing can help to scale-up social science research by applying social science theories on large scale naturalistic text data. I discuss how we investigated the impact of participants’ motivations in the public health campaign Movember on the amount of campaign donations raised based on the Social Identity Model of Collective Action. Second, I will discuss how advances in machine learning can be used to develop better tools for sociolinguists. Existing approaches to identify variables that exhibit geographical variation (e.g., pop vs. soda vs. coke in the US) have several important drawbacks. I discuss a method to measure geographical language variation based on Reproducing Kernel Hilbert space (RKHS) representations. I then conclude with discussing my perspective on a few big challenges in this area.