Dear Colleagues
I am pleased to announce the launch of the Global Contentious Politics Dataset, a comprehensive and fully automatically curated resource of social movements, protests, and conflict, made available through our
project website. This dataset is the result of an extensive research effort funded by the European Research Council (ERC) and leverages the latest advancements in Artificial Intelligence, Natural Language Processing, and Machine Learning to analyze and catalog instances of contentious politics in Argentina, Brazil, India, South Africa, and Turkey, and in some additional countries in the future. Please find
here the YouTube video of the project.
The Global Contentious Politics Dataset (GLOCON) stands as the inaugural multicountry protest event repository tailored for the Global South, harnessing local news sources through automated data processing. It catalogs an array of contentious political events ranging from protests and rallies to strikes, confrontations, and episodes of political turbulence. Developed during the Emerging Markets Welfare (EMW) Project, GLOCON was originally designed to examine the interplay between contentious politics and social welfare schemes in the Global South but we hope the dataset will be useful for a broader academic community, including social movement, conflict and computational social science scholars. The dataset, pioneering in its multilingual and fully automated collection, spans from the 1990s to the present, with event specifics on timing, location, participants, and organizers included. Particularly for India and South Africa, it distinguishes between rural vs. urban and violent vs. non-violent events, features accessible through the interactive Dashboard.

As of 2023, GLOCON houses data on 621,290 events derived from local news documents. The dataset is updated annually, with aspirations to expand its global reach. Data from India, South Africa, Argentina, and Brazil are harvested in English, Spanish, and Portuguese, respectively, while Turkish sources are manually processed for Turkey and they include data on the Kurdish ethnic conflict. The GLOCON Dashboard allows users to visualize event data through geolocation, temporal, and categorical filters, offering a dynamic tool for researchers. Further insights into the dataset’s methodology, definitions of protest event features, and the creation of training data can be found in the comprehensive Annotation Manual. For raw data access, users are directed to the Download section.
The supervised machine learning algorithm, modeled on human annotative precision, utilizes a 'Gold Standard Corpus' (GSC) – a meticulous double-annotated dataset of 17,000+ documents that shapes the accuracy of the automated system. This annotation process, carried out by skilled social science graduate students at Koç University and the University of Sao Paolo under expert guidance, ensures the consistency and quality of the GLOCON database. We believe that this GSC will be an important source for computational social scientists.
We invite you to explore the dataset and utilize it in your research and teaching. To facilitate its use, we have provided comprehensive documentation and user guides. Furthermore, we encourage feedback and collaboration, as we see this launch not as an end but as the beginning of an ongoing dialogue within the academic community. Should you have any inquiries or require further assistance, please do not hesitate to contact us through the contact form available on our website. We look forward to your contributions in advancing our collective understanding of contentious politics.
Thank you for your attention, and we eagerly await the insights that your engagement with the Global Contentious Politics Dataset will undoubtedly bring.
With my best wishes
On behalf of the GLOCON Team
Dr. Erdem Yörük