<apologies for cross-posting>
Call for Papers: Dependency Grammar for Typology
Workshop @ ALT 15 in Zhuhai, China; November 8-10, 2024
Large-scale multilingual corpora such as Universal Dependencies (de Marneffe et al 2021) have enabled advances in quantitative methods in morphosyntactic typology, allowing a transition from binary or multivariate classifications of linguistic features to more nuanced, continuous classifications. These enable us to capture variation better than ever before (Levshina et al. 2023) while studying linguistic variation from a token-based perspective (Haspelmath 2018). Going beyond use of these resources for typological research directly, the Universal Dependencies treebanks are used to annotate further large-scale multilingual corpora (Kondratyuk & Straka 2019) and to syntactically parse languages which are not covered within the framework yet as well as for zero-shot parsing (Ammar et al. 2016; Tran & Bisazza 2019; Üstün et al. 2022). Hence, they have become a valuable tool for multilingual morphosyntactic analysis, the products of which are indispensable for typology.
However, large-scale multilingual resources such as the Universal Dependencies treebanks have also been conceived of as problematic. A major concern for typologists has always been language sampling: this type of resource is typically biased towards including mostly WEIRD and especially European languages. Secondly, there is (as of yet) no devoted program to counter this sampling bias, i.e. any coordinated effort to include low-resource and less-described languages is on the shoulders of individual language specialists, whose time and funds are already under pressure. Third, as with any attempt to construct cross-linguistically appropriate schemes for tagging and annotation, the universal applicability of such schemes has been called into question (Croft et al. 2017).
This workshop aims to bring together typologists working using dependency-annotated resources for quantitative typological research. We aim to include both new studies that peruse dependency-annotated corpora to answer typological questions, as well as more critical authors who point to the limitations of ‘dependency grammar for typology’. This also includes proposals on how quantitative typology can be conducted using heterogeneous data sources and the development of new resources, as long as a focus on comparative research is maintained.
Topics of interest include, but are not limited to:
➔ Synchronic comparative studies on variation that can only be accessed using corpora, such as word order (Levshina 2019, Talamo & Verkerk 2022);
➔ Comparative studies that employ such resources to uncover universal principles of grammar, including dependency length optimization (Futrell, Mahowald & Gibson 2015; Liu 2021, Yingqi, Blasi & Bickel 2022), word order universals (Choi et al. 2021, Gerdes et al. 2021, Yan & Liu 2023), the memory-surprisal trade-off (Hahn, Degen & Futrell 2021);
➔ Diachronic studies of language change, such as the evolution rate of word order in main and subordinate clauses (Jing et al. 2023) or word order change (Hahn & Xu 2022);
➔ Theoretical challenges in annotation, such as the universality of syntactic labels, as well as of parts of speech, morpho-syntactic features, and tokenization (Croft et al. 2017, Osborne & Gerdes 2019, Sinnemäki and Haakana 2020, Hohn 2021);
➔ Development of new resources, in particular with respect to low-resource languages, starting from different type of texts (corpora, fieldwork notes, existing treebanks, Wikipedia, grammars, etc.) (Zariquiey et al. 2022, Kahane et al. 2023);
➔ Projects that employ such resources to go beyond sentence-level syntactic dependencies by developing additional layers of annotation for studying discourse and information structure, among other levels;
➔ Robustness and statistical validity of typological quantitative measures on the basis of different theoretical approaches and annotation schema (Gerdes et al. 2018, Osborne & Gerdes 2019, Yan & Liu 2019).
➔ Limits of dependency grammar for typology: issues such as unbalanced sampling, limitations of annotation in terms of availability, quality, as well as ‘missing’ annotation, and heterogeneousness of the annotation across treebanks, both in terms of application and quality.
We envision a worthwhile exchange between more traditional typologists and typologists who have already worked with these resources. If you want to join us, please submit your abstract to ALT15, explicitly indicating that it is intended for the workshop "Dependency Grammar for Typology". Instructions on how to submit abstracts can be found on the ALT2024 page:
https://sites.google.com/view/alt2024/call-for-papers ---- Abstracts are due March 15th!
Organizers: Andrew Dyer, Luigi Talamo, Annemarie Verkerk (Saarland University), Luca Brigada Villa, and Erica Biagetti (Universities of Bergamo and Pavia)
Ammar, Waleed, George Mulcaire, Miguel Ballesteros, Chris Dyer & Noah A. Smith. 2016. Many Languages, One Parser. In Transactions of the Association for Computational Linguistics, edited by Lillian Lee, Mark Johnson and Kristina Toutanova. 4:431–444.
Choi, Hee-Soo, Bruno Guillaume & Karën Fort. 2021. Corpus-based language universals analysis using Universal Dependencies. In Proceedings of the Second Workshop on Quantitative Syntax (Quasy, SyntaxFest 2021), 33–44, Sofia, Bulgaria. Association for Computational Linguistics.
Croft, William, Dawn Nordquist, Katherine Looney & Michael Regan. Linguistic Typology Meets Universal Dependencies. 2017. In Proceedings of the 15th International Workshop on Treebanks and Linguistic Theories (TLT15), edited by Markus Dickinson, Jan Hajic, Sandra Kübler, and Adam Przepiórkowski. 63–75. CEUR Workshop Proceedings.
Futrell, Richard, Kyle Mahowal & Edward Gibson. 2015. Quantifying Word Order Freedom in Dependency Corpora. In Proceedings of the Third International Conference on Dependency Linguistics (Depling 2015), edited by Joakim Nivre, Eva Hajičová, 91–100, Uppsala, Sweden. Uppsala University, Uppsala, Sweden.
Gerdes, Kim. Bruno Guillaume, Sylvain Kahane & Guy Perrier. 2018. SUD or Surface-Syntactic Universal Dependencies: An annotation scheme near-isomorphic to UD. Universal Dependencies Workshop 2018. Brussels, Belgium. ⟨10.18653/v1/W18-6008⟩. ⟨hal-01930614⟩
Hahn, Michael, Judith Degen & Richard Futrell. 2021. Modeling word and morpheme order in natural language as an efficient trade-off of memory and surprisal. Psychological Review, 128(4), 726–756. https://doi.org/10.1037/rev0000269
Hahn, Michael & Yang Xu. 2022. Crosslinguistic word order variation reflects evolutionary pressures of dependency and information locality. In Proceedings of the National Academy of Sciences of the United States of America vol. 119,24 (2022): e2122604119. doi:10.1073/pnas.2122604119
Kondratyuk, Dan & Milan Straka. 2019. 75 Languages, 1 Model: Parsing Universal Dependencies Universally. in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), edited by Kentaro Inui, Jing Jiang, Vincent Ng, Xiaojun Wan, 2779–2795.
Haspelmath, Martin. 2018. How Comparative Concepts and Descriptive Linguistic Categories Are Different. In Aspects of Linguistic Variation, edited by Daniël Olmen, Tanja Mortelmans, and Frank Brisard, 83–114. Berlin, Boston: De Gruyter.
Hohn, Georg F K. 2021. Towards a Consistent Annotation of Nominal Person in Universal Dependencies. In Proceedings of the Fifth Workshop on Universal Dependencies (UDW, SyntaxFest 2021), edited by Miryam de Lhoneux, Reut Tsarfaty, 75-83.
Jing, Yingi, Damián E. Blasi & Balthasar Bickel. 2022. Dependency-length minimization and its limits: A possible role for a probabilistic version of the final-over-final condition. Language 98(3), 397–418.
Jing, Yingi, Paul Widmer & Balthasar Bickel. 2023. Word order evolves at similar rates in main and subordinate clauses. Diachronica. https://doi.org/10.1075/dia.20035.jin
Kahane, Sylvain, Santiago Herrera, Bruno Guillaume & Kim Gerdes. 2023. Autogramm : développement simultané de treebanks et de grammaires à partir de corpus. In Actes de CORIA-TALN 2023. Actes de la 30e Conférence sur le Traitement Automatique des Langues Naturelles (TALN), volume 6 : projets, 37–42, Paris, France. ATALA.
Levshina, Natalia. 2019. Token-based typology and word order entropy: A study based on Universal Dependencies. Linguistic Typology 23(3), 533-572. https://doi.org/10.1515/lingty-2019-0025
Levshina, Natalia, Savithry Namboodiripad, Marc Allassonnière-Tang, Mathew Alex Kramer, Luigi Talamo, Annemarie Verkerk, Sasha Wilmoth et al. 2023. Why We Need a Gradient Approach to Word Order. Linguistics 61(4), 825–883. https://doi.org/10.31234/osf.io/yg9bf.
Liu, Zoey. 2021. The Crosslinguistic Relationship between Ordering Flexibility and Dependency Length Minimization: A Data-Driven Approach. In Proceedings of the Society for Computation in Linguistics: Vol. 4, Article 25. https://doi.org/10.7275/xt42-4282
Marneffe, Marie-Catherine de, Christopher D. Manning, Joakim Nivre, and Daniel Zeman. ‘Universal Dependencies’. Computational Linguistics 47, no. 2 (20 May 2021): 255–308. https://doi.org/10.1162/coli_a_00402.
Osborne, Timothy & Kim Gerdes. 2019. The status of function words in dependency grammar: A critique of Universal Dependencies (UD). Glossa: a journal of general linguistics 4(1): 17. doi: https://doi.org/10.5334/gjgl.537
Sinnemäki, Kaius & Viljami Haakana. 2020. Variation in Universal Dependencies Annotation: A Token-Based Typological Case Study on Adpossessive Constructions. In Proceedings of the Fourth Workshop on Universal Dependencies (UDW 2020), edited by Marie-Catherine de Marneffe, Miryam de Lhoneux, Joakim Nivre, Sebastian Schuster, 158–167.
Talamo, Luigi & Annemarie Verkerk. 2022. A new methodology for an old problem. Italian Journal of Linguistics, 34(2), 171-226.
Tran, Ke & Bisazza, Arianna. 2019. Zero-shot Dependency Parsing with Pre-trained Multilingual Sentence Representations. In Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo), edited by Colin Cherry, Greg Durrett, George Foster, Reza Haffari, Shahram Khadivi, Nanyun Peng, Xiang Ren, Swabha Swayamdipta, 281–288.
Üstün, Ahmet, Arianna Bisazza, Gosse Bouma & Gertjan van Noord. 2022. UDapter: Typology-based Language Adapters for Multilingual Dependency Parsing and Sequence Labeling. Computational Linguistics. 48. 1-37.
Yan, Jianwei and Haitao Liu. 2019. Which annotation scheme is more expedient to measure syntactic difficulty and cognitive demand?. In Proceedings of the First Workshop on Quantitative Syntax (Quasy, SyntaxFest 2019), 16–24, Paris, France. Association for Computational Linguistics.
Yan, Jianwei & Haitao Liu. 2023. Basic word order typology revisited: a crosslinguistic quantitative study based on UD and WALS. Linguistics Vanguard. https://doi.org/10.1515/lingvan-2021-0001
Zariquiey, Roberto, Arturo Oncevay & Javier Vera. 2022. CLD² Language Documentation Meets Natural Language Processing for Revitalising Endangered Languages. In Proceedings of the Fifth Workshop on the Use of Computational Methods in the Study of Endangered Languages, 20–30, Dublin, Ireland. Association for Computational Linguistics.