Dear CG community,
I am reaching out to you because we have the idea to follow-up on Anssi Yli-Jyrä’s ideas on comparing CG to transformer models to see whether there is some commonalities between expert-made linguistic grammars and learned neural language models.
This is some kind of fascinating question and we would like to carry out some empirical studies to find possible correlations and patterns.
It would be great to get an update about available CG resources to get started and it would also be interesting to hear whether anyone of you would be interested to even collaborate in that study. What I had in mind was to look into the disambiguation
process done on real-world data using CG-based parsers and compare that with the activations triggered in trained neural language models.
It would be excellent to know whether there are some (hopefully freely available) wide-coverage grammars and parsers available that we can study. Most likely, we need to look into high-resource languages (including Finnish( to also make proper
comparisons to neural models but other scenarios are possible as well. Please, let me and Anssi know whether you have any suggestions. Thanks a ot!