Hello there,
A question that Jin-Dong can answer best, but whose answer will maybe be of interest to others besides myself: what LOD representation(s) do you prefer for linguistic corpora? I am looking for something that I could use to encode metadata like the following:
1. Some data is spoken, some is written
2. Some corpora are "open" (content is added over time), some are not (their content is fixed)
3. Some corpora contain multiple-author data (e.g. newspaper articles), some contain single-author data (e.g. suicide notes)
Searching around, I have found
tons of papers about building ontologies from corpora, but no ontologies
of corpora. And I have found
OntoLex-lemon for lexical resources, but nothing similar for corpora themselves. Ontologies of linguistic annotations: absolutely. Of metadata: of course. Of languages: yep. Of corpora: no.
Opinions/advice?
Kev
--
Kevin Bretonnel Cohen, PhD
Director, Biomedical Text Mining Group
Computational Bioscience Program, U. Colorado School of Medicine
D'Alembert Chair in Natural Language Processing for the Biomedical Domain (Emeritus),
LIMSI, CNRS, Université Paris-Saclay