Preferred LOD representation(s) for linguistic corpora?

3 views
Skip to first unread message

Kevin B. Cohen

unread,
Mar 12, 2021, 9:41:33 AM3/12/21
to biohac...@googlegroups.com, bl...@linkedannotation.org
Hello there,

A question that Jin-Dong can answer best, but whose answer will maybe be of interest to others besides myself: what LOD representation(s) do you prefer for linguistic corpora?  I am looking for something that I could use to encode metadata like the following:

1. Some data is spoken, some is written
2. Some corpora are "open" (content is added over time), some are not (their content is fixed)
3. Some corpora contain multiple-author data (e.g. newspaper articles), some contain single-author data (e.g. suicide notes)

Searching around, I have found tons of papers about building ontologies from corpora, but no ontologies of corpora.  And I have found OntoLex-lemon for lexical resources, but nothing similar for corpora themselves.  Ontologies of linguistic annotations: absolutely.  Of metadata: of course.  Of languages: yep.  Of corpora: no.

Opinions/advice?

Kev

--
Kevin Bretonnel Cohen, PhD
Director, Biomedical Text Mining Group
Computational Bioscience Program, U. Colorado School of Medicine
D'Alembert Chair in Natural Language Processing for the Biomedical Domain (Emeritus),
LIMSI, CNRS, Université Paris-Saclay
Reply all
Reply to author
Forward
0 new messages