Preferred LOD representation(s) for linguistic corpora?

3 views

Skip to first unread message

Kevin B. Cohen

unread,

Mar 12, 2021, 9:41:33 AM3/12/21

to biohac...@googlegroups.com, bl...@linkedannotation.org

Hello there,

A question that Jin-Dong can answer best, but whose answer will maybe be of interest to others besides myself: what LOD representation(s) do you prefer for linguistic corpora? I am looking for something that I could use to encode metadata like the following:

1. Some data is spoken, some is written

2. Some corpora are "open" (content is added over time), some are not (their content is fixed)

3. Some corpora contain multiple-author data (e.g. newspaper articles), some contain single-author data (e.g. suicide notes)

Searching around, I have found tons of papers about building ontologies from corpora, but no ontologies of corpora. And I have found OntoLex-lemon for lexical resources, but nothing similar for corpora themselves. Ontologies of linguistic annotations: absolutely. Of metadata: of course. Of languages: yep. Of corpora: no.

Opinions/advice?

Kev

Kevin Bretonnel Cohen, PhD
Director, Biomedical Text Mining Group
Computational Bioscience Program, U. Colorado School of Medicine

D'Alembert Chair in Natural Language Processing for the Biomedical Domain (Emeritus),

LIMSI, CNRS, Université Paris-Saclay

303-916-2417
http://compbio.ucdenver.edu/Hunter_lab/Cohen

Reply all

Reply to author

Forward

0 new messages