I took a look at your sample LWT example and have some comments on
possible changes. I'm not that conversant in how GOLD has chosen to
model certain things. So, some details may not be GOLD-compliant. I
also added some concepts and predicates that aren't actually in GOLD
in the gold namespace (but it looks like you may have too).
Major changes:
1. For wordlists, after much discussion with various people, I think
it's safe to say a consensus has emerged not to code a wordlist
concept like "the cormorant" as a meaning in the GOLD sense of
meaning, but rather as a kind of "comparative concept" which has
"counterparts" in various languages. The GOLD sense of "meaning" would
then be reserved for particular translations of specific words in a
given language. This is kind of abstract, I know, but it has to do
with the special uses of wordlist concepts to try to generalize over
language-particular meanings. For example, we can talk of a concept
"person" which (at least a century ago) would have as a counterpart in
English as "man" and in German as "Mensch". However, in English "man"
has two important senses "person" and "male person" while German
"Mensch" lacks that ambiguity. In a wordlist context, both would be
linked to LWT's "the person", but if we were to assign each a language-
internal meaning, they would differ.
2. In an academic context, like the LWT project, where attribution is
vital, it is probably inadvisable to ever link any kind of data about
a language directly to a node for that language. Rather, an
intermediating device specifying the source of the data should be
employed. In the RDF below, I've used the notion of Doculect for this
(i.e., "a variety of a language which is documented in some source")
and connected the doculect to a language. I had thought GOLD
"described variety" was used for this concept, but upon reading the
documentation, it seems that is used for something different.
3. (Relatively minor) You had some content coming off of of a Form
node which struck me as being better connected to a Word node. So, I
moved some annotation "up" in the tree.
I've pasted revised RDF below. It should be valid if you want to
examine it in detail.
Jeff
----------------
# lwt data model example in turtle (http://www.w3.org/2007/02/turtle/
primer/)
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix owl: <http://www.w3.org/2002/07/owl#>.
@prefix gold: <http://purl.org/linguistics/gold#>.
@prefix dc: <http://purl.org/dc/elements/1.1/>.
@prefix lwt: <http://www.livingreviews.org/lwt/>.
#
# describe a word:
#
<lwt:word/72181920467485626> rdf:type gold:LinguisticSign .
<lwt:word/72181920467485626> gold:hasForm <lwt:word/
72181920467485626#Form> .
<lwt:word/72181920467485626> gold:translatesConcept <lwt:meaning/
3.597> .
<lwt:word/72181920467485626> rdf:type gold:SyntacticWord .
<lwt:word/72181920467485626> gold:inDoculect <lwt:languoid/
vanderSijs2008Dutch> .
<lwt:word/72181920467485626#Form> gold:OrthographicWord <lwt:word/
72181920467485626#OrthographicForm> .
<lwt:word/72181920467485626#OrthographicForm> rdf:type
gold:OrthographicWord .
<lwt:word/72181920467485626#OrthographicForm>
gold:orthographicRepresentation "aalscholver" .
#
# describe a meaning:
#
<lwt:meaning/3.597> rdf:type gold:ComparativeSemanticConcept .
<lwt:meaning/3.597> rdfs:label "the cormorant" .
#
# describe a doculect
#
<lwt:languoid/vanderSijs2008Dutch> rdf:type <gold:Doculect> .
<lwt:languoid/vanderSijs2008Dutch> gold:describesLanguoid
<lwt:languoid/Dutch> .
I mentioned in my previous post that we're putting together a toolkit to
help migrate data to GOLD aware RDF. We should have an update posted this
weekend. So far, I have implemented migration code for these formats:
-Praat (IGT only)
-Elan (IGT only)
-bibtex (for citing data)
-Leipzig glossed text
These formats are planned in the near future:
-phonetic feature geometries (in a text format)
-WordNet style entries
If there are any other formats that you think would be useful, please let
me know.
Scott
University of Washington
Department of Linguistics
B-201 Padelford Hall
Box 354340
Seattle, WA 98195-4340
Phone: (206) 616 5728
Fax: (206) 685-7978
webpage: http://faculty.washington.edu/farrar
> terms on my own, i'll accept that as being part of pioneering work.
> i'd still hope to find ideas for actual reuse of the LWT data - maybe
> comparison to other wordlists or dictionaries - to help me understand
> what level of interoperability we should go for.
This will be part of the LEGO project--there's plan to start work on
this early next year, if you can wait. We want to put a large number
of wordlists and dictionaries into a broadly interoperable format.
Details to be determined...
Jeff