"Vocabulary" sessions on Friday morning, 3 June

Thomas Baker

unread,

Jun 17, 2011, 10:39:23 PM6/17/11

to LOD in Libraries, Archives, Museums

Dear all,

Here are some notes from the Friday morning "vocabulary" sessions,
as best I can reconstruct them from the post-it sheets.

The three main concepts for further discussion and exploration
to emerge from this session were (details below):

-- Vocabulary Maintenance Toolkit
-- Vocabulary Preservation Framework
-- Alignment Primer

The main next-step action was to pursue these ideas in the following
contexts:

-- Tom (i.e., me) will set up a mailing list for discussing the Vocabulary
Preservation Framework idea. As I will be on the road for meetings and
vacation for much of the time through the end of July, realistically this
will happen in August.

-- Corey, Diane, and Tom are organizing a day-long meeting on
vocabularies as part of the DC-2011 conference in The Hague --
probably for Wednesday, 21 September. All three of the above
topics will be on the agenda. Preparation of the meeting will
take place on the following mailing lists:
http://www.jiscmail.ac.uk/lists/dc-registry.html
http://www.jiscmail.ac.uk/lists/dc-architecture.html
Contact:
Corey Harper <corey....@nyu.edu>
Tom Baker <t...@tombaker.org>
Diane Hillmann <metadat...@gmail.com>

Tom

======================================================================
2011-06-03 Friday, 9:30-11:30 - "Vocabularies"

Mapping the problem space

Alignments
Who makes them? Maintainers/owners of vocabularies (formalized).
Also individuals who need to map vocabularies for a specific
project or need (ad-hoc) -- "bottom-up" mappings.

Wikification of authority files?
Mapping text labels or tags to vocabularies?

Layering
Who says?
Trusted data versus community contributions.
"How to use it" versus "how to express it".

Versioning
Temporal, timestamps.
Flip side: commitment, durability.
Changing labels: time-stamped.
Active versioning: owners supply versioning information.
Passive versioning: Memento snapshots.

Preservation
Inheritance of ownership.
Orphaned vocabularies.
Passive preservation: Memento snapshots.

Provenance

----------------------------------------------------------------------
Preservation of RDF vocabularies

There is an expectation that URIs of RDF vocabularies (properties and
classes) will resolve to documentation, and evolving conventions about
the types of documentation to which they should resolve.

Who stands behind RDF vocabularies? How can vocabularies be put into a
"preservation context"? How can maintainers improve the short- and
long-term viability of their vocabularies? Who can step in and provide
access to vocabularies when primary servers go down? What arrangements
have been made for the long-term inheritance of vocabularies? Can cultural
memory organizations play a larger role in reinforcing vocabulary
maintainers?

Maintainers of RDF vocabularies range from private individuals (FOAF
Project) to major libraries (Library of Congress Subject Headings).
Vocabularies maintained by time-limited projects and by individuals seem
more vulnerable, but that is where alot of the innovation in vocabularies
occurs. What is the best practice, for example, for a scholar to publish a
vocabulary? And is any maintenance organization, in principle, "too big to
fail" in the long-term?

What types of partnerships are possible between innovators and stable
memory institutions? Inheritance arrangements for vocabularies.

Persistent identifiers (URIs): addressing link rot.

Possible organizational approaches:
-- Arrangements between vocabulary maintainers. Example: DCMI (Dublin Core)
and the FOAF Project have made an agreement. DCMI will mirror FOAF and,
if FOAF Project servers become unavailable, can use access to the FOAF
namespace DNS to put the cached copy online.

-- A coalition of memory institutions could mirror caches of RDF vocabularies,
as with the LOCKSS system.

-- Other possible solutions: Memento Project approach to saving snapshots of
Web servers.

To what extent are RDF vocabularies "special cases" from the standpoint of
preservation? What makes them different from any other type of Web resource?
Possibly different requirements: content negotiation, status of versions.

Follow up with: LOCKSS Project, Memento Project, Internet Archive,
Library of Congress, Dublin Core Metadata Initiative.

----------------------------------------------------------------------
Vocabulary maintenance and publication toolkit

Problem: Vocabulary maintainers have evolved a range tools and approaches
for versioning and publishing their vocabularies, e.g., Vocabulary Management
Tool (DCMI), SpecGen (FOAF), Open Metadata Registry (RDA Vocabularies),
ID.LOC.GOV (Library of Congress Subject Headings and MARC Vocabularies),
etc.

Approaches including: content negotiation between HTML and RDF, RDFa
embedded in Web pages. Vocabulary representations generated from underlying
databases or content management systems (e.g., Drupal).

Versioning is a BIG issue for maintenance.

----------------------------------------------------------------------
Alignments (mappings between vocabularies)

Problem: Increasing number of overlapping vocabularies. How to
"align" (map) terms in different vocabularies with similar or identical
meanings.

Users need to know what alignments are available.
Makers of alignments need to know how to make their mappings available for reuse.

Vocabulary maintainers can take the initiative to declare useful mappings in
order to align overlapping vocabularies.

On the other hand, lots of projects work on an ad-hoc, individual basis
making pragmatic alignments for local, project-specific purposes.

There are problems on two levels:
-- Ontological: how to express alignments (mappings)
-- How to make individual, ad-hoc mappings visible and allow them
to "bubble up" into formal standardization processes.

ACTION: Vocabulary meeting, 21 September, The Hague
-- Day-long meeting on vocabularies being planned, probably for Wednesday,
21 September, at DC-2011 in The Hague. Alignments will probably be one
of the issues discussed.
Discussion of the meeting will take place on the following two mailing lists:
http://www.jiscmail.ac.uk/lists/dc-registry.html
http://www.jiscmail.ac.uk/lists/dc-architecture.html
Contact:
Corey Harper <corey....@nyu.edu>
Tom Baker <t...@tombaker.org>
Diane Hillmann <metadat...@gmail.com>

Longer-term: Need for an "alignment primer" (best practice, at least for the
LOD-LAM world, for making alignments.

======================================================================
Ideas for next steps

-- "Vocabulary maintenance toolkit"

Problem: Maintainers of element sets and value vocabularies used in Linked
Data follow ad-hoc approaches to editing and publishing their vocabularies,
with little sharing of tools. None of the major Linked Data vocabularies
currently provide an example that other potential vocabulary maintainers
could easily emulate.

First step: Bring vocabulary maintainers together with developers to discuss
emerging tools and approaches for managing vocabularies -- i.e., making (and
versioning) editorial changes, generating multiple representations (HTML,
RDF/XML, Turtle, JSON, RDFa...?), and publishing in accordance with Web
architecture, possibly with content negotiation.

Follow-through: Focus the efforts of developers on improving, then
promoting, a handful of open-source tools for vocabulary managers,
explaining which tools are best-suited for which scenarios, then promoting
their use with good user documentation.

-- "Vocabulary preservation framework"

Problem: The value of any given vocabulary depends on the certainty (both
perceived and real) that its schemas and Web documentation will remain
reliably accessible over time and that its URIs will not be sold,
re-purposed, or simply forgotten. Without a well-understood framework for
long-term planning (e.g., arrangements for "inheriting" preservation
responsibility), much innovative work will be lost and the long-term
usability of data based on their vocabularies will be compromised.

First step: Bring vocabulary maintainers together with memory institutions
to clarify requirements with regard to ensuring the continued resolvability
of URIs to documentation in the short term (e.g., rapid response to service
outages) and in the long term (e.g., after owners retire, projects end, or
institutions become unable to continue maintenance duties).

Follow-through: Pull together a new consortium (or mobilize an existing
consortium) to prototype new types of partnership between vocabulary
maintainers and memory institutions.

-- "Alignment primer"

Problem: Implementers of Linked Data approaches, faced with an increasingly
diverse landscape of vocabularies that overlap in uncontrolled ways, must
continually make ad-hoc decisions about mapping between "apparently
equivalent" terms. There is a current lack of best practice both for
declaring alignments (i.e., mapping predicates as alternatives to the
overused owl:sameAs) and for publishing alignments (e.g., whether by
maintainers themselves or by third parties). Without alignments we could
see the proliferation of "Linked Data silos" (sets of data formally
published as Linked Data but based on ad-hoc vocabularies unrelated to
commonly understood vocabularies).

First step: Bring vocabulary maintainers together with ontologists to
clarify potential mapping predicates. Involve developers with insight on
how crowdsourced mappings might bubble up as input into more formal
alignment processes.

Follow-through: Seek research funding to test potential approaches.

--
Tom Baker <t...@tombaker.org>

Thomas Baker

unread,

Jun 17, 2011, 11:28:47 PM6/17/11

to LOD in Libraries, Archives, Museums

Jon Voss

unread,

Jun 18, 2011, 9:07:08 AM6/18/11

to lod...@googlegroups.com

This is great Tom, thanks so much for putting it together!

Jon

Reply all

Reply to author

Forward