schema.org workshop

39 views
Skip to first unread message

BT

unread,
Sep 22, 2011, 2:56:27 AM9/22/11
to Linked Open Data in Libraries, Archives, & Museums
Was anyone else on this list at the schema.org workshop in Mountain
View today? What did you think?

quick notes (paraphrasing):

"mea culpa" messaging of initial schema.org announcement was off.
Alternative serializations (such as those getting picked up now and
before schema.org) will be used as well depending on return on
investment. mixed vocabularies are fine, we'll just ignore what we do
not understand. Alternative searalizations will be explored.

schema.org wants to collaborate; but this is not consensus. We have
deadlines and the show must go on.

Who is schema.org working with? [people who want to get stuff done]

two w3c web schema working groups have been formed; one for syntax and
one for vocabulary. Google groups will migrate to w3c public-
voc...@w3.org R.V. Guha [sp] chair of vocab group; Jeni Tennison is
the chair of the syntax group.

http://schema.org/NewsArticle was added today; working with IPTC
http://rnews.org/ and Evan Sandhaus [sp?] from the new york times
(also learned of data.nytimes.com ) rNews data model (not a news
ontology) has RDFa, schema.org microdata, and JSON serializations

LRMI http://lrmi.net/ http://wiki.creativecommons.org/LRMI funded by
gates foundation; will represent Learning resource metadata Initiative
in schema.org see also http://www.learningregistry.org/ (Greg
Grossmeir [sp]) see also CAST [?] universal design for learning

e business + web science research group (Martin [?]) extending
schema.org with good relations and productontology.org
http://www.heppnetz.de/projects/goodrelations/

Highwire ... vocabulary for academic publications discussion on google
group (does not have a peer reviewed status field right now)

Vocabularies for Information guidance topics (technet/MSDN)

Job postings for veterans + other projects with whitehouse CTO

data commons in discussion

sports in discussion

wikipedia infoboxes in discussion

have more vocabs? Let's talk!

process to incorporate into schema.org
1. develop and discuss
* gather broad industry support or adoption
2. release
* waver for patent rights
* CC-BY-SA
3. Incorporate into schema.org
4. support in search engines and other tools

I also have notes from Kavi Goel's schema.org implementation intro;
but I'm not going to type them in right now and I think the slides
might be going to be put online. [editorializing] Don't take this
personally if you worked on RDFa or microformats -- but my feeling is
that microdata and schema.org are easier to understand than RDFa or
microformats. There seemed to be a lot of venting going on in the
meeting / arguments and sensitivity about syntax and proper
functioning of a standards activity. To my mind: microdata seems
easy. RDFa seems hard. If RDFa had a sexier name it would still seem
hard. If schema.org vocab in microdata was called XYZg it would still
seem easy.

-- Brian

Corey A Harper

unread,
Sep 22, 2011, 3:47:12 AM9/22/11
to lod...@googlegroups.com
Brian,

Thanks so much for sharing this. Glad you were at the meeting, and I'm
looking forward to some more in-depth thoughts from others in the
coming days. So far, I've just been browsing twitter's #schemaorg,
which clearly doesn't allow much richness.

A few quick thoughts / comments:
* Great news re: alternative syntaxes / serializations will still be
considered by these SE's
* Also glad to hear about moving groups to w3c, and that Jeni
Tennison's involved
* Evan & Co. turned NewsArticle around very quickly, and it's nice to
see a somewhat more deeply modeled data-type in schema.org. Wondering
what it would take for some of us in the LAM world to do the same for
"after the slash" schemaorg types underneath CreativeWork, Book,
Sculpture, &c.
* Good relations creator is Martin Hepp

As a <editorial> postscript of my own, I still don't see what's so
hard about RDFa:

##Declare some namespaces for metadata properties
<div xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:dcterms="http://purl.org/dc/terms/"
xmlns:rdv="http://rdvocab.info/Elements/"
xmlns:rdfrbr="http://rdvocab.info/uri/schema/FRBRentitiesRDA/"
xmlns:rdrel="http://rdvocab.info/RDARelationshipsWEMI/"
xmlns:bibo="http://purl.org/ontology/bibo/"

## Identify your thing as a kind of thing with a name:
about="#book" typeof="rdfrbr:Manifestation">

## Give some metadata about it:
<div property="dc:identifier" content="dedupmrg17257152"></div>
<div property="dc:title" content="A history of philosophy in
America."></div>
<div property="dcterms:alternative" content=""></div>
<div property="dcterms:extent" content="2v. cm.."></div>
<div property="dc:identifier" content="1958381"></div>
<div property="dc:identifier" content="75040254"></div>
<div property="bibo:isbn" content="0399116508"></div>
<div property="dc:publisher" content="Capricorn"></div>
<div property="dc:publisher" content=""></div>
<div property="dcterms:created" content="1977"></div>
<div property="dc:subject" content="Philosophy, American --
History"></div>

## relate it to other named & typed things:
<div rel="rdrel:workManifested">
<div typeof="rdfrbr:Work"
about="http://example.org/thing21827719#work>
<div property="dc:dentifier" content="21827719"></div>
</div>
</div>

Thanks again for sharing your notes on this important meeting!

Best,
-Corey

--
Corey A Harper
Metadata Services Librarian
New York University Libraries
20 Cooper Square, 3rd Floor
New York, NY 10003-7112
212.998.2479
corey....@nyu.edu

Corey A Harper

unread,
Sep 22, 2011, 4:10:40 AM9/22/11
to lod...@googlegroups.com
Addenda: This post from Ivan Herman has more info on the two new W3C
SWIG Task Groups (Vocabs / Syntax) Brian mentions:
http://lists.w3.org/Archives/Public/semantic-web/2011Sep/0086.html

-c

Ford, Kevin

unread,
Oct 4, 2011, 3:58:21 PM10/4/11
to lod...@googlegroups.com
Before this fell too far down our inboxes I wanted to

1) Thank Brian for putting these notes together (Thanks!) and
2) Ask if others had anything to add to Brian's notes...

I know the attending group was rather small (relatively speaking) so, to anyone else who participated, it would be great to get your thoughts/notes etc, especially as it pertains to LAM orgs.

Cordially,

Kevin

p.s. I agree with Corey about the RDFa. It's really not *that* hard. My biggest issue (unknown) is the how much semantic noise will be in our HTML once we've done everything humanly possible to make our data available (RDFa, microdata, schema.org...).

BT

unread,
Oct 10, 2011, 1:52:34 PM10/10/11
to Linked Open Data in Libraries, Archives, & Museums
re: "what's so hard about it" / "it's not that hard"

here is a little example I worked out comparing microdata and RDFa
before schema.org came out

https://gist.github.com/916958

Now I'm not 100% sure the microdata example is correct; but it was
very easy to do after reading the spec and I'm reasonably confident it
is correct.

With RDFa; the spec just made and head hurt, and I needed twitter
followers to help me out (thanks @rubinsztajn and @jaclark)

The guy from google made a point that "some specifications" seem to be
written for other people that write specifications -- rather than for
end users. schema.org tries to make its documentation as easy to use
as possible.

specifically about Corey's RDFa example; the xmlns business is easy
for xml-wonks; but to most folks it is just cargo cult magic. Also,
re: hidden divs; google said they preferred to have the actual data
that is showing up on the page marked up semantically because it is
less likely to get out of sync or gamed by SEOers. (This latter point
is orthogonal to the syntax)

anyway; link away link-heads! -- Brian

Ed Summers

unread,
Oct 10, 2011, 10:58:29 PM10/10/11
to lod...@googlegroups.com
On Mon, Oct 10, 2011 at 1:52 PM, BT <brian.tingl...@gmail.com> wrote:
> With RDFa; the spec just made and head hurt, and I needed twitter
> followers to help me out (thanks @rubinsztajn and @jaclark)
>
> The guy from google made a point that "some specifications" seem to be
> written for other people that write specifications -- rather than for
> end users.  schema.org tries to make its documentation as easy to use
> as possible.

I'm not sure which spec you were reading, but definitely check out the
RDFa Primer [1], Introduction to RDFa from A List Apart [2] or the
Open Graph Protocol docs if you want a more helpful and friendly
introduction than the RDFa Syntax and Processing spec [4].

I am definitely leaning toward using microdata more these days, since
it allows you to do most of what RDFa offers in a much more
straightforward (I think) syntax...and parsing it is *way* easier. But
RDFa and Microformats still have their benefits, and pre-existing
deployments, so I think it's important to know what's available and be
able to use the right tool for the job.

//Ed

[1] http://www.w3.org/TR/xhtml-rdfa-primer/
[2] http://www.alistapart.com/articles/introduction-to-rdfa/
[3] http://ogp.me/
[4] http://www.w3.org/TR/rdfa-syntax/

Corey A Harper

unread,
Oct 11, 2011, 1:56:52 PM10/11/11
to lod...@googlegroups.com
Thanks for sharing this list of RDFa links, Ed. I've looked at most of
these, but the "A List Apart" post was new to me.

I think your point about knowing "what's available and be[ing] able to
use the right tool for the job" is the crux of the issue, and I'd love
for there to be some more discussion about which approach is best
suited to particular use cases.

Microdata seems like a very effective--and much simpler than
RDFa--approach to embedding structured data into web documents. It
also seems to lose some of the more useful features of the RDF data
model & graph-based approach to metadata. My primary use cases for RDF
are centered around relationships *between* resources, and about the
topics and entities (people, places, things) that are in our data.

In your experience, are schema.org's enumerations & cannonical
references [1] sufficient for capturing this kind of information? I
suspect I'm blindered by years of studying semantic web technologies,
but it seems to me that these constructs aren't quite the same as a
generic syntax for saying resource A has relationship B to resource
C...

-Corey

[1] http://schema.org/docs/gs.html#advanced_enum

--

Ed Summers

unread,
Oct 11, 2011, 2:20:51 PM10/11/11
to lod...@googlegroups.com
On Tue, Oct 11, 2011 at 1:56 PM, Corey A Harper <corey....@nyu.edu> wrote:
> Microdata seems like a very effective--and much simpler than
> RDFa--approach to embedding structured data into web documents. It
> also seems to lose some of the more useful features of the RDF data
> model & graph-based approach to metadata. My primary use cases for RDF
> are centered around relationships *between* resources, and about the
> topics and entities (people, places, things) that are in our data.

The interesting thing is that you can use microdata to describe a
graph of related resources. Take a look at the example at schema.org
for a Person [1]. I snipped a bit out here:

--

<div itemscope itemtype="http://schema.org/Person">
<span itemprop="name">Jane Doe</span>

Graduate students:
<a href="http://www.xyz.edu/students/alicejones.html" itemprop="colleagues">
Alice Jones</a>
<a href="http://www.xyz.edu/students/bobsmith.html" itemprop="colleagues">
Bob Smith</a>
</div>

--

If this was sitting on the web somewhere at http://example.com/jane it
would be basically asserting:

--

@prefix person <http://schema.org/Person#>.

<http://example.com/jane>
a person:Person,
person:name "Jane Doe",
person:colleagues (<http://www.xyz.edu/students/alicejones.html>,
<http://www.xyz.edu/students/bobsmith.html>) .

--

Kinda neat right?

//Ed

[1] http://schema.org/Person

Corey A Harper

unread,
Oct 11, 2011, 2:40:47 PM10/11/11
to lod...@googlegroups.com
Yes! This is very cool. It's sort of implicitly allows for this kind
of thing, though it's not really hard-wired in the spec anywhere.

I think this is part of why I find the various http://rdfs.schema.org/
efforts to be significant.

Thanks for posting this concrete example.

-Corey

--

Corey A Harper

unread,
Oct 11, 2011, 2:48:09 PM10/11/11
to lod...@googlegroups.com
Got the link below backwards. That should have been http://schema.rdfs.org/

Ford, Kevin

unread,
Oct 11, 2011, 4:05:17 PM10/11/11
to lod...@googlegroups.com
I'm still not convinced that microdata is significantly "easier" than RDFa, especially since the two functionally overlap considerably at their most basic levels. Here's Ed's example as RDFa:

<div xmlns:schemaorg="http://schema.org/" typeof="schemaorg:Person">
<span property="schemaorg:name">Jane Doe</span>
Graduate students:
<a href="http://www.xyz.edu/students/alicejones.html" rel="schemaorg:colleagues">Alice Jones</a>
<a href="http://www.xyz.edu/students/bobsmith.html" rel="schemaorg:colleagues">Bob Smith</a>
</div>

That's not hugely different. Now, because RDFa can accommodate more complicated data, creating RDFa can get a lot more elaborate very quickly.

But, to speak to Brian's points (a few of which were expressed by the Google people), I would also say that the RDFa spec is *not* designed for the average web developer. The Schema.org vocabulary, which is distinct from, but nevertheless effectively acts as another authoritative source of information for, the microdata spec [1], is far more accessible. For starters - and this is a personal gripe - with microdata, I do not have to work as hard to determine the correct attribute to use, in part because the microdata attributes follow a naming pattern to some degree and in part because I feel they're a little more intuitive. On the other hand, I like being able to declare something to be multiple types and, if you have complex data, xmlns prefixes are handy. Above all, I think the most powerful thing schema.org has going for it presently is its website, which is replete with example after example about how to *implement* the schema.org vocabulary using the microdata HTML5 format. We shouldn't underestimate how such a basic thing as clear documentation and examples can be for individuals trying to figure this stuff out. (And, if a web developer starts using schema.org, Google, Yahoo! and Bing are, presumably, already in position to start using the embedded data, which is more than we - LODLAMers - can generally say about the data we've published and made available.)

About schema.org's documentation, I wonder precisely how much that level of commitment to documentation and examples has played a role in schema.org's (and microdata's) rise in the last few months (putting aside schema.org's backers for a moment). That's probably a study in itself. And, considering Ed's point about "using the right tool for the job," I'd be interested whether the LODLAM community believes

1) whether our data are sufficiently expressible in microdata
2) if not, is that a problem, otherwise what needs to change and
3) Is there a distinct role for microdata or RDFa within LAM organizations, depending on the use case

FWIW, a number of people have written about the positives and negatives of RDFa over microdata (and vice-versa). Despite the author's very, very close association with RDFa, I do think Manu Sporny's post comparing RDFa and microformats is very well done [2]. A quick Google search will return the thoughts of many more commentators.

Warmly,

Kevin

[1] http://dev.w3.org/html5/md/Overview.html
[2] http://manu.sporny.org/2011/uber-comparison-rdfa-md-uf/

Ed Summers

unread,
Oct 12, 2011, 5:15:47 PM10/12/11
to lod...@googlegroups.com
On Tue, Oct 11, 2011 at 4:05 PM, Ford, Kevin <ke...@loc.gov> wrote:
> We shouldn't underestimate how such a basic thing as clear documentation and examples can
> be for individuals trying to figure this stuff out.

Totally. This is incredibly important, especially if the point is to
encourage adoption. A friendly validator or inspector that helps
people quietly determine if they are doing things right can help quite
a bit too.

> 1) whether our data are sufficiently expressible in microdata
> 2) if not, is that a problem, otherwise what needs to change and
> 3) Is there a distinct role for microdata or RDFa within LAM organizations, depending on the use case

I guess it's kind of dodging the question, but I use "right tool for
the job" as a subjective measure. I know I have my own tool
preferences, and often the rightness of a tool is how well it fits how
I think, or a team I'm on thinks, and what we happen to be doing. I
think the only way we'll see commonality in serialization formats and
vocabularies is by actually using them, and seeing what works what
doesn't, and collectively arriving at the same conclusions.

At the moment it seems we are in the unfortunate position of having to
use RDFa to communicate metadata to Facebook, and Microdata to do the
same with Google, Bing, Yahoo, et al. Although you'd think it would be
in the interests of all involved to look for both and Microformats.
But this is a big step forward from where we were a few years ago,
when there weren't any widely deployed consumer grade tools looking
for structured data on the Web.

I think it can help a lot to stay focused on what you are trying to
enable by putting metadata in HTML, and choosing appropriately. So
that kind of begs your question of what the LOD-LAM use cases are. One
could do worse than look at the draft Use Case Report from the W3C
Library Linked Data Incubator Group:

http://www.w3.org/2005/Incubator/lld/wiki/UseCaseReport

But maybe (just maybe) it's more in the spirit of LOD-LAM to build
some apps that make library, archives and museum data available on the
web, and tools to use it, and see what works...

//Ed

Ed Summers

unread,
Oct 13, 2011, 11:44:40 AM10/13/11
to lod...@googlegroups.com
Incidentally, I just noticed this DLF Fall Forum event on November
1st. Note the emphasis on schema.org vocabs...

//Ed

Linked Data: Hands on How-To
http://www.diglib.org/forums/2011forum/schedule/linked-data-hands-on-how-to/

We propose to facilitate a ‘hands-on’ workshop in which participants
can gain direct experience working with their collection data as
linked data sets.

Bring an Excel spreadsheet or other file containing your collection
data and a laptop prepared to jump in elbow to elbow with colleagues
who want to learn more about LOD in a fun, low key, collaborative way.
Don’t have collection data you can bring? That’s ok! We’ll supply 2-3
sample data sets to choose from to experiment on locally.

We will attempt to package everything in a way that if connectivity is
sparse the workshop can proceed on your laptop and you will go home
with material for future experimentation.

Targeted Learning outcomes: Understand how institutions can…
* express collection data as Linked Open Data according to LOD
ontologies/schemas from schema.org
* use and/or create schemas on schema.org
* process LOD collections in Refine and index them in Freebase
* consume LOD via Exhibit and/or Omeka/other tool of choice

This is designed to be an ~3 hour introductory workshop.

BT

unread,
Oct 13, 2011, 11:55:21 AM10/13/11
to Linked Open Data in Libraries, Archives, & Museums
On Oct 12, 2:15 pm, Ed Summers <e...@pobox.com> wrote:
> On Tue, Oct 11, 2011 at 4:05 PM, Ford, Kevin <k...@loc.gov> wrote:
> > We shouldn't underestimate how such a basic thing as clear documentation and examples can
> > be for individuals trying to figure this stuff out.
>
> Totally. This is incredibly important, especially if the point is to
> encourage adoption. A friendly validator or inspector that helps
> people quietly determine if they are doing things right can help quite
> a bit too.

I think having fewer choices/ ways to do things is also important. I
think one of the major problems with METS, for example, is that there
are so many "6 of one, half a dozen of another, or 3 + 3 of something
else" types of choices that really end the end make no difference
except to the poor soul who is supposed to support all the different
METS. I could be totally off base, but it seems like there are more
ways to do things in RDFa.

> > 1) whether our data are sufficiently expressible in microdata
> > 2) if not, is that a problem, otherwise what needs to change and
> > 3) Is there a distinct role for microdata or RDFa within LAM organizations, depending on the use case
>
> I guess it's kind of dodging the question, but I use "right tool for
> the job" as a subjective measure. I know I have my own tool
> preferences, and often the rightness of a tool is how well it fits how
> I think, or a team I'm on thinks, and what we happen to be doing. I
> think the only way we'll see commonality in serialization formats and
> vocabularies is by actually using them, and seeing what works what
> doesn't, and collectively arriving at the same conclusions.
>
> At the moment it seems we are in the unfortunate position of having to
> use RDFa to communicate metadata to Facebook, and Microdata to do the
> same with Google, Bing, Yahoo, et al.

The vocabulary and the syntax are both differ in this schema.org vs.
opengraph case as well.

I saw something very discouraging fly across the list for Jeni
Tennison's syntax working group yesterday.

] One of the assumptions we're making within the HTML Data TF is that
] publishers will need to publish in multiple formats (rather than
] consumers understanding multiple formats)

I almost cried and was thinking about alternative career choices for a
split second until I read Ian Hickson's encouraging reply

} That sounds like a horrible authoring experience. :-)
}
} My assumption is that authors will typically use zero vocabularies.
I
} think we have to consider ourselves lucky if they actually use any
at all.
}
} I certainly wouldn't encourage people to use more than one. So long
as
} they always use multivendor, well-documented vocabularies, they'll
always
} be able to trivially move to other vocabularies by applying
mechanical
} transformations later. No need to expose two. (There's rarely a need
to
} expose any!)

my hero!

http://lists.w3.org/Archives/Public/public-html-data-tf/2011Oct/0067.html

> Although you'd think it would be
> in the interests of all involved to look for both and Microformats.

There is a Return on investment it consider. There needs to be a
critical mass of adoption / or some very high value content that uses
the format to make it worth it. But I think the consumers here are
the big guns with lots of technical wherewithal and they should have
to deal with multiple syntaxes and vocabularies if this never gets
settled and there is a business case for them. Expecting web page
authors to mark things up every which way but loose seems like a pipe
dream.
...

> But maybe (just maybe) it's more in the spirit of LOD-LAM to build
> some apps that make library, archives and museum data available on the
> web, and tools to use it, and see what works...

I've been exposing calisphere dc metadata via http://www.ietf.org/rfc/rfc2731.txt
for several years, and I don't
see what is so hard about it. But I don't see what anyone is doing
with it. (I also did a very simple oEmbed, and I can't see where
anything uses it either).

Ed Summers

unread,
Oct 13, 2011, 3:06:19 PM10/13/11
to lod...@googlegroups.com
On Thu, Oct 13, 2011 at 11:55 AM, BT <brian.tingl...@gmail.com> wrote:
> I've been exposing calisphere dc metadata via http://www.ietf.org/rfc/rfc2731.txt
> for several years, and I don't
> see what is so hard about it.  But I don't see what anyone is doing
> with it.  (I also did a very simple oEmbed, and I can't see where
> anything uses it either).

Can we seen an example?

//Ed

Corey A Harper

unread,
Oct 13, 2011, 6:02:20 PM10/13/11
to lod...@googlegroups.com
This is great, Brian!

Ed, for an example, just check out the source of pretty much any
calisphere item. The first one I hit has loads of embedded DC (as well
as XTF) metadata. [1]

There was the start of a conversation on the DC General list about how
to best to work DC Metadata in to emerging HTML5 best practice, which
would happen alongside a process to map dcterms properties / classes
to the schema.org vocab. [2] If that discussion gets revived, I may
reach out to the two of you and others on this list for input.

Thanks,
-Corey

[1] http://content.cdlib.org/ark:/13030/tf387008q3/
[2] http://wiki.dublincore.org/index.php/Schema.org_Alignment

--

Reply all
Reply to author
Forward
0 new messages