Why a new ontology?

9 views
Skip to first unread message

Peter Mika

unread,
Apr 15, 2007, 3:25:34 PM4/15/07
to Bibliographic Ontology Specification Group, mcak...@cs.vu.nl
Dear All,

I've read Frederick Giasson's call for this group on PlanetRDF.com.
But before getting started on the actual topic of developing an
ontology for bibliographies, my question is: why develop a new
ontology? What is lacking in SWRC/BuRST or PRISM that this new
ontology would add? I'm asking this, because I'm concerned by (even)
more fragmentation in this space.

Best,
Peter Mika
openacademia.org

p.s. please CC: response to pmika at yahoo-inc.com

bda...@gmail.com

unread,
Apr 15, 2007, 3:55:09 PM4/15/07
to Bibliographic Ontology Specification Group, pm...@yahoo-inc.com

Hi Peter,

On Apr 15, 3:25 pm, "Peter Mika" <peter.m...@gmail.com> wrote:

> I've read Frederick Giasson's call for this group on PlanetRDF.com.
> But before getting started on the actual topic of developing an
> ontology for bibliographies, my question is: why develop a new
> ontology? What is lacking in SWRC/BuRST or PRISM that this new
> ontology would add? I'm asking this, because I'm concerned by (even)
> more fragmentation in this space.

A fair point, but the reason why we need this is because the existing
stuff is not adequate. The first corresponds to a narrow range of
academic users (last I looked it wouldn't work for the humanities or
law), and the second is just a series of properties, mostly already
covered by DC and maintained by a fairly closed industry group not
very interested in RDF. The only properties they have that are useful
and unique IIRC are volume and number, and the latter is actually
wrong (it should be "issue" or "issueNumber") anyway.

Also, nobody has yet seemed to even try to solve how to incorporate
RDF into authoring workflows. I have RDF data, in other words, then
what? How do I used it to format my citations?

In the absence of that supoprt, existing RDF data is not very helpful
for users. This is why the class model is important.

Finally, I''ve not been happy with how anybody has solved contributor
modeling for bibliographic data in RDF.

I'd prefer reusing as much as possible from other ontologies (DC,
vCard, SKOS, etc.), but certainly at minimum we need a comrehensive
class model.

I don't think fragmentation is the problem in here. The problem is the
lack of compelling solutions (applications, services and so forth),
and stuff like Zotero will change that. Bottomline: we need something
that can support Zotero and OpenOffice bibliographic user needs. The
existing options do not.

Bruce

Peter Mika

unread,
Apr 16, 2007, 9:44:29 AM4/16/07
to Bibliographic Ontology Specification Group, pm...@yahoo-inc.com

Hi Bruce,

I'm wondering if you could clarify.


> A fair point, but the reason why we need this is because the existing
> stuff is not adequate. The first corresponds to a narrow range of
> academic users (last I looked it wouldn't work for the humanities or
> law), and the second is just a series of properties, mostly already
> covered by DC and maintained by a fairly closed industry group not
> very interested in RDF. The only properties they have that are useful
> and unique IIRC are volume and number, and the latter is actually
> wrong (it should be "issue" or "issueNumber") anyway.
>

So the SWRC is basically an RDF representation of BibTeX, which is
completely domain independent. (Although it is mostly used in
combination with Latex, i.e. in the science domain, there are no
features that are specific to the sciences.) AFAIK, same goes for
PRISM. Both have an RDF representation that you can extend if you see
fit, i.e. the extent to which these communities are interested in RDF
shouldn't matter. As an example, we are using SWRC in combination with
FOAF, because the modelling of persons in SWRC is not very detailed.
This causes no problems and thanks to RDF we could do it
independently.


> Also, nobody has yet seemed to even try to solve how to incorporate
> RDF into authoring workflows. I have RDF data, in other words, then
> what? How do I used it to format my citations?
>

The BuRST format is an RSS representation of SWRC, which also imposes
some structure. This allows you to apply XSLT stylesheets toward
formatting bibliographic data. This is what we are doing.

> In the absence of that supoprt, existing RDF data is not very helpful
> for users. This is why the class model is important.
>

What would be a class model that would achieve this?


> Finally, I''ve not been happy with how anybody has solved contributor
> modeling for bibliographic data in RDF.
>

We were also not, hence the use of SWRC in combination with FOAF.


> I'd prefer reusing as much as possible from other ontologies (DC,
> vCard, SKOS, etc.), but certainly at minimum we need a comrehensive
> class model.
>
> I don't think fragmentation is the problem in here. The problem is the
> lack of compelling solutions (applications, services and so forth),
> and stuff like Zotero will change that. Bottomline: we need something
> that can support Zotero and OpenOffice bibliographic user needs. The
> existing options do not.
>

If Zotero will turn out to be very convincing, people will use its
data in whatever format it dictates. But still, you would need to
expand on what the missing features are.

Best,
Peter

Yves Raimond

unread,
Apr 16, 2007, 10:26:40 AM4/16/07
to bibliographic-ontolog...@googlegroups.com
Hello!

I think this is actually a *really* good point. And it is true that
the combination of FOAF+SWRC is already really efficient (although I
don't really know to what extent in other domains than the scientific
one) - I have been using it for a while for my group's and my personal
web page.

So I think that this new effort should really try to not build an
ontology for scratch, especially in a domain which already had so many
data modeling efforts (there are at least 4 bibtex-in-rdf vocabulary I
can think of on the top of my head).

So I think one of the first thing we may do is a deep analysis of what
exists, what does it cover, what is missing, and construct on top of
that the few concepts/relationships that we need.

Cheers!
Yves

2007/4/16, Peter Mika <peter...@gmail.com>:

Frederick Giasson

unread,
Apr 16, 2007, 10:29:10 AM4/16/07
to bibliographic-ontolog...@googlegroups.com, pm...@yahoo-inc.com
Hi Peter,

>> A fair point, but the reason why we need this is because the existing
>> stuff is not adequate. The first corresponds to a narrow range of
>> academic users (last I looked it wouldn't work for the humanities or
>> law), and the second is just a series of properties, mostly already
>> covered by DC and maintained by a fairly closed industry group not
>> very interested in RDF. The only properties they have that are useful
>> and unique IIRC are volume and number, and the latter is actually
>> wrong (it should be "issue" or "issueNumber") anyway.
>>
>>
> So the SWRC is basically an RDF representation of BibTeX, which is
> completely domain independent. (Although it is mostly used in
> combination with Latex, i.e. in the science domain, there are no
> features that are specific to the sciences.) AFAIK, same goes for
> PRISM. Both have an RDF representation that you can extend if you see
> fit, i.e. the extent to which these communities are interested in RDF
> shouldn't matter. As an example, we are using SWRC in combination with
> FOAF, because the modelling of persons in SWRC is not very detailed.
> This causes no problems and thanks to RDF we could do it
> independently.
>

Bruce will answer to all these question later (think he is off for the
week for some conference things).

However, there is the core problem I have with Zitgist, and that other
people have too, and that has been widely discussed on the
Linked-Open-Data mailing list. The problem here is that yes , thanks to
RDF, we can do virtually anything, plug everything together, etc. We can
get mainstream vocabularies like DC, or obscure academic ontologies, etc.

This is certainly one of the more powerful feature of RDF, no doubts. It
works fine in a closed World, no problem. However, what happen when we
push this data in the wild? My experience with Zitgist and
Pingthesemanticwe.com told me that it become useless. Why? Because there
is no way for me to be aware of all these vocabularies, how they work
and how they can be queried. This is probably one of the biggest problem
for the semantic web right now, and this is why projects like the
Linked-Open-Data and communities driven ontologies development like
SIOC, the Music Ontology, the Bibliographic Ontology, and many others
are more than important. These communities make sure to create "best
practice guidelines" for developers to use. Since these ontologies are
developed by many people from many fields, there is a sort of consensus
that empower the use of these ontologies. This is what I realized while
developing the Music Ontology, when I participated to the development of
SIOC, when I participated to the Linked-Open-Data community, etc.

There are many ontologies, part-of ontologies, etc. that currently deal
with the problem of describing citations and bibliographic references
out there. However, they didn't answered the needs of the OpenOffice
project, the Zotero one neither, and certainly not Zitgist's.

By the only fact that 17 people subscribed to this mailing list in less
than 1 day tell me that there are questions to ask, and this is what we
are doing here right now. This is only the beginning of the
brainstorming, and I have the intuition that it will be fruitful and
that it could lead to dramatical changes.

The idea here is to develop yet-another-bibliographic-ontology. But the
goal isn't to re-invent the well another time. The goal is to fill-in
the blanks, to develop a sort of ontology framework developed in such a
way that we can easily plug future modules, and make it interacting
easily with already existing ontologies. Yes in RDF you can
"theorically" plug everything with everything, but in the reality, this
is not that simpler and effective. This new ontology initiative should
also act as a "best practices" guide for describing citations and
bibliographic references on the Semantic Web by developers that has
little knowledge in the semantic web.

This is a question of the adoption of the semantic web by Web
developers: people that just don't have the time to check all these
littles "fragmented" ontologies wrote in OWL, RDFS or whatever, without
too explicit comments, without documentation, examples, etc. This is why
microformats are going that well: because there are clear documentation,
good examples, etc. Like microformats or not, they got the attention of
developers because there is support, docs, examples and a strong community.


Salutations,


Fred

Frederick Giasson

unread,
Apr 16, 2007, 10:50:00 AM4/16/07
to bibliographic-ontolog...@googlegroups.com
Hi Yves!

Welcome aboard!


>
> I think this is actually a *really* good point. And it is true that
> the combination of FOAF+SWRC is already really efficient (although I
> don't really know to what extent in other domains than the scientific
> one) - I have been using it for a while for my group's and my personal
> web page.
>
> So I think that this new effort should really try to not build an
> ontology for scratch, especially in a domain which already had so many
> data modeling efforts (there are at least 4 bibtex-in-rdf vocabulary I
> can think of on the top of my head).
>
> So I think one of the first thing we may do is a deep analysis of what
> exists, what does it cover, what is missing, and construct on top of
> that the few concepts/relationships that we need.
>

Yes, since the beginning we said that we should try to re-use as much as
possible other ontologies, like we have done for the Music Ontology.


However if we take Chris Bizier's comment on my blog:


==============

yes, it would really be nice to have a community-backed ontology for
describing publications which is a bit more Semantic-Webby than Dublin
Core. So developing a best practice for mixing DC, FOAF, SIOC and the
event ontology would really useful.

Once you guys have developed this best practice, we are happy to change
the D2R mapping of our DBLP server
(http://www4.wiwiss.fu-berlin.de/dblp/) and the RDF book
mashup(http://sites.wiwiss.fu-berlin.de/suhl/bizer/bookmashup/index.html)
, so that they export RDF according to your best practice.

==============


It shows us that there is a real need for that sort of community driven
development. The problem is the following with the current landscape of
bibliographic ontologies:


I go to the BuRST home[1] page and click on one of its example[2]. I
check the code, I see some SWRC thing... now I try to dereference the
URI of this ontology[3] to get the schemas explaining what these
properties are. Then I try to find the properties/classes: they are not
there. I think this simple example explains all the problems out there,
at least most of the problem. There are no consistency, no good doc (I
can't find the good SWRC doc at the moment), no examples, etc. It is why
this initiative started, and this is the sort of thing we will try to
fix with it.

Why Web developers should care about these ontologies if they can't find
how they work? They won't, they will simply spend their time elsewhere
because in their business world, time is money, final dot.


And this initiative is not to tell people who is right and who is not.
This is a "community" project where everybody has their word to say.
There is a real problems with the current landscape, and this project
will try to fix them. We have to develop a framework with which we will
be able to extend it with modules, and to plug existing ontologies to it.

Salutations,


Fred

[1] http://www.cs.vu.nl/~pmika/research/burst/BuRST.html
[2] http://www.cs.vu.nl/~pmika/research/burst/BuRST-example.rdf
[3] http://swrc.ontoware.org/ontology/ontoware#

Frederick Giasson

unread,
Apr 16, 2007, 2:38:26 PM4/16/07
to bibliographic-ontolog...@googlegroups.com
Hi again all,


I just published an article related to my thought on my blog post:


http://fgiasson.com/blog/index.php/2007/04/16/why-another-bibliographic-ontology/


BTW Peter, thanks for having asked this question, it was the best
starting question for the brainstorming we could find :)


Let the discussion continue!


Take care,


Fred

Bruce D'Arcus

unread,
Apr 18, 2007, 7:13:16 PM4/18/07
to Peter Mika, Bibliographic Ontology Specification Group

On Apr 16, 2007, at 9:38 AM, Peter Mika wrote:

> So the SWRC is basically an RDF representation of BibTeX, which is
> completely domain independent. (Although it is mostly used in
> combination with Latex, i.e. in the science domain, there are no
> features that are specific to the sciences.)

I'm not going to get into yet another discussion of all that is wrong
with BibTeX (I've been having this discussion with people for the past
few years), but will just say that it was designed by a scientist for
scientists. That it's been hacked to work for the humanities (say with
Jurabib) doesn't obscure that it still needed to be hacked. See below
for more ...

> AFAIK, same goes for PRISM. Both have an RDF representation that you
> can extend if you see fit, i.e. the extent to which these communities
> are interested in RDF shouldn't matter.

It does matter, because it impacts all kinds of details of design and
deployment.

For example, I'd rather use (or simply refer to) dcterms, which has a
more robust community process, and better RDF support.

> As an example, we are using SWRC in combination with FOAF, because the
> modelling of persons in SWRC is not very detailed. This causes no
> problems and thanks to RDF we could do it independently.

I like aspects of FOAF, but I think that's its name model needs a lot
of work. I'd prefer to use the new vCard work happening in SWIG. That
would be more consistent with the microformats efforts too (hCite,
hCard), which I'd call "nice to have."

>> Also, nobody has yet seemed to even try to solve how to incorporate
>> RDF into authoring workflows. I have RDF data, in other words, then
>> what? How do I used it to format my citations?
>>

> The BuRST format is an RSS representation of SWRC, which also imposes
> some structure. This allows you to apply XSLT stylesheets toward
> formatting bibliographic data. This is what we are doing.

I think if you look into what we are doing with CSL and Zotero, our
goals are pretty ambitious. We're talking about fully automatic and
real-time formatting of citations and bibliographies in document
editors (Word, OpenOffice, etc.), and distributed user-defined XML
citation styles that can be used by different language libraries (XSLT,
Javascript, Python, etc.) and services. Writing formatting code in raw
XSLT it not very robust or scalable.

In the mid/long run, one should be able to have a document full of
citations (URIs), to provide the editing application a URI for the
citation style, and formatting happens automatically.

>> In the absence of that supoprt, existing RDF data is not very helpful
>> for users. This is why the class model is important.
>>
>

> What would be a class model that would achieve this?

The one I've written IMHO. It really has to be quite flexible to
account for the range of user communities. A rich list of classes,
which can be plugged together though relational properties to create
more complex descriptions. Example

A Book --> versionOf --> a Book (originally published in Kanji in 1876)

... or:

an Article --> presentedAt --> a Conference
publishedIn --> a ConferenceProceedings

... and so forth.

>> Finally, I''ve not been happy with how anybody has solved contributor
>> modeling for bibliographic data in RDF.
>>

> We were also not, hence the use of SWRC in combination with FOAF.

My issue is not so much the description of agents, but in the
description of the contributor relation; e.g. the property (or
properties) attached to the reference resource.

I consider all of these problematic:

<x:author>John Doe and Jane Smith</x:author>

This is obvious.

Using rdf:Seq is also problematic:

<x:author>
<rdf:Seq>
<rdf:li rdf:resource="http://ex.net/1"/>
<rdf:li rdf:resource="http://ex.net/2"/>
</rdf:Seq>
</x:author>

OK, so Ian Davis recommended instead explicitly modeling order:

<x:author rdf:parseType="Resource">
<x:contributor rdf:resource="http://ex.net/1"/>
<x:sequence>1</x:sequence>
</x:author>
<x:author rdf:parseType="Resource">
<x:contributor rdf:resource="http://ex.net/2"/>
<x:sequence>2</x:sequence>
</x:author>

That's still an option (and is similar to SWRC IIRC), but it still
means different ways to represent single-authored vs. multiple-authored
items.

So what about just a single author property where multiple authors get
modeled as what they are: a group?

<x:author rdf:resource="http://ex.net/1"/>

<foaf:Group rdf:about="http://ex.net/1">
<vcard:sort-string>Doe, John; Smith, Jane</vcard:sort-string>
....
</foaf:Group>

>> I'd prefer reusing as much as possible from other ontologies (DC,
>> vCard, SKOS, etc.), but certainly at minimum we need a comrehensive
>> class model.
>>
>> I don't think fragmentation is the problem in here. The problem is the
>> lack of compelling solutions (applications, services and so forth),
>> and stuff like Zotero will change that. Bottomline: we need something
>> that can support Zotero and OpenOffice bibliographic user needs. The
>> existing options do not.
>>

> If Zotero will turn out to be very convincing, people will use its
> data in whatever format it dictates. But still, you would need to
> expand on what the missing features are.

At the most basic, needs to be able to represent data common in the
humanities and law, and even the social sciences. This includes, but is
by no means limited to:

- much broader range of types of resources (interviews, hearing
transcripts, archival documents, etc.)
- wider range of contributor relations (translator, director, etc.)
- relations to original versions, potentially in other languages
and/or scripts (with support for transliteration and such)
- notes and annatations (not BibTeX annote as a property of a record,
but full resources)

Zotero also need supports for collections.

As part of "best practices" documentation, BTW, we also need
conventions of for subject URIs. It's easy for web resources, but more
complicated for other stuff (books, journal articles, archival
manuscripts in non-web-accessible collections, etc.).

Bruce

Reply all
Reply to author
Forward
0 new messages