I've read Frederick Giasson's call for this group on PlanetRDF.com. But before getting started on the actual topic of developing an ontology for bibliographies, my question is: why develop a new ontology? What is lacking in SWRC/BuRST or PRISM that this new ontology would add? I'm asking this, because I'm concerned by (even) more fragmentation in this space.
Best, Peter Mika openacademia.org
p.s. please CC: response to pmika at yahoo-inc.com
On Apr 15, 3:25 pm, "Peter Mika" <peter.m...@gmail.com> wrote:
> I've read Frederick Giasson's call for this group on PlanetRDF.com. > But before getting started on the actual topic of developing an > ontology for bibliographies, my question is: why develop a new > ontology? What is lacking in SWRC/BuRST or PRISM that this new > ontology would add? I'm asking this, because I'm concerned by (even) > more fragmentation in this space.
A fair point, but the reason why we need this is because the existing stuff is not adequate. The first corresponds to a narrow range of academic users (last I looked it wouldn't work for the humanities or law), and the second is just a series of properties, mostly already covered by DC and maintained by a fairly closed industry group not very interested in RDF. The only properties they have that are useful and unique IIRC are volume and number, and the latter is actually wrong (it should be "issue" or "issueNumber") anyway.
Also, nobody has yet seemed to even try to solve how to incorporate RDF into authoring workflows. I have RDF data, in other words, then what? How do I used it to format my citations?
In the absence of that supoprt, existing RDF data is not very helpful for users. This is why the class model is important.
Finally, I''ve not been happy with how anybody has solved contributor modeling for bibliographic data in RDF.
I'd prefer reusing as much as possible from other ontologies (DC, vCard, SKOS, etc.), but certainly at minimum we need a comrehensive class model.
I don't think fragmentation is the problem in here. The problem is the lack of compelling solutions (applications, services and so forth), and stuff like Zotero will change that. Bottomline: we need something that can support Zotero and OpenOffice bibliographic user needs. The existing options do not.
> A fair point, but the reason why we need this is because the existing > stuff is not adequate. The first corresponds to a narrow range of > academic users (last I looked it wouldn't work for the humanities or > law), and the second is just a series of properties, mostly already > covered by DC and maintained by a fairly closed industry group not > very interested in RDF. The only properties they have that are useful > and unique IIRC are volume and number, and the latter is actually > wrong (it should be "issue" or "issueNumber") anyway.
So the SWRC is basically an RDF representation of BibTeX, which is completely domain independent. (Although it is mostly used in combination with Latex, i.e. in the science domain, there are no features that are specific to the sciences.) AFAIK, same goes for PRISM. Both have an RDF representation that you can extend if you see fit, i.e. the extent to which these communities are interested in RDF shouldn't matter. As an example, we are using SWRC in combination with FOAF, because the modelling of persons in SWRC is not very detailed. This causes no problems and thanks to RDF we could do it independently.
> Also, nobody has yet seemed to even try to solve how to incorporate > RDF into authoring workflows. I have RDF data, in other words, then > what? How do I used it to format my citations?
The BuRST format is an RSS representation of SWRC, which also imposes some structure. This allows you to apply XSLT stylesheets toward formatting bibliographic data. This is what we are doing.
> In the absence of that supoprt, existing RDF data is not very helpful > for users. This is why the class model is important.
What would be a class model that would achieve this?
> Finally, I''ve not been happy with how anybody has solved contributor > modeling for bibliographic data in RDF.
We were also not, hence the use of SWRC in combination with FOAF.
> I'd prefer reusing as much as possible from other ontologies (DC, > vCard, SKOS, etc.), but certainly at minimum we need a comrehensive > class model.
> I don't think fragmentation is the problem in here. The problem is the > lack of compelling solutions (applications, services and so forth), > and stuff like Zotero will change that. Bottomline: we need something > that can support Zotero and OpenOffice bibliographic user needs. The > existing options do not.
If Zotero will turn out to be very convincing, people will use its data in whatever format it dictates. But still, you would need to expand on what the missing features are.
I think this is actually a *really* good point. And it is true that the combination of FOAF+SWRC is already really efficient (although I don't really know to what extent in other domains than the scientific one) - I have been using it for a while for my group's and my personal web page.
So I think that this new effort should really try to not build an ontology for scratch, especially in a domain which already had so many data modeling efforts (there are at least 4 bibtex-in-rdf vocabulary I can think of on the top of my head).
So I think one of the first thing we may do is a deep analysis of what exists, what does it cover, what is missing, and construct on top of that the few concepts/relationships that we need.
> I'm wondering if you could clarify. > > A fair point, but the reason why we need this is because the existing > > stuff is not adequate. The first corresponds to a narrow range of > > academic users (last I looked it wouldn't work for the humanities or > > law), and the second is just a series of properties, mostly already > > covered by DC and maintained by a fairly closed industry group not > > very interested in RDF. The only properties they have that are useful > > and unique IIRC are volume and number, and the latter is actually > > wrong (it should be "issue" or "issueNumber") anyway.
> So the SWRC is basically an RDF representation of BibTeX, which is > completely domain independent. (Although it is mostly used in > combination with Latex, i.e. in the science domain, there are no > features that are specific to the sciences.) AFAIK, same goes for > PRISM. Both have an RDF representation that you can extend if you see > fit, i.e. the extent to which these communities are interested in RDF > shouldn't matter. As an example, we are using SWRC in combination with > FOAF, because the modelling of persons in SWRC is not very detailed. > This causes no problems and thanks to RDF we could do it > independently. > > Also, nobody has yet seemed to even try to solve how to incorporate > > RDF into authoring workflows. I have RDF data, in other words, then > > what? How do I used it to format my citations?
> The BuRST format is an RSS representation of SWRC, which also imposes > some structure. This allows you to apply XSLT stylesheets toward > formatting bibliographic data. This is what we are doing.
> > In the absence of that supoprt, existing RDF data is not very helpful > > for users. This is why the class model is important.
> What would be a class model that would achieve this? > > Finally, I''ve not been happy with how anybody has solved contributor > > modeling for bibliographic data in RDF.
> We were also not, hence the use of SWRC in combination with FOAF. > > I'd prefer reusing as much as possible from other ontologies (DC, > > vCard, SKOS, etc.), but certainly at minimum we need a comrehensive > > class model.
> > I don't think fragmentation is the problem in here. The problem is the > > lack of compelling solutions (applications, services and so forth), > > and stuff like Zotero will change that. Bottomline: we need something > > that can support Zotero and OpenOffice bibliographic user needs. The > > existing options do not.
> If Zotero will turn out to be very convincing, people will use its > data in whatever format it dictates. But still, you would need to > expand on what the missing features are.
>> A fair point, but the reason why we need this is because the existing >> stuff is not adequate. The first corresponds to a narrow range of >> academic users (last I looked it wouldn't work for the humanities or >> law), and the second is just a series of properties, mostly already >> covered by DC and maintained by a fairly closed industry group not >> very interested in RDF. The only properties they have that are useful >> and unique IIRC are volume and number, and the latter is actually >> wrong (it should be "issue" or "issueNumber") anyway.
> So the SWRC is basically an RDF representation of BibTeX, which is > completely domain independent. (Although it is mostly used in > combination with Latex, i.e. in the science domain, there are no > features that are specific to the sciences.) AFAIK, same goes for > PRISM. Both have an RDF representation that you can extend if you see > fit, i.e. the extent to which these communities are interested in RDF > shouldn't matter. As an example, we are using SWRC in combination with > FOAF, because the modelling of persons in SWRC is not very detailed. > This causes no problems and thanks to RDF we could do it > independently.
Bruce will answer to all these question later (think he is off for the week for some conference things).
However, there is the core problem I have with Zitgist, and that other people have too, and that has been widely discussed on the Linked-Open-Data mailing list. The problem here is that yes , thanks to RDF, we can do virtually anything, plug everything together, etc. We can get mainstream vocabularies like DC, or obscure academic ontologies, etc.
This is certainly one of the more powerful feature of RDF, no doubts. It works fine in a closed World, no problem. However, what happen when we push this data in the wild? My experience with Zitgist and Pingthesemanticwe.com told me that it become useless. Why? Because there is no way for me to be aware of all these vocabularies, how they work and how they can be queried. This is probably one of the biggest problem for the semantic web right now, and this is why projects like the Linked-Open-Data and communities driven ontologies development like SIOC, the Music Ontology, the Bibliographic Ontology, and many others are more than important. These communities make sure to create "best practice guidelines" for developers to use. Since these ontologies are developed by many people from many fields, there is a sort of consensus that empower the use of these ontologies. This is what I realized while developing the Music Ontology, when I participated to the development of SIOC, when I participated to the Linked-Open-Data community, etc.
There are many ontologies, part-of ontologies, etc. that currently deal with the problem of describing citations and bibliographic references out there. However, they didn't answered the needs of the OpenOffice project, the Zotero one neither, and certainly not Zitgist's.
By the only fact that 17 people subscribed to this mailing list in less than 1 day tell me that there are questions to ask, and this is what we are doing here right now. This is only the beginning of the brainstorming, and I have the intuition that it will be fruitful and that it could lead to dramatical changes.
The idea here is to develop yet-another-bibliographic-ontology. But the goal isn't to re-invent the well another time. The goal is to fill-in the blanks, to develop a sort of ontology framework developed in such a way that we can easily plug future modules, and make it interacting easily with already existing ontologies. Yes in RDF you can "theorically" plug everything with everything, but in the reality, this is not that simpler and effective. This new ontology initiative should also act as a "best practices" guide for describing citations and bibliographic references on the Semantic Web by developers that has little knowledge in the semantic web.
This is a question of the adoption of the semantic web by Web developers: people that just don't have the time to check all these littles "fragmented" ontologies wrote in OWL, RDFS or whatever, without too explicit comments, without documentation, examples, etc. This is why microformats are going that well: because there are clear documentation, good examples, etc. Like microformats or not, they got the attention of developers because there is support, docs, examples and a strong community.
> I think this is actually a *really* good point. And it is true that > the combination of FOAF+SWRC is already really efficient (although I > don't really know to what extent in other domains than the scientific > one) - I have been using it for a while for my group's and my personal > web page.
> So I think that this new effort should really try to not build an > ontology for scratch, especially in a domain which already had so many > data modeling efforts (there are at least 4 bibtex-in-rdf vocabulary I > can think of on the top of my head).
> So I think one of the first thing we may do is a deep analysis of what > exists, what does it cover, what is missing, and construct on top of > that the few concepts/relationships that we need.
Yes, since the beginning we said that we should try to re-use as much as possible other ontologies, like we have done for the Music Ontology.
However if we take Chris Bizier's comment on my blog:
==============
yes, it would really be nice to have a community-backed ontology for describing publications which is a bit more Semantic-Webby than Dublin Core. So developing a best practice for mixing DC, FOAF, SIOC and the event ontology would really useful.
It shows us that there is a real need for that sort of community driven development. The problem is the following with the current landscape of bibliographic ontologies:
I go to the BuRST home[1] page and click on one of its example[2]. I check the code, I see some SWRC thing... now I try to dereference the URI of this ontology[3] to get the schemas explaining what these properties are. Then I try to find the properties/classes: they are not there. I think this simple example explains all the problems out there, at least most of the problem. There are no consistency, no good doc (I can't find the good SWRC doc at the moment), no examples, etc. It is why this initiative started, and this is the sort of thing we will try to fix with it.
Why Web developers should care about these ontologies if they can't find how they work? They won't, they will simply spend their time elsewhere because in their business world, time is money, final dot.
And this initiative is not to tell people who is right and who is not. This is a "community" project where everybody has their word to say. There is a real problems with the current landscape, and this project will try to fix them. We have to develop a framework with which we will be able to extend it with modules, and to plug existing ontologies to it.
> So the SWRC is basically an RDF representation of BibTeX, which is > completely domain independent. (Although it is mostly used in > combination with Latex, i.e. in the science domain, there are no > features that are specific to the sciences.)
I'm not going to get into yet another discussion of all that is wrong with BibTeX (I've been having this discussion with people for the past few years), but will just say that it was designed by a scientist for scientists. That it's been hacked to work for the humanities (say with Jurabib) doesn't obscure that it still needed to be hacked. See below for more ...
> AFAIK, same goes for PRISM. Both have an RDF representation that you > can extend if you see fit, i.e. the extent to which these communities > are interested in RDF shouldn't matter.
It does matter, because it impacts all kinds of details of design and deployment.
For example, I'd rather use (or simply refer to) dcterms, which has a more robust community process, and better RDF support.
> As an example, we are using SWRC in combination with FOAF, because the > modelling of persons in SWRC is not very detailed. This causes no > problems and thanks to RDF we could do it independently.
I like aspects of FOAF, but I think that's its name model needs a lot of work. I'd prefer to use the new vCard work happening in SWIG. That would be more consistent with the microformats efforts too (hCite, hCard), which I'd call "nice to have."
>> Also, nobody has yet seemed to even try to solve how to incorporate >> RDF into authoring workflows. I have RDF data, in other words, then >> what? How do I used it to format my citations?
> The BuRST format is an RSS representation of SWRC, which also imposes > some structure. This allows you to apply XSLT stylesheets toward > formatting bibliographic data. This is what we are doing.
I think if you look into what we are doing with CSL and Zotero, our goals are pretty ambitious. We're talking about fully automatic and real-time formatting of citations and bibliographies in document editors (Word, OpenOffice, etc.), and distributed user-defined XML citation styles that can be used by different language libraries (XSLT, Javascript, Python, etc.) and services. Writing formatting code in raw XSLT it not very robust or scalable.
In the mid/long run, one should be able to have a document full of citations (URIs), to provide the editing application a URI for the citation style, and formatting happens automatically.
>> In the absence of that supoprt, existing RDF data is not very helpful >> for users. This is why the class model is important.
> What would be a class model that would achieve this?
The one I've written IMHO. It really has to be quite flexible to account for the range of user communities. A rich list of classes, which can be plugged together though relational properties to create more complex descriptions. Example
A Book --> versionOf --> a Book (originally published in Kanji in 1876)
... or:
an Article --> presentedAt --> a Conference publishedIn --> a ConferenceProceedings
... and so forth.
>> Finally, I''ve not been happy with how anybody has solved contributor >> modeling for bibliographic data in RDF.
> We were also not, hence the use of SWRC in combination with FOAF.
My issue is not so much the description of agents, but in the description of the contributor relation; e.g. the property (or properties) attached to the reference resource.
>> I'd prefer reusing as much as possible from other ontologies (DC, >> vCard, SKOS, etc.), but certainly at minimum we need a comrehensive >> class model.
>> I don't think fragmentation is the problem in here. The problem is the >> lack of compelling solutions (applications, services and so forth), >> and stuff like Zotero will change that. Bottomline: we need something >> that can support Zotero and OpenOffice bibliographic user needs. The >> existing options do not.
> If Zotero will turn out to be very convincing, people will use its > data in whatever format it dictates. But still, you would need to > expand on what the missing features are.
At the most basic, needs to be able to represent data common in the humanities and law, and even the social sciences. This includes, but is by no means limited to:
- much broader range of types of resources (interviews, hearing transcripts, archival documents, etc.) - wider range of contributor relations (translator, director, etc.) - relations to original versions, potentially in other languages and/or scripts (with support for transliteration and such) - notes and annatations (not BibTeX annote as a property of a record, but full resources)
Zotero also need supports for collections.
As part of "best practices" documentation, BTW, we also need conventions of for subject URIs. It's easy for web resources, but more complicated for other stuff (books, journal articles, archival manuscripts in non-web-accessible collections, etc.).