Proposed changes to Scholarly Article

185 views
Skip to first unread message

Rachel Sanders

unread,
Aug 22, 2011, 2:05:48 PM8/22/11
to schemaorg-...@googlegroups.com
Hi all,

I work for HighWire Press, a division of Stanford University and a hosting service for many scholarly journals. We're interested in embedding microdata on our sites, and noticed the ScholarlyArticle scheme was just a stub.  

I have some suggested changes for you all, and would welcome feedback or discussion on any of them. I've tried to keep them in the style of existing schema.org fields, and to reuse as many descriptors from http://bibliontology.com/ as possible. 

If the text changes below make your eyes cross, here's a PDF mimicking the schema.org style: 


Name: articleAbstract
Format: Text
Description: The abstract or summary of the article.

Name: volume
Format: Text
Description: The print volume that this article was published in.

Name: issue
Format: Text
Description: The print issue that this article was published in.

Name: pages
Format: Text
Description: The printed page range for this article.

Name: locator
Format: Text
Description: A description (often numeric) that locates this article within its publication.

Name: doi
Format: Text
Description: The digital object identifier (DOI) used to uniquely identify an object

Name: pmid
Format: Text
Description: The PMID (PubMed identifier or PubMed unique identifier) assigned to this article's PubMed record.

Name: publication
Format: Text
Description: The full name of the journal or publication the article was published in.

Name: publicationAbbreviation
Format: Text
Description: The PubMed or other scholarly abbreviation of the journal or publication this article was published in.

Name: issn
Format: Text
Description: The International Standard Serial Number used to identify the print edition.

Name: eissn
Format: Text
Description: The International Standard Serial Number used to identify the electronic edition.

Name: owner
Format: Text
Description: The copyright holder for this article.

Name: rights
Format: Text
Description: The license status for this article.

Name: cites
Format: ScholarlyArticle, Book, or CreativeWork
Description: A reference that this article cites.

A couple points on the schema:  the new tags down through "pmid" are, I feel, pretty uncontroversial and part of most scholarly content.  The tags for "publication", "publicationAbbreviation", "issn" and "eissn" all describe the journal the article was published in.  It might make sense to create a ScholarlyPublication schema under "Organization" to hold that data.

The last three ("owner", "rights" and "cites") are all things I thought might be generally useful, but I'm not wedded to them. Often the publisher will keep copyright, but license the article under Creative Commons, for example. "cites" is the scholarly content/books/sites this article cites, which is useful for many reasons, but it can describe links to potentially hundreds of other pieces of content. 

Thanks in advance!

Alexander Shubin

unread,
Sep 5, 2011, 8:28:35 AM9/5/11
to Schema.org Discussion
Hi Rachel,

you did a great job! And I agree with your suggestions.

I`m mainly interested in two new fileds:
1. "publication" from your proposal
2. references - list of articles and literature used for writing
article. I didn`t find this field at schema.org or in your proposal.

Hope schema.org working group will answer us :)

On Aug 22, 10:05 pm, Rachel Sanders <rcord...@gmail.com> wrote:
> Hi all,
>
> I work for HighWire Press, a division of Stanford University and a hosting
> service for many scholarly journals. We're interested in embedding microdata
> on our sites, and noticed the ScholarlyArticle scheme was just a stub.  
>
> I have some suggested changes for you all, and would welcome feedback or
> discussion on any of them. I've tried to keep them in the style of existing
> schema.org fields, and to reuse as many descriptors fromhttp://bibliontology.com/as possible.
>
> If the text changes below make your eyes cross, here's a PDF mimicking the
> schema.org style:
>
> http://dl.dropbox.com/u/105439/ScholarlyArticle/ScholarlyArticle%20-%...

Rachel Sanders

unread,
Sep 5, 2011, 11:44:37 PM9/5/11
to schemaorg-...@googlegroups.com
Hi Alexander,

Thanks so much for the feedback. I did include referenced material - I
believe I called it "cites" or "cites". Take a look and see if that
fits your needs.

thanks!
Rachel

Alexander Shubin

unread,
Sep 8, 2011, 12:58:29 PM9/8/11
to Schema.org Discussion
Yep, Rachel, it really does :) Seems, I didn`t notice it first time.

And do you use this markup already or only planning? And on what
pages? If it`s not a secret :)

On Sep 6, 7:44 am, Rachel Sanders <rcord...@gmail.com> wrote:
> Hi Alexander,
>
> Thanks so much for the feedback. I did include referenced material - I
> believe I called it "cites" or "cites". Take a look and see if that
> fits your needs.
>
> thanks!
> Rachel
>

Rachel Sanders

unread,
Sep 8, 2011, 1:43:12 PM9/8/11
to schemaorg-...@googlegroups.com
Right now it's still a pilot, but we're very interested in having this be a part of our scholarly content going forward.  It's be a good thing for the search engines as well as general discoverability.

What about yourself? Have you thought about how to use it in your sites?  I'm curious to see what others are thinking.

Peter Sefton

unread,
Sep 13, 2011, 1:36:00 AM9/13/11
to schemaorg-...@googlegroups.com
Hi Rachel,

This is useful.

I think it would be good to align this with he work that has already
been done in the Bibliographic ontology - why not just use their terms
as documented? Looks like most of what you have presented here is
already pretty much aligned with that work already
(http://bibotools.googlecode.com/svn/bibo-ontology/trunk/doc/index.html).
As I understand it BIBO is more or less aligned with the way citation
data is handled in Zotero and Mendeley.


> Name: articleAbstract
> Format: Text
> Description: The abstract or summary of the article.

Why not description or abstract? This could then be used for multiple types.


> Name: rights
> Format: Text
> Description: The license status for this article.

This must have been covered somewhere else in schema.org I am assuming.

> Name: cites
> Format: ScholarlyArticle, Book, or CreativeWork
> Description: A reference that this article cites.

Yes, just working through the use case I am working on looking at
embedded citation data the work would be of type ScholarlyArticle and
the cites property could refer to another ScholarlyArticle item.

--
-------------------------------
Peter Sefton +61410326955 p...@ptsefton.com http://ptsefton.com
Gmail, Twitter & Skype name: ptsefton

Peter Sefton

unread,
Sep 13, 2011, 1:38:58 AM9/13/11
to schemaorg-...@googlegroups.com
Sorry Rachel - I should have read the post more carefully when I
returned to it to respond - didn't notice the reference to the bibo
work that was already there. No wonder your terms line up!

Peter Sefton

unread,
Sep 22, 2011, 7:00:33 PM9/22/11
to schemaorg-...@googlegroups.com
Hi again,

I am working on a UK project to look at Scholarly Resources in HTML5 and this issue is become quite pressing for us. We will definitely encourage the use of Microdata. And Schema.org where appropriate. One suggestion on that project for dealing with citations was to use the Bibo URIs as is, rather than minting new terms in Schema.org. This seems to be to be the easiest to document and support for all parties. Has this method for extending Schema.org been considered by the owners in this or other domains? 


Peter
--

Martin Hepp

unread,
Sep 23, 2011, 5:42:08 AM9/23/11
to schemaorg-...@googlegroups.com
According to yesterday's schema.org workshop in California, all three search engines tolerate mixing schema.org with properties and types from external vocabularies, e.g. by using full URIs. This does not mean that they actually consider arbitrary vocabularies, but you could always use schema.org terms for the basics for mainstream search engines and combine it with additional properties and types from external vocabularies.

This will not harm the parsing by the big three ones and provide the data for new applications.

Plus, it may become relevance factors for your page for specific queries.

See here for examples:

http://wiki.goodrelations-vocabulary.org/Microdata

and

http://www.slideshare.net/mhepp/extending-schemaorg-with-goodrelations-and-wwwproductontologyorg

Martin

Martin

--------------------------------------------------------
martin hepp
e-business & web science research group
universitaet der bundeswehr muenchen

e-mail: he...@ebusiness-unibw.org
phone: +49-(0)89-6004-4217
fax: +49-(0)89-6004-4620
www: http://www.unibw.de/ebusiness/ (group)
http://www.heppnetz.de/ (personal)
skype: mfhepp
twitter: mfhepp

Greg Grossmeier

unread,
Sep 23, 2011, 6:18:42 PM9/23/11
to schemaorg-...@googlegroups.com
On 09/12/2011 10:36 PM, Peter Sefton wrote:
>> Name: rights
>> Format: Text
>> Description: The license status for this article.
> This must have been covered somewhere else in schema.org I am assuming.

It wasn't until this past Wednesday when some of the rNews properties
were included in Schema.org (or at least announced to be, during the
Schema.org Workshop). Now there are, as a part of CreativeWork:

* copyrightHolder
* copyrightYear
* copyrightNotice (though, as referenced elsewhere on this list, this
one isn't showing up in CreativeWork, but it is showing up on /Painting,
oddly).

"rights" seems to have some overlap with those above terms, but it would
be wonderful, from a licensing perspective, to have a property that
doesn't take free text, but instead a URL to a license. Just to give it
a name to refer to: something like "copyrightLicenseUrl"

That may meet the needs of many community members, including HighWire
Press, which needs to make reference to multiple types of copyright
licenses depending on the article.


As an aside: Should this discussion move to the newly created
public-vocab mailing list[0] as per the new Web Schemas Task Force[1]
that was announced[2]?


Best,

Greg

[0] http://lists.w3.org/Archives/Public/public-vocabs/
[1] http://www.w3.org/2001/sw/interest/webschema.html
[2] http://www.w3.org/QA/2011/09/proposing_two_new_sw_interest.html

--
| Greg Grossmeier |
| http://grossmeier.net |

Philip (flip) Kromer

unread,
Sep 29, 2011, 2:23:23 PM9/29/11
to Schema.org Discussion
Those seem well-chosen extensions. Some comments:

* Instead of articleAbstract, I think there's a good argument for a
"summary" field on CreativeWork itself. As opposed to "description" (A
short description of the item), "summary" is something like "A brief
recapitulation of the content of this work". All of these feel like
"summary" to me:
- The abstract of a ScholarlyArticle
- An IMDB movie (http://www.imdb.com/title/tt0053291/) would file
the contents of "Storyline" under "summary"
- A recap of a TV episode
- A plot summary (as opposed to a review) of a book

* For "owner", can you instead use copyrightHolder?

* "rights" seems like it should go on CreativeWork, as a dedicated
LicensingRights type. Right now there are scattered fields:
"copyrightHolder" / "copyrightYear" on CreativeWork, the proposed
"copyrightLicenseUrl", "regionsAllowed" on MediaObject, no clear place
to put "CC-BY" or separate links to the license statement and its
plain-text summary. The CC embeddable widget is an example where the
licensing is reified as an independently-existing object.

* "cites" seems like a property of Article, not just ScholarlyArticle

* publicationAbbreviation is better placed as a property on the
publication

Mike Linksvayer

unread,
Sep 29, 2011, 2:49:03 PM9/29/11
to schemaorg-...@googlegroups.com
On Thu, Sep 29, 2011 at 11:23, Philip (flip) Kromer <fl...@infochimps.org> wrote:
> * "rights" seems like it should go on CreativeWork, as a dedicated
> LicensingRights type. Right now there are scattered fields:
> "copyrightHolder" / "copyrightYear" on CreativeWork, the proposed
> "copyrightLicenseUrl", "regionsAllowed" on MediaObject, no clear place
> to put "CC-BY" or separate links to the license statement and its
> plain-text summary. The CC embeddable widget is an example where the
> licensing is reified as an independently-existing object.

copyrightLicenseUrl is clearly needed on CreativeWork to obtain same
functionality currently available in rel="license" (which works as a
microformat and RDFa).

I'd approach a LicensingRights type with caution. Lots of things can
be relevant to copyright status and licensing. A dedicated type could
easily become a poorly thought out "rights expression language", for
which there isn't much evidence of usefulness of in the wild (i.e.,
deployed on the web). CC has been rather successful in just getting
people to point to a specific license URL, which is both very simple
and extensible.

Mike

Reply all
Reply to author
Forward
0 new messages