DSpace and Google Scholar

1,198 views
Skip to first unread message

Monika Mevenkamp

unread,
Jul 28, 2016, 5:19:41 PM7/28/16
to DSpace Tech


  • We are about ready to start a new repository and I want to make sure I am staying on the good side of GoogleScholar
    We are about to bring up a new repository which will make previously published articles, metadata and fulltext pdfs available to the big wide open web. We do want GoogleScholar to happily crawl this new instance. I checked on the Google’s indexing guidelines and looked at an example item page in our current test instance to figure out whether the instance will be compliant.

    For reference I include the item metadata as well as the generated meta_tags at the end of this message.

    My first observation, there is a mix of citation_* and dcterms and dc metatags in the HTML. The dcterms tags are a bit mysterious, since the item I tested has no dcterms metadata values. Does the metatag generator somehow prefer dcterms for some metadata fields ?


    Here now the Google Scholar requirements that may pose an issue:

    Google*) The publication date tag, e.g., citation_publication_date or DC.issued, must contain the date of publication, i.e., the date that would normally be cited in references to this paper from other papers. Don't use it for the date of entry into the repository - that should go into citation_online_date instead. Provide full dates in the "2010/5/12" format if available; or a year alone otherwise. This tag is required for inclusion in Google Scholar.

    there is a dcterms.issued but not a dc.issued; that looks like a potential problem to me

    in addition we decided to store the electronic issue date in the dc.eissued field, in the absence of dc.issued date we end up with no issue date in meta tags; 
    is there an easy way to fix this - short of storing all issue dates in dc.date.issued ?

    Google*) For journal and conference papers, provide the remaining bibliographic citation data in the following tags: citation_journal_title or citation_conference_title, citation_issn, citation_isbn, citation_volume, citation_issue, citation_firstpage, and citation_lastpage. Dublin Core equivalents are DC.relation.ispartof for journal and conference titles and the non-standard tags DC.citation.volume, DC.citation.issue, DC.citation.spage (start page), and DC.citation.epage (end page) for the remaining fields. Regardless of the scheme chosen, these fields must contain sufficient information to identify a reference to this paper from another document, which is normally all of: (a) journal or conference name, (b) volume and issue numbers, if applicable, and (c) the number of the first page of the paper in the volume (or issue) in question.

    now this may be  problematic
    the items dc.identifier.citation   shows as the DCTERMS.bibliographicCitation
    the journal is listed in the tag DCTERMS.isPartOf

    for ISBNs we decide  to store values in dc.identifier.isbn10 and isbn13 - again: is there an easy way to make these appear in the right metatag ?


    Google*) The author tag, e.g., citation_author or DC.creator, must contain the authors (and only the actual authors) of the paper. Don't use it for the author of the website or for contributors other than authors, e.g., thesis advisors. Author names can be listed either as "Smith, John" or as "John Smith". Put each author name in a separate tag and omit all affiliations, degrees, certifications, etc., from this field. At least one author tag is required for inclusion in Google Scholar.

    there is both dc.contributor.author and a citation_author field
    GoogleScholar shouldn’t complain about having both .. but .. just checking 

Lots of questions - hoping for answers …  

Monika

-------

dc.contributor.author   Uppaluri, Sravanti
dc.contributor.author   Brangwynne, Clifford P
dc.date.accessioned     2016-07-28T19:25:55Z    -
dc.date.available       2016-07-28T19:25:55Z
dc.date.issued          2015-08-22
dc.identifier.citation  "A size threshold governs Caenorhabditis elegans developmental progression" Proceedings of the Royal Society B: Biological Sciences, (1813), 282, 20151283 - 20151283, doi:10.1098/rspb.2015.1283
dc.identifier.issn      0962-8452
dc.format.extent        20151283 - 20151283
dc.relation.ispartof    Proceedings of the Royal Society B: Biological Sciences
dc.title                A size threshold governs Caenorhabditis elegans developmental progression
dc.type                 Journal Article
dc.identifier.doi       doi:10.1098/rspb.2015.1283
dc.date.eissued         2015-08-19
dc.identifier.eissn     1471-2954


<meta name="DC.contributor.author" content="Uppaluri, Sravanti" xml:lang="en_US" />
<meta name="DC.contributor.author" content="Brangwynne, Clifford P" xml:lang="en_US" />
<meta name="DCTERMS.dateAccepted" content="2016-07-28T19:25:55Z" scheme="DCTERMS.W3CDTF" />
<meta name="DCTERMS.available" content="2016-07-28T19:25:55Z" scheme="DCTERMS.W3CDTF" />
<meta name="DCTERMS.issued" content="2015-08-22" xml:lang="en_US" scheme="DCTERMS.W3CDTF" />
<meta name="DCTERMS.bibliographicCitation" content="&quot;A size threshold governs Caenorhabditis elegans developmental progression&quot; Proceedings of the Royal Society B: Biological Sciences, (1813), 282, 20151283 - 20151283, doi:10.1098/rspb.2015.1283" xml:lang="en_US" />
<meta name="DC.identifier" content="0962-8452" xml:lang="en_US" />
<meta name="DC.identifier" content="http://arks.princeton.edu/ark:/99999/fk4x06br14" scheme="DCTERMS.URI" />
<meta name="DCTERMS.extent" content="20151283 - 20151283" xml:lang="en_US" />
<meta name="DCTERMS.isPartOf" content="Proceedings of the Royal Society B: Biological Sciences" xml:lang="en_US" />
<meta name="DC.title" content="A size threshold governs Caenorhabditis elegans developmental progression" xml:lang="en_US" />
<meta name="DC.type" content="Journal Article" />
<meta name=“DC.identifier" content="doi:10.1098/rspb.2015.1283" xml:lang="en_US" />
<meta name="DC.date" content="2015-08-19" xml:lang="en_US" scheme="DCTERMS.W3CDTF" />
<meta name="DC.identifier" content="1471-2954" xml:lang="en_US" />

<meta name="citation_keywords" content="Journal Article" />
<meta name="citation_title" content="A size threshold governs Caenorhabditis elegans developmental progression" />
<meta name="citation_issn" content="0962-8452" />
<meta name="citation_author" content="Uppaluri, Sravanti" />
<meta name="citation_author" content="Brangwynne, Clifford P" />
<meta name="citation_pdf_url" content="http://oar-dev.princeton.edu/jspui/bitstream/99999/fk4x06br14/1/Generic.pdf" />
<meta name="citation_date" content="2015-08-22" />
<meta name="citation_abstract_html_url" content="http://oar-dev.princeton.edu/jspui/handle/99999/fk4x06br14" />


Monika Mevenkamp
Digital Repository Infrastructure Developer
Princeton University
Skype: mo-meven

Andrea Schweer

unread,
Jul 28, 2016, 5:28:48 PM7/28/16
to Monika Mevenkamp, DSpace Tech
Hi Monika,

I don't have time for a detailed response, but it is my understanding that Scholar takes the citatation_* fields as first preference. Things like journal/conference name typically need a little bit of tweaking because there is no standard field in DSpace for these. You should start with this config file for the mappings: https://github.com/DSpace/DSpace/blob/master/dspace/config/crosswalks/google-metadata.properties

cheers,
Andrea
--
You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech...@googlegroups.com.
To post to this group, send email to dspac...@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.

-- 
Dr Andrea Schweer
Lead Software Developer, ITS Information Systems
The University of Waikato, Hamilton, New Zealand
+64-7-837 9120
Reply all
Reply to author
Forward
0 new messages