Dataverse and new google dataset search

332 views
Skip to first unread message

Michel Bamouni

unread,
Sep 28, 2018, 8:41:18 AM9/28/18
to Dataverse Users Community
Hi,

I see that google offers google dataset search. In this engine, we can search dataset according to my understanding.
So, how can I reference dataverse datasets in this new google dataset search?

Best regards,

Michel

Philip Durbin

unread,
Sep 28, 2018, 8:56:58 AM9/28/18
to dataverse...@googlegroups.com
I may be misunderstanding the question but you can click "Share" in the top right and then click "Click to copy link". I'll attach a screenshot.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/73bee06c-b00b-4e09-8bf2-2b686d8b6257%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
Screen Shot 2018-09-28 at 8.54.18 AM.png

Michel Bamouni

unread,
Oct 1, 2018, 4:14:31 AM10/1/18
to Dataverse Users Community
Hi Phil,

I think, I was not ask my question clearly. So I will try to reformulate it.
Recently, I heard that google setup a new engine called "Dataset search" that link is https://toolbox.google.com/datasetsearch.
In this tool, we can search a dataset. So, what I want to know is how can I connect my dataverse dataset to this tool. The purpose is naturally after connection or referencement to have the capability to search my dataset in https://toolbox.google.com/datasetsearch.

regards,

Michel


Le vendredi 28 septembre 2018 14:56:58 UTC+2, Philip Durbin a écrit :
I may be misunderstanding the question but you can click "Share" in the top right and then click "Click to copy link". I'll attach a screenshot.

On Fri, Sep 28, 2018 at 8:41 AM Michel Bamouni <olimi...@gmail.com> wrote:
Hi,

I see that google offers google dataset search. In this engine, we can search dataset according to my understanding.
So, how can I reference dataverse datasets in this new google dataset search?

Best regards,

Michel

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

Philip Durbin

unread,
Oct 1, 2018, 7:07:55 AM10/1/18
to dataverse...@googlegroups.com
Hi Michel,

You seem to be saying "I'd like Google's search engine (and specifically Google's new Dataset Search feature) to index my datasets."

The first thing to check is that robots.txt is not blocking Google or other search engine crawlers. Please see "Letting Search Engines Crawl Your Installation" at http://guides.dataverse.org/en/4.9.2/installation/config.html#letting-search-engines-crawl-your-installation

In the future, Dataverse will create a sitemap that you can submit to Google or other search engines to make it easier for them to index your datasets. Please see the "Dataverse discovery in Google - Machine Readable Sitemaps" issue at https://github.com/IQSS/dataverse/issues/4261

Feedback (on both code and documentation) is very welcome on the pull request I made on Friday to implement sitemap support for the issue above. You can see the proposed changes at https://github.com/IQSS/dataverse/pull/5084/files

I hope this helps,

Phil

p.s. You might be interested in this previous thread about Google Dataset Search: https://groups.google.com/d/msg/dataverse-community/TlQPNI3Ip2E/srLf29aSBAAJ

On Mon, Oct 1, 2018 at 4:14 AM Michel Bamouni <olimi...@gmail.com> wrote:
Hi Phil,

I think, I was not ask my question clearly. So I will try to reformulate it.
Recently, I heard that google setup a new engine called "Dataset search" that link is https://toolbox.google.com/datasetsearch.
In this tool, we can search a dataset. So, what I want to know is how can I connect my dataverse dataset to this tool. The purpose is naturally after connection or referencement to have the capability to search my dataset in https://toolbox.google.com/datasetsearch.

regards,

Michel


Le vendredi 28 septembre 2018 14:56:58 UTC+2, Philip Durbin a écrit :
I may be misunderstanding the question but you can click "Share" in the top right and then click "Click to copy link". I'll attach a screenshot.

On Fri, Sep 28, 2018 at 8:41 AM Michel Bamouni <olimi...@gmail.com> wrote:
Hi,

I see that google offers google dataset search. In this engine, we can search dataset according to my understanding.
So, how can I reference dataverse datasets in this new google dataset search?

Best regards,

Michel

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

To post to this group, send email to dataverse...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Michel Bamouni

unread,
Oct 8, 2018, 5:24:00 AM10/8/18
to Dataverse Users Community
Hi Phil,

So I want to have the ability to search my dataset in Google's new Dataset Search feature, actually on dataverse side, there is no configuration to do.
All, I have to do is to allow search engine to crawl my data.

Michel
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

Philip Durbin

unread,
Oct 9, 2018, 7:29:25 AM10/9/18
to dataverse...@googlegroups.com
That's correct, as of the latest release of Dataverse, which is 4.9.4, there is no configuration to do other than making sure search engines can crawl your Dataverse installation by opening up robots.txt as described at http://guides.dataverse.org/en/4.9.4/installation/config.html#letting-search-engines-crawl-your-installation

In the next version of Dataverse (I'm not sure if this will be 4.9.5 or 4.10), you will need to do a little configuration to enable a new feature where Dataverse will create a machine-readable sitemap of dataverses and datasets. For now, you can read about the sitemap feature at https://github.com/IQSS/dataverse/blob/c80dc43daf372dd0956c8910b9c78d84eb55d6e9/doc/sphinx-guides/source/installation/config.rst#creating-a-sitemap-and-submitting-it-to-search-engines

I hope this helps,

Phil
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

To post to this group, send email to dataverse...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

James Turitto

unread,
Feb 22, 2019, 12:23:04 PM2/22/19
to Dataverse Users Community
Hi Philip, 

Do you know if the new feature for machine-readable sitmapping has been enabled on the Harvard Dataverse. We did a cursory check on the Google Dataset Search and could not find any datasets from our dataverse (Abdul Latif Jameel Poverty Action Lab), which resides in the Harvard DV installation. 

Thanks!
James
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

danny...@g.harvard.edu

unread,
Feb 22, 2019, 2:21:09 PM2/22/19
to Dataverse Users Community
Hey James,

Thanks for the report. We need to change our robots.txt file in Harvard Dataverse. We'll do it next week. We had been experimenting with some changes to the file because we had seen some instability that we thought was related to search crawler traffic. 

Thanks,

Danny

Philip Durbin

unread,
Feb 25, 2019, 1:34:40 PM2/25/19
to dataverse...@googlegroups.com
Danny's right about robots.txt. Production installations of Dataverse should make sure that search engines aren't being blocked[1].

Additionally, if you are running Dataverse 4.10 or higher, you should enable a sitemap[2]. Dataverse 4.10 also includes improved Schema.org JSON-LD representations of datasets[3]. I know you didn't mention searching by ORCID id, for example, but author identifiers and a bunch of other fields were added. I like using a browser addon called "OpenLink Structured Data Sniffer" to look at them (screenshot attached) but you can also just "view source" on the dataset landing page to see the JSON-LD.

I hope this helps,

Phil


To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

To post to this group, send email to dataverse...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
Screen Shot 2019-02-25 at 1.11.56 PM.png

Amber Leahey

unread,
Feb 26, 2019, 9:33:59 AM2/26/19
to Dataverse Users Community
I just wanted to say, we started allowing google to crawl our DV site again after the upgrade to 4.10.1 and this has improved greatly over the previous time we enabled this. Thanks! 
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

Philip Durbin

unread,
Mar 20, 2019, 12:42:00 PM3/20/19
to dataverse...@googlegroups.com
Harvard Dataverse recently enabled a sitemap and I can finally find my dataset in Google Dataset Search! Screenshot attached. I think as recently as Monday I still couldn't find it but now I can. Hooray!

I already linked to the docs we have on enabling Dataverse's new sitemap feature but we plan to make improvements to those docs in https://github.com/IQSS/dataverse/issues/5639

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

To post to this group, send email to dataverse...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
Screen Shot 2019-03-20 at 12.34.10 PM.png

Amber Leahey

unread,
Apr 1, 2019, 12:32:37 PM4/1/19
to Dataverse Users Community
Are you seeing "affiliation" show up correctly in these google dataset search results? 

We ran some testing with our Dataverse schema.org metadata and Google doesn't seem to like "Affiliation"


Any ideas? Should this be improved?

Thanks in advance, 
Amber
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

Julian Gautier

unread,
Apr 1, 2019, 1:22:20 PM4/1/19
to dataverse...@googlegroups.com
That structure data tool doesn't like that the metadata doesn't say if the author is a person or an organization, and Dataverse has no way of including that info right now. It's discussed in this github issue: https://github.com/IQSS/dataverse/issues/5029.

Regarding how that metadata is displayed in Google Dataset Search, as far as I can tell it still never shows author affiliation for any dataset it indexes. Have you seen examples of Dataset Search displaying affiliation?

Philipp at UiT

unread,
Sep 17, 2019, 5:22:46 AM9/17/19
to Dataverse Users Community
We have followed your advice on how to enable Google Dataset Search to harvest our metadata, but we still experience some issues.

When searching in Google Dataset Search for a quite recently published dataset (https://doi.org/10.18710/UWP6LL) we get a positive search result when searching for (parts of) the title of the dataset. However, when searching for one of the authors, the dataset is not in the result list.

Searching for the data producer (e.g. UiT The Arctic University of Norway) results only in a small subset of the published datasets.

Do you know it this is due issues in Google Dataset Search (beta!) or in the Dataverse software?

Best, Philipp

Philip Durbin

unread,
Sep 17, 2019, 6:28:46 AM9/17/19
to dataverse...@googlegroups.com
Hmm, I'm not sure about producers but not being able to find Dataverse datasets by author in Google Dataset Search is (unfortunately) a known issue. :(

We're tracking this in https://github.com/IQSS/dataverse/issues/5029 and it boils down to not having "@type": "Person" in the JSON-LD Schema.org output. But we can't hard code this because sometimes the author is an organization. I think we need a more flexible UI so the difference can be expressed with a checkbox or slider (person vs organization) or whatever.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/46e4bb72-1ea8-4773-b09a-248777adfb08%40googlegroups.com.

Julian Gautier

unread,
Sep 18, 2019, 12:26:13 PM9/18/19
to Dataverse Users Community
Hi Phillip. When you wrote data producer, do you mean what's in Dataverse's Producer metadata field? That field isn't mapped to any Schema.org properties. But maybe the larger issue is that we're not 100% sure which parts of the schema.org metadata Google Dataset Search is indexing. Their testing tool is throwing all kinds of errors that weren't there a few months ago, so it's very beta-y like you said.

I'm also not sure if and how adding or improving a repository's sitemap would help with discovery. I say this because when I search for DataverseNO, Google Dataset Search returns only around 225 datasets. I think sitemap support was added Dataverse 4.10 (https://github.com/IQSS/dataverse/issues/4261)?

As far as metadata mapping, Dataverse maps the repository name, in this case DataverseNO, to Schema.org publisher and provider properties:

Screen Shot 2019-09-18 at 11.12.18 AM.png


We could map Dataverse's Producer field to publisher or provider or add the producer property and map Dataverse's Producer field to that, but I don't know if Google Dataset Search would make those fields searchable.

(For now, in this working crosswalk I mapped Dataverse's Producer field to the Schema.org producer property.)

Philipp at UiT

unread,
Sep 20, 2019, 9:55:27 AM9/20/19
to Dataverse Users Community
Hi Julian,

Thanks for diving into this. I think ideally, we'd like the metadata about institutions/organizations to be harvested by Google and others from several fields in Dataverse. There is, e.g., the Affiliation field of the author, and the Affiliation field of the Distributor, and as I mentioned the Producer metadata field. For the partner organizations of DataverseNO it would be useful to be able to search for their organization name in discovery services like Google Dataset Search, and then they would get a complete list of datasets that their researchers have published in DataverseNO or in any other repository providing metadata in a standardized way. Once ORCIDs and institutional IDs (cf. ROR; https://www.ror.community) are widely implemented, such monitoring should be easy to achieve? As you say, Google Dataset Search seems still to be rather beta-ish. So I think for the time being, this is the answer I'll give to our partner organizations who have been asking why not all other their DataverseNO datasets appear in Google Dataset Search. But maybe we could provide some feedback about the issue to Google?

Best, Philipp

Julian Gautier

unread,
Sep 23, 2019, 2:13:03 PM9/23/19
to Dataverse Users Community
I agree!

A Google blog posted when the service was announced last year briefly mentioned how they'll tackle the inevitable issues with how data publishers use the same metadata fields in different ways:

Therefore, in some cases we substitute a more general field name (e.g., “provided by”) to display the values coming from multiple other fields (e.g., “publisher”, “creator”, etc.). In other cases, we are not able to use some of the fields at all: if a specific field is being misinterpreted in many different ways by dataset providers, we bypass that field for now and work with the community to clarify the guidelines.

This seems to be the case still. I haven't seen a clarification of their guidelines for using Schema.org, yet.

They point to their Webmaster's forum as a way to get answers and give feedback, but in my experience they haven't been really responsive there about schema.org and Dataset Search questions. Amber from Scholar's Portal posted a similar question, which has gone unanswered. 

Some Google employees are pretty involved in professional org workshops and working groups - I could reach out to members of an RDA group whose meetings I try to attend to ask if they have insights.

Philipp at UiT

unread,
Sep 24, 2019, 12:25:25 AM9/24/19
to Dataverse Users Community
Thanks, Julian, for following up on this!

In my previous posting, I mentioned briefly ROR - Research Organization Registry, which aims at establishing a common framework for institutional IDs. On October 7, there will be a webinar on how ROR is being implemented in DRYAD. I think this could be of interest for Dataverse, too:

Best, Philipp

Leonhard Maylein

unread,
Sep 24, 2019, 3:25:35 AM9/24/19
to Dataverse Users Community
May I ask another question in this thread?

Google Webmaster Tools complain about invalid data in our instance: Invalid object type for field "license".

This refers, for example, to the following data record:
https://heidata.uni-heidelberg.de/dataset.xhtml?persistentId=doi:10.11588/data/10087

We have entered HTML code in "Terms of use" because we want to display the name of the CC license and the URL (as a link).

However, Google requires either a URL or plain text in "License" (https://developers.google.com/search/docs/data-types/dataset).

Would it make sense to make the metadata for "Terms of use" more machine-readable for this purpose? However, it would probably be necessary to be able to enter the license name and the URL separately.

Or are there recommendations on how we can better record the license information in our Dataverse instance?

Philip Durbin

unread,
Sep 24, 2019, 7:56:03 AM9/24/19
to dataverse...@googlegroups.com
Hi Leonhard,

This isn't really an answer but I wanted to put on your radar that
just yesterday Don Sizemore from Odum mentioned similar trouble with
"license" at http://irclog.iq.harvard.edu/dataverse/2019-09-23#i_106489

It's still early for him (and me) but when I see him in
http://chat.dataverse.org I'll be sure to point him to this thread.
You are welcome to join us in Dataverse chat, of course! You are also
welcome to go ahead and create an issue at
https://github.com/IQSS/dataverse/issues because I do think there's
something going on. Where there's smoke, there's fire. :)

Thanks,

Phil
> --
> You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/88f4b836-184c-426b-9571-a51d98043d40%40googlegroups.com.

Julian Gautier

unread,
Sep 25, 2019, 12:06:34 PM9/25/19
to dataverse...@googlegroups.com
Hi Leonhard,
 
May I ask another question in this thread?
Google Webmaster Tools complain about invalid data in our instance: Invalid object type for field "license".
This refers, for example, to the following data record:
https://heidata.uni-heidelberg.de/dataset.xhtml?persistentId=doi:10.11588/data/10087
We have entered HTML code in "Terms of use" because we want to display the name of the CC license and the URL (as a link).
However, Google requires either a URL or plain text in "License" (https://developers.google.com/search/docs/data-types/dataset).
Would it make sense to make the metadata for "Terms of use" more machine-readable for this purpose? However, it would probably be necessary to be able to enter the license name and the URL separately.
Or are there recommendations on how we can better record the license information in our Dataverse instance?


Google's Search Console and Structured Data Tool are marking new errors and warnings in Dataverse's Schema.org metadata, but for the most part Google Dataset Search is still indexing and displaying the metadata that it's showing these errors and warnings about.

In Harvard Dataverse's Google Search Console, the first three of the errors in the screenshot below are about the license metadata:

Screen Shot 2019-09-25 at 10.48.45 AM.png



Google recommends that the license metadata looks like one of these two examples:

Screen Shot 2019-09-25 at 10.43.17 AM.png


 
For the dataset you mentioned where CC0 is waived and another license is entered in Terms of Use, the license metadata in the Dataverse Schema.org json export looks like this:

Screen Shot 2019-09-25 at 10.44.32 AM.png



The "Invalid object type for field "license"" error you mentioned and that we see in the first screenshot is pointing out that the @type called "Dataset" is invalid for Schema.org's "license" property. In the second screenshot, I see that Google recommends an @type of "CreativeWork" (and according to the license property page, only CreativeWork and URL are valid @types). This is an error Google never flagged, in its Structured Data Tool and in the Search Console, until recently, and we when we were implementing Schema.org metadata for Dataverse, we made the decision to use @Dataset hoping that eventually it would be an allowable @type for the license property.

I'll save thoughts on the other errors for another post.

The Structured Data Tool and Google Dataset Search doesn't have a problem with us putting HTML in "text" (the property, under license, being used to hold the Terms of Use). Google Dataset Search still displays the Terms of Use of the dataset you mentioned, with the rendered HTML.

But I agree, at the least, the CC BY 4.0 name and its url should go in their own properties, so it's more machine readable. Something like:

"license": {
   
"@type": "Dataset",
   
"text": "Creative Commons Attribution 4.0 International (CC BY 4.0"),
   
"url": "http://creativecommons.org/licenses/by/4.0/"

I think ideally we'd want to let people editing metadata choose from a list of licenses. This is discussed in https://github.com/IQSS/dataverse/issues/1753. When a depositor editing her metadata chooses CC BY 4.0, the license name and url are displayed in the UI and included in the Schema.org export (and other metadata exports).

For now, I don't think we have any way to make what's put in the Terms of Use field be more machine readable in the Schema.org metadata export.

Julian Gautier

unread,
Sep 26, 2019, 10:59:00 AM9/26/19
to Dataverse Users Community
I made some small corrections to my last post.

Also wanted to share that Google started a Google Dataset Search Announcements forum last week (a day after they announced the new Dataset section in the Search Console). Hopefully they'll start syncing their guidelines to the warnings and errors that their tools are reporting.


On Wednesday, September 25, 2019 at 12:06:34 PM UTC-4, Julian Gautier wrote:
Hi Leonhard,
 
May I ask another question in this thread?
Google Webmaster Tools complain about invalid data in our instance: Invalid object type for field "license".
This refers, for example, to the following data record:
https://heidata.uni-heidelberg.de/dataset.xhtml?persistentId=doi:10.11588/data/10087
We have entered HTML code in "Terms of use" because we want to display the name of the CC license and the URL (as a link).
However, Google requires either a URL or plain text in "License" (https://developers.google.com/search/docs/data-types/dataset).
Would it make sense to make the metadata for "Terms of use" more machine-readable for this purpose? However, it would probably be necessary to be able to enter the license name and the URL separately.
Or are there recommendations on how we can better record the license information in our Dataverse instance?


Google's Search Console and Structured Data Tool are marking new errors/warnings in Dataverse's Schema.org metadata, but for the most part Google Dataset Search is still indexing and displaying the metadata that it's showing these errors and warnings about.

In Harvard Dataverse's Google Search Console, the first three of the errors in the screenshot below are about the license metadata:

Screen Shot 2019-09-25 at 10.48.45 AM.png



Google recommends that the license metadata looks like one of these two examples:

Screen Shot 2019-09-25 at 10.43.17 AM.png


 
For the dataset you mentioned where CC0 is waived and another license is entered in Terms of Use, the license metadata in the Schema.org json looks like this:

Screen Shot 2019-09-25 at 10.44.32 AM.png



The "Invalid object type for field "license"" error you mentioned and that we see in the first screenshot is pointing out that the @type called "Dataset" is invalid for Schema.org's "license" property. In the second screenshot, I see that Google recommends an @type of "CreativeWork" (and according to the license property page, only CreativeWork and URL are valid @types). This is an error Google never flagged until recently, and we when we were implementing Schema.org metadata, we made the decision to use @Dataset hoping that eventually it would be an allowable @type for license.

I'll save thoughts on the other errors for another post.

The Structured Data Tool and Google Dataset Search doesn't have a problem with us putting HTML in "text" (the property under license). Google Dataset Search still displays the Terms of Use of the dataset you mentioned with the rendered HTML.

But I agree, at the least, the CC BY 4.0 name and its url should go in their own properties, so it's more machine readable. Something like:

"license": {
   
"@type": "Dataset",
   
"text": "Creative Commons Attribution 4.0 International (CC BY 4.0"),
   
"url": "http://creativecommons.org/licenses/by/4.0/"

I think ideally we'd want to let people editing metadata choose from a list of licenses. This is discussed in https://github.com/IQSS/dataverse/issues/1753. When a depositor editing her metadata chooses CC BY 4.0, the license name and url are displayed in the UI and included in the Schema.org export (and other metadata exports).

For now, I don't think we have any way to make what's put in the Terms of Use field be more machine readable in the Schema.org metadata.
Reply all
Reply to author
Forward
0 new messages