Dataverse and new google dataset search

330 views
Skip to first unread message

Michel Bamouni

unread,
Sep 28, 2018, 8:41:18 AM9/28/18
to Dataverse Users Community
Hi,

I see that google offers google dataset search. In this engine, we can search dataset according to my understanding.
So, how can I reference dataverse datasets in this new google dataset search?

Best regards,

Michel

Philip Durbin

unread,
Sep 28, 2018, 8:56:58 AM9/28/18
to dataverse...@googlegroups.com
I may be misunderstanding the question but you can click "Share" in the top right and then click "Click to copy link". I'll attach a screenshot.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/73bee06c-b00b-4e09-8bf2-2b686d8b6257%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
Screen Shot 2018-09-28 at 8.54.18 AM.png

Michel Bamouni

unread,
Oct 1, 2018, 4:14:31 AM10/1/18
to Dataverse Users Community
Hi Phil,

I think, I was not ask my question clearly. So I will try to reformulate it.
Recently, I heard that google setup a new engine called "Dataset search" that link is https://toolbox.google.com/datasetsearch.
In this tool, we can search a dataset. So, what I want to know is how can I connect my dataverse dataset to this tool. The purpose is naturally after connection or referencement to have the capability to search my dataset in https://toolbox.google.com/datasetsearch.

regards,

Michel


Le vendredi 28 septembre 2018 14:56:58 UTC+2, Philip Durbin a écrit :
I may be misunderstanding the question but you can click "Share" in the top right and then click "Click to copy link". I'll attach a screenshot.

On Fri, Sep 28, 2018 at 8:41 AM Michel Bamouni <olimi...@gmail.com> wrote:
Hi,

I see that google offers google dataset search. In this engine, we can search dataset according to my understanding.
So, how can I reference dataverse datasets in this new google dataset search?

Best regards,

Michel

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

Philip Durbin

unread,
Oct 1, 2018, 7:07:55 AM10/1/18
to dataverse...@googlegroups.com
Hi Michel,

You seem to be saying "I'd like Google's search engine (and specifically Google's new Dataset Search feature) to index my datasets."

The first thing to check is that robots.txt is not blocking Google or other search engine crawlers. Please see "Letting Search Engines Crawl Your Installation" at http://guides.dataverse.org/en/4.9.2/installation/config.html#letting-search-engines-crawl-your-installation

In the future, Dataverse will create a sitemap that you can submit to Google or other search engines to make it easier for them to index your datasets. Please see the "Dataverse discovery in Google - Machine Readable Sitemaps" issue at https://github.com/IQSS/dataverse/issues/4261

Feedback (on both code and documentation) is very welcome on the pull request I made on Friday to implement sitemap support for the issue above. You can see the proposed changes at https://github.com/IQSS/dataverse/pull/5084/files

I hope this helps,

Phil

p.s. You might be interested in this previous thread about Google Dataset Search: https://groups.google.com/d/msg/dataverse-community/TlQPNI3Ip2E/srLf29aSBAAJ

On Mon, Oct 1, 2018 at 4:14 AM Michel Bamouni <olimi...@gmail.com> wrote:
Hi Phil,

I think, I was not ask my question clearly. So I will try to reformulate it.
Recently, I heard that google setup a new engine called "Dataset search" that link is https://toolbox.google.com/datasetsearch.
In this tool, we can search a dataset. So, what I want to know is how can I connect my dataverse dataset to this tool. The purpose is naturally after connection or referencement to have the capability to search my dataset in https://toolbox.google.com/datasetsearch.

regards,

Michel


Le vendredi 28 septembre 2018 14:56:58 UTC+2, Philip Durbin a écrit :
I may be misunderstanding the question but you can click "Share" in the top right and then click "Click to copy link". I'll attach a screenshot.

On Fri, Sep 28, 2018 at 8:41 AM Michel Bamouni <olimi...@gmail.com> wrote:
Hi,

I see that google offers google dataset search. In this engine, we can search dataset according to my understanding.
So, how can I reference dataverse datasets in this new google dataset search?

Best regards,

Michel

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

To post to this group, send email to dataverse...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Michel Bamouni

unread,
Oct 8, 2018, 5:24:00 AM10/8/18
to Dataverse Users Community
Hi Phil,

So I want to have the ability to search my dataset in Google's new Dataset Search feature, actually on dataverse side, there is no configuration to do.
All, I have to do is to allow search engine to crawl my data.

Michel
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

Philip Durbin

unread,
Oct 9, 2018, 7:29:25 AM10/9/18
to dataverse...@googlegroups.com
That's correct, as of the latest release of Dataverse, which is 4.9.4, there is no configuration to do other than making sure search engines can crawl your Dataverse installation by opening up robots.txt as described at http://guides.dataverse.org/en/4.9.4/installation/config.html#letting-search-engines-crawl-your-installation

In the next version of Dataverse (I'm not sure if this will be 4.9.5 or 4.10), you will need to do a little configuration to enable a new feature where Dataverse will create a machine-readable sitemap of dataverses and datasets. For now, you can read about the sitemap feature at https://github.com/IQSS/dataverse/blob/c80dc43daf372dd0956c8910b9c78d84eb55d6e9/doc/sphinx-guides/source/installation/config.rst#creating-a-sitemap-and-submitting-it-to-search-engines

I hope this helps,

Phil
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

To post to this group, send email to dataverse...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

James Turitto

unread,
Feb 22, 2019, 12:23:04 PM2/22/19
to Dataverse Users Community
Hi Philip, 

Do you know if the new feature for machine-readable sitmapping has been enabled on the Harvard Dataverse. We did a cursory check on the Google Dataset Search and could not find any datasets from our dataverse (Abdul Latif Jameel Poverty Action Lab), which resides in the Harvard DV installation. 

Thanks!
James
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

danny...@g.harvard.edu

unread,
Feb 22, 2019, 2:21:09 PM2/22/19
to Dataverse Users Community
Hey James,

Thanks for the report. We need to change our robots.txt file in Harvard Dataverse. We'll do it next week. We had been experimenting with some changes to the file because we had seen some instability that we thought was related to search crawler traffic. 

Thanks,

Danny

Philip Durbin

unread,
Feb 25, 2019, 1:34:40 PM2/25/19
to dataverse...@googlegroups.com
Danny's right about robots.txt. Production installations of Dataverse should make sure that search engines aren't being blocked[1].

Additionally, if you are running Dataverse 4.10 or higher, you should enable a sitemap[2]. Dataverse 4.10 also includes improved Schema.org JSON-LD representations of datasets[3]. I know you didn't mention searching by ORCID id, for example, but author identifiers and a bunch of other fields were added. I like using a browser addon called "OpenLink Structured Data Sniffer" to look at them (screenshot attached) but you can also just "view source" on the dataset landing page to see the JSON-LD.

I hope this helps,

Phil


To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

To post to this group, send email to dataverse...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
Screen Shot 2019-02-25 at 1.11.56 PM.png

Amber Leahey

unread,
Feb 26, 2019, 9:33:59 AM2/26/19
to Dataverse Users Community
I just wanted to say, we started allowing google to crawl our DV site again after the upgrade to 4.10.1 and this has improved greatly over the previous time we enabled this. Thanks! 
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

Philip Durbin

unread,
Mar 20, 2019, 12:42:00 PM3/20/19
to dataverse...@googlegroups.com
Harvard Dataverse recently enabled a sitemap and I can finally find my dataset in Google Dataset Search! Screenshot attached. I think as recently as Monday I still couldn't find it but now I can. Hooray!

I already linked to the docs we have on enabling Dataverse's new sitemap feature but we plan to make improvements to those docs in https://github.com/IQSS/dataverse/issues/5639

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

To post to this group, send email to dataverse...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
Screen Shot 2019-03-20 at 12.34.10 PM.png

Amber Leahey

unread,
Apr 1, 2019, 12:32:37 PM4/1/19
to Dataverse Users Community
Are you seeing "affiliation" show up correctly in these google dataset search results? 

We ran some testing with our Dataverse schema.org metadata and Google doesn't seem to like "Affiliation"


Any ideas? Should this be improved?

Thanks in advance, 
Amber
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

Julian Gautier

unread,
Apr 1, 2019, 1:22:20 PM4/1/19
to dataverse...@googlegroups.com
That structure data tool doesn't like that the metadata doesn't say if the author is a person or an organization, and Dataverse has no way of including that info right now. It's discussed in this github issue: https://github.com/IQSS/dataverse/issues/5029.

Regarding how that metadata is displayed in Google Dataset Search, as far as I can tell it still never shows author affiliation for any dataset it indexes. Have you seen examples of Dataset Search displaying affiliation?

Philipp at UiT

unread,
Sep 17, 2019, 5:22:46 AM9/17/19
to Dataverse Users Community
We have followed your advice on how to enable Google Dataset Search to harvest our metadata, but we still experience some issues.

When searching in Google Dataset Search for a quite recently published dataset (https://doi.org/10.18710/UWP6LL) we get a positive search result when searching for (parts of) the title of the dataset. However, when searching for one of the authors, the dataset is not in the result list.

Searching for the data producer (e.g. UiT The Arctic University of Norway) results only in a small subset of the published datasets.

Do you know it this is due issues in Google Dataset Search (beta!) or in the Dataverse software?

Best, Philipp

Philip Durbin

unread,
Sep 17, 2019, 6:28:46 AM9/17/19
to dataverse...@googlegroups.com