Data Explorer Demo

116 views
Skip to first unread message

danny...@g.harvard.edu

unread,
Nov 8, 2016, 5:01:43 PM11/8/16
to Dataverse Users Community
Hi everyone,

Big thanks to Amber and Kevin from SP for the Data Explorer demo on Tuesday's community call! It was great to see it in action and to hear the discussion afterwards. The notes from the call are here:


We've been doing some thinking internally about how we can better display tabular metadata, and the outcome of that IQSS meeting is here:


So, I'd like to generate some discussion - between the features demonstrated in the Data Explorer, the ideas in the ticket above, and any other ideas, what would be the most valuable for the researchers in your respective institutions and disciplines?  







Vyacheslav Tikhonov

unread,
Nov 8, 2016, 6:08:50 PM11/8/16
to Dataverse Users Community

 We (DANS) also very interested in the Data Explorer. I'm not sure if SP should call it Explorer or Data frame, it's more datasets discovery tool, but it seems to be very useful for the community


I've put some thoughts in it and have some questions and suggestions:

- is it open source and API based tool?

- is it possible to connect it to another visualisations like D3.js? 

- how to link the same variables coming from different datasets? Can be interesting for us to provide suggestions to users during upload process.

- word cloud can be interesting feature to get it on the front page or for specific dataverse.

- can it be used to link variables to controlled vocabularies?

 

-- 

Best regards,

Vyacheslav Tykhonov (Slava)

Senior Information Scientist,

Research and Innovation Department

vyachesla...@dans.knaw.nl

 

Data Archiving and Networked Services (DANS)

DANS offers durable access to digital research data.

Please visit www.dans.knaw.nl for more information and contact details.

DANS is an institute of KNAW and NWO.

 

DANS | Anna van Saksenlaan 51 | 2593 HW Den Haag | Postbus 93067 | 2509 AB Den Haag | +31 70 349 44 50 | in...@dans.knaw.nl | www.dans.knaw.nl

Kevin Worthington

unread,
Mar 1, 2017, 1:46:06 PM3/1/17
to Dataverse Users Community
Thanks for opportunity to present this project Danny. We're really excited about the interest and the possibility of having it bundled with the master branch in a future dataverse release. I understand that ver 4.6.2 will have tabular data support for geospatial data and feel this would be a great opportunity to incorporate native support for visualizing ingested tab files too. Let me know what's needed to make this happen.

Hi Vyacheslav,
I've added answers to your questions below in bold

- is it open source and API based tool? Yes, the source code can be downloaded here https://github.com/scholarsportal/Dataverse-Data-Explorer and the data which powers it comes from the dataverse Data Access API under the section Data Variable Metadata Access as documented here http://guides.dataverse.org/en/latest/api/dataaccess.html

Another API call is made to get the variable frequency information which is currently stored in the "prep" file (but since it's supposed to be shipped with the DDI metadata, this second call won't be needed in the future). A working group is being established to help complete the DDI metadata served by dataverse.

The Data Explorer can be run stand-alone on your local host by passing a remote uri as demonstrated by the following link https://dataverse.scholarsportal.info/ddi_explore/index.html?uri=https://dataverse.scholarsportal.info/api/access/datafile/47298/metadata/ddi or incorportated into the dataverse interface as we've done.

- is it possible to connect it to another visualisations like D3.js? I see no reason why not. 

- how to link the same variables coming from different datasets? Can be interesting for us to provide suggestions to users during upload process. Currently the Data Explorer is powered by the metadata at the file level. To show a longitudinal study allowing the comparison of variables across surveys could be easily done by pulling the ddi metadata at the dataset level. This was done in another one of our applications using the same Data Explorer codebase called ODESI and can be demonstrated here http://search1.odesi.ca/#/details?uri=%2Fdataverse_sp%2Fhdl--10864_10677.xml

To compare variables across studies outside of a dataset is definitely of interest to us and we'd be happy to discuss how this might look.

- word cloud can be interesting feature to get it on the front page or for specific dataverse. An API call could be used to make this possible at the dataverse level, but I'm thinking it would be a great way to show open-ended question responses :)

- can it be used to link variables to controlled vocabularies? If you have a controlled variable mapping structure, it would be possible to create these links.


Kevin

danny...@g.harvard.edu

unread,
Mar 1, 2017, 9:25:05 PM3/1/17
to Dataverse Users Community
Hey Kevin - thanks for including this additional info. I pinged a few folks on the team who will be interested and will have comments/questions!

Philip Durbin

unread,
Mar 2, 2017, 9:20:17 AM3/2/17
to dataverse...@googlegroups.com
A quick comment is that there's a new 'Make "Explore" calls more modular' issue at https://github.com/IQSS/dataverse/issues/3657 that the team discussed yesterday. My understanding is that ideally we'd like each tool to provide a jar file that can be dropped into Glassfish to enable the Explore button for that tool. (We would provide guidance and examples of how to do this.) As far as what goes into the master branch, maybe it's better to think of it as what goes into the "installer" (dvinstall.zip). Instead of just dataverse.war the zip file could contain two-ravens.jar, geoconnect.jar, data-explorer.jar, etc. That's the idea anyway. The plan would be to make use of Service Provider Interface (SPI) the per the "Creating Extensible Applications" tutorial at https://docs.oracle.com/javase/tutorial/ext/basics/spi.html

Currently we use SPI in the "export" code and you can see an example at https://github.com/IQSS/dataverse/blob/develop/src/main/java/edu/harvard/iq/dataverse/export/DublinCoreExporter.java

Currently, that DublinCoreExporter code has not be factored out as a separate jar but that would be the next step to help guide developers to know how to make Dataverse more modular, more extensible.

I hope this makes sense. I guess it wasn't so quick of a comment after all. :)

This SPI stuff might be more on topic over at https://groups.google.com/forum/#!forum/dataverse-dev :)

Phil

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse-community@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/663e731e-6ba7-46a8-a9d0-1bc034a40ab1%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--

Vyacheslav Tikhonov

unread,
Mar 2, 2017, 10:39:09 AM3/2/17
to Dataverse Users Community, philip...@harvard.edu
Hi all,

I think the "Explore" button should go to the different applications built around of Dataverse, not only Data Explorer or Two Ravens, and it should depend on the type of file. Basically you can consider all applications as Virtual Research Environments connected to Dataverse that can run standardisation pipelines (for example), visualise stuff from dataset or do something useful with files.

We also have an idea to connect Dataverse through Explore button to OCR and NLP systems and put recognised text and Named Entities back to the original dataset with provenance information of used tools and services applied to the dataset. Also there is great potential to use Explore button with different applications to analyse data quality, for example, to let user apply Benford's law for statistical data and produce kind of quality ranking for his stuff stored in Dataverse. 

I see Dataverse as kind of Android OS or iOS for research datasets and all applications built around Dataverse as Apple Store. There is also great potential to reuse existing services for the data processing without reinventing wheel, we just need to think about RESTful APIs (or something) which will allow connect them to Dataverse.

Best,
Slava

DataverseNL application manager
To post to this group, send email to dataverse...@googlegroups.com.

Durand, Gustavo

unread,
Mar 2, 2017, 1:01:05 PM3/2/17
to dataverse...@googlegroups.com, Durbin, Philip
Hi all,

I was planning on sending an e-mail clarifying a few things, but Slava beat me to it! :)

But, generally, yes, the idea is that Dataverse is the repository for the datasets, and can connect to any and all external* tools.

* this includes TwoRavens, WorldMap, DataExplorer, and anything else that can talk to Dataverse via its APIs. In some cases, the tool that talks to Dataverse will be middleware that talks to the external tool via its APIs (e.g. GeoConnect for WorldMap)

The modularity behind the "Explore" button is to provide the link on the Dataverse side to these tools. Right now the code is hard coded to work for TwoRavens and GeoConnect, but once the infrastructure is set up, an installation should be able to connect to any new tool:

In simpler cases, this would just involve adding a row in the database and using a Default Handler (for examples TwoRavens will likely work this way)

In more complex cases, there may also be need to write a specific Handler class. This is where the SPI concept comes in, as this Handler would implement some interface. If it is a tool that we think many installations will use, we'll probably accept this handler as a pull request to the core code. However, and this is the critical piece, in my opinion, if not, it should be possible to put this handler in its own jar, drop it in, restart the server, add the row to the db, and voila!: a new tool**. In this way, you're not having to fork the core code to provide custom functionality and you'll still be able to easily upgrade when we make new releases.

** assuming of course, that the tool has been properly set up on its side

This is still in the proof of concept stage, so may (and likely, will) change some, but that's the general idea.

(and note we are planning on using this SPI concept for other functionality as well; eventually we'll have a document listing all the different area that have these kinds of hooks. Right now, as Phil pointed out, it's used for export)

Let me know your thoughts / questions.

Gustavo

On Thu, Mar 2, 2017 at 10:39 AM, Vyacheslav Tikhonov <4tik...@gmail.com> wrote:
Hi all,

I think the "Explore" button should go to the different applications built around of Dataverse, not only Data Explorer or Two Ravens, and it should depend on the type of file. Basically you can consider all applications as Virtual Research Environments connected to Dataverse that can run standardisation pipelines (for example), visualise stuff from dataset or do something useful with files.

We also have an idea to connect Dataverse through Explore button to OCR and NLP systems and put recognised text and Named Entities back to the original dataset with provenance information of used tools and services applied to the dataset. Also there is great potential to use Explore button with different applications to analyse data quality, for example, to let user apply Benford's law for statistical data and produce kind of quality ranking for his stuff stored in Dataverse. 

I see Dataverse as kind of Android OS or iOS for research datasets and all applications built around Dataverse as Apple Store. There is also great potential to reuse existing services for the data processing without reinventing wheel, we just need to think about RESTful APIs (or something) which will allow connect them to Dataverse.

Best,
Slava

DataverseNL application manager

On Thursday, March 2, 2017 at 3:20:17 PM UTC+1, Philip Durbin wrote:
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsubscribe...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse-community@googlegroups.com.

Kevin Worthington

unread,
Mar 2, 2017, 2:19:30 PM3/2/17
to Dataverse Users Community
I'm totally in support of using SPI to integrate the Data Explorer into Dataverse in a more formalized manner, though I wonder if being able to see the variables in a tab file could be incorporated within the dataverse interface. A great example of this is presented within the Tabular Mapping documentation under the heading "1. Once "Map Data" is pressed, the user is brought to the following page:" https://github.com/IQSS/geoconnect/wiki/Tabular-Mapping#1-once-map-data-is-pressed-the-user-is-brought-to-the-following-page

Kevin

On Tuesday, November 8, 2016 at 5:01:43 PM UTC-5, danny...@g.harvard.edu wrote:

Durand, Gustavo

unread,
Mar 2, 2017, 2:58:53 PM3/2/17
to dataverse...@googlegroups.com
Good idea, Kevin. I think we're tracking that already in:

and this other issue is related:


But I do think "how" we do is very tied into this whole discussion.

On Thu, Mar 2, 2017 at 2:19 PM, Kevin Worthington <kwor...@gmail.com> wrote:
I'm totally in support of using SPI to integrate the Data Explorer into Dataverse in a more formalized manner, though I wonder if being able to see the variables in a tab file could be incorporated within the dataverse interface. A great example of this is presented within the Tabular Mapping documentation under the heading "1. Once "Map Data" is pressed, the user is brought to the following page:" https://github.com/IQSS/geoconnect/wiki/Tabular-Mapping#1-once-map-data-is-pressed-the-user-is-brought-to-the-following-page

Kevin

On Tuesday, November 8, 2016 at 5:01:43 PM UTC-5, danny...@g.harvard.edu wrote:

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse-community@googlegroups.com.

Vyacheslav Tikhonov

unread,
Mar 2, 2017, 5:11:07 PM3/2/17
to Dataverse Users Community
Oh well, I think the problem here is much more complex than just use SPI to run some application, it's just one technical issue. 

We already have a lot of beautiful applications integrated with Dataverse (see http://nlgis.nl and http://www.clio-infra.eu as examples) but biggest pain is to provide support and maintenance of them after projects are actually finished and there is no money (or time) left. Here we're coming to the discussion about Sustainability of Software, which is very tough topic.

Thinking about high expectations of users applying tools to their data in Dataverse we should consider licence agreement in the way like it's done by marketplaces that can guarantee support of the software for some defined time in order to be trusted by users. If something isn't working properly and new functionality is required someone should take actions and this pain of all stakeholders isn't possible to ignore.

I think the first right step in this direction is to create some Registry of all apps developed for Dataverse and provide information about licence, responsible people and other stuff. 

Best,
Slava


On Thursday, March 2, 2017 at 8:58:53 PM UTC+1, Gustavo Durand wrote:
Good idea, Kevin. I think we're tracking that already in:

and this other issue is related:


But I do think "how" we do is very tied into this whole discussion.
On Thu, Mar 2, 2017 at 2:19 PM, Kevin Worthington <kwor...@gmail.com> wrote:
I'm totally in support of using SPI to integrate the Data Explorer into Dataverse in a more formalized manner, though I wonder if being able to see the variables in a tab file could be incorporated within the dataverse interface. A great example of this is presented within the Tabular Mapping documentation under the heading "1. Once "Map Data" is pressed, the user is brought to the following page:" https://github.com/IQSS/geoconnect/wiki/Tabular-Mapping#1-once-map-data-is-pressed-the-user-is-brought-to-the-following-page

Kevin

On Tuesday, November 8, 2016 at 5:01:43 PM UTC-5, danny...@g.harvard.edu wrote:
Hi everyone,

Big thanks to Amber and Kevin from SP for the Data Explorer demo on Tuesday's community call! It was great to see it in action and to hear the discussion afterwards. The notes from the call are here:


We've been doing some thinking internally about how we can better display tabular metadata, and the outcome of that IQSS meeting is here:


So, I'd like to generate some discussion - between the features demonstrated in the Data Explorer, the ideas in the ticket above, and any other ideas, what would be the most valuable for the researchers in your respective institutions and disciplines?  







--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.

Steven McEachern

unread,
Mar 2, 2017, 6:39:20 PM3/2/17
to Dataverse Users Community
Hi all

The establishment of this set of possible tools sounds like a great development to me.

However I think you're placing a lot of expectations on the "Explore" button. For example, if I have a CSV file - does my user want to use TwoRavens, DataExplorer, or some other tool. All would be legitimate options. I can imagine an "Explore using..." dropdown box - just like the Download button does.

This definitely needs some thought as previous contributors have noted. User experience is going to be important as well

Cheers
Steve

Durand, Gustavo

unread,
Mar 2, 2017, 11:10:30 PM3/2/17
to dataverse...@googlegroups.com
Slava, yes, there's a major policy / management aspect to this - I am mostly just focused on the technical side, so we can enable those discussions to then happen.

Steve, agreed. The idea I have for the modular "Explore" button* is to be a drop down similar to download (when there are multiple options), and differing depending on the file type. I forget that not everyone sees what's in my head! :)

* and, of course, we'll also get our design team involved to test user experience

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

Vyacheslav Tikhonov

unread,
Mar 3, 2017, 1:56:03 AM3/3/17
to Dataverse Users Community
Hi Steve,

Yes, this is what I'm also saying. It should be some Registry of tools created that can be used/reused for Dataverse, preferably in Harvard Dataverse itself.

In the one of our projects we're working on the semi-automatic overview of all stuff produced by all partners (more than 40) and using Dataverse to track all activities:
- for tools we're planning automatically monitor github repositories added to Dataverse by stakeholders to get latest released and place in Dataverse
- all research datasets produced by the project's community
- publications and presentations from community members
- video and audio content
- news about project and PR activity

In principle it should be universal solution to get the overview of any Open Source project and if tools (VREs) have some APIs there is opportunity to connect them to Dataverse.
Basically your "Explore using..." button should go here.

Best,
Slava

Mercè Crosas

unread,
Mar 3, 2017, 9:58:53 AM3/3/17
to dataverse...@googlegroups.com
Great conversation! 

I also agree with the Explore button evolving to a dropdown with a number of potential options for exploring, visualizing and analyzing. It should be the case that those options are not only enabled at the Dataverse installation level, but also at the dataset level. 

This type of options at the dataset level will expand even further once we have the option for "access to compute in the cloud" or similar features (related to Cloud Dataverse and Data Access Alliance). In this case, however, "access to compute" might apply to the entire datasets, with access to all the files associated with that dataset at once.

Merce


----------
Mercè Crosas, Ph.D., Chief Data Science and Technology Officer, IQSS, Harvard University
@mercecrosas mercecrosas.com

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

Philip Durbin

unread,
Mar 3, 2017, 7:15:43 PM3/3/17
to dataverse...@googlegroups.com
Yes, great discussion! A few things.

The "registry of tools" (or apps) idea reminds of the "Appiverse" meme from the 2015 Dataverse Community Meeting. I guess you had to be there but there are some traces* of the idea at https://twitter.com/bencomp/status/609085921229705218 and in the 'Determine how best to present which "apps" to show for which files' issue at https://github.com/IQSS/dataverse/issues/2269 . Thanks, Slava, for reminding us of this.

For my part, when I wrote http://guides.dataverse.org/en/4.6/api/apps.html I was trying to highlight a few "apps" that integrated with Dataverse as open source examples of how to integrate. The list includes TwoRavens but also integrations that have nothing to do with the "Explore" button such as OJS and OSF. Monkey see, monkey do. Look at some open source apps to see how to integrate with Dataverse... then go build your own. I'd be happy to add more open source apps to that list in the API Guide.

There's also a new page at http://dataverse.org/integrations that covers similar ground... apps that integrate with Dataverse such as TwoRavens, WorldMap, OSF, OJS, RSpace, SHARE, etc. Again, some of these integrations are "Explore" button territory and some are not. (Don't miss the "future integrations" tab!)

I guess I'm trying to say that we already have a couple of places to list tools and apps. Maybe we need a third. I'm not sure. :)

I hear you Slava, about the "Sustainability of Software" stuff and I agree it can be a tough topic, especially for open source. Recently I started listening to the "Request For Commits" podcast at https://changelog.com/rfc which is all about "exploring different perspectives in open source sustainability." One of the co-hosts, Nadia Eghbal, recently published "Roads and Bridges" about the topic of open source sustainability and I've been meaning to read it: http://www.fordfoundation.org/library/reports-and-studies/roads-and-bridges-the-unseen-labor-behind-our-digital-infrastructure

Good stuff. Keep up the chatter. Have a good weekend!

Phil

 

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsubscribe...@googlegroups.com.

To post to this group, send email to dataverse-community@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse-community@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages