Skip to first unread message

Benjamin Peuch

unread,
May 26, 2020, 8:16:29 AM5/26/20
to Dataverse Users Community
Hello everybody,

Because of the GDPR and other legal obligations to look into how we handle our users' personal information, we've been studying how Dataverse stores personally identifiable information (PII) in order to monitor personal data collection processes and come up with a comprehensive, transparent policy about it.

Overall, it seems to us that Dataverse automatically stores little personal data and that it does so in a very open, unambiguous manner. Users are seldom unaware that Dataverse collects information about them since, most of the time, the software explicitly asks them to provide such information.

We have identified the following elements. I am hoping to get community input and feedback on this list.

1. User-related / Account information

What information?
First name, surname and email address (mandatory information) + affiliation and position (optional) + (potentially PII) username (e.g. "benjamin_peuch").

Where does it come from?
Users either encode this information themselves when they sign up, or it is imported from another database if they use their institutional login.

What does Dataverse do with this information besides recording it?
Mostly, Dataverse uses a user's email address to send them notifications when they have created a dataset, when their dataset was published or when they have received a new role. Dataverse also automatically fills in the Depositor field with the user's surname and first name, and the user is free to change that unless this field was hidden from view by the admin.

How is the information protected?
Dataverse does not disseminate its users' email addresses: neither are they available e.g. on a public list or on public user pages, nor are they harvestable by crawlers. There is a list with user info, but it is available only to Dataverse administrators and super users via their dashboard.

2. Email address(es) of a (sub-)Dataverse's administrator(s)

What information?
One or more email address.

Where does it come from?
Provided by a (sub-)Dataverse's administrator through the GUI (homepage > Edit > General Information).

What does Dataverse do with this information besides recording it?
Dataverse sends notification to the email address(es) as the software monitors general activity on the Web application (creating datasets, publishing datasets, file ingest…).

How is the information protected?
Dataverse does not disseminate the email address(es) of its administrator(s) via the GUI: neither are they available e.g. on a public list or on public admin account pages. They do appear in the robots.txt file, so they are theoretically harvestable by crawlers. However, said file can be easily encrypted to prevent this.

3. People identified in a dataset's metadata

What information?
Information about various people related to a study or the ensuing dataset.

Where does it come from?
Provided by depositors.

What does Dataverse do with this information besides recording it?
This information is made public as soon as a dataset is published (a tautology, considering the meaning of the verb publish).

How is the information protected?
The email address(es) encoded in the Contact field(s) are not available as plain data: they only appear if a metadata record is generated. They are therefore not harvestable by the usual crawlers. As for the other PII, it does not require "protection" considering its purpose is to be made public. However, Dataverse administrators have a responsibility regarding the content published (whether by them or directly by their users) on their platform, so they should taken the necessary precautions to ensure that the published personal information is not problematic.


We are also wondering which personal data Dataverse might be automatically recording outside of the GUI.

It is stated in AUSSDA's Privacy Policy that a Web server "generates log files that include the IP address of [the user's] computer." Is this an institution-specific Web server or is it a feature of Dataverse?

I know that IP addresses are not recorded in Dataverse's general server log file (located at / usr / local / glassfish4 / glassfish / domains / domain1 / logs / server.log) but maybe they are saved somewhere else?

Thanks for your time.

Philip Durbin

unread,
May 26, 2020, 10:25:30 AM5/26/20
to dataverse...@googlegroups.com
Email addresses are exposed to harvesting/export but you can set :ExcludeEmailFromExport to true to prevent this. For details on this, please see http://guides.dataverse.org/en/4.20/installation/config.html#email-privacy

Guestbooks also store the authenticateduser_id (database id of the user who downloaded the data). See the "guestbookresponse" table. Here's the schema: http://phoenix.dataverse.org/schemaspy/latest/tables/guestbookresponse.html

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/9673d905-6970-42b5-94e1-4c21c0f05c30%40googlegroups.com.


--

Julian Gautier

unread,
May 26, 2020, 10:33:01 AM5/26/20
to Dataverse Users Community
2. Email address(es) of a (sub-)Dataverse's administrator(s)

How is the information protected?
Dataverse does not disseminate the email address(es) of its administrator(s) via the GUI: neither are they available e.g. on a public list or on public admin account pages. They do appear in the robots.txt file, so they are theoretically harvestable by crawlers. However, said file can be easily encrypted to prevent this.

I'm not sure if this is covered by what you're seeing in the robots.txt file, but the email addresses of sub(dataverse) admins is also shown in the Native API "view dataverse" endpoint, like https://demo.dataverse.org/api/dataverses/juliangautier.

Benjamin Peuch

unread,
May 27, 2020, 4:40:28 AM5/27/20
to Dataverse Users Community
Thanks a lot for your input, Philip & Julian!
Reply all
Reply to author
Forward
0 new messages