Production Date: Date range?

54 views
Skip to first unread message

Leonhard Maylein

unread,
Dec 5, 2018, 9:44:14 AM12/5/18
to Dataverse Users Community

Is it possible to allow date ranges for the production date in future releases?

Or is there better way to record that the production of the data has lasted several months/years ... ?


Julian Gautier

unread,
Dec 5, 2018, 10:36:03 AM12/5/18
to Dataverse Users Community
Hi Leonhard,

Could you share some more about what kind of data it is, how it's being collected and what factors go into deciding when the data is deposited into a Dataverse repository and published? After the data is created/collected and published, will more data in the study be collected and published?

In August, a working group representing several instances of Dataverse repositories across Canada started a discussion in another Google Groups thread about clarifying the date metadata,  https://groups.google.com/forum/#!topic/dataverse-community/n4I-bn1ukyQ, and they called out production date as one of the confusing fields. So I'm hoping we can learn about your use case and what would make sense.

Thanks!
Julian

Leonhard Maylein

unread,
Dec 6, 2018, 4:21:26 AM12/6/18
to Dataverse Users Community
Hi Julian,

we provide datasets from a lot of disciplines. I think in the humanities it is not unusual that data is collected over several years.

Here you will find an example (unfortunately in German): https://doi.org/10.11588/data/H2ILIH

This is another example:
The correct production date: 2012-2018

Leonhard

Sherry Lake

unread,
Dec 6, 2018, 9:35:12 AM12/6/18
to Dataverse Users Community
Hi Leonhard,

If you read the link that Julian sent, you will see there are many interpretations of the many date metadata fields in Dataverse.

The "Production Date" (according to Dataverse) is when all of the data, documentation, was put together (packaged). It is not a required field, so you don't have to use it.

In your most recent email (below), you say "data is collected over several years". Date of Collection (which can be a range) is what you are talking about, not Production Date. Dataverse has fields for "Date of Collection". It is not part of the initial set of fields when you are creating a dataset, but is part of the additional fields you can add once your dataset is created. (Edit metadata once the dataset has been saved).

Hope this helps -
Sherry Lake

Crosas, Mercè

unread,
Dec 6, 2018, 9:39:10 AM12/6/18
to dataverse...@googlegroups.com
I agree with, Sherry. I was going to make this similar comment - you should use Date of Collection for you are truing to describe.

Best,
Merce

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/f045feea-037d-4ed1-86b5-b6c3763fa5fb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
Mercè Crosas, Ph.D., Chief Data Science and Technology Officer, IQSS, Harvard University

Leonhard Maylein

unread,
Dec 6, 2018, 9:51:48 AM12/6/18
to Dataverse Users Community
Okay, thanks for the clarification.
"Production Date" only refers to the publicated dataset not to the data themselves.

Maybe the descriptions in the linked document don't fit such things like "transcriptions" (https://doi.org/10.11588/data/PEFKJM) because they are not "collected" but also "produced".

It seems to me that "Dates of collection", "Publication Date", "Date of deposit" and "Date of distribution" are much more important than the "Production Date".
In case of text publications (articles etc.) the "production date" does not matter. Or am I wrong?

Sherry Lake

unread,
Dec 6, 2018, 11:27:40 AM12/6/18
to dataverse...@googlegroups.com
If you are going to use "Date of Collection" and want to see it on the Summary Metadata fields section - where you have your current "Note" (as I see in this example: https://doi.org/10.11588/data/H2ILIH)

By default, running this command replaces the current default summary, so if you want to add the field dateOfCollection (and keep the others), do it this way:

curl http://localhost:8080/api/admin/settings/:CustomDatasetSummaryFields -X PUT -d 'dsDescription,subject,keyword,publication,notesText,dateOfCollection'




--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.

Leonhard Maylein

unread,
Dec 7, 2018, 2:44:49 AM12/7/18
to Dataverse Users Community
Thanks, this information is valuable.

Julian Gautier

unread,
Dec 7, 2018, 1:34:27 PM12/7/18
to Dataverse Users Community
I agree that the descriptions of these date fields, and maybe even the field names, can be improved. I see that as one outcome of the discussion in the other Google Group thread. And the thinking you've shared so far about the purpose of the "Production Date" field versus "Date of Collection" is really helpful. I agree when you write that "Production Date" only refers "to the publicated dataset not to the data themselves". So the goal I think is the describe that metadata field in way that makes this more clear.

I'm hoping we can continue clarifying these two fields with examples. Without examples the discussion can get frustratingly semantic (at least for me, such as "what does it mean to collect versus produce").  

You wrote that

Maybe the descriptions in the linked document don't fit such things like "transcriptions" (https://doi.org/10.11588/data/PEFKJM) because they are not "collected" but also "produced".

For that dataset, my thinking is that we could use "Date of Collection" to record:
  • the time duration during which the manuscript was being transcribed
  • or the time duration during which the text was being digitized
I think it depends on the purpose of the dataset and what you think is more relevant to record. Do you agree?

Then, as Sherry pointed out, the "Production Date" would be when all of the data, documentation, was put together (packaged).

I also wanted to ask about your comment:
In case of text publications (articles etc.) the "production date" does not matter. Or am I wrong?

By "articles," do you mean published journal articles? If so, I agree, I think it would be very uncommon to record a "production date" in the way I think Dataverse defines it. Of course, Dataverse isn't designed to describe journal articles, and you've referred to transcriptions, which I would also call a text publication. So I just wanted to ask what you meant by articles and text publications.

Philipp at UiT

unread,
Dec 9, 2018, 12:17:50 AM12/9/18
to Dataverse Users Community
We usually encourage researchers to fill in the field "Collection Date". I have interpreted this field to cover the periode when the data were collected for the project which the deposited dataset is about. In some cases, these data may be a subset of a larger collection of data which had been collected in an earlier period. Let me illustrate this with a dataset that I have published myself: The dataset consists of annotated sentences of written Norwegian. I collected these sentences from a corpus of written Norwegian in 2016, so I specified the field "Date of Collection" as start: 2016-07-07; End: 2016-10-23. However, the data my collected data is a subset of, i.e. the data in the corpus of written Norwegian, were collected at some time before 2011. After having read the Google Groups threads on the different date fields in Dataverse, i'm not quite sure anymore whether this interpretation is correct.

Best,
Philipp

Julian Gautier

unread,
Dec 10, 2018, 12:24:57 PM12/10/18
to Dataverse Users Community
Thanks Philipp. My opinion is that your first interpretation is best: the date of collection for the dataset of written Norwegian would be in 2016, when this subset derived from the corpus was collected, and not the dates when the corpus was collected. The tooltip text for "Date of Collection" right now is "Contains the date(s) when the data were collected." Would it be helpful if the description was "Contains the date(s) when this dataset was collected."? Do you think that would clarify which collection activity dates should be used for "Date of Collection"?

Philipp at UiT

unread,
Dec 10, 2018, 10:41:53 PM12/10/18
to Dataverse Users Community
Thanks Julian. Yes, maybe this dataset in bold would help.
Reply all
Reply to author
Forward
0 new messages