"Format" property, datasets and statistical packages

37 views
Skip to first unread message

Domingo Scisci

unread,
Jul 26, 2016, 4:38:04 PM7/26/16
to DataCite Metadata

Hi all,
I'm writing from UniData, the Social Science Data Archive of the University of Milano-Bicocca. We're implementing the DOI service for our research datasets and I've got a question regarding the "Format" Property. The majority of our distributed data are in SPSS, Stata, SAS, or R formats (they are statistical package, if you don't know them). The DataCite documentation suggests using the file extension or MIME types. The first option seems ok, but I think it's not unambiguous: for instance, several programs out there use the "SAV" extension, as the SPSS Software. The second one don't consider that there're no a complete list of available MIME type (for instance, the IANA list does not contain any aforementioned software). So, I'm thinking of using the software name and its version in "Format" property, like:
<Format>SPSS (22)</Format>
<Format>Stata (14)</Format>
<Format>SAS (9.4)</Format>
<Format>R (3.3.1)</Format>
In that way I think we can give users a clear information about the software (and version) used to build the data files.
What do you think? is it too much out of the DataCite schema? Do you know what other Research Data Archive are using?

Thank you very much.
Kind regards,

Domingo Scisci
UniData - Bicocca Data Archive
University of Milano-Biccoca

Joan Starr

unread,
Jul 27, 2016, 11:18:47 AM7/27/16
to DataCite Metadata
Hi Domingo,
You can get an idea about what other data centres are doing by using the "old" DataCite Metadata Store search tool (http://search.datacite.org/ui) and choosing the Advanced option. You will see that you can search by the Format property. I just did a search for DOIs with the value "R" in Format: http://search.datacite.org/ui?q=format%3AR&fq=&fq=&fq=&fq=&fq=&fq=

You might do more searching and explore this.

Best,
Joan Starr
Co-chair, DataCite Metadata Working Group

Domingo Scisci

unread,
Aug 16, 2016, 5:01:35 AM8/16/16
to DataCite Metadata
Hi Joan,
thank you for your reply. I did some searching and I must say there's no a standard in indicating the dataset formats. To stick to the schema I decided to use the mime types. In my opinion it could be a good idea adding the software version in the "format" property, as an attribute, like:
<format version="22">application/x-spss-sav</format>
<format version="13">application/x-stata-dta</format>

Thank you,
Domingo

--
Domingo Scisci
UniData - Bicocca Data Archive
University of Milano-Biccoca

Joan Starr

unread,
Aug 16, 2016, 10:36:01 AM8/16/16
to DataCite Metadata
Hi Domingo,
Thank you for your suggestion. I will bring it to the small group that is considering changes we can make to schema to support software citation.
Best regards,

Joan Starr
Co-chair, DataCite Metadata Working Group

Reply all
Reply to author
Forward
0 new messages