Two questions about the Dataverse BagIt implementation

62 views
Skip to first unread message

Menko de Ruijter

unread,
Dec 12, 2019, 5:20:07 AM12/12/19
to Dataverse Users Community
Hello all,

Here at DANS I have been asked to look into the Dataverse(version 4.18) implementation of BagIt. 
It's part of some research that's being done for an Innovation project here at DANS. 
We would like to get more detailed knowledge about the Research Data Alliance specification that's being used for the bag. 
 
Firstly, a question regarding one of the definitions in the BagPack Format Description: 
The according profile MUST be referenced by a property BagIt-Profile-Identifier inside bag-info.txt

BagItQ1.png






















Our first question is why the previously mentioned definition doesn't seem te be implemented in the Dataverse bag-info.txt?

Lastly we would like to know why the dataset title is used as a sub folder containing all the dataset files inside the data directory?
One of our concerns there is that some dataset titles from our archives can be very long and complicated. It seems a bit unnecessary. 

BagItQ2.png

Thanks in advance!


Menko

James Myers

unread,
Dec 12, 2019, 8:40:01 AM12/12/19
to dataverse...@googlegroups.com

Menko,

 

First – a disclaimer. The current implementation is “just a version 1”, so that’s part of any answer and any/all discussion of issues/improvements are very welcome.

 

For profiles, the recommendation is only that a BagPack SHOULD have a profile and I assume that the “MUST be referenced” line only applies if you do have one. My personal opinion is that profiles at the bag level aren’t that useful and they should really be made available per repository if at all. (Profiles include things such as Allow-Fetch which indicates whether a Bag can/can’t have a fetch file pointing to external data. A single bag either does or doesn’t have such a file so including a profile entry indicating whether it’s allowed seems to be of limited value, especially if profiles are generic with lots of optional items. (It might be more useful to know whether Bags from a given source are ever going to have fetch files if you wanted to decide whether to harvest from such a source, etc. )) However, the fact that the existing Dataverse implementation doesn’t have them is as much due to it being optional as anything so I would definitely be interested to hear what you/others think about the value of having a profile.

 

For the structure inside the data directory: my understanding is that the Bag specification and RDA recommendation don’t require anything here, so any structure is arbitrary. Nominally, all the information about file names, path structure, and, as you note, the overall title of the dataset, are recorded in metadata, so none of them are required in the data directory. I think my choice was to just make it convenient to be able to unzip the Bag and go to the data dir and be able to copy the contents out to your disk without having to create a top-level directory first. That said, removing the dataset title directory, or removing the path structure from the dataset, or even shortening file names are all allowable – that information is all in the ORE map and other files. Automated processing (e.g. for import) definitely has to follow the metadata rather than the direct physical structure in the data dir, so the question is really about what is convenient for manual inspection/use.

 

For all of these types of questions that come up, I’m going to start a google doc or wiki page to get community feedback, so stay tuned. For something like having the dataset title as a dir inside /data, the development level question is basically whether the community can agree to always have it, always not have it, or that it should be a configuration option. As we go forward to making it possible to ingest Bags as well as export them, we can start looking at other improvements as well.

 

Thanks,

   -- Jim

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/54759f09-6ab1-4a77-a100-6d60f3ba50a8%40googlegroups.com.

Menko de Ruijter

unread,
Dec 13, 2019, 5:41:10 AM12/13/19
to Dataverse Users Community
Hello Jim,

Thanks for clearing that up!
I think we missed the connection with the previous "SHOULD have a profile" line, and thought the "MUST be referenced" line had to state that this bag(profile) was made according to the BagPack spec. Honestly I'm a bit new to BagIt/BagPack. 
Let me communicate your answers with my colleagues first, It's quite possible we'll have some more questions afterwards.

Thanks,
Menko

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

Reply all
Reply to author
Forward
0 new messages