Greetings all,
Short version:
We (the team here at UQ) are considering making the Experiment/Dataset relationship many-to-many (ie. Datasets can exist in multiple experiments) with a view to generalizing experiment into a “collection” of datasets. The aim is to solve the “3-tier problem” for everybody without breaking lots of existing functionality based on experiments. In the process, grouping datasets would start feeling more like tagging than hierarchical placement.
This is a medium-term idea, and feedback would be appreciated. The idea is to keep it simple initially and give MyTardis room to grow.
Long version:
We had a chat with our lab managers here at UQ on Friday, and once again we ran into the infamous “3-tier” problem. In this case though, it wasn’t that we needed “n-tiers”, it was more that there were many different ways to collect datasets.
The lab managers pointed out that some researchers would be quite happy to group their data by the booking session that they collected it in, but in other cases grouping by sample would be useful, and in other cases grouping for publication might be useful. We had originally intended to solve this by moving/splitting datasets between collections, but the lab managers pointed out that the relationship to the booking session was actually a nice one to keep for finding data. We could always use metadata to mark it, but that still left us wondering how to initially group datasets.
We considered tagging to group related datasets to samples… at which point the tag would need a title, description, and the possibility of associating metadata with it… which is really an experiment. So, why not simply extend experiment?
The idea would be to transform experiment into a more flexible “collection”, and would most likely happen in stages:
1. First, the relationship between Experiment/Dataset would be transformed into many-to-many. This would be a fairly big change, but existing installations could be left largely unaffected.
2. Experiments would morph into a collections, with existing experiments acquiring a “type” of “experiment”. This would provide room for other “types” of groupings.
3. The new collections would be slimmed down, with truly “experiment” concepts being shifted out into metadata or one-to-one “facet” objects.
4. At some future point it might make sense for collections to include other collections, to create a “project” or “gallery” concept.
Does this sound like an improvement to MyTardis? It’s certainly a lot of work, but it would help remove one of the most often stated limitations of MyTardis. I feel like I’m running out ahead of a lot of the development community these days, so I’d like to get some feedback about whether you think this the right way to solve the problem (or even if the problem needs solving in your case).
It would certainly have an impact on how the file-system is arranged, as “experiment/dataset/datafile” would be impossible to replicate without file-system links. Permissions would stay largely the same, but deletion would be an interesting concept that could potentially cause issues. (Most likely we’d handle it like hard-links, with the last reference removing the dataset, but that doesn’t solve data file deletion.)
What do you think?
Thank you,
Tim Dettrick
Senior Software Engineer
ITEE eResearch Group
The University of Queensland
The lab managers pointed out that some researchers would be quite happy to group their data by the booking session that they collected it in, but in other cases grouping by sample would be useful, and in other cases grouping for publication might be useful.
1. First, the relationship between Experiment/Dataset would be transformed into many-to-many. This would be a fairly big change, but existing installations could be left largely unaffected.
2. Experiments would morph into a collections, with existing experiments acquiring a “type” of “experiment”. This would provide room for other “types” of groupings.
3. The new collections would be slimmed down, with truly “experiment” concepts being shifted out into metadata or one-to-one “facet” objects.
4. At some future point it might make sense for collections to include other collections, to create a “project” or “gallery” concept.
Does this sound like an improvement to MyTardis?
It’s certainly a lot of work, but it would help remove one of the most often stated limitations of MyTardis. I feel like I’m running out ahead of a lot of the development community these days, so I’d like to get some feedback about whether you think this the right way to solve the problem (or even if the problem needs solving in your case).
It would certainly have an impact on how the file-system is arranged, as “experiment/dataset/datafile” would be impossible to replicate without file-system links. Permissions would stay largely the same, but deletion would be an interesting concept that could potentially cause issues. (Most likely we’d handle it like hard-links, with the last reference removing the dataset, but that doesn’t solve data file deletion.)