Jun 24, 2016, 6:22:42 PM6/24/16
So after reading the scipy proceedings paper I started to notice that I
use datreant a little bit different then originally anticipated.
When I currently simulate a system I have about a hundred to a thousand
different runs for sampling and changing parameters. The analysis will
be saved along side of the simulations in csv files (Already a good step
for me since it removes the data amount from 500GB to a few GB).
To ease up transfering the data to my laptop from the cluster I'm
copying all analysis into one datreant object. The code to aggregate the
data basically calls lines like this a bunch of times.
obs['param1/obs1'] = some_data_from_all_sims_in_a_df
obs['param1/obs2'] = more_data
obs['param2/obs1'] = some_other_data
This is a useful pattern to me because it makes transferring the data
very simple and query what is in the datreant by looking at the keys.
This ease of copying and querying actually made me use datreant in the
The unfortunate part about that is that I can't assign any meta data to
the different observables stored for different data. Since this is only
one datreant object I can only assign metadata once to the whole
datreant. This breaks some of the useful meta data filter that are
The easy solution for me now is to generate more datreants objects to
store everything in the same folder. This means I have to handle the
directory generation myself more in the script that aggregates the data.
Not to much but just using the getitem method from the datreant object
was easy to do.
I think the reason that I did this was that the current docs suggested
to me that I should work with a single datreant. After reading the paper
it actually made click in my head and now I understand what you mean
with the filesystem is the adhoc database and datreant helps to deal
with it. So the examples of the paper should either go into the docs or
link to the paper in the docs.