Where we're going in 2016

7 views
Skip to first unread message

David Dotson

unread,
Jan 11, 2016, 1:23:46 AM1/11/16
to datreant
Hey all,

First, happy new year, and I hope your 2016 is worth living! :D

I wanted to take a moment to lay out my overall vision for datreant, which has been in near constant flux over the latter part of 2015, but which I think has now settled to something coherent enough to share. What once was just the existing core of MDSynthesis is progressing into a more general-purpose package, addressing what I consider a fairly universal pain point among scientists. That is, datreant is giving pythonic form to the fundamental data structure of scientific research: directory trees.

Max is responsible for this broader view. Originally the only thing datreant did with directories was store data structures using the `Treant.data` interface, but a more general API is not only possible, but makes complete sense. Toward this end the core functionality of a Treant should be pythonic tree manipulation, and by extension the files that live in them.

This broader picture means a few big changes. First, the core of datreant should exclude heavy dependencies with limited appeal, such as the use of HDF5. This means breaking out the `Treant.data` limb and others into a separate module that upon import attaches these interfaces to the appropriate classes. This makes datreant's core very light, but the same machinery that makes it possible also makes it easy to build other modules with future appeal, such as `datreant.blaze` or `datreant.dask`. The less-centralized structure keeps us from bloating the core library, while giving the freedom to experiment.

This change requires the core of datreant to move to a new namespace, such as `datreant.core`, since `datreant` itself must become a namespace package. See this issue for discussion on that particular change.

What does this all mean for MDSynthesis? On the surface, nothing will change. Whereas `datreant` will be an a la carte style collection of subpackages with a core, domain-specific packages like MDSynthesis will still come "batteries-included", and will include any datreant submodules as dependencies it needs. But because of this larger change in datreant, `Sim` objects will see all the same interfaces `Treant`s can obtain through imports of datreant subpackages, and of course they'll also get improvements to the core `Treant` object itself.

One last thing: state files are moving to a JSON format. This solves many of the issues we had with using HDF5 for state files, and doesn't come at a real performance hit for typical `Treant`s, or even MDSynthesis `Sims`. A script for converting existing state files in HDF5 format is already available for use.

All of these major changes should (finally!!) coalesce into a series of releases for each of these packages. It's been a long road, with most of the concerns relating to release the fact that we are also supporting a file format which must be made to change gracefully with time. I think we are finally nearing the point where we come move to regular and frequent releases!

All of this is open to revision, of course, and I welcome any alternative ideas anyone has for what our course should be. The packages should be useful, after all. :D

To a wonderful and productive 2016!

David
Reply all
Reply to author
Forward
0 new messages