Great, thanks. In the end (things move fast) I think we'll do a hybrid solution. The underlying directory structure looks like this (and can't be easily changed - the instrument writes directly into it):
projects
- myproject_sm
- jays_project
- another_project_by_jay
etc (ie, folder names contain project name and PI name but not in any machine readable form)
We'll set up a symlinked directory structure like this:
newprojects
- smatthews
--myproject_sm
- jsmith
--jays_project
--another_project_by_jay
That way the username is encoded in the directory structure like normal, so the atom dataset provider can easily pull it out and put it into the template.
Speaking of which, I cleaned up the scripts that come with the atom dataset provider a bit, to hopefully make it a bit easier to set up.
In particular, provider.sh contains this:
----
STAGING="/mnt/np_staging"
USERNAME='[^/]+'
INSTRUMENT='[^/]+'
EXPERIMENT='[^/]+'
DATASET='[^/]+'
# Modify the above regex components to suit your installation. For example, if usernames are like e1234 or s2345:
# USERNAME="[EeSs][0-9]+"
# You will also need to modify the templates to suit.
# In this structure, if users put files directly in the experiment level, they'll be grouped together in a dataset sharing that name.
GROUPPATTERN="^(${STAGING}/${USERNAME}/${INSTRUMENT}/${EXPERIMENT}/(${DATASET}/)?).*"
----
Writing separate regex's for the staging area, username, instrument, experiment and dataset folders should hopefully be a bit easier on the brain.
Steve