All datasets owned by dummy user

5 views
Skip to first unread message

Steve Bennett

unread,
Oct 3, 2013, 11:53:02 PM10/3/13
to tardis...@googlegroups.com
Hi all,
  One recurring issue in every auto-ingest deployment is trying to work out which user owns each incoming dataset. One idea I had was to basically skip this process as follows:

1) All datasets get harvested with some dummy user as the owner
2) The datasets do know which experiment they belong to
3) The correct owners of each experiment are manually administered in Tardis (there aren't that many, and it's much better to have a senior user doing this through the admin interface rather than fiddling with scripts or something)
4) Hence, the datasets are shared with the right users in Tardis (they're still owned by the dummy user, but full access is given to the owners of the experiment, and any other users that have access to the experiment)

Steps 1-3 are pretty straightforward. Is step 4 possible with the new auth system? Hard? Easy? Anyone tried a setup like this?

In this particular deployment, there are something like 20-30 experiments total, and they correspond more or less 1 experiment per researcher. But it's not exactly like that (sometimes 2 experiments per researcher, sometimes 2 researchers per experiment - or both.)

Steve

Steve Androulakis

unread,
Oct 4, 2013, 12:03:53 AM10/4/13
to tardis...@googlegroups.com
Hi Steve,

The interface/api/models fully support giving ownership to others (while retaining ownership yourself).

There are 3 levels of access:
Read = User can view and download data but not change descriptions or add data
Edit = User can view, download and change descriptions, as well as adding new data
Owner = User has the same permissions of edit, but they can now share with others (giving any of these levels of access)


Inline images 1

So this sounds like a realistic idea (and one that's crossed my mind in the past). I'd be happy for you to do it!

Cheers,
Steve


--
You received this message because you are subscribed to the Google Groups "tardis-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tardis-devel...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

sharing.png

Steve Bennett

unread,
Oct 4, 2013, 12:24:35 AM10/4/13
to tardis...@googlegroups.com
On Fri, Oct 4, 2013 at 2:03 PM, Steve Androulakis <steve.an...@gmail.com> wrote:
So this sounds like a realistic idea (and one that's crossed my mind in the past). I'd be happy for you to do it!

Thanks - so to make sure I've understood, there's currently no capability for datasets to automatically inherit sharing from the experiment level, but it could be done with a little bit of hacking?

Steve

Steve Androulakis

unread,
Oct 4, 2013, 12:29:35 AM10/4/13
to tardis...@googlegroups.com
I'm not sure if I understand that last question, but datasets inherit sharing permissions from their associated experiment(s).

If dataset 1 is associated with experiments A and B, and Jim has edit permissions on experiment A and read only for experiment B, then as far as dataset 1 is concerned, he has the most open inherited permission for dataset 1 -- that's edit permissions in this case.

Or do you mean a sharing interface specifically for datasets themselves? The models now support ACLs on the datafile and dataset level, but no interface.

S


--

Steve Bennett

unread,
Oct 4, 2013, 1:07:21 AM10/4/13
to tardis...@googlegroups.com
On Fri, Oct 4, 2013 at 2:29 PM, Steve Androulakis <steve.an...@gmail.com> wrote:
I'm not sure if I understand that last question, but datasets inherit sharing permissions from their associated experiment(s).

If dataset 1 is associated with experiments A and B, and Jim has edit permissions on experiment A and read only for experiment B, then as far as dataset 1 is concerned, he has the most open inherited permission for dataset 1 -- that's edit permissions in this case.

Oh, good. And when dataset 2 is added to experiment A, then Jim will be able to edit it, without any further fiddling? That's excellent.

Steve

Steve Androulakis

unread,
Oct 4, 2013, 1:09:48 AM10/4/13
to tardis...@googlegroups.com
Indeed!

Steve
--

Steve Bennett

unread,
Oct 4, 2013, 3:40:33 AM10/4/13
to tardis...@googlegroups.com
Great, thanks. In the end (things move fast) I think we'll do a hybrid solution. The underlying directory structure looks like this (and can't be easily changed - the instrument writes directly into it):

projects
- myproject_sm
- jays_project
- another_project_by_jay

etc (ie, folder names contain project name and PI name but not in any machine readable form)

We'll set up a symlinked directory structure like this:

newprojects
- smatthews
--myproject_sm
- jsmith
--jays_project
--another_project_by_jay

That way the username is encoded in the directory structure like normal, so the atom dataset provider can easily pull it out and put it into the template.

Speaking of which, I cleaned up the scripts that come with the atom dataset provider a bit, to hopefully make it a bit easier to set up.

In particular, provider.sh contains this:

----
STAGING="/mnt/np_staging"
USERNAME='[^/]+'
INSTRUMENT='[^/]+'
EXPERIMENT='[^/]+'
DATASET='[^/]+'

# Modify the above regex components to suit your installation. For example, if usernames are like e1234 or s2345:
# USERNAME="[EeSs][0-9]+"
# You will also need to modify the templates to suit.
# In this structure, if users put files directly in the experiment level, they'll be grouped together in a dataset sharing that name.
GROUPPATTERN="^(${STAGING}/${USERNAME}/${INSTRUMENT}/${EXPERIMENT}/(${DATASET}/)?).*"
----

Writing separate regex's for the staging area, username, instrument, experiment and dataset folders should hopefully be a bit easier on the brain. 

Steve
Reply all
Reply to author
Forward
0 new messages