Adding post-processed output files to data records (esp. for display in the web interface)

104 views
Skip to first unread message

Maximilian Albert

unread,
May 30, 2013, 9:49:12 AM5/30/13
to sumatr...@googlegroups.com
Hi,

I have been using Sumatra semi-regularly over the past few months and
really like the way it provides a clean overview over my simulation
results. Especially the web interface is great for quickly comparing
outcomes of different runs. So thanks once again for a really useful
piece of software! :)

My current setup is that I have a script with a really long-running
simulation which also does some post-processing afterwards to produce
a few plots. However, sometimes I only realise after a few simulation
runs that a certain kind of visualisation is rather useful. I can
certainly add this to my simulation script, but the relevant output
plots will of course only be present in data records of subsequent
simulation runs and not for runs that have already been done. I don't
mind producing the plots manually for previous runs, but even if I
save the files in the relevant directories they don't show up in the
web interface because Sumatra didn't record them as output files
during the simulation runs.

So my question is: does there exist an easy way of manually adding
'output files' to a certain run even after it has finished? I am aware
that this might cause problems with exact reproducibility, but imho it
would be really useful to have an easy way of accessing them via the
web interface. Maybe there could be a separate section for
'Post-processed files' (similar to the already existing sections for
'Input files' and 'Output files')?

I'm imagining that there could be a command along the lines of:

smt add-postprocessed-files RECORD_LABEL FILE1 [FILE2 ...]

which could be used to register a bunch of files that would then be
displayed in the 'postprocessing' section for this particular record
in the web interface. Does this make sense? Any comments or
alternative suggestions (also for a more concise command name or
section title)? If this sounds useful, which files do I need to look
at to implement this and submit a pull request?

Thanks,
Max

Andrew Davison

unread,
May 30, 2013, 2:14:20 PM5/30/13
to sumatr...@googlegroups.com
I think the best approach would be to turn your "manual" approach into a script, run it with "smt run", and then have Sumatra be able to display the two records (the original simulation and the "extra visualization" step) as a kind of "mini-workflow" with just two steps.

However, I also like your idea of being able to add extra files, which should definitely be in a separate section.

I would suggest either
  1. a new command "smt attach" or
  2. extend the "smt comment" command to accept binary filenames instead of just text strings (i.e. add an "--attach" option)

To start with you will need to modify commands.py, the Record class (records.py), the Project class (projects.py), the recordstore module and the web module (mainly web/templates/record_detail.html)

Cheers,

Andrew

Maximilian Albert

unread,
Jun 1, 2013, 3:32:34 PM6/1/13
to sumatr...@googlegroups.com
Hi Andrew,

> I think the best approach would be to turn your "manual" approach into a
> script, run it with "smt run", and then have Sumatra be able to display the
> two records (the original simulation and the "extra visualization" step) as
> a kind of "mini-workflow" with just two steps.

I had indeed considered this, but then the post-processed files would
be separated from the original simulation output in the web interface,
which imho makes it much less convenient to find the desired files
(especially if only a few early simulation runs are missing the
post-processed files). But I can imagine it's a useful strategy in
other situations.


> However, I also like your idea of being able to add extra files, which
> should definitely be in a separate section.
>
> I would suggest either
> 1. a new command "smt attach" or
> 2. extend the "smt comment" command to accept binary filenames instead of
> just text strings (i.e. add an "--attach" option)

I really like the "smt attach" idea. Judging from some initial
attempts I don't think it should be too difficult to implement, but I
have a couple of questions regarding syntax and functionality.

1) Just as a sanity check: is the following syntax reasonable?

smt attach [--remove] [--label LABEL] FILE1 [FILE2 ...]

In case the "--label" argument is omitted, the most recent record is
used. Moreover, if the "--remove" option is set, this would remove
attachments from the given record.

2) More importantly, what exactly should the attach command actually
do? Initially, I thought it would just 'register' (or deregister, if
the "--remove" option is set) certain files as attachments to a given
record. However, how to deal with files that are outside that record's
data store? Should they be copied into it (and be deleted again if the
"--remove" option is given)? Otherwise, should only files that are
already present in the data store be accepted as attachments?

Imho the latter would quite severely limit the usefulness of the
command, but I don't have enough experience with Sumatra yet to decide
which alternative is preferable (or if there are any other, better
options). So any suggestions would be appreciated.

Many thanks,
Max

Andrew Davison

unread,
Jun 11, 2013, 5:20:21 AM6/11/13
to sumatr...@googlegroups.com
On 1 juin 13, at 21:32, Maximilian Albert wrote:
>> I think the best approach would be to turn your "manual" approach
>> into a
>> script, run it with "smt run", and then have Sumatra be able to
>> display the
>> two records (the original simulation and the "extra visualization"
>> step) as
>> a kind of "mini-workflow" with just two steps.
>
> I had indeed considered this, but then the post-processed files would
> be separated from the original simulation output in the web interface,
> which imho makes it much less convenient to find the desired files
> (especially if only a few early simulation runs are missing the
> post-processed files). But I can imagine it's a useful strategy in
> other situations.

In future, I hope to be able to represent "workflows" better in the
web interface. For example, this could let multiple, related
computations be grouped together on a single page. But this will
probably take some time.

>> However, I also like your idea of being able to add extra files,
>> which
>> should definitely be in a separate section.
>>
>> I would suggest either
>> 1. a new command "smt attach" or
>> 2. extend the "smt comment" command to accept binary filenames
>> instead of
>> just text strings (i.e. add an "--attach" option)
>
> I really like the "smt attach" idea. Judging from some initial
> attempts I don't think it should be too difficult to implement, but I
> have a couple of questions regarding syntax and functionality.
>
> 1) Just as a sanity check: is the following syntax reasonable?
>
> smt attach [--remove] [--label LABEL] FILE1 [FILE2 ...]

> In case the "--label" argument is omitted, the most recent record is
> used. Moreover, if the "--remove" option is set, this would remove
> attachments from the given record.

For consistency with "smt comment", I would prefer

smt attach [--remove] [LABEL] FILE1 [FILE2 ...]

i.e. make LABEL an optional positional argument. This might be a bit
tricky to get working with argparse though, since we want to be able
to specify multiple files - you would probably have to implement a
custom Action (see http://docs.python.org/dev/library/argparse.html#action)
to check whether the first argument is a label or a filename.


> 2) More importantly, what exactly should the attach command actually
> do? Initially, I thought it would just 'register' (or deregister, if
> the "--remove" option is set) certain files as attachments to a given
> record. However, how to deal with files that are outside that record's
> data store? Should they be copied into it (and be deleted again if the
> "--remove" option is given)? Otherwise, should only files that are
> already present in the data store be accepted as attachments?
>
> Imho the latter would quite severely limit the usefulness of the
> command, but I don't have enough experience with Sumatra yet to decide
> which alternative is preferable (or if there are any other, better
> options). So any suggestions would be appreciated.

A record already has two data stores, one for input and one for output
data. I think the cleanest approach would be to add an optional third
data store for attachments (which would only be created the first time
"smt attach" is used).

Cheers,

Andrew



Maximilian Schmidt

unread,
Dec 17, 2013, 4:48:15 AM12/17/13
to sumatr...@googlegroups.com
Hi,
I just stumbled upon the same problem/feature request and would like to
ask whether you implemented your ideas, possibly in a developer branch,
or what the status of this is?
Thanks and best regards,
Max
--
Maximilian Schmidt
Institute of Neuroscience and Medicine (INM-6)
Computational and Systems Neuroscience
Institute for Advanced Simulation (IAS-6)
Theoretical Neuroscience
Juelich Research Center and JARA
Juelich, Germany

tel.: +49 2461 61-9472

maximilia...@fz-juelich.de




------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------

Maximilian Albert

unread,
Dec 17, 2013, 6:20:55 AM12/17/13
to sumatr...@googlegroups.com
Hi Max,

I personally haven't made any progress in this regard yet, I'm afraid.
As far as I remember, there were still a few things regarding the user
interface (e.g. the label vs. filename issue mentioned in an earlier
email) that needed thinking about, but unfortunately I was swamped
with work in the past few months and haven't gotten around to it. It's
still on my (long) TODO list, but I don't expect it to happen too
soon, so if anyone else wants to take a stab before me then feel free.

Cheers,
Max

2013/12/17 Maximilian Schmidt <max.s...@fz-juelich.de>:
> --
> You received this message because you are subscribed to the Google Groups
> "sumatra-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to sumatra-user...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.

Gregor Wautischer

unread,
Apr 12, 2016, 9:00:59 AM4/12/16
to sumatra-users

Hi,

I have taken a shot at this. It's my first try in adding something to Sumatra so don't be too harsh on me :)

To be able to add post processing data to a Sumatra record I added a new category for records called "evaluation_data" in the same way "output_data" is implemented. They both share the same data store (I don't see a problem with that?). Furthermore two functions called add_evaluation_data and remove_evaluation_data have been added in projects.py, oriented kinda on add_comment.

Syntax:

project.add_evaluation_data(recordlabel, filename)
project.remove_evaluation_data(recordlabel, sumatra.datastore.base.DataKey, delete=True/False)
(where project comes from project=load_project())

add_evaluation_data adds a file to a record (identified by its label) and moves the file to the record's data store if outside of it.
remove_evaluation_data removes a file from a record and deletes it if delete=True (False by default).

There is no possibility yet to add files from the command line and also the web interface does not incorporate evaluation_data yet. However please have a look at my implementation by cloning the evaluation_data branch from https://github.com/GregorWautischer/sumatra/ (git clone -b evaluation_data https://github.com/GregorWautischer/sumatra/). If people agree there should be no problem in making the commands available from the command line and including evaluation_data into the web interface.

Greetings,

Gregor

Andrew Davison

unread,
Apr 14, 2016, 8:18:32 AM4/14/16
to sumatra-users
Hi Gregor,

Thanks very much for this contribution. My first reaction is to ask how the evaluation data are generated? If they are generated by running another computation, wouldn't it be better to track that computation with Sumatra as well? Then you would have computation A which produces output data X, and computation B (post-processing) which takes X as input data and produces output data Y (the evaluation data).

Cheers,

Andrew

Andrew Davison

unread,
Apr 14, 2016, 8:30:49 AM4/14/16
to sumatra-users
Hi again,

I just realized that your post was in the context of the previous discussion about "smt attach". You can ignore the message I just posted, as it rehashes part of the earlier discussion.

In this context, I think your `evaluation_data` branch is a good start. Comments:

- I think the name "evaluation_data" is too narrow. I suggest "extra_data" or "additional_data" or "associated_data" as more generic alternatives.
- these additional data should have their own data store (it may in usual practice have the same path as the output data store, but some people may wish to keep these directories separate)
- I don't like the idea of automatically moving the file to the record's data store if outside it, unless you are using ArchivingDataStore

It would be best to continue this discussion on Github, so I suggest you either create a new issue, or create a pull request from your `evaluation_data` branch.

Cheers,

Andrew

Can Pervane

unread,
Mar 16, 2017, 5:40:34 PM3/16/17
to sumatra-users
Hi,

I am just started to use sumatra, and I think its a great software.

I am having the same workflow problem as mentioned in this series of posts. That is:

I have a long simulation run that produces data, and then I have a short data analysis step that takes the generated data as input.

More generally, the analysis step could use data outputs from multiple simulations.

Thus to reproduce the output of an analysis step, you either need the input Data or you need the sumatra record of the run that generated the input data for the analysis step. Using sumatra runs that tracks the analysis and the data generation separately could results in loss of information for the analysis step if al the info of the data generation run is not saved in the analysis input file. The input data used for the analysis step could be a binary file, and mostly does not contain any information on how the data was generated. Thus is there a current suggested sumatra workflow from losing the information on the data used by the analysis step?

I was thinking of a workflow where you can chain different sumatra records, such that one sumatra record can have another record as its input. Thus the simulation run either uses the output of the inputted sumatra record or it reruns the inputted sumatra record to regenerate(generate) the data to be used by the analysis step.
In addition, the sumatra run that took a sumatra record as input, could list the inputted sumatra record as its dependency or input on its own record.

Any thoughts on that? Would this kind of a workflow make sense or be useful in general? 

Bw,
Can

thomas...@boostheat.com

unread,
Apr 12, 2017, 10:40:48 AM4/12/17
to sumatra-users
Hello,
I do this with two kind of approach with web interface thanks to two powerful tools:
- django
- bokeh

I split this two approach with this paradigm :
- is the produced by the run ? e.g. I have a log file with residual I need to plot against iteration.
- is it an extra result output ?

For the first, i already have an xml file so I embed a bokeh app and a server. Plots are done on the fly since they all have the same shape.

For the second option, I take advantage of this django snippets http://blueimp.github.io/jQuery-File-Upload/
I construct another django model with a ManyToMany(Record) attribute. 

I also try another option adding a simple button which launch a python process and generate output.

I can share some ideas with you on this.

Thomas
Reply all
Reply to author
Forward
0 new messages