writing a new ACQ4 analysis module

43 views
Skip to first unread message

Michael Graupner

unread,
Oct 30, 2017, 10:48:48 AM10/30/17
to ac...@googlegroups.com
Hello, 

I would like to write an analysis module in ACQ4 for ACQ4 data. Does some first step documentation exist somewhere on how to do this? I thought I have seen some instructions somewhere but cannot find it. Any hints would be greatly appreciated.

Thanks in advance.

Best,
Michael Graupner

 

Luke Campagnola

unread,
Oct 30, 2017, 12:56:31 PM10/30/17
to ac...@googlegroups.com

Ok, first: although the analysis module system is pretty well tested and functional, it has a few design decisions that I am no longer happy with. Most recently I have been working with a new analysis system that I think has a better architectural design, but it is considerably less mature (https://github.com/AllenInstitute/neuroanalysis). Moving forward, I will be focusing on this system and importing these tools into ACQ4 for online analysis. I am also considering implementing ACQ4 analysis modules just as regular modules (rather than having an entirely separate module system for analysis).


Maybe if you tell me more about what you want to do, then I can point you in the right direction? 



Luke



From: ac...@googlegroups.com <ac...@googlegroups.com> on behalf of Michael Graupner <graupner...@gmail.com>
Sent: Monday, October 30, 2017 7:48:26 AM
To: ac...@googlegroups.com
Subject: [acq4] writing a new ACQ4 analysis module
 
--
You received this message because you are subscribed to the Google Groups "ACQ4" group.
To unsubscribe from this group and stop receiving emails from it, send an email to acq4+uns...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/acq4/CAFVjdW%2BwfaumyYb8q9%2BRXcosSAyk4ogcdzOX4X-9XF%2Bxq0VXHQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Michael Graupner

unread,
Nov 6, 2017, 5:02:02 AM11/6/17
to ac...@googlegroups.com
Dear Luke, 

thank you for your response. 

I am very unhappy with the way I am doing analysis right now. I feel like I am starting from scratch with the basic things (visualization, data import, data and analysis storage) for every new experiment. Feels like a wast of time and I am very curious of what you consider to be a good analysis architecture. 

My current experiments involve high-speed behavioral video recordings, paired with calcium imaging and recordings of the treadmill activity (from the rotary encoder). The analysis will involve video analysis to track paws and ROI analysis of the calcium imaging experiments. 

Any thoughts are highly appreciated. 

Cheers,
Michael 


On Mon, Oct 30, 2017 at 5:56 PM, Luke Campagnola <lu...@alleninstitute.org> wrote:

Ok, first: although the analysis module system is pretty well tested and functional, it has a few design decisions that I am no longer happy with. Most recently I have been working with a new analysis system that I think has a better architectural design, but it is considerably less mature (https://github.com/AllenInstitute/neuroanalysis). Moving forward, I will be focusing on this system and importing these tools into ACQ4 for online analysis. I am also considering implementing ACQ4 analysis modules just as regular modules (rather than having an entirely separate module system for analysis).


Maybe if you tell me more about what you want to do, then I can point you in the right direction? 



Luke



From: ac...@googlegroups.com <ac...@googlegroups.com> on behalf of Michael Graupner <graupner...@gmail.com>
Sent: Monday, October 30, 2017 7:48:26 AM
To: ac...@googlegroups.com
Subject: [acq4] writing a new ACQ4 analysis module
 
Hello, 

I would like to write an analysis module in ACQ4 for ACQ4 data. Does some first step documentation exist somewhere on how to do this? I thought I have seen some instructions somewhere but cannot find it. Any hints would be greatly appreciated.

Thanks in advance.

Best,
Michael Graupner

 

--
You received this message because you are subscribed to the Google Groups "ACQ4" group.
To unsubscribe from this group and stop receiving emails from it, send an email to acq4+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "ACQ4" group.
To unsubscribe from this group and stop receiving emails from it, send an email to acq4+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/acq4/MWHPR12MB12948238C88B2DAD49E336E6C9590%40MWHPR12MB1294.namprd12.prod.outlook.com.

Luke Campagnola

unread,
Nov 6, 2017, 1:42:00 PM11/6/17
to ac...@googlegroups.com

It’s a big topic! What makes a good analysis infrastructure? I have found analysis to be much more difficult to generalize than acquisition, so the tendency here is toward having a larger number of very small, reusable pieces. One of the themes you’ll see here is “separation of concerns” – as much as possible, keep your visualization code separate from analysis, separate from data reading, separate from database management, etc. This will allow you to build tools that are reusable, so that you don’t feel like you’re starting from scratch every time.

 

Things I have been working on:

 

* A data abstraction layer – something that represents the basic components in your data in a way that is agnostic to the actual format of the data on disk. This has two important benefits: 1) it future-proofs your code against changes in the data format over time, and 2) it forces you to separate the logic of reading your data from the logic of analyzing it. The “data model “ system in ACQ4 was a first attempt at this, but it turned out to be difficult to use. I am much happier with the data abstraction system here: https://github.com/AllenInstitute/neuroanalysis/blob/master/neuroanalysis/data.py

 

* Functions and classes that perform just one particular kind of analysis each, with no data handling (reading/writing) or user interface code. You will want to be able to call these functions many different ways, from many different contexts. (examples: neuroanalysis/fitting.py, event_detection.py, etc.)

 

* Reusable user interface tools. Making reusable UI tools is a lot more work than just writing the UI for a single specific use case. However, it begins to pay off in the cases where you want a similar functionality in more than one context. For example, a very common type of interaction for us is to plot a signal from which we will detect events (perhaps a calcium trace, or a recording of evoked synaptic currents), and on top of that we have a line showing the threshold at which events are detected, and tick marks showing the events that were detected. If the user moves the threshold line, then the events are re-detected and the tick marks update. Because we have repeated this motif in so many different places, it made sense to encapsulate it in a single class that creates the plot line, threshold line, tick marks, and a set of control parameters. Then for each specific use case, we can just insert these elements wherever they are needed in the UI. So a common theme here is that a reusable UI tool contains references to multiple UI elements (but does not attempt to assemble them into a final window or layout), and also the code that functionally connects these elements and calls out to the actual analysis functions.

 

* Reusable visualization tools. If you write the code to display the results of a particular analysis, keep it separate from the rest of your UI. Chances are good that you will want to display that same result again, but in a different context (for example, it’s common to visualize a result immediately after it is generated, but then to want to see the result later on by reloading from the stored analysis file or from a database).

 

* A set workflow for performing analyses, where each analysis generates a new file (don’t modify existing files). Ideally, these results become part of your data abstraction layer, so that you can re-load them alongside your data.

 

* A database for cataloging similar experiments. The point here is to take all of the “messy” raw data and force it all into nice rectangular tables, so that every experiment has the same format, and the results of the primary analyses for each experiment can be compared easily. For a small number of experiments, it might be easiest just to do this manually in a spreadsheet. As you get more and more experiments, it may become worthwhile to automatically construct this database from raw data (perhaps just saving it as a set of csv files, for example), so that it can be completely regenerated from raw data and analysis result files whenever you want to modify your analysis. For even larger experiments, you may eventually reach a point where you need a full SQL database to manage all of your data efficiently. ACQ4 has a database system based on sqlite that works reasonably well, and my latest project uses a postgres database. In both cases, building the database was a lot of work, but ultimately allowed us to do the high-level analyses that we needed.

 

 

That’s all for now. I think we could write a book on this topic when we are done..

 

 

 

--

To unsubscribe from this group and stop receiving emails from it, send an email to acq4+uns...@googlegroups.com.

--

You received this message because you are subscribed to the Google Groups "ACQ4" group.

To unsubscribe from this group and stop receiving emails from it, send an email to acq4+uns...@googlegroups.com.

 

--

You received this message because you are subscribed to the Google Groups "ACQ4" group.

To unsubscribe from this group and stop receiving emails from it, send an email to acq4+uns...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/acq4/CAFVjdWKMdsKa51A1BbOC7DtOdv5o%2BMHz%2BOSAbeAHg1SaD7u70Q%40mail.gmail.com.

Michael Graupner

unread,
Nov 7, 2017, 11:05:21 AM11/7/17
to ac...@googlegroups.com
Dear Luke, 

thank you for providing all those insights. It makes sense when reading it. 

I didn't get one point : "A set workflow for performing analyses, where each analysis generates a new file (don’t modify existing files)." What do you mean by set workflow ? 

At some point, I have to created data structures which contained all information, the raw data and the analyzed data related to an experiment. What do you think of that? 

I tried to get an example of "neuroanalysis" working in order to see your ideas in practice. However,  the raw data for
"test_event_detection.py": "test_data/synaptic_events/events1.npz" 
or for
"test_psp_fit.py": "psp.csv" 
seems to be missing. Or am I missing something? 

Thanks!

Cheers,
Michael



--

To unsubscribe from this group and stop receiving emails from it, send an email to acq4+unsubscribe@googlegroups.com.

--

You received this message because you are subscribed to the Google Groups "ACQ4" group.

To unsubscribe from this group and stop receiving emails from it, send an email to acq4+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "ACQ4" group.
To unsubscribe from this group and stop receiving emails from it, send an email to acq4+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "ACQ4" group.
To unsubscribe from this group and stop receiving emails from it, send an email to acq4+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/acq4/MWHPR12MB1294C321434FC68EA35E6F8AC9500%40MWHPR12MB1294.namprd12.prod.outlook.com.

Luke Campagnola

unread,
Nov 28, 2017, 8:18:30 PM11/28/17
to acq4
On Tue, Nov 7, 2017 at 8:04 AM, Michael Graupner <graupner...@gmail.com> wrote:
Dear Luke, 

thank you for providing all those insights. It makes sense when reading it. 

I didn't get one point : "A set workflow for performing analyses, where each analysis generates a new file (don’t modify existing files)." What do you mean by set workflow ? 

In most projects I have worked on, analysis happens in multiple stages. A typical workflow might look like:
1. Load all data in MosaicEditor to make sure it is well-aligned and to annotate the locations of cells or anatomical features
2. Analyze each ephys or calcium trace and extract some parameters like the timing and amplitude of events
3. Combine these metrics across many repeated traces to extract an average metric for each cell
4. Combine averaged metrics across many experiments in a large table to make comparisons and check for correlations between ephys features and other experimental parameters

Eeach stage in the workflow has a specific set of inputs and outputs. Whenever one stage generates a new result, keep a record of not only what the results were, but also the names of the input files, the values of any configurable parameters, and even the git commit ID of the analysis code. This can save you a lot of time later on if you change your analysis routines and need to re-analyze large batches of data.

There's also an open question about where to store the output of each stage--inside the HDF5 files? In separate files? In a completely separate directory? In a database?
I personally prefer to have each stage produce a single file (often json, yaml, or HDF5) that lives alongside the raw data, and then have tools that can rebuild the entire relational database from these raw files.


At some point, I have to created data structures which contained all information, the raw data and the analyzed data related to an experiment. What do you think of that? 

I think that's a good idea-- I like to model my raw data and results as classes like Experiment, Pipette, Cell, Synapse, Spike, etc., where each class gets its properties by either reading the raw data and analysis files, or by pulling from a database. This makes it much easier to write higher-level analysis code, and also separates the analysis code from the underlying data representation (for example, you could easily switch from reading files to reading from a database, but keep the same object-model API).
 

I tried to get an example of "neuroanalysis" working in order to see your ideas in practice. However,  the raw data for
"test_event_detection.py": "test_data/synaptic_events/events1.npz" 
or for
"test_psp_fit.py": "psp.csv" 
seems to be missing. Or am I missing something? 

Some of the test data is kept in a git submodule, and some just hasn't been added to the repository yet..

 
Cheers,
Luke
Reply all
Reply to author
Forward
0 new messages