task involving high-speed camera renders the GUI unresponsive

75 views
Skip to first unread message

Michael Graupner

unread,
Jun 19, 2017, 4:59:15 PM6/19/17
to ac...@googlegroups.com
Hello, 

I recently implemented a high-speed camera (200 frames/sec) for behavioral imaging using the micromanger camera class. Implementation was straightforward and usage of this camera in ACQ4 is great, for most parts. The micromanager camera class is a gem! 

When I use the camera is a task of 20 or 30 sec, 4000 to 6000 images will be recorded with 2 to 3 GB worth of data. Unfortunately, some processing at the end of the task renders the entire ACQ4 interface (not only the task manager) unresponsive. None of the running threads are updated during that period (e.g. the thread inquiring the laser status). The GUI comes back to life after 20 to 30 sec but sometimes with an error from a thread and most of the time with the laser switched off. I don't fully understand why the laser is switched off but I presume that this is a safeguard of the device when an interaction error occurs. 

Do you have an idea of what is going on during the dead time and is there a way to avoid that? I don't mind the unresponsive interface - even though it is annoying - but I want to make sure the threads are running safely and the laser does not switch off. 

Any help and suggestions are much appreciated. Thank you very much in advance. 

Cheers,
Michael 

Luke Campagnola

unread,
Jun 19, 2017, 5:42:13 PM6/19/17
to ac...@googlegroups.com

Hi Michael,


There is a tool we use to debug lockups here: https://github.com/acq4/acq4/blob/develop/acq4/pyqtgraph/debug.py#L1098

Just create an instance, and every 10 seconds it will print a stack trace from every running thread. This should make it possible to determine where the system is getting hung up. 


One possibility is that the task runner (I assume you are using the task runner?) is doing something inefficient when displaying the results from the task. However, I tested this with a mock camera and it seemed to have no trouble with a 30 second task (~6500 frames). Is it possible you are running out of memory (are you using a 32- or 64-bit python)?



Luke


From: ac...@googlegroups.com <ac...@googlegroups.com> on behalf of Michael Graupner <graupner...@gmail.com>
Sent: Monday, June 19, 2017 1:58:54 PM
To: ac...@googlegroups.com
Subject: [acq4] task involving high-speed camera renders the GUI unresponsive
 
--
You received this message because you are subscribed to the Google Groups "ACQ4" group.
To unsubscribe from this group and stop receiving emails from it, send an email to acq4+uns...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/acq4/CAFVjdW%2BkjfmEfW0uAocX9PU3Lr_975D6Y7MfHQR%2BXRsKP6xkyA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Michael Graupner

unread,
Jun 20, 2017, 9:45:51 AM6/20/17
to ac...@googlegroups.com
Dear Luke, 

thank you for your prompt response and suggestions. 

Yes, I am using the task runner. I am running 64 bit python (2.7.11, anaconda) on a 64 bit machine. My physical memory is not running out during the task. For example, it increases from  15 % to 27 % during the task and stays fixed until the GUI becomes responsive again (total 64 GB, increase ). 

Note that the freezing does not happen during a "Test" run in the task runner, but only during "Record Single". Memory consumption of the computer stays elevated during the freezing period and drops abruptly at the end. 

"Record Single" finishes with the following error : 
[15:30:24]  Error starting camera acquisition:

    |==============================>>
    |    File "acq4\util\Thread.py", line 23, in __run_wrapper
    |      self.__subclass_run()
    |    File "acq4\devices\Camera\Camera.py", line 871, in run
    |      printExc("Error starting camera acquisition:")
    |    File "acq4\util\debug.py", line 15, in printExc
    |      pgdebug.printExc(msg, indent, prefix)
    |    ---- exception caught ---->
    |    File "acq4\devices\Camera\Camera.py", line 858, in run
    |      self.dev.noFrameWarning(diff)
    |    File "acq4\devices\Camera\Camera.py", line 190, in noFrameWarning
    |      print "Camera acquisition thread has been waiting %02f sec but no new frames have arrived; shutting
 down." % diff
    |  TypeError: float argument required, not function
    |==============================<<


I have attached three runs using the ThreadTrace function. I marked the beginning of the task and the point during which the program freezes within the #### marks. 

I could not read an indication where the freezing comes from in the ThreadTrace output except that is screws up the running threads. 

Cheers,
Michael 


To unsubscribe from this group and stop receiving emails from it, send an email to acq4+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "ACQ4" group.
To unsubscribe from this group and stop receiving emails from it, send an email to acq4+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/acq4/MWHPR1201MB0143359717E7A2103189AC79C9C40%40MWHPR1201MB0143.namprd12.prod.outlook.com.
ThreadTrace_1.txt
ThreadTrace_2.txt
ThreadTrace_3.txt

Luke Campagnola

unread,
Jun 20, 2017, 2:56:15 PM6/20/17
to ac...@googlegroups.com

I tested this on my rig and got the same behavior you described. For a 30-second recording, ACQ4 hangs for several seconds while it processes the image data. For a 60-second recording, the entire machine hangs for several minutes because it goes into swap. 


I found one place where I could reduce memory overhead, when converting from a list of frames into a single array (whether this needs to be done at all is a different question, but that would take more work to change). With these changes, the acquisition thread still takes several seconds to process the image data, but it no longer causes the other threads to lock up: https://github.com/acq4/acq4/pull/61. Let me know if that helps at all.


Now this has unmasked another issue, that HDF5 cannot handle chunks larger than 4GB. Perhaps you can take a look at that (assuming you get the same error) ? 

Best place to fix this might be in CameraTask.storeResult(); note that the keyword arguments to dh.storeResult(...) are ultimately passed to MetaArray.writeHDF5(). 



Luke




Sent: Tuesday, June 20, 2017 6:45:29 AM
To: ac...@googlegroups.com
Subject: Re: [acq4] task involving high-speed camera renders the GUI unresponsive
 
To unsubscribe from this group and stop receiving emails from it, send an email to acq4+uns...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/acq4/CAFVjdWKCoeYZjPW4hn3LViJzEEuBsm5TUUNfdzhJDi%3D0bpar5g%40mail.gmail.com.

Michael Graupner

unread,
Jun 21, 2017, 9:04:36 AM6/21/17
to ac...@googlegroups.com
On Tue, Jun 20, 2017 at 8:56 PM, Luke Campagnola <lu...@alleninstitute.org> wrote:

I tested this on my rig and got the same behavior you described. For a 30-second recording, ACQ4 hangs for several seconds while it processes the image data. For a 60-second recording, the entire machine hangs for several minutes because it goes into swap. 


I found one place where I could reduce memory overhead, when converting from a list of frames into a single array (whether this needs to be done at all is a different question, but that would take more work to change). With these changes, the acquisition thread still takes several seconds to process the image data, but it no longer causes the other threads to lock up: https://github.com/acq4/acq4/pull/61. Let me know if that helps at all.



Unfortunately, the patch does not change anything on my side. The processing still interferes with the laser thread which cases its shut down. It is possible to have the processing running in separate thread? 

Cheers,
Michael
 

--
You received this message because you are subscribed to the Google Groups "ACQ4" group.
To unsubscribe from this group and stop receiving emails from it, send an email to acq4+unsubscribe@googlegroups.com.

Luke Campagnola

unread,
Jun 21, 2017, 7:25:58 PM6/21/17
to acq4
On Wed, Jun 21, 2017 at 6:04 AM, Michael Graupner <graupner...@gmail.com> wrote:


On Tue, Jun 20, 2017 at 8:56 PM, Luke Campagnola <lu...@alleninstitute.org> wrote:

I tested this on my rig and got the same behavior you described. For a 30-second recording, ACQ4 hangs for several seconds while it processes the image data. For a 60-second recording, the entire machine hangs for several minutes because it goes into swap. 


I found one place where I could reduce memory overhead, when converting from a list of frames into a single array (whether this needs to be done at all is a different question, but that would take more work to change). With these changes, the acquisition thread still takes several seconds to process the image data, but it no longer causes the other threads to lock up: https://github.com/acq4/acq4/pull/61. Let me know if that helps at all.



Unfortunately, the patch does not change anything on my side. The processing still interferes with the laser thread which cases its shut down. It is possible to have the processing running in separate thread? 

Cheers,
Michael


The task (including image processing and storage) is already running in a background thread. When it begins to process and store the image data, it actually seems to lock up all threads (in my tests, at least), and in some cases the entire machine. It could just be that the disk I/O is saturated, so any thread or process may become blocked waiting for disk access? On our system, we have a relatively fast SSD for data storage that is separate from the operating system disk (including ACQ4 configuration files). 

Maybe we should focus instead on figuring out why the laser is switching off--could you find the code that switches off the laser and insert a `traceback.print_stack()` so we can see who is calling it?


Michael Graupner

unread,
Jul 12, 2017, 10:37:09 AM7/12/17
to ac...@googlegroups.com
I am back working on this issue. 

I don't think there is a call to the laser routine. I think the problem arises in the laser thread (MaiTaiThread) :
It seems to me as if the laser switches off as precaution when something goes wrong in the serial protocol communication. The stalling interrupts the thread such that function writes and reads are not completed properly. This will in turn make the laser turn off. 

Is there a way to write that thread fail-safe? Can I build in a pre-caution there? 

Thanks,
Michael

 


--
You received this message because you are subscribed to the Google Groups "ACQ4" group.
To unsubscribe from this group and stop receiving emails from it, send an email to acq4+unsubscribe@googlegroups.com.

Luke Campagnola

unread,
Jul 12, 2017, 2:59:26 PM7/12/17
to acq4
On Wed, Jul 12, 2017 at 7:36 AM, Michael Graupner <graupner...@gmail.com> wrote:


On Thu, Jun 22, 2017 at 1:25 AM, Luke Campagnola <luke.ca...@gmail.com> wrote:

The task (including image processing and storage) is already running in a background thread. When it begins to process and store the image data, it actually seems to lock up all threads (in my tests, at least), and in some cases the entire machine. It could just be that the disk I/O is saturated, so any thread or process may become blocked waiting for disk access? On our system, we have a relatively fast SSD for data storage that is separate from the operating system disk (including ACQ4 configuration files). 

Maybe we should focus instead on figuring out why the laser is switching off--could you find the code that switches off the laser and insert a `traceback.print_stack()` so we can see who is calling it?

I am back working on this issue. 

I don't think there is a call to the laser routine. I think the problem arises in the laser thread (MaiTaiThread) :
It seems to me as if the laser switches off as precaution when something goes wrong in the serial protocol communication. The stalling interrupts the thread such that function writes and reads are not completed properly. This will in turn make the laser turn off. 

Is there a way to write that thread fail-safe? Can I build in a pre-caution there? 


When I did the test on my machine, it seemed to be that the entire OS was blocked while the HDF5 file was being written. So I am not sure that this can be solved just by moving the laser access to a more privileged thread, or even to a separate process.

Ok, next step: you said that there are no problems if you run the task without storing any data. Let's just verify that it is indeed the camera image storage that is causing the system to lockup. Under devices/Camera/Camera.py, CameraTask.storeResult(), comment out the last line of the function `dh.writeFile(data, k, info=info)`, and then run your task. 

If that really does prevent the lockup occurring, then it may help to think about how and where you are storing data -- on our machines, we use a fast SSD for data storage that is separate from the system drive (which includes ACQ4 and its configuration files). Another possibility is that we need to write the image data out more slowly as it is being collected, rather than in a single write at the end. 


Luke





 

Michael Graupner

unread,
Jul 13, 2017, 4:43:47 AM7/13/17
to ac...@googlegroups.com
On Wed, Jul 12, 2017 at 8:58 PM, Luke Campagnola <luke.ca...@gmail.com> wrote:
On Wed, Jul 12, 2017 at 7:36 AM, Michael Graupner <graupner...@gmail.com> wrote:


On Thu, Jun 22, 2017 at 1:25 AM, Luke Campagnola <luke.ca...@gmail.com> wrote:

The task (including image processing and storage) is already running in a background thread. When it begins to process and store the image data, it actually seems to lock up all threads (in my tests, at least), and in some cases the entire machine. It could just be that the disk I/O is saturated, so any thread or process may become blocked waiting for disk access? On our system, we have a relatively fast SSD for data storage that is separate from the operating system disk (including ACQ4 configuration files). 

Maybe we should focus instead on figuring out why the laser is switching off--could you find the code that switches off the laser and insert a `traceback.print_stack()` so we can see who is calling it?

I am back working on this issue. 

I don't think there is a call to the laser routine. I think the problem arises in the laser thread (MaiTaiThread) :
It seems to me as if the laser switches off as precaution when something goes wrong in the serial protocol communication. The stalling interrupts the thread such that function writes and reads are not completed properly. This will in turn make the laser turn off. 

Is there a way to write that thread fail-safe? Can I build in a pre-caution there? 


When I did the test on my machine, it seemed to be that the entire OS was blocked while the HDF5 file was being written. So I am not sure that this can be solved just by moving the laser access to a more privileged thread, or even to a separate process.

Ok, next step: you said that there are no problems if you run the task without storing any data. Let's just verify that it is indeed the camera image storage that is causing the system to lockup. Under devices/Camera/Camera.py, CameraTask.storeResult(), comment out the last line of the function `dh.writeFile(data, k, info=info)`, and then run your task. 

Commenting out `dh.writeFile(data, k, info=info)` solves the issue. The results of a `Record single` run in the Task Runner are immediately displayed and ACQ4 is immediately responsive again. So it is probably the issue you mention, the system gets blocked when the HDF5 file is written to the disk. However, it my case only ACQ4 is blocked, I can still interact with other programs and launch them. Is that also the case on your machine? 
 

If that really does prevent the lockup occurring, then it may help to think about how and where you are storing data -- on our machines, we use a fast SSD for data storage that is separate from the system drive (which includes ACQ4 and its configuration files). Another possibility is that we need to write the image data out more slowly as it is being collected, rather than in a single write at the end. 

I am in the process of ordering a SSD SATA drive for the ACQ4 data storage. So this is underway. To which extent will a SSD drive solve the issue? Saving the data during acquisition might be necessary if the issue persists even with SSD storage.

Cheers,
Michael


 


Luke





 

--
You received this message because you are subscribed to the Google Groups "ACQ4" group.
To unsubscribe from this group and stop receiving emails from it, send an email to acq4+unsubscribe@googlegroups.com.

Luke Campagnola

unread,
Jul 14, 2017, 2:35:12 PM7/14/17
to acq4
On Thu, Jul 13, 2017 at 1:43 AM, Michael Graupner
<graupner...@gmail.com> wrote:
>
>
> On Wed, Jul 12, 2017 at 8:58 PM, Luke Campagnola <luke.ca...@gmail.com>
> wrote:
>> When I did the test on my machine, it seemed to be that the entire OS was
>> blocked while the HDF5 file was being written. So I am not sure that this
>> can be solved just by moving the laser access to a more privileged thread,
>> or even to a separate process.
>>
>> Ok, next step: you said that there are no problems if you run the task
>> without storing any data. Let's just verify that it is indeed the camera
>> image storage that is causing the system to lockup. Under
>> devices/Camera/Camera.py, CameraTask.storeResult(), comment out the last
>> line of the function `dh.writeFile(data, k, info=info)`, and then run your
>> task.
>
>
> Commenting out `dh.writeFile(data, k, info=info)` solves the issue. The
> results of a `Record single` run in the Task Runner are immediately
> displayed and ACQ4 is immediately responsive again. So it is probably the
> issue you mention, the system gets blocked when the HDF5 file is written to
> the disk. However, it my case only ACQ4 is blocked, I can still interact
> with other programs and launch them. Is that also the case on your machine?

For me it depended on the amount of data that was being processed. It
was only for very long recordings that I would see other threads or
processes start to lock up. Some parts of the system were still
responsive (for example, I could open the start menu), whereas others
would hang until the write was complete.


>> If that really does prevent the lockup occurring, then it may help to
>> think about how and where you are storing data -- on our machines, we use a
>> fast SSD for data storage that is separate from the system drive (which
>> includes ACQ4 and its configuration files). Another possibility is that we
>> need to write the image data out more slowly as it is being collected,
>> rather than in a single write at the end.
>
>
> I am in the process of ordering a SSD SATA drive for the ACQ4 data storage.
> So this is underway. To which extent will a SSD drive solve the issue?
> Saving the data during acquisition might be necessary if the issue persists
> even with SSD storage.

The hopes are that 1) having fast storage will substantially reduce
the duration of the lockup, and 2) having all acq4 config files on a
separate drive from the data may prevent some other threads from
locking up, assuming they were competing with the task thread for disk
I/O time.

Michael Graupner

unread,
Oct 10, 2019, 3:33:08 AM10/10/19
to ac...@googlegroups.com
Dear Luke, 
I am coming back to the "storing large data renders ACQ4 unresponsive" issue from 2 years ago. I would like to know what it would require to store data progressively during a task rather than at once at the end? 

Just to back up, we were recording a video of 200 frames/sec for 30 sec and the storage of this data to a SSD drive renders ACQ4 unresponsive for about 40-60 seconds. We are now planning to extend our recording periods to 1 min and beyond and would like to avoid the inconvenient stalling of ACQ4.

You suggested 2 years ago :
" ... If that really does prevent the lockup occurring, then it may help to

think about how and where you are storing data -- on our machines, we use a fast SSD for data storage that is separate from the system drive (which includes ACQ4 and its configuration files). Another possibility is that we need to write the image data out more slowly as it is being collected, rather than in a single write at the end."

As mentioned above, we upgraded to a fast SSD. Do you have an idea how the incremental storing could be implemented in ACQ4? Would it entail large modifications in the code? 

Thank you for your thoughts on that. 

Best regards,
Michael Graupner



--
You received this message because you are subscribed to the Google Groups "ACQ4" group.
To unsubscribe from this group and stop receiving emails from it, send an email to acq4+uns...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/acq4/CACZXET_%2BmbCr2yoW5iONw0_4XEf-HwDeUJ9-65HN6qYxZLV3hA%40mail.gmail.com.

Luke Campagnola

unread,
Oct 22, 2019, 1:28:38 PM10/22/19
to ac...@googlegroups.com
Hi Michael!
Sorry to delay. In theory, this is pretty simple to implement. Currently, a CameraTask accumulates frames while the task is running by appending to a list here: https://github.com/acq4/acq4/blob/develop/acq4/devices/Camera/Camera.py#L549, and when the task finishes, the list is packed into a MetaArray and stored here: https://github.com/acq4/acq4/blob/develop/acq4/devices/Camera/Camera.py#L602.

What you want instead is to create the MetaArray file when the first frame arrives and append frames to the file every time newFrame is called. This functionality is already implemented in the RecordThread class, which is used by the CameraModule when it records a stack: https://github.com/acq4/acq4/blob/develop/acq4/util/imaging/record_thread.py. My inclination would be to create an instance of RecordThread just for use by the CameraTask, and then CameraTask.newFrame simply needs to call RecordThread.newFrame to have the frame appended to file. It looks like RecordThread will need to be slightly modified so that CameraTask can specify exactly what file name to write.

Let me know how that goes!

Cheers,
Luke

Sent: Thursday, October 10, 2019 00:32
To: ac...@googlegroups.com <ac...@googlegroups.com>

Subject: Re: [acq4] task involving high-speed camera renders the GUI unresponsive
 
CAUTION: This email originated from outside the Allen Institute. Please do not click links or open attachments unless you've validated the sender and know the content is safe.

Michael Graupner

unread,
Oct 30, 2019, 3:03:40 PM10/30/19
to ac...@googlegroups.com
Dear Luke, 

thank you very much for your suggestions. I implemented them and it works! The storage problem (stalling when saving large image numbers to disk) does not exist anymore when each new arriving frame is immediately appended to the file. I am really excited for that as it improves our recordings tremendously. 

I have added only minor changes so far in CameraTask to make this work, which you can check out in my cameraStorage branch https://github.com/mgraupe/acq4/tree/cameraStorage

There are a few minor things for which I would appreciate your help : 
- When calling self.recordThread.newFrame(frame) here
CameraTask should know whether is is a 'Test' run or a 'Record' run. Currently, images are stored in both cases. How can CameraTask access this information at this point? 

- In the record single run, the image stack is not stored in the protocol directory but in the default storage directory. I suppose the protocol directory has to be loaded here : 
Do you agree? How can the current protocol directory accessed by record_thread? In other words, where is this information saved to hand it over to record_thread? 

Thank you for your thoughts on these points in advance. 

Cheers,
Michael 


Luke Campagnola

unread,
Oct 30, 2019, 3:13:54 PM10/30/19
to ac...@googlegroups.com
Cool!

The camera task is given a reference to the main Task object here (as `parentTask`) : https://github.com/acq4/acq4/blob/develop/acq4/devices/Camera/Camera.py#L463

You should be able to use parentTask.cmd['storeData'] and parentTask.cmd['storagePath'] to decide whether/where to store data. Currently these decisions are implemented here: https://github.com/acq4/acq4/blob/develop/acq4/Manager.py#L1172


Luke

Sent: Wednesday, October 30, 2019 12:03

Michael Graupner

unread,
Oct 31, 2019, 5:09:10 PM10/31/19
to ac...@googlegroups.com
Hi Luke, 

thanks for the hints, they directed me as always in the right direction. I got it working to my satisfaction on the a test machine and will run more tests on the experiment computer next week. 

On a different note, we will upgrade the experiment computer to windows 10 and will have to reinstall everything for that purpose. What would it talk to run ACQ4 on python3? Should the current develop branch on github already run with python3 already? What is the status there? 

Cheers,
Michael  



Luke Campagnola

unread,
Oct 31, 2019, 6:28:02 PM10/31/19
to acq4
We are not quite ready for python 3 yet, but close. Still having some issues with PyQt5 and MicroManager.

Michael Graupner

unread,
Nov 5, 2019, 5:40:24 AM11/5/19
to ac...@googlegroups.com
Dear Luke, 

we are working with storing camera images during the acquisition and this part works perfectly. ACQ4 is non-responsive for a few (3-4) sec after the acquisition, nothing compared to before when the video storing after the task rendered the GUI non-responsive for a long time. 

However, we are having a memory overflow issue. The recorded frames accumulate somewhere in ACQ4 over multiple recordings, i.e., the used system memory increases over a sequence but also over successive sequence recordings. I would appreciate any hints into which direction to look. 

I tested the following modifications but that this not fix the problem : 
- I commented the recrodThread part in the CameraTask class but this did not fix the problem. I don't think it is related to acq4.util.imaging.record_thread

- Deleting self.frames in the CameraTask class did not the fix the memory accumulation. 

I don't have an idea where the frames could be stored elsewhere. Also, it is not clear to me whether this issue has to do with the sequential storing.? Is there maybe a variable handling linked with storeResults : 

Thanks very much in advance.

Best regards,
Michael 



Luke Campagnola

unread,
Nov 5, 2019, 12:05:12 PM11/5/19
to Michael Graupner, ac...@googlegroups.com
Memory leaks can be tricky to resolve in Python. I don't have a good idea what could be leaking (I think you have already checked the obvious places), but I do have a tool you can use to try tracking the leak: 


I originally developed this for tracking leaks in acq4, so hopefully it works for you.. let me know if you need help getting that working. 

Sent: Tuesday, November 5, 2019 02:40
Reply all
Reply to author
Forward
0 new messages