Bonsai and DeepLabCut - feature recognition less accurate on-line than off-line

654 views
Skip to first unread message

Blasiak Tomasz

unread,
Dec 28, 2020, 5:55:50 AM12/28/20
to Bonsai Users
Hi,
I have several well-trained (based on their off-line accuracy) rat recognition DLC networks (ResNet-50) trained on the open field and conditioned place preference test. After exporting these networks, I wanted to use them online in Bonsai. As a test, I used raw videos (FIleCapture) which I had previously used to train in DLC. Unfortunately, their accuracy is noticeably less accurate on-line compared to off-line feature detection. Where could this difference come from? 
Best,
Tomasz.

Blasiak Tomasz

unread,
Dec 28, 2020, 5:59:10 AM12/28/20
to Bonsai Users
PS. I use DLC  v. 2.2b8  (single animal) and newest DeepLabCut packages for Bonsai.

k.dani...@nencki.edu.pl

unread,
Dec 28, 2020, 6:23:52 AM12/28/20
to Bonsai Users
Hi,

I'm kinda curious about this also. Is the difference a problem of computational power and would be solved by employing a faster gpu or a less computationally expensive model with lower latency (like mobilnet)? Or is it from non batch processing when infering online?

Konrad

Edmund Chong

unread,
Dec 29, 2020, 2:46:45 PM12/29/20
to Bonsai Users
I've been struggling with the same issue for the past few weeks. Have tried mobile-net, does not work, I have tried also latest DLC (2.2b).

For me I've also noticed that the more iterations the network is trained on, the worse this problem becomes (the accuracy in Bonsai becomes much, much worse), even though in DLC itself the model looks really good.

I've spoken to Goncalo about this--the computation in Bonsai should be identical to the one in DLC-Live (which has a slightly different implementation from DLC), however the end result should be the same. I have looked at the DLC-Live inference--it looks almost (?) identical to the one in DLC, and better than the one in Bonsai.

The other thing I'm trying right now, with Goncalo, is that apparently there is a "global_scale" parameter in the config file for training the network. In DLC they automatically apply some scaling (e.g. 0.8, 0.5), which probably would be accounted for during model evaluation. In contrast, Bonsai does not account for this scaling. This could be the reason.

Edmund

Edmund Chong

unread,
Dec 29, 2020, 2:51:53 PM12/29/20
to Bonsai Users
Also: for me, the accuracy difference between Bonsai, and DLC/DLC-Live is already noticeable with offline detection of videos played through FileCapture

Gonçalo Lopes

unread,
Jan 12, 2021, 5:36:08 PM1/12/21
to Edmund Chong, Bonsai Users
Hi Edmund et al.

Any updates on this? I haven't observed the discrepancy myself with pretrained models downloaded from the model Zoo (e.g. human pose) and with the ones I happened to train myself. I have asked around and it seems like some people experience this issue, and some don't, so I would like to understand better what exactly causes this.

I have asked Mackenzie about this and she suggested this might be down to an effect of pre-scaling which might be enabled by default in deeplabcut (this is sometimes done for robustness of the network to changes in resolution from the original training set).

How exactly are you training the models? Are you using the DLC gui as is, or the python scripts?

Any accuracy issue will be the same regardless of whether you are playing videos offline or online, as the evaluation code is exactly the same and simply expects a sequence of frames as an input, regardless of where they came from.

This is not likely to be related to GPU computational power or hardware, as in that case it would affect DeepLabCut and Bonsai+DLC the same, as they both use TensorFlow under the hood, with very little overhead and no particular conversion.

From the examples Edmund shared with me the issue seemed much more a matter of scale and shift, as the proportions all seemed correct, which I don't believe you would expect if the tracking was independently failing to regress the locations of features due to poor training accuracy. I'm not sure if the failures reported by Blasiak and Konrad show the same pattern.

If anyone can share their training procedure and projects, that might help us to figure out what exactly is causing this discrepancy.


--
You received this message because you are subscribed to the Google Groups "Bonsai Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bonsai-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bonsai-users/6e730e3e-8c9b-42d5-8da3-4a2a13a9683an%40googlegroups.com.

kevf...@gmail.com

unread,
May 20, 2021, 9:38:12 AM5/20/21
to Bonsai Users
Hi all,

I'd like to bump this discussion up and ask if some of you managed to solve the issue or at least understand the root of the problem.

I started to train a network (mobnet35), on mouse reaching movements (100 fps movies, 224 x 174 pixels) through the DLC GUI (DLC version 2.2b8). The offline tracking is beautiful, very few tracking mistakes on the first iteration. But the tracking does very poorly online (tested through live imaging (100 fps) and using the FileCapture approach). I tried to adjust the scale parameter based on the config file ( I set it to 0.8 in Bonsai, and a minimum confidence factor at 0.5) but it does not really change the tracking performance. I am currently training a new network using Resnet50 to compare. Also, tracking and experiment performed on the same computer, so unlikely to be a computing power issue.

Any new insights?
Thanks!
Kevin

Gonçalo Lopes

unread,
May 29, 2021, 10:09:04 AM5/29/21
to kevf...@gmail.com, Bonsai Users
Hi Kevin,

The last I've heard of these issues it seems there is indeed some difference in specific cases, but it has been very hard to pinpoint what are the exact conditions for this problem to show up. It definitely does not happen in every video or training dataset.

There is a more specific issue trying to track the problem across DLC and DLC-Live codebases: Mismatch between video annotations in DLC and Bonsai-DLC for the same trained network · Issue #37 · DeepLabCut/DeepLabCut-live · GitHub

It might be useful to bring some of this discussion there with some more examples to see if we can get some insight into what is going on.

kevf...@gmail.com

unread,
Jun 15, 2021, 9:16:10 AM6/15/21
to Bonsai Users
Hi Gonçalo, 

thanks a lot for the reply. Before I move the issue to the Github page, I'd like to post an error message that seems to prevent to correct use of the GPU on my machine, see the screenshot below " Invoking ptxas not supported in Windows". I see that my GPU runs at about 9-10% during the pose detection process but I am not sure if this is what I should expect or if it should be more, and/or if this is linked to this message.

Capture d’écran 2021-06-15 à 12.45.48.png


Also, we have an other machine in the lab equipped with the NVIDIA GeForce RTX 3090, but this ones works with CUDA11+ and CuDNN 8.0+. I could not run the inference in Bonsai there since Bonsai relies on CuDNN v7.6.5. Is there any plan to release a package that works with higher CUDA/CuDNN versions ? I believe this would also require Tensorflow v2.4 or higher.

Thanks for your insights,
Best,
Kevin

kevf...@gmail.com

unread,
Jun 17, 2021, 7:24:32 AM6/17/21
to Bonsai Users
Hi all, after discussing with DLC creators, it turns out that the issue - that has been raised slightly differently in an other discussion here - stems from a problem in color conversion.

I solved my problem by applying a convertcolor node between my video capture node and the detect pose node. In my specific cases, the camera is set to capture RGB24 frames. By using the Rgba2Bgr converter I got excellent tracking through Bonsai, equivalent to the tracking performance as tested using the python-based  livedlcgui. So for those who've raised this issue, I suggest that you check your video format and either change it or convert it the right way. 

Best,
Kevin

Gonçalo Lopes

unread,
Jun 22, 2021, 3:40:53 AM6/22/21
to kevf...@gmail.com, Bonsai Users
Hi Kevin,

Thanks for your insightful observations! I hadn't personally come across this problem before as I mostly use grayscale formats. The default color format for OpenCV (and therefore Bonsai) is BGR. There was some discussion whether the DetectPose operator should be changed to automatically perform color conversions, but currently I am leaning against it for a few reasons:

  1) there is no way of knowing definitely whether a given image is in the RGB or BGR formats (both have 3 channels);
  2) it would break the behaviour of workflows which are already passing images in the correct format;
  3) if by chance your input image source is receiving images in the correct color format, then you would be forced to not only invert the channels out of the node, but pay the extra performance penalty of the built-in node conversion.

Besides improving the package documentation, one solution to this problem for those using RGB and worried about performance would be to train the original dataset with flipped color channel order, so the final inference does not require re-flipping.


Reply all
Reply to author
Forward
0 new messages