Memory saving and performance issues in control point editor

81 views
Skip to first unread message

johnfi...@gmail.com

unread,
Feb 17, 2022, 12:08:21 PM2/17/22
to hugin and other free panoramic software
I recently rewrote my local change for saving memory in the CP editor, to make it more suitable for committing on the branch with my other changes.

I ran into a bunch of performance side issues (yes, it saves memory, but the savings vs. cost in cpu time are a much trickier topic).

I have only one test environment and it is unusual, so I'm hoping for some feedback after I commit the changes on how this affects performance for others.  But while I'm still doing some final cleanups pre-commit, it would be great to get some opinions on which simple extra tweaks sound like they would be effective for other environments.

The basic change is: I wrote an efficient method for extracting and rotating a subimage from a wxImage, which allowed me to replace work on the whole image when zooming with work on just the visible portion when it is drawn.  That results in a big memory saving and a complicated performance change.

If you scroll around enough, then eventually the added cpu time would exceed the saved cpu time.  But for an interactive application, that typically doesn't matter, because the extra cpu time is distributed across user actions.

But a user expects scrolling to be much less sluggish than zooming, which demands more attention on the draw speed.

In my own environment (a giant high res display and an ancient graphics card) scrolling the image was painfully slow in my default-branch build and a bit slower in my version.  As I relearned linux performance tools  (which I had not used since over a decade ago), it took me a while to realize two things:
1) Almost all the sluggishness is CPU time spent in the XORG process that is independent of my change.
2) Of the CPU time within the Hugin process, almost all is creation of the magnifier image.  That ought to be such a small thing, that I initially paid no attention to the fact that it wasn't happening at all in the 200% zoom in default branch that I was comparing to my branch.  Recomparing at 150%, just so both had a magnifier, the performance difference in scrolling is too tiny to feel (but that could change in another environment).

I don't understand the intent of the magnifier current code.  I haven't yet learned vigra well enough to really follow the code.  For my own use, it would be easier to just call the same extract and rotate function I use for the whole visible portion.  But that would conflict with ever merging my code with Thomas's change to warp the magnifier.

Simply saving the magnifier result and only computing it when it changes would eliminate most of its contribution to sluggish scrolling.  Maybe I should do that before committing the change.

Scaling of the visible portion each time that changes is the biggest performance wild card in my change.  My faster, but messier, original version mixed that into the efficient extract and rotate code I wrote.  Since we support just a few scaling values and each is a rational number made from small integers, it is practical to have that execute far faster than the more general function in wxImage.  I never bothered to support .75 and 1.5 in my private copy because I never use them.  That and other needed cleanups prevent my committing that speedup for at least weeks.  On my AMD 5800X, the wxImage version is good enough for use even within the draw method.  But on other CPUs, especially those with less cache, I'm less sure.

There is also a giant performance flaw for float images, which would be easy and clean to fix in the image cache code vs. easy and ugly to fix in the control point code.  So I haven't yet done either.

On my system, all these performance changes are just tiny changes in a basically bad situation, because none of them affect the massive CPU time in the XORG process.  So I'm barely better than guessing at what will matter to others with better graphics cards.

In some environments, the whole rotate problem and likely the whole scaling problem can be passed all the way down to the graphics card, making this all really smooth.  These changes make those changes easier and I'm trying to keep those future changes in mind as I code.  But I can't test anything like that on my linux system and I still can't even build on my Windows system.

Gunter Königsmann

unread,
Feb 17, 2022, 1:23:54 PM2/17/22
to johnfi...@gmail.com, hugin and other free panoramic software
I don't know if that is the case here. But if the GUI feels sluggish one typical reason is that you try to update the display for every single mouse event you get sent (which means that if you draw slightly slower than your 19200dpi-7ms-latency-mouse sends data you start to fall behind with work and soon start lagging by a noticable amount). The way to counteract it is to update the display only when you get sent an "idle" event: this means that while and directly after the mouse moves you'll redraw the display as soon as the GUI thread manages to keep up with the work, but only if something has happened since the last redraw (which might be a timer or a mouse event or a key press). This might mean that you don't redraw after every single mouse event. But your monitor refresh rate might be slower than your mouse, anyway... ...and even if it isn't: Updating the zoom window only 40 times a second instead of at every screen refresh isn't bad. But lagging behind the user actions is.

Kind regards,

Gunter.

Gunter Königsmann

unread,
Feb 17, 2022, 1:56:17 PM2/17/22
to hugin and other free panoramic software
The same "speedup" method is used by the real time Linux kernel: The default kernel acts on every interrupt as fast as possible and by doing that saves every CPU cycle it can. But on slow computers that feels sluggish.
The real time kernel patch makes every interrupt set a flag that tells the system that if it has time it has some work to do here. If it has time it looks at the flags and priotizes the work. Both wastes additional CPU time. But the "choosing the right priority" part means there the system reacts instantaneously when you want it ti. Even if it makes your all- night-rendering-job run 20 minutes longer.

johnfi...@gmail.com

unread,
Feb 17, 2022, 3:10:42 PM2/17/22
to hugin and other free panoramic software
On Thursday, February 17, 2022 at 1:23:54 PM UTC-5 gunter.ko...@gmail.com wrote:
I don't know if that is the case here. But if the GUI feels sluggish one typical reason is that you try to update the display for every single mouse event you get sent (which means that if you draw slightly slower than your 19200dpi-7ms-latency-mouse sends data you start to fall behind with work and soon start lagging by a noticable amount). The way to counteract it is to update the display only when you get sent an "idle" event:

At least on my system, we aren't talking about anywhere near the speed implied by your speculation.  This isn't a video game.

I assume wxWidgets is dealing with the issue of getting behind by not even calling the onDraw method while previous onDraw calls haven't finished.  The results are lagging because a single update takes much too long, not because of any accumulation  of failure to keep up.

I haven't looked at the lower levels (gtk and x11) where most of the performance problem is, because from above wxWidgets I don't have any meaningful control over the way wxWidgets calls lower levels.  I certainly don't want to expand this effort into modifying wxWidgets.  It is also likely (and a big part of my reason for this thread) that the giant problem in lower levels doesn't act that way for most hugin users.  By contrast, the performance problem in the use of vigra to create the magnifier almost certainly does exist for anyone using the magnifier.  It might not make overall performance bad enough to be perceived as a problem, but I'm confident fixing it would have a noticeable effect for most users.

For my own use, the sluggish scrolling doesn't bother me much (as compared to the other things I fixed or enhanced).  But for some of my other changes to have a chance of being accepted, I expect the memory savings is needed and then that needs to be done in a way that across almost all use makes hugin more responsive, not less.  In the current rough version, there are probably more use cases in which the basic memory savings change makes it less responsive, even though there are also use cases in which this change already makes it more responsive.

Gunter Königsmann

unread,
Feb 17, 2022, 3:39:58 PM2/17/22
to johnfi...@gmail.com, hugin and other free panoramic software
Scrolling tends to be a place where wxWidgets might issue a excess number of redraw steps.
In all other cases if the intervals at which onDraw is called are strange I would wonder where Hugin triggers that.

The wxWidgets application I maintain (wxMaxima) draws the whole worksheet as a bitmap and displays it and I cannot complain about wxWidgets's speed even if my computer is only slightly better than an Intel Atom. But as I said I trigger redraws only from the OnIdle() function and only work on the part of the Big Bitmap I currently need...

johnfi...@gmail.com

unread,
Feb 19, 2022, 12:57:37 PM2/19/22
to hugin and other free panoramic software
I pushed the change discussed here to my branch.

Hopefully someone else will try building and testing it.

With this change, higher zoom level no longer affects memory use:  All levels other than 100% and "fit" have the same memory use as each other, and that memory use is lower than any of them had before.  (100% and "fit" are no worse than before and might be better).  I hope this can eliminate the objection to providing 400% and 800% zoom choices (which were already in this branch).  With a large high res display, I depend on the higher zoom.

I didn't find time for most of the pre-commit testing I hoped to do.  I did do the simple version of reducing cpu time used by creating the magnifier subimage.  I did test enough that I think it is appropriate for others to build/test.

With a reasonable test sequence, I compared my branch against a build of the Feb-17 default branch.  I compared at 150% so both would have a magnifier displayed.  Mine was a tiny bit more responsive and used 15% less total CPU time (to open an existing project, switch to 150%, and review every CP without changing any).

There should be a common base class for CPImageCtrl and MaskImageCtrl.  The areas I changed would fit better there than they do in CPImageCtrl itself.  Some might fit better in some util file.  I wouldn't ask permission to merge any of this to default without first resolving that decision.  But that only needs discussion if there is a possibility of such permission.

T. Modes

unread,
Feb 20, 2022, 10:55:26 AM2/20/22
to hugin and other free panoramic software
johnfi...@gmail.com schrieb am Samstag, 19. Februar 2022 um 18:57:37 UTC+1:
I pushed the change discussed here to my branch.

Hopefully someone else will try building and testing it.

I asked you several times not to mix up several things in one commit. You still mixed up several unrelated code changes in one changeset. (Yes, they have both the same target. But the code changes are totally unrelated. And the commit message say nothing if somebody looks later in the history.)

Yes, it builds and the memory consumptions is reduced. But at the same time the CPU usage increased significant. When scrolling a single image the CPU usage increases on my system to nearly 100 %!
And the synchronized scrolling of both images is broken - in the default branch it takes 3-4 % CPU for a single image scrolling and 6-7 % for the synchronized scrolling.

johnfi...@gmail.com

unread,
Feb 20, 2022, 11:17:18 AM2/20/22
to hugin and other free panoramic software
On Sunday, February 20, 2022 at 10:55:26 AM UTC-5 T. Modes wrote:

I asked you several times not to mix up several things in one commit. You still mixed up several unrelated code changes in one changeset. (Yes, they have both the same target. But the code changes are totally unrelated. And the commit message say nothing if somebody looks later in the history.)

I don't see how it is "several" unrelated changes.  There is the minor change (that in hindsight I should not have done, because it was separable and because you made the same change between when I tested and when I committed and I failed to notice that when committing).  But all the rest is one unit and I can't see how it could be split.  I also can't think of what a better commit message might be.

Yes, it builds and the memory consumptions is reduced. But at the same time the CPU usage increased significant. When scrolling a single image the CPU usage increases on my system to nearly 100 %!
And the synchronized scrolling of both images is broken - in the default branch it takes 3-4 % CPU for a single image scrolling and 6-7 % for the synchronized scrolling.

My system was always cpu bound when scrolling that window (not reported as 100% because it isn't multi-threaded enough to report as 100%) and my version is faster on my system.  I have tried to guess how it might behave on other systems, but apparently something is happening on your system that I didn't predict.  How many cores does your CPU have and what tool did you use to measure that 100%?

I will investigate the issue with synchronized scrolling.

There are a number of different ways to compromise the change for potentially better CPU performance and worse (but still better than original) memory use.  But I would need better feedback.

It is also possible to get another large performance increase by mixing the scaling into the subimage extraction, rather than using the slower more general version in wxImage.

johnfi...@gmail.com

unread,
Feb 21, 2022, 3:24:47 PM2/21/22
to hugin and other free panoramic software
I should have repeated my earlier statement that my committed code should only be tested with non-float images.  It has a serious performance flaw for float images.  That should be easy to correct, but I expected you to have a strong opinion on where/how that should be corrected, and I thought resolving any issues with non-float images should happen first.  But if you were testing with a float image, that would explain part of why your results were so different from my results.

I haven't seen the synchonized scrolling ever work, which makes it hard for me to diagnose why my code might have broken it.

My Windows system has an unmodified (downloaded from the web page) binary for Hugin and has great hardware.  Attempts at synchronized scrolling make both images jump around, but neither gets very far in the direction the mouse moves.
My Linux system has an old display card with a flawed display driver, so initially I thought that might be the reason synchronized scrolling doesn't work.  It fails the same between my version and the build from default, though not the same as on my Windows system.


T. Modes

unread,
Feb 22, 2022, 11:13:34 AM2/22/22
to hugin and other free panoramic software
johnfi...@gmail.com schrieb am Montag, 21. Februar 2022 um 21:24:47 UTC+1:
I should have repeated my earlier statement that my committed code should only be tested with non-float images.  It has a serious performance flaw for float images.  That should be easy to correct, but I expected you to have a strong opinion on where/how that should be corrected, and I thought resolving any issues with non-float images should happen first.  But if you were testing with a float image, that would explain part of why your results were so different from my results.
No, I tested with 8 bit jpg images.

For the synchronized scrolling the images need some overlap. If the overlap is too small it does not work so good, but for a normal overlap it works here.

johnfi...@gmail.com

unread,
Feb 22, 2022, 11:48:12 AM2/22/22
to hugin and other free panoramic software
On Tuesday, February 22, 2022 at 11:13:34 AM UTC-5 T. Modes wrote:

For the synchronized scrolling the images need some overlap. If the overlap is too small it does not work so good, but for a normal overlap it works here.

I tested with about 30% full image overlap on Windows and about 50% on Linux.  In both cases, I brought the same control point to the center of each subwindow, with significant room for either to be scrolled in any direction.  So there were no special boundary cases involved, and 100% overlap of the displayed portions of the images.

When I get back from vacation I'll look at that bit of code and that may help guide my further testing to understand why it hasn't worked for me.

Reply all
Reply to author
Forward
0 new messages