Improving GUI response with large dataset

1,167 views
Skip to first unread message

Israel Brewster

unread,
May 25, 2021, 12:51:05 PM5/25/21
to pyqtgraph
Note: this is not related to the recent thread with a similar subject

I am using PyQtgraph with PySide2 to plot some large datasets. This works fine with the smaller datasets, but when I start getting up into the hundreds of thousands of data points - the attached sample image contains 159,936 data points, for example - zooming and panning gets VERY laggy. I’m guessing this is simply due to Qt trying to re-draw all 159,936 data points whenever something changes?

I tried using the clipToView() option, but that crashed and burned (and wouldn’t help much when viewing the entire dataset anyway):

Traceback (most recent call last):
  File "/Users/israel/Development/tropomi_gui/lib/python3.8/site-packages/pyqtgraph/graphicsItems/PlotDataItem.py", line 885, in viewRangeChanged
    self.updateItems(styleUpdate=False)
  File "/Users/israel/Development/tropomi_gui/lib/python3.8/site-packages/pyqtgraph/graphicsItems/PlotDataItem.py", line 620, in updateItems
    self.scatter.setData(x=x, y=y, **scatterArgs)
  File "/Users/israel/Development/tropomi_gui/lib/python3.8/site-packages/pyqtgraph/graphicsItems/ScatterPlotItem.py", line 483, in setData
    self.addPoints(*args, **kargs)
  File "/Users/israel/Development/tropomi_gui/lib/python3.8/site-packages/pyqtgraph/graphicsItems/ScatterPlotItem.py", line 599, in addPoints
    self.setPointData(kargs['data'], dataSet=newData)
  File "/Users/israel/Development/tropomi_gui/lib/python3.8/site-packages/pyqtgraph/graphicsItems/ScatterPlotItem.py", line 753, in setPointData
    raise Exception("Length of meta data does not match number of points (%d != %d)" % (len(data), len(dataSet)))
Exception: Length of meta data does not match number of points (50 != 2)
Traceback (most recent call last):
  File "/Users/israel/Development/tropomi_gui/lib/python3.8/site-packages/pyqtgraph/graphicsItems/PlotDataItem.py", line 885, in viewRangeChanged
    self.updateItems(styleUpdate=False)
  File "/Users/israel/Development/tropomi_gui/lib/python3.8/site-packages/pyqtgraph/graphicsItems/PlotDataItem.py", line 620, in updateItems
    self.scatter.setData(x=x, y=y, **scatterArgs)
  File "/Users/israel/Development/tropomi_gui/lib/python3.8/site-packages/pyqtgraph/graphicsItems/ScatterPlotItem.py", line 483, in setData
    self.addPoints(*args, **kargs)
  File "/Users/israel/Development/tropomi_gui/lib/python3.8/site-packages/pyqtgraph/graphicsItems/ScatterPlotItem.py", line 589, in addPoints
    setMethod(kargs[k], update=False, dataSet=newData, mask=kargs.get('mask', None))
  File "/Users/israel/Development/tropomi_gui/lib/python3.8/site-packages/pyqtgraph/graphicsItems/ScatterPlotItem.py", line 668, in setBrush
    raise Exception("Number of brushes does not match number of points (%d != %d)" % (len(brushes), len(dataSet)))
Exception: Number of brushes does not match number of points (159936 != 36657)

So then I got the thought of simply “rendering” all the data points to a single QPixmap or the like, and simply displaying that at the appropriate location and size. And that’s where I got stuck: I can’t figure out how to a) render the data to an image object, or b) how to figure out the appropriate location and size.

Is this a reasonable and valid approach? If so, can anyone give me some tips as to how I might go about it? Is there a better way? Thanks!

---
Israel Brewster
Software Engineer
Alaska Volcano Observatory 
Geophysical Institute - UAF 
2156 Koyukuk Drive 
Fairbanks AK 99775-7320

Martin Chase

unread,
May 25, 2021, 10:24:48 PM5/25/21
to pyqt...@googlegroups.com
Hey Israel,

A couple of thoughts:

Downsampling should be able to make things faster. Look at the "optimization" section in https://pyqtgraph.readthedocs.io/en/latest/graphicsItems/plotdataitem.html

QGraphicsItems can be cached, which might help with panning. Docs https://doc.qt.io/qt-5/qgraphicsitem.html#CacheMode-enum

Others may have more ideas? Let us know if any of that helps!

 - Martin

Israel Brewster

unread,
May 26, 2021, 2:04:57 PM5/26/21
to pyqtgraph
Thanks for the suggestions!

On May 25, 2021, at 6:24 PM, Martin Chase <outofc...@gmail.com> wrote:

Hey Israel,

A couple of thoughts:

Downsampling should be able to make things faster. Look at the "optimization" section in https://pyqtgraph.readthedocs.io/en/latest/graphicsItems/plotdataitem.html

I don’t think downsampling is appropriate for my dataset. Since the points are arranged spatially, any *simple* downsampling would result in holes in the data coverage. I could, of course *resample* to a larger grid size, but that would either loose resolution when zoomed in - which we can’t afford, given that the features we are looking for often are only one or two grid cells in size - or require periodic re-resampling depending on the zoom, which doubtless would be more computationally expensive than simply re-drawing the dataset, not to mention complicated to implement.

QGraphicsItems can be cached, which might help with panning. Docs https://doc.qt.io/qt-5/qgraphicsitem.html#CacheMode-enum

It seems like this should work, based on the description - in fact, it sounds like it should do essentially what I was suggesting, rendering once to a offscreen pixmap and then simply using that rendered version. Unfortunately, I spent some time playing around with the setCacheMode function as well as the QPixmapCache.setCacheLimit function that the setCacheMode function mentioned, with no noticeable effect. Which suggests to me that I might be doing something wrong.

That said, I also spent a bit more time pursuing my original thought of rendering the plot to a single pixmap, then just displaying that single image. As it turned out, once I took a step back from PyQtGraph, implementing this solution using the base Qt classes/functions turned out to be surprisingly simple - simplified, no doubt, by the fact that I was already supplying the symbol brushes to the plot command as a list of QPainterPaths. So I was able to simply take the x and y coordinates, and use a QPainter to directly paint the QPainterPaths into a QPixmap, which could then be added to my plot as a single item. I *did* have to reduce the quality of the rendering a bit to fit things into memory - so if you zoom in far enough, the individual data cells have rough edges - but other than that, it seems to work fine - and performance goes from painfully sluggish on my old machine to silky smooth.

There may be other issues with this approach, but so far I haven’t run into any, at least with my specific application. More testing remains.

Thanks again!
---
Israel Brewster
Software Engineer
Alaska Volcano Observatory 
Geophysical Institute - UAF 
2156 Koyukuk Drive 
Fairbanks AK 99775-7320
cell:  907-328-9145 

Others may have more ideas? Let us know if any of that helps!

 - Martin

--
You received this message because you are subscribed to the Google Groups "pyqtgraph" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pyqtgraph+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pyqtgraph/CAD_p8v0iE_jYahUx7o2Oz-u%3DEor8ELyEJK1z53i-3YNvtK-dYw%40mail.gmail.com.

Ognyan Moore

unread,
May 26, 2021, 3:12:50 PM5/26/21
to pyqt...@googlegroups.com
Hi Israel,

I think Martin was pointing out that pyqtgraph offers downsampling methods which adjust sampling based on the pixel size of your display.   The "peak" downsample method draws a vertical line between the lowest and highest value within the range of values that would normally fit onto one pixel.  The problem with this method is that it assumes all data to be uniformly spaced in the x-axis, which of course may not be the case.  The nice thing about this downsampling method is that it will re-sample for you as your zoom level changes.

Ogi

Israel Brewster

unread,
May 26, 2021, 3:28:03 PM5/26/21
to pyqtgraph
On May 26, 2021, at 11:12 AM, Ognyan Moore <ognyan...@gmail.com> wrote:

Hi Israel,

I think Martin was pointing out that pyqtgraph offers downsampling methods which adjust sampling based on the pixel size of your display.   The "peak" downsample method draws a vertical line between the lowest and highest value within the range of values that would normally fit onto one pixel. 

Sure, and for a basic x-y plot, like a time series, where you have a row of dots along the x-axis, this would work fine. For geospatial data like I have, this would be a total mess - there is no line to connect the dots. Rather, the dots fill the field, and if you remove any of them, then the field will not be filled at any zoom level. Trying to bridge the gap with a line wouldn’t work

The problem with this method is that it assumes all data to be uniformly spaced in the x-axis, which of course may not be the case.

Right, and for geospatial data this is not only not the case, but any given X will have hundreds of data points that cover that X value.

Point being that while I’m sure the downsampling methods are good and very useful in general, for my geospatial data it simply doesn’t work.

Ognyan Moore

unread,
May 26, 2021, 3:37:58 PM5/26/21
to pyqt...@googlegroups.com
Completely missed your original post with the geospatial data bit, my apologies!  Side note, if there is some standard downsampling method involving geospatial data that you think would be useful to have in the library, please let me know.

For a sanity check, I would suggest running examples/ScatterPlotSpeedTest.py, and modify the parameters there (it uses a parameter tree so you can do this in the GUI) to most match your use-case (number of points and so on).  There is a zooming and panning mode you can adjust.  Throwing 150k points at it definitely took a toll, it's going at < 1fps.

We also recently filed an issue that would significantly speed up scatter plot code here: https://bugreports.qt.io/browse/PYSIDE-1572  If you think this issue would assist you, I would encourage you to chime in there encouraging the PySide developers to adopt this feature.



Israel Brewster

unread,
May 26, 2021, 5:24:03 PM5/26/21
to pyqt...@googlegroups.com
On May 26, 2021, at 11:37 AM, Ognyan Moore <ognyan...@gmail.com> wrote:

Completely missed your original post with the geospatial data bit, my apologies! 

No worries!

Side note, if there is some standard downsampling method involving geospatial data that you think would be useful to have in the library, please let me know.

I’m thinking that might be more specialized than would be appropriate. What I use when I need to “downsample” (or, really just resample, whether it is a downsample or not just depends on the parameters used) is the pyresample library, specifically in my case the bilinear interpolation for swath data functions (https://pyresample.readthedocs.io/en/latest/swath.html#pyresample-bilinear). The process generally takes several seconds for me, depending on the grid size I choose for the result, so I don’t know that it would integrate well into a “live” downsampling algorithm.


For a sanity check, I would suggest running examples/ScatterPlotSpeedTest.py, and modify the parameters there (it uses a parameter tree so you can do this in the GUI) to most match your use-case (number of points and so on).  There is a zooming and panning mode you can adjust.  Throwing 150k points at it definitely took a toll, it's going at < 1fps.

That sounds about right, maybe a bit slower on my machine (which is a 2013 model, so if it works well here, it should work well about anywhere). Functional, but a bit painful when trying to pan/zoom to a specific feature.

We also recently filed an issue that would significantly speed up scatter plot code here: https://bugreports.qt.io/browse/PYSIDE-1572  If you think this issue would assist you, I would encourage you to chime in there encouraging the PySide developers to adopt this feature.

Done. Looks interesting, although I might think that using pandas data frames or xarray datasets (both of which use numpy ndarrays under the hood) might be a bit simpler than “simple” multi-dimensional numpy arrays due to the named column features of those libraries making it easy to separate out x, y, brushes, symbols, etc vs a simple multi-dimensional numpy array. Just my 2¢ there though, not having looked at the implementation maybe just using straight ndarrays would be a better option for all I know :-)

BTW, might I say that PyQtGraph is an excellent library? I love how it is built upon the Qt classes, just adding features - it makes it so easy for me to hack on it when needed, or bring in additional “base” Qt functionality whenever it doesn’t do exactly what I want it to, rather than being forced to try to figure out how to “force” PyQtGraph itself to bend to my will. Makes it easy to use the best of both worlds - PyQtGraph, and Qt itself. Thanks for all the hard work!

Ognyan Moore

unread,
May 26, 2021, 10:21:32 PM5/26/21
to pyqt...@googlegroups.com
Hi Israel,

ImageItems have recently received significant performance boosts, line plots are in the process of getting a significant speed boost, and while scatter plots have also received a speed boost recently, we haven't figured out a good way to leverage vectorization or exploit continuous memory blocks; so there is likely a lot more performance to be had out there.  The PySide devs seem motivated to add more numpy compatible methods, so I'm hopeful that there is a performance boost to be had.

Glad to hear the library has worked out for you.  If you would like to hop on our slack workspace, feel free to come join us.  

Sorry I don't have a good suggestion for improving your scatter plot performance.  Oh, one tip involving scatter plots, which likely won't fix your situation is to reuse QPen, QColor and QBrush objects as much as possible instead of recreating new ones.  During testing, we found this to be a huge performance hit.

Ogi

Reply all
Reply to author
Forward
0 new messages