Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Plotting EXTREMELY large data set

1,497 views
Skip to first unread message

Patrick Hines

unread,
Sep 24, 2003, 6:52:06 PM9/24/03
to
I need our website to generate some pretty plots of some data. The rub is
that the datasets will be on the order of 30 GB of binary data. anyone have
experience with GNUPlot and this size data set?


Hans-Bernhard Broeker

unread,
Sep 25, 2003, 5:21:56 AM9/25/03
to
Patrick Hines <phi...@earthlink.net> wrote:

There is absolutely *no* sensible way you can display 30 GB of data in
a single plot. Not with any plotting program on the planet. The plot
would be completely unreadable. Note that a 30 GB dataset would have
at least 1000 times as many data points as there are pixels in any
diagram you could ever post as a webpage.

So you'll have to reduce that size, by several orders of magnitude,
and gnuplot is definitely not the tool to run that selection for you
--- it's prepared for large-ish datasets, but not for ones *that*
large. You'll have to come up with some idea what your plot is
supposed to tell the viewers, and how to extract that message from the
input data.
--
Hans-Bernhard Broeker (bro...@physik.rwth-aachen.de)
Even if all the snow were burnt, ashes would remain.

MidniteArrow

unread,
Sep 26, 2003, 1:09:12 AM9/26/03
to
Well, I completely disagree with that statement. However, at least you
agree with everyone else but me! While It is true that a lot of these data
points are redundant because of display technology limitations, they are not
when you consider the fact that you don't know before hand what the data
will look like. We can't just plot every nth data point, because there
could be data of interest that only exists outside of the plot frequency
(such as a spike that only lasts a millisecond or two, which we have seen in
the data). We could process the data and only show what is not redundant,
but the customer does not want that. They don't want our software to
"automate engineering decisions". They want a single plot with ALL the
data. That 30 GB number is a little misleading, but just a little. That
represents the entire data set - which will get turned into a bunch of
plots, not just one. For the last run, it was about 80 plots. But the
customer may want to plot 2,3,5,10 different series on the same plot, so I'd
rather get a clear answer on if GNUPlot can support this. Matlab can, but
it processes at about real time, which we'd like to improve upon. Another
perk of plotting all the data for the series at once (letting the display
device filter redundant points instead of our software) is that replots,
changing the display ranges, would be simple to implement (if they happen in
a timely manner with this much data).

"Hans-Bernhard Broeker" <bro...@physik.rwth-aachen.de> wrote in message
news:bkuc3k$jp2$1...@nets3.rz.RWTH-Aachen.DE...

Hans-Bernhard Broeker

unread,
Sep 26, 2003, 5:22:30 AM9/26/03
to
MidniteArrow <pri...@knology.net> wrote:

> While It is true that a lot of these data points are redundant
> because of display technology limitations,

This goes deeper than display technology. Human vision has a finite
resolution, too.

> process the data and only show what is not redundant, but the
> customer does not want that.

Then, with all due respect, your customer doesn't know what they're
talking about. You *will* always have some kind of preprocessing and
effective removal of redundant data, simply because there's no
30-Gigapixel display technology on the market, and even if there was,
it'd be useless, because humans don't have 30-Gigapixel eyes.

The only choice you get to make is *when* and *how* this reduction
happens.

> rather get a clear answer on if GNUPlot can support this.

Well, you can try. But there's more bad news waiting for you: gnuplot
doesn't handle binary datafiles for 2D plots, yet. And it will try to
*store* all the data it reads from the file, for at least a short
while. Which means there's good odds it'll just run out of memory, if
you don't cull redundant data before passing it to gnuplot.

Putting this all together: you could do it, but you would have to do
at least some kind of pre-filtering before you pass the data to
gnuplot.

michaels...@gmail.com

unread,
Feb 22, 2018, 5:21:23 PM2/22/18
to
Disappointing to see people attack the OP. The point of a plot is to visualize data, the guy has 30GB. Maybe he wants candle-sticks, maybe he wants a scatter plot with translucent dots. It's not inconceivable for a plot program to handle consolidation.

Lame comment about the eye resolution... although the high resolution area of the eye is small... the eye moves. Like a scanner.

Hans-Bernhard Bröker

unread,
Feb 22, 2018, 7:59:31 PM2/22/18
to
Am 22.02.2018 um 23:21 schrieb michaels...@gmail.com:
> Disappointing to see people attack the OP. The point of a plot is to visualize data, the guy has 30GB. Maybe he wants candle-sticks, maybe he wants a scatter plot with translucent dots. It's not inconceivable for a plot program to handle consolidation.
>
> Lame comment about the eye resolution... although the high resolution area of the eye is small... the eye moves. Like a scanner.

Congratulations, you just failed miserably ... at trolling.

Gavin Buxton

unread,
Feb 23, 2018, 8:55:54 AM2/23/18
to
I think the point that people are trying to make is that the data might be best represented statistically. For example, having all those data points is too much for the computer to handle, but if you went through and maybe plotted the numbers on a histogram then it would represent the larger data set with a much smaller data set. Having no idea what the data set is, I couldn't speculate on what this reduced form might look like, but suffice to say, most post-processing would be done outside of plotting (with gnuplot or any other plotting tool).

Ethan A Merritt

unread,
Feb 23, 2018, 2:24:44 PM2/23/18
to
Note that the original query was posted in 2003.

Possible trolling aside, in the 15 years since then gnuplot has added LFS
(Large File Support) and other optimizations for large data sets.

If you want to plot from a 30GB file, go right ahead.

Ethan

Gavin Buxton

unread,
Feb 24, 2018, 2:00:42 PM2/24/18
to
Ha ha, didn't see that!
0 new messages