Gnuplot is memory hungry

roucaries dot bastien

unread,

Jan 27, 2009, 7:34:30 AM1/27/09

to

Hi,

From debian bug http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=513114

It need more then 1G of memory in order to draw a 40MB file !

How can i debug this problem?

Regards

sfeam

unread,

Jan 27, 2009, 1:32:43 PM1/27/09

to

roucaries dot bastien wrote:

I think you first need to debug your data-generation program.
The source file mathieustab4.c attached to that bug report
gives compilation warnings here (gcc 4.2.2), and generated 750+MB
of output before I killed it.

Anyhow, the amount of memory required is proportional to the number
of input data points. It has no relationship to the size of the
output file. A bitmap output file, for instance, will always
be roughly the same size no matter how simple or complex your data.

Sylvain Lévêque

unread,

Jul 11, 2009, 2:34:14 PM7/11/09

to

Hello

>> From debian bug http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=513114
>>
>> It need more then 1G of memory in order to draw a 40MB file !
>>
>> How can i debug this problem?
>
> I think you first need to debug your data-generation program.
> The source file mathieustab4.c attached to that bug report
> gives compilation warnings here (gcc 4.2.2), and generated 750+MB
> of output before I killed it.

Here's a scenario for testing. First, save anything you are working on!!!!

Plot a 1MB zeroed file, then, if you get a plot, increase count
progressively until reaching the "out of memory for expanding curve
points" error. Your computer might be unresponsive for some time.

$ cd /tmp
$ dd if=/dev/zero of=testfile bs=1M count=1
$ echo "plot 'testfile' binary format='%int8' using 1" | gnuplot

It happens here on a Cygwin running gnuplot 4.2 patchlevel 4. Same
situation on a Debian running gnuplot 4.2 patchlevel 5.

It happens with at least the x11 and the png output drivers.

Things get better if I use the "every" parameter.

$ echo "plot 'testfile' binary format='%int8' every 2 using 1" | gnuplot

but the gnuplot process eats over 250MB of RAM.

> Anyhow, the amount of memory required is proportional to the number
> of input data points.

The proportion factor seems to be particularly high, in this case... Is
each sample stored in a float or a struct? In my case, I specify it
should be treated as int8.

Any suggestion?

Thanks in advance
--
Sylvain

sfeam

unread,

Jul 11, 2009, 8:40:24 PM7/11/09

to

Sylvain Lévêque wrote:

> Hello
>
>>> From debian bug
>>> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=513114
>>>
>>> It need more then 1G of memory in order to draw a 40MB file !
>>>
>>> How can i debug this problem?
>>
>> I think you first need to debug your data-generation program.
>> The source file mathieustab4.c attached to that bug report
>> gives compilation warnings here (gcc 4.2.2), and generated 750+MB
>> of output before I killed it.
>
> Here's a scenario for testing. First, save anything you are working
> on!!!!
>
> Plot a 1MB zeroed file, then, if you get a plot, increase count
> progressively until reaching the "out of memory for expanding curve
> points" error. Your computer might be unresponsive for some time.
>
> $ cd /tmp
> $ dd if=/dev/zero of=testfile bs=1M count=1
> $ echo "plot 'testfile' binary format='%int8' using 1" | gnuplot

OK.
I tried it here and got a roughly linear increase in memory usage
VIRT RES SHR
1MB 222m 197m 12m
2MB 401m 376m 12m
4MB 761m 736m 12m
8MB 1278m 1.4G 2480
16MB out of memory for expanding curve points

So it can store and process more than 8 million points.
Not too shabby!

Of course there's no way that it makes any sense to plot 8 million
points on a screen that has only about ~1 million pixels.

It took 150 seconds to produce the plot using an 8MB input file
and the commands:
set datafile nofpe_trap
plot "testfile" binary format="%int8" using 1 with dots

This was on a 32-bit OS, by the way. I think there would
effectively be no such limit in a 64-bit environment.

> It happens with at least the x11 and the png output drivers.

Sure. It has nothing to do with the output driver.

> Things get better if I use the "every" parameter.
>
> $ echo "plot 'testfile' binary format='%int8' every 2 using 1" |

Well yeah. Now you're throwing out every second data point
so of course it only takes half the storage space.

>> Anyhow, the amount of memory required is proportional to the number
>> of input data points.
>
> The proportion factor seems to be particularly high, in this case...
> Is each sample stored in a float or a struct? In my case, I specify it
> should be treated as int8.

In any sort of linear input mode (i.e. not array or image plots)
each point is stored in a structure

typedef double coordval;
typedef struct coordinate {
enum coord_type type;
coordval x, y, z;
coordval ylow, yhigh; /* ignored in 3d */
coordval xlow, xhigh; /* also ignored in 3d */
} coordinate;

So the storage requirement per point is 7 * (sizeof double) + 1 * (sizeof int)
On ancient 16-bit architectures, coordval is configured by default
as float rather than double. You could do the same to gain a factor of 2,
but you'd lose precision.

On the other hand, we were considering whether to add additional
fields to the structure for the next version of gnuplot.
If you can make a compelling case for reducing the memory requirements,
we might reconsider.

> Any suggestion?

Maybe. Please explain again what you are trying to accomplish.

If you are looking for a streaming mode (plot each point as it is
read then throw it away) you're out of luck; gnuplot doesn't have
such a mode. On the other hand, you could probably abuse multiplot
mode to read XX points + plot, read another XX points + plot,
read another XX points + plot, etc, etc, ad nauseam.
In multiplot mode the successive batches of plotted points would
appear superimposed on the same output screen.

Ethan

Sylvain Lévêque

unread,

Jul 12, 2009, 2:55:24 AM7/12/09

to

Hello

> I tried it here and got a roughly linear increase in memory usage
> VIRT RES SHR
> 1MB 222m 197m 12m
> 2MB 401m 376m 12m
> 4MB 761m 736m 12m
> 8MB 1278m 1.4G 2480
> 16MB out of memory for expanding curve points
>
> So it can store and process more than 8 million points.
> Not too shabby!

I need to plot 64MB files.

> Of course there's no way that it makes any sense to plot 8 million
> points on a screen that has only about ~1 million pixels.

I understand your argument, but I didn't find my way to do what I
exactly need.

> It took 150 seconds to produce the plot using an 8MB input file
> and the commands:
> set datafile nofpe_trap

Available as of 4.3, it seems. Thanks for the hint.

> plot "testfile" binary format="%int8" using 1 with dots
>
> This was on a 32-bit OS, by the way. I think there would
> effectively be no such limit in a 64-bit environment.

Ok.

>>> Anyhow, the amount of memory required is proportional to the number
>>> of input data points.
>> The proportion factor seems to be particularly high, in this case...
>> Is each sample stored in a float or a struct? In my case, I specify it
>> should be treated as int8.
>
> In any sort of linear input mode (i.e. not array or image plots)
> each point is stored in a structure
>
> typedef double coordval;
> typedef struct coordinate {
> enum coord_type type;
> coordval x, y, z;
> coordval ylow, yhigh; /* ignored in 3d */
> coordval xlow, xhigh; /* also ignored in 3d */
> } coordinate;
>
> So the storage requirement per point is 7 * (sizeof double) + 1 * (sizeof int)

Well, that makes everything much clearer.

> On the other hand, we were considering whether to add additional
> fields to the structure for the next version of gnuplot.
> If you can make a compelling case for reducing the memory requirements,
> we might reconsider.
>
>> Any suggestion?
>
> Maybe. Please explain again what you are trying to accomplish.
>
> If you are looking for a streaming mode (plot each point as it is
> read then throw it away) you're out of luck; gnuplot doesn't have
> such a mode. On the other hand, you could probably abuse multiplot
> mode to read XX points + plot, read another XX points + plot,
> read another XX points + plot, etc, etc, ad nauseam.
> In multiplot mode the successive batches of plotted points would
> appear superimposed on the same output screen.

That was the kind of solution I was thinking of, plotting my large files
by 100000 samples at a time, then reassembling them.

My needs are quite simple: I work with oscilloscope curves, that I would
like to batch process for inclusion in reports (take 100 curves, plot
them to get 100 PNG of a defined size, for example). I don't want hand
manipulation, because I need to guarantee reproductibility, so
screenshots of our dedicated GUI curve viewer is not an option.

The format is a binary file of successive int8 samples, exactly the use
case I described in my previous post. I want to plot the implicit sample
number on x (position in the file), its value on y, no advanced
processing is expected.

Ideally, I would like vertical bars showing the dynamic range of samples
that are represented in one column of pixel on the screen, and that
would refresh when zooming. I didn't find this min-max range possibility
in gnuplot, so I obtain the same effect by plotting all the samples with
lines to the expense of a longer processing time (displaying thousands
of samples instead of only the line between min and max).

The precise min-max range is very important to me, so downsampling is
not an option. "every" makes things better, but I need an "every 10" for
the 64MB files to be processed, which is definitely not what I want.

Given the internal data representation in gnuplot, it seems I have to
use something else to avoid running in memory problems. Any suggestion?

Thanks
--
Sylvain

Hans-Bernhard Bröker

unread,

Jul 12, 2009, 4:41:52 PM7/12/09

to

Sylvain Lï¿œvï¿œque wrote:

> My needs are quite simple: I work with oscilloscope curves, that I would
> like to batch process for inclusion in reports (take 100 curves, plot
> them to get 100 PNG of a defined size, for example). I don't want hand
> manipulation, because I need to guarantee reproductibility, so
> screenshots of our dedicated GUI curve viewer is not an option.

As is so often the case, all this could have solved a _lot_ faster if
those needs had been described right from the beginning. What you need
is to read all of "help every" and apply it to your task.

> The precise min-max range is very important to me, so downsampling is
> not an option.

Not quite correct. Downsampling would absolutely be an option --- you
would just have to do it yourself, to fit your exact needs.

Please try to remember that gnuplot is not meant to be a full-service
data processing environment. It only plays one on TV.

Sylvain Lévêque

unread,

Jul 12, 2009, 7:19:45 PM7/12/09

to

>> My needs are quite simple: I work with oscilloscope curves, that I
>> would like to batch process for inclusion in reports (take 100 curves,
>> plot them to get 100 PNG of a defined size, for example). I don't want
>> hand manipulation, because I need to guarantee reproductibility, so
>> screenshots of our dedicated GUI curve viewer is not an option.
>
> As is so often the case, all this could have solved a _lot_ faster if
> those needs had been described right from the beginning. What you need
> is to read all of "help every" and apply it to your task.

Well, I read "help every" again, and I don't think it will do what I
want, because there is no reason why the min and max would belong to the
samples that would be kept. When I try it, it is obvious the images are
not the same.

I played with "every" this morning, splitting my large file in several
chunks I process separately, iterating on them. It sure works, but it
really takes a long time. Oddly enough, if I halve my chunks, doubling
the number of iterations, the processing time almost doubles. It behaves
as if for each iteration the whole file was read, not just the chunk
that should be considered because of the "every ::start_pos::end_pos"
that I specified.

My initial point was to give an easy-to-reproduce use case, simpler than
what was provided in the initial post, so that whenever another person
runs into it, he will find some explanation.

Grace seems to handle large files, but is not as documented as gnuplot
and requires converting to ASCII first, so I am not sure I want to
follow this path.

>> The precise min-max range is very important to me, so downsampling is
>> not an option.
>
> Not quite correct. Downsampling would absolutely be an option --- you
> would just have to do it yourself, to fit your exact needs.

I guess I'll have to keep mins and max for fixed sample intervals,
recomputing everything whenever I decide to change the output size as
the interval to consider will have changed. It is just a little annoying
that there is no easy/dumb way to do that. Matlab generates these images
without a complain.

Thanks for your help.
--
Sylvain

Hans-Bernhard Bröker

unread,

Jul 13, 2009, 5:18:44 PM7/13/09

to

Sylvain Lï¿œvï¿œque wrote:

> It behaves
> as if for each iteration the whole file was read, not just the chunk
> that should be considered because of the "every ::start_pos::end_pos"
> that I specified.

Not quite, but close enough. The code does read sequentially from
beginning-of-file to <end_pos>. That's how it finds <start_pos>.
Remember that binary files are not the native data format of gnuplot.
It was all developed for and with ASCII data files, for which there is
no other option but to read it all to find a particular position. Last
I looked, we were still doing the same thing for binary data.

The net effect is the same, of course. You're effectively reading
roughly N^2/(2*n) bytes (file size N, chunk size n), so yes, a factor of
2 decrease in n will increase processing time by that same factor. If
it were reading the entire file every time, it'd be twice that: N^2/n