The boxxyerrorbars style and variable color

Péter Juhász

unread,

Jun 24, 2010, 10:50:10 AM6/24/10

to

Dear developers,

The 'boxxyerrorbars' style provides a way to plot data with arbitrary
rectangles. I wanted to use this style with variable color - the
manual mentions that this is possible, but in practice it wasn't. The
reason for this is that the implementation of variable color with this
style is not complete.
This can be fixed relatively simply; I can upload a patch that
contains this fix and this fix only to Sourceforge. It shouldn't be
too messy.

However, while we are at it, I cannot help but notice that many other
styles are also currently exempt from the variable color mechanism.
Notably, every style that uses error bars.

A case could be made that at least from a users' standpoint, it would
be more consistent and more useful if most, if not all, available
styles supported this capability.

On the other hand, from a programmers' standpoint, this would
contribute to the further spaghettification of the code, at least if
we continue with the present tactic of stuffing the variable color
value into the yhigh attribute of the coordinates' structure (or into
the width one, with some styles). Alternatively, we could introduce an
additional color attribute to that structure, thus allowing for a
clean and universal solution, at the expense of more memory
consumption. As far as I can gather from the comments in the code,
this - the memory consumption - was the main reason why the business
with the yhigh-as-color was started in the first place.

So my question:
Should I
- upload a self-contained patch that adds variable color to the
boxxyerrorbars style only,
- write a more complete patch that adds this capability to all
errorbar styles where possible,
- strive for a rewrite that introduces a new color field and handles
all variable color stuff with that,
- do nothing
?

Péter Juhász

sfeam

unread,

Jun 24, 2010, 12:07:53 PM6/24/10

to

Péter Juhász wrote:

> Dear developers,
>
> The 'boxxyerrorbars' style provides a way to plot data with arbitrary
> rectangles. I wanted to use this style with variable color - the
> manual mentions that this is possible, but in practice it wasn't. The
> reason for this is that the implementation of variable color with this
> style is not complete.
> This can be fixed relatively simply; I can upload a patch that
> contains this fix and this fix only to Sourceforge. It shouldn't be
> too messy.

OK

> However, while we are at it, I cannot help but notice that many other
> styles are also currently exempt from the variable color mechanism.

[snip]

> from a programmers' standpoint, this would
> contribute to the further spaghettification of the code, at least if
> we continue with the present tactic of stuffing the variable color
> value into the yhigh attribute of the coordinates' structure (or into
> the width one, with some styles).

Yes, it is ugly. Some of the spaghetti code could be removed by
consolidating all plot styles into the more general mechanism used for
VECTOR, CIRCLES, and BOXES (plot2d.c lines 525-539).

> Alternatively, we could introduce an
> additional color attribute to that structure, thus allowing for a
> clean and universal solution, at the expense of more memory
> consumption. As far as I can gather from the comments in the code,
> this - the memory consumption - was the main reason why the business
> with the yhigh-as-color was started in the first place.

Correct. I have been carrying along a patch that does this, but
last time the topic came up there was resistance to increasing the
memory footprint for large datasets. It was at that point that I
started looking into introducing a separate plotting mode that would
"stream" data from an input file directly into plot commands rather
than storing it internally and plotting it in a separate path.
I figured that if a separate mode were available for very large data
files then the issue of per-data-point memory consumption for normal
data files would be less important. But I haven't gotten very far
with implementing such an alternate plotting mode.

> So my question:
> Should I
> - upload a self-contained patch that adds variable color to the
> boxxyerrorbars style only

> - write a more complete patch that adds this capability to all
> errorbar styles where possible,

I think that the 6-parameter styles can all be handled by adding them
to the list of plot styles in the switch statement at plot2d.c line 531,
and then carrying through variable_color_value appropriately at line 896.

> - strive for a rewrite that introduces a new color field and handles
> all variable color stuff with that,

I already have a patch that introduces a new color field, although
it probably needs to be updated. What we need is a stronger consensus
that the increased memory use is acceptable. Adding a separate color
field increases the size of (struct coordinate) by 12%.

> - do nothing

It wouldn't hurt to clean up the spaghetti code, if only to prepare
the way for a more complete extension later.

Ethan

Péter Juhász

unread,

Jun 24, 2010, 2:32:41 PM6/24/10

to

On Jun 24, 6:07 pm, sfeam <sf...@users.sourceforge.net> wrote:
> Péter Juhász wrote:
> > Dear developers,
>
> > The 'boxxyerrorbars' style provides a way to plot data with arbitrary
> > rectangles. I wanted to use this style with variable color - the
> > manual mentions that this is possible, but in practice it wasn't. The
> > reason for this is that the implementation of variable color with this
> > style is not complete.
> > This can be fixed relatively simply; I can upload a patch that
> > contains this fix and this fix only to Sourceforge. It shouldn't be
> > too messy.
>
> OK
>
>

done.

> Yes, it is ugly. Some of the spaghetti code could be removed by
> consolidating all plot styles into the more general mechanism used for
> VECTOR, CIRCLES, and BOXES (plot2d.c lines 525-539).
>

I was wondering why it wasn't done like that in the first place.
Would "consolidating" mean the LINES etc. styles, which write the
third column explicitly into the yhigh field?

>
> > So my question:
> > Should I
> > - upload a self-contained patch that adds variable color to the
> > boxxyerrorbars style only
> > - write a more complete patch that adds this capability to all
> > errorbar styles where possible,
>
> I think that the 6-parameter styles can all be handled by adding them
> to the list of plot styles in the switch statement at plot2d.c line 531,
> and then carrying through variable_color_value appropriately at line 896.

That's one part of the equation, but then there is the problem that
the yhigh field that we currently hijack for color information does
have a legitimate function in certain styles (YERRORBARS et al.), and
the other possible unused field, width in store2d_point(), is
sometimes used as well (for example, it being negative triggers
automatic box width calculation for styles that have boxes).
So we have to use whichever field is not taken, which is inconsistent
(and possibly interferes with autoscaling), then have some ugly
special cases in the plot_boxes(), plot_bars() etc. functions
themselves.

> > - strive for a rewrite that introduces a new color field and handles
> > all variable color stuff with that,
>
> I already have a patch that introduces a new color field, although
> it probably needs to be updated. What we need is a stronger consensus
> that the increased memory use is acceptable. Adding a separate color
> field increases the size of (struct coordinate) by 12%.
>

Perhaps a series of tests that would compare the two
versions' (present way vs. additional color field version) memory
consumption with some realistic scenarios with large files.

> Ethan

sfeam

unread,

Jun 24, 2010, 4:26:11 PM6/24/10

to

Péter Juhász wrote:

>> I already have a patch that introduces a new color field, although
>> it probably needs to be updated. What we need is a stronger consensus
>> that the increased memory use is acceptable. Adding a separate color
>> field increases the size of (struct coordinate) by 12%.
>>
>
> Perhaps a series of tests that would compare the two
> versions' (present way vs. additional color field version) memory
> consumption with some realistic scenarios with large files.

An alternative approach occurred to me over lunch, while pondering the
improbability of Japan scoring twice off free kicks against Denmark.

Rather than stuffing variable color information into a new field in the
plot->points[] array, we can create a dynamically allocated parallel
array for it plot->colors[]. That will both clean up the code and
avoid any memory penalty for plots that do not use variable color.
The only down side I can see is that it affects more places in the code.
But given that it should lead to overall cleaner code, it's probably
worth the price.

Ethan

Péter Juhász

unread,

Jun 25, 2010, 4:08:27 AM6/25/10

to

On Jun 24, 10:26 pm, sfeam <sf...@users.sourceforge.net> wrote:

> An alternative approach occurred to me over lunch, while pondering the
> improbability of Japan scoring twice off free kicks against Denmark.
>
> Rather than stuffing variable color information into a new field in the
> plot->points[] array, we can create a dynamically allocated parallel
> array for it plot->colors[]. That will both clean up the code and
> avoid any memory penalty for plots that do not use variable color.
> The only down side I can see is that it affects more places in the code.
> But given that it should lead to overall cleaner code, it's probably
> worth the price.
>
> Ethan

I was going to propose something similar.
In fact, a similar reasoning could be applied to other fields of the
coordinate structure.
Some styles don't need an xlow/xhigh, or ylow/yhigh field, and the
POINTSTYLE, LINES styles, which are the most commonly used styles of
gnuplot, need neither of those.

So instead of defining a coordinates structure with a pre-set number
of fields and having a plot->points[] array of this type, why not have
a separate, dynamically allocated array for every field needed?
This way we would have to allocate an array for coordinates that are
needed for the particular style and particular use case. In case of
the aforementioned POINTS, etc. style, this would mean a considerable
decrease of memory usage (two doubles per point instead of seven, as
of now).
The argument can be extended to the other direction, too: if a
particular style demands extra fields, we can just allocate extra
arrays for it. This would make saner code in the case of styles like
3D vectors etc.
The variable color, as well as the variable pointsize mechanism could
be naturally assimilated into this scheme. (Note that the latter is
currently supported only for the POINTS and LINESPOINTS style, but not
for the errorbar styles that have a point component.) Perhaps other
extensions could be supported via this: e.g. variable arrow sizes for
vectors etc.

This is a major rewrite, however. Luckily, a mechanic search and
replace could deal with most of the places where the point coordinates
are accessed, only the places where they are allocated and filled with
data need particular attention.

Péter Juhász

Hans-Bernhard Bröker

unread,

Jun 25, 2010, 5:18:57 AM6/25/10

to

Am 25.06.2010 10:08, schrieb Péter Juhász:
> On Jun 24, 10:26 pm, sfeam<sf...@users.sourceforge.net> wrote:

>> Rather than stuffing variable color information into a new field in the
>> plot->points[] array, we can create a dynamically allocated parallel
>> array for it plot->colors[]. That will both clean up the code and

> So instead of defining a coordinates structure with a pre-set number

> of fields and having a plot->points[] array of this type, why not have
> a separate, dynamically allocated array for every field needed?

Ah, ye olde array-of-structs vs. struct-of-arrays dispute. The downside
of the struct of arrays being that you need several malloc()s instead of
just one, thus moving to more memory fragmentation.

To really solve this would require a complete re-think. Get rid of
structs altogether, and use a dynamical 2D array instead. I.e. allocate
<# columns> * <# data points> * sizeof(double), dump all the data in
there, and use parametrized macros to extract individual coordinates of
a given datapoint. It may even be help to "tile" this, i.e. go from one
huge malloc() to a set of fixed-size ones.

The memory layout of 'vtk', the Visualization ToolKit, might be worth
studying.

But let's take this to the developers' mailing list --- it really
doesn't concern the users.