how to determine a sensible width for bars in a plotter.Barchart

302 views
Skip to first unread message

Dan Kortschak

unread,
Jul 15, 2013, 9:17:04 PM7/15/13
to plotinum...@googlegroups.com
I have a set of mechanically determined data sets which have two series
(of the same length) but whose series lengths vary depending on the
experiments.

Using the plotter.BarChart type (at this stage there is no statistical
analysis so a box plot is not appropriate) I can get a good comparison
of the two series. The issue I'm facing though is how to determine a
good value of width for creating the chart. I should be able to
determine this from the number of points in the series and the concrete
length of the x-axis, but I don't think I can do that since
p.Transforms() takes a DrawArea, which I don't have (I'm using p.Save).

Any suggestions? Just do the Save actions myself?

Dan

Ethan Burns

unread,
Jul 16, 2013, 9:57:35 AM7/16/13
to plotinum...@googlegroups.com
Hi Dan,

Unfortunately, I don't think there is a great way to achieve what you want, because the bar chart (and box plot) widths are in absolute units, not units relative to the final plot.Save width. Perhaps, this was an oversight.  Maybe a future version (gonum?) could change these width parameters to be a percentage of the total final data area width.  Ideally, we would be able to support both without the API being terrible, but that may take some thought.  Anyway, it should be easy to copy/hack the bar chart plotter to try using relative widths.  If it works much better than the absolute widths then please open an issue with label gonum and attach the code.  I'd really appreciate it.

Otherwise, I recommend that you compute the widths based on the width passed to plot.Save minus the width of the y axis and its label (which you have to figure out by playing with it a bit).  So, if you are making a 3cm plot, let's pretend there's 1cm worth of y axis (just to make the calculation easy), then we could pretend that there is 2cm worth of data draw width.  Divide that evenly by your bars allowing for some spacing between them.  It's probably still going to be finicky, because the width of the left-most and right-most bars are half in and half out of the draw area (if that makes any sense), because they are treated as glyphs.


Best,
Ethan

Dan Kortschak

unread,
Jul 16, 2013, 7:40:29 PM7/16/13
to Ethan Burns, plotinum...@googlegroups.com
Thanks Ethan,

On Tue, 2013-07-16 at 06:57 -0700, Ethan Burns wrote:
> Unfortunately, I don't think there is a great way to achieve what you want,
> because the bar chart (and box plot) widths are in absolute units, not
> units relative to the final plot.Save width.

Yes, there are a few issues that collide to cause this problem if a
client wants to directly use BarChart.

I've figured out a solution, though it is convoluted by a circularity in
the API:

I create the Plot and BarCharts and then make a DrawArea and get the trX
to find out how wide the plot area is. I can then use this to determine
the width and set the Width field of the BarCharts. This approach
requires that I lie to NewBarChart since I must give it a non-zero width
value or it errors out. While it's no real difference assigning a 0 or
a !0, it looks weird and would require a comment to explain it if this
were being published.

I then recreate the actions performed by Save using my created DrawArea
(I could throw it away and use Save I guess, but that doesn't feel nice
either).

> Perhaps, this was an oversight. Maybe a future version (gonum?) could
> change these width parameters to be a percentage of the total final
> data area width. Ideally, we would be able to support both without
> the API being terrible, but that may take some thought.

Yes, this is a difficult path; have a look at the parameterisation of R
graphics for a view into Hell.

> Anyway, it should be easy to copy/hack the bar chart plotter to try
> using relative widths. If it works much better than the absolute
> widths then please open an issue with label gonum and attach the code.
> I'd really appreciate it.

What I have is working, so I probably won't given my time constraints,
but if I do, I will certainly send that.


Wandering around in the code while I was sorting this out raised a few
issues that I think would be worth addressing/discussing before the
gonum transition. Are you up for a googledoc or wiki page for design
discussion for possible changes during that transition?

Things that struck me:
1. There seem to be a fair few things that could be exposed without
danger and would allow a more general application of types (an
example of this is the change that Seb made to make a
gnuplot-style plot - his approach was forced by the fact that
some axis types and methods are unexported, so a closure was not
possible).
2. A corollary of 1. is the possibility of use of interfaces rather
than concrete types for some of the primitives, e.g. if Axis
were an interface instead of a concrete type an alternative
implementation could be provided that does gnuplot-style
rendering.
3. The plotter types hold concrete values that are a copy of the
values returned by an interface. I don't understand the
motivation behind this.
4. There is opininionation on acceptable values that can be used in
type creation by plotter.New* methods but which can be altered
because the fields are public (this is what made my solution
possible, though required the lying).

cheers
Dan

Ethan Burns

unread,
Jul 17, 2013, 9:41:45 AM7/17/13
to Dan Kortschak, plotinum...@googlegroups.com
On Tue, Jul 16, 2013 at 7:40 PM, Dan Kortschak
<dan.ko...@adelaide.edu.au> wrote:
> Thanks Ethan,
>
> On Tue, 2013-07-16 at 06:57 -0700, Ethan Burns wrote:
>> Unfortunately, I don't think there is a great way to achieve what you want,
>> because the bar chart (and box plot) widths are in absolute units, not
>> units relative to the final plot.Save width.
>
> Yes, there are a few issues that collide to cause this problem if a
> client wants to directly use BarChart.
>
> I've figured out a solution, though it is convoluted by a circularity in
> the API:
>
> I create the Plot and BarCharts and then make a DrawArea and get the trX
> to find out how wide the plot area is. I can then use this to determine
> the width and set the Width field of the BarCharts. This approach
> requires that I lie to NewBarChart since I must give it a non-zero width
> value or it errors out. While it's no real difference assigning a 0 or
> a !0, it looks weird and would require a comment to explain it if this
> were being published.
>
> I then recreate the actions performed by Save using my created DrawArea
> (I could throw it away and use Save I guess, but that doesn't feel nice
> either).

That certainly is roundabout.

>> Perhaps, this was an oversight. Maybe a future version (gonum?) could
>> change these width parameters to be a percentage of the total final
>> data area width. Ideally, we would be able to support both without
>> the API being terrible, but that may take some thought.
>
> Yes, this is a difficult path; have a look at the parameterisation of R
> graphics for a view into Hell.

You're right, we don't want to have too many parameters; I'd rather
have fewer features.

>> Anyway, it should be easy to copy/hack the bar chart plotter to try
>> using relative widths. If it works much better than the absolute
>> widths then please open an issue with label gonum and attach the code.
>> I'd really appreciate it.
>
> What I have is working, so I probably won't given my time constraints,
> but if I do, I will certainly send that.

That is fair. I'll add an issue to reconsider this problem for gonum.
(https://code.google.com/p/plotinum/issues/detail?id=133&thanks=133).

> Wandering around in the code while I was sorting this out raised a few
> issues that I think would be worth addressing/discussing before the
> gonum transition. Are you up for a googledoc or wiki page for design
> discussion for possible changes during that transition?

Sure. But, I am going to be out of the country until July 27th, then
I'll be moving to a new apartment, so I may not be available to work
on it more until August 3rd-4th. Perhaps then I can start moving
Plotinum over to gonum. I'll ask for a Plotinum repository to be
created on gonum-dev, and you can start a wiki page there if you'd
like.

> Things that struck me:
> 1. There seem to be a fair few things that could be exposed without
> danger and would allow a more general application of types (an
> example of this is the change that Seb made to make a
> gnuplot-style plot - his approach was forced by the fact that
> some axis types and methods are unexported, so a closure was not
> possible).

One of my goals was to leave the exported side of the interface as
small as possible. But, I must admit that I didn't foresee someone
wanting to change the style of the axes.

> 2. A corollary of 1. is the possibility of use of interfaces rather
> than concrete types for some of the primitives, e.g. if Axis
> were an interface instead of a concrete type an alternative
> implementation could be provided that does gnuplot-style
> rendering.

That would be a bigger change, because not only would Axis need to be
an interface but Plot would need changes to accept a more general form
of axis. For example, Plot assumes that the Y axis is only on the
left side and the x axis is only on the bottom of the plot.
Personally, I don't like plots that completely box in their data area
like gnuplot does, so I never considered that someone would want to
add support for that. I am open to it, but I think it will be a lot
of work.

> 3. The plotter types hold concrete values that are a copy of the
> values returned by an interface. I don't understand the
> motivation behind this.

That was done on purpose. The problem with holding a reference to the
data is that if the user changes the data, then plotters made with
that reference can be invalid--they expect the data to remain
unchanged from the time of creation until they are drawn to the plot.
My preference was to make it impossible for subtle bugs like that to
creep into a user's plot by requiring the copy in all of the standard
plotters. This is the same reason why I decided to check and error on
NaN and infinity values in the data. These are almost always bugs,
they can be very hard to track down, it is easy for them to go
unnoticed, and we don't want papers being published with bad results
because of plotting bugs.

By the way, it should be very easy to copy and hack any of the
standard plotters to create non-copying versions. So, if someone has
so much data that they copy is taking too much memory or is taking a
very long time, then they are free to make their own reference-only
versions of whatever plotters they use.

> 4. There is opininionation on acceptable values that can be used in
> type creation by plotter.New* methods but which can be altered
> because the fields are public (this is what made my solution
> possible, though required the lying).

I agree that we should probably not error on zero-width bars and boxes.


Best,
Ethan

Dan Kortschak

unread,
Jul 17, 2013, 7:20:06 PM7/17/13
to Ethan Burns, plotinum...@googlegroups.com
On Wed, 2013-07-17 at 09:41 -0400, Ethan Burns wrote:
> > Yes, this is a difficult path; have a look at the parameterisation of R
> > graphics for a view into Hell.
>
> You're right, we don't want to have too many parameters; I'd rather
> have fewer features.

This is where I think interfaces really come into the picture. I've used
conditional interface implementation to quite good effect (if I say so
myself) in my rings package to fine tune data representation.

> Sure. But, I am going to be out of the country until July 27th, then
> I'll be moving to a new apartment, so I may not be available to work
> on it more until August 3rd-4th. Perhaps then I can start moving
> Plotinum over to gonum. I'll ask for a Plotinum repository to be
> created on gonum-dev, and you can start a wiki page there if you'd
> like.

Yeah, I'm happy to start populating that - probably also link to this
thread.

I think there should probably be a relatively long migration period
while the design discussion happens.

> That would be a bigger change, because not only would Axis need to be
> an interface but Plot would need changes to accept a more general form
> of axis. For example, Plot assumes that the Y axis is only on the
> left side and the x axis is only on the bottom of the plot.

A single optional interface implementation would cover this to get the
Axis' desired placement or fall back to the plotinum standard placement.

> Personally, I don't like plots that completely box in their data area
> like gnuplot does, so I never considered that someone would want to
> add support for that. I am open to it, but I think it will be a lot
> of work.

Yes, this will undoubtedly be a fair amount of work. I think that given
that you are happy with how plotinum works as it stands that it would
fall on others to come up with proposals to get this up.

I also don't particularly like boxed graphs, but some journals like them
and for more complex figures they can be helpful e.g. [1].

> That was done on purpose. The problem with holding a reference to the
> data is that if the user changes the data, then plotters made with
> that reference can be invalid--they expect the data to remain
> unchanged from the time of creation until they are drawn to the plot.

I think the python dictum of "We're all adults here" is relevant. IFAICS
this is integrated in the Go idiom too; byte slices are often reused in
potentially dangerous ways throughout the standard library. All it
really takes is a statement in the docs that the data are expected to
remain unaltered after the call to (*plot).Add until the call to
(*plot).Draw or (*plot).Save.

The advantage of using an interface is that you can make a lot of the
plotter types' field parameterisation go away by querying the values
their prefered representation.

> My preference was to make it impossible for subtle bugs like that to
> creep into a user's plot by requiring the copy in all of the standard
> plotters. This is the same reason why I decided to check and error on
> NaN and infinity values in the data.

I agree with this check - even if only because axis length and plot size
are indeterminable if these exist and data points should not be
arbitrarily dropped. R does the same thing.

> These are almost always bugs, they can be very hard to track down, it
> is easy for them to go unnoticed, and we don't want papers being
> published with bad results because of plotting bugs.
>
> By the way, it should be very easy to copy and hack any of the
> standard plotters to create non-copying versions.

Yes. Of course.

> So, if someone has so much data that they copy is taking too much
> memory or is taking a very long time, then they are free to make their
> own reference-only versions of whatever plotters they use.

My issue with the using a concrete copied type is not so much that it's
copied, but that it's not an interface (for the reasons above). If
people are plotting so much that that memory is becoming an issue here
they are probably in need of some data summarisation prior to the plot
stage.


thanks
Dan

[1]http://musicroamer.com/blog/2011/01/16/r-tips-and-tricks-modified-pairs-plot/

Ethan Burns

unread,
Jul 17, 2013, 8:01:10 PM7/17/13
to plotinum...@googlegroups.com

Forgot to reply all.

---------- Forwarded message ----------
From: "Ethan Burns" <burns...@gmail.com>
Date: Jul 17, 2013 8:00 PM
Subject: Re: gonum transition design discussion
To: "Dan Kortschak" <dan.ko...@adelaide.edu.au>
Cc:

On Jul 17, 2013 7:20 PM, "Dan Kortschak" <dan.ko...@adelaide.edu.au> wrote:
>
> On Wed, 2013-07-17 at 09:41 -0400, Ethan Burns wrote:
> > > Yes, this is a difficult path; have a look at the parameterisation of R
> > > graphics for a view into Hell.
> >
> > You're right, we don't want to have too many parameters; I'd rather
> > have fewer features.
>
> This is where I think interfaces really come into the picture. I've used
> conditional interface implementation to quite good effect (if I say so
> myself) in my rings package to fine tune data representation.
>
> > Sure.  But, I am going to be out of the country until July 27th, then
> > I'll be moving to a new apartment, so I may not be available to work
> > on it more until August 3rd-4th.  Perhaps then I can start moving
> > Plotinum over to gonum.  I'll ask for a Plotinum repository to be
> > created on gonum-dev, and you can start a wiki page there if you'd
> > like.
>
> Yeah, I'm happy to start populating that - probably also link to this
> thread.

Thanks.

> I think there should probably be a relatively long migration period
> while the design discussion happens.

I completely agree.  It is rare to have a period to rethink an API after it has seen some use.  We should take full advantage of it.

> > That would be a bigger change, because not only would Axis need to be
> > an interface but Plot would need changes to accept a more general form
> > of axis.  For example, Plot assumes that the Y axis is only on the
> > left side and the x axis is only on the bottom of the plot.
>
> A single optional interface implementation would cover this to get the
> Axis' desired placement or fall back to the plotinum standard placement.
>
> > Personally, I don't like plots that completely box in their data area
> > like gnuplot does, so I never considered that someone would want to
> > add support for that.  I am open to it, but I think it will be a lot
> > of work.
>
> Yes, this will undoubtedly be a fair amount of work. I think that given
> that you are happy with how plotinum works as it stands that it would
> fall on others to come up with proposals to get this up.

Sounds good to me.

> I also don't particularly like boxed graphs, but some journals like them
> and for more complex figures they can be helpful e.g. [1].

Yes, I can see your point.

> > That was done on purpose.  The problem with holding a reference to the
> > data is that if the user changes the data, then plotters made with
> > that reference can be invalid--they expect the data to remain
> > unchanged from the time of creation until they are drawn to the plot.
>
> I think the python dictum of "We're all adults here" is relevant. IFAICS
> this is integrated in the Go idiom too; byte slices are often reused in
> potentially dangerous ways throughout the standard library. All it
> really takes is a statement in the docs that the data are expected to
> remain unaltered after the call to (*plot).Add until the call to
> (*plot).Draw or (*plot).Save.

I still prefer the safety.  I also like type checking and bounds checking.  We may all be adults, but we are also only human. Though, I may not quite understand your proposal (see below), so perhaps in this case the benefits are worth it.

> The advantage of using an interface is that you can make a lot of the
> plotter types' field parameterisation go away by querying the values
> their prefered representation.

I don't understand how avoiding the copy on XYers and Valuers allows us to avoid parameter fields in the plotter types.  I wonder if I am miss understanding your proposal.

> > My preference was to make it impossible for subtle bugs like that to
> > creep into a user's plot by requiring the copy in all of the standard
> > plotters.  This is the same reason why I decided to check and error on
> > NaN and infinity values in the data.
>
> I agree with this check - even if only because axis length and plot size
> are indeterminable if these exist and data points should not be
> arbitrarily dropped. R does the same thing.
>
> > These are almost always bugs, they can be very hard to track down, it
> > is easy for them to go unnoticed, and we don't want papers being
> > published with bad results because of plotting bugs.
> >
> > By the way, it should be very easy to copy and hack any of the
> > standard plotters to create non-copying versions.
>
> Yes. Of course.
>
> > So, if someone has so much data that they copy is taking too much
> > memory or is taking a very long time, then they are free to make their
> > own reference-only versions of whatever plotters they use.
>
> My issue with the using a concrete copied type is not so much that it's
> copied, but that it's not an interface (for the reasons above). If
> people are plotting so much that that memory is becoming an issue here
> they are probably in need of some data summarisation prior to the plot
> stage.

I agree about summarizing, but I don't understand why keeping it an interface helps.  I may need an example.

By the way, I am going to be out of the country starting now, so I'll be mostly away from email until the 27th.  Talk to you then, and thanks for helping think about this stuff with me.

Ethan

tracey....@gmail.com

unread,
Jul 19, 2013, 3:17:51 PM7/19/13
to plotinum...@googlegroups.com
Just to chime in, one benefit of allowing axes in different places is to allow multiple Y (or X) axes on a single plot (as in http://www.mathworks.com/help/matlab/ref/plotyy.html)

Dan Kortschak

unread,
Jul 19, 2013, 6:42:44 PM7/19/13
to tracey....@gmail.com, plotinum...@googlegroups.com
Yes, this is what prompted the query.

Dan
Reply all
Reply to author
Forward
0 new messages