Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

variable-width of box plot (candlesticks)

297 views
Skip to first unread message

Witold Baryluk

unread,
Dec 16, 2009, 7:24:37 PM12/16/09
to
Hi,

this is my first post here, but I was exclusivly using gnuplot in last
10 years for ploting
practically all my data. I wanted to thanks all persons involved in
development of
gnuplot for this greate work.

I have two enhancments proposals, first releated to shared memory with
other process,
and second about candlesticks. Here I would want talk about simpler
second point :)

I am talking about variable-width candlesticks, (known in statistics
as "box plot")
http://lis.epfl.ch/~markus/References/McGill78.pdf chapter 3.

In gnuplot we can now control
- 1: x
- 2: open
- 3: close
- 4: low
- 5: high
- is box empty or filled with vertical bars (if close < open)
- presence of 'whiskerbars' at the extremas (close & open)
- global per plot width, by 'set boxwidth {<width>} {absolute|
relative}'

candlesticks is not only used in finance, but mostly in statisics, and
then parameters are known as: open==minimum, close==maximum, low==1st
quartiel, high==3rd quartile.

Additionally we can add statistical "median" by using additionall
'using 1:4:4:4:4 with candlesticks lt -1 notitle'

This is all fine, well documented inside gnuplot help system, and with
nice demos on webpage. :)

I would ask to add support of 6th data column which will directly tell
about relative width
of boxes (and whiskerbars), it is usefull if each box is based on
different sample size.
In such situation it is usefull to set width to be proportional to the
square root
of the sample size, allowing quickly inspect importance of each data
set represented by candlestick.

It is in detail discussed in McGill78.pdf

Othere presented there "box plots" variations, can also be interesting
to have in gnuplot. Notably 'dashed' lines, 'nothced box plot'.

sfeam

unread,
Dec 16, 2009, 8:21:24 PM12/16/09
to
Witold Baryluk wrote:

> I am talking about variable-width candlesticks, (known in statistics
> as "box plot")
> http://lis.epfl.ch/~markus/References/McGill78.pdf chapter 3.
>
> In gnuplot we can now control
> - 1: x
> - 2: open
> - 3: close
> - 4: low
> - 5: high
> - is box empty or filled with vertical bars (if close < open)
> - presence of 'whiskerbars' at the extremas (close & open)
> - global per plot width, by 'set boxwidth {<width>} {absolute|
> relative}'
>
> candlesticks is not only used in finance, but mostly in statisics, and
> then parameters are known as: open==minimum, close==maximum, low==1st
> quartiel, high==3rd quartile.
>
> Additionally we can add statistical "median" by using additionall
> 'using 1:4:4:4:4 with candlesticks lt -1 notitle'
>
> This is all fine, well documented inside gnuplot help system, and with
> nice demos on webpage. :)
>
> I would ask to add support of 6th data column which will directly tell
> about relative width of boxes (and whiskerbars), it is usefull if each
> box is based on different sample size.

This feature is in the works. There is a technical limitation in the
way at the moment, but nothing fundamental.

If you care about the gory details:
The problem is that gnuplot stores each data point in a structure
that holds exactly 7 items of information. That's more than enough for
most plotting styles, but candlesticks exhausts the available space per
data item. We will need to either expand the structure or free up one of
the slots currently in use.

Ethan

Witold Baryluk

unread,
Dec 17, 2009, 10:50:59 AM12/17/09
to
On 17 Gru, 02:21, sfeam <sf...@users.sourceforge.net> wrote:
> This feature is in the works.  There is a technical limitation in the
> way at the moment, but nothing fundamental.  
>
> If you care about the gory details:
> The problem is that gnuplot stores each data point in a structure
> that holds exactly 7 items of information.  That's more than enough for
> most plotting styles, but candlesticks exhausts the available space per
> data item.  We will need to either expand the structure or free up one of
> the slots currently in use.

So I will ask different question, if we are using simple plotting
style, like "dot",
is we still using this 7-item struct for each point, ignorng the fact
that
it is sufficient to have 2-item struct?


If so, i would sugesst creating multiple versions of this structure
(or just make it the hold small array),
and have function returning n-th entry in the array, so no other
function will access
them directly (function will check what kind of structs are inside,
move pointer when
nacassary and read or write data there). This will make big memory
savings for many
plotting styles, and will also allow some more advanced ploting style
implementable.

Just adding 8th field, will just take additionall memory for people
using just other ploting
styles, so I can be against it :) I have already problems with very
big datasets.

Witek

sfeam

unread,
Dec 17, 2009, 3:03:38 PM12/17/09
to
Witold Baryluk wrote:

> On 17 Gru, 02:21, sfeam <sf...@users.sourceforge.net> wrote:
>> This feature is in the works.  There is a technical limitation in the
>> way at the moment, but nothing fundamental.
>>
>> If you care about the gory details:
>> The problem is that gnuplot stores each data point in a structure
>> that holds exactly 7 items of information.  That's more than enough for
>> most plotting styles, but candlesticks exhausts the available space per
>> data item.  We will need to either expand the structure or free up one of
>> the slots currently in use.
>
> So I will ask different question, if we are using simple plotting
> style, like "dot", is we still using this 7-item struct for each point,
> ignorng the fact that it is sufficient to have 2-item struct?

Correct, although actually the number of items for "dot" would be at
least 3: x, y, variable color. It may store a variable size also, although
most terminals ignore this information.

> If so, i would sugesst creating multiple versions of this structure
> (or just make it the hold small array), and have function returning
> n-th entry in the array, so no other function will access
> them directly (function will check what kind of structs are inside,
> move pointer when nacassary and read or write data there).
> This will make big memory savings for many plotting styles, and will
> also allow some more advanced ploting style implementable.

I don't think that's an option. Much of the current code is shared by
all plotting styles, and access to each point is via a pointer to the
corresponding (struct coordinate *)point. For example, in order to
determine the range on x it scans through all points looking at
point->xmin and point->xmax. It would not work to use a different
structure for each plot type.



> Just adding 8th field, will just take additionall memory for people
> using just other ploting styles, so I can be against it :)

Correct. That is the down side of adding a new field.

> I have already problems with very big datasets.

There may be other fixes for this problem. I have been playing with
creating a "streaming" data input mode, where the data is plotted as
it is read in rather than being stored in one pass and plotted in a
second pass. The current code design is not well matched to this
idea either, but I think it could be made to work if limited to a
single plot clause per graph. That is, one could say something like
plot 'A' streaming with points
but not
plot 'A' streaming with points, 'B' streaming with lines
But I don't have this working yet, even as a prototype.
So no promises.

Ethan

>
> Witek

Witold Baryluk

unread,
Dec 17, 2009, 8:32:41 PM12/17/09
to
On 17 Gru, 21:03, sfeam <sf...@users.sourceforge.net> wrote:

> Witold Baryluk wrote:
> > If so, i would sugesst creating multiple versions of this structure
> > (or just make it the hold small array), and have function returning
> > n-th entry in the array, so no other  function will access
> > them directly (function will check what kind of structs are inside,
> > move pointer when nacassary and read or write data there).
> > This will make big memory savings for many plotting styles, and will
> > also allow some more advanced ploting style implementable.
>
> I don't think that's an option.  Much of the current code is shared by
> all plotting styles, and access to each point is via a pointer to the
> corresponding (struct coordinate *)point.  For example, in order to
> determine the range on x it scans through all points looking at
> point->xmin and point->xmax.  It would not work to use a different
> structure for each plot type.

One can say that gnuplot needs some structural refactoring. :)
I bet you don't really like C++ (I also don't like it),
but it is sometimes good to have abstraction and hide things like
what datastructure is used, so data and algorithms over them can be
decopuled.
And it is good to work in similar maner in non-OO languages, like C.


>
> > Just adding 8th field, will just take additionall memory for people
> > using just other ploting styles, so I can be against it :)
>
> Correct. That is the down side of adding a new field.
>
> > I have already problems with very big datasets.
>
> There may be other fixes for this problem.  I have been playing with
> creating a "streaming" data input mode, where the data is plotted as
> it is read in rather than being stored in one pass and plotted in a
> second pass.  The current code design is not well matched to this
> idea either, but I think it could be made to work if limited to a
> single plot clause per graph.  That is, one could say something like
>   plot 'A' streaming with points
> but not
>   plot 'A' streaming with points, 'B' streaming with lines
> But I don't have this working yet, even as a prototype.

Yes. It is very good idea to work on streams of data,
actually streaming was going to be my 3rd proposal
for gnuplot! You read my mind :)

Most times it is sufficient to do one pass
iteration over data without storing any additionall data,
beyond the data send to the graphic backend [aka terminal] (in case
of vector graphics, this can be lot of data, but we already
store it there). Min/Max can be determined in separate
pre-pass. And if we have xrange/yrange manually assigned
by user (set xrange [a:b]) it is in fact not needed.

> So no promises.

This is big change, so more reaserch is needed, but
it can bring gnuplot to the new level. :)

Thank you very much for answers.

sfeam

unread,
Dec 21, 2009, 5:26:05 PM12/21/09
to
sfeam wrote:

> Witold Baryluk wrote:
>>
>> I would ask to add support of 6th data column which will directly tell
>> about relative width of boxes (and whiskerbars), it is usefull if each
>> box is based on different sample size.
>
> This feature is in the works.  There is a technical limitation in the
> way at the moment, but nothing fundamental.
>
> If you care about the gory details:
> The problem is that gnuplot stores each data point in a structure
> that holds exactly 7 items of information.  That's more than enough for
> most plotting styles, but candlesticks exhausts the available space per
> data item.

I am both delighted and embarrassed to report that I was wrong about there
not being enough room to store additional information associated with
candlesticks.

Therefore I have gone ahead and implemented a new plotting style
"with boxplot" that supports both the variable width option and
the inclusion of a bar representing the median of a distribution.
Please see patchset
#2918957 New plot style "with boxplot"

Since the new plot style is particularly useful in conjunction with the
"stats" command being developed as a separate patch series by Zoltán Vörös
and Philipp Janert, they will probably be added to CVS at the same time
when both are ready.

Ethan

0 new messages