Sorry if the subject line is not the right term for what I describe
...
I have a text file containing numerical data, sorted into order with
one number per line. This data all lies in the range 10.0 --> 50.0,
but my question is more general.
I want a frequency plot of the data, as I need to find the percentiles
for this data (50% of values < 27.3, 25% < 22.9, 10% < 19.6, and so
on.). i.e. I want to see a continuous line starting at zero and
ending at 100(%), basically heading from bottom left to top right with
the "values" as the x-axis ranging from 10 to 50 (in this case).
I hacked a quick fortran program for this data, but I am sure gnuplot
could do this and I would like to script the graph generation bit for
web publishing purposes.
Suggestions ...
Thanks
Kevin
> I want a frequency plot of the data, as I need to find the percentiles
> for this data (50% of values < 27.3, 25% < 22.9, 10% < 19.6, and so
> on.). i.e. I want to see a continuous line starting at zero and
> ending at 100(%), basically heading from bottom left to top right with
> the "values" as the x-axis ranging from 10 to 50 (in this case).
Maybe there is a better solution but what I did in a similar case was:
- sort the file numerically outside of gnuplot
- count the lines
- in gnuplot: plot 'sorted-values' us 1:($0*100.0/number-of-lines)
--
Regards
Heinz
> Sorry if the subject line is not the right term for what I describe
> ...
It isn't. A frequency plot would be what's more commonly called a
histogram, i.e. a plot of the sampled probability density function
(PDF) of your data. What you're describing below is a plot of the
sample dataset's cumulative density function (CDF).
> I want a frequency plot of the data, as I need to find the percentiles
> for this data (50% of values < 27.3, 25% < 22.9, 10% < 19.6, and so
> on.). i.e. I want to see a continuous line starting at zero and
> ending at 100(%), basically heading from bottom left to top right with
> the "values" as the x-axis ranging from 10 to 50 (in this case).
This requires summation of the input, which is not in gnuplot's bag of
tricks. You'll need a little bit of external script code for that.
'awk' can do it for you. Even on the fly, if you're on a somewhat
unix-ish platform that supports pipes (includes DOS and OS/2, but
currently not the MS-Windows versions of gnuplot):
plot '< awk "{sum = sum + $1 ; print sum} data.dat' u 1 with lines
This won't automatically scale the output to go from y = 0 to y = 100
percent, either: that would require two passes over the dataset.
--
Hans-Bernhard Broeker (bro...@physik.rwth-aachen.de)
Even if all the snow were burnt, ashes would remain.
> plot '< awk "{sum = sum + $1 ; print sum} data.dat' u 1 with lines
>
> This won't automatically scale the output to go from y = 0 to y = 100
> percent, either: that would require two passes over the dataset.
This:
plot "<awk '{x[NR]=$1; y[NR]=$2; if ($2 > maxY) {maxY = $2}}END{ for
(i=1;i<NR;i++) {print x[i], y[i]*100/maxY}}' data.dat" u 1 with lines
would work as long as you have less data points than whatever limit your version
of awk sets on array sizes (e.g. 4096 is one I've seen - gawk may be higher).
You could set the yrange to 0:100 before this if you want to ensure it starts at
zero.
plot "< awk '{sum = sum + $1 ; print sum}' data.dat" u 1 with lines
do this:
plot "< awk '{sum = sum + $1; x[NR] = sum}END{for(i=i;i<=NR;i++){print
x[i]*100/sum}}' data.dat" u 1 with lines
Regards,
Ed
Heinz wrote:
> - sort the file numerically outside of gnuplot
> - count the lines
> - in gnuplot: plot 'sorted-values' us 1:($0*100.0/number-of-lines)
Thanks, other replies dispalyed my lack of preciseness
in my original question, for which I apologise. I did indeed
want what I now remember was called a cumulative density function (CDF).
Using the above, I solved my problem (I know my data points all
lie between 10 and 50) with
set xrange [10:50]
set yrange [0:100]
set xtics 10,2,50
set ytics 0,10,100
numlines=`cat data | wc -l`
plot "<sort -n data" using 1:($0*100.0/numlines) with lines
although it would have been fairly trivial for me to
pre-sort the data if necessary.
Many Thanks to all those who responded.
Kevin