"A mathematician is a device for turning coffee into theorems." - Paul Erdos
Quoting Baron Schwartz <baron.s...@gmail.com>:
> --
> You received this message because you are subscribed to the Google
> Groups "Guerrilla Capacity Planning" group.
> To post to this group, send email to
> guerrilla-cap...@googlegroups.com.
> To unsubscribe from this group, send email to
> guerrilla-capacity-...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/guerrilla-capacity-planning?hl=en.
>
>
1. Use a direct non-linear regression model from R and use the
goodness-of-fit data / confidence intervals from that. That is, rather
than doing the algebra to convert GCap into a quadratic, fit the data
directly to the equation. I had some R code that did this a while back
- I don't remember whether I posted it anywhere or not. It may or may
not be in one of my Github repositories and it may or may not be
somewhere on my hard drive. ;-)
2. Use any regression - transformed to quadratic or direct - and do a
bootstrap or jackknife resampling. The jackknife is simpler - you just
re-run the analysis N times, once for each data point, leaving a data
point out each time. Then you compute a few statistics and you've got
confidence intervals. The most "user friendly" description of the
process is in "Data Analysis and Regression: A Second Course in
Statistics" by Mosteller and Tukey. You can do a jackknife in Excel, I
think - last time I did one I did it in FORTRAN but you should be able
to construct the iterations in VBA.
If you want to bootstrap, there are R routines to do it. A bootstrap
works by drawing repeated samples from your data, performing the
regressions, and then computing the statistics just like you would
with a jackknife.
I'd recommend the first option - it's really not that difficult unless
your data set is flaky. In that case, a simple scatterplot will tell
you it's flaky and you can simply remove bad points before doing the
fitting.
--
M. Edward (Ed) Borasky
http://borasky-research.net http://twitter.com/znmeb
"A mathematician is a device for turning coffee into theorems." - Paul Erdos
Quoting Baron Schwartz <baron.s...@gmail.com>:
I'm not a stats expert -- so thanks for correcting my terminology.
That's what I'm looking for, I think. I'll study that.
> --
> You received this message because you are subscribed to the Google Groups "Guerrilla Capacity Planning" group.
> To post to this group, send email to guerrilla-cap...@googlegroups.com.
> To unsubscribe from this group, send email to guerrilla-capacity-...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/guerrilla-capacity-planning?hl=en.
>
>
--
Baron Schwartz
Percona Inc <http://www.percona.com/>
Consulting, Training, Support & Services for MySQL
Thanks. I noticed the same thing. I think the data is suspect. I
looked at another set from the same benchmarks and the C(1)
measurement is way too low -- it makes the C(2 .. 5) measurements look
like they are 110% over linear speedup. I think the benchmarks might
not be properly warmed up, or something else like that might be going
on.
- Baron
--
Thanks. I have been using gnuplot, which can do regressions against
any closed-form equation such as the USL. Does anyone have experience
with gnuplot, or advice about it? I have no experience using R, but I
can see that I need to learn someday. I'd like to compare gnuplot's
notes with R. Using gnuplot against the dataset I gave, I get
sigma 0.0207163 +/- 0.001323 (6.385%)
kappa 0.000861226 +/- 5.414e-05 (6.287%)
R^2 0.999624
I computed the R^2 myself with awk. Apparently gnuplot does not
really believe in R^2, but I am not educated enough to understand the
explanation given in the gnuplot manual. It just looks to me like
someone knows a lot more than I do, and can't explain it to me!
My final model (still using gnuplot) gives me a peak predicted
capacity of 12253 at N=33.
If the above is different from the results others on this list obtain,
then I think I need to take the time to learn R, and learn the
techniques you and Dr. Gunther described (jackknife, etc). But if
gnuplot's +/- results for sigma and kappa are usable, then I'll save
the time I don't have to learn R, and go on with my work.
Thank you!
- Baron