Improving USL results

50 views
Skip to first unread message

Mohit Chawla

unread,
Jul 16, 2016, 4:23:43 PM7/16/16
to guerrilla-cap...@googlegroups.com
Hello folks, 

I am modeling an application's behavior with USL.

I am happy to report that the data perfectly aligns with Little's Law - the computed number from Little's Law is equal to the actual number of service threads in the system. 

So, load('N') and throughput data is good. But, when applying USL to the data, the best R-squared value I have been able to get so far is 0.87. I would appreciate some help to improve the results of the model. 

Here is the summary of applying USL to the data:

Call:
usl(formula = throughput ~ N, data = values)

Scale Factor for normalization: 3.3

Efficiency:
   Min     1Q Median     3Q    Max
0.5401 0.6916 0.7475 0.8282 1.0000

Residuals:
     Min       1Q   Median       3Q      Max
-151.626  -19.735    8.142   32.249   96.047

Coefficients:
       Estimate   Std. Error
sigma  6.649e-04  1.409e-04
kappa  2.452e-06  4.857e-07

Residual standard error: 42.32 on 233 degrees of freedom
Multiple R-squared: 0.8786, Adjusted R-squared: 0.878.

I have uploaded a screenshot of the data and the model curve at https://i.imgur.com/4q6qUHV.png

Thanks,
Mohit





DrQ

unread,
Jul 16, 2016, 4:46:26 PM7/16/16
to Guerrilla Capacity Planning
Hi Mohit,

I'm not sure why you think there's an issue that needs to be improved.

Be aware that R^2 does not tell the whole story about the fit. It's only the leading indicator.
And, you shouldn't expect it to be close to 100% (if that's what you're thinking), especially 
when you have some degree of scatter (which you do). 

It looks about right to me, given the residuals about the USL curve.
What does R report as the significance levels?

Quite honestly, I wouldn't be sweating bullets over R^2. I'd be looking at the USL projections 
beyond N = 350 threads or whatevers and assessing the magnitudes of the USL coefficients.
Do the projections (the thing that can't be seen in the data) make sense?

The other thing that looks unusual to me, is the shear number of data points. You have enough 
data to really calculate the error bars for those measurements.  Since those are also related to 
R^2, are they acceptable?

Mohit Chawla

unread,
Jul 17, 2016, 11:29:08 AM7/17/16
to Guerrilla Capacity Planning
Hello Dr. Gunther, 

Thanks for your reply. 

The projections for the same day do make sense, to an extent, as can be seen at http://i.imgur.com/SmiKkUG.png  - the red dots are the continued projection points. 

I was trying to use the model applied to the data from one day to predict the measurements for another day with similar load pattern, and I found out that the predictions were not close to the measured values. So I thought of improving the model's correctness. But maybe I need to repeat the calculations for a week to be able to make such predictions. Any ideas here would be helpful as well. 

Since the data is multivalued, I tried to use average of points for same values of 'N' instead, but that included rounding and converting to integers and that somehow brought down the R-squared value quite a bit. 

I do not know how I can get R to report the significance levels for the data - is there a library or function for that ? Error bars sound useful, I'll try to plot them. As for their acceptability, I am not sure how I can ascertain that, and what to do to clean them up (the data fits well with Little's Law). Any suggestions ? 

Thanks,
Mohit

Mohit Chawla

unread,
Jul 17, 2016, 11:29:08 AM7/17/16
to guerrilla-cap...@googlegroups.com
Hello Dr. Gunther, 

Thanks for your reply. 

The projections for the same day do make sense, to an extent, as can be seen at http://i.imgur.com/SmiKkUG.png  - the red dots are the continued projection points. 

I was trying to use the model applied to the data from one day to predict the measurements for another day with similar load pattern, and I found out that the predictions were not close to the measured values. So I thought of improving the model correctness. But maybe I need to repeat the calculations for a week to be able to make such predictions. Any ideas here would be helpful as well. 

Since the data is multivalued, I tried to use average of points for same values of 'N' instead, but that included rounding and converting to integers and that somehow brought down the R-squared value quite a bit. 

I do not know how I can get R to report the significance levels for the data. Error bars sound useful, I'll try to plot them. As for their acceptability, I am not sure how I can ascertain that, and what to do to clean them up. Any suggestions ? 

Thanks, 
Mohit



--
You received this message because you are subscribed to the Google Groups "Guerrilla Capacity Planning" group.
To unsubscribe from this group and stop receiving emails from it, send an email to guerrilla-capacity-...@googlegroups.com.
To post to this group, send email to guerrilla-cap...@googlegroups.com.
Visit this group at https://groups.google.com/group/guerrilla-capacity-planning.
For more options, visit https://groups.google.com/d/optout.

DrQ

unread,
Jul 17, 2016, 5:25:33 PM7/17/16
to Guerrilla Capacity Planning
Here is my USL fit (blue curve) to the initial data (black points) ...

Extended data (red points) are then superimposed. Those data are inferior to the USL prediction based on the initial data set.
Next, I fit to all those data (black + red) to get the new USL coefficients and see how much they changed.

This clearly tells you the newer data "failed" to scale as well as the USL predicted using the original measurements. Someone now needs to explain what went "wrong". Of course, this may be just how it is. Maybe the system (whatever it is) can't scale as well as anticipated by the USL based on the lower scaling data. Nonetheless, it should be explained why not. It might be a potential performance opportunity. :)

I also interpolated the the N=1 throughput value using R and found it to be  X(1) = 3.344884.
To unsubscribe from this group and stop receiving emails from it, send an email to guerrilla-capacity-planning+unsub...@googlegroups.com.
To post to this group, send email to guerrilla-capacity-planning@googlegroups.com.

Mohit Chawla

unread,
Jul 17, 2016, 10:14:29 PM7/17/16
to guerrilla-cap...@googlegroups.com

Hello Dr. Gunther,

Thanks a lot for taking the time to analyse and interpret the data ! I'll take up the performance opportunity and check it further.

Thanks,
Mohit

To unsubscribe from this group and stop receiving emails from it, send an email to guerrilla-capacity-...@googlegroups.com.
To post to this group, send email to guerrilla-cap...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Guerrilla Capacity Planning" group.
To unsubscribe from this group and stop receiving emails from it, send an email to guerrilla-capacity-...@googlegroups.com.
To post to this group, send email to guerrilla-cap...@googlegroups.com.

Baron Schwartz

unread,
Jul 17, 2016, 10:14:29 PM7/17/16
to guerrilla-cap...@googlegroups.com
I'd suggest looking at the data as a time-series as well. Can you plot the throughput and concurrency as time series and provide the links in a reply?

I was going to reply and suggest exactly what Dr. Gunther did: the "tail" of the data all fall under the predicted scaling curve, which is highly suggestive that the system is underperforming at the higher concurrencies and that this trend will continue and accelerate. I'm basing this on my own experience; when I've seen this, the system is already getting into serious trouble at the upper end and can't be counted on to produce increasing throughput under increasing load. Basically, you're already looking at its peak in the data you have at present.

DrQ

unread,
Jul 18, 2016, 1:08:41 AM7/18/16
to Guerrilla Capacity Planning
Here are the significance levels for the respective USL models.
For some reason they don't show up if you use Stefan Moeding's pkg.

> summary(usl.fit)

Formula: Norm ~ N/(1 + alpha * (N - 1) + beta * N * (N - 1))

Parameters:
       Estimate Std. Error t value Pr(>|t|)    
alpha 6.406e-04  1.399e-04   4.579 7.61e-06 ***
beta  2.686e-06  4.832e-07   5.559 7.40e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 12.61 on 233 degrees of freedom

Number of iterations to convergence: 13 
Achieved convergence tolerance: 4.1e-06

> summary(usl.fit2)

Formula: Norm ~ N/(1 + alpha * (N - 1) + beta * N * (N - 1))

Parameters:
       Estimate Std. Error t value Pr(>|t|)    
alpha 2.026e-04  1.026e-04   1.975   0.0492 *  
beta  4.340e-06  3.250e-07  13.354   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 12.8 on 287 degrees of freedom

Number of iterations to convergence: 13 
Achieved convergence tolerance: 9.683e-06



On Sunday, July 17, 2016 at 7:14:29 PM UTC-7, Mohit Chawla wrote:

Hello Dr. Gunther,

Thanks a lot for taking the time to analyse and interpret the data ! I'll take up the performance opportunity and check it further.

Thanks,
Mohit

To unsubscribe from this group and stop receiving emails from it, send an email to guerrilla-capacity-planning+unsub...@googlegroups.com.
To post to this group, send email to guerrilla-capacity-planning@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Guerrilla Capacity Planning" group.
To unsubscribe from this group and stop receiving emails from it, send an email to guerrilla-capacity-planning+unsub...@googlegroups.com.
To post to this group, send email to guerrilla-capacity-planning@googlegroups.com.

Harry van der horst

unread,
Jul 18, 2016, 10:14:23 AM7/18/16
to guerrilla-cap...@googlegroups.com
Mohit,
It looks a bit as if your database starts to fuck up at the higher utilitions.
Maybe there is a single data element that is causing contention.
kind regards 
harry


--
You received this message because you are subscribed to the Google Groups "Guerrilla Capacity Planning" group.
To unsubscribe from this group and stop receiving emails from it, send an email to guerrilla-capacity-...@googlegroups.com.
To post to this group, send email to guerrilla-cap...@googlegroups.com.



--
met hartelijke groeten/kind regards
harry van der Horst
M 0031643016999

Stefan Moeding

unread,
Oct 14, 2016, 4:35:56 AM10/14/16
to Guerrilla Capacity Planning
Hi!


On Monday, July 18, 2016 at 7:08:41 AM UTC+2, DrQ wrote:

Here are the significance levels for the respective USL models.
For some reason they don't show up if you use Stefan Moeding's pkg.

> summary(usl.fit)

Formula: Norm ~ N/(1 + alpha * (N - 1) + beta * N * (N - 1))

Parameters:
       Estimate Std. Error t value Pr(>|t|)    
alpha 6.406e-04  1.399e-04   4.579 7.61e-06 ***
beta  2.686e-06  4.832e-07   5.559 7.40e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

The significance levels were never part of the summary output for an usl model. It's just a different piece of code and while the calculation was already implemented, the output was not.

Anyway, this has been fixed with the latest release of the usl package for R. Now the summary command includes the t-value and the two sided p-value for both coefficients. So the output now looks similar to the one shown here.

Version 1.7.0 of the package should be available on your favorite CRAN mirror shortly.

Regards
Stefan
Reply all
Reply to author
Forward
0 new messages