Nonlinear curve fit with Deducer

166 views
Skip to first unread message

Stan Warford

unread,
May 9, 2013, 1:54:17 PM5/9/13
to ded...@googlegroups.com
Hi everybody,

I am new to R and Deducer, running R 3.0.0, and want to provide a point-and-click tool for my students to do a nonlinear curve fit of the equation

y = A x ln(x) + B x + C

I want my students to compare the RMSE of the best fit of this equation with the RMSE of the best fit of a quadratic equation to determine which is the better fit.

I see in the Analysis menu there are selections for Linear Model, Logistic Model, and Generalized Linear model but no general nonlinear model. I have two questions.

(1) Is it possible to do a point-and-click process with Deducer for the above nonlinear model, or must that be done on the Console command line? My students could handle the command line process, but I would like to avoid that if possible.

(2) With Deducer's Plot Builder, is there a template to plot the best fit curve with the data points?

Here is a typical data set:

Thanks in advance,

Stan

J. Stanley Warford

Professor of Computer Science

Pepperdine University

Malibu, CA 90263


Ian Fellows

unread,
May 9, 2013, 2:19:27 PM5/9/13
to ded...@googlegroups.com
(1) there are two ways of doing it. 
      (a) Go into Data -> Transform and create a new variable using the log transform (e.g. named x.tr). Then go to linear model, and add x and x.tr as predictors of y. This will give you your model.
      (b) Open linear model, with x predicting y. in the model builder double click on x to edit it into log(x). Then select x of the left and click the plus button. you should now have x and log(x) in your model.

(2) Create a scatter plot and then add in Smooth from Geometric Elements. In Smooth's options, select "Linear", and put y ~ x + log(x) in the formula.

ian



--
 
---
You received this message because you are subscribed to the Google Groups "Deducer" group.
To unsubscribe from this group and stop receiving emails from it, send an email to deducer+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Stan Warford

unread,
May 9, 2013, 6:30:24 PM5/9/13
to ded...@googlegroups.com
Thanks for the quick response.

I chose 1(b) as it seems easier than 1(a). However, computer science theory predicts the terms x*log(x) with coefficient A in my original post, and x with coefficient B in my original post. So, I edited the equation to be x*log(x), which was accepted by the dialog box, and used the plus button to include x back into the model.

I get the following for the formula:

Call:
lm(formula = InsertComp ~ DataCount * log(DataCount) + DataCount, 
    data = .gui.working.env$comparisonsR, na.action = na.omit)

where InsertComp is y and DataCount is x. But for the coefficients I get

Coefficients:
                           Estimate Std. Error t value Pr(>|t|)    
(Intercept)              -5645650.9  1142193.1  -4.943  0.00260 ** 
DataCount                  -16964.3     1376.3 -12.326 1.74e-05 ***
log(DataCount)            1292621.0   226823.8   5.699  0.00126 ** 
DataCount:log(DataCount)     2015.6      142.9  14.109 7.92e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

I assume (Intercept) Estimate corresponds to my coefficient C, and DataCount Estimate corresponds to my coefficient B, but what corresponds to my coefficient A?

Also, I got

Residual standard error: 94760 on 6 degrees of freedom
  (1 observation deleted due to missingness)

and wonder what the "missingness" comment means. I thought it might be the NA entry after the last entry in the Data Viewer, but I cannot seem to get rid of the NA. Is NA the sentinel for the list of data values in the viewer?

On another note, every time I try to resize the LInear Regression Model Explorer the application freezes and I must force quit JGR. I am running Mac OS X 10.8.3, R 3.0.0, RJavaClassLoader 1.0. Is this a known issue?

I must say that I am impressed with R and Deducer so far. As a non-staistician, however, it is difficult to find answers to these simple scenarios from the documentation. I appreciate the support!

Ian Fellows

unread,
May 10, 2013, 1:35:23 AM5/10/13
to ded...@googlegroups.com
Stanley,

I had interpreted the "x" in your first term as multiplication, leading me to tell you how to specify A*ln(x) + B*x + C. You are basically right with your formula, but R interprets the "*" in x*log(x) as the interaction, _not_ as multiplication. The correct formula is y ~ x:log(x) + x, or y ~ I(x*log(x)) + x. The "I" insulates the term allowing "*" to represent multiplication.

The way models are specified is very expressive, allowing for a simple statement of y ~ a*b*c*d to represent a full factorial model with many coefficients, but it does require some getting used to.


best,
Ian

Stan Warford

unread,
May 10, 2013, 5:49:16 PM5/10/13
to ded...@googlegroups.com
I bet : for multiplication bites a lot of newcomers! I thought I could get a quadratic fit by putting in x + x : x, but that does not work for either the curve fit or the plot. I figured out how to do the quadratic curve fit with the poly option in the Linear Regression Model Builder, but I cannot find the equivalent in Plot Builder. If I enter

y ~ x + x : x

it does a linear smooth (like it does with Model Builder).

How do I get a quadratic smooth curve in Plot Builder?

Thanks again!

Ian Fellows

unread,
May 10, 2013, 6:09:25 PM5/10/13
to ded...@googlegroups.com
y ~ poly(x,2) which is a drop down option. You can also type: y ~ x + I(x^2)

best,
Ian

Stan Warford

unread,
May 10, 2013, 6:51:30 PM5/10/13
to ded...@googlegroups.com
I should have seen that. (Although I do not know why x : x does not work.) Thanks.

As a result of this investigation I am going to switch to R with Deducer for my course and urge others in my department to do the same as well. Some of my colleagues are reluctant to switch because of the perceived difficulty of R. And I will say that the initial experience for non-specialists leaves a lot to be desired. Two big problems are:

(1) Documentation for the the non-specialist. There do not appear to be any "how to" step-by-step cookbook recipes for accomplishing simple tasks, like this one. It even took me a long time to figure out how to delete a row of data in Data Viewer until I stumbled on the contextual menu feature in your recent Statistical Software article. (Contextual menus should have regular menu equivalents.) The online Deducer manual is nothing more than a verbalization of the menus and dialog boxes. I did not find it helpful.

(2) Deployment. Installation for the non-specialist is way more complicated than it should be. There should be a Windows .exe installer and a Mac .dmg installer for the whole ball of wax -- R plus JGR plus Deducer plus launcher. I never could get the launcher for Windows to work.

Don't get me wrong. I am switching because R/JGR/Deducer is a great system, once you know how to use it. And I would not have made the switch without your outstanding support. I hope these weaknesses will eventually be corrected so others will make the switch as well.

Best regards,
Stan

Tom Hopper

unread,
May 11, 2013, 3:56:15 AM5/11/13
to ded...@googlegroups.com
Stan,

I completely agree with your points. The steep learning curve is R's biggest weakness.

Still, there are several good manuals and sources of help for R. Though I'm not in the position of teaching statistics or R (not a profession, anyway) I think that "An Introduction to R" by Venables and Ripley, available as HTML and PDF in the "Manuals" section of r-project.org (also available in print), would address your questions about how to form statistical models.

There is a whole section on r-project.org listing books about R. Some are directed toward the non-specialist, including the O'Reilly "R Cookbook" and "R in a Nutshell." I've liked Maindonald's "Data Analysis and Graphics in R" (though Deducer and ggplot2 have rendered it largely irrelevant) and I've found the books from Springer pretty good, including Wickham's "ggplot2" and Daalgard's "Introductory Statistics with R."

There are a number of reasonably good, free PDF books or booklets out on the web on how to use R for various analysis techniques such as fitting distributions, time series analysis, etc.

The RCommander (RCMDR) package (http://www.rcommander.com) and related packages (which continue to add capabilities) provides a good approximation of the Minitab interface for R. Designed for introductory statistics classes, it greatly shortens the learning curve, once you have it installed. Though not nearly as good as Deducer for exploratory data analysis, I've often thought that it would be a good alternative for my engineering colleagues who have neither the background nor the time to really learn statistics and R.

And last but certainly not least is the R-Help email list (https://stat.ethz.ch/mailman/listinfo/r-help), the archives of which can be searched and are often quite helpful for such questions.

Good luck,

Tom


Reply all
Reply to author
Forward
0 new messages