Improving multiple regression models log transformation and leverage points

81 views
Skip to first unread message

Nisha Arora

unread,
Mar 22, 2016, 4:19:49 AM3/22/16
to dataanalys...@googlegroups.com
Respected Neeraj Sir & group members,

Please bear with me, it's going to be little lengthy.

For 1 response and 6-7 predictors, I am playing around different models to improve my prediction. I managed to arrive at a model with around 56% adjusted r-squared.

I tried to look at the issues and found that there were three leverage points in the data (and thankfully no outliers), that were effecting my prediction accuracy badly.

When I removed those observations one by one, my adjusted r-squared increased to around 62 %.

Again in order to get even better model, looking carefully at the plots, I tried log transformation to one of my predictor and the results changed dramatically (adjusted r-squared 71%)
Feeling good about it but the output says "(42 observations deleted due to missingness)".

Probably R (I love R for regression modelling due to it's flexibility) removes observations with nearly zero values of that variable as log is not defined at zero.

Again to overcome this issue, I've applied log(x+x/2) or log(x+c); c being a small positive number. This further improved model accuracy and reduce the number of missing observations [but still there are some missing observations]

Now my concerns are:
  • Should I report this regression model using log transformation (with around 75% adj R2 but ~20 missing obs) or the previous model without log transformation (with 62% R2, no missing obs)
  • Do you suggest me to look for another transformation like beta, gamma or possion for that predictor
  • It is ok to remove three obs with high leverage point, which effects the model badly
At last, I find the way to report regression redults as per APA style here, hope it will help other too: http://www.adart.myzen.co.uk/reporting-multiple-regressions-in-apa-format-part-two/


Thanks for being patient, please provide your valuable insights.

Thanks,
Nisha Arora 


 

Neeraj Kaushik

unread,
Mar 22, 2016, 11:00:27 AM3/22/16
to dataanalysistraining
Dear Nisha
Are you just experimenting on Regression / Transformations etc or there is any requirements of such things in ur model?
Best wishes
Neeraj

--
Protocols of this Group:
 
1. Plz search previous post in group before posing the question.
2. Don't write query in someone's post. Always use the option of New topic for the new question. You can do this by writing to dataanalys...@googlegroups.com
3. Its better to give a proper subject to your post/query. It'll help others while searching.
4. Never write Open ended queries. This group intend to help research scholars NOT FOR WORK THEM.
5. Never write words like URGENT in ur posts. People will help them when they are free.
6. Never upload any info about National Seminars/Conferences. Send such info on personal emails. And feel free to share any RESEARCH related info.
7. No Happy New Year, Happy Diwali, Happy Holi, Happy B'day, Happy Anniversary etc. allowed on this group.
8. Few months back there was a facility for asking & sharing the Research Papers. Now there is no provision of asking for the research paper here.
 
Let’s make a better research environment.
---
You received this message because you are subscribed to the Google Groups "DataAnalysis" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataanalysistrai...@googlegroups.com.
To post to this group, send email to dataanalys...@googlegroups.com.
Visit this group at https://groups.google.com/group/dataanalysistraining.
For more options, visit https://groups.google.com/d/optout.

Nisha Arora

unread,
Mar 23, 2016, 12:58:34 AM3/23/16
to dataanalys...@googlegroups.com
Dear sir,
I am trying to improve my regression model currently for a paper in which one of my objective is to build a model predicting academic grades on the basis of some variable.

Besides, I am learning Maching Learning using R/Python from last few months.

Thanks & Regards,
Nisha Arora 

Neeraj Kaushik

unread,
Apr 1, 2016, 4:51:37 AM4/1/16
to dataanalysistraining
Dear Nisha

There is a difference between Data Analysis and Data Mining.
Data Analysis always works on fixed objectives and on the basis of No. of variables and the measurement of these variables, we decide for the tool/technique to be applied.
On the other hand, Data Mining is exploratory with no particular objective in mind.

What you are doing here seems like Data Mining.

Best wishes
Neeraj


Nisha Arora

unread,
Apr 1, 2016, 8:40:15 AM4/1/16
to dataanalys...@googlegroups.com
Dear Sir, 
Thank you for your response. I agree that data analysis and data mining are different.
And my recent attraction for machine learning made me play with the model to further improve it.
This is my first paper that includes data analysis as my area of research was operations research modeling. 
That's why I am not sure where can we stop improving the model for the purpose of research paper's analysis part. I understand that we don't need to use cross validation techniques etc for this purpose. Researchers generally don't use R but it's just awesome.
Researchers generally prefer step wise method but it is criticized well by statisticians, so I am not directly using it.
Sorry if it is going out of the track.

Neeraj Kaushik

unread,
Apr 4, 2016, 12:24:36 PM4/4/16
to dataanalysistraining
Actually its gud to know that you have learnt R.
If possible, plz plan to take some introductory classes for beginners of R.

As regard your work I shall suggest to work with Regression only and not the other things (until & unless you get some theory base for justifying why you are doing other things)

Best wishes
Neeraj

Nisha Arora

unread,
Apr 5, 2016, 1:02:24 AM4/5/16
to dataanalys...@googlegroups.com
Thank you sir.

I would love to take class/ workshop for R. Please let me know, if there is an opportunity.

Regards,
Nisha Arora 
Reply all
Reply to author
Forward
0 new messages