[R] variable importance in Random Forest

1 view
Skip to first unread message

Changbin Du

unread,
Apr 28, 2010, 8:57:46 PM4/28/10
to andy...@merck.com, r-h...@r-project.org
HI, Dear Andy,

I run the RandomFOrest in R, and get the following resutls in variable
importance:

What is the meaning of MeanDecreaseAccuracy and MeanDecreaseGini?

I found they are raw values, they are not scaled to 1, right?

Which column if most similar to the variable rel.influence in Boosting?

Thanks so much!



> fit$importance
0 1 MeanDecreaseAccuracy MeanDecreaseGini
CT 0.0022352025 0.003829344 0.0030311246 5.184427
DP 0.0069461974 0.016387520 0.0116650960 15.440624
DY 0.0141150255 0.026031690 0.0200603555 19.901538
FC 0.0024279188 0.005158945 0.0037948155 5.527078
NE 0.0352705133 0.070503233 0.0527718526 46.278504
NW 0.0256059127 0.034433862 0.0299981496 26.440402
QT 0.0037228694 0.008181262 0.0059571350 9.308828
SK 0.0048187014 0.008895719 0.0068609174 10.662129
TA 0.0042134249 0.011746533 0.0079851331 12.878367
WC 0.0177155268 0.014981440 0.0163366320 14.240232
WD 0.0232972311 0.034083695 0.0286702065 25.335182
WG 0.0328547215 0.053142508 0.0429480441 30.663749
WW 0.0093983693 0.006377956 0.0078681474 7.250101
YG 0.0051691399 0.007338639 0.0062618144 11.084111
num_cell 0.0061355526 0.005373049 0.0057463613 5.060577
num_genes 0.0364878788 0.044544488 0.0404558096 32.745034
position 0.0025375614 0.011566496 0.0070255302 10.070505
freq_hypo 0.0008723241 0.001757602 0.0013181209 1.930695
freq_intra 0.0009449492 0.001943090 0.0014431451 2.611950
log_hypo 0.0004514713 0.001366561 0.0009096419 1.736749
acid_per 0.0125815445 0.023360179 0.0179634375 21.131681
base_per 0.0070077737 0.012196570 0.0096129124 13.675893
charge_per 0.0095668425 0.024125997 0.0168345956 20.969665
hydrophob_per 0.0185736697 0.031941513 0.0252200036 25.994903
polar_per 0.0169369327 0.023633413 0.0202776247 20.890415










--
Sincerely,
Changbin
--

[[alternative HTML version deleted]]

______________________________________________
R-h...@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

--
You received this message because you are subscribed to the Google Groups "R-help-archive" group.
To post to this group, send email to r-help-...@googlegroups.com.
To unsubscribe from this group, send email to r-help-archiv...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/r-help-archive?hl=en.

Changbin Du

unread,
Apr 29, 2010, 11:45:27 AM4/29/10
to Liaw, Andy, r-h...@r-project.org
HI, Andy,

Thanks so much for your reply!

IN the paper "Classification and regression by randomForest", the first
page, there is "the random forest estimate the the importance of a variable
by looking at how much prediction error increase when the variable is
permuted..."

IN the help document of randomForest, the variable is measured in total
decrease in node impurities. IT should be total* increase* in node
impurities? right?

if total decrease in node impurities, will it be contradict with the paper?

ALso in the fit$importance, what is the meaning for first two columns?
On Thu, Apr 29, 2010 at 5:22 AM, Liaw, Andy <andy...@merck.com> wrote:

> Please see the "Detail" section of the help page for the importance()
> function in the randomForest package, and let me know which part of it you
> do not understand.
>
> For boosting, you need to read its documentation and decide for yourself if
> its importance measure is at all comparable to the two in RF.
>
> Andy
>
> ------------------------------
> *From:* Changbin Du [mailto:chan...@gmail.com]
> *Sent:* Wednesday, April 28, 2010 8:58 PM
> *To:* Liaw, Andy
> *Cc:* r-h...@r-project.org
> *Subject:* variable importance in Random Forest
> Notice: This e-mail message, together with any attach...{{dropped:21}}
Reply all
Reply to author
Forward
0 new messages