[R] LDA on pre-assigned training and testing data sets

0 views
Skip to first unread message

Peter Flom

unread,
Jun 25, 2008, 12:21:46 PM6/25/08
to r-h...@r-project.org
Dear r-help

I am trying to run LDA on a training data set, and test it on another data set with the same variables. I found examples using crossvalidation, and using training and testing data sets set up with sample, but not when they are preassigned.

Here is what I tried

# FIRST SET UP A DATAFRAME WITH ALL THE DATA AND CREATE NEW VARIABLES

traintest1 <- arnaudnognod1[arnaudnognod1$DISC_USE1 == 1.01|arnaudnognod1$DISC_USE1 == 1.03|arnaudnognod1$DISC_USE1 == 1.04
|arnaudnognod1$DISC_USE1 == 1.02|arnaudnognod1$DISC_USE1 == 1.05|arnaudnognod1$DISC_USE1 == 1.06,]
traintest1$normal <- traintest1$DISC_USE1 == 1.01|traintest1$DISC_USE1 == 1.03|traintest1$DISC_USE1 == 1.04
traintest1$mafelev <- apply(traintest1[,1:40], 1, FUN = mean)
traintest1$mafscatter <- apply(traintest1[,1:40], 1, FUN = sd)

# NEXT CREATE TRAINING AND TESTING DATAFRAMES

train <- traintest1[traintest1$DISC_USE1 == 1.01|traintest1$DISC_USE1 == 1.02,]
test <- traintest1[traintest1$DISC_USE1 > 1.02,]

# NOW, TRAIN HAS 400 ROWS, TEST HAS 396 ROWS, AND TRAINTEST1 HAS 796 ROWS, EACH HAS 615 COLUMNS, AS EXPECTED

# RUN DISCRIM ON TRAINING DATA

mafdisc <- lda(normal~mafelev + mafscatter, data = train)

#mafdisc$counts IS 210 AND 190, AS EXPECTED

#FINALLY, TEST IT ON THE TEST DATA

mafdiscpred <- predict(mafdisc, data = test)

#BUT mafdiscpred$class HAS LENGTH = 400, NOT 396, AS EXPECTED.

any help appreciated

thanks

Peter

Peter L. Flom, PhD
Brainscope, Inc.
212 263 7863 (MTW)
212 845 4485 (Th)
917 488 7176 (F)

[[alternative HTML version deleted]]

______________________________________________
R-h...@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Michael Conklin

unread,
Jun 25, 2008, 12:37:54 PM6/25/08
to Peter Flom, r-h...@r-project.org
I think this line

mafdiscpred <- predict(mafdisc, data = test)

needs to be

mafdiscpred <- predict(mafdisc, newdata = test)


Michael Conklin

Chief Methodologist - Advanced Analytics

MarketTools, Inc.

6465 Wayzata Blvd. Suite 170

Minneapolis, MN 55426

Tel: 952.417.4719 | Mobile:612.201.8978

Michael...@markettools.com

MarketTools(r) http://www.markettools.com

This e-mail and any attachments may contain privileged, confidential or
proprietary information. If you are not the intended recipient, be aware
that any review, copying, or distribution of this e-mail or any
attachment is strictly prohibited. If you have received this e-mail in
error, please return it to the sender immediately, and permanently
delete the original and any copies from your system. Thank you for your
cooperation.

Reply all
Reply to author
Forward
0 new messages