àPart of the Dataset:
|
0.131 |
0.297 |
0.633 |
0.492 |
0.704 |
0.747 |
0.491 |
0.698 |
0.738 |
0.481 |
0.771 |
0.532 |
|
0.311 |
0.496 |
0.001 |
0 |
0.638 |
0.009 |
0.991 |
0.44 |
0.414 |
0.009 |
0.021 |
0.999 |
|
0.773 |
0.01 |
0.032 |
0.01 |
0.006 |
0.042 |
0.988 |
0.993 |
1 |
0.549 |
0.577 |
0.99 |
|
0.719 |
0.534 |
0.028 |
0.008 |
0 |
0.569 |
0.983 |
0.985 |
0.025 |
0.022 |
0.6 |
0.374 |
àCode:
patient.train<- h2o.importFile("C:\\Users \\patient\\patient_trainingset.csv")
patient.test<-h2o.importFile("C:\\Users\\patient\\patient_testset.csv")
dim(patient.train)
#[1] 83 16384
dim(patient.test)
#[1] 81 16384
y.dep<-16384
x.indep<-1:16383
system.time(dlearning.model3<-h2o.deeplearning(y=y.dep,x=x.indep,training_frame=patient.train,activation="RectifierWithDropout",hidden=c(1200,50),epoch=100))
# user system elapsed
6.42 0.25 668.32
h2o.performance(dlearning.model3)# ** Reported on training data. **
Description: Metrics reported on full training frame
MSE: 0.01964181
R2 : 0.851305
Mean Residual Deviance : 0.01964181
predict.dl2<-as.data.frame(h2o.predict(dlearning.model3,patient.test))
submi_dlearning3<-data.frame(Predicted_Mort=predict.dl2$predict)
write.csv(submi_dlearning3,file="submi_dlearning3.csv",row.names=F)
àPart of the current output:
|
Predicted_Mort |
|
0.131749662 |
|
0.14698337 |
|
0.155728288 |
|
0.130509461 |
|
0.130420171 |
|
0.133914652 |
|
0.134027134 |
|
0.124258962 |
|
0.136126049 |
|
0.136019254 |
|
0.122301849 . . . |
patient.test<-h2o.importFile("C:\\Users\\patient\\patient_testset.csv")
| 0 |
| 0 |
| 0 |
| 0 |
| 0 |
| 0 |
| 0 |
| 0 |
| 0 |
| 0 |
| 0 |
| 0 |
. . . |
--
You received this message because you are subscribed to a topic in the Google Groups "H2O Open Source Scalable Machine Learning - h2ostream" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/h2ostream/zazcc2rtvwA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to h2ostream+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
library(h2o)
localH2O=h2o.init(nthreads=-1)
> s_train<-h2o.importFile("C:\\Users\\Desktop\\snp_trainingset_70_13.csv")
> s_test<-h2o.importFile("C:\\Users\\Desktop\\snp_testset_69_12.csv")
> dim(s_train)
[1] 83 16384
> dim(s_test)
[1] 81 16384
> s_train[,16384]=as.factor(s_train[,16384])
> x.indep<-1:16383
# now, y=16384
> system.time(dlearning.model6<-h2o.deeplearning(y=16384,x=x.indep,training_frame=s_train,activation="RectifierWithDropout",hidden=c(1200,50),epoch=100))
|============================================| 100%
user system elapsed
3.18 0.04 98.04
> h2o.performance(dlearning.model6)
H2OBinomialMetrics: deeplearning
** Reported on training data. **
Description: Metrics reported on full training frame
MSE: 6.163486e-16
R^2: 1
LogLoss: 3.458327e-09
AUC: 1
Gini: 1
Confusion Matrix for F1-optimal threshold:
0 1 Error Rate
0 70 0 0.000000 =0/70
1 0 13 0.000000 =0/13
Totals 70 13 0.000000 =0/83
Maximum Metrics: Maximum metrics at their respective thresholds
metric threshold value idx
1 max f1 1.000000 1.000000 1
2 max f2 1.000000 1.000000 1
3 max f0point5 1.000000 1.000000 1
4 max accuracy 1.000000 1.000000 1
5 max precision 1.000000 1.000000 0
6 max recall 1.000000 1.000000 1
7 max specificity 1.000000 1.000000 0
8 max absolute_MCC 1.000000 1.000000 1
9 max min_per_class_accuracy 1.000000 1.000000 1
Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or `h2o.gainsLift(<model>, valid=<T/F>, xval=<T/F>)`
--> And,here is where I get stuck: the 'h2o.predict' function:
> pred6<-as.data.frame(h2o.predict(dlearning.model6,s_test))
Error message:
ERROR: Unexpected HTTP Status code: 404 Not Found (url = http://localhost:54321/4/Predictions/models/DeepLearning_model_R_1464053711040_1/frames/RTMP_sid_8367_4)
water.exceptions.H2OKeyNotFoundArgumentException
[1] "water.api.ModelMetricsHandler.predict2(ModelMetricsHandler.java:236)"
[2] "sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)"
[3] "sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)"
[4] "sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)"
[5] "java.lang.reflect.Method.invoke(Method.java:606)"
[6] "water.api.Handler.handle(Handler.java:62)"
[7] "water.api.RequestServer.handle(RequestServer.java:653)"
[8] "water.api.RequestServer.serve(RequestServer.java:594)"
[9] "water.JettyHTTPD$H2oDefaultServlet.doGeneric(JettyHTTPD.java:616)"
[10] "water.JettyHTTPD$H2oDefaultServlet.doPost(JettyHTTPD.java:564)"
[11] "javax.servlet.http.HttpServlet.service(HttpServlet.java:755)"
[12] "javax.servlet.http.HttpServlet.service(HttpServlet.java:848)"
[13] "org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:684)"
Error in .h2o.doSafeREST(h2oRestApiVersion = h2oRestApiVersion, urlSuffix = page, :
ERROR MESSAGE:
Object 'RTMP_sid_8367_4' not found in function: predict for argument: frame
--
You received this message because you are subscribed to the Google Groups "H2O Open Source Scalable Machine Learning - h2ostream" group.
To unsubscribe from this group and stop receiving emails from it, send an email to h2ostream+...@googlegroups.com.
Hello Lauren,Yes, this is what I expected: the 16384th column in the trainingset is continuous while the response column in the testset should be categorical.Also,I do get an error while executing the command s_test[,16384]=as.factor(s_test[,16384]).The error is as follows:ERROR: Unexpected HTTP Status code: 400 Bad Request (url = http://localhost:54321/99/Rapids)java.lang.IllegalArgumentException[1] "water.rapids.ASTColSlice.col_select(ASTColSlice.java:39)"[2] "water.rapids.ASTColSlice.apply(ASTColSlice.java:25)"[3] "water.rapids.ASTExec.exec(ASTExec.java:46)"[4] "water.rapids.ASTAsFactor.apply(ASTStrList.java:104)"[5] "water.rapids.ASTExec.exec(ASTExec.java:46)"[6] "water.rapids.ASTAppend.apply(ASTAssign.java:231)"[7] "water.rapids.ASTAppend.apply(ASTAssign.java:225)"[8] "water.rapids.ASTExec.exec(ASTExec.java:46)"[9] "water.rapids.ASTTmpAssign.apply(ASTAssign.java:260)"[10] "water.rapids.ASTTmpAssign.apply(ASTAssign.java:253)"[11] "water.rapids.ASTExec.exec(ASTExec.java:46)"[12] "water.rapids.Session.exec(Session.java:56)"[13] "water.rapids.Exec.exec(Exec.java:63)"[14] "water.api.RapidsHandler.exec(RapidsHandler.java:25)"[15] "sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)"[16] "sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)"[17] "sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)"[18] "java.lang.reflect.Method.invoke(Method.java:606)"[19] "water.api.Handler.handle(Handler.java:62)"[20] "water.api.RequestServer.handle(RequestServer.java:653)"[21] "water.api.RequestServer.serve(RequestServer.java:594)"[22] "water.JettyHTTPD$H2oDefaultServlet.doGeneric(JettyHTTPD.java:616)"[23] "water.JettyHTTPD$H2oDefaultServlet.doPost(JettyHTTPD.java:564)"[24] "javax.servlet.http.HttpServlet.service(HttpServlet.java:755)"[25] "javax.servlet.http.HttpServlet.service(HttpServlet.java:848)"[26] "org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:684)"
Error in .h2o.doSafeREST(h2oRestApiVersion = h2oRestApiVersion, urlSuffix = page, :ERROR MESSAGE:
Column must be an integer from 0 to 16382--> How do we work around this?Thank you so much.
--
Hi Lauren,The last column in the training set has been made categorical (through the command that Darren suggested: s_train[,16384]=as.factor(s_train[,16384]). The last column in the training_set only contains 0/1 as it denotes 'mortality',while the 1st 16383 columns contain continuous values.The test_set has 16384 columns filled with continuous values. No 0/1 in it ,in any column (because,we need to predict it to be 0/1,i.e we need to predict the 'mortality' .)My original intention while starting out the problem has been summed up well by Darren in the previous reply: " * Turn the continuous variable into 2 categories, then make a binaryclassification model."Though now,since I'm not able to move forward,I think I'd be okay with it if I adopt his approach:"Though the thread subject says "binary classification". I wonder if it is expected 0 to 0.5 are supposed to be 0, and 0.5 to 1.0 are supposed to be 1?" and " * Make a regression model, then take the predicted continuous value and
convert that to 0 or 1."
So,to tweak my question:1) How can I correctly load the test and train files? (train_file:last column (16384th) is 0/1, test_file: all the 16384 columns have continous values; and I do not wish to delete any columns in either file)2)How do I set a threshold for the prediction results from the regression model,so that 0 to 0.5 are supposed to be 0, and 0.5 to 1.0 are supposed to be 1?