Rattle on H2O or Spark

77 views
Skip to first unread message

ramkumar nimmakayala

unread,
Oct 19, 2016, 8:47:37 AM10/19/16
to rattle-users
Just a basic question, is there way that we can use the backend of Hadoop or Spark and run rattle on top of it today?

Graham Williams

unread,
Oct 19, 2016, 7:30:45 PM10/19/16
to rattle-users
Not yet, as such. If R is installed on the instance you could run Rattle there but not make use of the specific model functions for Spark/Hadoop.

It is being looked at.

One approach that is very close is using the RevoScaleR (now Microsoft R) functions. Initial support for tree and forest is already in Rattle for the local compute context. RevoScaleR allows the compute context to be changed to Hadoop or Spark with a couple of lines of code and then all remaining code stays as it is and runs as is on the different remote compute context whether that is a  Hadoop or Spark server. So then run Rattle on your laptop and target the computation on Hadoop/Spark.

For general open source support, contributions of code, even sample code based on Rattle's Log tab, is always useful. I.e., how would you change the code exposed in the Log tab to use Hadoop or Spark? That can then be incorporated into Rattle's code generator.

Looking forward to any contributions!

Regards,
Graham

On 15 October 2016 at 02:13, ramkumar nimmakayala <ramkumar.n...@gmail.com> wrote:
Just a basic question, is there way that we can use the backend of Hadoop or Spark and run rattle on top of it today?

--
You received this message because you are subscribed to the Google Groups "rattle-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rattle-users+unsubscribe@googlegroups.com.
To post to this group, send email to rattle...@googlegroups.com.
Visit this group at https://groups.google.com/group/rattle-users.
For more options, visit https://groups.google.com/d/optout.

ramkumar nimmakayala

unread,
May 21, 2017, 4:53:28 PM5/21/17
to rattle-users

Thanks Graham for your response. With packages like Sparklyr, rsparkling we can take the advantage of H2o & Spark as of now. I am not a developer myself but I would love to see rattle running on top of a compute framework like spark etc. 

I agree with your point about the code output as a log. But, I am having struggle to see rattle to read 500 column dataset with few thousands of rows. It gets stuck. wondering if there is something I have to do with respect to memory.

On Wednesday, October 19, 2016 at 6:30:45 PM UTC-5, Graham Williams wrote:
Not yet, as such. If R is installed on the instance you could run Rattle there but not make use of the specific model functions for Spark/Hadoop.

It is being looked at.

One approach that is very close is using the RevoScaleR (now Microsoft R) functions. Initial support for tree and forest is already in Rattle for the local compute context. RevoScaleR allows the compute context to be changed to Hadoop or Spark with a couple of lines of code and then all remaining code stays as it is and runs as is on the different remote compute context whether that is a  Hadoop or Spark server. So then run Rattle on your laptop and target the computation on Hadoop/Spark.

For general open source support, contributions of code, even sample code based on Rattle's Log tab, is always useful. I.e., how would you change the code exposed in the Log tab to use Hadoop or Spark? That can then be incorporated into Rattle's code generator.

Looking forward to any contributions!

Regards,
Graham

On 15 October 2016 at 02:13, ramkumar nimmakayala <ramkumar.n...@gmail.com> wrote:
Just a basic question, is there way that we can use the backend of Hadoop or Spark and run rattle on top of it today?

--
You received this message because you are subscribed to the Google Groups "rattle-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rattle-users...@googlegroups.com.

Graham Williams

unread,
May 27, 2017, 11:37:06 AM5/27/17
to rattle-users
The code handling very wide data is not as efficient as it should be and hence we see issues when we have quite wide datasets. Something to work on. Thanks for the information.

Regards,
Graham

To unsubscribe from this group and stop receiving emails from it, send an email to rattle-users+unsubscribe@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages