H2O and Julia

80 views
Skip to first unread message

Rogerio Normand

unread,
Oct 19, 2015, 8:22:48 AM10/19/15
to H2O Open Source Scalable Machine Learning - h2ostream, Arno Candel, Tom Kraljevic, Cliff Click
Hi there,

This weekend I have discovered Julia (http://julialang.org) and got interested to explore it a a bit. It seems to provide speed on basic data preparation and then it would be possible to call Java to run H2O.

I would like to know if you have had any experience with it. If yes, any tips or material?

Kind Regards,


--
Rogério Normand
YNRo Advisory Services

Cliff Click

unread,
Oct 19, 2015, 11:11:05 AM10/19/15
to Rogerio Normand, H2O Open Source Scalable Machine Learning - h2ostream, Arno Candel, Tom Kraljevic
I've not used Julia personally, although I've read about it extensively.

Can I ask you what are your data prep needs?
We're on a big push to work out data prep with H2O using R, Python,
Scala or Java, and
I'm curious to see if what we're doing meshes well with what you need
getting done.

Thanks
Cliff
cclick0.vcf

Rogerio Normand

unread,
Oct 19, 2015, 2:08:31 PM10/19/15
to cli...@acm.org, H2O Open Source Scalable Machine Learning - h2ostream, Arno Candel, Tom Kraljevic
Hi Cliff,

Thank you for your email.

I am working with EEG (electroencephalography) data, data need to be read from the original files, sliced in a proper manner for my studies and added labels, extra calculations, etc. So, all those steps are before H2O algorithm.

As an example, for one subject I can get a file with 150k rows and 124 columns (about 300 seconds for 300 images seen by that subject). Then I have to add the label regarding the activity (which image) that was been seen at each instants. Remove rows not relevant for the study for each image/block. Then some math procedures with the data to better show the aspects I want to identify (in one study I was trying to identify alcoholic and non-alcoholics based on their EEG signals). After all that, I can feed the data to H2O environment.

As you can see, the preparation seems quite specific on my case. And my expectation with Julia would be to improve the speed in preparing the data (many time I prefer to create specific datasets on-the-fly to avoid to save too many big files). Julia also runs in parallel and multi-plataform, for sure a need I will have in the near future.

I was considering to add C or even Fortran functions to improve speed, but it seems Julia could be a better approach in terms of speed and easier syntax. In your opinion, could it be a good option? If so, I hope it would be possible to call H2O as Java from inside Julia.

Let me know your thoughts.

Kind Regards,

RN

Erin LeDell

unread,
Oct 19, 2015, 3:04:24 PM10/19/15
to Rogerio Normand, cli...@acm.org, H2O Open Source Scalable Machine Learning - h2ostream, Arno Candel, Tom Kraljevic
Hi Rogerio,
I am a big fan of Julia and although it's used quite a bit in academia, it's not too popular in industry yet.  Therefore, it might be a while before we will be able to put resources toward creating a Julia H2O API, however, if that is something you are interested in, you could give it a try.  You can also write your processed data from Julia to disk and then read it into H2O when you are ready to switch over to modeling.

I saw these posts about working with EEG data on kaggle the other day, it might interest you: http://blog.kaggle.com/tag/eeg-data/

Best,
Erin
--
You received this message because you are subscribed to the Google Groups "H2O Open Source Scalable Machine Learning - h2ostream" group.
To unsubscribe from this group and stop receiving emails from it, send an email to h2ostream+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

-- 
Erin LeDell Ph.D.
Statistician & Machine Learning Scientist | H2O.ai

Rogerio Normand

unread,
Oct 20, 2015, 1:53:31 AM10/20/15
to Erin LeDell, cli...@acm.org, H2O Open Source Scalable Machine Learning - h2ostream, Arno Candel, Tom Kraljevic
Hi Erin,

Thank you for your email.

I fully understand H2O position regarding Julia (and I agree). My expectation, assuming it will speed up the data preparation process, is to call H2O with Java instructions from inside Julia.

Thank you for the Kaggle tip. I will check the website.

Kind Regards,

RN


Reply all
Reply to author
Forward
0 new messages