Does h2o.splitFrame function randomize - or not?

108 views
Skip to first unread message

Ogukku

unread,
Apr 23, 2017, 10:10:52 AM4/23/17
to H2O Open Source Scalable Machine Learning - h2ostream
I'd like to get a final, complete answer about splitFrame(). In my experience, splitframe seems to "cut" the data into consecutive partitions.

However the following two links suggest that the h2o.splitFrame function introduces randomization as part of the splitting process:

https://groups.google.com/forum/m/#!searchin/h2ostream/Splitframe/h2ostream/zDgBPCb4wkY

http://stackoverflow.com/q/43463178/6088414

Yet, the link below by Arno Candel suggests that splitFrame only "cuts" data into consecutive partitions:

https://stats.stackexchange.com/questions/168480/inverse-progression-for-training-validation-data-during-training-with-h2o

So which one is it?

Thanks!

Tom Kraljevic

unread,
Apr 23, 2017, 11:19:21 AM4/23/17
to Ogukku, H2O Open Source Scalable Machine Learning - h2ostream

[ Final, complete answers can change in software! ]

It used to cut.  And now it’s a random row-by-row “coin flip” (comparison of a threshold against a per-row value of runif).

This is the git commit that converted the behavior in Oct. 2015.

This is the relevant file in the master branch (search for splitFrame):

And these are the release branches (as well as the master branch, and therefore future releases) that contain the random behavior.  So it’s been a while.
$ git branch -r --contains f2b73c87eb12494998a529360f14da5a2bb11a38 | grep rel-
  origin/rel-tibshirani
  origin/rel-tukey
  origin/rel-turan
  origin/rel-turchin
  origin/rel-turin
  origin/rel-turing
  origin/rel-turing-8a
  origin/rel-turnbull
  origin/rel-tutte
  origin/rel-tverberg
  origin/rel-ueno

I also added a comment about this on the older post by Arno.


Tom


--
You received this message because you are subscribed to the Google Groups "H2O Open Source Scalable Machine Learning - h2ostream" group.
To unsubscribe from this group and stop receiving emails from it, send an email to h2ostream+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages