Bug in DenseDesignMatrix's _apply_holdout?

15 views
Skip to first unread message

Gerrit Kieffer

unread,
Nov 20, 2015, 9:32:23 AM11/20/15
to pylearn-dev
Dear pylearn-dev team,

I was recently trying to split my dataset into training and test/validation sets using bootstrap_holdout(...) when I encountered the following error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "[...]/pylearn2.git/pylearn2/datasets/dense_design_matrix.py", line 597, in bootstrap_holdout
    return self._apply_holdout("random_slice", train_size, train_prop)
  File "[...]/pylearn2.git/pylearn2/datasets/dense_design_matrix.py", line 520, in _apply_holdout
    batch_size=size)
  File "[...]/pylearn2.git/pylearn2/datasets/dense_design_matrix.py", line 301, in iterator
    rng),
  File "[...]/pylearn2.git/pylearn2/utils/iteration.py", line 557, in __init__
    raise ValueError("num_batches cannot be None for random slice "
ValueError: num_batches cannot be None for random slice iteration

I believe this is a bug which was introduced with commit b5082926151c2b3b94159c614e7ef5e0adbb8b35 (https://github.com/lisa-lab/pylearn2/commit/b5082926151c2b3b94159c614e7ef5e0adbb8b35), where the parameter num_batches=2 was removed from the method call to self.iterator

Can you confirm this? 

And furthermore I wanted to know if there is a way of random splitting -without- resampling, i.e.: split_dataset_*(...) splits the dataset, but keeps the examples in order, so you always have the same examples in training and test sets, but bootstrap_*(...) just samples randomly with replacement, so you will most likely have some examples in both, the training and the test set.

Thanks in advance.

Kind regards,
Gerrit Kieffer
Reply all
Reply to author
Forward
0 new messages