Bug in DenseDesignMatrix's _apply_holdout?

15 views

Skip to first unread message

Gerrit Kieffer

unread,

Nov 20, 2015, 9:32:23 AM11/20/15

to pylearn-dev

Dear pylearn-dev team,

I was recently trying to split my dataset into training and test/validation sets using bootstrap_holdout(...) when I encountered the following error:

Traceback (most recent call last):

File "<stdin>", line 1, in <module>

File "[...]/pylearn2.git/pylearn2/datasets/dense_design_matrix.py", line 597, in bootstrap_holdout

return self._apply_holdout("random_slice", train_size, train_prop)

File "[...]/pylearn2.git/pylearn2/datasets/dense_design_matrix.py", line 520, in _apply_holdout

batch_size=size)

File "[...]/pylearn2.git/pylearn2/datasets/dense_design_matrix.py", line 301, in iterator

rng),

File "[...]/pylearn2.git/pylearn2/utils/iteration.py", line 557, in __init__

raise ValueError("num_batches cannot be None for random slice "

ValueError: num_batches cannot be None for random slice iteration

I believe this is a bug which was introduced with commit b5082926151c2b3b94159c614e7ef5e0adbb8b35 (https://github.com/lisa-lab/pylearn2/commit/b5082926151c2b3b94159c614e7ef5e0adbb8b35), where the parameter num_batches=2 was removed from the method call to self.iterator

Can you confirm this?

And furthermore I wanted to know if there is a way of random splitting -without- resampling, i.e.: split_dataset_*(...) splits the dataset, but keeps the examples in order, so you always have the same examples in training and test sets, but bootstrap_*(...) just samples randomly with replacement, so you will most likely have some examples in both, the training and the test set.

Thanks in advance.

Kind regards,

Gerrit Kieffer

Reply all

Reply to author

Forward

0 new messages