I was recently trying to split my dataset into training and test/validation sets using bootstrap_holdout(...) when I encountered the following error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "[...]/pylearn2.git/pylearn2/datasets/dense_design_matrix.py", line 597, in bootstrap_holdout
return self._apply_holdout("random_slice", train_size, train_prop)
File "[...]/pylearn2.git/pylearn2/datasets/dense_design_matrix.py", line 520, in _apply_holdout
batch_size=size)
File "[...]/pylearn2.git/pylearn2/datasets/dense_design_matrix.py", line 301, in iterator
rng),
File "[...]/pylearn2.git/pylearn2/utils/iteration.py", line 557, in __init__
raise ValueError("num_batches cannot be None for random slice "
ValueError: num_batches cannot be None for random slice iteration
And furthermore I wanted to know if there is a way of random splitting -without- resampling, i.e.: split_dataset_*(...) splits the dataset, but keeps the examples in order, so you always have the same examples in training and test sets, but bootstrap_*(...) just samples randomly with replacement, so you will most likely have some examples in both, the training and the test set.
Thanks in advance.