Hi everybody,
I am quite new to Blocks and Fuel and I would really like to solve a small problem. The goal is to 1) shuffle the stream each epoch and 2) read few batches ahead in order to sort the data by their length (memory reduction).
The following code solved 2). Firstly, it creates a DataSet (from numpy arrays x,y stored in memory). Secondly, a stream is created, which is consequently batched into (k*b) sized batches: b=number of batches; k=number of batches to be read ahead. Finally, if k>1, the already read batches are sorted by length, unpacked and packed again to batches of size b. The final line adds padding to the new batches.
dataset = IterableDataset({'x': x, 'y': y})
stream = DataStream(dataset=dataset)
stream = Batch(stream, iteration_scheme=ConstantScheme(k * b))
if k > 1:
stream = Mapping(stream, SortMapping(_length)) # wrapper for function len
stream = Unpack(stream)
stream = Batch(stream, iteration_scheme=ConstantScheme(b))
stream = Padding(stream, mask_sources=['x'])
This actually works well. My problem is 1), i.e. how to randomly shuffle the stream each training epoch. I hoped for setting ShuffledScheme or ShuffledExampleScheme as dataset's iteration_scheme but it for some reason didn't work as some other arguments were required or ValueErrors were raised.
Can I somehow use one of these schemes in order to achieve shuffling? Could you please tell me how? I believe this is common problem so it should be very straightforward. Thanks in advance!
Petr