Use of DataStream with different data subsets for different epochs

6 views
Skip to first unread message

hoshi...@gmail.com

unread,
Sep 28, 2016, 9:46:21 AM9/28/16
to fuel-users
Hey everyone,

For one of my projects, I have a validation dataset that contains about 1 million entries. It would take way too much time to validate on that entire dataset for each validation, so I've been wondering: Is there a way to make the DataStream return a subset of that dataset for each epoch?

For example: For each validation, I want to use a subset of 1000 validation entries. For my first validation, the first epoch that the DataStream returns should contain the entries [0, ..., 999], the second one for the second validation should contain the entries [1000, ..., 1999], and so on. If possible, the batch size should be freely configurable as well.

I've been studying the API and experimenting with different IterationSchemes and DataStreams, but I've not been able to find a solution.

Apologies if it's something really simple that I've overseen, and thanks in advance!

David Warde-Farley

unread,
Sep 28, 2016, 1:12:02 PM9/28/16
to hoshi...@gmail.com, fuel-users
I'd actually thought about this problem before, and how it'd be nice to have, and so I went ahead and implemented it. It's pretty straightforward. (Python 3.x only, shouldn't be hard to port to Python 2 as I've noted in the comments.)

Basically, construct your stream however you want, with whatever batch sizes etc., then wrap it in this thing, telling it how many batches (or examples, if the wrapped stream produces single examples) you want to constitute an "epoch".



--
You received this message because you are subscribed to the Google Groups "fuel-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fuel-users+unsubscribe@googlegroups.com.
To post to this group, send email to fuel-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/fuel-users/01308b64-08ee-4118-91f8-219a053b718c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

hoshi...@gmail.com

unread,
Sep 29, 2016, 4:55:32 AM9/29/16
to fuel-users, hoshi...@gmail.com
I'll have a look at it - thanks!
Reply all
Reply to author
Forward
0 new messages