Implementing consistent testset for python-supplied input data

Stepan Andreenko

unread,

Jun 29, 2016, 10:11:29 AM6/29/16

to Caffe Users

Hello!

I am trying to learn a Caffe net using custom Python Data layer for inputs.
Currently I use Python to generate shuffled batches for training phase and it works.

Now I want the solver to use another Python layer with another data for the test phase.
The thing I want to achieve is that the net is always tested against exactly the whole test set
without any shuffling or sampling.

Test set is too big to fit in GPU memory and I think it will fail if I try to set test phase batch size
as large as the whole set. So I have somehow to divide it into several iterations, but my Python
code cannot figure out which iteration it is on "Forward" call and when to reset position for the next
test attempt.

What is the right way to achieve testing consistency in Caffe? It seems the same problem
would exist even if I switch to another Data layer type.

Thanks,
Stepan.

Vijay Kumar

unread,

Jun 30, 2016, 2:14:58 AM6/30/16

to Caffe Users

Shuffle is off by default. So unless you say shuffle as true, it won't shuffle your data. And you don't have to run entire test set at once but run in smaller batches.

This tutorial may help you.

http://nbviewer.jupyter.org/github/BVLC/caffe/blob/master/examples/00-classification.ipynb

Stepan Andreenko

unread,

Jun 30, 2016, 11:12:57 AM6/30/16

to Caffe Users

Thank you for your answer, but probably my question was not clear enough.

The problem is that after data layer (regardless of exact type) is first initialized, and Forward() is called, there is no way
to get any information about what test iteration it is and when next test phase starts.

I would like to reset data layer on every test phase start and feed exactly as many batches as there are in the test set.

Currently the only way I see how to do it is to set number of test iterations (multiplied by batch size) exactly equal
to test set size, keep iteration counter in (custom) data layer itself and reset it when all test set was consumed.

But this is completely idiotic, because I have to synchronize solver parameters with test set size every time I change
anything, and any mistake will lead to inconsistent testing.

Is there a normal way to do consistent testing?

Stepan.

Reply all

Reply to author

Forward