Pascal Lamblin
unread,May 10, 2013, 2:39:09 PM5/10/13Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to pylearn-dev
Hi,
The mechanism for using data_specs (Spaces and sources) in costs,
models and monitoring channels, to request the appropriate data in the
appropriate format from data sets is making progress, it now almost
works with SGD.
I'm having a design problem in the case where no actual data from a
given data set is actually needed: for instance, if the only costs
used for learning and monitoring are penalties over the model's
parameters.
I've created a Space (NullSpace) for that, so the costs are able
to specify they do not use any data, to avoid that data being
generated/copied/reshaped, etc. I've used None as a placeholder value
for batches, for code that actually needs an object passed.
The problem occurs when trying to compute the actual batch size of that
batch. Since the data is not actually read from the data set, I don't
know what the batch size would have been (if it is the last batch of the
set, for instance, it may be smaller than the requested one). I talked
with David W-F about that when I first encountered the problem, and we
decided to use 0, since no actual sample was returned, and there is no
other way a 0 could be returned otherwise. The iterator would still
produce as many values as if actual data were requested.
However, Monitor actually checks that the sum of the returned batch
sizes corresponds to the total size the data set advertised, and it
complains (see error below).
I see different possible solutions:
- Make the data set iterate over the data, even if none is requested,
so we have the right batch size, and find a way to convey that information
to get_batch_size();
- Make the iterator not return any items, preventing iterating over
a data set when no data is returned, and change the existing tests
that use this feature;
- Make Monitor accept 0 as the number of example, provided the Space
does not contain any data.
I would favour the third solution, but I'm open to discussion and other
suggestions.
======================================================================
ERROR: test_monitor.test_dont_serialize_dataset
----------------------------------------------------------------------
Traceback (most recent call last):
File "/opt/lisa/os/epd-7.1.2/lib/python2.7/site-packages/nose/case.py", line 187, in runTest
self.test(*self.arg)
File "/u/lamblinp/code/Pylearn2/pylearn2/tests/test_monitor.py", line 333, in test_dont_serialize_dataset
monitor()
File "/u/lamblinp/code/Pylearn2/pylearn2/monitor.py", line 221, in __call__
+ str(actual_ne) + ".")
RuntimeError: At compile time, your iterator said it had 4.0 examples total, but at runtime it gave us 0.
----------------------------------------------------------------------
--
Pascal