predicting with a convolutional neural network

118 views
Skip to first unread message

Vincent Alexander Saulys

unread,
Jun 1, 2015, 12:30:56 PM6/1/15
to pylear...@googlegroups.com
So I recently trained a convolutional neural network on a some given dataset. Later in another script, I load it up and call 'fprop' with it. What I get is the following error:
"""
Traceback (most recent call last):
  File "gsr_raw_scoring.py", line 44, in <module>
    y_preds = f( x_test )
  File "/usr/local/lib/python2.7/dist-packages/theano/compile/function_module.py", line 513, in __call_
_                                                                                                    
    allow_downcast=s.allow_downcast)
  File "/usr/local/lib/python2.7/dist-packages/theano/tensor/type.py", line 169, in filter
    data.shape))
TypeError: ('Bad input argument to theano function with name "gsr_raw_scoring.py:35"  at index 0(0-base
d)', 'Wrong number of dimensions: expected 4, got 2 with shape (368, 20014).') 
"""
I suspect this has to do with how the data is transformed during training into a series of images. My questions are the following:
How does the dataset get transformed during training?
Why does it not get transformed during fprop?
How can I transform the data to fit what its expect?

Thanks in Advance,
Vincent

Nicu Tofan

unread,
Jun 1, 2015, 1:12:11 PM6/1/15
to pylear...@googlegroups.com
Is a dataset made out of images or not? I suspect not as 20014 = 2 * 10007
What dataset (class) do you use?
The information is probably in your yaml file, so it may help posting that, too.
The questions are too broad for me to respond, sorry.

Pascal Lamblin

unread,
Jun 1, 2015, 3:29:19 PM6/1/15
to pylear...@googlegroups.com
On Mon, Jun 01, 2015, Vincent Alexander Saulys wrote:
> I suspect this has to do with how the data is transformed during training
> into a series of images. My questions are the following:
> How does the dataset get transformed during training?
> Why does it not get transformed during fprop?
> How can I transform the data to fit what its expect?

The model expects the variables passed through fprop to be transformed
in the appropriate space already. The reshaping from whatever format the
data was in, to whatever the model needs, is usually done in the dataset
iterator.

To know which format the model needs, you can use `get_input_space()`.
To make your dataset iterator spit out data in that format, you can
build a data_specs using that space, and assuming you are using a
standard Dataset object, the iterator should take care of it.

Please let us know if you need more in-depth information,

--
Pascal

Vincent Alexander Saulys

unread,
Jun 2, 2015, 9:48:17 AM6/2/15
to pylear...@googlegroups.com
Well I do wrap the data as a Conv2DSpace in the training yaml file. I have trouble following the
flow of the code here as the yaml file gets parsed out, I was curious if anybody could point to
how to do this? I know that <http://deeplearning.net/software/pylearn2/internal/data_specs.html>
has information on this, but they don't seem to actually have any examples, just detailed write
ups.
What I have is a CSV, with the first column being a target value (this is a regression problem) and the
following columns being features. I want to transform this into a Conv2DSpace of shape [1,20014] &
number of channels being 1. Whats the way to do this?

Nicu Tofan

unread,
Jun 2, 2015, 10:41:41 AM6/2/15
to pylear...@googlegroups.com
About YAML file: remember that everything you see in the YAML file are constructor invocations.
You can easily convert yaml to python - understand how it works - then go back at using yaml (I confess this is how I started).
For example, see cifar10.yaml converted to python here.

My understanding is that your examples are not an images. Conv2DSpace is to be used when the data has topological meaning. in your case I would guess a VectorSpace would be more appropriate.

Also, remember to add task: 'regression' to the CsvDataset constructor.
Maybe you can take inspiration from this discussion (this is the correct yaml file)

--
You received this message because you are subscribed to the Google Groups "pylearn-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pylearn-user...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Pascal Lamblin

unread,
Jun 2, 2015, 2:02:13 PM6/2/15
to pylear...@googlegroups.com
On Tue, Jun 02, 2015, Vincent Alexander Saulys wrote:
> Well I do wrap the data as a Conv2DSpace in the training yaml file. I have
> trouble following the
> flow of the code here as the yaml file gets parsed out, I was curious if
> anybody could point to
> how to do this? I know that
> <http://deeplearning.net/software/pylearn2/internal/data_specs.html>
> has information on this, but they don't seem to actually have any examples,
> just detailed write
> ups.

Right, that part is mainly about explaining how things are currently
organized and called, not how to implement new functionalities based on
that framework.

> What I have is a CSV, with the first column being a target value (this is a
> regression problem) and the
> following columns being features. I want to transform this into a
> Conv2DSpace of shape [1,20014] &
> number of channels being 1. Whats the way to do this?

The dataset's iterator will take care of the conversion, what you need
is to tell it how you want the data.

For instance, if "dataset" is your instance of CSVDataset containing
the data, and "model" is your model:

data_space = model.get_input_space() # should be a Conv2DSpace
data_source = model.get_input_source() # probably "features"
data_specs = (data_source, data_space)

iter = dataset.iterator(mode='sequential',
batch_size=batch_size,
data_specs=data_specs)


Then you need the prediction function:

X = data_space.make_theano_batch('X')
pred = model.fprop(X)
predict = theano.function([X], pred)

And then you can call that "predict" function on the data coming from
the iterator:

predictions = []
for item in iter:
predictions.append(predict(*item))

Disclaimer: I did not test that code, so minor adjustments may be needed.

>
> On Monday, June 1, 2015 at 3:29:19 PM UTC-4, Pascal Lamblin wrote:
> >
> > On Mon, Jun 01, 2015, Vincent Alexander Saulys wrote:
> > > I suspect this has to do with how the data is transformed during
> > training
> > > into a series of images. My questions are the following:
> > > How does the dataset get transformed during training?
> > > Why does it not get transformed during fprop?
> > > How can I transform the data to fit what its expect?
> >
> > The model expects the variables passed through fprop to be transformed
> > in the appropriate space already. The reshaping from whatever format the
> > data was in, to whatever the model needs, is usually done in the dataset
> > iterator.
> >
> > To know which format the model needs, you can use `get_input_space()`.
> > To make your dataset iterator spit out data in that format, you can
> > build a data_specs using that space, and assuming you are using a
> > standard Dataset object, the iterator should take care of it.
> >
> > Please let us know if you need more in-depth information,
> >
> > --
> > Pascal
> >
>
> --
> You received this message because you are subscribed to the Google Groups "pylearn-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pylearn-user...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.


--
Pascal

Nicu Tofan

unread,
Jun 2, 2015, 2:10:00 PM6/2/15
to pylear...@googlegroups.com
Pascal, if my comments are wrong please don't hesitate to say so.
A future seeker may be confused about two trains of thought with no connection between them.
Also, the code that you just posted is pretty elegant.
Is there any way to post it somewhere in the documentation (after testing it, of course)?
Like a new page - getting predictions from a pylearn2 model.

Pascal Lamblin

unread,
Jun 2, 2015, 4:55:39 PM6/2/15
to pylear...@googlegroups.com
Yes, I think something like that should be added in the documentation,
I can try to do that if noone else volunteers.

Even better in my opinion would be a script like [1], but starting from
a dataset, and supporting various output spaces and export format.

[1] https://github.com/lisa-lab/pylearn2/blob/master/pylearn2/scripts/mlp/predict_csv.py

--
Pascal

Nicu Tofan

unread,
Jun 2, 2015, 5:09:26 PM6/2/15
to pylear...@googlegroups.com
I'm not sure that I'm familiar enough with reStructuredText but I can give it a try in the weekend.

About second point - doesn't that mean that potentially large amounts of memory may be used (for example with DenseDesignDataset)? Is there any acceptable way around this? 

In my classes I added the attributes of interest to the model. Not sure if that can be a universal solution for all classes inheriting Dataset.



--
Pascal

Pascal Lamblin

unread,
Jun 2, 2015, 5:43:11 PM6/2/15
to pylear...@googlegroups.com
On Wed, Jun 03, 2015, Nicu Tofan wrote:
> I'm not sure that I'm familiar enough with reStructuredText but I can give
> it a try in the weekend.

Great, thanks!

> About second point - doesn't that mean that potentially large amounts of
> memory may be used (for example with DenseDesignDataset)? Is there any
> acceptable way around this?

Well, it depends on the size of the dataset and of the dimensionality of
the prediction, but there are a couple of things that can be done:
- compute the prediction minibatch by minibatch, like the example in my
previous e-mail, rather than all at once
- pre-allocate a numpy ndarray with the right number of rows to store
the prediction, rather than storing the predictions in a list of ndarray
that we then have to concatenate
- if the output is to be saved in a CSV or HDF5 file, then we can store
it directly, which removes some more memory usage.

> In my classes I added the attributes of interest to the model. Not sure if
> that can be a universal solution for all classes inheriting Dataset.

I'm not sure what you mean by that.
--
Pascal

Nicu Tofan

unread,
Jun 2, 2015, 5:53:55 PM6/2/15
to pylear...@googlegroups.com
For some reason I misunderstood your first comment (actually I interpreted it in the context of some of my previous failed experiments). I thought that you mean to load the old dataset only to be able to create the minibatches.
Of course you're saying that a new dataset should be created with the examples to predict values for and that makes sense.

Pascal Lamblin

unread,
Jun 2, 2015, 6:25:09 PM6/2/15
to pylear...@googlegroups.com
On Wed, Jun 03, 2015, Nicu Tofan wrote:
> For some reason I misunderstood your first comment (actually I interpreted
> it in the context of some of my previous failed experiments). I thought
> that you mean to load the old dataset only to be able to create the
> minibatches.
> Of course you're saying that a new dataset should be created with the
> examples to predict values for and that makes sense.

Well, I had in mind to start from Dataset object (so that the whole
conversion can happen in the iterator), rather than from a CSV file as
what predict_csv.py does.

The creation of that data would be outside of that script's scope, but
of course people could use CSVDataset if their data is in CSV format.
Or maybe, for ease of use, we should automatically try to build an
appropriate type of Dataset from files.

Nicu Tofan

unread,
Jun 10, 2015, 3:19:22 PM6/10/15
to pylear...@googlegroups.com
Pascal, please have a look at #1538

Pascal Lamblin

unread,
Jun 11, 2015, 2:39:32 PM6/11/15
to pylear...@googlegroups.com
On Wed, Jun 10, 2015, Nicu Tofan wrote:
> Pascal, please have a look at #1538
> <https://github.com/lisa-lab/pylearn2/pull/1538>

Thanks! I'll try to take a look before the end of the week.
Reply all
Reply to author
Forward
0 new messages