Iterating along one dimension of the data

150 views
Skip to first unread message

Yuri Baburov

unread,
Apr 4, 2015, 6:03:01 AM4/4/15
to caffe...@googlegroups.com
I have a lot of sound recordings and want to learn a CNN from them (and RNN later).

I have my data in (N, 256) entries, where N is the number of input frames for each recording.

My network reads (20, 256) entries.

I realise that I could move and slice the data into (20, 256) chunks but then I'll have 20x input file overhead.
I did this, but I also want to add random noise, and now 1 hour of my recordings has 6 GB size!

How it is best to do it with Caffe without data duplication?

J. Yegerlehner

unread,
Apr 10, 2015, 1:18:16 PM4/10/15
to caffe...@googlegroups.com
Hi Yuri,

As far as I know, presenting a window of past values as an input to a net from a time history of data without duplication would require new feature development. I could be wrong. Pull requests welcome! We're probably going to need that if we ever deal with video.

But 6GB? That's cute. You should try working with images. I filled up a 2TB dedicated hard drive extracting features from a modest sized image database. :P

Jim

Yuri Baburov

unread,
Apr 11, 2015, 3:48:19 AM4/11/15
to J. Yegerlehner, caffe...@googlegroups.com
I meant, 6GB was for just 1 hour of speech, not whole corpus.
Typical speech corpus sizes start from 1000 hours -- so, 6-600TB is the more realistic figure...
Baidu used 10,000 hours corpus (+100,000 hours of background noises).

Ok, thanks, I'm working on making a layer which will do exactly what I need, and I'll do a pull request.

--
You received this message because you are subscribed to a topic in the Google Groups "Caffe Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/caffe-users/PkA2de88i3k/unsubscribe.
To unsubscribe from this group and all its topics, send an email to caffe-users...@googlegroups.com.
To post to this group, send email to caffe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/b7928575-c512-41cc-ba6f-12bd5bdf9362%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Best regards, Yuri V. Baburov, Skype: yuri.baburov

Evan Shelhamer

unread,
Apr 11, 2015, 4:09:02 PM4/11/15
to Yuri Baburov, J. Yegerlehner, caffe...@googlegroups.com
Hey Yuri,

Yeah, you'll want to make a data layer that internally seeks over windows or caches past values. I've heard the same has been done for video, but I haven't worked on it myself or seen a pull request for it yet, so please do send whatever you work out.

Evan Shelhamer

--
You received this message because you are subscribed to the Google Groups "Caffe Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to caffe-users...@googlegroups.com.

To post to this group, send email to caffe...@googlegroups.com.

J. Yegerlehner

unread,
Apr 14, 2015, 1:11:20 PM4/14/15
to caffe...@googlegroups.com, yeger...@gmail.com
Yuri,
If you're not familiar with it, I recommend considering using boost::circular_buffer in your design for this. It's a natural fit.


Yuri Baburov

unread,
Apr 14, 2015, 2:53:17 PM4/14/15
to caffe...@googlegroups.com
Jim, Evan,

Thanks a lot,

I used to write in C/C++ a lot but it was many years ago :)
I'll check it, however I thought the data copying is necessary to have the correct data representation and the CPU-GPU copying will be the activity taking most CPU time, right?

And actually this is only one tiny piece of the puzzle.

Currently for low-dimensional versions (~100x10), CPU is 120% busy while reported GPU load is only 6%.
Which probably means it could run 15 times faster if we would eliminate the need of manipulations on CPU. 
This bug can be easily reproduced with MNIST (GPU mode, large batches -- and it still wouldn't speed up to more than 10x from CPU, but for images the difference is up to 50x).
I would like to improve that, but I don't know how.

I got very good results with just CNN, but I'd like to be able to do RNN processing -- which should be very easy by itself... But as I understood it's not quite easy with Caffe at the moment...
I considered the following pipeline:

data -> sliding window -> normalizing to [0, 1] interval & adding noise -> several layers of neurons -> RNN

where consequent data chunks are terminated with zeros or in some other way -- then RNN should reset its values.

But then there is the batching issue: how to run these in parallel?

I would consider running those from python with net.forward / net.backward... then I will avoid any additional C/C++ code and could control timing and other points...
But then I'll be worried about speed much more than in case of running ./caffe directly! Or shouldn't I?

Multimodal networks are my passion... but how to develop them using the computing power efficiently?
This is my biggest question at the moment.

On Tue, Apr 14, 2015 at 11:11 PM, J. Yegerlehner <yeger...@gmail.com> wrote:
Yuri,
If you're not familiar with it, I recommend considering using boost::circular_buffer in your design for this. It's a natural fit.


--
You received this message because you are subscribed to a topic in the Google Groups "Caffe Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/caffe-users/PkA2de88i3k/unsubscribe.
To unsubscribe from this group and all its topics, send an email to caffe-users...@googlegroups.com.
To post to this group, send email to caffe...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

J. Yegerlehner

unread,
Apr 15, 2015, 1:54:05 AM4/15/15
to caffe...@googlegroups.com



Yuri,
 
Multimodal networks are my passion...

 I understand that drug. Except for the "multimodal" thing.



Currently for low-dimensional versions (~100x10), CPU is 120% busy while reported GPU load is only 6%.

My intuition is that we can do this without maxing out the CPU, if designed properly. But without seeing what you've done I don't know how to help.

My suggestion is submit a PR or issue on https://github.com/BVLC/caffe that shows us what you did as a diff. We might be able to see what the problem is.

Reply all
Reply to author
Forward
0 new messages