Caffe for natural language

Deniz Yuret

unread,

Nov 30, 2014, 10:02:18 AM11/30/14

to caffe...@googlegroups.com

I am trying to figure out whether I can use caffe for word embeddings. Here is a simplified architecture:

The input to the network consists of ~10 words, where each word is represented as a one-hot binary vector of ~100K dimensions. Each of the n input words is connected to a distinct set of ~100 hidden units, i.e. there are ~1000 hidden units in total. The set of weights from each of the n words to its set of hidden units is the same, i.e. a word causes the same activation in its hidden units (aka the word's embedding) no matter which position it is in. These 1000 hidden units are connected to further hidden layers and finally a softmax output.

The problem is the size of the input. Even though the number of weights is manageable (~10M in the example above), each input instance is 1M dimensional (although there are only 10 non-zero entries). This makes storing the training set standard caffe style a bit tricky. Any suggestions?

ra...@fct.unl.pt

unread,

Dec 1, 2014, 12:20:41 PM12/1/14

to caffe...@googlegroups.com

Hi

I'm having a similar problem. I have categorical data, about 40 Milion samples and each sample has 100 categorical binary features. I'm trying to create an hdf5 data layer that receives binary vectors (C++ vector<bool> ??) and converts it to a vector<float> with the same dimension. In that way the input file size would be reduced to fit in memory. I'm just trying to figure out how to do it. Would it help you?

Evan Shelhamer

unread,

Dec 1, 2014, 1:07:53 PM12/1/14

to ra...@fct.unl.pt, caffe...@googlegroups.com

One potential approach would be to store the data in a sparse format but then load each batch into blobs. There should be a sparse blob eventually -- and I believe there is a related PR that deserves to be reviewed. Stay tuned.

--
You received this message because you are subscribed to the Google Groups "Caffe Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to caffe-users...@googlegroups.com.
To post to this group, send email to caffe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/61052ff1-e4f5-4e1f-a217-6f6e7a70a8cc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Deniz Yuret

unread,

Dec 2, 2014, 2:51:44 AM12/2/14

to caffe...@googlegroups.com, ra...@fct.unl.pt, shel...@eecs.berkeley.edu

Thanks Evan, rapr:

This is exactly what I need! I think Evan's suggestion of having a sparse data format would be more helpful in case of language models: storing the index of non-zero bits takes much less space than storing a boolean vector in my case. I am willing to put in the work to create such a sparse data format (which I assume is easier than hacking a sparse blob). Could you guys point me in the right direction, e.g. is there a piece of code I can use as a starting point?

deniz

ra...@fct.unl.pt

unread,

Dec 2, 2014, 10:38:42 AM12/2/14

to caffe...@googlegroups.com, ra...@fct.unl.pt, shel...@eecs.berkeley.edu

Hi

After reading Evan's sugestion, I'm trying to create an hdf5 data layer for categorical data that works in folowing way: mainly it gets a matrix of non negative integers where each raw is a sample and each column refers to one of the categories; the integers on each column indicate the value of the sample corresponding to that category and the number of possible values does not have to be the same for all columns. It will create a blob (num, n, 1, 1) where n is the sum of the possible values on the different columns.

The input is not the same as a general sparse matrix. The only advantage I see on this approach against a sparse matrix, is that it will be easier(faster??) to create the different batches...

Rui

Friendly User

unread,

Feb 11, 2015, 4:56:21 PM2/11/15

to caffe...@googlegroups.com

working on the same problem

creating the hdf5 blobs according to this paper http://arxiv.org/pdf/1502.01710v1.pdf ("Text Understanding from Scratch") was easy

now we need the convolutions to operate on one axis only...

Evan Shelhamer

unread,

Feb 11, 2015, 5:40:56 PM2/11/15

to Friendly User, caffe...@googlegroups.com

convolutions to operate on one axis only...

Set kernel_h and kernel_w instead of kernel_size: https://github.com/BVLC/caffe/blob/dev/src/caffe/proto/caffe.proto#L381-L382

Evan Shelhamer

--

You received this message because you are subscribed to the Google Groups "Caffe Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to caffe-users...@googlegroups.com.
To post to this group, send email to caffe...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/c3d87e6c-d781-41cc-8b03-81ebdd595ce4%40googlegroups.com.

Karsten Flügge

unread,

Feb 11, 2015, 6:26:59 PM2/11/15

to Evan Shelhamer, caffe...@googlegroups.com

Thanks!!

and sorry for asking before looking at the obvious place

Reply all

Reply to author

Forward