RNN example

alepb...@gmail.com

unread,

Jan 30, 2017, 6:30:21 PM1/30/17

to BigDL User Group

Folks,

I have the same input shape as the one from the TextClassifier.scala:

val maxSequenceLength = 10
val embeddingDim = 100

val batching = Batching(param.batchSize, Array(maxSequenceLength, embeddingDim))
val trainingDataSet = DataSet.rdd(trainingRDD) -> batching
val valDataSet = DataSet.rdd(valRDD) -> batching

I'd like to traing an RNN to compute a classification task and I was trying something like this:

val hiddenSize = 40
val bpttTruncate = 4
val outputSize = 1000
val inputSize = 1000

val model_N = Sequential[Float]()
  .add(Recurrent[Float](hiddenSize, bpttTruncate)
  .add(RnnCell[Float](inputSize, hiddenSize))
  .add(Tanh[Float]()))
  .add(Linear[Float](hiddenSize, outputSize))
  .add(Linear(2, classNum))
  .add(LogSoftMax())

Where classNum is the number of labels I have.

val optimizer = Optimizer(
  model = model_N,
  dataset = trainingDataSet,
  criterion = new CrossEntropyCriterion[Float]()
).asInstanceOf[DistriOptimizer[Float]].disableCheckSingleton()

val numEpochs = 5
optimizer.
  setState(state).
  setValidation(Trigger.everyEpoch, valDataSet, Array(new Top1Accuracy[Float])).
  setOptimMethod(new SGD[Float]()).
  setEndWhen(Trigger.maxEpoch(numEpochs)).
  optimize()

This is the error that Im getting.

java.util.concurrent.ExecutionException: java.lang.IllegalArgumentException: requirement failed: input should be a two dimension Tensor

 I think Im lost as to how reshape my data correctly. I would expect Batching to spit out miniBatches of [maxSequenceLength x embedding] but Im probably wrong.

Any support would be much appreciated.

shell...@gmail.com

unread,

Jan 30, 2017, 11:01:08 PM1/30/17

to BigDL User Group, alepb...@gmail.com

Hi,

Currently, the RNN model only supports one sample input, which means train the model using training samples one by one. The error is thrown when Recurrent.scala loads the input with dimemsion greater than 2. (One sentence is represented by a two-dimensional tensor)

The BigDL will support the RNN for minibatch training soon.

alepb...@gmail.com

unread,

Jan 31, 2017, 4:01:48 PM1/31/17

to BigDL User Group, alepb...@gmail.com

Hi again,

I understand that the problem is my input being a 2d tensor so I tried to bring it back to one dimension and feed the RNN to perform a classification but I keep getting an exception.

Here is the code:

https://gist.github.com/whitebread/6da8f08f92332c1a6956de843a7addea

Here it's the exception:

java.util.concurrent.ExecutionException: java.lang.IllegalArgumentException: requirement failed

at java.util.concurrent.FutureTask.report(FutureTask.java:122)

at java.util.concurrent.FutureTask.get(FutureTask.java:192)

at com.intel.analytics.bigdl.optim.DistriOptimizer$$anonfun$4$$anonfun$7.apply(DistriOptimizer.scala:176)

at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)

at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)

at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)

at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)

at scala.collection.AbstractTraversable.map(Traversable.scala:104)

at com.intel.analytics.bigdl.optim.DistriOptimizer$$anonfun$4.apply(DistriOptimizer.scala:176)

at com.intel.analytics.bigdl.optim.DistriOptimizer$$anonfun$4.apply(DistriOptimizer.scala:125)

at org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:89)

at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)

at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)

at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)

at org.apache.spark.scheduler.Task.run(Task.scala:85)

at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)

Caused by: java.lang.IllegalArgumentException: requirement failed

at scala.Predef$.require(Predef.scala:207)

at com.intel.analytics.bigdl.tensor.DenseTensorMath$.addmv(DenseTensorMath.scala:603)

at com.intel.analytics.bigdl.tensor.DenseTensor.addmv(DenseTensor.scala:1204)

at com.intel.analytics.bigdl.nn.Linear.updateOutput(Linear.scala:70)

at com.intel.analytics.bigdl.nn.Linear.updateOutput(Linear.scala:29)

at com.intel.analytics.bigdl.nn.ParallelTable.updateOutput(ParallelTable.scala:36)

at com.intel.analytics.bigdl.nn.RnnCell.updateOutput(RNN.scala:66)

at com.intel.analytics.bigdl.nn.RnnCell.updateOutput(RNN.scala:28)

at com.intel.analytics.bigdl.nn.Recurrent.updateOutput(Recurrent.scala:47)

at com.intel.analytics.bigdl.nn.Recurrent.updateOutput(Recurrent.scala:26)

at com.intel.analytics.bigdl.nn.abstractnn.AbstractModule.forward(AbstractModule.scala:129)

at com.intel.analytics.bigdl.nn.Sequential.updateOutput(Sequential.scala:33)

at com.intel.analytics.bigdl.nn.abstractnn.AbstractModule.forward(AbstractModule.scala:129)

at com.intel.analytics.bigdl.optim.DistriOptimizer$$anonfun$4$$anonfun$5$$anonfun$apply$2.apply$mcI$sp(DistriOptimizer.scala:164)

at com.intel.analytics.bigdl.optim.DistriOptimizer$$anonfun$4$$anonfun$5$$anonfun$apply$2.apply(DistriOptimizer.scala:158)

at com.intel.analytics.bigdl.utils.ThreadPool$$anonfun$1$$anon$4.call(Engine.scala:119)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

Any help would be highly appreciated. Thanks,

Alessandro

Hi

shell...@gmail.com

unread,

Jan 31, 2017, 8:07:51 PM1/31/17

to BigDL User Group, alepb...@gmail.com

Hi,

I bet the problem is that rnn model in BigDL mismatches your text classification model.

The rnn model in BigDL is an implementation of language model meaning that a sequence of inputs map to a sequence of outputs. ([x1, x2, x3, .., xn] -> [x2, x3, x4, .., xn+1], x_i is a vector.)

However, the rnn model for text classification requires a single or a sequence of input mapping to a single output. ([x] -> [y], x is a vector, y is a vector.)

As a result, the forward method in Recurrent layer will yield a sequence of hidden units given a sequence of inputs. The 2D tensor as an input represents one training sample. i.e. one sentence.

Each row correlates with the others.

In your case, is the input a 1000-length vector [0.2 0.3 -0.2 .... 0.8], and the expected output is a 40-length vector ?

Could you please provide a detailed explanation of your recurrent steps? In text classification, each word is represented by a fixed length vector (1000?). Then the whole article would be

wrapped as a matrix? In this way, you will only need the last hidden unit as the output generated from the Recurrent layer. Then an additional layer will be needed to extract the last hidden unit before the Linear layer.

Besides, in the toSample codes, it accepts (nRow = 1, nCol = 2) parameters. Does it mean that it will form the labeled data to a (2*500, 1) tuple?

Currently, batch training is not supported in Recurrent layer. Please feed the RNN one sample at a time.

alepb...@gmail.com

unread,

Feb 1, 2017, 5:46:02 PM2/1/17

to BigDL User Group

Hello there, thanks for the support so far.

Let me give some more context as you asked first:

In my training set I have sentences, each of them labeled (I have a total of 34 different labels for now).

Each sentence is a concatenation of words, each of them vectorized by its word2vec representation (i.e. the word "dog" will be vectorized via word2vec_model.getVectors("dog")).

Each vector coming out from word2vec is 100 in size.

Every sentence will have the same length , in other words I pad every sentence to be a fixed length vector.

Because my sentences are short I fix the maximum length of 1000 ( which means 10 words *100 each). If I have a sentence which is 3 words only the remaining 7*100 floats will be simply zeros.

So if for example I have "German shepherd" labeled as "Dog", the same sentence in my training set it will look something like this:

sentence_1 = [ 0.2,0.4,...x100, 0.3,0.2,...x100, 0.1,0.6,...x100, 0.0,0.0,0.0,...x700] , 1.0

where the first hundred floats are the word2vec representations of each word followed by the padding (700 zeros). The Float 1.0 at the end represents the label "Dog"

Hopefully this gives you some more clarity as of how my training set looks like.

Having said so, I tried as you said to remodel the network and to reduce my batch size to 1 which threw this error:

java.lang.IllegalArgumentException: requirement failed: total batch size(1) should be at least two times of node number(1) * core number(1), please change your batch size

So that I changed to to 2. Please look at the gist for complete code:

https://gist.github.com/whitebread/0f78c8b7bf43f0522e59033fc23f3451

but I'm still getting the following exception: