Caffe Shakespeare LSTM text generator producing nonsense (code attached)

79 views
Skip to first unread message

Andrew Kyngdon

unread,
Jun 4, 2017, 7:24:57 AM6/4/17
to Caffe Users
Hi All,

I'm new to NLP and I want to use Caffe for a project I'm working on. To try it out I used the Shakespeare text file from the Caffe2 site.

The problem is that it just produces a stream of seemingly random characters rather than text which approximates English in iambic pentameter. I've tried to isolate the cause but to no avail.

I trained one net (see the deploy.prototxt attached) for 200,000 iterations and it seemed to learn. I used two LSTMs that were preceded by an "embed" layer (62 inputs as this was size of the "vocabulary" or the number of unique characters, with 50 as the output). Training loss ended up being 0.40 (from 4.12 at the start) with a training accuracy of 0.8776. I didn't use dropout because, as I understand, with per character text generation you fit the data "hard".

Attached are the Python scripts for the data layer (Input_Lang.py) and for generating the text (Predict_char.py). I suspect the problem lies in either one (or both) of these. The Python data layer inputs to the embed layer data in the T * N *... form for an LSTM, where T is the number of timesteps in the sequence and N is the batch size. It also prepares the clip markers and labels for an LSTM. A text string and its target are selected at random from the text.

The text generation code takes a "seed" character and builds a string of N-th length by a forward pass through the net to produce the probability of the next character in the string. The problem is that it produces a string like this:

GenerateText(100,'A')
 
AbYob;nChmXESobkk.bBbo,.K.;UCn:,GobnCKGnb',.KmblhkYb;Gnlbafd.;no;
.K.KbCKnCb;nCsKUwb;bvK;k.z.K.KK.nB.

I'd appreciate any ideas as to what maybe going wrong; and am happy to share any successful model that is estimated.

Cheers,

Andrew
Predict_char.py
LangLSTM_Deploy.prototxt
Input_Lang.py
Reply all
Reply to author
Forward
0 new messages