Alex Graves
Department of Computer Science
University of Toronto
Abstract
This paper shows how Long Short-term Memory recurrent neural networks can be used to generate complex sequences with long-range structure, simply by predicting one data point at a time. The approach is
demonstrated for text (where the data are discrete) and online handwriting (where the data are real-valued). It is then extended to handwriting
synthesis by allowing the network to condition its predictions on a text
sequence. The resulting system is able to generate highly realistic cursive
handwriting in a wide variety of styles.
...
Section 3 applies the prediction network to text from the Penn Treebank and Hutter Prize Wikipedia datasets.
...
Table 2: Wikipedia Results (bits-per-character)
Train Validation (static) Validation (dynamic)
1.42 1.67 1.33
To put the results in context, the current winner of the Hutter Prize (a variant of the PAQ-8 compression algorithm [20]) achieves 1.28 BPC on the same data (including the code required to implement the algorithm), mainstream compressors such as zip generally get more than 2, and a character level RNN applied to a text-only version of the data (i.e. with all the XML, markup tags etc. removed) achieved 1.54 on held-out data, which improved to 1.47 when the RNN was combined with a maximum entropy model [24].