I simply followed the Training Procedure in "Training Tesseract
" (
http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract). It
turns out it is not that hard and it has really improved the
performance for my case.
I think the most critical part of a successful and useful training is
to generate training images. I manually cut lots of character areas
from number-plate images which were taken from real world, resized
them to similar size each as a number plate, thresholded them, put all
such number plate images (containing characters only) into a single-
page image while making sure enough inter-line space (eventually the
image becomes very large). The resultant image is stored as a TIFF
image and used as the Training Image. The other steps are just said in
the instructions. You will need create your dictionary data using
whatever means.
For your case, I think your training image will contain lots of (how
much is much enough?) preprocessed hand-written characters. Others are
just the same.