Hi Ryan,
I can suggest the following:
- Use higher resolution and don't use JPEG. At such resolution and compression level you are doomed to poor OCR results because character strokes get literally ruined. It's not clear if your former camera is able to do better but I suspect it is; at least it can use higher JPEG compression level. So probably you won't need another cam.
- A fixture... Mmm I don't think it's necessary. Most of your target text is quite well distinguishable and localizable. If you just provide good lighting, focus and position camera sanely - that's enough. The rest can be done by the same old ImageMagick. OTOH if you're required to process thousands of cards the fixture would be just convenient.
- Training. No it won't help at all. Your digits are very similar to what stock (English) traineddata files already have in them.
- Cropping. If your typical photos contain much complex surrounding - it's necessary to strip that off. If it's just the card itself - Tess would work well.
- Rotation. No need. Tess can handle it well, even for the skew level you have shown in your image. See my results below.
- PIN. Here you'd probably need to work in color domain. Scratch leftovers definitely would need to be filtered out. Show us a color variant.
In fact show your entire unedited source, maybe also a couple of other images. Probably the community might help you better.
What I have achieved so far:
"2rsz.jpg" - Your source image upscaled 4x. This allows to mitigate a bit those destructive JPEG compression artifacts.
>tesseract inet012\2rsz.jpg inet012\2rsz.jpg -psm 7 -c tessedit_char_whitelist=0123456789
Result: "2rsz.jpg.txt"
All number digits are perfect. Though, don't expect it to work good for the PIN part - it requires cleaning.
I used the single line PSM and restricted possible characters by a "whitelist".