So if someone sends in labels like the attached ones, I need to grab the model number. So far results from straight tesseract usage are dismal. I used an ImageMagick library to clean up the image a bit and send it in and if its rotated at ALL the results are still dismal. Overall, I am just looking to increase accuracy.
Steps I have taken:
1) Using pre-processing library to clean up image
2) Added a new config that turns off dictionary and calls in a words file that has all the different samsung model numbers in it
3) tried to take my most promising pre-processed image and create a box file and then used "tesseract <image_name> <box_file_name> nobatch box.train" to train tesseract to not miss the two characters it missed ....this caused a segmentation fault.
Any hints or advice about how I can use tesseract to grab this information with at least 50% accuracy would be GREATLY appreciated.
Thanks!!