Hello,
Im building a OCR to read selected fields from invoices. i used tesseract, problems im facing are
1)not able to get table structures as is, atleast expecting a pipe symbol, which wil help in parsing text
2)few of characters were not extracted correctly. how to improve quality. does training tesseract4 helps?
3)why do you train tesseract4 additionally?
4)is there any option that i can use to get white spaces between words and text alignment as is in image after converting
i almost spent 1 mnth on this, could able to build ocr tool with a 40% accuracy