Hi,I am trying to run tesseract-ocr on invoices to detect user ID's, Invoice numbers, tax codes etc. I think tesseract has not been trained on this kind of data so i need to fine tune the network on my data. Now it will be a bit difficult for me to get labelled data to fine tune tesseract as stated in training-tesseract wiki page. So wanted to know if its possible to only tune the language model of tesseract-ocr in an unsupervised way just like the language models trained for English Language Understanding i.e. showing the language model just the pins and ids by passing the output generated at previous (t-1) timestep as input to current timestep (t).
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/276e6dcf-f0b5-43e0-a794-d1bb69c88857%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.