hi Zdenko,
Thanks for your feedback!
I have implemented the following things in Colab:
1/ installed tesseract ocr and pytesseract
2/ Used pytesseract.image_to_string to convert the image of scanned document to text.
The output text is like:
sae S\Pewnowet refer Yo We Uniovetha, Bops
don't
a where MWAH ple Commvadityer gre. Avediarie tee wode
Onden OMe wol
'
and On Wigs kcale. of Oferakin,
nee. es:
[rer Bat Chain in Prd Vegelanie “roger |
SP in Pst Vegelasie “Wieder |
; AD Me ]8 inc ug Maer Contumneg
hom Nes “I —> ty
Uae | . Mere ed
Serigh Soma)
Which is not making sense.
So I was asking if there are ways to dig deeper into tesseract built in model and understand the output of each layer. And then try some enhancements to decode this better.
But for that, I need to know the model in detail and should be able to use it in Colab. and I am not able to find any relevant text around it. All I could find is tuning of model from command line that too on Linux machines.
So if there is any, would request you to provide a reference.
Ruchika