Deserialize Header Failed

40 views
Skip to first unread message

Memeroni

unread,
Oct 14, 2023, 6:24:02 AM10/14/23
to tesseract-ocr
Hey folks, I downloaded tesseract tonight and I'm having an issue I can't get past. The error output is as follows: Deserialize header failed: ☺
First document cannot be empty!!
num_pages_per_doc_ > 0:Error:Assert failed:in file ../../../src/ccstruct/imagedata.cpp, line 704

I am using a tif file as my raw image source. I have tried 2 different methods of generating the tif file. The first method is taking a screenshot with snipping tool, pasting it into gimp and saving as a tif. I also tried print screening instead of snipping tool. The second method is taking a screenshot with snipping tool, saving as a .png, then converting to .tif via ImageMagick commandline. I am creating the box file like so:

tesseract 9.tif 9 makebox

I then editing the box file to make sure it is an accurate representation of the characters on the screen. I have also tried creating the box file and just leaving it to see if that resolves the issue, it does not. I then proceed to create the lstmf file like so:

tesseract 9.tif 9 --psm 6 lstm.train

I then try to run lstmtraining or lstmeval and i get the header error every time. I am using version 5.3.3, but I have also tried using v4.1, recreating all the files and I still got the same issue. Does anyone know why I'm getting this issue, and how to resolve it? About to give up with tesseract because this shit does not work out of the box. I am following google instructions to a T so I either overlooked something crucial that is ruining my lstmf file or this shit just does not work for me. Appreciate any help that can be provided.

Zdenko Podobny

unread,
Oct 14, 2023, 8:39:59 AM10/14/23
to tesser...@googlegroups.com
Hello,

tesseract works out of the box.

What does not work are you users, downloading Tesseract at night and jumping to Tesseract training. Training requires knowledge and experience that you will not get by following some random internet tutorials (most of them are outdated, pretending to be successful, just to get monetization of their video, blog etc...)

The better approach is to read (tesseract) official documentation, read this forum, and understand tesseract limitations (yes, as each SW on this earth it has limitations). 
Then you make an informed decision about whether training makes sense or not. Or ask more experienced users for advice (if you are willing to provide details of what you are trying to achieve e.g. input images)

Otherwise, you are alone with your problems. And it is not because of the tesseract.

Zdenko


so 14. 10. 2023 o 12:23 Memeroni <ericpi...@gmail.com> napísal(a):
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/ff9e7700-ca32-4692-84d1-623ebe353b9dn%40googlegroups.com.

Memeroni

unread,
Oct 14, 2023, 9:03:05 AM10/14/23
to tesseract-ocr
You say its user error but you have not pointed to a single mistake I've made in my process. Please reply back with something helpful or gladly piss off :)
Reply all
Reply to author
Forward
0 new messages