How to use my own traineddata language in OCR process?

1,090 views
Skip to first unread message

Clint William Theron

unread,
Aug 23, 2019, 11:43:18 PM8/23/19
to tesseract-ocr
Hi. I have a web-app and I'm using the Tesseract CDN like so:



I then try to add the path to my own created traineddata language like so:

const worker = new TesseractWorker({
    langPath
: 'https://lottoticketscanner.iclips.co.za/assets/tesseract/langs-folder/',
});

and finally I call the recognize method like so:

worker.recognize(canvas.toDataURL('image/png'), 'eng')
                       
.progress(progress => console.log('progress', progress))
                       
.then(result => console.log('result', result.text))
                       
.finally(() => worker.terminate());

This doesn't seem to be using my language (that I created myself). I get the same output as I do when I remove the langPath. I mean I can cleary see my language is not being used in the recognized process. I know it's not because I tested my language (traineddata) in windows desktop and I get the results that I actually want. What needs to change in my code so my (custom) traineddata, and only mine, is being  used?

Thanks.

Shree Devi Kumar

unread,
Aug 23, 2019, 11:56:30 PM8/23/19
to tesseract-ocr
You can name your custom traineddata file with a different name eg. mycustom.traineddata, copy the file to your tessdata folder (referred by tessdata_prefix) and then use 'mycustom' instead of 'eng' in your program.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/8a823afb-fde5-43aa-a643-2cf69462c2f6%40googlegroups.com.

Clint William Theron

unread,
Aug 24, 2019, 9:32:25 AM8/24/19
to tesseract-ocr
Thanks for your answer. I have a different error though. I'm not sure what you mean by tessdata folder. I'm using the tesseract CDN on a http web server in a html web page. The following images illustrates my current problem: 

Untitled.jpg

Untitled.png


The first image illustrates the type of server and directory structure I'm using and the second image shows the error. What is the resolution? I know I'm close now though, thanks to you.

My current code looks like so:

html
js
const worker = new Tesseract.TesseractWorker({
    langPath
: 'https://lottoticketscanner.iclips.co.za/assets/tesseract/tessdata/',
});


worker
.recognize(cameraSensor2.toDataURL('image/png'), 'mycustom')

                       
.progress(progress => console.log('progress', progress))

                       
.then(result => console.log('result', result))
                       
.finally(() => worker.terminate());

Thank you.                    
To unsubscribe from this group and stop receiving emails from it, send an email to tesser...@googlegroups.com.

Shree Devi Kumar

unread,
Aug 24, 2019, 10:44:12 AM8/24/19
to tesseract-ocr
Check that mycustom.traineddata is available in https://lottoticketscanner.iclips.co.za/assets/tesseract/tessdata/

To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/4d8dcf15-293c-4b3e-9d67-8b0ba8be9a27%40googlegroups.com.


--

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

Clint William Theron

unread,
Aug 24, 2019, 12:58:55 PM8/24/19
to tesseract-ocr
The traineddata file is available in that location. I took a screenshot of the current error:

Untitled.png

I changed the name to custom.traineddata but it's not relevant to the problem. I also changed the location of the file to see if that makes a different but didn't:

const worker = new Tesseract.TesseractWorker({

                  langPath
: 'https://iclips.co.za/images/tessdata/',
               
});

worker.recognize(cameraSensor2.toDataURL('image/png'), 'custom')...

Do you see what I'm missing here?
Thanks already

Shree Devi Kumar

unread,
Aug 24, 2019, 1:13:01 PM8/24/19
to tesseract-ocr
I have not used tesseract CDN on a http web server in a html web page. 

The error says that the traineddata file cannot be found. You need to check the value of TESSDATA_PREFIX and put your custom traineddata there.

To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/75347274-7bbe-4f66-a313-7fb51ad9d253%40googlegroups.com.

Clint William Theron

unread,
Aug 24, 2019, 1:34:00 PM8/24/19
to tesseract-ocr
I don't know where to find the TESSDATA_PREFIX value since I'm using tesseract CDN on a http web server. What did you do? Did you create a node.js app? I install tesseract in windows 10 and I replaced the traineddata file in the tessdata  directory and it worked. I'm looking to build an online solution though. I got started and found out about the custom traineddata idea from the following link:


It's not necessary to use the CDN or even a html web page. The solution should just work online and use my custom traineddata language

Clint William Theron

unread,
Aug 24, 2019, 1:38:11 PM8/24/19
to tesseract-ocr
Where do I find the value of TESSDATA_PREFIX? 

Clint William Theron

unread,
Aug 24, 2019, 6:53:29 PM8/24/19
to tesseract-ocr
Check here:
mycustom.traineddata definitely points to the correct location. It must be something else. Would you help me figure it out or at least give me a working solution. Thanks already

Clint William Theron

unread,
Aug 25, 2019, 12:07:48 PM8/25/19
to tesseract-ocr
If you recently tested the above link there was a auth require but I have taken that out now. The link should work now. Sorry for any inconvenience. 

Shree Devi Kumar

unread,
Aug 26, 2019, 4:56:49 AM8/26/19
to tesseract-ocr

Please file your issue in that repo.

To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/c7626d47-7642-44d2-bf66-9ab571106238%40googlegroups.com.

Clint William Theron

unread,
Aug 27, 2019, 1:11:22 PM8/27/19
to tesser...@googlegroups.com
in which group am I now? I don't understand the groups. I thought  https://groups.google.com/forum/#!forum/tesseract-ocr was the correct group to ask about tesseract.

Reply all
Reply to author
Forward
0 new messages