Tesseract.js and traineddata language.

2,559 views
Skip to first unread message

Clint William Theron

unread,
Sep 1, 2019, 7:04:45 PM9/1/19
to tesseract-ocr
Hey. I started a Gitpod and build a tesseract.js app on a node.js server using this tutorial. I now want to use my own custom traineddata language file.  I imagine I should copy my trained file somewhere in to the project but I'm not sure. I tried replacing the eng.traineddata file located in /tests/assets/traineddata/ but, in windows 10, that didn't work. What should I do? Any advice would be greatly appreciated.
Thanks

Clint William Theron

unread,
Sep 2, 2019, 5:07:20 PM9/2/19
to tesseract-ocr
Correction:

Hey. I started a Gitpod and build a tesseract.js app on a node.js server using this tutorial. I now want to use my own custom traineddata language file.  How do I do that?
Thanks

Clint William Theron

unread,
Sep 3, 2019, 4:21:54 PM9/3/19
to tesseract-ocr
just give me clue!

Timothy Snyder

unread,
Sep 3, 2019, 4:28:23 PM9/3/19
to tesser...@googlegroups.com
10 seconds of investigation yielded an FAQ page from the repo explaining how tesseract.js maintains .traineddata files.


--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/57177753-a4f5-4411-900b-ac88e1676fec%40googlegroups.com.

Timothy Snyder

unread,
Sep 3, 2019, 4:28:41 PM9/3/19
to tesser...@googlegroups.com

Clint William Theron

unread,
Sep 4, 2019, 12:01:39 PM9/4/19
to tesseract-ocr
Thank you. I truly appreciate it. It's coming up with the most efficient search term, that's the key.


On Tuesday, September 3, 2019 at 10:28:41 PM UTC+2, Timothy Snyder wrote:
On Tue, Sep 3, 2019 at 4:28 PM Timothy Snyder <tc...@zips.uakron.edu> wrote:
10 seconds of investigation yielded an FAQ page from the repo explaining how tesseract.js maintains .traineddata files.


On Tue, Sep 3, 2019 at 4:21 PM Clint William Theron <theroncli...@gmail.com> wrote:
just give me clue!

On Monday, September 2, 2019 at 11:07:20 PM UTC+2, Clint William Theron wrote:
Correction:

Hey. I started a Gitpod and build a tesseract.js app on a node.js server using this tutorial. I now want to use my own custom traineddata language file.  How do I do that?
Thanks

On Monday, September 2, 2019 at 1:04:45 AM UTC+2, Clint William Theron wrote:
Hey. I started a Gitpod and build a tesseract.js app on a node.js server using this tutorial. I now want to use my own custom traineddata language file.  I imagine I should copy my trained file somewhere in to the project but I'm not sure. I tried replacing the eng.traineddata file located in /tests/assets/traineddata/ but, in windows 10, that didn't work. What should I do? Any advice would be greatly appreciated.
Thanks

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesser...@googlegroups.com.

Clint William Theron

unread,
Sep 4, 2019, 12:54:39 PM9/4/19
to tesseract-ocr
Actually, that doesn't answer my question. It only says where tesseract stores the .traineddata file after download and not how to set the langPath. I tried to set the langPath like this:

const worker = new TesseractWorker({
 corePath
: '../../node_modules/tesseract.js-core/tesseract-core.wasm.js',
 langPath
: lang_path
 
});


worker
.recognize(file,
       
'cus'
   
)
       
.progress(function(packet){
            console
.info(packet)
            progressUpdate
(packet)

       
})
       
.then(function(data){
            console
.log(data)
            progressUpdate
({ status: 'done', data: data })
       
})



but I get errors: 
* Error opening data file ./cus.traineddata
* Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.

Thanks for anything...

Clint William Theron

unread,
Sep 4, 2019, 2:24:02 PM9/4/19
to tesseract-ocr
Intuitively I know it answers my question but I fail to see the answer. Here's what went through my mind as I read your link: "I think a gitpod is a node.js server so that means the file shuld be in the fs where the command was executed. The command got executed in the demo.html file which is located in the browser directory but there is no .traineddata in that folder. maybe the command got executed in the /dist directory because at the beginning of the script we included the following

<script src="/dist/tesseract.dev.js"></script>

but if so I don't see the directory in the project..."

it's about here where I get lost. I now think maybe I should declare the langPath but this I did and I already told you what happens...

guys help me out here because after I get this right I still need to work on my .traineddata file itself. It's working but it's limited. I just made it to get started...

Thanks already :-)

ElGato ElMago

unread,
Sep 4, 2019, 11:58:47 PM9/4/19
to tesseract-ocr
Why don't you ask questions over there?  I guess you've been advised so.

2019年9月5日木曜日 3時24分02秒 UTC+9 Clint William Theron:
Reply all
Reply to author
Forward
0 new messages