tesseract-ocr

Contact owners and managers

1–30 of 7151

Welcome,

Before sending an email to the group:

Please read Frequently Asked Questions
Make sure you read the Tesseract documentation
Search internet sources (including this group) for a solution

If you have a problem:

Provide all steps (including input resources) for its replication.
So not send a screenshot of the terminal - send the logs or copy text from a terminal.

0 selected

testcoal, Misti Hamon2

Apr 18

Train Tesseract (german)

Scanned books? No help on training or choosing datasets, but, if these images are photoscanned book

unread,

Train Tesseract (german)

Scanned books? No help on training or choosing datasets, but, if these images are photoscanned book

Apr 18

Jayrajsinh Zala, Zdenko Podobny2

Apr 18

tesseract misleading in 8 and 6

Unfortunately, your post is very vague. Unless you provide a detailed description of what you are

unread,

tesseract misleading in 8 and 6

Unfortunately, your post is very vague. Unless you provide a detailed description of what you are

Apr 18

Leder Extreme BR

Apr 18

Cursive letters

Hello, I'm testing tesseract and I'm not able to process texts that use cursive fonts. How do

unread,

Cursive letters

Hello, I'm testing tesseract and I'm not able to process texts that use cursive fonts. How do

Apr 18

achille sadjang

Apr 17

Tesseract to recognize images or shapes

Hello everyone, I have a concern: is it possible to train Tesseract to recognize images or shapes? If

unread,

Tesseract to recognize images or shapes

Hello everyone, I have a concern: is it possible to train Tesseract to recognize images or shapes? If

Apr 17

Omar Samir

Apr 12

Fine-Tune Arabic Model

I have created a dataset with almost 200 million words. So there are about 20 million examples to

unread,

Fine-Tune Arabic Model

I have created a dataset with almost 200 million words. So there are about 20 million examples to

Apr 12

Mark Pellegrino, … Jeremiah17

Apr 11

Post OCR Verification and Editing

Hi Mark, Glad you found Scribe OCR useful. Regarding character support, all characters in the Windows

unread,

Post OCR Verification and Editing

Hi Mark, Glad you found Scribe OCR useful. Regarding character support, all characters in the Windows

Apr 11

Nathan Bierema

Apr 11

Building from souce

I'm trying to build Tesseract from source using these instructions, but I believe I'm doing

unread,

Building from souce

I'm trying to build Tesseract from source using these instructions, but I believe I'm doing

Apr 11

Shatter, Jeremiah2

Apr 8

Recognition when font is known

Cropping the image to only include the relevant area can significantly improve performance in cases

unread,

Recognition when font is known

Cropping the image to only include the relevant area can significantly improve performance in cases

Apr 8

Cain Pian, Jeremiah3

Apr 7

Is there a good way to change the recognition rate for such images?

Yes, I've seen a lot of discussion on this issue that ended up going nowhere, it might be helpful

unread,

Is there a good way to change the recognition rate for such images?

Yes, I've seen a lot of discussion on this issue that ended up going nowhere, it might be helpful

Apr 7

Misti Hamon

Apr 5

Image preprocessing - textbook like layout

I'm hoping someone here can help. I'm working with a scan of a book with a textbook like

unread,

Image preprocessing - textbook like layout

I'm hoping someone here can help. I'm working with a scan of a book with a textbook like

Apr 5

Jean-Marc Spaggiari, René JM Clais5

Apr 3

Shord word detection recommendations

Thanks for giving it a try! I ended up generating 11 versions of the same picture with very little

unread,

Shord word detection recommendations

Thanks for giving it a try! I ended up generating 11 versions of the same picture with very little

Apr 3

Cain Pian, Zdenko Podobny3

Apr 3

Does training new images increase the size of the traindata file?

Thanks for the reply I'm simply confused by the fact that training a large number of images didn

unread,

Does training new images increase the size of the traindata file?

Thanks for the reply I'm simply confused by the fact that training a large number of images didn

Apr 3

aum hren, Tom Morris3

Mar 29

english-arabic dictionary - transliteration text

Rather than using random web resources, I'd suggest using the official documentation. The most

unread,

english-arabic dictionary - transliteration text

Rather than using random web resources, I'd suggest using the official documentation. The most

Mar 29

Madhav Pandey, … Zdenko Podobny14

Mar 27

Getting Error: No such file or directory: 'data/foo/all-lstmf'

You can try custom images - see the example ocrd-testset.zip And follow the example from https://

unread,

Getting Error: No such file or directory: 'data/foo/all-lstmf'

You can try custom images - see the example ocrd-testset.zip And follow the example from https://

Mar 27

roei shlezinger, Zdenko Podobny2

Mar 27

fine tuning on images

You can easily test your hypothesis by modifying Makefile[1] lines from tesseract "$<" $

unread,

fine tuning on images

You can easily test your hypothesis by modifying Makefile[1] lines from tesseract "$<" $

Mar 27

Ajay Pandya, Zdenko Podobny2

Mar 27

Lack of accuracy on reading numbers

Always test the command line if there is an issue with the wrapper. tesseract -v tesseract 5.3.4-44-

unread,

Lack of accuracy on reading numbers

Always test the command line if there is an issue with the wrapper. tesseract -v tesseract 5.3.4-44-

Mar 27

inKi Wang, Zdenko Podobny3

Mar 26

Reading large gray images with only numbers yields incorrect results

Yes, we have suggestions for me to improve the accuracy of the results - they are already in the

unread,

Reading large gray images with only numbers yields incorrect results

Yes, we have suggestions for me to improve the accuracy of the results - they are already in the

Mar 26

Misti Hamon, Ger Hobbelt2

Mar 25

hOCR verification and editing plus non-word characters

In your scenario, I would check performance of both modern lstm (v4/v5 engine) and old "classic

unread,

hOCR verification and editing plus non-word characters

In your scenario, I would check performance of both modern lstm (v4/v5 engine) and old "classic

Mar 25

Keith M, … Graham Toal14

Mar 21

advice for OCR'ing 9-pin dot matrix BASIC code

I believe that for fixed font width listings, it is preferable to segment the page into characters

unread,

advice for OCR'ing 9-pin dot matrix BASIC code

I believe that for fixed font width listings, it is preferable to segment the page into characters

Mar 21

Liam Doherty, … Tom Morris5

Mar 19

why are there no new trained models since 2018?

Thanks, that's helpful. Is the collaboration with Google ongoing then? Can you give me a sense of

unread,

why are there no new trained models since 2018?

Thanks, that's helpful. Is the collaboration with Google ongoing then? Can you give me a sense of

Mar 19

Jan Ploska

Mar 16

Chinise characters.

Hello, I am making a transcrypt of YT wideos using tessaract. Images I input to tessaract look like

unread,

Chinise characters.

Hello, I am making a transcrypt of YT wideos using tessaract. Images I input to tessaract look like

Mar 16

Quan Nguyen, JB Data313

Mar 13

VietOCR v6.3.0 & VietOCR.NET v6.3.0 Releases

VietOCR v6.13.0 & VietOCR.NET v6.11.0 Releases A Java/.NET WPF GUI frontend for Tesseract OCR

unread,

VietOCR v6.3.0 & VietOCR.NET v6.3.0 Releases

VietOCR v6.13.0 & VietOCR.NET v6.11.0 Releases A Java/.NET WPF GUI frontend for Tesseract OCR

Mar 13

Ravil R, Zdenko Podobny2

Mar 13

Leptonica directory

It seems like you are not following the official documented way for compiling leptonica and tesseract

unread,

Leptonica directory

It seems like you are not following the official documented way for compiling leptonica and tesseract

Mar 13

Roman Seidel, … Zdenko Podobny7

Mar 12

user patterns with tesserocr python API

One correction: I checked the example in the below mentioned url with the Tesseract executable and

unread,

user patterns with tesserocr python API

One correction: I checked the example in the below mentioned url with the Tesseract executable and

Mar 12

Jan F

Mar 12

Some PDF readers see double spaces in tesseract PDF output

Dear readers, I'm experimenting with Tesseract 5.3.3.20231005 on Windows and I keep running into

unread,

Some PDF readers see double spaces in tesseract PDF output

Dear readers, I'm experimenting with Tesseract 5.3.3.20231005 on Windows and I keep running into

Mar 12

Panumeth Khongsawatkiat

Mar 12

Training Tesseract 5 for a New Font in Thai not wroking

I tried to train Tesseract 5 with a new font in Thai but The BCER value keeps increasing. This is the

unread,

Training Tesseract 5 for a New Font in Thai not wroking

I tried to train Tesseract 5 with a new font in Thai but The BCER value keeps increasing. This is the

Mar 12

Mridul Davesar

Mar 12

LSTM training tesseract OCR high error rate

Hey everyone , I am train my own lstm model based using some specific images that I want tesseract to

unread,

LSTM training tesseract OCR high error rate

Hey everyone , I am train my own lstm model based using some specific images that I want tesseract to

Mar 12

Ali öksüzoglu

Mar 11

I can't create OCR traindata

Hello, I am trying to solve the Captcha in this image, but I am getting an error. Is there anyone who

unread,

I can't create OCR traindata

Hello, I am trying to solve the Captcha in this image, but I am getting an error. Is there anyone who

Mar 11

thangaraj r

Mar 8

i got Failed to continue from: data/eng/eng_num_vert.lstm

Warning: LSTMTrainer deserialized an LSTMRecognizer! Error, data/eng/eng_num_vert.lstm is an integer

unread,

i got Failed to continue from: data/eng/eng_num_vert.lstm

Warning: LSTMTrainer deserialized an LSTMRecognizer! Error, data/eng/eng_num_vert.lstm is an integer

Mar 8

Minh Nguyen

Mar 7

How to get path tesseract_cmd

I'm using sam cli to build and deploy images to AWS ECR. The code snippet has been packaged into

unread,

How to get path tesseract_cmd

I'm using sam cli to build and deploy images to AWS ECR. The code snippet has been packaged into

Mar 7