Tessearct in containers

187 views
Skip to first unread message

Chris G

unread,
Oct 19, 2019, 7:17:07 AM10/19/19
to tesseract-ocr
Greetings,

I am hoping this question is not too general i am really just looking for others experiences.

We are running one of the lastest versions in containers running in a Kubernetes cluster. Performance is not great. We are doing PDF conversion and generating a searchable pdf which is what takes the longest. 

~30 seconds per page.

Each pod/container has 2 cores and 4gb memory. 

We are experimenting with various configurations, cores, memory, and now threads based on my readings here and on github.

For those of your running in containers what are you setting your resources to? just on average looking for a range of answers likely.

We are planning on doing some testing as follows.

Tesseract Thread count 2

Pod Core Count 2

Pod Memory 2

Job Count 2

 

Tesseract Thread count 1

Pod core count 1

Pod Memory 2

Job Count 1

 

Tesseract Thread count 1

Pod core count 1

Pod Memory 1

Job count 1


Job count is how many concurrent documents it works on and are assigned by our orchestration engine.


I have not found much in the way of people posting the above. I think 30 seconds is WAY too long per page so we are trying to optimize everything we can,


Thanks in advance for any insight.

Reply all
Reply to author
Forward
0 new messages