Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

GitHub Issue #1492

23 views
Skip to first unread message

CCExtractor.org CI Platform

unread,
Mar 10, 2023, 5:31:38 AM3/10/23
to ccextra...@googlegroups.com
[BUG] Incorrect path for loading tesseract traineddata - ibrahim-akrab
Link to Issue: https://www.github.com/CCExtractor/ccextractor/issues/1492
ibrahim-akrab

CCExtractor version: 0.94

Necessary information

  • Is this a regression (i.e. did it work before)? {NO}
  • What platform did you use? {Linux}
  • What were the used arguments? {}

Video links

channel5-2018-02-12.ts from the TV Samples page

Additional information

ccextractor tries to load tesseract traineddata from a wrong location then blames it on the TESSDATA_PREFIX. Here's the output it produces: Opening file: /home/ibrahim/Downloads/channel5-2018-02-12.ts File seems to be a transport stream, enabling TS mode Analyzing data in general mode Error opening data file /usr/share/eng.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory. Failed loading language 'eng' Tesseract couldn't load any languages! Failed TessBaseAPIInit4 -1 I checked the logic in ocr.c and found that probe_tessdata_location works fine by tracing the syscalls it makes to each possible tessdata location by running strace -e trace=openat ./ccextractor ~/Downloads/channel5-2018-02-12.ts and the result is as follows: Opening file: /home/ibrahim/Downloads/channel5-2018-02-12.ts openat(AT_FDCWD, "/home/ibrahim/Downloads/channel5-2018-02-12.ts", O_RDONLY) = 3 File seems to be a transport stream, enabling TS mode Analyzing data in general mode openat(AT_FDCWD, "./tessdata/", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/usr/share/tessdata/", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4 openat(AT_FDCWD, "/usr/share/eng.traineddata", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/usr/share/eng.traineddata", O_RDONLY) = -1 ENOENT (No such file or directory) Error opening data file /usr/share/eng.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory. Failed loading language 'eng' Tesseract couldn't load any languages! Failed TessBaseAPIInit4 -1 It checks the paths correctly and stops when finding it at /usr/share/tessdata/' so I suspect the problem is possibly in theTessBaseAPIInit4` call.

Reply all
Reply to author
Forward
0 new messages