Hi Junye,
Now,I hava an workstation with 36 core(Intel(R) Xeon(R) E7-4820 v2 2.00GHz)
32G Memory ,
RHEL7.3 system
My training text is about 29MB including 9470568 characters.
The .tif file is about 2.5GB ,file sizes generated by different fonts are slightly different. It takes about 12 hours to generate a tif file.
It takes about 40 hours to generate one lstm files from a .tif file.
this is my command as follows:
/usr/local/bin/tesseract /root/tesseract_train/tif_and_box/lyq_chn.ReejiCloudYuanXiGBK.exp0.tif /root/tesseract_train/lstm/aaa/ReejiCloudYuanXiGBK.exp0 /usr/share/tesseract/4/tessdata/configs/lstm.train /usr/share/tesseract/4/tessdata/scripts/lang/lyq_chn/lyq_chn.config > /root/tesseract_train/lstmlogs/ReejiCloudYuanXiGBK.log 2>&1
/usr/local/bin/tesseract /root/tesseract_train/tif_and_box/lyq_chn.MSmartPRC.exp0.tif /root/tesseract_train/lstm/aaa/MSmartPRC.exp0 /usr/share/tesseract/4/tessdata/configs/lstm.train /usr/share/tesseract/4/tessdata/scripts/lang/lyq_chn/lyq_chn.config > /root/tesseract_train/lstmlogs/MSmartPRC.log 2>&1
/usr/local/bin/tesseract /root/tesseract_train/tif_and_box/lyq_chn.SimSun.exp0.tif /root/tesseract_train/lstm/aaa/SimSun.exp0 /usr/share/tesseract/4/tessdata/configs/lstm.train /usr/share/tesseract/4/tessdata/scripts/lang/lyq_chn/lyq_chn.config > /root/tesseract_train/lstmlogs/SimSun.log 2>&1
As shown in the screenshot:
I found that a tesseract process can only use one core.
here is the tesseract --version :
This is too time consuming. Is there no other way to speed up?
在 2018年11月27日星期二 UTC+8下午5:27:44,Junye Li写道: