How to recognize some specific symbols with Tess4.0

43 views
Skip to first unread message

roberty...@gmail.com

unread,
Aug 1, 2017, 1:57:57 AM8/1/17
to tesseract-ocr







Hello,

I'm trying to apply Tess4.0 to recongnize the simplified Chinese with the command as:
  argc = 13;
  argv[1] = "E:/数据库/yanghui_results/yanghui_100_0.jpg";
  argv[2] = "E:/sample/01";
  argv[3] = "-l";
  argv[4] = "chi_sim+eng";
  argv[5] = "-psm";
  argv[6] = "7";
  argv[7] = "--oem";
  argv[8] = "OEM_TESSERACT_LSTM_COMBINED";
  argv[9] = "--tessdata-dir";
  argv[10] = "../tessdata";
  argv[11] = "--user-words";
  argv[12] = "../tessdata/chi_sim.user-words";

I have used the chi_sim and eng traineddata as the tessdata language, but some specific symbols, such as '∠' (means an angle), cannot be correctly recognized.


For example, an image demonstrated in above is the input data of Tess4.0, and the results is shown as the following:
如图, 在口ABCD中, 点E, F在AC上, 且乙ABE=乙CDF, 求证: BE=DF,

From the results, we can observe that the '∠' symbol has been recognized as '乙', and the rhomboid symbol is recognized as '口', '.' period symbol as ',' comma symbol.


How to correctly recognized these specific symbols with Tess4.0? Can you help me?
Reply all
Reply to author
Forward
0 new messages