unrecognized argument "unrecognised argument linedata_only"

83 views
Skip to first unread message

thiyam...@gmail.com

unread,
Jul 21, 2018, 12:12:22 PM7/21/18
to tesseract-ocr
My command is 


usr/share/tesseract-ocr/./tesstrain.sh \

--fonts_dir /usr/share/fonts \

--lang ben \

--linedata_only\

--noextract_font_properties \

--langdata_dir /home/jennil/Desktop/pro/langdata-master/ben\

--tessdata_dir /usr/share/tesseract-ocr/4.00/tessdata –output_dir /home/jennil/Desktop/pro/output/ben_output\

--fontlist “Lohit Bengali”



and here is the error



ERROR: Unrecognized argument --linedata_only--noextract_font_properties


Zdenko Podobny

unread,
Jul 21, 2018, 12:14:44 PM7/21/18
to tesser...@googlegroups.com
your comand is wrong. you forget to put there space.

Dňa so 21. 7. 2018, 18:12 <thiyam...@gmail.com> napísal(a):
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/3aef2479-d04f-4b80-8d3b-abec3d4a9468%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

thiyam...@gmail.com

unread,
Jul 21, 2018, 12:30:14 PM7/21/18
to tesseract-ocr
can u please point out the place where to put the space

thank you

Shree Devi Kumar

unread,
Jul 21, 2018, 1:42:41 PM7/21/18
to tesser...@googlegroups.com
--linedata_only\

You need space before the continuation mark \

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

thiyam...@gmail.com

unread,
Jul 22, 2018, 2:59:22 AM7/22/18
to tesseract-ocr
hello again, i modified the error in the way you said and there is no error. but now the same error of unrecognised is occured in output_dir.
the error is
ERROR: Unrecognized argument -–output_dir

my command is

/usr/share/tesseract-ocr/./tesstrain.sh \

--fonts_dir /usr/share/fonts \

--lang ben \

--linedata_only \

--noextract_font_properties \

--langdata_dir /home/jennil/Desktop/pro/langdata-master/ben \

--tessdata_dir /usr/share/tesseract-ocr/4.00/tessdata \

-–output_dir /home/jennil/Desktop/pro/output/ben_output \

--fontlist “Lohit Bengali”


please do help

Shree Devi Kumar

unread,
Jul 22, 2018, 6:57:30 AM7/22/18
to tesser...@googlegroups.com
needs two dashes, 


For more options, visit https://groups.google.com/d/optout.


--

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

Jennil Thiyam

unread,
Jul 22, 2018, 7:16:17 AM7/22/18
to tesser...@googlegroups.com
Oh sorry for the mistake...I put two dashes, still it says unrecognised..

Jennil Thiyam

unread,
Jul 22, 2018, 10:50:32 AM7/22/18
to tesser...@googlegroups.com
you guys help me...now there is no error, but i don't know about the fonts, i try to train the bengali in "lohit-bengali" font thinking its already in the FONTS folder, but i got 

=== Starting training for language 'ben'
[Sun Jul 22 10:48:33 EDT 2018] /usr/bin/text2image --fonts_dir=/usr/share/fonts/truetype --font=“lohit-bengali” --outputbase=/tmp/font_tmp.z6y7AIvqyI/sample_text.txt --text=/tmp/font_tmp.z6y7AIvqyI/sample_text.txt --fontconfig_tmpdir=/tmp/font_tmp.z6y7AIvqyI
Could not find font named “lohit-bengali”.
Pango suggested font FreeMono.
Please correct --font arg.

=== Phase I: Generating training images ===
Rendering using “lohit-bengali”
[Sun Jul 22 10:48:34 EDT 2018] /usr/bin/text2image --fontconfig_tmpdir=/tmp/font_tmp.z6y7AIvqyI --fonts_dir=/usr/share/fonts/truetype --strip_unrenderable_words --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tmp.pBWa4wRHmt/ben/ben.“lohit-bengali”.exp0 --max_pages=3 --font=“lohit-bengali” --text=/home/jennil/Desktop/pro/langdata-master/ben/ben.training_text
Could not find font named “lohit-bengali”.
Pango suggested font FreeMono.
Please correct --font arg.
ERROR: /tmp/tmp.pBWa4wRHmt/ben/ben.“lohit-bengali”.exp0.box does not exist or is not readable
ERROR: /tmp/tmp.pBWa4wRHmt/ben/ben.“lohit-bengali”.exp0.box does not exist or is not readable

SO , please tell is all the fonts which are in this FONTS folder are already installed to tesseract or not?


needs two dashes, 

To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.


--

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.

Shree Devi Kumar

unread,
Jul 22, 2018, 11:00:31 AM7/22/18
to tesser...@googlegroups.com

needs two dashes, 



--

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

Jennil Thiyam

unread,
Jul 23, 2018, 3:48:08 AM7/23/18
to tesser...@googlegroups.com

I tried using Lohit Bengali and here is the command

/usr/share/tesseract-ocr/./tesstrain.sh --fonts_dir /usr/share/fonts --lang ben --linedata_only --noextract_font_properties --langdata_dir /home/jennil/Desktop/pro/langdata-master --tessdata_dir /usr/share/tesseract-ocr/4.00/tessdata --output_dir /home/jennil/Desktop/pro/output/ben_output --fontlist “Lohit Bengali”

and the error i got is

== Starting training for language 'ben'

[Mon Jul 23 01:18:01 EDT 2018] /usr/bin/text2image --fonts_dir=/usr/share/fonts --font=“Lohit --outputbase=/tmp/font_tmp.zAepRNq6Yo/sample_text.txt --text=/tmp/font_tmp.zAepRNq6Yo/sample_text.txt --fontconfig_tmpdir=/tmp/font_tmp.zAepRNq6Yo
Could not find font named “Lohit.


Pango suggested font FreeMono.
Please correct --font arg.

=== Phase I: Generating training images ===

Rendering using “Lohit
Rendering using Bengali”
[Mon Jul 23 01:18:16 EDT 2018] /usr/bin/text2image --fontconfig_tmpdir=/tmp/font_tmp.zAepRNq6Yo --fonts_dir=/usr/share/fonts --strip_unrenderable_words --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tmp.abQfzSYB19/ben/ben.Bengali”.exp0 --max_pages=3 --font=Bengali” --text=/home/jennil/Desktop/pro/langdata-master/ben/ben.training_text
[Mon Jul 23 01:18:16 EDT 2018] /usr/bin/text2image --fontconfig_tmpdir=/tmp/font_tmp.zAepRNq6Yo --fonts_dir=/usr/share/fonts --strip_unrenderable_words --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tmp.abQfzSYB19/ben/ben.“Lohit.exp0 --max_pages=3 --font=“Lohit --text=/home/jennil/Desktop/pro/langdata-master/ben/ben.training_text
Could not find font named Bengali”.


Pango suggested font FreeMono.
Please correct --font arg.

Could not find font named “Lohit.


Pango suggested font FreeMono.
Please correct --font arg.

ERROR: /tmp/tmp.abQfzSYB19/ben/ben.Bengali”.exp0.box does not exist or is not readable
ERROR: /tmp/tmp.abQfzSYB19/ben/ben.“Lohit.exp0.box does not exist or is not readable
ERROR: /tmp/tmp.abQfzSYB19/ben/ben.“Lohit.exp0.box does not exist or is not readable

please help me out shreeshrii
I read the link, but still i got this confusion about the fonts...the lohit bengali font is already in the system, then why this thing is happening


some of the fonts that showed up when i wrote text2image --fonts_dir /usr/share/fonts --list_available_fontsare

01: Liberation Serif Italic
102: Likhan Medium
103: Lohit Assamese
104: Lohit Bengali
105: Lohit Devanagari
106: Lohit Gujarati
107: Lohit Gurmukhi
108: Lohit Kannada
109: Lohit Malayalam
110: Lohit Odia
111: Lohit Tamil
112: Lohit Tamil Classical
113: Lohit Telugu
114: Loma
115: Loma Bold
116: Loma Bold Oblique
117: Loma Oblique
118: Manjari
119: Manjari Bold
120: Manjari Thin
121: Meera
122: Mitra Mono
...

Lohit Bengali is in it, so please tell me why is the error, do i need to do something others too?



needs two dashes, 

To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.


--

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

Lorenzo Bolzani

unread,
Jul 23, 2018, 9:29:41 AM7/23/18
to tesser...@googlegroups.com

Please read the complete error message: it's telling you exactly where the problem is.

I think you are using "fancy double quotes" or something like that rather than the normal ones.

Are you doing cut and paste from some word processor? This is probably causing all the errors...



Jennil Thiyam

unread,
Jul 23, 2018, 9:52:25 AM7/23/18
to tesser...@googlegroups.com
Even though the double quotes look fancy here, its not the case in command prompt. 
From all your help i am able to run this command but still i got lots of thing that says
 
Normalization failed for string  <some character>

and at last with this

Error writing unicharset!!

any help is welcome, i am so new to the tesseract, and trying my way to get in.

Reply all
Reply to author
Forward
0 new messages