How to extract Gujarati text using Parichit-OCR

498 views
Skip to first unread message

Jayant Solanki

unread,
Nov 19, 2012, 1:24:41 AM11/19/12
to parich...@googlegroups.com

Hi,

Please help me. i am searching for Gujarati OCR for last two months.

how can i use Parichit OCR to extract Gujarati Text from image file.

Please...Please..Please...help..


Thanks 

Indu

unread,
Nov 19, 2012, 6:45:27 AM11/19/12
to parich...@googlegroups.com
Hi,

Download the Gujarati training data  guj.traineddata http://code.google.com/p/parichit/downloads/list Check whether the Gujarati training data is in the folder Parichit_WIN_LIN/tesseract/tessdata.Download Parichit_WIN_LIN.tar.gz .Instructions for launching the GUI is  included in the help file .You have to select 'guj' in the OCR Language drop down menu in the Parichit GUI.New training data will be uploaded soon.
You can check with the training data available now and pls report the accuracy of the current training data.
--
Thanks & Regards

Indu

Jayant Solanki

unread,
Nov 20, 2012, 1:52:19 AM11/20/12
to parich...@googlegroups.com
Hi Indu,

Lots of thanks for your reply. my hope is alive....thanks a lot...

as per your suggestion  I downloaded Parichit_WIN_LIN.tar.gz and extracted in my computer then i execute. run.bat file and i got parichit OCR interface. 

i open one Gujarati text image file and i select OCR Languange as Gujarati from dropdown box and click on OCR button. i got error message like "Cannot find Tesseract. Please set its path." Screen shot image file with error also attached.

Gujarati training data is in the folder Parichit_WIN_LIN/tesseract/tessdata is available as guj.traindata.

please suggest me which path i have to set.  

Indu

unread,
Nov 20, 2012, 1:59:01 AM11/20/12
to parich...@googlegroups.com
Check whether the tesseract exe is in the Parichit_WIN_LIN/tesseract if not copy the tesseract exe into the specified folder.

Indu

unread,
Nov 20, 2012, 2:06:38 AM11/20/12
to parich...@googlegroups.com
If you have tesseract copied to  a different path you set path by modifying the run.bat script
currently path is given as follows
set PATH=%PATH%;E:\Parichit_WIN_LIN\tesseract\
set TESSDATA_PREFIX=E:\Parichit_WIN_LIN\tesseract\

Jayant Solanki

unread,
Nov 20, 2012, 5:13:40 AM11/20/12
to parich...@googlegroups.com
There is no tesseract.exe in the Parichit_WIN_LIN/tesseract. 
i tried to set both the variable and path which you suggested.

please help more...

Jayant Solanki

unread,
Nov 21, 2012, 4:45:11 AM11/21/12
to parich...@googlegroups.com
Hi Indu,

I downloaded tesseract setup from bellowed link. 



after setup i copied tesseract.exe from the project and pasted in E:\Parichit_WIN_LIN\tesseract

my error was gone and Gujarati ocr is also working fine. but only question is that its not extracting the word correctly. from 400 words only 1 word was correct. 

please suggest...  

Indu

unread,
Nov 21, 2012, 4:59:17 AM11/21/12
to parich...@googlegroups.com
Can you please send the document
Message has been deleted

Jayant Solanki

unread,
Nov 27, 2012, 2:59:58 AM11/27/12
to parich...@googlegroups.com
testJPG1.jpg

Jayant Solanki

unread,
Nov 27, 2012, 3:01:14 AM11/27/12
to parich...@googlegroups.com
Hi Indu,

I have attached file. 

Jayant Solanki

unread,
Dec 4, 2012, 6:10:54 AM12/4/12
to parich...@googlegroups.com
Hi Indu,

Any Improvement in OCR. Please help..

Jayant Solanki

unread,
Sep 7, 2013, 2:45:50 AM9/7/13
to parich...@googlegroups.com
Hi Indu,

how can i train data for Gujarati language. 

please provide some guideline.

Regards
Jayant Solanki

Jayant Solanki

unread,
Sep 10, 2013, 7:09:39 AM9/10/13
to parich...@googlegroups.com
Hi Indu,

i have one image file in Gujarati language and i want to convert this image into text. how can i train tesseract for my font. 

Please give few idea.
your help is needed now. i can't do anything. there is no proper tutorial is given on web site and even no video is provided. i want to train tesseract engine to work for my unicode(Gujarati) file. 



--
You received this message because you are subscribed to a topic in the Google Groups "Parichit-OCR" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/parichit-ocr/X6AR2nVRCeo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to parichit-ocr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



--
Solanki Jayant H.
RB One Source Pvt. Ltd.
Reply all
Reply to author
Forward
0 new messages