PANCARD AADHAAR

848 views
Skip to first unread message

Shubhranshu Panda

unread,
Jan 6, 2020, 9:07:13 AM1/6/20
to tesseract-ocr
I don't know how to extract particular text from a standard image type. I want to extract the name, dob and PAN number from the PAN card. I have attachhed a sample image for reference.
image_pan.jpg

Suresh Anand

unread,
Jan 6, 2020, 9:23:00 AM1/6/20
to tesser...@googlegroups.com
Use NER model after you extract the text with Oct

If you are interested in extracting both text and photo them use YOLO object detection and then tesseract

On Mon., 6 Jan. 2020, 19:37 Shubhranshu Panda, <shubhransh...@gmail.com> wrote:
I don't know how to extract particular text from a standard image type. I want to extract the name, dob and PAN number from the PAN card. I have attachhed a sample image for reference.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/6c6a10e6-9705-45a3-b7c4-7cf0eb9d8367%40googlegroups.com.

Shubhranshu Panda

unread,
Jan 6, 2020, 9:51:10 AM1/6/20
to tesser...@googlegroups.com
Thanks.
Additionally, is there any way I can instruct the system to look for the exact location for text extraction?

Suresh Anand

unread,
Jan 6, 2020, 9:54:32 AM1/6/20
to tesser...@googlegroups.com
You have annotations tools where you can annotate and train
I use labellmg .There are many you can explore

Saurabh Pal

unread,
Jan 7, 2020, 2:32:47 PM1/7/20
to tesseract-ocr
NER seems like overkill for this.


On Monday, January 6, 2020 at 7:53:00 PM UTC+5:30, Suresh Anand wrote:
Use NER model after you extract the text with Oct

If you are interested in extracting both text and photo them use YOLO object detection and then tesseract

On Mon., 6 Jan. 2020, 19:37 Shubhranshu Panda, <shubhrans...@gmail.com> wrote:
I don't know how to extract particular text from a standard image type. I want to extract the name, dob and PAN number from the PAN card. I have attachhed a sample image for reference.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesser...@googlegroups.com.

Saurabh Pal

unread,
Jan 7, 2020, 2:32:56 PM1/7/20
to tesseract-ocr
Try using template matching for your use case(I am assuming that PAN card format is same all over India). Atleast dob and pan card number can be found easily using regex. For names, you can reject all the other text boxes like 'INCOME TAX DEPARTMENT', 'GOVT. OF INDIA', then you will be left with only the fathers name and holders name just check for the y coordinate among those two text boxes.

Shubhranshu Panda

unread,
Jan 11, 2020, 2:21:09 AM1/11/20
to tesseract-ocr
yes. that's what I am doing right now. but there's an issue.

While I am getting all the names, I am not able to detect the [father's name] even if it is in the lot and some unwanted garbage is sometimes getting in the way.

I am attaching a sample image for your reference. and the text generated from that (in a list).


['MONIKA MAHADEV SHINDE', 'ARa Gar', 'GOVT. OF INDIA', 'MAHADEV SHINDE', '31/10/1992', 'Permanent Account Number', 'EJAPS0276M ~', 'MONIKA 1 SHIN OE :', '- 8', 'Signature']

I am obtaining the above list. and the father's name shows "ARa Gar" and not MAHADEV SHINDE.
As the unwanted text is not generated always, I need a way to figure out what the actual name might be.

Can you please look into this??
pan_card_4.jpg

Ameer Sheik

unread,
Jan 25, 2022, 5:52:57 AM1/25/22
to tesseract-ocr
Can You suggest some tools related to template matching

Ameer Sheik

unread,
Jan 25, 2022, 5:52:57 AM1/25/22
to tesseract-ocr
I'm also facing the same issue..... Can someone throw some time light here please

Ed Dow

unread,
Feb 25, 2022, 12:28:22 PM2/25/22
to tesseract-ocr
You could use OpenCV to define a template with regions of interest (ROIs) and then use tesseract to OCR them?
Reply all
Reply to author
Forward
0 new messages