Break down pedigree

Daron Goode

unread,

May 28, 2018, 12:02:13 AM5/28/18

to tesseract-ocr

Hello,

I am new to Tesseract and could use some guidance on how a versed person would tackle this issue. I have a php website where I can get the data out of a pdf without any issues but the order of the data that I am pulling is a mess. The issue is that the return is only one long sting without any return characters or other way to break it down into parts I was going to slice the pdf into several chunks and run each one though OCR at a time but I find that Tesseract has the power to do what I need it to do. Also with the 1000s of times the user will be uploading a new pdf it might not line up exactly the way I need it to.

My end goal is to be able to update all these values to my database in the order they are related. For the 4th generation that would be 31 different areas to scoop up the data I need. If these are in order with an X coordinate I can always use that and work my Y values down.

Even if all I had to work with is a /n character for each line I might be able to make that work.

On the 4th generation Pedigree I tried to cut the last entire 4th generation out. If I go that route that would only be 6 crops I need to make on this (1 for the dog, two for each of those parents, and then each generation. My users will have 3 or 4 generation pedigrees.

Any advice would be greatly appreciated.

Thanks

Daron

Daron Goode

unread,

May 28, 2018, 8:34:00 AM5/28/18

to tesseract-ocr

I have built a template that can cut this into 6 pieces.

What I need to do with these is to put /n line characters at the end of the lines or be able to get the Y coordinates to see when those change and the degree of change. I am not able to find anything useful on how to accomplish this.

Thanks,

Daron

Lorenzo Bolzani

unread,

May 28, 2018, 10:26:58 AM5/28/18

to tesser...@googlegroups.com

Use opencv SIFT (or others) to align the picture with your template.

http://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_feature2d/py_feature_homography/py_feature_homography.html#feature-homography

https://docs.opencv.org/3.3.0/dc/dc3/tutorial_py_matcher.html

That will make everything much much easier.

Bye

Lorenzo

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/715e1bb6-b7d2-4ce0-8a84-f583bdaf95ce%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward