Hey everyone,--I've got this pdf document which is a schedule. I'm trying to extract the text from it via tesseract but I'm not having that good results.I've tried a lot of different things, in my inexperienced opinion the image seems very high quality as I can zoom in a lot without seeing pixels. I've also tried to convert the pdf->tiff and add grayscale filter (all via java).I've attached both the end result and the original pdf here along with a sample of the output, any help making the output better would be appreciated.The tiff file is too big for the attachement; see this link: http://wltd.org/Daily%20schedule-14.tiff---Begin text---008 KIERA MCG 3:00 PM 11:00 PM TRWN 8.00 —718 KYLE s 11:00 PM 7:00 AM MT 8.00 < —686 JOSEPH e 11:00 PM 5:00 AM MT 6.00 — >718 KYLE s 11:00 PM 7:00 AM MT 8.00 — >656 CHANDLER A 1:00 PM 4:00 PM MB 3.00 —720 TYLER D 11:00 PM 7:00 AM T|_ F 8.00 < —720 TYLER D 11:00 PM 7:00 AM T|_ F 8.00 — >052 SH ELLY L 5:30 AM 2:00 PM FLRIFFIMGR F 8.50 _:IRiley M 372 8:00 AM 4:00 PM FLR F 8.00 —‘ Raphael B602 4:00 PM 12:00 AM FLRIMGR F 8.00 ‘ —:| I‘ Kevin G 652 11:00 AM 7:00 PM g$Y$IWNIMNY$I F 8.00 ‘ I:-:| IJoseph C 191 8:00 AM 4:00 PM ADMIBKIMB F 8.00 -:—2014 ROXANA T 11:00 AM 7:00 PM ADM F 8.00 _--END TEXT---As you can see tesseract becomes quite creative with its attempt at parsing this, earlier in the document it even parsed the letter "N" as "|\|", creative but useless for parsing!
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/f77f8dd8-f6d2-4f6b-b5fe-5510fac4f878%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

I could, the only issue is that based on the number of people scheduled the box can grow, which would change all the x,y coords...What can be easily done is to narrow down the scope of the ocr by only getting the horizontal table part and omitting the rest, I'm guessing that might also help?Thanks for the help by the way!
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/d3270fa9-7706-4260-9f90-c8b8d0f350d6%40googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/32721b73-7333-468c-8232-d6f5f68487a1%40googlegroups.com.
<Daily schedule-11348.tiff>
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/86f2ff29-e666-4136-8fc1-43ef6a509e75%40googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/9019638c-8c74-44f5-b887-3430a0f63d4a%40googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/0797dc24-b857-4245-9299-61df5a32a488%40googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/f7688cd4-63a7-4ade-b150-0133c49364d7%40googlegroups.com.