Need Help with tiktok image processing

90 views
Skip to first unread message

Anna Muller

unread,
Dec 9, 2022, 12:15:46 PM12/9/22
to tesseract-ocr
Hello - I am very new to using the Tesseract software. I am currently completing a project that requires me to read text from TikTok screenshots - I attached a random example image I got from Tiktok to this post. I am currently getting pretty inaccurate output. Below I pasted the code I was using that I got from an online tutorial.

I was wondering if anybody had any suggestions or would be able to point me in the right direction to resources where I could better learn how to fine tune my image processing parameters.

 I am currently using Jupyter Notebooks, but if anybody suggests accessing Tesseract differently, please let me know.

My Code:
from PIL import Image
column = Image.open('tiktoktest.png')
gray = column.convert('L')
blackwhite = gray.point(lambda x: 0 if x < 200 else 255, '1')
blackwhite.save("tiktok.jpg")

text_from_image = pytesseract.image_to_string(Image.open('tiktok.jpg'))
print(text_from_image)
tiktok test.png

Zdenko Podobny

unread,
Dec 9, 2022, 12:59:05 PM12/9/22
to tesser...@googlegroups.com
  1. Implement text detection on the image (EAST, YOLO... see https://www.youtube.com/watch?v=ZpRNfWzuexQ)  or search for "text detection python"
  2. Process detected areas so there is a text without any graphics - see some suggestions in docs (https://github.com/tesseract-ocr/tessdoc/blob/main/ImproveQuality.md)
  3. run OCR (tesseract) on the processed area(s)

Zdenko


pi 9. 12. 2022 o 18:15 Anna Muller <amul...@nd.edu> napísal(a):
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/e7c6a467-e53d-442d-b7bf-1fd645cdd66an%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages