Dear OpenCV team,
I hope you are doing well.
As we all know, OpenCV is primarily designed for computer vision. However, integrating Large Language Model (LLM) support can significantly enhance its capabilities. Currently, OpenCV can detect and extract text from images using OCR or deep learning models, but it often produces errors, missing characters, or poor formatting.
One of the major limitations in OpenCV’s DNN module is the lack of native support for tokenization. This means that running an LLM inside OpenCV requires an external tokenizer library, adding unnecessary complexity. I strongly believe that addressing this issue will make OpenCV more efficient and impactful in real-world applications, which is why I am eager to contribute to solving this problem.
To better understand tokenization algorithms, I am referring to the Hugging Face documentation:
https://huggingface.co/learn/nlp-course/en/chapter6/5
⭐️I have also referred to the krish naik's NLP playlist on youtube in past and have experience in fine tuning models as well.
Additionally, I found this video helpful in understanding the concepts:
https://youtu.be/zduSFxRajkE?si=JF725Ipnzc4R5Nnc
That said, I still need to explore the OpenCV DNN module further to fully understand how to integrate a tokenizer effectively. If anyone has relevant resources, insights, or experience related to this, I would greatly appreciate it if you could share them with me.
Looking forward to your thoughts!
Best regards,
Anushka Sharma
anushkas...@gmail.com
LinkedIn