Proposal for Better LLMs support in OpenCV (2)

44 views

Skip to first unread message

Kunal Tiwari

unread,

Apr 6, 2025, 12:37:11 AMApr 6

to opencv-gsoc-202x

Hello mentors,

My name is Kunal Tiwari. I am writing to express my interest in contributing to the Better LLMs support in OpenCV (2) project, specifically focusing on optimizing inference performance.

I am a student at the University of Texas at Austin where I am currently studying Computer Science. I have experience in machine learning, computer vision, and transformer-based architectures! I am currently part of the Living with Robots lab on campus, where I and my team have been developing a SOTA transformer for human motion prediction. We are currently also writing a research paper to be published displaying our results.

Outside of school I’ve worked on projects like rebuilding the GPT-2 model from scratch, using CNNs for music genre classification, and deploying full-stack ML apps using PyTorch, TensorFlow, and Hugging Face tools.

I wanted to provide a quick roadmap of my simplified proposal for Enhanced LLM support in the OpenCV DNN module:

Objective:
Improve the OpenCV DNN module to support LLMs more efficiently, with a focus on models compatible with llama.cpp (such as LLaMA, Mistral, and GPT-J). The goal is to optimize inference performance, particularly for autoregressive decoding, and to provide a user-friendly API that facilitates real-time applications.

Key Ideas:

Dynamic Memory Management:
- Enhance the current static blob allocation mechanism to support dynamic input sizes.
- Avoid costly reallocations when extending input sequences during token-by-token generation.
- Ensure efficient handling of varying sequence lengths to reduce memory overhead.
Past Key/Value Caching:
- Implement a caching mechanism for transformer-based models so that once the past key/value pairs are computed, they can be reused instead of re-calculating them with each new token.
- This caching should reduce computation time, making inference nearly constant per token after the initial forward pass.
Dynamic Sequence Extension & Batch Processing:
- Provide support for seamless, incremental token generation where the model processes only the new token along with the cached states.
- Enable batched processing of multiple sequences to maximize throughput, especially important for applications like generating multiple text completions in parallel.
User-Friendly API & Demo Applications:
- Offer a high-level wrapper that abstracts the complexity of caching and dynamic input handling, making it easier for users to integrate LLM inference into their projects.
- Develop sample demos (e.g., token streaming for a real-time chatbot, parallel text generation) that illustrate how to use the enhanced DNN module effectively.

Please let me know if I am on the right track for this project and if there is any other additional information I should know.

Thank you for your time,
Kunal Tiwari

Reply all

Reply to author

Forward

0 new messages