Hello OpenCV,
I hope this email finds you well. I am writing to share my proposal for implementing Dynamic CUDA Support in OpenCV’s DNN Module, which I believe would greatly enhance OpenCV’s flexibility and usability for NVIDIA GPU acceleration.
Overview
Currently, OpenCV’s DNN module supports CUDA as a backend for deep learning inference, but this introduces heavy dependencies on the CUDA SDK. My goal is to decouple CUDA support by implementing a dynamic loading mechanism, similar to the OpenVINO backend. This will allow OpenCV to run without requiring CUDA at compile-time, while still enabling GPU acceleration dynamically through a separate CUDA plugin.
Key Objectives & Approach
Develop a CUDA Plugin: The CUDA execution engine will be compiled as a separate shared library (opencv_cuda_dnn.so/.dll), which OpenCV can load dynamically.
Implement Dynamic Loading: Use dlopen() (Linux/macOS) or LoadLibrary() (Windows) to detect and load the CUDA plugin at runtime, avoiding direct linking to CUDA.
Automatic Memory Management: Ensure seamless GPU memory transfers by handling host-to-device and device-to-host memory copying within the plugin.
Modify OpenCV Build System: Introduce a CMake option (WITH_CUDA_PLUGIN) to enable building the CUDA plugin separately, ensuring OpenCV itself remains CUDA-independent.
Graceful Fallback & Performance Considerations: If CUDA is unavailable, OpenCV should safely fall back to another backend, ensuring robust error handling and minimal performance overhead.
Expected Deliverables
CUDA Plugin Implementation – A separate dynamically loaded library for CUDA inference.
Integration with OpenCV DNN – Modifications to cv::dnn::Net for detecting and using the CUDA plugin dynamically.
Build System Enhancements – Updates to CMake to support plugin-based CUDA loading.
Testing & Benchmarking – Ensuring functionality, correctness, and measuring performance impact.
Comprehensive Documentation – Clear usage guides and API references
Timeline & Next Steps
I plan to engage with the OpenCV community to refine the implementation details, validate feasibility through a prototype, and submit my work as a series of patches. I would appreciate any feedback or guidance on this approach to ensure alignment with OpenCV’s long-term vision.
Qualifications: Education: B.S. in Computer Science and Data Science, University of Wisconsin-Madison
Relevant Experience:
AMD GPU Intern: Experience with HIP, CUDA and OpenMP.
HPC & Compiler Optimization: Research in High-Performance Computing (HPC) and performance engineering at AMD.
C++ Development: Strong background in C++ for system programming and software performance tuning.
Would you be available for a brief discussion on this proposal? I am happy to incorporate any suggestions and further refine the plan before formally submitting it.
Looking forward to your thoughts.
Best regards,
Ambika Sharan