Hello again, fellow JavaCV users!
So I am using CascadeClassifier to do object detection in a video feed. Currently I am doing upper body, full body, and face detection in 3 separate CascadeClassifier instances. I am trying to run each detection in a separate CPU thread. However, that ends up being slower than if I run them sequentially. It almost seems like there's some lock behind calls to CascadeClassifier.detectMultiScale that's only allowing one call to run at a time.
Here is how I am testing this:
// Create classifier jobs
var classifierJobs = getClassifiers().stream()
.map((classifier) -> {
Callable<List<ClassifierResult>> classifierJob = () -> {
var detectedObjs = new RectVector();
classifier.detectMultiScale(grayMat,
detectedObjs,
scaleFactor,
minNeighbors,
CASCADE_SCALE_IMAGE,
new Size(minSize, minSize),
new Size()
);
return detectedObjs;
};
return classifierJob;
})
.collect(Collectors.toList());
// Execute jobs with ExecutorService instance
ExecutorService classifierPool = Executors.newCachedThreadPool();
classifierPool.invokeAll(classifierJobs)
As you can see, I am creating a
Callable for each
CascadeClassifier instance. Inside, there's a call to
detectMultiScale. I feed all of these Callables to a cached thread pool via
invokeAll.
This code runs but it is slow. If I instead define classifierPool as newSingleThreadExecutor() then the code runs much faster, albeit in a sequential fashion on a single thread.
So what's the deal here? Am I running into threading overhead? I thought using a thread pool would decrease some of the overhead particularly in terms of thread creation and destruction.
Or is there some kind of mutex lock behind the scenes that I am not aware of that's causing my multithreaded code to slow down?
Any help appreciated. Thanks in advance!
Ben