Hello,
I am a graduate student from Iowa State University. I had built a face recognition system including face detection and recognition. I am interested in your speaker tracking project. And my idea is to provide parallel version for face detection and speaker tracking.
I am thinking about using OpenCV and CUDA library. Both of them are system independent.
For face detection, I would use Haar face detection algorithm in OpenCV which has already been proved to be effective. A GPU version is also included in OpenCV.
For speaker tracking, I would like to test two algorithms, CAM shift and mean shift. I found in your document, there are two requirements, "The background is unevenly illuminated", “The camera moves in a controlled fashion, such as panning in either X or Y direction and zoom.” I assume, the two algorithms would not have a good result. So I am thinking about add a little Scale invariant feature descriptor, such as SIFT or SURF, so it can have a better detect rate when the size and illumination changes. SIFT and SURF can also be used to find the speaker after he is covered by another person for a few seconds.
So this is my basic idea. Do you think this would be a good project to do for GSoC?
Is there any open question in the speaker tracking project? I am happy to hear them and solve them.
Regard
Yijia Xu
Iowa State University
Computer Science Department
Hello,
I am a graduate student from Iowa State University. I had built a face recognition system including face detection and recognition. I am interested in your speaker tracking project. And my idea is to provide parallel version for face detection and speaker tracking.
I am thinking about using OpenCV and CUDA library. Both of them are system independent.
For face detection, I would use Haar face detection algorithm in OpenCV which has already been proved to be effective. A GPU version is also included in OpenCV.
For speaker tracking, I would like to test two algorithms, CAM shift and mean shift. I found in your document, there are two requirements, "The background is unevenly illuminated", “The camera moves in a controlled fashion, such as panning in either X or Y direction and zoom.” I assume, the two algorithms would not have a good result. So I am thinking about add a little Scale invariant feature descriptor, such as SIFT or SURF, so it can have a better detect rate when the size and illumination changes. SIFT and SURF can also be used to find the speaker after he is covered by another person for a few seconds.
So this is my basic idea. Do you think this would be a good project to do for GSoC?
Is there any open question in the speaker tracking project? I am happy to hear them and solve them.