An idea on speaker tracking

61 views

Skip to first unread message

Samson Mulder

unread,

Apr 17, 2013, 5:17:16 PM4/17/13

to gst-s...@googlegroups.com

Hello,

I am a graduate student from Iowa State University. I had built a face recognition system including face detection and recognition. I am interested in your speaker tracking project. And my idea is to provide parallel version for face detection and speaker tracking.

I am thinking about using OpenCV and CUDA library. Both of them are system independent.

For face detection, I would use Haar face detection algorithm in OpenCV which has already been proved to be effective. A GPU version is also included in OpenCV.

For speaker tracking, I would like to test two algorithms, CAM shift and mean shift. I found in your document, there are two requirements, "The background is unevenly illuminated", “The camera moves in a controlled fashion, such as panning in either X or Y direction and zoom.” I assume, the two algorithms would not have a good result. So I am thinking about add a little Scale invariant feature descriptor, such as SIFT or SURF, so it can have a better detect rate when the size and illumination changes. SIFT and SURF can also be used to find the speaker after he is covered by another person for a few seconds.

So this is my basic idea. Do you think this would be a good project to do for GSoC?

Is there any open question in the speaker tracking project? I am happy to hear them and solve them.

Regard

Yijia Xu

Iowa State University

Computer Science Department

Tim Ansell

unread,

Apr 17, 2013, 8:16:48 PM4/17/13

to gst-s...@googlegroups.com

On 18 April 2013 07:17, Samson Mulder <walk...@gmail.com> wrote:

Hello,

I am a graduate student from Iowa State University. I had built a face recognition system including face detection and recognition. I am interested in your speaker tracking project. And my idea is to provide parallel version for face detection and speaker tracking.

I am thinking about using OpenCV and CUDA library. Both of them are system independent.

For face detection, I would use Haar face detection algorithm in OpenCV which has already been proved to be effective. A GPU version is also included in OpenCV.

This is what we currently use (OpenCV with Haar face detection). Take a look speaker-tracking branch at;

https://github.com/timvideos/gst-switch/tree/speaker-tracking

For speaker tracking, I would like to test two algorithms, CAM shift and mean shift. I found in your document, there are two requirements, "The background is unevenly illuminated", “The camera moves in a controlled fashion, such as panning in either X or Y direction and zoom.” I assume, the two algorithms would not have a good result. So I am thinking about add a little Scale invariant feature descriptor, such as SIFT or SURF, so it can have a better detect rate when the size and illumination changes. SIFT and SURF can also be used to find the speaker after he is covered by another person for a few seconds.

So this is my basic idea. Do you think this would be a good project to do for GSoC?

Is there any open question in the speaker tracking project? I am happy to hear them and solve them.

The biggest open question is how to develop a test suite which doesn't require any hardware.