An idea on speaker tracking

61 views
Skip to first unread message

Samson Mulder

unread,
Apr 17, 2013, 5:17:16 PM4/17/13
to gst-s...@googlegroups.com

Hello,

 

I am a graduate student from Iowa State University. I had built a face recognition system including face detection and recognition. I am interested in your speaker tracking project. And my idea is to provide parallel version for face detection and speaker tracking.

  

I am thinking about using OpenCV and CUDA library. Both of them are system independent.

 

For face detection, I would use Haar face detection algorithm in OpenCV which has already been proved to be effective. A GPU version is also included in OpenCV.

 

For speaker tracking, I would like to test two algorithms, CAM shift and mean shift. I found in your document, there are two requirements, "The background is unevenly illuminated", “The camera moves in a controlled fashion, such as panning in either X or Y direction and zoom.” I assume, the two algorithms would not have a good result. So I am thinking about add a little Scale invariant feature descriptor, such as SIFT or SURF, so it can have a better detect rate when the size and illumination changes. SIFT and SURF can also be used to find the speaker after he is covered by another person for a few seconds.

 

So this is my basic idea. Do you think this would be a good project to do for GSoC?

Is there any open question in the speaker tracking project? I am happy to hear them and solve them.

 

Regard

Yijia Xu

Iowa State University

Computer Science Department

Tim Ansell

unread,
Apr 17, 2013, 8:16:48 PM4/17/13
to gst-s...@googlegroups.com
On 18 April 2013 07:17, Samson Mulder <walk...@gmail.com> wrote:

Hello,

 

I am a graduate student from Iowa State University. I had built a face recognition system including face detection and recognition. I am interested in your speaker tracking project. And my idea is to provide parallel version for face detection and speaker tracking.

  

I am thinking about using OpenCV and CUDA library. Both of them are system independent.

 

For face detection, I would use Haar face detection algorithm in OpenCV which has already been proved to be effective. A GPU version is also included in OpenCV.


This is what we currently use (OpenCV with Haar face detection). Take a look speaker-tracking branch at;

 For speaker tracking, I would like to test two algorithms, CAM shift and mean shift. I found in your document, there are two requirements, "The background is unevenly illuminated", “The camera moves in a controlled fashion, such as panning in either X or Y direction and zoom.” I assume, the two algorithms would not have a good result. So I am thinking about add a little Scale invariant feature descriptor, such as SIFT or SURF, so it can have a better detect rate when the size and illumination changes. SIFT and SURF can also be used to find the speaker after he is covered by another person for a few seconds.

 

So this is my basic idea. Do you think this would be a good project to do for GSoC?

Is there any open question in the speaker tracking project? I am happy to hear them and solve them.


The biggest open question is how to develop a test suite which doesn't require any hardware.

Tim
 
Reply all
Reply to author
Forward
0 new messages