Spectral analysis of (simple) speech using python (and introducing Chidiya)

167 views

Skip to first unread message

Dilawar Singh

unread,

Dec 24, 2014, 7:27:30 PM12/24/14

to wncc_iitb

Processing human speech is challenging for the simple reason that of sheer complexity of human languages. A relatively tractable problem is of recognizing bird-songs. Most work have been done for identifying the bird-species by processing the bird-songs, but some evolutionary biologists look at bird-songs for a different reason:

Shortwings birds are found on mountain-tops of Western Ghats and are isolated from each other by deep valleys; people like to call them sky-islands. Genetic data produced by researches at NCBS shows that major valleys in the Ghats have isolated populations for millions of years. More recent, fine-scaled genetic data shows that human-mediated fragmentation has caused recent disruption of gene flow between forest patches that were historically connected. With measurements of just a few parameters (time & frequency), songs of all populations were found to be statistically different. Curiously, some populations (not geographically closest) that shared alleles (genes) across a 1-million year divide also showed song similarity. Therefore the question: is there an invariant singing pattern in two species of Shortwings which are separated in time for millions of years?

To figure out that there are some elements of song that remain conserved over millions of years (ability to reflect ancient genetic relationships) while some others change rapidly (reflecting recent - 100 years fragmentation effects); someone came up with the idea that one should be able to find certain repeating motifs (grammar) that could be identified by approaches similar to the ones used on the Indus Valley script. But to get to this 'grammar', we need to get data on the 'alphabets' or notes.

Sorry for the biological part above, I thought it was too cool to miss. For last 4 weeks, I worked on a cython program which can figure out the 'alphabets' of the bird songs. Since the project is open source and is in good shape to be introduced to community for further development, I thought it would be nice to write a brief post about it.

The github page here would have complete information about it over the time. I don't want to mix programming details in this post so I'd just summarize what the application does. Its is pretty basic (can be used by others as well in similar domains):

Read the recorded song in aif/wave format (python standard library).
Compute spectrogram (scipy) and save it as an image (matplotlib)
Read the spectrogram and extract notes. The algorithm is pretty simple here and documented in code as 'slither'. Although not perfect when there is noise, it works pretty effectively.
Sort the notes according to start-time and cluster/partition them into songs.

Now the obvious things to do next are the following. I would be really grateful for any suggestions on how to do it robustly.

Compute a simple geometry from note. See the image on the github-page how a note looks like.
CLASSIFICATION of note. This looks like a hard problem. I have few ideas but wouldn't know how they will work out in practice. To human eyes, two notes might look very similar but which algorithm will compute it. In nutshell, the problem is to find similarity between splines with sharp turns and twists. I am thinking in terms of geometry problem but would love to hear about any other approach.
Once notes are classified and songs are written to text file, the problem is now reduce to construct a regular expression which would accept all songs/or generate various songs. And also Markov Chains based analysis as done in Indus paper linked above This would be a great tool for research purpose in evolutionary biology.