Hey guys. I put together this interface for YAMNet that can be used either in a GUI or on the command line. I figured it may be useful for someone.
https://github.com/tomysshadow/YAMosse
I created this application because a while back, I wanted an app that could give me a list of timestamps of some sounds in a sound file. I knew the technology for this definitely existed, but I was surprised to find there didn't seem to be any existing program I could just drag and drop a file into, in order to detect the sounds that were in it. The only thing I could find along those lines was YAMalyzer, but it did not report the timestamps of the sounds it detected.
Instead, when I Googled how to get a list of timestamps of sounds in a sound file, all I got were tutorials about how to write code to do it yourself in Python. Perhaps Google was catering to me because I usually use it to look up programming questions, but I didn't want to have to write a bunch of code to do this, I just wanted a program that did it for me. So naturally, I wrote a bunch of code to do it. And now I have a program that could do it for me.
- it can detect all 521 different classes of common sounds that can be detected by the YAMNet model
- it supports multiple file selection and can scan multiple files at once using multiprocessing
- it provides multiple ways to identify sounds: using a Confidence Score or using the Top Ranked classes
- you can import and export preset files in order to save the options you used for a scan
- you can calibrate the sound classes so that it is more confident or less confident about them, in order to eliminate false positives
- it can output the results as plaintext or as a JSON file
- it can write out timestamps for long sounds as timespans (like 1:30 - 1:35, instead of 1:30, 1:31, 1:32...)
- you can filter out silence by setting the background noise volume
This is my first "real" Python script. I say "real" in quotes because I have written Python before, but only in the form of quick n' dirty batch script replacements that I didn't spend much time on. So this is what I'd consider my first actual Python project, the first time I've made something medium sized.
I am an experienced developer in other languages, but this is well outside of my usual wheelhouse - most of the stuff I program is something to do with videogames, usually in C++, usually command line based or a DLL so it doesn't have any GUI. So, please excuse if there are parts of the code that could be improved. I tried my absolute best to make it quality.
If you're looking for something you can plug in a sound file to and get out a text file you can parse through, then I hope you will enjoy my project.