Given we have the Web Audio API and getUserMedia, I wondered if I could make a passable guitar tuner. Looks like I can, and in the process I learned way more stuff about audio than I care to mention. Cool stuff, though! I thought I'd do a breakdown of what went into building it.
If you're not a guitarist, or you don't have a guitar to hand, you can always check out the video below where I show it in use. Unfortunately it does involve seeing me play the guitar, for which I can only apologise, but hopefully I at least get points for trying.
It is a small app, mind. The whole thing weighs in at 40.1KB including Polymer (but excluding the 12KB Web Components polyfills), so if it had been slow to load I think I'd have found that more than a little depressing.
You can read the other post if you want the super gory details, but the quick version here is that I'm loading all three of my web components individually, and as each one arrives it upgrades the element it manages. In order to prevent FOUC, I inline some styles in the head of the app's index.html that make it look like this:
The elements all race to get Polymer, and, because of the way HTML Imports work and because Polymer is always requested with the same URL, we only request it once. Once loaded, all three components will be able to use it.
I think one of the really nice bits of Web Components is that it encourages healthy code decoupling. Sure you can achieve it anyway without making components, but I just find that it helps to have a nudge every now and then! And of course I can now bundle up the logic so if I need any more audio mangling I have an element ready to go.
I did have a bit and "umm" and an "ahh" over whether or not something like an should be an element or not. On the one hand it doesn't really offer any semantic value to have it there in the DOM, on the other it can dispatch events, which is really handy. Clearly you can see which way I came down on this one since there is an element, but I wouldn't blame anyone for calling it the other way.
You can't pass the class itself (or an instance) to Polymer, because without sugar the class is a function and the Web Components registerElement function that Polymer calls expects an object as its second parameter, not a function. It also expect a tag name as its first, so I used a getter for is because it appears as a property on the prototype. I guess I could have done this.constructor.prototype.is = 'my-rad-element', but getters look neater to me.
Another side-effect of this approach is that you don't get to use an instance of the class anywhere, so anything you would have done in constructor now needs to be done in the created and attached callbacks, which is a bit limiting but also no big deal. I guess that's just the nature of using a class / function instead of an object.
All of this isn't strictly necessary, or even remotely so; there's nothing wrong with giving Polymer an object. But I like ES6 Classes (controversial, I know) and if I'm in ES6 world, or want to be, why not just try and get it all working nicely? Yes? Winner.
With elements in place, let's talk about analysing audio, because I thought this bit was going to be relatively easy to do. I was wrong. Very wrong. Essentially I'm a clown and still haven't learned to estimate work well. But let me see if I can't make it easier for the next troubled soul who attempts to do something similar.
Attempt number one, then: Fast Fourier Transforms, or FFTs. If you're not familiar with them, what they do is give you a breakdown of the current audio in frequency buckets. The Web Audio API can let you get access to that data in - say - a requestAnimationFrame with an AnalyserNode, on which you call getFloatFrequencyData.
I thought that if I took an FFT of the audio, I would be able to step through that, look for the most active frequency. Then it's a case of figuring out which string it's likely to be based on the frequency, and then providing "tune up", "tune down", or "in tune" messages accordingly.
In order to get enough resolution on frequencies, you need a colossal FFT for this approach. With an FFT of 32K (the largest you can get), each bucket in the array represents a frequency range just shy of 3Hz.
Filling up an array of that size takes somewhere in the region of 11ms on a Nexus 5 on a good day with a following wind. If you're trying to do that in a requestAnimationFrame callback, you're going to have a bad time. Doubly bad is the fact that you're also going to have to process the audio data after getting it. For 60fps you have about 8-10ms of JavaScript time at the absolute maximum. The browser has housekeeping to do, so you have to share CPU time. In the end this approach yielded something with a frame rate that fluctuated wildly between 30 and 60fps, and something which can only be described by its friends as a "CPU melter".
See how there are peaks all over the place? Each string brings its own special combination of frequencies with it, called harmonics. One thing is for sure: it's not a "pure" sample where you can infer that you're hitting a given string just from the most active frequency.
I'm a little hard of understanding sometimes, so I attempted to work around this with some good ol' fashioned number fishing and fudging. It kind of worked under very specific circumstances, but it really wasn't robust.
Then Chris Wilson helped me. For context, I'd got to the end of my hack-fudge approach and started googling for things like "please i am a clown how do you do simple pitch detection?" As you might expect, the top results were Wikipedia articles that may as well be written in Ancient Egyptian hieroglyphics for all the sense they make. They're seemingly written by people who already understand these topics, and whose sole aim seems to be to ensure that you won't. I got the same deal when I made a 3D engine a few years back and, as with that period in my life, all of me screamed out for simple, treat-me-like-a-human explanations. Thankfully that's exactly what Chris provided over the course of several hours.
In retrospect I guess the name is a clue: auto- (self-) and correlation (matching). The idea is if you have an audio wave you can compare it to itself at various offsets. If you find a match then you have found where this wave repeats itself, even factoring in harmonics (more on that in a moment). Once you know when a wave repeats itself you have theoretically found its frequency.
You can get the wave data from the Web Audio API (of course you can, what a lovely API) with getFloatTimeDomainData, which has nearly zero documentation and also sounds like a function named after buzz words' greatest hits. But it does precisely what we need it to: it populates an array with floating point wave data with values ranging between -1 and 1.
Ideally speaking one would do some curve fitting here to figure out exactly where the wave repeats itself, but I found I was getting good enough results without that. The main problem I had with this approach was getting it to run quickly enough. With an array of 4,096, I was going to end up doing potentially 2,048 * 2,047 = 4,192,256 calculations, which wasn't quick enough to be done inside 8-10ms on mobile.
What I ended up doing was to do an initial pass where I just used 6 offsets, one for each string. Since I knew what frequency each string should be, I decided to offset the wave by that much and choose whichever string's offset yielded the lowest difference. The nearest match can then be considered the "target" string, kind of an "Oh, it looks like you're tuning the D3 string!" approach.
In the above image you can see the E4 string being plucked, and the various offset versions. You can also see that, when moved by E4's expected offset, the wave matches itself most closely than for any other offset, which is exactly what we want.
Now I had the candidate for the closest match, I used the code from above to figure out exactly how far away from the target frequency the plucked string was. Instead of using offsets from 0 to 2,048, however, I did it from (an admittedly random value of) 10 either side of the expected offset for that string. The net result was far fewer overall comparisons, although at the cost of only supporting standard tuning. I figure there may be a version of this I'm missing which would allow me to support any tuning, but alas it eludes me. I am eluded.
Eventually I realised that I kept turning my head about 45 to the left, and that was the clue I needed. What I was looking for were the left- and right-most points of the dial when looking at it at 45. Or, put another way, rotate the points of the dial clockwise by 45 then order them by x. Then choose the 0th and last values, since they are the where the dial's extremities are.
From there it's a case of drawing out the points as part of the shadow and, when you hit the right-most point start the drop down for the shadow, go along the bottom and come back up to the left-most point. Ta-daaaa!
Finally, I just wanted to share one little tip about working with Service Workers that I've found helpful. I came across a gulp plugin called bump. It's useful for taking the version in your package.json file and incrementing the number. (You tell it if it's a patch, major or minor revision.)
When I'm cutting a release I bump the version automatically using it. If I want to do major or minor versions I'll just tweak the package.json file myself, but for patching it'll do it for me during the build.
Whenever I run the tasks that write out the Service Worker, or anywhere else where I might want to include the version number, I grab the version from package.json and do a string replace on the target file.
What this gives me is the assurance that I won't accidentally leave my users on an old version of the app due to an unchanged Service Worker. The Service Worker has the version string in it and, in my case, I also use that as part of its cache's name:
d3342ee215