Guitar Tuner Hack

0 views
Skip to first unread message

Hilda Bagnoli

unread,
Aug 5, 2024, 8:06:56 AM8/5/24
to mintrotoma
Signupfor FREE access to Fender Tune's new feature - Tune Plus. This includes the largest suite of guitar learning resources like the Pro Tuner, 7000 interactive chords and scales, drum tracks and metronomes. Now available for iOS and Android.

As you may have noticed, all the notes in a C major chord, and then some, are present. This natural phenomenon is deeply intertwined with the formation of music. The frequencies in the series are also mathematically related to each other. The 2nd harmonic is roughly twice the frequency in hertz of the 1st harmonic, and in musical terms is one octave up. So, if our sample note is C4, the 1st harmonic would be about 262hz and the 2nd harmonic would be about 524hz. The 3rd harmonic is the first G after the 2nd harmonic and the 4th harmonic is two octaves above the 1st harmonic, or about 1048hz.


When piano technicians tune a piano aurally, meaning by ear, they listen to how the harmonics in the notes they are tuning interact with each other. For the sake of simplicity, which is desperately needed when trying to wade into the seething waters of tuning theory, we will stick with the 1st harmonic and the 2nd harmonic. Just keep in mind that piano technicians and sophisticated piano tuning apps use multiple harmonics to determine the best way forward when tuning.


Since a piano has a high degree of inharmonicity, the frequency of the 2nd harmonic of the C4 will usually be more than twice frequency of the 1st harmonic. In musical terms it will be sharp compared to what it should be in a perfect harmonic series. This means that the C5 that is being tuned must also be sharp, or it will cause beating. If there is too much beating in the tuning of a piano it will sound muddy and inarticulate. Furthermore, that means that the C5 on a piano will most likely be tuned sharper than the C5 on a guitar, because that will help the piano sound more in tune with itself.


So, all things considered, a guitar tuner is a poor substitute for the tuning you would get from an experienced technician. A well-crafted piano tuning app will give you the measurements for a good tuning, which is a good start, but tuning a piano well also requires a certain skill set that can take years to cultivate.


Given we have the Web Audio API and getUserMedia, I wondered if I could make a passable guitar tuner. Looks like I can, and in the process I learned way more stuff about audio than I care to mention. Cool stuff, though! I thought I'd do a breakdown of what went into building it.


If you're not a guitarist, or you don't have a guitar to hand, you can always check out the video below where I show it in use. Unfortunately it does involve seeing me play the guitar, for which I can only apologise, but hopefully I at least get points for trying.


It is a small app, mind. The whole thing weighs in at 40.1KB including Polymer (but excluding the 12KB Web Components polyfills), so if it had been slow to load I think I'd have found that more than a little depressing.


You can read the other post if you want the super gory details, but the quick version here is that I'm loading all three of my web components individually, and as each one arrives it upgrades the element it manages. In order to prevent FOUC, I inline some styles in the head of the app's index.html that make it look like this:


The elements all race to get Polymer, and, because of the way HTML Imports work and because Polymer is always requested with the same URL, we only request it once. Once loaded, all three components will be able to use it.


I think one of the really nice bits of Web Components is that it encourages healthy code decoupling. Sure you can achieve it anyway without making components, but I just find that it helps to have a nudge every now and then! And of course I can now bundle up the logic so if I need any more audio mangling I have an element ready to go.


I did have a bit and "umm" and an "ahh" over whether or not something like an should be an element or not. On the one hand it doesn't really offer any semantic value to have it there in the DOM, on the other it can dispatch events, which is really handy. Clearly you can see which way I came down on this one since there is an element, but I wouldn't blame anyone for calling it the other way.


You can't pass the class itself (or an instance) to Polymer, because without sugar the class is a function and the Web Components registerElement function that Polymer calls expects an object as its second parameter, not a function. It also expect a tag name as its first, so I used a getter for is because it appears as a property on the prototype. I guess I could have done this.constructor.prototype.is = 'my-rad-element', but getters look neater to me.


Another side-effect of this approach is that you don't get to use an instance of the class anywhere, so anything you would have done in constructor now needs to be done in the created and attached callbacks, which is a bit limiting but also no big deal. I guess that's just the nature of using a class / function instead of an object.


All of this isn't strictly necessary, or even remotely so; there's nothing wrong with giving Polymer an object. But I like ES6 Classes (controversial, I know) and if I'm in ES6 world, or want to be, why not just try and get it all working nicely? Yes? Winner.


With elements in place, let's talk about analysing audio, because I thought this bit was going to be relatively easy to do. I was wrong. Very wrong. Essentially I'm a clown and still haven't learned to estimate work well. But let me see if I can't make it easier for the next troubled soul who attempts to do something similar.


Attempt number one, then: Fast Fourier Transforms, or FFTs. If you're not familiar with them, what they do is give you a breakdown of the current audio in frequency buckets. The Web Audio API can let you get access to that data in - say - a requestAnimationFrame with an AnalyserNode, on which you call getFloatFrequencyData.


I thought that if I took an FFT of the audio, I would be able to step through that, look for the most active frequency. Then it's a case of figuring out which string it's likely to be based on the frequency, and then providing "tune up", "tune down", or "in tune" messages accordingly.


In order to get enough resolution on frequencies, you need a colossal FFT for this approach. With an FFT of 32K (the largest you can get), each bucket in the array represents a frequency range just shy of 3Hz.


Filling up an array of that size takes somewhere in the region of 11ms on a Nexus 5 on a good day with a following wind. If you're trying to do that in a requestAnimationFrame callback, you're going to have a bad time. Doubly bad is the fact that you're also going to have to process the audio data after getting it. For 60fps you have about 8-10ms of JavaScript time at the absolute maximum. The browser has housekeeping to do, so you have to share CPU time. In the end this approach yielded something with a frame rate that fluctuated wildly between 30 and 60fps, and something which can only be described by its friends as a "CPU melter".


See how there are peaks all over the place? Each string brings its own special combination of frequencies with it, called harmonics. One thing is for sure: it's not a "pure" sample where you can infer that you're hitting a given string just from the most active frequency.


I'm a little hard of understanding sometimes, so I attempted to work around this with some good ol' fashioned number fishing and fudging. It kind of worked under very specific circumstances, but it really wasn't robust.


Then Chris Wilson helped me. For context, I'd got to the end of my hack-fudge approach and started googling for things like "please i am a clown how do you do simple pitch detection?" As you might expect, the top results were Wikipedia articles that may as well be written in Ancient Egyptian hieroglyphics for all the sense they make. They're seemingly written by people who already understand these topics, and whose sole aim seems to be to ensure that you won't. I got the same deal when I made a 3D engine a few years back and, as with that period in my life, all of me screamed out for simple, treat-me-like-a-human explanations. Thankfully that's exactly what Chris provided over the course of several hours.


In retrospect I guess the name is a clue: auto- (self-) and correlation (matching). The idea is if you have an audio wave you can compare it to itself at various offsets. If you find a match then you have found where this wave repeats itself, even factoring in harmonics (more on that in a moment). Once you know when a wave repeats itself you have theoretically found its frequency.


You can get the wave data from the Web Audio API (of course you can, what a lovely API) with getFloatTimeDomainData, which has nearly zero documentation and also sounds like a function named after buzz words' greatest hits. But it does precisely what we need it to: it populates an array with floating point wave data with values ranging between -1 and 1.


Ideally speaking one would do some curve fitting here to figure out exactly where the wave repeats itself, but I found I was getting good enough results without that. The main problem I had with this approach was getting it to run quickly enough. With an array of 4,096, I was going to end up doing potentially 2,048 * 2,047 = 4,192,256 calculations, which wasn't quick enough to be done inside 8-10ms on mobile.


What I ended up doing was to do an initial pass where I just used 6 offsets, one for each string. Since I knew what frequency each string should be, I decided to offset the wave by that much and choose whichever string's offset yielded the lowest difference. The nearest match can then be considered the "target" string, kind of an "Oh, it looks like you're tuning the D3 string!" approach.


In the above image you can see the E4 string being plucked, and the various offset versions. You can also see that, when moved by E4's expected offset, the wave matches itself most closely than for any other offset, which is exactly what we want.

3a8082e126
Reply all
Reply to author
Forward
0 new messages