Hey David (and the rest of AMD who might be interested in synth
library implementation),
If you are coming from a C background, so of the tricks I used will
probably be more natural to you than a career Java guy.
For a bit of context, I want to point out that the "table" array in
WtOsc is the actual wavetable, a mostly-static snippet of audio data
that will be repeated many many times to make the output audio. In any
wave table synthesizer there is a tradeoff between audio quality and
memory/cache efficiency. A tiny wavetable will be able to render
faster, but, even with fancy interpolation, it might not sound exactly
like the waveform you thought you put into it. Using a larger
wavetable might waste memory (usually not a problem at all these
days), but it will allow you to get better audio quality even with a
crappier interpolation method. This interpolation shows up in
"render(.)" where samples are read out of "table" and stuffed into
"buffer", an array of samples that is being passed in from UGens
upstream of the WtOsc.
Ok, let's look the bit shifting stuff. The size of the wavetable is a
designer choice (as I described above), so, in that one night I
happened to be making WtOsc, I had to pick some arbitrary number to
use as the number of the samples ("ENTRIES"). Suppose I had picked
500. When it came time to implement "render(.)", there is a piece of
code where want to make sure that I never attempt to read any samples
past the end of the wavetable. Initially, you'd think that "index%500"
would do the trick (and it actually would!), but this function
contains the inner-innermost loop of my entire program. The ability to
playback audio without skipping relies on WtOsc#render not wasting too
much time. So, given the opportunity to replace that one division-
equivalent operation (remember, this is interpreted code on a battery-
powered mobile device), with a (probably) much faster bitwise-and, I
took it.
The snippets "index % ENTRIES" is equivalent to "index & (ENTRIES-1)"
iff "ENTRIES" was an integer power of two. Think about mentally
computing some number % 10, this just means hacking off all of the
upper decimal digits in a number and leaving the rest. You can do the
same trick in binary where % 256 means simply hack off (zero out) all
bits above the bottom eight. So, the fiddling with bitshifts in
"BITS", "ENTRIES", and "MASK" is to that I can specify how many bits
of precision I want in my wavetable index counter, the number (a power
of two) of entries that implies that the table contains, and a bitmask
I can use to zero out all of the overflowing bits I don't want.
The basic idea? "(index+1)&MASK" simply advances "index" around to the
next sample in the wavetable, wrapping if necessary, without having to
do a division/modulus operation.
Why did I pick 8? Dunno. It sounded fine so I left it wherever it was
last set in testing.
Ok, now for why "CHUNK_SIZE" seems to be disconnected from the
wavetable size. Simple answer: they are disconnected. More interesting
answer: "CHUNK_SIZE" is the size of the upstream audio buffer that may
(or may not) make it all the way to the user's ears, with some effects
added along the way. By analogy to 3D graphics, think of "CHUNK_SIZE"
as telling you something about the size of the window in your fancy
game and "ENTRIES" as the size of the texture you are mapping onto
various polygons. If a triangle is far away, the contents of your
texture (wavetable) might be repeated many time to fill a small area
of the output window (this is the case for a high pitched note). Other
polygons might be closer, even sitting at screen depth, where pixels
in the texture are mapped almost one-to-one with pixels in the window.
Just as using a larger window size might slow down the framerate of
your game on a slower machine, using a larger "CHUNK_SIZE" in the
synth libraries implies additional latency in the interaction between
the user with the touchpad and what they hear through the speakers (my
value, which happened to be 256 also, happens to allow an audio
interaction framerate of about 86 fps (Hz), comfortably faster than
touch events are pushed in by the UI framework).
Basic idea? The wavetable (holding "ENTRIES" samples) is repeated many
times at different speeds to fill a fixed size audio frame window
(holding "CHUNK_SIZE" samples). Coincidence: they both happen to be
256.
Finally, why all the work to return true at the end!!!??? Suppose you
are implementing an effect UGen like a delay or a flanger. When
someone asks you to render your audio into a buffer they provide, you
first have to ask the UGens downstream of you for them to render into
temporary buffers before you can apply your effect to the audio. But,
suppose you were told whether the downstream UGens were only going to
producing silence; then you could be lazy and not attempt to compute
the effect of silence reverberating, saving some tough computation.
To support this kind of dynamic laziness in computation, I made the so
that UGens must explicitly ask for their kids (the ones who provide
their input) to render, and that calling render on them should produce
a boolean describing whether or not the kid actually produced any
audio.
Take a look the ExpEnv (envelope) UGen. At the very top of its render
method, it checks if its own attenuation value is below a certain near-
silent threshold. If so, it simply returns false right away -- it is
saying that it won't be producing any non-silent output this frame,
and that it didn't even need to bother consulting its kids to decide
if they had any output (potentially saving a whole chain of UGens from
rendering this frame). On the next step, it asks its kids to render
into an audio buffer. Checking the return value, it knows that, even
if the envelope isn't closed off, that applying any gain change to a
silent signal is a waste of time, so it again returns false right
away. Finally, only if the envelope wasn't closed and the kids had
some actual audio, does the envelope calculation actually run this
frame (returning true at the end to say that it thinks that it
produced some real output that the envelope's parents might want).
Back in WtOsc, oscillators always produce audio every frame. So, of
course, WtOsc is hardcoded to return true every time. If you place an
envelope right after the oscillator in the dsp chain, the envelopes
logic for laziness will keep the oscillator from doing unnecessary
computation. The result, in Ethereal Dialpad, is that, if you stop
poking at the screen with your finger, the processor usage quickly
ramps down as the various UGens give up and go silent. It doesn't
quite get to zero processor usage (even for the DSP chain) because I
didn't implement lazy logic for all of the UGens. Delay, in
particular, doesn't attempt to estimate how much audio is left in its
internal buffer (which might take a long time to go near-silent,
depending on the feedback setting) so it just always returns true
also, meaning anything upstream of it will probably run every frame.
If you are going to take any ideas away from my little UGen library,
take high level ideas such as an example of how wavetables work, a DSP
graph structure, and short-cutting lazy traversal of the graph. Don't
worry too much about low level details like the bit shifting stuff.
Either you are making a processor-bound app where you need every tiny
bit of performance you can get (in which case you should be back in C
using the NDK instead of my fluffy Java stuff), or you have plenty of
processor to spare and you should be optimizing for what sounds the
nicest, is the easiest to program, and the most fun to play with, and
let the fancy new Android just-in-time compiler save you from
yourself.
On a final design note, I've looked at the implementation of a few
other synth libraries and seen one other big design paradigm which
isn't represented in my library. My library is based on computing
whole "frames" at a time (256 sample chunks). This is nice and
efficient, but it makes for some ugliness in programming because every
single UGen has this same stupid loop all over the place. I started
writing a LPF, but I scrapped it because it just didn't feel pretty
enough to me. The alternative is to have a graph where the render()
operation only produces a single sample at a time (STK works this way,
for example). This makes many UGens very easy and clean to write, but
it means that you need to be running on hardware that can handle doing
a bunch of function calls per sample of output. At the time I made
Ethereal Dialpad (testing on a G1), this was not the case. However, I
expect that on current generation hardware and either native or jitted
code, this might be feasible. Dunno, someone should try it.
Hope this helps! Make sure to share any big discoveries you make.
Adam