The Future of Game Audio

Skip to first unread message


Jun 9, 2002, 8:16:07 AM6/9/02
To me, it seems that the visual side has always been 'the focus' of
video game writers and that the audio just tagged along. As a result,
game audio is about 10 years behind game video in terms of realism and
computing power used. I mean, look at many modern first person
shooters. The graphics are astoundingly gourgeous, but the sound is
rarely more than MIDI or CD music plus audio samples with a few
surround sound and amplitude effects thrown in for good measure. It
could be so much better....

When the Amiga machines came along with the possibility to SAMPLE
real-world music and speech in an acceptable quality, that was about
it for the audio development of computer games. Pretty soon computers
could play 44Khz samples, which is CD-quality, without wasting too
many run cycles. Sound development ground to a halt.

Graphics, on the other hand, have been continuously developing, and
ever since 1992 and the classic "Wolfenstein" game, that as the first
game gave us the high-framerate, texturemapped, 3D worlds that tons of
games now use, computer graphics have gotten better, and better, and
better. Now, we buy special hardware graphics cards to better generate
all these fantastic visuals, while any old Sound Blaster 128 bulk card
can easily handle the audio sound of things. Where are the
corresponding developments of audio? Why did we focus so exclusively
on the visual side of things?

If you think that nothing more can be done about audio, I maintain
that there certainly can. A new generation of 3d games, using a
rendered 3d audio system, would be so much more immersive than the
same 3d graphics with just the regular, cardboard sound.

What do I mean by a "rendered 3d audio system"? Well, a game using
this system would, for instance, store acoustic surface qualities
along with the visual bitmaps. A steel plate has radically different
acoustic properties than a wool curtain. Any sound going off in such a
game would initially be a dry audio sample, but the engine would also
calculate the following things:

Where did the sound go off?
Where are your two ears?
Which surfaces in this 3d world would the sound waves reverberate from
in order to reach each your ears? (calculated seperately for a true
stereo effect)
How does the sound change when reflecting off these surfaces?

This is a huge calculation, but hardly any more than real-time fog,
smoke, ripples, etc. A dedicated 3d audio card could easily handle
these calculations while the 3d video card and the CPU handled the
rest. The result would be a world where an explosion in a long, tight
cave would sound completely different than the same explosion in a
metal tank, or in the open air. The realism given by these principles
would make any game not using this system seem plastique and ancient.

Further down the line, I think we should consider the audio aspects of
the emerging video game genre, massively multiplayer online RPGs, such
as Ultima Online, Everquest and the upcoming Star Wars: Galaxies. In
this genre, audio is more important than ever to finally break the
last hindrances for these games to become truly virtual reality.

What is sorely needed here is a solid, believable voice synthesis -
software and hardware to convert text into computer-generated speech.
We've all heard the robotic Stephen-Hawking-drone of 'modern' voice
synthesizers (which are basically the same as the ones that were
invented 20 years ago - back in ancient times, computer-wise). This is
clearly not good enough. It needs two things:

A more natural and correct delivery of words, which could be achieved
by simply inputting the correct pronounciation of all the words in the
dictionary, the phonetic spelling is already in there so that's no big
job - plus adding some standard ways of pronouncing strange words. We
all have some idea how we would pronounce a word like "Skizmo" or
"Fnarx". These basic pronounciation rules coded into a voice synth
would also improve it a whole lot.

A way of managing emphasis to avoid the monotony of current voice
synths. Luckily, a large portion of the world has been working on this
problem for the last 7 years or so! The problems of expressing
emphasis, humour, sarcasm and seriousness have been solved quite
satisfactorially by the countless chatters in internet chat rooms.
Nowadays, nobody really has trouble expressing a lot of hidden
information in texts, using _emphasis_, laughter *LOL* or sarcasm :-P
A host of smileys and other non-textual information is already
hardcoded into gamers' brains, and it would not be difficult to find
an easy-to-remember consensus on how to write emphasis 'properly'.
A second-generation voice synth would be able to convert the two typed
sentences "Hey, did _you_ steal my car? *G* :-)" and "Hey, did you
_steal_ my _car_?? >:-(" with the appropriate intonation and emphasis.

Imagine this boon in a multiplayer game. You type your intonated
message, and your character in the game says it in his/her/its voice
with the correct emphasis to the others! Combining the voice emphasis
with appropriate body language would be no problem. No more reading
long texts on the screen. The whole gaming experience becomes more

Combining 3d rendered audio with this type of voice synthesis would
add another dimension to such games. The difference of typing
"<shout>Damn you!! >:-(" while standing on the edge of a virtual Grand
Canyon would be a lot different from "<whisper>Dahm you.. ;-P" inside
a small closet.

Let's bring the audio in games up to speed with the video! I betcha
you'll think games with conventional audio will one day seem as
mundane as the old platform games of the 80's.


Reply all
Reply to author
0 new messages