By Bobby
Prince
Presented at the CGDC
March 1996
|
What is sound?
You have heard the question "if a tree falls in a forest far from any
sound detector (human ear, microphone, etc.), does the tree's fall make any
noise?" If we define sound as waves that are carried by the air, the answer
would be yes -- wherever there are sound waves there is sound. What if we
define sound subjectively as a sensation in the ear? Then the tree falling
would make no sound if there is no ear to detect it. We could go even further
in the definition and say that sound is what a brain decodes using electrical
impulses from the ears. In this case, too, the tree makes no sound when there
are no ears to detect it. Since we are going to be relating sound to interactive
game play, let's consider sound to be whatever the game player's brain decodes.
It will be helpful for us to keep the following overall goal in mind:
Sound should communicate effectively what we want the listener to know
or experience -- it should focus the player's attention. In order to meet
this goal we must always approach sound from the player's viewpoint.
The Physics of Sound
Sound is produced by vibrations that disturb the air, causing pressure waves
that travel out in all directions from the source of the sound. When the
waves reach someone's ear, they set up vibrations that cause electrical signals
to be sent to the brain. These electrical signals are perceived as sound.
Sound is one of the first sensations an unborn child experiences. It brings
us our first lessons of life while we are still in the womb. We don't have
to learn to detect the vibrations of sound. We take it all for granted from
our earliest experiences.
Recording and Storing Sounds
In the early days of movie making it was discovered that the human eye is
fooled into seeing lifelike moving pictures if the individual frames pass
in front of the eye at 24 frames per second. Similarly, the audio community
experimented with sound to see how many snapshots of a sound must be taken
per second to fool the ear into hearing lifelike sounds from digital signals.
It was decided that 44,100 audio snapshots would do the same trick for the
ears as 24 frames per second does for the eyes. These audio snapshots are
called samples and they are the equivalent of a video frame in a movie.
The audio equivalent of the number of video frames per second is the
sampling rate. You have probably heard the term sample used for a
complete sound. It is also used to describe the smallest part of a digital
sound.
Very few people can hear any fewer than 16 Hz or any more than about 22 kHz
(thousand cycles per second). So, what sampling rate is required to record
sounds from 16 Hz to 22 kHz? This question is answered in the Nyquist formula
which states that the sampling rate of a sound must be twice the frequency
of the highest sound to be sampled. You might ask what will happen if
you record something at too low a sampling rate, disregarding the Nyquist
formula. The answer is that the average person would not think that the sound
was lifelike.
How do we store the digital samples of a sound?
We take snapshots of the amplitudes of a sound at regular intervals and store
the values. We are merely storing values that represent the air pressure
of a vibration at any one point in time. So a digital sound file is merely
a method of storing air pressure! How much air pressure can be stored in
one byte? One byte (8 bits) can have a value from 0 to 255 decimal. For this
reason, 8-bit digital audio can store 256 discrete air pressure levels. 16
bit digital audio can store air pressure much more precisely, allowing values
from 0 to 65535, or 65536 discrete air pressure levels. As you can well imagine,
8 bit resolution leaves a lot to be desired. Original waveforms are "squared
off" as a result of using it. The sound is not very clear. But, maybe clarity
is not what you are after.
A good way to visualize the difference in 8 and 16 bit audio is to consider
a home stereo that has a volume knob with "detents" in it. You know the type
that clicks as you turn it? Imagine that you buy a tuner/amplifier that has
a knob with positions (clicks) from 0 or, no volume to 65535, or full volume.
The volume change from one click to an adjacent one would be indistinguishable.
Now take the same knob and give it positions (clicks) from 0, or no volume
to 255, or full volume. Now you have only 256 discrete volume settings. You
can see how coarse your volume changes will be from click to click -- you
can hear each change. Now imagine those changes taking place thousands of
times a second. You can hear the roughness of 8 bit amplitude changes.
Wolfenstein 3D
The sound effects for Wolfenstein were recorded directly into a 16-bit sampling
keyboard, using live actor voices, live Foley effects and effects recorded
on a cassette recorder. Foley effects are named after a man named Foley who
created a way to add sound effects live during post-production using a recording
studio. Foley effects such as footsteps, rain, thunder, doors opening, keys
rattling, etc. are recorded "live," but are not necessarily actual recordings
of "real" sounds. Thunder can be made by wiggling a large piece of metal.
Fire can be made by crushing paper or plastic. The imagination is the only
limit when it comes to Foley effects.
The sound driver for Wolf 3D was designed for playback of 7k samples at 8
bit resolution, so I had to start with high quality material for the voices.
That was the reason for recording directly into the sampler. The cassette
recorder was used when the microphone cable to the sampler was too short.
Digital Audio Tape Recorders (DAT's) were still relatively new and not yet
portable. The human voices were recorded using compression. Compressors bring
up the low level sounds and hold down the high level sounds -- this makes
the sound "meatier" and also helps to hide any noise. You hear compressors
at work every time you listen to an FM radio announcer. That's how they get
the boom in their voices. With the advent of digital recording equipment
and sound editing software, there is not a lot of need for compressors. It
can be done in software. The sound driver for Wolf 3D was capable of playing
only one digital sound at a time, so priority of sounds was very important.
Some of the weapon firing sounds were recorded at a shooting range. The
explosions began as the crashing of the lid on a "Dempsey Dumpster." Scott
Miller was the vocal talent for the word "Scheist!" and he could not talk
for several hours after doing 3-4 takes at full volume! The other vocal "talent"
was everyone with id Software at the time and me. I generally do not suggest
that local talent be used, but it worked well in this project.
The only software available at the time to move the effects from the sampler
to the computer was Sample Vision by Turtle Beach. Because it did not have
any digital effects editing capabilities, I used several outboard processors
on some of the final effects. Thank goodness we don't have to do that sort
of thing any longer.
Hexxagon and Argo Checkers
What do you do for sound effects in board games? I had the fun of discovering
just that when I worked on these two games. I went to the toy store and the
party supply store to buy every kind of noisemaker I could find -- whistles,
clickers, horns, and other leftovers from Halloween. They were each recorded
with as many variations as possible so there would be a "palate" of sounds
from which to choose. One of the game pieces had to spin and fall to the
playing surface. To make this sound, a large coin was spun on a coffee table.
The final effect was a combination of a slide whistle and the coin spin.
Wave for Windows was available at the time I was doing these sounds, and
it made mixing different sounds easy. It also had the potential to change
the timing of an effect without affecting its pitch. This came in handy for
getting the sounds to sync with the animations.
Doom
Early in the development of Doom, Tom Hall created what was called the Doom
Bible. It gave a lot of background information regarding the demons in the
game. It provided the information I needed to create raw material. The game
changed considerably between that time and the final version, but the raw
material was still appropriate when it came time to complete the final sound
effects. The raw material consisted of many animal sounds, explosions, weapon
sounds, etc. To fatten these sounds up, I used my own voice. The idea was
to make a similar sound on the same pitch and mix it in at a lower amplitude.
The final sounds were completed in the last month of development. The sound
driver for Doom was a major change from Wolf 3D in that more than one digital
sound could be played at once. This made it easier to set the priority of
play but it made it more difficult to get the relative volumes correct. There
were several classes of sounds in Doom. One was general active sounds that
were not attached to any one demon. These were more or less ambient sounds,
but they didn't play until demons close to the player "woke up" [usually
based upon the player making some noise in the area]. Then there were demon
active sounds that were attached to individual demons. These sounds let the
player know what class of demon was around the corner. Each type of demon
had a sight sound that played when the demon "saw" the player. There were
also attack, hurt and death sounds particular to each type of demon. Another
helpful thing about the sound driver was that the volume of sounds depended
upon the distance from the player to the source of the sound. This helped
keep the overall volume down during non-combat. It also stood to help scare
the pants off the player when a demon in a dark niche woke up and immediately
screamed his attack sound.
I was at id Software for the month that the sound effects were finalized.
As I finished sounds, John Romero would plug them into the game so we could
test them. John was the evil voice at the end of Doom II. One evening, he
was playing the game with clipping off so he could walk through walls. Something
in the final level caught his attention. It was his head on a stake! The
artists had digitized John for rough artwork and as a joke put him in the
game. Since he has a great sense of humor and a good (wild) imagination,
John decided to put his own joke in the game. We recorded him saying "In
order to win the game, you must kill me, John Romero." After that, I put
heavy flanging and echo on his voice and then reversed the whole thing. It
was fun to wait for the artists to discover this joke. It was quite a while
before we told them what was being said.
The software that I depended upon most during the development of the Doom
sound effects was Wave for Windows. The most helpful thing to come along
between the two Dooms was a shareware program called Cool Edit. It added
significantly to the arsenal of digital effects and made it easy to mix sounds
from different files. I depended upon it heavily, and it is the reason that
my outboard effects equipment has since gathered a good bit of dust.
Because so many of the sounds from Doom were familiar, it was decided to
keep them in Doom II. Several additional sounds were required, though. The
Archvile is an evil healer. Anyone getting in his way is blasted with fire
and disintegrated. This includes other demons. But, after he has wrought
his destruction, he then goes around and reanimates all of the demons. Because
of this interesting dual personality, I decided to give him a very evil laugh
as an active sound. For his death sound, I recorded a young girl saying "why,"
pitch shifted it down and mixed it with other sounds. The Archvile just doesn't
understand why anyone would want to kill him as he sees himself as only doing
good for his fellow demon.
Duke Nukem 3D
Another great piece of software came to my attention after Cool Edit. Rob
Wallace had been praising Sound Forge since he first began using it. Because
I was comfortable with Cool Edit, I did not consider using Sound Forge. In
looking for new tools to complete the sound effects for the latest in the
Duke Nukem series, I tried it. What a beautiful program! It is much faster
than Cool Edit for most operations. The one thing that really made it handy
was a pop and click removal tool. I had some very old analog sound effects
that would be excellent for use in Duke, but they had included a loud hum
and some crackling every once in a while. The hum was easily removed using
both Cool Edit and Sound Forge. The crackles were readily removed with Sound
Forge.
The voice of Duke was recorded in California on DAT (Digital Audio Tape)
and sent to me. I then played the tape into a digital i/o board using either
Sound Forge or Cool Edit. The result was a wave file that I could edit to
my heart's content and send on to 3D Realms for a decision as to what was
to be used in the game.
How much space is required to store the different types of samples?
Audio CDs have stereo samples at 44.1 kHz, 16 bit resolution. That resolution
requires 2 bytes per sample. At 44,100 samples per second, each taking 2
bytes (16 bits) of precision, we need 88,200 bytes per second for storage.
Since stereo is two tracks we have to double that: 176,400 bytes per second.
Take that times 60 seconds per minute and you get 10,584,000 bytes per minute
-- one minute of stereo 44.1k 16 bit sound requires almost 10 megabytes of
storage (uncompressed).
Storage Requirements for One Minute of Sound
Type:
|
Mono
|
Mono
|
Stereo
|
Stereo
|
Resolution:
|
8 bit
|
16 bit
|
8 bit
|
16 bit
|
Sampling Rate
|
44.1k
|
2646k
|
5292k
|
5292k
|
10584k
|
22.05k
|
1323k
|
2646k
|
2646k
|
5292k
|
11.025k
|
661.5k
|
1323k
|
1323k
|
2646k
|
8k
|
480k
|
960k
|
960k
|
1920k
|
7k
|
420k
|
840k
|
840k
|
1680k
|
6k
|
360k
|
720k
|
720k
|
1440k
|
5k
|
300k
|
600k
|
600k
|
1200k
|
Figure 1
The table in Figure 1 indicates the different
storage requirements for different types, sample rates and resolutions of
one minute of digital sound. The decision one makes regarding these variables
should be based upon the balancing of sound quality and storage requirement.
If the sound effects we want to use have a large dynamic range (amplitudes
from very low to very high), we would probably want to use 16 bit resolution.
If all sounds are going to be about the same volume, 8 bit should suffice.
The Psychology Of Sound
We are assailed by sound constantly. It is sometimes enough to completely
disorient us. Do you turn the car radio down when you are trying to concentrate
on street names or when you stop to look at a map? Sounds can be very
distracting. This is something we have to keep in mind before we start throwing
myriad sfx into our projects. Our brains know how distracting sound can be.
To handle the distraction, the brain focuses attention on the more important
sounds. It blocks us from consciously hearing competing sounds. In a computer
game our mind can get very confused. The brain gets befuddled and doesn't
know what sounds to focus on. This is where proper game sound design comes
in. As in the movies, we decide for the game player's brain what it
is to focus on by making one sound predominant at a time. Careful planning
will ensure that this predominant sound is the most important one for the
game player to hear.
What sounds do we hear each day? There are the sounds we remember well: voices,
sirens, explosions, favorite songs, and the like. There are also the sounds
that were there but never caught our attention: a breeze rustling leaves,
an airplane far off in the distance, general traffic noise, the sounds of
animals, the sound of our computer fans, children playing, elevator music,
someone making a presentation during a boring meeting, the sound of our boss
droning on and on ;), etc. We generally do not notice these sounds, that
is until they stop. That we would notice quickly. Complete silence is neither
normal nor realistic.
This brings us to two important aspects of sound in computer games. Consider
both of these as generalities. First, we should have one predominant
sound. That sound can be one single sound or a cacophony of other sounds
(multiple explosions, screams, weapons fire, etc.) -- either one serves to
focus the player's attention. Second, we want to have the normal background
sounds of life going on in our games. Normal here means normal for the
environment we have created within the game. This is ambient sound that is
not ordinarily used to focus attention -- just to make the gaming experience
more lifelike.
Ambient sounds can be divided into two categories: the sounds that are constant
in an environment and the sounds that occur on a random basis. Listening
to my environment at this very moment (02/10/96 14:26), I hear: two computer
fans on different pitches (constant), insects (constant--I live in Florida),
a powerboat passing by (random), a small airplane overhead (random), a lawn
mower far in the distance (random), my wife working in the other room (random,
but she says "constant"), my typing on the computer keyboard (random), and
a car passing by very slowly (random -- I said that I live in
Florida!). There is no one sound that my brain is having to focus on,
thankfully, as I need to concentrate on what I am typing. Until I consciously
paid attention to the sounds around me, I did not even know they exist.
If I were immersed in a computer game and these same things were going on,
I'd not know what to focus on based upon sound cues alone. This is not the
best situation. It breaks the general rule of having one sound keep the player
focused on the game. This does not mean that the focus sound must be invasive,
intruding, loud or any other obvious attention-getting factor. We could focus
attention by playing random sounds more often than in real life (maybe to
keep a fear factor alive, or in a children's game to make the child wonder
what the heck is going on in the distance and how do I get to that
action). We could also focus attention by bringing up the volume of a
constant sound (remember how in movie swamp scenes the insect noises get
very loud to let us know the terrible circumstances our hero is in?). Another
"movie" method of focusing a viewer's attention works well in computer games
too. You could have things get very, very, very quiet and then BLAM
something exciting happens. Whatever method is used, remember that we are
trying to do the sound work the brain handles in everyday life -- we are
trying to focus attention. Of course, there are computer games where sound
should always be ambient and never draw attention to itself or anything else.
Examples would be board games like chess and checkers.
Questions to ask when deciding what sounds to use in a project:
1. What does the subject matter of the project suggest in the way of sound?
Come up with some adjectives that describe the project and look for sound
effects that bring these adjectives to mind.
2. How much space is available for digital sound? If space is not limited,
think in terms of high quality, but remember that lower sampling rates can
be used to make effects sound more gritty, distant or muffled. Sometimes
that is desirable. Don't be afraid to mix effects at different sampling rates
and resolutions if your sound drivers support them.
3. How many sounds can be played at one time (based upon sound driver or
hardware limitations)? If the number is limited, you will have to decide
which effects will receive priority of play.
4. What is the dynamic range of all of the sounds to be used? If it is great,
use 16 bit resolution. Otherwise, 8 bit resolution will suffice.
5. What is the range of the sampling rates to be used? Using higher sampling
rates with 8 bit resolution can increase the apparent volume of background
hiss. Since 8 bit sounds have more hiss and since hiss is generally higher
pitched, a higher sampling rate will increase the volume of those higher
pitched sounds (hissing included). Don't be afraid to experiment with changing
the resolution of noisy sounds to see if the noise can be reduced.
6. Is there going to be intelligible speech? In general, it requires a higher
sampling rate. A female voice will usually require a higher sampling rate
than a male voice. For "full bodied" speech, use 16 bit resolution.
7. Are the effects or voice-overs to be recorded professionally or are they
to be recorded with less than professional equipment? Using less than
professional equipment usually means more noise.
8. Are the sound effects going to be processed? If so, what types of processing
will be used?
9. What are the relative volumes of the different digital sounds
(music/voice-over/Foley)? Decide which sounds should have priority in volume.
10. Where do you want sound to emphasize the action in your project? Remember
that your goal is to focus the listener's attention.
11. Are ambient sounds needed? Do you ever want complete silence? Remember
that ambient sounds can mask noise.
After answering these questions, where does one start in deciding which
sfx to use in a project?
A good place to start without having to reinvent the wheel is the movies.
State of the art movie sound is years ahead of computer games, so it would
pay to take lessons from Hollywood in this respect. Of course we must keep
in mind the linearity of movies as opposed to the "random access" of a computer
game. In looking to Hollywood for lessons, I decided to make a list of all
of the movies that have won Academy Awards for sound and/or sound effects.
The list in
Appendix
A is the result. I have watched and listened to many of these movies
and it is fascinating to see how Hollywood of years ago had many of the same
technical problems that we in computer gaming face today. Early Hollywood
had a major advantage -- they controlled the theaters and the equipment contained
in them. Another advantage that Hollywood has always had with sound is that
they control the whole movie experience and know what is going to happen
next. We can do that with cinematics, but it is a challenge in truly interactive
gameplay.
What do award winning sound effects in movies have in common?
1. They focus the viewer's attention.
2. They are bigger than life.
3. The sound effects and the music work together to focus the viewer's attention.
4. There is rarely complete silence. Some background (ambient) sound is going
on almost all of the time. Otherwise, the viewer will be distracted by some
sounds outside of the movie. How many times in a completely silent part of
a movie have you been annoyed by someone talking? It is very annoying
because it causes a loss of focus. This is not to say that silence cannot
be used in a computer game. But remember that there is always the drone of
the cooling fan(s) and the buzz of a sound card.
5. They do not "get in the face" of the dialog.
6. They do not "get in the face" of one another. Usually one effect takes
precedence over all of the others.
7. They prepare us for what is to come.
8. They set us up for what is to come.
9. They distract us from what is to come.
10. They take the place of the senses that we can experience in a movie (touch,
taste, and smell).
-
We know that the cook has touched something hot
when we hear his anguished cry of pain along with the sizzle of flesh. We
"feel" the pain with him.
-
We can smell the roses along with the beautiful
princess when we hear her take a deep breath while a dainty, sparkling sound
plays.
-
We can taste the bitter poison along with the
murder victim as we hear him gag and froth at the mouth while a discordant
sound effect plays.
11. They help place the listener in another
"reality."
General rules for better sound effects:
1. Start with the absolute best raw materials --
samples/recordings/actors/sounds/etc.
2. Start with the highest quality digital data. Record sound effects into
a portable Digital Audio Tape (DAT) recorder at 44.1k 16 bit. This leaves
nothing out of the recording and there is no tape hiss like you get on an
analog tape machine. Make sure that you recorded the hottest signal you can
without pegging the record meter. This will keep the "intelligent data" at
a high enough amplitude to cover up much of the noise present.
3. Use a high quality stereo microphone for foley effects. Use a superior
quality mono microphone for speech.
4. With the possible exception of compression on vocals, do not use outboard
effects during recording. What is compression? It is a reduction in the dynamic
range of a sound. It can help to hide noise in a sample. Digital effects,
including compression, can be added via software.
5. Use a digital sound interface card to transfer the data from the DAT to
computer via fiber optic or coaxial cables. All wave editing software will
make this a simple matter.
6. Edit the sound file before downsampling or converting to 8 bit
format. This keeps the noise out of the sample for as long as possible. It
is acceptable sometimes to convert a file before applying some type of digital
processing. The results can be interesting when the noise is used to make
a sound less realistic.
7. The experienced sound designer does not take sound for granted and realizes
that the sounds we hear every day are pretty wimpy. There is little expectation
of getting usable sound effects by recording "real" sounds. Instead, record
similar, but greatly exaggerated, sounds.
8. If you cannot help recording extraneous background noises, make sure to
get a good sample of them for noise reduction when you get back to the digital
editing software. Also, there will often be noises that you fail to hear
at the time of the recording because you are concentrating so hard on getting
the sound effect. You will want a recording of these by themselves also.
9. But, never depend on noise reduction algorithms/equipment if at all possible.
10. There is often some type of background noise that will become objectionable
when a 16 bit file is reduced to an 8 bit file.
And Speaking Of Noise --
What is noise? Instead of answering this subjective question, let's ask one
that can be answered more objectively. What is silence? A scientific definition
is that silence is the absence of rapid changes in air pressure. Gordon Hempton,
an Emmy-winning recordist, has searched the world over for quiet. He defines
a quiet place as "a place where, for a period of time, there is no human
intrusion. No chain saws. No trail bikes. No distant trucks. A quiet place
is a place where we're able to hear the world as our ancestors heard it."
Hempton feels lucky if he can record twenty minutes of noise-free sound,
patched together from a week's worth of work. As a listener and a sound
recordist, Hempton says that he has no control over the performance [noise
included], but he does have control over where the audience will sit.
"So that's exactly what I do -- I find the best seats possible in nature's
amphitheater."
Like it or not, even though you may prepare very carefully you will still
face the problem of noisy raw materials for your sound effects. Because this
is a universal problem, it would be good to discuss handling the problem.
Let's say that you have done everything to reduce noise when recording the
raw material but when you get down to using it you hear noise. The problem
gets worse when you reduce the sample resolution to 8 bits. So, what can
you do? First of all, as was said above, always record a few seconds of live
mike with nothing but ambient noise. It is best to start the recorder, let
it run a few seconds, record the sound effect, and then let the tape run
for a few more seconds of ambiance -- all of this being one recording without
turning the recorder off. These seconds of "quiet" will be very useful as
you will see. Remember that it is very important to keep your recording volume
maximized. When you load the digital waveform into your digital editor, find
a selection in the sample where there is only "quiet" (noise). Since you
will always record a few seconds of ambiance, this is no problem. Next, have
the editing software analyze the selection. Then use the noise reduction
algorithm on the total sample. You will probably have to work at this a while
to get the feel for the proper settings, but the time you spend doing so
will be well spent. Cool Edit and Sound Forge have excellent noise reduction
algorithms.
What digital editing software is available?
A non-exhaustive list of software is in
Appendix
B.
Where can one purchase "raw sound effects" that have already been recorded?
Most game development companies license a sound effect library on CD. The
library is generally purchased as a buyout, meaning that no other fees have
to be paid to use the effects in a product. This license is good for the
life of the CD's, and you get to use the effects as many times as you wish
for as long as you own the CD's. There are differences in the sound quality
of many of the libraries -- some use true man-made sound effects while others
depend upon electronic sounds. The important things to look for in a library
are variety of sounds, quality of sounds, and (probably most importantly)
a complete index with cross references. If you cannot find the sounds on
a CD set without resorting to browsing, you are no better off than if you
had no CD's at all. The ideal situation is to have the index and cross reference
in a database that will allow you to search for key words. Track names are
too limited in most cases, so the ability to search the descriptions of the
effects on a CD becomes a requirement if you want to hear everything that
resembles what you are looking for. Additionally, a database would allow
you to add comments so that you can make specific notes about each effect.
Many companies in the sound effect CD business offer a demo CD. This should
be your first step in researching what is available. Upon purchase of a library,
most companies will allow 30 day's satisfaction guaranteed or money back.
Ask about this if the offer is not advertised.
Appendix
C lists several companies offering buyout effects CD's.
Why not buy the sound effects CD's offered in audio CD stores? You have to
watch out regarding the licensing agreement on these. Some state on the cover
that they are "royalty free," but the fine print states "royalty free for
personal use." Read the agreement carefully.
Who can design good sound effects?
Almost anyone. Some of the common characteristics of designers of
good sound effects: ability to hear the "sound effect" potential in sounds
that others overlook; above normal ability to hear pitch; ability to visualize
what would happen to a sound if it is altered by a sound editor without actually
having to perform the alteration; ability to visualize a sound from a description
of the source of the sound; ability to create a sound that is only heard
in one's head to begin with; knowledge of proper recording technique and
equipment; fondness for gadgets; patience; good sense of humor; good looks
;).
|