Sound and Music

Tricks and Techniques for Sound Effect Design

 
By Bobby Prince
Presented at the CGDC
March 1996


Computer Game Developers Conference
What is sound?

You have heard the question "if a tree falls in a forest far from any sound detector (human ear, microphone, etc.), does the tree's fall make any noise?" If we define sound as waves that are carried by the air, the answer would be yes -- wherever there are sound waves there is sound. What if we define sound subjectively as a sensation in the ear? Then the tree falling would make no sound if there is no ear to detect it. We could go even further in the definition and say that sound is what a brain decodes using electrical impulses from the ears. In this case, too, the tree makes no sound when there are no ears to detect it. Since we are going to be relating sound to interactive game play, let's consider sound to be whatever the game player's brain decodes. It will be helpful for us to keep the following overall goal in mind: Sound should communicate effectively what we want the listener to know or experience -- it should focus the player's attention. In order to meet this goal we must always approach sound from the player's viewpoint.

The Physics of Sound

Sound is produced by vibrations that disturb the air, causing pressure waves that travel out in all directions from the source of the sound. When the waves reach someone's ear, they set up vibrations that cause electrical signals to be sent to the brain. These electrical signals are perceived as sound. Sound is one of the first sensations an unborn child experiences. It brings us our first lessons of life while we are still in the womb. We don't have to learn to detect the vibrations of sound. We take it all for granted from our earliest experiences.

Recording and Storing Sounds

In the early days of movie making it was discovered that the human eye is fooled into seeing lifelike moving pictures if the individual frames pass in front of the eye at 24 frames per second. Similarly, the audio community experimented with sound to see how many snapshots of a sound must be taken per second to fool the ear into hearing lifelike sounds from digital signals. It was decided that 44,100 audio snapshots would do the same trick for the ears as 24 frames per second does for the eyes. These audio snapshots are called samples and they are the equivalent of a video frame in a movie. The audio equivalent of the number of video frames per second is the sampling rate. You have probably heard the term sample used for a complete sound. It is also used to describe the smallest part of a digital sound.

Very few people can hear any fewer than 16 Hz or any more than about 22 kHz (thousand cycles per second). So, what sampling rate is required to record sounds from 16 Hz to 22 kHz? This question is answered in the Nyquist formula which states that the sampling rate of a sound must be twice the frequency of the highest sound to be sampled. You might ask what will happen if you record something at too low a sampling rate, disregarding the Nyquist formula. The answer is that the average person would not think that the sound was lifelike.

How do we store the digital samples of a sound?

We take snapshots of the amplitudes of a sound at regular intervals and store the values. We are merely storing values that represent the air pressure of a vibration at any one point in time. So a digital sound file is merely a method of storing air pressure! How much air pressure can be stored in one byte? One byte (8 bits) can have a value from 0 to 255 decimal. For this reason, 8-bit digital audio can store 256 discrete air pressure levels. 16 bit digital audio can store air pressure much more precisely, allowing values from 0 to 65535, or 65536 discrete air pressure levels. As you can well imagine, 8 bit resolution leaves a lot to be desired. Original waveforms are "squared off" as a result of using it. The sound is not very clear. But, maybe clarity is not what you are after.

A good way to visualize the difference in 8 and 16 bit audio is to consider a home stereo that has a volume knob with "detents" in it. You know the type that clicks as you turn it? Imagine that you buy a tuner/amplifier that has a knob with positions (clicks) from 0 or, no volume to 65535, or full volume. The volume change from one click to an adjacent one would be indistinguishable. Now take the same knob and give it positions (clicks) from 0, or no volume to 255, or full volume. Now you have only 256 discrete volume settings. You can see how coarse your volume changes will be from click to click -- you can hear each change. Now imagine those changes taking place thousands of times a second. You can hear the roughness of 8 bit amplitude changes.

Wolfenstein 3D

The sound effects for Wolfenstein were recorded directly into a 16-bit sampling keyboard, using live actor voices, live Foley effects and effects recorded on a cassette recorder. Foley effects are named after a man named Foley who created a way to add sound effects live during post-production using a recording studio. Foley effects such as footsteps, rain, thunder, doors opening, keys rattling, etc. are recorded "live," but are not necessarily actual recordings of "real" sounds. Thunder can be made by wiggling a large piece of metal. Fire can be made by crushing paper or plastic. The imagination is the only limit when it comes to Foley effects.

The sound driver for Wolf 3D was designed for playback of 7k samples at 8 bit resolution, so I had to start with high quality material for the voices. That was the reason for recording directly into the sampler. The cassette recorder was used when the microphone cable to the sampler was too short. Digital Audio Tape Recorders (DAT's) were still relatively new and not yet portable. The human voices were recorded using compression. Compressors bring up the low level sounds and hold down the high level sounds -- this makes the sound "meatier" and also helps to hide any noise. You hear compressors at work every time you listen to an FM radio announcer. That's how they get the boom in their voices. With the advent of digital recording equipment and sound editing software, there is not a lot of need for compressors. It can be done in software. The sound driver for Wolf 3D was capable of playing only one digital sound at a time, so priority of sounds was very important.

Some of the weapon firing sounds were recorded at a shooting range. The explosions began as the crashing of the lid on a "Dempsey Dumpster." Scott Miller was the vocal talent for the word "Scheist!" and he could not talk for several hours after doing 3-4 takes at full volume! The other vocal "talent" was everyone with id Software at the time and me. I generally do not suggest that local talent be used, but it worked well in this project.

The only software available at the time to move the effects from the sampler to the computer was Sample Vision by Turtle Beach. Because it did not have any digital effects editing capabilities, I used several outboard processors on some of the final effects. Thank goodness we don't have to do that sort of thing any longer.

Hexxagon and Argo Checkers

What do you do for sound effects in board games? I had the fun of discovering just that when I worked on these two games. I went to the toy store and the party supply store to buy every kind of noisemaker I could find -- whistles, clickers, horns, and other leftovers from Halloween. They were each recorded with as many variations as possible so there would be a "palate" of sounds from which to choose. One of the game pieces had to spin and fall to the playing surface. To make this sound, a large coin was spun on a coffee table. The final effect was a combination of a slide whistle and the coin spin. Wave for Windows was available at the time I was doing these sounds, and it made mixing different sounds easy. It also had the potential to change the timing of an effect without affecting its pitch. This came in handy for getting the sounds to sync with the animations.

Doom

Early in the development of Doom, Tom Hall created what was called the Doom Bible. It gave a lot of background information regarding the demons in the game. It provided the information I needed to create raw material. The game changed considerably between that time and the final version, but the raw material was still appropriate when it came time to complete the final sound effects. The raw material consisted of many animal sounds, explosions, weapon sounds, etc. To fatten these sounds up, I used my own voice. The idea was to make a similar sound on the same pitch and mix it in at a lower amplitude. The final sounds were completed in the last month of development. The sound driver for Doom was a major change from Wolf 3D in that more than one digital sound could be played at once. This made it easier to set the priority of play but it made it more difficult to get the relative volumes correct. There were several classes of sounds in Doom. One was general active sounds that were not attached to any one demon. These were more or less ambient sounds, but they didn't play until demons close to the player "woke up" [usually based upon the player making some noise in the area]. Then there were demon active sounds that were attached to individual demons. These sounds let the player know what class of demon was around the corner. Each type of demon had a sight sound that played when the demon "saw" the player. There were also attack, hurt and death sounds particular to each type of demon. Another helpful thing about the sound driver was that the volume of sounds depended upon the distance from the player to the source of the sound. This helped keep the overall volume down during non-combat. It also stood to help scare the pants off the player when a demon in a dark niche woke up and immediately screamed his attack sound.

I was at id Software for the month that the sound effects were finalized. As I finished sounds, John Romero would plug them into the game so we could test them. John was the evil voice at the end of Doom II. One evening, he was playing the game with clipping off so he could walk through walls. Something in the final level caught his attention. It was his head on a stake! The artists had digitized John for rough artwork and as a joke put him in the game. Since he has a great sense of humor and a good (wild) imagination, John decided to put his own joke in the game. We recorded him saying "In order to win the game, you must kill me, John Romero." After that, I put heavy flanging and echo on his voice and then reversed the whole thing. It was fun to wait for the artists to discover this joke. It was quite a while before we told them what was being said.

The software that I depended upon most during the development of the Doom sound effects was Wave for Windows. The most helpful thing to come along between the two Dooms was a shareware program called Cool Edit. It added significantly to the arsenal of digital effects and made it easy to mix sounds from different files. I depended upon it heavily, and it is the reason that my outboard effects equipment has since gathered a good bit of dust.

Because so many of the sounds from Doom were familiar, it was decided to keep them in Doom II. Several additional sounds were required, though. The Archvile is an evil healer. Anyone getting in his way is blasted with fire and disintegrated. This includes other demons. But, after he has wrought his destruction, he then goes around and reanimates all of the demons. Because of this interesting dual personality, I decided to give him a very evil laugh as an active sound. For his death sound, I recorded a young girl saying "why," pitch shifted it down and mixed it with other sounds. The Archvile just doesn't understand why anyone would want to kill him as he sees himself as only doing good for his fellow demon.

Duke Nukem 3D

Another great piece of software came to my attention after Cool Edit. Rob Wallace had been praising Sound Forge since he first began using it. Because I was comfortable with Cool Edit, I did not consider using Sound Forge. In looking for new tools to complete the sound effects for the latest in the Duke Nukem series, I tried it. What a beautiful program! It is much faster than Cool Edit for most operations. The one thing that really made it handy was a pop and click removal tool. I had some very old analog sound effects that would be excellent for use in Duke, but they had included a loud hum and some crackling every once in a while. The hum was easily removed using both Cool Edit and Sound Forge. The crackles were readily removed with Sound Forge.

The voice of Duke was recorded in California on DAT (Digital Audio Tape) and sent to me. I then played the tape into a digital i/o board using either Sound Forge or Cool Edit. The result was a wave file that I could edit to my heart's content and send on to 3D Realms for a decision as to what was to be used in the game.

How much space is required to store the different types of samples?

Audio CDs have stereo samples at 44.1 kHz, 16 bit resolution. That resolution requires 2 bytes per sample. At 44,100 samples per second, each taking 2 bytes (16 bits) of precision, we need 88,200 bytes per second for storage. Since stereo is two tracks we have to double that: 176,400 bytes per second. Take that times 60 seconds per minute and you get 10,584,000 bytes per minute -- one minute of stereo 44.1k 16 bit sound requires almost 10 megabytes of storage (uncompressed).


Storage Requirements for One Minute of Sound
Type: Mono Mono Stereo Stereo
Resolution: 8 bit 16 bit 8 bit 16 bit
Sampling Rate
44.1k 2646k 5292k 5292k 10584k
22.05k 1323k 2646k 2646k 5292k
11.025k 661.5k 1323k 1323k 2646k
8k 480k 960k 960k 1920k
7k 420k 840k 840k 1680k
6k 360k 720k 720k 1440k
5k 300k 600k 600k 1200k

Figure 1

The table in Figure 1 indicates the different storage requirements for different types, sample rates and resolutions of one minute of digital sound. The decision one makes regarding these variables should be based upon the balancing of sound quality and storage requirement. If the sound effects we want to use have a large dynamic range (amplitudes from very low to very high), we would probably want to use 16 bit resolution. If all sounds are going to be about the same volume, 8 bit should suffice.

The Psychology Of Sound

We are assailed by sound constantly. It is sometimes enough to completely disorient us. Do you turn the car radio down when you are trying to concentrate on street names or when you stop to look at a map? Sounds can be very distracting. This is something we have to keep in mind before we start throwing myriad sfx into our projects. Our brains know how distracting sound can be. To handle the distraction, the brain focuses attention on the more important sounds. It blocks us from consciously hearing competing sounds. In a computer game our mind can get very confused. The brain gets befuddled and doesn't know what sounds to focus on. This is where proper game sound design comes in. As in the movies, we decide for the game player's brain what it is to focus on by making one sound predominant at a time. Careful planning will ensure that this predominant sound is the most important one for the game player to hear.

What sounds do we hear each day? There are the sounds we remember well: voices, sirens, explosions, favorite songs, and the like. There are also the sounds that were there but never caught our attention: a breeze rustling leaves, an airplane far off in the distance, general traffic noise, the sounds of animals, the sound of our computer fans, children playing, elevator music, someone making a presentation during a boring meeting, the sound of our boss droning on and on ;), etc. We generally do not notice these sounds, that is until they stop. That we would notice quickly. Complete silence is neither normal nor realistic.

This brings us to two important aspects of sound in computer games. Consider both of these as generalities. First, we should have one predominant sound. That sound can be one single sound or a cacophony of other sounds (multiple explosions, screams, weapons fire, etc.) -- either one serves to focus the player's attention. Second, we want to have the normal background sounds of life going on in our games. Normal here means normal for the environment we have created within the game. This is ambient sound that is not ordinarily used to focus attention -- just to make the gaming experience more lifelike.

Ambient sounds can be divided into two categories: the sounds that are constant in an environment and the sounds that occur on a random basis. Listening to my environment at this very moment (02/10/96 14:26), I hear: two computer fans on different pitches (constant), insects (constant--I live in Florida), a powerboat passing by (random), a small airplane overhead (random), a lawn mower far in the distance (random), my wife working in the other room (random, but she says "constant"), my typing on the computer keyboard (random), and a car passing by very slowly (random -- I said that I live in Florida!). There is no one sound that my brain is having to focus on, thankfully, as I need to concentrate on what I am typing. Until I consciously paid attention to the sounds around me, I did not even know they exist.

If I were immersed in a computer game and these same things were going on, I'd not know what to focus on based upon sound cues alone. This is not the best situation. It breaks the general rule of having one sound keep the player focused on the game. This does not mean that the focus sound must be invasive, intruding, loud or any other obvious attention-getting factor. We could focus attention by playing random sounds more often than in real life (maybe to keep a fear factor alive, or in a children's game to make the child wonder what the heck is going on in the distance and how do I get to that action). We could also focus attention by bringing up the volume of a constant sound (remember how in movie swamp scenes the insect noises get very loud to let us know the terrible circumstances our hero is in?). Another "movie" method of focusing a viewer's attention works well in computer games too. You could have things get very, very, very quiet and then BLAM something exciting happens. Whatever method is used, remember that we are trying to do the sound work the brain handles in everyday life -- we are trying to focus attention. Of course, there are computer games where sound should always be ambient and never draw attention to itself or anything else. Examples would be board games like chess and checkers.

Questions to ask when deciding what sounds to use in a project:

1. What does the subject matter of the project suggest in the way of sound? Come up with some adjectives that describe the project and look for sound effects that bring these adjectives to mind.

2. How much space is available for digital sound? If space is not limited, think in terms of high quality, but remember that lower sampling rates can be used to make effects sound more gritty, distant or muffled. Sometimes that is desirable. Don't be afraid to mix effects at different sampling rates and resolutions if your sound drivers support them.

3. How many sounds can be played at one time (based upon sound driver or hardware limitations)? If the number is limited, you will have to decide which effects will receive priority of play.

4. What is the dynamic range of all of the sounds to be used? If it is great, use 16 bit resolution. Otherwise, 8 bit resolution will suffice.

5. What is the range of the sampling rates to be used? Using higher sampling rates with 8 bit resolution can increase the apparent volume of background hiss. Since 8 bit sounds have more hiss and since hiss is generally higher pitched, a higher sampling rate will increase the volume of those higher pitched sounds (hissing included). Don't be afraid to experiment with changing the resolution of noisy sounds to see if the noise can be reduced.

6. Is there going to be intelligible speech? In general, it requires a higher sampling rate. A female voice will usually require a higher sampling rate than a male voice. For "full bodied" speech, use 16 bit resolution.

7. Are the effects or voice-overs to be recorded professionally or are they to be recorded with less than professional equipment? Using less than professional equipment usually means more noise.

8. Are the sound effects going to be processed? If so, what types of processing will be used?

9. What are the relative volumes of the different digital sounds (music/voice-over/Foley)? Decide which sounds should have priority in volume.

10. Where do you want sound to emphasize the action in your project? Remember that your goal is to focus the listener's attention.

11. Are ambient sounds needed? Do you ever want complete silence? Remember that ambient sounds can mask noise.

After answering these questions, where does one start in deciding which sfx to use in a project?

A good place to start without having to reinvent the wheel is the movies. State of the art movie sound is years ahead of computer games, so it would pay to take lessons from Hollywood in this respect. Of course we must keep in mind the linearity of movies as opposed to the "random access" of a computer game. In looking to Hollywood for lessons, I decided to make a list of all of the movies that have won Academy Awards for sound and/or sound effects. The list in Appendix A is the result. I have watched and listened to many of these movies and it is fascinating to see how Hollywood of years ago had many of the same technical problems that we in computer gaming face today. Early Hollywood had a major advantage -- they controlled the theaters and the equipment contained in them. Another advantage that Hollywood has always had with sound is that they control the whole movie experience and know what is going to happen next. We can do that with cinematics, but it is a challenge in truly interactive gameplay.

What do award winning sound effects in movies have in common?

1. They focus the viewer's attention.

2. They are bigger than life.

3. The sound effects and the music work together to focus the viewer's attention.

4. There is rarely complete silence. Some background (ambient) sound is going on almost all of the time. Otherwise, the viewer will be distracted by some sounds outside of the movie. How many times in a completely silent part of a movie have you been annoyed by someone talking? It is very annoying because it causes a loss of focus. This is not to say that silence cannot be used in a computer game. But remember that there is always the drone of the cooling fan(s) and the buzz of a sound card.

5. They do not "get in the face" of the dialog.

6. They do not "get in the face" of one another. Usually one effect takes precedence over all of the others.

7. They prepare us for what is to come.

8. They set us up for what is to come.

9. They distract us from what is to come.

10. They take the place of the senses that we can experience in a movie (touch, taste, and smell).

  • We know that the cook has touched something hot when we hear his anguished cry of pain along with the sizzle of flesh. We "feel" the pain with him.
  • We can smell the roses along with the beautiful princess when we hear her take a deep breath while a dainty, sparkling sound plays.
  • We can taste the bitter poison along with the murder victim as we hear him gag and froth at the mouth while a discordant sound effect plays.

11. They help place the listener in another "reality."

General rules for better sound effects:

1. Start with the absolute best raw materials -- samples/recordings/actors/sounds/etc.

2. Start with the highest quality digital data. Record sound effects into a portable Digital Audio Tape (DAT) recorder at 44.1k 16 bit. This leaves nothing out of the recording and there is no tape hiss like you get on an analog tape machine. Make sure that you recorded the hottest signal you can without pegging the record meter. This will keep the "intelligent data" at a high enough amplitude to cover up much of the noise present.

3. Use a high quality stereo microphone for foley effects. Use a superior quality mono microphone for speech.

4. With the possible exception of compression on vocals, do not use outboard effects during recording. What is compression? It is a reduction in the dynamic range of a sound. It can help to hide noise in a sample. Digital effects, including compression, can be added via software.

5. Use a digital sound interface card to transfer the data from the DAT to computer via fiber optic or coaxial cables. All wave editing software will make this a simple matter.

6. Edit the sound file before downsampling or converting to 8 bit format. This keeps the noise out of the sample for as long as possible. It is acceptable sometimes to convert a file before applying some type of digital processing. The results can be interesting when the noise is used to make a sound less realistic.

7. The experienced sound designer does not take sound for granted and realizes that the sounds we hear every day are pretty wimpy. There is little expectation of getting usable sound effects by recording "real" sounds. Instead, record similar, but greatly exaggerated, sounds.

8. If you cannot help recording extraneous background noises, make sure to get a good sample of them for noise reduction when you get back to the digital editing software. Also, there will often be noises that you fail to hear at the time of the recording because you are concentrating so hard on getting the sound effect. You will want a recording of these by themselves also.

9. But, never depend on noise reduction algorithms/equipment if at all possible.

10. There is often some type of background noise that will become objectionable when a 16 bit file is reduced to an 8 bit file.

And Speaking Of Noise --

What is noise? Instead of answering this subjective question, let's ask one that can be answered more objectively. What is silence? A scientific definition is that silence is the absence of rapid changes in air pressure. Gordon Hempton, an Emmy-winning recordist, has searched the world over for quiet. He defines a quiet place as "a place where, for a period of time, there is no human intrusion. No chain saws. No trail bikes. No distant trucks. A quiet place is a place where we're able to hear the world as our ancestors heard it." Hempton feels lucky if he can record twenty minutes of noise-free sound, patched together from a week's worth of work. As a listener and a sound recordist, Hempton says that he has no control over the performance [noise included], but he does have control over where the audience will sit. "So that's exactly what I do -- I find the best seats possible in nature's amphitheater."

Like it or not, even though you may prepare very carefully you will still face the problem of noisy raw materials for your sound effects. Because this is a universal problem, it would be good to discuss handling the problem. Let's say that you have done everything to reduce noise when recording the raw material but when you get down to using it you hear noise. The problem gets worse when you reduce the sample resolution to 8 bits. So, what can you do? First of all, as was said above, always record a few seconds of live mike with nothing but ambient noise. It is best to start the recorder, let it run a few seconds, record the sound effect, and then let the tape run for a few more seconds of ambiance -- all of this being one recording without turning the recorder off. These seconds of "quiet" will be very useful as you will see. Remember that it is very important to keep your recording volume maximized. When you load the digital waveform into your digital editor, find a selection in the sample where there is only "quiet" (noise). Since you will always record a few seconds of ambiance, this is no problem. Next, have the editing software analyze the selection. Then use the noise reduction algorithm on the total sample. You will probably have to work at this a while to get the feel for the proper settings, but the time you spend doing so will be well spent. Cool Edit and Sound Forge have excellent noise reduction algorithms.

What digital editing software is available?

A non-exhaustive list of software is in Appendix B.

Where can one purchase "raw sound effects" that have already been recorded?

Most game development companies license a sound effect library on CD. The library is generally purchased as a buyout, meaning that no other fees have to be paid to use the effects in a product. This license is good for the life of the CD's, and you get to use the effects as many times as you wish for as long as you own the CD's. There are differences in the sound quality of many of the libraries -- some use true man-made sound effects while others depend upon electronic sounds. The important things to look for in a library are variety of sounds, quality of sounds, and (probably most importantly) a complete index with cross references. If you cannot find the sounds on a CD set without resorting to browsing, you are no better off than if you had no CD's at all. The ideal situation is to have the index and cross reference in a database that will allow you to search for key words. Track names are too limited in most cases, so the ability to search the descriptions of the effects on a CD becomes a requirement if you want to hear everything that resembles what you are looking for. Additionally, a database would allow you to add comments so that you can make specific notes about each effect.

Many companies in the sound effect CD business offer a demo CD. This should be your first step in researching what is available. Upon purchase of a library, most companies will allow 30 day's satisfaction guaranteed or money back. Ask about this if the offer is not advertised. Appendix C lists several companies offering buyout effects CD's.

Why not buy the sound effects CD's offered in audio CD stores? You have to watch out regarding the licensing agreement on these. Some state on the cover that they are "royalty free," but the fine print states "royalty free for personal use." Read the agreement carefully.

Who can design good sound effects?

Almost anyone. Some of the common characteristics of designers of good sound effects: ability to hear the "sound effect" potential in sounds that others overlook; above normal ability to hear pitch; ability to visualize what would happen to a sound if it is altered by a sound editor without actually having to perform the alteration; ability to visualize a sound from a description of the source of the sound; ability to create a sound that is only heard in one's head to begin with; knowledge of proper recording technique and equipment; fondness for gadgets; patience; good sense of humor; good looks ;).