Avoiding a DirectSound3D Disaster

By Rich Warwick
Published in Game Developer Magazine, January 1998
GD Logo

DirectSound 3D
Page 1
  Page 2
3D audio can have a tremendous effect on a gamer's experience. Unfortunately, if a game doesn't use the DirectSound3D API correctly, the effect can vary between little or no 3D audio positioning to even more serious problems, such as accidental system overloading. While the methods for implementing 3D positional audio via DirectSound3D can be found in various books as well as in Microsoft's own documentation, knowing how not to write to this API is just as critical to a successful implementation in your game.

Defining the Vocabulary

Before diving into how and how not to use 3D audio, let's clear up some of the confusion associated with the technology. Often the terms "3D audio," "positional audio," "spatialization," "virtualization," and "stereo expansion" are used interchagably. In reality, 3D positional audio, virtualization, and spatialization are three different concepts, and their differences must be understood before they can be properly applied to games or applications.

Spatialization, sometimes called stereo expansion, uses signal processing to expand the perceived location of speakers. It is a nonlocalizing effect, meaning that it doesn't localize a sound or a channel to a specific location. In fact, it does the opposite. Spatialization disperses the perceived location of the sound so that the listener can no longer determine the exact location of the speakers. It makes the listener believe that the sound is coming from an area that is much wider than the actual speakers. The general perception of spatialization is that it makes a stereo stream sound much richer to the average listener.

Virtualization uses signal processing to fool the listener into thinking that there are other speakers present that aren't really there. For example, a system could use only two front speakers or headphones and virtualize rear or surround sound speakers for a home theater effect. Virtualization localizes a specific channel of audio (such as left rear) to a specific location, as opposed to localizing a specific sound to an exact location (which is what occurs in 3D positional audio). Virtualization is only used when the source media has more than two channels of audio - its usefulness for positioning interactive sound sources (such as those found in action games) is limited. Examples of multichannel source media are Dolby Prologic Surround or Dolby Digital audio streams. In this article, when I refer to 3D audio, I mean 3D positional audio.

3D positional audio uses signal processing to localize a single sound to a specific location in three-dimensional space around the listener. 3D positional audio is the most common effect used in interactive games, because a sound effect, such as the sound of an opponent's automobile, can be localized to a specific position. This position, for instance, could be behind the listener and quickly moving around the left side while all the other sounds are positioned separately.

3D positional audio is also referred to as HRTF-based 3D audio. HRTF stands for "head-related transfer function," a method by which sounds are processed to localize them in space around the player. Although this technique is acceptable for 3D positioning, it requires a large amount of processing power. This is the reason 3D audio hardware accelerators are becoming so common in PCs. For an explanation of the mechanics of 3D audio and HRTF processing, see "Exploiting Surround Sound using DirectSound3D" in the December/ January 1997 issue of Game Developer.

Applications for Spacialization. Spatialization is an effect for processing music, such as an audio CD or a stereo music soundtrack in a game. This effect is especially useful for PC speakers that are built into the monitor because it makes the sound appear to come from a much wider sound field than the actual speakers, which in this case are very close together.

Care must be taken never to apply this effect to an audio stream that has already been processed by a 3D positional algorithm, because spatialization alters the phase of the signal and can destroy the 3D positional effect. However, such conflicts are the concern of the audio system designers, not the developer of game or application.

Applications for Virtualization. Virtualization is an effect that gives the listener the impression of a home theater environment even when only two speakers or headphones are present, which is typically the case with multimedia PCs. However, to use this effect, multichannel audio must be available, and the sounds to be played back on the virtualized rear speakers must be encoded onto those tracks during production. This makes this solution less than ideal for the action portion of games, in which sounds might have to jump from the front speakers to the rear (depending on the player's actions), but cannot due to prior encoding on a specific channel. However, multichannel audio and the virtualization of these channels are very effective for noninteractive game intro scenes.

Virtualization is typically used to play back Dolby AC-3 or Dolby Pro Logic audio streams on a system that only has two speakers. For example, DVD movies can have a 5.1 channel Dolby AC-3 audio track encoded on the DVD. (The 5.1 channels are actually left front, right front, center, left rear, right rear, and a subwoofer. The subwoofer is referred to as the ".1" channel of the 5.1.) 3D positional audio techniques are used to position a front center channel as well as right and left rear channels at their virtual locations. Virtualization simulates the additional speakers that aren't typically present on a computer. Virtualizing rear speakers is an example of using 3D positional audio for a noninteractive application, because the virtual speaker locations aren't moving or responding to the listener.

Applications for 3D Positional Audio.  One of the reasons that 3D positional audio is so popular in action games is because it can be interactive. Sounds don't have to be preprocessed during the game's development to position the sound. As the listener changes location in a virtual world, all the sound objects can maintain their correct location speed and path of motion around the listener as the action unfolds.

This is different from applications that encode the audio into a certain channel (such as a rear or surround channel) during development. Encoding the location of a sound into a particular channel of a multichannel stream is an example of noninteractive audio placement. Multichannel audio is typically used in an environment where the listener doesn't have control over the sounds - such as when you watch a movie in a home theater.

Implementing 3D Audio Today

The use of 3D audio in PC games initially wasn't widespread, mostly due to poor 3D audio support in the Microsoft DirectSound3D 3.0 API. The first iteration of this API didn't allow specialized 3D audio hardware to process the 3D streams, and as a result, game developers couldn't be certain that 3D sound objects would sound satisfactory and have the desired effect - even if a world-class 3D audio accelerator was present in the PC. So there wasn't much incentive for game developers to incorporate 3D audio into their games early last year. However, when DirectSound3D 5 started shipping in August 1997, the situation changed
(see table 1). That API supports specialized 3D audio accelerators for processing 3D audio streams.

Using the DirectSound3D API  Next Page