Producing Interactive Audio

Creating an Adaptive Audio Engine
 
Producing Interactive Audio
Introduction

The Rules of Interactive Sound Design

Creating an Adaptive Audio Engine

Case Study: The GEX Project

Only the Beginning

Tools of the Trade

Getting More Info about the IASIG

Armed with the desire for truly interactive audio, we at Crystal Dynamics set out to create our own sound driver for the Sony PlayStation and, perhaps later, for Windows 95. From my notions of interactivity and set of rules for interactive audio, we derived a number of design goals for our driver.

EMPHASIZE MIDI OVER DIGITAL AUDIO. For most of our products, MIDI and custom DLS-like instruments are a better way to go than Red Book or streaming digital audio. Rules 1 and 5 have some implications for Red Book audio. Red Book audio sounds great, but fortunately for Crystal Dynamics, our programmers know how to get the most out of the CD-ROM drive for game play. Therefore, it's not always available for audio. Furthermore, creating interactive sound designs using streamed, branching digital audio segments is limited in many ways, primarily by disk space, seek times, bus bandwidth, and buffer sizes. Red Book, or any kind of linear, streamed digital audio requires a lot of storage space in any situation. But it becomes even more problematic in an adaptive setting where each variation of a given section of music has to be remixed and stored uncompressed. Finally, most consoles (including the PlayStation) save money by using inexpensive (read, slow and not very reliable) CD-ROM drives. Thus, the constant seeking and playing of digital audio tracks is likely to tax the CD-ROM drive to the point of failure. Red Book audio should therefore be reserved for noninteractive sections of the game, such as title screens.

On the other hand, MIDI is small, compact, and easily modifiable on the fly. The PlayStation has its own dedicated sound RAM, and all of the data needed for a level can be loaded in less than a second. Once loaded, the data is out of the way and the CD-ROM returns to its other duties. Furthermore, the PlayStation contains a pretty good sampler. Respectable music and sound effects were created for the Super Nintendo as well, but that platform suffered from limited sound RAM. Fortunately, the PlayStation has almost ten times as much sound RAM, much better sound interpolation (an on-the-fly sample rate conversion technique used to stretch a sample up or down the keyboard from its native pitch), and superior DSP (used for reverb and the like). In my opinion as a confirmed curmudgeon, anyone who says that they can't make high-quality MIDI music on the PlayStation under these conditions is just whining about the amount of work involved in such an endeavor.

KEEP SOUND DRIVERS EFFICIENT. When we considered replacing the existing sound driver with our own technology, we decided that our code would need to be faster, smaller, easier to implement, and more capable than the code we would be replacing. Otherwise, the project was not worth undertaking. Rule 1 dictates that sound drivers must be small and fast, since both system RAM (where the driver resides) and CPU time are scarce commodities in a fully rendered 3D world. Making sound drivers easy to use is important; programmer and game designer time are limited commodities, so making basic implementation easier leaves more time for these folks to work on making the world ready for your interactive audio.

There should also be a simple, consistent means for your game to communicate relevant information about itself to the sound driver. Adding interactive sound capabilities requires programmers and designers to spend more time communicating information to the sound driver about the state of the world and the characters within it. At Crystal Dynamics, we tried to remedy this situation by communicating the state of the world to the sound driver in the form of a set of simple, numerically registered variables. Most often, we use values from 0 to 127 so that they could be set from standard 7-bit MIDI controllers. Thus, the number of enemies alive on the screen might be represented as one 7-bit variable. Your distance from the level's exit might be stored in another. We have tried to use these same variables throughout the game so that they only need to be coded once.

CODE OR DIE. It's important to put the logic programming in the hands of the sound designer, not the game programmer. Rule 8 clearly shows the logic behind this. It's hard enough to explain which aspects of the world you need to track. It's almost impossible (and I think unreasonable) to expect the game programmers to write code to mute and unmute specific MIDI channels when various condition arise. To solve this problem, we created (with some help from Jim Wright at IBM's Watson research lab) a programming language that allows us to author logical commands within a stock MIDI sequencing environment and store them within a standard MIDI file. The language contains a set of fairly simple Boolean functions (if then, else, endif), navigational commands (goto, label, loop, loop end), a set of data manipulation commands (get, set, sweep), and parameter controls (channel volume, pan, transpose, and so on). Next, we created an auditioning tool that allowed us to simulate the run-time game environment, kick out logic-enhanced sequences, manipulate the game state variables, send commands, and see what happens.
Case Study: The GEX Project Next Page