MPEG-4 Structured Audio

MPEG-4 AudioBIFS

Up to tools | Back to MIDI


Introduction

MPEG-4 AudioBIFS is part of the overall MPEG-4 BIFS specification. BIFS stands for Binary Format for Scene Description, and is the part of MPEG-4 which lets you describe how videos, graphics, sounds, and sound synthesis are combined together to create an interactive multimedia application.

The core structure of BIFS is based on the VRML 2.0 specification. Everything you can do in VRML, you can do in MPEG-4, but MPEG-4 also has a bunch of other cool features:


AudioBIFS

In MPEG-4, AudioBIFS allows you to describe a sound as the combination of a number of sound objects. These sound objects may be coded using different coders (for example, CELP-coded voice and synthetic background music), and combined together in many ways. We can mix sounds together, or apply special filters and other processing functions written in SAOL.

Like the rest of BIFS, AudioBIFS is based on a scene graph. However, unlike in visual BIFS, the nodes in the AudioBIFS scene graph don't represent a bunch of objects which are presented to the user. Each AudioBIFS sound subgraph represents one sound object which is created by mixing and processing the elementary sound streams on which it is based.

For example, here is an audio subgraph which shows how a simple sound is created from three elementary sound streams:

Each of the rectangles show a node in the audio scene subgraph. Each node has a certain function, like mixing some sounds together, or delaying a sound, or doing some effects-processing. The arrows along the bottom represent the three elementary sound streams which make up the sound object. Each sound stream can be coded a different way. For example, we might code the piano sound with the Structured Audio decoder, the bass sound with the MPEG-4 Parametric HILN coder, and the vocal track with the MPEG-4 CELP coder.

These three sound streams are just like a "multitrack" recording of the final music sound object. The sound of each instrument is represented separately, then the scene graph mixes them all together. The processing in the audio subgraph is like a "data-flow" diagram. The sounds flow from the streams at the bottom, up through the nodes, and turn into a single sound at the top.

This single, final sound can be put into an audiovisual scene: it can be given a 3-D spatial location, moved around, and so on. For example, here is a picture of an audiovisual scene with several sound objects in it.

In this scene, each of the objects has a different sound associated with it, and as you move around the virtual world, you hear the different objects coming from different places, at different loudness levels. Some of the sounds are speech, some are music, and some are sound effects. Each of these sounds has its own sound subgraph, and so might have many underlying elementary audio streams.


The AudioBIFS nodes

Here is a list, with short descriptions, of each of the AudioBIFS nodes. A more lengthy description is coming soon

Node name

Function

AudioSource

Insert sound into a audio subgraph (connect to an elementary audio stream)

AudioMix

Mix N channels of sound to produce M channels of sound

AudioDelay

Delay a sound for a short amount of time relative to the rest of the audio subgraph

AudioSwitch

Select N channels of sound out of a set of M channels

AudioClip

Save a short "snippet" of sound for use with interactions or loops

AudioFX

Execute parametric sound-effects processing given as SAOL code.

Sound (and Sound2D)

Attach the sound created with an audio subgraph into a 3D world (or 2D scene, for Sound2D).

 

Software

We're working with the MPEG-4 Audio/Systems Integration Ad-Hoc Group to produce software which can do all of these things, and multiplex and demultiplex bitstreams, and other cool things. It's not ready for release yet, but will be available here when it is.


Up to tools | Back to MIDI