MPEG-4 Structured Audio

Papers about Structured Audio

Up to top level | Forward to SA links page | Back to developer resources


Casey, Michael A., and Smaragdis, Paris J. (1996)
Netsound: High-quality audio from semantic descriptions
Proc. 1997 Intl Computer Music Conference, Hong Kong, Nov 1996.
Available here .

Abstract: We describe a sound and music specification protocol called NetSound that is oriented towards networked low-bandwidth, native-signal-processing sound synthesis applications. One such application is music distribution on the internet. We describe the concept behind NetSound and outline a prototype implementation that uses Csound, a synthesis specification language designed and implemented at the MIT Media Lab, as a client-side real-time synthesis engine.


Scheirer, Eric D., and Vercoe, Barry L. (in press)
SAOL: The MPEG-4 Structured Audio Orchestra Language
To appear in Computer Music Journal.

Abstract: The MPEG-4 standard, on which technical work was completed in October 1998, contains extensive provisions for sound synthesis as well as traditional methods of audio compression. At the heart of MPEG-4 Structured Audio, the sound-synthesis framework, is a new music-synthesis language called SAOL. This language, based on the Music-N model, is specified and defined fully in the MPEG-4 International Standard; a real-time implementation must be embedded in any device conforming to the full MPEG-4 Audio standard. In this paper, the structure and capabilities of SAOL are discssed, especially in comparison to other music languages. A discussion of the role of international standardization in the future development of computer-music tools is also presented.


Scheirer, Eric D., and Youngjik Lee and Jae-Woo Yang (in press).
Synthetic and SNHC Audio in MPEG-4.
in Atul Puri and Tsuhan Chen (eds).,
Advances in Multimedia: Signals, Standards and Networks
New York: Marcel-Dekker, in press.


Scheirer, Eric D. (1999).
Structured Audio and Effects Processing in the MPEG-4 Multimedia Standard.
Multimedia Systems 7:1 pp. 11-22, Jan. 1999.

Abstract: While previous generations of the MPEG multimedia standard have focused primarily on coding and delivery of content digitally sampled from the real world, MPEG-4 contains extensive support for structured, synthetic and synthetic/natural hybrid coding methods. We describe work-in-progress on the "Structured Audio and Effects" component of MPEG-4, which allows for the description of synthetic soundtracks, musical scores, and effects algorithms and the compositing, manipulation, and synchronization of real and synthetic audio sources. A discussion of the separation of functionality between the Systems layer and the Audio component of MPEG-4 is presented, and prospects for efficient DSP-based implementations are discussed.


Scheirer, Eric D., and Jyri Huopaniemi and Riitta Väänänen (1998)
AudioBIFS: The MPEG-4 Standard for Effects Processing.
Proc. DAFX98 Workshop on Digital Audio Effects, Barcelona, Nov. 1998.
(Available here.)

Abstract : We present a tutorial overview of the AudioBIFS system, part of the Binary Format for Scene Description in the MPEG-4 International Standard. AudioBIFS allows the flexible construction of sound scenes using streaming audio, interactive presentation, 3-D spatialization and auralization, and dynamic download of custom signal processing routines. MPEG-4 sound scenes are based on a model which is a superset of the model in VRML 2.0, and we clearly describe the relationship between sound in VRML and sound in MPEG-4. We discuss the use of SAOL, the MPEG-4 Structured Audio Orchestra Language, for writing downloadable effects, present example sound scenes in AudioBIFS, and describe the current and future state of implementations of the standard.


Scheirer, Eric D. (1998)
The MPEG-4 Structured Audio Orchestra Language.
Proc. 1998 Int. Computer Music Conf, Ann Arbor, MI, Oct 1998.
(Available here.)

Abstract : The MPEG-4 standard, proceeding to International Standard status in November 1998, merges the existing worlds of audio-video coding and synthetic content representation. Part of the standard is a new set of tools called, together, Structured Audio. The MPEG-4 Structured Audio system enables the efficient, streaming transmission of synthetic sound and music using a variety of formats. The central component of this system is a new music language called SAOL, some features of which will be described in this paper.


Scheirer, Eric D., and Lee Ray (1998)
Algorithmic and Wavetable Synthesis in the MPEG-4 Multimedia Standard.
Proc. 105th Meeting of the AES (invited paper), San Francisco, Sept 1998.
(Available here.)

Abstract : The newly released MPEG-4 standard for multimedia transmission contains several novel tools for the low-bitrate coding of audio. Among these is a new codec called "Structured Audio" that allows sound to be transmitted in a synthetic description format and synthesized at the client. MPEG-4 Structured Audio contains facilities for both algorithmic synthesis, using a new software-synthesis language called SAOL, and wavetable synthesis, using a new format for the efficient transmission of banks of samples. We contrast the use of these techniques for various multimedia applications, discussing scenarios in which one is favored over the other, or in which they are profitably used together.


Scheirer, Eric D. (1998)
The MPEG-4 Structured Audio Standard.
Proc. 1998 IEEE ICASSP (invited paper), Seattle, May 1998.
(Available here.)

Abstract : The MPEG-4 standard defines numerous tools that represent the state-of-the-art in representation, transmission, and decoding of multimedia data. Among these is a new type of audio standard, termed "Structured Audio". The MPEG-4 standard for structured audio allows for the efficient, flexible description of synthetic music and sound effects, and the use of synthetic sound in synchronization with natural sound in interactive multimedia scenes. A discussion of the capabilities, technological underpinnings, and application of MPEG-4 Structured Audio is presented.


Vercoe, Barry L., and William G. Gardner and Eric D. Scheirer.
Structured Audio: Creation, Transmission, and Rendering of Parametric Sound Representations.
Proc. IEEE 86:5 (May 1998), pp. 922-940 (invited paper).

Abstract : Structured audio representations are semantic and symbolic descriptions which are useful for ultra-low bitrate transmission, flexible synthesis, and perceptually-based manipulation and retrieval of sound. We present an overview of techniques for transmitting and synthesizing sound represented in structured formats, and for creating structured representations from audio waveforms. We discuss applications for structured audio in virtual environments, music synthesis, gaming, content-based retrieval, interactive broadcast, and other multimedia contexts.


For more information on these papers, or to request reprints, contact Eric Scheirer at eds@media.mit.edu.

Up to top level | Forward to SA links page | Back to developer resources