MPEG Audio FAQ

MPEG-1: coded storage of sampled sound waves

Overview of MPEG-1

MPEG-1 audio standardizes three different coding schemes for digitized sound waves called Layers I, II, and III. It does not standardize the encoder, but rather standardizes the type of information that an encoder has to produce and write to an MPEG-1 conformant bitstream as well as the way in which the decoder has to parse, decompress, and resynthesize this information in order to regain the encoded sound. The encoded sound bitstream can be stored together with an encoded video bitstream and other data streams in a so-called MPEG-1 systems stream.

What are the typical applications of MPEG-1 Audio?

Within the professional and consumer market, four fields of applications can be identified: broadcasting, storage, multimedia, and telecommunication. This variety of applications is possible because of the wide range of bitrates and the numerous configurations allowed within the MPEG-1 Audio standard. Some of the most important applications are:

Consumer Recording (DCC)
Disc based storage (CD-i, CD-Video)
DVD
Disc based Editing, audio broadcasting station automation
Solid State Storage Audio
Cable and satellite TV (e.g. DVB, USSB, DirecTV, EchoStar)
Cable Radio
Digital Audio Broadcasting (e.g. ADR, DAB, US-Digital Radio, Worldspace Radio)
Internet Radio
Computer based Multimedia
Contribution Links
Distribution Links
ISDN Links
Stand-alone electronic information systems

What is so special about MPEG-1 audio coding?

MPEG-1 audio aims for generic sound waves, i.e. it is not restricted to e.g. speech signals, but codes all types of sound signals.
It performs perceptual audio coding rather than lossless coding. In lossless coding, redundancy in the waveform is reduced to compress the sound signal, and the decoded sound wave does not differ from the original sound wave. On the contrary, a perceptual audio codec does not attempt to retain the input signal exactly after encoding and decoding, rather its goal is to ensure that the output signal sounds the same to a human listener. It aims at eliminating those parts of the sound signal that are irrelevant to the human ear, i.e. that are not heard. Roughly speaking, an MPEG-1 audio encoder transforms the sound signal into the frequency domain, eliminates those frequency components that are masked by stronger frequency components (i.e. cannot be heard), and packages this analyzed signal into an MPEG-1 conformant audio bitstream.

MPEG-1 language and abbreviations

What are the different layers in MPEG-1?

The different layers have been defined because they all have their merits. Basically, the complexity of the encoder and decoder, the encoder/decoder delay, and the coding efficiency increase when going from Layer I via Layer II to Layer III.
Layer I has the lowest complexity and is specifically suitable for applications where also the encoder complexity plays an important role.
Layer II requires a more complex encoder and a slightly more complex decoder, and is directed towards 'one to many' applications, i.e. one encoder serves many decoders. Compared to Layer I, Layer II is able to remove more of the signal redundancy and to apply the psychoacoustic threshold more efficiently.
Layer III is again more complex and is directed towards lower bit rate applications due to the additional redundancy and irrelevancy extraction from enhanced frequency resolution and Huffman coding.

The term 'layers' suggests that the higher layers are 'on top' of the lower layers. Is that true?

Not exactly. But it is true that the main functional modules of the lower layers are also used by the higher layers. E.g. the subband filterbank of Layer I is also used by Layer II and Layer III, Layer II adds a more efficient coding of side information, Layer III adds a frequency transform in all the subbands.
The three layers have been defined to be compatible in a hierarchical way, i.e. a 'full Layer N' decoder is able to decode bitstreams encoded in Layer N and all layers below N. Consequently, a 'full Layer III' decoder accepts Layer I, II, and III bitstreams and a 'full Layer II' decoder accepts Layer I and II bitstreams, however a Layer I decoder only accepts Layer I bitstreams.
Nevertheless, MPEG-1 Audio decoders may exist which do not support the full functionality for a certain layer, or do not support the lower layers. These decoders may however not be referred to as (full) Layer N decoder.

What is MP3?

Originally, the file extension ".mp3" was created with the emergence of MPEG-1 Layer III encoder and decoder software for Windows. After standardisation of MPEG-2, sound files encoded with the MPEG-2 lower sampling rate extension of Layer III are also called "MP3"-files. Sometimes MP3 is wrongly called MPEG-3.

Technicalities of MPEG-1

How does MPEG-1 Audio work?

The primary psychoacoustic effect that the perceptual MPEG-1 audio coder uses is called 'auditory masking', where parts of a signal are not audible due to the function of the human auditory system. For example, if there is a sound that consists mainly of one frequency, all other sounds that consist of a closeby frequency but are much quieter will not be heard. The parts of the signal that are masked are commonly called 'irrelevant', as opposed to parts of the signal that are removed by a lossless coding operation, which are termed 'redundant'.
In order to remove this irrelevancy, the encoder contains a psychoacoustic model (see Figure 1). This psychoacoustic model analyzes the input signals within consecutive time blocks and determines for each block the spectral components of the input audio signal by applying a frequency transform. Then it models the masking properties of the human auditory system, and estimates the just noticeable noise-level for each frequency band, sometimes called the threshold of masking.
In parallel, the input signal is fed through a time-to-frequency mapping, resulting in spectrum components for subsequent coding. In its quantisation and coding stage, the encoder tries to allocate the available number of data bits in a way that meets both the bitrate and masking requirements taking into account the calculated thresholds of masking. The information on how the bits are distributed over the spectrum is contained in the bitstream as side information.
The decoder is much less complex, because it does not require a psychoacoustic model and bit allocation procedure. Its only task is to reconstruct an audio signal from the coded spectral components and associated side information.

Figure 1: Overview of MPEG-1 audio encoding

What is the general layout of the bit stream?

The digitized sound signal is devided up into blocks of 384 samples in Layer I and 1152 samples in Layers II and III. Such a block is encoded within one MPEG-1 audio frame. An MPEG-1 audio stream therefore consists of consecutive audio frames. A frame consists of a header and the encoded sound data. A Layer III frame may distribute its encoded sound data over several consecutive other frames if those frames do not require all of their bits. The header of a frame contains general information such as the MPEG Layer, the sampling frequency, the number of channels, whether the frame is CRC protected, whether the sound is an original etc. Although most of this information may be the same for all frames, MPEG decided to give each audio frame such a header in order to simplify synchronization and bitstream editing.

What bit rates are supported by MPEG-1 Audio?

In order to be applicable to a large number of different applications scenarios, MPEG-1 supports a wide range of bit rates from 32 kbit/s to 320 kbit/s. The "Low Sampling Frequency" (LSF) extension of MPEG-2 extends this range down to 8 kbit/s. In addition, switching of bit rates at the frame level is explicitly included in the standard thus allowing applications to adapt their bit rate to environmental conditions.

Is Variable Bit Rate allowed in MPEG-1 Audio?

For Layer III, the answer is simply 'yes'. The average bit rate is the one given in the header of a Layer III frame, but as the bits may be distributed over several frames, this effectively implies a variable bit rate.
For Layers I and II, according to the standard, it is not mandatory for decoders to support Variable Bit Rate (VBR). However, in practice the majority of the decoders do support Variable Bit Rate, and it is perfectly in line with the standard to specify for a certain application that decoders should support VBR. This is implemented by specifying for each audio frame separately the bit rate at which it is encoded.

What stereo modes are supported by MPEG-1 Audio?

MPEG-1 audio works for one- and two-channel audio signals. A method called joint stereo coding can be used to exploit some redundancy between left and right channels of a stereo audio signal. Four different modes are standardized:

mono
stereo
joint stereo (intensity stereo or mid/side stereo)
dual channel (two independent channels e.g. for two languages)

What is the Signal-To-Noise Ratio (SNR) of MPEG-1 Audio?

For a perceptual codec, this is not really a relevant question. The SNR is a very bad measure of perceptual audio quality, even for a waveform coder. The SNR measured in a conventional way, may vary from a few dB up to more than 100 dB, mostly depending on the signal, while no noise is audible in any of these cases. Within the International Telecommunication Union (ITU-R), a task group (TG 10/4) is working on the development of a more appropriate objective measurement system, based on perceptual models. For the moment, one has to rely on the human ear as a measuring instrument, i.e. there are no other reliable means to determine the quality of a perceptual codec than listening tests. Even when a standardised perceptually based objective measurement system is available, listening tests will still be wise for comparison of different audio codecs.
On the basis of psychoacoustics, it must be noted that within the range of 5 to 80 dB SNR, it is easily possible to generate two test signals, one that can be exactly reproduced in the perceptual sense, and one that cannot, showing the large range over which SNR is not meaningful as a quality measure.

How is the performance of MPEG-1 Audio with respect to cascading, i.e. multiple coding?

This functionality was tested by the International Telecommunication Union (ITU-R). They tested various configurations of repeated encoder/decoder chains at different bitrates with a variety of audio coding algorithms.
MPEG-1 Audio performed best in this test. On the basis of these tests, ITU-R recommends the use of MPEG-1 Audio Layer II for contribution (i.e. link between broadcasting studios with provisions for post processing), for distribution (i.e. link between the broadcasting and transmitter station) and for emission (i.e. final transmission between transmitter and receiver at home). The use of MPEG-1 Audio Layer III is recommended for commentary links, i.e. a link for speech signals which are transmitted to the broadcasting station using e.g. one B-channel of an ISDN line.

Implementing MPEG-1 audio software

What kind of support does MPEG provide for implementers of MPEG Audio?

MPEG provides different kinds of support to implementers. Firstly, a Technical Report is issued that contains software that describes the decoder and an example encoder. This software can be used by implementers to analyze and to get accustomed with the algorithms, and could be used as a basis for an implementation. The encoder can be used to generate test sequences. The Technical Report is published as part 5 of the standard, i.e. ISO/IEC 11172-5 for MPEG-1 and ISO/IEC 13818-5 for MPEG-2, ISO/IEC 13818-5 for MPEG-2 AAC.
Secondly, a conformance document is issued. This document provides guidelines to test conformance to the standard of bitstreams, and conformance of decoders. It also describes the accuracy level that a decoder should meet in order to be called an MPEG audio decoder or a 'high accuracy' MPEG Audio decoder.
An important part of the conformance document is a set of bitstreams and the corresponding reference decoder output, that address several functionalities of the decoder. The conformance document is published as part 4 of the standard, i.e. ISO/IEC 11172-4 for MPEG-1 and ISO/IEC 13818-4 for MPEG-2 and MPEG-2 AAC.
For MPEG-2 ISO/IEC 13818-5 a CD ROM was released which contains all the reference bitstreams needed to perform the conformance test of the decoder implementations. The CD ROM can be ordered directly from the ISO/IEC Copyright Office.

Market position

How many MPEG-1 Audio decoders are already in the market-place?

Because of the widespread applications, it is rather difficult to give exact numbers. But at the end of 1996 a rough estimation of decoders in the marketplace gives a total number of several millions.

What are the reasons that MPEG-1 Audio is used so widely?

Thanks to its technical merits and excellent audio quality performance, several standardisation bodies include the MPEG-1 Audio standard in their recommendation. ITU-R (International Telecommunication Union) issued in 1994 the recommendation BS.1115, to use MPEG Audio for audio as well as television broadcasting, including contribution, distribution, commentary and emission links. In 1995, DAVIC (Digital Audio Visual Council) specified the use of MPEG Audio for mono and stereo audio signals. ETSI (European Telecommunication Standardisation Institute) included in January 1995 MPEG-1 and MPEG-2 Audio in their Standard on DAB Standard pr ETS 300 401, 'Radio Broadcasting system; Digital Audio Broadcasting (DAB) to mobile, portable and fixed receivers' and later in the ETR 154 on 'Digital broadcasting systems for television; Implementation guidelines for the use of MPEG-2 systems; Video and audio in satellite and cable broadcasting applications'. ITU-T recommended in 1995 in its recommendation J.52 'Digital Transmission of High-Quality Sound-Programme Signals using one, two or three 64 kbit/s Channels per Mono Signal (and up to Six per Stereo Signal)' the use of MPEG-1 and MPEG-2 Audio as an audio coding system to provide high quality audio over telecommunication lines.

Relation to other standards / methods

What is the relation between MUSICAM and MPEG-1 Audio Layer II?

MUSICAM was the name of an audio coding system submitted to MPEG, which became the basis for MPEG-1 Audio Layer I and II. Since the finalisation of MPEG-1 Audio, the original MUSICAM algorithm is not used anymore. The name MUSICAM is however mistakenly still used regularly when MPEG-1 Audio Layer II is meant. This is especially to be avoided because the name MUSICAM is trademarked by different companies in different regions of the world.

Bibliographic references

Can you propose more detailed information in the literature?

Since 1992, many articles about MPEG-1 Audio coding have been published all over the world. The following list of standards, recommendations, and articles provides you with more information, and gives you both, a better overview and more detailed knowledge, so that you will become a 'real MPEG Audio fan'.

ISO/IEC 11172-3. "Coding of Moving pictures and associated audio for digital storage media at up to 1.5 Mbit/s - Audio Part". International Standard, 1993.
Brandenburg, K.-H.; Stoll, G et al.: "ISO-MPEG-1 Audio: A Generic Standard for Coding of High Quality Digital Audio". Journal of the Audio Engineering Society, Oct. 1994, Vol. 42, No. 10, pp. 780 - 792.
Davis Pan: "A Tutorial on MPEG/Audio Compression". IEEE Multimedia Vol. 2, No. 7, 1995, pp. 60-74.
Chapter 4 in Haskell/Puri/Netravali: "Digital Video: An Introduction to MPEG-2". Chapman & Hall, New York, 1997.
Peter Noll: "MPEG Digital Audio Coding". IEEE Signal Processing Magazine, Sept. 1997, pp.59-81.
Seymour Shlien: "Guide to MPEG-1 Audio Standard". IEEE Transactions on Broadcasting, Dec. 1994, Vol40, No. 4, pp. 206-218.
Karlheinz Brandenburg: "MP3 and AAC explained". Proc. of the AES 17th International Conference on High Quality Audio Coding, Florence, Italy, 1999.
etc.

Organisational details

What's the status of the standardisation process?

MPEG-1 was finalised in 1992 and resulted in the International Standard ISO/IEC 11172-3 which was published in 1993.

Where can I find information on MPEG-1 licensing ?

Information on licensing of MPEG-1 Layer I and Layer II can be found at:

http://www.audiompeg.com/ (SISVEL, Italy)
http://www.licensing.philips.com/information/mpeg/

Information on licensing of MPEG-1 Layer III can be found at:

http://www.mp3licensing.com/
http://www.iis.fhg.de/amm/legal/index.html

The information above is provided for convenience of the reader. MPEG, however, is not in a position to guarantee the validity of any claim made by a party with respect to IPR ownership.

Heiko Purnhagen 07-Nov-2001