INTERNATIONAL ORGANISATION FOR STANDARDISATION
ORGANISATION INTERNATIONALE DE NORMALISATION
ISO/IEC JTC1/SC29/WG11
CODING OF MOVING PICTURES AND AUDIO

ISO/IEC JTC1/SC29/WG11
N3075
December 1999 / Maui, HI

Source: Audio Subgroup
Title: Report on the MPEG-4 Audio Version 2 Verification Test
Authors: Ralph Sperschneider (FhG), Frank Feige (T-Nova), Schuyler Quackenbush (AT&T)

Report on the MPEG-4 Audio Version 2 Verification Test

This web page contains only excerpts of the document. The complete document is available as PDF file.

Summary

The MPEG-4 Audio Version 2 coding tools have undergone a performance verification test for coding of monophonic audio signals in the range of 6 kbit/s to 64 kbit/s and stereophonic audio signals in the range of 64 kbit/s to 96 kbit/s. The coding tools tested were Harmonic and Individual Lines plus Noise (HILN) coding, Bit Sliced Arithmetic Coding (BSAC), Low Delay Advanced Audio Coding (AAC LD) and the Error Robustness tools comprising Error Resilience (ER), and Error Protection (EP). It was found that, relative to Version 1 tools, Version 2 tools provide new capabilities while still providing comparable audio quality and comparable levels of compression. New capabilities evaluated as part of these tests are parametric signal representation (allowing independent speed and pitch modification), fine step bit rate scalability, very low communications delay, and robustness to channel errors.

Introduction

MPEG-4 Version 2 is the name given to technology in Amendment 1 of MPEG-4 (ISO/IEC 14496). Although it is an amendment, Version 2 is more correctly viewed as technology that required more time to develop and hence was not available at time that ISO/IEC 14496 was issued as an international standard. The purpose of the tests reported on here is to verify that Version 2 tools bring valuable technology to the MPEG-4 standard. The figure of merit in the test is subjective audio quality. This, plus each tool´s features and capabilities, permit system developers to better judge the merit of the technology as a basis for future applications.

The technology tested was Harmonic and Individual Lines plus Noise (HILN) coding, Bit Sliced Arithmetic Coding (BSAC), Low Delay Advanced Audio Coding (AAC LD) and the Error Robustness tools comprising Error Resilience (ER) and Error Protection (EP). While the Version 2 technology provides compression, it is most often compression in conjunction with other valuable features, such as very low bit rate (for HILN), very low delay (for AAC LD), fine step bit rate scalability (for BSAC) or robustness to bit stream errors (for ER and EP tools). The ER and EP tools are valuable in systems in which compressed audio information must be transmitted over error-prone channels. These may be radio channels that incur bit or byte errors, or packet channels that incur lost (or late) packets. The increasing importance of wireless communications and the Internet make these tools particularly valuable.

In this document the names of the following Audio object types are used to identify the different codecs (for details see [n3058]):

Object type ID	Audio object type	MPEG-4 version	Description
1	AAC main	1	Advanced Audio Coding in main configuration
3	AAC SSR	1	Advanced Audio Coding in scalable sampling rate configuration
8	CELP	1	Code Excited Linear Prediction
12	TTSI	1	Text to speech interface
7	TwinVQ	1	Transform Domain weighted interleave Vector Quantization
17	ER AAC LC	2	Error Resilient Advanced Audio Coding with Low Complexity
23	ER AAC LD	2	Error Resilient Advanced Audio Coding with Low Delay
20	ER AAC scalable	2	Error Resilient scalable Advanced Audio Coding
22	ER BASC	2	Error Resilient Bit Sliced Arithmetic Coding
26	ER HILN	2	Error Resilient Harmonic and Individual Lines plus Noise
25	ER HVXC	2	Error Resilient Harmonic Vector Excitation Coding
21	ER TwinVQ	2	Error Resilient Transform Domain weighted interleave Vector Quantization

The set of new tools provided by MPEG-4 Audio Version 2 is listed below:

New codecs:

ER HILN, Parametric (ER HVXC + ER HILN)
ER AAC LD
ER BSAC

Codec extensions:

Silence compression for ER CELP
Variable rate coding for ER HVXC at 4 kbit/s

Error robustness:

EP tool
Error resilient bit stream syntax for all Version 1 object types (except of AAC main, AAC SSR, TSSI and structured audio related object types)
Error resilience tools for ER AAC LC, ER AAC LTP, ER AAC scalable, and ER AAC LD
Error resilience mode for ER BSAC

Out of this pool, the following Version 2 object types have been evaluated in this test:

ER HILN (Session A1)
ER BSAC (Session A2)
ER AAC LD (Session A3)
Error robustness applied to ER AAC LC and ER TwinVQ (Session A4)

No per-item tuning was permitted on any of the codecs involved in these verification tests.

Codecs under Test

During the Vancouver MPEG meeting it was decided to test the following Version 2 coding tools in three distinct sessions: ER HILN, ER BSAC and ER AAC LD. It was also decided to test in a separate session ER and EP tools as they apply to ER AAC LC and ER TwinVQ. The four sessions are designated A1, A2, A3, and A4.

The tables in this chapter indicate the parameters for the respective codec under test, the test method, and the reference codec. The reference codec serves as an anchor in the test, permitting results from this test to be more easily compared to that of previous tests in which the same reference codec was also tested.

Session A1 – ER HILN

Codec under test	Reference Codec	Test method
ER HILN 6 kbit/s @ 16 kHz (mono)	TwinVQ 6 kbit/s @ 16 kHz (mono)	BS.1284 quality scale, R/A R: band limited to 8 kHz
ER HILN scalable 6 kbit/s @ 16 kHz (mono) based on scalable configuration: 6 kbit/s @ 16 kHz (mono) + 10 kbit/s @ 16 kHz (mono)	TwinVQ 6 kbit/s @ 16 kHz (mono)	BS.1284 quality scale, R/A R: band limited to 8 kHz
ER HILN 16 kbit/s @ 16 kHz (mono)	AAC main 16 kbit/s @ 22.05 kHz (mono)	BS.1284 quality scale, R/A R: band limited to 8 kHz
ER HILN scalable 16 kbit/s @ 16 kHz (mono) based on scalable configuration: 6 kbit/s @ 16 kHz (mono) + 10 kbit/s @ 16 kHz (mono)	AAC main 16 kbit/s @ 22.05 kHz (mono)	BS.1284 quality scale, R/A R: band limited to 8 kHz

Session A2 – ER BSAC

Codec under test	Reference Codec	Test method
ER BSAC 96 kbit/s @ 32 kHz (stereo)	AAC main 96 kbit/s @ 32 kHz (stereo)	BS.1284 Quality scale, R/A/R/A
ER BSAC 88 kbit/s @ 32 kHz (stereo) derived from configuration 96 kbit/s @ 32 kHz (stereo)	AAC main 96 kbit/s @ 32 kHz (stereo)	BS.1284 Quality scale, R/A/R/A
ER BSAC 80 kbit/s @ 32 kHz (stereo) derived from configuration 96 kbit/s @ 32 kHz (stereo)	AAC main 96 kbit/s @ 32 kHz (stereo)	BS.1284 Quality scale, R/A/R/A
ER BSAC 72 kbit/s @ 32 kHz (stereo) derived from configuration 96 kbit/s @ 32 kHz (stereo)	AAC main 96 kbit/s @ 32 kHz (stereo)	BS.1284 Quality scale, R/A/R/A
ER BSAC 64 kbit/s @ 32 kHz (stereo) derived from configuration 96 kbit/s @ 32 kHz (stereo)	AAC main 64 kbit/s @ 32 kHz (stereo)	BS.1284 Quality scale, R/A/R/A

Session A3 – ER AAC LD

Codec under test	Reference Codec	Test method
ER AAC LD 64 kbit/s @ 48 kHz (mono) 20 ms delay	AAC main 56 kbit/s @ 44.1 kHz (mono)	BS.1284 quality scale, R/A/R/A R: full band original
ER AAC LD 32 kbit/s @ 32 kHz (mono) 30 ms delay	AAC main 24 kbit/s @ 24 kHz (mono) G.722 64 kbit/s @ 16 kHz (mono) CELP 24 kbit/s @ 16 kHz (mono)	BS.1284 quality scale, R/A/R/A R: band limited to 8 kHz

Session A4 – Error Robustness

Codec under test	Reference Codec	Test method
ER AAC LC (incl. ER tools) 96 kbit/s @ 32 kHz (stereo) EP Tool critical error condition	ER AAC LC (incl. ER tools) 96 kbit/s @ 32 kHz (stereo)	MUSHRA (see section 5.2)
ER AAC LC (incl. ER tools) 96 kbit/s @ 32 kHz (stereo) EP Tool very critical error condition	ER AAC LC (incl. ER tools) 96 kbit/s @ 32 kHz (stereo)	MUSHRA (see section 5.2)
ER TwinVQ 16 kbit/s @ 32 kHz (mono) EP Tool critical error condition	ER TwinVQ 16 kbit/s @ 32 kHz (mono)	MUSHRA (see section 5.2)
ER TwinVQ 16 kbit/s @ 32 kHz (mono) EP Tool very critical error condition	ER TwinVQ 16 kbit/s @ 32 kHz (mono)	MUSHRA (see section 5.2)

The error conditions of this test are described in the table here below. As a typical example of wireless mobile transmission channels, burst error channel is used as Physical Layer. Its error condition is defined as below:

Name	Average Bit Error Rate	Length of Burst Error
Critical Error Condition	10^-3	10 ms
Very Critical Error Condition	10^-3	1 ms

Test Material

Two selection panels have selected test items for session A1, A2, and A3. Whenever possible, the typical and critical test items and the training items were to be distributed among the four signal categories: speech, single instrument, music, and complex signals, as show in the following table:

	Speech	Single instrument	Music	Complex
Typical	1	1	1	1
Critical	1	1	1	1
Training	1	1	1	1

Based on the test items used for the previous Audio on Internet tests (test D, see [n2278], [n2278]) the following 8 items are used:

No.	Item number	Category
1	01	speech
2	02	single instrument
3	11	single instrument
4	13	speech
5	20	complex
6	31	classical
7	33	complex
8	37	pop

Test Methodology

Test Method and Test Design for Sessions A1, A2, and A3

The subjective assessment of sound quality was done according to ITU-Recommendation BS.1284 [bs1284]. This was chosen to permit these results to be compared to those of the MPEG-4 Version 1 tests.

The following 5-grade scale was used:

5	Excellent
4	Good
3	Fair
2	Poor
1	Bad

In order to achieve higher precision in the test results the quality scale was used as a continuous scale with one decimal place.

The listening test was designed as follows:

training with the corresponding selected training items
stimuli presentation in pairs A-B (called a trial), with "A" always the reference stimulus and "B" the processed version
Each grading phase was divided into sections of approx. 20 minutes length.

Test Method and Test Design for Session A4

"Subjective assessment of sound quality" (MUSHRA) [included in n2953] was the test method used in Session A4. (This method is a proposed standard at EBU and ITU-R.)

Session A4 was separated into two parts, each with a common channel bit rate and common number of signal channels, designated as follows:

A4 @ 16 kbit/s ER TwinVQ 16 kbit/s, excluding EP or ER tool rate, mono stimuli
A4 @ 96 kbit/s ER AAC LC 96 kbit/s, excluding EP or ER tool rate, stereo stimuli

Conclusions

The MPEG-4 Audio Version 2 coding tools have undergone a performance verification test for coding of monophonic audio signals in the range of 6 kbit/s to 64 kbit/s and stereophonic audio signals in the range of 64 kbit/s to 96 kbit/s. The coding tools tested were Harmonic and Individual Lines plus Noise (ER HILN) coding, Bit Sliced Arithmetic Coding (ER BSAC), Low Delay Advanced Audio Coding (AAC LD) and the Error Robustness tools comprising Error Resilience (ER) and Error Protection (EP). These tools were tested in four distinct tests, and for each of these tests a description of the systems under test, the method of test material selection, the selected test items, the test methodology and the test results were presented.

The results of these tests support the following broad conclusions:

The base plus enhancement layers of ER HILN support a bit rate scalable coder that provides at all scalable bit rates quality comparable to that of a fixed-rate ER HILN coder at the same bit rate.
ER HILN has performance comparable to other MPEG-4 coding technology operating at similar bit rates, but provides the additional capability of independent audio signal speed or pitch change while decoding.
At the upper end of the bit rate range, ER BSAC provides quality comparable to that of AAC main at the same bit rate, and hence the scalability feature comes at no cost to performance. However at the lower end of the range, the scalability provided by ER BSAC appears to require approximately a 12.5 % bit rate overhead relative to AAC main in order for both to deliver comparable quality.
In the tests ER BSAC demonstrated scalability in approximately 12 % increments, and, for the most part, each increase in rate provided a statistically significant increase in quality.
At comparable quality levels, ER AAC LD provides a significant decrease in one-way communications delay relative to AAC main, and does so at only a modest increase in bit rate (around 8 kbit/s).
The test results indicate that the ER and EP tools are able to provide significant error robustness over a range of channel error conditions, and do so with only a modest bit rate overhead.
The test results suggest that the ER and EP tools enable MPEG-4 coding tools to provide performance in error-prone channels that is nearly as good as the same coding tools operating over a clear channel, even when the clear channel performance approaches the level of "excellent" on the impairment scale.

Heiko Purnhagen 08-Feb-2000