INTERNATIONAL ORGANISATION FOR STANDARDISATION
ORGANISATION INTERNATIONALE DE NORMALISATION
ISO/IEC JTC1/SC29/WG11
CODING OF MOVING PICTURES AND AUDIO

ISO/IEC JTC1/SC29/WG11
N3075
December 1999 / Maui, HI

Source: Audio Subgroup
Title: Report on the MPEG-4 Audio Version 2 Verification Test
Authors: Ralph Sperschneider (FhG), Frank Feige (T-Nova), Schuyler Quackenbush (AT&T)



Report on the MPEG-4 Audio Version 2 Verification Test


This web page contains only excerpts of the document. The complete document is available as PDF file.


Table of Contents (excerpt)

Summary

The MPEG-4 Audio Version 2 coding tools have undergone a performance verification test for coding of monophonic audio signals in the range of 6 kbit/s to 64 kbit/s and stereophonic audio signals in the range of 64 kbit/s to 96 kbit/s. The coding tools tested were Harmonic and Individual Lines plus Noise (HILN) coding, Bit Sliced Arithmetic Coding (BSAC), Low Delay Advanced Audio Coding (AAC LD) and the Error Robustness tools comprising Error Resilience (ER), and Error Protection (EP). It was found that, relative to Version 1 tools, Version 2 tools provide new capabilities while still providing comparable audio quality and comparable levels of compression. New capabilities evaluated as part of these tests are parametric signal representation (allowing independent speed and pitch modification), fine step bit rate scalability, very low communications delay, and robustness to channel errors.

Introduction

MPEG-4 Version 2 is the name given to technology in Amendment 1 of MPEG-4 (ISO/IEC 14496). Although it is an amendment, Version 2 is more correctly viewed as technology that required more time to develop and hence was not available at time that ISO/IEC 14496 was issued as an international standard. The purpose of the tests reported on here is to verify that Version 2 tools bring valuable technology to the MPEG-4 standard. The figure of merit in the test is subjective audio quality. This, plus each tool´s features and capabilities, permit system developers to better judge the merit of the technology as a basis for future applications.

The technology tested was Harmonic and Individual Lines plus Noise (HILN) coding, Bit Sliced Arithmetic Coding (BSAC), Low Delay Advanced Audio Coding (AAC LD) and the Error Robustness tools comprising Error Resilience (ER) and Error Protection (EP). While the Version 2 technology provides compression, it is most often compression in conjunction with other valuable features, such as very low bit rate (for HILN), very low delay (for AAC LD), fine step bit rate scalability (for BSAC) or robustness to bit stream errors (for ER and EP tools). The ER and EP tools are valuable in systems in which compressed audio information must be transmitted over error-prone channels. These may be radio channels that incur bit or byte errors, or packet channels that incur lost (or late) packets. The increasing importance of wireless communications and the Internet make these tools particularly valuable.

In this document the names of the following Audio object types are used to identify the different codecs (for details see [n3058]):

Object type ID Audio object type MPEG-4 version Description
1 AAC main 1 Advanced Audio Coding in main configuration
3 AAC SSR 1 Advanced Audio Coding in scalable sampling rate configuration
8 CELP 1 Code Excited Linear Prediction
12 TTSI 1 Text to speech interface
7 TwinVQ 1 Transform Domain weighted interleave Vector Quantization
17 ER AAC LC 2 Error Resilient Advanced Audio Coding with Low Complexity
23 ER AAC LD 2 Error Resilient Advanced Audio Coding with Low Delay
20 ER AAC scalable 2 Error Resilient scalable Advanced Audio Coding
22 ER BASC 2 Error Resilient Bit Sliced Arithmetic Coding
26 ER HILN 2 Error Resilient Harmonic and Individual Lines plus Noise
25 ER HVXC 2 Error Resilient Harmonic Vector Excitation Coding
21 ER TwinVQ 2 Error Resilient Transform Domain weighted interleave Vector Quantization

The set of new tools provided by MPEG-4 Audio Version 2 is listed below:

New codecs:

Codec extensions:

Error robustness:

Out of this pool, the following Version 2 object types have been evaluated in this test:

No per-item tuning was permitted on any of the codecs involved in these verification tests.

Codecs under Test

During the Vancouver MPEG meeting it was decided to test the following Version 2 coding tools in three distinct sessions: ER HILN, ER BSAC and ER AAC LD. It was also decided to test in a separate session ER and EP tools as they apply to ER AAC LC and ER TwinVQ. The four sessions are designated A1, A2, A3, and A4.

The tables in this chapter indicate the parameters for the respective codec under test, the test method, and the reference codec. The reference codec serves as an anchor in the test, permitting results from this test to be more easily compared to that of previous tests in which the same reference codec was also tested.

Session A1 – ER HILN

Codec under test Reference Codec Test method
ER HILN
6 kbit/s @ 16 kHz (mono)
TwinVQ
6 kbit/s @ 16 kHz (mono)
BS.1284
quality scale, R/A
R: band limited to 8 kHz
ER HILN scalable
6 kbit/s @ 16 kHz (mono)
based on scalable configuration:
6 kbit/s @ 16 kHz (mono) +
10 kbit/s @ 16 kHz (mono)
TwinVQ
6 kbit/s @ 16 kHz (mono)
BS.1284
quality scale, R/A
R: band limited to 8 kHz
ER HILN
16 kbit/s @ 16 kHz (mono)
AAC main
16 kbit/s @ 22.05 kHz (mono)
BS.1284
quality scale, R/A
R: band limited to 8 kHz
ER HILN scalable
16 kbit/s @ 16 kHz (mono)
based on scalable configuration:
6 kbit/s @ 16 kHz (mono) +
10 kbit/s @ 16 kHz (mono)
AAC main
16 kbit/s @ 22.05 kHz (mono)
BS.1284
quality scale, R/A
R: band limited to 8 kHz

Session A2 – ER BSAC

Codec under test Reference Codec Test method
ER BSAC
96  kbit/s @ 32 kHz (stereo)
AAC main
96 kbit/s @ 32 kHz (stereo)
BS.1284
Quality scale, R/A/R/A
ER BSAC
88  kbit/s @ 32 kHz (stereo)
derived from configuration
96  kbit/s @ 32 kHz (stereo)
AAC main
96 kbit/s @ 32 kHz (stereo)
BS.1284
Quality scale, R/A/R/A
ER BSAC
80  kbit/s @ 32 kHz (stereo)
derived from configuration
96  kbit/s @ 32 kHz (stereo)
AAC main
96 kbit/s @ 32 kHz (stereo)
BS.1284
Quality scale, R/A/R/A
ER BSAC
72  kbit/s @ 32 kHz (stereo)
derived from configuration
96  kbit/s @ 32 kHz (stereo)
AAC main
96 kbit/s @ 32 kHz (stereo)
BS.1284
Quality scale, R/A/R/A
ER BSAC
64  kbit/s @ 32 kHz (stereo)
derived from configuration
96  kbit/s @ 32 kHz (stereo)
AAC main
64 kbit/s @ 32 kHz (stereo)
BS.1284
Quality scale, R/A/R/A

Session A3 – ER AAC LD

Codec under test Reference Codec Test method
ER AAC LD
64 kbit/s @ 48 kHz (mono)
20 ms delay
AAC main
56 kbit/s @ 44.1 kHz (mono)
BS.1284
quality scale, R/A/R/A
R: full band original
ER AAC LD
32 kbit/s @ 32 kHz (mono)
30 ms delay
AAC main
24 kbit/s @ 24 kHz (mono)
G.722
64 kbit/s @ 16 kHz (mono)
CELP
24 kbit/s @ 16 kHz (mono)
BS.1284
quality scale, R/A/R/A
R: band limited to 8 kHz

Session A4 – Error Robustness

Codec under test Reference Codec Test method
ER AAC LC (incl. ER tools)
96 kbit/s @ 32 kHz (stereo)
EP Tool
critical error condition
ER AAC LC (incl. ER tools)
96 kbit/s @ 32 kHz (stereo)
 
MUSHRA (see section 5.2)
ER AAC LC (incl. ER tools)
96 kbit/s @ 32 kHz (stereo)
EP Tool
very critical error condition
ER AAC LC (incl. ER tools)
96 kbit/s @ 32 kHz (stereo)
 
MUSHRA (see section 5.2)
ER TwinVQ
16 kbit/s @ 32 kHz (mono)
EP Tool
critical error condition
ER TwinVQ
16 kbit/s @ 32 kHz (mono)
MUSHRA (see section 5.2)
ER TwinVQ
16 kbit/s @ 32 kHz (mono)
EP Tool
very critical error condition
ER TwinVQ
16 kbit/s @ 32 kHz (mono)
MUSHRA (see section 5.2)

The error conditions of this test are described in the table here below. As a typical example of wireless mobile transmission channels, burst error channel is used as Physical Layer. Its error condition is defined as below:

Name Average Bit Error Rate Length of Burst Error
Critical Error Condition 10-3 10 ms
Very Critical Error Condition 10-3 1 ms

Test Material

Two selection panels have selected test items for session A1, A2, and A3. Whenever possible, the typical and critical test items and the training items were to be distributed among the four signal categories: speech, single instrument, music, and complex signals, as show in the following table:

  Speech Single instrument Music Complex
Typical 1 1 1 1
Critical 1 1 1 1
Training 1 1 1 1

Based on the test items used for the previous Audio on Internet tests (test D, see [n2278], [n2278]) the following 8 items are used:

No. Item number Category
1 01 speech
2 02 single instrument
3 11 single instrument
4 13 speech
5 20 complex
6 31 classical
7 33 complex
8 37 pop

Test Methodology

Test Method and Test Design for Sessions A1, A2, and A3

The subjective assessment of sound quality was done according to ITU-Recommendation BS.1284 [bs1284]. This was chosen to permit these results to be compared to those of the MPEG-4 Version 1 tests.

The following 5-grade scale was used:

5 Excellent
4 Good
3 Fair
2 Poor
1 Bad

In order to achieve higher precision in the test results the quality scale was used as a continuous scale with one decimal place.

The listening test was designed as follows:

Test Method and Test Design for Session A4

"Subjective assessment of sound quality" (MUSHRA) [included in n2953] was the test method used in Session A4. (This method is a proposed standard at EBU and ITU-R.)

Session A4 was separated into two parts, each with a common channel bit rate and common number of signal channels, designated as follows:

Conclusions

The MPEG-4 Audio Version 2 coding tools have undergone a performance verification test for coding of monophonic audio signals in the range of 6 kbit/s to 64 kbit/s and stereophonic audio signals in the range of 64 kbit/s to 96 kbit/s. The coding tools tested were Harmonic and Individual Lines plus Noise (ER HILN) coding, Bit Sliced Arithmetic Coding (ER BSAC), Low Delay Advanced Audio Coding (AAC LD) and the Error Robustness tools comprising Error Resilience (ER) and Error Protection (EP). These tools were tested in four distinct tests, and for each of these tests a description of the systems under test, the method of test material selection, the selected test items, the test methodology and the test results were presented.

The results of these tests support the following broad conclusions:


(MPEG Audio Web Page) (Tree) (Up)

Heiko Purnhagen 08-Feb-2000