INTERNATIONAL ORGANISATION FOR STANDARDISATION
ORGANISATION INTERNATIONALE DE NORMALISATION
ISO/IEC JTC1/SC29/WG11
CODING OF MOVING PICTURES AND AUDIO
ISO/IEC JTC1/SC29/WG11
N3075
December 1999 / Maui, HI
Source: Audio Subgroup
Title: Report on the MPEG-4 Audio Version 2 Verification Test
Authors: Ralph Sperschneider (FhG), Frank Feige (T-Nova), Schuyler Quackenbush (AT&T)
This web page contains only excerpts of the document. The complete document is available as PDF file. |
The MPEG-4 Audio Version 2 coding tools have undergone a performance verification test for coding of monophonic audio signals in the range of 6 kbit/s to 64 kbit/s and stereophonic audio signals in the range of 64 kbit/s to 96 kbit/s. The coding tools tested were Harmonic and Individual Lines plus Noise (HILN) coding, Bit Sliced Arithmetic Coding (BSAC), Low Delay Advanced Audio Coding (AAC LD) and the Error Robustness tools comprising Error Resilience (ER), and Error Protection (EP). It was found that, relative to Version 1 tools, Version 2 tools provide new capabilities while still providing comparable audio quality and comparable levels of compression. New capabilities evaluated as part of these tests are parametric signal representation (allowing independent speed and pitch modification), fine step bit rate scalability, very low communications delay, and robustness to channel errors.
MPEG-4 Version 2 is the name given to technology in Amendment 1 of MPEG-4 (ISO/IEC 14496). Although it is an amendment, Version 2 is more correctly viewed as technology that required more time to develop and hence was not available at time that ISO/IEC 14496 was issued as an international standard. The purpose of the tests reported on here is to verify that Version 2 tools bring valuable technology to the MPEG-4 standard. The figure of merit in the test is subjective audio quality. This, plus each tool´s features and capabilities, permit system developers to better judge the merit of the technology as a basis for future applications.
The technology tested was Harmonic and Individual Lines plus Noise (HILN) coding, Bit Sliced Arithmetic Coding (BSAC), Low Delay Advanced Audio Coding (AAC LD) and the Error Robustness tools comprising Error Resilience (ER) and Error Protection (EP). While the Version 2 technology provides compression, it is most often compression in conjunction with other valuable features, such as very low bit rate (for HILN), very low delay (for AAC LD), fine step bit rate scalability (for BSAC) or robustness to bit stream errors (for ER and EP tools). The ER and EP tools are valuable in systems in which compressed audio information must be transmitted over error-prone channels. These may be radio channels that incur bit or byte errors, or packet channels that incur lost (or late) packets. The increasing importance of wireless communications and the Internet make these tools particularly valuable.
In this document the names of the following Audio object types are used to identify the different codecs (for details see [n3058]):
Object type ID | Audio object type | MPEG-4 version | Description |
1 | AAC main | 1 | Advanced Audio Coding in main configuration |
3 | AAC SSR | 1 | Advanced Audio Coding in scalable sampling rate configuration |
8 | CELP | 1 | Code Excited Linear Prediction |
12 | TTSI | 1 | Text to speech interface |
7 | TwinVQ | 1 | Transform Domain weighted interleave Vector Quantization |
17 | ER AAC LC | 2 | Error Resilient Advanced Audio Coding with Low Complexity |
23 | ER AAC LD | 2 | Error Resilient Advanced Audio Coding with Low Delay |
20 | ER AAC scalable | 2 | Error Resilient scalable Advanced Audio Coding |
22 | ER BASC | 2 | Error Resilient Bit Sliced Arithmetic Coding |
26 | ER HILN | 2 | Error Resilient Harmonic and Individual Lines plus Noise |
25 | ER HVXC | 2 | Error Resilient Harmonic Vector Excitation Coding |
21 | ER TwinVQ | 2 | Error Resilient Transform Domain weighted interleave Vector Quantization |
The set of new tools provided by MPEG-4 Audio Version 2 is listed below:
New codecs:
Codec extensions:
Error robustness:
Out of this pool, the following Version 2 object types have been evaluated in this test:
No per-item tuning was permitted on any of the codecs involved in these verification tests.
During the Vancouver MPEG meeting it was decided to test the following Version 2 coding tools in three distinct sessions: ER HILN, ER BSAC and ER AAC LD. It was also decided to test in a separate session ER and EP tools as they apply to ER AAC LC and ER TwinVQ. The four sessions are designated A1, A2, A3, and A4.
The tables in this chapter indicate the parameters for the respective codec under test, the test method, and the reference codec. The reference codec serves as an anchor in the test, permitting results from this test to be more easily compared to that of previous tests in which the same reference codec was also tested.
Codec under test | Reference Codec | Test method |
ER HILN 6 kbit/s @ 16 kHz (mono) |
TwinVQ 6 kbit/s @ 16 kHz (mono) |
BS.1284 quality scale, R/A R: band limited to 8 kHz |
ER HILN scalable 6 kbit/s @ 16 kHz (mono) based on scalable configuration: 6 kbit/s @ 16 kHz (mono) + 10 kbit/s @ 16 kHz (mono) |
TwinVQ 6 kbit/s @ 16 kHz (mono) |
BS.1284 quality scale, R/A R: band limited to 8 kHz |
ER HILN 16 kbit/s @ 16 kHz (mono) |
AAC main 16 kbit/s @ 22.05 kHz (mono) |
BS.1284 quality scale, R/A R: band limited to 8 kHz |
ER HILN scalable 16 kbit/s @ 16 kHz (mono) based on scalable configuration: 6 kbit/s @ 16 kHz (mono) + 10 kbit/s @ 16 kHz (mono) |
AAC main 16 kbit/s @ 22.05 kHz (mono) |
BS.1284 quality scale, R/A R: band limited to 8 kHz |
Codec under test | Reference Codec | Test method |
ER BSAC 96 kbit/s @ 32 kHz (stereo) |
AAC main 96 kbit/s @ 32 kHz (stereo) |
BS.1284 Quality scale, R/A/R/A |
ER BSAC 88 kbit/s @ 32 kHz (stereo) derived from configuration 96 kbit/s @ 32 kHz (stereo) |
AAC main 96 kbit/s @ 32 kHz (stereo) |
BS.1284 Quality scale, R/A/R/A |
ER BSAC 80 kbit/s @ 32 kHz (stereo) derived from configuration 96 kbit/s @ 32 kHz (stereo) |
AAC main 96 kbit/s @ 32 kHz (stereo) |
BS.1284 Quality scale, R/A/R/A |
ER BSAC 72 kbit/s @ 32 kHz (stereo) derived from configuration 96 kbit/s @ 32 kHz (stereo) |
AAC main 96 kbit/s @ 32 kHz (stereo) |
BS.1284 Quality scale, R/A/R/A |
ER BSAC 64 kbit/s @ 32 kHz (stereo) derived from configuration 96 kbit/s @ 32 kHz (stereo) |
AAC main 64 kbit/s @ 32 kHz (stereo) |
BS.1284 Quality scale, R/A/R/A |
Codec under test | Reference Codec | Test method |
ER AAC LD 64 kbit/s @ 48 kHz (mono) 20 ms delay |
AAC main 56 kbit/s @ 44.1 kHz (mono) |
BS.1284 quality scale, R/A/R/A R: full band original |
ER AAC LD 32 kbit/s @ 32 kHz (mono) 30 ms delay |
AAC main 24 kbit/s @ 24 kHz (mono) G.722 64 kbit/s @ 16 kHz (mono) CELP 24 kbit/s @ 16 kHz (mono) |
BS.1284 quality scale, R/A/R/A R: band limited to 8 kHz |
Codec under test | Reference Codec | Test method |
ER AAC LC (incl. ER tools) 96 kbit/s @ 32 kHz (stereo) EP Tool critical error condition |
ER AAC LC (incl. ER tools) 96 kbit/s @ 32 kHz (stereo) |
MUSHRA (see section 5.2) |
ER AAC LC (incl. ER tools) 96 kbit/s @ 32 kHz (stereo) EP Tool very critical error condition |
ER AAC LC (incl. ER tools) 96 kbit/s @ 32 kHz (stereo) |
MUSHRA (see section 5.2) |
ER TwinVQ 16 kbit/s @ 32 kHz (mono) EP Tool critical error condition |
ER TwinVQ 16 kbit/s @ 32 kHz (mono) |
MUSHRA (see section 5.2) |
ER TwinVQ 16 kbit/s @ 32 kHz (mono) EP Tool very critical error condition |
ER TwinVQ 16 kbit/s @ 32 kHz (mono) |
MUSHRA (see section 5.2) |
The error conditions of this test are described in the table here below. As a typical example of wireless mobile transmission channels, burst error channel is used as Physical Layer. Its error condition is defined as below:
Name | Average Bit Error Rate | Length of Burst Error |
Critical Error Condition | 10-3 | 10 ms |
Very Critical Error Condition | 10-3 | 1 ms |
Two selection panels have selected test items for session A1, A2, and A3. Whenever possible, the typical and critical test items and the training items were to be distributed among the four signal categories: speech, single instrument, music, and complex signals, as show in the following table:
Speech | Single instrument | Music | Complex | |
Typical | 1 | 1 | 1 | 1 |
Critical | 1 | 1 | 1 | 1 |
Training | 1 | 1 | 1 | 1 |
Based on the test items used for the previous Audio on Internet tests (test D, see [n2278], [n2278]) the following 8 items are used:
No. | Item number | Category |
1 | 01 | speech |
2 | 02 | single instrument |
3 | 11 | single instrument |
4 | 13 | speech |
5 | 20 | complex |
6 | 31 | classical |
7 | 33 | complex |
8 | 37 | pop |
The subjective assessment of sound quality was done according to ITU-Recommendation BS.1284 [bs1284]. This was chosen to permit these results to be compared to those of the MPEG-4 Version 1 tests.
The following 5-grade scale was used:
5 | Excellent |
4 | Good |
3 | Fair |
2 | Poor |
1 | Bad |
In order to achieve higher precision in the test results the quality scale was used as a continuous scale with one decimal place.
The listening test was designed as follows:
"Subjective assessment of sound quality" (MUSHRA) [included in n2953] was the test method used in Session A4. (This method is a proposed standard at EBU and ITU-R.)
Session A4 was separated into two parts, each with a common channel bit rate and common number of signal channels, designated as follows:
The MPEG-4 Audio Version 2 coding tools have undergone a performance verification test for coding of monophonic audio signals in the range of 6 kbit/s to 64 kbit/s and stereophonic audio signals in the range of 64 kbit/s to 96 kbit/s. The coding tools tested were Harmonic and Individual Lines plus Noise (ER HILN) coding, Bit Sliced Arithmetic Coding (ER BSAC), Low Delay Advanced Audio Coding (AAC LD) and the Error Robustness tools comprising Error Resilience (ER) and Error Protection (EP). These tools were tested in four distinct tests, and for each of these tests a description of the systems under test, the method of test material selection, the selected test items, the test methodology and the test results were presented.
The results of these tests support the following broad conclusions:
The base plus enhancement layers of ER HILN support a bit rate scalable coder that provides at all scalable bit rates quality comparable to that of a fixed-rate ER HILN coder at the same bit rate.
ER HILN has performance comparable to other MPEG-4 coding technology operating at similar bit rates, but provides the additional capability of independent audio signal speed or pitch change while decoding.
At the upper end of the bit rate range, ER BSAC provides quality comparable to that of AAC main at the same bit rate, and hence the scalability feature comes at no cost to performance. However at the lower end of the range, the scalability provided by ER BSAC appears to require approximately a 12.5 % bit rate overhead relative to AAC main in order for both to deliver comparable quality.
In the tests ER BSAC demonstrated scalability in approximately 12 % increments, and, for the most part, each increase in rate provided a statistically significant increase in quality.
At comparable quality levels, ER AAC LD provides a significant decrease in one-way communications delay relative to AAC main, and does so at only a modest increase in bit rate (around 8 kbit/s).
The test results indicate that the ER and EP tools are able to provide significant error robustness over a range of channel error conditions, and do so with only a modest bit rate overhead.
The test results suggest that the ER and EP tools enable MPEG-4 coding tools to provide performance in error-prone channels that is nearly as good as the same coding tools operating over a clear channel, even when the clear channel performance approaches the level of "excellent" on the impairment scale.
Heiko Purnhagen 08-Feb-2000