INTERNATIONAL ORGANISATION FOR STANDARDISATION
ORGANISATION INTERNATIONALE DE NORMALISATION
ISO/IEC JTC1/SC29/WG11
CODING OF MOVING PICTURES AND AUDIO
ISO/IEC JTC1/SC29/WG11
MPEG98/N2425
October 1998 / Atlantic City
Source: Audio and Test subgroup
Title: MPEG-4 Audio verification test results: Audio on Internet
Authors: Eric Scheirer, Sang-Wook Kim, Martin Dietz
MPEG-4 Audio verification test results: Audio on Internet
This web page contains only excerpts of the document.
The complete document is available as PDF file.
|
Introduction
The MPEG-4 Audio coding tools cover a bit rate range from 2 kbit/s
to 64 kbit/s with a corresponding subjective audio quality that needs
to be evaluated. It was recognized that the verification tests should
first address applications that are potentially of great interest for
users. To this end, three important applications for MPEG-4 audio are
being addressed in the verification tests:
- Internet Audio applications (6 to 56 kbit/s),
- digital audio broadcasting on AM modulated bands (16 to 24 kbit/s) and
- speech applications
Four different sites offered to run the listening tests: Sony (Japan), Mitsubishi Elec. America (USA), NTT (Japan) and Samsung AIT (Korea). The final results analysis was performed by MIT (USA).
The purpose of this document is to describe the procedures that have been followed and to present the outcome of the verification tests on Audio on Internet application. The remaining verification tests are handled in separate documents.
Test motivation
The highly increasing need for music transmission over networks like the Internet is the background for this test evaluating recent MPEG coders at bit rates suitable for analog modems and ISDN connections.
The comparisons of interest are:
- to compare the Twin-VQ and HILN tools provided by MPEG-4 with
existing technique for transmission of audio at bitrates below 10
kbit/s.
- to compare the HILN and AAC tools provided by MPEG-4 with
existing technique for transmission of audio at bitrates between 10
and 20 kbit/s.
- to compare the AAC-based tools for large step scaleability
provided by MPEG-4 with existing tools (unscaled AAC and MPEG Layer
3). The scaleable system provides a mono/stereo scaleable system,
offering 24 kbit/sec mono, 40 kbit/sec stereo and 56 kbit/sec stereo
in one 56 kbit/sec bitstream. The purpose of this test is to evaluate
the performance of the scaleable coding scheme in comparison with
traditional unscaled coding.
- to compare the fine granule scaleable tool AAC-BSAC provided by
MPEG-4 with unscaled AAC coding to evaluate the impact of the small
step scalability functionality on the sound quality.
Codecs under Test
Test overview
The test was divided in four groups of coding scheme/bitrates.
- Group A tests the codecs at 6 and 8 kbit/sec mono and contains
HILN, Twin-VQ and MPEG Layer-3. The reference for this Group A is MPEG
Layer 3 (MP3).
- Group B tests the codecs at 16 kbit/sec mono and contains HILN,
AAC, and G.722 at 48 kbit/sec as a reference.
Group C and D belong to the same coding system, but are separated
because the lowest layer is a mono layer while the higher layers are
stereo layers.
- Group C tests the mono core layer of the AAC large step scaleable
Coder against a unscaled AAC coder and MPEG Layer 3. The reference
coder for this Group C is MPEG Layer 3 (MP3).
- Group D tests the upper layers of the scaleable coders against
unscaled coders and contains AAC, AAC large step scaleable coder,
AAC-BSAC fine granule scaleable coder and MPEG Layer 3. The reference
coder for this Group D is MPEG Layer 3 (MP3). The AAC-BSAC coder has
no counterpart in the C-Test since it is based on a unscaled stereo
AAC coder and therefore does not provide mono/stereo
scaleability.
It should be noted that in MPEG standards only the decoder is
normative and that the MPEG-4 encoders supplied for this test are
developmental and further optimization is expected. It must be
stressed that some of the coders in the test are parametric coders
which are not designed for some natural sounds which are present in
several items used in this test.
The codecs which were tested are listed below:
Group & #codec |
Codec |
mode |
sampling rate of operation |
total bitrate (layer bitrate) in kbit/s |
A1 |
HILN |
mono |
8 |
6 |
A2 |
TwinVQ |
mono |
16 |
6 |
A3 |
MPEG Layer 3 (MP3) |
mono |
8 |
8 |
B1 |
HILN |
mono |
16 |
16 |
B2 |
AAC |
mono |
16 |
16 |
B3 |
G722 |
mono |
16 |
48 |
C1 |
AAC |
mono |
24 |
24 |
C2 |
AAC scal |
mono |
24 |
24 |
C3 |
MPEG Layer 3 |
mono |
16 |
24 |
D1 |
AAC |
stereo |
24 |
40 |
D2 |
AAC |
stereo |
24 |
56 |
D3 |
AAC scal |
stereo |
24 |
40 |
D4 |
AAC scal |
stereo |
24 |
56 |
D5 |
AAC scal (BSAC) |
stereo |
24 |
40 |
D6 |
AAC scal (BSAC) |
stereo |
24 |
56 |
D7 |
MPEG Layer 3 |
stereo |
24 |
40 |
D8 |
MPEG Layer 3 |
stereo |
24 |
56 |
Test methodology
Subjective assessment of sound quality according to
ITU-Recommendation BS.562.3
This methods use a five grade scale for scoring:
BS.562.3 Quality scale |
5 |
Excellent |
4 |
Good |
3 |
Fair |
2 |
Poor |
1 |
Bad |
The Audio and Test group recommend the use of this scale as a
continuous scale with one decimal place.
Within each test (A,B,C,D), the coders were compared to a
bandlimited reference. The bandwidth of this reference was chosen in a
way that its bandwidth was equal to the bandwidth of the coder with
the highest bandwidth.
Conclusions
The following conclusions can be drawn from the test results:
Test A
- Twin VQ at 6 kbit/sec shows statistically the same quality as
Layer 3 at 8 kbit/sec. Twin VQ is therefore a valuable MPEG-4 tool for
improved coding efficiency at lowest bitrates.
- HILN at 6 kbit/sec shows a significantly worse average quality
than Twin VQ and Layer 3 with the items used in this test. Further
investigations (see document m4087) have shown, that the quality of
HILN is highly dependent on the test material and is better than the
quality of Twin VQ for some items. The selection process within this
test, however, has been found to be correct in selecting the critical
test items. Therefore the results of this test are a valid indication
for the audio quality achieved when the coders are used as general
audio coding systems on critical material. This leads to the
conclusion that more work on HILN is required to improve the coding
quality for critical material (see also test B).
Test B
- AAC at 16 kbit/sec performed 0.6 grades worse than G722, but
operated at 1/3rd of the bitrate. It can therefore be concluded that
AAC is a valuable MPEG-4 tool for coding music signals at bitrates as
low as 16 kbit/sec.
- HILN at 16 kbit/sec performed equal or worse than AAC at 16
kbit/sec for almost all items at both test sites. The test results
also shows that the quality of HILN is again highly dependent on the
test material (see also test A). This leads to the conclusion that
more work on HILN is required to improve the coding quality for
critical material.
Test C&D
- At all three bitrates, AAC audio coding shows significantly
better audio quality than MPEG Layer 3 (around 0.8 grades).
- The Large Step Scaleable System (AAC Scaleable) shows almost the
same quality as unscaled AAC at the lower (mono) layer and about
0.4-0.5 grades worse quality at the higher (stereo) layers. Still all
Layers perform slightly better (highest layer) or significantly better
(lower and mid layer) than MPEG Layer 3. Therefore the scaleable
system shows good performance compared to older standards while
providing the additional functionality of mono/stereo scaleable
coding.
- The Small Step Scaleable System (BSAC) performed very well at the
highest bitrate of 56 kbit/sec (item 20 should be excluded from the
evaluation, see section 'Test Results'), which matches earlier
results. On the lower bitrate of 40 kbit/sec, however, BSAC performed
worse than expected. Although being mainly designed for bitrates from
40-64 kbit/sec mono at 48 kHz sampling rate, the BSAC tool is still
expected to show reasonably good performance when going from 56
kbit/sec stereo to 40 kbit/sec stereo at 24 kHz sampling rate. The
conclusion therefore is that the integration of BSAC in the MPEG-4
audio framework needs further investigation to check whether the
integration is incomplete or needs changes.
Heiko Purnhagen
12-Nov-1998