INTERNATIONAL ORGANISATION FOR STANDARDISATION
ORGANISATION INTERNATIONALE DE NORMALISATION
ISO/IEC JTC1/SC29/WG11
CODING OF MOVING PICTURES AND AUDIO

ISO/IEC JTC1/SC29/WG11
MPEG98/N2424
October 1998 / Atlantic City

Source: MPEG Audio and Test subgroups
Title: Report on the MPEG-4 speech codec verification tests
Authors: Pasi Ojala (Nokia Research Center), Henri Toukomaa (Nokia Research Center), Takehiro Moriya (NTT) and Oliver Kunz (FhG)



Report on the MPEG-4 speech codec verification tests


This web page contains only excerpts of the document. The complete document is available as PDF file.


Introduction

The MPEG-4 Audio coding tools cover a bit rate range from 2 kbit/s to 64 kbit/s with a corresponding subjective audio quality. Therefore, the MPEG-4 verification tests were carried out in several parts. The tests were related first of all Internet audio applications applying codecs with bit-rates ranging from 20 to 56 kbit/s, digital audio broadcasting on AM modulated bands with bit-rates of 16 to 24 kbit/s and speech applications. This document presents the MPEG-4 audio verification test results on speech coders. The performance of speech coders is evaluated in comparison with other standard coders. In this document the results of three independent test sites are presented.

Test Format

The test was defined in the Tokyo and Dublin meetings [1]. The following decisions were taken:

Due to the different technology and different band-width applied in the speech coders, the test had to be divided in three groups:

Test 1 contains narrow band parametric speech coders with 2 and 4 kbit/s. FS1016 was selected as a reference coder.

Codec Bit rate (kbit/s)
Parametric 2, 4
Ref. FS1016 4.8

Test 2 contains narrow band CELP (NB-CELP) coders bit-rates ranging from 6 to 12 kbit/s. The test contains fixed bit-rate as well as bit-rate scalable coders. G.723.1, G.729 and GSM EFR coders operate as reference coders.

Codec Bit rate (kbit/s)
CELP (Mode VIII multi rate) 6, 8.3, 12
CELP (Mode VIII scaleable) 8, 12
Ref. ITU-T G723.1 6.3
Ref. ITU-T G729 8
Ref. GSM-EFR 12.2

Test 3 contains the wide-band CELP (WB-CELP) coders with bit-rates ranging from 17.9 to 18.2 kbit/s as well as bandwidth scaleable CELP at 16 kbit/s. G.722 and MPEG2 layer 3 coders operate as reference coders.

Codec Bit rate (kbit/s)
CELP (fixed rate Mode III) 18.2
CELP (BWscalable) 16
Optimized VQ+MPE 17.9
Optimized VQ +RPE 18.1
Ref. G.722 48, 56
Ref. MPEG-2 Layer 3 24

Test method

Absolute Category Rating (ACR) method according to ITU-T Recommendation P.800 was used. A five-grade scale for scoring was used:

ACR scale
5 Excellent
4 Good
3 Fair
2 Poor
1 Bad

The test sites and the number of valid listeners are shown below. Originally two more listeners (one for experiment 2 and one for experiment 3) took part in the test at FhG site. When analysing the results, all scores of these listeners in the experiment were discarded, since there were missing rating scores that could not be recovered.

  Japanese items European items
Test site NTT FhG NRC
Native language of listeners Japanese German Finnish
Number of listeners Exp 1 16 18 16
Number of listeners Exp 2 16 17 16
Number of listeners Exp 3 16 16 16

Analysis of the test

Remarks on each experiment

Experiment 1

MPEG-4 HVXC at both 2.0 and 4.0 kbit/s outperform the reference codec FS1016 at 4.8 kbit/s. Additionally, the HVXC coder has functionality, such as pitch and speed change and bit-rate scalability.

Experiment 2

The MPEG-4 NB-CELP coder with bit-rate ranging from 6 to 12 kbit/s provides competitive quality compared with the speech coding standards that were optimised for a single specific bit-rate.

Furthermore, the tested MPEG-4 CELP coder offers bit-rate scalability. The speech quality can be improved step-by-step by adding enhancement layers on top of the base layer coder.

There are some differences in quality depending on the tested language and input items.

Experiment 3

MPEG-4 WB-CELP coders for wide-band speech signals provide competitive quality compared with G.722 at 48 kbit/s and MPEG-2 Layer III at 24 kbit/s, as far as speech signals are concerned, at the bit-rate of 18 kbit/s with additional functionality, such as bit-rate, bandwidth and complexity scalability.

There are some differences in quality depending on the tested language and input items.


(MPEG Audio Web Page) (Tree) (Up)

Heiko Purnhagen 12-Nov-1998