HILN / Re: MPEG-4 Version 2 FCD ...

From: Heiko Purnhagen (purnhage@tnt.uni-hannover.de)
Date: Wed Jan 19 2000 - 18:42:35 EST


Dear John, all,

John Lazzaro wrote:
> Just noticed a few days ago that the Final Committee Draft
> of MPEG-4 Version 2 is now available to the general public:
>
> http://sound.media.mit.edu/mpeg4/audio/documents/w2803.html#ftp
>
> There's no obvious changes directly relevent to Structured
> Audio and thus sfront, at least from a cursory glance over the
> document -- hopefully someone in the subgroup listening in will
> correct me if I'm wrong ...

Yes. MPEG-4 Version 2 (an extension of V1, formally "Amendment 1")
contains no changes related to the Structured Audio Tools in V1. The
only aspect of the SA tools which is still under some discussion is the
problem of definition of (complexity) levels to permit the specification
of so-called "conformance points". More details about this have just
recently been pointed out here by Giorgio, who is currently working on
the very final editing of the V1 conformance document (and 10 days to go
...).

Just to give you a brief idea about the new audio-related features of
MPEG-4 that become available with V2:
 - Error Resilience for the audio coders
 - Low-Delay Audio Coding and Small Step Scalability, both for
     the transform transform base audio coders
 - Parametric Audio Coding (HILN, see below)
 - Environmental Spatialisation

The Environmental Spatialisation tools in V2 are probably of most
interest to SAOL people. As the SA tools in V1 are not only used for
synthesis but also for composition (mixing, effects processing) of
so-called natural and synthetic audio objects, the Environmental
Spatialisation tools of V2 permit easier and more flexible description
of the composition of audio objects to form a "audio scene". Both a
"physical approach" (somewhat VRML-like, e.g. for 3D VR with audio) and
a "perceptual approach" (using parameters for reverberation, source
presence etc.) are supported.

For more info, check out document w2803 (see above), some other links on
the mpeg audio web page
(http://www.tnt.uni-hannover.de/project/mpeg/audio, also mirror by Eric
at the MIT, updated FAQ available in 2 weeks), or some publications
avaiable from my own home page (see signature).

> However, there is a new codec (HILN, for Harmonic and Individual
> Line plus Noise) that is a parametric normative codec designed for
> music compression at low bitrates, that might actually be an interesing
> device to drive as a synthesis engine ...

Its nice to see HILN being mentioned here as well ;-) ...
Just to give you a brief idea of what´s behind this acronym: HILN
permits coding of natural audio and speech signals (typically sampled at
16 kHz) with bitrates typically in the range of 6 to 16 kbit/s. Similar
to sinusoidal coding schemes, it uses a "parametric representation" of
the audio signal and is based on the decomposition of the audio signal
into components which are described by appropriate source models and
represented by model parameters. The basic component types and
parameters are:
 - harmonic lines:
     fundamental frequency, amplitude, and spectrum (LPC parameters)
 - individual lines (sinusoids):
     frequency and amplitude, optional amplitude envelope and phase
 - noise:
     noise energy and spectrum (described by LPC parameters)

HILN provides coding efficiency similar to the transform-based MPEG-4
coders and - because of its signal represetation - permits independent
speed and pitch modification in the decoder at no extra costs, as well
as bitrate scalability.

But now back to John´s inital point: I also like the idea of using HILN
as a synthesis engine very much! While the HILN synthesis process is
_much_ less flexible that what is possible with SAOL, it has on the
other hand a clearly bound computational complexity: 16 kbit/s HILN
bistreams require max. 100 MHz of a Pentium to decode, with future
optimised decoders maybe even less.
 
Because of the amplitude/frequency-like parameters used by HILN, it
shouldn´t be too difficult to write a transcoder that converts a score
(e.g. in SASL or MIDI) into a HILN bitstream. The difficult aspect is
probably how to convert the instrument´s sounds as described by SAOL
into the "additive synthesis"-like HILN parametrisation. And it also
might be worthwhile to consider to use some kind of psychoacoustic model
in this transcoder to remove less important or inaudible components from
the HILN bistream - as done in any natural audio encoder.

To conclude, I´d like to invite anybody interested in these ideas to get
in contact with me - and I´ll try to provide any assitance possible for
further activities related to HILN transcoding, synthesis, etc. ...

Best regards

     Heiko

----------------------------------------------------------------------
Dipl.-Ing. Heiko Purnhagen Universitaet Hannover
                                               Laboratorium fuer
mailto:purnhage@tnt.uni-hannover.de Informationstechnologie
http://www.tnt.uni-hannover.de/~purnhage/ Schneiderberg 32
phone: +49-511-762-5033 D-30167 Hannover
fax: +49-511-762-5052 Germany
----------------------------------------------------------------------



This archive was generated by hypermail 2b29 : Mon Jan 28 2002 - 12:03:50 EST