Re: MPEG 4-SA for audio compression?

From: John Lazzaro (lazzaro@cs.berkeley.edu)
Date: Mon Apr 24 2000 - 13:01:58 EDT


> What about future posiblilities? What are your views on the paper
> "Generalized Audio Coding with MPEG 4 Structured Audio" by Eric Scheirer and
> Youngmoo Kim? Couldn't that be part of a MPEG4 codec in the future?

Personally, I think that scenareo is quite reasonable -- people coming
up with new ideas for encode-decode schemes, implementing the encoder
in a language like C, generating an MP4-SA compatible bitstream, and
implementing the decoder in SAOL that also becomes part of the bitstream.

I think an interesting question (that maybe is addressed in the
Scheirer and Kim paper, I haven't read it yet) is that if your coding
algorithm uses data types that are less than 16-bits long, and you
want to pack them into a MP4-SA compatible bitstream, and reliably decode
them in SAOL in a normative way, there's some trickery needed.
The obvious way to do this is the use the SASL table command with
the sample wavetable generator, and use the "fixed sample" option
of class_sample (when sends an array of 16-bit ints) to send your
packed data, and then unpack it in SAOL. The issue here is, the
sample wavetable generator converts these ints to floats between
-1.0 to 1.0. Doing floating-point math to reliably extract (say)
a 6 bit number concatenated to a 7 bit number concatented to a
three bit number, give a float between -1.0 and 1.0 that codes
the concatented triplet, may be tough to do -- roundoff will be
your enemy. If you know for certain the decoder is using IEEE
floats, you might be better off craftily encoding these bit values
inside the float representation, and doing extraction with floating
point math. But clearly, life becomes easier if there's a way for
the sample wavetable generator to normatively map the 16-bit ints
into values between -32768 and 32767, instead of -1 to 1 ... and
even better, add a representation to class sample designed for
tightly packed irregular-width integer data.

Of course, this whole line of reasoning assumes you're doing things
"the old-fashioned way" of quantizing integers to do coding -- maybe
if you accept from the outset that you're unit of coding is a 16-bit
int that maps into -1 to 1, you can devise algorithms that compress
well in that format, using all the bits as they naturally exist in
the word instead of using them in irregular chunks ...

Finally, its a good question to ask "why would you bother to do this"
rather than just use the natural codecs already in MPEG 4 audio,
assuming your terminals have both Structured Audio and the natural
codec decoders built in? Three reasons come to mind:

-- Special-purpose coding algorithms for subsets of natural sounds,
like the singing voice in isolation.

-- Patent avoidance -- a public-domain natural coding algorithm can
have both its encoder and decoder available as GPL'd software, and
Structured Audio solves the problem of "how to get the decoder installed
by the masses".

-- Experimentally deploying new general-purpose algorithms.

                                                                --jl

-------------------------------------------------------------------------
John Lazzaro -- Research Specialist -- CS Division -- EECS -- UC Berkeley
lazzaro [at] cs [dot] berkeley [dot] edu www.cs.berkeley.edu/~lazzaro
-------------------------------------------------------------------------



This archive was generated by hypermail 2b29 : Mon Jan 28 2002 - 11:46:39 EST