Request for Comment: sfront optimizations

From: John Lazzaro (lazzaro@CS.Berkeley.EDU)
Date: Thu Mar 09 2000 - 21:10:57 EST


> I'm contemplating making the following changes to the current SFRONT
> sources on an experimental basis to see if it improves performance of
> the SFRONT compiler.

This sounds like a great project ...

Sfront's has one foot in the computer systems research world, and one
foot in the "rough consensus and running code" IETF tradition. The
goal is to implement optimizations, while maintaining a stable tool
people can use for multimedia research. We've thought about some of
the issues you bring up, but made the decision to delay doing work on
them right away -- but if you want to do it first, that's fantastic.

Some specific comments:

> (1) Convert the state variables from arrays to explicit structure
> members.

All sfront indexes into the array are constants, albiet constants off
of a pointer -- nstate is a pointer passed in to the function, and all
accesses done are via:

nstate->v[constant].f or nstate->v[constant].i

I would _think_ this gives just as much information to the compiler
as an array of explicit structures, but I haven't checked gcc's
output to verify this. It is true that not passing nstate into the
function, but rather crafting a different function for each instance
that accesses elements in a giant structure, would give the compiler
more information. There may be a tricky way to do this without
literally having a function for every note played ... but if not,
its a pretty large cost to be paid in C file compilation time.

> (3) The practice of performing a single A cycle across all instruments
> instead of doing a K-Cycles-worth of A cycles for each instrument denies
> the compiler opportunites to optimize temporary variable usage in the
> FPU, and in most cases will deny the compiler an opportunity to fully
> optimize the super-scalar execution of code.

The tradeoff here is that holding A/K intermediate values for each bus
fills up lines in the cache -- and there may be many buses. Is what
you gain by improved compiler optimizations lost by more cache misses?
I've been hoping to do a set of experiments to answer this, over
difference cache sizes and over a suite of SAOL programs. The cache
issue also impacts the more aggressive strategy of blocking
line-by-line, which would be the compiled version of what Giorgio Zoia
and collaborators do in their virtual machine work.

I'd suggest reading FDIS section 5.7.3.3.6 part 10 Notes 1 and 2 (page
26) before starting this.

> (3) Eliminate *all* calls during kpass and apass operations.

"Some" or "most" will probably be faster than "all" because of I-cache
issues -- a huge function that only runs once, and is inlined in a bad
location in the calling function, could make things run slower when
inlined.

> (1) floats have a distinct drawback on x86 platforms: they're actually
> SLOWER than doubles.

The FDIS forbids this -- so to be a compliant MPEG 4 Structured Audio
decoder, you have to use floats. See the posting Eric Scheirer made to
saol-devs about this (appended to the end of my email) for details, or
read in the FDIS page 29, 5.8.3, for the rationale. It is true that
creating a "64-bit mode" in sfront which is non-compliant would be a
good option to add (many people would use it as the default, FDIS
nonwithstanding), but one could make the case that the performance of
sfront as an MPEG 4 decoder should be made in compliant-mode, which
would disallow using doubles. On the bright side, not using doubles
gives you a 2X boost in effective cache size.

                                                                --jl

From a posting Eric Schierer made to saol-devs in August (search
http://sound.media.mit.edu/~eds/mpeg4-old/saol-dev-archive to find
the complete discussion):

----

From: "Eric Scheirer" <eds@media.mit.edu> To: "Richard Dobson" <rwd@cableinet.co.uk>, "Saol-dev" <saol-dev@media.mit.edu> Subject: Re: Things I would like to see Date: Sun, 29 Aug 1999 13:09:57 -0400

Another nitpick:

>There are probably several cases where ~internal~ calculation can be >done in doubles (filter coefficients being the main target), so long as >the final output word is a float, for compliance with the standard.

This is backwards from the spec. The spec doesn't say anything about final output formats when SAOL is in a standalone implementation (ie not integrated into a bigger MPEG-4 audio-video system). What it does say is that internal calculations must be done in 32-bit floating point format.

It also deprecates the creation of instruments that break if you run them in 64-bit floating-point calculation, BTW, to accept the reality that some people will want to build 64-bit implementations. So instruments *must* work in 32-bit floating point and are strongly encouraged to work in 64-bit floating point, in order to comply with the standard.

----



This archive was generated by hypermail 2b29 : Mon Jan 28 2002 - 12:03:53 EST