Re: looking for other saol users

From: John Lazzaro (lazzaro@cs.berkeley.edu)
Date: Mon Jan 17 2000 - 13:47:46 EST


> We'd like =
> to know if our SAOL-file is programmed efficiently. If not, how should
> it be programmed otherwise?

Taking a look at the code and the profiling output from the sa.c
program sfront generates, I'd say as far as sfront is concerned,
its very efficiently written -- practically all the code I would
look at and say "you could rewrite this to go faster", sfront had
in fact optimized into the faster code already. Here's the profiling
of the top procedure calls (the total "cumulative seconds" here is
8.00 seconds):

Each sample counts as 0.01 seconds.
  % cumulative self self total
 time seconds seconds calls us/call us/call name
 19.27 1.54 1.54 800008 1.92 8.99 main_apass
  9.51 2.30 0.76 720040 1.06 1.06 tone_lopass11
  7.63 2.91 0.61 800000 0.76 0.76 pad_flange6
  7.63 3.52 0.61 720040 0.85 0.85 tone_flange14
  6.01 4.00 0.48 main
  5.88 4.47 0.47 720040 0.65 0.65 tone_reverb15
  5.63 4.92 0.45 800000 0.56 0.56 pad_reverb7
  5.01 5.32 0.40 320128 1.25 1.25 hihat_reverb4
  3.63 5.61 0.29 720040 0.40 0.40 tone_bandpass10
  3.63 5.90 0.29 720040 0.40 0.40 tone_oscil9
  2.88 6.13 0.23 800000 0.29 0.29 pad_lopass5
  2.50 6.33 0.20 800008 0.25 0.25 rvrb1_reverb1
  2.50 6.53 0.20 800000 0.25 0.25 pad_oscil4
  2.38 6.72 0.19 280072 0.68 0.68 fseq_oscil15
  2.13 6.89 0.17 720040 0.24 0.24 tone_lopass13
  1.38 7.00 0.11 280072 0.39 0.39 fseq_oscil14
  1.00 7.08 0.08 800008 0.10 0.10 mixer1_apass

A few comments:

-- It's misleading to think the "main_apass" call at the top
represents only overhead -- gcc ends up inlining several of
the instrument apass code sections into main_apass, in fact,
all instrument apass's except for mixer1 and fseq are inlined.

-- The single biggest thing you could do to speed things up
has artistic side-effects and may not be worth it -- the
calls to reverb() and flange() instead several of the
instruments means that each note played by the instrument
has its own independent reverb or flange computation. If
its musically OK to have all instances of, say, a hi-hat,
share the same reverb state, you could make an effects
instrument (say hihat-reverb) and route the hihat output
to it. This would have the effect of:

Each sample counts as 0.01 seconds.
  % cumulative self self total
 time seconds seconds calls us/call us/call name
  5.01 5.32 0.40 320128 1.25 1.25 hihat_reverb4
                            ^^^^^^

Reducing that number, the number of calls to the unit, although
it really does depend on how many simulatenous notes are playing.
Also, of course, its going to sound different -- maybe you don't
want the hihats to sound reverberated together (at least I think
it will sound different, on second thought maybe linear superposition
would hold here and it would be sample-by-sample identical, since
the filter structures in the sfront reverb() are all linear -- but
in general such a substitution could result in things sounding different).

-- The second biggest time user:

Each sample counts as 0.01 seconds.
  % cumulative self self total
 time seconds seconds calls us/call us/call name
  9.51 2.30 0.76 720040 1.06 1.06 tone_lopass11

Happens because the cutoff frequency of the lopass is being updated
at the k-rate, and so the filter coefficients get re-computed once
per k-rate, and sfront's lopass/hipass/bandpass/bandstop filters
aren't as efficient as they should be -- however, the story is a
bit more complicated because just a few lines after this call is
another lopass call with a ksig cutoff:

Each sample counts as 0.01 seconds.
  % cumulative self self total
 time seconds seconds calls us/call us/call name
 2.13 6.89 0.17 720040 0.24 0.24 tone_lopass13

That takes a quarter of the time! I'm pretty sure this is cache
related -- during the k-pass, the first lopass call brings in
the function calls to compute the new coefficients, sometimes
taking a cache hit, but the second call reaps the benefits. But
that's just a theory, reading profile traces at this level of
detail can be tricky to do well ...

Hope this helps, I think the main thing to take away, though, is
that for sfront, your coding style is pretty efficient. Of course,
different decoders are a different story ... for example, as of
sfront 0.53, these table generations:

instr tone(freq, cutoffext)
{

  table sine(harm, 2048, 1, 1);
  table sine1(harm, 2048, 0.5,0.5);

Are converted internally to sfront to be global tables, and computed
just once at the start of the program. Another decoder might not do
that (in fact, sfront 0.52 didn't), and compute 4096 sin() calls
every time a note is launched, which may be either trivial or
significant depending on how many notes are launched and how long
they last ...

                                                        --john lazzaro



This archive was generated by hypermail 2b29 : Mon Jan 28 2002 - 11:46:36 EST