Re: feedback opcodes

From: Robin Davies (rerdavies@msn.com)
Date: Sat Apr 22 2000 - 14:25:36 EDT


> Can't the same effect be acheived with the downsamp() core
> opcode, though? That is,
>
> kSomeVal = downsamp(aSomeSignal,t);
>
> where t is the table [0 0 0 . . . 1] with length of the
> k-period?
>

The main differences are differences of efficiency and storage. The storage
issue is obvious, since the downsamp operator has to store seperate values
for each a-rate sample.

The efficiency issue is less obvious. Consider a SAOL implementation that
implements high-quality interpolation. The table read operations
interpolate. The array reads do not.

Consider this from a code generation point of view.

Assume you have a compiler that inlines *everything* in order to maximize
execution speed. Generally speaking, because most processors these days are
superscalar, floating point ops are cheap. but Conditional branches are
expensive,since mispredicting a branch requires speculative execution to be
discarded, and the processor state to be returned to a valid known state.
This is true of x86, but also true powerpc (definitely), and sparc (I
believe), &c.

The tableread has to execute the following two peices of code, since
pre-determining the integerness of the index is difficult at best, and
involves a lot of flow analysis.

    <warning: a lot of handwaving and back-of-the-envelope follows. Cycle
counts are
      neccessarily inexact.>

// tableread(tbl, index)
    int offset = (int)(index); // 1 cycle.
    // optionally range check the offset. Not sure.
    float_t fraction = index-offset; // 3 or 4 cycles? .
    Interpolate(&tbl, offset, fraction); // inlined. 0 cycles.

The interpolate call has to execute (assuming it wants to optimize for this
case, which it doesn't):
        ....
           if (fractions = 0) { // 4-ish (correctly predicted on p6), or
18+ on a mispredicted branch.
                return tableStorage->m_pData[offset]; // 1-ish inlined and
compiler optimized.
           }

Or it can just go ahead and perform the interpolation anyway without
performing the test (not sure. About 20 cycles-ish).

The array read generates the following code:

    int offset = Round(index); // 1 cycle in assembler
                                // 2-3 cycles in C using intel-endorsed
magic hackery.
    // Optionally range-check the offset. Not sure. Very efficient in
assembler.
    return tableStorage->m_pData[offset]; // 2-ish (inlined).

Furthermore, if it's a constant array index, then the Rounding and
range-checking can be performed at compile time. If this is the case, the
code is:

    return tableStorage->m_pData[0]; // 1/2 or 1 cycle.

So. We're talking about an operation that takes 1 cycle for the array
referemce, but somewhere in excess of 15 machine cycles for the tableread.

In addition, array reference allows processing to occur at a-rate time which
would otherwise have to be performed using a k-rate while loop.

Trying to think of a concrete example..... try this:

Note, that this opcode is actually illegal in SAOL (I think), since it's a
specialop, but it *could* be
performed inline within an instr or opcode.

kopcode VUMeterPeakIndicator(asig input) {
    asig AsigZero; // A hack. We need an Asig with value zero.
    ksig KSigZero;
    isig peakLevel[1];
    peakLevel[AsigZero] = max(input ,peakLevel[AsigZero]*decay); // A-rate.
    return peakLevel[KSigZero]; // K-Rate.
}

Ugly? Yes.

Legal? Yes.

Efficient? Yes. Very.

Compare this to the downsamp implementation.

kopcode VUMeterPeakIndicator(asign input) {
    table tbl(empty, s_rate/k_rate);
    ksig i;
    ksig peak;
    downsamp(tbl, input);

    i = 0;
    while (i < s_rate/k_rate) {
        peak = max(tableread(tbl,i), peak*decay);
    }
    return (peak);
}

Now, not only do we pay the penaulty for the interpolated table reads, we
pay an additional penaulty of 5 or 6 cycles for the loop overhead.

Regards,

Robin Davies.



This archive was generated by hypermail 2b29 : Mon Jan 28 2002 - 12:03:54 EST