> Can't the same effect be acheived with the downsamp() core
> opcode, though? That is,
>
> kSomeVal = downsamp(aSomeSignal,t);
>
> where t is the table [0 0 0 . . . 1] with length of the
> k-period?
>
The main differences are differences of efficiency and storage. The storage
issue is obvious, since the downsamp operator has to store seperate values
for each a-rate sample.
The efficiency issue is less obvious. Consider a SAOL implementation that
implements high-quality interpolation. The table read operations
interpolate. The array reads do not.
Consider this from a code generation point of view.
Assume you have a compiler that inlines *everything* in order to maximize
execution speed. Generally speaking, because most processors these days are
superscalar, floating point ops are cheap. but Conditional branches are
expensive,since mispredicting a branch requires speculative execution to be
discarded, and the processor state to be returned to a valid known state.
This is true of x86, but also true powerpc (definitely), and sparc (I
believe), &c.
The tableread has to execute the following two peices of code, since
pre-determining the integerness of the index is difficult at best, and
involves a lot of flow analysis.
<warning: a lot of handwaving and back-of-the-envelope follows. Cycle
counts are
neccessarily inexact.>
// tableread(tbl, index)
int offset = (int)(index); // 1 cycle.
// optionally range check the offset. Not sure.
float_t fraction = index-offset; // 3 or 4 cycles? .
Interpolate(&tbl, offset, fraction); // inlined. 0 cycles.
The interpolate call has to execute (assuming it wants to optimize for this
case, which it doesn't):
....
if (fractions = 0) { // 4-ish (correctly predicted on p6), or
18+ on a mispredicted branch.
return tableStorage->m_pData[offset]; // 1-ish inlined and
compiler optimized.
}
Or it can just go ahead and perform the interpolation anyway without
performing the test (not sure. About 20 cycles-ish).
The array read generates the following code:
int offset = Round(index); // 1 cycle in assembler
// 2-3 cycles in C using intel-endorsed
magic hackery.
// Optionally range-check the offset. Not sure. Very efficient in
assembler.
return tableStorage->m_pData[offset]; // 2-ish (inlined).
Furthermore, if it's a constant array index, then the Rounding and
range-checking can be performed at compile time. If this is the case, the
code is:
return tableStorage->m_pData[0]; // 1/2 or 1 cycle.
So. We're talking about an operation that takes 1 cycle for the array
referemce, but somewhere in excess of 15 machine cycles for the tableread.
In addition, array reference allows processing to occur at a-rate time which
would otherwise have to be performed using a k-rate while loop.
Trying to think of a concrete example..... try this:
Note, that this opcode is actually illegal in SAOL (I think), since it's a
specialop, but it *could* be
performed inline within an instr or opcode.
kopcode VUMeterPeakIndicator(asig input) {
asig AsigZero; // A hack. We need an Asig with value zero.
ksig KSigZero;
isig peakLevel[1];
peakLevel[AsigZero] = max(input ,peakLevel[AsigZero]*decay); // A-rate.
return peakLevel[KSigZero]; // K-Rate.
}
Ugly? Yes.
Legal? Yes.
Efficient? Yes. Very.
Compare this to the downsamp implementation.
kopcode VUMeterPeakIndicator(asign input) {
table tbl(empty, s_rate/k_rate);
ksig i;
ksig peak;
downsamp(tbl, input);
i = 0;
while (i < s_rate/k_rate) {
peak = max(tableread(tbl,i), peak*decay);
}
return (peak);
}
Now, not only do we pay the penaulty for the interpolated table reads, we
pay an additional penaulty of 5 or 6 cycles for the loop overhead.
Regards,
Robin Davies.
This archive was generated by hypermail 2b29 : Mon Jan 28 2002 - 12:03:54 EST