UDOs, and a big section on Oparray implementation options.

From: Robin Davies (rerdavies@msn.com)
Date: Thu May 18 2000 - 11:10:22 EDT

Next message: Giorgio Zoia: "Re: UDOs, and a big section on Oparray implementation options."
Previous message: Giorgio Zoia: "Re: On UDOs ..."
In reply to: Giorgio Zoia: "Re: On UDOs ..."
Next in thread: Giorgio Zoia: "Re: UDOs, and a big section on Oparray implementation options."
Reply: Giorgio Zoia: "Re: UDOs, and a big section on Oparray implementation options."
Maybe reply: John Lazzaro: "Re: UDOs, and a big section on Oparray implementation options."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

>("at the first a-rate call", "has not a proper
> representation of time, but shall infer it from the number of calls",
etc.)
> i-block: first a-rate (or k-rate) call.
> k-block: first a-rate call of an a-cycle
> a-block: as understood.
>
Innteresting idea. I am very concious that the standard uses this kind of
terminology for many of the core opcodes (well, probably *all* of the core
aopcodes that have i-rate-derived state use this terminology). I'm also
aware that SAOLC implements these types of core opcodes quite literally
following the given verbiage. i.e. the calculation of i-rate derived values
are calculated at a-rate, but are effectively guarded by a "if (bFirstTime)"
guard, also executed at a-rate. I'd have to check, but I think this guard is
usually implicit, since SAOLC core opcodes allocate memory for their core
state on first use.

Sfx, where possible, avoids this construct. There are, admittedly, some core
opcodes that really do require the a-rate "if (bFirstTime)" guard. However,
for non-oparray use of core opcodes, usually, pre-calculations of internal
values derived from sub-arate arguments can be performed at the rate of the
argument. This partly constitutes Sfx's motivation for moving sub-guard-rate
opcode statements outside the the guard-block. It means that these internal
state variables can be calculated without placing a conditional branch into
the execution stream of the a-rate code.

loscil is actually a fairly good example to consider. Caveat: this isn't
actual sfx code, but indicates how i would implement loscil based on my best
recollection of the opcode.

aopcode losicl(table tbl, cps, ivar basefrequency, ivar loopStart, ivar
loopend)

IRate code:
    iRateConversion = kSRate/baseFrequency;
    if (loopStart >= 0) kLoopStart = loopStart; // compiler
optimizationremoves this
                                                // when loopStart is -1.
    if (loopEnd > 0) kLoopEnd = loopEnd;
    runtime checks on loopstart and loopend

KRate code:
    // update the loop parameters.
    if (loopend <= 0) { // rely on compiler optimization to remove
inappropriate code
                                  // when loopend is const.
        m_kLoopEnd = tbl->m_LoopEnd;
    }
    if (loopStart <= 0) {
            m_kLoopStart = tbl->m_LoopStart;
     }
    m_CurrentIndex = 0;
    runtime checks on legality of kLoopStart, kLoopEnd.
    m_LoopAdjustment = m_kLoopEnd-m_kLoopStart;

ARate code:
    float_t increment = cps*iRateConversion; // 3
    float_t curentIndex = m_currentIndex += inc; // 3
    if (currentIndex >= kLoopEnd) { // 20 clocks, randomly mis-taken
branch.
        currentIndex -= m_LoopAdjustment; // usually not taken. Free.
        if (currentIndex >= kLoopEnd) {
            currentIndex =
               fmod(currentIndex-m_kLoopStart, m_LoopAdjustment) //
Expensive!
               + m_kLoopStart;
        }
    }
    m_CurrentIndex = currentIndex; // 1 to 2.
    aResult = Interp(tbl, m_CurrentIndex); // ~ 30 for 8-point hi-quality
interpolation.

Approximate clock times are appended. The 20 clocks for the mistaken branch
assumes that the x86 branch prediction unit can't figure out which way the
branch will go because there are too many branches already in the a-rate
loop.

Adding an a-rate "if (bFirstTime) KRate code" test to this code adds another
conditional branch into the a-rate codepath. This slows down the execution
of the a-rate code path from ~60 clocks to ~80 clocks. Admittedly this is
fairly reasonable, but there are other core opcodes that go from say, ~6
clocks to ~26 clocks, which isn't so reasonable.

Since the results of the two methods are computationally equivalent, it
seems entirely justified to me to execute the k-rate code unconditionally
when the opcode is guarded by an a-rate if guard. The conditions for this
optimization to be computationally equivalent to the language of the spec:
(1) The k-rate code must not have any side effects that that modify opcode
state that persists from k-cycle to k-cycle. If condition (1) does hold,
then the results of then we have a condition that must either generate an
error or, produces undefined results per the current specification.

I think this is an important optimization for SAOL compilers that should be
kept in mind if we modify the behaviour of opcodes and oparrays. I'd have to
check my implementation, but currently, there are only one or two core
opcodes that have the "at first use" clause that cannot be optimized this
way.

Interestingly, Giorio's comment raises a possible strategy for dealing with
this problem if the "first-use" terminology really becomes a problem. Sfx's
code generator could possibly be taught to generate the following code:

    Eval() {
            KRatePass();
            ARateWithFirstTimeChecks();
            for (int i = 1; i < kSamplesPerControlCycle; ++i) // NB: i = 1!!
            {
                ArateWithoutFirstTimeChecks();
            }
    }

However, this only works for k-rate guards, and not for a-rate guards. And.
It's a hell of a lot of work. I'm going to wait to see where we end up here
before performing this kind of code surgery.

Now, the oparray case. Well. I'm less clear on this. As I've said, Sfx's
oparray implementation is very speculative right now, and currently
constitutes the major barrier to alpha release. I'm ready to go except for
this issue.

The "first use" language now becomes much more pertinent, since determining
first use at compile-time is compuationally infeasible when oparray indices
are not constant.

Furthermore, oparrays introduce an additional complication, which is this:
the same oparray state can be referenced using both a k-rate index *and* an
a-rate index in the same a-rate cycle. This is the core of Sfx's current
problem.

The real problem is this. If k-rate-argument derived parameters are
evaluated at k-rate, then the opcode state will reflect that *last use* of
the particular oparray member when a-rate execution starts. There are
endless possibilities here.

I think I'd like to introduce a new peice of terminology here:

opcode state: internal state variables of an opcode.
oparray item state: internal state variables of an oparray member, including
additional state variables generated specifically for oparray use.
oparray reference state: internal state variables of an oparray reference.

The opcode state variables would be precisely the same as when an opcode is
generated inline. The oparray item state consists of state variables
associated with a particular member of the oparray that are generated
specifically for managing oparray use, but are not present when the opcode
is generated inline. The oparray reference state is more subtle. Just as an
inline opcode generates state variables, an inline oparray index expression
could also generate a state block.

Perhaps the best way to illustrate the idea is to write a little
illustrative code. Hopefully everyone is comfortable enough with C++ to
understand member variablers.

class CX_AnOpcode {
    void IEval(float_t iArg);
    void KEval(float_t kArg);
    float_t AEval(float_t aArg);
    float_t m_kRateLocalVariable;
    float_t m_aRateLocalVariable;
};

class CX_AnOpcode_OpArrayAdaper {
    CX_AnOpcode opstate; // The core opcode state.
    bool m_bIRateExecuted; // for example.
    bool m_bKRateExecuted;
};

CX_AnOpcode_OpArrayAdapter m_AnOpArray[12]; // state variables of the
oparray.

class CX_AnOpcode_OpArrayReference {
    float_t kSavedIArg;
    float_t kSavedKArg;
    void IEval(int iOpIndex, float_t iArg)

        m_kSavedIARg = iArg;
    };
    void IEval(int iOpIndex, float_t kArg) {
        m_kSavedKRag = kArg;
    }
    float_t AEval(int iOpIndex, float_t aArg) {
        if (!m_AnOp[iOpIndex].bIRateExecuted) {
            m_AnOp[iOpIndex].IEval(kSavedIArg);
            m_AnOp[iOpIndex].bIRateExecuted = true;
       }
        if (!m_AnOp[iOpIndex].bKRateExecuted) {
            m_AnOp[iOpIndex].KEval(kSavedKArg);
            m_AnOp[iOpIndex].bKRateExecuted = true;
        }
        return m_AnOp[iOpIndex].AEval(aArg);
    }

This is intended to be illustrative, rather than an implementation that I'd
recommend. The CX_AnOpcode class holds state and members for a compiled
opcode that can be used in inline code. The CX_AnOpcode_OpArrayAdapter is an
"oparray item state" for AnOpcode. It contains additonal data items to
manage use of AnOpcode in an oparray (first use flags, in this case).

The CX_AnOpcode_OpArrayReference is an "oparray reference state" class. The
class contains state data associated with a particular lexical occurence of
an oparray in the souce code.

For example:

oparray fracdelay[12]; // generates CX_AnOpcode_OpArrayAdapter[12];

aResult = fracdelay(1,2,3); // generates a hidden CX_AnOpcode. Direct
reference.

aResult = fracdelay[5](1,2,3); // Generates a CX_AnOpcodeArrayReference
state variable.
// generates code
CX_AnOpcodeArrayReference::XEval(5, args);

Now lest get concretely to an problematic example:

    aopcode AnOpcode(ksig kArg, asig aArg) {
        ksig kArgCopy;
         kArgCopy = kArg;
        return (kArgCopy*aArg);

}

instr ...
    oparay AnOpcode[12];
    ksig kIndex;
    asig aIndex;

    kIndex = 1;
    aIndex = 1;
    aResult = AnOpcode[kIndex](1, aArg); // [1]
    aResult = AnOpcode[aIndex](2, aArg); // [2]

Here are the possible implemenation options I can think of.

(1) Results are illegal/undefined. Fine. I'll issue an error/warning.
However, we still need to define what conditions must hold for the results
to be defined. Continue on.

(2) kIndex gets promoted to a-rate to match the rate of the opcode. Perhaps
helpful. Perhaps not. Still need to define which statement's k-rate
arguments have effect.

(3) multiple k-rate passes are executed sequentially. [1] Assigns kArgCopy
the value 1. [2] Assigns kArgCopy 2. a-rate passes are executed
sequentially, after k-pass is complete. [1] returns aArg*2 (since kArgCopy
was overwritten by [2]); [2] returns aArg*2;

(4) k-rate passes are executed using the first-use rule. A bKRateExecuted
flag in the oparray state is cleared at the start of k-rate execution; [1]
constitutes the first use, so kArgCopy gets assigned the value 1. [2] does
not execute at k-rate. [1] returns aArg*1; [2] returns aArg*2.

(5) k-rate passes are executed at every a-rate cycle for every op-array
reference. At k-rate no execution takes place. [1] returns aArg*1; [2]
returns aArg*2. This is equivalent to flattening the rate of the enttire
opcode. k-rate subexpression optimizations must be turned off while the
opcode is generated.

(6) oparray opcodes and opcode rates must be uniform. Since oparrays are
almost invariably executed in while loops, state explicitly that statements
(and expressions also, and arguments possibly as well) within the opcode
referenced by the oparray must match the rate of the while loop guard. This
still doesn't implicitly address what happens with mixed-rate if guard
statements, so make the rule explicit. To make the example legal, the
definition must now be:

ksig aArgCopy;
aArgCopy = kArg; // or possibly aArgCopy = aArg;

Problem 1: the reference to kARg is a k-rate expression. A: Rate-restrict
it, so that kArg has to be imeediately promoted to a-rate before further
expression analysis takes place. i.e. all variable and argument references
become a-rate references regardless of their stated rate.

[1] returns 1*aArg; [2] returns 2*aArg. A caveat, however. This unpleasant
consequences for opcodes that rely on i-rate or k-rate optimization.
Consider aopcode ResonantFilter(asig input, ivar Q, ksig cutoff) as a
particularly unpleasant example. aopcode ResonantFilter(asig input, asig Q,
asig cutoff) is gonna be *verrrry* slow, whereas the former is entirely
reasonable.

(7) State explitly in the spec that k-rate arguments must match across *all*
references to an opcode state for the results to be defined. This leaves
implementers free to use (3) (4) or (5). It leaves the burden on authors to
ensure that the k-rate arguments match which is usually trivial to do.
Compilers will not be able to issue warnings at compile time except perhaps
for the simplest cases. In this case

aResult = AnOpcode[kIndex](1, aArg); // [1]
aResult = AnOpcode[aIndex](2, aArg); // [2]

is illegal, but

aResult = AnOpcode[kIndex](2, aArg); // [1]
aResult = AnOpcode[aIndex](2, aArg); // [2]

is legal, and returns well-defined results. This runs into problems when
combined with conditional code, however. See 8.

(8) Accept 7. So, now consider:

    while (aIterator < 12) {
        aResult = aResult + AnOpcode[aIterator](2, aArg);
       aIterator = aIterator+1;
    }

When does the k-pass code get executed? The only feasible implementation I
can think of is that all sub-index-rate evalution will be calculated across
the entire oparray prior to execuution of the while guard (and outside the
while code block).

e.g.:
        k pass compiler rewrite (note that it's unguarded).
               for (i = 0; i < 12; ++i) { AnOpcode[i].KEval(2);
       a pass:
            while (aIterator < 12) [
                aResult = aResult = AnOpcode[aIterator].AValue(aArg);
            }

Now consider:

AnOpcode[kIndex](2,aArg);
AnOpcode[aIndex](2,aArg);

At k-rate, this would have to be implemented as :

AnOpcode[kIndex].KEval(2);
for (i = 0; i < 12; ++i) AnOpcode[i].KEval(2); // <-- rewritten by the
compiler.

Which at least produces well-defined results if (7) is adhered to by the
author. I don't have a terrible problem with this. This implementation
option also has the rather pleasant side-effect that it actually guarantees
that opcodes used in oparrays can always be guaranteed to be called at least
once at i- k- and a-rate.

Interpreters would either have to rewrite as well, or walk the if/while code
blocks looking for oparray references. Although, again, I'm skeptical as to
wether anyone will ever write a pure SAOL interpreter that runs in realtime
that doesn't perform optimization/re-write passes. I don't imagine present
company would have a terrible problem with minor re-ordering of statements
in the parse tree prior to emiting code.

    if (indexRate > currentRate) {
        for (i = 0; i < dim(AnOpcode_Oparray); ++i) {
             AnOpcode[i].KEval(kArg);
        }
    }else

AnOpcode[index].AEval(aArg);
}

My vote: Option (8) with restriction (7) explicitly added to the
spec/corrigendum. Sfx would implement (3). I think this provides an
implementable solution that provides predictable unambiguous results for UDO
oparrays without hampering the capabilities of the language.

Regards,

Robin.

Next message: Giorgio Zoia: "Re: UDOs, and a big section on Oparray implementation options."
Previous message: Giorgio Zoia: "Re: On UDOs ..."
In reply to: Giorgio Zoia: "Re: On UDOs ..."
Next in thread: Giorgio Zoia: "Re: UDOs, and a big section on Oparray implementation options."
Reply: Giorgio Zoia: "Re: UDOs, and a big section on Oparray implementation options."
Maybe reply: John Lazzaro: "Re: UDOs, and a big section on Oparray implementation options."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Mon Jan 28 2002 - 12:03:55 EST