Dear Eric, Juhana and all,
        I am trying to put some pieces together concerning the issue
of sample-by-sample vs block-by-block execution.
In my first reply on the 18th of april I wrote:
>I don't see at the moment if an easy solution exists in general, this case
>could for instance be solved at the price of some extra memory allocation,
>but probably I must also consider how and where the opcode is called, if
>there are other relations at the a-rate between x and y, etc.
The same concept was explained even better by Juhana
> I see the problem only if x depends on the y outside of the integrator.
> If the x is N samples block and don't depend on y, then is the above the
> same than
>    initialize y[]
>    for(i = 0; i < N; i++) {
>      y[i+1] = y[i] + x[i];
>    }
> or such?
> 
where the two arrays x[] and y[] are the extra memory allocations. This works
if it is not:
y=integrate(x);
x=f(y);
i.e. if we have no feedback at a-rate. If for instance I imagine to
implement this in vhdl,
this 'for' cycle corresponds to parallelizing a single cell serial adder to a 
parallel N-cells adder (very very roughly, just to clear my ideas).
Eric's example clearly presents a case of feedback:
>Think of the following code block: 
>
>  asig w,x,y,z;
>
>  w = sum(y, z);  // 1
>  x = sum(x, y);  // 2
>  x = sum(x, z);  // 3
>
because the output of 3 is the input to 2; in this case, I could do
two things: 1) keep the two sum calls, but in this case I need to create a
feedback, and since I don't have clocks faster than a-rate I must execute
sample-
by-sample; 2) I could merge the two sum calls and create a sum(x,y,z), and
in this
case I could parallelize the execution, i.e.
for (i=0;i<ksmps;i++) {
        x[i] = x[i-1] + y[i];
        x[i] = x[i] + z[i];
}
(I forget here x[-1] for clarity). In this case in hw I create a two stage
parallel adder;
Now, if I am right so far, the question is:
>I think your flow-graph analysis is very correct; I've been thinking
>about similar things.  What I'm not yet sure about is whether there's
>good "prototypes" (type 1,2,3 above) for all the feedback cases 
>which allow for block-based code in all circumstances.  
>
>Your shared BBB and SBS model will work, though, as a minimum.
>
In my opinion it is theoretically possible to go through the flow-graph and
merge all the lines that involves a certain feedback, so that I can
parallelize
the execution. But:
1) the problem of course becomes much tougher if multiple feedback rings
must be detected and followed.
2) the builtin opcodes from this point of view are "black boxes", which
can't be
optimized rearranging the saol code. I need to think of it at the decoder
implementation.
In the end: I understand that the problem is easy enough to solve if I have
no feedback, at least in words. If I have a very simple feedback I can
optimize
the code to work with parallel blocks. I fear that finding out the block-based
code for all circumstances will be a heavy optimization task for the 
compiler or interpreter. 
Comments ? Is my analysis correct ? Can I consider this as a starting point 
for this issue ?
Best regards,
        Giorgio
__________________________________________________________________
Giorgio ZOIA
Integrated Systems Center - DE/c3i - EPFL
CH-1015 Lausanne - SWITZERLAND
Phone: + 41 21 693 69 79      E-mail: Giorgio.Zoia@epfl.ch
Fax: +41 21 693 46 63
__________________________________________________________________
This archive was generated by hypermail 2b29 : Wed May 10 2000 - 12:14:10 EDT