>>From Mike's posting, and from Eric's earlier posting, there seem to
>be two ways for sfront (and other implementations not specifically
>being done as part of a full MPEG system, but from a bottom-up
>"Structured Audio is cool" perspective) to go:
>
>[1] Implement MPEG 4 Systems in a compliant way.
>
>[2] The approach which Eric seemed to suggest in the earlier snippet:
>
>> Put another way, you don't need to use the official
>> MP4 format in a streaming MPEG-4 application, since
>> you're just sending and receiving chunks of data and don't
>> need 'files'.
>
>Basically, I read this as "an implementer has my blessing
>to take StructuredAudioSpecificConfig and SA_access_unit
>chuncks and incorporate them into other file formats
>besides MPEG 4 -- don't feel like you need to use MPEG
>4 Systems if you're (for example) writing a Structured
>Audio decoder for an open-source streaming system. Use
>some other existing streamer if you wish, or just
>concatenate StructuredAudioSpecificConfig and SA_access_unit
>chunks in the worst case.
We should be very clear about what is in MPEG-4 Systems
and how it relates to SA.
[1] The MPEG-4 multiplex and framing format for streaming
data. This tells you how you turn the SASpecificConfig
and SA_access_unit chunks into real packets, framed
with timestamps, ready to be put on the wire, and
how to receive them from the wire and turn them back
into chunks the decoder can deal with.
[2] An API (called DMIF) that lets you map the abstract
transport requirements given by [1] into an actual,
physical transport layer for some specific network
architecture like ATM, IP, or some satellite system.
For example, in a previous email I alluded to work
in IETF to specify "MPEG-4 over IP" -- this is exactly
a specific embodiment of DMIF for the IP transport.
Such embodiments are outside the scope of the
standard. (Strictly speaking, DMIF is not part of
Systems, but a part unto itself -- it's part 6 of
the MPEG-4 standard).
[3] AudioBIFS, which tells you how to mix down multiple
natural and/or synthetic audio streams into a single
presentation. This hasn't come up in this discussion,
so I won't talk about it. (There's a paper on my
WWW page).
[4] MP4, the MPEG-4 File Format. This part of the standard
tells you how to store StructuredAudioSpecificConfig
and SA_access_unit, along with timestamps, and other
information, in a fixed file that is very much like
a QuickTime file.
So the parts of this that you need depend entirely on
your application. If you just want to decode textual
SAOL and synthesize sound from SASL and MIDI on a local
machine, none of it is needed. If you want to stream
SA across a network or other streaming connection, the
decoder has to deal with [1] and probably some instance
of [2]. For [2], some applications will use public
protocols (for example, SA-over-IP) and some will use
private ones (for example, SA-over-DirecTV). And
of course, the server has to deal with the same
implementation of [1] and [2] as it figures out what
data to stream.
If you want to want to package SAOL and SASL files
together for exchange and manipulation, you probably
want to implement [4] in your encoder and decoder,
although no one's yet done it for SA. (Apple and
Mike have a great API for working with MP4 that they
contributed to MPEG and might be convinced to contribute
to SA causes). There would be additional advantages
to having the streaming system work with [4] as well,
because it makes it easy for client-directed-rewind
and things like that to be implemented.
Of course, none of this is strictly "required"; as
we say in MPEG, there's no MPEG police that will come
and get you for doing equivalent things in a private
way for your application. The advantage of doing
this as the standard lays out comes in the interoperability
of multiple applications, which is really only possible
when people work through the standard.
>The only technical issue I can see in the "worst case"
>of concatenating StructuredAudioSpecificConfig and
>SA_access_unit chunks is the midi_event issue we hashed
>out earlier -- every midi_event chunch has an implicit
>has_time bit set, and the time needs to come from somewhere.
>The simplest convention I could imagine is sending
>dummy score_line chunks at regular intervals with its
>has_time bit set and an explicit timestamp, along with
>a convention that a midi_event uses the timestamp of the
>last score_line sent. This has the advantages of being
>compatible on the bit level, whereas defining a non-standard
>event_type for SA_access_unit would not.
Yes, I agree that this is an issue. For the simple
encoder 'saenc' that is packaged with 'saolc', I just
added an extra time-stamp to the beginning of each
SA_access_unit. This isn't part of the AU, but part
of the stream conveyence conceptually. This is documented
in the saolc implementation technical report.
>I guess the MSB from my end in that binary formats aren't
>the biggest problem at the moment for Structured Audio --
>the problem is that a culture of SAOL programming needs to
>flower first, so that content worth streaming (be it SAOL
>only algorithmic content or SAOL + (MIDI || SASL) score-based
>content) can be created. In that sense, sfront's focus right
>now is improve code optimization, to inspire people to write
>SAOL programs -- once the SAOL programming culture takes off,
>hopefully it will be obvious if [1], [2], or some other option
>is the right one for sfront to support. In the short term,
>I hope to get the "worst case" version of [1], or some variant
>of it as people suggest, up and running in sfront, so that
>people interested in streaming can use sfront to do experiments
>with.
With this, I couldn't agree more!
Best,
-- Eric
+-----------------+
| Eric Scheirer |A-7b5 D7b9|G-7 C7|Cb C-7b5 F7#9|Bb |B-7 E7|
|eds@media.mit.edu| < http://sound.media.mit.edu/~eds >
| 617 253 1750 |A A/G# F#-7 F#-/E|Eb-7b5 D7b5|Db|C7b5 B7b5|Bb|
+-----------------+
This archive was generated by hypermail 2b29 : Wed May 10 2000 - 12:15:49 EDT