ISO/JTC 1/SC 29 WG 11
N2203SADate:
1998-05-15ISO/IEC FCD 14496-3 Subpart 5
ISO/JTC 1/SC 29/WG11
Narumi HiroseInformation Technology - Coding of Audiovisual Objects –
Low Bitrate Coding of Multimedia Objects
Part 3: Audio
Subpart 5: Structured Audio
FCD 14496-3 Subpart 5
MPEG-4 Structured Audio
Editor:
Eric D. Scheirer, MIT Media Laboratory
eds@media.mit.edu
+1 617 253 0112
<http://sound.media.mit.edu/mpeg4>
5.0 Introduction
*5.0.1 Overview of subpart
*5.0.1.1 Purpose
*5.0.1.2 Introduction to major elements
*5.0.2 Normative References
*5.0.3 Glossary of Terms
*5.0.4 Description methods
*5.0.4.1 Bitstream syntax
*5.0.4.2 SAOL syntax
*5.0.4.3 SASL Syntax
*5.0.5 Bibliography
*5.1 Bitstream syntax and semantics
*5.1.1 Introduction to bitstream syntax
*5.1.2 Bitstream syntax
*5.2 Profiles
*5.3 Decoding process
*5.3.1 Introduction
*5.3.2 Decoder configuration header
*5.3.3 Bitstream data and sound creation
*5.3.3.1 Relationship with systems layer
*5.3.3.2 Bitstream data elements
*5.3.3.3 Scheduler semantics
*5.3.4 Conformance
*5.4 SAOL syntax and semantics
*5.4.1 Relationship with bitstream syntax
*5.4.2 Lexical elements
*5.4.2.1 Concepts
*5.4.2.2 Identifiers
*5.4.2.3 Numbers
*5.4.2.4 String constants
*5.4.2.5 Comments
*5.4.2.6 Whitespace
*5.4.3 Variables and values
*5.4.4 Orchestra
*5.4.5 Global block
*5.4.5.1 Syntactic form
*5.4.5.2 Global parameter
*5.4.5.3 Global variable declaration
*5.4.5.4 Route statement
*5.4.5.5 Send statement
*5.4.5.6 Sequence specification
*5.4.6 Instrument definition
*5.4.6.1 Syntactic form
*5.4.6.2 Instrument name
*5.4.6.3 Parameter fields
*5.4.6.4 Preset and channel tags
*5.4.6.5 Instrument variable declarations
*5.4.6.6 Block of code statements
*5.4.6.7 Expressions
*5.4.6.8 Standard names
*5.4.7 Opcode definition
*5.4.7.1 Syntactic Form
*5.4.7.2 Rate tag
*5.4.7.3 Opcode name
*5.4.7.4 Formal parameter list
*5.4.7.5 Opcode variable declarations
*5.4.7.6 Opcode statement block
*5.4.7.7 Opcode rate
*5.4.8 Template declaration
*5.4.8.1 Syntactic form
*5.4.8.2 Semantics
*5.4.8.3 Template instrument definitions
*5.4.9 Reserved words
*5.5 SAOL core opcode definitions and semantics
*5.5.1 Introduction
*5.5.2 Specialop type
*5.5.3 List of core opcodes
*5.5.4 Math functions
*5.5.4.1 Introduction
*5.5.4.2 int
*5.5.4.3 frac
*5.5.4.4 dbamp
*5.5.4.5 ampdb
*5.5.4.6 abs
*5.5.4.7 sgn
*5.5.4.8 exp
*5.5.4.9 log
*5.5.4.10 sqrt
*5.5.4.11 sin
*5.5.4.12 cos
*5.5.4.13 atan
*5.5.4.14 pow
*5.5.4.15 log10
*5.5.4.16 asin
*5.5.4.17 acos
*5.5.4.18 ceil
*5.5.4.19 floor
*5.5.4.20 min
*5.5.4.21 max
*5.5.5 Pitch converters
*5.5.5.1 Introduction to pitch representations
*5.5.5.2 gettune
*5.5.5.3 settune
*5.5.5.4 octpch
*5.5.5.5 pchoct
*5.5.5.6 cpspch
*5.5.5.7 pchcps
*5.5.5.8 cpsoct
*5.5.5.9 octcps
*5.5.5.10 midipch
*5.5.5.11 pchmidi
*5.5.5.12 midioct
*5.5.5.13 octmidi
*5.5.5.14 midicps
*5.5.5.15 cpsmidi
*5.5.6 Table operations
*5.5.6.1 ftlen
*5.5.6.2 ftloop
*5.5.6.3 ftloopend
*5.5.6.4 ftsr
*5.5.6.5 ftbasecps
*5.5.6.6 ftsetloop
*5.5.6.7 ftsetend
*5.5.6.8 ftsetbase
*5.5.6.9 tableread
*5.5.6.10 tablewrite
*5.5.6.11 oscil
*5.5.6.12 loscil
*5.5.6.13 doscil
*5.5.6.14 koscil
*5.5.7 Signal generators
*5.5.7.1 kline
*5.5.7.2 aline
*5.5.7.3 kexpon
*5.5.7.4 aexpon
*5.5.7.5 kphasor
*5.5.7.6 aphasor
*5.5.7.7 pluck
*5.5.7.8 buzz
*5.5.7.9 fof
*5.5.8 Noise generators
*5.5.8.1 Note on noise generators and pseudo-random sequences
*5.5.8.2 irand
*5.5.8.3 krand
*5.5.8.4 arand
*5.5.8.5 ilinrand
*5.5.8.6 klinrand
*5.5.8.7 alinrand
*5.5.8.8 iexprand
*5.5.8.9 kexprand
*5.5.8.10 aexprand
*5.5.8.11 kpoissonrand
*5.5.8.12 apoissonrand
*5.5.8.13 igaussrand
*5.5.8.14 kgaussrand
*5.5.8.15 agaussrand
*5.5.9 Filters
*5.5.9.1 port
*5.5.9.2 hipass
*5.5.9.3 lopass
*5.5.9.4 bandpass
*5.5.9.5 bandstop
*5.5.9.6 biquad
*5.5.9.7 allpass
*5.5.9.8 comb
*5.5.9.9 fir
*5.5.9.10 iir
*5.5.9.11 firt
*5.5.9.12 iirt
*5.5.10 Spectral analysis
*5.5.10.1 fft
*5.5.10.2 ifft
*5.5.11 Gain control
*5.5.11.1 rms
*5.5.11.2 gain
*5.5.11.3 balance
*5.5.11.4 compress
*5.5.11.5 pcompress
*5.5.11.6 sblock
*5.5.12 Sample conversion
*5.5.12.1 decimate
*5.5.12.2 upsamp
*5.5.12.3 downsamp
*5.5.12.4 samphold
*5.5.13 Delays
*5.5.13.1 delay
*5.5.13.2 delay1
*5.5.13.3 fracdelay
*5.5.14 Effects
*5.5.14.1 reverb
*5.5.14.2 chorus
*5.5.14.3 flange
*5.6 SAOL core wavetable generators
*5.6.1 Introduction
*5.6.2 Sample
*5.6.3 Data
*5.6.4 Random
*5.6.5 Step
*5.6.6 Lineseg
*5.6.7 Expseg
*5.6.8 Cubicseg
*5.6.9 Spline
*5.6.10 Polynomial
*5.6.11 Window
*5.6.12 Harm
*5.6.13 Harm_phase
*5.6.14 Periodic
*5.6.15 Buzz
*5.6.16 Concat
*5.6.17 Empty
*5.7 SASL syntax and semantics
*5.7.1 Introduction
*5.7.2 Syntactic Form
*5.7.3 Instr line
*5.7.4 Control line
*5.7.5 Tempo line
*5.7.6 Table line
*5.7.7 End line
*5.8 SAOL/SASL tokenisation
*5.8.1 Introduction
*5.8.2 SAOL tokenisation
*5.8.3 SASL Tokenisation
*5.9 Sample Bank syntax and semantics
*5.9.1 Introduction
*5.9.2 Elements of bitstream
*5.9.2.1 RIFF Structure
*5.9.2.2 The INFO-list Chunk
*5.9.2.3 The sdta-list Chunk
*5.9.2.4 The pdta-list Chunk
*5.9.3 Enumerators
*5.9.3.1 Generator Enumerators
*5.9.3.2 Default Modulators
*5.9.3.3 Precedence and Absolute and Relative values.
*5.9.4 Parameters and Synthesis Model
*5.9.4.1 Synthesis Model
*5.9.4.2 MIDI Functions
*5.9.4.3 Parameter Units
*5.9.4.4 The SASBF Generator Model
*5.9.4.5 The SASBF Modulator Controller Model
*5.9.5 Error Handling
*5.9.5.1 Structural Errors
*5.9.5.2 Unknown Chunks
*5.9.5.3 Unknown Enumerators
*5.9.5.4 Illegal Parameter Values
*5.9.5.5 Out-of-range Values
*5.9.5.6 Missing Required Parameter or Terminator
*5.9.5.7 Illegal enumerator
*5.9.6 Profile 2 (Sample Bank and MIDI decoding)
*5.9.6.1 Stream information header
*5.9.6.2 Bitstream data and sound creation
*5.9.6.3 Conformance
*5.9.7 Profile 4 (Sample Bank decoding in SAOL instruments)
*5.9.8 Sample Bank Format Glossary
*5.10 MIDI semantics
*5.10.1 Introduction
*5.10.2 Profile 1 decoding process
*5.10.3 Mapping MIDI events into orchestra control
*5.10.3.1 Introduction
*5.10.3.2 MIDI events
*5.10.3.3 Standard MIDI Files
*5.10.3.4 Default controller values
*5.11 Input sounds and relationship with AudioBIFS
*5.11.1 Introduction
*5.11.2 Input sources and phaseGroup
*5.11.3 The AudioFX node
*5.11.4 Interactive 3-D spatial audio scenes
*C.1 Introduction
*C.2 Lexical grammar for SAOL in lex
*C.3 Syntactic grammar for SAOL in yacc
*The Structured Audio decoder allows for the transmission and decoding of synthetic sound effects and music using several techniques. Using Structured Audio, high-quality sound can be created at extremely low bandwidth. Typical synthetic music may be coded in this format at bit rates ranging from 0 kbps (no continuous cost) to 2 or 3 kbps for extremely subtle coding of expressive performance using multiple instruments.
MPEG-4 does not standardise a particular set of synthesis methods, but a method for describing synthesis methods. Any current or future sound-synthesis method may be described in the MPEG-4 Structured Audio format.
There are five major elements to the Structured Audio toolset:
1. The Structured Audio Orchestra Language, or SAOL. SAOL is a digital-signal processing language which allows for the description of arbitrary synthesis and control algorithms as part of the content bitstream. The syntax and semantics of SAOL are standardised here in a normative fashion.
2. The Structured Audio Score Language, or SASL. SASL is a simple score and control language which is used in certain profiles (see Subclause 5.2) to describe the manner in which sound-generation algorithms described in SAOL are used to produce sound.
3. The Structured Audio Sample Bank Format, or SASBF. The Sample Bank format allows for the transmission of banks of audio samples to be used in wavetable synthesis and the description of simple processing algorithms to use with them.
4. A normative scheduler description. The scheduler is the supervisory run-time element of the Structured Audio decoding process. It maps structural sound control, specified in SASL or MIDI, to real-time events dispatched using the normative sound-generation algorithms.
5. Normative reference to the MIDI standards, standardised externally by the MIDI Manufacturers Association. MIDI is an alternate means of structural control which can be used in conjunction with or instead of SASL. Although less powerful and flexible than SASL, MIDI support in this standard provides important backward-compatibility with existing content and authoring tools.
[MIDI] The Complete MIDI 1.0 Detailed Specification v. 96.2, (c) 1996 MIDI Manufacturers Association
Absolute time
The time at which sound corresponding to a particular event is really created; time in the real-world. Contrast score time.Actual parameter The expression which, upon evaluation, is passed to an opcode as a parameter value.
A-cycle See audio cycle.
A-rate See audio rate.
asig The lexical tag indicating an a-rate variable.
Audio cycle The sequence of processing which computes new values for all a-rate expressions in a particular code block.
Audio rate The rate type associated with a variable, expression or statement which may generate new values as often as the sampling rate.
Audio sample A short snippet or clip of digitally represented sound. Typically used in wavetable synthesis.
Authoring In Structured Audio, the combined processes of creatively composing music and sound control scripts, creating instruments which generate and alter sound, and encoding the instruments, control scripts, and audio samples in MPEG-4 Structured Audio format.
Backus-Naur Format (BNF) A format for describing the syntax of programming languages, used here to specify the SAOL and SASL syntax. See Subclause 5.0.4.2.
Bank A set of samples used together to define a particular sound or class of sounds with wavetable synthesis.
Beat The unit in which score time is measured.
BNF See Backus-Naur Format.
Bus An area in memory which is used to pass the output of one instrument into the input of another.
Context See state space.
Control An instruction used to describe how to use a particular synthesis method to produce sound.
EXAMPLES
"Using the piano instrument, play middle C at medium volume for 2 seconds."
"Glissando the violin instrument up to middle C."
"Turn off the reverberation for 8 seconds."
Control cycle The sequence of processing which computes new values for all control-rate expressions in a particular code block.
Control period The length of time (typically measured in audio samples) corresponding to the control rate.
Control rate 1. The rate at which instantiation and termination of instruments, parametric control of running instrument instances, sharing of global variables, and other non-sample-by-sample computation occurs in a particular orchestra.
2. The rate type of variables, expressions, and statements which can generate new values as often as the control rate.
Decoding The process of turning an MPEG-4 Structured Audio bitstream into sound.
Duration The amount of time between instantiation and termination of an instrument instance.
Encoding The process of creating a legal MPEG-4 bitstream, whether automatically, by hand, or using special authoring tools.
Envelope A loudness-shaping function applied to a sound, or more generally, any function controlling a parametric aspect of a sound
Event One control instruction.
Expression A mathematical or functional combination of variable values, symbolic constants, and opcode calls.
Formal parameter The syntactic element which gives a name to one of the parameters of an opcode.
Future wavetable A wavetable which is declared but not defined in the SAOL orchestra; its definition must arrive in the bitstream before it is used.
Global block The section of the orchestra which describes global variables, route and send statements, sequence rules, and global parameters.
Global context The state space used to hold values of global variables and wavetables.
Global parameters The sampling rate, control rate, and number of input and output channels of audio associated with a particular orchestra.
Global variable A variable which can be accessed and/or changed by several different instruments.
Grammar A set of rules which describes the set of allowable sequences of lexical elements comprising a particular language.
Guard expression The expression standing at the front of an if, while, or else statement which determines whether or how many times a particular block of code is executed.
I-cycle See initialisation cycle.
Identifier A sequence of characters in a textual SAOL program which denotes a symbol.
Informative Aspects of a standards document which are provided to assist implementors, but are not required to be implemented in order for a particular system to be compliant to the standard.
I-pass See initialisation pass.
I-rate See initialisation rate.
Initialisation cycle See initialisation pass.
Initialisation rate The rate type of variables, expressions, and statements which are set once at instrument instantation and then do not change.
Initialisation pass The sequence of processing which computes new values for each i-rate expression in a particular code block.
Instance See instrument instantiation.
Instantiation The process of creating a new instrument instantiation based on an event in the score or statement in the orchestra.
Instrument An algorithm for parametric sound synthesis, described using SAOL. An instrument encapsulates all of the algorithms needed for one sound-generation element to be controlled with a score.
NOTE
An MPEG-4 Structured Audio instrument does not necessarily correspond to a real-world instrument. A single instrument might be used to represent an entire violin section, or an ambient sound such as the wind. On the other hand, a single real-world instrument which produces many different timbres over its performance range might be represented using several SAOL instruments.
Instrument instantiation The state space created as the result of executing a note-creation event with respect to a SAOL orchestra.
ivar The lexical tag indicating an i-rate variable.
K-cycle See control cycle.
K-rate See control rate.
ksig The lexical tag indicating a k-rate variable.
Lexical element See token.
Looping A typical method of wavetable synthesis. Loop points in an audio sample are located and the sound between those endpoints is played repeatedly while being simultaneously modified by envelopes, modulators, etc.
MIDI The Musical Instrument Digital Interface standards, see [MIDI] in Subclause 5.0.2. MIDI is one method for specifying control of synthesis in MPEG-4 Structured Audio.
Natural Sound A sound created through recording from a real acoustic space. Contrasted with synthetic sound.
Normative Those aspects of a standard which must be implemented in order for a particular system to be compliant to the standard.
Opcode A parametric signal-processing function which encapsulates a certain functionality so that it may be used by several instruments.
Orchestra The set of sound-generation and sound-processing algorithms included in an MPEG-4 bitstream. Includes instruments, opcodes, routing, and global parameters.
Orchestra cycle A complete pass through the orchestra, during which new instrument instantiations are created, expired ones are terminated, each instance receives one k-cycle and one control period worth of a-cycles, and output is produced.
Parameter fields The names given to the parameters to an instrument.
P-fields See parameter fields.
Production rule In Backus-Naur Form grammars, a rule which describes how one syntactic element may be expressed in terms of other lexical and syntactic elements.
Rate-mismatch error The condition that results when the rate semantics rules are violated in a particular SAOL construction. A type of syntax error.
Rate semantics The set of rules describing how rate types are assigned to variables, expressions, statements, and opcodes, and the normative restrictions that apply to a bitstream regarding combining these elements based on their rate types.
Rate type The "speed of execution" associated with a particular variable, expression, statement, or opcode.
Route statement A statement in the global block which describes how to place the output of a certain set of instruments onto a bus.
Run-time error The condition that results from improper calculations or memory accesses during execution of a SAOL orchestra.
SASBF See Sample Bank Format
SAOL The Structured Audio Orchestra Language, pronounced like the English word "sail." SAOL is a digital-signal processing language which allows for the description of arbitrary synthesis and control algorithms as part of the content bitstream.
SAOL orchestra See orchestra.
SASL The Structured Audio Score Language. SASL is a simple format which allows for powerful and flexible control of music and sound synthesis.
Sample See Audio sample.
Sample Bank Format A component format of MPEG-4 Structured Audio which allows the description of a set of samples for use in wavetable synthesis and processing methods to apply to them.
Scheduler The component of MPEG-4 Structured Audio which describes the mapping from control instructions to sound synthesis using the specified synthesis techniques. The scheduler description provides normative bounds on event-dispatch times and responses.
Scope The code within which access to a particular variable name is allowed.
Score A description in some format of the sequence of control parameters needed to generate a desired music composition or sound scene. In MPEG-4 Structured Audio, scores are described in SASL and/or MIDI.
Score time The time at which an event happens in the score, measured in beats. Score time is mapped to absolute time by the current tempo.
Send statement A statement in the global block which describes how to pass a bus on to an effect instrument for post-processing.
Semantics The rules describing what a particular instruction or bitstream element should do. Most aspects of bitstream and SAOL semantics are normative in MPEG-4.
Sequence rules The set of rules, both default and explicit, given in the global block which define in what order to execute instrument instantiations during an orchestra cycle.
Signal variable A unit of memory, labelled with a name, which holds intermediate processing results. Each signal variable in MPEG-4 Structured Audio is instantaneously representable by a 32-bit floating point value.
Spatialisation The process of creating special sounds which a listener perceives as emanating from a particular direction.
State space A set of variable-value associations which define the current computational state of an instrument instantiation or opcode call. All the "current values" of the variables in an instrument or opcode call.
Statement "One line" of a SAOL orchestra.
Structured audio Sound-description methods which make use of high-level models of sound generation and control. Typically involving synthesis description, structured audio techniques allow for ultra-low bitrate description of complex, high-quality sounds. See [SAUD] in Subclause 5.0.5.
Symbol A sequence of characters in a SAOL program, or a symbol token in a MPEG-4 Structured Audio bitstream, which represents a variable name, instrument name, opcode name, table name, bus name, etc.
Symbol table In an MPEG-4 Structured Audio bitstream, a sequence of data which allows the tokenised representation of SAOL and SASL code to be converted back to a readable textual representation. The symbol table is an optional component.
Symbolic constant A floating-point value explicitly represented as a sequence of characters in a textual SAOL orchestra, or as a token in a bitstream.
Syntax The rules describing what a particular instruction or bitstream element should look like. All aspects of bitstream and SAOL syntax are normative in MPEG-4.
Syntax error The condition that results when a bitstream element does not comply with its governing rules of syntax.
Synthesis The process of creating sound based on algorithmic descriptions.
Synthetic Sound Sound created through synthesis.
Tempo The scaling parameter which specifies the relationship between score time and absolute time. A tempo of 60 beats per minute means that the score time measured in beats is equivalent to the absolute time measured in seconds; higher numbers correspond to faster tempi, so that 120 beats per minute is twice as fast.
Terminal The "client side" of an MPEG transaction; whatever hardware and software are necessary in a particular implementation to allow the capabilities described in this document.
Termination The process of destroying an instrument instantiation when it is no longer needed.
Timbre The combined features of a sound which allow a listener to recognise such aspects as the type of instrument, manner of performance, manner of sound generation, etc. Those aspects of sound which distinguish sounds equivalent in pitch and loudness.
Token A lexical element of a SAOL orchestra: a keyword, punctuation mark, symbol name, or symbolic constant.
Tokenisation The process of converting a orchestra in textual SAOL format into a bitstream representation consisting of a stream of tokens.
Variable See signal variable.
Wavetable synthesis A synthesis method in which sound is created by simple manipulation of audio samples, such as looping, pitch-shifting, enveloping, etc.
Width The number of channels of data which an expression represents.
The Structured Audio bitstream syntax is described using MSDL, the MPEG-4 Syntactic Description Language. See 14496-1 Subclause XXX.
The textual SAOL syntax (in Subclause 5.4) is described using extended Backus-Naur format (BNF) notation [see DRAG in Subclause 5.0.5]. BNF is a description for context-free grammars of programming languages. Normative BNF rules will be described in
the ARIEL font.BNF grammars are composed of terminals, also called tokens, and production rules. Terminals represent syntactic elements of the language, such as keywords and punctuation; production rules describe the composition of these elements into structural groups.
Terminals will be represented in
boldface; production rules will be represented in <angle brackets>.The rewrite rules which map productions into sequences of other productions and terminals are represented with the
-> symbol.EXAMPLE
<letter> -> a
<letter> -> b
<sequence> -> <letter>
<sequence> -> <letter> <sequence>
This grammar (starting from the
sequence token) describes, using a recursive rewrite rule and a two-symbol alphabet, all strings containing at least one letter which are made up of ‘a’ and ‘b’ characters.In addition, rewrite rules using optional elements will be described using the
[ ] symbols. Using this notation does not increase the power of the syntax description (in terms of the languages it can represent), but makes certain constructs simpler.EXAMPLE
<head> -> c
<seqhead> -> [<head>] <sequence>
This grammar (starting from the
seqhead token) describes, in addition to the set above, all strings beginning with a ‘c’ character and followed by a sequence of ‘a’s and ‘b’s.The
NULL token may be used to indicate that a sequence of no characters (the empty string) is a permissible rewrite for a particular production.Normative aspects of the relationship between the BNF grammar, other grammar representation methods, the bitstream syntax, and the textual description format are described in Subclause 5.4.1.
The SASL syntax is specified using extended BNF grammars, as described in Subclause 5.0.4.2.
[DRAG]
Aho, Alfred V., and Ravi Sethi and Jeffrey Ullman, Compilers: Principles, Techniques, and Tools. Reading, Mass: Addison-Wesley, 1984.[ICASSP] Scheirer, Eric, "The MPEG-4 Structured Audio standard", Proc 1998 IEEE ICASSP, Seattle, 1998.
[NETSOUND] Casey, Michael, and Paris Smaragdis, "Netsound", Proc. 1996 ICMC, Hong Kong, 1996.
[SAFX] Scheirer, Eric, "Structured audio and effects processing in the MPEG-4 multimedia standard", ACM Multimedia Sys. J., in press.
[SAOL] Scheirer, Eric, "SAOL: The MPEG-4 Structured Audio Orchestra Language", Proc 1998 ICMC, Ann Arbor, MI, 1998.
[SAUD] Vercoe, Barry, and William G. Gardner and Eric D. Scheirer , "Structured Audio: Creation, Transmission, and Rendering of Parametric Sound Descriptions". Proc. IEEE 85:5 (May 1998), pp.
[WAVE] Scheirer, Eric, and Lee Ray, "Algorithmic and wavetable synthesis in the MPEG-4 multimedia standard". Proc 105th Conv AES, San Francisco, 1998.
This Subclause describes the bitstream format defining an MPEG-4 Structured Audio bitstream.
Each group of classes is notated with normative semantics, which define the meaning of the data represented by those classes.
/*********************************
symbol table definitions
***********************************/
class symbol {
unsigned int(16) sym; // no more than 65536 symbols/orch + score
}
class sym_name { // one name in a symbol table
unsigned int(4) length; // names up to 16 chars long
unsigned int(8) name[length];
}
class symtable { // a whole symbol table
unsigned int(16) length; // no more than 65536 symbols/orch+score
sym_name name[length];
}
A bitstream may contain a symbol table, but this is not required. The symbol table allows textual SAOL and SASL code to be recovered from the tokenised bitstream representation. The inclusion or exclusion of a symbol table does not affect the decoding process.
If a symbol table is included, then all or some of the symbols in the orchestra and score shall be associated with a textual name in the following way: each symbol (a symbol is just an integer) shall be associated with the character string paired with that symbol in a
sym_name object. There shall be no more than one name associated with a given symbol, otherwise the bitstream is invalid. It is permissible for the symbol table to be incomplete and contain names associated with some, but not all, symbols used in the orchestra and score.SAOL and SASL implementations which require textual input, rather than tokenised input, are permissible in a compliant decoder, in which case the decoder must detokenise the bitstream before it can be processed. In such a case, any symbols without associated names are suggested to be associated with a default name of the form
_sym_x, where x is the symbol value. Names of this form are reserved in SAOL for this purpose, and so following this suggestion guarantees that names will not clash with symbol-table-defined symbol names./*********************************
orchestra file definitions
***********************************/
class orch_token { // a token in an orchestra
int done;
unsigned int(8) token; // see standard token table, Annex A
switch (token) {
case 0xF0 : // a symbol
symbol sym; // the symbol name
break;
case 0xF1 : // a constant value
float(32) val; // the floating-point value
break;
case 0xF2 : // a constant int value
unsigned int(32) val; // the integer value
break;
case 0xF3 : // a string constant
int(8) length;
unsigned int(8) str[length]; // strings no more than 256 chars
break;
case 0xFF : // end of orch
done = 1;
break;
}
}
class orc_file { // a whole orch file
unsigned int(16) length;
orch_token data[length];
}
An orchestra file is a string of tokens. These tokens represent syntactic elements such as reserved words, core opcode names, and punctuation marks as given in the table in Annex A; in addition, there are five special tokens. Token 0xF0 is the symbol token; when it is encountered, the next 16 bits in the bitstream shall be a symbol number. Token 0xF1 is the value token; when it is encountered, the next 32 bits in the bitstream shall be a floating-point value. This token shall be used for all symbolic constants within the SAOL program except for those encountered in special integer contexts, as described in Subclause 5.8. Token 0xF2 is the integer token; when it is encountered, the next 32 bits in the bitstream shall be an unsigned integer value. Token 0xF3 is the string token; when it is encountered, the next several bits in the bitstream shall represent a character string (this token is currently unused). Token 0xFF is the end-of-orchestra token; this token has no syntactic function in the SAOL orchestra, but signifies the end of the orchestra file section of the bitstream.
Not every sequence of tokens is permitted to occur as an orchestra file. Subclause 5.4 contains extensive syntactic rules restricting the possible sequence of tokens, described according to the textual SAOL format. Normative rules for mapping back and forth between the tokenised format and the textual format are given in Subclause 5.8. The overall sequence of orchestra tokens shall correspond to an
<orchestra> production as given in Subclause 5.4.4./*********************************
score file definitions
***********************************/
class instr_event { // a note-on event
bit(1) has_label;
if (has_label)
symbol label;
symbol iname_sym; // the instrument name
float(32) dur; // note duration
unsigned int(8) num_pf;
float(32) pf[num_pf]; // all the pfields (no more than 256)
}
class control_event { // a control event
bit(1) has_label;
if (has_label)
symbol label;
symbol varsym; // the controller name
float(32) value; // the new value
}
class table_event {
symbol tname; // the name of the table
bit(1) destroy; // a table destructor
if (!destroy) {
token tgen; // a core wavetable generator
bit(1) refers_to_sample;
if (refers_to_sample)
symbol table_sym; // the name of the sample
unsigned int(16) num_pf; // the number of pfields
float(32) pf[num_pf]; // all the pfields
}
}
class end_event {
// fixed at nothing
}
class tempo_event { // a tempo event
float(32) tempo;
}
class score_line {
float(32) time; // the event time
bit(3) type;
switch (type) {
case 0b000 : instr_event inst; break;
case 0b001 : control_event control; break;
case 0b010 : table_event table; break;
case 0b100 : end_event end; break;
case 0b101 : tempo_event tempo; break;
}
}
class score_file {
unsigned int(20) num_lines; // a whole score file
score_line lines[num_lines];
}
A score file is a set of lines of score information provided in the stream information header. Thus, events which are known before the real-time bitstream transmission begins may be included in the header, so that they are available to the decoder immediately, which may aid efficient computation in certain implementations. Each line shall be one of five events. Each type of event has different implications in the decoding and scheduling process, see Subclause 5.3.3. An instrument event specifies the start time, instrument name symbol, duration, and any other parameters of a note played on a SAOL instrument. A control event specifies a control parameter which is passed to a instrument or instruments already generating sound. A table event dynamically creates or destroys a global wavetable in the orchestra. An end event signifies the end of orchestra processing. A tempo event dynamically changes the tempo of orchestra playback.
A score file need not be presented in increasing order of event times; the events shall be "sorted" by the scheduler as they are processed.
/*********************************
MIDI definitions
***********************************/
/* NB that a midi_file (SMF format) is not just an array
of MIDI events */
class midi_event {
// not done yet
}
class midi_file {
/* Right now it's just an array of bytes; I'd rather do
this that lay out the whole SMF format here */
unsigned int(20) length;
unsigned int(8) data[length];
}
The MIDI chunks allow the inclusion of MIDI score information in the bitstream header and bitstream. The MIDI event class contains a single MIDI instruction as specified in [MIDI]; the MIDI file class contains an array of bytes corresponding to a Standard MIDIFile as specified in [MIDI]. Note that not every sequence of data may occur in either case; the legal syntaxes of MIDI events and MIDIFiles as specified in [MIDI] place normative bounds on syntactically valid MPEG-4 Structured Audio bitstreams. The semantics of MIDI data are given in Subclause 5.9 (for Profile 1 and 2 implementations) and Subclause 5.10 (for Profile 4 implementations).
/**********************************
sample data
************************************/
class sample {
/* note that 'sample' can be used for any big chunk of data
which needs to get into a wavetable */
symbol sample_name_sym;
unsigned int(24) length; // length in samples
bit(1) has_srate;
if (has_srate)
unsigned int(17) srate; // sampling rate (needs to go to 96 KHz)
bit(1) has_loop;
if (has_loop) {
unsigned int(24) loopstart; // loop points in samples
unsigned int(24) loopend;
}
bit(1) has_base;
if (has_base)
float(32) basecps; // base freq in Hz
bit(1) float_sample;
if (float_sample) {
float(32) float_sample_data[length];
}
else {
int(16) sample_data[length]; // all the data
}
}
A sample chunk includes a block data which will be included in a wavetable in a SAOL orchestra. Each sample consists of a name, a length, a block of data, and four optional parameters: the sampling rate, the loop start and loop end points, and the base frequency. Access to the data in the sample is provided through the sample core wavetable generator, see Subclause 5.6.2.
The sample data may be represented either as 32-bit floating point values, in which case it shall be scaled between –1 and 1, or may be represented as 16-bit integer values, in which case it shall be scaled between -32767 and 32768. In the case that the sample data is represented as integer values, upon inclusion in a wavetable, it shall be rescaled to floating-point as described in Subclause 5.6.2.
/**********************************
sample bank data
************************************/
The sample bank chunk describes a bank of wavetable data and associated processing parameters for use with the sample bank synthesis procedure in Subclause 5.9.
const int sbf_chunk_ID = 0x7366626b; // ‘sfbk’
const int INFO_list_ID = 0x494e464f; // ‘INFO’
const int ifil_chunk_ID = 0x6966696c; // ‘ifil’
const int isng_chunk_ID = 0x69736e67; // ‘isng’
const int INAM_chunk_ID = 0x494e414d; // ‘INAM’
const int irom_chunk_ID = 0x69726f5d; // ‘irom’
const int iver_chunk_ID = 0x69766572; // ‘iver’
const int ICRD_chunk_ID = 0x49435244; // ‘ICRD’
const int IENG_chunk_ID = 0x49454e47; // ‘IENG’
const int IPRD_chunk_ID = 0x49505244; // ‘IPRD’
const int ICOP_chunk_ID = 0x49434f50; // ‘ICOP’
const int ICMT_chunk_ID = 0x49434d54; // ‘ICMT’
const int ISFT_chunk_ID = 0x49534654; // ‘ISFT’
const int sdta_chunk_ID = 0x73647461; // ‘sdta’
const int smpl_chunk_ID = 0x736d706c; // ‘smpl’
const int pdta_chunk_ID = 0x70647461; // ‘pdta’
const int phdr_chunk_ID = 0x70686472; // ‘phdr’
const int pbag_chunk_ID = 0x70626167; // ‘pbag’
const int pmod_chunk_ID = 0x706d6f64; // ‘pmod’
const int pgen_chunk_ID = 0x7067656e; // ‘pgen’
const int inst_chunk_ID = 0x696e7374; // ‘inst’
const int ibag_chunk_ID = 0x69626167; // ‘ibag’
const int imod_chunk_ID = 0x696d6f64; // ‘imod’
const int igen_chunk_ID = 0x6967656e; // ‘igen’
const int shdr_chunk_ID = 0x73686472; // ‘shdr’
aligned(16) class chunk: bit(32) ckID = 0x00000000 {
unsigned int(32) ckSize; // size of chunk data in bytes
}
class ifil_chunk extends chunk: bit(32) ckID = ifil_chunk_ID {
unsigned int(16) wMajor; // file format version number
unsigned int(16) wMinor;
}
class isng_chunk extends chunk: bit(32) ckID = isng_chunk_ID {
char(8) isng[ck_hdr.ckSize]; // sound engine identifier
}
class INAM_chunk extends chunk: bit(32) ckID = INAM_chunk_ID {
char(8) INAM[ck_hdr.ckSize]; // bank name
}
class irom_chunk extends chunk: bit(32) ckID = irom_chunk_ID {
char(8) irom[ck_hdr.ckSize]; // rom name
}
class iver_chunk extends chunk: bit(32) ckID = iver_chunk_ID {
unsigned int(16) wMajor; // rom version
unsigned int(16) wMinor;
}
class ICRD_chunk extends chunk: bit(32) ckID = ICRD_chunk_ID {
char(8) ICRD[ck_hdr.ckSize]; // creation date
}
class IENG_chunk extends chunk: bit(32) ckID = IENG_chunk_ID {
char(8) IENG[ck_hdr.ckSize]; // sound designer name
}
class IPRD_chunk extends chunk: bit(32) ckID = IPRD_chunk_ID {
char(8) IPRD[ck_hdr.ckSize]; // product name
}
class ICOP_chunk extends chunk: bit(32) ckID = ICOP_chunk_ID {
char(8) ICOP[ck_hdr.ckSize]; // copyright string
}
class ICMT_chunk extends chunk: bit(32) ckID = ICMT_chunk_ID {
char(8) ICMT[ck_hdr.ckSize]; // comment string
}
class ISFT_chunk extends chunk: bit(32) ckID = ISFT_chunk_ID {
char(8) ISFT[ck_hdr.ckSize]; // tool name
}
class INFO_list extends chunk: bit(32) ckID = INFO_list_ID {
ifil_chunk ifil_ck;
isng_chunk isng_ck;
INAM_chunk INAM_ck;
aligned(16) bit(32)* test0;
if (test0 == irom_chunk_ID) {
irom_chunk irom_ck;
}
aligned(16) bit(32)* test1;
if (test1 == iver_chunk_ID) {
iver_chunk iver_ck;
}
aligned(16) bit(32)* test2;
if (test2 == ICRD_chunk_ID) {
ICRD_chunk ICRD_ck;
}
aligned(16) bit(32)* test3;
if (test3 == IENG_chunk_ID) {
IENG_chunk IENG_ck;
}
aligned(16) bit(32)* test4;
if (test4 == IPRD_chunk_ID) {
IPRD_chunk IPRD_ck;
}
aligned(16) bit(32)* test5;
if (test5 == ICOP_chunk_ID) {
ICOP_chunk ICOP_ck;
}
aligned(16) bit(32)* test6;
if (test6 == ICMT_chunk_ID) {
ICMT_chunk ICMT_ck;
}
aligned(16) bit(32)* test7;
if (test7 == ISFT_chunk_ID) {
ISFT_chunk ISFT_ck;
}
}
class smpl_chunk extends chunk: bit(32) ckID = smpl_chunk_ID {
int(16) smpl[ck_hdr.ckSize / 2]; // sample data
}
class sdta_list extends chunk: bit(32) ckID = sdta_chunk_ID {
smpl_chunk smpl_ck;
}
class phdr_chunk extends chunk: bit(32) ckID = phdr_chunk_ID {
unsigned int i;
for (i = 0; i < ck_hdr.ckSize / 38; i++) {
char(8) achPresetName[20];
unsigned int(16) wPreset;
unsigned int(16) wBank;
unsigned int(16) wPresetBagNdx;
unsigned int(32) dwLibrary;
unsigned int(32) dwGenre;
unsigned int(32) dwMorphology;
}
}
class bag_chunk extends chunk {
unsigned int i;
for (i = 0; i < ck_hdr.ckSize / 4; i++) {
unsigned int(16) wGenNdx;
unsigned int(16) wModNdx
}
}
class pbag_chunk extends bag_chunk: bit(32) ckID = pbag_chunk_ID {
}
class ibag_chunk extends bag_chunk: bit(32) ckID = ibag_chunk_ID {
}
class mod_chunk extends chunk {
unsigned int i;
for (i = 0; i < ck_hdr.ckSize / 10; i++) {
unsigned int(16) sfModSrcOper;
unsigned int(16) sfModDestOper;
int(16) modAmount;
unsigned int(16) sfModAmtSrcOper;
unsigned int(16) sfModTransOper;
}
}
class pmod_chunk extends mod_chunk: bit(32) ckID = pmod_chunk_ID {
}
class imod_chunk extends mod_chunk: bit(32) ckID = imod_chunk_ID {
}
class gen_chunk extends chunk {
unsigned int i;
for (i = 0; i < ck_hdr.ckSize / 4; i++) {
unsigned int(16) sfGenOper;
bit(16) genAmount;
}
}
class pgen_chunk extends gen_chunk: bit(32) ckID = pgen_chunk_ID {
}
class igen_chunk extends gen_chunk: bit(32) ckID = igen_chunk_ID {
}
class inst_chunk extends chunk {
unsigned int i;
for (i = 0; i < ck_hdr.ckSize / 22; i++) {
char(8) achInstName[20];
unsigend int(16) wInstBagNdx;
}
}
class shdr_chunk extends chunk {
unsigned int i;
for (i = 0; i < ck_hdr.ckSize / 46; i++) {
char(8) achSampleName[20];
unsigned int(32) dwStart;
unsigned int(32) dwEnd;
unsigned int(32) dwStartloop;
unsigned int(32) dwEndloop;
unsigned int(32) dwSampleRate;
unsigned int(8) byOriginalPitch;
int(8) chCorrection;
unsigned int(16) wSampleLink;
unsigned int(16) sfSampleType;
}
}
class pdta_list extends chunk: bit(32) ckID = pdta_chunk_ID {
phdr_chunk phdr_ck; // preset headers
pbag_chunk pbag_ck; // preset index list
pmod_chunk pmod_ck; // preset modulator list
pgen_chunk pgen_ck; // preset generator list
inst_chunk inst_ck; // instrument names and indices
ibag_chunk ibag_ck; // instrument index list
imod_chunk imod_ck; // instrument modulator list
igen_chunk igen_ck; // instrument generator list
shdr_chunk shdr_ck; // sample headers
}
class sbf extends chunk: bit(32) ckID = sbf_chunk_ID {
INFO_list INFO_lt;
sdta_list sdta_lt;
pdta_list pdta_lt;
}
/***********************************
bitstream formats
***********************************/
class SA_decoder_config { // the bitstream header
bit more_data = 1;
while (more_data) { // must have at least one chunk
bit(3) chunk_type;
switch (chunk_type) {
case 0b000 : orc_file orc; break;
case 0b001 : score_file score; break;
case 0b010 : midi_file SMF; break;
case 0b011 : sample samp; break;
case 0b100 : sbf sample_bank; break;
case 0b101 : symtable sym; break;
}
bit(1) more_data;
}
}
The bitstream decoder configuration contains all the information required to configure and start up a structured audio decoder. It contains a sequence of one or more chunks, where each chunk is of one of the following types: orchestra file, score file, midi file, sample data, sample bank, or symbol table.
class SA_access_unit { // the streaming data
bit(2) event_type;
switch (event_type) {
case 0b00 : score_line score_ev; break;
case 0b01 : midi_event midi_ev; break;
case 0b10 : sample samp; break;
}
}
The Structured Audio access unit contains real-time streaming control information to be provided to a running Structured Audio decoding process. It shall not contain new instrument definitions; the orchestra configuration is fixed at decoder startup. It may contain score lines, MIDI events, and new sample data.
There are three profiles standardised for Structured Audio, called Profile1, Profile2, and Profile4. Each of these profiles corresponds to a particular set of application requirements. The default profile is Profile 4; when reference is made to MPEG-4 Structured Audio format without reference to a profile, it shall be understood that the reference is to Profile 4.
Terminals implementing MPEG-4 Systems Audio Composition Profile XXX (see ISO/IEC 14496-1, Subclause XXX) shall also implement Structured Audio Profile 4.
1. MIDI only. In this profile, only the midi_file chunk shall occur in the stream information header, and only the midi_event event shall occur in the bitstream data. In this profile, the General MIDI patch mappings are used, and the decoding process is described in Subclause 5.9. This profile is used to enable backward-compatibility with existing MIDI content and rendering devices. Implementation-independent sound quality cannot be produced in this profile.
4. Standard profile. All bitstream elements and stream information elements may occur.
The decoding process for Profile 4 is described in Subclause 5.3.
Decoding processIntroduction
This Subclause describes the decoding process, in which a bitstream conforming to Profile 4 is converted into sound. The decoding process for Profile 1 bitstreams is described in Subclause 5.9, and the decoding process for Profile 2 bitstreams in Subclause 5.9.6.
At the creation of a Structured Audio Elementary Stream, a Structured Audio decoder is instantiated and a bitstream object of class SA_decoder_config provided to that decoder as configuration information. At this time, the decoder shall initialise a run-time scheduler, and then parse the stream information object into its component parts and use them as follows:
·
Orchestra file: The orchestra file shall be checked for syntactic conformance with the SAOL grammar and rate semantics as specified in Subclause 5.4. Whatever preprocessing (i.e., compilation, allocation of static storage, etc.) need be done to prepare for orchestra run-time execution shall be performed.·
Score file: Each event in the score file shall be registered with the scheduler. To "register" means to inform the scheduler of the presence of a particular parametrised event at a particular future time, and the scheduler’s associated actions.·
MIDI file: Each event in the MIDI file shall be converted into an appropriate event as described in Subclause 5.9, and those events registered with the scheduler.·
Sample bank: The data in the bank shall be stored, and whatever preprocessing necessary to prepare for using the bank for synthesis shall be performed.·
Sample data: The data in the sample shall be stored, and whatever preprocessing necessary to prepare the data for reference from a SAOL wavetable generator shall be performed. If the sample data is represented as 16-bit integers in the bitstream, it shall be converted to floating-point format at this time.If there is more than one orchestra file in the stream information header, the various files are combined together via concatenation and processed as one large orchestra file. That is, each orchestra file within the bitstream refers to the same global namespace, instrument namespace, and opcode namespace.
At each time step within the systems operation, the systems layer may present the Structured Audio decoder with an Access Unit containing data conforming to the SA_access_unit class. The run-time responsibility of the Structured Audio decoder is to receive these AU data elements, parse and understand them as the various Structured Audio bitstream data elements, execute the on-going SAOL orchestra, via the scheduler, to produce one Composition Unit of output, and present the systems layer with that Composition Unit.
As Access Units are received from the systems demultiplexer, they are parsed and used by the Structured Audio decoder in various ways, as follows:
·
Score line events shall be registered with the scheduler.·
MIDI events shall be converted into appropriate SAOL events (see Subclause 5.10) and then registered with the scheduler, if they have time stamps, or executed in the next k-cycle, if not.·
Sample data shall be stored, and whatever preprocessing is necessary for reference by forthcoming score lines containing references to that sample shall be performed. . If the sample data is represented as 16-bit integers in the bitstream, it shall be converted to floating-point format at this time.
The scheduler is the central control mechanism of a Structured Audio decoding system. It is responsible for handling events by instantiating and terminating instruments, keeping track of what instrument instantiations are active, instructing the various instrument instantiations to perform synthesis, routing the output of instruments onto busses, and sending busses to effects instruments. Although there are many ways to perform these tasks, the exact nature of what must be done can be clearly specified. This Subclause provides normative bounds on the activities of the scheduler.
To instantiate an instrument is to create data space for its variables and the data space required for any opcodes called by that instrument. When an instrument is instantiated, the following tasks shall be performed. First, space for any parameter fields shall be allocated and their values set according to the p-fields of the instantiating expression or event. Then, space for any locally declared variables shall be allocated and these variable values set to 0. Then, the current values of any imported i-rate variables shall be copied into the local storage space. Then, locally declared wavetables shall be created and filled with data according to their declaration and the appropriate rules in Subclause 5.6.
To terminate an instrument instantiation is to destroy the data space for that instance.
To execute an instrument instantiation at a particular rate is to calculate the results of the instructions given in that instrument definition. When an instrument instance is executed at a particular rate, the following steps shall be performed. First, the values of any global variables and wavetables imported by that instrument at that rate shall be copied into the storage space of the instrument. In addition, when executing at the a-rate an instrument instance which is the target of a send statement, the current value of the input standard name in the instance shall be set to the current value of the bus or busses referenced in the send statement. Then, the code block for that instrument shall be executed at the particular rate with regard to the data space of the instrument instantiation, as given by the rules in Subclause 5.4.6.6. Then, the values of any global variables and wavetables exported by that instrument at that rate shall be copied into the global storage space. Finally, when executing an instrument instantiation at the a-rate, the value of the instance output shall be added to the bus onto which the instrument is routed according to the rules in Subclause 5.4.5.4, unless the instance is the target of a send expression referencing the special bus output_bus, in which case the output of the instrument instance is the output of the orchestra and may be turned into sound
At orchestra startup time, before the first Composition Unit of audio samples is created in the scheduler, the following tasks shall be performed. First, space for any global signal variables (see Subclause 5.4.5.3) shall be allocated and their values set to zero. If there is an instrument called startup in the orchestra, that instrument shall be instantiated and executed at the i-rate. After this execution is complete, then all global wavetables are created and filled with data according to their definitions in the global block of the orchestra and the appropriate rules in Subclause 5.6.
After the global wavetable creation, the orchestra busses are initialised. Each bus’s width is determined, in the order specified by the global sequencing rules (Subclause 5.4.5.6), as the width of the output expression by instruments on that bus. For the purposes of calculating bus widths, any instrument which does not receive any bus data according to the sequence rules shall have an inchannels width of 0 (this specification is needed since output widths may depend on the value of inchannels).
After busses are created, all instruments which are the targets of send statements as described in Subclause 5.4.5.5 shall be instantiated and executed at the i-rate in the order specified by the global sequencing rules described in the global block according to Subclause 5.4.5.6. Finally, the global absolute orchestra time shall be set to 0.
NOTE
A time is called absolute if it is specified in seconds. When a tempo instruction is first decoded and the value of tempo changes from its default value, the score time and the absolute time are not identical anymore; all the times in the score, subsequent to a tempo line execution, are scaled according to the new tempo and enqueued in absolute dispatch and duration times as specified in Subclause 5.3.3.3.6, list item 7.
In each orchestra cycle, one Composition Unit of samples is produced by the real-time synthesis process. This synthesis is performed according to the rules below and the resulting orchestra output, as described in list item 11, is presented to the Systems layer as a Composition Unit. To execute one orchestra cycle, the following tasks shall be performed in the order denoted:
NOTE
If the current orchestra time differs from the tempo dispatch time, the former shall be used to calculate the new durations and future dispatch times of events.
With regard to all normative language in this Sub-Part of ISO 14496-3, conformance to the normative language is measured at the time of orchestra output. Any optimisation of SAOL code or rearrangement of processing sequence may be performed as long as to do so has no effect on the output of the orchestra. "Has no effect" in this sense means that the output of the rearranged or optimised orchestra is sample-by-sample identical to the output of the original orchestra according to the decoding rules given in this Subpart.
SAOL syntax and semantics Relationship with bitstream syntax
The bitstream syntax description as given in Subclause 5.1 specifies the representation of SAOL instruments and algorithms that shall be presented to the decoder in the bitstream. However, the tokenised description as presented there is not adequate to describe the SAOL language syntax and semantics. In addition, for purposes of enabling bitstream creation and exchange in robust manner, it is useful to have a standard human-readable textual representation of SAOL code in addition to the tokenised binary format.
The Backus-Naur Format (BNF) grammar presented in this Subclause denotes a language, or an infinite set of programs; the legal programs which may be transmitted in the bitstream are restricted to this set. Any program which cannot be parsed by this grammar is not a legal SAOL program – it has a syntax error – and a bitstream containing it is an invalid bitstream. Although the bitstream is made up of tokens, the grammar will be described in terms of lexical elements – a textual representation – for clarity of presentation. The syntactic rules expressed by the grammar which restrict the set of textual programs also normatively restrict the syntax of the bitstream, through the relationship of the bitstream and the textual format in the normative tokenisation process.
This Subclause thus describes a textual representation of SAOL which is standardised, but stands outside of the bitstream-decoder relationship. Subclause 5.8 describes the mapping between this textual representation and the bitstream representation. The exact normative semantics of SAOL will be described in reference to the textual representation, but also apply to the tokenised bitstream representation as created via the normative tokenisation mapping.
Annex C contains a grammar for the SAOL textual language, represented in the ‘lex’ and ‘yacc’ formats. Using these versions of the grammar, parsers can be automatically created using the ‘lex’ and ‘yacc’ tools. However, these versions are for informative purposes only; there is no requirement to use these tools in building a decoder.
Normative language regarding syntax in this Subclause provides bounds on syntactically legal SAOL programs, and by extension, the syntactically legal bitstream sequences which can appear in an orchestra bitstream class. That is, there are constructions which appear to be permissible upon reading only the BNF grammar, but are disallowed in the normative text accompanying the grammar. The status of such constructions is exactly that of those which are outside of the language defined by the grammar alone. In addition, normative language describing static rate semantics further bounds the set of syntactically legal SAOL programs, and by extension, the set of syntactically legal bitstream sequences.
The decoding process for bitstreams containing syntactically illegal SAOL programs (i.e., SAOL programs which do not conform to the BNF grammar, or contain syntax errors or rate mismatch errors) is unspecified.
Normative language regarding semantics in this Subclause describes the semantic bounds on the behaviour of the Structured Audio decoder. Certain constructions describe "run-time error" situations; the behaviour of the decoder in such circumstances is not normative, but implementations are encouraged to recover gracefully from such situations and continue decoding if possible.
Concepts
The textual SAOL orchestra contains punctuation marks, which syntactically disambiguate the orchestra; identifiers, which denote symbols of the orchestra; numbers, which denote constant values; string constants, which are not currently used; comments, which allow internal documentation of the orchestra; and whitespace, which lexically separates the various textual elements. These elements do not occur in the bitstream – since each is represented there by a token – but we define them here to ground the subsequent discussion of SAOL. Within the rest of Subclause 5.4, when we discuss the semantics of "an identifier", this shall be taken to normatively refer to the semantics of the symbol denoted by that identifier; the language used is for clarity of presentation.
A lexical grammar for parsing SAOL, written in the ‘lex’ language, is provided for informative purposes in Annex 5.C.
An identifier is a series of one or more letters, digits and the underscore that begins with a letter or underscore; it denotes a symbol of the orchestra. Every identifier which consists of the same characters in the first 16 characters (is equivalent under string comparison to the first 16 characters) denotes the same symbol. Identifiers are case-sensitive, meaning that identifiers which differ only in the case of one or more characters denote different symbols.
A string of characters equivalent to one of the reserved words listed in Subclause 5.4.9, to one of the standard names listed in Subclause 5.4.6.8, to the name of one of the core opcodes listed in Subclause 5.5.3, or to the name of one of the core wavetable generators listed in Subclause 5.6 does not denote a symbol, but rather denotes that reserved word, standard name, core opcode, or core wavetable generator.
An identifier is denoted in the BNF grammar below by the terminal symbol
<ident>.There are two kinds of symbolic constants which hold numeric values in SAOL: integer constants and floating-point constants.
The integer constant must occur in certain contexts, such as array definitions. An integer token is a series of one or more digits. Since the contexts in which integers must occur in SAOL do not allow negative values, there is no provision for negative integers. A string of characters which appears to be a negative integer shall be lexically analysed as a floating-point constant. No integer constant greater than 2
32 (4294967296) shall occur in the orchestra.An integer constant is denoted in the BNF grammar below by the terminal symbol
<int>.The floating-point constant occurs in SAOL expressions, and denotes a constant numeric value. A floating-point token consists of a base, optionally followed by an exponent. A base is either a series of one or more digits, optionally followed by a decimal point and a series of zero or more digits, or a decimal point followed by a series of one or more digits. An exponent is the letter e, optionally followed by either a + or – character, followed by a series of one or more digits. Since the floating-point constant appears in a SAOL expression, where the unary negation operator is always available, floating-point constants need not be lexically negative. Every floating-point constant in the orchestra shall be representable by a 32-bit floating-point number.
A floating-point constant is denoted in the BNF grammar below by the terminal symbol
<number>.String constants are not used in the normative SAOL specification, but a description is provided here so that they may be treated consistently by implementors who choose to add functionality over and above normative requirements to their implementations.
A string constant denotes a constant string value, that is, a character sequence. A string constant is a series of characters enclosed in double quotation marks ("). The double quotation character may be included in the string constant by preceding it with a backslash (\) character. Any other character, including the line-break (newline) character, may be explicitly enclosed in the quotation marks.
The interpretation and use of string constants is left open to implementors.
Comments may be used in the textual SAOL representation to internally document an orchestra. However, they are not included in the bitstream, and so are lost on a tokenisation/detokenisation sequence.
A comment is any series of characters beginning with two slashes (//), and terminating with a new line. During lexical analysis, whenever the // element is found on a line, the rest of the line is ignored.
Whitespace serves to lexically separate the various elements of a textual SAOL orchestra. It has no syntactic function in SAOL, and is not represented in the bitstream, so the exact whitespacing of a textual orchestra is lost on a tokenisation/detokenisation sequence.
A whitespace is any series of one or more space, tab, and/or newline characters.
Each variable within the SAOL orchestra holds a value, or an ordered set of values for array variables, as an intermediate calculation by the orchestra. At any point in time, the value of a variable, sample in a wavetable, or single element of an array variable, shall be represented by a 32-bit floating-point value.
Conformance to this Subclause is in accordance with Subclause 5.3.4; that is, implementations are free to use any internal representation for variable values, so long as the results calculated are identical to the results of the calculations using 32-bit floating-point values.
NOTE
For certain sensitive digital-filtering operations, the results of using greater precision in a calculation may be equivalently detrimental to orchestra output as the results of using less precision, as the stability of the filter may be critically dependent on the quantization error which is provided with 32-bit values. It is strongly deprecated for bitstreams to contain code which generates widely different results when calculated with 32-bit and 64-bit arithmetic.
At orchestra output, the values calculated by the orchestra should reside between a minimum value of –1 and a maximum value of 1. These values at orchestra output represent the maximum negatively- and positively-valued audio samples which can be produced by the terminal. If the values calculated by the orchestra fall outside that range, they are clipped to [-1,1] as described in Subclause 5.3.3.3 list item 11. When the terminal presents the sound to a listener, it is likely that further rescaling of the signal will be necessary, as required by the particular digital-analog converter present in the terminal. This scaling is not done by the orchestra, but is outside the scope of the standard and happens after all processing described in Subclause 5.3.3.3 is completed.
<orchestra> -> <orchestra element> <orchestra>
<orchestra> -> <orchestra element>
The orchestra is the collection of signal processing routines and declarations that make up a Structured Audio processing description. It shall consist of a list of one or more orchestra elements.
<orchestra element> -> <global block>
<orchestra element> -> <instrument declaration>
<orchestra element> -> <opcode declaration>
<orchestra element> -> <template declaration>
<orchestra element> -> NULL
There are four kinds of orchestra elements:
1. The global block contains instructions for global orchestra parameters, bus routings, global variable declarations, and instrument sequencing. It is not permissible to have more than one global block in an orchestra.
2. Instrument declarations describe sequences of processing instructions which can be parametrically controlled using SASL or MIDI score files.
3. Opcode declarations describe sequences of processing instruments which provide encapsulated functionality used by zero or more instruments in the orchestra.
4. Template declarations describe multiple instruments which differ only slightly using a concise parametric form.
Orchestra elements may appear in any order within the orchestra; in particular, opcode definitions may occur either syntactically before or after they are used in instruments or other opcodes.
Syntactic form
<global block> -> global { <global list> }
<global list> -> <global statement> <global list>
<global list> -> NULL
A global block shall contain a global list, which shall consist of a sequence of zero or more global statements.
<global statement> -> <global parameter>
<global statement> -> <global variable declaration>
<global statement> -> <route statement>
<global statement> -> <send statement>
<global statement> -> <sequence definition>
There are five kinds of global statement:
1. Global parameters set orchestra parameters such as sampling rate, control rate, and number of input and output channels of sound
2. Global variable declarations define global variables which can be shared by multiple instruments.
3. Route statements describe the routing of instrument outputs onto busses.
4. Send statements describe the sending of busses to effects instruments.
5. Sequence definitions describe the sequencing of instruments by the run-time scheduler.
<global parameter> -> srate <int>;
The srate global parameter specifies the audio sampling rate of the orchestra. The decoding process shall create audio internally at this sampling rate. It is not permissible to simplify orchestra complexity or account for terminal capability by generating audio internally at other sampling rates, for to do so may have seriously detrimental effects on certain processing elements of the orchestra.
The srate parameter shall be an integer value between 4000 and 96000 inclusive, specifying the audio sampling rate in Hz. If the srate parameter is not provided in an orchestra, the default shall be the fastest of the audio signals provided as input (see Subclause 5.11). If the sampling rate is not provided, and there are no input audio signals, the default sampling rate shall be 32000 Hz.
<global parameter> -> krate <int>;
The krate global parameter specifies the control rate of the orchestra. The decoding process shall execute k-rate processing internally at this rate. It is not permissible to simplify orchestra complexity or account for terminal capability by executing k-rate processing at other rates, unless it can be determined that to do so will have no effect on orchestra output. In this case, "no effect" means that the resulting output of the orchestra is sample-by-sample identical to the output created if the control rate is not altered.
The krate parameter shall be an integer value between 1 and the sampling rate inclusive, specifying the control rate in Hz. If the krate parameter is not provided in an orchestra, the default control rate shall be 100 Hz.
If the control rate as determined by the previous paragraph is not an even divisor of the sampling rate, then the control rate is the next larger integer which does evenly divide the sampling rate. The control period of the orchestra is the number of samples, or amount of time represented by these samples, in one control cycle.
<global parameter> -> inchannels <int>;
The inchannels global parameter specifies the number of input channels to process. If there are fewer than this many audio channels provided as input sources, the additional channels shall be set to continuous zero-valued signals. If there are more than this many audio channels provided as input sources, the extra channels are ignored.
If the inchannels parameter is not provided in an orchestra, the default shall be the sum of the numbers of channels provided by the input sources (see Subclause 5.11). If there are no input sources provided, the value shall be 0.
<global parameter> -> outchannels <int>;
The outchannels global parameter specifies the number of output channels of sound to produce. The run-time decoding process shall produce and render this number of channels internally. It is not permissible to simplify orchestra complexity or account for terminal capability by producing fewer channels.
If the outchannels parameter is not provided in an orchestra, the default shall be one channel.
<global variable declaration> -> ivar <namelist> ;
<global variable declaration> -> ksig <namelist> ;
<global variable declaration> -> <table declaration> ;
Global variable declarations declare variables which may be shared and accessed by all instruments and by a SASL score. Only ivar and ksig type variables, as well as wavetables, may be declared globally. A global variable declaration is either a table definition, or an allowed type name followed by a list of name declarations.
A global name declaration specifies that a name token shall be created and space equal to one signal value allocated for variable storage in the global context. A global array declaration specifies that a name token shall be created and space equal to the specified number of signal values allocated in the global context.
<namelist> -> <name>, <namelist>
<namelist> -> <name>
A namelist is a sequence of one or more name declarations.
<name> -> <ident>
<name> -> <ident>[<array length>]
<array length> -> <int>
<array length> -> inchannels
<array length> -> outchannels
A name declaration is an identifier (see Subclause 5.4.2.2), or an array declaration. For an array declaration, the parameter shall be either an integer strictly greater than 0, or one of the tokens inchannels or outchannels. If the latter, the array length shall be the same as the number of input channels or output channels to the instrument, respectively, as described in Subclause 5.4.5.2. It is illegal to use the token inchannels if the number of input channels to the instrument is 0.
Not every identifier may be used as a variable name; in particular, the reserved words listed in Subclause 5.4.8, the standard names listed in Subclause 5.4.6.8, the names of the core opcodes listed in Subclause 5.5, and the names of the core wavetable generators listed in Subclause 5.6 shall not be declared as variable names.
<table declaration> -> table <ident> ( <ident> , <expr> [ , <expr list>] ) ;
<expr>
Wavetables are structures of memory allocated for the typical purpose of allowing rapid oscillation, looping, and playback. The wavetable declaration associates a name (the first identifier) with a wavetable created by a core wavetable generator referenced by the second identifier. It is a syntax error if the second identifier is not one of the core wavetable generators named in Subclause 5.6. The first expression in the comma-delimited parameter sequence is termed the size expression; the remaining zero or more expressions comprise the wavetable parameter list.
The semantics of the size expression and wavetable parameter list are determined by the particular core wavetable generator, see Subclause 5.6. Any expression which is i-rate (see Subclause 5.4.6.7.2) is legal as part of the table parameter list; in particular, reference to i-rate global variables is allowed (their values may be set by the special instrument startup). Each expression must be single-valued, except in the case of the concat generator (Subclause 5.6.16), in which case the expressions must be table references. The order of creation of wavetables is non-deterministic; it is not recommended for calls to the tableread() opcode to occur in the table parameter expressions, and to do so gives unspecified results.
A global wavetable may be referenced by a wavetable placeholder in any instrument or opcode. See Subclause 5.4.6.5.4. Global wavetables shall be created and initialised with data at orchestra initialisation time, immediately after the execution of the special instrument startup. They shall not be destroyed unless they are explicitly destroyed or replaced by a table line in a SASL score.
To create a wavetable, first, the expression fields are evaluated in the order they appear in the syntax according to the rules in Subclause 5.4.6.7. Then, the particular wavetable generator named in the second identifier is executed; the normative semantics of each wavetable generator detail exactly how large a wavetable shall be created, and which values placed in the wavetable, for each generator.
<route statement> -> route ( <ident> , <identlist> ) ;
<identlist> -> <ident> , <identlist>
<identlist> -> <ident>
<identlist> -> <NULL>
A route statement consists of a single identifier, which specifies a bus, and a sequence of one or more instrument names, which specify instruments. The route statement specifies that the instruments listed do not produce sound output directly, but instead their results are placed on the given bus. The output channels from the instruments listed each are placed on a separate channel of the bus. Multiple route statements onto the same bus indicate that the given instrument outputs should be summed on the bus. Multiple route statements with differing numbers of channels referencing the same bus are illegal, unless each statement has either n channels or 1 channel. In this case, each of the one-channel route statements places the same signal on each channel of the bus, which is n channels wide.
There shall be at least one instrument name in the instrument list (the NULL Subclause in the grammar is provided so that constructions appearing later may use the same production).
EXAMPLES
Assume that instruments a, b, and c produce one, two, and three channels of output, respectively.
1. The sequence
route(bus1, a, b);
route(bus1, c);
is legal and specifies a three-channel bus. The first bus channel contains the sum of the output of a and the first channel of c; the second contains the sum of the first output channel of b and the second of c; and the third contains the sum of the second channel of b and the third channel of c.
2. The sequence
route(bus1,b);
route(bus1,c);
is illegal since the statements refer different numbers of channels to the same bus.
3. The sequence
route(bus1,a,c);
route(bus1,a);
route(bus1,b,b);
is legal and specifies a four-channel bus. The first and third route statements each refer to four channels of audio, and the second refers to one channel, which will be mapped to each of the four channels.
The resulting channel values are as follows, using array notation to indicate the channel outputs from each instrument:
|
Channel |
Value |
|
1 |
a + a + b[1] |
|
2 |
c[1] + a + b[2] |
|
3 |
c[2] + a + b[1] |
|
4 |
c[3] + a + b[2] |
It is illegal for a route statement to reference a bus which is not the special bus output_bus and which does not occur in a send statement. See Subclause 5.4.5.5.
It is illegal for a route statement to refer to the special bus input_bus (see Subclause 5.11.2).
All instruments which are not referred to in route statements place their output on the special bus output_bus, except for an effect instrument to which output_bus was sent (see Subclause 5.4.5.5). The same rules for allowable channel combinations to the special bus output_bus apply as if the route statements were explicit; these rules are implicit in the rules for the output statement, see Subclause 5.4.6.6.8.
<send statement> -> send ( <ident> ; <expr list> ; <identlist> );
<identlist>
The send statement creates an instrument instantiation, defines busses, and specifies that the referenced instrument is used as an effects processor for those busses.
All busses in the orchestra are defined by using send statements. It is illegal for a statement referencing a bus to refer to a bus which is not defined in a send statement. The exception is the special bus output_bus which is always defined.
The identifier in the send statement references an instrument which will be used as a bus-processing instrument, also called effect instrument. There is no syntactic distinction between effect instruments and other instruments. The identifier list references one or more busses which shall be made available to the effect instrument through its input standard name, as follows:
The first n
0 channels of input, channels 0 through n0-1 are the n0 channels of the first referenced bus;In addition, the grouping of busses in the input array shall be made available to the effect instrument through its inGroup standard name, as follows:
The first n
0 values of inGroup have the value 1;The expression list is a list of zero or more i-rate expressions which are provided to the effect instrument as its parameter fields. Any expression which is i-rate (see Subclause 5.4.6.7.2) is legal as part of this list; in particular, reference to i-rate global variables is allowed. The number of expressions provided shall match the number of parameter fields defined in the instrument declaration; otherwise, it is a syntax error.
The effect instrument referred to in a send statement shall be instantiated no later than immediately after the first instantiation of an instrument which either is routed to a bus which is sent to the effect instrument or refers to the bus in an outbus or sbsynth statement. These instrument instantiations shall remain in effect until the orchestra synthesis process terminates. One instrument instantiation shall be created for each send statement in the orchestra. If such an instrument instantiation utilises the turnoff statement, the instantiation is destroyed (and sound is no longer routed to it). No other changes are made in the orchestra.
Any bus may be routed to more than one effect instrument, except for the special bus output_bus. The special bus output_bus represents the second-to-finalmost processing of a sound stream; it may only be sent to at most one effect instrument, and it is a syntax error if that instrument is itself routed or makes use of the outbus statement. If output_bus is not sent to an instrument, it is turned into sound at the end of an orchestra cycle (see Subclause 5.3.3.3); if output_bus is sent to an instrument, the output of that instrument is turned into sound at the end of an orchestra pass. This instrument is not permitted to use the turnoff statement.
At least one bus name shall be provided in the send instruction.
<sequence specification> -> sequence ( <identlist> ) ;
<identlist>
The sequence statement allows the specification of the ordering of execution of instrument instantiations by the run-time scheduler. The identlist references a list of instruments which describes a partial ordering on the set of instruments. If instrument a and instrument b are referenced in the same sequence statement with a preceding b, then instantiations of instrument a shall be executed strictly before instantiations of instrument b.
There are several default sequence rules:
1. The special instrument startup is instantiated and the instantiation executed at the i-rate at the very beginning of the orchestra.
2. Any instrument instances corresponding to the startup instrument are executed first in a particular orchestra cycle.
3. If output_bus is sent to an instrument, the instrument instantiation corresponding to that send statement is the last instantiation executed in the orchestra cycle.
4. For each instrument routed to a bus which is sent to an effect instrument, instantiations of the routed instrument are executed before instantiations of the effect instrument. If loops are created using route and send statements, the ordering is resolved syntactically: whichever send statement occurs latest, that instrument instantiation is executed latest.
Default rules 2, 3, and 4 may be overridden by use of the sequence statement. Rule 1 cannot be overridden.
It is a syntax error if explicit sequence statements create loops in ordering. Any send statements which are the "backward" part of an implicit send loop have no effect.
If the sequence of two instruments is not defined by the default or explicit sequence rules, their instantiations may be executed in any order or in parallel.
It is not possible to specify the ordering of multiple instantiations of the same instrument; these instantiations can be run in any order or in parallel.
EXAMPLES
An orchestra consists of five instruments, a, b, c, d, and e.
1. The following code fragment
route(bus1, a, b);
send(c; ; bus1);
is legal and specifies (using the default sequencing rules) that instantiations of instruments a and b shall be executed strictly before instantiations of instrument c. This ordering applies to all instantiations of instrument c, not only to the one corresponding to the send statement. No ordering is specified between instruments a and b.

route(bus1, a, b);
send(c; ; bus1);
sequence(c,a);
send(d; ; bus1);
is legal and specifies that instantiations of instrument b shall be executed first, followed by instantiations of instrument c, followed by instantiations of instrument a, followed by instances of instrument d. The ordering of b and c, and a and b with d, follows from default rule 3; the placement of instrument c follows from the explicit sequence statement, which overrides default rule 3. Due to this ordering, the output samples of instrument a are not provided to instrument c (they get put on the bus "too late"), and however many channels of output this represents are set to 0 in instrument c. The output samples of instrument a are provided to instrument d.
3. The following code fragment
sequence(a,b);
sequence(b,c,d);
sequence(c,e);
sequence(e,a);
is ille