INTERNET-DRAFT John Lazzaro September 22, 2002 John Wawrzynek Expires: March 22, 2003 UC Berkeley The MIDI Wire Protocol Packetization (MWPP) Status of this Memo This document is an Internet-Draft and is subject to all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/1id-abstracts.html The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html Abstract The MIDI Wire Protocol Packetization (MWPP) is a general-purpose RTP packetization for the MIDI command language. MWPP is suitable for use in both interactive applications (such as the remote operation of musical instruments) and content-delivery applications (such as MIDI file streaming). MWPP is suitable for use over unicast and multicast UDP, and defines tools that support the graceful recovery from packet loss. MWPP may also be used over reliable transport such as TCP. The SDP parameters defined for MWPP support the customization of stream behavior (including the MIDI rendering method) during session setup. MWPP is compatible with the MPEG-4 generic RTP payload format, to support the MPEG 4 Audio object types for General MIDI, DLS2, and Structured Audio. Lazzaro/Wawrzynek [Page 1] INTERNET-DRAFT 22 September 2002 0. Change Log for Many document changes were made, in response to comments from experts (noted parenthetically below) received on -04.txt: o Table of contents added (Colin Perkins). o Rewritten abstract, Section 1, and Section 2. o In Section 2, implementors are warned to refer to references [1], [2], and [3] for normative descriptions of MIDI and RTP. Uses of RTP/AVT have been changed to RTP/AVP, and uses of fmpt have been changed to fmtp (Colin Perkins). o MIDI acronym definition is now correct (Michel Jullian). o The preamble of Section 2 discusses factors in the choice of transport for an MWPP stream. These factors include unsuitability of the recovery journal for archival applications (Herbie Robinson). o Several changes to Section 2.1. The definition of the MWPP RTP timestamp has changed, so that the sampling instant is compatible with RTCP statistics. Also, the timestamp increment policy is clarified. In Section 2.2, the MTU limits for the MWPP payload size over UDP is made explicit, and the Ethernet 1500 MTU is used as an example. In addition, rationale for multiple MWPP streams is given, with an explanation of the 16-channel MIDI namespace that is explicitly coded in the MIDI command syntax (Colin Perkins and Dominique Fober). o Recovery journal changes to Section 2.2. The top-level payload definition has been generalized, so that alternative journal systems may be defined to replace the recovery journal. The lack of deep redundancy in the recovery journal is noted; UDP (RFC2733, RFC2198) and TCP approaches to deep redundancy are discussed as is the possibility of adding deep redundancy to an alternative recovery journal format (Herbie Robinson). o In Section 3, an efficiency justification for the delta time coding method has been added (Colin Perkins). o In Section 4, the high-level interface to the recovery journal system has been redesigned, in response to issues raised by Dominique Fober, Colin Perkins, and Herbie Robinson. The related SDP parameters have been reworked (Appendix C.1), Lazzaro/Wawrzynek [Page 2] INTERNET-DRAFT 22 September 2002 but journal bitfield syntax and semantics are unchanged, apart from the removal of the G bit from the recovery journal header. Highlights of the redesign (see Section 4 for details): -- We classify the types of artifacts that may result from lost MIDI commands, following suggestions from Herbie Robinson. -- We define the quality of recovery a compliant recovery journal system must produce. This mandate is expressed in terms of the artifact classification scheme noted above. -- We provide normative tools (the journal bitfields, SDP, ...) as building blocks that implementations may use to produce the mandated quality. However, we do not normatively define how to use the tools to meet the mandate. -- We sketch three general approaches to fulfilling the mandate: a simple RTP-only approach for when bandwidth is not an issue, a closed-loop RTCP approach for denser MIDI streams, and an RTP-only approach that combines sender and receiver semantics to reduce the stream bandwidth. These sketches are in the memo in order to prepare the reader to understand the definitions of the normative tools. A forthcoming informative memo will expand upon each of these approaches, with sufficient detail to be useful as a cookbook for implementors. Many of these ideas follow from suggestions by Dominique Fober. -- For closed-loop systems, fallback strategies for handling unresponsive receivers in closed-loop RTCP sending strategies are discussed. For open-loop systems, the importance of sender behavior in minimizing the occurrence of overrun is discussed, and the inadvisability of updating the checkpoint packet identity if the update doesn't reduce the journal size is noted (Dominique Fober). -- Caveats for multicast scaling are presented, in response to suggestions by Colin Perkins. o In Section 5, the use of the Checkpoint Sequence Number field by receivers to detect overruns is described, and Lazzaro/Wawrzynek [Page 3] INTERNET-DRAFT 22 September 2002 the G bit is removed. o In Section 6 and Appendix C.4, SDP examples that contain full session descriptions are now legal session description, complete with v=, o-, s=, and t= lines. The other SDP examples have all been renamed media stream examples, and show syntax for an MWPP media stream component of a session description only. In Section 6.2, the role of the MTU in limiting the size of the AudioSpecificConfig string has been clarified (Colin Perkins). o The preamble of Appendix C now includes a table of the MWPP SDP parameters, with pointers to the section of the document that defines the parameter (Colin Perkins). o Appendix C.1 has been rewritten. All-new SDP parameters support the configuration of journal semantics. o In Appendix C.4, we clarify the responsibilities for senders and receivers when splitting a MIDI name space across several MWPP streams (Martijn Sipkema). o In Appendix C.5, we clarify the extension policy for the render parameter. The render parameter may only be extended via IETF standards-track documents, but these documents are expected to define complete registration hierarchies for rendering algorithms, whose management will be independent of the IETF. For example, a registration hierarchy could be based on the MIDI Manufacturer's Association tree for System Exclusive IDs (in response to discussions at Yokohama). o A new Appendix C.6 defines the syntax of the MWPP parameters, using ABNF. A new Appendix C.7 lists the IANA considerations for the three MIME contexts of MWPP: the audio/mwpp, audio/mpeg4-generic, and audio/sasc MIME types (Colin Perkins). o Appendix D, a MIDI overview for networking specialists new to computer music applications, has been added. References to Appendix D appear several times at appropriate places in the document (Colin Perkins). o The terms "native MWPP streams" and "mpeg4-generic MWPP streams" are now used throughout the document, to reference MWPP streams that use the mwpp and mpeg4-generic MIME types, respectively. Lazzaro/Wawrzynek [Page 4] INTERNET-DRAFT 22 September 2002 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.1 MWPP RTP Overview . . . . . . . . . . . . . . . . . . . . . 8 1.2 Overview of SDP Parameters for MWPP . . . . . . . . . . . . 9 2. MWPP Packet Format. . . . . . . . . . . . . . . . . . . . . . . 9 2.1 RTP Header . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2 MWPP Payload . . . . . . . . . . . . . . . . . . . . . . . 12 3. MIDI Command Section . . . . . . . . . . . . . . . . . . . . . . 14 4. Recovery Journal Overview . . . . . . . . . . . . . . . . . . . . 19 4.1. Recovery Journal Sender Strategies . . . . . . . . . . . . 21 5. Recovery Journal Format . . . . . . . . . . . . . . . . . . . . . 24 6. MWPP and the Session Description Protocol . . . . . . . . . . . . 26 6.1 Session Descriptions for Native MWPP Streams . . . . . . . 27 6.2 Session Description for mpeg4-generic MWPP Streams . . . . 29 6.3 MWPP SDP Parameters . . . . . . . . . . . . . . . . . . . . 31 7. Security Considerations . . . . . . . . . . . . . . . . . . . . . 32 8. Congestion Control . . . . . . . . . . . . . . . . . . . . . . . 33 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 33 Appendix A. The Recovery Journal Channel Chapters . . . . . . . . . 34 Appendix A.1. Recovery Journal Definitions . . . . . . . . . . 34 Appendix A.2. Chapter P: MIDI Program Change . . . . . . . . . 36 Appendix A.3. Chapter W: MIDI Pitch Wheel . . . . . . . . . . . 36 Appendix A.4. Chapter N: MIDI NoteOff and NoteOn . . . . . . . 37 Appendix A.5. Chapter A: MIDI Poly Aftertouch . . . . . . . . . 39 Appendix A.6. Chapter T: MIDI Channel Aftertouch . . . . . . . 40 Appendix A.7. Chapter C: MIDI Control Change . . . . . . . . . 40 Appendix A.8. Chapter M: MIDI Parameter System . . . . . . . . 44 Appendix B. The Recovery Journal System Chapters . . . . . . . . . . 48 Appendix B.1. System Chapter D: Reset, etc. . . . . . . . . . 48 Appendix B.2. System Chapter V: Active Sense Command . . . . . 49 Appendix B.3. System Chapter Q: Sequencer State Commands . . . 49 Appendix B.4. System Chapter E: MIDI Time Code . . . . . . . . 52 B.4.1 Informative Description of Chapter E . . . . . . . 53 B.4.2 Normative Definition of Chapter E . . . . . . . . . 53 Appendix B.5. System Chapter X: System Exclusive . . . . . . . 56 Appendix C. Session Description Protocol (SDP) Definitions . . . . . 60 Appendix C.1. The Journalling System . . . . . . . . . . . . . 62 C.1.1. The j_sec and j_update Parameters . . . . . . . . . 62 C.1.2. Chapter Inclusion Parameters . . . . . . . . . . . 63 Appendix C.2. Command Execution Semantics . . . . . . . . . . . 66 C.2.1 Description of the async method . . . . . . . . . . 67 C.2.2 Description of the buffer method . . . . . . . . . . 68 Appendix C.3. Media Time . . . . . . . . . . . . . . . . . . . 69 Appendix C.4. Multiple Streams . . . . . . . . . . . . . . . . 70 C.4.1 The midiport parameter . . . . . . . . . . . . . . . 70 C.4.2 The zerosync parameter . . . . . . . . . . . . . . . 72 Lazzaro/Wawrzynek [Page 5] INTERNET-DRAFT 22 September 2002 Appendix C.5. MIDI Rendering . . . . . . . . . . . . . . . . . 75 C.5.1 The sasc Method . . . . . . . . . . . . . . . . . . 76 Appendix C.6. ABNF Specifications for MWPP Parameters . . . . . 78 Appendix C.7. IANA Considerations . . . . . . . . . . . . . . . 82 Appendix C.7.1 mwpp MIME Registration . . . . . . . . . . 82 Appendix C.7.2 mpeg4-generic MIME Registration . . . . . . 84 Appendix C.7.3 sasc MIME Registration . . . . . . . . . . 87 Appendix D. A MIDI Overview for Networking Specialists . . . . . . . 90 Appendix E. Author Addresses . . . . . . . . . . . . . . . . . . . . 92 Appendix F. References . . . . . . . . . . . . . . . . . . . . . . . 93 Lazzaro/Wawrzynek [Page 6] INTERNET-DRAFT 22 September 2002 1. Introduction The Internet Engineering Task Force (IETF) has developed a set of focused tools for multimedia networking ([2] [9] [10] [12]). These tools can be combined in different ways to support a variety of real-time applications over Internet Protocol (IP) networks. For example, to support IP telephony, applications might use the Session Initiation Protocol (SIP, [10]) to set up phone calls. Call setup might include negotiations (using the SIP offer/answer protocol [11]) to agree on a common audio codec. These negotiations would use the Session Description Protocol (SDP, [9]) to describe candidate codecs. After a call is set up, audio data would flow between the participants using the Real Time Protocol (RTP, [2]) under the Audio/Visual Profile (RTP/AVP, [3]). The tools used in this telephony example (SIP, SDP, RTP/AVP) might be combined in a different way to support a content streaming application, perhaps in conjunction with other tools (such as the Real Time Streaming Protocol (RTSP, [12])). The Musical Instrument Digital Interface (MIDI, [1]), a standard for musical instrument control, is widely used in applications that are roughly analogous to the example applications described above. On stage and in the recording studio, MIDI is used for the interactive remote control of musical instruments, an application similar to spirit to telephony. On web pages, Standard MIDI Files [1] rendered using the General MIDI standard [1] provide a low-bandwidth substitute for audio streaming, suitable for simple "background music" uses. This memo is motivated by a simple premise: if MIDI performances could be sent as RTP streams that are managed by IETF session tools, a hybridization of the MIDI and IETF application domains may occur. For example, manufacturers of professional audio equipment and electronic musical instruments may consider adopting the IETF multimedia stack (IP, SIP, RTP) as the networking layer for a MIDI control plane. As another example, the audio streaming community may begin to use gestural codes (such as MIDI) for normative low-bitrate audio, perhaps using the sound synthesis standards described in [5] or [18]. To provide a foundation for these new applications, this memo extends two of the IETF tools (RTP and SDP) to support the MIDI standard. The memo extends RTP by adding a new packetization, the MIDI Wire Protocol Packetization (MWPP), to the Audio/Visual Profile. The memo extends SDP by defining a set of SDP parameters to support the configuration and negotiation of MIDI endpoint behaviors using SIP, RTSP, and other IETF session setup tools. The scope of this memo is limited in several respects. This memo normatively defines the syntax and semantics of MWPP, an RTP Lazzaro/Wawrzynek [Page 7] INTERNET-DRAFT 22 September 2002 packetization for MIDI. However, this memo does not define algorithms for sending and receiving MWPP RTP packets. An ancillary IETF document [22] provides informative guidance on MWPP algorithms. Supplemental information may be found in related conference publications [6] [8] and reference software [7]. The scope of this memo is also limited in that it defines MIDI extensions for RTP and SDP, but it does not define frameworks for using RTP, SDP and other IETF tools in any specific MIDI application domain. Other documents, from the IETF or from other organizations, may define frameworks that incorporate MWPP, but this memo does not. 1.1 MWPP RTP Overview The first part of this memo (Sections 2-5, Appendices A.1-8 and B.1-5) defines the MIDI Wire Protocol Packetization (MWPP), a MIDI RTP [2] packetization for the Audio/Visual Profile [3]. The MIDI standard [1] defines a command set that describes sound as a series of events (NoteOn command to start a musical note event, NoteOff command to end a note, etc). Commands execute on one of the 16 voice channels (a voice channel is usually devoted to a single instrument timbre) or on the special systems channel. The command syntax explicitly codes the execution channel. See Appendix D for a more detailed introduction to MIDI. MWPP maps a single MIDI command stream (16 voice channels + systems) onto an RTP stream. Section 2 of this memo introduces the modular design of MWPP packetization. The simplest form of MWPP uses the MIDI command section (described in Sections 3) as a complete self-framed RTP payload. This lightweight version of MWPP is suitable for use over reliable transport such as TCP. MWPP is also suitable for use over unreliable transport such as unicast and multicast UDP. The term unreliable transport means that packets may be lost in transit or delivered out-of-order. MWPP provides feed-forward resiliency by inserting a journal section (such as the recovery journal, described in Sections 4 and 5 and Appendices A.1-8 and B.1-5) into each RTP packet. The journal codes the recent history of the stream. Receivers use the journal to gracefully recover from packet loss and out-of-order packet delivery. MWPP supports the two command execution timing methods defined in the MIDI standard: the implicit "time-of-arrival" code used in the MIDI wire protocol (a networking standard for the remote operation of musical instruments over short asynchronous serial lines), and the explicit timestamps of Standard MIDI Files (a file format for representing complete musical performances). Lazzaro/Wawrzynek [Page 8] INTERNET-DRAFT 22 September 2002 1.2 Overview of SDP Parameters for MWPP The second part of this memo (Section 6 and Appendices C.1-5) defines Session Description Protocol (SDP, [9]) parameters for MWPP. These parameters may be used to customize (and perhaps negotiate [11]) the configuration of an MWPP session, by using SDP in conjunction with session setup tools like SIP [10] or RTSP [12]. For example, the extensible SDP parameter "render" configures the method of rendering the MIDI command stream into audio output (Appendix C.5). Other SDP parameters provide tools for structuring multiple MWPP streams (Appendix C.4), setting the resiliency configuration (Appendix C.1), and customizing the MWPP timestamp semantics (Appendix C.2). Section 6 describes the SDP syntax for binding an MWPP stream to a MIME type. MWPP supports two MIME types: the general-purpose mwpp MIME type, and the mpeg4-generic MIME type [4]. The mpeg4-generic MIME type supports MIDI rendering using the MPEG 4 Audio synthesis tools (General MIDI [1], DLS2 [18], and Structured Audio [5]). In this memo, the phrase "native MWPP stream" refers to an MWPP stream that uses the mwpp MIME type. The phrase "mpeg4-generic MWPP stream" refers to an MWPP stream that uses the mpeg4-generic MIME type. 2. MWPP Packet Format. In this section, we introduce the format of MWPP RTP packets. The description includes some background information on RTP/AVP, for the benefit of MIDI implementors new to IETF tools. Likewise, Appendix D provides a MIDI overview, for the benefit of networking specialists new to musical applications. However, implementors should consult the normative documents for RTP/AVP [2,3] and MIDI [1] for authoritative descriptions of these standards. An RTP media stream is a sequence of logical packets that share a common format. Each RTP packet consists of two parts: the RTP header and a payload. Figure 1 shows this format for MWPP RTP packets (vertical space delineates the header from the payload). Lazzaro/Wawrzynek [Page 9] INTERNET-DRAFT 22 September 2002 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | V |P|X| CC |M| PT | Sequence number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Timestamp | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SSRC | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | MIDI command section ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Journal section ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 1 -- MWPP packet format We describe RTP packets as "logical" packets to highlight the fact that RTP itself does not define a transport protocol. Instead, RTP packets are mapped onto network protocols (such as unicast UDP, multicast UDP, or TCP) by an application [13]. An application chooses a particular network protocol for an MWPP stream based on several factors: o Some MWPP applications are a good match to the one-to-many architecture of UDP multicast transport (one piano keyboard controlling several synthesizers over a LAN, one streaming server broadcasting to many receivers over a WAN, etc). o Low latency is a desirable property for some MWPP interactive applications. The latency of unicast UDP and TCP are comparable over networks with very low loss. For higher loss rates, UDP has a latency advantage: TCP packet transmission adds a round-trip time to packet latency, and head-of-line blocking further increases latency. o Some MWPP applications have an archival requirement. In these applications, the receiver performs two functions: a real-time rendering of the stream (which may be imperfect) and an archival recording of the stream (which must be perfect). TCP provides reliability for archival applications at the transport layer. While it is possible to enhance MWPP with resiliency tools to support archival work over UDP, the default resiliency tool presented in this memo does not provide archival support. Lazzaro/Wawrzynek [Page 10] INTERNET-DRAFT 22 September 2002 o Low-cost embedded environments may not support TCP, forcing a UDP solution. The IETF multimedia toolkit is designed to work in UDP-only environments. Next, we describe the RTP header and payload, in separate sections. 2.1 RTP Header The RTP header begins with an octet of fields (V, P, X, and CC) to support specialized RTP uses (see [2] and [3] for details). For the bulk of RTP applications, V is set to 2, and the P, X, and CC fields are set to 0. These default values yield an RTP stream with a fixed header size of 12 octets. If network bandwidth is at a premium, header compression [14] may be used to reduce overhead. The second RTP header octet holds the M and PT fields. The 1-bit M field is set to 1 for all MWPP packets. The 7-bit PT field encodes the payload format type. The PT field value for a stream is set during session configuration by the SDP rtpmap line (Sections 6.1 and 6.2). The other RTP header fields code the 16-bit sequence number and 32-bit timestamp for the packet, and the 32-bit sender identification number (SSRC) for the stream. These unsigned integer values are coded in the IETF network byte ordering (big-endian). We discuss the timestamp and sequence fields below, and refer the reader to [2] for information on the SSRC field. The sequence number is initialized to a randomly chosen value, and is incremented by one (modulo 65536) for each packet sent in the stream. A related quantity, the 32-bit extended packet sequence number, may be computed by tracking rollovers of the 16-bit sequence number. Note that different receivers in the same session may compute different extended packet sequence numbers, depending on when the receiver joined the session. The RTP timestamp sets the base timestamp value for the packet. The MWPP payload codes MIDI command timestamps relative to this base timestamp value (Section 3). The sampling instant of the RTP packet (used in [2] to calculate stream statistics) is the command timestamp of the first MIDI command in the MIDI command section. If an RTP packet has an empty MIDI command section, the RTP timestamp of the packet codes the sampling instant for the packet. The RTP timestamp units are set during session configuration by the SDP rtpmap parameter srate (Sections 6.1 and 6.2). For example, if configuration sets srate to a value of 44100 Hz, two MWPP packets whose base timestamp values differ by 2 seconds have RTP timestamp fields that differ by 88200. By default (Appendix C.4.2) the timestamp field is Lazzaro/Wawrzynek [Page 11] INTERNET-DRAFT 22 September 2002 initialized to a randomly chosen value. MWPP RTP timestamps do not necessarily increment at a fixed rate, because MWPP packets are not necessarily sent at a fixed rate. The timestamps for two sequential RTP packets may be identical, or the second packet may have a timestamp arbitrarily larger than the first packet (modulo 2^32). This RTP timestamp definition supports interactive applications that vary the packet rate to track the gestural rate of a human performer [6]. However, note that the RTP timestamps of an MWPP stream MAY increment at a fixed rate, and the MWPP payload includes features to support fixed-rate operation. In this way, MWPP is compatible with content-streaming servers that require a fixed-rate timestamp increment policy. MWPP defines the length of media time a packet encodes as the RTP timestamp difference (modulo 2^32) between the packet's successor and the packet itself. By default, the media time for a packet may be arbitrarily long. However, a maximum media time for MWPP packets in a stream may be set during session configuration, via the SDP parameter maxptime (Appendix C.3). 2.2 MWPP Payload The MWPP payload (Figure 1) MUST begin with the MIDI command section. The MIDI command section codes a (possibly empty) list of timestamped MIDI commands, and provides the essential service of MWPP. The payload may also contain a journal section. The journal section provides resiliency by coding the recent history of the stream. Section 3 defines the format for the MIDI command section. Sections 4-5 and Appendices A.1-8 and B.1-5 define the recovery journal, the default format for journal section. Here, we describe how these payload sections operate in an MWPP stream. During session configuration, the journalling method for an MWPP stream is set. A stream may be set up to use the recovery journal, to use an alternative journal format (not defined in this memo), or to not use journalling. Alternative journal formats may pair recovery services with other functions, such as deep history coding (for redundancy) or archival support. By default, the journalling method of a stream is inferred from its transport type. Streams that use unreliable transport (such as UDP) default to using the recovery journal. Streams that use reliable transport (such as TCP) default to not using journalling. Appendix C.1.1 defines tools for overriding these defaults. Lazzaro/Wawrzynek [Page 12] INTERNET-DRAFT 22 September 2002 If an MWPP stream uses the recovery journal, every payload in the stream MUST include a journal section. If an MWPP stream does not use journalling, a journal section MUST NOT appear in a stream payload. If an MWPP stream uses an alternative journal format, the specification for the journal format defines an inclusion policy. The recovery journal codes the minimal information needed for a graceful recovery (ending stuck notes, updating channel volumes, etc) from a packet loss episode. The minimal approach is a good fit to low-latency interactive applications. High-latency content-streaming applications may benefit by augmenting the recovery journal with a deeper redundancy layer, using generic RTP tools [20] [21]. In general, it is not possible to reconstruct the lost MIDI command stream from the recovery journal contents. Therefore, an unreliable MWPP stream protected by the recovery journal system is not suitable for use in archival applications (as described in the preamble of Section 2). Archival applications should use MWPP over a reliable transport like TCP. The payload of an MWPP stream encodes data for a single MIDI command namespace (16 voice channels + systems). Applications may use several MWPP streams in a session to customize the session namespace. For example, an application may use 2 MWPP streams to send 32 MIDI voice channels. As a second example, an application may split a single MIDI namespace between a UDP MWPP stream and a TCP MWPP stream, to separate real-time data and archival bulk data. Session configuration tools for multiple MWPP streams are defined in Appendix C.4. The sender of an MWPP stream often has some sort of model of the method the receiver uses to render MIDI into audio (or sometimes, into control actions such as the rewind of a tape deck or the dimming of stage lights). Appendix C.5 defines session configuration tools to set the MIDI rendering model for an MWPP stream. These tools support standards- based models (such as the General MIDI [1], DLS2 [18], and Structured Audio [5] profiles of MPEG 4 Audio [5]), and may be extended to support proprietary MIDI renderers. The theoretical size of the MIDI command section ranges from 1 to 16384 octets; the theoretical size of a recovery journal ranges from 3 to 17394 octets. If an MWPP stream is sent over UDP transport, the Maximum Transmission Unit (MTU) of the underlying network limits the practical size of these payload sections (for example, an Ethernet MTU is 1500 octets). The session configuration tools defined in Appendix C.4 may be used to split a dense MIDI namespace into several UDP MWPP streams, so that the MWPP payload fits comfortably into an MTU. Lazzaro/Wawrzynek [Page 13] INTERNET-DRAFT 22 September 2002 3. MIDI Command Section Figure 2 shows the format of the MIDI command section. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |B|Z| LEN ... | MIDI list ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 2 -- MIDI command section The MIDI command section begins with a variable-length header. The header field LEN codes the length (in units of octets) of the MIDI list that follows the header. If the header flag B is 0, the header is one octet long, and LEN is a 6-bit field, supporting a maximum MIDI list length of 63 octets. If B is 1, the header is two octets long, and LEN is a 14-bit field, supporting a maximum MIDI list length of 16383 octets. A LEN value of 0 is legal, and codes an empty MIDI list. If LEN is nonzero, the MIDI list has the structure shown in Figure 3. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Delta Time 0 (if Z = 1) | MIDI Command 0 ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Delta Time 1 | MIDI Command 1 ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Delta Time 2 | MIDI Command 2 ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ..... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Delta Time N | MIDI Command N ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 3 -- MIDI list structure If the header flag Z is 1, the MIDI list begins with a complete MIDI command (MIDI Command 0) preceded by a delta time (Delta Time 0). If Z is 0, the Delta Time 0 field is not present in the MIDI list, and MIDI Command 0 has an implicit delta time of 0. The MIDI list structure may Lazzaro/Wawrzynek [Page 14] INTERNET-DRAFT 22 September 2002 also optionally encode a list of N additional complete MIDI commands. Each additional command is preceded by a delta time. The MWPP delta time syntax is a modified form of the MIDI File delta time syntax [1]. MWPP delta times use 1-4 octet fields to encode 32-bit unsigned integers. Figure 4 shows the encoded and decoded forms of delta times. Note that delta time values may be legally encoded in multiple formats; for example, there are four legal ways to encode the zero delta time (0x00, 0x8000, 0x800000, 0x80000000). One-Octet Delta Time: Encoded form: 0ddddddd Decoded form: 00000000 000000000 00000000 0ddddddd Two-Octet Delta Time: Encoded form: 1ccccccc 0ddddddd Decoded form: 00000000 00000000 00cccccc cddddddd Three-Octet Delta Time: Encoded form: 1bbbbbbb 1ccccccc 0ddddddd Decoded form: 00000000 000bbbbb bbcccccc cddddddd Four-Octet Delta Time: Encoded form: 1aaaaaaa 1bbbbbbb 1ccccccc 0ddddddd Decoded form: 0000aaaa aaabbbbb bbcccccc cddddddd Figure 4 -- Decoding delta time formats MWPP uses delta times to encode a timestamp for each MIDI command. The timestamp for MIDI Command K is the summation (modulo 2^32) of the RTP timestamp and decoded delta times 0 through K. This cumulative coding technique, borrowed from MIDI File delta time coding, is efficient because it reduces the number of multi-octet delta times. All command timestamps in a packet MUST be less than or equal to the RTP timestamp of the next packet in the MWPP stream (modulo 2^32). By default, a command timestamp indicates the execution time for the command. The difference between two timestamps indicates the time delay between the execution of the commands. This difference may be zero, coding simultaneous execution. MIDI sources that use explicit command timestamps, such as the MIDI file format, are simple to transcode into Lazzaro/Wawrzynek [Page 15] INTERNET-DRAFT 22 September 2002 MWPP streams using these default semantics. MIDI command sources that use implicit command timing, such as the MIDI wire protocol, must be annotated with timestamps as part of the MWPP transcoding process. The hardware and systems environment for an application may dictate a particular approach to timestamps, that may not be a good fit for the default MWPP timestamp semantics. To address this issue, the semantics of command timestamps may be customized during session configuration, as described in Appendix C.2. As a rule, each MIDI Command field in the MIDI list contains a complete MIDI command, in the binary command format defined in the MIDI standard [1]. In the remainder of this section, we describe exceptions to this rule. The first MIDI channel command in the MIDI list MUST include a status octet; running status coding, as defined in [1], may be used for all subsequent MIDI channel commands in the MIDI list. As in [1], System Common and System Exclusive messages (0xF0 ... 0xF7) cancel running status state, but System RealTime messages (0xF8 ... 0xFF) do not effect running status state. In the MIDI wire protocol [1], a System RealTime command may be embedded inside of another "host" MIDI command. This syntactic construction is not supported in MWPP: a MIDI Command field in the MIDI list codes exactly one complete MIDI command. To encode an embedded System RealTime command, senders MUST extract the command from its host, and code it in the MIDI list as a separate command. The host command and System RealTime command SHOULD appear in the same MIDI list. The delta time of the System RealTime command SHOULD result in a command timestamp that encodes the System RealTime command placement in its original embedded position. Two methods are provided for encoding MIDI System Exclusive (SysEx) commands in the MIDI list. A SysEx command may be encoded in a MIDI Command field verbatim: an 0xF0 octet, followed by an arbitrary number of data octets, followed by an 0xF7 octet. Alternatively, a SysEx command may be encoded as multiple segments. The command is divided into two or more SysEx command segments; each segment is encoded in its own MIDI Command field in the MIDI list. MWPP supports segmentation in order to encode SysEx commands that encode information in the temporal pattern of data octets. By encoding these commands as a series of segments, each data octet is associated with a delta time. Segmentation may also be useful in coding very large SysEx commands across several RTP packets. Lazzaro/Wawrzynek [Page 16] INTERNET-DRAFT 22 September 2002 To segment a SysEx command, first partition its data octet list into two or more sublists; each sublist must contain at least one data octet. To complete the segmentation, add status octets to the head and tail of each sublist, as detailed in Figure 5. Figure 6 shows example segmentations of a SysEx command. ----------------------------------------------------------- | Sublist Position | Head Status Octet | Tail Status Octet | |-----------------------------------------------------------| | first | 0xF0 | 0xF0 | |-----------------------------------------------------------| | middle | 0xF7 | 0xF0 | |-----------------------------------------------------------| | last | 0xF7 | 0xF7 | ----------------------------------------------------------- Figure 5 -- Command Segmentation Status Octets Lazzaro/Wawrzynek [Page 17] INTERNET-DRAFT 22 September 2002 Original SysEx command: 0xF0 0x01 0x02 0x03 0x04 0x05 0x06 0x07 0x08 0xF7 A two-segment segmentation: 0xF0 0x01 0x02 0x03 0x04 0xF0 0xF7 0x05 0x06 0x07 0x08 0xF7 A different two-segment segmentation: 0xF0 0x01 0xF0 0xF7 0x02 0x03 0x04 0x05 0x06 0x07 0x08 0xF7 A three-segment segmentation: 0xF0 0x01 0x02 0xF0 0xF7 0x03 0x04 0xF0 0xF7 0x05 0x06 0x07 0x08 0xF7 The segmentation with the largest number of segments: 0xF0 0x01 0xF0 0xF7 0x02 0xF0 0xF7 0x03 0xF0 0xF7 0x04 0xF0 0xF7 0x05 0xF0 0xF7 0x06 0xF0 0xF7 0x07 0xF0 0xF7 0x08 0xF7 Figure 6 -- Example segmentations Lazzaro/Wawrzynek [Page 18] INTERNET-DRAFT 22 September 2002 The relative ordering of SysEx command segments in a MIDI list must match the relative ordering of the sublists in the original SysEx command. Only System RealTime MIDI commands may appear between SysEx command segments. If the command segments of a SysEx command are placed in the MIDI lists of two or more RTP packets, the segment ordering rules apply to the concatenation of all affected MIDI lists. The MIDI wire protocol [1] permits a "dropped 0xF7" construction for SysEx commands; in this coding method, the 0xF7 octet is dropped from the end of the SysEx command, and the status octet of the next MIDI command acts both to terminate the SysEx command and start the next command. To encode this construction in MWPP, follow these steps: o Determine the appropriate delta times for the SysEx command and the command that follows the SysEx command. o Insert the "dropped" 0xF7 octet at the end of the SysEx command, to form the standard SysEx syntax. o Code both commands into the MIDI list using the rules above. o Replace the 0xF7 octet that terminates the verbatim SysEx encoding or the last segment of the segmented SysEx encoding with a 0xF5 command. This substitution informs the receiver of the original dropped 0xF7 coding. 4. Recovery Journal Overview In this section we introduce the recovery journal, the default MWPP resiliency tool for unreliable transport. Readers unfamiliar with the semantics of the MIDI command set may wish to review Appendix D before reading this section. MIDI is a fragile code. A single lost command in a MIDI command stream may produce an artifact in the rendered performance. We classify MIDI loss artifacts into three categories: o Transient artifacts. Transient artifacts produce immediate but short-term glitches in the rendered performance. For example, a lost NoteOn command produces a transient artifact: one note fails to play, but no long-term consequences ensure. o Indefinite artifacts (recoverable). Indefinite artifacts produce long-lasting errors in the rendered performance. For example, a lost NoteOff command may produce an indefinite artifact: the note that should have been ended by the lost NoteOff command may sustain indefinitely. Lazzaro/Wawrzynek [Page 19] INTERNET-DRAFT 22 September 2002 However, a lost NoteOff is a recoverable indefinite artifact. A recoverable artifact is one that can be fixed by the renderer without specific knowledge about the lost command. For example, if a renderer suspects a command loss occurred, it can issue NoteOff commands for all active NoteOn commands. By doing so, it recovers from all ongoing NoteOff indefinite artifacts, at the cost of unpleasant transient artifacts. o Indefinite artifacts (unrecoverable). A renderer may need to know specific information about a lost MIDI command in order to perform a recovery. We refer to artifacts that result from the loss of such commands as unrecoverable indefinite artifacts. For example, the loss of a MIDI Controller Change command for the channel volume (controller number 7) produces an unrecoverable indefinite artifact. If this command is lost, all future notes on the channel will play too softly or too loudly. Without knowledge of the volume parameter of the lost command, a renderer cannot reset the channel volume to the correct value. Compliant senders and receivers interoperate to satisfy the following mandate: a MIDI performance rendered from an unreliable MWPP stream MUST NOT contain indefinite artifacts. This memo does not define normative algorithms to meet this mandate. Instead, this memo defines the normative tools that make up the recovery journal system: the recovery journal bitfield format (Section 5, Appendices A.1-8 and B.1-5) and the SDP parameters for customizing the use of the recovery journal (Appendix C.1). These tools, if used judiciously by senders and receivers, are capable of transforming all indefinite artifacts in the received MWPP stream into (at worst) transient artifacts in the rendered MIDI performance. However, the best way to apply the tools depends on the application domain. By mandating an end result, and defining tools to achieve this result, this memo avoids the domain-specific issues inherent in the specification of complete recovery journal algorithms. The recovery journal system is not based on packet retransmission. Instead, each MWPP packet includes a special section (the "recovery journal") that codes the recent history of the stream. By default, the recovery journal codes information about all MIDI command types. Typically, the recovery process begins when a receiver detects a break in the RTP sequence number pattern of the stream. The receiver uses the recovery journal of the break packet to guide corrective rendering actions, such as ending stuck notes and updating out-of-date controller Lazzaro/Wawrzynek [Page 20] INTERNET-DRAFT 22 September 2002 values. By doing so, a compliant receiver transforms all indefinite artifacts in the incoming MWPP stream into transient artifacts in the rendered MIDI performance, thereby fulfilling the mandate. [22] discusses receiver implementation issues in detail. Senders also have a role in fulfilling the mandate. Sender are expected to generate recovery journals which code the information receivers need to transform all possible indefinite artifacts into transient artifacts. In the following section, we examine sender issues in detail. 4.1. Recovery Journal Sender Strategies The recovery journal codes the history of the MWPP stream, back to an earlier packet called the checkpoint packet. The range of coverage for the journal is called the checkpoint history (Appendix A.1). The recovery journal codes the information necessary to recover from the loss of an arbitrary number of packets in the checkpoint history. Senders choose a checkpoint history length for each MWPP packet. We refer to the algorithm a sender uses to choose the history length as the sending strategy. Good sender implementations choose a sending strategy that is well-matched to the network properties of the application (bandwidth constraints, unicast or multicast transport, RTP/RTCP or RTP- only sessions, etc). The simplest sending strategy is the anchored checkpoint strategy: the sender anchors the checkpoint packet at the first packet in the stream for the duration of the session. As a result, the checkpoint history always covers the entire stream. Flexibility is the key benefit of the anchored checkpoint strategy. The strategy works for unicast or multicast streams. It does not use RTCP, and does not require receivers to track the changing identity of a checkpoint packet. In a multicast session, receivers that join the session mid-stream may easily discover the current value of state variables (such as MIDI channel volumes) by parsing the recovery journal of the first received packet. The main limitation of the anchored checkpoint strategy is bandwidth efficiency. Because the checkpoint history covers the entire stream, the size of the recovery journals produced by this strategy usually exceeds the journal size of alternative strategies. However, for some MWPP applications, the absolute bandwidth required by the anchored checkpoint strategy is quite reasonable. Reference [6] analyzes an MWPP stream that uses the anchored checkpoint strategy (Appendix A.4 in [6]). The stream is driven by a realistic model of a musician playing a synthesizer keyboard. The analysis yields a payload Lazzaro/Wawrzynek [Page 21] INTERNET-DRAFT 22 September 2002 bandwidth for the stream of 6.88 kb/s, comparable to modern voice codecs. The recovery journal for this streams asymptotes to a fixed size of 39 octets, comparable to the 36 octets of overhead for the IP, UDP, and RTP headers. Unfortunately, the bandwidth requirements for the anchored checkpoint strategy becomes excessive for dense MWPP streams. For dense streams, the dynamic checkpoint strategy is an efficient alternative to the anchored checkpoint strategy. The dynamic checkpoint strategy assumes the use of RTCP. In an RTCP implementation, receivers periodically issue receiver report (RR) packets to the sender, that code reception statistics. Of particular interest is a 32-bit RR field that codes the extended highest sequence number received (EHSNR) from a sender. In the dynamic checkpoint strategy, an MWPP sender keeps track of the EHSNR reported by each receiver of the stream. These EHSHR values can be used to deduce the smallest checkpoint history that ensures that the rendered MIDI performances are free from indefinite artifacts. If the sender continually updates the checkpoint packet to reduce the checkpoint history to the smallest acceptable size, the average bandwidth of the stream is minimized. The dynamic checkpoint strategy depends on regular RTCP feedback from all receivers. If one receiver stops sending RTCP packets for an extended period of time, the checkpoint history grows larger, and the dynamic checkpoint strategy degrades into an anchored checkpoint strategy. If bandwidth or MTU limits preclude the use of an anchored checkpoint strategy, and no other options are available, a sender MUST terminate the unresponsive receiver stream, rather than reduce the checkpoint history in a way that does not enable receivers to eliminate all indefinite artifacts from the rendered stream. [22] describes sender options for this failure mode in detail. The dynamic checkpoint strategy is compatible with multicast transport, with a few caveats. The sender state scales linearly with the number of receivers, as the sender needs to track the identity and EHSNR value for each receiver. Receivers that join sessions mid-stream must be made aware of MIDI state variables, such as the channel volume for each MIDI channel (unlike the anchored checkpoint strategy, this information may not be in the recovery journal). Finally, the average recovery journal size is not independent of the number of receivers, due to the RTCP reporting interval backoff; the backoff interval may also increase the amount of ancillary state used by certain sending and receiving strategies. [22] describes multicast issues in detail. A third class of checkpoint strategies, named hybrid strategies, Lazzaro/Wawrzynek [Page 22] INTERNET-DRAFT 22 September 2002 addresses the needs of applications whose bandwidth requirements preclude the use of the anchored checkpoint strategy, but whose implementation constraints preclude the use of RTCP (and by inference, the dynamic checkpoint strategy). Hybrid strategies, in effect, multiplex two virtual recovery journals into a single journal structure. One virtual journal uses the anchored checkpoint strategy, and codes MIDI commands whose loss produce unrecoverable indefinite artifacts. The second virtual journal codes all other MIDI commands, and uses an open-loop variant of the dynamic checkpoint strategy (as the lack of RTCP precludes the use of closed- loop techniques). Because the second virtual journal runs open-loop, a long packet loss episode might "overrun" its checkpoint history, and leave the rendered MIDI performance vulnerable to (recoverable) indefinite artifacts. However, receivers are always able to detect overruns, by examining the preamble of the recovery journal. If a receiver detects an overrun, it takes actions to transform all possible recoverable indefinite artifacts into transient artifacts. For example, if note commands are coded in the open-loop journal, all pending NoteOn commands would be canceled after an overrun, because a NoteOff command may have "fallen out" of the journal during the overrun. Hybrid strategies work best if senders manage the size of the checkpoint history to ensure that overruns occur infrequently, by taking into account the delay and loss characteristics of the network. Also, as each checkpoint packet change incurs the risk of an overrun, senders should only move the checkpoint if it reduces the size of the journal. [22] discussed open-loop sender behavior in detail. The SDP parameters defined in Appendix C.1 support hybrid sending strategies. Using these parameters, applications may partition a journal into sections which use the anchored checkpoint strategy and sections which use the open-loop dynamic checkpoint strategy. This memo does not classify the MIDI command set with respect to the loss artifact type (transient, unrecoverable indefinite, or recoverable indefinite), and thus does not mandate the command partitioning between anchored and dynamic strategies in a hybrid strategy. Instead, the senders and receivers implicitly negotiate an artifact classification scheme, by agreeing to use a session description whose journal configuration is acceptable to all participants. Lazzaro/Wawrzynek [Page 23] INTERNET-DRAFT 22 September 2002 5. Recovery Journal Format This section introduces the structure of the recovery journal, and defines the bitfields of recovery journal headers. Appendices A.1-8 and B.1-5 complete the bitfield definition of the recovery journal. The recovery journal has a three-level structure: o Top-level header. o Channel and system journal headers. Encodes recovery information for a single MIDI channel (channel journal) and for all MIDI Systems commands (system journal). o Chapters. Describes recovery information for a single MIDI command type. Figure 7 shows the top-level structure of the recovery journal. A recovery journals consists of a 3-octet header, optionally followed by a system journal and a list of channel journals. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |S|A|Y|R|TOTCHAN| Checkpoint Packet Seqnum | ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ... System journal ... | Channel journals ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 7 -- Top-level recovery journal format If the Y bit is set to 1, a system journal follows the recovery journal header. If the A bit is set to 1, the recovery journal ends with a list of (TOTCHAN + 1) channel journals. If A and Y are both zero, the recovery journal only contains the 3-octet header, and is considered to be an "empty" journal. The S (single-packet loss) bit appears in most recovery journal structures. It helps receivers efficiently parse the recovery journal in the common case of the loss of a single packet. Appendix A.1 defines S bit semantics. The R bit is reserved. The semantics for all R fields are uniform throughout the recovery journal, and are defined in Appendix A.1. The 16-bit Checkpoint Packet Seqnum field codes the sequence number of the checkpoint packet for this journal. The choice of the checkpoint Lazzaro/Wawrzynek [Page 24] INTERNET-DRAFT 22 September 2002 packet sets the depth of the recovery journal history, as defined in Appendix A.1. As described in Section 4, some sending strategies use open-loop techniques, that rely on receivers to detect checkpoint history overruns (the checkpoint history is defined in Appendix A.1). A receiver may check for an overrun by detecting if the Checkpoint Packet Seqnum field is greater (modulo 2^32) than the highest RTP sequence number previously received on the stream. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |S| CHAN |R| LENGTH |P|W|N|A|T|C|M|R| Chapters ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 8 -- Channel journal format Figure 8 shows the structure of a channel journal: a 3-octet header, followed by a list of leaf elements called channel chapters. A channel journal encodes information about MIDI commands on the MIDI channel coded by the 4-bit CHAN header field. The 10-bit LENGTH field codes the length of the channel journal. The semantics for LENGTH fields are uniform throughout the recovery journal, and are defined in Appendix A.1. The third octet of the channel journal header is the Table of Contents (TOC) of the channel journal. The TOC is a set of bits that encode the presence of a chapter in the journal. Each chapter contains information about a certain class of MIDI channel command: o Chapter P: MIDI Program Change (0xC) o Chapter W: MIDI Pitch Wheel (0xE) o Chapter N: MIDI NoteOff (0x8), NoteOn (0x9) o Chapter A: MIDI Poly Aftertouch (0xA) o Chapter T: MIDI Channel Aftertouch (0xD) o Chapter C: MIDI Control Change (0xB) o Chapter M: MIDI Parameter System (part of 0xB) Chapters appear in a list following the header, in order of their appearance in the TOC. Appendices A.2-8 describe the bitfield format for each chapter, and define the conditions under which a chapter type MUST appear in the recovery journal. If any chapter types are required for a channel, an associated channel journal MUST appear in the recovery journal. Lazzaro/Wawrzynek [Page 25] INTERNET-DRAFT 22 September 2002 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |S|D|V|Q|E|X| LENGTH | System chapters ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 9 -- System journal format Figure 9 shows the structure of the system journal: a 2-octet header, followed by a list of system chapters. System chapters code information about a specific class of MIDI Systems command: o Chapter D: Song Select (0xF3), Tune Request (0xF6), Reset (0xFF) o Chapter V: Active Sense (0xFE) o Chapter Q: Sequencer State (0xF2, 0xF8, 0xF9, 0xFA, 0xFB, 0xFC) o Chapter E: MTC Tape Position (0xF1, 0xF0 0x7F 0xcc 0x01 0x01) o Chapter X: System Exclusive (all other 0xF0) If header bits D, V, Q, or E are set to 1, one chapter for each chapter type whose associated bit is set appears in a list following the header. The chapter ordering follows the ordering of chapter header bits in the header bitfield. If header bit X is set to 1, one or more Chapter X bitfields appear at the end of the chapter list. Appendices B.1-5 describe the bitfield format for the system chapters, and define the conditions under which a chapter type MUST appear in the recovery journal. If any system chapter type is required to appear in the recovery journal, the system journal MUST appear in the recovery journal. 6. MWPP and the Session Description Protocol RTP is a standard for the transport of media streams, but RTP does not perform session management for the streams it carries. Instead, RTP is designed to work together with tools that perform session management, such as the Session Initiation Protocol (SIP, [10]) and the Real Time Streaming Protocol (RTSP, [12]). RTP interacts with session management tools via another standard, the Session Description Protocol (SDP, [9]). SDP is a textual format for specifying session descriptions. A session description is an ordered list of declarative statements (or "lines"). A session description includes one or more media stream descriptions. A stream description maps an RTP stream to a network transport (for example, unicast UDP at a certain IP number and port number), and defines the numeric value of the PT field in the RTP header for the stream. A stream description also maps each RTP stream to a Lazzaro/Wawrzynek [Page 26] INTERNET-DRAFT 22 September 2002 media encoding (such as MWPP), and may carry configuration parameters for the media encoding. Session management tools like SIP and RTSP coordinate the exchange of complete session descriptions between session participants. The exchange protocol may by unilateral in nature: a sender proposes a session description, which a receiver must accept in order to join the session. Alternatively, some exchange protocols, like the SIP offer/answer model [11], specify negotiation methods, in which the proposal and acceptance/rejection of session descriptions are components of the negotiation process. In the sections that follow, we show how to construct session descriptions that include MWPP stream descriptions. Section 6.1 defines the stream description syntax for native MWPP streams. Section 6.2 defines the stream description syntax for mpeg4-generic MWPP streams. In Section 6.3, we introduce the SDP parameter extensions for MWPP; these extensions are described in detail in Appendix C.1-5. 6.1 Session Descriptions for Native MWPP Streams In this section, we show the session description syntax for sessions that use native MWPP streams (i.e. MWPP streams layered directly onto RTP). For simplicity, we specialize the syntax for unicast UDP transport. See [15] for the syntax for reliable TCP and TLS transport, and see [9] for the syntax for multicast UDP transport. A session description begins with lines to describe the session characteristics that are common to all streams (session name, start and end time, etc). These common lines do not relate to MWPP, and so we do not discuss them here; instead, we refer the reader to [9]. All session description examples in this memo uses the same set of common lines, shown below: v=0 o=lazzaro 2520644554 2838152170 IN IP4 cs.Berkeley.edu s=Example t=3238012065 0 One or more media stream descriptions follow the common lines of a session description. The minimal SDP stream description consists of three lines: a media (m=) line, a connection data line (c=), and an rtpmap attribute line (a=rtpmap). The media line acts to bind the UDP port number to a RTP payload type and has the syntax: m=audio RTP/AVP The connection line sets the IP number for the RTP stream, and has the Lazzaro/Wawrzynek [Page 27] INTERNET-DRAFT 22 September 2002 syntax: c=IN IP4 The rtpmap line maps the payload type to the MIME type for the stream, and has the syntax: a=rtpmap: /[/] The for native MWPP streams is mwpp. The rtpmap line also sets the sample rate and the number of audio channels. For many MWPP applications, the field is irrelevant or redundant; we include it here for compatibility reasons. Note that the square brackets around indicates it is an optional field; the default value for is 1 (mono). We now show an example session description, that includes one minimal MWPP stream description: v=0 o=lazzaro 2520644554 2838152170 IN IP4 cs.Berkeley.edu s=Example t=3238012065 0 m=audio 5004 RTP/AVP 96 c=IN IP4 169.229.60.64 a=rtpmap: 96 mwpp/44100 In this example, each MWPP packet in the stream has an RTP header PT field value of 96, and the sample rate for the RTP header timestamp field is 44100 Hz (Section 2.1 describes the RTP header fields). The RTP stream flows from sender to receiver over unicast UDP address 169.229.60.64, on port 5004. If the Real Time Control Protocol (RTCP) is in use, a second unicast UDP stream flowing from receiver to sender appears on port 5005. The low-bandwidth RTCP stream carries information about the reception quality of the forward channel (see [2] for details). We describe this stream description as minimal, because it does not customize the stream. Without such customization, a native MWPP stream has these default characteristics: 1. If the stream uses unreliable transport (unicast UDP, multicast UDP, ...) the recovery journal system is in use, and the RTP payload contains both the MIDI command section and the journal section. If the stream uses reliable transport (TCP, TLS, ...), the stream does not use journalling, and the payload contains only the MIDI command section. See Section 2.2 for details. Lazzaro/Wawrzynek [Page 28] INTERNET-DRAFT 22 September 2002 2. If the stream uses the recovery journal system, the stream uses the default format of the recovery journal, as defined in Sections 4 and 5 and Appendices A.1-8 and B.1-5 of this memo. 3. In the MIDI command section of the payload, the command timestamps are interpreted as the command execution time, using the default semantics described in Section 3. 4. An RTP packet does not have a defined maximum media time, and so the timestamp difference between adjacent packets in the stream may be arbitrarily large. See Section 2.1 for details. 5. If more than one minimal mwpp stream appears in a session, the MIDI namespaces for these streams are independent: channel 1 in the first stream does not reference the same MIDI channel as channel 1 in the second stream. In addition, the RTP timestamp fields for the streams do not necessarily share the same random offset value (see Section 2.1), and thus synchronization of the streams must use the generic RTP tools defined in [2]. 6. A MIDI rendering method for the stream is not specified. 6.2 Session Description for mpeg4-generic MWPP Streams In this section, we show the session description syntax for sessions that use mpeg4-generic MWPP streams (i.e. streams that layer MWPP packets onto the mpeg4-generic RTP payload [4]). These streams support MIDI rendering using the MPEG 4 Audio synthetic codecs: o General MIDI (Object Profile ID 14). This profile renders the MIDI stream using the General MIDI standard [1]. o Wavetable Synthesis (Object Profile ID 13). This profile renders the MIDI stream using the DLS2 standard [18]. The session description includes the RIFF file to initialize the wavetable synthesis engine. o Main Synthetic (Object Profile ID 12). This profile renders the MIDI stream using Structured Audio [5], an algorithmic synthesis system based on the programming language SAOL. The session description includes the SAOL program and associated data. Minimal mpeg4-generic MWPP stream descriptions use the same media line, connection line, and rtpmap line format as native MWPP stream descriptions (Section 6.1). The only syntactic difference occurs in the field (mpeg-4-generic replaces mwpp). Lazzaro/Wawrzynek [Page 29] INTERNET-DRAFT 22 September 2002 However, a minimal mpeg4-generic MWPP stream description also sets the value of several mpeg4-generic SDP parameters, using fmtp lines. Two of these parameters (mode and streamtype) must be set to specific constant values to create a legal mpeg4-generic MWPP stream. We show the proper initialization for these parameters in the fmtp line below: a=fmtp: streamtype=5; mode=mwpp; A third required parameter, profile-level-id, takes on the value 74 for Main Synthetic (Object Profile ID 12), 75 for Wavetable Synthesis (Object Profile ID 13), and 76 for General MIDI (Object Profile ID 14). A fourth required parameter, config, is set to a double-quoted hexadecimal string representation of the AudioSpecificConfig() binary data block. Note that the format for AudioSpecificConfig() is shown in [16]. For the Main Synthetic or Wavetable Synthesis profiles, AudioSpecificConfig() codes the system initialization data (DLS2 samples, SAOL programs, etc). The config parameter may also be set to the empty string, which acts as an escape code (see Appendix C.5.1). We now show an example session description, that uses a minimal mpeg4-generic MWPP stream to drive General MIDI (Object Profile ID 14): v=0 o=lazzaro 2520644554 2838152170 IN IP4 cs.Berkeley.edu s=Example t=3238012065 0 m=audio 5004 RTP/AVP 61 c=IN IP4 169.229.60.64 a=rtpmap: 61 mpeg4-generic/44100 a=fmtp: 61 streamtype=5; mode=mwpp; config="e4"; profile-level-id=76; Each packet in the stream has an RTP header PT field value of 61, and the sample rate for the RTP header timestamp field is 44100 Hz (see Section 2.1 for RTP header field descriptions). The profile-level-id value of 76 informs the receiver to render the MIDI stream using the General MIDI object type. The config value is a hexadecimal string encoding of the short AudioSpecificConfig() used by General MIDI. The RTP stream flows from sender to receiver over unicast UDP, at port 5004 on IP number 169.229.60.64. If the Real Time Control Protocol (RTCP) is in use, a second unicast UDP stream flowing from receiver to sender appears on port 5005. The low-bandwidth RTCP stream carries information about the reception quality of the forward channel (see [2] for details). Lazzaro/Wawrzynek [Page 30] INTERNET-DRAFT 22 September 2002 We describe this stream description as minimal, because it defines the SDP parameters that are required for mpeg4-generic operation, but does not customize the stream via additional SDP parameters. In Section 6.1, we describe the behavior of a minimal native MWPP stream, as a numbered list of characteristics. Characteristics 1-4 on that list also describe the minimal mpeg4-generic MWPP stream, but characteristics 5 and 6 require restatements, as listed below: 5. If more than one minimal mpeg4-generic MWPP stream appears in a session, each stream denotes an independent instance of the synthesizer of the object type coded in the profile-level-id parameter. In addition, the RTP timestamp fields for the streams do not necessarily share the same random offset value (see Section 2.1), and thus synchronization of the streams must use the generic RTP tools defined in [2]. 6. The minimal MWPP stream encodes the AudioSpecificConfig() as an inline double-quoted hexadecimal string. This encoding limits the size of the AudioSpecificConfig() in some situations. Specifically, if the session management tool distributes a session description in a single datagram (such as SIP [10] over UDP transport), the size of the AudioSpecificConfig() string is limited by the Maximum Transmission Unit (MTU) of the underlying network (for Ethernet, the MTU is 1500 octets). 6.3 MWPP SDP Parameters This section introduces optional MWPP session description parameters, to add features to the minimal streams described in Sections 6.1 and 6.2. In this section, we briefly discuss the purpose of each parameter, and reference the Appendix C sub-section that contains the complete parameter description. To use an optional parameter in a stream description, include an fmtp line to set the parameter value, in the position mandated by [9]. The syntax for fmtp lines is: a=fmtp: =; =; ... The MWPP optional parameters provide several distinct sets of services: o Journal customization. The j_sec and j_update parameters configure the use of the journal section in the MWPP payload. The ch_default, ch_unused, ch_never, and ch_anchor parameters configure the semantics of the chapter types that appear in Lazzaro/Wawrzynek [Page 31] INTERNET-DRAFT 22 September 2002 the recovery journal. These parameters are described in Appendix C.1, and override the default stream behaviors 1 and 2 listed in Section 6.1 and referenced in Section 6.2. o MIDI command timestamp semantics. The tsmode, octpos, mperiod, and linerate parameters customize the semantics of the timestamps that label commands in the MIDI command section. These parameters let MWPP accurately encode the implicit time coding of the MIDI wire protocol. These parameters are described in Appendix C.2, and override default stream behavior 3 listed in Section 6.1 and referenced in Section 6.2 o Media time limits. The standard SDP parameter maxptime sets the maximum media time of an MWPP RTP packet, and as a consequence imposes a minimum sending rate for MWPP. This feature benefits algorithms performing clock-skew compensation, network latency estimation, and packet loss recovery. This parameter is described in Appendix C.3, and overrides default stream behavior 4 listed in Section 6.1 and referenced in Section 6.2. o Multiple streams. The midiport SDP parameter supports mapping multiple MWPP streams to the same MIDI namespace (for native MWPP streams) or to the same instance of an MPEG 4 object type (for the mpeg4-generic MWPP streams). The zerosync SDP parameter provides an alternative way to synchronize multiple MWPP streams. These parameters are described in Appendix C.4, and override default stream behavior 5 in Sections 6.1 and 6.2. o MIDI rendering. An extensible set of SDP parameters supports the specification of the MWPP rendering method, for both native MWPP streams and mpeg4-generic MWPP streams. These parameters are described in Appendix C.5 and override default stream behavior 6 in Sections 6.1 and 6.2. 7. Security Considerations Cryptographic authentication of incoming RTP and RTCP packets is highly recommended when using MWPP. Without such protections, attackers could forge MIDI commands into an ongoing streams, potentially damaging speakers and eardrums. An attacker could also craft RTP and RTCP packets to exploit known bugs in the client, and take effective control of a client machine. Lazzaro/Wawrzynek [Page 32] INTERNET-DRAFT 22 September 2002 The session management tool should also use cryptographic authentication on all session descriptions, as spoofed AudioSpecificConfig() data blocks are another point of entry for attackers. The zerosync SDP parameter (described in Appendix C.4.2) impairs a security feature of RTP. In standard RTP, the RTP timestamp is initialized to a randomly chosen value, to reduce the predictability of RTP header values. If the zerosync SDP parameter is used with a non-zero value in a stream description, and a plain-text session description is snooped, an attacker knows the randomly chosen RTP timestamp offset for the stream. If the zerosync SDP parameter is used with a zero value for several stream descriptions in a session, all of these streams use the same randomly chosen RTP offset, and so an attacker may find this offset value is easier to determine. The sasc rendering value for the SDP render parameter (defined in Appendix C.5.1) supports the inclusion of AudioSpecificConfig() data by reference, using the url parameter. If this url is spoofed, an attacker could change the session configuration in an arbitrary way, and thus forge an attack on the MPEG 4 client. 8. Congestion Control MWPP has congestion control issues that are unique for an RTP audio packetization. In certain applications such as network musical performance [6], the packet rate is linked to the gestural rate of a human performer. MWPP implementations SHOULD sense the MIDI wire protocol stream for command patterns that result in excessive packet rates, and filter these streams as part of MWPP to reduce the packet rate. [22] offers implementation guidance on this issue. 9. Acknowledgements We thank the networking, media compression, and computer music community members who have contributed to the MWPP standardization effort, including Steve Casner, Robin Davies, Dominique Fober, Philippe Gentric, Chris Grigg, Michel Jullian, Phil Kerr, Young-Kwon Lim, Jan van der Meer, Colin Perkins, Herbie Robinson, Larry Rowe, Dave Singer, Martijn Sipkema, and Giorgio Zoia. Lazzaro/Wawrzynek [Page 33] INTERNET-DRAFT 22 September 2002 Appendix A. The Recovery Journal Channel Chapters Appendix A.1. Recovery Journal Definitions In this Appendix, we define the terminology and the coding idioms that are used in the recovery journal bitfield descriptions in Section 5 (journal header structure), Appendices A.2-8 (channel journal chapters) and Appendices B.1-5 (system journal chapters). These descriptions assume that the recovery journal resides in the journal section of an RTP packet with sequence number I ("packet I") and that the Checkpoint Packet Seqnum field in the top-level recovery journal header refers to a packet with sequence number C. Sequence number algorithms defined for the recovery journal system use modulo 2^16 arithmetic. Several bitfield coding idioms appear throughout the recovery journal system, with consistent semantics. Most recovery journal elements begin with an "S" (Single-packet loss) bit. S bits are designed to help receivers efficiently parse through the recovery journal hierarchy in the common case of the loss of a single packet. The default value of the S bit is 1. An S bit for a recovery journal element in packet I is set to 0 if the element encodes data about a MIDI command stored in the MIDI command section of packet I - 1. If an element has its S bit set to 0, all higher-level recovery journal elements that contain it also have S bits that are set to 0, including the top-level recovery journal header (Figure 7 in Section 5). Other coding idioms that appear with consistent semantics throughout the recovery journal system are described below. o R flag bit. R flag bits are reserved for future use by MWPP. Sender MUST set R bits to 0; receivers MUST ignore R bit values. o LENGTH field. All fields named LENGTH (as distinct from LEN) code the number of octets in the structure that contains it, including the header it resides in and all hierarchical levels below it. This definition simplifies parsing, as receivers may skip over the entire structure with an addition operation. We now define normative terms used to describe recovery journal semantics. o Checkpoint history. The checkpoint history of a recovery journal is the concatenation of the MIDI command sections of packets C through I - 1. The last MIDI command in MIDI command section for Lazzaro/Wawrzynek [Page 34] INTERNET-DRAFT 22 September 2002 packet I - 1 is considered the most recent command; the first MIDI command in the MIDI command section for packet C is the oldest command. A checkpoint history with no MIDI commands is considered to be empty. The checkpoint history never contains the MIDI Command section of the packet I (the packet containing the recovery journal), so if C == I, the checkpoint history is empty by definition. o Session history. The session history of a recovery journal is the concatenation of MIDI command sections from the first packet of the session up to packet I - 1. The definitions of MIDI command recency and history emptiness are the same as in the checkpoint history. The session history never contains the MIDI command section of packet I, and so the session history of the first packet in the session is empty by definition. o Finished/unfinished commands. If all octets of a MIDI command appear in the session history, the command is defined to be finished. If some but not all octets of a MIDI command appear in the session history, the command is defined to be unfinished. Unfinished commands occur if segments of a SysEx command appear in several RTP packets. For example, if a SysEx command is coded as 3 segments, with segment 1 in packet K, segment 2 in packet K + 1, and segment 3 in packet K + 2, the session histories for packets K + 1 and K + 2 contain unfinished versions of the command. o Active commands (default). For most types of MIDI commands, an active MIDI command is defined to be a MIDI command that does not appear before one of the following MIDI commands in the session history: System Reset (0xFF), General MIDI System Enable (0xF0 0x7E 0xcc 0x09 0x01 0xF7), General MIDI System Disable (0xF0 0x7E 0xcc 0x09 0x00 0xF7). A few types of MIDI commands use a modified meaning of active (see below). o Active commands (NoteOn, NoteOff, Poly Aftertouch). For MIDI NoteOn, NoteOff, and Poly Aftertouch commands, an active MIDI command is defined to be a MIDI command that does not appear before one of the following MIDI commands in the session history: System Reset (0xFF), General MIDI System Enable (0xF0 0x7E 0xcc 0x09 0x01 0xF7), General MIDI System Disable (0xF0 0x7E 0xcc 0x09 0x00 0xF7), MIDI Control Change number 120 (All Notes Off) or 124 (All Sound Off). o Active commands (MIDI Control Change). For MIDI Control Change commands, an active MIDI command is defined to be a MIDI command that does not appear before one of the following MIDI commands in the session history: System Reset (0xFF), General MIDI System Enable (0xF0 0x7E 0xcc 0x09 0x01 0xF7), General MIDI System Disable (0xF0 0x7E 0xcc 0x09 0x00 0xF7), MIDI Control Change number 121 Lazzaro/Wawrzynek [Page 35] INTERNET-DRAFT 22 September 2002 (All Controllers Off). The chapter definitions in Appendices A.2-8 and B.1-5 reflect the default recovery journal behavior of MWPP. The j_update, ch_default, ch_unused, ch_never, and ch_anchor SDP parameters modulate these definitions, as described in Appendix C.1. Finally, we note that channel journals only encode information about MIDI commands appearing on the MIDI channel the journal protects. All references to MIDI commands in Appendices A.2-8 should be read as "MIDI commands appearing on this channel." Appendix A.2. Chapter P: MIDI Program Change A channel journal MUST contain Chapter P if an active Program Change (0xC) command appears in the checkpoint history. Figure A.2.1 shows the format for Chapter P. 0 1 2 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |S| PROGRAM |C| BANK-COARSE |F| BANK-FINE | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure A.2.1 -- Chapter P Format The chapter has a fixed size of 24 bits. The PROGRAM field indicates the program value of the most recent Program Change command in the checkpoint history. By default, bits 8-23 of Chapter P are set to 0. However, if an active Control Change (0xB) command for controller 0 (Bank Select Coarse) appears before this Program Change command in the session history, the C bit is set to 1, and the BANK-COARSE field is set to the 7-bit data value for the most recent Control Change command for controller 0. The F bit and BANK-FINE field code the Control Change command for controller 32 (Bank Select Fine) in an identical manner. Appendix A.3. Chapter W: MIDI Pitch Wheel A channel journal MUST contain Chapter W if an active MIDI Pitch Wheel (0xE) command appears in the checkpoint history. Figure A.3.1 shows the format for Chapter W. Lazzaro/Wawrzynek [Page 36] INTERNET-DRAFT 22 September 2002 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |S| FIRST |R| SECOND | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure A.3.1 -- Chapter W Format The chapter has a fixed size of 16 bits. The FIRST and SECOND fields are the 7-bit values of the first and second data octets of the most recent active Pitch Wheel command in the checkpoint history. Appendix A.4. Chapter N: MIDI NoteOff and NoteOn In this Appendix, we consider NoteOn commands with zero velocity to be NoteOff commands. A channel journal MUST contain Chapter N if an active MIDI NoteOn (0x9) or NoteOff (0x8) command appears in the checkpoint history. Figure A.4.1 shows the format for Chapter N. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 8 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |B| LEN | LOW | HIGH |S| NOTENUM |Y| VELOCITY | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |S| NOTENUM |Y| VELOCITY | .... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | BITFIELD | BITFIELD | .... | BITFIELD | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure A.4.1 -- Chapter N Format Chapter N codes the most recent active NoteOn or NoteOff reference to a MIDI note number in the checkpoint history. Chapter N consists of a 2-octet header, followed by least one of the following data structures: o A list of note logs to code NoteOn commands. o A NoteOff bitfield structure to code NoteOff commands. The note log list MUST contain an entry for all note numbers whose most recent checkpoint history appearance is in a NoteOn command. The NoteOff bitfield structure MUST contain a set bit for all note numbers whose most recent checkpoint history appearance is in a NoteOff command. Lazzaro/Wawrzynek [Page 37] INTERNET-DRAFT 22 September 2002 A note number is never coded in both structures. The header for Chapter N, reproduced in Figure A.4.2, codes the size of the note list and bitfield structures. 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |B| LEN | LOW | HIGH | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure A.4.2 -- Chapter N Header The 7-bit LEN field codes the number of 2-octet note logs in the note list. Zero is a valid value for LEN, and codes the empty note list. A LEN value of 127 serves double duty, coding a note list length of 128 note logs (if LOW = 0xF and HIGH = 0x0) or 127 note logs (for any other LOW/HIGH combination). This mechanism supports the unlikely, but legal, condition of 128 concurrent NoteOn commands, one for each note number. The 4-bit LOW and HIGH fields code the number of NoteOff bitfield octets that follow the note log list. LOW and HIGH are unsigned integer values. If LOW is less that or equal to HIGH, there are (HIGH - LOW + 1) NoteOff bitfield octets in the chapter. An empty NoteOff bitfield structure is coded by setting LOW to 15 and HIGH to 0 or 1. The B bit is set to 1 if the MIDI command section of packet I - 1 does not include a NoteOff command for this channel. The B bit, like the S bit (Appendix A.1), helps receivers efficiently parse recovery journals in the common case of the loss of a single packet. We now describe the 2-octet note log structure, reproduced in Figure A.4.3. 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |S| NOTENUM |Y| VELOCITY | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure A.4.3 -- Chapter N Note Log The 7-bit NOTENUM field codes the note number for the log; a note number may not be represented by multiple note logs in the note list. The Lazzaro/Wawrzynek [Page 38] INTERNET-DRAFT 22 September 2002 7-bit VELOCITY field codes the velocity value for the most recent NoteOn command for the note number in the checkpoint history. VELOCITY is never zero; NoteOn commands with zero velocity are coded as NoteOff commands in the NoteOff bitfield structure. The note log does not code the execution time of the NoteOn command. However, the Y bit codes a hint from the sender about the NoteOn execution time. This hint takes the form of a recommendation to play (Y = 1) or skip (Y = 0) a recovered NoteOn command from this log. More specifically, Y is set to 1 if the NoteOn command coded by the note log is considered to be simultaneous with the RTP timestamp of the packet than contains the note log. The metric used to judge simultaneity is implementation dependent. We now describe the NoteOff bitfield structure. A NoteOff bitfield octet codes NoteOff information for eight consecutive MIDI note numbers, with the MSB representing the lowest note number. The MSB of the first bitfield octet codes the note number 8*LOW; the MSB of the last bitfield octet codes the note number 8*HIGH. A set bit codes a NoteOff command for the note number; Chapter N does not code NoteOff velocity data. In the most efficient coding for the NoteOff bitfield structure, the first and last octets of the structure contain at least one set bit. Appendix A.5. Chapter A: MIDI Poly Aftertouch A channel journal MUST contain Chapter A if an active Poly Aftertouch (0xA) command appears in the checkpoint history. Figure A.5.1 shows the format for Chapter A. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 8 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |S| LEN |S| NOTENUM |R| PRESSURE |S| NOTENUM | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |R| PRESSURE | .... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure A.5.1 -- Chapter A format The chapter consists of a 1-octet header, followed by a variable length list of 2-octet note logs. A note log MUST appear for a note number if an active Poly Aftertouch command for the note number appears in the checkpoint history. A note number may not be represented by multiple Lazzaro/Wawrzynek [Page 39] INTERNET-DRAFT 22 September 2002 note logs in the note list. The 7-bit LEN field codes the number of note logs in the list, minus one. Figure A.5.2 reproduces the note log structure of Chapter A. 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |S| NOTENUM |R| PRESSURE | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure A.5.2 -- Chapter A Note Log The 7-bit PRESSURE field codes the pressure value of the most recent Poly Aftertouch command in the checkpoint history. The MIDI note number for this command is coded in the 7-bit NOTENUM field. Appendix A.6. Chapter T: MIDI Channel Aftertouch A channel journal MUST contain Chapter T if an active MIDI Channel Aftertouch (0xD) command appears in the checkpoint history. Figure A.6.1 shows the format for Chapter T. 0 0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+ |S| PRESSURE | +-+-+-+-+-+-+-+-+ Figure A.6.1 -- Chapter T Format The chapter has a fixed size of 8 bits. The 7-bit PRESSURE field holds the pressure value of the most recent active Channel Aftertouch command sent on this channel. Appendix A.7. Chapter C: MIDI Control Change A channel journal MUST contain Chapter C if an active Control Change (0xB) command appears in the checkpoint history (excepting controller numbers 0, 6, 32, 38, 96, 97, 98, 99, 100, and 101). In certain cases (defined later in this Appendix) this rule also applies to the excepted controller numbers. Figure A.7.1 shows the format for Chapter C. Lazzaro/Wawrzynek [Page 40] INTERNET-DRAFT 22 September 2002 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 8 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |S| LEN |S| NUMBER |A| VALUE/ALT |S| NUMBER | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |A| VALUE/ALT | .... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure A.7.1 -- Chapter C format The chapter consists of a 1-octet header, followed by a variable length list of 2-octet controller logs. The list MUST contain an entry for a controller number if an active Control Change command for the number appears in the checkpoint history (excepting numbers 0, 6, 32, 38, 96, 97, 98, 99, 100, 101, 124, 125, 126, and 127). In certain cases (defined later in this Appendix) this rule also applies to the excepted controller numbers. The 7-bit LEN field codes the number of controller logs in the list, minus one. A controller number may not appear in multiple controller logs in the list. Figure A.7.2 reproduces the controller log structure of Chapter C. 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |S| NUMBER |A| VALUE/ALT | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure A.7.2 -- Chapter C Controller Log The 7-bit NUMBER field identifies the controller number. The 7-bit VALUE/ALT field codes recovery information for the most recent Control Change command for this number in the checkpoint history. Chapter C provides three tools for coding recovery information for a command in the VALUE/ALT field: the value tool, the toggle tool, and the count tool. Implementations may choose among the tools to code a Control Change command. In the value tool, the 7-bit VALUE field codes the control value of the most recent Control Change command for this controller number. This tool works best for controllers that code a continuous quantity, such as number 1 (Modulation Wheel). If the value tool is chosen, the A bit is set to 0. Lazzaro/Wawrzynek [Page 41] INTERNET-DRAFT 22 September 2002 The A bit is set to 1 to code the toggle or count tool. These tools work best for controllers that code discrete actions. Figure A.7.3 shows the controller log for these tools. 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |S| NUMBER |1|T| ALT | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure A.7.3 -- Controller Log for ALT tools The T flag is set to 1 to code the toggle tool; T is set to 0 to code the count tool. Both methods use the 6-bit ALT field as an unsigned integer. The toggle tools works best for controllers that act as on/off switches, such as 64 (Hold Pedal). These controllers code the "off" state with control values 0-63 and the "on" state with 64-127. The ALT field codes the total number of toggles (off->on and on->off) due to Control Change commands in the session history. Toggle counting is performed modulo 64, and the controller is assumed to be off at the start of a session. The Hold Pedal controller illustrates the benefit of the toggle tool over the value tool for switch controllers. As often used in piano applications, the "on" state of the Hold Pedal lets notes resonate, while the "off" state immediately damps notes to silence. The loss of the "off" command in an "on->off->on" sequence results in ringing notes that should have been damped silent. The toggle tool lets receivers detect this lost "off" command but the value tool does not. The count tool is similar to the toggle tool, but is optimized for controllers whose value octet is ignored, such as 120 (All Notes Off). For the count tool, the ALT field codes the total number of Control Change commands in the session history. Command counting is performed modulo 64, and the command count is set to 0 at the start of the session. We now describe normative coding rules for the controller numbers that are excepted from the general rules presented in the beginning of this Appendix. For each excepted controller number, we define the conditions under which a control log MUST appear in Chapter C for the controller number. By extension, these conditions imply that Chapter C MUST appear in the recovery journal. If active Control Change commands for controller numbers 0 (Bank Select Lazzaro/Wawrzynek [Page 42] INTERNET-DRAFT 22 September 2002 Coarse) or 32 (Bank Select Fine) appear in the checkpoint history, the most recent commands for these numbers MUST appear as entries in the controller list only if the data value for these commands are not coded in the BANK-COARSE (0) or BANK-FINE (32) fields of the Chapter P (Appendix A.2) for the channel journal. This rule avoids redundant coding in Chapters C and P. Several controller numbers pairs are defined to be mutually exclusive. Controller numbers 124 (Omni Off) and 125 (Omni On) form a mutually exclusive pair, as do controller numbers 126 (Mono) and 127 (Poly). If active Control Change commands for one or both members of a mutually exclusive pair appear in the checkpoint history, one and only one controller log MUST appear in controller list to code the pair. This controller log MUST code the controller number of the most recent Control Change command of the pair. Appendix A.8 defines Chapter M, the MIDI Parameter chapter, to provide resiliency for the MIDI registered/non-registered parameter system. Here, we define the Chapter C rules for coding Control Change commands related to the registered/non-registered parameter system. These Chapter C rules serve to minimize redundancy with Chapter M. Control Change commands for controller numbers 6 and 38 (Data Slider) and 96 and 97 (Data Button) may be used as part of the parameter system, or may be used as general-purpose controllers. Control Change commands for controller numbers 6, 38, 96, or 97 that appear in the checkpoint history, and that are used in the parameter system, MUST NOT appear as entries in the controller list. However, if active Control Change commands for controller numbers 6, 38, 96, or 97 appear in the checkpoint history, and these commands are used as general-purpose controllers, the most recent general-purpose command instance for these numbers MUST appear as entries in the controller list. A parameter system transaction begins with paired Control Change commands for numbers 98 and 99 (Non-Registered Parameter LSB and MSB) or 100 and 101 (Registered Parameter LSB and MSB). Chapter M codes these paired Control Change commands. The Chapter C rule below acts to code "unpaired" commands for these controller numbers, that appear in the checkpoint history if a (98, 99) or (100, 101) pair is split across the MIDI command sections of two MWPP packets. If the most recent active Control Change command for controller 98, 99, 100, or 101 in the checkpoint history is part of a (98, 99) or (100, 101) command pair that begins a parameter system transaction, the command MUST NOT appear in the controller list. However, if the most recent active Control Change command for controller 98, 99, 100, or 101 Lazzaro/Wawrzynek [Page 43] INTERNET-DRAFT 22 September 2002 in the checkpoint history does not form part of a (98, 99) or (100, 101) command pair, an entry MUST appear in the controller list. Appendix A.8. Chapter M: MIDI Parameter System A channel journal MUST contain Chapter M if an active Control Change command that forms part of an initiated parameter system transaction (as defined below) appears in the checkpoint history. We begin by defining the terms "parameter system", "parameter system transaction", and "initiated parameter system transaction" as used in the Appendix. o Parameter system. This phrase refers to a MIDI feature that provides two sets of 16,384 parameters to augment the Control Change controller number space. Registered Parameter Names (RPN) system and the Non-Registered Parameter Names (NRPN) system each provides 16,384 parameters. o Parameter system transaction. The value of RPNs and NRPNs are changed by a series of Control Change commands that form a transaction. A transaction begins with two Control Change commands to set the parameter number (controller numbers 98 and 99 for NRPNs, controller numbers 100 and 101 for RPNs). The transaction continues with an arbitrary number of Data Entry (controller numbers 6 and 38) and Data Button (controller numbers 96 and 97) Control Change commands to set the parameter value. The transaction ends with a second pair of (98, 99) or (100, 101) Control Change commands. These terminal commands are considered a part of the transaction. In addition, the terminal commands may start a second parameter system transaction; in this case, these commands belong to two transactions. o Initiated parameter system transaction. An initiated parameter system transaction is a transaction whose (98, 99) or (100, 101) initial active Control Change command pair appears in the session history. Under certain conditions, unpaired active Control Change commands for controller numbers 98, 99, 100, or 100 are coded in Chapter C, as described in Appendix A.7. Figure A.8.1 shows the variable-length format of Chapter M. Lazzaro/Wawrzynek [Page 44] INTERNET-DRAFT 22 September 2002 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |S|P|N|R|R|R| LENGTH | Transaction log list ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure A.8.1 Top-level Chapter M format Chapter M consists of a 2-octet header, followed by list of transaction log entries. The 10-bit LENGTH field codes the length of Chapter M, and conforms to semantics described in Appendix A.1. If an active Control Change command that forms part of an initiated parameter system transaction appears in the checkpoint history, a log entry for the transaction MUST appear in the transaction list. The relative order of transaction list entries MUST reflect the relative position of parameter transactions in the session history: the first log entry codes the most recent parameter transaction in the history, the second log entry codes a transaction that appears before the first parameter transaction in the history, etc. The P header bit is set to 1 if an active Control Change command pair to terminate the first RPN transaction in the log list does not appear in the session history. The N header bit has the same role for the first NRPN transaction in the log list. Figure A.8.2 shows the structure of a transaction log. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |S|T| PARAM-NUMBER | KEY | DATA ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ... | KEY | DATA ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure A.8.2 Transaction Log Structure The transaction log consists of a 2-octet header, followed by a compressed enumeration of the Control Change commands for controller numbers 6, 38, 96, and 97 for this transaction in the session history. The presence of Control Change commands to terminate the transaction log are coded implicitly by the P and N header bits of the top-level chapter format (Figure A.8.1). Lazzaro/Wawrzynek [Page 45] INTERNET-DRAFT 22 September 2002 A transaction log header codes the parameter identity. If T is set to 1, the log codes an NRPN parameter; if T is set to 0, the log codes an RPN parameter. The 14-bit PARAM-NUMBER header field codes the parameter number. The KEY and DATA fields that follow log header encode the compressed enumeration of the Control Change commands for numbers 6, 38, 96, and 97. The ordering of this enumeration matches the ordering of commands in the transaction: the first transaction command appears as the first command in the enumeration, the second transaction command appears as the second command in the enumeration, etc. KEY and DATA fields always appear in pairs in the transaction log; at least one KEY-DATA pair MUST appear in a transaction log, even if no Control Change commands need to be coded. The KEY field has a fixed 1-octet size, and acts as a directory for the KEY-DATA pair; the DATA fields has a variable size of 0-3 octets. Figure A.8.3 shows the format of the KEY octet. 0 0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+ |S|M|IN1|IN2|IN3| +-+-+-+-+-+-+-+-+ Figure A.8.3 -- Key Octet The two-bit fields IN1, IN2, and IN3 code the appearance and meaning of the first, second, and third DATA octet that may follows the KEY octet. The IN fields code the following information: o IN_k = 00. The DATA octet for this position is not present. The permitted placements of the 00 value are: IN1 = IN2 = IN3 = 00 (no DATA octets follow the KEY octet), IN2 = IN3 = 00 (one DATA octet follow the KEY octet), IN3 = 00 (two DATA octets follow the KEY octet). o IN_k = 01. Indicates an active Control Change command for controller number 6 (Data Entry Slider Coarse); the DATA octet codes the third octet of the Control Change command. o IN_k = 02. Indicates an active Control Change command for controller number 38 (Data Entry Slider Fine); the DATA octet codes the third octet of the Control Change command. Lazzaro/Wawrzynek [Page 46] INTERNET-DRAFT 22 September 2002 o IN_k = 03. Indicates one or more active Control Change commands for controller number 96 (Data Button Increment) and/or 97 (Data Button Decrement), without an intervening Control Change command 6 or 38.The DATA octet codes the cumulative effect of the Data Button commands, as a two's complement 8-bit value: controller 96 commands increment the value by 1, controller 97 commands decrement the value by 1. The M flag is 1 if another KEY octet follows the DATA octet(s). If M is 0, another transaction log may follow the DATA octet(s), or the DATA octet(s) may mark the end of Chapter M, depending on the LENGTH field of the top-level Chapter M header shown in Figure A.8.1. In comparison with other recovery journal chapters, Chapter M is inefficient: each transaction for a parameter number in the checkpoint history is listed in the transaction list, and each Control Change command for a transaction is enumerated in a transaction log. This design decision trades off recovery journal size for design simplicity. In practice, parameter system commands rarely appear in MIDI streams, and this design decision does not have a significant impact on MWPP bandwidth requirements. Lazzaro/Wawrzynek [Page 47] INTERNET-DRAFT 22 September 2002 Appendix B. The Recovery Journal System Chapters Appendix B.1. System Chapter D: Reset, Song Select, Tune Request The system journal MUST contain Chapter D if an active MIDI Reset (0xFF), MIDI Tune Request (0xF6), or MIDI Song Select (0xF3) command appears in the checkpoint history. Note that General MIDI reset commands are coded in Chapter X (Appendix B.5), not in Chapter D. Figure B.1.1 shows the variable-length format for Chapter D. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |S|E|T|G|R|R|R|R| Command logs ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure B.1.1 -- System Chapter D Format The chapter consists of a 1-octet header, followed by one or more command logs. Header flag bits indicate the presence of command logs for the Reset (E = 1), Tune Request (T = 1), and Song Select (G = 1) commands. Command logs appear in a list following the header, in the order that their flag bits appear in the header. Figure B.1.2 shows the 1-octet command log format for the Reset and Tune Request commands. 0 0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+ |S| COUNT | +-+-+-+-+-+-+-+-+ Figure B.1.2 -- Command Log for Reset and Tune Request Chapter D MUST contain the Reset command log if an active Reset command appears in the checkpoint history. The 7-bit COUNT field codes the total number of Reset commands (modulo 128) present in the session history. Chapter D MUST contain the Tune Request command log if an active Tune Request command appears in the checkpoint history. The 7-bit COUNT field codes the total number of Tune Request commands (modulo 128) present in the session history. Lazzaro/Wawrzynek [Page 48] INTERNET-DRAFT 22 September 2002 Figure B.1.3 shows the 1-octet command log format for the Song Select command. 0 0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+ |S| VALUE | +-+-+-+-+-+-+-+-+ Figure B.1.3 -- Song Select Command Log Format Chapter D MUST contain the Song Select command log if an active Song Select command appears in the checkpoint history. The 7-bit VALUE field codes the song number of the most recent Song Select command in the checkpoint history. Appendix B.2. System Chapter V: Active Sense Command The system journal MUST contain Chapter V if an active MIDI Active Sense (0xFE) command appears in the checkpoint history. Figure B.2.1 shows the format for Chapter V. 0 0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+ |S| COUNT | +-+-+-+-+-+-+-+-+ Figure B.2.1 -- System Chapter V Format The 7-bit COUNT field codes the total number of Active Sense commands (modulo 128) present in the session history. Appendix B.3. System Chapter Q: Sequencer State Commands This Appendix describes Chapter Q, the system chapter for the MIDI sequencer commands. The system journal MUST contain Chapter Q if an active MIDI Song Position Pointer (0xF2), MIDI Clock (0xF8), MIDI Tick (0xF9), MIDI Start (0xFA), MIDI Continue (0xFB) or MIDI Stop (0xFC) command appears in the checkpoint history. Note that MIDI Tick, a relatively recent addition to Lazzaro/Wawrzynek [Page 49] INTERNET-DRAFT 22 September 2002 the MIDI standard [1], is a seconds-based alternative to MIDI Clock. Figure B.3.1 shows the variable-length format for Chapter Q. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |S|N|D|C|T|Q|TOP| CLOCK | TICKS | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ... | QNOTE | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ... | +-+-+-+-+-+-+-+-+ Figure B.3.1 -- System Chapter Q Format Unlike most chapters, Chapter Q does not provide resiliency by coding log entries for individual MIDI commands. Instead, Chapter Q captures the cumulative effect of all sequencer commands in the session history, by encoding the most recent sequencer system state. This coding strategy yields an efficient chapter design: the minimal Chapter Q configuration fits is 3 octets. In a temporal sense, the fields of Chapter Q reflect system state up to (but not including) the moment encoded by the RTP timestamp of the packet in which it resides (packet I, as defined in Appendix A.1). In normal operation, a receiver examines Chapter Q after a packet loss episode, in order to re-synchronize its open-loop estimation of the sequencer state. Chapter Q state information includes the position of the sequencer pointer (coded by the CLOCK and/or TICKS field), the presence of the downbeat (the D bit) and the on/off state of the sequencer (the N bit). In addition, Chapter Q may optionally code an estimate of the current tempo may be coded in the QNOTE field. QNOTE helps loss recovery in two ways. If the sequencer is running, a tempo estimate may help a receiver re-synchronize faster. If the sequencer is stopped, QNOTE tracks tempo changes in the MIDI Clock or MIDI Tick stream; this information helps receivers smoothly react if a Start or Continue command appears soon after a packet loss episode. We now state the normative definition of the Chapter Q bitfields. Chapter Q consists of a 1-octet header followed by several optional fields, in the order shown in Figure B.3.1. Three header bits (C, T, and Q) indicate the presence of fields following the header. Two header bits (N and D) encode aspects of the sequencer system state directly. Lazzaro/Wawrzynek [Page 50] INTERNET-DRAFT 22 September 2002 Header flag bits C, T, and Q signal the presence of the 16-bit CLOCK field (C set to 1), the 24-bit TICKS field (T set to 1) and the 24-bit QNOTE field (Q set to 1). The N header bit encodes the relative occurrence of the Start, Continue and Stop commands in the session history. If an active Start or Continue command appears most recently, N is set to 1. If an active Stop appears most recently, or if no active instances of these commands appear in the session history, N is set to 0. The D header bit encodes the presence of the downbeat. If N is set to 1, D is set to 1 if at least one Clock or Tick command follows the most recent Start or Continue command in the session history. If this condition does not hold, or if N is 0, then D is set to 0. If N is set to 0 (coding a stopped sequence), or if N is set to 1 and D is set to 0 (coding a sequence on the verge of beginning), Chapter Q MUST encode the starting song position of the sequence. The C and T header flags, the optional CLOCK (if C is set to 1) and TICKS (if T is set to 1) fields, and the TOP header field, act to code the starting song position, via the methods described below. o If C = 0 and T = 0, the starting song position is at the beginning of the song. o If C = 1 and T = 0, the 2-bit TOP header field and the 16-bit CLOCK field are combined to form the 18-bit unsigned quantity 65536*TOP + CLOCK. This value encodes the starting song position, in units of clocks (24 clocks per quarter note). Use this method if the MIDI source uses Clock commands as timing pulses. o If C = 0 and T = 1, the 24-bit TICKS field codes the starting song position, in units of milliseconds. Use this method if the MIDI source uses Tick commands as timing pulses (10 ms per Tick). The song position MUST be encoded using sub-Tick (i.e. sub-10ms) resolution. o If C = 1 and T = 1, the starting song position is the sum of the positions encoded by the CLOCK, TOP and TICKS fields, as described above. Used this method if the MIDI stream uses Tick commands as timing pulses and also uses the clock-based Song Position Pointer commands to reposition the sequence. If the N and D header bits are both set to 1, the sequence is playing, and Chapter Q MUST encode the current song position in the sequence. The current song position is coded using the same fields and methods as Lazzaro/Wawrzynek [Page 51] INTERNET-DRAFT 22 September 2002 the starting song position (see above). If the TICKS field is used to code the current song position, the field value counts time up to the moment encoded by the RTP timestamp of packet I. Chapter Q MAY encode an estimate of the current tempo, by setting the Q header bit to 1, and placing the estimated tempo value in the 24-bit QNOTE field. The QNOTE field has units of microseconds per quarter note. This memo does not define a normative algorithm for tempo estimation for the QNOTE field. Note that Q may be set to 1 even if N is set to 0, providing a method for coding current tempo while the sequence is stopped. Appendix B.4. System Chapter E: MIDI Time Code Tape Position This Appendix describes Chapter E, the system chapter for the MIDI Time Code (MTC) commands. The system journal MUST contain Chapter E if an active MIDI System Common Quarter Frame command (0xF1) or an active finished System Exclusive (Universal Real Time) MTC Full Frame command (F0 7F cc 01 01 hr mn sc fr F7) appears in the checkpoint history. Unfinished MTC Full Frame commands are coded in Chapter X, as described in Appendix B.5. See Appendix A.1 for definitions of finished and unfinished MIDI commands. Figure B.4.1 shows the variable-length format for Chapter E. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |S|Q|C|P|D|POINT| COMPLETE | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | PARTIAL | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure B.4.1 -- System Chapter E Format This Appendix contains two sub-sections. B.4.1 is an informative description of the Chapter E design; B.4.2 is the normative definition of the Chapter E bitfield semantics. Lazzaro/Wawrzynek [Page 52] INTERNET-DRAFT 22 September 2002 B.4.1 Informative Description of Chapter E The MIDI standard uses MTC to tag a particular moment in the MIDI stream with a SMPTE timestamp (a frame-based timestamp standard for video and film). In a typical application, a receiver uses these SMPTE timestamps to synchronize the playback of a video tape deck with the MIDI stream. MTC provides two methods for sending a SMPTE timestamp. The simple method, the Full Frame command, encodes the entire timestamp in a 10-octet System Exclusive command. Alternatively, the timestamp value may be transmitted incrementally, via 8 one-octet Quarter Frame commands sent at regular intervals over two video frames. Chapter E encodes SMPTE recovery information derived from MTC commands that appear in the session history. In normal operation, a receiver examines Chapter E after a packet loss episode, in order to re- synchronize its open-loop estimation of the current SMPTE time. Chapter E may hold two SMPTE timestamps. The 24-bit COMPLETE field, present if the C bit is set, codes the most recent complete MTC timestamp that appears in the session history. This timestamp may be coded by one finished Full Frame command or 8 Quarter Frame commands. If the COMPLETE field codes data from Quarter Frame commands, the COMPLETE field value is two frames ahead of the timestamp encoded in the Quarter Frame commands, to compensate for the transmission delay of the incremental Quarter Frame code. Chapter E may also contain a 24-bit PARTIAL field, that codes the timestamp data fragments coded by an incomplete Quarter Frame sequence. The P bit signals the presence of the PARTIAL field. The D, Q, and POINT fields hold ancillary data that is essential for decoding the meaning of the PARTIAL field. B.4.2 Normative Definition of Chapter E Chapter E holds information about the most recent MIDI Time Code (MTC) tape position coded in the session history. Chapter E consists of a 1-octet header followed by two optional fields (COMPLETE and PARTIAL) in the order shown in Figure B.4.1. The 24-bit COMPLETE field is present if header bit C is set to 1; the 24-bit PARTIAL field is present if header bit P is set to 1. MTC tape position updates in the session history may occur atomically, via a finished Full Frame command, or incrementally, via a series of Quarter Frame commands spaced over the time period of two video frames. The Q header bit codes if a Quarter Frame command (Q set to 1) or a finished Full Frame command (Q set to 0) appears most recently in the session history. Lazzaro/Wawrzynek [Page 53] INTERNET-DRAFT 22 September 2002 At any moment in time, the session history may hold a sequence of zero or more complete MTC frame values. A partially complete MTC frame value (coded by an incomplete sequence of Quarter Frame commands) may also appear in the session history (after the most recent complete MTC frame value, if one exists). If the session history holds a complete MTC frame, and if the Quarter Frame command or finished Full Frame command that completes this frame encoding appears in the checkpoint history, Chapter E MUST include the 24-bit COMPLETE field to encode the frame value. The C header bit is set to 1 to signal the presence of the COMPLETE field. If a partially complete MTC frame value appears in the session history (after the most recent complete MTC frame value, if one exists), if this partially complete frame value not malformed (i.e. the high nibble sequence of Quarter Frame commands starts at 0 and increments contiguously to an intermediate value, or else starts at 7 and decrements contiguously to an intermediate value), and if at least one Quarter Frame command coding this partial value appears in the checkpoint history, Chapter E MUST include the 24-bit PARTIAL field to encode the frame value in progress. The P header bit is set to 1 to signal the presence of the PARTIAL field. Note that the PARTIAL field never codes a frame value coded in a Full Frame command; unfinished Full Frame commands are coded in Chapter X, as described in Appendix B.5. The D header flag bit signals the direction the tape is moving. D is set to 0 for forward or no movement; D is set to 1 for reverse movement. If Q is set to 1, the relative motion of the upper nibble of the Quarter Frame data value determines D. If Q is set to 0, the relative tape motion from its last position determines D. The D bit serves two roles in Chapter E. If a PARTIAL field is present in Chapter E, the D bit serves a syntactic role: its state value is required to parse the contents of PARTIAL (as explained below). In addition, the tape direction information coded in the D bit serves an advisory role for receivers performing tape re-synchronization after a packet loss episode. The 3-bit POINT field hold information about the incremental Quarter Frame encoding in the session history. If Q is set to 1, POINT codes the upper nibble of the most recent Quarter Frame data value in the session history. If the PARTIAL field is present in Chapter E, the POINT field serves a syntactic role: its state value is required to parse the contents of PARTIAL (as explained below). If Q is set to 0, POINT is reserved for future use; senders MUST set POINT to 0x0, and receivers must ignore its value. Lazzaro/Wawrzynek [Page 54] INTERNET-DRAFT 22 September 2002 Figure B.4.2 shows the common format for the COMPLETE and PARTIAL fields. 0 1 2 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |TYP| HOURS | MINUTES | SECONDS | FRAMES | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure B.4.2 -- COMPLETE and PARTIAL format The 5-bit HOURS, 6-bit MINUTES, 6-bit SECONDS, and 5-bit FRAMES fields encode the SMPTE values encoded in Full Frame and Quarter Frame commands. The bit allocations are sufficient to encode legal SMPTE values; note that for some fields, the associated MIDI commands use larger encodings. The 2-bit TYP field encodes the SMPTE frame type, using same encoding as the Quarter Frame and Full Frame commands. If used in the COMPLETE field, the TYP, HOURS, MINUTES, SECONDS, and FRAMES fields hold the most recent complete frame value, encoded by a finished Full Frame command or a series of 8 Quarter Frame commands in the session history. If the COMPLETE field codes data from Quarter Frame commands, the COMPLETE field value is two frames larger than the timestamp encoded in the Quarter Frame commands, to compensate for the transmission delay of the incremental Quarter Frame code. If used in the PARTIAL field, the TYP, HOURS, MINUTES, SECONDS, and FRAMES fields do not all contain valid values. Recall that the PARTIAL field encodes a partially complete SMPTE value encoded by a series of Quarter Frame commands in the session history. The bits in the PARTIAL field that correspond to data values in these Quarter Frame commands hold valid values; all other PARTIAL bits are set to 0. The valid PARTIAL bits directly reflect the data values encoded in the Quarter Frame commands in the session history; this PARTIAL field encoding MUST NOT include a compensatory offset for transmission delay. The D and POINT header values signal the valid bits in the PARTIAL field. If D is set to 0, PARTIAL field bits corresponding to Quarter Frame commands with High Nibble values (0, 1, ... POINT) are valid. If D is set to 1, PARTIAL field bits corresponding to Quarter Frame commands with High Nibble values (7, 6, ... POINT) are valid. Lazzaro/Wawrzynek [Page 55] INTERNET-DRAFT 22 September 2002 Appendix B.5. System Chapter X: System Exclusive This Appendix describes Chapter X, the system journal chapter for the MIDI System Exclusive command (opcode 0xF0, abbreviation SysEx). The system journal MUST contain at least one Chapter X entry if an active SysEx command (excluding a finished MTC Full Frame command) appears in the checkpoint history. A SysEx command is said to "appear" in the checkpoint history if the history contains a verbatim encoding of the SysEx command, or if the history contains at least one segment of the segmental encoding of the SysEx command. Note that finished MTC Full Frame commands are coded in Chapter E, as described in Appendix B.4. Unfinished MTC Full Frame commands, however, are coded in Chapter X. See Appendix A.1 for definitions of finished and unfinished commands. The Chapter X encoding is optimized for the short SysEx commands that signal real-time events. Chapter X is not intended for use with the longer SysEx commands used in bulk data transport, because the recovery journal system is very inefficient if the journal size is large. A MIDI session that combines real-time and bulk-data functions SHOULD be sent over two MWPP streams: a bulk-data stream sent over reliable transport, and a real-time unreliable stream for shorter commands. The midiport SDP parameter (Appendix C.4) supports split-stream operation. Note that the structure of the system journal (Figure 9 in Section 5) permits multiple entries for Chapter X. Each Chapter X entry codes information about exactly one SysEx command. The relative ordering of Chapter X entries MUST reflect the relative position of commands in the checkpoint history: the first Chapter X entry codes the most recent SysEx command in the history, the second Chapter X entry codes a SysEx command that appears before the first coded SysEx command in the history, etc. A Chapter X entry for a SysEx command encodes all information about the command that appears in the session history (as distinct from the checkpoint history, see Appendix A.1 for definitions). This distinction is relevant for the coding of SysEx commands whose segments appear across multiple packets. In this case, the Chapter X entry includes the starting segments for the SysEx command, even if these segments no longer appear in the checkpoint history. Chapter X provides two tools for encoding multiple SysEx commands of the same type. Each command of a certain type may be encoded in a separate Chapter X entry (the list tool) or only the most recent command of a certain type may be encoded (the recency tool). Lazzaro/Wawrzynek [Page 56] INTERNET-DRAFT 22 September 2002 Each active SysEx command that appears in the checkpoint history MUST be associated with a Chapter X entry via the list or recency tool (excluding finished MTC Full Frame commands). For each SysEx command type, an implementation may choose either coding tool. Simple implementations may use the list tool for all command types; sophisticated implementations may reduce bandwidth by using the recency tool for some command types. Figure B.5.1 shows the variable length format for System Chapter X. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |S|IDC|L|T| LEN | DATA ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure B.5.1 -- System Chapter X Format Chapter X consists of a 1-octet header, following by an arbitrary length DATA field. The DATA field encodes a modified version of the data octets of the SysEx command, as described below. The leading 0xF0 and trailing 0x7F SysEx octets never appear in the DATA field. If the Manufacturer ID value of the SysEx command (coded in the first octet of the MIDI command) has the values 0x00, 0x7E, or 0x7F, the DATA field begins with the second data octet of the SysEx command; for all other Manufacturer ID values, the DATA field begins with the first data octet of the SysEx command. The 2-bit IDC header field codes 0x00, 0x7E, and 0x7F ID values, using the method shown in Figure B.5.2. Lazzaro/Wawrzynek [Page 57] INTERNET-DRAFT 22 September 2002 ----------------------------------------------------------------------- | IDC | Manufacturer ID | First DATA octet is: | |--------------------------------------|------------------------------| | 0x0 | 0x7E (Universal Real-Time) | 2nd SysEx data octet | |--------------------------------------|------------------------------| | 0x1 | 0x7F (Universal Non-Real-Time) | 2nd SysEx data octet | |--------------------------------------|------------------------------| | 0x2 | 0x00 (Extension Escape Code) | 2nd SysEx data octet | |--------------------------------------|------------------------------| | 0x3 | in the range 0x01--0x7D | 1st SysEx data octet | ----------------------------------------------------------------------| Figure B.5.2 -- IDC Header Field Encoding The 3-bit LEN header field codes the exact length of short, complete SysEx commands, and signals alternative coding techniques for longer commands and truncated commands. The LEN values 0x0 through 0x5 indicate that the length of the DATA field is 1-6 octets. For these LEN values, the DATA field encodes a complete SysEx command, as a verbatim copy of the SysEx data octets (possibly skipping the first octet, as detailed in Figure B.5.2). The LEN value 0x6 indicates that the DATA field contains 7 or more octets. The DATA field encodes a complete SysEx command, as a verbatim copy of the data octets of the SysEx command (possibly skipping the first octet, as detailed in Figure B.5.2), with one exception: bit 7 (the most-significant bit) of the final data octet is set to one. This set bit implicitly codes the length of the DATA field (MIDI data octets, by definition, clear bit 7). The LEN value 0x7 indicates that the DATA field encodes a truncated SysEx command. This coding option is only to be used for SysEx commands encoded using the segmented method, for the case where not all segments appear in the session history. If LEN is 0x7, the DATA field encodes the data octets of the SysEx command segments that appear in the session history. The DATA field holds a verbatim copy of the data octets of the coded portion of the SysEx command, with two exceptions: the first octet may be skipped (as detailed in Figure B.5.2) and bit 7 (the most-significant bit) of the final coded data octet is set to one (to provide an implicit field length, as in the case where LEN is 0x6). The L and T header flags describe the coding tool used for the Chapter X bitfield. If L is set to 1 (the list tool), all SysEx commands of this type have an associated Chapter X bitfield in the system journal. If L Lazzaro/Wawrzynek [Page 58] INTERNET-DRAFT 22 September 2002 is set to 0 (the recency tool), only the most recent SysEx command of this type has an associated Chapter X bitfield in the system journal. The T flag defines the meaning of the word "type" in the previous paragraph. The T flag has different semantics for MIDI Universal SysEx commands (Manufacturers ID 0x7E and 0x7F) and for generic SysEx commands (all other Manufacturers ID values). We first define the T flag for Universal SysEx commands. The first four data octets of Universal commands have a defined semantics in the MIDI standard; we symbolically represent these four octets as: ID cc SubID SubID1. If T is set to 0, all Universal commands with the same ID, cc, SubID, and SubID1 values are considered the same type. If T is set to 1, all Universal commands with the same ID, cc, and SubID values are considered the same type. For generic SysEx commands (all Manufacturers ID values except 0x7E and 0x7F), we define the T flag as follow. The first data octet of a generic SysEx command is the Manufacturers ID; the remaining data octets may have an arbitrary organization, but often have a set of octets coding device and sub-command, followed by data octets for the command. If T is set to 0, all generic SysEx commands with the same ID value are considered to be of the same type. If T is set to 1, the SysEx command is assumed to have a device/sub-command/data organization, and all generic SysEx commands with the same ID value, device, and sub-command values are considered to be of the same type. If the SysEx command has a multi-level sub-command structure, these semantics require identical sub-command values at all levels. Lazzaro/Wawrzynek [Page 59] INTERNET-DRAFT 22 September 2002 Appendix C. Session Description Protocol (SDP) Definitions In this Appendix, we define the Session Description Protocol (SDP) parameters for MWPP. These parameters may be used to customize (and perhaps negotiate [11]) the configuration of an MWPP session, by using SDP in conjunction with session setup tools like SIP [10] or RTSP [12]. Figure C.1 lists the parameters described in the Sections 1-5 of this Appendix. With the exception of the standard SDP parameter maxptime (defined in [9]), these parameters are defined in this memo for use with MWPP. Appendix C.6 formally defines the syntax for these parameters, using ABNF [23]. MWPP uses parameters in three contexts, as formally defined in the IANA considerations (Appendix C.7). Session descriptions for native (Section 6.1) or mpeg4-generic (Section 6.2) MWPP streams may use parameters in fmtp lines. In addition, a few MWPP parameters may be used to customize the audio/sasc MIME encodings of Structured Audio initialization data (Appendix C.5). The left-most columns of Figure C.1 show which parameters may be used in each MIME context. Lazzaro/Wawrzynek [Page 60] INTERNET-DRAFT 22 September 2002 ---------------------------------------------------------------- | Parameter | Type | Appendix | mwpp | mpeg4-generic | sasc | |----------------------------------------------------------------| | j_sec | custom | C.1 | x | x | | |----------------------------------------------------------------| | j_update | custom | C.1 | x | x | | |----------------------------------------------------------------| | ch_default | custom | C.1 | x | x | | |----------------------------------------------------------------| | ch_unused | custom | C.1 | x | x | | |----------------------------------------------------------------| | ch_never | custom | C.1 | x | x | | |----------------------------------------------------------------| | ch_anchor | custom | C.1 | x | x | | |----------------------------------------------------------------| | tsmode | custom | C.2 | x | x | | |----------------------------------------------------------------| | linerate | custom | C.2 | x | x | | |----------------------------------------------------------------| | octpos | custom | C.2 | x | x | | |----------------------------------------------------------------| | mperiod | custom | C.2 | x | x | | |----------------------------------------------------------------| | maxptime | standard | C.3 | x | x | | |----------------------------------------------------------------| | midiport | custom | C.4 | x | x | | |----------------------------------------------------------------| | zerosync | custom | C.4 | x | x | | |----------------------------------------------------------------| | render | custom | C.5 | x | x | | |----------------------------------------------------------------| | url | custom | C.5 | | x | | |----------------------------------------------------------------| | inline | custom | C.5 | | x | | |----------------------------------------------------------------| | compr | custom | C.5 | | x | x | |----------------------------------------------------------------| | cid | custom | C.5 | | x | x | ---------------------------------------------------------------- Figure C.1 -- Table of MWPP Parameters Lazzaro/Wawrzynek [Page 61] INTERNET-DRAFT 22 September 2002 Appendix C.1. SDP Definitions: The Journalling System In this Appendix, we define the session description parameters that affect the journalling system. C.1.1. The j_sec and j_update Parameters By default, MWPP streams that use unreliable transport (such as UDP) MUST contain a journal section in each payload, and this journal section MUST use the recovery journal format. Also by default, MWPP streams that use reliable transport (such as TCP) MUST NOT include a journal section in the payload. The SDP parameter j_sec may be used to override these defaults. This memo defines two symbolic values for j_sec: "none", to indicate that all stream payloads MUST NOT contain a journal section, and "recj", to indicate that all stream payloads MUST contain a journal section that uses the recovery journal format. In practice, the j_sec parameter might be used to disable the recovery journal for a UDP MWPP stream, if the stream uses a MIDI rendering method that is inherently robust to lost MIDI commands. A drum machine that only responds to NoteOn commands is an example of a renderer that exhibits this robustness property. The stream description below configures a UDP stream that does not use the recovery journal: m=audio 5004 RTP/AVP 96 c=IN IP4 169.229.60.64 a=rtpmap: 96 mwpp/44100 a=fmtp: 96 j_sec=none; Other IETF standards-track documents may define alternative formats for the journal section. These documents MUST define new symbolic values for the j_sec parameter to signal the use of the alternative journal format. If a session description uses a j_sec value unknown to the recipient, the recipient MUST NOT accept the description. A second SDP parameter, j_update, declares the method the sender uses to update the checkpoint packet of the journal. This parameter is designed for use with the recovery journal system, but may be used by alternative journal formats if appropriate. Recall that in the recovery journal system, the recovery journal codes the recent history of the stream. The checkpoint packet of the stream, encoded in the recovery journal header (Section 6), sets the range of coverage of the journal (the checkpoint history, as defined in Appendix Lazzaro/Wawrzynek [Page 62] INTERNET-DRAFT 22 September 2002 A.1). This memo defines three symbolic values for j_update: the default "closed-loop", "open-loop", and "anchor". The semantics of these values are defined below. o closed-loop (default). If j_update is assigned the closed-loop value, the sender MUST NOT advance the checkpoint packet to extended sequence number N until all receivers have received a packet with extended sequence number M >= (N - 1). The sender MUST use receiver feedback (such as RTCP RR reports) to meet this constraint. If the constraint cannot be met due to an unresponsive receiver, the sender MUST drop the receiver from session, or (if applicable) update the session description to use a different j_update value. In Section 4, we discuss closed-loop checkpoint management in the context of the dynamic checkpoint strategy. o open-loop. If j_update is assigned the open-loop value, the sender may update the identity of the checkpoint packet at any time during the stream. The sender is not obliged to use RTCP or other receiver feedback to guarantee that overruns (as defined in Section 4) do not occur for receivers. In Section 4, we discuss open-loop checkpoint management in the context of hybrid checkpoint strategies. o anchor. If j_update is assigned the anchor value, the sender uses the first stream packet as the checkpoint packet, for the duration of the session. As a result, the checkpoint history for the stream is always identical to the session history. In Section 4, we discuss anchored checkpoint management in the context of the anchored checkpoint strategy. In Appendix C.1.2, we show examples that use the j_update parameter. C.1.2. Recovery Journal Chapter Inclusion Parameters By default, a chapter appears in the recovery journal if the normative text for the chapter in Appendices A.1-8 or B.1-5 demands it. In most cases, the normative text states that if a MIDI command appears in the checkpoint history, certain chapter(s) MUST appear in the recovery journal to protect the command. In this section, we describe SDP parameters that change these default semantics, on a chapter-by-chapter basis. We refer to these parameters collectively as the chapter inclusion parameters. These parameters serve to customize journals for certain sending strategies. Lazzaro/Wawrzynek [Page 63] INTERNET-DRAFT 22 September 2002 Each chapter inclusion parameter represents a type of inclusion semantics. An assignment to a parameter declares which chapters (or chapter subsets) obey the inclusion semantics of the parameter. We describe the assignment syntax for these parameters later in this section. Below, we define the SDP chapter inclusion parameters. For clarity, we define the action of parameters on complete chapters; if a parameter is assigned a subset of a chapter, the definition applies only to the chapter subset. o ch_default. Chapters assigned to the ch_default parameter follow the default semantics for the chapter (as defined in Appendices A.1-8 or B.1-5). o ch_unused. Chapters assigned to the ch_unused parameter never appear in the recovery journal. In addition, all MIDI command types encoded by these chapters never appear in the MIDI command sections of the packets in the stream. Session participants use ch_unused to declare which MIDI commands require no resiliency support, because the commands themselves are not to be used in the stream. o ch_never. Chapters assigned to the ch_never parameter never appear in the recovery journal. However, the MIDI command types encoded by these chapters may appear in the MIDI command sections of the packets in the stream. Session participants use ch_unused to declare (1) which MIDI commands may be lost without producing an unrecoverable indefinite artifact (as defined in Section 4), and (2) the MIDI commands whose loss recovery may proceed without journal support, while maintaining an acceptable rendering quality for the session. o ch_anchor. Chapters assigned to the ch_anchor parameter obey a stricter variant of the default semantics for the chapter. For these chapters, all references to the checkpoint history in the chapter definition (in Appendices A.1-8 or B.1-5) are replaced by references to the session history (as defined in Appendix A.1). The checkpoint packet for a ch_anchor chapter is the first packet sent in the stream, not the Checkpoint Sequence Number field of the header of the recovery journal that contains the chapter. The ch_anchor parameter lets participants in a session that uses a hybrid sending strategy (Section 4 and [22]) declare which MIDI commands are vulnerable to unrecoverable indefinite artifacts. The ch_default, ch_unused, ch_never and ch_anchor parameters may be set using the following syntax (defined formally in Appendix C.6): Lazzaro/Wawrzynek [Page 64] INTERNET-DRAFT 22 September 2002 = [channel list][field list]; The chapter list is mandatory; the channel and field lists are optional. Multiple assignments to these parameters have a cumulative effect, and are applied in the order of parameter appearance. The chapter list specifies the channel and system chapters for which this parameter applies, using a concatenated list of one or more upper- case letters corresponding to the chapter types. The optional channel list specifies the channel journals for which this parameter applies; if no channel list is provided, the parameter applies to all channel journals. The channel list takes the form of a comma- separated list of channel numbers (0 through 15) and dash-separated channel number ranges (i.e. 0-5, 8-12, etc). The channel list is irrelevant for system chapters. The optional field list is only relevant for Chapters C, N, and A. For Chapter C (coding the MIDI Control Change command), the field list codes the controller numbers for which the parameter applies. For Chapters N and A, the field list codes the note numbers for which the parameter applies. The syntax for field lists follows the syntax for channel lists. If no field list is provided, the parameter applies to all controller or note numbers. The stream configuration example below sets up an anchored checkpoint strategy (Section 4). m=audio 5004 RTP/AVP 96 c=IN IP4 169.229.60.64 a=rtpmap: 96 mwpp/44100 a=fmtp: 96 j_update=anchor; ch_unused=DVQEX; The ch_unused parameter in this example declares that the MIDI command sections in this stream do not code system commands. Note that the j_update parameter (Appendix C.1.1) configures the anchored checkpoint semantics for all journal chapters, and so per-chapter assignments to ch_anchor are not required. The stream configuration example below sets up a hybrid checkpoint strategy (Section 4). m=audio 5004 RTP/AVP 96 c=IN IP4 169.229.60.64 a=rtpmap: 96 mwpp/44100 a=fmtp: 96 j_update=open-loop; j_unused=WATCMDVQEX; a=fmtp: 96 j_anchor=P; j_anchor=C7,64; a=fmtp: 96 j_never=4,11-13N; Lazzaro/Wawrzynek [Page 65] INTERNET-DRAFT 22 September 2002 The j_update parameter is set to open-loop, to declare that the checkpoint history may experience overruns. Most chapters are assigned to j_unused, a typical MIDI usage pattern of a low-bandwidth stream. To guard against unrecoverable indefinite artifacts, the MIDI Program Change command and several MIDI Control Change controller numbers are assigned to ch_anchor. Note that the ordering of the j_anchor chapter C assignment after the j_unused command acts to override the j_unused assignment for the listed controller numbers (7 and 64). Chapter N for several MIDI channels is assigned to ch_never; in practice, this assignment pattern would reflect knowledge about a resilient rendering method in use for certain channels. In this example, Chapter N for MIDI channels other than 4, 11, 12, and 13 may appear in the recovery journal, per the default behavior. Appendix C.2. SDP Definitions: Command Execution Semantics As defined in Section 3, the MIDI command section of the MWPP payload consists of a list of MIDI commands, each with an associated command timestamp. By default, a command timestamp indicates the execution time for the command. If two commands have identical timestamps, the commands execute simultaneously. This default timestamp behavior is not a good fit for the MIDI wire protocol [1]. The MIDI wire protocol, a networking standard for the remote control of musical instruments over serial lines, does not send timestamps over the wire. Instead, MIDI commands are placed on the wire at the moment of occurrence, and receivers infer the timestamp from the moment of reception. In this memo, we refer to this coding technique as an "implicit" or a "time-of-arrival" code. As these names suggest, it is not possible to code two simultaneous MIDI commands over the MIDI wire protocol, because two commands can not be simultaneously sent over a serial line. If two musical events occur at the same moment in time, a wire protocol sender arbitrarily sends one MIDI command first, followed by the second MIDI command. The wire protocol receiver sees a sequence of MIDI commands offset in time, but cannot tell if the MIDI command offsets are serialization artifacts or genuine event timing offsets played by the musician. This Appendix defines alternative semantics of MIDI command timestamps, for use in transcoding time-of-arrival MIDI data streams into MWPP packets. The optional SDP parameter tsmode codes the choice of timestamp semantics. The tsmode parameter takes on one of three symbolic values: comex, async, or buffer. Lazzaro/Wawrzynek [Page 66] INTERNET-DRAFT 22 September 2002 The comex value indicates the default "command execution timestamp" semantics defined in Section 3. The async and buffer values code two different methods for coding MIDI wire protocol data, which we describe in sub-sections C.2.1 and C.2.2 below. The async and buffer methods are based on a simple idea: each method describes a sampling algorithm to sense data octets on a MIDI wire. The async and buffer methods use several SDP parameters to describe the physical properties of the sampling algorithm, in order to describe a wide range of plausible hardware and operating system environments. One such SDP parameter is linerate. The linerate parameter codes the timespan of one octet on the serial line. The linerate parameter has units of nanoseconds, and takes on integral values. For the MIDI wire protocol as defined in [1], linerate is 320,000 nanoseconds. Implicit MIDI data sent over other physical layers (such as IEEE-1394) might require a different linerate value. If linerate is not specified, it is considered to be undefined. We now describe the async and buffer methods in detail. C.2.1 Description of the async method The async method assumes an asynchronous sampling of the MIDI serial line. At the moment a complete octet is received, it is labelled with an accurate wall-clock time value, whose units match the units of the RTP header timestamp field. The SDP parameter octpos defines how MWPP command timestamps are derived from these octet timestamps. If octpos has the symbolic value first, a MIDI command timestamp codes the time value for the first octet of the MIDI command. If octpos has the symbolic value last, a MIDI command timestamp codes the time value for the last octet of the MIDI command. If an octpos parameter does not appear in the session description, the MIDI command timestamp value may reflect any octet of the MIDI command. Note that the octpos value refers to the first or last octet of the MIDI command as it appears on the MIDI wire, not the MIDI command as it appears in the MWPP packet. This distinction is important for cases where the MWPP command representation includes extra octets that do not appear on the MIDI wire. For example, if a MIDI command appears on the wire using running status coding, and this command becomes the first command in the MIDI command section of an MWPP packet, the MWPP representation begins with a status octet that did not appear in the original MIDI source on the wire. In the case of segmented SysEx commands (see Section 3), the octpos rules apply to the octets of the SysEx command segment as they appear on Lazzaro/Wawrzynek [Page 67] INTERNET-DRAFT 22 September 2002 the MIDI wire. We now show a session description example for the async method. Consider an MWPP sender that is transcoding a MIDI wire protocol command stream into an MWPP UDP RTP stream. The sender runs on a computing platform that time stamps every incoming octet on the MIDI cable serial line, and the sender chooses to use the timestamp of the first octet of each command as the MIDI command timestamp. This stream description accurately describes the transcoding: m=audio 5004 RTP/AVP 96 c=IN IP4 169.229.60.64 a=rtpmap: 96 mwpp/44100 a=fmtp: 96 tsmode=async;linerate=320000;octpos=first; C.2.2 Description of the buffer method The buffer method uses a synchronous sampling of the MIDI wire data. In this model, each arriving octet on the MIDI wire is placed in a buffer, without adding a timestamp. At periodic intervals, the MWPP sender examines the buffer. The sender removes complete MIDI commands from the buffer, and places those commands into the MIDI command section of an MWPP packet. The command timestamp reflects the actual moment of buffer examination, expressed in the units of the RTP timestamp field. Note that in this coding scheme, several commands may have the same command timestamp. The SDP parameter mperiod defines the nominal periodic sampling interval for the buffer tsmode. The mperiod parameter takes on positive integral values, and has units of the RTP timestamp field. The SDP parameter octpos (described in C.2.1 for the async method) is also defined for the buffer method, but takes on different semantics. These semantics address the choice of the command timestamp for MIDI commands whose octets appear on the MIDI wire across several sampling periods. If octpos takes on the symbolic value first, the command timestamp reflects the arrival period of the first octet of the command on the wire. If octpos takes on the symbolic value last, the command timestamp reflects the arrival period of the last octet of the command on the wire. If an octpos parameter does not appear in the session description, MIDI commands whose octets appear across several sampling periods may take on the timestamp value associated with any arrival period of an octet in the command. In the case of segmented SysEx commands (see Section 3), Lazzaro/Wawrzynek [Page 68] INTERNET-DRAFT 22 September 2002 the octpos rules apply to the octets of the SysEx command segment as they appear on the MIDI wire. We now show a session description example for the buffer method. Consider an MWPP sender that is transcoding a MIDI wire protocol command stream into an MWPP UDP RTP stream. The sender runs on a computing platform that places MIDI serial line data into a buffer upon receipt, without timestamps. The sender polls the buffer 1000 times a second, extracts all complete commands from the buffer, and places them in the MIDI command section of an MWPP packet. All of the MIDI command timestamps in this packet are identical, and reflect the actual clock value at the sampling instant, in RTP timestamp units. This stream description accurately describes the transcoding: m=audio 5004 RTP/AVP 96 c=IN IP4 169.229.60.64 a=rtpmap: 96 mwpp/44100 a=fmtp: 96 tsmode=buffer;linerate=320000;octpos=last;mperiod=44; Note that mperiod takes on an integral value, and has the units of the RTP timestamp field. In this example, the mperiod value of 44 is derived by dividing the rtpmap srate (44100 Hz) by the 1000 Hz buffer sampling rate, and rounding to the nearest integer. The MIDI command timestamps might not advance by exact multiples of 44, as the actual buffer sampling period might not precisely match the nominal sampling period. Appendix C.3. SDP Definitions: Media Time In Section 2.1, we define the media time of an MWPP RTP packet as the RTP timestamp difference (modulo 2^32) between the packet's successor and the packet itself. By default, the media time for a packet may be arbitrarily long. For example, consider an MWPP stream that codes the real-time behavior of a musician playing a piano keyboard. If the musician does not play a note for several seconds, there is no reason to send a new packet, and so the media time of the last packet sent may grow without bound. However, for some applications, it is desirable to set a maximum media time for an MWPP packet, that is independent of the source rate of MIDI event data. This constraint acts to set a minimum packet sending rate, which may simplify algorithms performing clock-skew compensation, network latency estimation, and packet loss recovery. Applications may use the SDP maxptime (defined in [9]) for this purpose. The maxptime parameter specifies the maximum amount of media time an Lazzaro/Wawrzynek [Page 69] INTERNET-DRAFT 22 September 2002 MWPP packet encodes, in units of milliseconds. For example, this stream description sets a maximum media time of 0.5 seconds, and thus a minimum packet rate of 2 Hz: m=audio 5004 RTP/AVP 96 c=IN IP4 169.229.60.64 a=rtpmap: 96 mwpp/44100 a=fmtp: 96 maxptime=500; Appendix C.4. SDP Definitions: Multiple Streams Several MWPP streams may appear in a session description. By default, each MWPP stream is an independent entity. The MIDI name space (16 voice channels + systems) for each MWPP stream is unique, and the rendering for each MWPP stream proceeds independently. The audio outputs of the streams are presented simultaneously, using the standard synchronization and audio mixing conventions for RTP. In this Appendix, we define two SDP parameters for use in sessions with several MWPP streams. These parameters (midiport and zerosync) add three features to MWPP: 1. Several MWPP streams may target the same MIDI name space. 2. Several MWPP streams may be bundled to form a larger MIDI name space, that a single rendering system may treat as an ordered entity. 3. Receivers may be informed of the synchronized behavior of the RTP timestamp fields of several MWPP streams, to simplify the time-locked rendering of multi-stream MWPP systems. In Sections C.4.1 and C.4.2, we normatively define the midiport and zerosync parameters. In Section C.4.3, we show a series of examples, that illustrate the feature set described above. C.4.1 The midiport parameter The midiport SDP parameter codes an arbitrary identification number for the MIDI name space (16 voice channels + systems) of an MWPP stream. The midiport parameter may take on integer values between 0 and 429496729. If several MWPP streams in a session share the same midiport value, the streams target the same MIDI name space. We refer to this relationship as the identity relationship. If several MWPP streams in a session have contiguous midiport values (i.e. i, i+1, ... i+k), the name spaces of the MWPP streams form an Lazzaro/Wawrzynek [Page 70] INTERNET-DRAFT 22 September 2002 ordered entity. In this case, the streams in the entity are said to share an ordered relationship. Note that streams may participate in both an identity and an ordered relationship, if MWPP in an identity relationship have a midiport value that forms part of an ordered relationship. If the midiport values of two MWPP streams are not part of an ordered or identity relationship, the two streams are independent, and have independent MIDI name spaces. MWPP streams in an ordered or identity relationship MUST be all native MWPP streams or all mpeg4-generic MWPP streams. Thus, we refer to relationships as being native relationships or mpeg4-generic relationships. All streams in an ordered or identity mpeg4-generic relationship generate audio using the same instance of the synthesis engine, and thus the following restrictions apply: 1. All streams in an identity or ordered relationship must have the same profile-level-id (74 for Main Synthetic, 75 for Wavetable Synthesis, 76 for General MIDI). 2. Ordered relationships MUST NOT be used with Wavetable Synthesis or General MIDI object types, because these systems are only defined for 16 MIDI voice channels. Ordered relationships MAY be used with the Main Synthetic object type, and follow the MIDI semantics defined in 5.14.3.2.2. of [5]. 3. At most one of the streams in an identity or ordered relationship may have a config parameter value other than the empty string. In this case, the non-empty config value configures the stream. Alternatively, the config parameter for all streams may be set to the empty string. In this case, exactly one stream in the relationship MUST define the configuration using the tools described in Section C.5. For MWPP streams in an ordered or identity native relationship, at most one stream may specify a MIDI renderer (using the tools described in C.5). Each MIDI rendering type may define its own semantics with regard to identity and ordered relationships. In an identity relationship, the sender partitions a MIDI name space (16 voice channels + systems) into several MWPP streams. Receivers may process these streams independently, or may merge the streams to reconstitute the original MIDI command stream. We now specify receiver and sender responsibilities to ensure the robust transmission of identity relationships. Lazzaro/Wawrzynek [Page 71] INTERNET-DRAFT 22 September 2002 Receivers that merge identity relationship streams into a single MIDI command stream MUST maintain the structural integrity of the MIDI commands coded in each MWPP during the merging process, in the same way that software that merges traditional MIDI wire protocol flows is responsible for creating a merged command flow compatible with [1]. Senders MUST partition the name space so that the rendered MIDI performance does not contain indefinite artifacts (as defined in Section 4). This responsibility holds even if all streams are sent over reliable transport, as imperfect synchronization of reliable streams may yield indefinite artifacts. For example, stuck notes may occur if a single- channel MIDI performance is split over two TCP streams, if NoteOn commands are sent on the first stream and NoteOff commands are sent on the second stream. A simple way to safely partition voice channel commands is to place all MIDI commands for a particular voice channel into the same MWPP stream. Safe partitions of systems commands may be more complex for streams that extensively use System Exclusive commands. In [22], we discuss identity partitioning issues in detail. C.4.2 The zerosync parameter The RTP timestamp value of the first packet in a stream is not set to zero. Instead, the RTP standard [2] mandates that the RTP timestamp is initialized to a randomly chosen value, to guard against plaintext attacks on encrypted streams. As a consequence, a receiver cannot directly use RTP timestamps to play back two RTP streams in sync, even if the sender is generating synchronized timestamps for the streams. Note that the Real Time Control Protocol (RTCP), a low-bandwidth feedback channel that is paired with each RTP stream, includes a synchronization feature. Certain types of RTCP packets code the current time in two forms: the format of the RTP timestamp, and the 64-bit Network Time Protocol (NTP) format. A receiver may examine the NTP timestamps of several RTCP streams, and use this information to compute the ongoing temporal relationship between the RTP streams associated with the RTCP streams. For many MWPP applications, this RTCP-based method is a good way to synchronize streams. In some applications, however, this method is not optimal, because of the synchronization time delay at the start of the session. The SDP parameter zerosync provides an alternative mechanism for MWPP stream synchronization. The zerosync parameter codes the RTP timestamp offsets for each stream, so that streams that are generated in a synchronized fashion may be played back in sync without using RTCP Lazzaro/Wawrzynek [Page 72] INTERNET-DRAFT 22 September 2002 feedback. The use of the zerosync parameter weakens the security of RTP, as discussed in Section 7 of this memo. The zerosync parameter supports two different ways to normalize RTP timestamp fields. One mechanism is in effect if the zerosync parameter takes on integer values in the range 1 to 429496729. A second mechanism is in effect of the zerosync parameter takes on the special value 0. We first describe the synchronization behavior for non-zero values of zerosync. This synchronization mechanism is designed for use with a set of MWPP streams that form an ordered or identity relationship. For a relationship to use this mechanism, all streams in the relationship MUST include a zerosync parameter set to a non-zero value, and the srate rtpmap parameter (see Section 6.1) of all streams in the relationship MUST have the same value. Given these conditions, the normalized RTP timestamp for a packet in a stream is computed by subtracting (modulo 2^32) the stream zerosync parameter value from the original RTP timestamp of the packet. Next, we describe the synchronization behavior for zero-valued zerosync parameters. All streams in a session with zerosync = 0 are generated from a single RTP timebase. In other words, these streams simply ignore the RTP requirement for random timestamp offsets. All streams whose zerosync values are set to 0 MUST have the same srate rtpmap parameter value. Note that a stream description may contain, at most, one zerosync parameter assignment. A stream may participate in a non-zero-valued zerosync behavior or a zero-valued zerosync behavior, but not both. C.4.3 Multi-stream examples using midiport and zerosync. This section shows several session description examples that use the midiport and zerosync parameters. Our first session description example shows two mpeg4-generic MWPP streams that drive the same General MIDI decoder. v=0 o=lazzaro 2520644554 2838152170 IN IP4 cs.Berkeley.edu s=Example t=3238012065 0 m=audio 5004 RTP/AVP 61 c=IN IP4 169.229.60.64 a=rtpmap: 61 mpeg4-generic/44100 a=fmtp: 61 streamtype=5; mode=mwpp; config="e4"; profile-level-id=76; a=fmtp: 61 midiport=12;zerosync=1726 Lazzaro/Wawrzynek [Page 73] INTERNET-DRAFT 22 September 2002 m=audio 5006 RTP/AVP 62 c=IN IP4 169.229.60.64 a=rtpmap: 62 mpeg4-generic/44100 a=fmtp: 62 streamtype=5; mode=mwpp; config=""; profile-level-id=76; a=fmtp: 62 midiport=12;zerosync=726 The two UDP streams in the session use different UDP ports (5004/5006) that map to different RTP header PT field values (61 and 62). The profile-level-id codes General MIDI. Note that only one config parameter is set to a non-empty string. The midiport values indicate the streams share an identity relationship; the presence of zerosync parameters with non-zero values establish the synchronization mechanism. A variant on this example, whose session description is not shown, is to have two streams in an identity relationship driving the same MIDI renderer, each with a different transport type. One stream would use UDP, and would be dedicated to real-time messages. A second stream would use TCP, and would be dedicated to sending reliable bulk SysEx dumps. In the next example, two mpeg4-generic MWPP streams form an ordered relationship to drive a Structured Audio decoder with 32 MIDI voice channels. v=0 o=lazzaro 2520644554 2838152170 IN IP4 cs.Berkeley.edu s=Example t=3238012065 0 m=audio 5004 RTP/AVP 61 c=IN IP4 169.229.60.64 a=rtpmap: 61 mpeg4-generic/44100 a=fmtp: 61 streamtype=5; mode=mwpp; config=""; profile-level-id=74; a=fmtp: 61 midiport=5;zerosync=0; m=audio 5006 RTP/AVP 62 c=IN IP4 169.229.60.64 a=rtpmap: 62 mpeg4-generic/44100 a=fmtp: 62 streamtype=5; mode=mwpp; config=""; profile-level-id=74; a=fmtp: 62 midiport=6;zerosync=0; a=fmtp: 62 render=sasc; url="http://www.stanford.edu/cardinal.sasc"; a=fmtp: 62 cid="azsldkaslkdjqpwojdkmsldkfpe"; The sequential midiport pattern for the two streams establishes the ordered relationship; the profile-level-id values of 74 indicate Main Synthetic (i.e. Structured Audio). The midiport=5 stream maps to Structured Audio extended channels range 0-15, the midiport=6 stream maps to Structured Audio extended channels range 16-31. Note the use of the zero-valued zerosync option. Finally, note that both config strings are empty. The configuration Lazzaro/Wawrzynek [Page 74] INTERNET-DRAFT 22 September 2002 information for the Structured Audio decoder is specified in the final two fmtp lines of the second media stream description. In Appendix C.5, we describe the coding used in these lines in detail. Appendix C.5. SDP Definitions: MIDI Rendering A MIDI command stream codes a series of high-level events, such as the onset and termination of musical notes. A receiver turns this event stream into audio (or some applications, into control actions such as the dimming of stage lights) by applying a MIDI rendering algorithm. By default, native MWPP streams do not specify a rendering algorithm. This default behavior assumes that the rendering algorithm is sent in- band, via MIDI System Exclusive commands. The minimal native MWPP stream description in Section 6.1 exhibits this default behavior. In contrast, the default rendering algorithm for mpeg4-generic MWPP streams is the MPEG 4 synthesis algorithm coded in the SDP config parameter. The minimal mpeg4-generic MWPP stream description in Section 6.2 exhibits this default behavior. In this Appendix, we define the SDP parameter "render" to override these default rendering methods. Uses of the render parameter must obey the restrictions defined in Appendix C.4.1. This document defines two symbolic values for render: "default" and "sasc". Other standards-track IETF documents may define additional values for render. Receivers MUST NOT participate in sessions if the session description sets the SDP render parameter to a value that is not known by the receiver. We anticipate that the standards-track IETF documents that extend the render parameter will define registration hierarchies for rendering algorithms, whose management will be independent of the IETF AVT Working Group. Candidate hierarchies include the Manufacturer ID registration tree used in MIDI System Exclusive commands [1], and hierarchies based on the DNS registration tree. If the SDP parameter render takes on the value "default", the stream uses the default rendering method, as defined in Section 6.1 (for native MWPP streams) or Section 6.2 (for mpeg4-generic MWPP streams). We describe the use of the sasc value for the render parameter in the following sub-section. Lazzaro/Wawrzynek [Page 75] INTERNET-DRAFT 22 September 2002 C.5.1 The sasc Method The sasc method supports the flexible transport of the MPEG 4 Audio AudioSpecificConfig() binary data block. This structure may contain the configuration data for the General MIDI [1], DLS2 [18], or Structured Audio [5] synthesis methods, as specified in [5]. Only an mpeg4-generic MWPP stream description may use the sasc method. To signal the use of sasc, the config parameter for the mpeg4-generic stream MUST be set to the empty string, AND the SDP render parameter MUST be set to the symbolic value sasc. Two AudioSpecificConfig() transport parameters are defined by sasc method: o The SDP parameter url may be assigned a string that contains a Uniform Resource Locator (URL) to the AudioSpecificConfig() data. o The SDP parameter inline may be assigned a string that contains a Base64 encoding of a representation of AudioSpecificConfig(). Exactly one url parameter assignment or exactly one inline parameter assignment MUST appear in a stream description that uses the sasc method. The url and inline parameters MUST NOT both appear in the same stream description. The sasc method is based on MIME [17]. We consider sasc to be a MIME subtype for the audio media type. The SDP parameters we define in the remainder of this sub-section may also act as MIME parameters for the audio/sasc MIME type. If the url parameter is used in a stream description, the coded URL SHOULD that returns a MIME document of type audio/sasc. We define the following SDP/MIME parameters for use with the sasc method: o compr. The compr parameter indicates which lossless compression algorithm is in use to reduce the size of AudioSpecificConfig(). Compression occurs before any content transfer encoding (such as the Base64 encoding for the inline parameter). This memo defines two legal values for compr: none (for no compression) and gzip (for the gzip compression algorithm as defined in [19]). The default value for compr is gzip. The compr parameter is an extensible parameter; other IETF documents may define new compression methods. Receivers MUST Lazzaro/Wawrzynek [Page 76] INTERNET-DRAFT 22 September 2002 NOT participate in a session if the session description sets the compr parameter to a value that is not known by the receiver. o cid. The cid parameter is assigned a string value that encodes a globally unique identifier for the content encoded in the AudioSpecificConfig(). The cid value supports cache management: if a receiver notices it has previously used an AudioSpecificConfig(), it can avoid redundant transmission or decoding. If an AudioSpecificConfig() is coded in a MIME document, the Content-ID header [17] value MUST match the cid value in the stream description. Using the cid parameter in a MIME document is legal but redundant, because Content-ID also codes the string. If these parameters are in use for a stream, SDP fmtp lines that assign values to these parameters MUST appear in the session description. In addition, if the stream description uses the url parameter to encode a MIME document, the MIME version of these parameters SHOULD appear in the MIME document, unless the parameter definition indicates otherwise. We now show stream description examples for the sasc method. The stream description below uses the inline SDP parameter to code the AudioSpecificConfig() block for a mpeg4-generic General MIDI stream. This stream has the same characteristics as the example shown in Section 6.2. m=audio 5004 RTP/AVP 61 c=IN IP4 169.229.60.64 a=rtpmap: 61 mpeg4-generic/44100 a=fmtp: 61 streamtype=5; mode=mwpp; config=""; profile-level-id=76; a=fmtp: 61 render=sasc; inline="e4"; compr=none; Note the empty string value for config, and the presence of the render parameter. We use a General MIDI stream in this example for didactic purposes; in practice, the sasc method would not be used for a General MIDI stream, because the configuration string is trivially short. The stream description below uses the url SDP parameter to code the AudioSpecificConfig() block for the same General MIDI stream: m=audio 5004 RTP/AVP 61 c=IN IP4 169.229.60.64 a=rtpmap: 61 mpeg4-generic/44100 a=fmtp: 61 streamtype=5; mode=mwpp; config=""; profile-level-id=76; a=fmtp: 61 render=sasc; url="http://www.berkeley.edu/oski.sasc"; a=fmtp: 61 cid="xjflsoeiurvpa09itnvlduihgnvet98pa3w9utnuighbuk"; Lazzaro/Wawrzynek [Page 77] INTERNET-DRAFT 22 September 2002 In this example, the MIME-encoded document oski.sasc, of MIME type audio/sasc, contains the AudioSpecificConfig(). The default gzip compression is used on the AudioSpecificConfig(), and the cid value matches the Content-ID value of oski.sasc. Appendix C.6. ABNF Specifications for MWPP Parameters In this Appendix, we formally define the syntax for the MWPP parameters, using ABNF [23]. MWPP parameters appear in the fmtp lines of session descriptions for native or mpeg4-generic MWPP streams. A fmtp line may be defined in the following way: ; ; SDP fmtp line definition ; fmtp = "a=fmtp:" token 1*(param-assign ";") CRLF where codes the RTP payload type. At the end of this Appendix, we define as a set of 17 incremental rules, one for each custom parameter listed in Figure C.1 of the Appendix C preamble. The mpeg4-generic RTP packetization [4] defines a mode parameter, that signals the type of MPEG stream in use. We extend the mode parameter to signal an MWPP mpeg4-generic stream, using the ABNF rule below: ; ; mpeg4-generic mode parameter extension ; mode /= "mwpp" ; as described in Section 6.2 of this memo Two of the parameters listed in Figure C.1 ("compr" and "cid") may appear in Content-Type field of audio/sasc MIME documents. The rule for MIME headers defined in Appendix A of [17] is compatible with the definitions of "compr" and "cid" in the MWPP parameter ABNF listed below. ; ; top-level definition for all MWPP parameters ; param-assign = "j_sec" "=" ("none" / "recj" / (*ietf-extension)) ; described in Appendix C.1 ; for audio/mwpp and audio/mpeg-generic Lazzaro/Wawrzynek [Page 78] INTERNET-DRAFT 22 September 2002 param-assign /= "j_update" "=" ("anchor" / "closed-loop" / "open-loop") ; described in Appendix C.1 ; for audio/mwpp and audio/mpeg-generic param-assign /= "ch_default" "=" ([channel-list] chapter-list [field-list]) ; described in Appendix C.1 ; for audio/mwpp and audio/mpeg-generic param-assign /= "ch_unused" "=" ([channel-list] chapter-list [field-list]) ; described in Appendix C.1 ; for audio/mwpp and audio/mpeg-generic param-assign /= "ch_never" "=" ([channel-list] chapter-list [field-list]) ; described in Appendix C.1 ; for audio/mwpp and audio/mpeg-generic param-assign /= "ch_anchor" "=" ([channel-list] chapter-list [field-list]) ; described in Appendix C.1 ; for audio/mwpp and audio/mpeg-generic param-assign /= "tsmode" "=" ("comex" / "async" / "buffer") ; described in Appendix C.2 ; for audio/mwpp and audio/mpeg-generic param-assign /= "linerate" "=" nonzero-four-octet ; described in Appendix C.2 ; for audio/mwpp and audio/mpeg-generic param-assign /= "octpos" "=" ("first" / "last") ; described in Appendix C.2 ; for audio/mwpp and audio/mpeg-generic param-assign /= "mperiod" "=" nonzero-four-octet ; described in Appendix C.2 ; for audio/mwpp and audio/mpeg-generic param-assign /= "midiport" "=" four-octet ; described in Appendix C.4 ; for audio/mwpp and audio/mpeg-generic param-assign /= "zerosync" "=" four-octet ; described in Appendix C.4 ; for audio/mwpp and audio/mpeg-generic param-assign /= "render" "=" ("default" / "sasc" / (*ietf-extension)) ; described in Appendix C.5 ; for audio/mwpp and audio/mpeg-generic Lazzaro/Wawrzynek [Page 79] INTERNET-DRAFT 22 September 2002 param-assign /= "url" "=" double-quote uri-element double-quote ; described in Appendix C.5 ; for audio/mpeg-generic param-assign /= "inline" "=" double-quote base-64-block double-quote ; described in Appendix C.5 ; for audio/mpeg-generic param-assign /= "compr" "=" ("none" / "gzip" / (*ietf-extension)) ; described in Appendix C.5 ; for audio/mpeg-generic and audio/sasc param-assign /= "cid" "=" double-quote cid-block double-quote ; described in Appendix C.5 ; for audio/mpeg-generic and audio/sasc ; ; list definitions for the ch_ chapter-list ; chapter-list = chapter-part1 chapter-part2 chapter-part3 chapter-part1 = 0*1"P" 0*1"W" 0*1"N" 0*1"A" chapter-part2 = 0*1"T" 0*1"C" 0*1"M" 0*1"D" chapter-part3 = 0*1"V" 0*1"Q" 0*1"E" 0*1"X" ; ; list definitions for the ch_ channel-list ; channel-list = midi-chan-element 1*(["," midi-chan-element]) midi-chan-element = midi-chan midi-chan-range midi-chan-range = midi-chan "-" midi-chan ; decimal value of left midi-chan ; MUST be strictly less than decimal ; value of right midi-chan midi-chan = %d0-15 ; Lazzaro/Wawrzynek [Page 80] INTERNET-DRAFT 22 September 2002 ; list definitions for the ch_ field-list ; field-list = midi-field-element 1*(["," midi-field-element]) midi-field-element = midi-field midi-field-range midi-field-range = midi-field "-" midi-field ; ; decimal value of left midi-field ; MUST be strictly less than decimal ; value of right midi-field midi-field = %d0-127 ; ; generic rules ; ietf-extension = token ; ; token as defined in reference [9]. ; ietf-extension may only be defined in ; standards-track RFCs, but we expect ; those RFCs to define a namespaces that ; do not require IETF actions. four-octet = %d0-429496729 ; unsigned encoding of 32-bits nonzero-four-octet = %d1-429496729 ; unsigned encoding of 32-bits, ex-zero uri-element = uri ; as defined in reference [9]. base-64-block = base64 ; as defined in reference [9]. double-quote = %x22 ; the double-quote (") character cid-block = msg-id ; as discussed in Section 7 of ; reference [17] ; Lazzaro/Wawrzynek [Page 81] INTERNET-DRAFT 22 September 2002 ; End of ABNF for MWPP parameters. ; Appendix C.7. IANA Considerations In this Appendix, we register the audio/mwpp and audio/sasc MIME types, and we extend the audio/mpeg4-generic MIME type for use with MWPP. The audio/mwpp and audio/sasc registrations are in the IETF tree, as we expect MWPP to be widely used in MIDI and MPEG applications. The mpeg4-generic extensions are in compliance with the extension guidelines in [4]. Appendix C.7.1 mwpp MIME Registration This section registers mwpp as a MIME subtype for the audio type. MIME media type name: audio MIME subtype name: mwpp Required parameters: rate: The RTP timestamp clock rate, as specified in the rtpmap line. See Sections 2.1 and 6.1 of this memo for usage details. Optional parameters: Standard SDP parameters: maxptime: See Appendix C.3 for usage details. Non-extensible parameters: j_sec: See Appendix C.1 for usage details. j_update: See Appendix C.1 for usage details. ch_default: See Appendix C.1 for usage details. Lazzaro/Wawrzynek [Page 82] INTERNET-DRAFT 22 September 2002 ch_unused: See Appendix C.1 for usage details. ch_never: See Appendix C.1 for usage details. ch_anchor: See Appendix C.1 for usage details. tsmode: See Appendix C.2 for usage details. linerate: See Appendix C.2 for usage details. octpos: See Appendix C.2 for usage details. mperiod: See Appendix C.2 for usage details. midiport: See Appendix C.4 for usage details. zerosync: See Appendix C.4 for usage details. Extensible parameter: render: See Appendix C.5 for usage details. The render parameter may only be extended via a standards track IETF document. We anticipate only a few such extensions; however, these extensions will serve to define methods for using existing registries (such as the MIDI Manufacturer Code registry [1]), so that implementors may define new rendering schemes without IETF involvement. Encoding considerations: This type is only defined for real-time transfers of MIDI streams via RTP transport. Note that an industry standard already exists for stored MIDI files [1]. Security considerations: See Section 7 of this memo. Interoperability considerations: None. Published specification: This memo and [1] serve as the normative specification. In addition, references [6], [8], and [22] provide non-normative implementation guidance. Applications which use this media type: Lazzaro/Wawrzynek [Page 83] INTERNET-DRAFT 22 September 2002 Audio content-creation hardware, such as MIDI controller piano keyboards and MIDI audio synthesizers. Audio content-creation software, such as music sequencers, digital audio workstations, and soft synthesizers. In addition, content distribution servers and terminals may use this media type for low bit-rate music coding. Additional information: None. Person & email address to contact for further information: John Lazzaro Intended usage: COMMON. The goal is to replace the asynchronous serial line MIDI networking described in [1] with RTP. If this goal is achieved, thousands of embedded devices will use this media type. Author/Change controller: John Lazzaro Appendix C.7.2 mpeg4-generic MWPP extensions MIME Registration The mpeg4-generic MIME type [4] permits extensions to support new modes. The registration below defines mode mwpp for use with mpeg4-generic. These extensions support the MPEG Audio codecs [5] that use MIDI as a control language. MIME media type name: audio MIME subtype name: Lazzaro/Wawrzynek [Page 84] INTERNET-DRAFT 22 September 2002 mpeg4-generic Required parameter extensions: We extend the mpeg4-generic required parameter mode, by adding the value=parameter syntax: mode=mwpp to the list of legal mode values defined in [4]. See Section 6.2 for usage details. rate: In mode mwpp, rate is a required parameter. Rate specifies the RTP timestamp clock rate on the rtpmap line. See Sections 2.1 and 6.2 for usage details. Optional parameter extensions: Standard SDP parameters: maxptime: See Appendix C.3 for usage details. Non-extensible parameters: j_sec: See Appendix C.1 for usage details. j_update: See Appendix C.1 for usage details. ch_default: See Appendix C.1 for usage details. ch_unused: See Appendix C.1 for usage details. ch_never: See Appendix C.1 for usage details. ch_anchor: See Appendix C.1 for usage details. tsmode: See Appendix C.2 for usage details. linerate: See Appendix C.2 for usage details. octpos: See Appendix C.2 for usage details. mperiod: See Appendix C.2 for usage details. midiport: See Appendix C.4 for usage details. zerosync: See Appendix C.4 for usage details. url: See Appendix C.5.1 for usage details. inline: See Appendix C.5.1 for usage details. cid: See Appendix C.5.1 for usage details. Extensible parameters: render: See Appendix C.5 for usage details. The render parameter may only be extended via a standards track IETF document. Extensions of render in Lazzaro/Wawrzynek [Page 85] INTERNET-DRAFT 22 September 2002 the context of mpeg4-generic would be rare; we define render as extensible to match the render parameter defined for audio/mwpp in Appendix C.7.1. compr: See Appendix C.5.1 for usage details. The compr parameter may only be extended via a standards track IETF document. As compr specifies a compression method for a binary data block, we expect extensions of compr would be rare. Encoding considerations: This type is only defined for real-time transfers of audio/mpeg4-generic streams with mode=mwpp. Security considerations: See Section 7 of this memo. Interoperability considerations: The RTP packet formats for audio/mwpp and audio/mpeg4-generic mode=mwpp are identical. The two packetizations differ in purpose: audio/mpeg4-generic mode=mwpp is for MIDI transport for MPEG synthetic codecs, audio/mwpp is for MIDI transport for all other applications. Software may interoperate with both audio/mwpp and audio/mpeg4-generic mode=mwpp simply by supporting the differing parameter sets for each MIME type. See Section 6 for details. Published specification: This memo, [1], and [5] are the normative references. In addition, references [6], [8], and [22] provide non-normative implementation guidance. Applications which use this media type: MPEG 4 servers and terminals that support [5]. Additional information: Lazzaro/Wawrzynek [Page 86] INTERNET-DRAFT 22 September 2002 None. Person & email address to contact for further information: John Lazzaro Intended usage: COMMON. The codecs in [5] have the potential to be for electronic musical instruments what Postscript is for printers -- the common language to express rendering. If [5] is successful in this goal, audio/mpeg4-generic mode=mwpp will be the RTP transport for playing these electronic musical instruments. Author/Change controller: John Lazzaro Appendix C.7.3 sasc MIME Registration This section registers sasc as a MIME subtype for the audio type. MIME media type name: audio MIME subtype name: sasc Required parameters: none Optional parameters: Non-extensible parameter: Lazzaro/Wawrzynek [Page 87] INTERNET-DRAFT 22 September 2002 cid: See Appendix C.5.1 for usage details. Extensible parameter: compr: See Appendix C.5.1 for usage details. The compr parameter may only be extended via a standards track IETF document. As compr specifies a compression method for a binary data block, we expect extensions of compr would be rare. Encoding considerations: This type is only defined for stored-file transfer. In the MIME registration extension for audio/mpeg4-generic mode=mwpp Appendix C.7.2, we define an optional parameter url. The stored-file data coded by url has the MIME type audio/sasc. The most common transports for audio/sasc are HTTP and SMTP. Security considerations: See Section 7 of this memo. Interoperability considerations: None. Published specification: The binary data coded in a audio/sasc document is normatively defined as the StructuredAudioSpecificConfig object in section 5.5.2 of [5]. Methods for coding this data into a MIME document are normatively defined in Appendix C.5.1 of this memo. Applications which use this media type: Applications that use RTP streams of type audio/mpeg4-generic mode=mwpp, and which wish to specify initialization data of non-trivial size in the session description. Additional information: Lazzaro/Wawrzynek [Page 88] INTERNET-DRAFT 22 September 2002 None. Person & email address to contact for further information: John Lazzaro Intended usage: COMMON. [5] defines three synthetic codecs for MPEG 4: the General MIDI codec originally defined in [1], the DLS2 codec originally defined in [18], and the Structured Audio codec. The latter two codecs have initialization data blocks too large for direct inclusion into SDP session descriptions sent over UDP. If audio/mpeg4-generic mode=mwpp becomes a popular MIME type for use with DLS2 or Structured Audio, audio/sasc will also become a popular MIME type. Author/Change controller: John Lazzaro Lazzaro/Wawrzynek [Page 89] INTERNET-DRAFT 22 September 2002 Appendix D. A MIDI Overview for Networking Specialists This Appendix presents an overview of the MIDI standard, for the benefit of networking specialists new to musical applications. MWPP implementors should consult [1] for a normative description of MIDI. Musicians make music by performing a controlled sequence of physical movements. For example, a pianist plays by coordinating a series of key presses, key releases, and pedal actions. MIDI represents a musical performance by encoding these physical gestures as a sequence of MIDI commands. This high-level musical representation is compact but fragile: one lost command may be catastrophic to the performance. MIDI commands have much in common with the machine instructions of a microprocessor. MIDI commands are defined as binary elements. Bitfields within a MIDI command have a regular structure and a specialized purpose. For example, the upper nibble of the first command octet (the opcode field) codes the command type. MIDI commands may consist of an arbitrary number of complete octets, but most MIDI commands are 1, 2, or 3 octets in length. ------------------------------------------------------------- | Name | Bitfield Pattern | |-------------------------------------------------------------| | NoteOff (end a note) | 1000cccc 0nnnnnnn 0vvvvvvv | |-------------------------------------------------------------| | NoteOn (start a note) | 1001cccc 0nnnnnnn 0vvvvvvv | |-------------------------------------------------------------| | PTouch (Polyphonic Aftertouch) | 1010cccc 0nnnnnnn 0aaaaaaa | |-------------------------------------------------------------| | CControl (Controller Change) | 1011cccc 0xxxxxxx 0yyyyyyy | |-------------------------------------------------------------| | PChange (Program Change) | 1100cccc 0ppppppp | |-------------------------------------------------------------| | CTouch (Channel Aftertouch) | 1101cccc 0aaaaaaa | |-------------------------------------------------------------| | PWheel (Pitch Wheel) | 1110cccc 0xxxxxxx 0yyyyyyy | |-------------------------------------------------------------| | System (sub-opcode is xxxx) | 1111xxxx ... | ------------------------------------------------------------- Figure D.1 -- MIDI Command Chart Figure D.1 shows the MIDI command family. There are two major classes of commands: voice commands (opcode field values in the range 0x8 through 0xE) and system commands (opcode field value 0xF). Voice commands code Lazzaro/Wawrzynek [Page 90] INTERNET-DRAFT 22 September 2002 the musical gestures for each timbre in a composition. Systems commands perform housekeeping functions, such as System Reset (the one-octet command 0xFF). Voice commands execute on one of 16 MIDI channels, as coded by its 4-bit channel field (field cccc in Figure D.1). In most applications, notes for different timbres are assigned to different channels. To support applications that require more than 16 channels, MIDI systems use several MIDI command streams in parallel, to yield 32, 48, or 64 MIDI channels. As an example of a voice command, consider a NoteOn command (opcode 0x9), with binary encoding 1001cccc 0nnnnnnn 0aaaaaaa. This command signals the start of a musical note on MIDI channel cccc. The note has a pitch coded by the note number nnnnnnn, and an onset amplitude coded by note velocity aaaaaaa. Other voice commands signal the end of notes (NoteOff, opcode 0x8), map a specific timbre to a MIDI channel (PChange, opcode 0xC), or set the value of parameters that modulate the timbral quality (all other voice commands). The exact meaning of most voice channel commands depends on the rendering algorithms the MIDI receiver uses to generate sound. In most applications, a MIDI sender has a model (in some sense) of the rendering method used by the receiver. An examination of the opcode bitfields in Figure D.1 reveals a special structure: the leading bit of the first octet is set to 1, and the leading bit of all subsequent octets is set to 0. This structure supports a data compression system, called running status [1], that significantly reduces the size of the MIDI command stream. In running status coding, the first octet of a MIDI command may be dropped if it is identical to the first octet of the previous MIDI command. This rule, in combination with a convention to consider NoteOn commands with a null third octet as NoteOff commands, supports the coding of note sequences using two octets per command. Finally, note that the bitfield formats in Figure D.1 do not encode the execution time for a command. Timing information is not a part of the MIDI command syntax itself; different applications of the MIDI command language use different methods to encode timing. For example, the MIDI Wire Protocol [1], a networking standard for the remote control of musical instruments over short asynchronous serial lines, does not place timestamps on the wire. Instead, the protocol uses an implicit "time of arrival" code: receivers execute MIDI commands at the moment they appear on the wire. In contrast, Standard MIDI Files [1], a file format for representing complete musical performances, adds Lazzaro/Wawrzynek [Page 91] INTERNET-DRAFT 22 September 2002 a timestamp field to each MIDI command, using a delta-time code that is tuned to the statistics of musical performance. Appendix E. Author Addresses John Lazzaro (corresponding author) UC Berkeley CS Division 315 Soda Hall Berkeley CA 94720-1776 Email: lazzaro@cs.berkeley.edu John Wawrzynek UC Berkeley CS Division 631 Soda Hall Berkeley CA 94720-1776 Email: johnw@cs.berkeley.edu Lazzaro/Wawrzynek [Page 92] INTERNET-DRAFT 22 September 2002 Appendix F. References [1] MIDI Manufacturers Association. The complete MIDI 1.0 detailed specification, 1996. http://www.midi.org [2] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson. RTP: A transport protocol for real-time applications. Work in progress, draft-ietf-avt-rtp-new-11.txt. [3] H. Schulzrinne and S. Casner. RTP Profile for Audio and Video Conferences with Minimal Control. Work in progress, draft-ietf-avt-profile-new-12.txt. [4] Internet Engineering Task Force. Transport of MPEG-4 Elementary Streams. Work in progress, draft-ietf-avt-mpeg4-simple-04.txt. [5] International Standards Organization. ISO 14496 MPEG-4, Part 3 (Audio) Subpart 5 (Structured Audio) 1999. [6] John Lazzaro and John Wawrzynek. A Case for Network Musical Performance. The 11th International Workshop on Network and Operating Systems Support for Digital Audio and Video (NOSSDAV 2001) June 25-26, 2001, Port Jefferson, New York. http://www.cs.berkeley.edu/~lazzaro/sa/pubs/pdf/nossdav01.pdf [7] Sfront source code release, includes a Linux networking client that implements the MIDI RTP packetization. http://www.cs.berkeley.edu/~lazzaro/sa/ [8] Dominique Fober, Yann Orlarey, Stephane Letz. Real Time Musical Events Streaming over Internet. Proceedings of the International Conference on WEB Delivering of Music 2001, pages 147-154 http://www.grame.fr/~fober/RTESP-Wedel.pdf [9] M. Handley, V. Jacobson and C. Perkins. SDP: Session Description Protocol. Work in progress, draft-ietf-mmusic-sdp-new-10.txt. [10] J. Rosenberg, H. Schulzrinne, G. Camarillo, A. Johnston, J. Peterson, R. Sparks, M. Handley, and E. Schooler. SIP: Session Initiation Protocol. Internet Engineering Task Force, RFC 3261. [11] J. Rosenberg and H. Schulzrinne. An Offer/Answer Model with SDP. Internet Engineering Task Force, RFC 3264. [12] H. Schulzrinne, A. Rao, and R. Lanphier. Real Time Streaming Protocol (RTSP). Work in progress, draft-ietf-mmusic-rfc2326bis-00.txt. Lazzaro/Wawrzynek [Page 93] INTERNET-DRAFT 22 September 2002 [13] D. D. Clark and D. L. Tennenhouse, "Architectural considerations for a new generation of protocols," in SIGCOMM Symposium on Communications Architectures and Protocols , (Philadelphia, Pennsylvania), pp. 200--208, IEEE, Sept. 1990. Computer Communications Review, Vol. 20(4), Sept. 1990. [14] C. Bormann et al. Robust Header Compression (ROHC). Internet Engineering Task Force, RFC 3095. Also see related work at http://www.ietf.org/html.charters/rohc-charter.html. [15] D. Yon. Connection-Oriented Media Transport in SDP. Work in progress, draft-ietf-mmusic-sdp-comedia-03.txt. [16] International Standards Organization. ISO 14496 MPEG-4, Part 3 (Audio) Subpart 1 (Main Document) 1999. [17] N. Freed and N. Borenstein. MIME Part 1: Format of Internet Message Bodies. RFC 2045, November 1996. [18] MIDI Manufacturers Association. The MIDI Downloadable Sounds Specification, v98.2. Available for purchase at http://www.midi.org. [19] P. Deutsch. GZIP file format specification version 4.3. Internet Engineering Task Force, RFC 1952. [20] C. Perkins et al. RTP Payload for Redundant Audio Data. Internet Engineering Task Force, RFC 2198. [21] J. Rosenberg and H. Schulzrinne. An RTP Payload Format for Generic Forward Error Correction. Internet Engineering Task Force, RFC 2733. [22] John Lazzaro and John Wawrzynek. An Implementation Guide to the MIDI Wire Protocol Packetization (MWPP). An informative IETF I-D (in preparation). [23] D. Crocker and P Overell. Augmented BNF for Syntax Specifications: ABNF. Internet Engineering Task Force, RFC 2234. Lazzaro/Wawrzynek [Page 94]