[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[AVT] Re: [MMUSIC] maxptime
On Oct 21, 2004, at 12:45 AM, Magnus Westerlund wrote:
Hi John,
Hi Magnus!
There are a few separable issues here, let's take
them one by one ... CC'd to AVT since these issues cross
boundaries ...
I think the usage of maxptime=0 may be a bit problematic. A
interpretation is that no audio can be included in the packet, as no
audio duration is allowed.
I see that you have made another interpretation in the MIDI draft
(draft-ietf-avt-rtp-midi-format-06):
"0 ms is a reasonable media time value for MIDI packets. In a packet
with a 0 ms media time, all commands execute at the instant coded by
the
packet timestamp. Prohibitions in [13] against 0 ms ptime values are
not relevant for MIDI streams, and may be ignored."
An RTP timestamp tick is strictly not an instant, it has a duration
equal to 1/(RTP timestamp rate).
However these is my interpretation.
Here's the underlying motivation for the choice I made.
Imagine playing the piano.
Play three notes simultaneously. In MIDI, this corresponds
to three separate NoteOn commands. In RTP MIDI, each
has a timestamp, expressed in RTP timestamp units.
It's really important to the sonic integrity of the piece to code that
the notes happen simultaneously -- even one RTP tick matters.
The reason is, phase effects as the waveforms beat against each
other change the timbre. Hand drummers on the list (conga, etc)
can attest to this -- the art of creating certain timbres on those
types of percussion instruments is based on fine timing of strokes
on the drum with the two hands. Standard MIDI Files do in fact code
true simultaneity, which is why they can be used in studio work.
Musicians go crazy shifting around MIDI notes by tiny amounts inside
the DAW to get this sort of stuff right, and SMFs preserve that hard
work
on export.
Pull back from the nanosecond level to the few-millisecond
level, and we move from timbre distortion to timing sloppiness.
MIDI as transmitted on a MIDI 1.0 DIN cable suffers from this:
at 300 microseconds/command, playing full chords with both
hands creates timing slews you can hear. Part of the motivation
to move off of MIDI DIN cable technology is to make instrument
controllers that don't suffer this timing problem.
You might ask -- why can't just put all simultaneous notes
in the same packet, and use something other than RTP timestamps
to denote intra-packet timing?
To understand the answer, you need to understand that a core
use of RTP MIDI is on "media LANs" -- a stage or studio situation
that may consist of two nodes -- piano controller and laptop --
connected by an Ethernet cable as the Layer 2 for RTP MIDI.
These folks want absolute lowest latency -- so, they don't want
their piano controller to wait around after the first MIDI command
to see if a second command that is "simultaneous" is happening soon.
Instead, they want the piano to take each command, put it in its own RTP
packet, and send it off. So, they want the ability to send two RTP
packets
in a row, each coding one MIDI command, each with the same
RTP timestamp. To do this, they need the RTP timestamp semantics
that say an RTP MIDI packet can have 0 ms media time.
Some instrument makers use Ethernet today. They use their own UDP
solutions (some use OSC on top of UDP) and have their own code. They
want to add support for RTP MIDI, but RTP MIDI needs to work with their
model, which is as I describe it above.
I would also not be surprised if the recovery journal semantics
failed if we moved to your RTP timestamp interpretation, although
I'd have to think about to say for certain. Other things might break
too. This is a 4-year-old project now, and the timestamp model is
a fundamental assumption that underlies RTP MIDI in subtle ways.
So, that's the reply to issue #1.
Issue #2:
Is really your intention of using maxptime=0 that for each timestamp
tick there is a command(s), a new packet should be sent? I can
understand using ptime=0 to recommend that you do this behaviour but
using maxptime to enforce the behaviour?
Yes. Once again, we're talking about apps in the "media network"
world, a laptop and piano keyboard with a point-to-point 2-node
Ethernet as the Layer 2 for RTP MIDI. The maxptime=0 parameter
says: all packets must either code no MIDI commands, or code one
or more MIDI commands with the same timestamp -- and thus, the
media time for all packets must be 0 ms.
This accommodates receivers (such as, currently, sfront!) that do not
implement a receiver buffer of any sort. Instead, sfront grabs a
packet,
and executes all MIDI commands in the packet simultaneously.
It relies on the constant nominal latency of the point-to-point Layer 2,
and the constant nominal latency of the keyboard sender, for timing
accuracy -- not RTP timestamps, which are ignored.
Thus, these receivers can't handle a packet that has a non-zero media
time -- all they know is "now".
If one is designing receivers exclusively for use on media LANs
as live performance instruments (in other words -- a musical
instrument synthesizers intended to only be played locally), this
"no buffer" model makes perfect sense -- it's how soft synths that
are driven by MIDI 1.0 cables work today. Eventually, it would be
better if these synths incorporated small configurable buffers to
compensate for sub-ms jitter, but the conceptual hurdle to do so
would be too great for many synth authors to do. In practice,
OS APIs will probably implement those buffers if/when they choose
to support RTP MIDI as a media LAN MIDI source.
So, we need a way for those soft synths to negotiation (via
Offer/Answer)
to say "I can only handle 0 ms packets, do not send me any other
kind".
As an aside, RTP MIDI works fairly well Berkeley/Stanford/Berkeley using
the no-buffer sfront -- it doesn't get down to the "hand percussion
jitter"
I mention above, but with a few coding heuristics it produces a playable
instrument when I do loop-back (keyboard in Berkeley, RTP MIDI to
Stanford and back, which sfront then turns into audio). Chris Chafe
sees jitter under the audio sample rate on his DAT-quality long-distance
streaming sessions on occassion!
Issue #3
Especially any that is primarily to handle Offer/Answer, where a
normative
MUST be larger than 0 is present. I think you should consider the
impact on interoperability.
Well, the easiest thing to do is invent my own SDP parameter as an
RTP MIDI replacement for maxptime -- this skirts the compatibility
issue. But as I explained above, I think we have a legitimate need
for the negotiating functionality. And I think the most elegant thing
to do is for maxptime to bend its definition to include 0 ms, just for
RTP MIDI, which is why I did it that way ... a compromise would be
to put interoperability warning language into the RTP MIDI I-D, but
keep the normative text as is.
So lets try to sort this out by answering some questions:
- How big is the need to be able to specify ptime=0 and maxptime=0,
and what is the effect is the smallest one can use is ptime=1?
Hopefully this is answered above.
- Do any implementers of generic SDP parsers have any problems with
ptime or maxptime equals 0?
- Why does RFC 3264 have a normative MUST against using ptime=0?
- Any further considerations?
These questions seem aimed at others on the list ... however, I
should note that I didn't take the route I did on maxptime
lightly -- after several years of consensus building with the music
technology community, the approach to RTP timing I outline above
is the best way I can see to solve their problems.
---
John Lazzaro
http://www.cs.berkeley.edu/~lazzaro
lazzaro [at] cs [dot] berkeley [dot] edu
---
_______________________________________________
Audio/Video Transport Working Group
avt at ietf.org
https://www1.ietf.org/mailman/listinfo/avt