[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[AVT] Re: [MMUSIC] maxptime
Hi John,
Please see below. I also have to point out that I haven't read your
drafts during the last year or so. I have only looked at certain
specific things in them.
lazzaro wrote:
On Oct 21, 2004, at 12:45 AM, Magnus Westerlund wrote:
> Hi John,
Hi Magnus!
There are a few separable issues here, let's take
them one by one ... CC'd to AVT since these issues cross
boundaries ...
> I think the usage of maxptime=0 may be a bit problematic. A
> interpretation is that no audio can be included in the packet, as no
> audio duration is allowed.
>
> I see that you have made another interpretation in the MIDI draft
> (draft-ietf-avt-rtp-midi-format-06):
>
> "0 ms is a reasonable media time value for MIDI packets. In a packet
> with a 0 ms media time, all commands execute at the instant coded by
> the
> packet timestamp. Prohibitions in [13] against 0 ms ptime values are
> not relevant for MIDI streams, and may be ignored."
>
> An RTP timestamp tick is strictly not an instant, it has a duration
> equal to 1/(RTP timestamp rate).
> However these is my interpretation.
I probably should clarify myself as I have received comments that it
seems like I am reinterpreting the timestamp model. The only point I was
trying to make is that in a digital domain, compared to the analog an
instant is not strictly a precise moment without any duration. Due to
the analog to digital conversion the smallest point in time we can
express is a single sample (RTP TS tick). That sample represent some
duration, equal to 1/(sampling frequency), thus it is not truly
instantaneous. To connect to John's example below, about the
keyboardist. Even if he strikes the keys simultaneously on a timescale
of 1/10000 part of second it is most likely not true when you go to a
timescale of 1/10^9 of a second.
I will return to this later down.
Here's the underlying motivation for the choice I made.
Imagine playing the piano.
Play three notes simultaneously. In MIDI, this corresponds
to three separate NoteOn commands. In RTP MIDI, each
has a timestamp, expressed in RTP timestamp units.
It's really important to the sonic integrity of the piece to code that
the notes happen simultaneously -- even one RTP tick matters.
The reason is, phase effects as the waveforms beat against each
other change the timbre. Hand drummers on the list (conga, etc)
can attest to this -- the art of creating certain timbres on those
types of percussion instruments is based on fine timing of strokes
on the drum with the two hands. Standard MIDI Files do in fact code
true simultaneity, which is why they can be used in studio work.
Musicians go crazy shifting around MIDI notes by tiny amounts inside
the DAW to get this sort of stuff right, and SMFs preserve that hard
work
on export.
Pull back from the nanosecond level to the few-millisecond
level, and we move from timbre distortion to timing sloppiness.
MIDI as transmitted on a MIDI 1.0 DIN cable suffers from this:
at 300 microseconds/command, playing full chords with both
hands creates timing slews you can hear. Part of the motivation
to move off of MIDI DIN cable technology is to make instrument
controllers that don't suffer this timing problem.
I think I understand this, however the RTP timestamp rate as I
understand it for MIDI would be equal to the sample rate for the audio
produced? That would mean that it is reasonable to expect that we are
working with a sample frequency of for example 48kHz. Or do you needed
even higher resolution?
You might ask -- why can't just put all simultaneous notes
in the same packet, and use something other than RTP timestamps
to denote intra-packet timing?
No, I am not questioning this timing model at all. It seems appropriate
for RTP based on your description. However one needs to be aware what it
means. A MIDI command can't be correctly synchronized with a finer scale
then one RTP TS tick. Thus the precision is depending on the timestamp
rate.
To understand the answer, you need to understand that a core
use of RTP MIDI is on "media LANs" -- a stage or studio situation
that may consist of two nodes -- piano controller and laptop --
connected by an Ethernet cable as the Layer 2 for RTP MIDI.
These folks want absolute lowest latency -- so, they don't want
their piano controller to wait around after the first MIDI command
to see if a second command that is "simultaneous" is happening soon.
Instead, they want the piano to take each command, put it in its own RTP
packet, and send it off. So, they want the ability to send two RTP
packets
in a row, each coding one MIDI command, each with the same
RTP timestamp. To do this, they need the RTP timestamp semantics
that say an RTP MIDI packet can have 0 ms media time.
First, I can understand that due to the desire to keep the latency to a
minimal even a short buffering is not desirable, however RTP payloads
are normal gathered using some type of buffering, either at the encoding
step, due to the need to gather sufficient amount of samples to perform
the encoding, or at the packetization step to aggregate together
sufficient amount of data to get the right efficiency in the
transmission. Thus MIDI commands are a bit special.
Secondly I don't think we are talking media time really here. The
command does not truly have a duration as you say. They takes affect the
instant (on a specific sample) indicated by the RTP timestamp. The
problem is really about indicating packetization strategies.
Some instrument makers use Ethernet today. They use their own UDP
solutions (some use OSC on top of UDP) and have their own code. They
want to add support for RTP MIDI, but RTP MIDI needs to work with their
model, which is as I describe it above.
I would also not be surprised if the recovery journal semantics
failed if we moved to your RTP timestamp interpretation, although
I'd have to think about to say for certain. Other things might break
too. This is a 4-year-old project now, and the timestamp model is
a fundamental assumption that underlies RTP MIDI in subtle ways.
I am not redefining anything. I think we simply have misunderstood each
other.
So, that's the reply to issue #1.
Issue #2:
> Is really your intention of using maxptime=0 that for each timestamp
> tick there is a command(s), a new packet should be sent? I can
> understand using ptime=0 to recommend that you do this behaviour but
> using maxptime to enforce the behaviour?
Yes. Once again, we're talking about apps in the "media network"
world, a laptop and piano keyboard with a point-to-point 2-node
Ethernet as the Layer 2 for RTP MIDI. The maxptime=0 parameter
says: all packets must either code no MIDI commands, or code one
or more MIDI commands with the same timestamp -- and thus, the
media time for all packets must be 0 ms.
This accommodates receivers (such as, currently, sfront!) that do not
implement a receiver buffer of any sort. Instead, sfront grabs a
packet,
and executes all MIDI commands in the packet simultaneously.
It relies on the constant nominal latency of the point-to-point Layer 2,
and the constant nominal latency of the keyboard sender, for timing
accuracy -- not RTP timestamps, which are ignored.
Then it is not really RTP you are using. You are ignoring the
functionality the timestamp provides to place the different packets
correctly on a timescale to combat any transport jitter. As it seems the
local stage applications are relying on a consistent transport behavior.
Thus, these receivers can't handle a packet that has a non-zero media
time -- all they know is "now".
Then they are not RTP aware, a RTP receiver needs to have knowledge
about how time flows.
If one is designing receivers exclusively for use on media LANs
as live performance instruments (in other words -- a musical
instrument synthesizers intended to only be played locally), this
"no buffer" model makes perfect sense -- it's how soft synths that
are driven by MIDI 1.0 cables work today. Eventually, it would be
better if these synths incorporated small configurable buffers to
compensate for sub-ms jitter, but the conceptual hurdle to do so
would be too great for many synth authors to do. In practice,
OS APIs will probably implement those buffers if/when they choose
to support RTP MIDI as a media LAN MIDI source.
So, we need a way for those soft synths to negotiation (via
Offer/Answer)
to say "I can only handle 0 ms packets, do not send me any other
kind".
The problem is really to indicate the desired behavior. Which is not
really what ptime and maxptime is about. They give the recommend and
maximal amount of media duration for a packet.
Issue #3
> Especially any that is primarily to handle Offer/Answer, where a
> normative
> MUST be larger than 0 is present. I think you should consider the
> impact on interoperability.
Well, the easiest thing to do is invent my own SDP parameter as an
RTP MIDI replacement for maxptime -- this skirts the compatibility
issue. But as I explained above, I think we have a legitimate need
for the negotiating functionality. And I think the most elegant thing
to do is for maxptime to bend its definition to include 0 ms, just for
RTP MIDI, which is why I did it that way ... a compromise would be
to put interoperability warning language into the RTP MIDI I-D, but
keep the normative text as is.
The basic problem is that you are using ptime and maxptime way beyond
what was ever intended by the authors of these parameters. I am certain
of this when it comes to maxptime, as I am the one that defined that one
with some help.
Therefore I think one needs to look at the different use cases for RTP
midi. I think there are several.
1. Local play, where most of RTP's timing model seems to be ignored, as
it can't be excepted to buffer in either end of the chain.
2. Transmitting a live session to another point, but in which cases it
is acceptable to introduce some small delay. This is done over a network
that do not have the jitter behavior necessary to ignore the timestamp.
Which means a receiver must follow the timestamp to give the right inter
command time spacing.
3. Streaming of midi content.
In 1, I think you need something new to indicate the intended behavior
of minimal transmission delay and local live play. However I do hope
that a sender at least correctly timestamp the RTP packets sent. Thus
the RTP receiver can in the future start using the timing model correctly.
In 2, I think the usage of ptime and maxptime could be reasonable to
instruct a sender to gather 10, 50 or 100 samples or 1, 5, 20 ms of
commands into each packet. However if you like to have finer resolution
than 1 ms then it is necessary to have a new parameter.
In 3, theses parameters are not used at all. The amount of commands in
the packets are pre determined for best performance at the receiver.
In conclusion I would suggest that you use another MIME parameter than
ptime and maxptime to indicate the desired behavior.
I also think that the format drafts needs to be more explicit on
considerations for the RTP timestamp. What rates are suitable, what are
the impact on playback timing. There seems to be a bit on this in the
guidelines, but I haven't had time to read it fully.
Cheers
Magnus Westerlund
Multimedia Technologies, Ericsson Research EAB/TVA/A
----------------------------------------------------------------------
Ericsson AB | Phone +46 8 4048287
Torshamsgatan 23 | Fax +46 8 7575550
S-164 80 Stockholm, Sweden | mailto: magnus.westerlund at ericsson.com
_______________________________________________
Audio/Video Transport Working Group
avt at ietf.org
https://www1.ietf.org/mailman/listinfo/avt