[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[AVT] Re: [MMUSIC] maxptime



Hi John,

Please see below. I also have to point out that I haven't read your drafts during the last year or so. I have only looked at certain specific things in them.

lazzaro wrote:

On Oct 21, 2004, at 12:45 AM, Magnus Westerlund wrote: > Hi John,

Hi Magnus!

        There are a few separable issues here, let's take
them one by one ... CC'd to AVT since these issues cross
boundaries ...

 > I think the usage of maxptime=0 may be a bit problematic. A
 > interpretation is that no audio can be included in the packet, as no
 > audio duration is allowed.
 >
 > I see that you have made another interpretation in the MIDI draft
 > (draft-ietf-avt-rtp-midi-format-06):
 >
 > "0 ms is a reasonable media time value for MIDI packets.  In a packet
 > with a 0 ms media time, all commands execute at the instant coded by
 > the
 > packet timestamp.  Prohibitions in [13] against 0 ms ptime values are
 > not relevant for MIDI streams, and may be ignored."
 >
 > An RTP timestamp tick is strictly not an instant, it has a duration
 > equal to 1/(RTP timestamp rate).
 > However these is my interpretation.


I probably should clarify myself as I have received comments that it seems like I am reinterpreting the timestamp model. The only point I was trying to make is that in a digital domain, compared to the analog an instant is not strictly a precise moment without any duration. Due to the analog to digital conversion the smallest point in time we can express is a single sample (RTP TS tick). That sample represent some duration, equal to 1/(sampling frequency), thus it is not truly instantaneous. To connect to John's example below, about the keyboardist. Even if he strikes the keys simultaneously on a timescale of 1/10000 part of second it is most likely not true when you go to a timescale of 1/10^9 of a second.


I will return to this later down.

Here's the underlying motivation for the choice I made.

Imagine playing the piano.

Play three notes simultaneously. In MIDI, this corresponds
to three separate NoteOn commands.  In RTP MIDI, each
has a timestamp, expressed in RTP timestamp units.

It's really important to the sonic integrity of the piece to code that
the notes happen simultaneously -- even one RTP tick matters.
The reason is, phase effects as the waveforms beat against each
other change the timbre.  Hand drummers on the list (conga, etc)
can attest to this -- the art of creating certain timbres on those
types of percussion instruments is based on fine timing of strokes
on the drum with the two hands.  Standard MIDI Files do in fact code
true simultaneity, which is why they can be used in studio work.
Musicians go crazy shifting around MIDI notes by tiny amounts inside
the DAW to get this sort of stuff right, and SMFs preserve that hard
work
on export.

Pull back from the nanosecond level to the few-millisecond
level, and we move from timbre distortion to timing sloppiness.
MIDI as transmitted on a MIDI 1.0 DIN cable suffers from this:
at 300 microseconds/command, playing full chords with both
hands creates timing slews you can hear.  Part of the motivation
to move off of MIDI DIN cable technology is to make instrument
controllers that don't suffer this timing problem.

I think I understand this, however the RTP timestamp rate as I understand it for MIDI would be equal to the sample rate for the audio produced? That would mean that it is reasonable to expect that we are working with a sample frequency of for example 48kHz. Or do you needed even higher resolution?



You might ask -- why can't just put all simultaneous notes in the same packet, and use something other than RTP timestamps to denote intra-packet timing?

No, I am not questioning this timing model at all. It seems appropriate for RTP based on your description. However one needs to be aware what it means. A MIDI command can't be correctly synchronized with a finer scale then one RTP TS tick. Thus the precision is depending on the timestamp rate.



To understand the answer, you need to understand that a core use of RTP MIDI is on "media LANs" -- a stage or studio situation that may consist of two nodes -- piano controller and laptop -- connected by an Ethernet cable as the Layer 2 for RTP MIDI. These folks want absolute lowest latency -- so, they don't want their piano controller to wait around after the first MIDI command to see if a second command that is "simultaneous" is happening soon. Instead, they want the piano to take each command, put it in its own RTP packet, and send it off. So, they want the ability to send two RTP packets in a row, each coding one MIDI command, each with the same RTP timestamp. To do this, they need the RTP timestamp semantics that say an RTP MIDI packet can have 0 ms media time.

First, I can understand that due to the desire to keep the latency to a minimal even a short buffering is not desirable, however RTP payloads are normal gathered using some type of buffering, either at the encoding step, due to the need to gather sufficient amount of samples to perform the encoding, or at the packetization step to aggregate together sufficient amount of data to get the right efficiency in the transmission. Thus MIDI commands are a bit special.


Secondly I don't think we are talking media time really here. The command does not truly have a duration as you say. They takes affect the instant (on a specific sample) indicated by the RTP timestamp. The problem is really about indicating packetization strategies.


Some instrument makers use Ethernet today. They use their own UDP solutions (some use OSC on top of UDP) and have their own code. They want to add support for RTP MIDI, but RTP MIDI needs to work with their model, which is as I describe it above.

I would also not be surprised if the recovery journal semantics
failed if we moved to your RTP timestamp interpretation, although
I'd have to think about to say for certain.  Other things might break
too.  This is a 4-year-old project now, and the timestamp model is
a fundamental assumption that underlies RTP MIDI in subtle ways.


I am not redefining anything. I think we simply have misunderstood each other.


So, that's the reply to issue #1.

Issue #2:

 > Is really your intention of using maxptime=0 that for each timestamp
 > tick there is a command(s), a new packet should be sent? I can
 > understand using ptime=0 to recommend that you do this behaviour but
 > using maxptime to enforce the behaviour?

Yes. Once again, we're talking about apps in the "media network"
world, a laptop and piano keyboard with a point-to-point 2-node
Ethernet as the Layer 2 for RTP MIDI.  The maxptime=0 parameter
says: all packets must either code no MIDI commands, or code one
or more MIDI commands with the same timestamp -- and thus, the
media time for all packets must be 0 ms.

This accommodates receivers (such as, currently, sfront!) that do not
implement a receiver buffer of any sort.  Instead, sfront grabs a
packet,
and executes all MIDI commands in the packet simultaneously.
It relies on the constant nominal latency of the point-to-point Layer 2,
and the constant nominal latency of the keyboard sender, for timing
accuracy -- not RTP timestamps, which are ignored.

Then it is not really RTP you are using. You are ignoring the functionality the timestamp provides to place the different packets correctly on a timescale to combat any transport jitter. As it seems the local stage applications are relying on a consistent transport behavior.



Thus, these receivers can't handle a packet that has a non-zero media time -- all they know is "now".


Then they are not RTP aware, a RTP receiver needs to have knowledge about how time flows.


If one is designing receivers exclusively for use on media LANs
as live performance instruments (in other words -- a musical
instrument synthesizers intended to only be played locally), this
"no buffer" model makes perfect sense -- it's how soft synths that
are driven by MIDI 1.0 cables work today.  Eventually, it would be
better if these synths incorporated small configurable buffers to
compensate for sub-ms jitter, but the conceptual hurdle to do so
would be too great for many synth authors to do.  In practice,
OS APIs will probably implement those buffers if/when they choose
to support RTP MIDI as a media LAN MIDI source.

So, we need a way for those soft synths to negotiation (via
Offer/Answer)
to say "I can only handle 0 ms packets, do not send me any other
kind".

The problem is really to indicate the desired behavior. Which is not really what ptime and maxptime is about. They give the recommend and maximal amount of media duration for a packet.



Issue #3

 > Especially any that is primarily to handle Offer/Answer, where a
 > normative
 > MUST be larger than 0 is present. I think you should consider the
 > impact on interoperability.

Well, the easiest thing to do is invent my own SDP parameter as an
RTP MIDI replacement for maxptime -- this skirts the compatibility
issue.  But as I explained above, I think we have a legitimate need
for the negotiating functionality.  And I think the most elegant thing
to do is for maxptime to bend its definition to include 0 ms, just for
RTP MIDI, which is why I did it that way ... a compromise would be
to put interoperability warning language into the RTP MIDI I-D, but
keep the normative text as is.

The basic problem is that you are using ptime and maxptime way beyond what was ever intended by the authors of these parameters. I am certain of this when it comes to maxptime, as I am the one that defined that one with some help.


Therefore I think one needs to look at the different use cases for RTP midi. I think there are several.

1. Local play, where most of RTP's timing model seems to be ignored, as it can't be excepted to buffer in either end of the chain.

2. Transmitting a live session to another point, but in which cases it is acceptable to introduce some small delay. This is done over a network that do not have the jitter behavior necessary to ignore the timestamp. Which means a receiver must follow the timestamp to give the right inter command time spacing.

3. Streaming of midi content.

In 1, I think you need something new to indicate the intended behavior of minimal transmission delay and local live play. However I do hope that a sender at least correctly timestamp the RTP packets sent. Thus the RTP receiver can in the future start using the timing model correctly.

In 2, I think the usage of ptime and maxptime could be reasonable to instruct a sender to gather 10, 50 or 100 samples or 1, 5, 20 ms of commands into each packet. However if you like to have finer resolution than 1 ms then it is necessary to have a new parameter.

In 3, theses parameters are not used at all. The amount of commands in the packets are pre determined for best performance at the receiver.

In conclusion I would suggest that you use another MIME parameter than ptime and maxptime to indicate the desired behavior.

I also think that the format drafts needs to be more explicit on considerations for the RTP timestamp. What rates are suitable, what are the impact on playback timing. There seems to be a bit on this in the guidelines, but I haven't had time to read it fully.

Cheers

Magnus Westerlund

Multimedia Technologies, Ericsson Research EAB/TVA/A
----------------------------------------------------------------------
Ericsson AB                | Phone +46 8 4048287
Torshamsgatan 23           | Fax   +46 8 7575550
S-164 80 Stockholm, Sweden | mailto: magnus.westerlund at ericsson.com

_______________________________________________
Audio/Video Transport Working Group
avt at ietf.org
https://www1.ietf.org/mailman/listinfo/avt