-----Original Message-----
From: simple-bounces at ietf.org
[mailto:simple-bounces at ietf.org]On Behalf
Of ext Paul Kyzivat
Sent: 09.October.2004 01:51
To: Cullen Jennings
Cc: Adam Roach; simple at ietf.org; Niemi Aki (Nokia-NRC/Helsinki); Rohan
Mahy
Subject: Re: MSRP: Are REPORTs per-chunk or per-message? (was Re:
[Simple]Review of draft-ietf-simple-message-sessions-08 - Byte Ranges
inREPORTs)
Cullen Jennings wrote:
I think Paul clarified this for me. The semantics of the
protocol are that
reports are for the byterange specified in the report. No
more or less is
implied on how this range lines up with any chunk or message.
If someone has asked for reports that a message succeeded,
you can't do that
till you have all the message so you don't send a SUCCESS
report until you
have all the bytes in the message.
You *can* do it on a per-byte-range basis. There is a
question of *why*
you would do that - it would only be because it was of some value in
error recovery.
If you know some bytes have failed, and the sender has
asked for failure
reports, you send that as soon as you know which bytes have
failed. You
might later send another one that other bytes have failed.
The byte ranges
in a FAILURE report will correspond to some chunk somewhere
but that may not
correspond to any chunk that the first sender sent since an
intermediate
relay may have chunked differently.
Right.
The issue of tracking which bytes you have received and not
received already
existed as soon as we did chunks even if there was no
reporting. This is not
hard to implement with linked lists of chunks.
Yes. But if you want to send partial success reports, there is some
complexity in deciding when to do so, and what range to include in it.
We haven't mentioned the issue, but if we can lose chunks of
message, we
can probably lose REPORTs too. If there is known to be only
one success
report, and it fails to come, the sender can time and declare
a failure.
If there are multiple success reports for disjoint byte
ranges, and you
don't get them all, then what? I guess you wait until you have sent
everything once, and then timed out waiting for all the
success reports,
and then you could try retransmitting the ranges that haven't been
confirmed. But then what? Do you go thru another timeout cycle?
When you receive a negative report on a byterange. You
renegotiate the
connection and resend that chunk.
That works for negative ones, but not the positive kind. But negative
ones are even less reliable. If things aren't working well so that
chunks are being lost, what is chance that the failure will
be lost too?
Another question that comes to mind: Suppose I have sent bytes
1-1,000,000. I get a failure report for bytes 500,000-501,000. If I
renegotiate a new connection, right away, and close the old
one, what do
I retransmit? Just 500,000-501,000, or everything starting
from 500,000?
There could be failure reports for other byte ranges that I would
receive if I kept listening to the old connection - even for earlier
byte ranges than the one I received. If I kill the connection
and start
another then I won't get those. I can just retransmit starting from
500,000 and hope that nothing earlier failed too.
As some level this is complicated - as soon as we said that
we wanted a
system that multiplexed message onto one TCP transport, we
choose this level
of complexity. None of this is that complicated - we have
been around this
over and over for the last 1.5 years, if we remove any part
of this system
we break requirements that people said were absolutely
critical to them.
I agree that some scheme like this is ultimately not that complicated.
But it isn't written down anywhere. The question is whether
we want to
take the time to write it all down.
The various combinations of success and failure reports with
and without
byte ranges at least makes a lot of cases to explain and for
people to
understand and implement correctly.
Forbidding byte ranges in success reports would at least prune away a
few of the options.
Paul
On 10/8/04 7:39 AM, "Paul Kyzivat" <pkyzivat at cisco.com> wrote:
Cullen Jennings wrote:
I had a conversation with Cullen, where he expressed the
same opinion
for failure reports. Imagine, in your LoTR example, you
were using a
relay, and it has a transport failure between itself and
the next hop.
It sends you one or more failure reports for the chunks
for which it
could not confirm delivery. You establish a new session,
and continue;
resending the failed chunks.
I'd like to close on this quickly, so I offer the
following questions
to anyone who cares. I would appreciate opinions asap, so
I can get
this into the next revision.
1) Should failure reports be sent per chunk or per
complete message? (I
think it is per chunk.)
sort of both - yes they have to be per chunk so the chunk
had a fialure in
transmission but the message is fine. If the whole message
is going to be
bad, because of something like the body type is not
understood, then they
are sent on the whole message. You indicate they are on
the whole message by
using a * in the byterange
This is confusing. The failure reports are really for byte
ranges, not
chunks, since there is no consistent understanding by all parties of
what the chunking is. And the failure reports aren't necessarily all
sent by the same node.
It seems that if failure reports for byte ranges are
supported at all,
then the sender will need to in effect keep a separate
status for every
byte sent. The status for each byte is one of:
- unknown
- failure reported
- success reported
(Obviously there are optimizations on how to store this status for a
message of many bytes.) It appears that there could be at
least as many
reasons for overlapping byte ranges in reports as in sends, so that
needs to be handled as well.
Then the sender will need to decide what to do if failure
is reported
for one or more bytes. It could decide to just give up on
the message
and stop sending. Or it could just retransmit a chunk
containing the bad
byte(s). (For that to be useful, it would have to assume
that some relay
was at fault and will recover.) Or it could negotiate a replacement
session and then send a chunk containing the bad byte(s).
This is starting to look very ugly.
2) Should success reports be sent per chunk or per
complete message?
Note that, if we send them per-chunk, then the sender
must accumulate 1
or more reports covering all the bytes in the message
before it can
consider the message successful. These reports may or may not
correspond one-to-one with the chunks it sent, as a relay
may re-chunk.
(Again, I vote per-chunk.)
I think we have to have a success for the whole message or
there is no way
to know if everything has arrived - questions to me is do
we also need to
have chunk by chunk acknowledgments along the way. (This
is all assuming
that positive reports where requested). I'm not sure I see
the reason that
these would be needed but I feel like I am forgetting something.
Success reports only make sense end to end. Ultimately you need
confirmation of success of the *whole* message. So if byte
ranges are
used then the sum total of them must cover the entire message.
This is complicated by the fact that REPORTs aren't
confirmed, and so
might be lost. (E.g. by a relay.)
If byte ranges in success reports are to be used, then I
think the best
strategy might be for the receiving node to send cumulative
ones. For
instance, send one for the largest range of bytes received
that includes
previously unconfirmed bytes. (With in-order delivery this
would mean
that each report would cover everything received so far.)
These reports
might not be sent for every chunk received - just every so often.
This is of course just as complicated for the sender as the failure
reports above, and more complicated for the receiver.
3) Do we need to say anything about how long the
_receiver_ holds onto
the chunks of an incomplete message in hopes any remaining chunks
arrive? In particular, do we need to talk about this for
holding chunks
after a session fails, in case the sender establishes a
new session and
sends the remaining chunks?
No, I think this is like how long the end to end timer
should be - it is
totally situation dependent. Probably needs to say
something along lines off
"for awhile"
I am inclined to agree.
(I vote that we say no more than we already did in 08 concerning
re-sending chunks on new sessions.)
4) Do we need to enable the receiver to send a failure
report about
_missing_ chunks? This does not make sense with success reports
requested, but might make sense when only failure reports are
requested.
I don't think so - if the sender cared about the data,
someone will sent a
negative at the appropriate time. The chunks can arrive
out of order and
this makes it hard for the receiver to know when to
request a missing block.
I agree this doesn't make sense.
Because of the complexity that is now apparent, I only see
a couple of
useful ways forward:
- Specify how to handle recovery, driven by byte ranges in
responses.
- Put off recovery to a future draft. Remove support for byte ranges
in reports. But figure out how we could gracefully
incorporate that
future draft and migrate to support of it.
- Entirely abandon any thought of recovery in MSRP. Remove
support for
byte ranges from reports. Leave it for a replacement protocol.
I think leaving byte ranges in reports without specifying how to use
them for recovery will result in major interoperability problems.
I am of mixed thought about which of those alternative I
prefer. Given
the time pressures and priorities, I could easily be convinced to
abandon any thought of recovery, though I am not happy about it.
Paul
_______________________________________________
Simple mailing list
Simple at ietf.org
https://www1.ietf.org/mailman/listinfo/simple