[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[dccp] dccp spec expert review (Minshall, main spec)



Folks-

This is the first set of review comments and apply to the main spec.
Of the other two reviewers, one indicated that we'd receive comments
by today and the other indicated they'd be a bit late.  

It is quite detailed and raises may points that require clarification,
challenging some of the DCCP design decisions.  I'd like to try to
resolve as much as we can before the IETF meeting next month.

In Vienna, we'll be trying something somewhat uncommon in the IETF, a
design review.  I'll be sending out more on the meeting format and
objectives soon. So, stay tuned ...

--aaron

----- Forwarded message from Greg Minshall <minshall@acm.org> -----

From: Greg Minshall <minshall@acm.org>
To: Aaron Falk <falk@ISI.EDU>
Subject: Re: dccp spec expert review 
Date: Fri, 13 Jun 2003 14:14:05 -0700

Aaron,

below you should find my review of the base document.

Greg
----
DRAFT-IETF-DCCP-SPEC-02.TXT

This is a review of draft-ietf-dccp-spec-02.txt, which I was asked to
review by Aaron Falk <falk@isi.edu> on behalf of the DCCP working
group of the IETF.

Along with this document, I was provided pointers to the charter of
the DCCP working group and to draft-ietf-dccp-problem-00.txt, which I
have read.  I was also provided a pointer to a paper "Designing DCCP:
Congestion Control Without Reliability", by Kohler, Handley, and
Floyd.  I chose not to read this paper as part of the review process,
believing that the design specification of a protocol for the IETF
should be completely specified within IETF documents.

In addition to the reviews currently solicited, there are many other
members of the community who might provide useful and insightful input
into the design of DCCP.  Included in this group might be Dave Borman,
Bob Braden, Steve Casner, Jeff Mogul, and Stefan Savage.

This review written in a style which I would use in reviewing a paper
for publication.


SUMMARY STATEMENT

This draft is on the right track, but has open issues which are
described below in the review.


GENERAL COMMENTS

It would be important to have multiple, independent, interoperable
implementations of DCCP before fully standardizing the protocol.

I worry about 24 bit sequence numbers as networks get faster.  I would
feel more comfortable with 32 bit sequence numbers.  I understand,
though, the engineering constraint of trying to keep the header as
small as possible.

There need to be extension mechanisms for such pieces of the protocol
as options and reset reasons, with a way of knowing what can be
ignored if not understood by the receiver.  I didn't see this in the
document.


SECTION-SPECIFIC COMMENTS

(3.3)

 "DCCP does not support TCP-style simultaneous open..."  Why is this?

"DCCP does not support half-open connections either.  That is, DCCP
shuts down both half-connections as a unit. However, DCCP SHOULD allow
applications to declare they are no longer interested in receiving
data..."  I think the authors misunderstand TCP's concept of a
half-open connection.  In TCP, a connection is said to be "half-open"
if one of the endpoints has knowledge of the connection (in a state, I
suppose, past SYNSENT and before TIMEWAIT) and the other endpoint has
no knowledge of the connection.  This is an important concept to get
right, and I'm not sure if DCCP supports it correctly.  The authors
here appear to be discussing the effects of shutdown(2) with ``how''
of 0 or 1.

(4)

"(8) The server sends a DCCP-Reset packet whose reason is set to
"Closed", and clears its connection state."  I would suggest using
reset for an "unclean" close, and use another mechanism for a clean
close.

(4.1.1)

"(3) ... and including a Confirm option finalizing the negotiation of
the client-to-server CCID..."  Why is this?

"(5) ... ... The client's Ack Vector echoes the accumulated ECN Nonce
for the server's packets."  Does this mean that the server doesn't use
an Ack Vector, or that the server's Ack Vector does not echo the
accumulated ECN Nonce for the client's packets?

"(7) ... The server responds to lost or marked DCCP-Ack packets by
modifying the ACK Ratio sent to the client..."  I assume the reason
for this is to have "flow control for ACKs".  However, I think the
solution of reducing the number of ACKs per data packet may complicate
simple DCCP implementations, which would have to deal with one ACK per
every 20 or 100 data packets, which makes the "clocking" harder.  At
the least, I would think that an option should be for the server to
reduce the rate at which it is sending data packets (which would
implicitly slow down the rate at which ACK packets are being inserted
into the network).

In general, I'm uneasy about the "ACK Ratio" feature of DCCP.  There
is today strong disagreement among experts about whether one ACK per
data packet, 0.5 ACK per data packet, 0.1 ACK per data packet (in this
last case, qualified with the "both endpoints on the same local
LAN").  I think giving control to application designers is likely to
be a mistake.  The protocol designers and/or relevant standardization
body should come up with the desired ratio, and system implementors
should use that number, without any negotiating between the two sides.

(5.1)

I will note, though I'm sure this was discussed and dismissed, that
the source port, destination port, and random ISS could be replaced
with a random connection ID, and have the sequence number start with
zero.

"Despite this, we leave the design of mechanisms to protect against
wrapped sequence numbers for future work.  In particular, if it is
decided that very large packet sizes are better than very large
congestion windows for very-high-bandwidth flows, then 24 bits may be
enough."  This seems wrong to me.  I think of jumbograms as being an
mistake that people will always fall back on (just like people fall
back on only ACKing one out of every ten packets "on a LAN").  I don't
think we should build a mechanism that even slightly promotes the use
of jumbograms.  24 bits is probably too few bits for the sequence
number field.

"Values between 1 and 14, inclusive, indicate that the checksum
additionally covers that number of initial 32-bit words of the
packet's payload, padded to the right with zeros as necessary."  I
would reword this: "the checksum covers the DCCP header, DCCP options,
pseudo-header, plus an initial number of 32-bit words..."

(5.2)

"One good guideline is to set it to about 3 or 4 times the maximum
number of packets the sender expects to send in any round trip time."
I don't know how the sender, at the beginning of a connection, is
supposed to estimate the RTT.  I would suggest simply deciding on and
stating either one half or one third of the sequence number space.

"The receiver may center the loss window on GSN, or arrange it
asymmetrically."  A problem with this flexibility is that network
management tools (looking at a set of captured packets) will have a
hard time understanding which packets will, and which packets won't,
be accepted by the receiver.

"(1) No valid packet has been received recently (for instance, within
at least one round-trip time)."  Going quiet for one round trip time
is not unusual.  In fact, the network delaying a flight of packets by
one round trip time is not that unusual.  If the quoted sentence means
that after one round trip time of quiet, any packet will be accepted,
no matter what the sequence number, this would be a problem.  The
concept Maximum Segment Lifetime (MSL) is defined in TCP as the
longest a segment may exist in the network before being presented to a
receiving host.  MSL is typically (though conservatively) set at one
or two minutes.

"(2) The packet includes a correct Identification or Challenge option
(see Section 6.4.3)."  This should not be required (for example, to
deal with half-open connections).

(5.3)

The state diagram should be included in the ASCII version of this
document.  If a diagram itself it difficult to include, then a set of
state transition tables should be used in the ASCII and postscript
versions of this document.

The lack of a state diagram makes this analysis incomplete.

The action table which is included has some entries marked "Old/New",
but other entries are not so marked.  I think it might be clearer if
all the entries were so marked.

In the "Request" column of the "RESPOND" row, the entry is marked
"-/OK".  Should this be "RST/OK"?

I didn't understand Note 1 to the action table.

I think a number of the resets in the state diagram should be
evaluated once DCCP supports cleaning up half-open connections.  (In
TCP, the side which is "open" sends an ACK to the "closed" side, which
elicits a reset.  The intent, I believe, is to prevent spurious
resets.)

"The Open state does not signify that a DCCP connection is ready for
data transfer."  I find this unfortunate.

(5.5)

"All valid packets received by a DCCP stack MUST be acknowledged as
'received', even if their payloads were dropped (due to receive buffer
overflow or payload corruption, for example)."  This seems like a
mistake.  First off, it assumes the term "received" (in "all valid
packets received") is well specified, which it isn't.  Second,
congestion "towards" the application (because of the application's
"socket buffer" is full, say) is congestion.  An old streams bug with
TCP was to do the TCP processing, then drop the packet because the
application buffer was full.

"Each new DCCP-Response MUST increment the Sequence Number, and
possibly #NDP, by one."  Which sequence number?  That of the
responder?  What if seq0 is sent, delayed, then Seq1 is sent and
arrives.  When seq0 arrives at the Requester, the action table of
section 5.3 implies that a Rst would occur.

(5.6)

By not having an ACK on every packet, there is less possibility of
error/validity/timeliness checking than is possible in TCP.

"A DCCP-Data or DCCP-DataAck packet may contain no data bytes if the
application sends a zero-length datagram.  Such zero-length datagrams
MUST be reported to the receiving application."  I can see some
utility in having this facility (framing, sending an EOF, etc.).  But,
I can imagine that some systems would have trouble implementing this
requirement (some systems might have to re-work their buffer
management code, the API between applications and the kernel, etc.).

"The receiver of a valid DCCP-Close packet SHOULD respond with a
DCCP-Reset packet, with Reason set to "Closed"..."  Why not respond
with an ACK?

(5.9)

"DCCP B SHOULD respond to the DCCP-Move with a DCCP-Reset (with Reason
set to "Invalid Move") if any of the following conditions holds..."
Would "silently ignore" be more secure?  By sending the DCCP-Reset,
system A is telling system C ("far away" from A and B) something about the
state of the connection between A and B.

"After receiving such an invalid DCCP-Move, DCCP B MAY ignore
subsequent DCCP-Move packets, valid or not, for a short period of
time, such as one round trip time."  Which round trip time?  Also, do
DCCP *receivers* track round trip time?  (I don't believe TCP
receivers track round trip time.)

"If DCCP B accepts the move, it MUST send this acknowledgment to the
network address/Source Port combination."  I believe the intent of
this is to say "to the network address and source port contained in
the received datagram", but I'm not sure.  It should be re-worded to
be more clear.

"... if it rejects the move, which it MAY do for any reason, it MUST
send the acknowledgment to the Old Address/Old Port combination."
Does this mean that if attacker C sends a invalid DCCP-Move to A
purporting to be from B, A will then reset the connection with B?

"If the acknowledgment is lost, DCCP A might resend the DCCP-Move
packet (using a new sequence number).  DCCP B will detect this case
because the network address/Source port combination corresponds to a
valid connection, for which the ...  It SHOULD respond by sending
another acknowledgment, as allowed by the congestion control
mechanism in use."  This seems very baroque to me.

(6.2)

The problem I have is that Ignored is not guaranteed delivery.  So, if
A would have RST on seeing an Ignored, a lost Ignored may have A
continue, with subsequent bad performance/semantics.  My sense is that
even though data transfer does not need to be reliable in DCCP,
session management does need to have reliability.

(6.3.2)

"DCCP A SHOULD retransmit the Change option until it receives some
relevant response."  How often should it retransmit?

Is it the case that Change (and others) do *not* consume sequence
number space?

(6.3.3)

"DCCP B SHOULD respond to a valid Prefer option..."  The last sentence
of 6.3.2 implies "MUST" rather than "SHOULD".

(6.3.5)

The second example ("Here, A and B jointly settle on CC mechanism 5")
seems contrived, in that I would think that B would have started with
"Change(CC, 3, 4, 5)" if it understood 5.

(6.3.7)

"... rather, it says which feature option must be sent on the next
packet generated."  What if the next packet is already as large as
possible?

(REQUESTER STATE DIAGRAM (DCCP B))

In the Unknown state, "RECV - Pr | APP", why "-"?

I don't see any retransmissions.

I don't know how sequence numbers are interpreted on received packets,
i.e., what to do with "in the window" and "out of the window" packets.

(FEATURE LOCATION STATE DIAGRAM (DCCP A))

In the Unknown state, "RECV - | APP"  What does the "RECV -" mean?

In the Known state, "RECV Chg | APP" should be "RECV Chg (X) | APP".

(6.4.2)

I'm not sure how "Connection Nonce defaults to a random 8-byte string"
can work.  If B's Connection Nonce feature is set to a different
random 8-byte string than A's connection nonce, how does that work?

(6.7)

I believe one still needs to specify "network byte order" when
discussing quantities larger than a single byte.

"Elapsed time is meant to help the Timestamp sender separate the
network round-trip time from the Timestamp receiver's processing time.
This may be particularly important for CCIDs where acknowledgments
are sent infrequently, so that there might be a considerable delay
between receiving a Timestamp option and sending the corresponding
Timestamp Echo."  I think asking a system to put the actual elapsed
time will not work.  I think the best we might be able to do is put an
*estimated* elapsed time.  Different systems might put in different
values: half the period at which ACKs are sent, or the maximum, or the
amount of time left on the ACK timer for this connection).

(6.9)

A diagram of the loss window would be useful.

(7)

"A new connection starts with CCID 2 for both DCCPs.  If this is
unacceptable for either DCCP, that DCCP will start in the Unknown
state."  This seems problematic, in that A thinks B is in 2, but B is
in Unknown and isn't about to tell A that it is in Unknown.

"A DCCP SHOULD NOT send data when its Congestion Control feature is in
the Unknown state."  What about cases where the sending of *control*
information is limited by the CCID?

(7.1)

"For example, say that CCID 98, a new sender-based congestion control
mechanism using Ack Vector for acknowledgments, has entered the IETF
standards process, and the IETF has approved the use of CCID 1 as a
backup for CCID 98.  Now, DCCP A, which understands and would like to
use CCID 98, is trying to communicate with DCCP B, which doesn't yet
know about CCID 98.  DCCP A can simply negotiate use of CCID 1 and,
separately, negotiate Use Ack Vector.  DCCP B will provide the
feedback DCCP A requires for CCID 98, namely Ack Vector, without
needing to understand the congestion control mechanism in use."  This
sounds very good.  But, there are two issues.  First, what control
information sender and receiver agree to send each other on the wire.
Second, how the sender and receiver expect each other to act based on
the control information exchanged.  It isn't clear to me that if the
two systems have different expectations about what will happen there
won't be trouble.  I understand the argument, "in that case, the IETF
would not have approved the use of ...".  However, my concern is that
this is a mechanism that makes the protocol more complicated (as all
mechanisms do) but that may not have any *practical* utility in the
future.

(7.4)

Maybe in Gordon Bell's assessment of the PDP-10 (-11?) architecture,
there is the point that the main mistake an architect can make is not
having enough address bits.  I worry about the partitioning of
CCID-specific options into ones that apply to to HC-Sender,
HC-Receiver, etc.  I would suggest instead having a second byte that
would specify this information.

(8.1)

"However, note that acks-of-acks need not be reliable themselves:
when an ack-of-acks is lost, the HC-Receiver will simply maintain
[(and retransmit)] old acknowledgment state for a little longer."

"For instance, DCCP A might send a DCCP-DataAck packet every now and
then, instead of DCCP-Data."  Why not send DCCP-DataAck every time
there is Ack data to transmit?

"DCCP A switches its ack pattern from bidirectional to unidirectional
when it notices that DCCP B has gone quiescent.  It switches from
unidirectional to bidirectional when it must acknowledge even a single
DCCP-Data or DCCP-DataAck packet from DCCP B..."  Isn't it the case
that if DCCP A always sent DCCP-DataAck (when it had data and ACKs to
send), it wouldn't have to "notice" these transitions?

(8.4)

"... although DCCP B MAY send Ack Vector options even when Use Ack
Vector is false."  Why is this?  It seems counterintuitive and could
cause problems.

(8.5)

"Packets reported as State 0 or State 1 ...  And data on the packet
need not have been delivered to the receiving application; in fact,
the data may have been dropped."  I'll repeat that this seems like a
mistake to me.  I would say "if the data cannot be delivered to the
application, the packet should be silently discarded (or, "ECN
discarded" would be fine, too).

(8.5.1)

If the Ack of packet 24 (showing it to be State 0) has been acked,
then duplicate of packet 24 is received marked ECN, does a new Ack of
packet 24 need to be generated?

In the "old state"/"received state" table, I would suggest the the top
row be set to "0 0 0" (rather than "0 1 0").  This would mean that the
state is not changed from State 0 or from State 1, but only from State
3.  Yes, this means a bit of information may be lost, but simplicity
probably trumps absolute efficiency in this case.

(8.5.2)

"The union of groups 2 and 3 is called the Unacknowledged Window."
This is the HC-Receiver's point of view.  From the HC-Sender's point
of view, the union of groups 2, 3, and 4 is what is unacknowledged.

(8.6)

I don't think the "Slow Receiver Option" is a good idea.  I don't see
any need to invent a new mechanism to indicate "congestion" in the
path to the application.  Packets that arrive should be dropped
silently (or, with ECN).

"Slow Receiver implements a portion of TCP's receive window
functionality.  We believe receiver operating systems and applications
will find it much easier to send Slow Receiver when appropriate than
they currently find it to correctly set a TCP receive window."  To me,
the receive window portion of TCP is a way of trying to get N bytes in
flight.  The "guarantee" of being able to, in fact, accept all those N
bytes is at the level of statistical multiplexing: most likely the
receiving TCP will accept all N bytes, but it is possible it will
not.  As long as in a "best effort" sense this breaking of the
"guarantee" doesn't happen "too often", it is okay.

(I seem to remember at the SIGCOMM in 1988, Dave Cheriton asking Van
Jacobson if his work ultimately eliminated the utility of the receive
window.  I don't know of any subsequent work looking at this issue,
though it would be interesting.)

(8.7)

Again, I do not see a strong case for this option.

"Drop State 4 ("application no longer listening") means the
application running at the endpoint that sent the option is no longer
listening for data. ..."  Why not simply reset the connection if data
arrives when the receiving application is not expecting any data?

(8.8)

I don't see the checksum option as being particularly useful.  I know
this is an open issue currently, but I wouldn't want to rely on the
ones complement checksum in the case where the link level CRC
indicates a corrupted packet.  I see checksum less than 15 as being a
CPU optimization (don't waste cycles computing checksum on payload).

(8.9)

This might be a separate document, or perhaps an appendix.

(8.9.4)

"... a single acknowledgment number tells HC-Receiver how much ack
information has arrived."  I thought HC-Sender needed to maintain a
vector of received ACKs and send this vector to HC-Receiver every now
and then.

(9.1)

"The ECN Capable feature lets a DCCP inform its partner that it cannot
read ECN bits from received IP headers, so the partner must not set
ECN-Capable Transport on its packets."  Why support this?  Why not
just require ECN?

(9.2)

I think ECN Nonces are a clever idea, but I'm not sure they are worth
the complication in the protocol.  Both the sender and the receiver
are at least somewhat motivated to ignore congestion control, and this
doesn't help any malfeasance on the part of the sender.  I suspect you
cannot use technical means to police "good behavior" in this case.

(10.3)

"... unless they have high-quality information about actual network
conditions between the two new endpoints."  I wouldn't allow for any
exceptions.  The endpoints should start as they would at the beginning
of a connection.

"Normally, the only way to get this information would be by
instrumenting a DCCP connection between the new addresses."  What does
"instrumenting" mean in this context?

(10.4)

"The mobile DCCP MUST NOT let loss events on packets from the old
address/port pair affect the new congestion control state."  Why not?
Clearly, it wouldn't necessarily be correct (though it might be).
But, it seems like the conservative (towards overall network health)
thing to do.

(11)

"A DCCP implementation SHOULD be capable of performing Path MTU (PMTU)
Discovery..."  Why not say "MUST"?

"However, it is undesirable for MTU discovery to occur on the initial
connection setup handshake, as the connection setup process may not be
representative of packet sizes used during the connection, and
performing MTU discovery on the initial handshake might unnecessarily
delay connection establishment.  Thus, DF SHOULD NOT be set on
DCCP-Request and DCCP-Response packets."  Am I right in remembering
that IPv6 doesn't support fragmentation in routers along the path?  I
think a new protocol should probably set DF in every single packet.
For connection setup, assuming a 576 byte PMTU seems conservative, and
is what I would recommend.

"(We are aware that this may cause problems for DCCP endpoints behind
certain firewalls.)"  I'm unaware, so it might be good to discuss this
briefly.

(12)

"(In TCP, sequence number modification is required to support legacy
protocols like FTP that carry variable-length addresses in the data
stream.  If such an application were deployed over DCCP, middleboxes
would simply grow or shrink the relevant packets as necessary, without
changing their sequences numbers.)"  "Legacy" has a somewhat negative
connotation to it and, in this instance, could be safely left out.
And, a middlebox might need to inject an extra packet in the
data stream in the case where the packet that needed to be extended was
already of a maximum size.

(14)

"However, this approach to multiplexing sub-flows above DCCP will not
work in circumstances such as RTP where the RTP subflows require
separate port numbers."  I would think that a multiplexing layer above
DCCP would have to have port numbers in its header.  I don't see why
this means it couldn't happen.



----- End forwarded message -----
_______________________________________________
dccp IETF mailing list: dccp@ietf.org
list info:  https://www1.ietf.org/mailman/listinfo/dccp
wg charter: http://www.ietf.org/html.charters/dccp-charter.html