TCPM WG Meeting - IETF 68 - Prague
Tuesday, March 20, 2007, 17:40 -- 18:40

Note takers: Pasi Sarolahti and Gorry Fairhurst
Acting chair: David Borman (for Mark and Ted)
Jabber watcher: Lars Eggert

(My thanks to Pasi and Gorry for taking good notes, these minutes
 are mainly just a merger of their notes.  -David)

Agenda bashing -- no comments


WG status
* anti-spoof: Passed WGLC, currently in IETF last call
* syn-flooding: in seemingly good shape, intent to have WGLC
* soft errors: seemingly good shape, intent to have WGLC
* rfc2581bis: authors believe issues have been addressed
  - for Draft Standard we need implementation reports
  - Mark to survey implementors -- implementors are asked to contact Mark
* ecn-syn: Sally working on simulations to get clarity on what is
  right response to ECN-marked SYN-ACK. Sally and Mark have some
  disagreements on what the response should be.
* tcp-auth: There is WG consensus on developing TCP-MD5
  replacement. TCPM to do the transport protocol work. Design team has
  been established to produce -00 version from two competing drafts,
  chaired by Steve Bellovin. After that the usual WG process will take
  over. 


User Timeout -- Lars Eggert
draft-ietf-tcpm-tcp-uto-05.txt
* Lars and chairs did not remember if this is going for experimental or
  proposed. Tim Shepard has preference for experimental.
  - Tim Shepherd: in San Diego asked this explicitly from chairs during
    agenda bashing, they said the plan is for Experimental.
  - Lars: ok
* Status
  - TCP connection dies if there is long period of disconnection
    without ACKs coming in, relates to mobility, etc.
  - Draft proposes one way to solve it, a TCP option to signal the
    appropriate timeout 
  - ver -05 has clarifications, Ted Faber commented earlier that the
    language needs to be consistent, lot of overload on UTO
    value. There was also one other issue during author discussion. 

* Changing to make it explicit of what we talk about -- three pieces
  of connection state: 
  - "enabled": whether UTO is enabled or not
  - "local_uto": local UTO in use, system-wide default, or optionally
    applications to tell TCP stack what to use
  - "changeable": controls whether local UTO may be changed based on
    incoming options. Default is true unless application sets
    local_uto, false if application has set the local_uto.

* Last issue
  - Distinction between UTO I am currently using vs. last sent UTO
    information. Ability to shorten the local UTO depends on maintaining
    last_sent_uto information. If don't need to shorten local UTO, the
    implementation becomes simpler. Does the WG consider this is
    important? 
  - There will be a revision, then WGLC
  - Gorry Fairhurst: Is there a real use-case for reducing the UTO? In the last meeting,
    we discussed the issues of having too small a value. So, things are simpler
    if we can only make this bigger.
  - Lars:That is one option. A TCP may wish to shorten this to release resources.
    Use case for short UTO could be busy web server, for long
    UTO it is periods of disconnectivity. Fernando would like to keep
    short UTO, Lars would like to remove. No one has requested short
    UTO, no one has opposed.
  - David Borman: if we don't have ability to shorten it, does it affect
    the web server? If the server uses a shorter timeout and closes the
    connection, the client would know about it anyway.
  - Lars: it wouldn't change anything on the web server. Would give
    the other guy ability to shorten timeout, if a disconnection happens.
  - Lars: server can locally timeout while client might keep trying for a
    longer time vs. client would stop immediately.

* Next version need to choose which way to go, then we will need committed
  reviewers. 
    - Do the new variables clarify things?
    - Do we need to be able to reduce the UTO?
  Two volunteers:
    Anantha Ramaiah
    Arjuna Sathiaseelan


TCP-secure -- Anantha Ramaiah
draft-ietf-tcpm-tcpsecure-07.txt

Improving TCP robustness. Has been around as WG document for two
years, would like to go for WG last call.

Provides mitigations to known security issues. Has been thoroughly
discussed on list. Have been running in Internet for two years. No
known serious issues.

Changes:
* Mitigation recommendations changed from MUST to SHOULD, after a comment
  by Ted Faber, would have made existing TCP implementations
  non-compliant with this.
* Security considerations text rewritten.
* Got many comments: some public, some private.

Data injection mitigation: SHOULD or MAY?
* explaining data injection mitigation specification
* implementations can choose to hard code the value of max.snd.wnd
  mitigation. Increases robustness to FIN attack.

Comments? Ready for WGLC?
* Joe Touch: All documents need to be careful with the use of SHOULD - it
  means most implementations will do this, but there could be cases where a
  specific implementor will not do this. We SHOULD say when (under what
  conditions) this is NOT OK to implement. Perhaps we could say a SHOULD (or
  even) MUST for all routers.
* Joe Touch: Raising a comment from the list -- this is a general
  statement for all documents: What SHOULD means, and what it allows?
  There seems to be consensus about how it is documented in RFC
  2119. I would be ok with SHOULD or even a MUST with regard to
  TCP implementations on routers. Requiring SHOULD without caveats to
  all hosts in the world, it is not compliant to update. On general
  host it is MAY. 
* Scott Bradner: This interpretation is correct. The idea of SHOULD is saying
  this is to be done. You may not imagine there is a good reason for not doing
  this, (but you may not know all the possibilities at the time the RFC is
  published) - the aim is to do something to make the system to work.
* Tim Shepard: clarifying question to Joe -- why did you say router?
  There could be end hosts and all sorts of places where you run
  long-lived connections that find this useful to improve
  robustness. Router is a wrong distinction. If you said BGP that
  might make more sense. 
* Joe: because in routers TCP attack would be reasonable and likely.
* Tim: There could be applications on end hosts, that may need this.
  Not just "router" it could be that we really mean a BGP session.
* Anantha: Distinction to router is gone. We have router that can use
  TCP for all kinds of purposes, HTTP, voice-over-IP, all kinds of stuff. 
* Joe: It may be useful for end hosts that have long-lived sessions were both
  ends are likely to be known.
* Tim: It is needed on any system that may in the future need robustness.
* Joe: ... yes, but in cases where the host fails to be implement "proper"
  security.

* Lars Eggert: Note that previous discussions considered the document also has
  an IPR statement. The WG needs to include consideration of this in the
  decisions.
* Anantha: why mentioned it now?
* Lars: WG needs to include the existence of IPR in its decision. Draft
  has a long history, and shouldn't forget about the IPR.
* Mark Allman - via jabber: The IPR point is that the IPR could have an impact
  on MUST v. SHOULD v. whatever, i.e., I would personally be against saying a
  TCP MUST implement tcpsecure, because it has an IPR statement.


F-RTO update to proposed standard -- Markku Kojo
(RFC 4138)

Updating F-RTO to proposed standard. No revised protocol specification
available, just an evaluation report

Small modification in TCP sender algorithm, allows detecting a spurious
RTO
* Experimental RFC since Aug 2005.
* number of known implementations.
* Experimentations with all major implementations show encouraging results.
* Interest to promote has been expressed already earlier
* Last IETF we were asked to write document to evaluate &
  show it is not harmful. Material is available on a web page, but not
  yet in the repository.

First question: is F-RTO useful?
* showing time-sequence graphs of normal TCP behavior on delay spike
* full window of segments unnecessarily retransmitted, wastes requests,
  breaks the packet conservation principle.
* Problem in a mobility case of moving from WLAN to GPRS environment.
* Problem is not about causing congestion, but about performance of an
  individual TCP flow 
* Presenting time-seq diagram of case using F-RTO. If two segments
  acknowledge new data, can declare timeout spurious and continue
  sending new data. Avoids unnecessary retransmissions, and additionally can
  take RTT samples from the delayed segments.

Can F-RTO be harmful? No
* If RTO is not spurious or F-RTO cannot detect spurious reverts back to
  the traditional RTO recovery. Exactly the same number of segments are
  transmitted as with normal RTO recovery.
* There are a few corner cases where F-RTO can declare RTO spurious
  even if there are packet losses. It would be harmful if congestion
  control response was aggressive. If congestion window was not halved
  in response to spurious RTO, it should be ok.
* Few known scenarios
  - 1: loss of unnecessary RTO retransmission. Quite rare situation to
    happen 
  - 2: severe reordering
    * Mark Allman (via jabber): You have only ten minutes, get to the point
  - 3: malicious receiver
  - Might be harmful if congestion control response is reverted, but
    proposing that congestion window is reduced in response to
    spurious RTO, in which case false positives are not be harmful.

Next steps
* Revise RFC 4138, targeting at Proposed Standard. Specify basic
  algorithm only, and TCP only, leave out SCTP because there is no
  implementation experience. Leave out SACK-enhanced variant, because it
  has only limited benefit.
* What to do with response? The draft does not specify any response in
  the original RFC. Options are 1) do not specify response 2) specify
  conservative default response, or 3) specify a conservative response in
  the new draft 
* Recommend implementing conservative response

* Anantha Ramaiah: how does it cooperate with other related
  enhancements: Eifel, DSACK? - e.g, if all three coexist.
* Markku: Does not require additional information. If timestamps have
  been enabled, Eifel can be used without problems. DSACK does not
  prevent unnecessary retransmissions.
* David: Continue discussion on the mailing list. Please indicate if you
  want or don't want this to be promoted from experimental to proposed.


Non-WG Drafts

Identifying Cheating Receivers -- Toby Moncaster
draft-moncaster-tcpm-rcv-cheat-00.txt

First presentation of a new draft
* TCP senders rely on accurate feedback from receivers
* Dishonest receiver can do optimistic ACKs, causing sender to
  transmit at higher rate, conceal lost data, harmful for congestion
  control 
* Some existing proposed solutions:
  - Randomly skipped segments
  - ECN nonce
  - Transport layer nonce in TCP headers

Listing 7 key requirements for solution
* Joe Touch: Proposes re-phrasing "Test should not harm innocent
  receiver": Anything that network could have done cannot be
  interpreted as malicious. Sender should not be allowed to do
  anything that the network couldn't have done anyway. That allows to
  develop the solution rather than just mess up the receiver.
* Joe: The test should not harm an innocent receiver, i.e. anything the
  network may have done accidentally should not be seen as malicious.
* Joe: It also should only do what the network would normally allow a sender
  to do.

Assessing proposed solutions
* Table evaluating the earlier solutions
* Joe: should have separate column in the draft, or have a separate
  row works only with certain options. Otherwise I like the table,
  should be also in the draft. 
* Toby: it already is

Our proposed solution
* Based on Rob Sherwood's randomly skipped segments solution
  1. delay a segment by small amount
  2. delay segment until duplicate ack is received.

Graph of stage 1 test

Assessing stage 1 test
* Meets all requirements set, but does not strictly prove dishonesty.
* Joe: Should also say works only work with certain options.
* Tim Shepard: if receiver is using SACK, the test gives receiver
  chance to prove it works nicely with SACK. The table did not mention
  delayed ACKs.
* Toby: main gain of optimistic acking is to reduce RTT
* Joe: any indications that things mentioned in tcpsecure have
  relationship with this. For example if anyone wanting to cheat by
  bursting. 
* Toby: need to look at that in detail

Stage 2 test
* Meets all requirements set, except doesn't harm innocent receiver

Conclusion
* Cheating receiver gets more resources that it should get. Could
  possibly cause congestion collapse 
  (Gorry needs to leave a room, other minute taker is gone)
* Anantha: what about senders using byte counting?
* Joe: should verify that there are no interactions with Nagle in
  cases when there are segments dropped, should check interactions with
  partial segments 


TCP Response to Lower-Layer Connectivity-Change Indications -- Simon Sch�tz
draft-schuetz-tcpm-tcp-rlci-01.txt

Problem: TCP is unaware of what happens on lower layers. RLCI uses
generic indications from lower layers, avoids long idle time due to
repetitive RTOs.

Connection stays idle after gaining connection after disconnection,
because RTO has backed off

Why RLCI?
* Provides generic approach to overcome problems
  - hand-overs
  - connectivity disruptions
* Bob Briscoe: The draft was not clear is this about getting
  information from local link or of the remote link?
* Simon: Defining the source of indications is out of scope, defining the
  response to indications.
* Joe Touch: has been discussed before, problem with earlier
  approaches is that you are trying to get indications from the link
  layer, transitive to issue on putting this on the API layer. Tickling
  the end point deliberately, unreasonable thing to do. 

What's new

Next steps
* Would like to start discussion on the mailing list
* Candidate for experimental.
* Soliciting discussion on the mailing list


Concluding the TCPM meeting
* Explicit reviewers sought for tcpsecure. Send mail to Ted and Mark if
  you are volunteering.
* Looking for people to read tcp-persist and comment on the mailing list