DCLCRG - IETF90 - 25 July 2014 - Toronto, ON, Canada

Chairs: Fred Baker (Cisco), Lingli Deng (China Mobile)
Note taker: Dirk.Kutscher@neclab.eu

Proposed Agenda at http://trac.tools.ietf.org/group/irtf/trac/wiki/DclcIetfNinty


Fred: problem statement
      - Erik Nordmark: one aspect of management is TCP management (flow), other aspect is DC with multiple tenants -- are we thinking about both?
      - Fred: there is a variety of aspects, so yes

Fred: data sharing -- what data can people share?  

      - Richard Scheffenegger: some companies cannot share anything,
        some operators have very high security standards, might be
        difficult

      - Lars Eggert: floated this idea in London, RGs can give
        themselves different rules -- can relate to data sharing,
        perhaps an NDA could be signed. Also universities are running
        pretty large data centers

      - Dirk: carrier DC measurements could be feasible --
        product-specific performance measurement is different. Hope
        someone finds this work interesting, so that they can share
        data

      - Lars Eggert: without bringing data, work in the group might be
        seriously impeded


Bob Briscoe: Reducing Internet Latency -- a survey of techniques and their metrics
      - http://riteproject.eu/publications
      - http://riteproject.files.wordpress.com/2014/07/latency_preprint-31.pdf
      - http://www.bobbriscoe.net/presents/1407ietf/1407latency-survey.pdf
      - Lars: this is for Internet latency reduction not for DC,right?
      - Bob: there are different flows in the picture


Vic Liu: Gap Analysis on Virtualized Network Test 

    - http://tools.ietf.org/html/draft-liu-dclc-gap-virtual-test
    
    - Lars on Throughput slide: figure 1 surprising. Hypervisor not
      getting enough cycles to send packets?

    - Vic: VM does not get enough cycles.

    - Lars: timers inside VM cannot be synchronized at high accuracy?

    - Vic: cannot synchronized, becuase it base on two different linix systems
    
    - Karen Sollins on Troughput: how many physical CPUs?

    - Vic: 2 cores, 24 vCPUs

    - Karen: you don't pin vCPUs physical CPUs?

    - Vic: we do pin them
    
    - Dirk: are you sharing measurement data with vendors

    - Vic: currently one vendor


Dapeng Liu: Latency Test Report
    - http://tools.ietf.org/html/draft-shi-dclc-latency-test

    - Lars: buffer threshold -- is that the queue depth?

    - Dapeng: yes

    - Lars: threshold of 1% means limiting to 1% of max capacity?

    - Dapeng: yes

    - Lars: would be nice to see absolute values, not just percentage

    - Lars: by using percentage, you are losing a bit of information

    - Lars (on "The Mininum Delay"): minimum latency is lowest latency
      during a test run?

    - Dapeng: yes

    - Richard Scheffenegger: did you have a ramp up phase? measuring
      after couple of seconds?
    
    - Lingli: there are different phases for each test round, a starting phase,
      a stable phase and a teardown phase, the statistics are drawn from the
      phase in the middle when the load keeps stable for 30 to 60 seconds.

    - Al Morton: how are repeatable are the results? can you repeat one
      week later or so? There might be processess sharing the buffer etc.

    - Dapeng: good point. different things affect the test. With
      different vendors' products, results could be different.
    

Fred Baker: TCP Congestion Control as Latency Control (Tsinghua Work)


     Fred: maybe AQM is not the right place to optimize

     Alexander Zimmermann: surprised to see that newReno than SACK-TCP?

     Fred: no good answer

     Fred: conclusion: using delay-based procedure helps a bit but
     does not solve incast cold

     Fred: what are the research options, next steps?

     Lars Eggert: TCP pretends that all flows are unrelated? Ideas
     from integrated congestion control (congestion manager) might
     help here

     Lars (on map-reduce slide): with SSD you have less jitter, so
     effects are more tightly synched than before

     Fred: Conclusions: TCP and related protocols should use
     delay-based or jitter-based procedures such as FAST or CDG

     Fred: what to do to move away from Map-Reduce-based apps

     Al Morton : excited about researching and benchmarking connection


Haibing Song: TCP parameter control

   - http://tools.ietf.org/html/draft-song-dclc-tcpdc

   - Lars: default TPC parameters were chosen to be safe in all
     possible scenarios. Assumming that there is an all-knowing entity
     that can derive optimal parameters for all hosts is a very strong
     assumptions -- traffic patterns might change drastically. How
     stale is info that centralized controller sees? 

   - Fred: what parameters would you set? RWIN? effective windows?

   - Haibin: initial congestion window size

   - Richard Scheffenegger: what timescales? daytime/nightime seems
     rather coarse. Lemmings is more about microseconds. If you reduce
     timescales, what about overhead

   - Haibin: no answer to that

   - Q: if you change TCP parameters dynamically -- how about fairness?

   - Haibin: no answer to that

   - Fred: look at MIT Fastpass -- slotted Aloha in DC


Bob Briscoe: Accurate TCP ECN

   - http://tools.ietf.org/html/draft-ietf-tcpm-accecn-reqs

   - "An Enabler for New E2E Transport Behaviours: More Accurate ECN
     Feedback Reflector (AccECN)"

   - Lars on DCTCP: safe operation problem is fixed in Windows and
     FreeBSD stacks

   - Lars on extended feedback: do we need to modify DCTCP for this?
     Would unmodified stacks still work?

   - Bob: because of cap neg, unmodified stacks would still work

   - Lars: do wo need to take as many bits as possible or now or
     better reserve some?

   - Bob: trying to make it modular

   - Bob: anyone has concerns regarding turning off delayed ACKs

   - Gorry Fairhurst: DCs do care about delayed ACKs
   
   - Lars: cannot make decision now

   - Gorry (on Open Design Issues): what are the implications?

   - Bob: with delayed response, it may take longer to learn that
     there is specific codepoint

   - Richard Scheffenegger: that's an issue of receiver behviour --
     not of wire protocol

   - Lars: super-efficient design may be a problem -- difficult to
     understand, review, debug...

 
Discussion:

	Lingli on data sharing: better start thinking about what data
	you actually need

	Dirk: yes

	Lars: there may also be existing data that could be leveraged
	if we specify better what we need

	Fred -- show of hands:
	     - most of us find this interesting
	     - many want to have another meeting (perhaps IETF91)
 	     - people should be on mailing list

	Bob: currently more random presentations -- should we give the
	work a purpose?

	
NOT DISCUSSED:
Bob Briscoe: Network Performance Isolation in Data Centres using Congestion Policing

   - http://tools.ietf.org/html/draft-briscoe-conex-data-centre