idnits 2.17.1 

draft-gettys-iw10-considered-harmful-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (August 26, 2011) is 4621 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Unused Reference: 'Chu' is defined on line 316, but no explicit
     reference was found in the text

  -- Obsolete informational reference (is this intentional?): RFC 2068
     (Obsoleted by RFC 2616)

  -- Obsolete informational reference (is this intentional?): RFC 2616
     (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235)


     Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 3 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	Internet Engineering Task Force                               Jim Gettys
3	Internet-Draft                                  Alcatel-Lucent Bell Labs
4	Intended status: Informational                           August 26, 2011
5	Expires: February 27, 2012

7	                        IW10 Considered Harmful
8	                draft-gettys-iw10-considered-harmful-00

10	Abstract

12	   The proposed change to the initial window to 10 indraft-ietf-tcpm-
13	   initcwnd must be considered deeply harmful; not because it is the
14	   proposed change is evil taken in isolation, but that other changes in
15	   web browsers and web sites that have occurred over the last decade,
16	   it makes the problem of transient congestion at a user's broadband
17	   connection two and a half times worse.  This result has been hidden
18	   by the already widespread bufferbloat present in broadband
19	   connections.  Packet loss in isolation is no longer a useful metric
20	   of a path's quality.  The very drive to improve latency of web page
21	   rendering is already destroying other low latency applications, such
22	   as VOIP and gaming, and will prevent reliable rich web real time web
23	   applications such as those contemplated by the IETF rtcweb working
24	   group.

26	Status of this Memo

28	   This Internet-Draft is submitted in full conformance with the
29	   provisions of BCP 78 and BCP 79.

31	   Internet-Drafts are working documents of the Internet Engineering
32	   Task Force (IETF).  Note that other groups may also distribute
33	   working documents as Internet-Drafts.  The list of current Internet-
34	   Drafts is at http://datatracker.ietf.org/drafts/current/.

36	   Internet-Drafts are draft documents valid for a maximum of six months
37	   and may be updated, replaced, or obsoleted by other documents at any
38	   time.  It is inappropriate to use Internet-Drafts as reference
39	   material or to cite them other than as "work in progress."

41	   This Internet-Draft will expire on February 27, 2012.

43	Copyright Notice

45	   Copyright (c) 2011 IETF Trust and the persons identified as the
46	   document authors.  All rights reserved.

48	   This document is subject to BCP 78 and the IETF Trust's Legal
49	   Provisions Relating to IETF Documents
50	   (http://trustee.ietf.org/license-info) in effect on the date of
51	   publication of this document.  Please review these documents
52	   carefully, as they describe your rights and restrictions with respect
53	   to this document.  Code Components extracted from this document must
54	   include Simplified BSD License text as described in Section 4.e of
55	   the Trust Legal Provisions and are provided without warranty as
56	   described in the Simplified BSD License.

58	Table of Contents

60	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . . . 3
61	   2.  Discussion  . . . . . . . . . . . . . . . . . . . . . . . . . . 3
62	   3.  Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
63	   4.  IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 7
64	   5.  Security Considerations . . . . . . . . . . . . . . . . . . . . 8
65	   6.  Informative References  . . . . . . . . . . . . . . . . . . . . 8
66	   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . . . 9

68	1.  Introduction

70	   In the second half of the 2000's, competition in web browsers
71	   reappeared and changed focus from strictly features, to speed
72	   (meaning latency, at least as seen from data-centers, which can be
73	   highly misleading), with the discovery (most clearly understood by
74	   Google) that web sites are stickier the faster (lower latency) they
75	   are.  Perhaps Sergey Brin and Larry Page knew Stuart Cheshire at
76	   Stanford?  [Cheshire].

78	   The problem, in short, is the multiplicative effect of the following:

80	   o  Browsers ignoring RFC 2068 [RFC2068] and RFC 2616 [RFC2616]
81	      requirement to use no more than two simultaneous TCP connections,
82	      with current browsers often using 6 or sometimes many more
83	      simultaneous TCP connections

85	   o  "Sharded" web sites that sometimes deliberately hide the path to
86	      servers actually located in the same data center, to encourage
87	      browsers to use even more simultaneous TCP connections

89	   o  The proposed change to the TCP congestion window, to allow each
90	      fresh TCP connection to send as much as 2.5 times as much data as
91	      in the past.

93	   o  Current broadband connections having a single queue available to
94	      customers, which is usually badly over-buffered, hiding packet
95	      loss

97	   o  Web pages having large numbers of embedded objects in a web page.

99	   o  Web servers having large memory caches and processing power when
100	      generating objects on the fly that responses are often/usually
101	      transmitted effectively instantaneously at line rate speed

103	   The result can easily be a horrifying large impulse of packets sent
104	   effectively simultaneously to the user as a continuous packet train,
105	   landing in, and clogging the one queue in their broadband connection
106	   and/or home router for extended periods of time.  Any chance for your
107	   VOIP call to work correctly, or to avoid being fragged in your game,
108	   evaporates.

110	2.  Discussion

112	   The original reasons for the 2 TCP connection rule in RFC 2068
113	   [RFC2068] and RFC 2616 [RFC2616] in section 8.1.4 are long gone.  In
114	   the 1990's era, dial-up modem banks were often badly underbuffered,
115	   and multiple simultaneous connections could easily cause excessive
116	   packet loss due to self-congestion either on the dialup port itself,
117	   or the dialup bank overall.

119	   Since the 1990's memory has become very cheap, and we now have the
120	   opposite problem: buffering in broad band equipment is much, much too
121	   large, larger than any sane amount, as shown by the Netalyzr
122	   [Netalyzr] and FCC data [Sundaresan], a phenomena I christened
123	   "bufferbloat" as we had lacked a good term for this problem.

125	   What is more, broadband equipment usually provides only a single
126	   queue unmanaged, bloated queue to the subscriber; a large impulse of
127	   packets for a single user will block other packets from other
128	   applications of that user, or to other users who share that
129	   connection.  This buffering is so large that slow start is badly
130	   damaged (TCP will attempt to run many times faster than it should
131	   until packet loss finally brings it back under control), and
132	   congestion avoidance is no longer stable, as I discovered in 2010
133	   [Gettys].

135	   I had expected and hoped that high performance would be achieved via
136	   HTTP Pipelining [HTTPPerf] and that web traffic would have have
137	   longer TCP sessions.  HTTP pipelining is painful due to HTTP's lack
138	   of any multiplexing layer and the lack of response numbering to allow
139	   out of order responses; "poor man's multiplexing" is possible, but
140	   complex.  The benefits of pipelining to the length of TCP sessions
141	   are somewhat less than one might naively presume, as significantly
142	   fewer packets are ultimately necessary.  But HTTP pipelining has
143	   never seen widespread browser deployment (though is supported by a
144	   high fraction of web servers).  You will seldom see packet loss by
145	   using many TCP connections simultaneously in today's Internet, as
146	   buffers are now so large they can absorb huge transients, sometimes
147	   even a megabyte or more.

149	   Web browsers (seeing no packet loss) changed from obeying the RFC2616
150	   requirement of two TCP connections to using 6 or even 15 TCP
151	   connections from a browser to a web server.  What is more, some web
152	   sites, called "sharded web sites," deliberately split themselves
153	   across multiple names to trick web browsers into even more profligate
154	   uses of TCP connections, and there is no way for a web client to
155	   easily determine a web site has been so engineered.

157	   A browser may then render a page with many embedded objects (e.g.
158	   images).  Current web browsers will therefore simultaneously open or
159	   reuse 6 or often many more TCP connections at once, and the initial
160	   congestion window of 4 packets may be sent from the same data center
161	   simultaneously to a user's broadband connection.  These packets
162	   currently enter the single queue in most broadband systems between
163	   the Internet and the home, and with no QOS or other fair queuing
164	   present, induce transient latency; your VOIP or gaming packet will be
165	   stuck behind this burst of web traffic until the burst drains.
166	   Similarly, current home routers often lack any QOS or sophisticated
167	   queuing to ensure fairness between different users.  The proposal by
168	   Chu, et. al. in to raise the initial congestion window from four to
169	   10, making the blocking and resulting latency and jitter problem up
170	   to 2.5 times worse.

172	   Note that broadband equipment is not the only overbuffered equipment
173	   in most user's paths.  Home routers, 3g wireless and the user's
174	   operating systems are usually even worse than broadband equipment.
175	   In a user's home, whenever the wireless bandwidth happens to be below
176	   that of the broadband connection, the bottleneck link is in the
177	   wireless hop, and so the problem may occur there rather than in the
178	   broadband connection.  It is the bottleneck transfer that matters;
179	   not the theoretical bandwidth of the links. 802.11g is at best
180	   20Mbps, but often much worse.  Other bottleneck points in the user's
181	   paths may also be lacking AQM.

183	   I believe the performance analysis in the draft-ietf-tcpm-initcwnd is
184	   flawed not by being incorrect in what it presents, but by overlooking
185	   the latency and jitter inflicted on other traffic that is sharing the
186	   broadband link, due to the large buffering in these links and
187	   typically single queue.  It is the damage the IW change would make to
188	   other real time applications sharing that link (including rtcweb
189	   applications), or what those sharing that link do to you that is the
190	   issue.

192	   Simple arithmetic to compute induced transient latency, even ignoring
193	   all overhead, comes up with scary results:

195	   Latency table

197	   +------+---------+---------+----------+--------+---------+----------+
198	   |    # |   ICW=4 |    Time |     Time | ICW=10 |    Time |     Time |
199	   | conn | (bytes) |  @1Mbps |  @50Mbps |        |  @1Mbps |  @50Mbps |
200	   +------+---------+---------+----------+--------+---------+----------+
201	   |    2 |   12000 |    96ms |   1.92ms |  30000 |   240ms |    4.8ms |
202	   |    6 |   36000 |   288ms |   5.76ms |  90000 |   720ms |     14ms |
203	   |   15 |   90000 |   720ms |   14.4ms | 225000 |  1800ms |     36ms |
204	   |   30 |  180000 |  1440ms |   28.8ms | 450000 |  3600ms |     72ms |
205	   +------+---------+---------+----------+--------+---------+----------+

207	                         Table 1: Unloaded Latency

209	   1 Mbps may be your fair share of a loaded 802.11 link. 50Mbps is near
210	   the top end of today's broadband.  Available bandwidths in other
211	   parts of the world are often much, much lower than in parts of the
212	   world where broadband has deployed.

214	   Simple experiments over 50Mbps home cable service against Google
215	   images confirm latencies that reach or sometimes double those in the
216	   table.  Steady-state competing TCP traffic will multiply these times
217	   correspondingly; even at 50Mbps, reliable, low latency VOIP can
218	   therefore be problematic.  From this table, it becomes obvious that
219	   QOS in shared wireless networks has become essential, if only because
220	   of this change in web browser behavior.  Note that the 2 connection
221	   rule still results in 100ms latencies on 1Mbps connections, which is
222	   already very problematic for VOIP by induction of jitter.  Two TCP
223	   connections are capable of driving a megabit link at saturation over
224	   most paths today even from a cold start with ICW=4.

226	   In the effort to maximise speed (as seen by a data center) web
227	   browsers/servers have turned web traffic into an delta function
228	   congesting the user's queue for extended periods.  Since the
229	   broadband edge is badly over-buffered as shown first by Netalyzr,
230	   packets are usually not lost, but instead, fill the one queue
231	   separating people from the rest of the Internet until they drain.

233	   Many carrier's telephony services are not blocked by this web traffic
234	   since the carriers have generally provisioned voice channels
235	   independently of data service; but most competing services such as
236	   Vonage or Skype will be blocked, as they must use the single,
237	   oversized queue.  While I do not believe this advantage was by
238	   design, it is an effect of bufferbloat and current broadband
239	   supporting only a single queue, at most accelerating acks ahead of
240	   other bulk data packets.  In the presently deployed broadband
241	   infrastructure, these other queues are usually unavailable for use by
242	   time sensitive traffic, and DiffServ [RFC3260] is not implemented in
243	   broadband head end equipment.  Therefore time sensitive packets share
244	   the same queue of non-time sensitive bulk data (HTTP) traffic.

246	3.  Solutions

248	   If HTTP pipelining were deployed, it would result in lower actual
249	   times to most users; fewer bytes are needed due to sharing packets
250	   among objects and requests, and much lower packet overhead and lower
251	   ack traffic and significantly better TCP congestion behavior.  While
252	   increasing the initial window someday may indeed make sense, it is
253	   truly a frightening to us to raise the ICW during this arms race
254	   given already deployed HTTP/1.1 implementations.  SPDY [SPDY] should
255	   have similar (or better) results, but requires server side support
256	   that will take time to develop and deploy, whereas most deployed web
257	   servers have supported pipelining for over a decade (sometimes with
258	   bugs, which is part of why it is painful to deploy web client HTTP
259	   pipelining).

261	   A full discussion of solutions that would improve latency for general
262	   web browsing without destroying realtime applications is beyond the
263	   scope of this ID.  I note a few quickly (which are not mutually
264	   exclusive) that can and should be persued.  They all have differing
265	   time scales and costs; all are desirable in my view, but the
266	   discussion would be much more than I can cover here.

268	   o  Deployment of HTTP/1.1 pipelining (with reduction of # of
269	      simultaneous connections back to RFC 2616 levels

271	   o  Deployment of SPDY

273	   o  DiffServ deployment in the broadband edge and its use by
274	      applications

276	   o  DiffServ deployment in home routers (which often unbeknownst to
277	      those not in the gaming industry, has already partially occurred
278	      due to its inclusion in the default Linux PFIFO_FAST line
279	      discipline).

281	   o  Some sort of "per user" or "per machine" queuing mechanism on
282	      broadband connections such that complete starvation of service for
283	      extended periods can be avoided.

285	   At a deeper and more fundamental level, individual applications (such
286	   as web browsers) may game the network with concurrent bad results to
287	   the user (or other users sharing that edge connection), and with the
288	   advent of Web sockets, even individual web applications may similarly
289	   game the network's behavior.  With the huge dynamic range of today's
290	   edge environments, we have no good way to know what a "safe" initial
291	   impulse of packets a server may send into the network in what
292	   situation.  Today there is no disincentive to applications abusing
293	   the network.  Congestion exposure mechanisms such as Congestion
294	   Exposure [ConEx] are badly needed, and some way to enable users (and
295	   their applications, on their behalf) to be aware of and react to
296	   badly behaved applications.

298	4.  IANA Considerations

300	   This memo includes no request to IANA.

302	5.  Security Considerations

304	   Current practice of web browsers, in concert with "sharded" web sites
305	   and changes to the initial congestion window, and the currently
306	   deployed broadband infrastructure can be considered a denial of (low
307	   latency) service attack on consumer's broadband service.

309	6.  Informative References

311	   [Cheshire]
312	              Cheshire, "It's the Latency, Stupid", 1996,
313	              <http://rescomp.stanford.edu/~cheshire/rants/
314	              Latency.html>.

316	   [Chu]      Chu, Dukkipati, Cheng, and Mathis, "Increasing TCP's
317	              Initial Window", 2011, <http://datatracker.ietf.org/doc/
318	              draft-ietf-tcpm-initcwnd/>.

320	   [ConEx]    Briscoe, "congestion exposure (ConEx), re-feedback and re-
321	              ECN", 2005, <http://www.bobbriscoe.net/projects/refb/>.

323	   [Gettys]   Gettys, "Whose house is of glasse, must not throw stones
324	              at another", January 2011, <http://gettys.wordpress.com/
325	              2010/12/06/
326	              whose-house-is-of-glasse-must-not-throw-stones-at-
327	              another/>.

329	   [HTTPPerf]
330	              Nielsen, Gettys, Baird-Smith, Prud'hommeaux, Lie, and
331	              Lilley, "Network Performance Effects of HTTP/1.1, CSS1,
332	              and PNG", June 1997, <http://www.w3.org/Protocols/HTTP/
333	              Performance/Pipeline.html>.

335	   [Netalyzr]
336	              Kreibich, Weaver, Nechaev, and Paxson, "Netalyzr:
337	              illuminating the edge network.", November 2010, <http://
338	              www.icir.org/christian/publications/
339	              2010-imc-netalyzr.pdf>.

341	   [RFC2068]  Fielding, R., Gettys, J., Mogul, J., Nielsen, H., and T.
342	              Berners-Lee, "Hypertext Transfer Protocol -- HTTP/1.1",
343	              RFC 2068, January 1997.

345	   [RFC2616]  Fielding, R., Gettys, J., Mogul, J., Frystyk, H.,
346	              Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext
347	              Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999.

349	   [RFC3260]  Grossman, D., "New Terminology and Clarifications for
350	              Diffserv", RFC 3260, April 2002.

352	   [SPDY]     Belshe, "SPDY: An experimental protocol for a faster web",
353	              2011, <http://dev.chromium.org/spdy>.

355	   [Sundaresan]
356	              Sundaresan, de Donato, Feamster, Teixeira, Crawford, and
357	              Pescape, "Broadband Internet Performance: A View From the
358	              Gateway, Proceedings of SIGCOMM 2011", August 2011, <http:
359	              //gtnoise.net/papers/2011/sundaresan:sigcomm2011.pdf>.

361	Author's Address

363	   Jim Gettys
364	   Alcatel-Lucent Bell Labs
365	   21 Oak Knoll Road
366	   Carlisle, Massachusetts  01741
367	   USA

369	   Phone: +1 978 254-7060
370	   Email: jg@freedesktop.org