idnits 2.17.1 

draft-ietf-pilc-link-design-15.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Looks like you're using RFC 2026 boilerplate.  This must be updated to
     follow RFC 3978/3979, as updated by RFC 4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard

  == The page length should not exceed 58 lines per page, but there was 1
     longer page, the longest (page 1) being 2537 lines


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack a both a reference to RFC 2119 and the
     recommended RFC 2119 boilerplate, even if it appears to use RFC 2119
     keywords. 

     RFC 2119 keyword, line 1996: '...   SHOULD provide a mechanism to authe...'


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not
     match the current year

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- Couldn't find a document date in the document -- date freshness check
     skipped.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: 'RFC 2460' is mentioned on line 251, but not defined

  ** Obsolete undefined reference: RFC 2460 (Obsoleted by RFC 8200)

  == Missing Reference: 'RFC768' is mentioned on line 843, but not defined

  == Missing Reference: 'RFC1662' is mentioned on line 874, but not defined

  == Missing Reference: 'RFC2582' is mentioned on line 971, but not defined

  ** Obsolete undefined reference: RFC 2582 (Obsoleted by RFC 3782)

  == Missing Reference: 'RFC2990' is mentioned on line 1233, but not defined

  == Missing Reference: 'RFC1633' is mentioned on line 1239, but not defined

  == Missing Reference: 'RFC2205' is mentioned on line 1245, but not defined

  == Missing Reference: 'RFC2210' is mentioned on line 1245, but not defined

  == Missing Reference: 'RFC 2212' is mentioned on line 1250, but not defined

  == Missing Reference: 'RFC2211' is mentioned on line 1256, but not defined

  == Missing Reference: 'RFC2208' is mentioned on line 1266, but not defined

  == Missing Reference: 'RFC2475' is mentioned on line 1268, but not defined

  == Missing Reference: 'RFC2474' is mentioned on line 1364, but not defined

  == Missing Reference: 'RFC2598' is mentioned on line 1289, but not defined

  ** Obsolete undefined reference: RFC 2598 (Obsoleted by RFC 3246)

  == Missing Reference: 'RFC2597' is mentioned on line 1295, but not defined

  == Missing Reference: 'RFC 2990' is mentioned on line 1305, but not defined

  == Missing Reference: 'RFC2212' is mentioned on line 1362, but not defined

  == Missing Reference: 'RFC2865' is mentioned on line 1761, but not defined

  == Missing Reference: 'RFC2131' is mentioned on line 1768, but not defined

  == Missing Reference: 'RFC1332' is mentioned on line 1770, but not defined

  == Missing Reference: 'RFC1939' is mentioned on line 1780, but not defined

  == Missing Reference: 'RFC2060' is mentioned on line 1780, but not defined

  ** Obsolete undefined reference: RFC 2060 (Obsoleted by RFC 3501)

  == Missing Reference: 'RFC2002' is mentioned on line 1784, but not defined

  ** Obsolete undefined reference: RFC 2002 (Obsoleted by RFC 3220)

  == Missing Reference: 'RFC2322' is mentioned on line 1850, but not defined

  == Missing Reference: 'RFC2332' is mentioned on line 1875, but not defined

  == Missing Reference: 'RFC1991' is mentioned on line 1935, but not defined

  ** Obsolete undefined reference: RFC 1991 (Obsoleted by RFC 4880)

  == Missing Reference: 'RFCs-2630-2634' is mentioned on line 1935, but not
     defined

  == Missing Reference: 'Wilbur99' is mentioned on line 2003, but not defined

  == Missing Reference: 'Schneier4' is mentioned on line 2045, but not defined

  == Unused Reference: 'MAGMA-SNOOP' is defined on line 2346, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC2460' is defined on line 2398, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC2630' is defined on line 2407, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC2631' is defined on line 2409, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC2632' is defined on line 2412, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC2710' is defined on line 2418, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC3376' is defined on line 2431, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC3590' is defined on line 2438, but no explicit
     reference was found in the text

  == Unused Reference: 'Stevens94' is defined on line 2445, but no explicit
     reference was found in the text

  == Unused Reference: 'Wilbur89' is defined on line 2458, but no explicit
     reference was found in the text

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ATMFTM'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'BGW01'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'BPK98'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO3309'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'MSMO97'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'PFTK98'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'RED93'

  ** Obsolete normative reference: RFC  793 (Obsoleted by RFC 9293)

  ** Downref: Normative reference to an Informational RFC: RFC 1435

  ** Obsolete normative reference: RFC 1981 (Obsoleted by RFC 8201)

  ** Obsolete normative reference: RFC 2246 (Obsoleted by RFC 4346)

  ** Obsolete normative reference: RFC 2309 (Obsoleted by RFC 7567)

  ** Obsolete normative reference: RFC 2393 (Obsoleted by RFC 3173)

  ** Downref: Normative reference to an Informational RFC: RFC 2394

  ** Downref: Normative reference to an Informational RFC: RFC 2395

  ** Obsolete normative reference: RFC 2581 (Obsoleted by RFC 5681)

  ** Obsolete normative reference: RFC 2406 (Obsoleted by RFC 4303, RFC 4305)

  ** Downref: Normative reference to an Informational RFC: RFC 2689

  ** Downref: Normative reference to an Informational RFC: RFC 2923

  ** Obsolete normative reference: RFC 2988 (Obsoleted by RFC 6298)

  ** Downref: Normative reference to an Informational RFC: RFC 3096

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Schneier95'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Schneier00'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'SRC81'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'SSL2'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'SSL3'

  == Outdated reference: A later version (-06) exists of
     draft-ietf-magma-igmp-proxy-04

  == Outdated reference: A later version (-12) exists of
     draft-ietf-magma-snoop-09

  -- Unexpected draft version: The latest known version of 
     draft-ietf-mboned-iesg-gap-analysis is -00, but you're referring to -01.

  -- Obsolete informational reference (is this intentional?): RFC 1750
     (Obsoleted by RFC 4086)

  -- Obsolete informational reference (is this intentional?): RFC 2401
     (Obsoleted by RFC 4301)

  -- Obsolete informational reference (is this intentional?): RFC 2440
     (Obsoleted by RFC 4880)

  -- Obsolete informational reference (is this intentional?): RFC 2460
     (Obsoleted by RFC 8200)

  -- Obsolete informational reference (is this intentional?): RFC 2461
     (Obsoleted by RFC 4861)

  -- Obsolete informational reference (is this intentional?): RFC 2616
     (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235)

  -- Obsolete informational reference (is this intentional?): RFC 2630
     (Obsoleted by RFC 3369, RFC 3370)

  -- Obsolete informational reference (is this intentional?): RFC 2632
     (Obsoleted by RFC 3850)

  -- Obsolete informational reference (is this intentional?): RFC 2633
     (Obsoleted by RFC 3851)


     Summary: 23 errors (**), 0 flaws (~~), 44 warnings (==), 24 comments
     (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Internet Engineering Task Force                        Phil Karn, editor
3	INTERNET DRAFT                                                  Qualcomm
4	                                                         Carsten Bormann
5	                                             Universitaet Bremen FB3 TZI
6	                                                Godred (Gorry) Fairhurst
7	                                                  University of Aberdeen
8	                                                            Dan Grossman
9	                                                          Motorola, Inc.
10	                                                           Reiner Ludwig
11	                                                       Ericsson Research
12	                                                         Jamshid Mahdavi
13	                                                            Volera, Inc.
14	                                                      Gabriel Montenegro
15	                                   Sun Microsystems Laboratories, Europe
16	                                                               Joe Touch
17	                                                                 USC/ISI
18	                                                              Lloyd Wood
19	                                                           Cisco Systems
20	File: draft-ietf-pilc-link-design-15.txt                  December, 2003
21	                                                     Expires: June, 2004

23	                Advice for Internet Subnetwork Designers

25	Status of this Memo

27	   This document is an Internet-Draft and is in full conformance with
28	   all provisions of Section 10 of RFC2026.

30	   Internet-Drafts are working documents of the Internet Engineering
31	   Task Force (IETF), its areas, and its working groups.  Note that
32	   other groups may also distribute working documents as Internet-
33	   Drafts.

35	   Internet-Drafts are draft documents valid for a maximum of six months
36	   and may be updated, replaced, or obsoleted by other documents at any
37	   time.  It is inappropriate to use Internet-Drafts as reference
38	   material or to cite them other than as "work in progress."

40	   The list of current Internet-Drafts can be accessed at
41	   http://www.ietf.org/ietf/1id-abstracts.txt

43	   The list of Internet-Draft Shadow Directories can be accessed at
44	   http://www.ietf.org/shadow.html.

46	Abstract

48	   This document provides advice to the designers of digital
49	   communication equipment, link-layer protocols and packet-switched
50	   local networks (collectively referred to as subnetworks) who wish to
51	   support the Internet protocols but who may be unfamiliar with the
52	   Internet architecture and the implications of their design choices on
53	   the performance and efficiency of the Internet.

55	Contributors

57	   This document represents a consensus of the members of the IETF
58	   Performance Implications of Link Characteristics (PILC) working
59	   group.

61	   This document would not have been possible without the contributions
62	   of a great number of people in the Performance Implications of Link
63	   Characteristics Working Group.  In particular, the following people
64	   provided major contributions of text, editing and advice to this
65	   document: Mark Allman provided the final editing to complete this
66	   document.  Carsten Bormann provided text on robust header
67	   compression.  Gorry Fairhurst provided text on broadcast and
68	   multicast issues and many valuable comments on the entire document.
69	   Aaron Falk provided text on bandwidth on demand.  Dan Grossman
70	   provided text on security considerations as well as on many facets of
71	   the document.  Reiner Ludwig provided thorough document review and
72	   text on TCP vs. Link-Layer Retransmission.  Jamshid Mahdavi provided
73	   text on TCP performance calculations.  Saverio Mascolo provided
74	   feedback on the document.  Gabriel Montenegro provided feedback on
75	   the document.  Marie-Jose Montpetit provided text on bandwidth on
76	   demand.  Joe Touch provided text on multicast and broadcast.  and
77	   Lloyd Wood provided many valuable comments on drafts of the document.

79	Table of Contents

81	   1      Introduction and Overview
82	   2      Maximum Transmission Units (MTUs) and IP Fragmentation
83	   2.1    Choosing the MTU in Slow Networks
84	   3      Framing on Connection-Oriented Subnetworks
85	   4      Connection-Oriented Subnetworks
86	   5      Broadcasting and Discovery
87	   6      Multicasting
88	   7      Bandwidth on Demand (BoD) Subnets
89	   8      Reliability and Error Control
90	   8.1    TCP vs Link-Layer Retransmission
91	   8.2    Recovery from Subnetwork Outages
92	   8.3    CRCs, Checksums and Error Detection
93	   8.4    How TCP Works
94	   8.5    TCP Performance Characteristics
95	   8.5.1  The Formulae
96	   8.5.2  Assumptions
97	   8.5.3  Analysis of Link-Layer Effects on TCP Performance
98	   9      Quality-of-Service (QoS) considerations
99	   10     Fairness vs Performance
100	   11     Delay Characteristics
101	   12     Bandwidth Asymmetries
102	   13     Buffering, flow & congestion control
103	   14     Compression
104	   15     Packet Reordering
105	   16     Mobility
106	   17     Routing
107	   18     Security Considerations
108	          Normative References
109	          Informative References

111	1 Introduction and Overview

113	   IP, the Internet Protocol [RFC791], is the core protocol of the
114	   Internet. IP defines a simple "connectionless" packet-switched
115	   network.  The success of the Internet is largely attributed to IP's
116	   simplicity, the "end-to-end principle" [SRC81] on which the Internet
117	   is based, and the resulting ease of carrying IP on a wide variety of
118	   subnetworks not necessarily designed with IP in mind.  A subnetwork
119	   refers to any network operating immediately below the IP layer to
120	   connect two or more systems using IP (i.e., end hosts or routers).
121	   In its simplest form, this may be a direct connection between the IP
122	   systems (e.g., using a length of cable or over a wireless medium).

124	   This document defines a subnetwork as a layer 2 network, which is a
125	   network that does not rely upon the services of IP routers to forward
126	   packets between parts of the subnetwork. Although, IP routers may
127	   bridge frames at layer 2 between parts of a subnetwork. Sometimes, it
128	   is convenient to aggregate a group of such subnetworks into a single
129	   logical subnetwork. IP routing protocols (e.g., OSPF, IS-IS, and PIM)
130	   can be configured to support this aggregation, but typically present
131	   a layer-3 subnetwork rather than a layer-2 subnetwork. This may also
132	   result in a specific packet passing several times over the same
133	   layer-2 subnetwork via an intermediate layer-3 gateway (router).
134	   Because that aggregation requires layer-3 components, issues thereof
135	   are beyond the scope of this document.

137	   However, while many subnetworks carry IP, they do not necessarily do
138	   so with maximum efficiency, minimum complexity or minimum cost; nor
139	   do they implement certain features to efficiently support newer
140	   Internet features of increasing importance, such as multicasting or
141	   quality of service.

143	   With the explosive growth of the Internet, IP packets comprise an
144	   increasingly large fraction of the traffic carried by the world's
145	   telecommunications networks. It therefore makes sense to optimize
146	   both existing and new subnetwork technologies for IP as much as
147	   possible.

149	   Optimizing a subnetwork for IP involves three complementary
150	   considerations:

152	   1. Providing functionality sufficient to carry IP.

154	   2. Eliminating unnecessary functions that increase cost or
155	   complexity.

157	   3. Choosing subnetwork parameters that maximize the performance of
158	   the Internet protocols.

160	   Because IP is so simple, consideration 2 is more of an issue than
161	   consideration 1. That is to say, subnetwork designers make many more
162	   errors of commission than errors of omission.  However, certain
163	   enhancements to Internet features, such as multicasting and quality-
164	   of-service, benefit significantly from support given by the
165	   underlying subnetworks beyond that necessary to carry "traditional"
166	   unicast, best-effort IP.

168	   A major consideration in the efficient design of any layered
169	   communication network is the appropriate layer(s) in which to
170	   implement a given function. This issue was first addressed in the
171	   seminal paper "End-to-End Arguments in System Design" [SRC81]. That
172	   paper argued that many functions can be implemented properly *only*
173	   on an end-to-end basis, i.e., at the highest protocol layers, outside
174	   the subnetwork. These functions include ensuring the reliable
175	   delivery of data and the use of cryptography to provide
176	   confidentiality and message integrity.

178	   Such functions cannot be provided solely by the concatenation of hop-
179	   by-hop services, so duplicating these functions at the lower protocol
180	   layers (i.e., within the subnetwork) can be needlessly redundant or
181	   even harmful to cost and performance.

183	   However, partial duplication of functionality in a lower layer can
184	   *sometimes* be justified by performance, security or availability
185	   considerations. Examples include link-layer retransmission to improve
186	   the performance of an unusually lossy channel, e.g., mobile radio;
187	   link level encryption intended to thwart traffic analysis; and
188	   redundant transmission links to improve availability, increase
189	   throughput, or to guarantee performance for certain classes of
190	   traffic.  Duplication of protocol function should be done only with
191	   an understanding of system-level implications, including possible
192	   interactions with higher-layer mechanisms.

194	   The original architecture of the Internet was influenced by the end-
195	   to-end principle, and in our view it has been part of the reason for
196	   the Internet's success.

198	   The remainder of this document discusses the various subnetwork
199	   design issues that the authors consider relevant to efficient IP
200	   support.

202	2 Maximum Transmission Units (MTUs) and IP Fragmentation

204	   IPv4 packets (datagrams) vary in size from 20 bytes (the size of the
205	   IPv4 header alone) to a maximum of 65535 bytes. Subnetworks need not
206	   support maximum-sized (64KB) IP packets, as IP provides a scheme that
207	   breaks packets that are too large for a given subnetwork into
208	   fragments that travel as independent IP packets and are reassembled
209	   at the destination. The maximum packet size supported by a subnetwork
210	   is known as its Maximum Transmission Unit (MTU).

212	   Subnetworks may, but are not required to, indicate the length of each
213	   packet they carry.  One example is Ethernet with the widely used DIX
214	   [DIX82] (not IEEE 802.3 [IEEE8023]) header, which lacks a length
215	   field to indicate the true data length when the packet is padded to a
216	   minimum of 60 bytes.  This is not a problem for uncompressed IP
217	   because each IP packet carries its own length field.

219	   If optional header compression [RFC1144] [RFC2507] [RFC2508]
220	   [RFC3095] is used, however, it is required that the link framing
221	   indicate frame length because it is needed for the reconstruction of
222	   the original header.

224	   In IP version 4 (the version now in widespread use), fragmentation
225	   can occur at either the sending host or in an intermediate router,
226	   and fragments can be further fragmented at subsequent routers if
227	   necessary.

229	   In IP version 6 [RFC 2460], fragmentation can occur only at the
230	   sending host; it cannot occur in a router (called "router
231	   fragmentation" in this document).

233	   Both IPv4, and IPv6 provide a "path MTU discovery" procedure
234	   [RFC1191] [RFC1435] [RFC1981] that allows the sending host to avoid
235	   fragmentation by discovering the minimum MTU along a given path and
236	   reducing its packet sizes accordingly. This procedure is optional in
237	   IPv4 and IPv6.

239	   Path MTU discovery is widely deployed, but it sometimes encounters
240	   problems. Some routers fail to generate the ICMP messages that convey
241	   path MTU information to the sender, and sometimes the ICMP messages
242	   are blocked by overly restrictive firewalls.  The result can be a
243	   "Path MTU Black Hole" [RFC2923] [RFC1435].

245	   The Path MTU Discovery procedure, the persistence of path MTU black
246	   holes, and the deletion of router fragmentation in IPv6 reflects a
247	   consensus of the Internet technical community that router
248	   fragmentation is best avoided. This requires that subnetworks support
249	   MTUs that are "reasonably" large. The smallest MTU permitted in IPv4
250	   by [RFC791] is 576 bytes, but such a small value would clearly be
251	   inefficient. Because IPv6 omits fragmentation by routers, [RFC 2460]
252	   specifies a larger minimum MTU of 1280 bytes. Any subnetwork with an
253	   internal packet payload smaller than 1280 bytes must implement a
254	   mechanism that performs fragmentation/reassembly of IP packets
255	   to/from subnetwork frames if it is to support IPv6.

257	   If a subnetwork cannot directly support a "reasonable" MTU with
258	   native framing mechanisms, it should internally fragment. That is, it
259	   should transparently break IP packets into internal data elements and
260	   reassemble them at the other end of the subnetwork.

262	   This leaves the question of what is a "reasonable" MTU.  Ethernet (10
263	   and 100 Mb/s) has a MTU of 1500 bytes, and because of the ubiquity of
264	   Ethernet few Internet paths have MTUs larger than this value.  This
265	   severely limits the utility of larger MTUs provided by other
266	   subnetworks. Meanwhile larger MTUs are increasingly desirable on
267	   high-speed subnetworks to reduce the per-packet processing overhead
268	   in host computers, and implementers are encouraged to provide them
269	   even though they may not be usable when Ethernet is also in the path.

271	   Various "tunneling" schemes, such as GRE [RFC2784] or IP Security in
272	   tunnel mode [RFC2406] treat IP as a subnetwork for IP.  Since
273	   tunneling adds header overhead, it can trigger fragmentation even
274	   when the same physical subnetworks (e.g., Ethernet) are used on both
275	   sides of the host performing IPsec encapsulation. Tunneling has made
276	   it more difficult to avoid router fragmentation and has increased the
277	   incidence of path MTU black holes [RFC2401], [RFC2923]. Larger
278	   subnetwork MTUs may help to alleviate this problem.

280	 2.1 Choosing the MTU in Slow Networks

282	   In slow networks, the largest possible packet may take a considerable
283	   time to send.  This is known as channelisation or serialisation
284	   delay. Total end-to-end interactive response time should not exceed
285	   the well-known human factors limit of 100 to 200 ms. This includes
286	   all sources of delay: electromagnetic propagation delay, queuing
287	   delay, serialisation delay, and the store-and-forward time, i.e,. the
288	   time to transmit a packet at link speed.

290	   At low link speeds, store-and-forward delays can dominate total end-
291	   to-end delay, and these are in turn directly influenced by the
292	   maximum transmission unit (MTU) size. Even when an interactive packet
293	   is given a higher queuing priority, it may have to wait for a large
294	   bulk transfer packet to finish transmission.  This worst-case wait
295	   can be set by an appropriate choice of MTU.

297	   For example, if the MTU is set to 1500 bytes, then a MTU-sized packet
298	   will take about 8 milliseconds to send on a T1 (1.536 Mb/s) link.
299	   But if the link speed is 19.2kb/s, then the transmission time becomes
300	   625 ms -- well above our 100-200ms limit.  A 256-byte MTU would lower
301	   this delay to a little over 100 ms. However, care should be taken not
302	   to lower the MTU excessively, as this will increase header overhead
303	   and trigger frequent router fragmentation (if Path MTU discovery is
304	   not in use).  This is likely the case with multicast.

306	   One way to limit delay for interactive traffic without imposing a
307	   small MTU is to give priority to this traffic and to preempt (abort)
308	   the transmission of a lower-priority packet when a higher priority
309	   packet arrives in the queue.  However, the link resources used to
310	   send the aborted packet are lost, and overall throughput will
311	   decrease.

313	   Another way is to implement a link-level multiplexing scheme that
314	   allows several packets to be in progress simultaneously, with
315	   transmission priority given to segments of higher-priority IP
316	   packets. For links using the Point-To-Point Protocol (PPP) [RFC1661],
317	   multi-class multilink [RFC2686] [RFC2687] [RFC2689] provides such a
318	   facility.

320	   ATM (asynchronous transfer mode), where SNDUs are fragmented and
321	   interleaved across smaller 53-byte ATM cells, is another example of
322	   this technique. However, ATM is generally used on high-speed links
323	   where the store-and-forward delays are already minimal, and it
324	   introduces significant (~9%) additional overhead due to the addition
325	   of 5-byte cell overhead to each 48-byte ATM cell.

327	   A third example is Data-Over-Cable Service Interface Specifications
328	   (DOCSIS) with typical upstream bandwidths of 2.56 Mb/s or 5.12 Mb/s.
329	   To reduce the impact of a 1500-byte MTU in DOCSIS 1.0 [DOCSIS1], a
330	   data link layer fragmentation mechanism is specified in DOCSIS 1.1
331	   [DOCSIS2]. To accommodate the installed base, DOCSIS 1.1 must be
332	   backward compatible with DOCSIS 1.0 cable modems, which generally do
333	   not support fragmentation. Under the co-existence of DOCSIS 1.0 and
334	   DOCSIS 1.1, the unfragmented large data packets from DOCSIS 1.0 cable
335	   modems may affect the quality of service for voice packets from
336	   DOCSIS 1.1 cable modems. In this case, it has been shown in [DOCSIS3]
337	   that use of bandwidth allocation algorithms can mitigate this effect.

339	   To summarize, there is a fundamental tradeoff between efficiency and
340	   latency in the design of a subnetwork, and the designer should keep
341	   this tradeoff in mind.

343	3 Framing on Connection-Oriented Subnetworks

345	   IP requires that subnetworks mark the beginning and end of each
346	   variable-length, asynchronous IP packet.  Some examples of links and
347	   subnetworks that do not provide this as an intrinsic feature include:

349	   1. leased lines carrying a synchronous bit stream;

351	   2. ISDN B-channels carrying a synchronous octet stream;

353	   3. dialup telephone modems carrying an asynchronous octet stream;

355	   and

357	   4. Asynchronous Transfer Mode (ATM) networks carrying an asynchronous
358	   stream of fixed-sized "cells".

360	   The Internet community has defined packet framing methods for all
361	   these subnetworks. The Point-To-Point Protocol (PPP) [RFC1661], which
362	   uses a variant of HDLC, is applicable to bit synchronous, octet
363	   synchronous and octet asynchronous links (i.e., examples 1-3 above).
364	   PPP is one prefered framing method for IP, since a large number of
365	   systems interoperate with PPP. ATM has its own framing methods
366	   described in [RFC2684] [RFC2364].

368	   At high speeds, a subnetwork should provide a framed interface
369	   capable of carrying asynchronous, variable-length IP datagrams.  The
370	   maximum packet size supported by this interface is discussed above in
371	   the MTU/Fragmentation section.  The subnetwork may implement this
372	   facility in any convenient manner.

374	   IP packet boundaries need not coincide with any framing or
375	   synchronization mechanisms internal to the subnetwork. When the
376	   subnetwork implements variable sized data units, the most
377	   straightforward approach is to place exactly one IP packet into each
378	   subnetwork data unit (SNDU), and to rely on the subnetwork's existing
379	   ability to delimit SNDUs to also delimit IP packets.  A good example
380	   is Ethernet. However, some subnetworks have SNDUs of one or more
381	   fixed sizes, as dictated by switching, forward error correction
382	   and/or interleaving considerations.  Examples of such subnetworks
383	   include ATM, with a single cell size of 48 bytes plus a 5-byte
384	   header, and IS-95 digital cellular, with two "rate sets" of four
385	   fixed frame sizes each that may be selected on 20 millisecond
386	   boundaries.

388	   Because IP packets are of variable length, they may not necessarily
389	   fit into an integer multiple of fixed-sized SNDUs. An "adaptation
390	   layer" is needed to convert IP packets into SNDUs while marking the
391	   boundary between each IP packet in some manner.

393	   There are several approaches to the problem. The first is to encode
394	   each IP packet into one or more SNDUs, with no SNDU containing pieces
395	   of more than one IP packet, and padding out the last SNDU of the
396	   packet as needed.  Bits in a control header added to each SNDU
397	   indicate where the data segment belongs in the IP packet. If the
398	   subnetwork provides in-order, at-most-once delivery, the header can
399	   be as simple as a pair of bits to indicate whether the SNDU is the
400	   first and/or the last in the IP packet. Alternatively for subnetworks
401	   that do not reorder the fragments of A SNDU, only the last SNDU of
402	   the packet could be marked, as this would implicitly indicate the
403	   next SNDU as the first in a new IP packet. The AAL5 (ATM Adaption
404	   Layer 5) scheme used with ATM is an example of this approach, though
405	   it adds other features, including a payload length field and a
406	   payload CRC.

408	   In AAL5, the ATM User-User Indication, which is encoded in the
409	   Payload Type field of an ATM cell, indicates the end cell of a
410	   packet.  The packet trailer is located at the end of the SNDU and
411	   contains the packet length and a CRC.

413	   Another framing technique is to insert per-segment overhead to
414	   indicate the presence of a segment option.  When present, the option
415	   carries a pointer to the end of the packet.  This differs from AAL5
416	   in that it permits another packet to follow within the same segment.
417	   MPEG-2 [EN301] [ISO13818] supports this style of fragmentation, and
418	   may utilize either padding (limiting each transport stream packet to
419	   carry only part of one packet), or to allow a second packet to start
420	   (no padding).

422	   A third approach is to insert a special flag sequence into the data
423	   stream between each IP packet, and to pack the resulting data stream
424	   into SNDUs without regard to SNDU boundaries. This may have
425	   implications when frames are lost. The flag sequence can also pad
426	   unused space at the end of an SNDU. If the special flag appears in
427	   the user data, it is escaped to an alternate sequence (usually larger
428	   than a flag) to avoid being misinterpreted as a flag.  The HDLC-based
429	   framing schemes used in PPP are all examples of this approach.

431	   All three adaptation schemes introduce overhead; how much depends on
432	   the distribution of IP packet sizes, the size(s) of the SNDUs, and in
433	   the HDLC-like approaches, the content of the IP packet (since flag-
434	   like sequences occurring in the packet must be escaped, which expands
435	   them). The designer must also weigh implementation complexity and
436	   performance in the choice and design of an adaptation layer.

438	4 Connection-Oriented Subnetworks

440	   IP has no notion of a "connection"; it is a purely connectionless
441	   protocol.  When a connection is required by an application, it is
442	   usually provided by TCP [RFC793], the Transmission Control Protocol,
443	   running atop IP on an end-to-end basis.

445	   Connection-oriented subnetworks can be (and are widely) used to carry
446	   IP, but often with considerable complexity.  Subnetworks with a few
447	   nodes can simply open a permanent connection between each pair of
448	   nodes. This is frequently done with ATM. However, the number of
449	   connections increases as the square of the number of nodes, so this
450	   is clearly impractical for large subnetworks. A "shim" layer between
451	   IP and the subnetwork is therefore required to manage connections.
452	   This is one of the most common functions of a Subnetwork Dependent
453	   Convergence Function (SNDCF) sublayer between IP and a subnetwork.

455	   SNDCFs typically open subnetwork connections as needed when an IP
456	   packet is queued for transmission and close them after an idle
457	   timeout. There is no relation between subnetwork connections and any
458	   connections that may exist at higher layers (e.g., TCP).

460	   Because Internet traffic is typically bursty and transaction-
461	   oriented, it is often difficult to pick an optimal idle timeout. If
462	   the timeout is too short, subnetwork connections are opened and
463	   closed rapidly, possibly over-stressing the subnetwork call
464	   management system (especially if was designed for voice traffic
465	   holding times). If the timeout is too long, subnetwork connections
466	   are idle much of the time, wasting any resources dedicated to them by
467	   the subnetwork.

469	   Purely connectionless subnets (such as Ethernet), which have no state
470	   and dynamically share resources, are optimal to supporting best-
471	   effort IP, which is stateless and dynamically shares resources.
472	   Connection-oriented packet networks (such as ATM and Frame Relay),
473	   which have state and dynamically share resources, are less optimal,
474	   since best effort IP does not benefit from the overhead of creating
475	   and maintaining state.  Connection-oriented circuit switched networks
476	   (including the PSTN and ISDN) both have state and statically allocate
477	   resources for a call, and thus not only require state creation and
478	   maintenance overhead, but also do not benefit from the efficiencies
479	   of statistical multiplexing sharing of capacity inherent in IP.

481	   In any event, if an SNDCF that opens and closes subnet connections is
482	   used to support IP, care should be taken to make sure that call
483	   processing in the subnet can keep up with relatively short holding
484	   times.

486	5 Broadcasting and Discovery

488	   Subnetworks fall into two categories: point-to-point and shared.  A
489	   point-to-point subnet has exactly two endpoint components (hosts or
490	   routers); a shared link has more than two, using either an inherent
491	   broadcast medium (e.g., Ethernet, radio) or that are on a switching
492	   layer hidden from the network layer (e.g., switched Ethernet, Myrinet
493	   [MYR95], ATM).  Switched subnetworks handle broadcast by copying
494	   broadcast packets to give to each interface that supports one, or
495	   more, systems (hosts or routers) a copy of each packet.

497	   Several Internet protocols for IPv4 make use of broadcast
498	   capabilities, including link-layer address lookup (ARP), auto-
499	   configuration (RARP, BOOTP, DHCP), and routing (RIP).

501	   A lack of broadcast capability can impede the performance of these
502	   protocols, or render them inoperable (e.g. DHCP). ARP-like link
503	   address lookup can be provided by a centralized database, but at the
504	   expense of potentially higher response latency and the need for nodes
505	   to have explicit knowledge of the ARP server address. Shared links
506	   should support native, link-layer subnet broadcast.

508	   A corresponding set of IPv6 protocols uses multicasting (see next
509	   section) instead of broadcast to provide similar functions with
510	   improved scaling in large networks.

512	6 Multicasting

514	   The Internet model includes "multicasting", where IP packets are sent
515	   to all the members of a multicast group [RFC1112] [RFC2236].
516	   Multicast is an option in IPv4, but a standard feature of IPv6.  IPv4
517	   multicast is currently used by multimedia, teleconferencing, gaming,
518	   and file distribution (web, peer-to-peer sharing) applications, as
519	   well as by some key network and host protocols (e.g., RIPv2, OSPF,
520	   NTP).  IPv6 additionally relies on multicast for network
521	   configuration (DHCP-like autoconfiguration) and link-layer address
522	   discovery [RFC2461] (replacing ARP). In the case of IPv6 this can
523	   allow autoconfiguration and address discovery to span across routers,
524	   whereas the IPv4 broadcast-based services cannot without ad-hoc
525	   router support [RFC1812].

527	   Multicast enabled IP routers organize each multicast group into a
528	   spanning tree, and route multicast packets by making a copy of each
529	   multicast packet and forwards the copies to each output interface
530	   that includes at least one downstream member of the multicast group.

532	   Multicasting is considerably more efficient when a subnetwork
533	   explicitly supports it. For example, a router relaying a multicast
534	   packet onto an Ethernet subnet need send only one copy of the packet,
535	   no matter how many members of the multicast group are connected to
536	   the segment.  Without native multicast support, routers and switches
537	   on shared links would need to use broadcast with software filters,
538	   such that every multicast packet sent incurs software overhead for
539	   every node on the subnetwork, even if a node is not a member of the
540	   multicast group.  Alternately, the router would transmit a separate
541	   copy to every member of the multicast group on the segment, as is
542	   done on multicast-incapable switched subnets.

544	   Subnetworks using shared channels (e.g., radio LANs, Ethernets, etc.)
545	   are especially suitable for native multicasting, and their designers
546	   should make every effort to support it. This involves designating a
547	   section of the subnetwork's own address space for multicasting. On
548	   these networks, multicast is basically broadcast on the medium, with
549	   Layer-2 receiver filters.

551	   Subnet interfaces also need to be designed to accept packets
552	   addressed to some number of multicast addresses in addition to the
553	   unicast packets specifically addressed to them. How many multicast
554	   addresses need to be supported by a host depends on the requirements
555	   of the associated host; at least several dozen will meet most current
556	   needs.

558	   On low-speed networks the multicast address recognition function may
559	   be readily implemented in host software, but on high-speed networks
560	   it should be implemented in subnetwork hardware. This hardware need
561	   not be complete; for example, many Ethernet interfaces implement a
562	   "hashing" function where the IP layer receives all of the multicast
563	   (and unicast) traffic to which the associated host subscribes, plus
564	   some small fraction of multicast traffic to which the host does not
565	   subscribe.  Host/router software then has to discard the unwanted
566	   packets that pass the Layer-2 multicast address filter [RFC1112].

568	   There does not need to be a one-to-one mapping between a layer 2
569	   multicast address and an IP multicast address. An address overlap may
570	   significantly degrade the filtering capability of a receiver's
571	   hardware multicast address filter. A subnetwork supporting only
572	   broadcast should use this service for multicast and must rely on
573	   software filtering.

575	   Switched subnetworks must also provide a mechanism for copying
576	   multicast packets to ensure the packets reach at least all members of
577	   a multicast group.  One option is to "flood" multicast packets, in
578	   the same manner as broadcast.  This can lead to unnecessary
579	   transmissions on some subnetwork links (notably non-multicast-aware
580	   ethernet switches). Some subnetworks therefore allow multicast filter
581	   tables to control which links receive packets belonging to a specific
582	   group.  To configure this automatically requires access to layer 3
583	   group membership information (e.g., IGMP).  Various implementation
584	   options currently exist to provide a subnet node with a list of
585	   multicast addresses to port/interface mappings [MBONED-GAP].  These
586	   employ a range of approaches, including signaling from end hosts
587	   (e.g.  IEEE 802 GARP/GMRP [802.1p]), signaling from switches (e.g.
588	   CGMP [CGMP] and RGMP [RFC3488]), interception and proxy of IP group
589	   membership packets (e.g.  IGMP/MLD Proxy [MAGMA-PROXY]), and enabling
590	   Layer 2 devices to snoop/inspect/peek into forwarded Layer 3 protocol
591	   headers (e.g. IGMP, MLD, PIM) so that they may infer L3 multicast
592	   group membership. These approaches differ in their complexity,
593	   flexibility and ability to support new protocols.

595	7 Bandwidth on Demand (BoD) Subnets

597	   Some subnets allow a number of subnet nodes to share a channel
598	   efficiently by assigning transmission opportunities dynamically.
599	   Transmission opportunities are requested by a subnet node when it has
600	   packets to send.  The subnet schedules and grants transmission
601	   opportunities sufficient to allow the transmitting subnet node to
602	   send one or more packets (or packet fragments).  We call these
603	   subnets Bandwidth on Demand (BoD) subnets.  Examples of BoD subnets
604	   include Demand Assignment Multiple Access (DAMA) satellite and
605	   terrestrial wireless networks, IEEE 802.11 point coordination
606	   function (PCF) mode, and DOCSIS.  A connection-oriented network (like
607	   the PSTN, ATM or Frame Relay) reserves resources on a much longer
608	   timescale, and is therefore not a BoD subnet in our taxonomy.

610	   The design parameters for BoD are similar to those in connection
611	   oriented subnetworks, although the implementations may vary
612	   significantly.  In BoD, the user typically requests access to the
613	   shared channel for some duration. Access may be allocated for a
614	   period of time at a specific rate, for a certain number of packets,
615	   or until the user releases the channel. Access may be coordinated
616	   through a central management entity or with a distributed algorithm
617	   amongst the users.  Examples of the resource that may be shared
618	   include a terrestrial wireless hop, a cable modem uplink, a satellite
619	   uplink, and an end-to-end satellite channel.

621	   Long-delay BoD subnets pose problems similar to connection-oriented
622	   networks in anticipating traffic. While connection-oriented subnets
623	   that expect new data to arrive hold idle channels open, BoD subnets
624	   request channel access based on buffer occupancy (or expected buffer
625	   occupancy) on the sending port. Poor performance will likely result
626	   if the sender does not anticipate additional traffic arriving at that
627	   port during the time it takes to grant a transmission request. It is
628	   recommended that the algorithm have the capability to extend a hold
629	   on the channel for data that has arrived after the original request
630	   was generated (this may done by piggybacking new requests on user
631	   data).

633	   There is a wide variety of BoD protocols available.  However, there
634	   has been relatively little comprehensive research on the interactions
635	   between BoD mechanisms and Internet protocol performance.  Research
636	   on some specific mechanisms is available (e.g., [AR02]).  One item
637	   that has been studied is TCP's retransmission timer [KY02].  BoD
638	   systems can cause spurious timeouts when adjusting from a relatively
639	   high data rate to a relatively low data rate.  In this case, TCP's
640	   transmitted data takes longer to get through the network than
641	   predicted by the TCP sender's computed retransmission timeout and
642	   therefore the TCP sender is prone to resending a segment prematurely.

644	8 Reliability and Error Control

646	   In the Internet architecture, the ultimate responsibility for error
647	   recovery is at the end points [SRC81]. The Internet may occasionally
648	   drop, corrupt, duplicate or reorder packets, and the transport
649	   protocol (e.g., TCP) or application (e.g., if UDP is used as the
650	   transport protocol) must recover from these errors on an end-to-end
651	   basis.  Error recovery in the subnetwork is therefore justifiable
652	   only to the extent that it can enhance overall performance.  It is
653	   important to recognize that a subnetwork can go too far in attempting
654	   to provide error recovery services in the Internet environment.
655	   Subnet reliability should be "lightweight", i.e., it only has to be
656	   "good enough", *not* perfect.

658	   In this section we discuss how to analyze characteristics of a
659	   subnetwork to determine what is "good enough".  The discussion below
660	   focuses on TCP, which is the most widely-used transport protocol in
661	   the Internet.  It is widely believed (and is a stated goal within the
662	   IETF) that non-TCP transport protocols should attempt to be "TCP-
663	   friendly" and have many of the same performance characteristics.
664	   Thus, the discussion below should be applicable even to portions of
665	   the Internet where TCP may not be the predominant protocol.

667	 8.1 TCP vs Link-Layer Retransmission

669	   Error recovery involves the generation and transmission of redundant
670	   information computed from user data. Depending on how much redundant
671	   information is sent and how it is generated, the receiver can use it
672	   to reliably detect transmission errors; correct up to some maximum
673	   number of transmission errors; or both. The general approach is known
674	   as Error Control Coding, or ECC.

676	   The use of ECC to detect transmission errors so that retransmissions
677	   (hopefully without errors) can be requested is widely known as "ARQ"
678	   (Automatic Repeat Request).

680	   When enough ECC information is available to permit the receiver to
681	   correct some transmission errors without a retransmission, the
682	   approach is known as Forward Error Correction (FEC). Due to the
683	   greater complexity of the required ECC and the need to tailor its
684	   design to the characteristics of a specific modem and channel, FEC
685	   has traditionally been implemented in special-purpose hardware
686	   integral to a modem. This effectively makes it part of the physical
687	   layer.

689	   Unlike ARQ, FEC was seldom used for telecommunications outside of
690	   space links prior to the 1990s.  It is now nearly universal in
691	   telephone, cable and DSL modems, digital satellite links and digital
692	   mobile telephones. FEC is also heavily used in optical and magnetic
693	   storage where "retransmissions" are not possible.

695	   Some systems use hybrid combinations of ARQ layered atop FEC; V.90
696	   dialup modems (in the upstream direction) with V.42 error control are
697	   one example. Most errors are corrected by the trellis (FEC) code
698	   within the V.90 modem, and most that remain are detected and
699	   corrected by the ARQ mechanisms in V.42.

701	   Work is now underway to apply FEC above the physical layer, primarily
702	   in connection with reliable multicasting [RFC3048] where conventional
703	   ARQ mechanisms are inefficient or difficult to implement. However, in
704	   this discussion we will assume that if FEC is present, it is
705	   implemented within the physical layer.

707	   Depending on the layer where it is implemented, error control can
708	   operate on an end-to-end basis or over a shorter span such as a
709	   single link.  TCP is the most important example of an end-to-end
710	   protocol that uses an ARQ strategy.

712	   Many link-layer protocols use ARQ, usually some flavor of HDLC
713	   [ISO3309]. Examples include the X.25 link layer, the AX.25 protocol
714	   used in amateur packet radio, 802.11 wireless LANs, and the reliable
715	   link layer specified in IEEE 802.2.

717	   Only end-to-end error recovery can ensure a reliable service to the
718	   application (see Section 8).  However, some subnetworks (e.g., many
719	   wireless links) also require link-layer error recovery as a
720	   performance enhancement [RFC3366].  For example, many cellular links
721	   have small physical frame sizes (< 100 bytes) and relatively high
722	   frame loss rates. Relying entirely on end-to-end error recovery
723	   clearly yields a performance degradation, as retransmissions across
724	   the end-to-end path take much longer to be received than when link
725	   layer retransmissions are used. Thus, link-layer error recovery can
726	   often increase end-to-end performance. As a result, link-layer and
727	   end-to-end recovery often co-exist; this can lead to the possibility
728	   of inefficient interactions between the two layers of ARQ protocols.

730	   This inter-layer "competition" might lead to the following wasteful
731	   situation. When the link layer retransmits (parts of) a packet, the
732	   link latency momentarily increases. Since TCP bases its
733	   retransmission timeout on prior measurements of total end-to-end
734	   latency, including that of the link in question, this sudden increase
735	   in latency may trigger an unnecessary retransmission by TCP of a
736	   packet that the link layer is still retransmitting.  Such spurious
737	   end-to-end retransmissions generate unnecessary load and reduce end-
738	   to-end throughput. As a result, the link layer may even have multiple
739	   copies of the same packet in the same link queue at the same time. In
740	   general, one could say the competing error recovery is caused by an
741	   inner control loop (link-layer error recovery) reacting to the same
742	   signal as an outer control loop (end- to-end error recovery) without
743	   any coordination between the loops.  Note that this is solely an
744	   efficiency issue; TCP continues to provide reliable end-to-end
745	   delivery over such links.

747	   This raises the question of how persistent a link-layer sender should
748	   be in performing retransmission [RFC3366]. We define the link-layer
749	   (LL) ARQ persistency as the maximum time that a particular link will
750	   spend trying to transfer a packet before it can be discarded. This
751	   deliberately simplified definition says nothing about maximum number
752	   of retransmissions, retransmission strategies, queue sizes, queuing
753	   disciplines, transmission delays, or the like. The reason we use the
754	   term LL ARQ persistency instead of a term such as 'maximum link-layer
755	   packet holding time' is that the definition closely relates to link-
756	   layer error recovery. For example, on links that implement
757	   straightforward error recovery strategies, LL ARQ persistency will
758	   often correspond to a maximum number of retransmissions permitted per
759	   link-layer frame.

761	   For link layers that do not or cannot differentiate between flows
762	   (e.g., due to network layer encryption), the LL ARQ persistency
763	   should be small.  This avoids any harmful effects or performance
764	   degradation resulting from indiscriminate high persistence.  A
765	   detailed discussion of these issues is provided in [RFC3366].

767	   However, when a link layer can identify individual flows and apply
768	   ARQ selectively [LKJK02], then the link ARQ persistency should be
769	   high for a flow using reliable unicast transport protocols (e.g.,
770	   TCP) and must be low for all other flows.  Setting the link ARQ
771	   persistency larger than the largest link outage allows TCP to rapidly
772	   restore transmission without the need to wait for a retransmission
773	   time out. This generally improves TCP performance in the face of
774	   transient outages.  However, excessively high persistence may be
775	   disadvantageous; a practical upper limit of 30-60 seconds may be
776	   desirable. Implementation of such schemes remains a research issue.
777	   (See also Section "Recovery from Subnetwork Outages").

779	   Many subnetwork designers have opportunities to reduce the
780	   probability of packet loss, e.g., with FEC, ARQ and interleaving, at
781	   the cost of increased delay. TCP performance improves with decreasing
782	   loss but worsens with increasing end-to-end delay, so it is important
783	   to find the proper for expected TCP traffic on its end-to-end paths
784	   across the subnet balance through analysis and simulation.

786	 8.2 Recovery from Subnetwork Outages

788	   Some types of subnetworks, particularly mobile radio, are subject to
789	   frequent temporary outages. For example, an active cellular data user
790	   may drive or walk into an area (such as a tunnel) that is out of
791	   range of any base station. No packets will be successfully delivered
792	   until the user returns to an area with coverage.

794	   The Internet protocols currently provide no standard way for a
795	   subnetwork to explicitly notify an upper layer protocol (e.g., TCP)
796	   that it is experiencing an outage rather than severe congestion.

798	   Under these circumstances TCP will, after each unsuccessful
799	   retransmission, wait even longer before trying again; this is its
800	   "exponential back-off" algorithm. Furthermore, TCP will not discover
801	   that the subnetwork outage has ended until its next retransmission
802	   attempt. If TCP has backed off, this may take some time.  This can
803	   lead to extremely poor TCP performance over such subnetworks.

805	   It is therefore highly desirable that a subnetwork subject to outages
806	   not silently discard packets during an outage. Ideally, the
807	   subnetwork should define an interface to the next higher layer (i.e.,
808	   IP) that allows it to refuse packets during an outage, and to
809	   automatically ask IP for new packets when it is again able to deliver
810	   them. If it cannot do this, then the subnetwork should hold onto at
811	   least some of the packets it accepts during an outage and attempt to
812	   deliver them when the outage ends. When packets are discarded, IP
813	   should be notified so that the appropriate ICMP messages can be sent.

815	   Note that it is *not* necessary to completely avoid dropping packets
816	   during an outage. The purpose of holding onto a packet during an
817	   outage, either in the subnetwork or at the IP layer, is so that its
818	   eventual delivery will implicitly notify TCP that the subnetwork is
819	   again operational. This is to enhance performance, not to ensure
820	   reliability -- reliability, as discussed earlier, can only be ensured
821	   on an end-to-end basis.

823	   Only a few packets per TCP connection, including ACKs, need be held
824	   in this way to cause the TCP sender to recover from the additional
825	   losses once the flow resumes [RFC3366].

827	   Because it would be a layering violation (and possibly a performance
828	   hit) for IP or a subnetwork layer to look at TCP headers (which would
829	   in any event be impossible if IPsec [RFC2401] encryption is in use),
830	   it would be reasonable for the IP or subnetwork layers to choose, as
831	   a design parameter, some small number of packets that will be
832	   retained during an outage.

834	 8.3 CRCs, Checksums and Error Detection

836	   The TCP [RFC793], UDP [RFC768], ICMP, and IPv4 [RFC791] protocols all
837	   use the same simple 16-bit 1's complement checksum algorithm
838	   [RFC1071] to detect corrupted packets.  The IPv4 header checksum
839	   protects only the IPv4 header, while the TCP, ICMP, and UDP checksums
840	   provide end-to-end error detection for both the transport pseudo
841	   header (including network and transport layer information) and the
842	   transport payload data. Protection of the data is optional for
843	   applications using UDP [RFC768] for IPv4, but is required for IPv6.

845	   The Internet checksum is not very strong from a coding theory
846	   standpoint, but it is easy to compute in software, and various
847	   proposals to replace the Internet checksums with stronger checksums
848	   have failed.  However, it is known that undetected errors can and do
849	   occur in packets received by end hosts [SP2000].

851	   To reduce processing costs, IPv6 has no IP header checksum.  The
852	   destination host detects "important" errors in the IP header such as
853	   the delivery of the packet to the wrong destination. This is done by
854	   including the IP source and destination addresses (pseudo header) in
855	   the computation of the checksum in the TCP or UDP header, a practice
856	   already performed in IPv4.  Errors in other IPv6 header fields may go
857	   undetected within the network; this was considered a reasonable price
858	   to pay for a considerable reduction in the processing required by
859	   each router, and it was assumed that subnetworks would use a strong
860	   link CRC.

862	   One way to provide additional protection for an IPv4 or IPv6 header
863	   is by the authentication and packet integrity services of the IP
864	   Security (IPsec) protocol [RFC2401]. However, this may not be a
865	   choice available to the subnetwork designer.

867	   Most subnetworks implement error detection just above the physical
868	   layer. Packets corrupted in transmission are detected and discarded
869	   before delivery to the IP layer.  A 16-bit cyclic redundancy check
870	   (CRC) is usually the minimum for error detection. This is
871	   significantly more robust against most patterns of errors than the
872	   16-bit Internet checksum.  However, not that the error detection
873	   properties of a specific CRC code diminish with increasing frame
874	   size. The Point-to-Point Protocol [RFC1662] requires support of a
875	   16-bit CRC for each link frame, with a 32-bit CRC as an option.  (PPP
876	   is often used in conjunction with a dialup modem, which can provides
877	   its own error control). Other subnetworks, including 802.3/Ethernet,
878	   AAL5/ATM, FDDI, Token Ring and PPP over SONET/SDH all use a 32-bit
879	   CRC.  Many subnetworks can also use other mechanisms to enhance the
880	   error detection capability of the link CRC (e.g., FEC in dialup
881	   modems, mobile radio and satellite channels).

883	   Any new subnetwork designed to carry IP should therefore provide
884	   error detection for each IP packet that is at least as strong as the
885	   32-bit CRC specified in [ISO3309].  While this will achieve a very
886	   low undetected packet error rate due to transmission errors, it will
887	   not (and need not) achieve a very low packet loss rate as the
888	   Internet protocols are better suited to dealing with lost packets
889	   than to dealing with corrupted packets [SRC81].

891	   Packet corruption may be, and is, also caused by bugs in host and
892	   router hardware and software. Even if every subnetwork implemented
893	   strong error detection, it is still essential that end-to-end
894	   checksums are used at the receiving end host [SP2000].

896	   Designers of complex subnetworks consisting of internal links and
897	   packet switches should consider implementing error detection on an
898	   edge-to-edge basis to cover an entire SNDU (or IP packet). A CRC
899	   would be generated at the entry point to the subnetwork and checked
900	   at the exit endpoint.  This may be used instead of, or in combination
901	   with, error detection at the interface to each physical link. An
902	   edge-to-edge check has the significant advantage of protecting
903	   against errors introduced anywhere within the subnetwork, not just
904	   within its transmission links.  Examples of this approach include the
905	   way in the Ethernet CRC-32 is handled by LAN bridges [802.1D].  ATM
906	   AAL5 [ITU-I363] also uses an edge-to-edge CRC-32.

908	   Some specific applications may be tolerant of residual errors in the
909	   data they exchange, but removal of the link CRC may expose the
910	   network to an undesirable increase in undetected errors in the IP and
911	   transport headers. Applications may also require a high level of
912	   error protection for control information exchanged by protocols
913	   acting above the transport layer.  One example is a voice codec which
914	   is robust against bit errors in the speech samples.  For such
915	   mechanisms to work, the receiving application must be able to
916	   tolerate receiving corrupted data. This also requires that an
917	   application uses a mechanism to signal payload corruption is
918	   permitted and to indicate the coverage (headers and data) that is
919	   required to be protected by the subnetwork CRC.  Currently there is
920	   no Internet standard for supporting partial payload protection.
921	   Receipt of corrupt data by arbitrary application protocols carries a
922	   serious danger that a subnet delivers data with errors which remain
923	   undetected by the application and hence corrupt the communicated data
924	   [SRC81].

926	 8.4 How TCP Works

928	   One of TCP's functions is end-host based congestion control for the
929	   Internet.  This is a critical part of the overall stability of the
930	   Internet, so it is important that link-layer designers understand
931	   TCP's congestion control algorithms.

933	   TCP assumes that, at the most abstract level, the network consists of
934	   links and queues.  Queues provide output-buffering on links that are
935	   momentarily oversubscribed.  They smooth instantaneous traffic bursts
936	   to fit the link bandwidth.  When demand exceeds link capacity long
937	   enough to fill the queue, packets must be dropped. The traditional
938	   action of dropping the most recent packet ("tail dropping") is no
939	   longer recommended [RFC2309,RFC2914], but it is still widely
940	   practiced.

942	   TCP uses sequence numbering and acknowledgments (ACKs) on an end-to-
943	   end basis to provide reliable, sequenced delivery.  TCP ACKs are
944	   cumulative, i.e., each implicitly ACKs every segment received so far.
945	   If a packet with an unexpected sequence number is received, the ACK
946	   field in the packets returned by the receiver will cease to advance.
947	   Using an optional enhancement, TCP can send selective acknowledgments
948	   (SACKs) [RFC2018] to indicate which segments have arrived at the
949	   receiver.

951	   Since the most common cause of packet loss is congestion, TCP treats
952	   packet loss as a potential indication of Internet congestion along
953	   the path between TCP endhosts. This happens automatically, and the
954	   subnetwork need not know anything about IP or TCP. A subnetwork node
955	   simply drops packets whenever it must, though some packet-dropping
956	   strategies (e.g., RED) are more fair to competing flows than others.

958	   TCP recovers from packet losses in two different ways. The most
959	   important mechanism is the retransmission timeout. If an ACK fails to
960	   arrive after a certain period of time, TCP retransmits the oldest
961	   unacked packet. Taking this as a hint that the network is congested,
962	   TCP waits for the retransmission to be ACKed before it continues, and
963	   it gradually increases the number of packets in flight as long as a
964	   timeout does not occur again.

966	   A retransmission timeout can impose a significant performance
967	   penalty, as the sender is idle during the timeout interval and
968	   restarts with a congestion window of 1 following the timeout. To
969	   allow faster recovery from the occasional lost packet in a bulk
970	   transfer, an alternate scheme known as "fast recovery" was introduced
971	   [RFC2581] [RFC2582] [RFC2914] [TCPF98].

973	   Fast recovery relies on the fact that when a single packet is lost in
974	   a bulk transfer, the receiver continues to return ACKs to subsequent
975	   data packets that do not actually acknowledge any newly-received
976	   data. These are known as "duplicate acknowledgments" or "dupacks".
977	   The sending TCP can use dupacks as a hint that a packet has been lost
978	   and retransmit it without waiting for a timeout.  Dupacks effectively
979	   constitute a negative acknowledgment (NAK) for the packet sequence
980	   number in the acknowledgment field.  TCP waits until a certain number
981	   of dupacks (currently 3) are seen prior to assuming a loss has
982	   occurred; this helps avoid an unnecessary retransmission during out-
983	   of-sequence delivery.

985	   A new technique called "Explicit Congestion Notification" (ECN)
986	   [RFC3168] allows routers to directly signal congestion to hosts
987	   without dropping packets.  This is done by setting a bit in the IP
988	   header.  Since ECN support is likely to remain optional, the lack of
989	   an ECN bit must NEVER be interpreted as a lack of congestion.  Thus,
990	   for the foreseeable future, TCP must interpret a lost packet as a
991	   signal of congestion.

993	   The TCP "congestion avoidance" [RFC2581] algorithm maintains a
994	   congestion window (cwnd) controlling the amount of data TCP may have
995	   in flight at any moment.  Reducing cwnd reduces the overall bandwidth
996	   obtained by the connection; similarly, raising cwnd increases the
997	   performance, up to the limit of the available capacity.

999	   TCP probes for available network capacity by initially setting cwnd
1000	   to one or two packets and then increasing cwnd by one packet for each
1001	   ACK returned from the receiver. This is TCP's "slow start" mechanism.
1002	   When a packet loss is detected (or congestion is signaled by other
1003	   mechanisms), cwnd is reset to one and the slow start process is
1004	   repeated until cwnd reaches one half of its previous setting before
1005	   the reset. Cwnd continues to increase past this point, but at a much
1006	   slower rate than before. If no further losses occur, cwnd will
1007	   ultimately reach the window size advertised by the receiver.

1009	   This is an "Additive Increase, Multiplicative Decrease" (AIMD)
1010	   algorithm.  The steep decrease of cwnd in response to congestion
1011	   provides for network stability; the AIMD algorithm also provides for
1012	   fairness between long running TCP connections sharing the same path.

1014	 8.5 TCP Performance Characteristics

1016	  Caveat

1018	   Here we present a current "state-of-the-art" understanding of TCP
1019	   performance.  This analysis attempts to characterize the performance
1020	   of TCP connections over links of varying characteristics.

1022	   Link designers may wish to use the techniques in this section to
1023	   predict what performance TCP/IP may achieve over a new link-layer
1024	   design.  Such analysis is encouraged.  Because this is a relatively
1025	   new analysis, and the theory is based on single-stream TCP
1026	   connections under "ideal" conditions, it should be recognized that
1027	   the results of such analysis may differ from actual performance in
1028	   the Internet.  That being said, we have done the best we can to
1029	   provide information which will help designers get an accurate picture
1030	   of the capabilities and limitations of TCP under various conditions.

1032	  8.5.1 The Formulae

1034	   The performance of TCP's AIMD Congestion Avoidance algorithm has been
1035	   extensively analyzed.  The current best formula for the performance
1036	   of the specific algorithms used by Reno TCP (i.e., the TCP specified
1037	   in [RFC2581]) is given by Padhye, et al [PFTK98].  This formula is:

1039	                                         MSS
1040	           BW = --------------------------------------------------------
1041	                RTT*sqrt(1.33*p) + RTO*p*[1+32*p^2]*min[1,3*sqrt(.75*p)]

1043	   where

1045	           BW   is the maximum TCP throughout achievable by an
1046	                individual TCP flow
1047	           MSS  is the TCP segment size being used by the connection
1048	           RTT  is the end-to-end round trip time of the TCP connection
1049	           RTO  is the packet timeout (based on RTT)
1050	           p    is the packet loss rate for the path
1051	                (i.e. .01 if there is 1% packet loss)

1053	   Note that the speed of the links making up the Internet path does not
1054	   explicitly appear in this formula. Attempting to send faster than the
1055	   slowest link in the path causes the queue to grow at the transmitter
1056	   driving the bottleneck. This increases the RTT, which in turn reduces
1057	   the achievable throughput.

1059	   This is currently considered to be the best approximate formula for
1060	   Reno TCP performance.  A further simplification to this formula is
1061	   generally made by assuming that RTO is approximately 5*RTT.

1063	   TCP is constantly being improved.  A simpler formula, which gives an
1064	   upper bound on the performance of any AIMD algorithm which is likely
1065	   to be implemented in TCP in the future, was derived by Ott, et al
1066	   [MSMO97].

1068	                     MSS   1
1069	           BW = C    --- -------
1070	                     RTT sqrt(p)

1072	   where C is 0.93.

1074	  8.5.2 Assumptions

1076	   Both formulae assume that the TCP Receiver Window is not limiting the
1077	   performance of the connection.  Because the receiver window is
1078	   entirely determined by end-hosts, we assume that hosts will maximize
1079	   the announced receiver window to maximize their network performance.

1081	   Both of these formulae allow BW to become infinite if there is no
1082	   loss.  However, an Internet path will drop packets at bottleneck
1083	   queues if the load is too high.  Thus, a completely lossless TCP/IP
1084	   network can never occur (unless the network is being underutilized).

1086	   The RTT used is the arithmetic average, including queuing delays.

1088	   The formulae are for a single TCP connection.  If a path carries many
1089	   TCP connections, each will follow the formulae above independently.

1091	   The formulae assume long-running TCP connections.  For connections
1092	   that are extremely short (<10 packets) and don't lose any packets,
1093	   performance is driven by the TCP slow-start algorithm.  For
1094	   connections of medium length, where on average only a few segments
1095	   are lost, single connection performance will actually be slightly
1096	   better than given by the formulae above.

1098	   The difference between the simple and complex formulae above is that
1099	   the complex formula includes the effects of TCP retransmission
1100	   timeouts.  For very low levels of packet loss (significantly less
1101	   than 1%), timeouts are unlikely to occur, and the formulae lead to
1102	   very similar results.  At higher packet losses (1% and above), the
1103	   complex formula gives a more accurate estimate of performance (which
1104	   will always be significantly lower than the result from the simple
1105	   formula).

1107	   Note that these formulae break down as p approaches 100%.

1109	  8.5.3 Analysis of Link-Layer Effects on TCP Performance

1111	   Consider the following example:

1113	   A designer invents a new wireless link layer which, on average, loses
1114	   1% of IP packets.  The link layer supports packets of up to 1040
1115	   bytes, and has a one-way delay of 20 msec.

1117	   If this link layer were used in the Internet, on a path that
1118	   otherwise had a round trip of 80 msec, you could compute an upper
1119	   bound on the performance as follows:

1121	   For MSS, use 1000 bytes to exclude the 40 bytes of minimum IPv4 and
1122	   TCP headers.

1124	   For RTT, use 120 msec (80 msec for the Internet part, plus 20 msec
1125	   each way for the new wireless link).

1127	   For p, use .01.  For C, assume 1.

1129	   The simple formula gives:

1131	   BW = (1000 * 8 bits) / (.120 sec * sqrt(.01)) = 666 kbit/sec

1133	   The more complex formula gives:

1135	   BW = 402.9 kbit/sec

1137	   If this were a 2 Mb/s wireless LAN, the designers might be somewhat
1138	   disappointed.

1140	   Some observations on performance:

1142	   1.  We have assumed that the packet losses on the link layer are
1143	   interpreted as congestion by TCP.  This is a "fact of life" that must
1144	   be accepted.

1146	   2.  The equations for TCP performance are all expressed in terms of
1147	   packet loss, but many subnetwork designers think in terms of bit-
1148	   error ratio.  *If* channel bit errors are independent, then the
1149	   probability of a packet being corrupted is:

1151	   p = 1 - ([1 - BER]^[FRAME_SIZE*8])

1153	   Here we assume FRAME_SIZE is in bytes and "^" represents
1154	   exponentiation. It includes the user data and all headers (TCP,IP and
1155	   subnetwork).  (Note: this analysis assumes the subnetwork does not
1156	   perform ARQ or transparent fragmentation [RFC3366].)  If the
1157	   inequality

1159	   BER * [FRAME_SIZE*8] << 1

1161	   holds, the packet loss probability p can be approximated by:

1163	   p = BER * [FRAME_SIZE*8]

1165	   These equations can be used to apply BER to the performance equations
1166	   above.

1168	   Note that FRAME_SIZE can vary from one packet to the next.  Small
1169	   packets (such as TCP acks) generally have a smaller probability of
1170	   packet error than, say, a TCP packet carrying one MSS (maximum
1171	   segment size) of user data.  A flow of small TCP acks can be expected
1172	   to be slightly more reliable than a stream of larger TCP data
1173	   segments.

1175	   It bears repeating that the above analysis assumes that bit errors
1176	   are statistically independent. Because this is not true for many real
1177	   links, our computation of p is actually an upper bound, not the exact
1178	   probability of packet loss.

1180	   There are many reasons why bit errors are not independent on real
1181	   links.  Many radio links are affected by propagation fading or by
1182	   interference that lasts over many bit times.

1184	   Also, links with Forward Error Correction (FEC) generally have very
1185	   non-uniform bit error distributions that depend on the type of FEC,
1186	   but in general the uncorrected errors tend to occur in bursts even
1187	   when channel symbol errors are independent.  In all such cases our
1188	   computation of p from BER can only place an upper limit on the packet
1189	   loss rate.

1191	   If the distribution of errors under the FEC scheme is known, one
1192	   could apply the same type of analysis as above, using the correct
1193	   distribution function for the BER.  It is more likely in these FEC
1194	   cases, however, that empirical methods are needed to determine the
1195	   actual packet loss rate.

1197	   3.  Note that the packet size plays an important role.  If the
1198	   subnetwork loss characteristics are such that large packets have the
1199	   same probability of loss as smaller packets, then larger packets will
1200	   yield improved performance.

1202	   4.  We have chosen a specific RTT that might occur on a wide-area
1203	   Internet path within the USA.  It is important to recognize that a
1204	   variety of RTT values are experienced in the Internet.

1206	   For example, RTTs are typically less than 10 msec in a wired LAN
1207	   environment when communicating with a local host.  International
1208	   connections may have RTTs of 200 msec or more.  Modems and other low-
1209	   capacity links can add considerable delay due to their long packet
1210	   transmission (serialisation) times.

1212	   Links over geostationary repeater satellites have one-way speed-of-
1213	   light delays of around 250ms: a minimum of 125ms propagation delay up
1214	   to the satellite and 125ms down. The RTT of an end-to-end TCP
1215	   connection that includes such a link can be expected to be greater
1216	   than 250ms.

1218	   Queues on heavily-congested links may back up, increasing RTTs.
1219	   Finally, virtual private networks (VPNs) and other forms of
1220	   encryption and tunneling can add significant end-to-end delay to
1221	   network connections.

1223	9 Quality-of-Service (QoS) considerations

1225	   It is generally recognized that specific service guarantees are
1226	   needed to support real-time multimedia, toll-quality telephony and
1227	   other performance-critical applications. The provision of such
1228	   Quality of Service guarantees in the Internet is an active area of
1229	   research and standardization. The IETF has not converged on a single
1230	   service model, set of services or single mechanism that will offer
1231	   useful guarantees to applications and be scalable to the Internet.
1232	   Indeed, the IETF does not have a single definition of Quality of
1233	   Service.  [RFC2990] represents a current understanding of the
1234	   challenges in architecting QoS for the Internet.

1236	   There are presently two architectural approaches to providing
1237	   mechanisms for QoS support in the Internet.

1239	   IP Integrated Services (Intserv) [RFC1633] provides fine-grained
1240	   service guarantees to individual flows.  Flows are identified by a
1241	   flow specification (flowspec), which creates a stateful association
1242	   between individual packets by matching fields in the packet header.
1243	   Capacity is reserved for the flow, and appropriate traffic
1244	   conditioning and scheduling is installed in routers along the path.
1245	   The ReSerVation Protocol (RSVP) [RFC2205, RFC2210] is usually, but
1246	   need not necessarily be, used to install the flow QoS state.  Intserv
1247	   defines two services, in addition to the Default (best effort)
1248	   service.

1250	     -- Guaranteed Service (GS) [RFC 2212] offers hard upper bounds on
1251	     delay to flows that conform to a traffic specification (TSpec).  It
1252	     uses a fluid-flow model to relate the TSpec and reserved bandwidth
1253	     (RSpec) to variable delay.  Non-conforming packets are forwarded on
1254	     a best-effort basis.

1256	     -- Controlled Load Service (CLS) [RFC2211] offers delay and packet
1257	     loss equivalent to that of an unloaded network to flows that
1258	     conform to a TSpec, but no hard bounds. Non-conforming packets are
1259	     forwarded on a best-effort basis.

1261	   Intserv requires installation of state information in every
1262	   participating router. Performance guarantees cannot be made unless
1263	   this state is present in every router along the path.  This, along
1264	   with RSVP processing and the need for usage-based accounting, is
1265	   believed to have scalability problems, particularly in the core of
1266	   the Internet [RFC2208].

1268	   IP Differentiated Services (Diffserv) [RFC2475] provides a "toolkit"
1269	   offering coarse-grained controls to aggregates of flows.  Diffserv in
1270	   itself does NOT provide QoS guarantees, but can be used to construct
1271	   services with QoS guarantees across a Diffserv domain.  Diffserv
1272	   attempts to address the scaling issues associated with Intserv by
1273	   requiring state awareness only at the edge of a Diffserv domain.  At
1274	   the edge, packets are classified into flows, and the flows are
1275	   conditioned (marked, policed or shaped) to a traffic conditioning
1276	   specification (TCS).  A Diffserv Codepoint (DSCP), identifying a per-
1277	   hop behavior (PHB), is set in each packet header.  The DSCP is
1278	   carried in the DS-field, subsuming six bits of the former Type-of-
1279	   Service (ToS) byte [RFC791] of the IP header [RFC2474].   The PHB
1280	   denotes the forwarding behavior to be applied to the packet in each
1281	   node in the Diffserv domain. Although there is a "recommended" DSCP
1282	   associated with each PHB, the mappings from DSCPs to PHBs are defined
1283	   by the DS-domain.  In fact, there can be several DSCPs associated
1284	   with the same PHB.  Diffserv presently defines three PHBs.

1286	   The class selector PHB [RFC2474] replaces the IP precedence field of
1287	   the former ToS byte. It offers relative forwarding priorities.

1289	   The Expedited Forwarding (EF) PHB [RFC2598] guarantees that packets
1290	   will have a well-defined minimum departure rate which, if not
1291	   exceeded, ensures that the associated queues are short or empty.  EF
1292	   is intended to support services that offer tightly-bounded loss,
1293	   delay and delay jitter.

1295	   The Assured Forwarding (AF) PHB group [RFC2597] offers different
1296	   levels of forwarding assurance for each aggregated flow of packets.
1297	   Each AF group is independently allocated forwarding resources.
1298	   Packets are marked with one of three drop precedences; those with the
1299	   highest drop precedence are dropped with lower probability than those
1300	   marked with the lowest drop precedence.  DSCPs are recommended for
1301	   four independent AF groups, although a DS domain can have more or
1302	   fewer AF groups.

1304	   Ongoing work in the IETF is addressing ways to support Intserv with
1305	   Diffserv.  There is some belief (e.g. as expressed in [RFC 2990])
1306	   that such an approach will allow individual flows to receive service
1307	   guarantees and scale to the global Internet.

1309	   The QoS guarantees that can be offered by the IP layer are a product
1310	   of two factors:

1312	     -- the concatenation of the QoS guarantees offered by the subnets
1313	     along the path of a flow. This implies that a subnet may wish to
1314	     offer multiple services (with different QoS guarantees) to the IP
1315	     layer, which can then determine which flows use which subnet
1316	     service.  To put it another way, forwarding behavior in the subnet
1317	     needs to be 'clued' by the forwarding behavior (service or PHB) at
1318	     the IP layer, and

1320	     -- the operation of a set of cooperating mechanisms, such as
1321	     bandwidth reservation and admission control, policy management,
1322	     traffic classification, traffic conditioning (marking, policing
1323	     and/or shaping), selective discard, queuing and scheduling.  Note
1324	     that support for QoS in subnets may require similar mechanisms,
1325	     especially when these subnets are general topology subnets (e.g.,
1326	     ATM, frame relay or MPLS) or shared media subnets.

1328	   Many subnetwork designers face inherent tradeoffs between delay,
1329	   throughput, reliability and cost. Other subnetworks have parameters
1330	   that manage bandwidth, internal connection state, and the like.
1331	   Therefore, the following subnetwork capabilities may be desirable,
1332	   although some might be trivial or moot if the subnet is a dedicated
1333	   point-to-point link.

1335	     - The subnetwork should have the ability to reserve bandwidth for a
1336	     connection or flow and schedule packets accordingly.

1338	     - Bandwidth reservations should be based on a one- or two-token
1339	     bucket model, depending on whether the service is intended to
1340	     support constant-rate or bursty traffic.

1342	     - If a connection or flow does not use its reserved bandwidth at a
1343	     given time, the unused bandwidth should be available for other
1344	     flows.

1346	     - Packets in excess of a connection or flow's agreed rate should be
1347	     forwarded as best-effort or discarded, depending on the service
1348	     offered by the subnet to the IP layer.

1350	     - If a subnet contains error control mechanisms (retransmission
1351	     and/or FEC), it should be possible for the IP layer to influence
1352	     the inherent tradeoffs between uncorrected errors, packet losses
1353	     and delay.  These capabilities at the subnet/IP layer service
1354	     boundary correspond to to selection of more or less error control
1355	     and/or to selection of particular error control mechanisms within
1356	     the subnetwork.

1358	     - The subnet layer should know, and be able to inform the IP layer,
1359	     how much fixed delay and delay jitter it offers for a flow or
1360	     connection.  If the Intserv model is used, the delay jitter
1361	     component may best be expressed in terms of the TSpec/RSpec model
1362	     described in [RFC2212].

1364	     - Support of the Diffserv class selectors [RFC2474] suggests that
1365	     the subnet might consider mechanisms that support priorities.

1367	10 Fairness vs Performance

1369	   Subnetwork designers should be aware of the tradeoffs between
1370	   fairness and efficiency inherent in many transmission scheduling
1371	   algorithms. For example, many local area networks use contention
1372	   protocols to resolve access to a shared transmission channel.  These
1373	   protocols represent overhead. Limiting the amount of data that a
1374	   subnet node may transmit per contention cycle helps assure timely
1375	   access to the channel for each subnet node, but it also increases
1376	   contention overhead per unit of data sent.

1378	   In some mobile radio networks, capacity is limited by interference,
1379	   which in turn depends on average transmitter power. Some receivers
1380	   may require considerably more transmitter power (generating more
1381	   interference and consuming more channel capacity) than others.

1383	   In each case, the scheduling algorithm designer must balance
1384	   competing objectives: providing a fair share of capacity to each
1385	   subnet node while maximizing the total capacity of the network.  One
1386	   approach for balancing performance and fairness is outlined in
1387	   [ES00].

1389	11 Delay Characteristics

1391	   The TCP sender bases its retransmission timeout (RTO) on measurements
1392	   of the round trip delay experienced by previous packets. This allows
1393	   TCP to adapt automatically to the very wide range of delays found on
1394	   the Internet. The recommended algorithms are described in [RFC2988].
1395	   Evaluations of TCP's retransmission timer can be found in [AP99] and
1396	   [LS00].

1398	   These algorithms model the delay along an Internet path as a
1399	   normally-distributed random variable with slowly-varying mean and
1400	   standard deviation. TCP estimates these two parameters by
1401	   exponentially smoothing individual delay measurements, and it sets
1402	   the RTO to the estimated mean delay plus some fixed number of
1403	   standard deviations. (The algorithm actually uses mean deviation as
1404	   an approximation to standard deviation, as it is easier to compute.)

1406	   The goal is to compute a RTO that is small enough to detect and
1407	   recover from packet losses while minimizing unnecessary ("spurious")
1408	   retransmissions when packets are unexpectedly delayed but not lost.
1409	   Although these goals conflict, the algorithm works well when the
1410	   delay variance along the Internet path is low, or the packet loss
1411	   rate is low.

1413	   If the path delay variance is high, TCP sets a RTO that is much
1414	   larger than the mean of the measured delays. But if the packet loss
1415	   rate is low, the large RTO is of little consequence, as timeouts
1416	   occur only rarely.  Conversely, if the path delay variance is low,
1417	   then TCP recovers quickly from lost packets; again, the algorithm
1418	   works well.  However when delay variance and the packet loss rate are
1419	   both high, these algorithms perform poorly, especially when the mean
1420	   delay is also high.

1422	   Because TCP uses returning acknowledgments as a "clock" to time the
1423	   transmission of additional data, excessively high delays (even if the
1424	   delay variance is low) also affect TCP's ability to fully utilize a
1425	   high-speed transmission pipe. It also slows down the recovery of lost
1426	   packets even when delay variance is small.

1428	   Subnetwork designers should therefore minimize all three parameters
1429	   (delay, delay variance and packet loss) as much as possible.

1431	   In many subnetworks, these parameters are inherently in conflict.
1432	   For example, on a mobile radio channel the subnetwork designer can
1433	   use retransmission (ARQ) and/or forward error correction (FEC) to
1434	   trade off delay, delay variance and packet loss in an effort to
1435	   improve TCP performance. For example, while ARQ increases delay
1436	   variance, FEC does not. However, FEC (especially when combined with
1437	   interleaving) often increases mean delay even on good channels where
1438	   ARQ retransmissions are not needed and ARQ would not increase either
1439	   the delay or the delay variance.

1441	   The tradeoffs among these error control mechanisms and their
1442	   interactions with TCP can be quite complex, and are the subject of
1443	   much ongoing research. We therefore recommend that subnetwork
1444	   designers provide as much flexibility as possible in the
1445	   implementation of these mechanisms, and to provide access to them as
1446	   discussed above in the section on Quality of Service.

1448	12 Bandwidth Asymmetries

1450	   Some subnetworks may provide asymmetric bandwidth (or may cause TCP
1451	   packet flows to experience asymmetry in the capacity) and the
1452	   Internet protocol suite will generally still work fine.  However,
1453	   there is a case when such a scenario reduces TCP performance.  Since
1454	   TCP data segments are 'clocked' out by returning acknowledgments, TCP
1455	   senders are limited by the rate at which ACKs can be returned
1456	   [BPK98].  Therefore, when the ratio of the bandwidth of the
1457	   subnetwork carrying the data to the bandwidth of the subnetwork
1458	   carrying the acknowledgments is too large, the slow return of the
1459	   ACKs directly impacts performance.  Since ACKs are generally smaller
1460	   than data segments, TCP can tolerate some asymmetry, but as a general
1461	   rule designers of subnetworks should be aware that subnetworks with
1462	   significant asymmetry can result in reduced performance, unless
1463	   issues are taken to mitigate this [RFC3449].

1465	   Several strategies have been identified for reducing the impact of
1466	   asymmetry of the network path between two TCP end hosts, e.g.
1467	   [RFC3449].  These techniques attempt to reduce the number of ACKs
1468	   transmitted over the return path (low bandwidth channel) by changes
1469	   at the end host(s), and/or by modification of subnetwork packet
1470	   forwarding. While these solutions may mitigate the performance issues
1471	   caused by asymmetric subnetworks, they do have associated cost and
1472	   may have other implications. A fuller discussion of strategies and
1473	   their implications is provided in [RFC3449].

1475	13 Buffering, flow & congestion control

1477	   Many subnets include multiple links with varying traffic demands and
1478	   possibly different transmission speeds. At each link there must be a
1479	   queuing system, including buffering, scheduling and a capability to
1480	   discard excess subnet packets.  These queues may also be part of a
1481	   subnet flow control or congestion control scheme.

1483	   For the purpose of this discussion, we talk about packets without
1484	   regard to whether they refer to a complete IP packet or a subnetwork
1485	   frame.  At each queue, a packet experiences a delay that depends on
1486	   competing traffic and the scheduling discipline, and is subjected to
1487	   a local discarding policy.

1489	   Some subnets may have flow or congestion control mechanisms in
1490	   addition to packet dropping.  Such mechanisms can operate on
1491	   components in the subnet layer, such as schedulers, shapers or
1492	   discarders, and can affect the operation of IP forwarders at the
1493	   edges of the subnet.  However, with the exception of Explicit
1494	   Congestion Notification [RFC3168] (discussed below), IP has no way to
1495	   pass explicit congestion or flow control signals to TCP.

1497	   TCP traffic, especially aggregated TCP traffic, is bursty.  As a
1498	   result, instantaneous queue depths can vary dramatically, even in
1499	   nominally stable networks.  For optimal performance, packets should
1500	   be dropped in a controlled fashion, not just when buffer space is
1501	   unavailable.  How much buffer space should be supplied is still a
1502	   matter of debate, but as a rule of thumb, each node should have
1503	   enough buffering to hold one link_bandwidth*link_delay product's
1504	   worth of data for each TCP connection sharing the link.

1506	   This is often difficult to estimate, since it depends on parameters
1507	   beyond the subnetwork's control or knowledge. Internet nodes
1508	   generally do not implement admission control policies, and cannot
1509	   limit the number of TCP connections that use them.  In general, it is
1510	   wise to err in favor of too much buffering rather than too little.
1511	   It may also be useful for subnets to incorporate mechanisms that
1512	   measure propagation delays to assist in buffer sizing calculations.

1514	   There is a rough consensus in the research community that active
1515	   queue management is important to improving fairness, link utilization
1516	   and throughput [RFC2309].  Although there are questions and concerns
1517	   about the effectiveness of active queue management (e.g., [MBDL99]),
1518	   it is widely considered an improvement over tail-drop discard
1519	   policies.

1521	   One form of active queue management is the Random Early Detection
1522	   (RED) algorithm [RED93], actually a family of related algorithms. In
1523	   one version of RED, an exponentially-weighted moving average of the
1524	   queue depth is maintained:

1526	     When this average queue depth is between a maximum threshold
1527	     max_th, and a minimum threshold min_th, packets are dropped with a
1528	     probability which is proportional to the amount by which the
1529	     average queue depth exceeds min_th.

1531	     When this average queue depth is equal to max_th, the drop
1532	     probability is equal to a configurable parameter max_p.

1534	     When this average queue depth is greater than max_th, packets are
1535	     always dropped.  Numerous variants on RED appear in the literature,
1536	     and there are other active queue management algorithms which claim
1537	     various advantages over RED [GM02].

1539	   With an active queue management algorithm, dropped packets become a
1540	   feedback signal to trigger more appropriate congestion behavior by
1541	   the TCPs in the end hosts.  Randomization of dropping tends to break
1542	   up the observed tendency of TCP windows belonging to different TCP
1543	   connections to become synchronized by correlated drops, and it also
1544	   imposes a degree of fairness on those connections that properly
1545	   implement TCP congestion avoidance.  Another important property of
1546	   active queue management algorithms is that they attempt to keep
1547	   average queue depths short while accommodating large short term
1548	   bursts.

1550	   Since TCP neither knows nor cares whether congestive packet loss
1551	   occurs at the IP layer or in a subnet, it may be advisable for
1552	   subnets that perform queuing and discarding to consider implementing
1553	   some form of active queue management.  This is especially true if
1554	   large aggregates of TCP connections are likely to share the same
1555	   queue.  However, active queue management may be less effective in the
1556	   case of many queues carrying smaller aggregates of TCP connections,
1557	   e.g., in an ATM switch that implements per-VC queuing.

1559	   Note that the performance of active queue management algorithms is
1560	   highly sensitive to settings of configurable parameters, and also to
1561	   factors such as RTT [MBB00] [FB00].

1563	   Some subnets, most notably ATM, perform segmentation and reassembly
1564	   at the subnetwork edges.  Care should be taken here in designing
1565	   discard policies.  If the subnet discards a fragment of an IP packet,
1566	   then the remaining fragments become an unproductive load on the
1567	   subnet that can markedly degrade end-to-end performance [RF95].
1568	   Subnetworks should therefore attempt to discard these extra fragments
1569	   whenever one of them must be discarded.  If the IP packet has already
1570	   been partially forwarded when discarding becomes necessary, then
1571	   every remaining fragment except the one marking the end of the IP
1572	   packet should also be discarded.  For ATM subnets, this specifically
1573	   means using Early Packet Discard and Partial Packet Discard [ATMFTM].

1575	   Some subnets include flow control mechanisms that effectively require
1576	   that the rate of traffic flows be shaped on entry to the subnet.  One
1577	   example of such a subnet mechanism is in the ATM Available Bit rate
1578	   (ABR) service category [ATMFTM].  Such flow control mechanisms have
1579	   the effect of making the subnet nearly lossless by pushing congestion
1580	   into the IP routers at the edges of the subnet.  In such a case,
1581	   adequate buffering and discard policies are needed in these routers
1582	   to deal with a subnet that appears to have varying bandwidth.
1583	   Whether there is benefit in this kind of flow control is
1584	   controversial; there are numerous simulation and analytical studies
1585	   that go both ways.  It appears that some of the issues that lead to
1586	   such different results include sensitivity to ABR parameters, use of
1587	   binary rather than explicit rate feedback, use (or not) of per-VC
1588	   queuing, and the specific ATM switch algorithms selected for the
1589	   study.  Anecdotally, some large networks have used IP over ABR to
1590	   carry TCP traffic, have claimed it to be successful, but have
1591	   published no results.

1593	   Another possible approach to flow control in the subnet would be to
1594	   work with TCP Explicit Congestion Notification (ECN) semantics
1595	   [RFC3168] through utilizing explicit congestion indicators in subnet
1596	   frames.  Routers at the edges of the subnet, rather than shaping,
1597	   would set the explicit congestion bit in those IP packets that are
1598	   received in subnet frames that have an ECN indication.  Nodes in the
1599	   subnet would need to implement an active queue management protocol
1600	   that marks subnet frames instead of dropping them.

1602	   ECN is currently a proposed standard, but it is not yet widely
1603	   deployed.

1605	14 Compression

1607	   Application data compression is a function that can usually be
1608	   omitted in the subnetwork. The endpoints typically have more CPU and
1609	   memory resources to run a compression algorithm and a better
1610	   understanding of what is being compressed.  End-to-end compression
1611	   benefits every network element in the path, while subnetwork-layer
1612	   compression, by definition, benefits only a single subnetwork.

1614	   Data presented to the subnetwork layer may already be in compressed
1615	   format (e.g., a JPEG file), compressed at the application layer
1616	   (e.g., the optional "gzip", "compress", and "deflate" compression in
1617	   HTTP/1.1 [RFC2616]), or compressed at the IP layer (the IP Payload
1618	   Compression Protocol [RFC2393] supports DEFLATE [RFC2394] and LZS
1619	   [RFC2395]).  Compression at the subnetwork edges is of no benefit for
1620	   any of these cases.

1622	   The subnetwork may also process data that has been encrypted by the
1623	   application (OpenPGP [RFC2440] or S/MIME [RFC2633]), just above TCP
1624	   (SSL, TLS [RFC2246]), or just above IP (IPsec ESP [RFC2406]). Ciphers
1625	   generate high entropy bit streams lacking any patterns that can be
1626	   exploited by a compression algorithm.

1628	   However, much data is still transmitted uncompressed over the
1629	   Internet, so subnetwork compression may be beneficial.  Any
1630	   subnetwork compression algorithm must not expand uncompressible data,
1631	   e.g., data that has already been compressed or encrypted.

1633	   We make a strong recommendation that subnetworks operating at low
1634	   speed or with small MTUs compress IP and transport-level headers (TCP
1635	   and UDP) using several header compression schemes developed within
1636	   the IETF. An uncompressed 40-byte TCP/IP header takes about 33
1637	   milliseconds to send at 9600 bps.  "VJ" TCP/IP header compression
1638	   [RFC1144] compresses most headers to 3-5 bytes, reducing transmission
1639	   time to several milliseconds on dialup modem links. This is
1640	   especially beneficial for small, latency-sensitive packets in
1641	   interactive sessions.

1643	   Similarly, RTP compression schemes such as CRTP [RFC2508] and ROHC
1644	   [RFC3095] compress most IP/UDP/RTP headers to one to four bytes.  The
1645	   resulting savings are especially significant when audio packets are
1646	   kept small to minimize store-and-forward latency.

1648	   Designers should consider the effect of the subnetwork error rate on
1649	   the performance of header compression. TCP ordinarily recovers from
1650	   lost packets by retransmitting only those packets that were actually
1651	   lost; packets arriving correctly after a packet loss are kept on a
1652	   resequencing queue and do not need to be retransmitted.  In VJ TCP/IP
1653	   [RFC1144] header compression, however, the receiver cannot explicitly
1654	   notify a sender of data corruption and subsequent loss of
1655	   synchronization between compressor and decompressor. It relies
1656	   instead on TCP retransmission to re-synchronize the decompressor.
1657	   After a packet is lost, the decompressor must discard every
1658	   subsequent packet, even if the subnetwork makes no further errors,
1659	   until the sending TCP retransmits to re-synchronize the decompressor.
1660	   This effect can substantially magnify the effect of subnetwork packet
1661	   losses if the sending TCP window is large, as it will often be on a
1662	   path with a large bandwidth*delay product [LRKOJ99].

1664	   Alternate header compression schemes, such as those described in
1665	   [RFC2507] include an explicit request for retransmission of an
1666	   uncompressed packet to allow decompressor resynchronization without
1667	   waiting for a TCP retransmission.  However, these schemes are not yet
1668	   in widespread use.

1670	   Both TCP header compression schemes do not compress widely-used TCP
1671	   options such as selective acknowledgements (SACK).  Both fail to
1672	   compress TCP traffic that makes use of explicit congestion
1673	   notification (ECN).  Work is under way in the IETF ROHC WG to address
1674	   these shortcomings in a ROHC header compression scheme for TCP
1675	   [RFC3095] [RFC3096].

1677	   The subnetwork error rate also is important for RTP header
1678	   compression.  CRTP uses delta encoding, so a packet loss on the link
1679	   causes uncertainty about the subsequent packets, which often must be
1680	   discarded until the decompressor has notified the compressor and the
1681	   compressor has sent re-synchronizing information.  This typically
1682	   takes slightly more than the end-to-end path round-trip time.  For
1683	   links that combine significant error rates with latencies that
1684	   require multiple packets to be in flight at a time, this leads to
1685	   significant error propagation, i.e. subsequent losses caused by an
1686	   initial loss.

1688	   For links that are both high-latency (multiple packets in flight from
1689	   a typical RTP stream) and error-prone, RTP ROHC provides a more
1690	   robust way of RTP header compression, at a cost of higher complexity
1691	   at the compressor and decompressor.  For example, within a talk
1692	   spurt, only extended losses of (depending on the mode chosen) 12 to
1693	   64 packets typically cause error propagation.

1695	15 Packet Reordering

1697	   The Internet architecture does not guarantee that packets will arrive
1698	   in the same order in which they were originally transmitted, and
1699	   transport protocols like TCP must take this into account.

1701	   However, reordering does come at a cost with TCP as it is currently
1702	   defined. Because TCP returns a cumulative acknowledgment (ACK)
1703	   indicating the last in-order segment that has arrived, out-of-order
1704	   segments cause a TCP receiver to transmit a duplicate acknowledgment.
1705	   When the TCP sender notices three duplicate acknowledgments, it
1706	   assumes that a segment was dropped by the network and uses the fast
1707	   retransmit algorithm [Jac90] [RFC2581] to resend the segment.  In
1708	   addition, the congestion window is reduced by half, effectively
1709	   halving TCP's sending rate.  If a subnetwork reorders segments
1710	   significantly such that three duplicate ACKs are generated, the TCP
1711	   sender needlessly reduces the congestion window and performance
1712	   suffers.

1714	   Packet reordering does frequently occur in parts of the Internet, and
1715	   it seems to be difficult or impossible to eliminate [BPS99].  For
1716	   this reason, research has begun into improving TCP's behavior in the
1717	   face of packet reordering [LK00] [BA02].

1719	   [BPS99] cites reasons why it may even be undesirable to eliminate
1720	   reordering. There are situations where average packet latency can be
1721	   reduced, link efficiency can be increased, and/or reliability can be
1722	   improved if reordering is permitted.  Examples include certain high
1723	   speed switches within the Internet backbone and the parallel links
1724	   used over many Internet paths for load splitting and redundancy.

1726	   This suggests that subnetwork implementers should try to avoid packet
1727	   reordering whenever possible, but not if doing so compromises
1728	   efficiency, impairs reliability or increases average packet delay.

1730	   Note that every header compression scheme currently standardized for
1731	   the Internet requires in-order packet delivery on the link between
1732	   compressor and decompressor. PPP is frequently used to carry
1733	   compressed TCP/IP packets; since it was originally designed for
1734	   point-to-point and dialup links it is assumed to provide in-order
1735	   delivery. For this reason, subnetwork implementers who provide PPP
1736	   interfaces to VPNs and other, more complex subnetworks must also
1737	   maintain in-order delivery of PPP frames.

1739	16 Mobility

1741	   Internet users are increasingly mobile. Not only are many Internet
1742	   nodes laptop computers, but pocket organizers and mobile embedded
1743	   systems are also becoming nodes on the Internet. These nodes may
1744	   connect to many different access points on the Internet over time,
1745	   and they expect this to be largely transparent to their activities.
1746	   Except when they are not connected to the Internet at all, and for
1747	   performance differences when they are connected, they expect that
1748	   everything will "just work" regardless of their current Internet
1749	   attachment point or local subnetwork technology.

1751	   Changing a host's Internet attachment point involves one or more of
1752	   the following steps.

1754	   First, if use of the local subnetwork is restricted, the user's
1755	   credentials must be verified and access granted.  There are many ways
1756	   to do this. A trivial example would be an "Internet cafe" that grants
1757	   physical access to the subnetwork for a fee.  Subnetworks may
1758	   implement technical access controls of their own; one example is IEEE
1759	   802.11 Wireless Equivalent Privacy [IEEE80211]. And it is common
1760	   practice for both cellular telephone and Internet service providers
1761	   (ISPs) to agree to serve each others users; RADIUS [RFC2865] is the
1762	   standard means for ISPs to exchange authorization information.

1764	   Second, the host may have to be reconfigured with IP parameters
1765	   appropriate for the local subnetwork. This usually includes setting
1766	   an IP address, default router, and domain name system (DNS) servers.
1767	   On multiple-access networks, the Dynamic Host Configuration Protocol
1768	   (DHCP) [RFC2131] is almost universally used for this purpose. On PPP
1769	   links, these functions are performed by the IP Control Protocol
1770	   (IPCP) [RFC1332].

1772	   Third, traffic destined for the mobile host must be routed to its
1773	   current location. This roaming function is the most common meaning of
1774	   the term "Internet mobility".

1776	   Internet mobility can be provided at any of several layers in the
1777	   Internet protocol stack, and there is ongoing debate as to which are
1778	   the most appropriate and efficient. Mobility is already a feature of
1779	   certain application layer protocols; the Post Office Protocol (POP)
1780	   [RFC1939] and the Internet Message Access Protocol (IMAP) [RFC2060]
1781	   were created specifically to provide mobility in the receipt of
1782	   electronic mail.

1784	   Mobility can also be provided at the IP layer [RFC2002]. This
1785	   mechanism provides greater transparency, viz., IP addresses that
1786	   remain fixed as the nodes move, but at the cost of potentially
1787	   significant network overhead and increased delay because of the sub-
1788	   optimal network routing and tunneling involved.

1790	   Some subnetworks may provide internal mobility, transparent to IP, as
1791	   a feature of their own internal routing mechanisms. To the extent
1792	   that these simplify routing at the IP layer, reduce the need for
1793	   mechanisms like Mobile IP, or exploit mechanisms unique to the
1794	   subnetwork, this is generally desirable. This is especially true when
1795	   the subnetwork covers a relatively small geographic area and the
1796	   users move rapidly between the attachment points within that area.
1797	   Examples of internal mobility schemes include Ethernet switching and
1798	   intra-system handoff in cellular telephony.

1800	   However, if the subnetwork is physically large and connects to other
1801	   parts of the Internet at multiple geographic points, care should be
1802	   taken to optimize the wide-area routing of packets between nodes on
1803	   the external Internet and nodes on the subnet. This is generally done
1804	   with "nearest exit" routing strategies. Because a given subnetwork
1805	   may be unaware of the actual physical location of a destination on
1806	   another subnetwork, it simply routes packets bound for the other
1807	   subnetwork to the nearest router between the two. This implies some
1808	   awareness of IP addressing and routing within the subnetwork. The
1809	   subnetwork may wish to use IP routing internally for wide area
1810	   routing and restrict subnetwork-specific routing to constrained
1811	   geographic areas where the effects of suboptimal routing are
1812	   minimized.

1814	17 Routing

1816	   Subnetworks connecting more than two systems must provide their own
1817	   internal layer-2 forwarding mechanisms, either implicitly (e.g.,
1818	   broadcast) or explicitly (e.g., switched).  Since routing is the
1819	   major function of the Internet layer, the question naturally arises
1820	   as to the interaction between routing at the Internet layer and
1821	   routing in the subnet, and proper division of function between the
1822	   two.

1824	   Layer 2 subnetworks can be point-to-point, connecting two systems, or
1825	   multipoint.  Multipoint subnetworks can be broadcast (e.g., shared
1826	   media or emulated) or non-broadcast.  Generally, IP considers
1827	   multipoint subnetworks as broadcast, with shared-medium Ethernet as
1828	   the canonical (and historical) example, and point-to-point
1829	   subnetworks as a degenerate case.  Non-broadcast subnetworks may
1830	   require additional mechanisms, e.g., above IP at the routing layer
1831	   [RFC2328].

1833	   IP is ignorant of the topology of the subnetwork layer. In
1834	   particular, reconfiguration of subnetwork paths is not tracked by the
1835	   IP layer.  IP is only affected by whether it can send/receive packets
1836	   sent to the remotely connected systems via the subnetwork interface
1837	   (i.e. the reachability from one router to another). IP further
1838	   considers that subnetworks are largely static - that both their
1839	   membership and existence are stable at routing timescales (tens of
1840	   seconds); both events are considered re-provisioning, rather than
1841	   routing.

1843	   Routing functionality in a subnetwork is related to addressing in
1844	   that subnetwork.  Resolution of addresses on subnetwork links is
1845	   required for forwarding IP packets across links (e.g., ARP for IPv4,
1846	   or ND for IPv6). There is unlikely to be direct interaction between
1847	   subnetwork routing and IP routing.  Where broadcast is provided or
1848	   explicitly emulated, address resolution can be used directly; where
1849	   not provided, the link layer routing may interface to a protocol for
1850	   resolution, e.g., to the Next-Hop Resolution Protocol [RFC2322] to
1851	   provide context-dependent address resolution capabilities.

1853	   Subnetwork routing can either complement or compete with IP routing.
1854	   It complements IP when a subnetwork encapsulates its internal
1855	   routing, and where the effects of that routing are not noticible at
1856	   the IP layer.  However, if different paths in the subnetwork have
1857	   characteristics that affect IP routing, it can affect or even inhibit
1858	   the convergence of IP routing.

1860	   Routing protocols generally consider layer 2 subnetworks, i.e., with
1861	   subnet masks and no intermediate IP hops, to have uniform routing
1862	   metrics to all members.  Routing can break when a link's
1863	   characteristics do not match the routing metric, in this case, e.g.,
1864	   when some member pairs have different path characteristics. Consider
1865	   a virtual Ethernet subnetwork that includes both nearby (sub-
1866	   millisecond latency) and remote (100's of milliseconds away) systems.
1867	   Presenting that group as a single subnetwork means that some routing
1868	   protocols will assume that all pairs have the same delay, and that it
1869	   is small. Because this is not the case, the routing tables
1870	   constructed may be suboptimal or may even fail to converge.

1872	   When a subnetwork is used to transit between a set of routers, it
1873	   conventionally provide the equivalent of a full mesh of point-to-
1874	   point links. Simplicity of the internal subnet structure can be used
1875	   (e.g., via NHRP [RFC2332]) to reduce the size of address resolution
1876	   tables, but routing exchanges will continue to reflect the full mesh
1877	   they emulate. In general, subnetworks should not be used as a transit
1878	   among a set of routers where routing protocols would break if a full
1879	   mesh of equivalent point-to-point links were used.

1881	   Some subnetworks have special features that allow the use of more
1882	   effective or responsive routing mechanisms that cannot be implemented
1883	   in IP because of its need for generality. One example is the self-
1884	   learning bridge algorithm widely used in Ethernet networks. Learning
1885	   bridges perform Layer-2 subnetwork forwarding, avoiding the need for
1886	   dynamic routing at each subnetwork hop.  Another is the "handoff"
1887	   mechanism in cellular telephone networks, particularly the "soft
1888	   handoff" scheme in IS-95 CDMA.

1890	   Subnetworks that cover large geographic areas or include links of
1891	   widely-varying capabilities should be avoided. IP routing generally
1892	   considers all multipoint subnets equivalent to a local, shared-medium
1893	   link with uniform metrics between any pair of systems, and ignores
1894	   internal subnetwork topology. Where a subnetwork diverges from that
1895	   assumption, it is the obligation of subnetwork designers to provide
1896	   compensating mechanisms. Not doing so can affect the scalability and
1897	   convergence of IP routing, as noted above.

1899	   The subnetwork designer who decides to implement internal routing
1900	   should consider whether a custom routing algorithm is warranted, or
1901	   if an existing Internet routing algorithm or protocol may suffice.
1902	   The designer should consider whether this decision is to reduce the
1903	   address resolution table size (possible, but with additional protocol
1904	   support required), or is trying to reduce routing table complexity.
1905	   The latter may be better achieved by partitioning the subnetwork,
1906	   either physically or logically, and using network-layer protocols to
1907	   support partitioning (e.g., AS's in BGP). Protocols and routing
1908	   algorithms can be notoriously subtle, complex and difficult to
1909	   implement correctly.  Much work can be avoided if an existing
1910	   protocol or existing implementations can be readily used.

1912	18 Security Considerations

1914	   Security has become a high priority in the design and operation of
1915	   the Internet. The Internet is vast, and countless organizations and
1916	   individuals own and operate its various components.  A consensus has
1917	   emerged for what might be called a "security placement principle": a
1918	   security mechanism is most effective when it is placed as close as
1919	   possible to, and under the direct control of the owner of, the asset
1920	   that it protects.

1922	   A corollary of this principle is that end-to-end security (e.g.,
1923	   confidentiality, authentication, integrity and access control) cannot
1924	   be ensured with subnetwork security mechanisms.  Not only are end-to-
1925	   end security mechanisms much more closely associated with the end-
1926	   user assets they protect, they are also much more comprehensive. For
1927	   example, end-to-end security mechanisms cover gaps that can appear
1928	   when otherwise good subnetwork mechanisms are concatenated.  This is
1929	   an important application of the end-to-end principle [SRC81].

1931	   Several security mechanisms that can be used end-to-end have already
1932	   been deployed in the Internet and are enjoying increasing use. The
1933	   most important are the Secure Sockets Layer (SSL) [SSL2] [SSL3] and
1934	   TLS [RFC2246] primarily used to protect web commerce; Pretty Good
1935	   Privacy (PGP) [RFC1991] and S/MIME [RFCs-2630-2634], primarily used
1936	   to protect and authenticate email and software distributions; the
1937	   Secure Shell (SSH), used for secure remote access and file transfer;
1938	   and IPsec [RFC2401], a general purpose encryption and authentication
1939	   mechanism that sits just above IP and can be used by any IP
1940	   application. (IPsec can actually be used either on an end-to-end
1941	   basis or between security gateways that do not include either or both
1942	   end systems.)

1944	   Nonetheless, end-to-end security mechanisms are not used as widely as
1945	   might be desired. However, the group could not reach consensus on
1946	   whether subnetwork designers should be actively encouraged to
1947	   implement mechanisms to protect user data.

1949	   The clear consensus of the working group held that subnetwork
1950	   security mechanisms, especially when weak or incorrectly implemented
1951	   [BGW01], may actually be counterproductive.  The argument is that
1952	   subnetwork security mechanisms can lull end users into a false sense
1953	   of security, diminish the incentive to deploy effective end-to-end
1954	   mechanisms, and encourage "risky" uses of the Internet that would not
1955	   be made if users understood the inherent limits of subnetwork
1956	   security mechanisms.

1958	   The other point of view encourages subnetwork security on the
1959	   principle that it is better than the default situation, which all too
1960	   often is no security at all.  Users of especially vulnerable subnets
1961	   (such as consumers who have wireless home networks and/or shared
1962	   media Internet access) often have control over at most one endpoint
1963	   -- usually a client -- and therefore cannot enforce the use of end-
1964	   to-end mechanisms. However, subnet security can be entirely adequate
1965	   for protecting low-valued assets against the most likely threats.  In
1966	   any event, subnet mechanisms do not preclude the use of end-to-end
1967	   mechanisms, which are typically used to protect highly-valued assets.
1968	   This viewpoint recognizes that many security policies implicitly
1969	   assume that the entire end-to-end path is composed of a series of
1970	   concatenated links that are nominally physically secured.  That is,
1971	   these policies assume that all endpoints of all links are trusted and
1972	   that access to the physical medium by attackers is difficult.  To
1973	   meet the assumptions of such policies, explicit mechanisms are needed
1974	   for links (especially shared medium links) that lack physical
1975	   protection.  This, for example, is the rationale that underlies Wired
1976	   Equivalent Privacy (WEP) in the IEEE 802.11 [IEEE80211] wireless LAN
1977	   standard, and the Baseline Privacy Interface in the DOCSIS [DOCSIS1]
1978	   [DOCSIS2] data over cable television networks standards.

1980	   We therefore recommend that subnetwork designers who choose to
1981	   implement security mechanisms to protect user data be as candid as
1982	   possible with the details of such security mechanisms and the
1983	   inherent limits of even the most secure mechanisms when implemented
1984	   in a subnetwork rather than on an end-to-end basis.

1986	   In keeping with the "placement principle", a clear consensus exists
1987	   for another subnetwork security role: the protection of the
1988	   subnetwork itself.  Possible threats to subnetwork assets include
1989	   theft of service and denial of service; shared media subnets tend to
1990	   be especially vulnerable to such attacks.  In some cases, mechanisms
1991	   that protect subnet assets can also improve (but can not ensure) end-
1992	   to-end security.

1994	   One security service can be provided by the subnetwork that will aid
1995	   in the solution to an overall Internet problem: subnetwork security
1996	   SHOULD provide a mechanism to authenticate the source of a subnetwork
1997	   frame.  This function is missing in some current protocols, e.g., the
1998	   use of ARP [RFC0826] to associate an IPv4 address with a MAC address.
1999	   The IPv6 Neighbor Discovery (ND) [RFC2461] performs a similar
2000	   function.

2002	   There are well known security flaws with this address resolution
2003	   mechanism [Wilbur99].  However, the inclusion of subnetwork frame
2004	   source authentication will permit a secure subnetwork address

2006	   Another potential role for subnetwork security is to protect users
2007	   against traffic analysis, i.e., identifying the communicating parties
2008	   and determining their communication patterns and volumes even when
2009	   their actual contents are protected by strong end-to-end security
2010	   mechanisms.  Lower-layer security can be more effective against
2011	   traffic analysis due to its inherent ability to aggregate the
2012	   communications of multiple parties sharing the same physical
2013	   facilities while obscuring higher layer protocol information that
2014	   indicates specific end points, such as IP addresses and TCP/UDP port
2015	   numbers.

2017	   However, traffic analysis is a notoriously subtle and difficult
2018	   threat to understand and defeat, far more so than threats to
2019	   confidentiality and integrity.  We therefore urge extreme care in the
2020	   design of subnetwork security mechanisms specifically intended to
2021	   thwart traffic analysis.

2023	   Subnetwork designers must keep in mind that design and implementation
2024	   for security is difficult [Schneier00].  [Schneier95] describes
2025	   protocols and algorithms which are considered well understood and
2026	   believed to be sound.

2028	   Poor design process, subtle design errors and flawed implementation
2029	   can result in gaping vulnerabilities.  In recent years, a number of
2030	   subnet standards have had problems exposed.  The following are
2031	   examples of mistakes that have been made:

2033	   1. Use of weak and untested algorithms [Crypto9912] [BGW01].  For a
2034	   variety of reasons, algorithms were chosen which had subtle flaws
2035	   that made them vulnerable to a variety of attacks.

2037	   2. Use of 'security by obscurity' [Schneier4] [Crypto9912].  One
2038	   common mistake is to assume that keeping cryptographic algorithms
2039	   secret makes them more secure.  This is intuitive, but wrong.  Full
2040	   public disclosure early in the design process attracts peer review by
2041	   knowledgeable cryptographers.  Exposure of flaws by this review far
2042	   outweighs any imagined benefit from forcing attackers to reverse
2043	   engineer security algorithms.

2045	   3. Inclusion of trapdoors [Schneier4] [Crypto9912].  Trapdoors are
2046	   flaws surreptitiously left in an algorithm to allow it to be broken.
2047	   This might be done to recover lost keys or to permit surreptitious
2048	   access by governmental agencies.  Trapdoors can be discovered and
2049	   exploited by malicious attackers.

2051	   4. Sending passwords or other identifying information as clear text.
2052	   For many years, analog cellular telephones could be cloned and used
2053	   to steal service.  The cloners merely eavesdropped on the
2054	   registration protocols that exchanged everything in clear text.

2056	   5. Keys which are common to all systems on a subnet [BGW01].

2058	   6. Incorrect use of a sound mechanism.  For example [BGW01], one
2059	   subnet standard includes an initialization vector which is poorly
2060	   designed and poorly specified.  A determined attacker can easily
2061	   recover multiple ciphertexts encrypted with the same key stream and
2062	   perform statistical attacks to decipher them.

2064	   7. Identifying information sent in clear text that can be resolved to
2065	   an individual, identifiable device. This creates a vulnerability to
2066	   attacks targeted to that device (or its owner).

2068	   8. Inability to renew and revoke shared secret information.

2070	   9. Insufficient key length.

2072	   10. Failure to address "man-in-the-middle" attacks, e.g., with mutual
2073	   authentication.

2075	   11. Failure to provide a form of replay detection, e.g., to prevent a
2076	   receiver from accepting packets from an attacker that simply resends
2077	   previously captured network traffic.

2079	   12. Failure to provide integrity mechanisms when providing
2080	   confidentiality schemes [Bel98].

2082	   This list is by no means comprehensive.  Design problems are
2083	   difficult to avoid, but expert review is generally invaluable in
2084	   avoiding problems.

2086	   In addition, well-designed security protocols can be compromised by
2087	   implementation defects.  Examples of such defects include use of
2088	   predictable pseudo-random numbers [RFC1750], vulnerability to buffer
2089	   overflow attacks due to unsafe use of certain I/O system calls
2090	   [WFBA2000], and inadvertent exposure of secret data.

2092	Normative References

2094	   References of the form RFCnnnn are Internet Request for Comments
2095	   (RFC) documents available online at www.rfc-editor.org.

2097	   [ATMFTM] The ATM Forum, "Traffic Management Specification, Version
2098	   4.0", April 1996, document af-tm-0056.000 (www.atmforum.com).

2100	   [BGW01] Nikita Borisov, Ian Goldberg and David Wagner, "Intercepting
2101	   Mobile Communications: The Insecurity of 802.11," In Proceedings of
2102	   ACM MobiCom, July 2001.

2104	   [BPK98] Hari Balakrishnan, Venkata Padmanabhan, Randy H. Katz.  'The
2105	   Effects of Asymmetry on TCP Performance."  ACM Mobile Networks and
2106	   Applications (MONET), 1998.

2108	   [ISO3309] ISO/IEC 3309:1991(E), "Information Technology -
2109	   Telecommunications and information exchange between systems - High-
2110	   level data link control (HDLC) procedures - Frame structure",
2111	   International Organization For Standardization, Fourth edition
2112	   1991-06-01.

2114	   [MSMO97] M. Mathis, J. Semke, J. Mahdavi, T. Ott, "The Macroscopic
2115	   Behavior of the TCP Congestion Avoidance Algorithm", Computer
2116	   Communication Review, volume 27, number 3, July 1997.

2118	   [PFTK98] Padhye, J., Firoiu, V., Towsley, D., and Kurose, J.,
2119	   "Modeling TCP Throughput: a Simple Model and its Empirical
2120	   Validation", UMASS CMPSCI Tech Report TR98-008, Feb. 1998.

2122	   [RED93] S. Floyd, V. Jacobson, "Random Early Detection gateways for
2123	   Congestion Avoidance", IEEE/ACM Transactions in Networking, V.1 N.4,
2124	   August 1993, http://www.aciri.org/floyd/papers/red/red.html

2126	   [RFC791] Jon Postel.  "Internet Protocol". September 1981.

2128	   [RFC793] Jon Postel.  "Transmission Control Protocol", September
2129	   1981.

2131	   [RFC1144] Jacobson, V., "Compressing TCP/IP Headers for Low-Speed
2132	   Serial Links," RFC 1144, February 1990.

2134	   [RFC1191] J. Mogul, S. Deering. "Path MTU Discovery". November 1990.

2136	   [RFC1435] S. Knowles. "IESG Advice from Experience with Path MTU
2137	   Discovery".  March 1993.

2139	   [RFC1661] W. Simpson. "The Point-to-Point Protocol (PPP)". July 1994.

2141	   [RFC1812] F. Baker, "Requirements for IP Version 4 Routers".  June
2142	   1995.

2144	   [RFC1981] J. McCann, S. Deering, J. Mogul. "Path MTU Discovery for IP
2145	   version 6".  August 1996.

2147	   [RFC2246] T. Dierks, C. Allen. "The TLS Protocol Version 1.0".
2148	   January 1999.

2150	   [RFC2309] B. Braden, D. Clark, J. Crowcroft, B. Davie, S.  Deering,
2151	   D. Estrin, S. Floyd, V. Jacobson, G. Minshall, C. Partridge, L.
2152	   Peterson, K. Ramakrishnan, S. Shenker, J. Wroclawski, L. Zhang.
2153	   "Recommendations on Queue Management and Congestion Avoidance in the
2154	   Internet".  April 1998.

2156	   [RFC2364] G. Gross et al. "PPP Over AAL5". July 1998.

2158	   [RFC2393] A. Shacham et al. "IP Payload Compression Protocol
2159	   (IPComp)". December 1998.

2161	   [RFC2394] R. Pereira. "IP Payload Compression Using DEFLATE".
2162	   December 1998.

2164	   [RFC2395] R. Friend, R. Monsour. "IP Payload Compression Using LZS".
2165	   December 1998.

2167	   [RFC2507] M. Degermark, B. Nordgren, S. Pink. "IP Header
2168	   Compression".  February 1999.

2170	   [RFC2508] S. Casner, V. Jacobson. "Compressing IP/UDP/RTP Headers for
2171	   Low-Speed Serial Links". February 1999.

2173	   [RFC2581] M. Allman, V. Paxson, W. Stevens. "TCP Congestion Control".
2174	   April 1999.

2176	   [RFC2406] S. Kent, R. Atkinson. "IP Encapsulating Security Payload
2177	   (ESP)". November 1998.

2179	   [RFC2684] D. Grossman, J. Heinanen. "Multiprotocol Encapsulation over
2180	   ATM Adaptation Layer 5". September 1999.

2182	   [RFC2686] C. Bormann, "The Multi-Class Extension to Multi-Link PPP",
2183	   September 1999.

2185	   [RFC2687] C. Bormann, "PPP in a Real-time Oriented HDLC-like
2186	   Framing", September 1999.

2188	   [RFC2689] C. Bormann, "Providing Integrated Services over Low-bitrate
2189	   Links", September 1999.

2191	   [RFC2914] S. Floyd. "Congestion Control Principles". September 2000

2193	   [RFC2923] K. Lahey.  "TCP Problems with Path MTU Discovery".
2194	   September 2000.

2196	   [RFC2988] V.Paxson, M. Allman. "Computing TCP's Retransmission
2197	   Timer". November 2000.

2199	   [RFC3095] C. Bormann, ed., C. Burmeister, M. Degermark, H. Fukushima,
2200	   H. Hannu, L-E. Jonsson, R. Hakenberg, T. Koren, K. Le, Z. Liu, A.
2201	   Martensson, A. Miyazaki, K. Svanbro, T. Wiebke, T. Yoshimura, H.
2202	   Zheng, "RObust Header Compression (ROHC): Framework and four
2203	   profiles: RTP, UDP, ESP, and uncompressed", July 2001.

2205	   [RFC3096] M. Degermark, ed., "Requirements for robust IP/UDP/RTP
2206	   header compression", July 2001.

2208	   [RFC3168] K. Ramakrishnan, S. Floyd, D. Black, "The Addition of
2209	   Explicit Congestion Notification (ECN) to IP", September 2001.

2211	   [Schneier95] Schneier, Bruce, Applied Cryptography: Protocols,
2212	   Algorithms and Source Code in C (John Wiley and Sons, October 1995).

2214	   [Schneier00] Schneier, Bruce, Secrets and Lies: Digital Security in a
2215	   Networked World (John Wiley & Sons, August 2000).

2217	   [SRC81] Jerome H. Saltzer, David P. Reed and David D. Clark, "End-to-
2218	   End Arguments in System Design".  Second International Conference on
2219	   Distributed Computing Systems (April, 1981) pages 509-512. Published
2220	   with minor changes in ACM Transactions in Computer Systems 2, 4,
2221	   November, 1984, pages 277-288. Reprinted in: Craig Partridge, editor
2222	   Innovations in internetworking. Artech House, Norwood, MA, 1988,
2223	   pages 195-206. ISBN 0-89006-337-0.
2224	   http://people.qualcomm.com/karn/library.html

2226	   [SSL2]   Hickman, Kipp, "The SSL Protocol", Netscape Communications
2227	   Corp., Feb 9, 1995.

2229	   [SSL3]   A. Frier, P. Karlton, and P. Kocher, "The SSL 3.0 Protocol",
2230	   Netscape Communications Corp., Nov 18, 1996.

2232	Informative References

2234	   References of the form RFCnnnn are Internet Request for Comments
2235	   (RFC) documents available online at www.rfc-editor.org.

2237	   [802.1D] Information Technology Telecommunications and information
2238	   exchange between systems Local and metropolitan area networks, Common
2239	   specifications Media access control (MAC) bridges, IEEE 802.1D, 1998.
2240	   ISO 15802-3.

2242	   [802.1p] IEEE, 802.1p, Standard for Local and Metropolitan Area
2243	   Networks - Supplement to Media Access Control (MAC) Bridges: Traffic
2244	   Class Expediting and Multicast

2246	   [AP99] M. Allman, V. Paxson, On Estimating End-to-End Network Path
2247	   Properties, In Proceedings of ACM SIGCOMM 99.

2249	   [AR02] G. Acar and C. Rosenberg, Weighted Fair Bandwidth-on-Demand
2250	   (WFBoD) for Geo-Stationary Satellite Networks with On-Board
2251	   Processing, Computer Networks, 39(1), 2002.

2253	   [BA02] Ethan Blanton, Mark Allman. On Making TCP More Robust to
2254	   Packet Reordering. ACM Computer Communication Review, 32(1), January
2255	   2002.

2257	   [Bel98] Steven M. Bellovin, "Cryptography and the Internet", in
2258	   Proceedings of CRYPTO '98, August 1998.
2259	   (http://www.research.att.com/~smb/papers/inet-crypto.pdf)

2261	   [BPS99] "Packet Reordering is Not Pathological Network Behavior", Jon
2262	   C. R. Bennet, Craig Partridge, Nicholas Shectman, IEEE/ACM
2263	   Transactions on Networking, Vol 7, No. 6, December 1999.

2265	   [CGMP] Farinacci D., Tweedly A., Speakman T., "Cisco Group Management
2266	   Protocol (CGMP)", 1996/1997
2267	   ftp://ftpeng.cisco.com/ipmulticast/specs/cgmp.txt

2269	   [ITU-I363] ITU-T I.363.5 B-ISDN ATM Adaptation Layer Specification
2270	   Type AAL5, International Standards Organisation (ISO), 1996.

2272	   [RFC3366] Fairhurst, G., and L. Wood, Advice to link designers on
2273	   link Automatic Repeat reQuest (ARQ), August 2002.

2275	   [RFC3449] H. Balakrishnan, V. N. Padmanabhan, G. Fairhurst, M,
2276	   Sooriyabandara. "TCP Performance Implications of Network Path
2277	   Asymmetry", December 2002.

2279	   [Crypto9912] Schneier, Bruce "European Cellular Encryption
2280	   Algorithms" Crypto-Gram (December 15, 1999)
2281	   http://www.counterpane.com

2283	   [DIX82] Digital Equipment Corp, Intel Corp, Xerox Corp, Ethernet
2284	   Local Area Network Specification Version 2.0, November 1982.

2286	   [DOCSIS1] Data-Over-Cable Service Interface Specifications, Radio
2287	   Frequency Interface Specification 1.0, SP-RFI-I05-991105, November
2288	   1999, Cable Television Laboratories, Inc.

2290	   [DOCSIS2] Data-Over-Cable Service Interface Specifications, Radio
2291	   Frequency Interface Specification 1.1, SP-RFIv1.1-I05-000714, July
2292	   2000, Cable Television Laboratories, Inc.

2294	   [DOCSIS3] W.S. Lai, "DOCSIS-Based Cable Networks: Impact of Large
2295	   Data Packets on Upstream Capacity", 14th ITC Specialists Seminar on
2296	   Access Networks and Systems, Barcelona, Spain, April 25-27, 2001.

2298	   [EN301] ETSI, European Broadcasting Union, Digital Video Broadcasting
2299	   (DVB); DVB Specification for Data Broadcasting, European Standard
2300	   (Telecommunications Series)  EN 301 192 v1.2.1(1999-06)

2302	   [ES00] David A. Eckhardt and Peter Steenkiste, "Effort-limited Fair
2303	   (ELF) Scheduling for Wireless Networks, Proceedings of IEEE Infocom
2304	   2000.

2306	   [FB00] Firoiu V., and Borden M., "A Study of Active Queue Management
2307	   for Congestion Control" to appear in Infocom 2000

2309	   [IEEE8023] IEEE 802.3 CSMA/CD Access Method. Available from
2310	   http://standards.ieee.org/

2312	   [IEEE80211] IEEE 802.11 Wireless LAN standard. Available from
2313	   http://standards.ieee.org/

2315	   [ISO13818] ISO/IEC, ISO/IEC 13818-1:2000(E)  Information Technology
2316	   - Generic coding of moving pictures and associated audio information:
2317	   Systems,    Second edition, 2000-12-01 International Organization for
2318	   Standardization and International Electrotechnical Commission.

2320	   [Jac90] Van Jacobson.  Modified TCP Congestion Avoidance Algorithm.
2321	   Email to the end2end-interest mailing list, April 1990.  URL:
2322	   ftp://ftp.ee.lbl.gov/email/vanj.90apr30.txt.

2324	   [KY02] F. Khafizov, M. Yavuz.  Running TCP Over IS-2000, Proceedings
2325	   of IEEE ICC, 2002.

2327	   [LK00] R. Ludwig, R. H. Katz, "The Eifel Algorithm: Making TCP Robust
2328	   Against Spurious Retransmissions", ACM Computer Communication Review,
2329	   Vol.  30, No. 1, January 2000.

2331	   [LKJK02] R. Ludwig, A. Konrad, A. D. Joseph, R. H. Katz, "Optimizing
2332	   the End-to-End Performance of Reliable Flows over Wireless Links",
2333	   Kluwer/ACM Wireless Networks Journal, Vol. 8, Nos. 2/3, pp. 289-299,
2334	   March-May 2002.

2336	   [LRKOJ99] R. Ludwig, B. Rathonyi, A. Konrad, K. Oden, A. Joseph,
2337	   Multi-Layer Tracing of TCP over a Reliable Wireless Link, pp.
2338	   144-154, In Proceedings of ACM SIGMETRICS 99.

2340	   [LS00] R. Ludwig, K. Sklower, The Eifel Retransmission Timer, ACM
2341	   Computer Communication Review, Vol. 30, No. 3, July 2000.

2343	   [MAGMA-PROXY] Work In Progress, MAGMA WG, draft-ietf-magma-igmp-
2344	   proxy-04.txt

2346	   [MAGMA-SNOOP] Work In Progress, MAGMA WG, draft-ietf-magma-
2347	   snoop-09.txt

2349	   [MBB00] May, M., Bonald, T., and Bolot, J-C., "Analytic Evaluation of
2350	   RED Performance", INFOCOM 2000.

2352	   [MBDL99] May, M., Bolot, J., Diot, C., and Lyles, B., "Reasons not to
2353	   deploy RED", Proc. of 7th. International Workshop on Quality of
2354	   Service (IWQoS'99), June 1999.

2356	   [GM02] Luigi Alfredo Grieco1, Saverio Mascolo, "TCP Westwood and Easy
2357	   RED to Improve Fairness in High-Speed Networks", Proceedings of the
2358	   7th International Workshop on Protocols for High-Speed Networks,
2359	   April 2002.

2361	   [MBONED-GAP] Meyer, D. and B. Nickless, Work In Progress, MBoned WG,
2362	   draft-ietf-mboned-iesg-gap-analysis-01.txt.

2364	   [MYR95] Nanette J. Boden, Danny Cohen, Robert E. Felderman, Alan E.
2365	   Kulawik, Charles L. Seitz, et al.  MYRINET: A Gigabit per Second
2366	   Local Area Network, IEEE-Micro,Vol.15, No.1, February 1995, pp.29-36.

2368	   [RF95] Romanow, A., and Floyd, S., "Dynamics of TCP Traffic over ATM
2369	   Networks".  IEEE Journal of Selected Areas in Communication, V. 13 N.
2370	   4, May 1995, p. 633-641.

2372	   [RFC0826] Plummer, D.C., "Ethernet Address Resolution Protocol: Or
2373	   converting network protocol addresses to 48-bit Ethernet address for
2374	   transmission on Ethernet hardware," STD 37, RFC 826, November 1982.

2376	   [RFC1071] R. Braden, D. Borman, C. Partridge, "Computing the Internet
2377	   Checksum", September 1988.

2379	   [RFC1112] S. Deering, "Host Extensions for IP Multicasting", August
2380	   1989.

2382	   [RFC1750] D. Eastlake, S. Crocker, J. Schiller, "Randomness
2383	   Recommendations for Security", December 1994.

2385	   [RFC2018] M. Mathis, J. Mahdavi, S. Floyd, A. Romanow.  "TCP
2386	   Selective Acknowledgement Options".  October 1996.

2388	   [RFC2236] W. Fenner, Internet Group Management Protocol, Version 2.,
2389	   November 1997.

2391	   [RFC2328] J. Moy, "OSPF Version 2", April 1998.

2393	   [RFC2401]  S. Kent, R. Atkinson, "Security Architecture for the
2394	   Internet Protocol".  November 1998.

2396	   [RFC2440] J. Callas et al. "OpenPGP Message Format". November 1998.

2398	   [RFC2460] S. Deering, R. Hinden.  "Internet Protocol, Version 6
2399	   (IPv6) Specification".  December 1998.

2401	   [RFC2461] T. Narten, E. Nordmark, W. Simpson.  "Neighbor Discovery
2402	   for IP Version 6 (IPv6)".  December 1998.

2404	   [RFC2616] R. Fielding et al. "Hypertext Transfer Protocol --
2405	   HTTP/1.1". June 1999.

2407	   [RFC2630] R. Housley.  "Cryptographic Message Syntax".  June 1999.

2409	   [RFC2631] E. Rescorla.  "Diffie-Hellman Key Agreement Method".  June
2410	   1999.

2412	   [RFC2632] B. Ramsdell.  "S/MIME Version 3 Certificate Handling".
2413	   June 1999.

2415	   [RFC2633] B. Ramsdell.  "S/MIME Version 3 Message Specification".
2416	   June 1999.

2418	   [RFC2710] S. Deering, W. Fenner, B. Haberman, Multicast Listener
2419	   Discovery (MLD) for IPv6, October 1999.

2421	   [RFC2784] D. Farinacci, T. Li, S. Hanks, D. Meyer, P. Traina.
2422	   "Generic Routing Encapsulation (GRE)".  March 2000.

2424	   [RFC2923]  K. Lahey. "TCP Problems with Path MTU Discovery".
2425	   September 2000.

2427	   [RFC3048] B. Whetten, L. Vicisano, R. Kermode, M. Handley, S. Floyd,
2428	   M. Luby.  "Reliable Multicast Transport Building Blocks for One-to-
2429	   Many Bulk-Data Transfer".  January 2001.

2431	   [RFC3376] B. Cain, S. Deering, I. Kouvelas, B. Fenner, A.
2432	   Thyagarajan, Internet Group Management Protocol, Version 3, October
2433	   2002.

2435	   [RFC3488] Cisco Systems Router-port Group Management Protocol (RGMP).
2436	   I.  Wu, T. Eckert. February 2003.

2438	   [RFC3590] B. Haberman, Source Address Selection for the Multicast
2439	   Listener Discovery (MLD) Protocol, September 2003.

2441	   [SP2000] "When the CRC and TCP Checksum Disagree", Jonathan Stone &
2442	   Craig Partridge, ACM SIGCOMM, September 2000.
2443	   http://www.acm.org/sigcomm/sigcomm2000/conf/paper/sigcomm2000-9-1.pdf

2445	   [Stevens94] R. Stevens, "TCP/IP Illustrated, Volume 1," Addison-
2446	   Wesley, 1994 (section 2.10).

2448	   [TCPF98] Dong Lin and H.T. Kung, "TCP Fast Recovery Strategies:
2449	   Analysis and Improvements", IEEE Infocom, March 1998.  Available
2450	   from: "http://www.eecs.harvard.edu/networking/papers/infocom-tcp-
2451	   final-198.pdf"

2453	   [WFBA2000] David Wagner, Jeffrey S. Foster, Eric Brewer and Alexander
2454	   Aiken, "A First Step Toward Automated Detection of Buffer Overrun
2455	   Vulnerabilities", Proceedings of NDSS2000, or
2456	   http://www.berkeley.edu:80/~daw/papers/

2458	   [Wilbur89] Wilbur, Steve R., Jon Crowcroft, and Yuko Murayama.  "MAC
2459	   layer Security Measures in Local Area Networks, " Local Area Network
2460	   Security, Workshop LANSEC '89 Proceedings, Springer-Verlag, April
2461	   1989, pp.53-64.

2463	Authors'  Addresses:

2465	   Phil Karn, Editor Qualcomm 5775 Morehouse Drive San Diego CA 92121
2466	   858 587 1121 karn@qualcomm.com

2468	   Carsten Bormann Universitaet Bremen FB3 TZI Postfach 330440 D-28334
2469	   Bremen, GERMANY +49 421 218 7024 cabo@tzi.org

2471	   Godred (Gorry) Fairhurst Department of Engineering University of
2472	   Aberdeen Aberdeen, AB24 3UE UK gorry@erg.abdn.ac.uk
2473	   http://www.erg.abdn.ac.uk/users/gorry

2475	   Dan Grossman Motorola, Inc.  20 Cabot Blvd.  Mansfield, MA 02048
2476	   dan@dma.isg.mot.com

2478	   Reiner Ludwig Ericsson Research Ericsson Allee 1 52134 Herzogenrath,
2479	   Germany +49 2407 575 719 Reiner.Ludwig@ericsson.com

2481	   Jamshid Mahdavi Volera, Inc.  2211 N. 1st St.  San Jose, CA 95131
2482	   mahdavi@volera.com

2484	   Gabriel Montenegro Sun Microsystems Laboratories, Europe 29, chemin
2485	   du Vieux Chene 38240 Meylan, FRANCE gab@sun.com

2487	   Joe Touch USC/ISI 4676 Admiralty Way Marina del Rey CA 90292 310 448
2488	   9151 touch@isi.edu http://www.isi.edu/touch

2490	   Lloyd Wood Global Defense and Space Group, Cisco Systems 9 New Square
2491	   Park, Bedfont Lakes Feltham TW14 8HA, United Kingdom +44 (0)20 8824
2492	   4236 lwood@cisco.com http://www.ee.surrey.ac.uk/Personal/L.Wood/

2494	Contributors' Addresses:

2496	   Aaron Falk USC Information Sciences Institute 4676 Admiralty Way
2497	   Marina Del Rey, CA 90292 310-448-9327 falk@isi.edu

2499	   Saverio Mascolo Dipartimento di Elettrotecnica ed Elettronica,
2500	   Politecnico di Bari Via Orabona 4, 70125 Bari,  Italy +39 080 596
2501	   3621 mascolo@poliba.it http://www-dee.poliba.it/dee-
2502	   web/Personale/mascolo.html

2504	   Marie-Jose Montpetit marie@mjmontpetit.com

2506	Full Copyright Statement

2508	   Copyright (C) The Internet Society (2003). All Rights Reserved.

2510	   This document and translations of it may be copied and furnished to
2511	   others, and derivative works that comment on or otherwise explain it
2512	   or assist in its implementation may be prepared, copied, published
2513	   and distributed, in whole or in part, without restriction of any
2514	   kind, provided that the above copyright notice and this paragraph are
2515	   included on all such copies and derivative works. However, this
2516	   document itself may not be modified in any way, such as by removing
2517	   the copyright notice or references to the Internet Society or other
2518	   Internet organizations, except as needed for the purpose of
2519	   developing Internet standards in which case the procedures for
2520	   copyrights defined in the Internet Standards process must be
2521	   followed, or as required to translate it into languages other than
2522	   English.

2524	   The limited permissions granted above are perpetual and will not be
2525	   revoked by the Internet Society or its successors or assigns.

2527	   This document and the information contained herein is provided on an
2528	   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
2529	   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
2530	   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
2531	   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
2532	   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.