idnits 2.17.1 draft-ietf-pilc-link-design-15.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 2537 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 1996: '... SHOULD provide a mechanism to authe...' Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC 2460' is mentioned on line 251, but not defined ** Obsolete undefined reference: RFC 2460 (Obsoleted by RFC 8200) == Missing Reference: 'RFC768' is mentioned on line 843, but not defined == Missing Reference: 'RFC1662' is mentioned on line 874, but not defined == Missing Reference: 'RFC2582' is mentioned on line 971, but not defined ** Obsolete undefined reference: RFC 2582 (Obsoleted by RFC 3782) == Missing Reference: 'RFC2990' is mentioned on line 1233, but not defined == Missing Reference: 'RFC1633' is mentioned on line 1239, but not defined == Missing Reference: 'RFC2205' is mentioned on line 1245, but not defined == Missing Reference: 'RFC2210' is mentioned on line 1245, but not defined == Missing Reference: 'RFC 2212' is mentioned on line 1250, but not defined == Missing Reference: 'RFC2211' is mentioned on line 1256, but not defined == Missing Reference: 'RFC2208' is mentioned on line 1266, but not defined == Missing Reference: 'RFC2475' is mentioned on line 1268, but not defined == Missing Reference: 'RFC2474' is mentioned on line 1364, but not defined == Missing Reference: 'RFC2598' is mentioned on line 1289, but not defined ** Obsolete undefined reference: RFC 2598 (Obsoleted by RFC 3246) == Missing Reference: 'RFC2597' is mentioned on line 1295, but not defined == Missing Reference: 'RFC 2990' is mentioned on line 1305, but not defined == Missing Reference: 'RFC2212' is mentioned on line 1362, but not defined == Missing Reference: 'RFC2865' is mentioned on line 1761, but not defined == Missing Reference: 'RFC2131' is mentioned on line 1768, but not defined == Missing Reference: 'RFC1332' is mentioned on line 1770, but not defined == Missing Reference: 'RFC1939' is mentioned on line 1780, but not defined == Missing Reference: 'RFC2060' is mentioned on line 1780, but not defined ** Obsolete undefined reference: RFC 2060 (Obsoleted by RFC 3501) == Missing Reference: 'RFC2002' is mentioned on line 1784, but not defined ** Obsolete undefined reference: RFC 2002 (Obsoleted by RFC 3220) == Missing Reference: 'RFC2322' is mentioned on line 1850, but not defined == Missing Reference: 'RFC2332' is mentioned on line 1875, but not defined == Missing Reference: 'RFC1991' is mentioned on line 1935, but not defined ** Obsolete undefined reference: RFC 1991 (Obsoleted by RFC 4880) == Missing Reference: 'RFCs-2630-2634' is mentioned on line 1935, but not defined == Missing Reference: 'Wilbur99' is mentioned on line 2003, but not defined == Missing Reference: 'Schneier4' is mentioned on line 2045, but not defined == Unused Reference: 'MAGMA-SNOOP' is defined on line 2346, but no explicit reference was found in the text == Unused Reference: 'RFC2460' is defined on line 2398, but no explicit reference was found in the text == Unused Reference: 'RFC2630' is defined on line 2407, but no explicit reference was found in the text == Unused Reference: 'RFC2631' is defined on line 2409, but no explicit reference was found in the text == Unused Reference: 'RFC2632' is defined on line 2412, but no explicit reference was found in the text == Unused Reference: 'RFC2710' is defined on line 2418, but no explicit reference was found in the text == Unused Reference: 'RFC3376' is defined on line 2431, but no explicit reference was found in the text == Unused Reference: 'RFC3590' is defined on line 2438, but no explicit reference was found in the text == Unused Reference: 'Stevens94' is defined on line 2445, but no explicit reference was found in the text == Unused Reference: 'Wilbur89' is defined on line 2458, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. 'ATMFTM' -- Possible downref: Non-RFC (?) normative reference: ref. 'BGW01' -- Possible downref: Non-RFC (?) normative reference: ref. 'BPK98' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO3309' -- Possible downref: Non-RFC (?) normative reference: ref. 'MSMO97' -- Possible downref: Non-RFC (?) normative reference: ref. 'PFTK98' -- Possible downref: Non-RFC (?) normative reference: ref. 'RED93' ** Obsolete normative reference: RFC 793 (Obsoleted by RFC 9293) ** Downref: Normative reference to an Informational RFC: RFC 1435 ** Obsolete normative reference: RFC 1981 (Obsoleted by RFC 8201) ** Obsolete normative reference: RFC 2246 (Obsoleted by RFC 4346) ** Obsolete normative reference: RFC 2309 (Obsoleted by RFC 7567) ** Obsolete normative reference: RFC 2393 (Obsoleted by RFC 3173) ** Downref: Normative reference to an Informational RFC: RFC 2394 ** Downref: Normative reference to an Informational RFC: RFC 2395 ** Obsolete normative reference: RFC 2581 (Obsoleted by RFC 5681) ** Obsolete normative reference: RFC 2406 (Obsoleted by RFC 4303, RFC 4305) ** Downref: Normative reference to an Informational RFC: RFC 2689 ** Downref: Normative reference to an Informational RFC: RFC 2923 ** Obsolete normative reference: RFC 2988 (Obsoleted by RFC 6298) ** Downref: Normative reference to an Informational RFC: RFC 3096 -- Possible downref: Non-RFC (?) normative reference: ref. 'Schneier95' -- Possible downref: Non-RFC (?) normative reference: ref. 'Schneier00' -- Possible downref: Non-RFC (?) normative reference: ref. 'SRC81' -- Possible downref: Non-RFC (?) normative reference: ref. 'SSL2' -- Possible downref: Non-RFC (?) normative reference: ref. 'SSL3' == Outdated reference: A later version (-06) exists of draft-ietf-magma-igmp-proxy-04 == Outdated reference: A later version (-12) exists of draft-ietf-magma-snoop-09 -- Unexpected draft version: The latest known version of draft-ietf-mboned-iesg-gap-analysis is -00, but you're referring to -01. -- Obsolete informational reference (is this intentional?): RFC 1750 (Obsoleted by RFC 4086) -- Obsolete informational reference (is this intentional?): RFC 2401 (Obsoleted by RFC 4301) -- Obsolete informational reference (is this intentional?): RFC 2440 (Obsoleted by RFC 4880) -- Obsolete informational reference (is this intentional?): RFC 2460 (Obsoleted by RFC 8200) -- Obsolete informational reference (is this intentional?): RFC 2461 (Obsoleted by RFC 4861) -- Obsolete informational reference (is this intentional?): RFC 2616 (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235) -- Obsolete informational reference (is this intentional?): RFC 2630 (Obsoleted by RFC 3369, RFC 3370) -- Obsolete informational reference (is this intentional?): RFC 2632 (Obsoleted by RFC 3850) -- Obsolete informational reference (is this intentional?): RFC 2633 (Obsoleted by RFC 3851) Summary: 23 errors (**), 0 flaws (~~), 44 warnings (==), 24 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force Phil Karn, editor 3 INTERNET DRAFT Qualcomm 4 Carsten Bormann 5 Universitaet Bremen FB3 TZI 6 Godred (Gorry) Fairhurst 7 University of Aberdeen 8 Dan Grossman 9 Motorola, Inc. 10 Reiner Ludwig 11 Ericsson Research 12 Jamshid Mahdavi 13 Volera, Inc. 14 Gabriel Montenegro 15 Sun Microsystems Laboratories, Europe 16 Joe Touch 17 USC/ISI 18 Lloyd Wood 19 Cisco Systems 20 File: draft-ietf-pilc-link-design-15.txt December, 2003 21 Expires: June, 2004 23 Advice for Internet Subnetwork Designers 25 Status of this Memo 27 This document is an Internet-Draft and is in full conformance with 28 all provisions of Section 10 of RFC2026. 30 Internet-Drafts are working documents of the Internet Engineering 31 Task Force (IETF), its areas, and its working groups. Note that 32 other groups may also distribute working documents as Internet- 33 Drafts. 35 Internet-Drafts are draft documents valid for a maximum of six months 36 and may be updated, replaced, or obsoleted by other documents at any 37 time. It is inappropriate to use Internet-Drafts as reference 38 material or to cite them other than as "work in progress." 40 The list of current Internet-Drafts can be accessed at 41 http://www.ietf.org/ietf/1id-abstracts.txt 43 The list of Internet-Draft Shadow Directories can be accessed at 44 http://www.ietf.org/shadow.html. 46 Abstract 48 This document provides advice to the designers of digital 49 communication equipment, link-layer protocols and packet-switched 50 local networks (collectively referred to as subnetworks) who wish to 51 support the Internet protocols but who may be unfamiliar with the 52 Internet architecture and the implications of their design choices on 53 the performance and efficiency of the Internet. 55 Contributors 57 This document represents a consensus of the members of the IETF 58 Performance Implications of Link Characteristics (PILC) working 59 group. 61 This document would not have been possible without the contributions 62 of a great number of people in the Performance Implications of Link 63 Characteristics Working Group. In particular, the following people 64 provided major contributions of text, editing and advice to this 65 document: Mark Allman provided the final editing to complete this 66 document. Carsten Bormann provided text on robust header 67 compression. Gorry Fairhurst provided text on broadcast and 68 multicast issues and many valuable comments on the entire document. 69 Aaron Falk provided text on bandwidth on demand. Dan Grossman 70 provided text on security considerations as well as on many facets of 71 the document. Reiner Ludwig provided thorough document review and 72 text on TCP vs. Link-Layer Retransmission. Jamshid Mahdavi provided 73 text on TCP performance calculations. Saverio Mascolo provided 74 feedback on the document. Gabriel Montenegro provided feedback on 75 the document. Marie-Jose Montpetit provided text on bandwidth on 76 demand. Joe Touch provided text on multicast and broadcast. and 77 Lloyd Wood provided many valuable comments on drafts of the document. 79 Table of Contents 81 1 Introduction and Overview 82 2 Maximum Transmission Units (MTUs) and IP Fragmentation 83 2.1 Choosing the MTU in Slow Networks 84 3 Framing on Connection-Oriented Subnetworks 85 4 Connection-Oriented Subnetworks 86 5 Broadcasting and Discovery 87 6 Multicasting 88 7 Bandwidth on Demand (BoD) Subnets 89 8 Reliability and Error Control 90 8.1 TCP vs Link-Layer Retransmission 91 8.2 Recovery from Subnetwork Outages 92 8.3 CRCs, Checksums and Error Detection 93 8.4 How TCP Works 94 8.5 TCP Performance Characteristics 95 8.5.1 The Formulae 96 8.5.2 Assumptions 97 8.5.3 Analysis of Link-Layer Effects on TCP Performance 98 9 Quality-of-Service (QoS) considerations 99 10 Fairness vs Performance 100 11 Delay Characteristics 101 12 Bandwidth Asymmetries 102 13 Buffering, flow & congestion control 103 14 Compression 104 15 Packet Reordering 105 16 Mobility 106 17 Routing 107 18 Security Considerations 108 Normative References 109 Informative References 111 1 Introduction and Overview 113 IP, the Internet Protocol [RFC791], is the core protocol of the 114 Internet. IP defines a simple "connectionless" packet-switched 115 network. The success of the Internet is largely attributed to IP's 116 simplicity, the "end-to-end principle" [SRC81] on which the Internet 117 is based, and the resulting ease of carrying IP on a wide variety of 118 subnetworks not necessarily designed with IP in mind. A subnetwork 119 refers to any network operating immediately below the IP layer to 120 connect two or more systems using IP (i.e., end hosts or routers). 121 In its simplest form, this may be a direct connection between the IP 122 systems (e.g., using a length of cable or over a wireless medium). 124 This document defines a subnetwork as a layer 2 network, which is a 125 network that does not rely upon the services of IP routers to forward 126 packets between parts of the subnetwork. Although, IP routers may 127 bridge frames at layer 2 between parts of a subnetwork. Sometimes, it 128 is convenient to aggregate a group of such subnetworks into a single 129 logical subnetwork. IP routing protocols (e.g., OSPF, IS-IS, and PIM) 130 can be configured to support this aggregation, but typically present 131 a layer-3 subnetwork rather than a layer-2 subnetwork. This may also 132 result in a specific packet passing several times over the same 133 layer-2 subnetwork via an intermediate layer-3 gateway (router). 134 Because that aggregation requires layer-3 components, issues thereof 135 are beyond the scope of this document. 137 However, while many subnetworks carry IP, they do not necessarily do 138 so with maximum efficiency, minimum complexity or minimum cost; nor 139 do they implement certain features to efficiently support newer 140 Internet features of increasing importance, such as multicasting or 141 quality of service. 143 With the explosive growth of the Internet, IP packets comprise an 144 increasingly large fraction of the traffic carried by the world's 145 telecommunications networks. It therefore makes sense to optimize 146 both existing and new subnetwork technologies for IP as much as 147 possible. 149 Optimizing a subnetwork for IP involves three complementary 150 considerations: 152 1. Providing functionality sufficient to carry IP. 154 2. Eliminating unnecessary functions that increase cost or 155 complexity. 157 3. Choosing subnetwork parameters that maximize the performance of 158 the Internet protocols. 160 Because IP is so simple, consideration 2 is more of an issue than 161 consideration 1. That is to say, subnetwork designers make many more 162 errors of commission than errors of omission. However, certain 163 enhancements to Internet features, such as multicasting and quality- 164 of-service, benefit significantly from support given by the 165 underlying subnetworks beyond that necessary to carry "traditional" 166 unicast, best-effort IP. 168 A major consideration in the efficient design of any layered 169 communication network is the appropriate layer(s) in which to 170 implement a given function. This issue was first addressed in the 171 seminal paper "End-to-End Arguments in System Design" [SRC81]. That 172 paper argued that many functions can be implemented properly *only* 173 on an end-to-end basis, i.e., at the highest protocol layers, outside 174 the subnetwork. These functions include ensuring the reliable 175 delivery of data and the use of cryptography to provide 176 confidentiality and message integrity. 178 Such functions cannot be provided solely by the concatenation of hop- 179 by-hop services, so duplicating these functions at the lower protocol 180 layers (i.e., within the subnetwork) can be needlessly redundant or 181 even harmful to cost and performance. 183 However, partial duplication of functionality in a lower layer can 184 *sometimes* be justified by performance, security or availability 185 considerations. Examples include link-layer retransmission to improve 186 the performance of an unusually lossy channel, e.g., mobile radio; 187 link level encryption intended to thwart traffic analysis; and 188 redundant transmission links to improve availability, increase 189 throughput, or to guarantee performance for certain classes of 190 traffic. Duplication of protocol function should be done only with 191 an understanding of system-level implications, including possible 192 interactions with higher-layer mechanisms. 194 The original architecture of the Internet was influenced by the end- 195 to-end principle, and in our view it has been part of the reason for 196 the Internet's success. 198 The remainder of this document discusses the various subnetwork 199 design issues that the authors consider relevant to efficient IP 200 support. 202 2 Maximum Transmission Units (MTUs) and IP Fragmentation 204 IPv4 packets (datagrams) vary in size from 20 bytes (the size of the 205 IPv4 header alone) to a maximum of 65535 bytes. Subnetworks need not 206 support maximum-sized (64KB) IP packets, as IP provides a scheme that 207 breaks packets that are too large for a given subnetwork into 208 fragments that travel as independent IP packets and are reassembled 209 at the destination. The maximum packet size supported by a subnetwork 210 is known as its Maximum Transmission Unit (MTU). 212 Subnetworks may, but are not required to, indicate the length of each 213 packet they carry. One example is Ethernet with the widely used DIX 214 [DIX82] (not IEEE 802.3 [IEEE8023]) header, which lacks a length 215 field to indicate the true data length when the packet is padded to a 216 minimum of 60 bytes. This is not a problem for uncompressed IP 217 because each IP packet carries its own length field. 219 If optional header compression [RFC1144] [RFC2507] [RFC2508] 220 [RFC3095] is used, however, it is required that the link framing 221 indicate frame length because it is needed for the reconstruction of 222 the original header. 224 In IP version 4 (the version now in widespread use), fragmentation 225 can occur at either the sending host or in an intermediate router, 226 and fragments can be further fragmented at subsequent routers if 227 necessary. 229 In IP version 6 [RFC 2460], fragmentation can occur only at the 230 sending host; it cannot occur in a router (called "router 231 fragmentation" in this document). 233 Both IPv4, and IPv6 provide a "path MTU discovery" procedure 234 [RFC1191] [RFC1435] [RFC1981] that allows the sending host to avoid 235 fragmentation by discovering the minimum MTU along a given path and 236 reducing its packet sizes accordingly. This procedure is optional in 237 IPv4 and IPv6. 239 Path MTU discovery is widely deployed, but it sometimes encounters 240 problems. Some routers fail to generate the ICMP messages that convey 241 path MTU information to the sender, and sometimes the ICMP messages 242 are blocked by overly restrictive firewalls. The result can be a 243 "Path MTU Black Hole" [RFC2923] [RFC1435]. 245 The Path MTU Discovery procedure, the persistence of path MTU black 246 holes, and the deletion of router fragmentation in IPv6 reflects a 247 consensus of the Internet technical community that router 248 fragmentation is best avoided. This requires that subnetworks support 249 MTUs that are "reasonably" large. The smallest MTU permitted in IPv4 250 by [RFC791] is 576 bytes, but such a small value would clearly be 251 inefficient. Because IPv6 omits fragmentation by routers, [RFC 2460] 252 specifies a larger minimum MTU of 1280 bytes. Any subnetwork with an 253 internal packet payload smaller than 1280 bytes must implement a 254 mechanism that performs fragmentation/reassembly of IP packets 255 to/from subnetwork frames if it is to support IPv6. 257 If a subnetwork cannot directly support a "reasonable" MTU with 258 native framing mechanisms, it should internally fragment. That is, it 259 should transparently break IP packets into internal data elements and 260 reassemble them at the other end of the subnetwork. 262 This leaves the question of what is a "reasonable" MTU. Ethernet (10 263 and 100 Mb/s) has a MTU of 1500 bytes, and because of the ubiquity of 264 Ethernet few Internet paths have MTUs larger than this value. This 265 severely limits the utility of larger MTUs provided by other 266 subnetworks. Meanwhile larger MTUs are increasingly desirable on 267 high-speed subnetworks to reduce the per-packet processing overhead 268 in host computers, and implementers are encouraged to provide them 269 even though they may not be usable when Ethernet is also in the path. 271 Various "tunneling" schemes, such as GRE [RFC2784] or IP Security in 272 tunnel mode [RFC2406] treat IP as a subnetwork for IP. Since 273 tunneling adds header overhead, it can trigger fragmentation even 274 when the same physical subnetworks (e.g., Ethernet) are used on both 275 sides of the host performing IPsec encapsulation. Tunneling has made 276 it more difficult to avoid router fragmentation and has increased the 277 incidence of path MTU black holes [RFC2401], [RFC2923]. Larger 278 subnetwork MTUs may help to alleviate this problem. 280 2.1 Choosing the MTU in Slow Networks 282 In slow networks, the largest possible packet may take a considerable 283 time to send. This is known as channelisation or serialisation 284 delay. Total end-to-end interactive response time should not exceed 285 the well-known human factors limit of 100 to 200 ms. This includes 286 all sources of delay: electromagnetic propagation delay, queuing 287 delay, serialisation delay, and the store-and-forward time, i.e,. the 288 time to transmit a packet at link speed. 290 At low link speeds, store-and-forward delays can dominate total end- 291 to-end delay, and these are in turn directly influenced by the 292 maximum transmission unit (MTU) size. Even when an interactive packet 293 is given a higher queuing priority, it may have to wait for a large 294 bulk transfer packet to finish transmission. This worst-case wait 295 can be set by an appropriate choice of MTU. 297 For example, if the MTU is set to 1500 bytes, then a MTU-sized packet 298 will take about 8 milliseconds to send on a T1 (1.536 Mb/s) link. 299 But if the link speed is 19.2kb/s, then the transmission time becomes 300 625 ms -- well above our 100-200ms limit. A 256-byte MTU would lower 301 this delay to a little over 100 ms. However, care should be taken not 302 to lower the MTU excessively, as this will increase header overhead 303 and trigger frequent router fragmentation (if Path MTU discovery is 304 not in use). This is likely the case with multicast. 306 One way to limit delay for interactive traffic without imposing a 307 small MTU is to give priority to this traffic and to preempt (abort) 308 the transmission of a lower-priority packet when a higher priority 309 packet arrives in the queue. However, the link resources used to 310 send the aborted packet are lost, and overall throughput will 311 decrease. 313 Another way is to implement a link-level multiplexing scheme that 314 allows several packets to be in progress simultaneously, with 315 transmission priority given to segments of higher-priority IP 316 packets. For links using the Point-To-Point Protocol (PPP) [RFC1661], 317 multi-class multilink [RFC2686] [RFC2687] [RFC2689] provides such a 318 facility. 320 ATM (asynchronous transfer mode), where SNDUs are fragmented and 321 interleaved across smaller 53-byte ATM cells, is another example of 322 this technique. However, ATM is generally used on high-speed links 323 where the store-and-forward delays are already minimal, and it 324 introduces significant (~9%) additional overhead due to the addition 325 of 5-byte cell overhead to each 48-byte ATM cell. 327 A third example is Data-Over-Cable Service Interface Specifications 328 (DOCSIS) with typical upstream bandwidths of 2.56 Mb/s or 5.12 Mb/s. 329 To reduce the impact of a 1500-byte MTU in DOCSIS 1.0 [DOCSIS1], a 330 data link layer fragmentation mechanism is specified in DOCSIS 1.1 331 [DOCSIS2]. To accommodate the installed base, DOCSIS 1.1 must be 332 backward compatible with DOCSIS 1.0 cable modems, which generally do 333 not support fragmentation. Under the co-existence of DOCSIS 1.0 and 334 DOCSIS 1.1, the unfragmented large data packets from DOCSIS 1.0 cable 335 modems may affect the quality of service for voice packets from 336 DOCSIS 1.1 cable modems. In this case, it has been shown in [DOCSIS3] 337 that use of bandwidth allocation algorithms can mitigate this effect. 339 To summarize, there is a fundamental tradeoff between efficiency and 340 latency in the design of a subnetwork, and the designer should keep 341 this tradeoff in mind. 343 3 Framing on Connection-Oriented Subnetworks 345 IP requires that subnetworks mark the beginning and end of each 346 variable-length, asynchronous IP packet. Some examples of links and 347 subnetworks that do not provide this as an intrinsic feature include: 349 1. leased lines carrying a synchronous bit stream; 351 2. ISDN B-channels carrying a synchronous octet stream; 353 3. dialup telephone modems carrying an asynchronous octet stream; 355 and 357 4. Asynchronous Transfer Mode (ATM) networks carrying an asynchronous 358 stream of fixed-sized "cells". 360 The Internet community has defined packet framing methods for all 361 these subnetworks. The Point-To-Point Protocol (PPP) [RFC1661], which 362 uses a variant of HDLC, is applicable to bit synchronous, octet 363 synchronous and octet asynchronous links (i.e., examples 1-3 above). 364 PPP is one prefered framing method for IP, since a large number of 365 systems interoperate with PPP. ATM has its own framing methods 366 described in [RFC2684] [RFC2364]. 368 At high speeds, a subnetwork should provide a framed interface 369 capable of carrying asynchronous, variable-length IP datagrams. The 370 maximum packet size supported by this interface is discussed above in 371 the MTU/Fragmentation section. The subnetwork may implement this 372 facility in any convenient manner. 374 IP packet boundaries need not coincide with any framing or 375 synchronization mechanisms internal to the subnetwork. When the 376 subnetwork implements variable sized data units, the most 377 straightforward approach is to place exactly one IP packet into each 378 subnetwork data unit (SNDU), and to rely on the subnetwork's existing 379 ability to delimit SNDUs to also delimit IP packets. A good example 380 is Ethernet. However, some subnetworks have SNDUs of one or more 381 fixed sizes, as dictated by switching, forward error correction 382 and/or interleaving considerations. Examples of such subnetworks 383 include ATM, with a single cell size of 48 bytes plus a 5-byte 384 header, and IS-95 digital cellular, with two "rate sets" of four 385 fixed frame sizes each that may be selected on 20 millisecond 386 boundaries. 388 Because IP packets are of variable length, they may not necessarily 389 fit into an integer multiple of fixed-sized SNDUs. An "adaptation 390 layer" is needed to convert IP packets into SNDUs while marking the 391 boundary between each IP packet in some manner. 393 There are several approaches to the problem. The first is to encode 394 each IP packet into one or more SNDUs, with no SNDU containing pieces 395 of more than one IP packet, and padding out the last SNDU of the 396 packet as needed. Bits in a control header added to each SNDU 397 indicate where the data segment belongs in the IP packet. If the 398 subnetwork provides in-order, at-most-once delivery, the header can 399 be as simple as a pair of bits to indicate whether the SNDU is the 400 first and/or the last in the IP packet. Alternatively for subnetworks 401 that do not reorder the fragments of A SNDU, only the last SNDU of 402 the packet could be marked, as this would implicitly indicate the 403 next SNDU as the first in a new IP packet. The AAL5 (ATM Adaption 404 Layer 5) scheme used with ATM is an example of this approach, though 405 it adds other features, including a payload length field and a 406 payload CRC. 408 In AAL5, the ATM User-User Indication, which is encoded in the 409 Payload Type field of an ATM cell, indicates the end cell of a 410 packet. The packet trailer is located at the end of the SNDU and 411 contains the packet length and a CRC. 413 Another framing technique is to insert per-segment overhead to 414 indicate the presence of a segment option. When present, the option 415 carries a pointer to the end of the packet. This differs from AAL5 416 in that it permits another packet to follow within the same segment. 417 MPEG-2 [EN301] [ISO13818] supports this style of fragmentation, and 418 may utilize either padding (limiting each transport stream packet to 419 carry only part of one packet), or to allow a second packet to start 420 (no padding). 422 A third approach is to insert a special flag sequence into the data 423 stream between each IP packet, and to pack the resulting data stream 424 into SNDUs without regard to SNDU boundaries. This may have 425 implications when frames are lost. The flag sequence can also pad 426 unused space at the end of an SNDU. If the special flag appears in 427 the user data, it is escaped to an alternate sequence (usually larger 428 than a flag) to avoid being misinterpreted as a flag. The HDLC-based 429 framing schemes used in PPP are all examples of this approach. 431 All three adaptation schemes introduce overhead; how much depends on 432 the distribution of IP packet sizes, the size(s) of the SNDUs, and in 433 the HDLC-like approaches, the content of the IP packet (since flag- 434 like sequences occurring in the packet must be escaped, which expands 435 them). The designer must also weigh implementation complexity and 436 performance in the choice and design of an adaptation layer. 438 4 Connection-Oriented Subnetworks 440 IP has no notion of a "connection"; it is a purely connectionless 441 protocol. When a connection is required by an application, it is 442 usually provided by TCP [RFC793], the Transmission Control Protocol, 443 running atop IP on an end-to-end basis. 445 Connection-oriented subnetworks can be (and are widely) used to carry 446 IP, but often with considerable complexity. Subnetworks with a few 447 nodes can simply open a permanent connection between each pair of 448 nodes. This is frequently done with ATM. However, the number of 449 connections increases as the square of the number of nodes, so this 450 is clearly impractical for large subnetworks. A "shim" layer between 451 IP and the subnetwork is therefore required to manage connections. 452 This is one of the most common functions of a Subnetwork Dependent 453 Convergence Function (SNDCF) sublayer between IP and a subnetwork. 455 SNDCFs typically open subnetwork connections as needed when an IP 456 packet is queued for transmission and close them after an idle 457 timeout. There is no relation between subnetwork connections and any 458 connections that may exist at higher layers (e.g., TCP). 460 Because Internet traffic is typically bursty and transaction- 461 oriented, it is often difficult to pick an optimal idle timeout. If 462 the timeout is too short, subnetwork connections are opened and 463 closed rapidly, possibly over-stressing the subnetwork call 464 management system (especially if was designed for voice traffic 465 holding times). If the timeout is too long, subnetwork connections 466 are idle much of the time, wasting any resources dedicated to them by 467 the subnetwork. 469 Purely connectionless subnets (such as Ethernet), which have no state 470 and dynamically share resources, are optimal to supporting best- 471 effort IP, which is stateless and dynamically shares resources. 472 Connection-oriented packet networks (such as ATM and Frame Relay), 473 which have state and dynamically share resources, are less optimal, 474 since best effort IP does not benefit from the overhead of creating 475 and maintaining state. Connection-oriented circuit switched networks 476 (including the PSTN and ISDN) both have state and statically allocate 477 resources for a call, and thus not only require state creation and 478 maintenance overhead, but also do not benefit from the efficiencies 479 of statistical multiplexing sharing of capacity inherent in IP. 481 In any event, if an SNDCF that opens and closes subnet connections is 482 used to support IP, care should be taken to make sure that call 483 processing in the subnet can keep up with relatively short holding 484 times. 486 5 Broadcasting and Discovery 488 Subnetworks fall into two categories: point-to-point and shared. A 489 point-to-point subnet has exactly two endpoint components (hosts or 490 routers); a shared link has more than two, using either an inherent 491 broadcast medium (e.g., Ethernet, radio) or that are on a switching 492 layer hidden from the network layer (e.g., switched Ethernet, Myrinet 493 [MYR95], ATM). Switched subnetworks handle broadcast by copying 494 broadcast packets to give to each interface that supports one, or 495 more, systems (hosts or routers) a copy of each packet. 497 Several Internet protocols for IPv4 make use of broadcast 498 capabilities, including link-layer address lookup (ARP), auto- 499 configuration (RARP, BOOTP, DHCP), and routing (RIP). 501 A lack of broadcast capability can impede the performance of these 502 protocols, or render them inoperable (e.g. DHCP). ARP-like link 503 address lookup can be provided by a centralized database, but at the 504 expense of potentially higher response latency and the need for nodes 505 to have explicit knowledge of the ARP server address. Shared links 506 should support native, link-layer subnet broadcast. 508 A corresponding set of IPv6 protocols uses multicasting (see next 509 section) instead of broadcast to provide similar functions with 510 improved scaling in large networks. 512 6 Multicasting 514 The Internet model includes "multicasting", where IP packets are sent 515 to all the members of a multicast group [RFC1112] [RFC2236]. 516 Multicast is an option in IPv4, but a standard feature of IPv6. IPv4 517 multicast is currently used by multimedia, teleconferencing, gaming, 518 and file distribution (web, peer-to-peer sharing) applications, as 519 well as by some key network and host protocols (e.g., RIPv2, OSPF, 520 NTP). IPv6 additionally relies on multicast for network 521 configuration (DHCP-like autoconfiguration) and link-layer address 522 discovery [RFC2461] (replacing ARP). In the case of IPv6 this can 523 allow autoconfiguration and address discovery to span across routers, 524 whereas the IPv4 broadcast-based services cannot without ad-hoc 525 router support [RFC1812]. 527 Multicast enabled IP routers organize each multicast group into a 528 spanning tree, and route multicast packets by making a copy of each 529 multicast packet and forwards the copies to each output interface 530 that includes at least one downstream member of the multicast group. 532 Multicasting is considerably more efficient when a subnetwork 533 explicitly supports it. For example, a router relaying a multicast 534 packet onto an Ethernet subnet need send only one copy of the packet, 535 no matter how many members of the multicast group are connected to 536 the segment. Without native multicast support, routers and switches 537 on shared links would need to use broadcast with software filters, 538 such that every multicast packet sent incurs software overhead for 539 every node on the subnetwork, even if a node is not a member of the 540 multicast group. Alternately, the router would transmit a separate 541 copy to every member of the multicast group on the segment, as is 542 done on multicast-incapable switched subnets. 544 Subnetworks using shared channels (e.g., radio LANs, Ethernets, etc.) 545 are especially suitable for native multicasting, and their designers 546 should make every effort to support it. This involves designating a 547 section of the subnetwork's own address space for multicasting. On 548 these networks, multicast is basically broadcast on the medium, with 549 Layer-2 receiver filters. 551 Subnet interfaces also need to be designed to accept packets 552 addressed to some number of multicast addresses in addition to the 553 unicast packets specifically addressed to them. How many multicast 554 addresses need to be supported by a host depends on the requirements 555 of the associated host; at least several dozen will meet most current 556 needs. 558 On low-speed networks the multicast address recognition function may 559 be readily implemented in host software, but on high-speed networks 560 it should be implemented in subnetwork hardware. This hardware need 561 not be complete; for example, many Ethernet interfaces implement a 562 "hashing" function where the IP layer receives all of the multicast 563 (and unicast) traffic to which the associated host subscribes, plus 564 some small fraction of multicast traffic to which the host does not 565 subscribe. Host/router software then has to discard the unwanted 566 packets that pass the Layer-2 multicast address filter [RFC1112]. 568 There does not need to be a one-to-one mapping between a layer 2 569 multicast address and an IP multicast address. An address overlap may 570 significantly degrade the filtering capability of a receiver's 571 hardware multicast address filter. A subnetwork supporting only 572 broadcast should use this service for multicast and must rely on 573 software filtering. 575 Switched subnetworks must also provide a mechanism for copying 576 multicast packets to ensure the packets reach at least all members of 577 a multicast group. One option is to "flood" multicast packets, in 578 the same manner as broadcast. This can lead to unnecessary 579 transmissions on some subnetwork links (notably non-multicast-aware 580 ethernet switches). Some subnetworks therefore allow multicast filter 581 tables to control which links receive packets belonging to a specific 582 group. To configure this automatically requires access to layer 3 583 group membership information (e.g., IGMP). Various implementation 584 options currently exist to provide a subnet node with a list of 585 multicast addresses to port/interface mappings [MBONED-GAP]. These 586 employ a range of approaches, including signaling from end hosts 587 (e.g. IEEE 802 GARP/GMRP [802.1p]), signaling from switches (e.g. 588 CGMP [CGMP] and RGMP [RFC3488]), interception and proxy of IP group 589 membership packets (e.g. IGMP/MLD Proxy [MAGMA-PROXY]), and enabling 590 Layer 2 devices to snoop/inspect/peek into forwarded Layer 3 protocol 591 headers (e.g. IGMP, MLD, PIM) so that they may infer L3 multicast 592 group membership. These approaches differ in their complexity, 593 flexibility and ability to support new protocols. 595 7 Bandwidth on Demand (BoD) Subnets 597 Some subnets allow a number of subnet nodes to share a channel 598 efficiently by assigning transmission opportunities dynamically. 599 Transmission opportunities are requested by a subnet node when it has 600 packets to send. The subnet schedules and grants transmission 601 opportunities sufficient to allow the transmitting subnet node to 602 send one or more packets (or packet fragments). We call these 603 subnets Bandwidth on Demand (BoD) subnets. Examples of BoD subnets 604 include Demand Assignment Multiple Access (DAMA) satellite and 605 terrestrial wireless networks, IEEE 802.11 point coordination 606 function (PCF) mode, and DOCSIS. A connection-oriented network (like 607 the PSTN, ATM or Frame Relay) reserves resources on a much longer 608 timescale, and is therefore not a BoD subnet in our taxonomy. 610 The design parameters for BoD are similar to those in connection 611 oriented subnetworks, although the implementations may vary 612 significantly. In BoD, the user typically requests access to the 613 shared channel for some duration. Access may be allocated for a 614 period of time at a specific rate, for a certain number of packets, 615 or until the user releases the channel. Access may be coordinated 616 through a central management entity or with a distributed algorithm 617 amongst the users. Examples of the resource that may be shared 618 include a terrestrial wireless hop, a cable modem uplink, a satellite 619 uplink, and an end-to-end satellite channel. 621 Long-delay BoD subnets pose problems similar to connection-oriented 622 networks in anticipating traffic. While connection-oriented subnets 623 that expect new data to arrive hold idle channels open, BoD subnets 624 request channel access based on buffer occupancy (or expected buffer 625 occupancy) on the sending port. Poor performance will likely result 626 if the sender does not anticipate additional traffic arriving at that 627 port during the time it takes to grant a transmission request. It is 628 recommended that the algorithm have the capability to extend a hold 629 on the channel for data that has arrived after the original request 630 was generated (this may done by piggybacking new requests on user 631 data). 633 There is a wide variety of BoD protocols available. However, there 634 has been relatively little comprehensive research on the interactions 635 between BoD mechanisms and Internet protocol performance. Research 636 on some specific mechanisms is available (e.g., [AR02]). One item 637 that has been studied is TCP's retransmission timer [KY02]. BoD 638 systems can cause spurious timeouts when adjusting from a relatively 639 high data rate to a relatively low data rate. In this case, TCP's 640 transmitted data takes longer to get through the network than 641 predicted by the TCP sender's computed retransmission timeout and 642 therefore the TCP sender is prone to resending a segment prematurely. 644 8 Reliability and Error Control 646 In the Internet architecture, the ultimate responsibility for error 647 recovery is at the end points [SRC81]. The Internet may occasionally 648 drop, corrupt, duplicate or reorder packets, and the transport 649 protocol (e.g., TCP) or application (e.g., if UDP is used as the 650 transport protocol) must recover from these errors on an end-to-end 651 basis. Error recovery in the subnetwork is therefore justifiable 652 only to the extent that it can enhance overall performance. It is 653 important to recognize that a subnetwork can go too far in attempting 654 to provide error recovery services in the Internet environment. 655 Subnet reliability should be "lightweight", i.e., it only has to be 656 "good enough", *not* perfect. 658 In this section we discuss how to analyze characteristics of a 659 subnetwork to determine what is "good enough". The discussion below 660 focuses on TCP, which is the most widely-used transport protocol in 661 the Internet. It is widely believed (and is a stated goal within the 662 IETF) that non-TCP transport protocols should attempt to be "TCP- 663 friendly" and have many of the same performance characteristics. 664 Thus, the discussion below should be applicable even to portions of 665 the Internet where TCP may not be the predominant protocol. 667 8.1 TCP vs Link-Layer Retransmission 669 Error recovery involves the generation and transmission of redundant 670 information computed from user data. Depending on how much redundant 671 information is sent and how it is generated, the receiver can use it 672 to reliably detect transmission errors; correct up to some maximum 673 number of transmission errors; or both. The general approach is known 674 as Error Control Coding, or ECC. 676 The use of ECC to detect transmission errors so that retransmissions 677 (hopefully without errors) can be requested is widely known as "ARQ" 678 (Automatic Repeat Request). 680 When enough ECC information is available to permit the receiver to 681 correct some transmission errors without a retransmission, the 682 approach is known as Forward Error Correction (FEC). Due to the 683 greater complexity of the required ECC and the need to tailor its 684 design to the characteristics of a specific modem and channel, FEC 685 has traditionally been implemented in special-purpose hardware 686 integral to a modem. This effectively makes it part of the physical 687 layer. 689 Unlike ARQ, FEC was seldom used for telecommunications outside of 690 space links prior to the 1990s. It is now nearly universal in 691 telephone, cable and DSL modems, digital satellite links and digital 692 mobile telephones. FEC is also heavily used in optical and magnetic 693 storage where "retransmissions" are not possible. 695 Some systems use hybrid combinations of ARQ layered atop FEC; V.90 696 dialup modems (in the upstream direction) with V.42 error control are 697 one example. Most errors are corrected by the trellis (FEC) code 698 within the V.90 modem, and most that remain are detected and 699 corrected by the ARQ mechanisms in V.42. 701 Work is now underway to apply FEC above the physical layer, primarily 702 in connection with reliable multicasting [RFC3048] where conventional 703 ARQ mechanisms are inefficient or difficult to implement. However, in 704 this discussion we will assume that if FEC is present, it is 705 implemented within the physical layer. 707 Depending on the layer where it is implemented, error control can 708 operate on an end-to-end basis or over a shorter span such as a 709 single link. TCP is the most important example of an end-to-end 710 protocol that uses an ARQ strategy. 712 Many link-layer protocols use ARQ, usually some flavor of HDLC 713 [ISO3309]. Examples include the X.25 link layer, the AX.25 protocol 714 used in amateur packet radio, 802.11 wireless LANs, and the reliable 715 link layer specified in IEEE 802.2. 717 Only end-to-end error recovery can ensure a reliable service to the 718 application (see Section 8). However, some subnetworks (e.g., many 719 wireless links) also require link-layer error recovery as a 720 performance enhancement [RFC3366]. For example, many cellular links 721 have small physical frame sizes (< 100 bytes) and relatively high 722 frame loss rates. Relying entirely on end-to-end error recovery 723 clearly yields a performance degradation, as retransmissions across 724 the end-to-end path take much longer to be received than when link 725 layer retransmissions are used. Thus, link-layer error recovery can 726 often increase end-to-end performance. As a result, link-layer and 727 end-to-end recovery often co-exist; this can lead to the possibility 728 of inefficient interactions between the two layers of ARQ protocols. 730 This inter-layer "competition" might lead to the following wasteful 731 situation. When the link layer retransmits (parts of) a packet, the 732 link latency momentarily increases. Since TCP bases its 733 retransmission timeout on prior measurements of total end-to-end 734 latency, including that of the link in question, this sudden increase 735 in latency may trigger an unnecessary retransmission by TCP of a 736 packet that the link layer is still retransmitting. Such spurious 737 end-to-end retransmissions generate unnecessary load and reduce end- 738 to-end throughput. As a result, the link layer may even have multiple 739 copies of the same packet in the same link queue at the same time. In 740 general, one could say the competing error recovery is caused by an 741 inner control loop (link-layer error recovery) reacting to the same 742 signal as an outer control loop (end- to-end error recovery) without 743 any coordination between the loops. Note that this is solely an 744 efficiency issue; TCP continues to provide reliable end-to-end 745 delivery over such links. 747 This raises the question of how persistent a link-layer sender should 748 be in performing retransmission [RFC3366]. We define the link-layer 749 (LL) ARQ persistency as the maximum time that a particular link will 750 spend trying to transfer a packet before it can be discarded. This 751 deliberately simplified definition says nothing about maximum number 752 of retransmissions, retransmission strategies, queue sizes, queuing 753 disciplines, transmission delays, or the like. The reason we use the 754 term LL ARQ persistency instead of a term such as 'maximum link-layer 755 packet holding time' is that the definition closely relates to link- 756 layer error recovery. For example, on links that implement 757 straightforward error recovery strategies, LL ARQ persistency will 758 often correspond to a maximum number of retransmissions permitted per 759 link-layer frame. 761 For link layers that do not or cannot differentiate between flows 762 (e.g., due to network layer encryption), the LL ARQ persistency 763 should be small. This avoids any harmful effects or performance 764 degradation resulting from indiscriminate high persistence. A 765 detailed discussion of these issues is provided in [RFC3366]. 767 However, when a link layer can identify individual flows and apply 768 ARQ selectively [LKJK02], then the link ARQ persistency should be 769 high for a flow using reliable unicast transport protocols (e.g., 770 TCP) and must be low for all other flows. Setting the link ARQ 771 persistency larger than the largest link outage allows TCP to rapidly 772 restore transmission without the need to wait for a retransmission 773 time out. This generally improves TCP performance in the face of 774 transient outages. However, excessively high persistence may be 775 disadvantageous; a practical upper limit of 30-60 seconds may be 776 desirable. Implementation of such schemes remains a research issue. 777 (See also Section "Recovery from Subnetwork Outages"). 779 Many subnetwork designers have opportunities to reduce the 780 probability of packet loss, e.g., with FEC, ARQ and interleaving, at 781 the cost of increased delay. TCP performance improves with decreasing 782 loss but worsens with increasing end-to-end delay, so it is important 783 to find the proper for expected TCP traffic on its end-to-end paths 784 across the subnet balance through analysis and simulation. 786 8.2 Recovery from Subnetwork Outages 788 Some types of subnetworks, particularly mobile radio, are subject to 789 frequent temporary outages. For example, an active cellular data user 790 may drive or walk into an area (such as a tunnel) that is out of 791 range of any base station. No packets will be successfully delivered 792 until the user returns to an area with coverage. 794 The Internet protocols currently provide no standard way for a 795 subnetwork to explicitly notify an upper layer protocol (e.g., TCP) 796 that it is experiencing an outage rather than severe congestion. 798 Under these circumstances TCP will, after each unsuccessful 799 retransmission, wait even longer before trying again; this is its 800 "exponential back-off" algorithm. Furthermore, TCP will not discover 801 that the subnetwork outage has ended until its next retransmission 802 attempt. If TCP has backed off, this may take some time. This can 803 lead to extremely poor TCP performance over such subnetworks. 805 It is therefore highly desirable that a subnetwork subject to outages 806 not silently discard packets during an outage. Ideally, the 807 subnetwork should define an interface to the next higher layer (i.e., 808 IP) that allows it to refuse packets during an outage, and to 809 automatically ask IP for new packets when it is again able to deliver 810 them. If it cannot do this, then the subnetwork should hold onto at 811 least some of the packets it accepts during an outage and attempt to 812 deliver them when the outage ends. When packets are discarded, IP 813 should be notified so that the appropriate ICMP messages can be sent. 815 Note that it is *not* necessary to completely avoid dropping packets 816 during an outage. The purpose of holding onto a packet during an 817 outage, either in the subnetwork or at the IP layer, is so that its 818 eventual delivery will implicitly notify TCP that the subnetwork is 819 again operational. This is to enhance performance, not to ensure 820 reliability -- reliability, as discussed earlier, can only be ensured 821 on an end-to-end basis. 823 Only a few packets per TCP connection, including ACKs, need be held 824 in this way to cause the TCP sender to recover from the additional 825 losses once the flow resumes [RFC3366]. 827 Because it would be a layering violation (and possibly a performance 828 hit) for IP or a subnetwork layer to look at TCP headers (which would 829 in any event be impossible if IPsec [RFC2401] encryption is in use), 830 it would be reasonable for the IP or subnetwork layers to choose, as 831 a design parameter, some small number of packets that will be 832 retained during an outage. 834 8.3 CRCs, Checksums and Error Detection 836 The TCP [RFC793], UDP [RFC768], ICMP, and IPv4 [RFC791] protocols all 837 use the same simple 16-bit 1's complement checksum algorithm 838 [RFC1071] to detect corrupted packets. The IPv4 header checksum 839 protects only the IPv4 header, while the TCP, ICMP, and UDP checksums 840 provide end-to-end error detection for both the transport pseudo 841 header (including network and transport layer information) and the 842 transport payload data. Protection of the data is optional for 843 applications using UDP [RFC768] for IPv4, but is required for IPv6. 845 The Internet checksum is not very strong from a coding theory 846 standpoint, but it is easy to compute in software, and various 847 proposals to replace the Internet checksums with stronger checksums 848 have failed. However, it is known that undetected errors can and do 849 occur in packets received by end hosts [SP2000]. 851 To reduce processing costs, IPv6 has no IP header checksum. The 852 destination host detects "important" errors in the IP header such as 853 the delivery of the packet to the wrong destination. This is done by 854 including the IP source and destination addresses (pseudo header) in 855 the computation of the checksum in the TCP or UDP header, a practice 856 already performed in IPv4. Errors in other IPv6 header fields may go 857 undetected within the network; this was considered a reasonable price 858 to pay for a considerable reduction in the processing required by 859 each router, and it was assumed that subnetworks would use a strong 860 link CRC. 862 One way to provide additional protection for an IPv4 or IPv6 header 863 is by the authentication and packet integrity services of the IP 864 Security (IPsec) protocol [RFC2401]. However, this may not be a 865 choice available to the subnetwork designer. 867 Most subnetworks implement error detection just above the physical 868 layer. Packets corrupted in transmission are detected and discarded 869 before delivery to the IP layer. A 16-bit cyclic redundancy check 870 (CRC) is usually the minimum for error detection. This is 871 significantly more robust against most patterns of errors than the 872 16-bit Internet checksum. However, not that the error detection 873 properties of a specific CRC code diminish with increasing frame 874 size. The Point-to-Point Protocol [RFC1662] requires support of a 875 16-bit CRC for each link frame, with a 32-bit CRC as an option. (PPP 876 is often used in conjunction with a dialup modem, which can provides 877 its own error control). Other subnetworks, including 802.3/Ethernet, 878 AAL5/ATM, FDDI, Token Ring and PPP over SONET/SDH all use a 32-bit 879 CRC. Many subnetworks can also use other mechanisms to enhance the 880 error detection capability of the link CRC (e.g., FEC in dialup 881 modems, mobile radio and satellite channels). 883 Any new subnetwork designed to carry IP should therefore provide 884 error detection for each IP packet that is at least as strong as the 885 32-bit CRC specified in [ISO3309]. While this will achieve a very 886 low undetected packet error rate due to transmission errors, it will 887 not (and need not) achieve a very low packet loss rate as the 888 Internet protocols are better suited to dealing with lost packets 889 than to dealing with corrupted packets [SRC81]. 891 Packet corruption may be, and is, also caused by bugs in host and 892 router hardware and software. Even if every subnetwork implemented 893 strong error detection, it is still essential that end-to-end 894 checksums are used at the receiving end host [SP2000]. 896 Designers of complex subnetworks consisting of internal links and 897 packet switches should consider implementing error detection on an 898 edge-to-edge basis to cover an entire SNDU (or IP packet). A CRC 899 would be generated at the entry point to the subnetwork and checked 900 at the exit endpoint. This may be used instead of, or in combination 901 with, error detection at the interface to each physical link. An 902 edge-to-edge check has the significant advantage of protecting 903 against errors introduced anywhere within the subnetwork, not just 904 within its transmission links. Examples of this approach include the 905 way in the Ethernet CRC-32 is handled by LAN bridges [802.1D]. ATM 906 AAL5 [ITU-I363] also uses an edge-to-edge CRC-32. 908 Some specific applications may be tolerant of residual errors in the 909 data they exchange, but removal of the link CRC may expose the 910 network to an undesirable increase in undetected errors in the IP and 911 transport headers. Applications may also require a high level of 912 error protection for control information exchanged by protocols 913 acting above the transport layer. One example is a voice codec which 914 is robust against bit errors in the speech samples. For such 915 mechanisms to work, the receiving application must be able to 916 tolerate receiving corrupted data. This also requires that an 917 application uses a mechanism to signal payload corruption is 918 permitted and to indicate the coverage (headers and data) that is 919 required to be protected by the subnetwork CRC. Currently there is 920 no Internet standard for supporting partial payload protection. 921 Receipt of corrupt data by arbitrary application protocols carries a 922 serious danger that a subnet delivers data with errors which remain 923 undetected by the application and hence corrupt the communicated data 924 [SRC81]. 926 8.4 How TCP Works 928 One of TCP's functions is end-host based congestion control for the 929 Internet. This is a critical part of the overall stability of the 930 Internet, so it is important that link-layer designers understand 931 TCP's congestion control algorithms. 933 TCP assumes that, at the most abstract level, the network consists of 934 links and queues. Queues provide output-buffering on links that are 935 momentarily oversubscribed. They smooth instantaneous traffic bursts 936 to fit the link bandwidth. When demand exceeds link capacity long 937 enough to fill the queue, packets must be dropped. The traditional 938 action of dropping the most recent packet ("tail dropping") is no 939 longer recommended [RFC2309,RFC2914], but it is still widely 940 practiced. 942 TCP uses sequence numbering and acknowledgments (ACKs) on an end-to- 943 end basis to provide reliable, sequenced delivery. TCP ACKs are 944 cumulative, i.e., each implicitly ACKs every segment received so far. 945 If a packet with an unexpected sequence number is received, the ACK 946 field in the packets returned by the receiver will cease to advance. 947 Using an optional enhancement, TCP can send selective acknowledgments 948 (SACKs) [RFC2018] to indicate which segments have arrived at the 949 receiver. 951 Since the most common cause of packet loss is congestion, TCP treats 952 packet loss as a potential indication of Internet congestion along 953 the path between TCP endhosts. This happens automatically, and the 954 subnetwork need not know anything about IP or TCP. A subnetwork node 955 simply drops packets whenever it must, though some packet-dropping 956 strategies (e.g., RED) are more fair to competing flows than others. 958 TCP recovers from packet losses in two different ways. The most 959 important mechanism is the retransmission timeout. If an ACK fails to 960 arrive after a certain period of time, TCP retransmits the oldest 961 unacked packet. Taking this as a hint that the network is congested, 962 TCP waits for the retransmission to be ACKed before it continues, and 963 it gradually increases the number of packets in flight as long as a 964 timeout does not occur again. 966 A retransmission timeout can impose a significant performance 967 penalty, as the sender is idle during the timeout interval and 968 restarts with a congestion window of 1 following the timeout. To 969 allow faster recovery from the occasional lost packet in a bulk 970 transfer, an alternate scheme known as "fast recovery" was introduced 971 [RFC2581] [RFC2582] [RFC2914] [TCPF98]. 973 Fast recovery relies on the fact that when a single packet is lost in 974 a bulk transfer, the receiver continues to return ACKs to subsequent 975 data packets that do not actually acknowledge any newly-received 976 data. These are known as "duplicate acknowledgments" or "dupacks". 977 The sending TCP can use dupacks as a hint that a packet has been lost 978 and retransmit it without waiting for a timeout. Dupacks effectively 979 constitute a negative acknowledgment (NAK) for the packet sequence 980 number in the acknowledgment field. TCP waits until a certain number 981 of dupacks (currently 3) are seen prior to assuming a loss has 982 occurred; this helps avoid an unnecessary retransmission during out- 983 of-sequence delivery. 985 A new technique called "Explicit Congestion Notification" (ECN) 986 [RFC3168] allows routers to directly signal congestion to hosts 987 without dropping packets. This is done by setting a bit in the IP 988 header. Since ECN support is likely to remain optional, the lack of 989 an ECN bit must NEVER be interpreted as a lack of congestion. Thus, 990 for the foreseeable future, TCP must interpret a lost packet as a 991 signal of congestion. 993 The TCP "congestion avoidance" [RFC2581] algorithm maintains a 994 congestion window (cwnd) controlling the amount of data TCP may have 995 in flight at any moment. Reducing cwnd reduces the overall bandwidth 996 obtained by the connection; similarly, raising cwnd increases the 997 performance, up to the limit of the available capacity. 999 TCP probes for available network capacity by initially setting cwnd 1000 to one or two packets and then increasing cwnd by one packet for each 1001 ACK returned from the receiver. This is TCP's "slow start" mechanism. 1002 When a packet loss is detected (or congestion is signaled by other 1003 mechanisms), cwnd is reset to one and the slow start process is 1004 repeated until cwnd reaches one half of its previous setting before 1005 the reset. Cwnd continues to increase past this point, but at a much 1006 slower rate than before. If no further losses occur, cwnd will 1007 ultimately reach the window size advertised by the receiver. 1009 This is an "Additive Increase, Multiplicative Decrease" (AIMD) 1010 algorithm. The steep decrease of cwnd in response to congestion 1011 provides for network stability; the AIMD algorithm also provides for 1012 fairness between long running TCP connections sharing the same path. 1014 8.5 TCP Performance Characteristics 1016 Caveat 1018 Here we present a current "state-of-the-art" understanding of TCP 1019 performance. This analysis attempts to characterize the performance 1020 of TCP connections over links of varying characteristics. 1022 Link designers may wish to use the techniques in this section to 1023 predict what performance TCP/IP may achieve over a new link-layer 1024 design. Such analysis is encouraged. Because this is a relatively 1025 new analysis, and the theory is based on single-stream TCP 1026 connections under "ideal" conditions, it should be recognized that 1027 the results of such analysis may differ from actual performance in 1028 the Internet. That being said, we have done the best we can to 1029 provide information which will help designers get an accurate picture 1030 of the capabilities and limitations of TCP under various conditions. 1032 8.5.1 The Formulae 1034 The performance of TCP's AIMD Congestion Avoidance algorithm has been 1035 extensively analyzed. The current best formula for the performance 1036 of the specific algorithms used by Reno TCP (i.e., the TCP specified 1037 in [RFC2581]) is given by Padhye, et al [PFTK98]. This formula is: 1039 MSS 1040 BW = -------------------------------------------------------- 1041 RTT*sqrt(1.33*p) + RTO*p*[1+32*p^2]*min[1,3*sqrt(.75*p)] 1043 where 1045 BW is the maximum TCP throughout achievable by an 1046 individual TCP flow 1047 MSS is the TCP segment size being used by the connection 1048 RTT is the end-to-end round trip time of the TCP connection 1049 RTO is the packet timeout (based on RTT) 1050 p is the packet loss rate for the path 1051 (i.e. .01 if there is 1% packet loss) 1053 Note that the speed of the links making up the Internet path does not 1054 explicitly appear in this formula. Attempting to send faster than the 1055 slowest link in the path causes the queue to grow at the transmitter 1056 driving the bottleneck. This increases the RTT, which in turn reduces 1057 the achievable throughput. 1059 This is currently considered to be the best approximate formula for 1060 Reno TCP performance. A further simplification to this formula is 1061 generally made by assuming that RTO is approximately 5*RTT. 1063 TCP is constantly being improved. A simpler formula, which gives an 1064 upper bound on the performance of any AIMD algorithm which is likely 1065 to be implemented in TCP in the future, was derived by Ott, et al 1066 [MSMO97]. 1068 MSS 1 1069 BW = C --- ------- 1070 RTT sqrt(p) 1072 where C is 0.93. 1074 8.5.2 Assumptions 1076 Both formulae assume that the TCP Receiver Window is not limiting the 1077 performance of the connection. Because the receiver window is 1078 entirely determined by end-hosts, we assume that hosts will maximize 1079 the announced receiver window to maximize their network performance. 1081 Both of these formulae allow BW to become infinite if there is no 1082 loss. However, an Internet path will drop packets at bottleneck 1083 queues if the load is too high. Thus, a completely lossless TCP/IP 1084 network can never occur (unless the network is being underutilized). 1086 The RTT used is the arithmetic average, including queuing delays. 1088 The formulae are for a single TCP connection. If a path carries many 1089 TCP connections, each will follow the formulae above independently. 1091 The formulae assume long-running TCP connections. For connections 1092 that are extremely short (<10 packets) and don't lose any packets, 1093 performance is driven by the TCP slow-start algorithm. For 1094 connections of medium length, where on average only a few segments 1095 are lost, single connection performance will actually be slightly 1096 better than given by the formulae above. 1098 The difference between the simple and complex formulae above is that 1099 the complex formula includes the effects of TCP retransmission 1100 timeouts. For very low levels of packet loss (significantly less 1101 than 1%), timeouts are unlikely to occur, and the formulae lead to 1102 very similar results. At higher packet losses (1% and above), the 1103 complex formula gives a more accurate estimate of performance (which 1104 will always be significantly lower than the result from the simple 1105 formula). 1107 Note that these formulae break down as p approaches 100%. 1109 8.5.3 Analysis of Link-Layer Effects on TCP Performance 1111 Consider the following example: 1113 A designer invents a new wireless link layer which, on average, loses 1114 1% of IP packets. The link layer supports packets of up to 1040 1115 bytes, and has a one-way delay of 20 msec. 1117 If this link layer were used in the Internet, on a path that 1118 otherwise had a round trip of 80 msec, you could compute an upper 1119 bound on the performance as follows: 1121 For MSS, use 1000 bytes to exclude the 40 bytes of minimum IPv4 and 1122 TCP headers. 1124 For RTT, use 120 msec (80 msec for the Internet part, plus 20 msec 1125 each way for the new wireless link). 1127 For p, use .01. For C, assume 1. 1129 The simple formula gives: 1131 BW = (1000 * 8 bits) / (.120 sec * sqrt(.01)) = 666 kbit/sec 1133 The more complex formula gives: 1135 BW = 402.9 kbit/sec 1137 If this were a 2 Mb/s wireless LAN, the designers might be somewhat 1138 disappointed. 1140 Some observations on performance: 1142 1. We have assumed that the packet losses on the link layer are 1143 interpreted as congestion by TCP. This is a "fact of life" that must 1144 be accepted. 1146 2. The equations for TCP performance are all expressed in terms of 1147 packet loss, but many subnetwork designers think in terms of bit- 1148 error ratio. *If* channel bit errors are independent, then the 1149 probability of a packet being corrupted is: 1151 p = 1 - ([1 - BER]^[FRAME_SIZE*8]) 1153 Here we assume FRAME_SIZE is in bytes and "^" represents 1154 exponentiation. It includes the user data and all headers (TCP,IP and 1155 subnetwork). (Note: this analysis assumes the subnetwork does not 1156 perform ARQ or transparent fragmentation [RFC3366].) If the 1157 inequality 1159 BER * [FRAME_SIZE*8] << 1 1161 holds, the packet loss probability p can be approximated by: 1163 p = BER * [FRAME_SIZE*8] 1165 These equations can be used to apply BER to the performance equations 1166 above. 1168 Note that FRAME_SIZE can vary from one packet to the next. Small 1169 packets (such as TCP acks) generally have a smaller probability of 1170 packet error than, say, a TCP packet carrying one MSS (maximum 1171 segment size) of user data. A flow of small TCP acks can be expected 1172 to be slightly more reliable than a stream of larger TCP data 1173 segments. 1175 It bears repeating that the above analysis assumes that bit errors 1176 are statistically independent. Because this is not true for many real 1177 links, our computation of p is actually an upper bound, not the exact 1178 probability of packet loss. 1180 There are many reasons why bit errors are not independent on real 1181 links. Many radio links are affected by propagation fading or by 1182 interference that lasts over many bit times. 1184 Also, links with Forward Error Correction (FEC) generally have very 1185 non-uniform bit error distributions that depend on the type of FEC, 1186 but in general the uncorrected errors tend to occur in bursts even 1187 when channel symbol errors are independent. In all such cases our 1188 computation of p from BER can only place an upper limit on the packet 1189 loss rate. 1191 If the distribution of errors under the FEC scheme is known, one 1192 could apply the same type of analysis as above, using the correct 1193 distribution function for the BER. It is more likely in these FEC 1194 cases, however, that empirical methods are needed to determine the 1195 actual packet loss rate. 1197 3. Note that the packet size plays an important role. If the 1198 subnetwork loss characteristics are such that large packets have the 1199 same probability of loss as smaller packets, then larger packets will 1200 yield improved performance. 1202 4. We have chosen a specific RTT that might occur on a wide-area 1203 Internet path within the USA. It is important to recognize that a 1204 variety of RTT values are experienced in the Internet. 1206 For example, RTTs are typically less than 10 msec in a wired LAN 1207 environment when communicating with a local host. International 1208 connections may have RTTs of 200 msec or more. Modems and other low- 1209 capacity links can add considerable delay due to their long packet 1210 transmission (serialisation) times. 1212 Links over geostationary repeater satellites have one-way speed-of- 1213 light delays of around 250ms: a minimum of 125ms propagation delay up 1214 to the satellite and 125ms down. The RTT of an end-to-end TCP 1215 connection that includes such a link can be expected to be greater 1216 than 250ms. 1218 Queues on heavily-congested links may back up, increasing RTTs. 1219 Finally, virtual private networks (VPNs) and other forms of 1220 encryption and tunneling can add significant end-to-end delay to 1221 network connections. 1223 9 Quality-of-Service (QoS) considerations 1225 It is generally recognized that specific service guarantees are 1226 needed to support real-time multimedia, toll-quality telephony and 1227 other performance-critical applications. The provision of such 1228 Quality of Service guarantees in the Internet is an active area of 1229 research and standardization. The IETF has not converged on a single 1230 service model, set of services or single mechanism that will offer 1231 useful guarantees to applications and be scalable to the Internet. 1232 Indeed, the IETF does not have a single definition of Quality of 1233 Service. [RFC2990] represents a current understanding of the 1234 challenges in architecting QoS for the Internet. 1236 There are presently two architectural approaches to providing 1237 mechanisms for QoS support in the Internet. 1239 IP Integrated Services (Intserv) [RFC1633] provides fine-grained 1240 service guarantees to individual flows. Flows are identified by a 1241 flow specification (flowspec), which creates a stateful association 1242 between individual packets by matching fields in the packet header. 1243 Capacity is reserved for the flow, and appropriate traffic 1244 conditioning and scheduling is installed in routers along the path. 1245 The ReSerVation Protocol (RSVP) [RFC2205, RFC2210] is usually, but 1246 need not necessarily be, used to install the flow QoS state. Intserv 1247 defines two services, in addition to the Default (best effort) 1248 service. 1250 -- Guaranteed Service (GS) [RFC 2212] offers hard upper bounds on 1251 delay to flows that conform to a traffic specification (TSpec). It 1252 uses a fluid-flow model to relate the TSpec and reserved bandwidth 1253 (RSpec) to variable delay. Non-conforming packets are forwarded on 1254 a best-effort basis. 1256 -- Controlled Load Service (CLS) [RFC2211] offers delay and packet 1257 loss equivalent to that of an unloaded network to flows that 1258 conform to a TSpec, but no hard bounds. Non-conforming packets are 1259 forwarded on a best-effort basis. 1261 Intserv requires installation of state information in every 1262 participating router. Performance guarantees cannot be made unless 1263 this state is present in every router along the path. This, along 1264 with RSVP processing and the need for usage-based accounting, is 1265 believed to have scalability problems, particularly in the core of 1266 the Internet [RFC2208]. 1268 IP Differentiated Services (Diffserv) [RFC2475] provides a "toolkit" 1269 offering coarse-grained controls to aggregates of flows. Diffserv in 1270 itself does NOT provide QoS guarantees, but can be used to construct 1271 services with QoS guarantees across a Diffserv domain. Diffserv 1272 attempts to address the scaling issues associated with Intserv by 1273 requiring state awareness only at the edge of a Diffserv domain. At 1274 the edge, packets are classified into flows, and the flows are 1275 conditioned (marked, policed or shaped) to a traffic conditioning 1276 specification (TCS). A Diffserv Codepoint (DSCP), identifying a per- 1277 hop behavior (PHB), is set in each packet header. The DSCP is 1278 carried in the DS-field, subsuming six bits of the former Type-of- 1279 Service (ToS) byte [RFC791] of the IP header [RFC2474]. The PHB 1280 denotes the forwarding behavior to be applied to the packet in each 1281 node in the Diffserv domain. Although there is a "recommended" DSCP 1282 associated with each PHB, the mappings from DSCPs to PHBs are defined 1283 by the DS-domain. In fact, there can be several DSCPs associated 1284 with the same PHB. Diffserv presently defines three PHBs. 1286 The class selector PHB [RFC2474] replaces the IP precedence field of 1287 the former ToS byte. It offers relative forwarding priorities. 1289 The Expedited Forwarding (EF) PHB [RFC2598] guarantees that packets 1290 will have a well-defined minimum departure rate which, if not 1291 exceeded, ensures that the associated queues are short or empty. EF 1292 is intended to support services that offer tightly-bounded loss, 1293 delay and delay jitter. 1295 The Assured Forwarding (AF) PHB group [RFC2597] offers different 1296 levels of forwarding assurance for each aggregated flow of packets. 1297 Each AF group is independently allocated forwarding resources. 1298 Packets are marked with one of three drop precedences; those with the 1299 highest drop precedence are dropped with lower probability than those 1300 marked with the lowest drop precedence. DSCPs are recommended for 1301 four independent AF groups, although a DS domain can have more or 1302 fewer AF groups. 1304 Ongoing work in the IETF is addressing ways to support Intserv with 1305 Diffserv. There is some belief (e.g. as expressed in [RFC 2990]) 1306 that such an approach will allow individual flows to receive service 1307 guarantees and scale to the global Internet. 1309 The QoS guarantees that can be offered by the IP layer are a product 1310 of two factors: 1312 -- the concatenation of the QoS guarantees offered by the subnets 1313 along the path of a flow. This implies that a subnet may wish to 1314 offer multiple services (with different QoS guarantees) to the IP 1315 layer, which can then determine which flows use which subnet 1316 service. To put it another way, forwarding behavior in the subnet 1317 needs to be 'clued' by the forwarding behavior (service or PHB) at 1318 the IP layer, and 1320 -- the operation of a set of cooperating mechanisms, such as 1321 bandwidth reservation and admission control, policy management, 1322 traffic classification, traffic conditioning (marking, policing 1323 and/or shaping), selective discard, queuing and scheduling. Note 1324 that support for QoS in subnets may require similar mechanisms, 1325 especially when these subnets are general topology subnets (e.g., 1326 ATM, frame relay or MPLS) or shared media subnets. 1328 Many subnetwork designers face inherent tradeoffs between delay, 1329 throughput, reliability and cost. Other subnetworks have parameters 1330 that manage bandwidth, internal connection state, and the like. 1331 Therefore, the following subnetwork capabilities may be desirable, 1332 although some might be trivial or moot if the subnet is a dedicated 1333 point-to-point link. 1335 - The subnetwork should have the ability to reserve bandwidth for a 1336 connection or flow and schedule packets accordingly. 1338 - Bandwidth reservations should be based on a one- or two-token 1339 bucket model, depending on whether the service is intended to 1340 support constant-rate or bursty traffic. 1342 - If a connection or flow does not use its reserved bandwidth at a 1343 given time, the unused bandwidth should be available for other 1344 flows. 1346 - Packets in excess of a connection or flow's agreed rate should be 1347 forwarded as best-effort or discarded, depending on the service 1348 offered by the subnet to the IP layer. 1350 - If a subnet contains error control mechanisms (retransmission 1351 and/or FEC), it should be possible for the IP layer to influence 1352 the inherent tradeoffs between uncorrected errors, packet losses 1353 and delay. These capabilities at the subnet/IP layer service 1354 boundary correspond to to selection of more or less error control 1355 and/or to selection of particular error control mechanisms within 1356 the subnetwork. 1358 - The subnet layer should know, and be able to inform the IP layer, 1359 how much fixed delay and delay jitter it offers for a flow or 1360 connection. If the Intserv model is used, the delay jitter 1361 component may best be expressed in terms of the TSpec/RSpec model 1362 described in [RFC2212]. 1364 - Support of the Diffserv class selectors [RFC2474] suggests that 1365 the subnet might consider mechanisms that support priorities. 1367 10 Fairness vs Performance 1369 Subnetwork designers should be aware of the tradeoffs between 1370 fairness and efficiency inherent in many transmission scheduling 1371 algorithms. For example, many local area networks use contention 1372 protocols to resolve access to a shared transmission channel. These 1373 protocols represent overhead. Limiting the amount of data that a 1374 subnet node may transmit per contention cycle helps assure timely 1375 access to the channel for each subnet node, but it also increases 1376 contention overhead per unit of data sent. 1378 In some mobile radio networks, capacity is limited by interference, 1379 which in turn depends on average transmitter power. Some receivers 1380 may require considerably more transmitter power (generating more 1381 interference and consuming more channel capacity) than others. 1383 In each case, the scheduling algorithm designer must balance 1384 competing objectives: providing a fair share of capacity to each 1385 subnet node while maximizing the total capacity of the network. One 1386 approach for balancing performance and fairness is outlined in 1387 [ES00]. 1389 11 Delay Characteristics 1391 The TCP sender bases its retransmission timeout (RTO) on measurements 1392 of the round trip delay experienced by previous packets. This allows 1393 TCP to adapt automatically to the very wide range of delays found on 1394 the Internet. The recommended algorithms are described in [RFC2988]. 1395 Evaluations of TCP's retransmission timer can be found in [AP99] and 1396 [LS00]. 1398 These algorithms model the delay along an Internet path as a 1399 normally-distributed random variable with slowly-varying mean and 1400 standard deviation. TCP estimates these two parameters by 1401 exponentially smoothing individual delay measurements, and it sets 1402 the RTO to the estimated mean delay plus some fixed number of 1403 standard deviations. (The algorithm actually uses mean deviation as 1404 an approximation to standard deviation, as it is easier to compute.) 1406 The goal is to compute a RTO that is small enough to detect and 1407 recover from packet losses while minimizing unnecessary ("spurious") 1408 retransmissions when packets are unexpectedly delayed but not lost. 1409 Although these goals conflict, the algorithm works well when the 1410 delay variance along the Internet path is low, or the packet loss 1411 rate is low. 1413 If the path delay variance is high, TCP sets a RTO that is much 1414 larger than the mean of the measured delays. But if the packet loss 1415 rate is low, the large RTO is of little consequence, as timeouts 1416 occur only rarely. Conversely, if the path delay variance is low, 1417 then TCP recovers quickly from lost packets; again, the algorithm 1418 works well. However when delay variance and the packet loss rate are 1419 both high, these algorithms perform poorly, especially when the mean 1420 delay is also high. 1422 Because TCP uses returning acknowledgments as a "clock" to time the 1423 transmission of additional data, excessively high delays (even if the 1424 delay variance is low) also affect TCP's ability to fully utilize a 1425 high-speed transmission pipe. It also slows down the recovery of lost 1426 packets even when delay variance is small. 1428 Subnetwork designers should therefore minimize all three parameters 1429 (delay, delay variance and packet loss) as much as possible. 1431 In many subnetworks, these parameters are inherently in conflict. 1432 For example, on a mobile radio channel the subnetwork designer can 1433 use retransmission (ARQ) and/or forward error correction (FEC) to 1434 trade off delay, delay variance and packet loss in an effort to 1435 improve TCP performance. For example, while ARQ increases delay 1436 variance, FEC does not. However, FEC (especially when combined with 1437 interleaving) often increases mean delay even on good channels where 1438 ARQ retransmissions are not needed and ARQ would not increase either 1439 the delay or the delay variance. 1441 The tradeoffs among these error control mechanisms and their 1442 interactions with TCP can be quite complex, and are the subject of 1443 much ongoing research. We therefore recommend that subnetwork 1444 designers provide as much flexibility as possible in the 1445 implementation of these mechanisms, and to provide access to them as 1446 discussed above in the section on Quality of Service. 1448 12 Bandwidth Asymmetries 1450 Some subnetworks may provide asymmetric bandwidth (or may cause TCP 1451 packet flows to experience asymmetry in the capacity) and the 1452 Internet protocol suite will generally still work fine. However, 1453 there is a case when such a scenario reduces TCP performance. Since 1454 TCP data segments are 'clocked' out by returning acknowledgments, TCP 1455 senders are limited by the rate at which ACKs can be returned 1456 [BPK98]. Therefore, when the ratio of the bandwidth of the 1457 subnetwork carrying the data to the bandwidth of the subnetwork 1458 carrying the acknowledgments is too large, the slow return of the 1459 ACKs directly impacts performance. Since ACKs are generally smaller 1460 than data segments, TCP can tolerate some asymmetry, but as a general 1461 rule designers of subnetworks should be aware that subnetworks with 1462 significant asymmetry can result in reduced performance, unless 1463 issues are taken to mitigate this [RFC3449]. 1465 Several strategies have been identified for reducing the impact of 1466 asymmetry of the network path between two TCP end hosts, e.g. 1467 [RFC3449]. These techniques attempt to reduce the number of ACKs 1468 transmitted over the return path (low bandwidth channel) by changes 1469 at the end host(s), and/or by modification of subnetwork packet 1470 forwarding. While these solutions may mitigate the performance issues 1471 caused by asymmetric subnetworks, they do have associated cost and 1472 may have other implications. A fuller discussion of strategies and 1473 their implications is provided in [RFC3449]. 1475 13 Buffering, flow & congestion control 1477 Many subnets include multiple links with varying traffic demands and 1478 possibly different transmission speeds. At each link there must be a 1479 queuing system, including buffering, scheduling and a capability to 1480 discard excess subnet packets. These queues may also be part of a 1481 subnet flow control or congestion control scheme. 1483 For the purpose of this discussion, we talk about packets without 1484 regard to whether they refer to a complete IP packet or a subnetwork 1485 frame. At each queue, a packet experiences a delay that depends on 1486 competing traffic and the scheduling discipline, and is subjected to 1487 a local discarding policy. 1489 Some subnets may have flow or congestion control mechanisms in 1490 addition to packet dropping. Such mechanisms can operate on 1491 components in the subnet layer, such as schedulers, shapers or 1492 discarders, and can affect the operation of IP forwarders at the 1493 edges of the subnet. However, with the exception of Explicit 1494 Congestion Notification [RFC3168] (discussed below), IP has no way to 1495 pass explicit congestion or flow control signals to TCP. 1497 TCP traffic, especially aggregated TCP traffic, is bursty. As a 1498 result, instantaneous queue depths can vary dramatically, even in 1499 nominally stable networks. For optimal performance, packets should 1500 be dropped in a controlled fashion, not just when buffer space is 1501 unavailable. How much buffer space should be supplied is still a 1502 matter of debate, but as a rule of thumb, each node should have 1503 enough buffering to hold one link_bandwidth*link_delay product's 1504 worth of data for each TCP connection sharing the link. 1506 This is often difficult to estimate, since it depends on parameters 1507 beyond the subnetwork's control or knowledge. Internet nodes 1508 generally do not implement admission control policies, and cannot 1509 limit the number of TCP connections that use them. In general, it is 1510 wise to err in favor of too much buffering rather than too little. 1511 It may also be useful for subnets to incorporate mechanisms that 1512 measure propagation delays to assist in buffer sizing calculations. 1514 There is a rough consensus in the research community that active 1515 queue management is important to improving fairness, link utilization 1516 and throughput [RFC2309]. Although there are questions and concerns 1517 about the effectiveness of active queue management (e.g., [MBDL99]), 1518 it is widely considered an improvement over tail-drop discard 1519 policies. 1521 One form of active queue management is the Random Early Detection 1522 (RED) algorithm [RED93], actually a family of related algorithms. In 1523 one version of RED, an exponentially-weighted moving average of the 1524 queue depth is maintained: 1526 When this average queue depth is between a maximum threshold 1527 max_th, and a minimum threshold min_th, packets are dropped with a 1528 probability which is proportional to the amount by which the 1529 average queue depth exceeds min_th. 1531 When this average queue depth is equal to max_th, the drop 1532 probability is equal to a configurable parameter max_p. 1534 When this average queue depth is greater than max_th, packets are 1535 always dropped. Numerous variants on RED appear in the literature, 1536 and there are other active queue management algorithms which claim 1537 various advantages over RED [GM02]. 1539 With an active queue management algorithm, dropped packets become a 1540 feedback signal to trigger more appropriate congestion behavior by 1541 the TCPs in the end hosts. Randomization of dropping tends to break 1542 up the observed tendency of TCP windows belonging to different TCP 1543 connections to become synchronized by correlated drops, and it also 1544 imposes a degree of fairness on those connections that properly 1545 implement TCP congestion avoidance. Another important property of 1546 active queue management algorithms is that they attempt to keep 1547 average queue depths short while accommodating large short term 1548 bursts. 1550 Since TCP neither knows nor cares whether congestive packet loss 1551 occurs at the IP layer or in a subnet, it may be advisable for 1552 subnets that perform queuing and discarding to consider implementing 1553 some form of active queue management. This is especially true if 1554 large aggregates of TCP connections are likely to share the same 1555 queue. However, active queue management may be less effective in the 1556 case of many queues carrying smaller aggregates of TCP connections, 1557 e.g., in an ATM switch that implements per-VC queuing. 1559 Note that the performance of active queue management algorithms is 1560 highly sensitive to settings of configurable parameters, and also to 1561 factors such as RTT [MBB00] [FB00]. 1563 Some subnets, most notably ATM, perform segmentation and reassembly 1564 at the subnetwork edges. Care should be taken here in designing 1565 discard policies. If the subnet discards a fragment of an IP packet, 1566 then the remaining fragments become an unproductive load on the 1567 subnet that can markedly degrade end-to-end performance [RF95]. 1568 Subnetworks should therefore attempt to discard these extra fragments 1569 whenever one of them must be discarded. If the IP packet has already 1570 been partially forwarded when discarding becomes necessary, then 1571 every remaining fragment except the one marking the end of the IP 1572 packet should also be discarded. For ATM subnets, this specifically 1573 means using Early Packet Discard and Partial Packet Discard [ATMFTM]. 1575 Some subnets include flow control mechanisms that effectively require 1576 that the rate of traffic flows be shaped on entry to the subnet. One 1577 example of such a subnet mechanism is in the ATM Available Bit rate 1578 (ABR) service category [ATMFTM]. Such flow control mechanisms have 1579 the effect of making the subnet nearly lossless by pushing congestion 1580 into the IP routers at the edges of the subnet. In such a case, 1581 adequate buffering and discard policies are needed in these routers 1582 to deal with a subnet that appears to have varying bandwidth. 1583 Whether there is benefit in this kind of flow control is 1584 controversial; there are numerous simulation and analytical studies 1585 that go both ways. It appears that some of the issues that lead to 1586 such different results include sensitivity to ABR parameters, use of 1587 binary rather than explicit rate feedback, use (or not) of per-VC 1588 queuing, and the specific ATM switch algorithms selected for the 1589 study. Anecdotally, some large networks have used IP over ABR to 1590 carry TCP traffic, have claimed it to be successful, but have 1591 published no results. 1593 Another possible approach to flow control in the subnet would be to 1594 work with TCP Explicit Congestion Notification (ECN) semantics 1595 [RFC3168] through utilizing explicit congestion indicators in subnet 1596 frames. Routers at the edges of the subnet, rather than shaping, 1597 would set the explicit congestion bit in those IP packets that are 1598 received in subnet frames that have an ECN indication. Nodes in the 1599 subnet would need to implement an active queue management protocol 1600 that marks subnet frames instead of dropping them. 1602 ECN is currently a proposed standard, but it is not yet widely 1603 deployed. 1605 14 Compression 1607 Application data compression is a function that can usually be 1608 omitted in the subnetwork. The endpoints typically have more CPU and 1609 memory resources to run a compression algorithm and a better 1610 understanding of what is being compressed. End-to-end compression 1611 benefits every network element in the path, while subnetwork-layer 1612 compression, by definition, benefits only a single subnetwork. 1614 Data presented to the subnetwork layer may already be in compressed 1615 format (e.g., a JPEG file), compressed at the application layer 1616 (e.g., the optional "gzip", "compress", and "deflate" compression in 1617 HTTP/1.1 [RFC2616]), or compressed at the IP layer (the IP Payload 1618 Compression Protocol [RFC2393] supports DEFLATE [RFC2394] and LZS 1619 [RFC2395]). Compression at the subnetwork edges is of no benefit for 1620 any of these cases. 1622 The subnetwork may also process data that has been encrypted by the 1623 application (OpenPGP [RFC2440] or S/MIME [RFC2633]), just above TCP 1624 (SSL, TLS [RFC2246]), or just above IP (IPsec ESP [RFC2406]). Ciphers 1625 generate high entropy bit streams lacking any patterns that can be 1626 exploited by a compression algorithm. 1628 However, much data is still transmitted uncompressed over the 1629 Internet, so subnetwork compression may be beneficial. Any 1630 subnetwork compression algorithm must not expand uncompressible data, 1631 e.g., data that has already been compressed or encrypted. 1633 We make a strong recommendation that subnetworks operating at low 1634 speed or with small MTUs compress IP and transport-level headers (TCP 1635 and UDP) using several header compression schemes developed within 1636 the IETF. An uncompressed 40-byte TCP/IP header takes about 33 1637 milliseconds to send at 9600 bps. "VJ" TCP/IP header compression 1638 [RFC1144] compresses most headers to 3-5 bytes, reducing transmission 1639 time to several milliseconds on dialup modem links. This is 1640 especially beneficial for small, latency-sensitive packets in 1641 interactive sessions. 1643 Similarly, RTP compression schemes such as CRTP [RFC2508] and ROHC 1644 [RFC3095] compress most IP/UDP/RTP headers to one to four bytes. The 1645 resulting savings are especially significant when audio packets are 1646 kept small to minimize store-and-forward latency. 1648 Designers should consider the effect of the subnetwork error rate on 1649 the performance of header compression. TCP ordinarily recovers from 1650 lost packets by retransmitting only those packets that were actually 1651 lost; packets arriving correctly after a packet loss are kept on a 1652 resequencing queue and do not need to be retransmitted. In VJ TCP/IP 1653 [RFC1144] header compression, however, the receiver cannot explicitly 1654 notify a sender of data corruption and subsequent loss of 1655 synchronization between compressor and decompressor. It relies 1656 instead on TCP retransmission to re-synchronize the decompressor. 1657 After a packet is lost, the decompressor must discard every 1658 subsequent packet, even if the subnetwork makes no further errors, 1659 until the sending TCP retransmits to re-synchronize the decompressor. 1660 This effect can substantially magnify the effect of subnetwork packet 1661 losses if the sending TCP window is large, as it will often be on a 1662 path with a large bandwidth*delay product [LRKOJ99]. 1664 Alternate header compression schemes, such as those described in 1665 [RFC2507] include an explicit request for retransmission of an 1666 uncompressed packet to allow decompressor resynchronization without 1667 waiting for a TCP retransmission. However, these schemes are not yet 1668 in widespread use. 1670 Both TCP header compression schemes do not compress widely-used TCP 1671 options such as selective acknowledgements (SACK). Both fail to 1672 compress TCP traffic that makes use of explicit congestion 1673 notification (ECN). Work is under way in the IETF ROHC WG to address 1674 these shortcomings in a ROHC header compression scheme for TCP 1675 [RFC3095] [RFC3096]. 1677 The subnetwork error rate also is important for RTP header 1678 compression. CRTP uses delta encoding, so a packet loss on the link 1679 causes uncertainty about the subsequent packets, which often must be 1680 discarded until the decompressor has notified the compressor and the 1681 compressor has sent re-synchronizing information. This typically 1682 takes slightly more than the end-to-end path round-trip time. For 1683 links that combine significant error rates with latencies that 1684 require multiple packets to be in flight at a time, this leads to 1685 significant error propagation, i.e. subsequent losses caused by an 1686 initial loss. 1688 For links that are both high-latency (multiple packets in flight from 1689 a typical RTP stream) and error-prone, RTP ROHC provides a more 1690 robust way of RTP header compression, at a cost of higher complexity 1691 at the compressor and decompressor. For example, within a talk 1692 spurt, only extended losses of (depending on the mode chosen) 12 to 1693 64 packets typically cause error propagation. 1695 15 Packet Reordering 1697 The Internet architecture does not guarantee that packets will arrive 1698 in the same order in which they were originally transmitted, and 1699 transport protocols like TCP must take this into account. 1701 However, reordering does come at a cost with TCP as it is currently 1702 defined. Because TCP returns a cumulative acknowledgment (ACK) 1703 indicating the last in-order segment that has arrived, out-of-order 1704 segments cause a TCP receiver to transmit a duplicate acknowledgment. 1705 When the TCP sender notices three duplicate acknowledgments, it 1706 assumes that a segment was dropped by the network and uses the fast 1707 retransmit algorithm [Jac90] [RFC2581] to resend the segment. In 1708 addition, the congestion window is reduced by half, effectively 1709 halving TCP's sending rate. If a subnetwork reorders segments 1710 significantly such that three duplicate ACKs are generated, the TCP 1711 sender needlessly reduces the congestion window and performance 1712 suffers. 1714 Packet reordering does frequently occur in parts of the Internet, and 1715 it seems to be difficult or impossible to eliminate [BPS99]. For 1716 this reason, research has begun into improving TCP's behavior in the 1717 face of packet reordering [LK00] [BA02]. 1719 [BPS99] cites reasons why it may even be undesirable to eliminate 1720 reordering. There are situations where average packet latency can be 1721 reduced, link efficiency can be increased, and/or reliability can be 1722 improved if reordering is permitted. Examples include certain high 1723 speed switches within the Internet backbone and the parallel links 1724 used over many Internet paths for load splitting and redundancy. 1726 This suggests that subnetwork implementers should try to avoid packet 1727 reordering whenever possible, but not if doing so compromises 1728 efficiency, impairs reliability or increases average packet delay. 1730 Note that every header compression scheme currently standardized for 1731 the Internet requires in-order packet delivery on the link between 1732 compressor and decompressor. PPP is frequently used to carry 1733 compressed TCP/IP packets; since it was originally designed for 1734 point-to-point and dialup links it is assumed to provide in-order 1735 delivery. For this reason, subnetwork implementers who provide PPP 1736 interfaces to VPNs and other, more complex subnetworks must also 1737 maintain in-order delivery of PPP frames. 1739 16 Mobility 1741 Internet users are increasingly mobile. Not only are many Internet 1742 nodes laptop computers, but pocket organizers and mobile embedded 1743 systems are also becoming nodes on the Internet. These nodes may 1744 connect to many different access points on the Internet over time, 1745 and they expect this to be largely transparent to their activities. 1746 Except when they are not connected to the Internet at all, and for 1747 performance differences when they are connected, they expect that 1748 everything will "just work" regardless of their current Internet 1749 attachment point or local subnetwork technology. 1751 Changing a host's Internet attachment point involves one or more of 1752 the following steps. 1754 First, if use of the local subnetwork is restricted, the user's 1755 credentials must be verified and access granted. There are many ways 1756 to do this. A trivial example would be an "Internet cafe" that grants 1757 physical access to the subnetwork for a fee. Subnetworks may 1758 implement technical access controls of their own; one example is IEEE 1759 802.11 Wireless Equivalent Privacy [IEEE80211]. And it is common 1760 practice for both cellular telephone and Internet service providers 1761 (ISPs) to agree to serve each others users; RADIUS [RFC2865] is the 1762 standard means for ISPs to exchange authorization information. 1764 Second, the host may have to be reconfigured with IP parameters 1765 appropriate for the local subnetwork. This usually includes setting 1766 an IP address, default router, and domain name system (DNS) servers. 1767 On multiple-access networks, the Dynamic Host Configuration Protocol 1768 (DHCP) [RFC2131] is almost universally used for this purpose. On PPP 1769 links, these functions are performed by the IP Control Protocol 1770 (IPCP) [RFC1332]. 1772 Third, traffic destined for the mobile host must be routed to its 1773 current location. This roaming function is the most common meaning of 1774 the term "Internet mobility". 1776 Internet mobility can be provided at any of several layers in the 1777 Internet protocol stack, and there is ongoing debate as to which are 1778 the most appropriate and efficient. Mobility is already a feature of 1779 certain application layer protocols; the Post Office Protocol (POP) 1780 [RFC1939] and the Internet Message Access Protocol (IMAP) [RFC2060] 1781 were created specifically to provide mobility in the receipt of 1782 electronic mail. 1784 Mobility can also be provided at the IP layer [RFC2002]. This 1785 mechanism provides greater transparency, viz., IP addresses that 1786 remain fixed as the nodes move, but at the cost of potentially 1787 significant network overhead and increased delay because of the sub- 1788 optimal network routing and tunneling involved. 1790 Some subnetworks may provide internal mobility, transparent to IP, as 1791 a feature of their own internal routing mechanisms. To the extent 1792 that these simplify routing at the IP layer, reduce the need for 1793 mechanisms like Mobile IP, or exploit mechanisms unique to the 1794 subnetwork, this is generally desirable. This is especially true when 1795 the subnetwork covers a relatively small geographic area and the 1796 users move rapidly between the attachment points within that area. 1797 Examples of internal mobility schemes include Ethernet switching and 1798 intra-system handoff in cellular telephony. 1800 However, if the subnetwork is physically large and connects to other 1801 parts of the Internet at multiple geographic points, care should be 1802 taken to optimize the wide-area routing of packets between nodes on 1803 the external Internet and nodes on the subnet. This is generally done 1804 with "nearest exit" routing strategies. Because a given subnetwork 1805 may be unaware of the actual physical location of a destination on 1806 another subnetwork, it simply routes packets bound for the other 1807 subnetwork to the nearest router between the two. This implies some 1808 awareness of IP addressing and routing within the subnetwork. The 1809 subnetwork may wish to use IP routing internally for wide area 1810 routing and restrict subnetwork-specific routing to constrained 1811 geographic areas where the effects of suboptimal routing are 1812 minimized. 1814 17 Routing 1816 Subnetworks connecting more than two systems must provide their own 1817 internal layer-2 forwarding mechanisms, either implicitly (e.g., 1818 broadcast) or explicitly (e.g., switched). Since routing is the 1819 major function of the Internet layer, the question naturally arises 1820 as to the interaction between routing at the Internet layer and 1821 routing in the subnet, and proper division of function between the 1822 two. 1824 Layer 2 subnetworks can be point-to-point, connecting two systems, or 1825 multipoint. Multipoint subnetworks can be broadcast (e.g., shared 1826 media or emulated) or non-broadcast. Generally, IP considers 1827 multipoint subnetworks as broadcast, with shared-medium Ethernet as 1828 the canonical (and historical) example, and point-to-point 1829 subnetworks as a degenerate case. Non-broadcast subnetworks may 1830 require additional mechanisms, e.g., above IP at the routing layer 1831 [RFC2328]. 1833 IP is ignorant of the topology of the subnetwork layer. In 1834 particular, reconfiguration of subnetwork paths is not tracked by the 1835 IP layer. IP is only affected by whether it can send/receive packets 1836 sent to the remotely connected systems via the subnetwork interface 1837 (i.e. the reachability from one router to another). IP further 1838 considers that subnetworks are largely static - that both their 1839 membership and existence are stable at routing timescales (tens of 1840 seconds); both events are considered re-provisioning, rather than 1841 routing. 1843 Routing functionality in a subnetwork is related to addressing in 1844 that subnetwork. Resolution of addresses on subnetwork links is 1845 required for forwarding IP packets across links (e.g., ARP for IPv4, 1846 or ND for IPv6). There is unlikely to be direct interaction between 1847 subnetwork routing and IP routing. Where broadcast is provided or 1848 explicitly emulated, address resolution can be used directly; where 1849 not provided, the link layer routing may interface to a protocol for 1850 resolution, e.g., to the Next-Hop Resolution Protocol [RFC2322] to 1851 provide context-dependent address resolution capabilities. 1853 Subnetwork routing can either complement or compete with IP routing. 1854 It complements IP when a subnetwork encapsulates its internal 1855 routing, and where the effects of that routing are not noticible at 1856 the IP layer. However, if different paths in the subnetwork have 1857 characteristics that affect IP routing, it can affect or even inhibit 1858 the convergence of IP routing. 1860 Routing protocols generally consider layer 2 subnetworks, i.e., with 1861 subnet masks and no intermediate IP hops, to have uniform routing 1862 metrics to all members. Routing can break when a link's 1863 characteristics do not match the routing metric, in this case, e.g., 1864 when some member pairs have different path characteristics. Consider 1865 a virtual Ethernet subnetwork that includes both nearby (sub- 1866 millisecond latency) and remote (100's of milliseconds away) systems. 1867 Presenting that group as a single subnetwork means that some routing 1868 protocols will assume that all pairs have the same delay, and that it 1869 is small. Because this is not the case, the routing tables 1870 constructed may be suboptimal or may even fail to converge. 1872 When a subnetwork is used to transit between a set of routers, it 1873 conventionally provide the equivalent of a full mesh of point-to- 1874 point links. Simplicity of the internal subnet structure can be used 1875 (e.g., via NHRP [RFC2332]) to reduce the size of address resolution 1876 tables, but routing exchanges will continue to reflect the full mesh 1877 they emulate. In general, subnetworks should not be used as a transit 1878 among a set of routers where routing protocols would break if a full 1879 mesh of equivalent point-to-point links were used. 1881 Some subnetworks have special features that allow the use of more 1882 effective or responsive routing mechanisms that cannot be implemented 1883 in IP because of its need for generality. One example is the self- 1884 learning bridge algorithm widely used in Ethernet networks. Learning 1885 bridges perform Layer-2 subnetwork forwarding, avoiding the need for 1886 dynamic routing at each subnetwork hop. Another is the "handoff" 1887 mechanism in cellular telephone networks, particularly the "soft 1888 handoff" scheme in IS-95 CDMA. 1890 Subnetworks that cover large geographic areas or include links of 1891 widely-varying capabilities should be avoided. IP routing generally 1892 considers all multipoint subnets equivalent to a local, shared-medium 1893 link with uniform metrics between any pair of systems, and ignores 1894 internal subnetwork topology. Where a subnetwork diverges from that 1895 assumption, it is the obligation of subnetwork designers to provide 1896 compensating mechanisms. Not doing so can affect the scalability and 1897 convergence of IP routing, as noted above. 1899 The subnetwork designer who decides to implement internal routing 1900 should consider whether a custom routing algorithm is warranted, or 1901 if an existing Internet routing algorithm or protocol may suffice. 1902 The designer should consider whether this decision is to reduce the 1903 address resolution table size (possible, but with additional protocol 1904 support required), or is trying to reduce routing table complexity. 1905 The latter may be better achieved by partitioning the subnetwork, 1906 either physically or logically, and using network-layer protocols to 1907 support partitioning (e.g., AS's in BGP). Protocols and routing 1908 algorithms can be notoriously subtle, complex and difficult to 1909 implement correctly. Much work can be avoided if an existing 1910 protocol or existing implementations can be readily used. 1912 18 Security Considerations 1914 Security has become a high priority in the design and operation of 1915 the Internet. The Internet is vast, and countless organizations and 1916 individuals own and operate its various components. A consensus has 1917 emerged for what might be called a "security placement principle": a 1918 security mechanism is most effective when it is placed as close as 1919 possible to, and under the direct control of the owner of, the asset 1920 that it protects. 1922 A corollary of this principle is that end-to-end security (e.g., 1923 confidentiality, authentication, integrity and access control) cannot 1924 be ensured with subnetwork security mechanisms. Not only are end-to- 1925 end security mechanisms much more closely associated with the end- 1926 user assets they protect, they are also much more comprehensive. For 1927 example, end-to-end security mechanisms cover gaps that can appear 1928 when otherwise good subnetwork mechanisms are concatenated. This is 1929 an important application of the end-to-end principle [SRC81]. 1931 Several security mechanisms that can be used end-to-end have already 1932 been deployed in the Internet and are enjoying increasing use. The 1933 most important are the Secure Sockets Layer (SSL) [SSL2] [SSL3] and 1934 TLS [RFC2246] primarily used to protect web commerce; Pretty Good 1935 Privacy (PGP) [RFC1991] and S/MIME [RFCs-2630-2634], primarily used 1936 to protect and authenticate email and software distributions; the 1937 Secure Shell (SSH), used for secure remote access and file transfer; 1938 and IPsec [RFC2401], a general purpose encryption and authentication 1939 mechanism that sits just above IP and can be used by any IP 1940 application. (IPsec can actually be used either on an end-to-end 1941 basis or between security gateways that do not include either or both 1942 end systems.) 1944 Nonetheless, end-to-end security mechanisms are not used as widely as 1945 might be desired. However, the group could not reach consensus on 1946 whether subnetwork designers should be actively encouraged to 1947 implement mechanisms to protect user data. 1949 The clear consensus of the working group held that subnetwork 1950 security mechanisms, especially when weak or incorrectly implemented 1951 [BGW01], may actually be counterproductive. The argument is that 1952 subnetwork security mechanisms can lull end users into a false sense 1953 of security, diminish the incentive to deploy effective end-to-end 1954 mechanisms, and encourage "risky" uses of the Internet that would not 1955 be made if users understood the inherent limits of subnetwork 1956 security mechanisms. 1958 The other point of view encourages subnetwork security on the 1959 principle that it is better than the default situation, which all too 1960 often is no security at all. Users of especially vulnerable subnets 1961 (such as consumers who have wireless home networks and/or shared 1962 media Internet access) often have control over at most one endpoint 1963 -- usually a client -- and therefore cannot enforce the use of end- 1964 to-end mechanisms. However, subnet security can be entirely adequate 1965 for protecting low-valued assets against the most likely threats. In 1966 any event, subnet mechanisms do not preclude the use of end-to-end 1967 mechanisms, which are typically used to protect highly-valued assets. 1968 This viewpoint recognizes that many security policies implicitly 1969 assume that the entire end-to-end path is composed of a series of 1970 concatenated links that are nominally physically secured. That is, 1971 these policies assume that all endpoints of all links are trusted and 1972 that access to the physical medium by attackers is difficult. To 1973 meet the assumptions of such policies, explicit mechanisms are needed 1974 for links (especially shared medium links) that lack physical 1975 protection. This, for example, is the rationale that underlies Wired 1976 Equivalent Privacy (WEP) in the IEEE 802.11 [IEEE80211] wireless LAN 1977 standard, and the Baseline Privacy Interface in the DOCSIS [DOCSIS1] 1978 [DOCSIS2] data over cable television networks standards. 1980 We therefore recommend that subnetwork designers who choose to 1981 implement security mechanisms to protect user data be as candid as 1982 possible with the details of such security mechanisms and the 1983 inherent limits of even the most secure mechanisms when implemented 1984 in a subnetwork rather than on an end-to-end basis. 1986 In keeping with the "placement principle", a clear consensus exists 1987 for another subnetwork security role: the protection of the 1988 subnetwork itself. Possible threats to subnetwork assets include 1989 theft of service and denial of service; shared media subnets tend to 1990 be especially vulnerable to such attacks. In some cases, mechanisms 1991 that protect subnet assets can also improve (but can not ensure) end- 1992 to-end security. 1994 One security service can be provided by the subnetwork that will aid 1995 in the solution to an overall Internet problem: subnetwork security 1996 SHOULD provide a mechanism to authenticate the source of a subnetwork 1997 frame. This function is missing in some current protocols, e.g., the 1998 use of ARP [RFC0826] to associate an IPv4 address with a MAC address. 1999 The IPv6 Neighbor Discovery (ND) [RFC2461] performs a similar 2000 function. 2002 There are well known security flaws with this address resolution 2003 mechanism [Wilbur99]. However, the inclusion of subnetwork frame 2004 source authentication will permit a secure subnetwork address 2006 Another potential role for subnetwork security is to protect users 2007 against traffic analysis, i.e., identifying the communicating parties 2008 and determining their communication patterns and volumes even when 2009 their actual contents are protected by strong end-to-end security 2010 mechanisms. Lower-layer security can be more effective against 2011 traffic analysis due to its inherent ability to aggregate the 2012 communications of multiple parties sharing the same physical 2013 facilities while obscuring higher layer protocol information that 2014 indicates specific end points, such as IP addresses and TCP/UDP port 2015 numbers. 2017 However, traffic analysis is a notoriously subtle and difficult 2018 threat to understand and defeat, far more so than threats to 2019 confidentiality and integrity. We therefore urge extreme care in the 2020 design of subnetwork security mechanisms specifically intended to 2021 thwart traffic analysis. 2023 Subnetwork designers must keep in mind that design and implementation 2024 for security is difficult [Schneier00]. [Schneier95] describes 2025 protocols and algorithms which are considered well understood and 2026 believed to be sound. 2028 Poor design process, subtle design errors and flawed implementation 2029 can result in gaping vulnerabilities. In recent years, a number of 2030 subnet standards have had problems exposed. The following are 2031 examples of mistakes that have been made: 2033 1. Use of weak and untested algorithms [Crypto9912] [BGW01]. For a 2034 variety of reasons, algorithms were chosen which had subtle flaws 2035 that made them vulnerable to a variety of attacks. 2037 2. Use of 'security by obscurity' [Schneier4] [Crypto9912]. One 2038 common mistake is to assume that keeping cryptographic algorithms 2039 secret makes them more secure. This is intuitive, but wrong. Full 2040 public disclosure early in the design process attracts peer review by 2041 knowledgeable cryptographers. Exposure of flaws by this review far 2042 outweighs any imagined benefit from forcing attackers to reverse 2043 engineer security algorithms. 2045 3. Inclusion of trapdoors [Schneier4] [Crypto9912]. Trapdoors are 2046 flaws surreptitiously left in an algorithm to allow it to be broken. 2047 This might be done to recover lost keys or to permit surreptitious 2048 access by governmental agencies. Trapdoors can be discovered and 2049 exploited by malicious attackers. 2051 4. Sending passwords or other identifying information as clear text. 2052 For many years, analog cellular telephones could be cloned and used 2053 to steal service. The cloners merely eavesdropped on the 2054 registration protocols that exchanged everything in clear text. 2056 5. Keys which are common to all systems on a subnet [BGW01]. 2058 6. Incorrect use of a sound mechanism. For example [BGW01], one 2059 subnet standard includes an initialization vector which is poorly 2060 designed and poorly specified. A determined attacker can easily 2061 recover multiple ciphertexts encrypted with the same key stream and 2062 perform statistical attacks to decipher them. 2064 7. Identifying information sent in clear text that can be resolved to 2065 an individual, identifiable device. This creates a vulnerability to 2066 attacks targeted to that device (or its owner). 2068 8. Inability to renew and revoke shared secret information. 2070 9. Insufficient key length. 2072 10. Failure to address "man-in-the-middle" attacks, e.g., with mutual 2073 authentication. 2075 11. Failure to provide a form of replay detection, e.g., to prevent a 2076 receiver from accepting packets from an attacker that simply resends 2077 previously captured network traffic. 2079 12. Failure to provide integrity mechanisms when providing 2080 confidentiality schemes [Bel98]. 2082 This list is by no means comprehensive. Design problems are 2083 difficult to avoid, but expert review is generally invaluable in 2084 avoiding problems. 2086 In addition, well-designed security protocols can be compromised by 2087 implementation defects. Examples of such defects include use of 2088 predictable pseudo-random numbers [RFC1750], vulnerability to buffer 2089 overflow attacks due to unsafe use of certain I/O system calls 2090 [WFBA2000], and inadvertent exposure of secret data. 2092 Normative References 2094 References of the form RFCnnnn are Internet Request for Comments 2095 (RFC) documents available online at www.rfc-editor.org. 2097 [ATMFTM] The ATM Forum, "Traffic Management Specification, Version 2098 4.0", April 1996, document af-tm-0056.000 (www.atmforum.com). 2100 [BGW01] Nikita Borisov, Ian Goldberg and David Wagner, "Intercepting 2101 Mobile Communications: The Insecurity of 802.11," In Proceedings of 2102 ACM MobiCom, July 2001. 2104 [BPK98] Hari Balakrishnan, Venkata Padmanabhan, Randy H. Katz. 'The 2105 Effects of Asymmetry on TCP Performance." ACM Mobile Networks and 2106 Applications (MONET), 1998. 2108 [ISO3309] ISO/IEC 3309:1991(E), "Information Technology - 2109 Telecommunications and information exchange between systems - High- 2110 level data link control (HDLC) procedures - Frame structure", 2111 International Organization For Standardization, Fourth edition 2112 1991-06-01. 2114 [MSMO97] M. Mathis, J. Semke, J. Mahdavi, T. Ott, "The Macroscopic 2115 Behavior of the TCP Congestion Avoidance Algorithm", Computer 2116 Communication Review, volume 27, number 3, July 1997. 2118 [PFTK98] Padhye, J., Firoiu, V., Towsley, D., and Kurose, J., 2119 "Modeling TCP Throughput: a Simple Model and its Empirical 2120 Validation", UMASS CMPSCI Tech Report TR98-008, Feb. 1998. 2122 [RED93] S. Floyd, V. Jacobson, "Random Early Detection gateways for 2123 Congestion Avoidance", IEEE/ACM Transactions in Networking, V.1 N.4, 2124 August 1993, http://www.aciri.org/floyd/papers/red/red.html 2126 [RFC791] Jon Postel. "Internet Protocol". September 1981. 2128 [RFC793] Jon Postel. "Transmission Control Protocol", September 2129 1981. 2131 [RFC1144] Jacobson, V., "Compressing TCP/IP Headers for Low-Speed 2132 Serial Links," RFC 1144, February 1990. 2134 [RFC1191] J. Mogul, S. Deering. "Path MTU Discovery". November 1990. 2136 [RFC1435] S. Knowles. "IESG Advice from Experience with Path MTU 2137 Discovery". March 1993. 2139 [RFC1661] W. Simpson. "The Point-to-Point Protocol (PPP)". July 1994. 2141 [RFC1812] F. Baker, "Requirements for IP Version 4 Routers". June 2142 1995. 2144 [RFC1981] J. McCann, S. Deering, J. Mogul. "Path MTU Discovery for IP 2145 version 6". August 1996. 2147 [RFC2246] T. Dierks, C. Allen. "The TLS Protocol Version 1.0". 2148 January 1999. 2150 [RFC2309] B. Braden, D. Clark, J. Crowcroft, B. Davie, S. Deering, 2151 D. Estrin, S. Floyd, V. Jacobson, G. Minshall, C. Partridge, L. 2152 Peterson, K. Ramakrishnan, S. Shenker, J. Wroclawski, L. Zhang. 2153 "Recommendations on Queue Management and Congestion Avoidance in the 2154 Internet". April 1998. 2156 [RFC2364] G. Gross et al. "PPP Over AAL5". July 1998. 2158 [RFC2393] A. Shacham et al. "IP Payload Compression Protocol 2159 (IPComp)". December 1998. 2161 [RFC2394] R. Pereira. "IP Payload Compression Using DEFLATE". 2162 December 1998. 2164 [RFC2395] R. Friend, R. Monsour. "IP Payload Compression Using LZS". 2165 December 1998. 2167 [RFC2507] M. Degermark, B. Nordgren, S. Pink. "IP Header 2168 Compression". February 1999. 2170 [RFC2508] S. Casner, V. Jacobson. "Compressing IP/UDP/RTP Headers for 2171 Low-Speed Serial Links". February 1999. 2173 [RFC2581] M. Allman, V. Paxson, W. Stevens. "TCP Congestion Control". 2174 April 1999. 2176 [RFC2406] S. Kent, R. Atkinson. "IP Encapsulating Security Payload 2177 (ESP)". November 1998. 2179 [RFC2684] D. Grossman, J. Heinanen. "Multiprotocol Encapsulation over 2180 ATM Adaptation Layer 5". September 1999. 2182 [RFC2686] C. Bormann, "The Multi-Class Extension to Multi-Link PPP", 2183 September 1999. 2185 [RFC2687] C. Bormann, "PPP in a Real-time Oriented HDLC-like 2186 Framing", September 1999. 2188 [RFC2689] C. Bormann, "Providing Integrated Services over Low-bitrate 2189 Links", September 1999. 2191 [RFC2914] S. Floyd. "Congestion Control Principles". September 2000 2193 [RFC2923] K. Lahey. "TCP Problems with Path MTU Discovery". 2194 September 2000. 2196 [RFC2988] V.Paxson, M. Allman. "Computing TCP's Retransmission 2197 Timer". November 2000. 2199 [RFC3095] C. Bormann, ed., C. Burmeister, M. Degermark, H. Fukushima, 2200 H. Hannu, L-E. Jonsson, R. Hakenberg, T. Koren, K. Le, Z. Liu, A. 2201 Martensson, A. Miyazaki, K. Svanbro, T. Wiebke, T. Yoshimura, H. 2202 Zheng, "RObust Header Compression (ROHC): Framework and four 2203 profiles: RTP, UDP, ESP, and uncompressed", July 2001. 2205 [RFC3096] M. Degermark, ed., "Requirements for robust IP/UDP/RTP 2206 header compression", July 2001. 2208 [RFC3168] K. Ramakrishnan, S. Floyd, D. Black, "The Addition of 2209 Explicit Congestion Notification (ECN) to IP", September 2001. 2211 [Schneier95] Schneier, Bruce, Applied Cryptography: Protocols, 2212 Algorithms and Source Code in C (John Wiley and Sons, October 1995). 2214 [Schneier00] Schneier, Bruce, Secrets and Lies: Digital Security in a 2215 Networked World (John Wiley & Sons, August 2000). 2217 [SRC81] Jerome H. Saltzer, David P. Reed and David D. Clark, "End-to- 2218 End Arguments in System Design". Second International Conference on 2219 Distributed Computing Systems (April, 1981) pages 509-512. Published 2220 with minor changes in ACM Transactions in Computer Systems 2, 4, 2221 November, 1984, pages 277-288. Reprinted in: Craig Partridge, editor 2222 Innovations in internetworking. Artech House, Norwood, MA, 1988, 2223 pages 195-206. ISBN 0-89006-337-0. 2224 http://people.qualcomm.com/karn/library.html 2226 [SSL2] Hickman, Kipp, "The SSL Protocol", Netscape Communications 2227 Corp., Feb 9, 1995. 2229 [SSL3] A. Frier, P. Karlton, and P. Kocher, "The SSL 3.0 Protocol", 2230 Netscape Communications Corp., Nov 18, 1996. 2232 Informative References 2234 References of the form RFCnnnn are Internet Request for Comments 2235 (RFC) documents available online at www.rfc-editor.org. 2237 [802.1D] Information Technology Telecommunications and information 2238 exchange between systems Local and metropolitan area networks, Common 2239 specifications Media access control (MAC) bridges, IEEE 802.1D, 1998. 2240 ISO 15802-3. 2242 [802.1p] IEEE, 802.1p, Standard for Local and Metropolitan Area 2243 Networks - Supplement to Media Access Control (MAC) Bridges: Traffic 2244 Class Expediting and Multicast 2246 [AP99] M. Allman, V. Paxson, On Estimating End-to-End Network Path 2247 Properties, In Proceedings of ACM SIGCOMM 99. 2249 [AR02] G. Acar and C. Rosenberg, Weighted Fair Bandwidth-on-Demand 2250 (WFBoD) for Geo-Stationary Satellite Networks with On-Board 2251 Processing, Computer Networks, 39(1), 2002. 2253 [BA02] Ethan Blanton, Mark Allman. On Making TCP More Robust to 2254 Packet Reordering. ACM Computer Communication Review, 32(1), January 2255 2002. 2257 [Bel98] Steven M. Bellovin, "Cryptography and the Internet", in 2258 Proceedings of CRYPTO '98, August 1998. 2259 (http://www.research.att.com/~smb/papers/inet-crypto.pdf) 2261 [BPS99] "Packet Reordering is Not Pathological Network Behavior", Jon 2262 C. R. Bennet, Craig Partridge, Nicholas Shectman, IEEE/ACM 2263 Transactions on Networking, Vol 7, No. 6, December 1999. 2265 [CGMP] Farinacci D., Tweedly A., Speakman T., "Cisco Group Management 2266 Protocol (CGMP)", 1996/1997 2267 ftp://ftpeng.cisco.com/ipmulticast/specs/cgmp.txt 2269 [ITU-I363] ITU-T I.363.5 B-ISDN ATM Adaptation Layer Specification 2270 Type AAL5, International Standards Organisation (ISO), 1996. 2272 [RFC3366] Fairhurst, G., and L. Wood, Advice to link designers on 2273 link Automatic Repeat reQuest (ARQ), August 2002. 2275 [RFC3449] H. Balakrishnan, V. N. Padmanabhan, G. Fairhurst, M, 2276 Sooriyabandara. "TCP Performance Implications of Network Path 2277 Asymmetry", December 2002. 2279 [Crypto9912] Schneier, Bruce "European Cellular Encryption 2280 Algorithms" Crypto-Gram (December 15, 1999) 2281 http://www.counterpane.com 2283 [DIX82] Digital Equipment Corp, Intel Corp, Xerox Corp, Ethernet 2284 Local Area Network Specification Version 2.0, November 1982. 2286 [DOCSIS1] Data-Over-Cable Service Interface Specifications, Radio 2287 Frequency Interface Specification 1.0, SP-RFI-I05-991105, November 2288 1999, Cable Television Laboratories, Inc. 2290 [DOCSIS2] Data-Over-Cable Service Interface Specifications, Radio 2291 Frequency Interface Specification 1.1, SP-RFIv1.1-I05-000714, July 2292 2000, Cable Television Laboratories, Inc. 2294 [DOCSIS3] W.S. Lai, "DOCSIS-Based Cable Networks: Impact of Large 2295 Data Packets on Upstream Capacity", 14th ITC Specialists Seminar on 2296 Access Networks and Systems, Barcelona, Spain, April 25-27, 2001. 2298 [EN301] ETSI, European Broadcasting Union, Digital Video Broadcasting 2299 (DVB); DVB Specification for Data Broadcasting, European Standard 2300 (Telecommunications Series) EN 301 192 v1.2.1(1999-06) 2302 [ES00] David A. Eckhardt and Peter Steenkiste, "Effort-limited Fair 2303 (ELF) Scheduling for Wireless Networks, Proceedings of IEEE Infocom 2304 2000. 2306 [FB00] Firoiu V., and Borden M., "A Study of Active Queue Management 2307 for Congestion Control" to appear in Infocom 2000 2309 [IEEE8023] IEEE 802.3 CSMA/CD Access Method. Available from 2310 http://standards.ieee.org/ 2312 [IEEE80211] IEEE 802.11 Wireless LAN standard. Available from 2313 http://standards.ieee.org/ 2315 [ISO13818] ISO/IEC, ISO/IEC 13818-1:2000(E) Information Technology 2316 - Generic coding of moving pictures and associated audio information: 2317 Systems, Second edition, 2000-12-01 International Organization for 2318 Standardization and International Electrotechnical Commission. 2320 [Jac90] Van Jacobson. Modified TCP Congestion Avoidance Algorithm. 2321 Email to the end2end-interest mailing list, April 1990. URL: 2322 ftp://ftp.ee.lbl.gov/email/vanj.90apr30.txt. 2324 [KY02] F. Khafizov, M. Yavuz. Running TCP Over IS-2000, Proceedings 2325 of IEEE ICC, 2002. 2327 [LK00] R. Ludwig, R. H. Katz, "The Eifel Algorithm: Making TCP Robust 2328 Against Spurious Retransmissions", ACM Computer Communication Review, 2329 Vol. 30, No. 1, January 2000. 2331 [LKJK02] R. Ludwig, A. Konrad, A. D. Joseph, R. H. Katz, "Optimizing 2332 the End-to-End Performance of Reliable Flows over Wireless Links", 2333 Kluwer/ACM Wireless Networks Journal, Vol. 8, Nos. 2/3, pp. 289-299, 2334 March-May 2002. 2336 [LRKOJ99] R. Ludwig, B. Rathonyi, A. Konrad, K. Oden, A. Joseph, 2337 Multi-Layer Tracing of TCP over a Reliable Wireless Link, pp. 2338 144-154, In Proceedings of ACM SIGMETRICS 99. 2340 [LS00] R. Ludwig, K. Sklower, The Eifel Retransmission Timer, ACM 2341 Computer Communication Review, Vol. 30, No. 3, July 2000. 2343 [MAGMA-PROXY] Work In Progress, MAGMA WG, draft-ietf-magma-igmp- 2344 proxy-04.txt 2346 [MAGMA-SNOOP] Work In Progress, MAGMA WG, draft-ietf-magma- 2347 snoop-09.txt 2349 [MBB00] May, M., Bonald, T., and Bolot, J-C., "Analytic Evaluation of 2350 RED Performance", INFOCOM 2000. 2352 [MBDL99] May, M., Bolot, J., Diot, C., and Lyles, B., "Reasons not to 2353 deploy RED", Proc. of 7th. International Workshop on Quality of 2354 Service (IWQoS'99), June 1999. 2356 [GM02] Luigi Alfredo Grieco1, Saverio Mascolo, "TCP Westwood and Easy 2357 RED to Improve Fairness in High-Speed Networks", Proceedings of the 2358 7th International Workshop on Protocols for High-Speed Networks, 2359 April 2002. 2361 [MBONED-GAP] Meyer, D. and B. Nickless, Work In Progress, MBoned WG, 2362 draft-ietf-mboned-iesg-gap-analysis-01.txt. 2364 [MYR95] Nanette J. Boden, Danny Cohen, Robert E. Felderman, Alan E. 2365 Kulawik, Charles L. Seitz, et al. MYRINET: A Gigabit per Second 2366 Local Area Network, IEEE-Micro,Vol.15, No.1, February 1995, pp.29-36. 2368 [RF95] Romanow, A., and Floyd, S., "Dynamics of TCP Traffic over ATM 2369 Networks". IEEE Journal of Selected Areas in Communication, V. 13 N. 2370 4, May 1995, p. 633-641. 2372 [RFC0826] Plummer, D.C., "Ethernet Address Resolution Protocol: Or 2373 converting network protocol addresses to 48-bit Ethernet address for 2374 transmission on Ethernet hardware," STD 37, RFC 826, November 1982. 2376 [RFC1071] R. Braden, D. Borman, C. Partridge, "Computing the Internet 2377 Checksum", September 1988. 2379 [RFC1112] S. Deering, "Host Extensions for IP Multicasting", August 2380 1989. 2382 [RFC1750] D. Eastlake, S. Crocker, J. Schiller, "Randomness 2383 Recommendations for Security", December 1994. 2385 [RFC2018] M. Mathis, J. Mahdavi, S. Floyd, A. Romanow. "TCP 2386 Selective Acknowledgement Options". October 1996. 2388 [RFC2236] W. Fenner, Internet Group Management Protocol, Version 2., 2389 November 1997. 2391 [RFC2328] J. Moy, "OSPF Version 2", April 1998. 2393 [RFC2401] S. Kent, R. Atkinson, "Security Architecture for the 2394 Internet Protocol". November 1998. 2396 [RFC2440] J. Callas et al. "OpenPGP Message Format". November 1998. 2398 [RFC2460] S. Deering, R. Hinden. "Internet Protocol, Version 6 2399 (IPv6) Specification". December 1998. 2401 [RFC2461] T. Narten, E. Nordmark, W. Simpson. "Neighbor Discovery 2402 for IP Version 6 (IPv6)". December 1998. 2404 [RFC2616] R. Fielding et al. "Hypertext Transfer Protocol -- 2405 HTTP/1.1". June 1999. 2407 [RFC2630] R. Housley. "Cryptographic Message Syntax". June 1999. 2409 [RFC2631] E. Rescorla. "Diffie-Hellman Key Agreement Method". June 2410 1999. 2412 [RFC2632] B. Ramsdell. "S/MIME Version 3 Certificate Handling". 2413 June 1999. 2415 [RFC2633] B. Ramsdell. "S/MIME Version 3 Message Specification". 2416 June 1999. 2418 [RFC2710] S. Deering, W. Fenner, B. Haberman, Multicast Listener 2419 Discovery (MLD) for IPv6, October 1999. 2421 [RFC2784] D. Farinacci, T. Li, S. Hanks, D. Meyer, P. Traina. 2422 "Generic Routing Encapsulation (GRE)". March 2000. 2424 [RFC2923] K. Lahey. "TCP Problems with Path MTU Discovery". 2425 September 2000. 2427 [RFC3048] B. Whetten, L. Vicisano, R. Kermode, M. Handley, S. Floyd, 2428 M. Luby. "Reliable Multicast Transport Building Blocks for One-to- 2429 Many Bulk-Data Transfer". January 2001. 2431 [RFC3376] B. Cain, S. Deering, I. Kouvelas, B. Fenner, A. 2432 Thyagarajan, Internet Group Management Protocol, Version 3, October 2433 2002. 2435 [RFC3488] Cisco Systems Router-port Group Management Protocol (RGMP). 2436 I. Wu, T. Eckert. February 2003. 2438 [RFC3590] B. Haberman, Source Address Selection for the Multicast 2439 Listener Discovery (MLD) Protocol, September 2003. 2441 [SP2000] "When the CRC and TCP Checksum Disagree", Jonathan Stone & 2442 Craig Partridge, ACM SIGCOMM, September 2000. 2443 http://www.acm.org/sigcomm/sigcomm2000/conf/paper/sigcomm2000-9-1.pdf 2445 [Stevens94] R. Stevens, "TCP/IP Illustrated, Volume 1," Addison- 2446 Wesley, 1994 (section 2.10). 2448 [TCPF98] Dong Lin and H.T. Kung, "TCP Fast Recovery Strategies: 2449 Analysis and Improvements", IEEE Infocom, March 1998. Available 2450 from: "http://www.eecs.harvard.edu/networking/papers/infocom-tcp- 2451 final-198.pdf" 2453 [WFBA2000] David Wagner, Jeffrey S. Foster, Eric Brewer and Alexander 2454 Aiken, "A First Step Toward Automated Detection of Buffer Overrun 2455 Vulnerabilities", Proceedings of NDSS2000, or 2456 http://www.berkeley.edu:80/~daw/papers/ 2458 [Wilbur89] Wilbur, Steve R., Jon Crowcroft, and Yuko Murayama. "MAC 2459 layer Security Measures in Local Area Networks, " Local Area Network 2460 Security, Workshop LANSEC '89 Proceedings, Springer-Verlag, April 2461 1989, pp.53-64. 2463 Authors' Addresses: 2465 Phil Karn, Editor Qualcomm 5775 Morehouse Drive San Diego CA 92121 2466 858 587 1121 karn@qualcomm.com 2468 Carsten Bormann Universitaet Bremen FB3 TZI Postfach 330440 D-28334 2469 Bremen, GERMANY +49 421 218 7024 cabo@tzi.org 2471 Godred (Gorry) Fairhurst Department of Engineering University of 2472 Aberdeen Aberdeen, AB24 3UE UK gorry@erg.abdn.ac.uk 2473 http://www.erg.abdn.ac.uk/users/gorry 2475 Dan Grossman Motorola, Inc. 20 Cabot Blvd. Mansfield, MA 02048 2476 dan@dma.isg.mot.com 2478 Reiner Ludwig Ericsson Research Ericsson Allee 1 52134 Herzogenrath, 2479 Germany +49 2407 575 719 Reiner.Ludwig@ericsson.com 2481 Jamshid Mahdavi Volera, Inc. 2211 N. 1st St. San Jose, CA 95131 2482 mahdavi@volera.com 2484 Gabriel Montenegro Sun Microsystems Laboratories, Europe 29, chemin 2485 du Vieux Chene 38240 Meylan, FRANCE gab@sun.com 2487 Joe Touch USC/ISI 4676 Admiralty Way Marina del Rey CA 90292 310 448 2488 9151 touch@isi.edu http://www.isi.edu/touch 2490 Lloyd Wood Global Defense and Space Group, Cisco Systems 9 New Square 2491 Park, Bedfont Lakes Feltham TW14 8HA, United Kingdom +44 (0)20 8824 2492 4236 lwood@cisco.com http://www.ee.surrey.ac.uk/Personal/L.Wood/ 2494 Contributors' Addresses: 2496 Aaron Falk USC Information Sciences Institute 4676 Admiralty Way 2497 Marina Del Rey, CA 90292 310-448-9327 falk@isi.edu 2499 Saverio Mascolo Dipartimento di Elettrotecnica ed Elettronica, 2500 Politecnico di Bari Via Orabona 4, 70125 Bari, Italy +39 080 596 2501 3621 mascolo@poliba.it http://www-dee.poliba.it/dee- 2502 web/Personale/mascolo.html 2504 Marie-Jose Montpetit marie@mjmontpetit.com 2506 Full Copyright Statement 2508 Copyright (C) The Internet Society (2003). All Rights Reserved. 2510 This document and translations of it may be copied and furnished to 2511 others, and derivative works that comment on or otherwise explain it 2512 or assist in its implementation may be prepared, copied, published 2513 and distributed, in whole or in part, without restriction of any 2514 kind, provided that the above copyright notice and this paragraph are 2515 included on all such copies and derivative works. However, this 2516 document itself may not be modified in any way, such as by removing 2517 the copyright notice or references to the Internet Society or other 2518 Internet organizations, except as needed for the purpose of 2519 developing Internet standards in which case the procedures for 2520 copyrights defined in the Internet Standards process must be 2521 followed, or as required to translate it into languages other than 2522 English. 2524 The limited permissions granted above are perpetual and will not be 2525 revoked by the Internet Society or its successors or assigns. 2527 This document and the information contained herein is provided on an 2528 "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING 2529 TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING 2530 BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION 2531 HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF 2532 MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.