idnits 2.17.1 draft-thubert-6lo-forwarding-fragments-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The draft header indicates that this document updates RFC4944, but the abstract doesn't seem to mention this, which it should. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year (Using the creation date from RFC4944, updated by this document, for RFC5378 checks: 2005-07-13) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (July 19, 2017) is 2472 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. 'IEEE.802.15.4' == Outdated reference: A later version (-30) exists of draft-ietf-6tisch-architecture-11 Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 6lo P. Thubert, Ed. 3 Internet-Draft Cisco Systems 4 Updates: 4944 (if approved) J. Hui 5 Intended status: Standards Track Nest Labs 6 Expires: January 20, 2018 July 19, 2017 8 LLN Fragment Forwarding and Recovery 9 draft-thubert-6lo-forwarding-fragments-06 11 Abstract 13 Considering that an LLN frame can have a MAC payload below 100 bytes, 14 an IPv6 packet might be fragmented into more than 10 fragments at the 15 6LoWPAN layer. In a 6LoWPAN mesh-under network, the fragments can be 16 forwarded individually across the mesh, whereas a route-over mesh 17 network, a fragmented 6LoWPAN packet must be reassembled at every 18 hop, which causes latency and congestion. This draft introduces a 19 simple protocol to forward individual fragments across a route-over 20 mesh network, and, regardless of the type of mesh, recover the loss 21 of individual fragments across the mesh and protect the network 22 against bloat with a minimal flow control. 24 Status of This Memo 26 This Internet-Draft is submitted in full conformance with the 27 provisions of BCP 78 and BCP 79. 29 Internet-Drafts are working documents of the Internet Engineering 30 Task Force (IETF). Note that other groups may also distribute 31 working documents as Internet-Drafts. The list of current Internet- 32 Drafts is at http://datatracker.ietf.org/drafts/current/. 34 Internet-Drafts are draft documents valid for a maximum of six months 35 and may be updated, replaced, or obsoleted by other documents at any 36 time. It is inappropriate to use Internet-Drafts as reference 37 material or to cite them other than as "work in progress." 39 This Internet-Draft will expire on January 20, 2018. 41 Copyright Notice 43 Copyright (c) 2017 IETF Trust and the persons identified as the 44 document authors. All rights reserved. 46 This document is subject to BCP 78 and the IETF Trust's Legal 47 Provisions Relating to IETF Documents 48 (http://trustee.ietf.org/license-info) in effect on the date of 49 publication of this document. Please review these documents 50 carefully, as they describe your rights and restrictions with respect 51 to this document. Code Components extracted from this document must 52 include Simplified BSD License text as described in Section 4.e of 53 the Trust Legal Provisions and are provided without warranty as 54 described in the Simplified BSD License. 56 Table of Contents 58 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 59 2. Updating RFC 4944 . . . . . . . . . . . . . . . . . . . . . . 3 60 3. Terminology and Referenced Work . . . . . . . . . . . . . . . 4 61 4. New Dispatch types and headers . . . . . . . . . . . . . . . 5 62 4.1. Recoverable Fragment Dispatch type and Header . . . . . . 5 63 4.2. RFRAG Acknowledgment Dispatch type and Header . . . . . . 7 64 5. Fragments Recovery . . . . . . . . . . . . . . . . . . . . . 8 65 6. Forwarding Fragments . . . . . . . . . . . . . . . . . . . . 10 66 6.1. Upon the first fragment . . . . . . . . . . . . . . . . . 10 67 6.2. Upon the next fragments . . . . . . . . . . . . . . . . . 11 68 6.3. Upon the RFRAG Acknowledgments . . . . . . . . . . . . . 12 69 7. Security Considerations . . . . . . . . . . . . . . . . . . . 12 70 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12 71 9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 12 72 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 13 73 10.1. Normative References . . . . . . . . . . . . . . . . . . 13 74 10.2. Informative References . . . . . . . . . . . . . . . . . 13 75 Appendix A. Rationale . . . . . . . . . . . . . . . . . . . . . 15 76 Appendix B. Requirements . . . . . . . . . . . . . . . . . . . . 16 77 Appendix C. Considerations On Flow Control . . . . . . . . . . . 17 78 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 19 80 1. Introduction 82 In most Low Power and Lossy Network (LLN) applications, the bulk of 83 the traffic consists of small chunks of data (in the order few bytes 84 to a few tens of bytes) at a time. Given that an IEEE Std. 802.15.4 85 [IEEE.802.15.4] frame can carry 74 bytes or more in all cases, 86 fragmentation is usually not required. However, and though this 87 happens only occasionally, a number of mission critical applications 88 do require the capability to transfer larger chunks of data, for 89 instance to support a firmware upgrades of the LLN nodes or an 90 extraction of logs from LLN nodes. In the former case, the large 91 chunk of data is transferred to the LLN node, whereas in the latter, 92 the large chunk flows away from the LLN node. In both cases, the 93 size can be on the order of 10Kbytes or more and an end-to-end 94 reliable transport is required. 96 "Transmission of IPv6 Packets over IEEE 802.15.4 Networks" [RFC4944] 97 defines the original 6LoWPAN datagram fragmentation mechanism for 98 LLNs. One critical issue with this original design is that routing 99 an IPv6 [RFC8200] packet across a route-over mesh requires to 100 reassemble the full packet at each hop, which may cause latency along 101 a path and an overall buffer bloat in the network. Those undesirable 102 effects can be alleviated by a hop-by-hop fragment forwarding 103 technique such as the one proposed in this specification, and 104 arguably this could be achieved without the need to define a new 105 protocol. However, adding that capability alone to the local 106 implementation of the original 6LoWPAN fragmentation would not 107 address the bulk of the issues raised against it, and may create new 108 issues like uncontrolled state in the network. 110 Another issue against RFC 4944 [RFC4944] is that it does not define a 111 mechanism to first discover the loss of a fragment along a multi-hop 112 path (e.g. having exhausted the link-layer retries at some hop on the 113 way), and then to recover that loss. With RFC 4944, the forwarding 114 of a whole datagram fails when one fragment is not delivered properly 115 to the destination 6LoWPAN endpoint. End-to-end transport or 116 application-level mechanisms may require a full retransmission of the 117 datagram, wasting resources in an already constrained network. 119 In that situation, the source 6LoWPAN endpoint will not be aware that 120 a loss occurred and will continue sending all fragments for a 121 datagram that is already doomed. The original support is missing 122 signaling to abort a multi-fragment transmission at any time and from 123 either end, and, if the capability to forward fragments is 124 implemented, clean up the related state in the network. It is also 125 lacking flow control capabilities to avoid participating to a 126 congestion that may in turn cause the loss of a fragment and trigger 127 the retransmission of the full datagram. 129 This specification proposes a method to forward fragments across a 130 multi-hop route-over mesh, and to recover individual fragments 131 between LLN endpoints. The method is designed to limit congestion 132 loss in the network and addresses the requirements that are detailed 133 in Appendix B. 135 2. Updating RFC 4944 137 This specification updates the fragmentation mechanism that is 138 specified in "Transmission of IPv6 Packets over IEEE 802.15.4 139 Networks" [RFC4944] for use in route-over LLNs by providing a model 140 where fragments can be forwarded end-to-end across a 6LoWPAN LLN, and 141 where fragments that are lost on the way can be recovered 142 individually. New dispatch types are defined in Section 4. 144 3. Terminology and Referenced Work 146 Past experience with fragmentation has shown that miss-associated or 147 lost fragments can lead to poor network behavior and, occasionally, 148 trouble at application layer. The reader is encouraged to read "IPv4 149 Reassembly Errors at High Data Rates" [RFC4963] and follow the 150 references for more information. 152 That experience led to the definition of "Path MTU discovery" 153 [RFC8201] (PMTUD) protocol that limits fragmentation over the 154 Internet. 156 Specifically in the case of UDP, valuable additional information can 157 be found in "UDP Usage Guidelines for Application Designers" 158 [RFC8085]. 160 Readers are expected to be familiar with all the terms and concepts 161 that are discussed in "IPv6 over Low-Power Wireless Personal Area 162 Networks (6LoWPANs): Overview, Assumptions, Problem Statement, and 163 Goals" [RFC4919] and "Transmission of IPv6 Packets over IEEE 802.15.4 164 Networks" [RFC4944]. 166 "The Benefits of Using Explicit Congestion Notification (ECN)" 167 [RFC8087] provides useful information on the potential benefits and 168 pitfalls of using ECN. 170 Quoting the "Multiprotocol Label Switching (MPLS) Architecture" 171 [RFC3031]: with MPLS, "packets are "labeled" before they are 172 forwarded. At subsequent hops, there is no further analysis of the 173 packet's network layer header. Rather, the label is used as an index 174 into a table which specifies the next hop, and a new label". The 175 MPLS technique is leveraged in the present specification to forward 176 fragments that actually do not have a network layer header, since the 177 fragmentation occurs below IP. 179 This specification uses the following terms: 181 6LoWPAN endpoints 183 The LLN nodes in charge of generating or expanding a 6LoWPAN 184 header from/to a full IPv6 packet. The 6LoWPAN endpoints are the 185 points where fragmentation and reassembly take place. 187 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 188 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 189 document are to be interpreted as described in [RFC2119]. 191 4. New Dispatch types and headers 193 This specification enables the 6LoWPAN fragmentation sublayer to 194 provide an MTU up to 2048 bytes to the upper layer, which can be the 195 6LoWPAN Header Compression sublayer that is defined in the 196 "Compression Format for IPv6 Datagrams" [RFC6282] specification. In 197 order to achieve this, this specification enables the fragmentation 198 and the reliable transmission of fragments over a multihop 6LoWPAN 199 mesh network. 201 This specification provides a technique that is derived from MPLS in 202 order to forward individual fragments across a 6LoWPAN route-over 203 mesh. The datagram_tag is used as a label; it is locally unique to 204 the node that is the source MAC address of the fragment, so together 205 the MAC address and the label can identify the fragment globally. A 206 node may build the datagram_tag in its own locally-significant way, 207 as long as the selected tag stays unique to the particular datagram 208 for the lifetime of that datagram. It results that the label does 209 not need to be globally unique but also that it must be swapped at 210 each hop as the source MAC address changes. 212 This specification extends RFC 4944 [RFC4944] with 4 new Dispatch 213 types, for Recoverable Fragment (RFRAG) headers with or without 214 Acknowledgment Request (RFRAG vs. RFRAG-ARQ), and for the RFRAG 215 Acknowledgment back, with or without ECN Echo (RFRAG-ACK vs. RFRAG- 216 ECHO). 218 (to be confirmed by IANA) The new 6LoWPAN Dispatch types use the 219 Value Bit Pattern of 11 1010xx from page 0 [RFC8025], as follows: 221 Pattern Header Type 222 +------------+------------------------------------------+ 223 | 11 101000 | RFRAG - Recoverable Fragment | 224 | 11 101001 | RFRAG-ARQ - RFRAG with Ack Request | 225 | 11 101010 | RFRAG-ACK - RFRAG Acknowledgment | 226 | 11 101011 | RFRAG-ECHO - RFRAG Ack with ECN Echo | 227 +------------+------------------------------------------+ 229 Figure 1: Additional Dispatch Value Bit Patterns 231 4.1. Recoverable Fragment Dispatch type and Header 233 In this specification, the size and offset of the fragments are 234 expressed on the compressed packet form as opposed to the 235 uncompressed - native - packet form. 237 The first fragment is recognized by a sequence of 0; it carries its 238 fragment_size and the datagram_size of the compressed packet, whereas 239 the other fragments carry their fragment_size and fragment_offset. 240 The last fragment for a datagram is recognized when its 241 fragment_offset and its fragment_size add up to the datagram_size. 243 Recoverable Fragments are sequenced and a bitmap is used in the RFRAG 244 Acknowledgment to indicate the received fragments by setting the 245 individual bits that correspond to their sequence. 247 1 2 3 248 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 249 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 250 |1 1 1 0 1 0 0 X|E|fragment_size| datagram_tag | 251 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 252 |sequence | fragment_offset | 253 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 254 X set == Ack Requested 256 Figure 2: RFRAG Dispatch type and Header 258 X: 1 bit; Ack Requested: when set, the sender requires an RFRAG 259 Acknowledgment from the receiver. 261 E: 1 bit; Explicit Congestion Notification; the "E" flag is reset by 262 the source of the fragment and set by intermediate routers to 263 signal that this fragment experienced congestion along its path. 265 Fragment_size: 7 bit unsigned integer; the size of this fragment in 266 a unit that depends on the MAC layer technology. For IEEE Std. 267 802.15.4, the unit is octet, and the maximum fragment size, which 268 is constrained by the maximum frame size of 128 octet minus the 269 overheads of the MAC and Fragment Headers, is not limited by this 270 encoding. 272 Sequence: 5 bit unsigned integer; the sequence number of the 273 fragment. Fragments are sequence numbered [0..N] where N is in 274 [0..31]. As long as the overheads enable a fragment size of 64 275 bits or more, this enables to fragment a packet of 2047 octets. 277 Fragment_offset: 11 bit unsigned integer; when set to 0, this field 278 indicates an abort condition; else, its value depends on the value 279 of the Sequence. When the sequence is not 0, this field indicates 280 the offset of the fragment in the compressed form. When the 281 sequence is 0, denoting the first fragment of a datagram, this 282 field is overloaded to indicate the total_size of the compressed 283 packet, to help the receiver allocate an adapted buffer for the 284 reception and reassembly operations. This format limits the 285 maximum MTU on a 6LoWPAN link to 2047 bytes, but 1280 bytes is the 286 recommended value to avoid issues with IPV6 Path MTU Discovery 287 [RFC8201]. 289 4.2. RFRAG Acknowledgment Dispatch type and Header 291 This specification also defines a 4-octet RFRAG Acknowledgment bitmap 292 that is used by the reassembling end point to confirm selectively the 293 reception of individual fragments. A given offset in the bitmap maps 294 one to one with a given sequence number. 296 The offset of the bit in the bitmap indicates which fragment is 297 acknowledged as follows: 299 1 2 3 300 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 301 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 302 | RFRAG Acknowledgment Bitmap | 303 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 304 ^ ^ 305 | | bitmap indicating whether: 306 | +--- Fragment with sequence 10 was received 307 +----------------------- Fragment with sequence 00 was received 309 Figure 3: RFRAG Acknowledgment bitmap encoding 311 Figure 4 shows an example Acknowledgment bitmap which indicates that 312 all fragments from sequence 0 to 20 were received, except for 313 fragments 1, 2 and 16 that were either lost or are still in the 314 network over a slower path. 316 1 2 3 317 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 318 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 319 |1|0|0|1|1|1|1|1|1|1|1|1|1|1|1|1|0|1|1|1|1|0|0|0|0|0|0|0|0|0|0|0| 320 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 322 Figure 4: Expanding 3 octets encoding 324 The RFRAG Acknowledgment Bitmap is included in a RFRAG Acknowledgment 325 header, as follows: 327 1 2 3 328 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 329 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 330 |1 1 1 0 1 0 1 Y| datagram_tag | 331 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 332 | RFRAG Acknowledgment Bitmap (32 bits) | 333 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 335 Figure 5: RFRAG Acknowledgment Dispatch type and Header 337 Y: 1 bit; Explicit Congestion Notification Echo 339 When set, the sender indicates that at least one of the 340 acknowledged fragments was received with an Explicit Congestion 341 Notification, indicating that the path followed by the fragments 342 is subject to congestion. 344 RFRAG Acknowledgment Bitmap 346 An RFRAG Acknowledgment Bitmap, whereby setting the bit at offset 347 x indicates that fragment x was received, as shown in Figure 3. 348 All 0's is a NULL bitmap that indicates that the fragmentation 349 process is aborted. All 1's is a FULL bitmap that indicates that 350 the fragmentation process is complete, all fragments were received 351 at the reassembly end point. 353 5. Fragments Recovery 355 The Recoverable Fragment headers RFRAG and RFRAG-ARQ are used to 356 transport a fragment and optionally request an RFRAG Acknowledgment 357 that will confirm the good reception of a one or more fragments. An 358 RFRAG Acknowledgment can optionally carry an ECN indication; it is 359 carried as a standalone header in a message that is sent back to the 360 6LoWPAN endpoint that was the source of the fragments, as known by 361 its MAC address. The process ensures that at every hop, the source 362 MAC address and the datagram_tag in the received fragment are enough 363 information to send the RFRAG Acknowledgment back towards the source 364 6LoWPAN endpoint by reversing the MPLS operation. 366 The 6LoWPAN endpoint that fragments the packets at 6LoWPAN level (the 367 sender) also controls when the reassembling end point sends the RFRAG 368 Acknowledgments by setting the Ack Requested flag in the RFRAG 369 packets. It may set the Ack Requested flag on any fragment to 370 perform congestion control by limiting the number of outstanding 371 fragments, which are the fragments that have been sent but for which 372 reception or loss was not positively confirmed by the reassembling 373 endpoint. When the sender of the fragment knows that an underlying 374 link-layer mechanism protects the Fragments, it may refrain from 375 using the RFRAG Acknowledgment mechanism, and never set the Ack 376 Requested bit. When it receives a fragment with the ACK Request flag 377 set, the 6LoWPAN endpoint that reassembles the packets at 6LoWPAN 378 level (the receiver) sends back an RFRAG Acknowledgment to confirm 379 reception of all the fragments it has received so far. 381 The sender transfers a controlled number of fragments and MAY flag 382 the last fragment of a series with an RFRAG Acknowledgment Request. 383 The received MUST acknowledge a fragment with the acknowledgment 384 request bit set. If any fragment immediately preceding an 385 acknowledgment request is still missing, the receiver MAY 386 intentionally delay its acknowledgment to allow in-transit fragments 387 to arrive. delaying the acknowledgment might defeat the round trip 388 delay computation so it should be configurable and not enabled by 389 default. 391 The receiver MAY issue unsolicited acknowledgments. An unsolicited 392 acknowledgment signals to the sender endpoint that it can resume 393 sending if it had reached its maximum number of outstanding 394 fragments. Another use is to inform that the reassembling endpoint 395 has cancelled the process of an individual datagram. Note that 396 acknowledgments might consume precious resources so the use of 397 unsolicited acknowledgments should be configurable and not enabled by 398 default. 400 The sender protects the transmission with a retry timer that is 401 computed according to the method detailed in [RFC6298]. It is 402 expected that the upper layer retries obey the same or friendly rules 403 in which case a single round of fragment recovery should fit within 404 the upper layer recovery timers. 406 Fragments are sent in a round robin fashion: the sender sends all the 407 fragments for a first time before it retries any lost fragment; lost 408 fragments are retried in sequence, oldest first. This mechanism 409 enables the receiver to acknowledge fragments that were delayed in 410 the network before they are actually retried. 412 When the sender decides that a packet should be dropped and the 413 fragmentation process canceled, it sends a pseudo fragment with the 414 fragment_offset, sequence and fragment_size all set to 0, and no 415 data. Upon reception of this message, the receiver should clean up 416 all resources for the packet associated to the datagram_tag. If an 417 acknowledgment is requested, the receiver responds with a NULL 418 bitmap. 420 The receiver might need to cancel the process of a fragmented packet 421 for internal reasons, for instance if it is out of reassembly 422 buffers, or considers that this packet is already fully reassembled 423 and passed to the upper layer. In that case, the receiver SHOULD 424 indicate so to the sender with a NULL bitmap. Upon an acknowledgment 425 with a NULL bitmap, the sender MUST abort the current fragmented 426 transmission of the datagram. 428 6. Forwarding Fragments 430 It is assumed that the first Fragment is large enough to carry the 431 IPv6 header and make routing decisions. If that is not so, then this 432 specification MUST NOT be used. 434 This specification enables intermediate routers to forward fragments 435 with no intermediate reconstruction of the entire packet. The first 436 fragment carries the IP header and it is routed all the way from the 437 fragmenting end point to the reassembling end point. Upon the first 438 fragment, the routers along the path install a label-switched path 439 (LSP), and the sollowing fragments are label-switched along that 440 path. As a consequence, alternate routes not possible for individual 441 fragments. The datagram_tag is used to carry the label, that is 442 swapped at each hop. All fragments follow the same path and 443 fragments are delivered in the order at which they are sent. 445 6.1. Upon the first fragment 447 In Route-Over mode, the source and destination MAC addressed in a 448 frame change at each hop. The label that is formed and placed in the 449 datagram_tag is associated to the source MAC and only valid (and 450 unique) for that source MAC. Say the first fragment has: 452 Source IPv6 address = IP_A (maybe hops away) 454 Destination IPv6 address = IP_B (maybe hops away) 456 Source MAC = MAC_previous 458 Datagram_tag= DT_previous 460 The intermediate router that forwards individual fragments performs 461 the following action: 463 ia route lookup to get the Next hop IPv6 towards IP_B, which 464 resolves as IP_next. 466 a MAC address resolution to get the MAC address associated to 467 IP_next, which resolves as MAC_next 469 Since it is a first fragment of a packet from that source MAC address 470 MAC_previous for that tag DT_previous, the router: 472 cleans up any leftover resource associated to the tuple 473 (MAC_previous, DT_previous) 475 allocates a new label for that flow, DT_next, from a Least 476 Recently Used pool or some similar procedure. 478 allocates a label-swap entry indexed by (MAC_previous, 479 DT_previous) that contains (MAC_next, DT_next) 481 allocates a label-swap structure indexed by (MAC_next, DT_next) 482 that contains (MAC_previous, DT_previous); this enables the 483 reverse MPLS switching operation that is used to route the RFRAG- 484 ACK. 486 change the source MAC address to from MAC_prev to MAC_self 488 change the destination MAC address to from MAC_self to MAC_next 490 Swaps the datagram_tag to DT_next 492 At this point the router is all set and can forward the fragment to 493 next. 495 6.2. Upon the next fragments 497 Upon next fragments (that are not first fragment), the router expects 498 to have already installed a label-swap structure indexed by 499 (MAC_previous, DT_previous). The router: 501 looks up the label-swap entry for (MAC_previous, DT_previous), 502 which resolves as (MAC_next, DT_next) 504 swaps the MAC info to from self to MAC_next; 506 Swaps the datagram_tag to DT_next 508 if the label-swap entry for (MAC_previous, DT_previous) is not found, 509 the router builds an RFRAG-ACK to indicate the error. The resulting 510 message has the following information: 512 MAC info set to from self to MAC_previous as found in the fragment 514 Swaps the datagram_tag set to DT_previous 516 Numm bitmap to indicate the error 518 At this point the router is all set and can send the RFRAG-ACK back 519 ot the previous router. 521 6.3. Upon the RFRAG Acknowledgments 523 Upon an RFRAG Acknowledgment, the router expects to already have 524 label-swap structure indexed by (MAC_next, DT_next), which are 525 respectively the source MAC address of the received frame and the 526 received datagram_tag. DT_next should have been computed by this 527 router and this router should have assigned it to this particular 528 datagram. The router: 530 looks up the label-swap entry for (MAC_next, DT_next), which 531 resolves as (MAC_previous, DT_previous) 533 swaps the MAC info to from self to MAC_previous; 535 Swaps the datagram_tag to DT_previous 537 At this point the router is all set and can forward the RFRAG-ACK to 538 previous. 540 If the label-swap entry for (MAC_next, DT_next) is not found, it MUST 541 silently drop the packet. 543 If the RFRAG-ACK indicates either an error (NULL bitmap) or that the 544 fragment was entirely received (FULL bitmap), the router schedules 545 the label-swap entries for recycling. If the RFRAG-ACK is lost on 546 the way back, the source may retry the last fragment, which will 547 result as an error RFRAG-ACK from the first router on the way that 548 has already cleaned up. 550 7. Security Considerations 552 The process of recovering fragments does not appear to create any 553 opening for new threat compared to "Transmission of IPv6 Packets over 554 IEEE 802.15.4 Networks" [RFC4944]. 556 8. IANA Considerations 558 Need extensions for formats defined in "Transmission of IPv6 Packets 559 over IEEE 802.15.4 Networks" [RFC4944]. 561 9. Acknowledgments 563 The author wishes to thank Thomas Watteyne for in-depth reviews and 564 comments, as well as Jay Werb, Christos Polyzois, Soumitri Kolavennu, 565 Pat Kinney, Margaret Wasserman, Richard Kelsey, Carsten Bormann and 566 Harry Courtice for their various contributions. 568 10. References 570 10.1. Normative References 572 [IEEE.802.15.4] 573 IEEE, "IEEE Standard for Low-Rate Wireless Networks", 574 IEEE Standard 802.15.4, DOI 10.1109/IEEESTD.2016.7460875, 575 . 577 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 578 Requirement Levels", BCP 14, RFC 2119, 579 DOI 10.17487/RFC2119, March 1997, 580 . 582 [RFC4944] Montenegro, G., Kushalnagar, N., Hui, J., and D. Culler, 583 "Transmission of IPv6 Packets over IEEE 802.15.4 584 Networks", RFC 4944, DOI 10.17487/RFC4944, September 2007, 585 . 587 [RFC6282] Hui, J., Ed. and P. Thubert, "Compression Format for IPv6 588 Datagrams over IEEE 802.15.4-Based Networks", RFC 6282, 589 DOI 10.17487/RFC6282, September 2011, 590 . 592 [RFC6298] Paxson, V., Allman, M., Chu, J., and M. Sargent, 593 "Computing TCP's Retransmission Timer", RFC 6298, 594 DOI 10.17487/RFC6298, June 2011, 595 . 597 [RFC8025] Thubert, P., Ed. and R. Cragie, "IPv6 over Low-Power 598 Wireless Personal Area Network (6LoWPAN) Paging Dispatch", 599 RFC 8025, DOI 10.17487/RFC8025, November 2016, 600 . 602 10.2. Informative References 604 [I-D.ietf-6tisch-architecture] 605 Thubert, P., "An Architecture for IPv6 over the TSCH mode 606 of IEEE 802.15.4", draft-ietf-6tisch-architecture-11 (work 607 in progress), January 2017. 609 [RFC2914] Floyd, S., "Congestion Control Principles", BCP 41, 610 RFC 2914, DOI 10.17487/RFC2914, September 2000, 611 . 613 [RFC3031] Rosen, E., Viswanathan, A., and R. Callon, "Multiprotocol 614 Label Switching Architecture", RFC 3031, 615 DOI 10.17487/RFC3031, January 2001, 616 . 618 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 619 of Explicit Congestion Notification (ECN) to IP", 620 RFC 3168, DOI 10.17487/RFC3168, September 2001, 621 . 623 [RFC4919] Kushalnagar, N., Montenegro, G., and C. Schumacher, "IPv6 624 over Low-Power Wireless Personal Area Networks (6LoWPANs): 625 Overview, Assumptions, Problem Statement, and Goals", 626 RFC 4919, DOI 10.17487/RFC4919, August 2007, 627 . 629 [RFC4963] Heffner, J., Mathis, M., and B. Chandler, "IPv4 Reassembly 630 Errors at High Data Rates", RFC 4963, 631 DOI 10.17487/RFC4963, July 2007, 632 . 634 [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion 635 Control", RFC 5681, DOI 10.17487/RFC5681, September 2009, 636 . 638 [RFC7554] Watteyne, T., Ed., Palattella, M., and L. Grieco, "Using 639 IEEE 802.15.4e Time-Slotted Channel Hopping (TSCH) in the 640 Internet of Things (IoT): Problem Statement", RFC 7554, 641 DOI 10.17487/RFC7554, May 2015, 642 . 644 [RFC7567] Baker, F., Ed. and G. Fairhurst, Ed., "IETF 645 Recommendations Regarding Active Queue Management", 646 BCP 197, RFC 7567, DOI 10.17487/RFC7567, July 2015, 647 . 649 [RFC8085] Eggert, L., Fairhurst, G., and G. Shepherd, "UDP Usage 650 Guidelines", BCP 145, RFC 8085, DOI 10.17487/RFC8085, 651 March 2017, . 653 [RFC8087] Fairhurst, G. and M. Welzl, "The Benefits of Using 654 Explicit Congestion Notification (ECN)", RFC 8087, 655 DOI 10.17487/RFC8087, March 2017, 656 . 658 [RFC8200] Deering, S. and R. Hinden, "Internet Protocol, Version 6 659 (IPv6) Specification", STD 86, RFC 8200, 660 DOI 10.17487/RFC8200, July 2017, 661 . 663 [RFC8201] McCann, J., Deering, S., Mogul, J., and R. Hinden, Ed., 664 "Path MTU Discovery for IP version 6", STD 87, RFC 8201, 665 DOI 10.17487/RFC8201, July 2017, 666 . 668 Appendix A. Rationale 670 There are a number of uses for large packets in Wireless Sensor 671 Networks. Such usages may not be the most typical or represent the 672 largest amount of traffic over the LLN; however, the associated 673 functionality can be critical enough to justify extra care for 674 ensuring effective transport of large packets across the LLN. 676 The list of those usages includes: 678 Towards the LLN node: 680 Firmware update: For example, a new version of the LLN node 681 software is downloaded from a system manager over unicast or 682 multicast services. Such a reflashing operation typically 683 involves updating a large number of similar LLN nodes over a 684 relatively short period of time. 686 Packages of Commands: A number of commands or a full 687 configuration can be packaged as a single message to ensure 688 consistency and enable atomic execution or complete roll back. 689 Until such commands are fully received and interpreted, the 690 intended operation will not take effect. 692 From the LLN node: 694 Waveform captures: A number of consecutive samples are measured 695 at a high rate for a short time and then transferred from a 696 sensor to a gateway or an edge server as a single large report. 698 Data logs: LLN nodes may generate large logs of sampled data for 699 later extraction. LLN nodes may also generate system logs to 700 assist in diagnosing problems on the node or network. 702 Large data packets: Rich data types might require more than one 703 fragment. 705 Uncontrolled firmware download or waveform upload can easily result 706 in a massive increase of the traffic and saturate the network. 708 When a fragment is lost in transmission, the lack of recovery in the 709 original fragmentation system of RFC 4944 implies that all fragments 710 are resent, further contributing to the congestion that caused the 711 initial loss, and potentially leading to congestion collapse. 713 This saturation may lead to excessive radio interference, or random 714 early discard (leaky bucket) in relaying nodes. Additional queuing 715 and memory congestion may result while waiting for a low power next 716 hop to emerge from its sleeping state. 718 Considering that RFC 4944 defines an MTU is 1280 bytes and that in 719 most incarnations (but 802.15.4g) a IEEE Std. 802.15.4 frame can 720 limit the MAC payload to as few as 74 bytes, a packet might be 721 fragmented into at least 18 fragments at the 6LoWPAN shim layer. 722 Taking into account the worst-case header overhead for 6LoWPAN 723 Fragmentation and Mesh Addressing headers will increase the number of 724 required fragments to around 32. This level of fragmentation is much 725 higher than that traditionally experienced over the Internet with 726 IPv4 fragments. At the same time, the use of radios increases the 727 probability of transmission loss and Mesh-Under techniques compound 728 that risk over multiple hops. 730 Mechanisms such as TCP or application-layer segmentation could be 731 used to support end-to-end reliable transport. One option to support 732 bulk data transfer over a frame-size-constrained LLN is to set the 733 Maximum Segment Size to fit within the link maximum frame size. 734 Doing so, however, can add significant header overhead to each 735 802.15.4 frame. In addition, deploying such a mechanism requires 736 that the end-to-end transport is aware of the delivery properties of 737 the underlying LLN, which is a layer violation, and difficult to 738 achieve from the far end of the IPv6 network. 740 Appendix B. Requirements 742 For one-hop communications, a number of Low Power and Lossy Network 743 (LLN) link-layers propose a local acknowledgment mechanism that is 744 enough to detect and recover the loss of fragments. In a multihop 745 environment, an end-to-end fragment recovery mechanism might be a 746 good complement to a hop-by-hop MAC level recovery. This draft 747 introduces a simple protocol to recover individual fragments between 748 6LoWPAN endpoints that may be multiple hops away. The method 749 addresses the following requirements of a LLN: 751 Number of fragments 752 The recovery mechanism must support highly fragmented packets, 753 with a maximum of 32 fragments per packet. 755 Minimum acknowledgment overhead 757 Because the radio is half duplex, and because of silent time spent 758 in the various medium access mechanisms, an acknowledgment 759 consumes roughly as many resources as data fragment. 761 The new end-to-end fragment recovery mechanism should be able to 762 acknowledge multiple fragments in a single message and not require 763 an acknowledgment at all if fragments are already protected at a 764 lower layer. 766 Controlled latency 768 The recovery mechanism must succeed or give up within the time 769 boundary imposed by the recovery process of the Upper Layer 770 Protocols. 772 Optional congestion control 774 The aggregation of multiple concurrent flows may lead to the 775 saturation of the radio network and congestion collapse. 777 The recovery mechanism should provide means for controlling the 778 number of fragments in transit over the LLN. 780 Appendix C. Considerations On Flow Control 782 Considering that a multi-hop LLN can be a very sensitive environment 783 due to the limited queuing capabilities of a large population of its 784 nodes, this draft recommends a simple and conservative approach to 785 congestion control, based on TCP congestion avoidance. 787 Congestion on the forward path is assumed in case of packet loss, and 788 packet loss is assumed upon time out. The draft allows to control 789 the number of outstanding fragments, that have been transmitted but 790 for which an acknowledgment was not received yet. It must be noted 791 that the number of outstanding fragments should not exceed the number 792 of hops in the network, but the way to figure the number of hops is 793 out of scope for this document. 795 Congestion on the forward path can also be indicated by an Explicit 796 Congestion Notification (ECN) mechanism. Though whether and how ECN 797 [RFC3168] is carried out over the LoWPAN is out of scope, this draft 798 provides a way for the destination endpoint to echo an ECN indication 799 back to the source endpoint in an acknowledgment message as 800 represented in Figure 5 in Section 4.2. 802 It must be noted that congestion and collision are different topics. 803 In particular, when a mesh operates on a same channel over multiple 804 hops, then the forwarding of a fragment over a certain hop may 805 collide with the forwarding of a next fragment that is following over 806 a previous hop but in a same interference domain. This draft enables 807 an end-to-end flow control, but leaves it to the sender stack to pace 808 individual fragments within a transmit window, so that a given 809 fragment is sent only when the previous fragment has had a chance to 810 progress beyond the interference domain of this hop. In the case of 811 6TiSCH [I-D.ietf-6tisch-architecture], which operates over the 812 TimeSlotted Channel Hopping [RFC7554] (TSCH) mode of operation of 813 IEEE802.14.5, a fragment is forwarded over a different channel at a 814 different time and it makes full sense to transmit the next fragment 815 as soon as the previous fragment has had its chance to be forwarded 816 at the next hop. 818 From the standpoint of a source 6LoWPAN endpoint, an outstanding 819 fragment is a fragment that was sent but for which no explicit 820 acknowledgment was received yet. This means that the fragment might 821 be on the way, received but not yet acknowledged, or the 822 acknowledgment might be on the way back. It is also possible that 823 either the fragment or the acknowledgment was lost on the way. 825 From the sender standpoint, all outstanding fragments might still be 826 in the network and contribute to its congestion. There is an 827 assumption, though, that after a certain amount of time, a frame is 828 either received or lost, so it is not causing congestion anymore. 829 This amount of time can be estimated based on the round trip delay 830 between the 6LoWPAN endpoints. The method detailed in [RFC6298] is 831 recommended for that computation. 833 The reader is encouraged to read through "Congestion Control 834 Principles" [RFC2914]. Additionally [RFC7567] and [RFC5681] provide 835 deeper information on why this mechanism is needed and how TCP 836 handles Congestion Control. Basically, the goal here is to manage 837 the amount of fragments present in the network; this is achieved by 838 to reducing the number of outstanding fragments over a congested path 839 by throttling the sources. 841 Section 5 describes how the sender decides how many fragments are 842 (re)sent before an acknowledgment is required, and how the sender 843 adapts that number to the network conditions. 845 Authors' Addresses 847 Pascal Thubert (editor) 848 Cisco Systems, Inc 849 Building D 850 45 Allee des Ormes - BP1200 851 MOUGINS - Sophia Antipolis 06254 852 FRANCE 854 Phone: +33 497 23 26 34 855 Email: pthubert@cisco.com 857 Jonathan W. Hui 858 Nest Labs 859 3400 Hillview Ave 860 Palo Alto, California 94304 861 USA 863 Email: jonhui@nestlabs.com