[6lo] Thomas' review of draft-thubert-6lo-forwarding-fragments-04

Thomas Watteyne <thomas.watteyne@inria.fr> Mon, 27 March 2017 21:01 UTC

Return-Path: <thomas.watteyne@inria.fr>
X-Original-To: 6lo@ietfa.amsl.com
Delivered-To: 6lo@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 04D9412967A; Mon, 27 Mar 2017 14:01:37 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.399
X-Spam-Level:
X-Spam-Status: No, score=-6.399 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_SORBS_SPAM=0.5, RP_MATCHES_RCVD=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id tEMwC0u9x-I5; Mon, 27 Mar 2017 14:01:28 -0700 (PDT)
Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 1EC0C1279EB; Mon, 27 Mar 2017 14:01:24 -0700 (PDT)
X-IronPort-AV: E=Sophos;i="5.36,233,1486422000"; d="scan'208,217";a="266445830"
Received: from mail-vk0-f47.google.com ([209.85.213.47]) by mail2-relais-roc.national.inria.fr with ESMTP/TLS/AES128-GCM-SHA256; 27 Mar 2017 23:01:22 +0200
Received: by mail-vk0-f47.google.com with SMTP id d188so66874625vka.0; Mon, 27 Mar 2017 14:01:22 -0700 (PDT)
X-Gm-Message-State: AFeK/H0XJ02QLTrmUtaQUM5Bnk12rveW3fg0tpGfLsmRM2z0/ajt5e6TeomnKxawRaS15XLPS/A1kYG8AyrHxg==
X-Received: by 10.176.92.36 with SMTP id q36mr5450303uaf.93.1490648479754; Mon, 27 Mar 2017 14:01:19 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.31.194.149 with HTTP; Mon, 27 Mar 2017 14:00:59 -0700 (PDT)
From: Thomas Watteyne <thomas.watteyne@inria.fr>
Date: Mon, 27 Mar 2017 23:00:59 +0200
X-Gmail-Original-Message-ID: <CADJ9OA9AX6N+-pciaCxyO-oWNft2fwKf8OXSn71VDi+tdXHoKw@mail.gmail.com>
Message-ID: <CADJ9OA9AX6N+-pciaCxyO-oWNft2fwKf8OXSn71VDi+tdXHoKw@mail.gmail.com>
To: "6lo@ietf.org" <6lo@ietf.org>, "6tisch@ietf.org" <6tisch@ietf.org>
Content-Type: multipart/alternative; boundary="f403043621d61ef647054bbca593"
Archived-At: <https://mailarchive.ietf.org/arch/msg/6lo/QArOM8HExw2kF5TApR_nwP0NjUc>
Subject: [6lo] Thomas' review of draft-thubert-6lo-forwarding-fragments-04
X-BeenThere: 6lo@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: "Mailing list for the 6lo WG for Internet Area issues in IPv6 over constrained node networks." <6lo.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/6lo>, <mailto:6lo-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/6lo/>
List-Post: <mailto:6lo@ietf.org>
List-Help: <mailto:6lo-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/6lo>, <mailto:6lo-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 27 Mar 2017 21:01:37 -0000

Authors, all,

[CC'ing the 6TiSCH ML as this draft has been discussed quite a number of
times at 6TiSCH]

After a couple of weeks disconnected, I'm back just in time for reviewing
draft-thubert-6lo-forwarding-fragments-04 before the 6lo WG meeting. I'll
be (remotely) attending the 6lo WG meeting on Wednesday, happy to further
discuss there.

What follows are a couple of high level discussion points.
This being a relatively short draft, the entire document is copied at the
end of this e-mail, with corrections/suggestions directly inline.
I left questions and comments with the "TW>" prefix.
Please use Diff to report them to the XML.

Overall, I believe the draft addresses a very important point, and I'm
surprised about the relatively low number of people that stand up to
support this work. Are we all happy to reassemble at every hop? really?

I believe it is time to adopt this work in 6lo.

Thomas

---

point 1: clearer intro needed

from my understanding, this draft introduces two things: (1) a way of
forwarding fragments so as to avoid to have to reassemble at every hop and
(2) an end-to-end protocol to request lost fragments to be retransmitted. I
think item (1) is really the main contribution of the draft, and (2) is a
necessary corollary to making (1) work. The abstract and intro, however,
seem to focus almost exclusively on (2), not clearly stating (1). A single
sentence at the very beginning stressing the two contributions would go a
long way IMO.

---

point 2: justification

Section 3 provides a rationale as to why this draft is needed. It focuses
almost exclusively on reliability: if the link is 99% reliable, then the
probability that 5 fragments, etc. IMO that doesn't make much sense as each
fragment will be link-layer ACK'ed and retransmitted (at the link layer,
i.e."transparently" to 6LoWPAN). There is of course a probability that all
retries fail, but justifying the draft by assuming there are no link-layer
ACKs make little sense. I may completely misunderstand, in which case,
please point that out.

The major rationale IMO for this draft is that it doesn't require
intermediary nodes to reassemble! Why not just say that? Reassembly at each
hop yields several problems, the biggest one in my experience is that of
memory management (if I'm missing one fragment from a packet, and a
fragment from a second packet arrive, and I have only one assembly buffer,
what should I do?).

I feel like I'm missing something, or even reading a draft that is not what
I expected it to be...

---

point 3: Section 5 Overview is not an overview

Section 5 is called "Overview" and appears before the actual proposal is
described. From the title, I was expecting a couple of paragraphs with the
main idea, maybe a diagram or two to make the big picture clear. Instead,
the section contains a discussion about very subtle points, including
congestion control, interaction with link-layer protocols, timeout
calculation and security.

To summarize, the text in this section is interesting and should certainly
appear in the specification, but as discussion points AFTER the main idea
is presented.

---

point 4: the confusing use of the word "acknowledgment"

While understandable, the use of acknowledgment which can refer to L2 ACKs
or RFRAG-ACK messages is confused. I suggest to use the term "RFRAG-ACK"
rather than the generic "Acknowledgement".

---

point 5: label switching

The label switching mechanism is elegant as the labels are locally
significant only, with no need to maintain a network-wide label registry.
The document should state so.

===

6lo                                                      P. Thubert, Ed.
Internet-Draft                                             Cisco Systems
Intended status: Standards Track                                  J. Hui
Expires: July 14, 2017                                         Nest Labs
                                                        January 10, 2017


                  LLN Fragment Forwarding and Recovery
               draft-thubert-6lo-forwarding-fragments-04

Abstract

   In order to be routed, a fragmented 6LoWPAN packet must be
   reassembled at every hop of a multihop link where lower layer
   fragmentation occurs.  Considering that the IPv6 minimum MTU is 1280
   bytes, and that an 802.15.4 frame can have a payload limited to 74
   bytes in the worst case, a packet might end up fragmented into as
   many as 18 fragments by 6LoWPAN.
TW> not sure the term "shim layer" is widely used. I suggest to replace
simply by "6LoWPAN" (done already)
   If a single one of
   those fragments is lost in transmission, all fragments must be
   resent, further contributing to the congestion that might have caused
   the initial packet loss.  This specification introduces a simple
protocol to
   forward and recover individual fragments that might be lost over
   multiple hops between 6LoWPAN endpoints.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on July 14, 2017.

Copyright Notice

   Copyright (c) 2017 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents



Thubert & Hui             Expires July 14, 2017                 [Page 1]
Internet-Draft    LLN Fragment Forwarding and Recovery      January 2017


   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
   2.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . .   3
   3.  Rationale . . . . . . . . . . . . . . . . . . . . . . . . . .   4
   4.  Requirements  . . . . . . . . . . . . . . . . . . . . . . . .   5
   5.  Overview  . . . . . . . . . . . . . . . . . . . . . . . . . .   6
   6.  New Dispatch types and headers  . . . . . . . . . . . . . . .   8
     6.1.  Recoverable Fragment Dispatch type and Header . . . . . .   8
     6.2.  Fragment acknowledgment Dispatch type and Header  . . . .   8
   7.  Fragments Recovery  . . . . . . . . . . . . . . . . . . . . .  10
   8.  Forwarding Fragments  . . . . . . . . . . . . . . . . . . . .  11
     8.1.  Upon the first fragment . . . . . . . . . . . . . . . . .  12
     8.2.  Upon the next fragments . . . . . . . . . . . . . . . . .  13
     8.3.  Upon the fragment acknowledgments . . . . . . . . . . . .  13
   9.  Security Considerations . . . . . . . . . . . . . . . . . . .  14
   10. IANA Considerations . . . . . . . . . . . . . . . . . . . . .  14
   11. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . .  14
   12. References  . . . . . . . . . . . . . . . . . . . . . . . . .  14
     12.1.  Normative References . . . . . . . . . . . . . . . . . .  14
     12.2.  Informative References . . . . . . . . . . . . . . . . .  15
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  16

1.  Introduction

   In most Low Power and Lossy Network (LLN) applications, the bulk of
   the traffic consists of small chunks of data (in the order few bytes
   to a few tens of bytes) at a time.  Given that an 802.15.4 frame can
   carry 74 bytes or more in all cases, fragmentation is usually not
   required.  However, and though this happens only occasionally, a
   number of mission-critical applications do require the capability to
   transfer larger chunks of data, for instance to support firmware
   update of the LLN nodes, or an extraction of logs from LLN nodes.
   In the former case, the large chunk of data is transferred to the LLN
   node, whereas in the latter, the large chunk flows away from the LLN
   node.  In both cases, the size can be on the order of 10 kB or
   more, and an end-to-end reliable transport is required.

   Mechanisms such as TCP or application-layer segmentation will be used
   to support end-to-end reliable transport.  One option to support bulk



Thubert & Hui             Expires July 14, 2017                 [Page 2]
Internet-Draft    LLN Fragment Forwarding and Recovery      January 2017


   data transfer over a frame-size-constrained LLN is to set the Maximum
   Segment Size to fit within the link maximum frame size.  Doing so,
   however, can add significant header overhead to each 802.15.4 frame.
   This causes the end-to-end transport to be intimately aware of the
   delivery properties of the underlaying LLN, which is a layer
   violation.

   An alternative mechanism uses 6LoWPAN fragmentation in
   addition to transport or application-layer segmentation.  Increasing
   the Maximum Segment Size reduces header overhead by the end-to-end
   transport protocol.  It also encourages the transport protocol to
   reduce the number of outstanding datagrams, ideally to a single
   datagram, thus reducing the need to support out-of-order delivery
   common to LLNs.

   [RFC4944] defines a datagram fragmentation mechanism for LLNs.
   However, because [RFC4944] does not define a mechanism for recovering
   fragments that are lost, datagram forwarding fails if even one
   fragment is not delivered properly to the next IP hop.  End-to-end
   transport mechanisms will require retransmission of all fragments,
   wasting resources in an already resource-constrained network.

   Past experience with fragmentation has shown that mis-associated or
   lost fragments can lead to poor network behavior and, eventually,
   trouble at application layer.  The reader is encouraged to read
   [RFC4963] and follow the references for more information.  That
   experience led to the definition of the Path MTU discovery [RFC1191]
   protocol that limits fragmentation over the Internet.

   For one-hop communications, a number of media propose a local
   acknowledgment mechanism that is enough to protect the fragments.  In
   a multihop environment, an end-to-end fragment recovery mechanism
   might be a good complement to a hop-by-hop MAC level recovery.  This
   draft introduces a simple protocol to recover individual fragments
   between 6LoWPAN endpoints.  Specifically in the case of UDP, valuable
   additional information can be found in UDP Usage Guidelines for
   Application Designers [RFC5405].

2.  Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in [RFC2119].

   Readers are expected to be familiar with all the terms and concepts
   that are discussed in "IPv6 over Low-Power Wireless Personal Area
   Networks (6LoWPANs): Overview, Assumptions, Problem Statement, and




Thubert & Hui             Expires July 14, 2017                 [Page 3]
Internet-Draft    LLN Fragment Forwarding and Recovery      January 2017


   Goals" [RFC4919] and "Transmission of IPv6 Packets over IEEE 802.15.4
   Networks" [RFC4944].

   ERP

      Error Recovery Procedure.

   6LoWPAN endpoints

      The LLN nodes in charge of generating or expanding a 6LoWPAN
      header from/to a full IPv6 packet.  The 6LoWPAN endpoints are the
      points where fragmentation and reassembly take place.
TW> should we add a sentence that states that one of those point is NOT
necessarily the LBR?

3.  Rationale

   There are a number of use cases for large packets in Wireless Sensor
   Networks.  Such usages may not be the most typical or represent the
   largest amount of traffic over the LLN; however, the associated
   functionality can be critical enough to justify extra care for
   ensuring effective transport of large packets across the LLN.

   The list of those use cases includes:

   Towards the LLN node:

      Packages of Commands:  A number of commands or a full
         configuration can be packaged as a single message to ensure
         consistency and enable atomic execution or complete rollback.
         Until such commands are fully received and interpreted, the
         intended operation will not take effect.

      Firmware update:  For example, a new version of the LLN node
         software is downloaded from a system manager over unicast or
         multicast services.  Such a reflashing operation typically
         involves updating a large number of similar LLN nodes over a
         relatively short period of time.

   From the LLN node:

      Waveform captures:  A number of consecutive samples are measured
         at a high rate for a short time and then transferred from a
         sensor to a gateway or an edge server as a single large report.

      Data logs:  LLN nodes may generate large logs of sampled data for
         later extraction.  LLN nodes may also generate system logs to
         assist in diagnosing problems on the node or network.





Thubert & Hui             Expires July 14, 2017                 [Page 4]
Internet-Draft    LLN Fragment Forwarding and Recovery      January 2017


      Large data packets:  Rich data types might require more than one
         fragment.

   Uncontrolled firmware download or waveform upload can easily result
   in a massive increase of the traffic and saturate the network.

   When a fragment is lost in transmission, all fragments are resent,
   further contributing to the congestion that caused the initial loss,
   and potentially leading to congestion collapse.

   This saturation may lead to excessive radio interference, or random
   early discard (leaky bucket) in relaying nodes.  Additional queuing
   and memory congestion may result while waiting for a low power next
   hop to emerge from its sleeping state.

   To demonstrate the severity of the problem, consider a fairly
   reliable 802.15.4 frame delivery rate of 99.9% over a single 802.15.4
   hop.  The expected delivery rate of a 5-fragment datagram would be
   about 99.5% over a single 802.15.4 hop.  However, the expected
   delivery rate would drop to 95.1% over 10 hops, a reasonable network
   diameter for LLN applications.  The expected delivery rate for a
   1280-byte datagram is 98.4% over a single hop and 85.2% over 10 hops.
TW> I don't think this paragraph makes much sense.
TW> You would have link-layer ACKs and retries to ensure that all fragments
reach the next hop.

   Considering that [RFC4944] defines an MTU is 1280 bytes and that in
   most incarnations (but 802.15.4G) a 802.15.4 frame can limit the MAC
   payload to as few as 74 bytes, a packet might be fragmented into at
   least 18 fragments by 6LoWPAN.  Taking into account
   the worst-case header overhead for 6LoWPAN Fragmentation and Mesh
   Addressing headers will increase the number of required fragments to
   around 32.  This level of fragmentation is much higher than that
   traditionally experienced over the Internet with IPv4 fragments.  At
   the same time, the use of radios increases the probability of
   transmission loss and Mesh-Under techniques compound that risk over
   multiple hops.

4.  Requirements

   This specification proposes a method to recover individual fragments
between
   LLN endpoints.
TW> add that endpoints can be multiple hops away from one another?
   The method is designed to fit the following
   requirements of a LLN (with or without a Mesh-Under routing
   protocol):

   Number of fragments

      The recovery mechanism must support highly fragmented packets,
      with a maximum of 32 fragments per packet.
TW> resulting length of the packet? The abstract implies a 32*74=2368 byte
limit. Is that correct?

   Minimum acknowledgment overhead



Thubert & Hui             Expires July 14, 2017                 [Page 5]
Internet-Draft    LLN Fragment Forwarding and Recovery      January 2017


      Because the radio is half duplex, and because of silent time spent
      in the various medium access mechanisms, an acknowledgment
      consumes roughly as many resources as data fragment.

      The recovery mechanism should be able to acknowledge multiple
      fragments in a single message and not require an acknowledgment at
      all if fragments are already protected at a lower layer.

TW> maybe specify here that we are talking about end-to-end ACKs between
endpoints, not L2 ACKs

   Controlled latency

      The recovery mechanism must succeed or give up within the time
      boundary imposed by the recovery process of the Upper Layer
      Protocols.

   Support for out-of-order fragment delivery

      A Mesh-Under load balancing mechanism such as the ISA100 Data Link
      Layer can introduce out-of-sequence packets.
TW> I don't fully understand why we focus so much on mesh-under.
TW> If you have a fully mesh-under network, 4944 fragmentation results in
fragment forwarding.
TW> I would remove the repeated mentioning on mesh-under (as well as ISA100)

      The recovery mechanism must account for packets that appear lost
      but are actually only delayed over a different path.

   Optional congestion control

      The aggregation of multiple concurrent flows may lead to the
      saturation of the radio network and congestion collapse.

      The recovery mechanism should provide means for controlling the
      number of fragments in transit over the LLN.

5.  Overview

   Considering that a multi-hop LLN can be a very sensitive environment
   due to the limited queuing capabilities of a large population of its
   nodes, this specification recommends a simple and conservative approach
to
   congestion control, based on TCP congestion avoidance.
TW> "based on" or "inspired by"

TW> The "overview" section delves immediately into subtle discussions about
congestion control.
TW> At this point, the reader (e.g. me) have no idea what the basic
principle is.
TW> Discussions about subtleties such as congestion control seem out of
place here.

   Congestion on the forward path is assumed in case of packet loss, and
   packet loss is assumed upon time out.  This specification allows to
control
   the number of outstanding fragments, that have been transmitted but
   for which an acknowledgment was not received yet.  It must be noted
   that the number of outstanding fragments should not exceed the number
   of hops in the network, but the way to figure the number of hops is
   out of scope for this document.

   Congestion on the forward path can also be indicated by an Explicit
   Congestion Notification (ECN) mechanism.  Though whether and how ECN
   [RFC3168] is carried out over the LoWPAN is out of scope, this draft



Thubert & Hui             Expires July 14, 2017                 [Page 6]
Internet-Draft    LLN Fragment Forwarding and Recovery      January 2017


   provides a way for the destination endpoint to echo an ECN indication
   back to the source endpoint in an acknowledgment message as
   represented in Figure 5 in Section 6.2.

TW> The following paragraph talks about how a link-layer transmits multiple
frames in a row.
TW> The link-layer does so whether those frames are fragments or regular
packets.
TW> I would remove this discussion, which seems to be link-layer only, and
not specific to fragment forwarding.

   It must be noted that congestion and collision are different topics.
   In particular, when a mesh operates on a same channel over multiple
   hops, then the forwarding of a fragment over a certain hop may
   collide with the forwarding of a next fragment that is following over
   a previous hop but in a same interference domain.  This draft enables
   an end-to-end flow control, but leaves it to the sender stack to pace
   individual fragments within a transmit window, so that a given
   fragment is sent only when the previous fragment has had a chance to
   progress beyond the interference domain of this hop.  In the case of
   6TiSCH [I-D.ietf-6tisch-architecture], which operates over the
   TimeSlotted Channel Hopping [I-D.ietf-6tisch-tsch] (TSCH) mode of
   operation of IEEE802.14.5, a fragment is forwarded over a different
   channel at a different time and it make full sense to fire a next
   fragment as soon as the previous fragment has had its chance to be
   forwarded at the next hop, retry (ARQ) operations included.

   From the standpoint of a source 6LoWPAN endpoint, an outstanding
   fragment is a fragment that was sent but for which no explicit
   acknowledgment was received yet.  This means that the fragment might
   be on the way, received but not yet acknowledged, or the
   acknowledgment might be on the way back.  It is also possible that
   either the fragment or the acknowledgment was lost on the way.

   Because a meshed LLN might deliver frames out of order, it is
   virtually impossible to differentiate these situations.  In other
   words, from the sender's standpoint, all outstanding fragments might
   still be in the network and contribute to its congestion.  There is
   an assumption, though, that after a certain amount of time, a frame
   is either received or lost, so it is not causing congestion anymore.
   This amount of time can be estimated based on the round trip delay
   between the 6LoWPAN endpoints.  The method detailed in [RFC6298] is
   recommended for that computation.

   The reader is encouraged to read through "Congestion Control
   Principles" [RFC2914].  Additionally [RFC2309] and [RFC5681] provide
   deeper information on why this mechanism is needed and how TCP
   handles Congestion Control.  Basically, the goal here is to manage
   the amount of fragments present in the network; this is achieved by
   to reducing the number of outstanding fragments over a congested path
   by throttling the sources.

   Section 7 describes how the sender decides how many fragments are
   (re)sent before an acknowledgment is required, and how the sender
   adapts that number to the network conditions.



Thubert & Hui             Expires July 14, 2017                 [Page 7]
Internet-Draft    LLN Fragment Forwarding and Recovery      January 2017


6.  New Dispatch types and headers

   This specification extends "Transmission of IPv6 Packets over IEEE
   802.15.4 Networks" [RFC4944] with 4 new dispatch types, for
   Recoverable Fragments (RFRAG) headers with or without Acknowledgment
   Request, and for the Acknowledgment back, with or without ECN Echo.

            Pattern    Header Type
          +------------+-----------------------------------------------+
          | 11  101000 | RFRAG      - Recoverable Fragment             |
          | 11  101001 | RFRAG-AR   - RFRAG with Ack Request           |
          | 11  101010 | RFRAG-ACK  - RFRAG Acknowledgment             |
          | 11  101011 | RFRAG-AEC  - RFRAG Ack with ECN Echo          |
          +------------+-----------------------------------------------+

             Figure 1: Additional Dispatch Value Bit Patterns

   In the following sections, the semantics of "datagram_tag",
   "datagram_offset" and "datagram_size" and the reassembly process are
   changed from [RFC4944] Section 5.3.  "Fragmentation Type and Header".
   The size and offset are expressed on the compressed packet per
   [RFC6282] as opposed to the uncompressed - native packet - form.

6.1.  Recoverable Fragment Dispatch type and Header

                            1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |1 1 1 0 1 0 0 X|datagram_offset|         datagram_tag          |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |Sequence |    datagram_size    |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                                                  X set == Ack Requested

          Figure 2: Recoverable Fragment Dispatch type and Header

   X: 1 bit; When set, the sender requires an Acknowledgment from the
      receiver

   Sequence:  5 bits; The sequence number of the fragment.  Fragments
      are numbered [0..N] where N is in [0..31].

6.2.  Fragment acknowledgment Dispatch type and Header

   The specification also defines a 4-octet acknowledgment bitmap that
   is used to carry selective acknowledgments for the received
   fragments.  A given offset in the bitmap maps one to one with a given
   sequence number.



Thubert & Hui             Expires July 14, 2017                 [Page 8]
Internet-Draft    LLN Fragment Forwarding and Recovery      January 2017


   The offset of the bit in the bitmap indicates which fragment is
   acknowledged as follows:

                            1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |           Acknowledgment Bitmap                               |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        ^                   ^
        |                   |    bitmap indicating whether:
        |                   +--- Fragment with sequence 10 was received
        +----------------------- Fragment with sequence 00 was received

                 Figure 3: Acknowledgment bitmap encoding

TW> rewrote the paragraph below

   Figure 4 shows an example Acknowledgment bitmap which indicates that all
fragments from
   sequence 0 to 20 were received, except for fragments 1, 2 and 16 that
were
   either lost or are still in the network over a slower path.

                            1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |1 0 0 1 1 1 1 1|1 1 1 1 1 1 1 1|0 1 1 1 1 0 0 0|0 0 0 0 0 0 0 0|
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                   Figure 4: Example acknowledgment bitmap

   The acknowledgment bitmap is carried in a Fragment Acknowledgment as
   follows:

                            1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
                       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                       |1 1 1 0 1 0 1 Y|         datagram_tag          |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |          Acknowledgment Bitmap (32 bits)                      |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

        Figure 5: Fragment Acknowledgment Dispatch type and Header

   Y: 1 bit; Explicit Congestion Notification (ECN) signaling

      When set, the sender indicates that at least one of the
      acknowledged fragments was received with an Explicit Congestion
      Notification, indicating that the path followed by the fragments
      is subject to congestion.

   Acknowledgment Bitmap



Thubert & Hui             Expires July 14, 2017                 [Page 9]
Internet-Draft    LLN Fragment Forwarding and Recovery      January 2017


      An acknowledgment bitmap, whereby but at offset x indicates that
      fragment x was received.

7.  Fragments Recovery

   The Recoverable Fragments header RFRAG and RFRAG-AR deprecate the
   original fragment headers from [RFC4944] and replace them in the
   fragmented packets.
TW> are we really deprecating 4944 fragmentation, or adding another option?
   The Fragment Acknowledgment RFRAG-ACK is
   introduced as a standalone header in a message that is sent back to the
   fragment source endpoint as known by its MAC address.  This assumes
   that the source MAC address in the fragment (if any) and datagram_tag
   are enough information to send the Fragment Acknowledgment back to
   the source fragmentation endpoint.
TW> I get confused. Isn't the source MAC address changed at each hop?

   The 6LoWPAN endpoint that fragments the packets at the 6LoWPAN level (the
   sender) controls the Fragment Acknowledgments.  If may do that at any
   fragment to implement its own policy or perform congestion control
   which is out of scope for this document.
TW> I don't understand the previous sentence.
   When the sender of the
   fragment knows that an underlying mechanism protects
TW> "protects" -> "ensures delivery of"?
   the Fragments
   already it MAY refrain from using the Acknowledgment mechanism, and
   never set the ACK Requested bit.  The 6LoWPAN endpoint that
   recomposes the packets at 6LoWPAN level (the receiver) MUST
   acknowledge the fragments it has received when asked to, and MAY
   slightly defer that acknowledgment.

   The sender transfers a controlled number of fragments and MAY flag
   the last fragment of a series with an acknowledgment request.  The
   received MUST acknowledge a fragment with the acknowledgment request
   bit set.
TW> reword to "the receiver which receives a fragment with the ACK Request
bit set MUST send back an acknowledgment"?
   If any fragment immediately preceding an acknowledgment
   request is still missing, the receiver MAY intentionally delay its
   acknowledgment to allow in-transit fragments to arrive. Delaying the
   acknowledgment might defeat the round trip delay computation so it
   should be configurable and not enabled by default.

   The receiver interacts with the sender using an Acknowledgment
   message with a bitmap that indicates which fragments were actually
   received.  The bitmap is a 32 bit SWORD
TW> why SWORD?
   , which accommodates up to 32
   fragments and is sufficient for the 6LoWPAN MTU.  For all n in
   [0..31], bit n is set to 1 in the bitmap to indicate that fragment
   with sequence n was received, otherwise the bit is set to 0.  All
   zeros is a NULL bitmap that indicates that the fragmentation process
   was canceled by the receiver for that datagram.

   The receiver MAY issue unsolicited acknowledgments.  An unsolicited
   acknowledgment enables the sender endpoint to resume sending if it
   had reached its maximum number of outstanding fragments or indicate
   that the receiver has canceled the process of an individual
   datagram.  Note that acknowledgments might consume precious resources



Thubert & Hui             Expires July 14, 2017                [Page 10]
Internet-Draft    LLN Fragment Forwarding and Recovery      January 2017


   so the use of unsolicited acknowledgments should be configurable and
   not enabled by default.

   The sender arms a retry timer to cover the fragment that carries the
   Acknowledgment request.  Upon time out, the sender assumes that all
   the fragments on the way are received or lost.  The process must have
   completed within an acceptable time that is within the boundaries of
   upper layer retries.  The method detailed in [RFC6298] is recommended
   for the computation of the retry timer.  It is expected that the
   upper layer retries obey the same or friendly rules in which case a
   single round of fragment recovery should fit within the upper layer
   recovery timers.

   Fragments are sent in a round robin fashion: the sender sends all the
   fragments for a first time before it retries any lost fragment; lost
   fragments are retried in sequence, oldest first.  This mechanism
   enables the receiver to acknowledge fragments that were delayed in
   the network before they are actually retried.

   When the sender decides that a packet should be dropped and the
   fragmentation process canceled, it sends a pseudo fragment with the
   datagram_offset, sequence and datagram_size all set to zero, and no
   data.  Upon reception of this message, the receiver should clean up
   all resources for the packet associated to the datagram_tag.  If an
   acknowledgment is requested, the receiver responds with a NULL
   bitmap.

   The receiver might need to cancel the process of a fragmented packet
   for internal reasons, for instance if it is out of recomposition
   buffers, or considers that this packet is already fully recomposed
   and passed to the upper layer.  In that case, the receiver SHOULD
   indicate so to the sender with a NULL bitmap.  Upon an acknowledgment
   with a NULL bitmap, the sender MUST drop the datagram.

8.  Forwarding Fragments

   This specification enables intermediate routers to forward fragments
   with no intermediate reconstruction of the entire packet.  Upon the
   first fragment, the routers lay a label along the path that is
   followed by that fragment (that is IP routed), and all further
   fragments are label switched along that path.  As a consequence,
   alternate routes are not possible for individual fragments.  The
   datagram_tag is used to carry the label, that is swapped at each hop.








Thubert & Hui             Expires July 14, 2017                [Page 11]
Internet-Draft    LLN Fragment Forwarding and Recovery      January 2017


8.1.  Upon the first fragment

   In route over the L2 source changes at each hop.
TW> something's wrong with the previous sentence
   The label that is
   formed and placed in the datagram_tag is associated to the source MAC
   and only valid (and unique) for that source MAC.  Say the first
   fragment has:

      Source IPv6 address = IP_A (may be multiple hops away)

      Destination IPv6 address = IP_B (maybe be multiple hops away)

      Source MAC = MAC_prv (prv as previous)

      Datagram_tag= DT_prv

   The intermediate router that forwards individual fragments does the
   following:

      a route lookup to get Next hop IPv6 towards IP_B, which resolves
      as IP_nxt (nxt as next)

      a MAC address resolution to get the MAC address associated to
      IP_nxt, which resolves as MAC_nxt

   Since it is a first fragment of a packet from that source MAC address
   MAC_prv for that tag DT_prv, the router:

      cleans up any leftover resource associated to the tuple (MAC_prv,
      DT_prv)

      allocates a new label for that flow, DT_nxt, from a Least Recently
      Used pool or some similar procedure.

      allocates a Label swap structure indexed by (MAC_prv, DT_prv) that
      contains (MAC_nxt, DT_nxt)

      allocates a Label swap structure indexed by (MAC_nxt, DT_nxt) that
      contains (MAC_prv, DT_prv)

      swaps the MAC info to from self to MAC_nxt

      Swaps the datagram_tag to DT_nxt

   At this point the router is all set and can forward the packet to
   nxt.

TW> this assumes the first fragment is long enough to contain the
source/destination IP addresses, right?






Thubert & Hui             Expires July 14, 2017                [Page 12]
Internet-Draft    LLN Fragment Forwarding and Recovery      January 2017


8.2.  Upon the next fragments

   Upon next fragments (that are not first fragment), the router expects
   to have already Label swap structure indexed by (MAC_prv, DT_prv).
   The router:

      looks up the Label swap entry for (MAC_prv, DT_prv), which
      resolves as (MAC_nxt, DT_nxt)

      swaps the MAC info to from self to MAC_nxt;

      Swaps the datagram_tag to DT_nxt

   At this point the router is all set and can forward the packet to
   nxt.

   if the Label swap entry for (MAC_src, DT_src) is not found, the
   router builds an RFRAG-ACK to indicate the error.  The acknowledgment
   message has the following information:

      MAC info set to from self to MAC_prv as found in the fragment
TW> "to from"?

      Swaps the datagram_tag set to DT_prv

      Bitmap of all zeroes to indicate the error

   At this point the router is all set and can send the RFRAG-ACK back
   to the previous router.

8.3.  Upon the fragment acknowledgments

   Upon fragment acknowledgments next fragments
TW> reads strange
   (that are not first
   fragment), the router expects to have already Label swap structure
   indexed by (MAC_nxt, DT_nxt).  The router:

      looks up the Label swap entry for (MAC_nxt, DT_nxt), which
      resolves as (MAC_prv, DT_prv)

      swaps the MAC info to from self to MAC_prv;

      swaps the datagram_tag to DT_prv

   At this point the router is all set and can forward the RFRAG-ACK to
   prv.

   If the Label swap entry for (MAC_nxt, DT_nxt) is not found, it simply
   drops the packet.




Thubert & Hui             Expires July 14, 2017                [Page 13]
Internet-Draft    LLN Fragment Forwarding and Recovery      January 2017


   If the RFRAG-ACK indicates either an error or that the fragment was
   fully receive, the router schedules the Label swap entries for
   recycling.  If the RFRAG-ACK is lost on the way back, the source may
   retry the last fragments, which will result as an error RFRAG-ACK
   from the first router on the way that has already cleaned up.

9.  Security Considerations

   The process of recovering fragments does not appear to create any
   opening for new threat compared to "Transmission of IPv6 Packets over
   IEEE 802.15.4 Networks" [RFC4944].

10.  IANA Considerations

   Need extensions for formats defined in "Transmission of IPv6 Packets
   over IEEE 802.15.4 Networks" [RFC4944].

11.  Acknowledgments

   The author wishes to thank Jay Werb, Christos Polyzois, Soumitri
   Kolavennu, Pat Kinney, Margaret Wasserman, Richard Kelsey, Carsten
   Bormann and Harry Courtice for their contributions and review.

12.  References

12.1.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <http://www.rfc-editor.org/info/rfc2119>.

   [RFC4944]  Montenegro, G., Kushalnagar, N., Hui, J., and D. Culler,
              "Transmission of IPv6 Packets over IEEE 802.15.4
              Networks", RFC 4944, DOI 10.17487/RFC4944, September 2007,
              <http://www.rfc-editor.org/info/rfc4944>.

   [RFC6282]  Hui, J., Ed. and P. Thubert, "Compression Format for IPv6
              Datagrams over IEEE 802.15.4-Based Networks", RFC 6282,
              DOI 10.17487/RFC6282, September 2011,
              <http://www.rfc-editor.org/info/rfc6282>.

   [RFC6298]  Paxson, V., Allman, M., Chu, J., and M. Sargent,
              "Computing TCP's Retransmission Timer", RFC 6298,
              DOI 10.17487/RFC6298, June 2011,
              <http://www.rfc-editor.org/info/rfc6298>.





Thubert & Hui             Expires July 14, 2017                [Page 14]
Internet-Draft    LLN Fragment Forwarding and Recovery      January 2017


12.2.  Informative References

   [I-D.ietf-6tisch-architecture]
              Thubert, P., "An Architecture for IPv6 over the TSCH mode
              of IEEE 802.15.4", draft-ietf-6tisch-architecture-10 (work
              in progress), June 2016.

   [I-D.ietf-6tisch-tsch]
              Watteyne, T., Palattella, M., and L. Grieco, "Using
              IEEE802.15.4e TSCH in an IoT context: Overview, Problem
              Statement and Goals", draft-ietf-6tisch-tsch-06 (work in
              progress), March 2015.

   [RFC1191]  Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191,
              DOI 10.17487/RFC1191, November 1990,
              <http://www.rfc-editor.org/info/rfc1191>.

   [RFC2309]  Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering,
              S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G.,
              Partridge, C., Peterson, L., Ramakrishnan, K., Shenker,
              S., Wroclawski, J., and L. Zhang, "Recommendations on
              Queue Management and Congestion Avoidance in the
              Internet", RFC 2309, DOI 10.17487/RFC2309, April 1998,
              <http://www.rfc-editor.org/info/rfc2309>.

   [RFC2914]  Floyd, S., "Congestion Control Principles", BCP 41,
              RFC 2914, DOI 10.17487/RFC2914, September 2000,
              <http://www.rfc-editor.org/info/rfc2914>.

   [RFC3168]  Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
              of Explicit Congestion Notification (ECN) to IP",
              RFC 3168, DOI 10.17487/RFC3168, September 2001,
              <http://www.rfc-editor.org/info/rfc3168>.

   [RFC4919]  Kushalnagar, N., Montenegro, G., and C. Schumacher, "IPv6
              over Low-Power Wireless Personal Area Networks (6LoWPANs):
              Overview, Assumptions, Problem Statement, and Goals",
              RFC 4919, DOI 10.17487/RFC4919, August 2007,
              <http://www.rfc-editor.org/info/rfc4919>.

   [RFC4963]  Heffner, J., Mathis, M., and B. Chandler, "IPv4 Reassembly
              Errors at High Data Rates", RFC 4963,
              DOI 10.17487/RFC4963, July 2007,
              <http://www.rfc-editor.org/info/rfc4963>.







Thubert & Hui             Expires July 14, 2017                [Page 15]
Internet-Draft    LLN Fragment Forwarding and Recovery      January 2017


   [RFC5405]  Eggert, L. and G. Fairhurst, "Unicast UDP Usage Guidelines
              for Application Designers", BCP 145, RFC 5405,
              DOI 10.17487/RFC5405, November 2008,
              <http://www.rfc-editor.org/info/rfc5405>.

   [RFC5681]  Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
              Control", RFC 5681, DOI 10.17487/RFC5681, September 2009,
              <http://www.rfc-editor.org/info/rfc5681>.

Authors' Addresses

   Pascal Thubert (editor)
   Cisco Systems, Inc
   Building D
   45 Allee des Ormes - BP1200
   MOUGINS - Sophia Antipolis  06254
   FRANCE

   Phone: +33 497 23 26 34
   Email: pthubert@cisco.com


   Jonathan W. Hui
   Nest Labs
   3400 Hillview Ave
   Palo Alto, California  94304
   USA

   Email: jonhui@nestlabs.com






















Thubert & Hui             Expires July 14, 2017                [Page 16]

-- 
_______________________________________

Thomas Watteyne, PhD
Research Scientist & Innovator, Inria
Sr Networking Design Eng, Linear Tech
Founder & co-lead, UC Berkeley OpenWSN
Co-chair, IETF 6TiSCH

www.thomaswatteyne.com
_______________________________________