Internet-Draft ipv6-qos-exthdr March 2024
Eckert, et al. Expires 5 September 2024 [Page]
Intended Status:
T. Eckert
Futurewei Technologies USA
J. Joung
Sangmyung University
S. Peng
ZTE Corp.
X. Geng

Considerations for common QoS IPv6 extension header(s)


This document is written to start a discussion and collect opinions and ansers to questions raised in this document on the issue of defining IPv6 extension headers for DETNET-WG functionality with IPv6.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 5 September 2024.

Table of Contents

1. Introduction

1.1. Process and innovation problem and proposal

DetNet has or is considering different functionalities which would require IPv6 extension headers if they where to be supported with IPv6 without the use of an additional hop-by-hop or tunneling mechanism. Due to the absence of such headers, DetNet has so far been developing variations of transporting IPv6 over MPLS because in MPLS some of this functionality was already defined by DetNet.

For the problem of hop-by-hop bounded latency guaranteed especially for large-scale networks such as Service-Provider or private metropolitan aggregation networks, various competing proposals are being made and agreement on only one or very few of such proposal will be a hard competitive decision and is not supportive of the proven IETF model of allowing new technology to be productized and let the market decide - and then after sufficient experience discuss further standardization on what was successful.

The main issue to gain experience is the overhead that would exist if each of these proposal was to ask for an IPv6 extension header to embody it's functionality. This goes even to a chicken-and-egg problem, that 6MAN would very likely not want to spend time on multiple extension headers for different proposals that have not yet achieved enough adoption experience that they would qualify for standards track.

This problem also extensions to per-hop QoS functionality beyond DetNet, such as novel congestion control mechanisms that where already presented to IETF and did not progress due to the high overhead of getting extension headers, or possible new work, also from research (IRTF or other) that does not even dare to attempt work on an IPv6 solution due to the process overhead of defining IPv6 extension headers.

This document therefore proposes to cut through this chicken-and-egg problem by proposing a single, but extensible (set of) IPv6 extension header(s) for IPv6 that are built to support multiple different end-to-end and hop-by-hop QoS functions.

The goal would be to ultimately arrive at a 6MAN standards track document that defines all the encoding and allocation aspects under the purview of 6MAN, so that by using those extension header(s), technical groups who are experts on specific functionality can then create specifications defining multiple alternative options for QoS packet processing that leverage the common header. These specification can range from industry specific with public documentation over informational IETF RFCs over to experimental/standards track RFC for those methods.

To rephrase the above in a more packet level explanation: The goal is to produce a common packet header that can be thought of as a larger variation of the IPv6 header TOS field (which over time evolved into ECN and DSCP sub-fields), and allow those different specifications to define the encoding and end-to-end and per-hop processing options based on subdividing that packet header space into multiple metadata fields used in processing.

Like with ECN and DSCP, the semantic of the processing is not the purview of 6MAN anymore, but groups expert in the intended processing.

One of the tasks to define a clear demarcation between the responsibility of DetNet is to define in such a common packet header specification the permitted limits as to what can be done in processing, such as not impacting routing, but only per-hop packet scheduling, admission or congestion based or end-to-end resiliency functions derived from DetNet's PREOF (described below). Likewise some definition of appropriate type of metadata will be required which does not violate privacy but only describes the processing required characteristics of the traffic.

While the core initial driver for this discussion are DetNet QoS functions, this work should equally be applicable to non-DetNet QoS functions, such as congestion control algorithms in need of more metadata than possible via DSCP and ECN. To this end, two examples are included, DPS and LBF.

1.2. Technical Problem (DetNet)

DetNet supports today or plan to support through additional work functions that require packet header "metadata" elements, and those elements are not all supported in IETF standards for existing network layers.

1.2.1. Background

The following attempts to summarize DetNet functionality as relevant to the discussions in this document.

The DetNet architecture [RFC8655] specifies that "DetNet operates at the IP layer and delivers service over lower-layer technologies such as MPLS and Time-Sensitive Networking (TSN) as defined by IEEE 802.1". DetNet forwarding has two sub-layers, the forwarding sub-layer and the service sub-layer.

Nodes that only participate in the forwarding sub-layer, but not the service sub-layer are called DetNet transit node. Transit nodes operate some hop-by-hop forwarding protocols, such a IP, IPv6, MPLS, 802.1 (ethernet switching with TSN services) or other, that is not explicitly mentioned in the DetNet architecture, such as BIER [RFC8279]. Note that DetNet supports unicast and multicast.

The DetNet forwarding sub-layer provides resource allocation and explicit routing to DetNet flows or their aggregates. Resource allocation means bandwidth reservation and buffer management to ensure no-loss forwarding, and when required for the traffic also per-hop scheduling to guarantee bounded latency of the forwarded DetNet traffic.

The DetNet service sub-layer provides (according to [RFC8655] the Packet Replication Function (PRF), the Packet Elimination Function (PEF) and the Packet Ordering Function (POF). These functions are collectively called PREOF (Packet Replication Elimination and Ordering Functions). PREOF provides resilience against even individual packet loss by utilizing or inserting some sequence number in packets in the PRF and sending out two or more copies of the traffic to disjoint paths which are only converging in a DetNet service node performing the PEF and optionally the POF. These functions may occur on network nodes or on DetNet sender/receiver nodes. POF is optional because it implies packet buffering in the face of packet reordering, something which may be too complex to perform in a network node.

In DetNet MPLS data plane [RFC8964], DetNet/MPLS packets have one (or more) F(orwarding) labels through which the explicit path and resource reservation can happen and which can be set up by various MPLS control plane mechanisms, including, but not limited to RSVP-TE. These are followed by a S(ervice) label (S-Label) and at the bottom of the MPLS stack a DetNet Control Word (d-CW) containing a sequence number for the traffic (DetNet flow or aggregate) identified by the S-label.

In a simple example MPLS network deployment of DetNet, the ingress PE LSR would perform the DetNet service sub-layer with PRF and replicate the two copies of the traffic into two RSVP-TE tunnels (over disjoint paths) to the egress PE that performs the PRF/POF. Intervening MPLS P LSR act solely as DetNet forwarding sub-layer nodes, forwarding the traffic as MPLS, and resource allocation only happens out-of-band in the Path Computation Engine (PCE) and Admission Controller (AC).

Note that service sub-layer functions are not necessarily only applied on ingress and egress of a network, depending on the design and deployment of a service, they may also occur on intermediate DetNet service nodes, for example to protect against packet loss on more than one hop along a long path.

1.2.2. Gaps and challenges No PREOF header for IP

For IP and IPv6, there is no equivalent network packet header to the S-label/d-CW in MPLS. Hence, it is currently not possible to build PREOF purely with an IP packet header. Instead, there are proposal to utilize the MPLS header within an IP environment, such as [I-D.ietf-detnet-mpls-over-ip-preof]. Such mixed stacks may not be ideal for processing performance and/or implementation also on DetNet sender/receivers. Limited choice for bounded latency scheduling in IP and MPLS

The only scheduling mechanism documented in an RFC to provide per-hop bounded latency is [RFC2212]. This severely limits the ability to easily implement IP router or MPLS LSR forwarding sub-layer bounded-latency functionality. The IEEE TSN working group has defined several per-hop bounded latency mechanisms. These can not be used today though hop-by-hop when the forwarding node is not operating as an 802.1 bridge, but as an IP/IPv6 router or MPLS LSR. The IEEE has the intend to extend specification of TSN mechanism to make them applicable to IP/MPLS forwarding layers, but except for some summary of this effort, it is unclear what degree of IETF review and hence IETF standards applicability those mechanisms would get (ongoing work at 2/2024). Scalability issues of explicit routing

In most cases, bandwidth and if possible bounded-latency guarantees can only be provided on a strict explicit path, because every single re-route that could happen would require pre-allocated resources. In per-hop explicit routing, such as RSVP-TE, this requires a so-called strict ERO, and every LSR requires an MPLS LSP for each such RSVP-TE tunnel. In a large MPLS network with e.g.: 1000 PE and a full-mesh of PE-PE RSVP-TE tunnels, this would require 1,000,000 LSP. PCE established LSP could gain some optimization through MP2P LSP setup, but this is still an undesirable large design, purely from the perspective of the required PCE signaling to P LSR.

The same considerations apply to strict routes in IP/IPv6 networks, due to the absence of an RSVP-TE equivalent signaling of ERO in IP/IPv6, only PCE signaling based solutions are available through standard mechanisms today.

Segment Routing (SR) for MPLS and IPv6 overcomes this issue by eliminating the per-hop steering, and moves the explicit route into the packet header. However, SR started out primarily for use-cases such as capacity optimization, where strict explicit routing is not required, but instead loose routing is sufficient, and the PCE is calculating a minimum number of loose hops to put into the steering header. This is insufficient for DetNet, but instead large service provider networks, especially in sparsely populated but large countries may have ring-type topologies with 20 or more hops, requiring 20 or more explicit steering hops in the source routing header.

In SRv6, mechanisms such as [I-D.ietf-spring-srv6-srh-compression] are looking into supporting longer SR paths in the header, which should hopefully suffice for the long and strict explicit paths required for DetNet in such networks. Likewise, MPLS LSR requirements for the maximum supported length of label stacks should take the requirement for strict explicit paths into account in profiling implementation requirements. Scalability issues of hop-by-hop bounded latency mechanisms

Both [RFC2212] as well as most TSN mechanisms in support of hop-by-hop bounded latency scheduling operate on the basis of per-flow, per-hop traffic state, such as per-flow Weighted-Fair-Queuing or Shaping. These mechanisms require per-flow, per-hop state that needs to be established by a PCE and whose operational performance and scalability requirements are worse of those of per-hop, per-flow explicit traffic steering without source routing (segment routing).

2. Common header proposal

The following two picture show the proposed common metadata blocks from which one or two IPv6 extension header(s) are to be composed. these blocks do not include the common headers of the possible IPv6 extension header options but are only their payload.

   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
  |Per Hop Method |    Method Parameters (56 bits)                |
  +-+-+-+-+-+-+-+-+                                               |
  |                                                               |
Figure 1: Per-hop metadata block

Per Hop Method, Method Parameters:

The Per Hop Method field (abbreviated to Method) indicates the format of the following 56 bits of method parameters and its processing by a per-hop scheduling/marking algorithms. When used with DetNet, this header is processed by every DetNet transport node.

For this option, IANA would have to allocate one new 5-bit codepoint for the Hop-by-Hop option in combination with all "act"/"chg" bit combinations. The specification for each method would be required to comply with [I-D.ietf-6man-hbh-processing] and define which "act"/"chg" options the method uses and how.

The common specification for this hop-by-hop option would specify that the requirements of [I-D.ietf-6man-hbh-processing] for this new option apply individually on a per-method basis. In other word, implementations supporting this option must not only perform the right "act" processing based on whether this option is supported/configured, but on whether/how the specific "Per Hop Method" is supported/configured. More specifically, by default, packets with any non-explicitly configured "Per Hop Method" must default to be discarded with only internal logging, not or minimally impacting performance of other packets (aka: increase a single per-received-interface (potentially sampled) counter for packets with this option - or better).

ICMP replies support must only be provided for explicitly supported and configured methods. Likewise, ignoring/passing packets with unknown methods must be explicitly configured on a per-method basis.

Ultimately, one core goal of this per-method approach is to escape the limited space we still have left for hop-by-hop options and on the other hand commonize across multiple different QoS mechanism variations all reasonable common-practice aspects so each such method does not have to re-invent that common part of the wheel.

Algorithms supported by this option will typically attempt to achieve per-hop bounded latency, and optionally also limit per-hop jitter when they are serving DetNet. Alternatively (or additionally) they may also attempt to achieve specific congestion control goals. Nevertheless, the 56 bit may support any per-hop function of interest based on method allocation rules and the limits seen as reasonable for the of this extension header - for example: scheduling and marking/discarding of the packet.

Allocation of Method values are proposed/discussed in Section 5.

   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
  |e2eMthd|  End-to-end Method (e2emth) Parameters (60 bits)      |
  +-+-+-+-+                                                       |
  |                                                               |
Figure 2: End-to-end metadata block

e2eMthd, Method Parameters:

This extension header data block follows the same logic as the hop-by-hop metadata block except that in general it is meant to only be processed by DetNet Service nodes, or for non DetNet functions equally only the ingress/egress nodes of an IPv6 packet path. Depending on how this is packetized, it may be part of a per-hop extension header but the data might simply be ignored there.

Processing rules would have to be written similarly to what was outlined above for the hop-by-hop option.

Allocation of e2eMthd values are proposed/discussed in Section 5.

3. Examples

This section describes example functionalities that would have to be specified (as IETF standard/experimental/informational, ISE ... external specification) utilizing the above proposed extension header options.

The text for these examples does not attempt to include all such possible specification but instead focuses on a summary of the functionality and how the metadata carried in the extension header(s) achieves it.

3.1. Example End-to-End extension header method for DetNet

The following is an example for how the end-to-end extension header might be defined for DetNet.

   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
  |e2eMthd|Rsrvd|P|           End-to-End Timestamp                |
  |0|0 0 0|                Sequence Number                        |
Figure 3: Example DetNet End-to-end metadata block

Rsrvd: Reserved.

Sequence Number: Explained in the following PREOF subsection.

End-to-end Timestamp, P: Explained in the following End-to-end Time-stamping subsection.

3.1.1. PREOF

Sequence Number:

The field carrying the Sequence Number has the same encoding and semantic as the "DetNet Control Word" as specified in the MPLS Data plane for DetNet, [RFC8964].

Note that processing the sequence number field (insertion, reordering, duplicate elimination) is relative to the DetNet flow that the packet belongs to, requiring per-DetNet flow state in the processing nodes.

This means that it is relative to the N-tuple that is looked up by the processing node. The DetNet architecture lays out various option. In simple, non-SRv6 end-to-end flow scenarios, this is the typical 5-tuple of (Src-IPv6/Mask, Dst-IPv6/Mask, Proto, SrcPort, DstPort), where Proto is the value of the first IPv6 "Next Header" field which is not an IPv6 extension header, and SrcPort, DstPort the first two 16 bit values of that protocol header.

When a routing header is used, DstIP is the final routing header address, which will only be the IPv6 headers destination address on the last hop. In addition, the DetNet flow tuple can be as small as a 2-tuple (Src-IPv6/Mask, Dst-IPv6/Mask), indicating for example all traffic to use DetNet service between a service provider networks ingress PE (Src-IP6) and egress PE (Dst-IPv6).

3.1.2. End-to-end time-stamping

The P and End-to-end Timestamp serve an "end-to-end time-stamping" service. Clock timestamp has a unit of 1 usec.

When the hop-by-hop scheduling mechanism is "in-time", traffic will incur no or little per-hop scheduling delay in the absence of competing traffic, but more delay when there is competing traffic. This can lead to a high degree of end-to-end latency variation ("jitter") under varying degrees of competing traffic, something that applications may not be able to easily deal with.

End-to-end timestamping is a way to permit the egress DetNet service node to buffer packets received before their guaranteed maximum bounded latency. This is a tentative feature which has not yet been considered in other drafts in DetNet, and is included to offer a more comprehensive view of possible still to be resolved DetNet packet header features beyond per-hop scheduling.

If the P)layout flag is 0, then the end-to-end timestamp indicates the time I at which the ingress DetNet service node inject the packet. The egress DetNet service node then needs to understand the guaranteed bounded latency D for the packets flow and delay the packet up to time I+D.

In some cases, it may be easier for the ingress DetNet service node to know D, in which case it can set the End-to-End Timestamp to I+D and indicate this via P=1. The egress DetNet service node then needs to delay the packet up to the time indicated by the end-to-end timestamp.

Because the end-to-end timestamp is not a full timestamp, it needs to be defined as a 24 bit modulo against some reference clock timestamp, such as seconds since Jan-1-1970. There is also the need for the egress DetNet node to check the modulo (formula TBD).

The unit of the end-to-end-timestamp is 0.1 usec, allowing a maximum latency of 1.6 seconds (24 bit value). This allows any potentially necessary timestamp calculations to be at last 1 usec precise. 1.6 seconds is assume to be significantly longer than any relevant end-to-end delay that needs to be supported.

3.2. Example Hop-by-hop extension header Methods

The different example methods described in the following subsections use a one-bit field V(ersion) to allow introduction of both backward compatible and non-backward compatible extensions of a method without requiring a new Method field allocation.

When V=0, extensions need to be backward compatible and use only Reserved bits in the header, which are by default sent as 0 and ignored by the following methods.

When V=1, receivers will recognize the packet header as being incompatible with the following specifications. In that case all bits of the Method parameters can be redefined as seem fit by such extensions, but all nodes processing such a header must then support that new version of the Method.

3.2.1. C-SCORE

"Work Conserving Stateless Core Fair Queuing" (C-SCORE, [I-D.joung-detnet-stateless-fair-queuing]) is a mechanism for per-hop stateless fair queuing that together with admission control allows to guarantee per-hop bounded latency in which because of the properties of fair queuing the latency of bursts is only paid once. It aims to achieve similar end-to-end bounded latency guarantees as [RFC2212], which is also leveraging fair queuing, except that that mechanism does not operate stateless, but requires for each flow on every relevant hop per-flow state, specifically the weighted fair queue for the flow.

   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
  |    Method     |  Reserved     | Service Rate                  |
  |                  Finish Time                                  |
Figure 4: TCQF hop-by-hop block

Finish Time:

This is also called F(p) in the C-SCORE draft. This is in units of usec and updated through the C-SCORE algorithm and experienced processing latency on every hop. See below in the gLBF section for thoughts on how to deal with overflow modulo for such a parameter.

Service Rate:

This is also called r in the C-Score draft. The unit is bits/sec. TBD: this may need to be a larger unit, such as kbps.

Note that C-SCORE also needs to know the length of the packet for its calculation. It is assumed that this length is known when C-SCORE algorithm is run, and it is up to implementations to determine the length from e.g.: L2 encapsulation of the IPv6 packet or further parsing of the packet header chain following the IPv6 extension headers.

3.2.2. TCQF

   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
  |    Method     |V|   Reserved          | Prio  |    Cycle      |
  |                     Reserved                                  |
Figure 5: TCQF hop-by-hop block


"Tagged Cyclic Queuing and Forwarding" (TCQF) is a forwarding method derived from IEEE "Cyclic Queuing and Forwarding" ([CQF]). In TCQF, multiple cycles are used to forward packets. For example, if 3 cycles are used packets will be sent in sequence for a cycle time period T of e.g.: 10 usec from cycle buffer 1, then for T from cycles 2, then cycle 3 and then cycle 1 again. The Cycle value in received packets from a neighbor are mapped via a simple (neighbor, input-cycle) -> output-cycle function, output-cycle is written into the Cycle field and the packet is enqueued into that cycle buffer. The mapping is set up so that all packets for input-cycle can be received before output-cycle starts to send. Carrying the Cycle value in the packet allows to support arbitrary link-latencies (which is limited in CQF) as well as higher level of errors between clocks on adjacent nodes. This is called "Maximum Time Interval Error" - MTIE in clock synchronization protocols such as PTP.

Cycle=0 is reserved. Current considerations are that fewer than e.g.: 10 Cycles are needed with TCQF, but the field is defined larger to allow extensions without having to redefine the field in an incompatible fashion.

While not specified in the current TCQF draft, it is possible to use multiple independent set of cycle buffers for packets of different priorities. Thus the encoding also includes a Prio(rity) field. Prio=0 is reserved, Typically a maximum of 8 priorities is likely to be beneficial/necessary.

3.2.3. TQF

"Timeslot Queueing and Forwarding" (TQF), [I-D.peng-detnet-packet-timeslot-mechanism] is a variation of CQF (and TCQF, CSQF) in which a larger number of cycles, which are called Timeslots, are used to allow direct interleaving of more smaller flows without having to reserve bandwidth in every cycle to such flows - but only reserving bandwidth in a smaller number of timeslots in a larger number of timeslots.

   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
  |    Method     |V|   Reserved  |G| Timeslot                    |
  |                     Reserved    | Deviation                   |
Figure 6: TQF hop-by-hop block

Timeslot is the equivalent of Cycle in TCQF, but may use 1024 or more values.

If G)lobal =0, this indicates Local Timeslot style, G)local=1 indicates Global Timeslot style. See [I-D.peng-detnet-packet-timeslot-mechanism], for more explanations.

Deviation indicates the number of timeslots which a packet was delayed by on one or more hops because it did not fit into the earliest available timeslot on a processing node. This value is updated by adding the number of slots the packet is delayed to the Deviation value and updating the Deviation field in the packet.

When so desired, the egress DetNet service node can then use this value and the admission control system calculated maximum value of this field for this flow to delay packets such that all packets of the flow will have the same latency (reducing jitter).

3.2.5. CSQF

CSQF, [I-D.chen-detnet-sr-based-bounded-latency] operates fundamentally like TCQF except that it does not operate on a single cycle identifier in a header field, which is then rewritten on every hop like TCQF, but instead the cycle identifier for every hop is a (parameter of the) SRv6 SID for every hop, e.g.: it requires use a of a routing header such as SRH.

Compared to TCQF, this approach has the benefit of allowing a more flexible per-hop sequence of cycles because the cycle for every hop is programmable for each packet, whereas it is fixed by router mapping tables in TCQF.

As a downside, when the cycle mapping does not require this flexibility, it costs more bits on the wire, because when e.g.: N=4 bit of cycle values are required, then this requires as many bits per hop in the routing header, which may be a relevant consideration when using compressed routing headers.

However, CSQF like TCQF can also be implemented with different priorities, and if that is an end-to-end priority, then it may be beneficial not to replicate the priority field into every source routing header hop (e.g.: SRv6 SID in SRH), but carry it in the hop-by-hop extension header field. For this purpose, it would be possible even to re-use the CQF Method formatting, set the Cycle field to 0, but the Prio field to a non-zero value indicating the priority. Likewise, it should then be possible to only allocate a single Method value for both TCQF and CSQF.

3.2.6. gLBF Guaranteed Latency Based Forwarding (gLBF)

"guaranteed Latency Based Forwarding" (gLBF, [I-D.eckert-detnet-glbf] is a mechanism for on-time stateless forwarding of packets without requirements for clock synchronization utilizing the calculus for admission control and flow specifications of TSN-ATS. To eliminate the per-hop shaper or interleaved regulator state of TSN-ATS, it instead uses a Damper.

   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
  |    Method     |V|   Reserved                  | PPrio | Prio  |
  |                         Damper Time                           |
Figure 7: gLBF hop-by-hop block

If Method equals to the value assigned to gLBF, the method parameters are as follows:


Prio is the end-to-end priority of the packet with defined values 1 to 8. 1 is highest priority, 8 is lowest priority. Values 9-15 are reserved. If Prio is 0, then the per-hop priority needs to be derived from the per-hop routing header information, such as the SID or its parameters when using SRH. When using per-hop priorities solely via SIDs (such as when using compressed SIDs), this requires up to 8 SID per node (depending if all 8 priorities are required in the deployment).

PPrio (prior hop priority):

This is an optional parameter. It is set to 0 when not used. When supported then a node will insert the per-hop priority extracted from its SID (or its parameters) into this field (value 1-8) in support of less complex processing of the packet on the following node, such as simple timed FIFOs. See [I-D.eckert-detnet-glbf] for further explanations.

Note that the prior hop prio is also available from the routing header SID when using IPv6 routing headers, because unlike in MPLS, it is not discarded. Nevertheless, a node typically would not know the semantics of the prior-hop nodes SID to priority mapping.

Damper Time:

This is the time calculated by the prior-hop node that the receiving node needs to delay the packet so that the sum of scheduling latency on the prior hop and this damper time is the known maximum bounded latency for this hop and the prior hops priority of the packet.

Each node rewrites this field after knowing when the packet will hit the outgoing interface (e.g.: after dequeing it from the egress queue).

3.2.7. Dynamic Packet State (DPS)

"Dynamic Packet State" (DPS) [I-D.stoica-diffserv-dps] is a proposal that was brought to the IETF in 2002. It is not considered for DetNet, but is the first example presented here how the common header proposed might also help accelerate at least practical experimentation with research QoS options that so far have always struggled to get even towards practical experimentation because of packetization.

DPS provides weighted fair scheduling approximation without per-flow state (such as a per-flow weighted queue) by carrying a flows target data rate as metadata in the packet. In the most basic setup, all DPS flows share a single traffic class. The scheduler calculates an ongoing estimate of the (over-load) of that traffic class, deriving a packet ECN-mark or discard probability, and then adjusting that probability up or down based on the packets target data rate.

A solution like DPS does require controlled networks where the target data rate of flows can be trusted, but then allows to easily solve issues such as allowing devices/applications requiring much faster flows to get those higher speed under congestion - without introducing any non-scalable per-flow state management issues (except for wherever on ingress some policy admission is needed).

The DPS draft lays out primarily complex encoding options to minimize the number of bits required to encode the metadata. These considerations not only precede faster networks of today, but they also ignore the issue that complex encoding will not necessarily work at high speeds in programmable forwarding plane. Hence a simple, non-floating point representation would likewise also accelerate the ability to experiment with these type of advanced mechanisms.

   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
  |    Method     |V|   Reserved                                  |
  |                         Flow Rate                             |
Figure 8: DPS hop-by-hop block

In a simple encoding option, the DPS metadata would simply consist of a 32-bit Flow Rate field in units of 10 bps, allowing maximum flow rates of 40 Gbps, thus allowing experimentation both in high-speed as well as low-speed, radio networks.

3.2.8. Latency Based Forwarding (LBF)

Like DPS, "Latency Based Forwarding" (LBF, [LBF]) is also not a DetNet target per-hop method, but a more research experiment from which gLBF was derived.

It is included here as a second examples of how the common header structure proposed here could also help to avoid researchers having to introduce new packet headers fully and instead are able to rely on the framework of the common header. It also shows how the desire for a space optimized common header may cause limitation to more experimental, advanced solutions.

   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
  |    Method     |V| Reserved  |A| eLatency                      |
  | minLatency                    | maxLatency                    |
Figure 9: LBF hop-by-hop block

eLatency is the latency experienced by the packet when it travels hop-by-hop in the unit of usec. Every node along the path adds the latency including link and queuing latency to the value, updating the packet header field.

minLatency and maxLatency are only set by the originator and never changed. Like eLatency their unit is usec.

Every node performing LBF scheduling management uses routing information that includes propagation latencies to know the minimum, no-queuing latency to the packets destination to determine how much scheduling time budget the packet has, taking eLatency and maxLatency into account. When it calculates that the packet could not reach the destination in time it discards the packet immediately, hence avoiding unnecessary congestion downstream. When immediate sending of the packet would make the packet likely arrive too early (taking minLatency into account), the packet will be intentionally delayed even in the absence of contention. As soon as the packet would not be too early to be sent it's dynamic scheduling priority versus contending packet based on the calculated maximum time it could spend to not exceed maxLatency when it reaches the destination.

A)dmitted is a flag indicating whether the traffic is admission controlled, which increases its dynamic scheduling priority in LBF.

minLatency and maxLatency are so-called "Service Level Objectives" (SLO), and overall LBF attempts to show that latency SLO can not only be orchestrated in complex fashions in the controller/control-plane, but also lightweight and stateless (hence scaling) directly in the forwarding plane. It can equalize end-to-end latency for flows across different long paths to create fairness, such as in metropolitan access rings, and it can minimize multi-hop queuing latency - once a packet experiences undesired queuing on one hop, its dynamic priority will be higher on the next.

Because three time values are required, they can all only be 16 bit long, making the maximum latency supportable 65 msec. This should well suffice for any interactive time-critical applications, but it would not be good enough for arbitrary use over wide-area networks.

4. Open questions

4.1. Functional requirements / limitation

Assume a standards track draft/RFC was to be created from this header concept. What are the functional requirements / limitation necessary and sufficient so that it becomes most easy to create standard/experimental/information/external methods leveraging this encapsulation.

For example:

Defining constraints that must be met by the mechanism to be allowed to use this header, such as:

o The header operation may primarily only impact scheduling, marking and/or discarding of packets containing the header relative to other packets. The mechanisms explicitly need to minimize exposure of application information to the network according to [RFC9419]. For example, past proposal attempted to include metadata characterizing the application, session and type of media transported so that the network operator could try to provide mappings to the service quality deemed necessary for the traffic. Instead the metadata should explicitly only be the control parameter for the QoS algorithm provided by the method.

o If parameters beyond this limit are proposed to be transported, then they must be encrypted in a way that would allow for them to be decoded only by entities to whom the originating application can build an equal trust relationship to the one deemed to be sufficient to carry the same information in an encrypted/authenticated end-to-end connection. Note that the potential to include such encryption related parameters may be one reason to potentially reserve more space, such as another 32 bits for some form of security-association identifier.

o Unless the mechanism claims and provides sufficient evidence (any standard requirements to achieve this) to be compatible to Internet congestion control (best RFC reference ?), the mechanism needs to be scoped for intra-domain/controlled-network/federation use (not across the Internet) and/or require participating hosts to employ circuit breakers ([RFC8084]).

o Operation of the method must only depend on parameters included in the header(s) (hop-by-hop, end-to-end), and the base IPv6 header, but not other headers or payload.

o All operations on any hop-by-hop header defined for a method are intended to be possible within the "fast-path" forwarding plane of advanced routers, any control-plane / slow-path functionality should not rely on these headers. Any normal end-to-end header should in general be designed in the same way, but an additional method may be assigned to end-to-end headers that are intended to operate only in software. For example the proposed playout metadata in the end-to-end header requires timed buffering of packets, something typically not deemed feasible for even most advanced high-speed router forwarding plane engines today. However, this functionality is easy to do in software on receiver host stacks, and supporting this via an IPv6 destination option header will likely may it easier to add such functionality to existing application/host-stacks than defining a new "transport" header for it - because the latter option would then require to "tunnel" e.g.: UDP on top of that transport.

o The method must specify how it fits into a DiffServ configuration model.

o A method specification must describe how the method interacts with traffic not carrying the methods header.

4.2. Combination with IPv6 routing header.

When DetNet wants to guarantee per-hop resources for bandwidth and optionally per-hop latency, then this means in almost all network designs with redundancy, that the network path needs to be fixed through what is commonly called a strict, explicit hop-by-hop route, or else a re-route event on a loose intermediate hop will cause the traffic to reconverge to a path without prior resource allocation.

Today, in IPv6 there are two relevant source-routing headers through which this steering is done in the industry. For high-speed networks the "IPv6 Segment Routing Header", [RFC8754], and for lower speed, industrial, and often also wireless-mesh networks the "IPv6 Routing Header for Source Routes with the Routing Protocol for Low-Power and Lossy Networks", [RFC6554]. Because DetNet explicitly includes the wireless network architecture aspects originating from the IETF RAW working group, we should assume that in the ideal case, the DetNet header can be combined with the functionality of either of these type of networks.

4.3. Hop-by-hop or Routing-Header

Should the DetNet header (primarily) be a Hop-by-Hop (HbH) header, or a routing-header ? Here are a couple of considerations:

A HbH header would have the benefit of allowing to combine DetNet with unmodified routing headers [RFC8754] or [RFC6554].

A HbH header would have the possible downside that parsing and executing both a HbH header and a routing header may be more expensive in high-speed forwarding planes than if the DetNet header would become part of a new routing header. Especially because in the worst case, 50% the DetNet functionality may need to be applied on ingress before routing, and the other 50% may need to be applied after routing on the egress of the router.

A HbH header would have the benefit of allowing per-hop operation even if the routing header is loose hop. As mentioned above, this does not seem to be a significant use-case for DetNet.

4.4. Extending existing router headers or new routing header ?

Technically it seems to be an option to include the DetNet header into an SRH as a "DetNet" TLV. So far, all existing SRH TLV are, as far as the authors are aware only examined by the final SRH hop, but not hop-by-hop. In this respect, the SRH TLV options seem to be mostly a replacement for a separate Destination Options Header, and implementations may have a higher overhead acting hop-by-hop on a TLV encoded DetNet header.

However, the option for hop-by-hop examined TLVs are architecturally possible in SRH through the high order bit of the TLV type field.

For [RFC6554] extensions are not explicitly considered, but it should be possible to update this RFC with a DetNet header added at the end (similar to the TLV section in SRH), but without having to add a TLV encoding.

Extending the existing headers will have the architectural downside of having to support two routing headers with DetNet, but this seems to be only a theoretical and RFC text duplication downside, because almost every device will only support one type of header anyhow.

4.5. Integrated or split DetNet header

The service sub-layer functions and associated DetNet packet header elements do not need to be executed on every hop where DetNet transport sub-layer functions and hence the associated packet header elements are required.

Therefore it could be considered to be technically feasible and architecturally sound to split up the DetNet header into two IPv6 extension headers.

A DetNet transport sub-layer extension header with the first 64 bit of data would be encoded as a HbH header, and/or an extension to an existing or new routing header.

A DetNet service sub-layer extension header with the second 32 bit which would be encoded in a Destination Options header, or (as the SHR TLV example shows, added to a routing header). When encoded into a Destination Options header there is the option of adding the 64 bit of information as options into the common Destination Options extension header or allocating a new Destination Options extension header.

It would even be possible to consider calling this header not a Destination Options header, but a new "DetNet service transport header" - by simply not declaring the new "Next Header" value to be an IPv6 extension header

Which of the options is best is an open issue.

One core functional benefit of having a single joint header is that it would be possible to consider the option that different Methods can also redefine the semantic of the 64 end-to-end bit and perform per-hop operations on them. This for example could allow longer metadata values in LBF.

5. Method and e2eMthd code-point/semantic allocation

The hop-by-hop Method is proposed to be allocated through different mechanisms in different blocks as follows.

Values 0 - 31: Header encoding and associated forwarding behavior specified through standards track RFC.

Values 32 - 63: Header encoding and associated forwarding behavior specified through experimental or informational IETF or IRTF RFC.

Values 64 - 223: Header encoding and associated forwarding behavior specified through specification (including IETF) plus expert review. Typical use would be for third-party SDO or research / industry specifications.

Values 224 - 240: Experimental use. No RFC shall refer to a binding of encoding or associated forwarding behavior to a specific code point in this range.

Values 241 - 255: Configurable.

Nodes along the path need to be configured with a consistent Configured Method Semantic. Configured Method Semantic is a another IANA registry of 64 bit value allocated on FCFS basis to a public specification without expert review. Each referenced specification can only request one Method unless expert review allows it to be associated with more than one.

The difference between Experimental and Configurable code points is that experimental explicitly attempts to avoid creating documentation for experiments that would cause them to proliferate beyond a stage of an experiment, while Configurable explicitly demands documentation to be produced, without consuming a limited space codepoint (instead consuming only a codepoint from a large space).

Configurable and Experimental are targeted to be similar to private DSCP for which no standard functionality is assigned, but instead consistent behavior, such as queue assignment and queue / early-discard/marking behavior need to be configured on every node in the network.

For values 0 - 223, temporary allocations are permitted through IETF or IRTF working group drafts (of the right track) until the draft expires or is abandoned.

For e2eMthd, similarly blocks would be assigned to different allocation policies (TBD).

6. Security Considerations

This document has no security considerations (yet?).

7. IANA Considerations

This document has no IANA considerations.

8. Logistics

The authors welcome feedback, please address it to Feel free to submit suggestions to improve the document also as issues to

8.1. Changelog

00 Initial version.

9. Acknowledgments

10. References

10.1. Normative References

Shenker, S., Partridge, C., and R. Guerin, "Specification of Guaranteed Quality of Service", RFC 2212, DOI 10.17487/RFC2212, , <>.
Hui, J., Vasseur, JP., Culler, D., and V. Manral, "An IPv6 Routing Header for Source Routes with the Routing Protocol for Low-Power and Lossy Networks (RPL)", RFC 6554, DOI 10.17487/RFC6554, , <>.
Fairhurst, G., "Network Transport Circuit Breakers", BCP 208, RFC 8084, DOI 10.17487/RFC8084, , <>.
Wijnands, IJ., Ed., Rosen, E., Ed., Dolganow, A., Przygienda, T., and S. Aldrin, "Multicast Using Bit Index Explicit Replication (BIER)", RFC 8279, DOI 10.17487/RFC8279, , <>.
Finn, N., Thubert, P., Varga, B., and J. Farkas, "Deterministic Networking Architecture", RFC 8655, DOI 10.17487/RFC8655, , <>.
Filsfils, C., Ed., Dukes, D., Ed., Previdi, S., Leddy, J., Matsushima, S., and D. Voyer, "IPv6 Segment Routing Header (SRH)", RFC 8754, DOI 10.17487/RFC8754, , <>.
Varga, B., Ed., Farkas, J., Berger, L., Malis, A., Bryant, S., and J. Korhonen, "Deterministic Networking (DetNet) Data Plane: MPLS", RFC 8964, DOI 10.17487/RFC8964, , <>.
Arkko, J., Hardie, T., Pauly, T., and M. Kühlewind, "Considerations on Application - Network Collaboration Using Path Signals", RFC 9419, DOI 10.17487/RFC9419, , <>.

10.2. Informative References

IEEE Time-Sensitive Networking (TSN) Task Group., "IEEE Std 802.1Qch-2017: IEEE Standard for Local and Metropolitan Area Networks — Bridges and Bridged Networks — Amendment 29: Cyclic Queuing and Forwarding", .
Chen, M., Geng, X., Li, Z., Joung, J., and J. Ryoo, "Segment Routing (SR) Based Bounded Latency", Work in Progress, Internet-Draft, draft-chen-detnet-sr-based-bounded-latency-03, , <>.
Eckert, T. T., Clemm, A., Bryant, S., and S. Hommes, "Deterministic Networking (DetNet) Data Plane - guaranteed Latency Based Forwarding (gLBF) for bounded latency with low jitter and asynchronous forwarding in Deterministic Networks", Work in Progress, Internet-Draft, draft-eckert-detnet-glbf-02, , <>.
Hinden, R. M. and G. Fairhurst, "IPv6 Hop-by-Hop Options Processing Procedures", Work in Progress, Internet-Draft, draft-ietf-6man-hbh-processing-14, , <>.
Varga, B., Farkas, J., and A. G. Malis, "Deterministic Networking (DetNet): DetNet PREOF via MPLS over UDP/IP", Work in Progress, Internet-Draft, draft-ietf-detnet-mpls-over-ip-preof-11, , <>.
Cheng, W., Filsfils, C., Li, Z., Decraene, B., and F. Clad, "Compressed SRv6 Segment List Encoding", Work in Progress, Internet-Draft, draft-ietf-spring-srv6-srh-compression-13, , <>.
Joung, J., Ryoo, J., Cheung, T., Li, Y., and P. Liu, "Latency Guarantee with Stateless Fair Queuing", Work in Progress, Internet-Draft, draft-joung-detnet-stateless-fair-queuing-02, , <>.
Peng, S., Du, Z., Basu, K., cheng, Yang, D., and C. Liu, "Deadline Based Deterministic Forwarding", Work in Progress, Internet-Draft, draft-peng-detnet-deadline-based-forwarding-09, , <>.
Peng, S., Liu, P., Basu, K., Liu, A., Yang, D., and G. Peng, "Timeslot Queueing and Forwarding Mechanism", Work in Progress, Internet-Draft, draft-peng-detnet-packet-timeslot-mechanism-06, , <>.
Stoica, I., Zhang, H., Baker, F., and Y. Bernet, "Per Hop Behaviors Based on Dynamic Packet State", Work in Progress, Internet-Draft, draft-stoica-diffserv-dps-02, , <>.
Clemm, A. and T. Eckert, "High-Precision Latency Forwarding over Packet-Programmable Networks", IEEE 2020 IEEE/IFIP Network Operations and Management Symposium (NOMS 2020), doi 10.1109/NOMS47738.2020.9110431, .

Authors' Addresses

Toerless Eckert
Futurewei Technologies USA
2220 Central Expressway
Santa Clara, CA 95050
United States of America
Jinoo Joung
Sangmyung University
South Korea
Shaofu Peng
ZTE Corp.
Xuesong Geng