Network Shaofu. Peng Internet-Draft Bin. Tan Intended status: Standards Track ZTE Corporation Expires: 11 June 2023 Peng. Liu China Mobile 8 December 2022 Deadline Based Deterministic Forwarding draft-peng-detnet-deadline-based-forwarding-04 Abstract This document describes a deterministic forwarding mechanism based on deadline. The mechanism enhances strict priority scheduling algorithm with dynamically adjusting the priority of the queue according to its deadline attribute. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on 11 June 2023. Copyright Notice Copyright (c) 2022 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/ license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License. Peng, et al. Expires 11 June 2023 [Page 1] Internet-Draft Deadline Routing December 2022 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 3 2. Deadline Queue . . . . . . . . . . . . . . . . . . . . . . . 4 3. Get Deadline Information of Packets . . . . . . . . . . . . . 7 3.1. Get Planned Residence Time . . . . . . . . . . . . . . . 7 3.2. Get Existing Accumulated Planned Residence Time . . . . . 8 3.3. Get Existing Accumulated Actual Residence Time . . . . . 9 3.4. Get Existing Accumulated Residence Time Deviation . . . . 9 4. Put Packets into the Deadline Queues . . . . . . . . . . . . 10 4.1. Alternate Queue Allocation Rules . . . . . . . . . . . . 13 5. Buffer Size Design . . . . . . . . . . . . . . . . . . . . . 14 6. Schedulability Condition for Deadline Mechanism . . . . . . . 14 6.1. Schedulability Condition Analysis for On-time Mode . . . 17 7. Traffic Regulation and Orchestrating . . . . . . . . . . . . 18 8. Latency Compensation Considerations . . . . . . . . . . . . . 20 9. Compatibility Considerations . . . . . . . . . . . . . . . . 22 10. Deployment Considerations . . . . . . . . . . . . . . . . . . 24 11. Benefits . . . . . . . . . . . . . . . . . . . . . . . . . . 25 12. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 25 13. Security Considerations . . . . . . . . . . . . . . . . . . . 25 14. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 25 15. References . . . . . . . . . . . . . . . . . . . . . . . . . 25 15.1. Normative References . . . . . . . . . . . . . . . . . . 25 15.2. Informative References . . . . . . . . . . . . . . . . . 26 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 27 1. Introduction [RFC8655] describes the architecture of deterministic network and defines the QoS goals of deterministic forwarding: Minimum and maximum end-to-end latency from source to destination, timely delivery, and bounded jitter (packet delay variation); packet loss ratio under various assumptions as to the operational states of the nodes and links; an upper bound on out-of-order packet delivery. In order to achieve these goals, deterministic networks use resource reservation, explicit routing, service protection and other means. Resource reservation refers to the occupation of resources by service traffic, exclusive or shared in a certain proportion, such as dedicated physical link, link bandwidth, queue resources, etc; Explicit routing means that the transmission path of traffic flow in the network needs to be selected in advance to ensure the stability of the route and does not change with the real-time change of network topology, and based on this, the upper bound of end-to-end delay and delay jitter can be accurately calculated; Service protection refers to sending multiple service flows along multiple disjoint paths at the same time to reduce the packet loss rate. In general, a Peng, et al. Expires 11 June 2023 [Page 2] Internet-Draft Deadline Routing December 2022 deterministic path is a strictly explicit path calculated by a centralized controller, and resources are reserved on the nodes along the path to meet the SLA requirements of deterministic services. [I-D.stein-srtsn] describes that the controller calculates the local deadline time of each node for the traffic to be transmitted in advance, which is a absolute system time, forms a stack of these local deadline time, and then carries them in the forwarded data packets. Each node forwards the packets according to its own local deadline. [I-D.stein-srtsn] suggests that FIFO queue can not be used to realize this function, because the packets stored in the queue are always first in first out and a special data structure is recommoned. The packets in this data structure will be automatically sorted with the order from emergency to non emergency according to the deadline of the packets. However, it may be difficult to implement this structure in hardware, and especially for a large network it may be challenge to synchronize time. Considering that the link propagation delay is generally a fixed value, and we focus on the residence time of the packets inside the node, an alternate approach is to make the deadline eliminate the interference of link propagation delay and avoids relying on time synchronization between nodes. This document desrbies an alternate packets scheduling scheme based on EDF (earliest-deadline-firs) combined with latency compensation and used for wide area network. It suggests to only use a single deadline time to control the packets scheduling of all nodes along the path. The single deadline time is an offset time, which is based on the time when the packet enters the node and represents the maximum time allowed for the packet to stay inside the node. However, if each node has obvious differences in the capability of packets forwarding and scheduling, more offset-time may be needed. 1.1. Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here. Peng, et al. Expires 11 June 2023 [Page 3] Internet-Draft Deadline Routing December 2022 2. Deadline Queue For nodes in the network, some queues with count-down time (also termed as CT) are introduced and maintained for each outgoing port. These queues are called deadline queues and participate in priority based scheduling. Deadline queues have the following characteristics: * The CT of each deadline queue will decrease with the passage of time. When it decreases to 0, the scheduling priority of the queue will be set to the highest, and the scheduling opportunity will be obtained immediately (note that there may be interference delay caused by a packet being sent by a low priority queue). It will prohibit receiving new packets, in which the buffered packets will be sent to the outgoing port immediately, and the maximum duration allowed to send packets is the preset authorization time (also termed as AT), e.g, 10us, 20us, etc. In principle, all packets buffered in the queue shall be sent within this authorization time. If the queue is sent to empty and the authorization time is still free, other queues with lower priority can be scheduled during this authorization time. * The scheduling engine can initiate a cycle timer to decrement the CT of all deadline queues, that is, whenever the timer expires, the CT values of all queues will be subtracted from the timer interval (also termed as TI). Note that TI must be less than or equal to the AT, with AT = N * TI, where the natural number N >= 1. * For a deadline queue whose CT has been reduced to 0, after a new round of authorization time, the CT will return to the maximum initial value (also termed as MAX_CT), allow receiving new packets, and continue to enter the next round of operation that decreases with the passage of time. * For a deadline queue whose CT is not reduced to 0, it can receive packets. In detailed, when a node receives a packet to be forwarded from a specific outgoing port, it first obtains the expected deadline of the packet, and then put the packet to the deadline queue with the relevant CT value of the outgoing port for transmission. * For a deadline queue whose CT is not reduced to 0, its scheduling priority cannot be set to the highest value. The smaller the CT, the higher the priority. A transmission mode can be further configured for a deadline queue to control its transmission behavior. There are two modes: Peng, et al. Expires 11 June 2023 [Page 4] Internet-Draft Deadline Routing December 2022 - The first mode is to allow participation in priority based scheduling, also termed as in-time mode; - The second mode, not allowed, also termed as on-time mode. That is, a queue with on-time mode is allowed to participate in priority based scheduling only when its CT becomes 0. In-time mode is work-conserving and applicable to low latency services, and on-time mode is not work-conserving and applicable to low latency variation services. Only one mode can be configured for each deadline queue. One implementation can support one set of queues in a single mode, or two sets of queues support two modes. * At the beginning, all deadline queues have different CT values, i.e., staggered from each other, so that the CT of only one deadline queue will decrease to 0 at any time. It should be noted that CT is just the countdown of the head of the queue, and other packets in the queue (especially those at the end of the queue) have to wait longer to be scheduled. It can be seen that the scheduling countdown of any specific packet in the queue is in the range [CT, CT+AT). The above AT, TI and MAX_CT value shall be choosed according to the actual capacity of the node. In fact, each node in the network can independently use different AT for different outgoing ports. The general principle is that if an outgoing port has a large bandwidth (such as 100G bps), the AT can be small (such as 1us), because the link with large bandwidth can send the required bits amount even within a small duration; If an outgoing port has a small bandwidth (e.g. 1G bps), the AT should be larger (e.g. 10us), because the link with small bandwidth needs to send the required bits amount within a larger duration. In the FE (Fast Ethernet, 100M bps) scenario, that however may not be the typical scenario this document focuses on, the transmission time of a single packet may take several microseconds, then attention must be paid to ensure that the AT is larger than the transmission time of a single packet, especially the AT should include the interferrence delay caused by a single low priority packet with maximum size. The choose of TI should consider the latency granularity of various service flows, so that CT updated per TI can match the delay requirements of different services. For example, if the delay difference of different traffic flows is several microseconds, TI can be choosed as 1 us. If the delay difference of different traffic flows is several 10 microseconds, TI can be choosed as 10 us. A specific example of the deadline queue is depicted in Figure 1. Peng, et al. Expires 11 June 2023 [Page 5] Internet-Draft Deadline Routing December 2022 +------------------------------+ +------------------------------+ | Deadline Queue Group: | | Deadline Queue Group: | | queue-1(CT=60us) | | queue-1(CT=59us) | | in-time sub-queue ######| | in-time sub-queue ######| | on-time sub-queue ######| | on-time sub-queue ######| | queue-2(CT=50us) | | queue-2(CT=49us) | | in-time sub-queue ######| | in-time sub-queue ######| | on-time sub-queue ######| | on-time sub-queue ######| | queue-3(CT=40us) | | queue-3(CT=39us) | | in-time sub-queue ######| | in-time sub-queue ######| | on-time sub-queue ######| | on-time sub-queue ######| | queue-4(CT=30us) | | queue-4(CT=29us) | | in-time sub-queue ######| | in-time sub-queue ######| | on-time sub-queue ######| | on-time sub-queue ######| | queue-5(CT=20us) | | queue-5(CT=19us) | | in-time sub-queue ######| | in-time sub-queue ######| | on-time sub-queue ######| | on-time sub-queue ######| | queue-6(CT=10us) | | queue-6(CT=9us) | | in-time sub-queue ######| | in-time sub-queue ######| | on-time sub-queue ######| | on-time sub-queue ######| | queue-7(CT=0us) | | queue-7(CT=0us) | | in-time sub-queue ######| | in-time sub-queue ######| | on-time sub-queue ######| | on-time sub-queue ######| +------------------------------+ +------------------------------+ +------------------------------+ +------------------------------+ | Non-deadline Queue Group: | | Non-deadline Queue Group: | | queue-8 ############ | | queue-8 ############ | | queue-9 ############ | | queue-9 ############ | | queue-10 ############ | | queue-10 ############ | | ... ... | | ... ... | +------------------------------+ +------------------------------+ -o----------------------------------o--------------------------------> T0 T0+1us time Figure 1: Example of Deadline Queue for outgoing Port In this example, the AT for deadline queue group is configured to 10us. Queue-1 ~ queue-7 are deadline queues, and other queues are traditional non-deadline queues. Each deadline queue has its CT attribute. The MAX_CT is 60us. At the initial time (T0), the CT of all deadline queues are staggered from each other. For example, the CT of queue-1 is 60us, the CT of queue-2 is 50uS, the CT of queue-3 is 40us, and so on. At this time, only the CT of queue-7 is 0, which has the highest scheduling priority. Peng, et al. Expires 11 June 2023 [Page 6] Internet-Draft Deadline Routing December 2022 Suppose the scheduling engine initiates a cycle timer with a time interval of 1us, i.e., AT = 10 * TI in this case. After each timer timeout, the timer interval will be subtracted from the CT of all deadline queues. As shown in the figure, at T0 + 1us, the timer timeout, the CT of queue-1 becomes 59us, the CT of queue-2 becomes 49us, the CT of queue-3 becomes 39us, etc. At this time, the CT of queue-7, depending on the implementation, can be maintained as 0 or equal to -1. At T0 + 10us, the CT of queue-1 becomes 50us, the CT of queue-2 becomes 40us, the CT of queue-3 becomes 30us, etc. At this time, the CT of queue-7, returns to MAX_CT and is no longer set to the highest scheduling priority; The CT of queue-6 becomes 0, which has the highest scheduling priority. When the CT of a deadline queue becomes 0, it has a time limit of 10us (i.e., authorization time) to send packets in the queue. During this period, it will be prohibited to receive new packets (in fact, there can be no new packets with a deadline of 0). After the 10us time elapses, the CT of another deadline queue will change to 0. If the deadline queue with the highest priority is still free after sending packets within the authorized time, the scheduling engine will visit other queues with the second highest priority during the rest of the authorized time. Note that if both in-time and on-time policies are supported, the queue-1...7 can be taken regard as logical queues and each logical queue may include two sub-queues, one used to buffer in-time packets and the other used to buffer on-time packets. 3. Get Deadline Information of Packets 3.1. Get Planned Residence Time The planned residence time of the packet is an offset time, which is based on the time when the packet arrivals the node and represents the maximum time allowed for the packet to stay inside the node. There are many ways to obtain the planned residence time of the packet. * Carried in the packet. The ingress PE node, when encapsulating the deterministic service flow, can explicitly insert the planned residence time into the packet according to SLA. The intermediate node, after receiving the packet, can directly obtain the planned residence time from the packet. Generally, only a single planned residence time needs to be carried in the packet, which is applicable to all nodes along the path; Or insert a stack composed Peng, et al. Expires 11 June 2023 [Page 7] Internet-Draft Deadline Routing December 2022 of multiple deadlines, one for each node. [I-D.peng-6man-deadline-option] defined a method to carry the planned residence time in the IPv6 packets . * Included in the FIB entry. Each node in the network can maintain the deterministic FIB entry. After the packet hits the deterministic FIB entry, the planned residence time is obtained from the forwarding information contained in the FIB entry. * Included in the policy entry. Configure local policies on each node in the network, and then set the corresponding planned residence time according to the matched specific characteristics of the packet, such as 5-tuple. An implementation should support the policy to forcibly override the planned residence time obtained by other methods. For a deterministic delay path based on deadline scheduling, the path it passes through has deterministic end-to-end delay requirements. It includes two parts, one is the accumulated node delay and the other is the accumulated link propagation delay. The end-to-end delay is subtracted from the accumulated link propagation delay to obtain the accumulated node delay. A simple method is that the accumulated node delay is shared equally by each intermediate node along the path to obtain the planning deadline of each node. 3.2. Get Existing Accumulated Planned Residence Time The existing accumulated planned residence time of the packet refers to the sum of the planned residence time of all upstream nodes before the packet is transmitted to the current node. This information needs to be carried in the packet. Every time the packet passes through a node, the node accumulates its corresponding planned residence time to the existing accumulated planned residence time field in the packet. [I-D.peng-6man-deadline-option] defined a method to carry existing accumulated planned residence time in the IPv6 packets. The setting of "existing accumulated planned residence time" in the packet needs to be friendly to the chip for reading and writing. For example, it should be designed as a fixed position in the packet. The chip may support flexible configuration for that position. Peng, et al. Expires 11 June 2023 [Page 8] Internet-Draft Deadline Routing December 2022 3.3. Get Existing Accumulated Actual Residence Time The existing accumulated actual residence time of the packet, refers to the sum of the actual residence time of all upstream nodes before the packet is transmitted to the current node. This information needs to be carried in the packet. Every time the packet passes through a node, the node accumulates its corresponding actual residence time to the existing accumulated actual residence time field in the packet. [I-D.peng-6man-deadline-option] defined a method to carry existing accumulated actual residence time in the IPv6 packets. The setting of "existing accumulated actual residence time" in the packet needs to be friendly to the chip for reading and writing. For example, it should be designed as a fixed position in the packet. The chip may support flexible configuration for that position. Although other methods can also be, for example, carrying the absolute system time of receiving and sending in the packet to compute the actual residence time indirectly, that has a low encapsulation efficiency and require strict time synchronization between nodes. A possible method to get the actual residence time in the node is that, the receiving and sending time of the packet can be recorded in the auxiliary data structure (note that is not packet itself) of the packet, and the actual residence time of the packet in the node can be calculated according to these two times. 3.4. Get Existing Accumulated Residence Time Deviation The existing accumulated residence time deviation equals existing accumulated planned residence time minus existing accumulated actual residence time. This value can be positive or negative. If the existing accumulated planned residence time and the existing accumulated actual residence time are carried in the packet, it is not necessary to carry the existing accumulated residence time deviation. Otherwise, it is necessary. The advantage of the former is that it can be applied to more scenarios, while the later has less packaging overhead. Peng, et al. Expires 11 June 2023 [Page 9] Internet-Draft Deadline Routing December 2022 4. Put Packets into the Deadline Queues [RFC9320] presents a latency model for DetNet nodes. There are six type of delays that a packet can experience from hop to hop. The processing delay (type-4), the regulator delay (type-5), the queueing subsystem delay (type-6), and the output delay (type-1) together contribute to the residence time in the node. In this document, the residence delay in the node is simplified into two parts: the first part is to lookup the forwarding table when the packet is received from the incoming port (or generated by the control plane) and deliver the packet to the line card where the outgoing port is located; the second part is to store the packet in the queue of the outgoing port for transmission. These two parts contribute to the actual residence time of the packet in the node. The former can be called forwarding delay and the latter can be called queueing delay. The forwarding delay is related to the chip implementation and is generally constant; The queueing delay is unstable. When a node receives a packet from an upstream node, it can first get the existing accumulated residence time deviation, and then add it to the planned residence time of the packet at this node to obtain the deadline adjustment value, and then on the basis of the deadline adjustment value, deducting the forwarding delay of the packet in the node, the allowable queueing delay value (termd as Q) is obtained, and then the packet will be put to the deadline queue with corresponding CT, meeting the condition: CT <= Q < CT+AT. As mentioned above, if the queue already contains many packets, the newly inserted packet may need to wait more time to be scheduled. Therefore, an alternate implemantation may support to select deadline queue for the packet based on Q-k*AT, where 0 <= k <= 1, i.e., meeting the condition: CT <= Q-k*AT < CT+AT. In fact, according to the following introduction, the deadline scheduling mechanism supports latency credit compensation, so the end-to-end latency caused by these two conditions is the same. Assume that the local node in a deterministic path is i, all upstream nodes are from 1 to i-1, and downstream nodes are i + 1, the planned residence time is D, the actual residence time is R, the deadline adjustment value is M, the forwarding delay inside the node is F, the existing accumulated residence time deviation is E, and the allowable queueing delay is Q, then the allowable queueing delay (Q) of the packet on this node i is calculated as follows: * E(i-1) = D(1) + D(2) + ... + D(i-1) - R(1) - R(2) - ... - R(i-1) * M(i) = D(i) + E(i-1) Peng, et al. Expires 11 June 2023 [Page 10] Internet-Draft Deadline Routing December 2022 * Q(i) = M(i) - F(i) Under normal circumstances, if each hop strictly controls the scheduling of the packet according to its planned residence time, the actual residence time of the packet will be very close to the planned residence time, thus the absolute value of the existing accumulated residence time deviation will be very small. Consider some extreme cases. For example, many upstream nodes adopt the in-time mode to send packets quickly. Packets almostly need not queue in these nodes, but only depend on the forwarding delay. Then the existing accumulated residence time deviation (E) may be a very large positive value, resulting in a large allowable queueing delay (Q). If this value exceeds the MAX_CT of the deadline queue maintained by the node, the allowable queueing delay (Q) should be modified to the MAX_CT, or such a packet can be inserted into an escape deadline queue with fixed larger CT than MAX_CT, or it can also be inserted into a higher-level queue in a hierarchical deadline queue. For another example, if some upstream nodes are abnormal and have a very large actual residence time (R), the existing accumulated residence time deviation (E) may be a negative number, resulting in the allowable queueing delay (Q) may be less than or equal to 0, the allowable queueing delay (Q) should be modified to a minimum CT other than 0. If the queue choosed by the condition is full, the packet must be inserted into the next queue with higher CT value. Figure 2 depicts an example of packets inserted to the deadline queues. Peng, et al. Expires 11 June 2023 [Page 11] Internet-Draft Deadline Routing December 2022 P2 P1 +------------------------------+ +--------+ +--------+ | Deadline Queue Group: | | D=20us | | D=30us | | queue-1(CT=55us) ###### | | E=15us | | E=-8us | +--+ | queue-2(CT=45us) ###### | +--------+ +--------+ |\/| | queue-3(CT=35us) ###### | ------incoming port-1------> |/\| | queue-4(CT=25us) ###### | |\/| | queue-5(CT=15us) ###### | P4 P3 |/\| | queue-6(CT=5us) ###### | +--------+ +--------+ |\/| | queue-7(CT=0us) ###### | | | | D=30us | |/\| +------------------------------+ +--------+ | E=-30us| |\/| +--------+ |/\| ------incoming port-2------> |\/| +------------------------------+ |/\| | Non-deadline Queue Group: | P6 P5 |\/| | queue-8 ############ | +--------+ +--------+ |/\| | queue-9 ############ | | | | D=40us | |\/| | queue-10 ############ | +--------+ | E=40us | |/\| | ... ... | +--------+ +--+ +------------------------------+ ------incoming port-3------> ---------outgoing port----------> -o----------------------------------o--------------------------------> receiving-time base +F time Figure 2: Time Sensitive Packets Buffered to Deadline Queue As shown in Figure 2, the node successively receives six packets from three incoming ports, among which packet 1, 2, 3 and 5 have corresponding deadline information, while packet 4 and 6 are traditional packets. These packets need to be forwarded to the same outgoing port according to the forwarding table entries. It is assumed that they arrive at the line card where the outgoing port is located at almost the same time after the forwarding delay in the node (F = 5us). At this time, the queue status of the outgoing port is shown in the figure. Then: * The allowable queueing delay (Q) of packet 1 in the node is 30 - 8 - 5 = 17us, and it will be put into the deadline queue-5 (its CT is 15us), meeting the condition that Q is in the range [15, 25). * The allowable queueing delay (Q) of packet 2 in the node is 20 + 15 - 5 = 30us, and it will be put into the deadline queue-4 (its CT is 25us), meeting the condition that Q is in the range [25, 35). Peng, et al. Expires 11 June 2023 [Page 12] Internet-Draft Deadline Routing December 2022 * The allowable queueing delay (Q) of packet 3 in the node is 30 - 30 - 5 = -5us, and it will be modified to the minimum positive value 5 us then put into the deadline queue-6 (its CT is 5us), meeting the condition that Q is in the range [5, 15). * The allowable queueing delay (Q) of packet 5 in the node is 40 + 40 - 5 = 75us, and it will be modified to the MAX_CT (60 us) then put into the deadline queue-1 (its CT is 55us), meeting the condition that Q is in the range [55, 65). * Packets 4 and 6 will be put into the non-deadline queue in the traditional way. 4.1. Alternate Queue Allocation Rules One option may further divide a deadline queue into multiple sub- queues, each for a typical planned residence time (D) such as 10us, 20us, ..., 80us. That is, packets with different planned residence time are inserted into different sub-queues and protected. In this way, for two packets with the same allowable queueing delay (Q) but different planned residence time, we can decide to firstly schedule the packet with the smallest planned residence time. This is beneficial because when packets with smaller planned residence time facing larger interference delay, it is difficult to have space for latency compensation on downstream nodes, while packets with larger planned residence time have larger space for latency compensation. However, this option may cause in-time traffic with high priority (small planned residence time) to preempt the sending time of on-time traffic with low priority (large planned residence time). Another option may further divide a deadline queue into multiple sub- queues, each for a typical residence time deviation (E) such as -60, -50us, -40us, ..., 0, PV(positive value). That is, packets with different existing accumulated residence time deviation are inserted into different sub-queues and protected. In this way, for two packets with the same allowable queueing delay (Q) but different existing accumulated residence time, we can decide to firstly schedule the packet with smallest existing accumulated residence time deviation. An implementation can also maintain sub-queues with different sub- priorities. Packets are mapped to different sub-priorities according to their planned residence time (D) and accumulated residence time deviation (E), and placed in corresponding sub-queues. The basic principle is to schedule packets with smaller accumulated residence time deviation (E) first. If the accumulated residence time deviation (E) is equal, then schedule packets with smaller planned residence time (D). Peng, et al. Expires 11 June 2023 [Page 13] Internet-Draft Deadline Routing December 2022 5. Buffer Size Design We assume that the service rate of the deadline scheduler can be the link rate (termed as C), so the buffer size of each deadline queue is AT * C - M, where M is the maximum size of the packet with low priority. 6. Schedulability Condition for Deadline Mechanism In this section, we first discuss the schedulability condition of in- time scheduling mode. Then analyze the condition of the on-time mode. Consider that a node is traversed by multiple paths, and each path may carry one or more service flows. For each service flow, it follows the corresponding constraint function, such as the leaky bucket arrival curve A(t) = b + r*t. Generally there are two mechanisms to enforce that traffic of a service entering the deadline scheduler conforms to the given traffic constraint function. The first such mechanism is a traffic policer at the entrance of the network which rejects traffic exceeding the constraint, and the second mechanism is a traffic re-shaper on the intermediate node which temporarily buffers packets to make it conform to the constraint. Suppose that all nodes along each path have reserved corresponding bandwidth resources according to all service flows carried by that path, then we can classify all service flows forwarded by a node based on different planned residence times. For example, for the class of planned residence time d_i, the corresponding accumulated constraint function is A_i(t). Let d_i < d_(i+1), then the schedulability condition for deadline in-time mode is:. * A_1(t-d_1) + sum{A_i(t+AT-d_i) for all i>=2} <= C*t where AT is the authorization time of each deadline queue, C is service rate of the deadline scheduler. The proof is similar with that in [RPQ], except that the step length is fine-grained by TI when the queue rotates. Figure 3 below gives a rough explanation. Peng, et al. Expires 11 June 2023 [Page 14] Internet-Draft Deadline Routing December 2022 ^ | Planned Residence Time | | | |CT_t+AT+2*TI -> ===== | | | CT_t+AT+TI -> =====| | CT_t + - - - - - - - - - - - - >+====+ -\ +AT | | | | | | | | | | | > Phycical queue-x d_p + - - - - - - - - - - - - >| | | | | | | CT_t + - - - - - - - - - - - - >+====+ -/ | | |===== <- CT_t-TI | | | ===== <- CT_t-2*TI | | | | | TI | ... ... | TI | TI | TI | TI | TI | TI | ---+----+-----------+----+----+----+----+----+----+---------> 0 ^ ^ ^ ^ time | | | | t-tau' t-ofst t t+tau (busy period begin) (arrival) (departure) Figure 3: Deadline In-time Scheduling Suppose that the observed packet, with planned residence time d_p, arrives at the scheduler at time t and leaves the scheduler at time t+tau. It will be inserted to physical queue-x with count-down time CT_t at the current timer interval TI with starting time t-ofst and end time t-ofst+TI. According to the above packet queueing rules, we have CT_t <= d_p < CT_t+AT. Also suppose that t-tau' is the beginning of the busy period closest to t. Then, we can get the amount of packets within time interval [t-tau', t+tau] that must be scheduled before the observed packet. In detailed: * for all service i with planned residence time d_i meeting CT_t <= d_i < CT_t+AT, the workload is sum{A_i[t-tau', t]}. Explanation: since the packets with priority d_i in the range [CT_t, CT_t+AT) at time t will be sent before the observed packet, the packets with the same priority d_i before time t will become more urgent at time t, and must also be sent before the observed packet. * for all service i with planned residence time d_i meeting d_i >= CT_t+AT, the workload is sum{A_i[t-tau', t-ofst+TI-(d_i-CT_t- AT+TI)]}. Peng, et al. Expires 11 June 2023 [Page 15] Internet-Draft Deadline Routing December 2022 Explanation: although the packets with priority d_i larger than CT_t+AT at time t will be sent after the observed packet, but the packets with the same priority d_i before time t, at time t-ofst+TI-(d_i-CT_t-AT+TI), will become more urgent at time t, and must be sent before the observed packet. * for all service i with planned residence time d_i meeting d_i < CT_t, the workload is sum{A_i[t-tau', t+(CT_t-d_i)]}. Explanation: the packets with priority d_i less than CT_t at time t will certainly be sent before the observed packet, at a future time t+(CT_t-d_i) the packets with the same priority d_i will still be urgent than the observed packet (even the observed packet also become urgent), and must be sent before the observed packet. * then deduct the traffic that has been sent during the busy period, i.e., C*(tau+tau'). Let tau as d_p, and remember that CT_t <= d_p, the above workload is less than sum{A_i(tau'+CT_t+AT-d_i) for all d_i >= CT_t} + sum{A_i(tau'+CT_t-d_i) for all d_i < CT_t} - C*(tau'+d_p) It is further less than sum{A_i(tau'+d_p+AT-d_i) for all d_i >= d_2} + A_1(tau'+d_p-d_1) - C*(tau'+d_p) Then, denote x as tau'+d_p, we have sum{A_i(x+AT-d_i) for all d_i >= d_2} + A_1(x-d_1) - C*(x) Let the above workload less than zero, then we get the condition. It should be noted that for a class i, its planned residence time d_i is actually contributed by its own flows and all the classes with lower planned residence time. Thus, based on the above schedulability conditions, and knowing the traffic constraint functions of all classes, we can then give the guidance to allocate appropriate (i.e., not arbitrary) planned residence time for each class. In brief, we can choose the traffic arrival constraint function according to the preset planned residence time, or we can choose the planned residence time according to the preset traffic arrival constraint function. Peng, et al. Expires 11 June 2023 [Page 16] Internet-Draft Deadline Routing December 2022 Deadline in-time mode is a variant of EDF scheduler with work- conserving characteristic and has several aggregated queues. It needs to rely on the known traffic arrival curve to obtain the sufficient and necessary schedulability conditions. In traffic aggregation case, the arrival traffic faced by the intermidiate nodes may be irregular due to work-conserving scheduling. * One option may choose to implement re-shaping per flow or interleaved regulator (IR) introduced by [UBS] placed before the deadline schedulre on the intermediate node. It is not recommended to maintain re-shapers per flow in a large-scale network. As described in [IR-Theory], re-shaper per flow or interleaved regulator does not increate worst-case latency bounds. * Another option is that the deadline queue based on latency compensation has essentially played the role of re-shaper queue. Assume that an in-time burst A_i released late in the network quickly reaches the intermediate node, and catches up with the traffic released early in the network, resulting in the actual arrival curve of A_i exceeding the constraints. However, the excess traffic can be identified by the scheduler, which has a greater residence time deviation and is placed in a queue with a larger CT value, without causing interference to class i. Although the excess traffic may be ranked before class j (j>i) after latency compensation, considering that the essence of latency compensation is to align to the on-time mode, that is, even if the excess traffic is sent in the on-time mode, this part of traffic naturally needs to be sent before class j traffic. The test of schedulability conditions needs to be based on the whole network view. When we need to add new traffic to the network, we need to consider which nodes the related path will pass through, and then check in turn whether these nodes still meet the schedulability conditions after adding new traffic. 6.1. Schedulability Condition Analysis for On-time Mode Compared with the in-time mode, on-time mode is non-work-conserving, which can be considered as the combination of damper and EDF scheduler. The intuitive understanding of the on-time mode is that if the headends of multiple paths (they converge at the same intermediate node) apply admission control strictly according to the service bandwidth when releasing traffic to the network, and successfully reserve the corresponding bandwidth resources for these paths in the control plane, then the on-time forwarding behavior on a path controls the interval between early and late release of traffic on that path, but not lead to an increase in the bandwidth occupied by that path. Therefore, the on-time mode does not cause the arrival Peng, et al. Expires 11 June 2023 [Page 17] Internet-Draft Deadline Routing December 2022 curve to exceed the expected traffic constraint function. If any packets with on-time mode can always be sent before or at their deadline, e.g, place packets to deadline queues according to CT <= Q-k*AT < CT+AT, the above schedulability condition can also be applied to the on-time mode, that is, if the aggregate traffic for the specific planned residence time reaches the maximum limit allowed by the conditions, it will achieve passive on-time behavior. If the aggregate traffic is lower than the limit, the deadline on-time scheduler will also achieve active on-time behavior by applying the built-in damper. Since the built-in dumper has shaped the on-time service flow, there is also no need to re-shape the on-time service flow by additional shapers in front of deadline scheduler. However, this is not always the case. Since the on-time mode explicitly introduces the hold time, the actual departure time of the packet may be after the deadline. For example, a set of packets with class of d_60 (i.e., the planned residence time is 60 us) arrives at the deadline scheduler at time T, assuming that these packets require a total transmission time of 10us. Then after 50 us, another set of packets with class of d_10 (i.e., the planned residence time is 10 us) arrives, also assuming a total transmission time of 10us. These two batches of packets will start to be sent at the same time, which will inevitably lead to some packets not leaving before their deadline. Although the delayed packets can be accelerated on the downstream nodes through latency compensation, it still bring risks to end-to-end QoS target. On the one hand, hold time is used to avoid jitter, but on the other hand, it is also a waste of transmission opportunities. This means that implementing more relaxed on time scheduling on intermediate nodes will be more beneficial to end-to-end QoS target. For example, for CT <= Q-k*AT < CT+AT, one can config a large k (>1) on the intermediate nodes for loose on-time scheduling, while config a small k (<1) on the egress for strict on-time scheduling. A node needs to identify its role as an intermediate node or egress node for a specific packet and provide an appropriate k value. Note that even if all intermediate nodes apply loose on-time scheduling, as long as the egress node applies strict on-time scheduling, the jitter target can still be achieved. 7. Traffic Regulation and Orchestrating On the ingress PE node, traffic regulation must be performed on the incoming port, so that the service traffic does not exceed its constraint funciton, such as leacky bucket A_i(t) = b_i + r_i*t for service flow i. Peng, et al. Expires 11 June 2023 [Page 18] Internet-Draft Deadline Routing December 2022 Then, on the ingress PE node, traffic is orchestrated into different deadline queues of the outgoing port. Multiple continuous packets of the specific service flow are stored in the deadline queue with corresponding remaining time according to the planned residence time of the service flow. Note that these packets are not stored in the same queue over time. The amount of bits that can be stored in one queue for service i is equal to r_i * AT, however, at least one whole packet shall be loaded (i.e., there may be overprivisioning per AT in the case of large packet size and small AT value). For example, if the allowable queueing delay (Q) is 20us, then within the current authorization time, the first sequence of the packets will be put into the current deadline queue with [CT, CT+AT) range that can cover 20us, until the reserved bandwidth limit is reached; Then, within the next authorization time, the next sequence of packets will be put into the queue with [CT, CT+AT) range that can cover 20us, until the reserved bandwidth limit is reached; and so on, until the total service bits are loaded. Figure 4 depicts an example of deadline based traffic orchestrated on the ingress PE node. It is assumed that the packets loaded in each authorization time do not exceed the reserved bandwidth of the service. Peng, et al. Expires 11 June 2023 [Page 19] Internet-Draft Deadline Routing December 2022 1st packet | v +-+ +-+ +----+ +-+ +--+ +------+ |1| |2| | 3 | |4| |5 | | 6 | <= packet sequence +-+ +-+ +----+ +-+ +--+ +------+ | | | | | | ~+F ~+F ~+F ~+F ~+F ~+F | | | | | | UNI v v v v v v ingr-PE -+--------+--------+--------+--------+--------+--------+----> NNI | Auth | Auth | Auth | Auth | Auth | Auth | time | time | time | time | time | time | time | v v v v 1,2 in 3 in 4,5 in 6 in Buffer-A Buffer-B Buffer-C Buffer-D (CT<=Q) (CT<=Q) (CT<=Q) (CT<=Q) | | | | ~+Q ~+Q ~+Q ~+Q | | | | sending v v v v +-+ +-+ +----+ +-+ +--+ +------+ |1| |2| | 3 | |4| |5 | | 6 | +-+ +-+ +----+ +-+ +--+ +------+ Figure 4: Deadline Based Packets Orchestrating 8. Latency Compensation Considerations As mentioned above, the calculation of allowable queueing delay (Q) reflects the meaning of latency compensation. It can be used for in- time services to distinguish between emergency flows and non emergency flows (compared with traditional strict priority scheduling), and also for on-time services to strictly control delay jitter. In both cases, packets have a tolerable maximum queueing delay in the queueing subsystem. The traditional EDF scheduling can also benefit from this latency compensation. Peng, et al. Expires 11 June 2023 [Page 20] Internet-Draft Deadline Routing December 2022 Suppose that two packets, P1, P2, are generated instantaneously from a specific in-time service flow at the same source, and the two packets have the same planned residence time. Packet-1 may face less interference delay than P2 in their process of forwarding. When they arrive at an intermediate node successively, P2 will have less allowable queueing delay (Q) than P1 to try to stay close to P1 again. It should be noted that to compary who is ealier is based on queue's CT and packet's Q, according to the above queueing rule (CT <= Q < CT+AT), and the CT of the queue is not changed in real-time, but gradually with the decreasing step TI. It is possible to get an unexpected comparision result. As shown in Figure 5, P1 and P2 are two packets belonging to the same burst. The arrival time when they are received on the current scheduler is shown in the figure. Suppose that the AT of the deadline queue is 10us, the decreasing step TI is 1us, and the transmission time of each packet is 0.01us. Also suppose that the Q values of two adjacent packets P1 and P2 are 40us and 39us respectively, and they are both received in the window from T0 to T0+1us. P1 will enter queue-B with CT range [40, 50), while P2 will enter queue-A with CT range [30, 40). This means that P2 will be scheduled before P1, resulting in disorder. packets arrived later packets arrived earlier | | | V V --------+-----------------------------------------------+--------- ... ... | .P1.................P2....................... | ... ... --------+-----------------------------------------------+--------- allowable queueing delay (Q): P1.Q=40us P2.Q=39us | | | --------o---------------------o---------------------o-----------> T0 T0+1us T0+2us time queue-A.CT[30,40) queue-A.CT[29,39) queue-B.CT[40,50) queue-B.CT[39,49) queue-C.CT[50,60) queue-C.CT[49,59) Figure 5: Packets queueing based on Latency Compensation DetNet architecture [RFC8655] provides Packet Ordering Function (POF), that can be used to solve the above disorder problem caused by the latency compensation. Peng, et al. Expires 11 June 2023 [Page 21] Internet-Draft Deadline Routing December 2022 Alternatively, if the POF is not enabled, we can also maintain states for service flows to record the last queueing information to address this issue. For example, one ore more OGOs (order guarantee object) are maintained per flow (or per incoming port, planned residence time, incoming port plus planned residence time, etc), on each outgoing port. An OGO records the queue information which is the queue that all the packets mapped to this OGO was inserted recently. For simplicity, a count-down time (CT), which is copied from the recent inserted deadline queue, can be recorded in OGO. Note that the CT of OGO needs to decrease synchronously with that of other deadline queues, with the same decreasing step TI. If the CT of OGO decreases to 0, it will remain at 0. When a packet arrives at the deadline scheduler at the outgoing port, it is first mapped to its OGO, and get the CT of OGO, termed as OGO.CT. Then, according to the above queueing rule (CT <= Q-k*AT < CT+AT), get the CT of a preliminarily selected queue, termed as preliminary CT. Let Q' is MAX{OGO.CT, preliminary CT}, and put the packet in the target queue according to CT <= Q' < CT+AT Update the value of OGO.CT to the CT of target queue. The intuitive meaning of the above operation is that for two packets belonging to the same OGO (e.g, per flow), the packet received later should not be queued before the packet received first. This also implies that the main purpose of latency compensation is to compensate the interference delay between pakckets of different flows, rather than compensating the interference delay between packets of the same flow. 9. Compatibility Considerations For a particular path, if only some nodes in the path upgrade support the deadline mechanism defined in this document, the end-to-end deterministic delay/jitter target will only be partially achieved. Those legacy devices may adopt the existing priority based scheduling mechanism, and ignore the possible deadline information in the packet, thus the delay intra node produced by them cannot be perceived by the adjacent upgraded node. The more upgraded nodes included in the path, the closer to the delay/jitter target. Although, the legacy devices may not support the dataplane mechanism described in this document, but they can be freely programmed (such as P4 language) to measure and insert the deadline information into packets, in this case the achievement of delay/jitter target will be Peng, et al. Expires 11 June 2023 [Page 22] Internet-Draft Deadline Routing December 2022 more perfect. Only a few key nodes are upgraded to support deadline mechanism, which is low-cost, but can meet a service with relatively loose time requirements. Figure 6 shows an example of upgrading only several network border nodes. In the figure, only R1, R2, R3 and R4 are upgraded to support deadline mechanism. A deterministic path across domain 1, 2, and 3 is established, which contains nodes R1, R2, R3, and R4, as well as explicit nodes in each domain. Domain 1, 2 and 3 use the traditional strict priority based forwarding mechanism. The encoding of the packet sent by R1 includes the planned residence time and the accumulated residence time deviation. Especially, DS filed in IP header ([RFC2474]) are also set to appropriate values. The basic principle of setting is that the less the planned residence time, the higher the priority. However, in order to avoid the interference of non deterministic flow to deterministic flow, the priority of deterministic flow should be set as high as possible. The delay analysis based on strict priority in each domain can be found in [SP-LATENCY], which gives the formula to evaluate the worst- case delay of each hop during the resource reservation procedure. The worst-case delay depends on the number of hops and the burst size of interference flows that may be faced on each hop. [EF-FIFO] also shows that, for FIFO packet scheduling be used to support the EF (expedited forwarding) per-hop behavior (PHB), if the network utilization level alpha < l/(H-l), the worst-case delay bound is inversely proportional to 1-alpha*(H-1), where H is the number of hops in the longest path of the network. Although the deadline in- time scheduling, the SP scheduling and EF FIFO scheduling are all work-conserving, the deadline in-time scheduling can further distinguish between emergency and non emergency according to deadline information other than traffic class. Therefore, when analyzing the latency of the in-time mode, the latency is not evaluated just according to the order in which the packets arrive at the scheduler, but also according to the deadline of the packets. An intuitive phenomenon is that if a packet unfortunately faces more interference delays at the upstream nodes, it will become more urgent at the downstream node, and will not always be unfortunate. This operation of dynamically modifying the key fields, e.g, the existing acumulated residence time deviation (E), of the packet can avoid always overestimating worst-case latency on all hops. According to schedulability condition, in-time mode can be pessimistic that its worst-case latancy bounds is the same as on-time mode. When the boundary node (e.g, R2) receives the deterministic traffic, it will decide whether to speed up or hold according to the existing accumulated residence time deviation information carried in the packet. The in-time traffic is always sent as soon as possible, Peng, et al. Expires 11 June 2023 [Page 23] Internet-Draft Deadline Routing December 2022 especially when the residence time deviation of the packet is negative. The on-time traffic always controls the sending time so that the average planned residence time is followed, especially when the residence time deviation of the packet is positive. For a specific deterministic flow, if it experiences too much latency in the SP domain (due to unreasonable setting of traffic class and the inability to distinguish between deterministic and non deterministic flows), even if the boundary node accelerates the transmission, it may not be able to achieve the target of low E2E latency. If the traffic experiences less latency within the SP domain, on-time mode will work to achieve the end-to-end jitter target. _____ ___ _____ ___ _____ ___ / \/ \___ / \/ \___ / \/ \___ / \ / \ / \ +--+ +--+ +--+ +--+ |R1| Strict Priority |R2| Strict Priority |R3| Strict Priority |R4| +--+ domian 1 +--+ domian 2 +--+ domian 3 +--+ \____ __/ \____ __/ \____ __/ \_______/ \_______/ \_______/ Figure 6: Example of partial upgrade 10. Deployment Considerations According to the above schedulability conditions, the planned residence time that can be provided in the network is related to the entire deployed service flows. Each planned residence time can be considered as a delay resource, and the smaller the residence time, the more valuable it is. The operator needs to match the corresponding planned residence time for each service. It is not recommended to allocate a large planned residence time for the on- time service, which will consume a large amount of buffer and is not beneficial to target jitter. However, the in-time service may also not want to be assigned to a larger planned residence time, that may be determined by its SLA that requires a smaller target end-to-end latency. Thus it is necessary to find a balance to meet more service requirements as much as possible. We believe that the planned residence time specified for the service flows is completely determined by the network, not by the user. Therefore, it is impossible to carry the planned residence time beyond the network capacity in the service packets. However, there may be some services that expect a large residence time, in this case Peng, et al. Expires 11 June 2023 [Page 24] Internet-Draft Deadline Routing December 2022 it is recommended to hold them at the border nodes, but not on the intermediate node. For this purpose, the MAX_CT configured on the border nodes can be much larger than the MAX_CT configured on the intermediate nodes, or the hierarchical deadline queues can be configured on the border nodes. As mentioned above, loose on-time scheduling or in-time scheduling can be applied in the network, and the jitter is removed at the edge node according to strict on-time scheduling. 11. Benefits The mechanism described in this document has the following benefits: * Time synchronization is not required between network nodes. Each node can flexibly set the authorization time length of the deadline queue according to its own outgoing port bandwidth. * Packet multiplexing based, it is an enhancement of PQ scheduling algorithm, friendly to the upgrade of packet switching network. All nodes in the network can independently use cycle timers with different timeout intervals to rotate the deadline queues. * The packet can control its expected residence time in the node. A single set of deadline queues supports multiple levels of residence time. * For in-time mode, the end-to-end delay is H*(F~D), jitter is H*Q; For on-time mode, the end-to-end delay is H*D, jitter is a just single authorization time. 12. IANA Considerations There is no IANA requestion for this document. 13. Security Considerations TBD 14. Acknowledgements TBD 15. References 15.1. Normative References Peng, et al. Expires 11 June 2023 [Page 25] Internet-Draft Deadline Routing December 2022 [I-D.peng-6man-deadline-option] Peng, S., Tan, B., and P. Liu, "Deadline Option", Work in Progress, Internet-Draft, draft-peng-6man-deadline-option- 01, 11 July 2022, . [I-D.stein-srtsn] Stein, Y. J., "Segment Routed Time Sensitive Networking", Work in Progress, Internet-Draft, draft-stein-srtsn-01, 29 August 2021, . [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, . [RFC2474] Nichols, K., Blake, S., Baker, F., and D. Black, "Definition of the Differentiated Services Field (DS Field) in the IPv4 and IPv6 Headers", RFC 2474, DOI 10.17487/RFC2474, December 1998, . [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017, . [RFC8655] Finn, N., Thubert, P., Varga, B., and J. Farkas, "Deterministic Networking Architecture", RFC 8655, DOI 10.17487/RFC8655, October 2019, . [RFC9320] Finn, N., Le Boudec, J.-Y., Mohammadpour, E., Zhang, J., and B. Varga, "Deterministic Networking (DetNet) Bounded Latency", RFC 9320, DOI 10.17487/RFC9320, November 2022, . 15.2. Informative References [EF-FIFO] "Fundamental Trade-Offs in Aggregate Packet Scheduling", 2001, . [IR-Theory] "A Theory of Traffic Regulators for Deterministic Networks with Application to Interleaved Regulators", 2018, . Peng, et al. Expires 11 June 2023 [Page 26] Internet-Draft Deadline Routing December 2022 [RPQ] "Exact Admission Control for Networks with a Bounded Delay Service", 1996, . [SP-LATENCY] "Guaranteed Latency with SP", 2020, . [UBS] "Urgency-Based Scheduler for Time-Sensitive Switched Ethernet Networks", 2016, . Authors' Addresses Shaofu Peng ZTE Corporation China Email: peng.shaofu@zte.com.cn Bin Tan ZTE Corporation China Email: tan.bin@zte.com.cn Peng Liu China Mobile China Email: liupengyjy@chinamobile.com Peng, et al. Expires 11 June 2023 [Page 27]