< draft-ietf-raw-oam-support-01.txt   draft-ietf-raw-oam-support-02.txt >
RAW F. Theoleyre RAW F. Theoleyre
Internet-Draft CNRS Internet-Draft CNRS
Updates: draft-ietf-raw-oam-support-00 G. Papadopoulos Updates: draft-ietf-raw-oam-support-01 G. Papadopoulos
(if approved) IMT Atlantique (if approved) IMT Atlantique
Intended status: Informational G. Mirsky Intended status: Informational G. Mirsky
Expires: November 25, 2021 ZTE Corp. Expires: December 5, 2021 ZTE Corp.
CJ. Bernardos CJ. Bernardos
UC3M UC3M
May 24, 2021 June 3, 2021
Operations, Administration and Maintenance (OAM) features for RAW Operations, Administration and Maintenance (OAM) features for RAW
draft-ietf-raw-oam-support-01 draft-ietf-raw-oam-support-02
Abstract Abstract
Some critical applications may use a wireless infrastructure. Some critical applications may use a wireless infrastructure.
However, wireless networks exhibit a bandwidth of several orders of However, wireless networks exhibit a bandwidth of several orders of
magnitude lower than wired networks. Besides, wireless transmissions magnitude lower than wired networks. Besides, wireless transmissions
are lossy by nature; the probability that a packet cannot be decoded are lossy by nature; the probability that a packet cannot be decoded
correctly by the receiver may be quite high. In these conditions, correctly by the receiver may be quite high. In these conditions,
guaranteeing that the network infrastructure works properly is providing high reliability and a low delay is challenging. This
particularly challenging, since we need to address some issues document lists the requirements of the Operation, Administration, and
specific to wireless networks. This document lists the requirements Maintenance (OAM) features recommended to construct a predictable
of the Operation, Administration, and Maintenance (OAM) features communication infrastructure on top of a collection of wireless
recommended to construct a predictable communication infrastructure segments. This document describes the benefits, problems, and trade-
on top of a collection of wireless segments. This document describes offs for using OAM in wireless networks to achieve Service Level
the benefits, problems, and trade-offs for using OAM in wireless Objectives (SLO).
networks to achieve Service Level Objectives (SLO).
Status of This Memo Status of This Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/. Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on November 25, 2021. This Internet-Draft will expire on December 5, 2021.
Copyright Notice Copyright Notice
Copyright (c) 2021 IETF Trust and the persons identified as the Copyright (c) 2021 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of (https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 2, line 36 skipping to change at page 2, line 36
2. Role of OAM in RAW . . . . . . . . . . . . . . . . . . . . . 6 2. Role of OAM in RAW . . . . . . . . . . . . . . . . . . . . . 6
2.1. Link concept and quality . . . . . . . . . . . . . . . . 7 2.1. Link concept and quality . . . . . . . . . . . . . . . . 7
2.2. Broadcast Transmissions . . . . . . . . . . . . . . . . . 7 2.2. Broadcast Transmissions . . . . . . . . . . . . . . . . . 7
2.3. Complex Layer 2 Forwarding . . . . . . . . . . . . . . . 8 2.3. Complex Layer 2 Forwarding . . . . . . . . . . . . . . . 8
2.4. End-to-end delay . . . . . . . . . . . . . . . . . . . . 8 2.4. End-to-end delay . . . . . . . . . . . . . . . . . . . . 8
3. Operation . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3. Operation . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.1. Information Collection . . . . . . . . . . . . . . . . . 8 3.1. Information Collection . . . . . . . . . . . . . . . . . 8
3.2. Continuity Check . . . . . . . . . . . . . . . . . . . . 9 3.2. Continuity Check . . . . . . . . . . . . . . . . . . . . 9
3.3. Connectivity Verification . . . . . . . . . . . . . . . . 9 3.3. Connectivity Verification . . . . . . . . . . . . . . . . 9
3.4. Route Tracing . . . . . . . . . . . . . . . . . . . . . . 9 3.4. Route Tracing . . . . . . . . . . . . . . . . . . . . . . 9
3.5. Fault Verification/detection . . . . . . . . . . . . . . 9 3.5. Fault Verification/detection . . . . . . . . . . . . . . 10
3.6. Fault Isolation/identification . . . . . . . . . . . . . 10 3.6. Fault Isolation/identification . . . . . . . . . . . . . 10
4. Administration . . . . . . . . . . . . . . . . . . . . . . . 10 4. Administration . . . . . . . . . . . . . . . . . . . . . . . 10
4.1. Worst-case metrics . . . . . . . . . . . . . . . . . . . 11 4.1. Worst-case metrics . . . . . . . . . . . . . . . . . . . 11
4.2. Efficient data retrieval . . . . . . . . . . . . . . . . 11 4.2. Efficient data retrieval . . . . . . . . . . . . . . . . 11
4.3. Reporting OAM packets to the source . . . . . . . . . . . 12 4.3. Reporting OAM packets to the source . . . . . . . . . . . 12
5. Maintenance . . . . . . . . . . . . . . . . . . . . . . . . . 12 5. Maintenance . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.1. Soft transition after reconfiguration . . . . . . . . . . 12 5.1. Soft transition after reconfiguration . . . . . . . . . . 13
5.2. Predictive maintenance . . . . . . . . . . . . . . . . . 12 5.2. Predictive maintenance . . . . . . . . . . . . . . . . . 13
6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13
7. Security Considerations . . . . . . . . . . . . . . . . . . . 13 7. Security Considerations . . . . . . . . . . . . . . . . . . . 13
8. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 13 8. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 13
9. Informative References . . . . . . . . . . . . . . . . . . . 13 9. Informative References . . . . . . . . . . . . . . . . . . . 14
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 14 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 15
1. Introduction 1. Introduction
Reliable and Available Wireless (RAW) is an effort that extends Reliable and Available Wireless (RAW) is an effort that extends
DetNet to approach end-to-end deterministic performances over a DetNet to approach end-to-end deterministic performances over a
network that includes scheduled wireless segments. In wired network that includes scheduled wireless segments. In wired
networks, many approaches try to enable Quality of Service (QoS) by networks, many approaches try to enable Quality of Service (QoS) by
implementing traffic differentiation so that routers handle each type implementing traffic differentiation so that routers handle each type
of packets differently. However, this differentiated treatment was of packets differently. However, this differentiated treatment was
expensive for most applications. expensive for most applications.
skipping to change at page 3, line 36 skipping to change at page 3, line 36
depending on a broad set of parameters. Thus, providing high depending on a broad set of parameters. Thus, providing high
reliability through wireless segments is particularly challenging. reliability through wireless segments is particularly challenging.
Wired networks rely on the concept of _links_. All the devices Wired networks rely on the concept of _links_. All the devices
attached to a link receive any transmission. The concept of a link attached to a link receive any transmission. The concept of a link
in wireless networks is somewhat different from what many are used to in wireless networks is somewhat different from what many are used to
in wireline networks. A receiver may or may not receive a in wireline networks. A receiver may or may not receive a
transmission, depending on the presence of a colliding transmission, transmission, depending on the presence of a colliding transmission,
the radio channel's quality, and the external interference. Besides, the radio channel's quality, and the external interference. Besides,
a wireless transmission is broadcast by nature: any _neighboring_ a wireless transmission is broadcast by nature: any _neighboring_
device may be able to decode it. The document includes detailed device may be able to decode it. This document includes detailed
information on what the implications for the OAM features are. information on what the implications for the OAM features are.
Last but not least, radio links present volatile characteristics. If Last but not least, radio links present volatile characteristics. If
the wireless networks use an unlicensed band, packet losses are not the wireless networks use an unlicensed band, packet losses are not
anymore temporally and spatially independent. Typically, links may anymore temporally and spatially independent. Typically, links may
exhibit a very bursty characteristic, where several consecutive exhibit a very bursty characteristic, where several consecutive
packets may be dropped. Thus, providing availability and reliability packets may be dropped, because of e.g. temporary external
on top of the wireless infrastructure requires specific Layer 3 interference. Thus, providing availability and reliability on top of
mechanisms to counteract these bursty losses. the wireless infrastructure requires specific Layer 3 mechanisms to
counteract these bursty losses.
Operations, Administration, and Maintenance (OAM) Tools are of Operations, Administration, and Maintenance (OAM) Tools are of
primary importance for IP networks [RFC7276]. It defines a toolset primary importance for IP networks [RFC7276]. They define a toolset
for fault detection, isolation, and performance measurement. for fault detection, isolation, and performance measurement.
The primary purpose of this document is to detail the specific The primary purpose of this document is to detail the specific
requirements of the OAM features recommended to construct a requirements of the OAM features recommended to construct a
predictable communication infrastructure on top of a collection of predictable communication infrastructure on top of a collection of
wireless segments. This document describes the benefits, problems, wireless segments. This document describes the benefits, problems,
and trade-offs for using OAM in wireless networks to provide and trade-offs for using OAM in wireless networks to provide
availability and predictability. availability and predictability.
1.1. Terminology 1.1. Terminology
skipping to change at page 6, line 25 skipping to change at page 6, line 25
RAW networks expect to make the communications reliable and RAW networks expect to make the communications reliable and
predictable on top of a wireless network infrastructure. Most predictable on top of a wireless network infrastructure. Most
critical applications will define an SLO to be required for the data critical applications will define an SLO to be required for the data
flows it generates. RAW considers network plane protocol elements flows it generates. RAW considers network plane protocol elements
such as OAM to improve the RAW operation at the service and the such as OAM to improve the RAW operation at the service and the
forwarding sub-layers. forwarding sub-layers.
To respect strict guarantees, RAW relies on the Path Selection Engine To respect strict guarantees, RAW relies on the Path Selection Engine
(PSE) (as defined in [I-D.pthubert-raw-architecture] to monitor and (PSE) (as defined in [I-D.pthubert-raw-architecture] to monitor and
maintain the network. As an example, a Software-Defined Network maintain the L3 network. A L2 scheduler may be used to allocate
(SDN) controller may be used to schedule the transmissions in the transmission opportunities, based on the radio link characteristics,
deployed network, based on the radio link characteristics, SLO of the SLO of the flows, the number of packets to forward. The PSE exploits
flows, the number of packets to forward. Thus, resources have to be the L2 ressources reserved by the scheduler, and organizes the L3
provisioned a priori to handle any defect. OAM represents the core paths to introduce redundancy, fault tolerance, and create backup
of the pre-provisioning process and maintains the network operational paths. OAM represents the core of the pre-provisioning process by
by updating the schedule dynamically. supervising the network. It maintains maintains a global view of the
network resources, to detect defects, faults, over-provisionning,
anomalies.
Fault-tolerance also assumes that multiple paths have to be Fault-tolerance also assumes that multiple paths have to be
provisioned so that an end-to-end circuit keeps on existing whatever provisioned so that an end-to-end circuit keeps on existing whatever
the conditions. The Packet Replication and Elimination Function the conditions. The Packet Replication and Elimination Function
([PREF-draft]) on a node is typically controlled by the PSE. OAM ([PREF-draft]) on a node is typically controlled by the PSE. OAM
mechanisms can be used to monitor that PREOF is working correctly on mechanisms can be used to monitor that PREOF is working correctly on
a node and within the domain. a node and within the domain.
To be energy-efficient, reserving some dedicated out-of-band To be energy-efficient, reserving some dedicated out-of-band
resources for OAM seems idealistic, and only in-band solutions are resources for OAM seems idealistic, and only in-band solutions are
considered here. considered here.
RAW supports both proactive and on-demand troubleshooting. RAW supports both proactive and on-demand troubleshooting.
Proactively, it is necessary to detect anomalies, to report defects,
or to reduce over-provisionning if it is not required. However, on-
demand may also be required to identify the cause of a specific
defect. Indeed, some specific faults may only be detected with a
global, detailed view of the network, which is too expensive to
acquire in the normal operating mode.
The specific characteristics of RAW are discussed below. The specific characteristics of RAW are discussed below.
2.1. Link concept and quality 2.1. Link concept and quality
In wireless networks, a _link_ does not exist physically. A device In wireless networks, a _link_ does not exist physically. A device
has a set of *neighbors* that correspond to all the devices that have has a set of *neighbors* that correspond to all the devices that have
a non null probability of receiving correctly its packets. We make a a non null probability of receiving correctly its packets. We make a
distinction between: distinction between:
o point-to-point (p2p) link with one transmitter and one receiver. o point-to-point (p2p) link with one transmitter and one receiver.
These links are used to transmit unicast packets. These links are used to transmit unicast packets.
o point-to-multipoint (p2m) link associates one transmitter and a o point-to-multipoint (p2m) link associates one transmitter and a
collection of receivers. For instance, broadcast packets assume collection of receivers. For instance, broadcast packets assume
the existence of p2m links to avoid duplicating a broadcast packet the existence of p2m links to avoid duplicating a broadcast packet
to reach each possible radio neighbor. to reach each possible radio neighbor.
In scheduled radio networks, p2m and p2p links are commonly not In scheduled radio networks, p2m and p2p links are commonly not
scheduled simultaneously to save energy. More precisely, only one scheduled simultaneously to save energy, and/or to reduce the number
part of the neighbors may wake-up at a given instant. of collisions. More precisely, only one part of the neighbors may
wake-up at a given instant.
Anycast are used in p2m links to improve the reliability. A Anycast are used in p2m links to improve the reliability. A
collection of receivers are scheduled to wake-up simutaneously, so collection of receivers are scheduled to wake-up simutaneously, so
that the transmission fails only if none of the receivers is able to that the transmission fails only if none of the receivers is able to
decode the packet. decode the packet.
Each wireless link is associated with a link quality, often measured Each wireless link is associated with a link quality, often measured
as the Packet Delivery Ratio (PDR), i.e., the probability that the as the Packet Delivery Ratio (PDR), i.e., the probability that the
receiver can decode the packet correctly. It is worth noting that receiver can decode the packet correctly. It is worth noting that
this link quality depends on many criteria, such as the level of this link quality depends on many criteria, such as the level of
skipping to change at page 9, line 29 skipping to change at page 9, line 31
all the receivers have different probabilities of forwarding a all the receivers have different probabilities of forwarding a
packet. To verify a delay SLO for a given flow, we must also packet. To verify a delay SLO for a given flow, we must also
consider all the possible combinations, leading to a probability consider all the possible combinations, leading to a probability
distribution function for end-to-end transmissions. If this distribution function for end-to-end transmissions. If this
verification is implemented naively, the number of combinations to verification is implemented naively, the number of combinations to
test may be exponential and too costly for wireless networks with low test may be exponential and too costly for wireless networks with low
bandwidth. bandwidth.
3.4. Route Tracing 3.4. Route Tracing
Wireless networks are meshed by nature: we have many redundant radio Wireless networks are broadcast by nature: a radio transmission can
links. These meshed networks are both an asset and a drawback: while be decoded by any radio neighbor. In multihop wireless networks,
several paths exist between two endpoints, and we should choose the several paths exist between two endpoints. In hub networks, a device
most efficient one(s), concerning specifically the reliability, and may be covered by several Access Points. We should choose the most
efficient path or AP, concerning specifically the reliability, and
the delay. the delay.
Thus, multipath routing can be considered to make the network fault- Thus, multipath routing / multi-attachment can be considered to make
tolerant. Even better, we can exploit the broadcast nature of the network fault-tolerant. Even better, we can exploit the
wireless networks to exploit meshed multipath routing: we may have broadcast nature of wireless networks to exploit: we may have
multiple Maintenance Intermediate Endpoints (MIP) for each hop in the multiple Maintenance Intermediate Endpoints (MIP) for each of this
path. In that way, each Maintenance Intermediate Endpoint has kind of hop. While it may be reasonable in the multi-attachment
several possible next hops in the forwarding plane. Thus, all the case, the complexity quickly increases with the path length. Indeed,
possible paths between two maintenance endpoints should be retrieved, each Maintenance Intermediate Endpoint has several possible next hops
which may quickly become untractable if we apply a naive approach. in the forwarding plane. Thus, all the possible paths between two
maintenance endpoints should be retrieved, which may quickly become
intractable if we apply a naive approach.
3.5. Fault Verification/detection 3.5. Fault Verification/detection
Wired networks tend to present stable performances. On the contrary, Wired networks tend to present stable performances. On the contrary,
wireless networks are time-variant. We must consequently make a wireless networks are time-variant. We must consequently make a
distinction between _normal_ evolutions and malfunction. distinction between _normal_ evolutions and malfunction.
3.6. Fault Isolation/identification 3.6. Fault Isolation/identification
The network has isolated and identified the cause of the fault. The network has isolated and identified the cause of the fault.
skipping to change at page 11, line 5 skipping to change at page 11, line 13
considered by an operator to take proper decisions. considered by an operator to take proper decisions.
These metrics should be collected per device, virtual circuit, and These metrics should be collected per device, virtual circuit, and
path, as detnet already does. However, we have to face in RAW to a path, as detnet already does. However, we have to face in RAW to a
finer granularity: finer granularity:
o per radio channel to measure, e.g., the level of external o per radio channel to measure, e.g., the level of external
interference, and to be able to apply counter-measures (e.g., interference, and to be able to apply counter-measures (e.g.,
blacklisting). blacklisting).
o per physical radio technology / interface if a device has multiple
NIC.
o per link to detect misbehaving link (assymetrical link, o per link to detect misbehaving link (assymetrical link,
fluctuating quality). fluctuating quality).
o per resource block: a collision in the schedule is particularly o per resource block: a collision in the schedule is particularly
challenging to identify in radio networks with spectrum reuse. In challenging to identify in radio networks with spectrum reuse. In
particular, a collision may not be systematic (depending on the particular, a collision may not be systematic (depending on the
radio characteristics and the traffic profile) radio characteristics and the traffic profile).
4.1. Worst-case metrics 4.1. Worst-case metrics
RAW inherits the same requirements as DetNet: we need to know the RAW inherits the same requirements as DetNet: we need to know the
distribution of a collection of metrics. However, wireless networks distribution of a collection of metrics. However, wireless networks
are known to be highly variable. Changes may be frequent, and may are known to be highly variable. Changes may be frequent, and may
exhibit a periodical pattern. Collecting and analyzing this amount exhibit a periodical pattern. Collecting and analyzing this amount
of measurements is challenging. of measurements is challenging.
Wireless networks are known to be lossy, and RAW has to implement Wireless networks are known to be lossy, and RAW has to implement
strategies to improve reliability on top of unreliable links. Hybrid strategies to improve reliability on top of unreliable links.
Automatic Repeat reQuest (ARQ) has typically to enable Reliability is typically achieved through Automatic Repeat Request
retransmissions based on the end-to-end reliability and latency (ARQ), and Forward Error Correction (FEC). Since the different flows
requirements. have not the same SLO, RAW must adjust the ARQ and FEC based on the
link and path characteristics.
4.2. Efficient data retrieval 4.2. Efficient data retrieval
We have to minimize the number of statistics / measurements to We have to minimize the number of statistics / measurements to
exchange: exchange:
o energy efficiency: low-power devices have to limit the volume of o energy efficiency: low-power devices have to limit the volume of
monitoring information since every bit consumes energy. monitoring information since every bit consumes energy.
o bandwidth: wireless networks exhibit a bandwidth significantly o bandwidth: wireless networks exhibit a bandwidth significantly
skipping to change at page 11, line 48 skipping to change at page 12, line 12
o per-packet cost: it is often more expensive to send several o per-packet cost: it is often more expensive to send several
packets instead of combining them in a single link-layer frame. packets instead of combining them in a single link-layer frame.
In conclusion, we have to take care of power and bandwidth In conclusion, we have to take care of power and bandwidth
consumption. The following techniques aim to reduce the cost of such consumption. The following techniques aim to reduce the cost of such
maintenance: maintenance:
on-path collection: some control information is inserted in the on-path collection: some control information is inserted in the
data packets if they do not fragment the packet (i.e., the MTU is data packets if they do not fragment the packet (i.e., the MTU is
not exceeded). Information Elements represent a standardized way not exceeded). Information Elements represent a standardized way
to handle such information; to handle such information. IP hop by hop extension headers may
help to collect metrics all along the path;
flags/fields: we have to set-up flags in the packets to monitor to flags/fields: we have to set-up flags in the packets to monitor to
be able to monitor the forwarding process accurately. A sequence be able to monitor the forwarding process accurately. A sequence
number field may help to detect packet losses. Similarly, path number field may help to detect packet losses. Similarly, path
inference tools such as [ipath] insert additional information in inference tools such as [ipath] insert additional information in
the headers to identify the path followed by a packet a the headers to identify the path followed by a packet a
posteriori. posteriori.
hierarchical monitoring; localized and centralized mechanisms have hierarchical monitoring: localized and centralized mechanisms have
to be combined together. Typically, a local mechanism should to be combined together. Typically, a local mechanism should
contiuously monitor a set of metrics and trigger distant OAM contiuously monitor a set of metrics and trigger distant OAM
exchances only when a fault is detected (but possibly not exchances only when a fault is detected (but possibly not
identified). For instance, local temporary defects must not identified). For instance, local temporary defects must not
trigger expensive OAM transmissions. trigger expensive OAM transmissions. Besides, the wireless
segments represent often the weakest parts of a path: the volume
of control information they produce has to be fixed accordingly.
4.3. Reporting OAM packets to the source 4.3. Reporting OAM packets to the source
TODO: statistics are collected when a packet goes from the source to TODO: statistics are collected when a packet goes from the source to
the destination. However, it has to be also reported by the source. the destination. However, it has to be also reported by the source.
Problem: resource may not be reserved bidirectionnaly. Even worse: Problem: resource may not be reserved bidirectionnaly. Even worse:
the inverse path may not exist. the inverse path may not exist.
Reporting everything exhaustively to the source may in most cases too
exensive. Thus, devices may take local decisions when possible, and
receive end-to-end information when possible.
5. Maintenance 5. Maintenance
Maintenance needs to facilitate the maintenance (repairs and Maintenance needs to facilitate the maintenance (repairs and
upgrades). In wireless networks, repairs are expected to occur much upgrades). In wireless networks, repairs are expected to occur much
more frequently, since the link quality may be highly time-variant. more frequently, since the link quality may be highly time-variant.
Thus, maintenance represents a key feature for RAW. Thus, maintenance represents a key feature for RAW.
5.1. Soft transition after reconfiguration 5.1. Soft transition after reconfiguration
Because of the wireless medium, the link quality may fluctuate, and Because of the wireless medium, the link quality may fluctuate, and
 End of changes. 24 change blocks. 
52 lines changed or deleted 75 lines changed or added

This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/