idnits 2.17.1 draft-geib-ippm-connectivity-monitoring-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == It seems as if not all pages are separated by form feeds - found 0 form feeds but 12 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (January 2, 2020) is 1575 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) No issues found here. Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 ippm R. Geib, Ed. 3 Internet-Draft Deutsche Telekom 4 Intended status: Standards Track January 2, 2020 5 Expires: July 5, 2020 7 A Connectivity Monitoring Metric for IPPM 8 draft-geib-ippm-connectivity-monitoring-02 10 Abstract 12 Within a Segment Routing domain, segment routed measurement packets 13 can be sent along pre-determined paths. This enables new kinds of 14 measurements. Connectivity monitoring allows to supervise the state 15 and performance of a connection or a (sub)path from one or a few 16 central monitoring systems. This document specifies a suitable 17 type-P connectivity monitoring metric. 19 Status of This Memo 21 This Internet-Draft is submitted in full conformance with the 22 provisions of BCP 78 and BCP 79. 24 Internet-Drafts are working documents of the Internet Engineering 25 Task Force (IETF). Note that other groups may also distribute 26 working documents as Internet-Drafts. The list of current Internet- 27 Drafts is at https://datatracker.ietf.org/drafts/current/. 29 Internet-Drafts are draft documents valid for a maximum of six months 30 and may be updated, replaced, or obsoleted by other documents at any 31 time. It is inappropriate to use Internet-Drafts as reference 32 material or to cite them other than as "work in progress." 34 This Internet-Draft will expire on July 5, 2020. 36 Copyright Notice 38 Copyright (c) 2020 IETF Trust and the persons identified as the 39 document authors. All rights reserved. 41 This document is subject to BCP 78 and the IETF Trust's Legal 42 Provisions Relating to IETF Documents 43 (https://trustee.ietf.org/license-info) in effect on the date of 44 publication of this document. Please review these documents 45 carefully, as they describe your rights and restrictions with respect 46 to this document. Code Components extracted from this document must 47 include Simplified BSD License text as described in Section 4.e of 48 the Trust Legal Provisions and are provided without warranty as 49 described in the Simplified BSD License. 51 Table of Contents 53 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 54 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 4 55 2. A brief segment routing connectivity monitoring framework . . 4 56 3. Singleton Definition for Type-P-SR-Path-Connectivity-and- 57 Congestion . . . . . . . . . . . . . . . . . . . . . . . . . 7 58 3.1. Metric Name . . . . . . . . . . . . . . . . . . . . . . . 7 59 3.2. Metric Parameters . . . . . . . . . . . . . . . . . . . . 7 60 3.3. Metric Units . . . . . . . . . . . . . . . . . . . . . . 8 61 3.4. Definition . . . . . . . . . . . . . . . . . . . . . . . 8 62 3.5. Discussion . . . . . . . . . . . . . . . . . . . . . . . 8 63 3.6. Methodologies . . . . . . . . . . . . . . . . . . . . . . 9 64 3.7. Errors and Uncertainties . . . . . . . . . . . . . . . . 10 65 3.8. Reporting the Metric . . . . . . . . . . . . . . . . . . 11 66 4. Singleton Definition for Type-P-SR-Path-Round-Trip-Delay- 67 Estimate . . . . . . . . . . . . . . . . . . . . . . . . . . 11 68 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11 69 6. Security Considerations . . . . . . . . . . . . . . . . . . . 11 70 7. References . . . . . . . . . . . . . . . . . . . . . . . . . 11 71 7.1. Normative References . . . . . . . . . . . . . . . . . . 11 72 7.2. Informative References . . . . . . . . . . . . . . . . . 12 73 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 12 75 1. Introduction 77 Within a Segment Routing domain, Segment Routing enables sending 78 measurement packets along pre-determined segment routed paths 79 [RFC8402]. A segment routed path may consist of pre-determined sub 80 paths down to specific router-interfaces. It may also consist of sub 81 paths spanning multiple routers, given that all segments to address a 82 desired path are available and known at the SR domain edge interface. 84 A Path Monitoring System or PMS (see [RFC8403]) is a dedicated 85 central Segment Routing domain monitoring device (as compared to a 86 distributed monitoring approach based on router-data and -functions 87 only). Monitoring individual sub-paths or point-to-point connections 88 is executed for different purposes. IGP exchanges hello messages 89 between neighbors to keep alive routing and switfly adapt routing to 90 topology changes. Network Operators may be interested in monitoring 91 connectivity and congestion of interfaces or sub-paths at a timescale 92 of seconds, minutes or hours. In both cases, the periodicity is 93 significantly smaller than commodity interface monitoring based on 94 router counters, which may be collected on a minute timescale the 95 processor- or monitoring data-load is to be kept low. 97 The IPPM architecture was a first step to that direction [RFC2330]. 98 Commodity IPPM solutions require dedicated measurement systems, a 99 large number of measurement agents and synchronised clocks. 100 Monitoring a domain from edge to edge by commodity IPPM solutions 101 increase scalability of the monitoring system. But localising the 102 site of a detected change in network behaviour then may require 103 network tomography methods. 105 The IPPM Metrics for Measuring Connectivity offer generic 106 connectivity metrics [RFC2678]. These metrics allow to measure 107 connectivity between end nodes without making any assumption on the 108 paths between them. The metric and the type-p packet specified by 109 this document follow a different approach: they are designed to 110 monitor connectivity and performance of a specific single link or a 111 path segment. The underlying definition of connectivity is partially 112 the same: a packet not reaching a destination indicates a loss of 113 connectivity. An IGP re-route may indicate a loss of a link, while 114 it might not cause loss of connectivity beween end systems. The 115 metric specified here is able to detect the loss of a link, if the 116 change in end-to-end delay along a new route is differing from that 117 of the original path. 119 A Segment Routing PMS which is part of an SR domain is IGP topology 120 aware, covering the IP and (if present) the MPLS layer topology 121 [RFC8402]. This allows to design a PMS which can steer packets along 122 arbitrary pre-determined concatenated sub-paths, identified by 123 suitable segments. Basically, a number of overlaid measurement paths 124 is set up. The delays of packets sent along each on of these paths 125 is measured. Single changes in topology cause correlated changes in 126 the measurement packet delay or connectivity of different measurement 127 paths. By a suitable set up, the number of measurement paths may be 128 one per connection (or path) to be monitored, but identify the 129 location of congestion (in addition to montoring information revealed 130 by a comparable single commodity ICMP ping relation, which fails to 131 identify the location of a congested interface). Combining the SR 132 measurement path configuration with a priori network tomography 133 assumptions and methods allows to localise detected changes. The 134 latter requires setting up multiple measurement paths which share 135 sub-paths following the constraints derived from network tomography, 136 and a suitable evaluation of measurement results. 138 This document specifies a type-p metric determining properties of an 139 SR path which allows to monitor connectivity and congestion of 140 interfaces and further allows to locate the path or interface which 141 caused a change in the reported type-p metric. This document is 142 focussed on the MPLS layer, but the methodolgy may be applied within 143 SR domains or MPLS domains in general. 145 1.1. Requirements Language 147 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 148 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 149 document are to be interpreted as described in RFC 2119 [RFC2119]. 151 2. A brief segment routing connectivity monitoring framework 153 The Segment Routing IGP topology information consists of the IP and 154 (if present) the MPLS layer topology. The minimum SR topology 155 information consists of Node-Segment-Identifiers (Node-SID), 156 identifying an SR router. The IGP exchange of Adjacency-SIDs [I- 157 D.draft-ietf-isis-segment-routing-extensions], which identify local 158 interfaces to adjacent nodes, is optional. It is RECOMMENDED to 159 distribute Adj-SIDs in a domain operating a PMS to monitor 160 connectivity as specified below. If Adj-SIDs aren't availbale, 161 [RFC8029] provides methods how to steer packets along desired paths 162 by the proper choice of an MPLS Echo-request IP-destination address. 163 A detailed description of [RFC8029] methods as a replacement of Adj- 164 SIDs is out of scope of this document. 166 A round trip measurement between two adjacent nodes is a simple 167 method to monitor connectivity of a connecting link. If multiple 168 links are operational between two adjacent nodes and only a single 169 one fails, a single plain round trip measurement may fail to identify 170 which link has failed. A round trip measurement also fails to 171 identify which inteface is congested, even if only a single link 172 connects two adjacent nodes. 174 Segment Routing enables the set-up of extended measurement loops. 175 Several different measurement loops can be set up. If these form a 176 partial overlay, any change in the network properties impacts more 177 than a single loops round trip time (or causes drops of packets of 178 more than one loop). Randomly chosen loop paths including the 179 interfaces or paths to be monitored may fail to produce unique result 180 patterns. The approach picked here uses specified measurement loop 181 and path overlay design. A centralised monitoring approach benefits 182 from keeping the number of required measurement loops low. This 183 improves scalability by minimising the number of measurement loops. 184 This also keeps the number of required packets and results to be 185 evaluated and correlated low. 187 An additional property of the measurement path set-up specified below 188 is that it allows to estimate the packet round trip and the one way 189 delay of a monitored link (or path). The delay along a single link 190 is not perfectly symmetric. Packet processing causes small delay 191 differences per interface and direction. These cause an error, which 192 can't be quantified or removed by the specified method. Quantifying 193 this error requires a different measurement set-up. As this will 194 introduce additional measurements loops, packets and evaluations, the 195 cost in terms of reduced scalability is not felt to be worth the 196 benefit in measurement accuracy. IPPM however honors precision more 197 than accuracy and the mentioned processing differences are relatively 198 stable, resulting in relatively precise delay estimates. 200 An example SR domain is shown below. The PMS shown should monitor 201 the connectivity of all 6 links between nodes L100 and L200 one one 202 side and the connected nodes L050, L060 and L070 on the other side. 203 The round trip times per measurement loop are assumed to exhibit 204 unique delays. 206 +---+ +----+ +----+ 207 |PMS| |L100|-----|L050| 208 +---+ +----+\ /+----+ 209 | / \ \_/_____ 210 | / \ / \+----+ 211 +----+/ \/_ +----|L060| 212 |L300| / |/ +----+ 213 +----+\ / /\_ 214 \ / / \ 215 \+----+ / +----+ 216 |L200|-----|L070| 217 +----+ +----+ 219 Connectivity verification with a PMS 221 Figure 1 223 The SID values are picked for convenient reading only. Node-SID: 100 224 identifies L100, Node-SID: 300 identifies L300 and so on. Adj-SID 225 10050: Adjacency L100 to L050, Adj-SID 10060: Adjacency L100 to L060, 226 Adj-SID 60200: Adjacency L60 to L200 228 Monitoring the 6 links between Ln00 and L0m0 nodes requires 6 229 measurement loops, each of which has the following properties: 231 o Each loop follows a single round trip from one Ln00 to one L0m0 232 (e.g., between L100 and L050). 234 o Each loop passes two more links: one between that Ln00 and another 235 L0m0 and from there to the other Ln00 (e.g., between L100 and L060 236 and then L060 to L200) 238 o Every link is passed by a single round trip per measurement loop 239 only once and only once unidirectional by two other loops, and the 240 latter two pass along opposing directions (that's three loops 241 passing each single link, e.g., one having a round trip L100 to 242 L050 and back, a second passing L100 to L050 only and a third loop 243 passing L050 to L100 only). 245 Note that any 6 links between two to six nodes can be monitored that 246 way too (if multiple parallel links between two nodes are monitored, 247 the differences in delay may require a sufficiently high clock 248 resulotion, if applicable). 250 This results in 6 measurement loops for the given example (the start 251 and end of each measurement loop is PMS to L300 to L100 or L200 and a 252 similar sub-path on the return leg. It is ommitted here for 253 brevity): 255 1. M1 is the delay along L100 -> L050 -> L100 -> L060 -> L200 257 2. M2 is the delay along L100 -> L060 -> L100 -> L070 -> L200 259 3. M3 is the delay along L100 -> L070 -> L100 -> L050 -> L200 261 4. M4 is the delay along L200 -> L050 -> L200 -> L060 -> L100 263 5. M5 is the delay along L200 -> L060 -> L200 -> L070 -> L100 265 6. M6 is the delay along L200 -> L070 -> L200 -> L050 -> L100 267 An example for a stack of a loop consisting of Node-SID segments 268 allowing to caprture M1 is (top to bottom): 100 | 050 | 100 | 060 | 269 200 | PMS. 271 An example for a stack of Adj-SID segments the loop resulting in M1 272 is (top to bottom): 100 | 10050 | 50100 | 10060 | 60200 | PMS. As 273 can be seen, the Node-SIDs 100 and PMS are present at top and bottom 274 of the segment stack. Their purpose is to transport the packet from 275 the PMS to the start of the measurement loop at L100 and return it to 276 the PMS from its end. 278 The measurement loops set up as shown have the following properties: 280 o If the loops are set up using Node-SIDs only, any single complete 281 loss of connectivity caused by a failing single link between any 282 Ln00 and any L0m0 node briefly disturbs (and changes the measured 283 delay) of three loops. Traffic to Node-SIDs is rerouted. 285 o If the loops are set up using Adj-SIDs only (and Node-SIDs only to 286 send the packet from PMS to the loop starting point and from the 287 loop end back to the PMS), any single complete loss of 288 connectivity caused by a failing single link between any Ln00 and 289 any L0m0 node terminates the traffic along three loops. The 290 packets of these loops will be dropped, until the link gets back 291 into service. Traffic to Adj-SIDs is not rerouted. 293 o Any congested single interface between any Ln00 and any L0m0 node 294 only impacts the measured delay of two measurement loops. 296 o As an example, the formula for a single Round Trip Delay (RTD) is 297 shown here 4 * RTD_L100-L050-L100 = 3 * M1 + M3 + M6 - M2 - M4 - 298 M5 300 A closer look reveals that each single event of interest for the 301 proposed metric, which are a loss of connectivity or a case of 302 congestion, uniquely only impacts a single a-priori determinable set 303 of measurement loops. If, e.g., connectivity is lost between L200 304 and L050, measurement loops (3), (4) and (6) indicate a change in the 305 measured delay. 307 As a second example, if the interface L070 to L100 is congested, 308 measurement loops (3) and (5) indicate a change in the measured 309 delay. Without listing all events, all cases of single losses of 310 connectivity or single events of congestion influence only delay 311 measurements of a unique set of measurement loops. 313 A congestion event adding latency to two specific measurement loops 314 allows calculation of the delay added by the queue at the congested 315 interface. Thus, the resulting RTD increase can be assigned to a 316 single interface. 318 3. Singleton Definition for Type-P-SR-Path-Connectivity-and-Congestion 320 3.1. Metric Name 322 Type-P-SR-Path-Connectivity-and-Congestion 324 3.2. Metric Parameters 326 o Src, the IP address of a source host 328 o Dst, the IP address of a destination host if IP routing is 329 applicable; in the case of MPLS routing, a diagnostic address as 330 specified by [RFC8029] 332 o T, a time 334 o lambda, a rate in reciprocal seconds 335 o L, a packet length in bits. The packets of a Type P packet stream 336 from which the sample Path-Connectivity-and-Congestion metric is 337 taken MUST all be of the same length. 339 o MLA, a Monitoring Loop Address information ensuring that a 340 singleton passes a single sub-path_a to be monitored 341 bidirectional, a sub-path_b to be monitored unidirectional and a 342 sub-path_c to be monitored unidirectional, where sub-path_a, -_b 343 and -_c MUST NOT be identical. 345 o P, the specification of the packet type, over and above the source 346 and destination addresses 348 o DS, a constant time interval between two type-P packets 350 3.3. Metric Units 352 A sequence of consecutive time values. 354 3.4. Definition 356 A moving average of AV time values per measurement path is compared 357 by a change point detection algorithm. The temporal packet spacing 358 value DS represents the smallest period within which a change in 359 connectivity or congestion may be detected. 361 A single loss of connectivity of a sub-path between two nodes affects 362 three different measurement paths. Depending on the value chosen for 363 DS, packet loss might occur (note that the moving average evaluation 364 needs to span a longer period than convergence time; alternatively, 365 packet-loss visible along the three measurement paths may serve as an 366 evaluation criterium). After routing convergence the type-p packets 367 along the three measurement paths show a change in delay. 369 A congestion of a single interface of a sub-path connecting two nodes 370 affects two different measurement paths. The the type-p packets 371 along the two congested measurement paths show an additional change 372 in delay. 374 3.5. Discussion 376 Detection of a multiple losses of monitored sub-path connectivity or 377 congestion of a multiple monitored sub-paths may be possible. These 378 cases have not been investigated, but may occur in the case of Shared 379 Risk Link Groups. Monitoring Shared Risk LinkGroups and sub-paths 380 with multiple failures abd congestion is not within scope of this 381 document. 383 3.6. Methodologies 385 For the given type-p, the methodology is as follows: 387 o The set of measurement paths MUST be routed in a way that each 388 single loss of connectivity and each case of single interface 389 congestion of one of the sub-paths passed by a type-p packet 390 creates a unique pattern of type-p packets belonging to a subset 391 of all configured measurement paths indicate a change in the 392 measured delay. As a minimum, each sub-path to be monitored MUST 393 be passed 395 o 397 * by one measurement_path_1 and its type-p packet in 398 bidirectional direction 400 * by one measurement_path_2 and its type-p packet in "downlink" 401 direction 403 * by one measurement_path_3 and its type-p packet in "uplink" 404 direction 406 o "Uplink" and "Downlink" have no architectural relevance. The 407 terms are chosen to express, that the packets of 408 measurement_path_2 and measuremnt_path_3 pass the monitored sub- 409 path unidirectional in opposing direction. Measuremnt_path_1, 410 measurement_path_2 and measurement_path_3 MUST NOT be identical. 412 o All measurement paths SHOULD terminate between identical sender 413 and receiver interfaces. It is recommended to connect the sender 414 and receiver as closely to the paths to be monitored as possible. 415 Each intermediate sub-path between sender and receiver one one 416 hand and sub-paths to be monitored is an additional source of 417 errors requiring separate monitoring. 419 o Segment Routed domains supporting Node- and Adj-SIDs should enable 420 the monitoring path set-up as specified. Other routing protocols 421 may be used as well, but the monitoring path set up might be 422 complex or impossible. 424 o Pre-compute how the two and three measurement path delay changes 425 correlate to sub-path connectivity and congestion patterns. 426 Absolute change valaues aren't required, a simultaneous change of 427 two or three particular measurement paths is. 429 o Ensure that the temporal resolution of the measurement clock 430 allows to reliably capture a unique delay value for each 431 configured measurement path while sub-path connectivity is 432 complete and no congestion is present. 434 o Synchronised clocks are not strictly required, as the metric is 435 evaluating differences in delay. Changes in clock synchronisation 436 SHOULD NOT be close to the time interval within which changes in 437 connectivity or congestion should be monitored. 439 o At the Src host, select Src and Dst IP addresses, and address 440 information to route the type-p packet along one of the configured 441 measurement path. Form a test packet of Type-P with these 442 addresses. 444 o Configure the Dst host access to receive the packet. 446 o At the Src host, place a timestamp, a sequence number and a unique 447 identifier of the measurement path in the prepared Type-P packet, 448 and send it towards Dst. 450 o Capture the one-way delay and determine packet-loss by the metrics 451 specified by [RFC7679] and [RFC7680] respectively and store the 452 result for the path. 454 o If two or three subpaths indicate a change in delay, report a 455 change in connectivity or congestion status as pre-computed above. 457 o If two or three sub paths indicate a change in delay, report a 458 change in connectivity or congestion status as pre-computed above. 460 Note that monitoring 6 sub paths requires setting up 6 monitoring 461 paths as shown in the figure above. 463 3.7. Errors and Uncertainties 465 Sources of error are: 467 o Measurement paths whose delays don't indicate a change after sub- 468 path connectivity changed. 470 o A timestamps whose resolution is missing or inacurrate at the 471 delays measured for the different monitoring paths. 473 o Multiple occurrences of sub path connectivity and congestion. 475 o Loss of connectivity and congestion along sub-paths connecting the 476 measurement device(s) with the sub-paths to be monitored. 478 3.8. Reporting the Metric 480 The metric reports loss of connectivity of monitored sub-path or 481 congestion of an interface and identifies the sub-path and the 482 direction of traffic in the case of congestion. 484 4. Singleton Definition for Type-P-SR-Path-Round-Trip-Delay-Estimate 486 This section will be added in a later version, if there's interest in 487 picking up this work. 489 5. IANA Considerations 491 If standardised, the metric will require an entry in the IPPM metric 492 registry. 494 6. Security Considerations 496 This draft specifies how to use methods specified or described within 497 [RFC8402] and [RFC8403]. It does not introduce new or additional SR 498 features. The security considerations of both references apply here 499 too. 501 7. References 503 7.1. Normative References 505 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 506 Requirement Levels", BCP 14, RFC 2119, 507 DOI 10.17487/RFC2119, March 1997, 508 . 510 [RFC2678] Mahdavi, J. and V. Paxson, "IPPM Metrics for Measuring 511 Connectivity", RFC 2678, DOI 10.17487/RFC2678, September 512 1999, . 514 [RFC7679] Almes, G., Kalidindi, S., Zekauskas, M., and A. Morton, 515 Ed., "A One-Way Delay Metric for IP Performance Metrics 516 (IPPM)", STD 81, RFC 7679, DOI 10.17487/RFC7679, January 517 2016, . 519 [RFC7680] Almes, G., Kalidindi, S., Zekauskas, M., and A. Morton, 520 Ed., "A One-Way Loss Metric for IP Performance Metrics 521 (IPPM)", STD 82, RFC 7680, DOI 10.17487/RFC7680, January 522 2016, . 524 [RFC8029] Kompella, K., Swallow, G., Pignataro, C., Ed., Kumar, N., 525 Aldrin, S., and M. Chen, "Detecting Multiprotocol Label 526 Switched (MPLS) Data-Plane Failures", RFC 8029, 527 DOI 10.17487/RFC8029, March 2017, 528 . 530 [RFC8402] Filsfils, C., Ed., Previdi, S., Ed., Ginsberg, L., 531 Decraene, B., Litkowski, S., and R. Shakir, "Segment 532 Routing Architecture", RFC 8402, DOI 10.17487/RFC8402, 533 July 2018, . 535 7.2. Informative References 537 [RFC2330] Paxson, V., Almes, G., Mahdavi, J., and M. Mathis, 538 "Framework for IP Performance Metrics", RFC 2330, 539 DOI 10.17487/RFC2330, May 1998, 540 . 542 [RFC8403] Geib, R., Ed., Filsfils, C., Pignataro, C., Ed., and N. 543 Kumar, "A Scalable and Topology-Aware MPLS Data-Plane 544 Monitoring System", RFC 8403, DOI 10.17487/RFC8403, July 545 2018, . 547 Author's Address 549 Ruediger Geib (editor) 550 Deutsche Telekom 551 Heinrich Hertz Str. 3-7 552 Darmstadt 64295 553 Germany 555 Phone: +49 6151 5812747 556 Email: Ruediger.Geib@telekom.de