idnits 2.17.1 draft-ietf-trill-oam-req-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (August 22, 2012) is 4263 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Unused Reference: 'RFC4377' is defined on line 491, but no explicit reference was found in the text == Unused Reference: 'RFC6325' is defined on line 459, but no explicit reference was found in the text == Unused Reference: 'RFC791' is defined on line 485, but no explicit reference was found in the text == Unused Reference: 'RFC4379' is defined on line 487, but no explicit reference was found in the text == Outdated reference: A later version (-16) exists of draft-ietf-opsawg-oam-overview-06 -- Obsolete informational reference (is this intentional?): RFC 5101 (Obsoleted by RFC 7011) -- Obsolete informational reference (is this intentional?): RFC 2680 (Obsoleted by RFC 7680) -- Obsolete informational reference (is this intentional?): RFC 2679 (Obsoleted by RFC 7679) -- Obsolete informational reference (is this intentional?): RFC 4379 (Obsoleted by RFC 8029) Summary: 0 errors (**), 0 flaws (~~), 6 warnings (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 TRILL Working Group Tissa Senevirathne 2 Internet Draft CISCO 3 Intended status: Informational David Bond 4 IBM 5 Sam Aldrin 6 Yizhou Li 7 Huawei 8 Rohit Watve 9 CISCO 10 Anoop Ghanwani 11 DELL 12 Jon Hudson 13 Brocade 14 Naveen Nimmu 15 Broadcom 16 Radia Perlman 17 Intel 18 Tal Mizrahi 19 Marvell 21 August 22, 2012 22 Expires: February 2013 24 Requirements for Operations, Administration and Maintenance (OAM) in 25 TRILL 26 draft-ietf-trill-oam-req-01.txt 28 Status of this Memo 30 This Internet-Draft is submitted in full conformance with the 31 provisions of BCP 78 and BCP 79. 33 Internet-Drafts are working documents of the Internet Engineering 34 Task Force (IETF), its areas, and its working groups. Note that 35 other groups may also distribute working documents as Internet- 36 Drafts. 38 Internet-Drafts are draft documents valid for a maximum of six 39 months and may be updated, replaced, or obsoleted by other documents 40 at any time. It is inappropriate to use Internet-Drafts as 41 reference material or to cite them other than as "work in progress." 43 The list of current Internet-Drafts can be accessed at 44 http://www.ietf.org/ietf/1id-abstracts.txt 45 The list of Internet-Draft Shadow Directories can be accessed at 46 http://www.ietf.org/shadow.html 48 This Internet-Draft will expire on February 22,2013. 50 Copyright Notice 52 Copyright (c) 2012 IETF Trust and the persons identified as the 53 document authors. All rights reserved. 55 This document is subject to BCP 78 and the IETF Trust's Legal 56 Provisions Relating to IETF Documents 57 (http://trustee.ietf.org/license-info) in effect on the date of 58 publication of this document. Please review these documents 59 carefully, as they describe your rights and restrictions with 60 respect to this document. Code Components extracted from this 61 document must include Simplified BSD License text as described in 62 Section 4.e of the Trust Legal Provisions and are provided without 63 warranty as described in the Simplified BSD License. 65 Abstract 67 OAM (Operations, Administration and Maintenance) is a general term 68 used to identify functions and toolsets to troubleshoot and monitor 69 networks. This document presents, OAM Requirements applicable to 70 TRILL. 72 Table of Contents 74 1. Introduction...................................................3 75 1.1. Contributors..............................................3 76 2. Conventions used in this document..............................3 77 3. Terminology....................................................4 78 4. OAM Requirements...............................................5 79 4.1. Data Plane................................................5 80 4.2. Connectivity Verification.................................5 81 4.2.1. Unicast..............................................5 82 4.2.2. Multicast............................................6 83 4.3. Continuity Check..........................................6 84 4.4. Path Tracing..............................................6 85 4.5. General Requirements......................................7 86 4.6. Performance Monitoring....................................7 87 4.6.1. Packet Loss..........................................7 88 4.6.2. Packet Delay.........................................8 90 4.7. ECMP Utilization..........................................9 91 4.8. Security and Operational considerations...................9 92 4.9. Fault Indications.........................................9 93 4.10. Defect Indications.......................................9 94 4.11. Live Traffic monitoring.................................10 95 5. Security Considerations.......................................10 96 6. IANA Considerations...........................................10 97 7. References....................................................10 98 7.1. Normative References.....................................10 99 7.2. Informative References...................................11 100 8. Acknowledgments...............................................11 102 1. Introduction 104 OAM (Operations, Administration and Maintenance) generally covers 105 various production aspects of a network. In this document we use the 106 term OAM as defined in [RFC6291]. 108 Success of any mission critical network depends on the ability to 109 proactively monitor networks for faults, performance, etc. as well 110 as its ability to efficiently and quickly troubleshoot defects and 111 failures. A well-defined OAM toolset is a vital requirement for 112 wider adoption of TRILL as the next generation data forwarding 113 technology in larger networks such as data centers. 115 In this document we define the Requirements for TRILL OAM. It is 116 assumed that the readers are familiar with the OAM concepts and 117 terminologies defined in other OAM standards such as [8021ag], 118 [RFC5860]. This document does not attempt to redefine the terms and 119 concepts specified elsewhere. 121 1.1. Contributors 123 The following members were part of the design team that produced 124 this document. Their names are listed below in alphabetical order. 126 Anoop Ghanwani, David Bond, Donald Eastlake, Jon Hudson, Naveen 127 Nimmu, Radia Perlman, Rohit Watve, Sam Aldrin, Shivakumar Sundaram, 128 Tal Mizrahi, Thomas Narten, Tissa Senevirathne, Yizhou Li. 130 2. Conventions used in this document 132 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 133 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 134 document are to be interpreted as described in RFC-2119 [RFC2119]. 136 Although this document is not a protocol specification, the use of 137 this language clarifies the instructions to protocol designers 138 producing solutions that satisfy the requirements set out in this 139 document. 141 3. Terminology 143 Section: The term Section refers to a partial segment of a path 144 between any two given RBridges. As an example, consider the case 145 where RB1 is connected to RBx via RB2,RB3 and RB4. The segment 146 between RB2 to RB4 is referred to as a Section of the path RB1 to 147 RBx. 149 Flow: The term Flow indicates a set of packets that share the same 150 path and per-hop behavior (such as priority). A flow is typically 151 identified by a portion of the inner payload that affects the hop-by 152 hop forwarding decisions. This may contain Layer 2 through Layer 4 153 information. 155 All Selectable Least Cost Paths: The term "all selectable least cost 156 paths" refers to a subset of all potentially available least cost 157 paths to a specified destination RBridge that are available (and 158 usable) for forwarding of frames. It is important to note, in 159 practice, not all available least cost paths are selectable for 160 forwarding due to limitations in implementations. 162 Connectivity: The term connectivity indicates reachability between 163 an arbitrary RBridge RB1 and any other RBridge RB2. The specific 164 path can be either explicit (i.e. associated with a specific flow) 165 or unspecified. Unspecified means that messages used for 166 connectivity verification take whatever that path the RBs happen to 167 select. 169 Continuity Verification: Continuity Verification refers to proactive 170 verification of Connectivity between two RBridges at periodic 171 intervals and generation of explicit notification when Connectivity 172 failures occur. 174 Fault: The term Fault refers to an inability to perform a required 175 action, e.g., an unsuccessful attempt to deliver a packet. 177 Defect: The term Defect refers to an interruption in the normal 178 operation, such that over a period of time no packets are delivered 179 successfully. 181 Failure: The term Failure refers to the termination of the required 182 function over a longer period of time. Persistence of a defect for a 183 period of time is interpreted as a failure. 185 4. OAM Requirements 187 4.1. Data Plane 189 OAM frames, utilized for connectivity verification, continuity 190 checks, performance measurements, etc., will by default take 191 whatever the path TRILL chooses based on the current topology and 192 per-hop equal cost path choices. In some cases, it may be required 193 that the OAM frames utilize specific paths. Thus, it MUST be 194 possible to arrange that OAM frames follow the path taken by a 195 specific flow. 197 RBridges MUST have the ability to identify OAM frames destined for 198 them or which require processing by the OAM plane from normal data 199 frames. 201 TRILL OAM frames MUST NOT be forwarded out as native frames on end 202 station service enabled ports. 204 OAM MUST have ability to include all Ethernet traffic types carried 205 by TRILL, including both IP and non-IP traffic. 207 4.2. Connectivity Verification 209 4.2.1. Unicast 211 From an arbitrary RBridge RB1, OAM MUST have the ability to verify 212 connectivity to any other RBridge RB2. 214 From an arbitrary RBridge RB1, OAM MUST have the ability to verify 215 connectivity to any other RBridge RB2 for a specific flow via the 216 path associated with the specified flow 218 An RBridge SHOULD have the ability to verify the above connectivity 219 tests on sections. As an example, assume RB1 is connected to RB5 via 220 RB2, RB3 and RB4. An operator SHOULD be able to verify the RB1 to 221 RB5 connectivity on the section from RB3 to RB5. The difference is 222 that the ingress and egress TRILL nicknames in this case are RB1 and 223 RB5 as opposed to RB3 and RB5, even though the message itself may 224 originate at RB3. 226 4.2.2. Multicast 228 OAM MUST have the ability to verify connectivity, from an arbitrary 229 RBridge RB1, to either to specific set of RBridges or all member 230 RBridges, for a specified multicast tree. This functionality is 231 referred to as verification of the un-pruned multicast tree. 233 OAM MUST have the ability to verify connectivity, from an arbitrary 234 RBridge RB1, to either to a specific set of RBridges or all member 235 RBridges, for a specified multicast tree and for a specified flow. 236 This functionality is referred to as verification of the pruned 237 tree. 239 4.3. Continuity Check 241 OAM MUST provide functions that allow any arbitrary RBridge RB1 to 242 perform a Continuity Check to any other RBridge. 244 OAM MUST provide functions that allow any arbitrary RBridge RB1 to 245 perform a Continuity Check to any other RBridge using a path 246 associated with a specified flow. 248 OAM SHOULD provide functions that allow any arbitrary RBridge to 249 perform a Continuity Check to any other RBridge over all selectable 250 least cost paths. 252 OAM SHOULD provide the ability to perform a Continuity Check on 253 sections of any path within the network. 255 OAM SHOULD provide the ability to perform a multicast Continuity 256 Check for specified multi-destination tree(s) as well as specified 257 multi-destination tree and flow combinations. The former is referred 258 to as an un-pruned multi-destination tree Continuity Check and the 259 latter is referred to as a pruned tree Continuity Check. 261 4.4. Path Tracing 263 OAM MUST provide the ability to trace a path between any two 264 RBridges per specified unicast flow. 266 OAM SHOULD provide the ability to trace all selectable least cost 267 paths between any two RBridges. 269 OAM SHOULD provide functionality to trace all branches of a 270 specified multi-destination tree (un-pruned tree) 271 OAM SHOULD provide functionality to trace all branches of a 272 specified multi-destination tree for a specified flow (pruned tree). 274 4.5. General Requirements 276 OAM MUST provide the ability to initiate and maintain multiple 277 concurrent sessions for multiple OAM functions between any arbitrary 278 RBridge RB1 to any other RBridge. In general, multiple OAM 279 operations will run concurrently. For example, proactive continuity 280 checks may take place between RB1 and RB2 at the same time an 281 operator decides to test connectivity between the same two RBs. 282 Multiple OAM functions and instances of those functions MUST be able 283 to run concurrently without interfering with each other. 285 OAM MUST provide a single OAM framework for all TRILL OAM functions 287 OAM, as practical and as possible, SHOULD provide a single framework 288 between TRILL and other similar standards. 290 OAM MUST maintain related error and operational counters. Such 291 counters MUST be accessible via network management applications 292 (e.g. SNMP). 294 OAM functions related to continuity and connectivity checks MUST be 295 able to be invoked either proactively or on-demand. 297 OAM SHOULD NOT require extensions to the TRILL header.OAM MAY be 298 required to provide the ability to specify a desired response mode 299 for a specific OAM message. The desired response mode can be either 300 in-band, out-of band or none. 302 The OAM Framework MUST be extensible to future needs of TRILL and 303 the needs of other standard organizations. 305 OAM MAY provide methods to verify control plane and forwarding plane 306 alignments. 308 OAM SHOULD leverage existing OAM technologies, where practical. 310 4.6. Performance Monitoring 312 4.6.1. Packet Loss 314 In this document, term loss of a packet is used as defined in 315 [RFC2680] (see Section 2.4 of RFC2680). 317 NOTE: Term simulated flow below indicates a flow that is generated 318 by an RBRidge RB1 for OAM purposes. The fields of the simulated flow 319 may or may not be identical to the actual data. However, simulated 320 flow is required to follow the intended path. 322 OAM SHOULD provide the ability to measure packet loss statistics for 323 a simulated flow from any arbitrary RBridge RB1 to any other 324 RBridge. 326 OAM SHOULD provide the ability to measure packet loss statistics 327 over a segment, for a simulated flow between any arbitrary RBridge 328 RB1 to any other RBridge. 330 OAM SHOULD provide the ability to measure simulated packet loss 331 statistics between any two RBridges over all least cost paths. 333 An RBridge SHOULD be able to perform the above packet loss 334 measurement functions either proactively or on-demand. 336 4.6.2. Packet Delay 338 There are two types of packet delays -- one-way delay and two-way 339 delay (Round Trip Delay). 341 One-way delay is defined in [RFC2679] as the time elapsed from the 342 start of transmission of the first bit of a packet by an RBridge 343 until the reception of the last bit of the packet by the destination 344 RBridge. 346 Two-way delay is also referred to as Round Trip Delay is defined 347 similar to [RFC2681]; i.e. the time elapsed from the start of 348 transmission of the first bit of a packet by an RBridge until the 349 reception of the last bit of the packet by the same RBridge. 351 OAM SHOULD provide functions to measure two-way delay between two 352 RBridges for a specified flow. 354 OAM SHOULD provide functions to measure two-way delay between two 355 RBridges for a specified flow over a specific section. 357 OAM MAY provide functions to measure one-way delay between two 358 RBridges for a specified flow. 360 OAM MAY provide functions to measure one-way delay between two 361 RBridges for a specified flow over a specific section. 363 4.7. ECMP Utilization 365 OAM MAY provide functionality to monitor the effectiveness of per- 366 hop ECMP hashing. For example, individual RBridges could maintain 367 counters that show how packets are being distributed across equal 368 cost next hops for a specified destination RBridge or RBridges as a 369 result of ECMP hashing. 371 4.8. Security and Operational considerations 373 Methods MUST be provided to protect against exploitation of OAM 374 framework for security and denial of service attacks. 376 Methods SHOULD be provided to prevent OAM messages causing 377 congestion in the networks. Periodically generated messages with 378 high frequencies may lead to congestion, hence methods such as 379 shaping or rate limiting SHOULD be utilized. 381 4.9. Fault Indications 383 The term Fault refers to an inability to perform a required action, 384 e.g., an unsuccessful attempt to deliver a packet [OAMOVER]. The 385 unsuccessful attempt may be due to Hop Count expiry, invalid 386 nickname, etc. 388 OAM MUST provide a Fault Indication framework to notify faults to 389 the ingress RBRidge of the flow or other interested parties (such as 390 syslog servers). 392 OAM MUST provide functions to selectively enable or disable 393 different types of Fault Indications. 395 4.10. Defect Indications 397 [OAMOVER] defines "The term Defect refers to an interruption in the 398 normal operation, such as a consecutive period of time where no 399 packets are delivered successfully." 401 OAM SHOULD provide a framework for Defect Detection and Indication. 403 OAM implementations that provide Defect Indication MUST provide 404 methods to selectively enable or disable Defect Detection per defect 405 type. 407 OAM implementations that provide Defect Indication MUST provide 408 methods to configure Defect Detection thresholds per different types 409 of defects. 411 OAM implementations that provide Defect Indication facilities MUST 412 provide methods to log defect indications to a locally defined 413 archive such as log buffer or SNMP traps. 415 OAM implementations that provide Defect Indication facilities SHOULD 416 provide a Remote Defect Indication framework that facilitates 417 notifying the originator/owner of the flow experiencing the defect, 418 which is the ingress RBridge. 420 Remote Defect Indication MAY be either in-band or out-of-band. 422 4.11. Live Traffic monitoring 424 OAM implementations MAY provide methods to utilize live traffic for 425 troubleshooting and performance monitoring. 427 Implementations MAY leverage Data Driven CFM [8021Q] or IPFIX 428 [RFC5101] for the purpose of performance monitoring. 430 5. Security Considerations 432 Security Requirements are specified in section 4.8. 434 6. IANA Considerations 436 None 438 7. References 440 7.1. Normative References 442 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 443 Requirement Levels", BCP 14, RFC 2119, March 1997. 445 [OAMOVER] Mizrahi, T, et.al., "An Overview of Operations, 446 Administration, and Maintenance (OAM) Mechanisms", draft- 447 ietf-opsawg-oam-overview-06, Work in Progress, March 2012. 449 [RFC5860] Vigoureux, M., et.al., "Requirements for Operations, 450 Administration and Maintenance (OAM) in MPLS Transport 451 Networks", RFC5860, May 2010. 453 [RFC4377] Nadeau, T., et.al., "Operations and Management (OAM) 454 Requirements for Multi-Protocol Label Switched (MPLS) 455 Networks", RFC 4377, February 2006. 457 7.2. Informative References 459 [RFC6325] Perlman, R., et.al., "Routing Bridges (RBridges): Base 460 Protocol Specification", RFC 6325, July 2011. 462 [RFC5101] Claise, B., "Specification of the IP Flow Information 463 Export (IPFIX) Protocol for the Exchange of IP Traffic 464 Flow Information", RFC5101, January 2008. 466 [RFC2680] Almes, G., et.al. "A One-way Packet Loss Metric for IPPM", 467 RFC 2680, September 1999. 469 [RFC2679] Almes, G., et.al. "A One-way Delay Metric for IPPM", RFC 470 2679, September 1999. 472 [RFC2681] Almes, G., et.al. "A Round-trip Delay Metric for IPPM", 473 RFC 2681, September 1999. 475 [RFC6291] Anderson, L., et.al. "Guidelines for the Use of the "OAM" 476 Acronym in the IETF", RFC 6291, June 2011. 478 [8021ag] IEEE, "Virtual Bridged Local Area Networks Amendment 5: 479 Connectivity Fault Management", 802.1ag, 2007. 481 [8021Q] IEEE, "Media Access Control (MAC) Bridges and Virtual 482 Bridged Local Area Networks", IEEE Std 802.1Q-2011, 483 August, 2011. 485 [RFC791] "Internet Protocol", RFC 791, September 1981. 487 [RFC4379] Kompela, K., et.al. "Detecting Multi-protocol Label 488 Switched (MPLS) Data Plane Failures", RFC 4379, February 489 2006. 491 [RFC4377] Nadeau, T., et.al. "Operations and Management (OAM) 492 Requirements for Multi-protocol Label Switched 493 (MPLS)Networks", RFC 4377, February 2006. 495 8. Acknowledgments 497 Special acknowledgments to IEEE 802.1 chair, Tony Jeffree for 498 allowing us to solicit comments from IEEE 802.1 group. Also 499 recognized are the comments received from IEEE group, Ayal Loir and 500 others. 502 This document was prepared using 2-Word-v2.0.template.dot. 504 Authors' Addresses 506 Tissa Senevirathne 507 CISCO Systems 508 375 East Tasman Drive 509 San Jose, CA 95134 510 USA. 512 Phone: +1-408-853-2291 513 Email: tsenevir@cisco.com 515 David Bond 516 IBM 517 2051 Mission College Blvd 518 Santa Clara, CA 95054 519 USA 521 Phone: +1-603-339-7575 522 Email: mokon@mokon.net 524 Sam Aldrin 525 Huawei Technologies 526 2330 Central Express Way 527 Santa Clara, CA 95951 528 USA 530 Email: aldrin.ietf@gmail.com 531 Yizhou Li 532 Huawei Technologies 533 101 Software Avenue, 534 Nanjing 210012 535 China 537 Phone: +86-25-56625375 538 Email: liyizhou@huawei.com 540 Rohit Watve 541 CISCO Systems 542 375 East Tasman Drive 543 San Jose, CA 95134 544 USA. 546 Phone: +1-408-424-2091 547 Email: rwatve@cisco.com 549 Anoop Ghanwani 550 DELL 551 350 Holger Way 552 San Jose, CA 95134 553 USA. 555 Phone: +1-408-571-3500 556 Email: Anoop@duke.alumni.duke.edu 558 John Hudson 559 Brocade 560 120 Holger Way 561 San Jose, CA 95134 562 USA. 564 Email: jon.hudson@gmail.com 566 Naveen Nimmu 567 Broadcom 568 9th Floor, Building no 9, Raheja Mind space 569 Hi-Tec City, Madhapur, 570 Hyderabad - 500 081, INDIA 572 Phone: +1-408-218-8893 573 Email: naveen@broadcom.com 574 Radia Perlman 575 Intel Labs 576 2700 156th Ave NE, Suite 300, 577 Bellevue, WA 98007 578 USA. 580 Phone: +1-425-881-4824 581 Email: radia.perlman@intel.com 583 Tal Mizrahi 584 Marvell 585 6 Hamada St. 586 Yokneam, 20692 Israel 588 Email: talmi@marvell.com