idnits 2.17.1 draft-adubey-bfd-service-redundancy-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (January 24, 2020) is 1551 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'RFC5880' is defined on line 233, but no explicit reference was found in the text Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 BFD Workgroup S. Boutros, Ed. 3 Internet-Draft Ciena 4 Intended status: Standards Track A. Dubey 5 Expires: July 27, 2020 VMware 6 R. Rahman 7 Cisco 8 January 24, 2020 10 Service Redundancy using BFD 11 draft-adubey-bfd-service-redundancy-03 13 Abstract 15 In a data center, when multiple routing/service nodes are providing 16 single active redundancy for a set of L2, L3 and/or L4-L7 services. 17 Both non-revertive and revertive fail over modes are required for the 18 services. This draft describes a method to achieve the non-revertive 19 and revertive fail over modes for services using Bidirectional 20 Forwarding Detection (BFD). 22 Status of This Memo 24 This Internet-Draft is submitted in full conformance with the 25 provisions of BCP 78 and BCP 79. 27 Internet-Drafts are working documents of the Internet Engineering 28 Task Force (IETF). Note that other groups may also distribute 29 working documents as Internet-Drafts. The list of current Internet- 30 Drafts is at https://datatracker.ietf.org/drafts/current/. 32 Internet-Drafts are draft documents valid for a maximum of six months 33 and may be updated, replaced, or obsoleted by other documents at any 34 time. It is inappropriate to use Internet-Drafts as reference 35 material or to cite them other than as "work in progress." 37 This Internet-Draft will expire on July 27, 2020. 39 Copyright Notice 41 Copyright (c) 2020 IETF Trust and the persons identified as the 42 document authors. All rights reserved. 44 This document is subject to BCP 78 and the IETF Trust's Legal 45 Provisions Relating to IETF Documents 46 (https://trustee.ietf.org/license-info) in effect on the date of 47 publication of this document. Please review these documents 48 carefully, as they describe your rights and restrictions with respect 49 to this document. Code Components extracted from this document must 50 include Simplified BSD License text as described in Section 4.e of 51 the Trust Legal Provisions and are provided without warranty as 52 described in the Simplified BSD License. 54 Table of Contents 56 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 57 1.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 3 58 2. Solution Overview . . . . . . . . . . . . . . . . . . . . . . 3 59 2.1. Node failover . . . . . . . . . . . . . . . . . . . . . . 3 60 2.2. Per service failover for non-revertive services . . . . . 4 61 3. Security Considerations . . . . . . . . . . . . . . . . . . . 5 62 4. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 5 63 5. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 5 64 6. References . . . . . . . . . . . . . . . . . . . . . . . . . 5 65 6.1. Normative References . . . . . . . . . . . . . . . . . . 5 66 6.2. Informative References . . . . . . . . . . . . . . . . . 6 67 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 6 69 1. Introduction 71 This document describes how can a group of service/routing nodes in a 72 data center providing single active redundancy for multiple L2/L3 73 and/or L4/L7 services, can use BFD protocol to support non-revertive 74 as well as revertive fail over mode. 76 Typically, BFD is used between the group of service nodes to verify 77 the connectivity as well as the aliveness of the service nodes. The 78 assignment of which node in the group is the primary designated 79 forwarder for a given service can be determined using a centralized 80 or distributed control plane. 82 The use of BFD will be to communicate the set of services that are 83 being currently active on a given service node to the other service 84 nodes. On a given node failure, for a given service the backup node 85 will take over. If the service was configured to have a non- 86 revertive fail over mode, then the backup node should continue to 87 perform the service forwarding even after the primary node recovers 88 and comes back up. In order to do that, the backup node MUST inform 89 the primary node that it is currently active for the service. This 90 is achieved through the extension we are proposing to the BFD 91 protocol as will be described in the following sections. 93 It is to be noted that for revertive fail over mode of operation, the 94 primary node should be able to take over the active role from the 95 backup node when the primary node goes back to an operational state. 97 This can be as well communicated using the BFD session establishment 98 between the primary node and the backup node. 100 1.1. Terminology 102 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 103 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 104 document are to be interpreted as described in [RFC2119]. 106 2. Solution Overview 108 +----------+ 109 |Controller| 110 +----------+ 111 // | \ 112 // | \ 113 // | \ 114 +-------+ +-------+ +-------+ 115 |Node1 |-BFD-|Node2 |-BFD-|Node3 | 116 +-------+ +-------+ +-------+ 117 |--------------BFD--------| 119 Figure 1: Solution Overview 121 Figure 1 shows 3 routing nodes using BFD to implement the single 122 active redundancy for revertive and non-revertive services. More 123 than 3 routing nodes can be used. 125 Multiple L2/L3 and/or L4/L7 services are offered in a data center by 126 a set of routing/service nodes providing single active redundancy. 127 The provisioning of the services can be done using a centralized 128 control plane implemented in a controller or using a distributed 129 dynamic control plane. 131 2.1. Node failover 133 An implementation MAY choose to support only node failover and not a 134 per service failover. A node can be primary or backup for a given 135 service. On a primary node failure, all non-revertive and revertive 136 services will become active on the backup node. 138 In figure 1, lets assume that Node1 is the primary node for a set A 139 of non-revertive services with node2 as backup, and another set B of 140 non-revertive services with Node3 as backup. As well, Node1 is 141 primary for a set C of revertive services with Node2 as backup and, 142 another set D of revertive services with Node3 as backup. 144 If Node1 fails, Node2 and Node3 will set a new diag code in the BFD 145 control packet. This diag code will inform Node1 that both Node2 and 146 Node3 didn't fail, and Node1 MUST NOT activate the non-revertive set 147 of services A and B respectively, when it comes back up. The BFD 148 control packet with the new diag code will be sent after the BFD 149 session came up for at least twice the detection multiplier count. 151 Therefore, Node1 upon receiving the BFD control packet with the new 152 diag code, MUST NOT attempt to activate the non-revertive services, 153 but remain in standby state for the non-revertive services until the 154 Node2 or Node3 that took over fails. 156 Revertive services are assumed to revert back to the primary node 157 Node1, after the node recovers. Once the BFD session comes up 158 between the primary and backup nodes, the backup node should stop 159 forwarding for any revertive services. A node MUST start forwarding 160 all revertive services for which it is configured as a primary once 161 the BFD session comes up with the corresponding backup nodes. A node 162 MUST stop forwarding for revertive services for which it is a backup 163 once the BFD session comes up with the corresponding primary. 165 2.2. Per service failover for non-revertive services 167 An implementation MAY choose to support per service failover for non- 168 revertive services. For example, in figure1, some non-revertive 169 services could be active on Node1 while some non-revertive services 170 could be active on Node2 or Node3 for better load balancing of 171 services traffic. In this mode, every L2/L3 and/or L4/L7 non- 172 revertive service will be identified by a unique ID known across the 173 routing/service nodes providing the services. 175 A bitmap will be used to represent the non-revertive services, where 176 each non-revertive service is represented by one bit in the bitmap. 177 All the service nodes MUST have the same mapping of the bit position 178 to the non-revertive service unique ID. The bitmap position and the 179 unique service ID could be maintained by a network controller. 181 A node that is assigned as backup for a given non-revertive service 182 node will take over as active in either of the following cases: 1) 183 The node assigned as primary for this service failed. 2) This 184 specific service failed on the primary node for this service. 186 In case 1, the BFD session will go down since it is a node failure. 187 In case 2, BFD session between the nodes will remain up. In either 188 scenarios, the node assigned as secondary will become active for the 189 non-revertive service. In case 1, the secondary node will set the 190 new diag code in the BFD control packets once the BFD session is 191 established. The new diag code will be set in the BFD control 192 packets for at least twice the detection multiplier count. In case 193 2, this diag code will be set in the next BFD control packets sent 194 after the node takes over as Active for a given non-revertive 195 service. If there is at least one non-revertive service for which 196 this node is not active AND at least 1 non-revertive service for 197 which it is active, the node will also send the bitmap in the BFD 198 control packets payload. The bits identifying the active non- 199 revertive services will be set in this bitmap. The new diag code and 200 the optional bitmap payload will be sent in the BFD control packets 201 for at least twice the detection multiplier count. 203 Therefore, if a node receives a BFD control packet with the new diag 204 code set but no payload in the BFD control packet, this means that it 205 MUST NOT activate all non-revertive services for which this node is 206 primary. Whereas, if a payload is present in the BFD control packet 207 that has the new diag code set, the receiving node MUST NOT activate 208 the non-revertive services indicated by the set bits in the bitmap. 210 Per service failover is not applicable to revertive services. They 211 will behave the same way as described in section 2.1 213 3. Security Considerations 215 This document does not introduce any additional security constraints. 217 4. IANA Considerations 219 IANA is requested to assign a new diag code from the "BFD Diagnostic 220 Codes" 222 Value BFD Diagnostic Code Name 223 ----- ------------------------------------------------ 224 0xNN Out-lived and optional BitMap BFD control packet 225 payload for non-revertive services. 227 5. Acknowledgments 229 6. References 231 6.1. Normative References 233 [RFC5880] Katz, D. and D. Ward, "Bidirectional Forwarding Detection 234 (BFD)", RFC 5880, DOI 10.17487/RFC5880, June 2010, 235 . 237 6.2. Informative References 239 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 240 Requirement Levels", BCP 14, RFC 2119, 241 DOI 10.17487/RFC2119, March 1997, 242 . 244 Authors' Addresses 246 Sami Boutros (editor) 247 Ciena 248 USA 250 Email: sboutros@ciena.com 252 Ankur Dubey 253 VMware 254 USA 256 Email: adubey@vmware.com 258 Reshad Rahman 259 Cisco 260 USA 262 Email: rrahman@cisco.com