idnits 2.17.1 draft-adubey-bfd-service-redundancy-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: Therefore, Node1 upon receiving the BFD control packet with the new diag code, MUST not attempt to activate the non-revertive services, but remain in standby state for the non-revertive services until the Node2 or Node3 that took over fails. -- The document date (July 1, 2019) is 1761 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC2119' is mentioned on line 107, but not defined == Unused Reference: 'RFC5880' is defined on line 233, but no explicit reference was found in the text Summary: 1 error (**), 0 flaws (~~), 4 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 INTERNET-DRAFT Sami Boutros 3 Intended Status: Standard Track Ankur Dubey 4 VMware 6 Reshad Rahman 7 Cisco 9 Expires: January 2, 2020 July 1, 2019 11 Service Redundancy using BFD 12 draft-adubey-bfd-service-redundancy-02 14 Abstract 16 In a data center, when multiple routing/service nodes are providing 17 single active redundancy for a set of L2, L3 and/or L4-L7 services. 18 Both non-revertive and revertive fail over modes are required for the 19 services. This draft describes a method to achieve the non-revertive 20 and revertive fail over modes for services using Bidirectional 21 Forwarding Detection (BFD). 23 Status of this Memo 25 This Internet-Draft is submitted to IETF in full conformance with the 26 provisions of BCP 78 and BCP 79. 28 Internet-Drafts are working documents of the Internet Engineering 29 Task Force (IETF), its areas, and its working groups. Note that 30 other groups may also distribute working documents as 31 Internet-Drafts. 33 Internet-Drafts are draft documents valid for a maximum of six months 34 and may be updated, replaced, or obsoleted by other documents at any 35 time. It is inappropriate to use Internet-Drafts as reference 36 material or to cite them other than as "work in progress." 38 The list of current Internet-Drafts can be accessed at 39 http://www.ietf.org/1id-abstracts.html 41 The list of Internet-Draft Shadow Directories can be accessed at 42 http://www.ietf.org/shadow.html 44 Copyright and License Notice 46 Copyright (c) 2019 IETF Trust and the persons identified as the 47 document authors. All rights reserved. 49 This document is subject to BCP 78 and the IETF Trust's Legal 50 Provisions Relating to IETF Documents 51 (http://trustee.ietf.org/license-info) in effect on the date of 52 publication of this document. Please review these documents 53 carefully, as they describe your rights and restrictions with respect 54 to this document. Code Components extracted from this document must 55 include Simplified BSD License text as described in Section 4.e of 56 the Trust Legal Provisions and are provided without warranty as 57 described in the Simplified BSD License. 59 Table of Contents 61 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 62 1.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . 3 63 2. Solution Overview . . . . . . . . . . . . . . . . . . . . . . . 4 64 2.1 Node failover . . . . . . . . . . . . . . . . . . . . . . . 4 65 2.2 Per service failover for non-revertive services . . . . . . 5 66 3 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 6 67 4 Security Considerations . . . . . . . . . . . . . . . . . . . . 6 68 5 IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 6 69 6 References . . . . . . . . . . . . . . . . . . . . . . . . . . 6 70 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 6 72 1 Introduction 74 This document describes how can a group of service/routing nodes in a 75 data center providing single active redundancy for multiple L2/L3 76 and/or L4/L7 services, can use BFD protocol to support non-revertive 77 as well as revertive fail over mode. 79 Typically, BFD is used between the group of service nodes to verify 80 the connectivity as well as the aliveness of the service nodes. The 81 assignment of which node in the group is the primary designated 82 forwarder for a given service can be determined using a centralized 83 or distributed control plane. 85 The use of BFD will be to communicate the set of services that are 86 being currently active on a given service node to the other service 87 nodes. On a given node failure, for a given service the backup node 88 will take over. If the service was configured to have a non-revertive 89 fail over mode, then the backup node should continue to perform the 90 service forwarding even after the primary node recovers and comes 91 back up. In order to do that, the backup node MUST inform the primary 92 node that it is currently active for the service. This is achieved 93 through the extension we are proposing to the BFD protocol as will be 94 described in the following sections. 96 It is to be noted that for revertive fail over mode of operation, the 97 primary node should be able to take over the active role from the 98 backup node when the primary node goes back to an operational state. 99 This can be as well communicated using the BFD session establishment 100 between the primary node and the backup node. 102 1.1 Terminology 104 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 105 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 106 document are to be interpreted as described in RFC 2119 [RFC2119]. 108 2. Solution Overview 110 +----------+ 111 |Controller| 112 +----------+ 113 // | \ 114 // | \ 115 // | \ 116 +-------+ +-------+ +-------+ 117 |Node1 |-BFD-|Node2 |-BFD-|Node3 | 118 +-------+ +-------+ +-------+ 119 |--------------BFD--------| 121 Figure 1: 123 Figure 1 shows 3 routing nodes using BFD to implement the single 124 active redundancy for revertive and non-revertive services. More than 125 3 routing nodes can be used. 127 Multiple L2/L3 and/or L4/L7 services are offered in a data center by 128 a set of routing/service nodes providing single active redundancy. 129 The provisioning of the services can be done using a centralized 130 control plane implemented in a controller or using a distributed 131 dynamic control plane. 133 2.1 Node failover 135 An implementation MAY choose to support only node failover and not a 136 per service failover. A node can be primary or backup for a given 137 service. On a primary node failure, all non-revertive and revertive 138 services will become active on the backup node. 140 In figure 1, lets assume that Node1 is the primary node for a set A 141 of non-revertive services with node2 as backup, and another set B of 142 non-revertive services with Node3 as backup. As well, Node1 is 143 primary for a set C of revertive services with Node2 as backup and, 144 another set D of revertive services with Node3 as backup. 146 If Node1 fails, Node2 and Node3 will set a new diag code in the BFD 147 control packet. This diag code will inform Node1 that both Node2 and 148 Node3 didn't fail, and Node1 MUST NOT activate the non-revertive set 149 of services A and B respectively, when it comes back up. The BFD 150 control packet with the new diag code will be sent after the BFD 151 session came up for at least twice the detection multiplier count. 153 Therefore, Node1 upon receiving the BFD control packet with the new 154 diag code, MUST not attempt to activate the non-revertive services, 155 but remain in standby state for the non-revertive services until the 156 Node2 or Node3 that took over fails. 158 Revertive services are assumed to revert back to the primary node 159 Node1, after the node recovers. Once the BFD session comes up between 160 the primary and backup nodes, the backup node should stop forwarding 161 for any revertive services. A node MUST start forwarding all 162 revertive services for which it is configured as a primary once the 163 BFD session comes up with the corresponding backup nodes. A node MUST 164 stop forwarding for revertive services for which it is a backup once 165 the BFD session comes up with the corresponding primary. 167 2.2 Per service failover for non-revertive services 169 An implementation MAY choose to support per service failover for non- 170 revertive services. For example, in figure1, some non-revertive 171 services could be active on Node1 while some non-revertive services 172 could be active on Node2 or Node3 for better load balancing of 173 services traffic. In this mode, every L2/L3 and/or L4/L7 non- 174 revertive service will be identified by a unique ID known across the 175 routing/service nodes providing the services. 177 A bitmap will be used to represent the non-revertive services, where 178 each non-revertive service is represented by one bit in the bitmap. 179 All the service nodes MUST have the same mapping of the bit position 180 to the non-revertive service unique ID. The bitmap position and the 181 unique service ID could be maintained by a network controller. 183 A node that is assigned as backup for a given non-revertive service 184 node will take over as active in either of the following cases: 1) 185 The node assigned as primary for this service failed. 2) This 186 specific service failed on the primary node for this service. 188 In case 1, the BFD session will go down since it is a node failure. 189 In case 2, BFD session between the nodes will remain up. In either 190 scenarios, the node assigned as secondary will become active for the 191 non-revertive service. In case 1, the secondary node will set the new 192 diag code in the BFD control packets once the BFD session is 193 established. The new diag code will be set in the BFD control packets 194 for at least twice the detection multiplier count. In case 2, this 195 diag code will be set in the next BFD control packets sent after the 196 node takes over as Active for a given non-revertive service. If there 197 is at least one non-revertive service for which this node is not 198 active AND at least 1 non-revertive service for which it is active, 199 the node will also send the bitmap in the BFD control packets 200 payload. The bits identifying the active non-revertive services will 201 be set in this bitmap. The new diag code and the optional bitmap 202 payload will be sent in the BFD control packets for at least twice 203 the detection multiplier count. 205 Therefore, if a node receives a BFD control packet with the new diag 206 code set but no payload in the BFD control packet, this means that it 207 MUST NOT activate all non-revertive services for which this node is 208 primary. Whereas, if a payload is present in the BFD control packet 209 that has the new diag code set, the receiving node MUST NOT activate 210 the non-revertive services indicated by the set bits in the bitmap. 212 Per service failover is not applicable to revertive services. They 213 will behave the same way as described in section 2.1 215 3 Acknowledgements 217 4 Security Considerations 219 This document does not introduce any additional security constraints. 221 5 IANA Considerations 223 IANA is requested to assign a new diag code from the "BFD Diagnostic 224 Codes" 226 Value BFD Diagnostic Code Name 227 ----- ------------------------------------------------ 228 0xNN Out-lived and optional BitMap BFD control packet 229 payload for non-revertive services. 231 6 References 233 [RFC5880] D. Katz, D. Ward "Bidirectional Forwarding Detection 234 (BFD)". 236 Authors' Addresses 238 Sami Boutros 239 VMware 240 Email: boutross@vmware.com 241 Ankur Dubey 242 VMware 243 Email: adubey@vmware.com 245 Reshad Rahman 246 Cisco 247 Email: rrahman@cisco.com