| < draft-adubey-bfd-service-redundancy-00.txt | draft-adubey-bfd-service-redundancy-01.txt > | |||
|---|---|---|---|---|
| INTERNET-DRAFT Sami Boutros | INTERNET-DRAFT Sami Boutros | |||
| Intended Status: Standard Track Ankur Dubey | Intended Status: Standard Track Ankur Dubey | |||
| VMware | VMware | |||
| Reshad Rahman | Reshad Rahman | |||
| Cisco | Cisco | |||
| Expires: November 15, 2017 May 14, 2017 | Expires: May 31, 2018 November 27, 2017 | |||
| Service Redundancy using BFD | Service Redundancy using BFD | |||
| draft-adubey-bfd-service-redundancy-00 | draft-adubey-bfd-service-redundancy-01 | |||
| Abstract | Abstract | |||
| In a data center, when multiple routing/service nodes are providing | In a data center, when multiple routing/service nodes are providing | |||
| single active redundancy for a set of L2, L3 and/or L4-L7 services. | single active redundancy for a set of L2, L3 and/or L4-L7 services. | |||
| Both non-revertive and revertive fail over modes are required for the | Both non-revertive and revertive fail over modes are required for the | |||
| services. This draft describes a method to achieve the non-revertive | services. This draft describes a method to achieve the non-revertive | |||
| and revertive fail over modes for services using Bidirectional | and revertive fail over modes for services using Bidirectional | |||
| Forwarding Detection (BFD). | Forwarding Detection (BFD). | |||
| skipping to change at page 2, line 21 ¶ | skipping to change at page 2, line 21 ¶ | |||
| to this document. Code Components extracted from this document must | to this document. Code Components extracted from this document must | |||
| include Simplified BSD License text as described in Section 4.e of | include Simplified BSD License text as described in Section 4.e of | |||
| the Trust Legal Provisions and are provided without warranty as | the Trust Legal Provisions and are provided without warranty as | |||
| described in the Simplified BSD License. | described in the Simplified BSD License. | |||
| Table of Contents | Table of Contents | |||
| 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 | 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 | |||
| 1.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . 3 | 1.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . 3 | |||
| 2. Solution Overview . . . . . . . . . . . . . . . . . . . . . . . 4 | 2. Solution Overview . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
| 3 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 5 | 2.1 Node failover . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
| 4 Security Considerations . . . . . . . . . . . . . . . . . . . . 5 | 2.2 Per service failover for non-revertive services . . . . . . 5 | |||
| 5 IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 5 | 3 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 6 | |||
| 6 References . . . . . . . . . . . . . . . . . . . . . . . . . . 5 | 4 Security Considerations . . . . . . . . . . . . . . . . . . . . 6 | |||
| Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 5 | 5 IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 6 | |||
| 6 References . . . . . . . . . . . . . . . . . . . . . . . . . . 6 | ||||
| Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 6 | ||||
| 1 Introduction | 1 Introduction | |||
| This document describes how can a group of service/routing nodes in a | This document describes how can a group of service/routing nodes in a | |||
| data center providing single active redundancy for multiple L2/L3 | data center providing single active redundancy for multiple L2/L3 | |||
| and/or L4/L7 services, can use BFD protocol to support non-revertive | and/or L4/L7 services, can use BFD protocol to support non-revertive | |||
| as well as revertive fail over mode. | as well as revertive fail over mode. | |||
| Typically, BFD is used between the group of service nodes to verify | Typically, BFD is used between the group of service nodes to verify | |||
| the connectivity as well as the aliveness of the service nodes. The | the connectivity as well as the aliveness of the service nodes. The | |||
| skipping to change at page 4, line 21 ¶ | skipping to change at page 4, line 21 ¶ | |||
| // | \ | // | \ | |||
| // | \ | // | \ | |||
| +-------+ +-------+ +-------+ | +-------+ +-------+ +-------+ | |||
| |Node1 |-BFD-|Node2 |-BFD-|Node3 | | |Node1 |-BFD-|Node2 |-BFD-|Node3 | | |||
| +-------+ +-------+ +-------+ | +-------+ +-------+ +-------+ | |||
| |--------------BFD--------| | |--------------BFD--------| | |||
| Figure 1: | Figure 1: | |||
| Figure 1 shows 3 routing nodes using BFD to implement the single | Figure 1 shows 3 routing nodes using BFD to implement the single | |||
| active redundancy for revertive and non-revertive services. | active redundancy for revertive and non-revertive services. More than | |||
| 3 routing nodes can be used. | ||||
| Multiple L2/L3 and/or L4/L7 services are offered in a data center by | Multiple L2/L3 and/or L4/L7 services are offered in a data center by | |||
| a set of routing/service nodes providing single active redundancy. | a set of routing/service nodes providing single active redundancy. | |||
| The provisioning of the services can be done using a centralized | The provisioning of the services can be done using a centralized | |||
| control plane implemented in a controller or using a distributed | control plane implemented in a controller or using a distributed | |||
| dynamic control plane. | dynamic control plane. | |||
| Every L2/L3 and/or L4/L7 service is identified by a unique ID known | 2.1 Node failover | |||
| across the routing/service nodes providing the services. | ||||
| A bitmap will be used to represent the services, where each service | An implementation MAY choose to support only node failover and not a | |||
| is represented by one bit in the bit map. All the service nodes MUST | per service failover. A node can be primary or backup for a given | |||
| have the same mapping of the bit position to the service unique ID. | service. On a primary node failure, all non-revertive and revertive | |||
| The bitmap position and the unique service ID could be maintained by | services will become active on the backup node. | |||
| a network controller. The bitmap will be used in the payload of the | ||||
| BFD packets sent by the service node to indicate which service the | ||||
| node maintain an active status for. | ||||
| Service nodes providing single active redundancy will communicate | In figure 1, lets assume that Node1 is the primary node for a set A | |||
| using BFD this bitmap carried in the BFD control packet payload. When | of non-revertive services with node2 as backup, and another set B of | |||
| a backup service node takes over a service with a non-revertive fail | non-revertive services with Node3 as backup. As well, Node1 is | |||
| over mode after primary node failure. The backup node once the BFD | primary for a set C of revertive services with Node2 as backup and, | |||
| session comes up with the recovered primary node, will set the bit | another set D of revertive services with Node3 as backup. | |||
| associated with this service in the bitmap payload carried in the BFD | ||||
| control packet sent to the primary node. Furthermore, the backup node | ||||
| will use a new Diag code in the BFD control packet to inform the | ||||
| primary node that it out-lived it and took over the set of non- | ||||
| preemptive services encoded in the bitmap of the BFD control packet | ||||
| payload. | ||||
| The BFD control packet with the new Diag code and the bitmap will be | If Node1 fails, Node2 and Node3 will set a new diag code in the BFD | |||
| sent after the BFD session came up in the BFD control packets for at | control packet. This diag code will inform Node1 that both Node2 and | |||
| least twice the detection multiplier count. Only the non-revertive | Node3 didn't fail, and Node1 MUST NOT activate the non-revertive set | |||
| services associated bits in the bitmap will be set by a service node | of services A and B respectively, when it comes back up. The BFD | |||
| acting as a backup for those services after a primary node failure | control packet with the new diag code will be sent after the BFD | |||
| recovery. Primary node upon receiving the BFD control packet with the | session came up for at least twice the detection multiplier count. | |||
| bit set for the corresponding non-revertive service MUST not attempt | ||||
| to activate the service, but should remain in standby state for the | Therefore, Node1 upon receiving the BFD control packet with the new | |||
| service until the backup node that took over fails. | diag code, MUST not attempt to activate the non-revertive services, | |||
| but remain in standby state for the non-revertive services until the | ||||
| Node2 or Node3 that took over fails. | ||||
| Revertive services are assumed to revert back to the primary node | Revertive services are assumed to revert back to the primary node | |||
| after primary node recovers. Once the BFD session comes up between | Node1, after the node recovers. Once the BFD session comes up between | |||
| the primary and backup node, the backup node should stop forwarding | the primary and backup nodes, the backup node should stop forwarding | |||
| for any revertive services. A node MUST start forwarding all | for any revertive services. A node MUST start forwarding all | |||
| revertive services for which it is configured as a primary once the | revertive services for which it is configured as a primary once the | |||
| BFD session comes up with the corresponding backup nodes. A node MUST | BFD session comes up with the corresponding backup nodes. A node MUST | |||
| stop forwarding for revertive services for which it is a backup once | stop forwarding for revertive services for which it is a backup once | |||
| the BFD session comes up with the corresponding primary. | the BFD session comes up with the corresponding primary. | |||
| 2.2 Per service failover for non-revertive services | ||||
| An implementation MAY choose to support per service failover for non- | ||||
| revertive services. For example, in figure1, some non-revertive | ||||
| services could be active on Node1 while some non-revertive services | ||||
| could be active on Node2 or Node3 for better load balancing of | ||||
| services traffic. In this mode, every L2/L3 and/or L4/L7 non- | ||||
| revertive service will be identified by a unique ID known across the | ||||
| routing/service nodes providing the services. | ||||
| A bitmap will be used to represent the non-revertive services, where | ||||
| each non-revertive service is represented by one bit in the bitmap. | ||||
| All the service nodes MUST have the same mapping of the bit position | ||||
| to the non-revertive service unique ID. The bitmap position and the | ||||
| unique service ID could be maintained by a network controller. | ||||
| A node that is assigned as backup for a given non-revertive service | ||||
| node will take over as active in either of the following cases: 1) | ||||
| The node assigned as primary for this service failed. 2) This | ||||
| specific service failed on the primary node for this service. | ||||
| In case 1, the BFD session will go down since it is a node failure. | ||||
| In case 2, BFD session between the nodes will remain up. In either | ||||
| scenarios, the node assigned as secondary will become active for the | ||||
| non-revertive service. In case 1, the secondary node will set the new | ||||
| diag code in the BFD control packets once the BFD session is | ||||
| established. The new diag code will be set in the BFD control packets | ||||
| for at least twice the detection multiplier count. In case 2, this | ||||
| diag code will be set in the next BFD control packets sent after the | ||||
| node takes over as Active for a given non-revertive service. If there | ||||
| is at least one non-revertive service for which this node is not | ||||
| active AND at least 1 non-revertive service for which it is active, | ||||
| the node will also send the bitmap in the BFD control packets | ||||
| payload. The bits identifying the active non-revertive services will | ||||
| be set in this bitmap. The new diag code and the optional bitmap | ||||
| payload will be sent in the BFD control packets for at least twice | ||||
| the detection multiplier count. | ||||
| Therefore, if a node receives a BFD control packet with the new diag | ||||
| code set but no payload in the BFD control packet, this means that it | ||||
| MUST NOT activate all non-revertive services for which this node is | ||||
| primary. Whereas, if a payload is present in the BFD control packet | ||||
| that has the new diag code set, the receiving node MUST NOT activate | ||||
| the non-revertive services indicated by the set bits in the bitmap. | ||||
| Per service failover is not applicable to revertive services. They | ||||
| will behave the same way as described in section 2.1 | ||||
| 3 Acknowledgements | 3 Acknowledgements | |||
| 4 Security Considerations | 4 Security Considerations | |||
| This document does not introduce any additional security constraints. | This document does not introduce any additional security constraints. | |||
| 5 IANA Considerations | 5 IANA Considerations | |||
| IANA is requested to assign a new diag code from the "BFD Diagnostic | IANA is requested to assign a new diag code from the "BFD Diagnostic | |||
| Codes" | Codes" | |||
| Value BFD Diagnostic Code Name | Value BFD Diagnostic Code Name | |||
| ----- ------------------------------------------------------------ | ----- ------------------------------------------------ | |||
| 0xNN Out-lived and BitMap payload set with non-revertive services | 0xNN Out-lived and optional BitMap BFD control packet | |||
| payload for non-revertive services. | ||||
| 6 References | 6 References | |||
| [RFC5880] D. Katz, D. Ward "Bidirectional Forwarding Detection | [RFC5880] D. Katz, D. Ward "Bidirectional Forwarding Detection | |||
| (BFD)". | (BFD)". | |||
| Authors' Addresses | Authors' Addresses | |||
| Sami Boutros | Sami Boutros | |||
| VMware | VMware | |||
| Email: sboutros@vmware.com | Email: sboutros@vmware.com | |||
| Ankur Dubey | Ankur Dubey | |||
| VMware | VMware | |||
| Email: adubey@vmware.com | Email: adubey@vmware.com | |||
| Reshad Rahman | Reshad Rahman | |||
| Cisco | Cisco | |||
| Email: rrahman@cisco.com | Email: rrahman@cisco.com | |||
| End of changes. 13 change blocks. | ||||
| 43 lines changed or deleted | 87 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||