Internet-Draft Framework of FFD for IP-based Network March 2023
Wang, et al. Expires 12 September 2023 [Page]
Workgroup:
Network Working Group
Internet-Draft:
draft-wang-ffd-framework-01
Published:
Intended Status:
Informational
Expires:
Authors:
H. Wang
Huawei
F. Qin
China Mobile
L. Zhao
Huawei
S. Chen
Huawei

Framework of Fast Fault Detection for IP-based Network

Abstract

The IP-based distributed system and software application layer often use heartbeat to maintain the network topology status. However, the heartbeat setting is long, which prolongs the system fault detection time. IP-based storage network is the typical usage of that scenario. When the IP-based storage network fault occurs, NVMe connections need to be switched over. Currently, no effective method is available for quick detection, switchover is performed only based on keepalive timeout, resulting in low performance.

This document defines the basic framework of how network assisted host devices can quickly detect application connection failures caused by network faults.

Requirements Language

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119].

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 12 September 2023.

Table of Contents

1. Introduction

Today, distributed systems based on network communication are widely used. In order to ensure that both ends of the distributed system can perceive faults, heartbeat is a common technology. However, relying on the heartbeat to detect whether the peer is faulty also faces challenges: if the heartbeat is set too short, it may be misjudged by network disturbances; if the heartbeat is set too long, when a fault occurs, it will not be found for a long time.

Application scenarios such as IP-based NVMe, distributed storage, and cluster computing are typical scenarios for such technologies.

The [I-D.guo-ffd-requirement] describes the problems of the current IP-based NVMe solution. On an IP-based storage area network, if the access link of a storage device is faulty, hosts cannot access the storage device. Because the host cannot directly detect the fault, the host has to wait for the KA timeout. To speed up fault detection, hosts and storage devices can implement fast KA or BFD. However, this solution introdueced additional cost on hosts and storage devices and is hard to use in large-scale IP-based storage area network. In fact, the IP network can directly detect these faults, so we can use the IP network to assist these access endpoints to quickly perceive the fault, so as to perform quickly service recovery.

2. Terminology

NoF : NVMe of Fabrics

FC : Fiber Channel

NVMe : Non-Volatile Memory Express

SAN: Storage Area Network

3. Reference Models

The frame solution here is applicable to the system where the terminals are directly connected to the IP network.

 +--------+    +-----------+     +-----------+    +--------+
 |Terminal|----| IP Network|-----| IP Network|----|Terminal|
 | Device |    |   Device  |     |   Device  |    | Device |
 +--------+    +-----------+     +-----------+    +--------+
             Figure 1 : Basic framework

Terminals are connected to the IP network, and they establish IP connections through the reachability provided by the IP network. When the connection path fails, they cannot be detected quickly. They can only detect it after the keep-alive timeout, and then can carry out service protection processing. This time may be relatively long. Therefore, it is necessary to notify the terminal device of some failures in the network, such as access port failures and internal network failures that will cause IP connection failures between terminals, so that the terminal device can respond quickly and perform corresponding service processing.

Figure 1 shows a model abstraction. In actual use, as introduced in Chapter 1, there are scenarios such as IP-based NVMe, distributed storage, and cluster computing. IP-based NVME is introduced as a typical scenario here, and the processing behaviors of other scenarios are similar.

An IP-based storage area network mainly includes three types of roles:

o Initiator, the terminal device, is also called the host.

o Switch, which is a network device used to access terminal devices.

o Target is also a terminal device, also known as a storage device.

               +--+      +--+      +--+      +--+
   Host        |H1|      |H2|      |H3|      |H4|
(Initiator)    +/-+      +-,+      +.-+      +/-+
                |         | '.   ,-`|         |
                |         |   `',   |         |
                |         | ,-`  '. |         |
              +-\--+    +--`-+    +`'--+    +-\--+
              | SW1|    | SW2|    | SW3|    | SW4|
              +--,-+    +---,,    +,.--+    +-.--+
                  `.          `'.,`         .`
                    `.   _,-'`    ``'.,   .`
    IP              +--'`+            +`-`-+
  Network           | SW5|            | SW6|
                    +--,,+            +,.,-+
                    .`   `'.,     ,.-``   ',
                  .`         _,-'`          `.
              +--`-+    +--'`+    `'---+    +-`'-+
              | SW7|    | SW8|    | SW9|    |SW10|
              +-.,-+    +-..-+    +-.,-+    +-_.-+
                | '.   ,-` |        | `.,   .' |
                |   `',    |        |    '.`   |
                | ,-`  '.  |        | ,-`  `', |
  Storage      +-`+      `'\+      +-`+      +`'+
  (Target)     |S1|      |S2|      |S3|      |S4|
               +--+      +--+      +--+      +--+
               Figure 2 : Large-scale SAN

.Figure 2 shows a typical IP-Based NVME dual-plane storage area network. When the access link of the storage device fails, the host needs to quickly detect the failure so that the NVMe connection initiated by the host can quickly switch to the backup path..

4. Functional Components

The NVMe IP-based SANs consists of storage devices, hosts and switches. Hosts and storage devices need to obtain required fault information from the IP network. Switches need to synchronize locally detected fault information on the IP network so that other switches can obtain the faults and notify hosts or storage devices that require the fault infomation.

4.1. Storage Device

As the server side, storage devices provide storage access services for hosts. If a storage device is connected to an IP network and is interested in the status of other devices, the storage device can initiate a subscription request to the connected switch to obtain status notifications of other devices from the access switch.

In order to reduce the complexity of storage device implementation and improve device security, it is recommended to extend the LLDP protocol to support the storage device to subscribe to the access switch, and use the new L2 protocol extension to support the switch to notify the storage device of status information.

  +-------+                  +------+
  |Storage|                  |Switch|
  +-------+                  +------+
      |      Subscribe Msg      |
      | ----------------------->|
      |                         |
      |     Notification Msg    |
      | <-----------------------|
      |                         |
      |                         |
      Figure 3 : Storage Device

4.2. Host

As a client accessing a storage device, the host needs to be able to quickly obtain the service status of the storage device. When the host receives the failure message of the storage device notified by the switch, the host will quickly disconnect the connection in use and switch to the redundan.t storage device.

The recommended protocol on the host side is the same as that on the storage device.

+-------+                  +------+
|  HOST |                  |Switch|
+-------+                  +------+
    |      Subscribe Msg      |
    | ----------------------->|
    |                         |
    |     Notification Msg    |
    | <-----------------------|
    |                         |
    |                         |
     Figure 4 : Host Device

4.3. Network Device

The switch can quickly detect local failures or network failures, and can calculate the affected IPs based on these failures. The switch synchronizes the IP information affected by the fault to other switches in the IP network. After the switch gets the fault information, it needs to notify the required hosts of the fault so that they can quickly switch to the redundant storage device.

+------+                  +------+
|Switch|                  |Switch|
+------+                  +------+
   |    Information Sync     |
   | ----------------------->|
   |                         |
   |                         |
   |                         |
    Figure 5 : Network Device

5. Procedures

5.1. Network Deployment

The IP-based SAN uses the standard Ethernet technolog. Network deployments typically use the current IP technologies. For example, OSPF is usually deployed as an underlay protocol.

5.2. Storage and Host Access

Hosts and storage devices are connected to the ethernet network. The administrator assigns access IP addresses to the hosts and storage devices. In most scenarios, these routes can be advertised through the underlay protocol. In addition, after hosts and storage devices go online, they needs to send subscription requests to the switch to obtain the status information of the target device.

To prevent hosts or storage devices from being aware of extra IP address, it is recommended that LLDP be used to implement this message.

5.3. Status Infomation Sync And Notification

When hosts and storage devices go online, the switch can calculates an initial state of these devices and synchronizes the state on the IP network.

After detecting a local fault, the switch needs to notify other access devices who need the fault information. In addition, the switch needs to synchronize the fault information to other switches on the network. To ensure that synchronization messages can be reliably synchronized to other switches, a reliable transmission protocol, such as TCP or Quic, must be used. For large-scale IP networks, hierarchical synchronization can be used to reduce the number of sessions between switches.

The synchronization information about the host and storage devices belongs to the application layer's information.

+-------+           +----+      +------+      +----+         +-------+
|  HOST |-----------|TOR1|------|Spine1|------|TOR3|---------|Storage|
+---/---+           +-/--+      +--/---+      +-/--+         +---/---+
    |---------------->|  Info Sync |  Info Sync |<---------------|
    |  SubscribeMsg   |----------->|<-----------|  Subscribe Msg |
    |                 |<-----------|----------->|                |
    |<----------------|  Info Sync |  Info Sync |                |
    |Notification Msg |            |            |                |
    |                 |            |            |                |
            Figure 7 : Information Advertisement

When an access link fails, the access switch can detect the failure. According to the faulty link, the access switch can calculate the IP address of the affected device. The access switch advertises the faulty IP address information to other local devices that need to sense the fault. At the same time, the switch synchronizes the calculated affected IP information to other switches in the IP network.

After the switch receives the synchronized fault IP information from other switches, it needs to notify the required local access device of the fault information.

ECMP or redundant link protection is usually deployed to prevent this failure.

But when an unconvergable fault occurs on the network, the access switch can detect it quickly by deploying detection technology, and can also calculate the IP addresses affected by the fault, and then perform the same actions as above.

6. Security Considerations

In order to control the communication range of information and reduce the negative impact of possible information flooding, the Subscribe Msg and Notification Msg considered in this framework are suggested to be implemented through the L2 extension protocol, so that the sending and receiving of this information will only be controlled by the access network device within the domain. At the same time, the network device is not allowed to forward this message, only allowed to receive or send such message as needed.

For the communication protocol between network devices, in order to ensure its security, it can be encrypted by commonly used encryption technology, including but not limited to TCP-AO, TLS and other technologies.

7. IANA Considerations

This document makes no request of IANA.

8. References

8.1. Normative References

[RFC2119]
Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, , <https://www.rfc-editor.org/info/rfc2119>.

8.2. Informative References

[I-D.guo-ffd-requirement]
Guo, L., Feng, Y., Zhao, J., Qin, F., Zhao, L., Wang, H., and W. Quan, "Requirement of Fast Fault Detection for IP-based Network", Work in Progress, Internet-Draft, draft-guo-ffd-requirement-01, , <https://datatracker.ietf.org/doc/html/draft-guo-ffd-requirement-01>.

Authors' Addresses

Haibo Wang
Huawei
No. 156 Beiqing Road
Beijing
100095
P.R. China
Fengwei Qin
China Mobile
Beijing
China
Lily Zhao
Huawei
No. 3 Shangdi Information Road
Beijing
100085
P.R. China
Shuanglong Chen
Huawei
No. 156 Beiqing Road
Beijing
100095
P.R. China