INTERNET DRAFT V.Kashyap IBM Expiration Date: October 26, 2001 April 26, 2001 IPv4 multicast and broadcast over InfiniBand networks Status of this memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC 2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as Reference material or to cite them other than as ``work in progress''. The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html This memo provides information for the Internet community. This memo does not specify an Internet standard of any kind. Distribution of this memo is unlimited. Copyright Notice Copyright (C) The Internet Society (2001). All Rights Reserved. Abstract This document specifies a method for the transmission of IP version 4 multicast and broadcast datagrams multicast datagrams over InfiniBand subnets. Table of Contents 1.0 Introduction 2.0 InfiniBand addresses 2.1 Unicast GIDs Kashyap [Page 1] INTERNET-DRAFT IPv4 multicast over InfiniBand April 26, 2001 2.2 Multicast GIDs 2.3 InfiniBand Multicast group management 3.0 IPv4 on IB multicast fabrics 3.1 Scope bits 3.1.1 Options for implementing IPv4 subnets spanning multiple IB subnets 3.2 Flag bits 3.3 Mapping of IPv4 multicast address to IB address 3.3.1 IPv4 multicast addresses spanning multiple IB subnets 4.0 Security considerations 5.0 Acknowledgement 6.0 References 7.0 Author's Address 8.0 Full Copyright statement 1.0 Introduction IPv4 multicasting provides a means of transmitting IPv4 datagrams to a group of interfaces. A group IPv4 address is used as the destination address in the IPv4 datagram as documented in STD 5, RFC 1112 [1]. Standard mappings are defined for various media types e.g. ethernet [1], fddi RFC1188 [2], and token ring RFC 1469[3] etc. IPv4 broadcast address is used to send packets to all the IPv4 nodes in the specific IPv4 network. The address range of the multicast addresses is 224.0.0.0 to 239.255.255.255. The limited broadcast address is 255.255.255.255. The net broadcast address is <-1> or <-1). This document defines the mappings for IPv4 multicast and broadcast addresses to the InfiniBand multicast group addresses. This document addresses the issues wrt IPv4. It further assumes unreliable datagram and raw datagram services of InfiniBand Architecture(IBA). These services are described in InfiniBand architecture specification [4]. For a concise overview of the InfiniBand architecture refer to draft-kashyap-ipoib-requirements-00.txt [5]. IPv6 multicast over datagram service of IBA will be described in a subsequent document. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119. Kashyap [Page 2] INTERNET-DRAFT IPv4 multicast over InfiniBand April 26, 2001 This document utlises the text representations described in RFC2373[6] for both the IPv6 and InfiniBand (IB) addresses. 2.0 InfiniBand addresses The InfiniBand architecture borrows heavily from the IPv6 architecture in terms of the InfiniBand subnet structure and global identifiers (GIDs). The InfiniBand architecture defines the global identifier associated with a port as follows: GID (Global Identifier): A 128-bit unicast or multicast identifier used to identify a port on a channel adapter, a port on a router, a switch, or a multicast group. A GID is a valid 128-bit IPv6 address (per RFC 2373) with additional properties/restrictions defined within IBA to facilitate efficient discovery, communication, and routing. Note: These rules apply only to IBA operation and do not apply to raw IPv6 operation unless specifically called out. The raw IPv6 operation referred to in the note in the the definition above is the IPv6 mode of InfiniBand's raw datagram service. It does not mean IPv6 itself. The routers and switches referred to in the above definition are the InfiniBand routers and switches. The InfiniBand(IB) specification defines two types of GIDs: unicast and multicast. 2.1 Unicast GIDs The unicast GIDs are defined, as in IPv6, with three scopes. The IB specification states: a. link local: This is defined to be FE80/10. The IB routers will not forward any packets with link local address in source or destination beyond the IB subnet. b. site local: FEC0/10 A unicast GID used within a collection of subnets which is unique within that collection (e.g. a data center or cam-pus) but is not necessarily globally unique. IB routers must not Kashyap [Page 3] INTERNET-DRAFT IPv4 multicast over InfiniBand April 26, 2001 forward any packets with either a site-local Source GID or a site-local Destination GID outside of the site. c. global: A unicast GID with a global prefix, i.e. an IB router may use this GID to route packets throughout an enterprise or internet. 2.2 Multicast GIDs The mulicast GIDs also parallel the IPv6 multicast addresses. The IB specification defines the multicast GIDs as follows: FFxy:<112 bits> Flag bits: The 4 bits, denoted by x above, are the 4 flag bits: 000T. The first three bits are reserved and are set to zero. The last bit is defined as follows: T=0: denotes a permanently assigned i.e. well known GID T=1: denotes a transient group Scope bits: The 4 bits, denoted by y in the GID above, are the scope bits. These are defined as : scope value Address value 0 Reserved 1 Unassigned 2 Link-local 3 Unassigned 4 Unassigned 5 Site-local 6 Unassigned 7 Unassigned 8 Organization-local 9 Unassigned 0xA Unassigned 0xB Unassigned 0xC Unassigned 0xD Unassigned 0xE Global 0xF Reserved Kashyap [Page 4] INTERNET-DRAFT IPv4 multicast over InfiniBand April 26, 2001 The IB specification further refers to RFC2373 [6] and RFC2375[7] while defining the well known multicast addresses. However, it then states that the well known addresses apply to IB raw IPv6 datagrams only. The IB unreliable datagram (UD) service recognises only one well known multicast address. This is the ALL_CHANNEL_ADAPTERS multicast address defined to be FF02::1. The scope of this address is limited to a single IB subnet. 2.3 InfiniBand Multicast group management IB multicast groups (multicast GIDs) are managed by the subnet manager(SM). The SM explicitly programs the IB switches in the fabric to ensure that the packets are received by all the members of the multicast group. When the group is created a create request is sent to the SM. The subnet manager records the group GIDs and the associated characteristics. The group characteristics are defined by the group path MTU, whether the group will be used for raw datagrams or unreliable datagrams, the service level, the partition key associated with the group, the LID associated with the group etc. These characteristics are defined at the time of the group creation. Any member IB node wanting to participate in the group must join the group. As part of the join operation the node is returned the group characteristics. At the same time the subnet manager ensures that the requestor can indeed participate in the group by verifying that it can support the group MTU, and accessiblity to the rest of the group members. Other group characteristics may need verification too. The SM, for groups that span IB subnet boundaries, must interact with IB routers to determine the presence of this group in other IB subnets. If present the MTU must match across the IB subnets. P_Key is another characteristic that must match across IB subnets since the P_Key inserted into a packet is not modified by the IB switches or IB routers. Thus if the P_Keys didn't match the IB router(s) itself might drop the packets or destinations on other subnets might drop the packets. These characteristics are returned to the IB endnode that joins the multicast group. A join operation may cause the SM to repgrogam the fabric so that the new member can participate in the multicat group. Kashyap [Page 5] INTERNET-DRAFT IPv4 multicast over InfiniBand April 26, 2001 3.0 IPv4 on IB multicast fabrics The InfiniBand architecture defines multiple transport methods to communicate between the IB endnodes. However, only two of these methods support multicast. These are the IB unreliable datagram (UD) service and the IB raw datagram service. Of the two the raw IB datagram service is optional. The UD service in itself is the only service that all IB end nodes must support. The IPv4 on InfiniBand multicast implementation are RECOMMENDED to use the UD service of IB. The IB specification does not make multicast support mandatory though. Thus, in some IP subnets the multicast service may have to be implemented using a multicast server or some other method. It is RECOMMENDED that the IPv4 implementations on IB are implemented on fabrics that support multicast. Note that the mappings defined in this document however are not effected by the above choices. The IPv4 broadcast and multicast to IB multicast GIDs are applicable to any IPv4 over InfiniBand network. 3.1 Scope bits The IB multicast GID scope is as defined in table 1. Thus the use of local scope will confine the IB multicast packets to an IB subnet. The local scope at IB level conflicts with the requirement of an IP brodcast address if IP subnets can span across IB subnets. The IP broadcast address will need to be mapped to an IB GID that has a greater scope than the IB subnet. Extending the IB router to bridge such packets(using local-scope GIDs) across IB subnets suggests an extension to the IB specification. It also makes an IP limited broadcast address extend across all of the connected IB subnets if any of them has an IPv4 subnet in them since all IPv4 subnets support the broadcast address (multicast is optional). The alternative of using the global scope has the same result of extending the group across all the connected IB subnets when IPv4 subnets are created in the IB subnets. This brings in the associated administrative difficulties of ensuring common MTUs and P_Keys across IB subnets for Kashyap [Page 6] INTERNET-DRAFT IPv4 multicast over InfiniBand April 26, 2001 implementing the IP multicast groups. The IB multicast groups also take other values such as the TClass, HopLimit, Flowid etc. that will also cause administrative hiccups in ensuring that they are consistent across IB subnets participating in the group. The use of any other scope is not well defined in IB specification. Therefore, in the interest of simplicity, it is RECOMMENDED that the IPv4 multicast and broadcast addresses be mapped to link-local scope IB multicast GIDs. It is further RECOMMENDED that the IPv4 subnets implementations not span multiple IB subnets. IPv4 subnetting can be used to span a particular IPv4 subnet with a shorter mask across multiple IB subnets. Note that the IB GID group takes a hop limit. However, setting a hop limit in the SM doesn't limit the span of the multicast group. The hop limit only specifies the hop limit that the packets sent out by end nodes must use. Secondly, the IB subnets may be only 1 subnet away but have multiple of them linked by multiple or the same IB routers. 3.1.1 Options for implementing IPv4 subnets spanning multiple IB subnets There are two alternatives for implementations that do need to span multiple IB subnets and cannot use IPv4 subnetting for this. Based on the dicussions in the working group one of these methods will be chosen. 1. Use of a 'spans multiple IB subnets' option Some implementations may however wish to implement IP subnets across IB subnets. It is a MUST for all IPv4 over IB implementations to define a configuration parameter associated with an IPv4 subnet that defines the scope bit to be used for the multicast translations. The default is to use the link-local scope. This parameter MUST be IP subnet wide to ensure that all the IP end nodes map the addresses the same way. It is however NOT RECOMMENDED that such a parameter be set to anything but the default value. The implication is that the IP subnets are by default assumed to be within an IB subnet. Kashyap [Page 7] INTERNET-DRAFT IPv4 multicast over InfiniBand April 26, 2001 2. Use of an IP subnet number Each IP subnet can be implemented to be associated with a specific 16 bit number. This number MUST be kept unique by the fabric administrator. This number MUST be made part of the multicast mapping thereby creating unique IPv4 subnet wide mappings. In this case the IB multicast GIDs MUST use the global scope. 3.2 Flag bits IPv4 multicast/broadcast addresses have no well defined IPv6 or IB subnet mappings. The flag bits will therefore always be set to 0001 in IPv4 multicast/broadcast mappings to the IB multicast GID. 3.3 Mapping of IPv4 multicast address to IB address The IPv4 broadcast to IB multicast GID mapping is defined to be: FF1y::255.255.255.255 The IPv4 multicast to IB multicast GID mapping is correspondingly defined as: FF1y:: The default value of 'y' is 0x2. Thus the default translation of IPv4 broadcast address is FF12::255.255.255.255. 3.3.1 IPV4 multicast addresses spanning multiple IB subnets If the IPv4 subnet spans multiple IB subnets the scope value will be set according to parameters defined in section 3.1.1. If the IP subnet number is used the mapping will be defined as FF1E::IP_subnet_number:. 4.0 Security considerations No security issues are discussed in this document. 5.0 Acknowledgement The author thanks David L. Stevens for his useful suggestions and comments. Kashyap [Page 8] INTERNET-DRAFT IPv4 multicast over InfiniBand April 26, 2001 6.0 References [1] RFC1112: Host extensions for IP multicasting. S.E. Deering. [2] RFC1188: Proposed Standard for the Transmission of IP Datagrams over FDDI Networks. D. Katz. [3] RFC1469: IP Multicast over Token-Ring Local Area Networks. T. Pusateri. [4] InfiniBand(TM) Architecture Specification Volume 1, Release 1.0 [5] draft-kashyap-ipoib_requirements-00.txt [6] RFC2373: IP Version 6 Addressing Architecture. R. Hinden,S. Deering. [7] RFC2375: IPv6 Multicast Address Assignments. R. Hinden, S. Deering. 7.0 Author's Address Vivek Kashyap IBM 15450, SW Koll Parkway Beaverton, OR 97006 Work: 503 578 3422 Email: vivk@us.ibm.com 8.0 Full Copyright Statement Copyright (C) The Internet Society (2001). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. Kashyap [Page 9] INTERNET-DRAFT IPv4 multicast over InfiniBand April 26, 2001 The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Kashyap [Page 10] -- Vivek Kashyap IBM vivk@us.ibm.com Ph: 503 578 3422 T/L: 775 3422