idnits 2.17.1 draft-ietf-ipoib-link-multicast-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. == There are 4 instances of lines with non-RFC3849-compliant IPv6 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'SHALL not' in this paragraph: It is up to the network administrator to select a link MTU to use when configuring an IPoIB link. The link MTU SHALL not be greater than the MTU of any IB devices on the IPoIB link. Here the IB devices include IB switches, CAs, or routers. == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: In case an IPoIB link spans more than one IB subnet, the IPoIB link MTU MUST not exceed the path MTU of any path connecting two nodes in the same IB partition. It is up to the network administrator to determine the appropriate path MTU value that will work for any node in the same IPoIB link. -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'IP6MLD' is defined on line 602, but no explicit reference was found in the text ** Obsolete normative reference: RFC 2373 (ref. 'AARCH') (Obsoleted by RFC 3513) ** Obsolete normative reference: RFC 2461 (ref. 'DISC') (Obsoleted by RFC 4861) -- Possible downref: Non-RFC (?) normative reference: ref. 'IBTA' ** Obsolete normative reference: RFC 2460 (ref. 'IPV6') (Obsoleted by RFC 8200) Summary: 6 errors (**), 0 flaws (~~), 6 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 INTERNET-DRAFT H.K. Jerry Chu 3 Sun Microsystems 4 Vivek Kashyap 5 IBM 6 Expires: December, 2003 June, 2003 8 IP link and multicast over InfiniBand networks 10 Status of this Memo 12 This document is an Internet-Draft and is in full conformance with 13 all provisions of Section 10 of RFC2026. 15 Internet-Drafts are working documents of the Internet Engineering 16 Task Force (IETF), its areas, and its working groups. Note that other 17 groups may also distribute working documents as Internet-Drafts. 19 Internet-Drafts are draft documents valid for a maximum of six months 20 and may be updated, replaced, or obsoleted by other documents at any 21 time. It is inappropriate to use Internet-Drafts as reference 22 material or to cite them other than as "work in progress." 24 The list of current Internet-Drafts can be accessed at 25 http://www.ietf.org/ietf/1id-abstracts.txt 27 The list of Internet-Draft Shadow Directories can be accessed at 28 http://www.ietf.org/shadow.html. 30 Copyright (C) The Internet Society (2003). All Rights Reserved. 32 Abstract 34 This document specifies a method for setting up IP subnets and 35 multicast services over InfiniBand(TM) networks. Discussions in this 36 document are applicable to both IPv4 and IPv6, unless explicitly 37 specified. A separate document will cover unicast and encapsulation 38 of IP datagrams over InfiniBand networks. 40 Table of Contents 41 1.0 Introduction 42 2.0 Terminology 43 3.0 Basic IPoIB Transport - Unreliable Datagram 44 4.0 IB Multicast Architecture 45 5.0 IB Links vs. IPoIB Links 46 6.0 Setting up an IPoIB Link 47 6.1 Maximum Transmission Unit 48 6.2 IPoIB Link Q_Key 49 6.3 Other Link Attributes 50 7.0 The IPoIB Broadcast Group 51 8.0 Mapping for other Multicast Groups 52 9.0 Sending and Receiving IP Multicast Packets 53 10.0 IP Multicast Routing 54 11.0 New Types of Vulnerability in IB Multicast 55 12.0 Security Considerations 56 13.0 Acknowledgments 57 14.0 References 58 15.0 Author's Address 59 16.0 Full Copyright Statement 61 1.0 Introduction 63 InfiniBand Architecture (IBA) defines four layers of network services 64 corresponding to layer one through layer four of the OSI reference 65 model. For the purpose of running IP over an InfiniBand (IB) 66 network, the IB link, network, and transport layers collectively 67 constitute the data link layer to the IP stack. One can find a 68 general overview of IB architecture related to IP networks in 69 [IPoIB_ARCH]. 71 This document will focus on the necessary steps in order to lay out 72 an IP network on top of an IB network. It will describe all the 73 elements of an IP over InfiniBand (IPoIB) link, how to configure its 74 associated attributes, and how to set up basic broadcast and 75 multicast services for it. IPoIB links are the building blocks upon 76 which an IP network consisting of many IP subnets connected by 77 routers can be built. Subnetting allows the containment of broadcast 78 traffic within a single link. It also provides certain degree of 79 isolation for the administration purpose between nodes on different 80 subnets. 82 2.0 Terminology 84 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 85 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 86 document are to be interpreted as described in [RFC2119]. 88 3.0 Basic IPoIB Transport - Unreliable Datagram 90 InfiniBand defines four types of transport services [IBTA]. They are 91 reliable connection, unreliable connection, reliable datagram, 92 unreliable datagram. IBA also defines a special raw datagram service 93 for encapsulation purpose. Both unreliable datagram and raw datagram 94 define support for multicast. They provide the basic transport 95 mechanism that best matches the IP datagram paradigm. 97 IB unreliable datagram provides many additional features such as the 98 partition key (P_Key) protection, multiple queue pairs (QPs), and 99 Q_Key protection. Moreover, it defines a 32-bit invariant CRC 100 checksum, which provides a much stronger protection against data 101 corruption, compared with the 16-bit CRC that a raw datagram carries. 103 For these reasons, IB unreliable datagram is considered to be a much 104 better choice as the basic IPoIB transport than the raw datagram, and 105 is chosen as the default IPoIB transport mechanism ([IPoIB_ARCH], 106 [IPoIB_ENCAP]). 108 4.0 IB Multicast Architecture 110 The following discussion gives a short overview of the multicast 111 architecture in InfiniBand. For a complete specification, the reader 112 is referred to [IBTA]. 114 IBA defines two layers of multicast services. Its link layer uses 115 multicast LIDs (MLIDs) in the Local Route Header (LRH). LIDs are 116 allocated by the Subnet Manager (SM) and fall in the range between 117 0xC0000 to 0xFFFE (approximately 16k). MLIDs are used by IB switches 118 to program their multicast forwarding tables. An IB switch 119 implementation may support much fewer MLIDs in its forwarding table 120 though. 122 The IB network layer uses multicast GIDs (MGIDs) in the Global Route 123 Header (GRH). MGIDs closely resemble IPv6 multicast addresses [AARCH] 124 shown below. 126 | 8 | 4 | 4 | 112 bits | 127 +------ -+----+----+---------------------------------------------+ 128 |11111111|flgs|scop| group ID | 129 +--------+----+----+---------------------------------------------+ 131 Figure 1 133 [IPoIB_ARCH] describes each field in more details. 135 Since every IB multicast packet is required to carry a LRH and a GRH, 136 both a valid MGID and a valid MLID are needed before an IB multicast 137 packet can be constructed. 139 An IB multicast group is uniquely identified by a valid MGID. Before 140 a MGID can be used within an IB subnet, either as a destination 141 address of a multicast packet, or to represent a multicast group that 142 an IB node can join, an IB multicast group corresponding to the MGID 143 must be created through the Subnet Administrator (SA). Besides the 144 the MGID, the creator of an IB multicast group must supply values of 145 path MTU, P_Key, Q_Key, Service Level (SL), FlowLabel, TClass that 146 are appropriate for all the potential clients of the multicast group 147 to use. In return, SA will allocate a MLID to be used by switches in 148 the local IB subnet. 150 Unreliable multicast is defined by IBA as an optional functionality 151 for channel adaptors (CAs) and switches. In today's IP technology, 152 link multicast has become an indispensable function for better 153 supporting a modern IP network. For this reason, it is required that 154 an IPoIB fabric supports multicast. This includes all the CAs and 155 switches that are part of an IP network. 157 5.0 IB Links vs. IPoIB Links 159 A link segment on top of which an IP subnet can be configured is 160 defined in [IPV6] as a communication facility or medium over which 161 nodes can communicate at the "link" layer. For most types of 162 communication media, the boundary between different data link 163 segments closely follows the physical topology of the network. For 164 instance, an Ethernet network connected by switches, hubs, or bridges 165 usually forms a single link segment and broadcast/multicast domain. 166 Different Ethernet segments can be connected by IP routers at the 167 network layer to form an IP network. 169 InfiniBand defines its own link-layer and subnets consisting of nodes 170 connected by IB switches and routers. However, the IPoIB link 171 boundary need not follow the IB link boundary. Nodes residing on 172 different IB subnets can still communicate directly with one another 173 through IB routers at the InfiniBand network layer. This 174 communication at the network layer applies to unicast as well as 175 multicast. 177 The ultimate requirement for two nodes in the same IB fabric to 178 communicate at the IB level, besides physical connectivity, is a 179 common P_Key. 181 Partitioning in IB provides an isolation mechanism among nodes in an 182 IB fabric, much like VLANs in the Ethernet network. Each port of an 183 HCA (Host Channel Adaptor) contains a P_Key table holding all the 184 valid P_Keys the port is allowed to use. The P_Key table is set up by 185 the SM of the local IB subnet. Each QP is programmed with a P_Key 186 from the local P_Key table. This P_Key is carried in all the outgoing 187 packets from the QP, and is used to compare against the P_Key of all 188 incoming packets to the QP. Any packet with an invalid P_Key will be 189 discarded by the QP and a P_Key violation trap will be generated. IB 190 switches may optionally enforce partition checking too. 192 Following the above, IB partitions are the natural choice for 193 defining IPoIB link boundary. It also provides much needed 194 flexibility for a network administrator to group nodes logically into 195 different subnets in a large network. 197 6.0 Setting up an IPoIB Link 199 A network administrator defines an IPoIB link by setting up an IB 200 partition and assigning it a unique P_Key. Since a full-duplex 201 communication is required among IP nodes, full-membership P_Keys, 202 that is, those with the high-order bit set to 1 shall be used. An IB 203 partition may or may not span multiple IB subnets; and whether it 204 does or not is mostly transparent to IPoIB. 206 Each node attached to an IB partition MUST have one of its HCAs 207 assigned the P_Key to use. Note that the P_key table of an HCA port 208 may contain many P_Keys. It is up to the implementation to define the 209 method by which the P_Key relevant to a particular IPoIB subnet is 210 determined and conveyed to the IPoIB stack. For instance, 211 implementations may resort to a manual configuration when choosing 212 the P_key or a set of P_Keys for IPoIB, and rely on DHCP [DHCP] to 213 assign an IP subnet number to each IPoIB link. 215 Once an IB partition is established for IPoIB use, the link MTU and 216 Q_Key are two other attributes that must be chosen before an IPoIB 217 link can be configured. 219 6.1 Maximum Transmission Unit 221 IB defines five permissible maximum payload sizes (MTUs). They are 222 256, 512, 1024, 2048 and 4096 bytes. [IPV6] requires a link MTU of 223 1280 bytes or greater. To be better compatible with Ethernet, the 224 dominant network media in both the LAN and WAN environment, the IPoIB 225 link MTU should be 1500 bytes or greater. This leaves only 2048 and 226 4096 bytes as the two acceptable MTUs for IPoIB. Channel adaptors 227 supporting a MTU less than the minimal requirement can still expose 228 an acceptable MTU to IP through an adaptation layer that fragments 229 larger messages into smaller IB packets, and reassembles them on the 230 receiving end. But this must be done in a way that is transparent to 231 the IP stack. 233 It is up to the network administrator to select a link MTU to use 234 when configuring an IPoIB link. The link MTU SHALL not be greater 235 than the MTU of any IB devices on the IPoIB link. Here the IB devices 236 include IB switches, CAs, or routers. 238 In general, a maximum link MTU should be employed whenever possible 239 to attain a better throughput performance. One caveat is that once a 240 link MTU is chosen for a given IPoIB link, nodes connected by CAs of 241 a smaller MTU won't be able to join the link unless the whole link 242 and all the devices attached to it are reconfigured to use the 243 smaller MTU. 245 It may be desirable in some case to use a smaller link MTU than the 246 full size. For example, bridging an IPoIB link with an Ethernet link 247 could be made much easier if the IPoIB link MTU is reduced to 1500 248 bytes. For IPv4, this may require a manual configuration of a 249 different link MTU than the maximum that all the nodes support. For 250 IPv6, one can use the MTU option of the router advertisement [DISC] 251 to announce a smaller MTU to all the nodes. 253 In case an IPoIB link spans more than one IB subnet, the IPoIB link 254 MTU MUST not exceed the path MTU of any path connecting two nodes in 255 the same IB partition. It is up to the network administrator to 256 determine the appropriate path MTU value that will work for any node 257 in the same IPoIB link. 259 6.2 IPoIB Link Q_Key 261 A Q_Key is programmed by the source QP in every IB datagram, and is 262 compared against the Q_Key of the destination QP. A Q_Key violation 263 will cause the offending datagram to be dropped, and a Q_Key 264 violation counter to be incremented on the receiving port. A trap is 265 also generated if the feature is supported on that port. 267 A single Q_Key must be selected for all the QPs attached to an IPoIB 268 link to use. It is recommended that a controlled Q_Key be used with 269 the high order bit set. This is to prevent non-privileged software 270 from fabricating and sending out bogus IP datagrams. All QPs 271 configured for a given IPoIB link SHALL be assigned the same per-link 272 Q_Key. 274 6.3 Other Link Attributes 276 TClass, FlowLabel, HopLimit, and SL are four other attributes that 277 are required if an IPoIB link covers more than a single IB subnet. 278 The selection of these values are implementation dependent. 279 Implementations must take into account the topology of IB subnets 280 comprising the IPoIB link to ensure a successful communication 281 between any two nodes in the same IPoIB link. 283 7.0 The IPoIB Broadcast Group 285 Once an IB partition is created with link attributes identified for 286 an IPoIB link, the network administrator must create a special IB 287 all-node multicast group (henceforth referred to as the broadcast 288 group) with these link attributes for every node on the IPoIB link to 289 join. The creation of an IB multicast group is through the use of 290 the "MCMemberRecord" SA attribute as described in the IBA 291 specification. 293 The MGID of an IPoIB broadcast group will embed in it the P_Key of 294 the IB partition that defines the IPoIB link. A special signature is 295 also embedded to identify all the MGIDs for IPoIB use only. For IPv4 296 over IB, the signature will be "0x401B". For IPv6 over IB, the 297 signature will be "0x601B". 299 For an IPv4 subnet, the MGID for this special IB multicast group 300 SHALL have the following format: 302 | 8 | 4 | 4 | 16 bits | 16 bits | 48 bits | 32 bits | 303 +--------+----+----+----------------+---------+----------+---------+ 304 |11111111|0001|scop|0100000000011011|< P_Key >|00.......0|| 305 +--------+----+----+----------------+---------+----------+---------+ 307 Figure 2 309 For an IPv6 subnet, the format of the MGID SHALL look like this: 311 | 8 | 4 | 4 | 16 bits | 16 bits | 80 bits | 312 +--------+----+----+----------------+---------+--------------------+ 313 |11111111|0001|scop|0110000000011011|< P_Key >|000.............0001| 314 +--------+----+----+----------------+---------+--------------------+ 316 Figure 3 318 As for the scop bits, if the IPoIB link is fully contained within a 319 single IB subnet, the scop bits SHALL be set to 2 (link-local). 320 Otherwise the scope will be set higher. 322 The broadcast group for IPv4 will serve to provide a broadcast 323 service for protocols like ARP to use. 325 When a node is first brought up on an IPoIB link identified by a 326 P_Key, it must look for the right broadcast group to join. This is 327 done by querying the SA MCMemberRecord database for a multicast group 328 with a MGID matching the one constructed from the link P_Key and the 329 IPoIB signature. The node SHOULD always look for a MGID of a link- 330 local scope first before attempting one with a greater scope. 332 Once the right MGID and broadcast group are identified, the local 333 node SHOULD use the MTU associated with the broadcast group. In case 334 the MTU of the broadcast group is greater than what the local HCA can 335 support, the node can not join the IPoIB link and operate as an IP 336 node. Otherwise the local node must join the broadcast group as a 337 "full member" and use the rest of link attributes associated with the 338 group for all future communication to the link. 340 In addition to the special all-node multicast group for broadcast 341 purpose, an all-router multicast group may be created at link 342 configuration time if an IP router will be attached to the link. This 343 is to facilitate IP multicast operations described later. An IB 344 multicast group for the all-router MGID must cover every IB subnet 345 that the IPoIB link encompasses. The format of the all-router MGID 346 will be covered in the next section. 348 8.0 Mapping for other Multicast Groups 350 The general IP multicast [IPMULT] support over IB is similar to the 351 case of the special broadcast group discussed above. An algorithmic 352 mapping is used so that given an IP multicast address, individual 353 host can compute the corresponding IB multicast address (MGID) all by 354 itself without having to consult an external entity. This also 355 removes the need for an externally maintained IP to IB multicast 356 mapping table. 358 The IPoIB multicast mapping is depicted in Figure 4. The same mapping 359 function is used for both IPv4 and IPv6 except the IPoIB signature 360 field. 362 | 8 | 4 | 4 | 16 bits | 16 bits | 80 bits | 363 +------ -+----+----+-----------------+---------+--------------------+ 364 |11111111|0001|scop||< P_Key >| group ID | 365 +--------+----+----+-----------------+---------+--------------------+ 367 Figure 4 369 Since a MGID allocated for transporting IP multicast datagrams is 370 considered only a transient link-layer multicast address, all IB 371 MGIDs allocated for IPoIB purpose SHOULD have T = 1. The scope bits 372 SHALL be the same as that of the all-node MGID for the same IPoIB 373 link. 375 An IP multicast address is used together with a given IPoIB link 376 P_Key to form the MGID of the IB multicast group. For IPv6 the lower 377 80-bit of the group ID is used directly in the lower 80-bit of the 378 MGID. For IPv4, the group ID is only 28-bit long and the rest of the 379 80 bits are filled with 0. 381 The rest of the bits are the same as those of the broadcast MGID. 382 For example, on an IPoIB link that is fully contained within a single 383 IB subnet with a P_Key of 0x8006, the MGIDs for the all-router 384 multicast group with group ID 2 [AARCH, IGMP2] are: 386 FF12:401B:8006:0:0:0:0:2 388 or 390 FF12:401B:8006::2 392 for IPv4 in a compressed format, and 394 FF12:601B:8006:0:0:0:0:2 396 or 398 FF12:601B:8006::2 400 for IPv6 in a compressed format. 402 A special case exists for the IPv4 limited broadcast address 403 "255.255.255.255" [HOSTS]. The address SHALL be mapped to the 404 broadcast MGID for IPv4 networks as described in section 7 above. 405 Also the IPv6 all-node multicast address "FF0X::1" [AARCH] maps 406 naturally to the the special broadcast MGID for IPv6 networks. 408 9.0 Sending and Receiving IP Multicast Packets 410 Multicast in InfiniBand differs in a number of ways from multicast in 411 Ethernet. This adds some complexity to an IPoIB implementation when 412 supporting IP multicast over IB. 414 A) An IB multicast group must be explicitly created through the SA 415 before it can be used. 417 This implies that in order to send a packet destined for an IP 418 multicast address, the IPoIB implementation must check with the SA on 419 the outbound link first for a "MCMemberRecord" that matches the MGID. 420 If one does exist, the MLID associated with the multicast group is 421 used as the DLID for the packet. Otherwise, it implies no member 422 exists on the local link. If the scope of the IP multicast group is 423 beyond link-local, the packet must be sent to the on-link routers 424 through the use of the all-router multicast group or the broadcast 425 group. This is to allow local routers to forward the packet to 426 multicast listeners on remote networks. The all-router multicast 427 group is preferred over the broadcast group for better efficiency. If 428 the all-router multicast group does not exist, the sender can assume 429 that there are no routers on the local link; hence the packet can be 430 safely dropped. 432 B) A multicast sender must join the target multicast group as a 433 "SendOnlyNonMember" before outgoing multicast messages from it can be 434 successfully routed. The "SendOnlyNonMember" join is different from 435 the regular "FullMember" join in two aspects. First, both types of 436 joins enable multicast packets to be routed FROM the local port, but 437 only the "FullMember" join causes multicast packets to be routed TO 438 the port. Second, the sender port of a "SendOnlyNonMember" join will 439 not be counted as a member of the multicast group for purposes of 440 group creation and deletion. 442 The following code snippet demonstrates the steps in a typical 443 implementation when processing an egress multicast packet. 445 if the egress port is already a "SendOnlyNonMember", or a 446 "FullMember" 447 => send the packet 449 else if the target multicast group exists 450 => do "SendOnlyNonMember" join 451 => send the packet 453 else if scope > link-local AND the all-router multicast group exists 454 => send the packet to all routers 455 else 456 => drop the packet 458 Implementations should cache the information about the existence of 459 an IB multicast group, its MLID and other attributes. This is to 460 avoid expensive SA calls on every outgoing multicast packet. Senders 461 MUST subscribe to the multicast group create and delete traps in 462 order to monitor the status of specific IB multicast groups. E.g., 463 multicast packets directed to the all-router multicast group due to a 464 lack of listener on the local subnet must be forwarded to the right 465 multicast group if the group is created later. This happens when a 466 listener shows up on the local subnet. 468 A node joining an IP multicast group must first construct a MGID 469 according to the rule described in section 8 above. Once the correct 470 MGID is calculated, the node must call the SA of the outbound link to 471 attempt a "FullMember" join of the IB multicast group corresponding 472 to the MGID. If the IB multicast group doesn't already exist, one 473 must be created first with the IPoIB link MTU. For the rest of 474 attributes, the same values from the all-node multicast/broadcast 475 group SHOULD be used. 477 The join request will cause the local port to be added to the 478 multicast group. It also enables the SM to program IB switches and 479 routers with the new multicast information to ensure the correct 480 forwarding of multicast packets for the group. 482 When a node leaves an IP multicast group, it SHOULD make a 483 "FullMember" leave request to the SA. This gives SM an opportunity to 484 update relevant forwarding information, to delete an IB multicast 485 group if the local port is the last FullMember to leave, and free up 486 the MLID allocated for it. The specific algorithm is implementation- 487 dependent, and is out of the scope of this document. 489 Note that for an IPoIB link that spans more than one IB subnet 490 connected by IB routers, an adequate multicast forwarding support at 491 the IB level is required for multicast packets to reach listeners on 492 a remote IB subnet. The specific mechanism for this will be covered 493 in [IBTA], and is beyond the scope of IPoIB. 495 10.0 IP Multicast Routing 497 IP multicast routing requires multicast routers to receive a copy of 498 every link multicast packet on a locally connected link [IPMULT, 499 IP6MLD]. For Ethernet this is usually achieved by turning on the 500 promiscuous multicast mode on a locally connected Ethernet interface. 502 IBA does not provide any hardware support for promiscuous multicast 503 mode. Fortunately a promiscuous multicast mode can be emulated in 504 the software running on a router through the following steps. 506 A) Obtain a list of all active IB multicast groups from the local SA. 508 B) Make a "NonMember" join request to the SA for every group that has 509 a signature in its MGID matching the one for either IPv4 or IPv6. 511 C) Subscribe to the IB multicast group creation events using a 512 wildcarded MGID so that the router can "NonMember" join all IB 513 multicast groups created subsequently for IPv4 or IPv6. 515 The "NonMember" join has the same effect as a "FullMember" join 516 except that the former will not be counted as a member of the 517 multicast group for purposes of group creation or deletion. That is, 518 when the last "FullMember" leaves a multicast group, the group can be 519 safely deleted by the SA without concerning any "NonMember" routers. 521 11.0 New Types of Vulnerability in IB Multicast 523 Many IB multicast functions are subject to failures due to a number 524 of possible resource constraints. These include the creation of IB 525 multicast groups, the join calls ("SendOnlyNonMember", "FullMember", 526 and "NonMember"), and the attaching of a QP to a multicast group. 528 In general, the occurrence of these failure conditions is highly 529 implementation dependent, and is believed to be rare. Usually a 530 failed multicast operation at the IB level can be propagated back to 531 the IP level, causing the original operation to fail, and the 532 initiator of the operation to be notified. But some IB multicast 533 functions are not tied to any foreground operation, making their 534 failures hard to detect. E.g., if an IP multicast router attempts to 535 "NonMember" join a newly created multicast group in the local subnet, 536 but the join call fails, packet forwarding for that particular 537 multicast group will likely to fail silently, that is, without the 538 attention of local multicast senders. This type of problems can add 539 more vulnerability to the already unreliable IP multicast operations. 541 Implementations should log error messages upon any failure from an IB 542 multicast operation. Network administrators should be aware of this 543 vulnerability, and preserve enough multicast resources at the points 544 where IP multicast will be used heavily. E.g., HCAs with ample 545 multicast resources should be used at any IP multicast router. 547 12.0 Security Considerations 549 All the operations for creating and configuring an IPoIB link 550 described in this document, including assigning P_Keys to CAs, 551 creating IB multicast groups in SA, creating and attaching QPs to IB 552 multicast groups,... etc, are privileged operations, and MUST be 553 protected by the underlying operating system. This is to prevent 554 malicious, non-privileged software from hijacking important resources 555 and configurations. For example, A bogus IPoIB broadcast group may 556 prevent a proper one from being created when the network 557 administrator tries to set up a link. 559 Controlled Q_Keys SHOULD be used in IPoIB links. This is to prevent 560 non-privileged software from fabricating IP datagrams to send, as 561 mentioned in section 6.2. 563 13.0 Acknowledgments 565 The authors would like to thank Bruce Beukema, David Brean, Dan 566 Cassiday, Aditya Dube, Yaron Haviv, Michael Krause, Thomas Narten, 567 Erik Nordmark, Greg Pfister, Renato Recio, Kanoj Sarcar, Satya 568 Sharma, and David L. Stevens for their suggestions and many 569 clarifications on the IBA specification. 571 14.0 References 573 [AARCH] Hinden, R. and S. Deering "IP Version 6 Addressing 574 Architecture", RFC 2373, July 1998. 576 [DHCP] R. Droms "Dynamic Host Configuration Protocol", RFC 2131, 577 March 1997. 579 [DISC] Narten, T., Nordmark, E. and W. Simpson, "Neighbor 580 Discovery for IP Version 6 (IPv6)", RFC 2461, December 581 1998. 583 [HOSTS] Braden R., "Requirements for Internet Hosts -- 584 Communication Layers", RFC 1122, October 1989 586 [IBTA] InfiniBand Architecture Specification, Release 1.0.a by 587 InfiniBand Trade Association at www.infinibandta.org 589 [IGMP2] Fenner W., "Internet Group Management Protocol, Version 2", 590 RFC 2236, November 1997. 592 [IPMULT] Deering S., "Host Extensions for IP Multicasting", RFC 593 1112, August 1989. 595 [IPoIB_ARCH] draft-ietf-ipoib-architecture-01.txt 597 [IPoIB_ENCAP] draft-ietf-ipoib-ip-over-infiniband-01.txt 599 [IPV6] Deering, S. and R. Hinden, "Internet Protocol, Version 6 600 (IPv6) Specification", RFC 2460, December 1998. 602 [IP6MLD] Deering S., Fenner W., Haberman B., "Multicast Listener 603 Discovery (MLD) for IPv6", RFC 2710, October 1999. 605 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 606 Requirement Levels", BCP 14, RFC 2119, March 1997. 608 15.0 Author's Address 610 H.K. Jerry Chu 611 17 Network Circle, UMPK17-201 612 Menlo Park, CA 94025 613 USA 615 Phone: +1 650 786-5146 616 EMail: jerry.chu@sun.com 618 Vivek Kashyap 619 IBM 620 15450, SW Koll Parkway 621 Beaverton, OR 97006 623 Phone: 503 578 3422 624 EMail: vivk@us.ibm.com 626 16.0 Full Copyright Statement 628 Copyright (C) The Internet Society (2003>. All Rights Reserved. 630 This document and translations of it may be copied and furnished to 631 others, and derivative works that comment on or otherwise explain it 632 or assist in its implementation may be prepared, copied, published 633 and distributed, in whole or in part, without restriction of any 634 kind, provided that the above copyright notice and this paragraph are 635 included on all such copies and derivative works. However, this 636 document itself may not be modified in any way, such as by removing 637 the copyright notice or references to the Internet Society or other 638 Internet organizations, except as needed for the purpose of 639 developing Internet standards in which case the procedures for 640 copyrights defined in the Internet Standards process must be 641 followed, or as required to translate it into languages other than 642 English. 644 The limited permissions granted above are perpetual and will not be 645 revoked by the Internet Society or its successors or assigns. 647 This document and the information contained herein is provided on an 648 "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING 649 TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING 650 BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION 651 HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF 652 MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.