INTERNET DRAFT V.Kashyap IBM Expiration Date: January 12, 2002 July 12, 2001 Transmission of IPv6 packets over InfiniBand networks Status of this memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC 2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as Reference material or to cite them other than as ``work in progress''. The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html This memo provides information for the Internet community. This memo does not specify an Internet standard of any kind. Distribution of this memo is unlimited. Copyright Notice Copyright (C) The Internet Society (2001). All Rights Reserved. Abstract This document specifies a method for the transmission of IP version 6 datagrams over InfiniBand subnets. Table of Contents 1.0 Introduction 2.0 InfiniBand data link 2.1 UD packet format 2.2 IPv6 over UD requirements Kashyap [Page 1] INTERNET-DRAFT IPv6 over InfiniBand July 12, 2001 3.0 IPv6 and IB Raw IPv6 datagrams 4.0 IPv6 address mapping over InfiniBand 4.1 Multicast address mapping 4.2 Link local address mapping 4.3 Stateless Autoconfiguration 5.0 Address Resolution 5.1 Link layer address 5.1.1 LID 5.1.2 Capability flags 5.1.3 QPN and Q_Key 5.1.4 GID 5.1.5 Service Level 5.1.6 Determining the MTU 5.1.7 P_Key 6.0 Maximum Transmission Unit 7.0 Frame Format 8.0 Security Considerations 9.0 References 10.0 Author's Address 11.0 APPENDIX A: Introduction to InfiniBand 12.0 APPENDIX B: Headers used in UD communication 1. Introduction This memo specifies the MTU and frame format for transmission of IPv6 packets on InfiniBand networks. It also specifies the method of forming IPv6 link-local addresses on InfiniBand networks and the content of the Source/Target Link-layer Address option used in Router Solicitation, Router Advertisement, Redirect, Neighbor Solicitation and Neighbor Advertisement messages when those messages are transmitted on an InfiniBand network. The reader is referred to APPENDIX A at the end of this document for a brief description of InfiniBand(TM) architecture. The InfiniBand specification [1] can be found at www.infinibandta.org. The document 'IP over InfiniBand: Overview, issues and requirements' [2] provides a short overview of InfiniBand architecture and issues with respect to specifying IP over InfiniBand. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119. 2.0 InfiniBand data link InfiniBand(IB) provides multiple methods of packet exchange Kashyap [Page 2] INTERNET-DRAFT IPv6 over InfiniBand July 12, 2001 beween two endpoints. These are : Reliable Connected (RC) Reliable Datagram (RD) Unreliable Connected (UC) Unreliable Datagram (UD) Raw Datagram - Raw IPv6 (R6) - Raw Ethertype (RE) IPv6 can be specified over any, multiple or all of these methods. A case can be made for support on any of the methods depending on the desired parameters. However, only Unreliable Datagram is required to be supported by all the IB nodes. The host channel adapters (HCAs) are additionally required to support Reliable connected and Unreliable connected modes. This is not the case with target channel adapters. However only Unreliable datagram and IB's Raw datagram modes support multicast. Raw IPv6 mode of raw datagram is an obvious fit for IPv6 since it can carry IPv6 packets directly. Unfortunately it has been specified as an optional feature in the InfiniBand specification. Given the above conditions this document specifies a method to encapsulate IPv6 packets over UD mode of InfiniBand. 2.1 UD packet format The UD packet may be transmitted in two ways: 1) Local (within an IB subnet) packets +--------+---------+---------+-------+---------+---------+ |Local |Base |Datagram |Packet |Invariant| Variant | |Routing |Transport|Extended |Payload| CRC | CRC | |Header |Header |Transport| | | | | | |Header | | | | +--------+---------+---------+-------+---------+---------+ 2) Global (between IB subnets) packets +--------+-------+---------+---------+-------+---------+---------+ |Local |Global |Base |Datagram |Packet |Invariant| Variant | |Routing |Routing|Transport|Extended |Payload| CRC | CRC | |Header |Header |Header |Transport| | | | | | | |Header | | | | +--------+-------+---------+---------+-------+---------+---------+ Kashyap [Page 3] INTERNET-DRAFT IPv6 over InfiniBand July 12, 2001 For details of the header format please refer to appendix B. 2.2 IPv6 over UD requirements Based on the UD headers it is clear that the IPv6 implementation must know the following information before it can send a packet to a peer: 1. LID 2. Service Level 3. Path MTU between the communicating port 4. Partition Key 5. Queue Pair Number 6. Q_Key 3.0 IPv6 and IB Raw IPv6 datagrams The IB specification defines the Raw IPv6 mode as being indistinguishable with IPv6 i.e. it has the following format: +-------+------+---------+---------+ |Local | | | 16 bit | |Routing| IPv6 | Payload | Variant | |Header |Header| | CRC | | | | | | +-------+------+---------+---------+ Thus the simplest case of encapsulating IPv6 (or IPv4 for that matter) is using Raw IPv6 mode of IB transport. However, this is not likely to find favor due to supporting only a 16 bit CRC. Additionally this mode is optional and so not all channel adapters will support it. However, this mode is specified by IB and hence could be implemented on a subnet. Such an implementation is straight forward and is not specified in this document. Such an implementation is NOT RECOMMENDED. It is RECOMMENDED that all IPv6 over InfiniBand implementations utilise the UD mode. 4.0 IPv6 address mapping over InfiniBand The IB specification utilises a GRH, as in figure 2 and figure GRH in APPENDIX B, that is identical in appearance to the IPv6 Kashyap [Page 4] INTERNET-DRAFT IPv6 over InfiniBand July 12, 2001 header. The IB specification further specifies the multicast addresses for Raw mode exactly as in IPv6 RFC2373. For UD mode the IB specification limits itself to defining only one multicast group FF02::1 - all-nodes on the subnet multicast group. The IB group address refers to the IB subnet. The IPv6 group would refer to all the IPv6 systems on the subnet. They are in different address spaces but look very much the same. Similarly, the IB specification describes a link local address as FE80:: of the IB port. The IPv6 link local address is exactly the same if the IPv6 interface is associated to the port. The IB router will not forward any packets with the link-local prefix FE80:/10 in the source of the GRH/IPv6 header. Thefore there is a lot of scope for confusion in setting up IPv6 over IB fabrics. The link layer addresses and rules are very similar or same as the IPv6 addresses and rules. 4.1 Multicast address mapping As noted above the Raw IPv6 mode addresses are the same as the IPv6 addresses. The IB specification also refers to RFC2373 and acknowledges all well-known IPv6 multicast addresses to apply to IB's Raw IPv6 mode. Thus it could be assumed that the same mapping could be extended to UD multicast address groups. However, the multicast groups in the SA are keyed using the multicast address itself (the multicast GID). The multicast GIDs are marked whether they are for Raw mode or the UD mode. Additionally, Raw IPv6 packets and the UD packets cannot be received by QP's of the other type, multicast or otherwise. Thus this straightforward mapping doesn't work. The IPv6 multicast addresses MUST be mapped to the corresponding IB multicast groups as follows: i) The scope bits are copied as is ii) The transient flag is always set. No mapped IPv6 address over UD is a well known IB address. iii) The 3rd and 4th octet's of the IB multicast group MUST always be set to 0x3333. Thus the IPv6 multicast address, FFxy::<112 bits> is mapped to the IB multicast GID, FF1y:3333::<96 bits>. Kashyap [Page 5] INTERNET-DRAFT IPv6 over InfiniBand July 12, 2001 It follows from the above that IPv6 subnets are always fully contained within an IB subnet. An IPv6 local scope multicast address is mapped to a local scope IB multicast GID. A packet sent to such an address cannot cross the IB subnet boundary. It must be noted that if IPv6 subnets were to span across IB subnets the resultant multicast groups would always span all IB subnets. The site local values are not well-defined in IB specification. The global scope applies to all subnets. Additionally, the setting up of cross IB subnet multicast groups is a difficult task since the various group parameters must be consistent across subnets. This task is made more difficult in the absence of a well defined IB routing protocl and IB node to IB router interaction specification. 4.2 Link local address mapping Every port in the InfiniBand subnet has two unique interface identifiers. These are the i) EUI-64 identifier associated with a port ii) 16-bit LID associated with the port. It is RECOMMENDED that the link-local address be constructed from the port's EUI-64 identifier as per the rules specified in RFC2373[3]. Note that this is exactly the same as the InfiniBand defined link local address for the IB subnet [1]. The link local address MAY also be constructed from the 16 bit LID. 4.3 Stateless Autoconfiguration The IPv6 prefix is prepended to the interface identifier derived from the EUI-64 identifier associated with the port. 5.0 Address Resolution Section 2.0 described the parameters needed to send/receive an InfiniBand packet successfully. 5.1 Link layer address The procedure for IPv6 address resolution is described in RFC2461[5]. The source/target address option is specified as follows: Option fields: Type: Kashyap [Page 6] INTERNET-DRAFT IPv6 over InfiniBand July 12, 2001 Source Link-Layer Address 1 Target Link-Layer Address 2 Length: 32 Link Information: 16 bits: LID 8 bits: Capability flags 24 bits: QPN 32 bits: Q_Key 128 bits: GID 48 bits: zero pad 5.1.1 LID This is the LID associated with the port to which the IPv6 address is attached by way of the logical interface. 5.1.2 Capability flags Only the first 5 bits are defined. The rest are for future use. The first 4 bits denote the InfiniBand modes over which IPv6 is supported. UC - unreliable connected RC - reliable connected RE - raw ethertype R6 - raw IPv6 The support of IPv6 over UD is mandatory and therefore it need not be indicated in these bits. The rest are all optional. The support of raw IPv6 mode is described in IB specification but its implementation is a choice both in the IP stack and the IB setup. The implementation details of the other formats are beyond the scope of this document. The flags provide a way for the IPv6 over IB implementations to indicate the possibilities among themselves. The use of these capabilites is then a choice between the communicating endpoints. QPN flag: QPN flag indicates that the endpoint supports applications that are tied to specific QPs. Since there may be a large number of QPs available at the endpoints (QP number is 24 bits) an endpoint can choose to map various services (protocol and port pairs) to specific QPNs. This flag indicates the use Kashyap [Page 7] INTERNET-DRAFT IPv6 over InfiniBand July 12, 2001 of such demultiplexing. The flag will be set by hosts that want to advertise such a use. The endpoints that don't support QPN demultiplexing don't use this flag. The receiver is free to ignore this flag and continue to use the default QPN (described below) and not determine the service related QPN. By the same token, a host that implements QPN based demultiplexing MUST accept packets that are received on the default QPN even if it is demultiplexing the corresponding service by use of QPNs. The method of service resolution to the corresponding QPN is not defined in this document. 5.1.3 QPN and Q_Key This is the default QPN to be used to communicate to the endnode. The sender lists the QPN it expects the packets to be sent to and the target replies with its QPN. The Q_Key is the corresponding Q_Key the endpoints intend to use. 5.1.4 GID The GID fulfils the need of implementations that might prefer to use a well defined, largely invariant link address to identify endpoints. 5.1.5 Service Level Every IB packet must include the service level (SL). The service level may be derived based on any parameter such as QoS mappings, load balancing setup etc. The only condition is that it must be one of the valid SLs between the two endpoints. This information is kept with the subnet administrator(SA) component of the subnet manager(SM)[1]. It is upto the implementation to determine the SL in a suitable manner. This document RECOMMENDS using the SL used with the all-hosts multicast group (IB GID FF12:3333::1) as the default SL for all communication. An implementation MUST provide configuration parameters to define the method of determining the SL. Kashyap [Page 8] INTERNET-DRAFT IPv6 over InfiniBand July 12, 2001 5.1.6 Determining the MTU The MTU used for the multicast group is by definition the MTU that can be used by all the members of the IPv6 subnet. Thus the IB interface MTU is set to the MTU returned when the interface joins the IB group FF12:3333::1 which corresponds to the IPv6 all-host multicast address. 5.1.7 P_Key The partition key is a necessity in all IB communication. This document recommends using the P_Key associated with the all hosts multicast group (IB GID FF12:3333::1) for all IPv6 subnet related communication. 6.0 Maximum Transmission Unit The IB specification defines the following set of link MTUs: 256, 512, 1024, 2048, 4096 bytes All IPv6 hosts join the all-hosts multicast group, IB GID FF12:3333::1. This group will include the link MTU at which the packets can be sent across the subnet. This value along with other subnet parameters is returned to the node when the IB group is joined. The IPv6 host knows the MTU as a result of this operation. This size may be modified by a Router Advertisement containing an MTU option or may be modified by manual configuration (user or by DHCP). If the MTU is set to be larger than the value received from the IB multicast group then it MUST be ignored. It is a MUST that the subnet administrator setup the all-hosts IB multicast group when an IPv6 subnet is setup. It is RECOMMENDED that the group be setup with an MTU of at least 2048 bytes. 7.0 Frame Format The IB frame does not indicate the payload type. The queue pairs (QP) at the endpoints are tied to specific 'users' who know the data they are likely to receive. In this scenario if a common QP is used to receive multiple protocols there are two options: a) Introduce a protocol identifier header into the payload. Kashyap [Page 9] INTERNET-DRAFT IPv6 over InfiniBand July 12, 2001 This mode would translate to introducing a 4 byte field in the packet before the IPv6 header. This field would carry the assigned value to indicate IPv6. This option, however has a flaw. Since it is being protected from a random protocol or data reception the random data has an equal chance of looking like the new header as with the IPv6 header. b) Determine the packet based on IPv6 header requirements The packet received can be checked for IPv6 signatures such as the value 6 in the first nibble. The likely conflict is with IPv4/ARP packets on the unicast connections. These can be distinguished by looking at the first nibble. Any intermediate IB switches/IB routers might as well look at the first nibble (and fields) directly rather than a header in the payload. [ Opinions of the WG members are solicited on this. The author's preference is for not specifying an additional header. ] The frame format is as follows: +-------+------+---------+---------+---------+---------+---------+ |Local | |Base |Datagram | IPv6 |Invariant| Variant | |Routing| GRH* |Transport|Extended | Datagram| CRC | CRC | |Header |Header|Header |Transport| | | | | | | |Header | | | | +-------+------+---------+---------+---------+---------+---------+ The GRH header is optional. 8.0 Security Considerations This document specifies IPv6 packet transmission over a broadcast network. Any network of this kind is vulnerable to a sender claiming another's identity and forge traffic or eavesdrop. It is the responsibility of the higher layers or applications to implement suitable counter-measures if this is a problem. 9.0 References: [1] InfiniBand Architecture Specification, Volume 1, Release 1.0 [2] draft-kashyap-ipoib_requirements-00.txt. V. Kashyap [3] RFC2373: IPv6 Version 6 Addressing Architecture. R. Hinden,S. Deering. Kashyap [Page 10] INTERNET-DRAFT IPv6 over InfiniBand July 12, 2001 [4] RFC2375: IPv6 Multicast Address Assignments. R. Hinden, S. Deering. [5] RFC2461: Neighbor Discovery for IP Version 6 (IPv6) T. Narten, E. Nordmark, W. Simpson [6] draft-kashyap-ipoib-ipv4-multicast-00.txt V. Kashyap [7] draft-kashyap-ipoib-ipv4-and-arp-00.txt V. Kashyap 10.0 Author's Address Vivek Kashyap IBM 15450, SW Koll Parkway Beaverton, OR 97006 Work: 503 578 3422 Email: vivk@us.ibm.com 11.0 APPENDIX A: Introduction to InfiniBand For a more complete overview the reader is referred to chapter 3 of the InfiniBand specification. InfiniBand Architecture (IBA) defines a System Area Network (SAN) for connecting multiple independent processor platforms, I/O platforms and I/O devices. The IBA SAN is a communications and management infrastructure supporting both I/O and inter-processor communications for one or more computer systems. An IBA SAN consists of processor nodes and I/O units connected through an IBA fabric made up of cascaded switches and IB routers (connecting IB subnets). I/O units can range in complexity from single ASIC IBA attached devices such as a LAN adapter to a large memory rich RAID subsystem. IBA network is subdivided into subnets interconnected by IB routers. These are IB routers and IB subnets and not IP routers or IP subnets. Each IB node or switch may attach to a single or multiple switches or directly with each other. Each node interfaces with the link by way of channel adapters (CAs). The architecture supports multiple CAs per unit with each CA providing one or mode ports that connect to the fabric. Each CA appears as a node to the fabric. The ports are the endpoints to which the data is sent. Kashyap [Page 11] INTERNET-DRAFT IPv6 over InfiniBand July 12, 2001 However, each of the ports may include multiple QPs (queue pairs) that may be directly addressed from a remote peer. From the point of view of data transfer the QP number (QPN) is part of the address. IBA supports both connection oriented and datagram service between the ports. The peers are identified by QPN and the port identifier. In raw datagram mode the QPN is not used. A port may be identified by a local ID (LID) and optionally a Global ID (GID). The GID is 128 bits long and is formed by the concatenation of a 64 bit subnet prefix and a 64 bit EUI-64 compliant portion (GUID). The LID is a 16 bit value that is assigned when the port becomes active. Note that the GUID is the only persistent identifier of a port. However, it cannot be used as an address in a packet. If the prefix is modified then the GID may change. The subnet manager may attempt to keep the LID values constant across shutdowns but that is not a requirement. The assignment of the GID and the LID is done by the subnet manager. Every IB subnet has at least one subnet manager component that controls the fabric. It assigns the LIDs and GIDs, it programs the switches so that they route packets between destinations. The subnet manager and a related component, the subnet administrator (SA) are the central repository of all information that is required to setup and bring up the fabric. IB routers are components that route packets between IB subnets based on the GIDs. Thus within and IB subnet a packet may or may not include a GID but when going across an IB subnet the GID must be included. A LID is always needed in a packet since the destination within a subnet is determined by it. A CA and a switch may have multiple ports. Each CA port is assigned its own LID or a range of LIDs. The ports of a switch are not addressable by LIDs/GIDs or in other words, are transparent to other end nodes. Each port has its own set of buffers. The buffering is channeled through virtual lanes (VL) where each VL has its own flow control. There may be upto 16 VLs. VLs provide a mechanism for creating multiple virtual links within a single physical link. All ports however must support VL15 which is reserved exclusively for subnet management Kashyap [Page 12] INTERNET-DRAFT IPv6 over InfiniBand July 12, 2001 datagrams and hence doesn't concern the IPoIB discussions. The actual VL that a port uses is configured by the SM and is based on the Service Level (SL) specified in every packet. There are 16 possible SLs. In addition to the features described above viz. Queue Pairs (QPs), Service Levels (SLs) and addressing (GID/LID), IBA also defines the following: P_Keys or partition keys: Every packet, but for the raw datagrams, carries the partition key (P_key). These values are used for isolation in the fabric. A switch (this is an optional feature) may be programmed by the SM to drop packets not having a certain key. The same is the case with the receiving CA. Q_Keys: These are used to enforce access rights for reliable and unreliable IB datagram services. Raw datagram services don't require this value. At communication establishment the endpoints exchange the Q_Keys and must always use the relevant Q_Keys when communicating with one another. Mutlicast support: A switch may support multicasting ie. replication of packets across multiple output ports. This is an optional feature at the switches. A multicast group is identified by a GID. The GID format is as defined in RFC 2373 on IPv6 addressing. Thus from an IPv6 over IB's point of view the data link multicast address looks like the network address. An IB node must explicitly join a multicast group by a request to the SM to receive packets. A node may send packets to any multicast group. In both cases the multicast LID to be used in the packets is received from the SM. There are 6 transport types specified by the IB architecture. These are : 1. Unreliable Datagram (unacknowledged - connectionless) The UD service is connectionless and unacknowledged. It allows the QP to communicate with any unreliable datagram QP on any node. The switches and hence each link can support only a certain MTU. The MTU ranges are 256 bytes, 512 bytes, 1024 bytes, 2048 bytes, 4096 bytes. A UD packet cannot be larger than the smallest link MTU between the two peers. Kashyap [Page 13] INTERNET-DRAFT IPv6 over InfiniBand July 12, 2001 2. Reliable Datagram (acknowledged - multiplexed) The RD service is multiplexed over connections between nodes called End to end contexts (EEC) which allows each RD QP to communicate with any RD QP on any node with an established EEC. Multiple QPs can use the same EEC and a single QP can use multiple EECs (one for each remote node per reliable datagram domain). 3. Reliable Connected (acknowledged - connection oriented) The RC service associates a local QP with one and only one remote QP. The message sizes maybe as large as 2^31 bytes in length. The CA implementation takes care of segmentation and assembly. 4. Unreliable Connected (unacknowledged - connection oriented) The UC service associates one local QP with one and only one remote QP. There is no acknowledgment and hence no resend of lost or corrupted packets. Such packets are therefore simply dropped. It is similar to RC otherwise. 5. Raw Ethertype (unacknowledged - connectionless) The Ethertype raw datagram packet contains a generic transport header that is not interpreted by the CA but it specifies the protocol type. The values for ethertype are the same as defined in RFC1700 for ethertype. 6. Raw IPv6 ( unacknowledged - connectionless) Using IPv6 raw datagram service, the IBA CA can support standard prtocol layers atop IPv6 (such as TCP/UDP). Thus native IPv6 packets can be bridged into the IBA SAN and delivered directly to a port and to its IPv6 raw datagram QP. The first 4 are referred to as IB transports. The latter two are classified as Raw datagrams. There is no indication of the QP number in the raw datagram packets. The raw datagram packets are limited by the link MTU in size. 12.0 APPENDIX B: Headers used in UD communication 1 Local Routing header 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Virtual|Link |Service|Rsr|LNH| Destination Local ID | Kashyap [Page 14] INTERNET-DRAFT IPv6 over InfiniBand July 12, 2001 | Lane |Version| Level |vd | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Reserved | Packet Length | Source Local ID | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Of the header elements the sending node's IPv6 stack must know the Service Level, Destination LID and the source LID. In addition packet length cannot specify a payload of more than the path MTU between the source and the destination ports. The other values are either well known standard values or are determined from other known values. For example, the VL is determined from the SL. Kashyap [Page 15] INTERNET-DRAFT IPv6 over InfiniBand July 12, 2001 2 Global Routing header This header is used when the packet must traverse IB subnet boundaries. The GRH looks like the IPv6 header. The GID looks like an IPv6 address. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Version| Traffic Class | Flow Label | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Payload Length | Next Header | Hop Limit | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Source GID | | | | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Destination GID | | | | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure GRH This header is needed only if the packet is sent across the IB subnet. Note that from the point of view of the IPv6 layer the GID is another form of MAC address albeit incomplete since the LID is always needed for any communication. The version is always set to 6, the Traffic Class, Flow label etc. are likely to be determined in response to a policy or default values may be used. The next header field is always the BTH (Base transport header). The hop limit is a function of the configuration. Only the destination GID needs to be determined from the resolution of target IPv6 address to the link layer address. Kashyap [Page 16] INTERNET-DRAFT IPv6 over InfiniBand July 12, 2001 3 Base Transport Header 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | OpCode |S|M|PC | Tver | Partition Key (P_Key) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Reserved | Destination Queue Pair(QP) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |A| Reserved | Packet Sequence Number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Of these the P_Key and the destination QP must be determined as part of the IPv6 address resolution process. The rest of the fields are either not used by UD mode or are filled in the the channel adapter based on local conditions/values. The P_Key index in the P_Key table is attached to the QP used for transmission of packets. In case the P_Key table on the port is more than one entry deep the software needs to decide the P_Key to use. Note: The P_Key table can be written to only by the SM [1]. When multicasting the destination QP is always set to 0xFFFFFF. 4 Datagram Extended Transport Header 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Queue Key (Q_Key) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Reserved | Source Queue Pair | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ This header includes the sender's queue pair number and the Q_Key used in thn communication. Full Copyright Statement Copyright (C) The Internet Society (2001). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in Kashyap [Page 17] INTERNET-DRAFT IPv6 over InfiniBand July 12, 2001 part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Kashyap [Page 18] -- Vivek Kashyap IBM viv@sequent.com vivk@us.ibm.com 503 578 3422 (o)