< draft-ietf-ipoib-architecture-03.txt   draft-ietf-ipoib-architecture-04.txt >
INTERNET DRAFT INTERNET DRAFT
<draft-ietf-ipoib-architecture-03.txt> Vivek Kashyap <draft-ietf-ipoib-architecture-04.txt> Vivek Kashyap
Expiration Date: April, 2004 IBM Expiration Date: October, 2004 IBM
October, 2003 April, 2004
IP over InfiniBand(IPoIB) Architecture IP over InfiniBand(IPoIB) Architecture
Status of this memo Status of this memo
This document is an Internet-Draft and is in full conformance This document is an Internet-Draft and is in full conformance
with all provisions of Section 10 of RFC 2026. with all provisions of Section 10 of RFC 2026.
Internet-Drafts are working documents of the Internet Internet-Drafts are working documents of the Internet
Engineering Task Force (IETF), its areas, and its working Engineering Task Force (IETF), its areas, and its working
skipping to change at page 2, line 5 skipping to change at page 2, line 5
InfiniBand is a high speed, channel based interconnect between InfiniBand is a high speed, channel based interconnect between
systems and devices. systems and devices.
This document presents an overview of the InfiniBand This document presents an overview of the InfiniBand
architecture. It further describes the requirements and architecture. It further describes the requirements and
guidelines for the transmission of IP over InfiniBand. guidelines for the transmission of IP over InfiniBand.
Discussions in this document are applicable to both IPv4 and Discussions in this document are applicable to both IPv4 and
IPv6 unless explicitly specified. The encapsulation of IP over IPv6 unless explicitly specified. The encapsulation of IP over
InfiniBand and the mechanism for IP address resolution on IB InfiniBand and the mechanism for IP address resolution on IB
fabrics are covered in [IPOIB_ENCAP] and [IPOIB_DHCP]. fabrics are covered in other documents.
Table of Contents Table of Contents
1.0 Introduction to InfiniBand 1.0 Introduction to InfiniBand
1.1 InfiniBand Architecture Specification 1.1 InfiniBand Architecture Specification
1.2 Overview of InfiniBand Architecture 1.2 Overview of InfiniBand Architecture
1.2.1 InfiniBand Addresses 1.2.1 InfiniBand Addresses
1.2.1.1 Unicast GIDs 1.2.1.1 Unicast GIDs
1.2.1.2 Multicast GIDs 1.2.1.2 Multicast GIDs
1.3 InfiniBand Multicast Group Management 1.3 InfiniBand Multicast Group Management
skipping to change at page 2, line 28 skipping to change at page 2, line 28
1.3.2 Join and Leave operations 1.3.2 Join and Leave operations
1.3.2.1 Creating a Multicast Group 1.3.2.1 Creating a Multicast Group
1.3.2.3 Deleting a Multicast Group 1.3.2.3 Deleting a Multicast Group
1.3.2.4 Multicast Group Create/Delete Traps 1.3.2.4 Multicast Group Create/Delete Traps
2.0 Management of InfiniBand Subnet 2.0 Management of InfiniBand Subnet
3.0 IP over IB 3.0 IP over IB
3.1 InfiniBand as Datalink 3.1 InfiniBand as Datalink
3.2 Multicast Support 3.2 Multicast Support
3.2.1 Mapping IP Multicast to IB Multicast 3.2.1 Mapping IP Multicast to IB Multicast
3.2.2 Transient Flag in IB MGIDs 3.2.2 Transient Flag in IB MGIDs
3.3 IP Subnet Across IB Subnets ? 3.3 IP Subnet Across IB Subnets
4.0 IP Subnets in InfiniBand Fabrics 4.0 IP Subnets in InfiniBand Fabrics
4.1 IPoIB VLANs 4.1 IPoIB VLANs
4.2 Multicast in IPoIB Subnets 4.2 Multicast in IPoIB Subnets
4.2.1 Sending IP Multicast Datagrams 4.2.1 Sending IP Multicast Datagrams
4.2.2 Receiving Multicast Packets 4.2.2 Receiving Multicast Packets
4.2.3 Forwarding Multicast Packets 4.2.3 Forwarding Multicast Packets
4.2.4 Impact of InfiniBand Architecture Limits 4.2.4 Impact of InfiniBand Architecture Limits
4.2.5 Leaving/Deleting a Multicast Group 4.2.5 Leaving/Deleting a Multicast Group
5.0 QoS and Related Issues 5.0 QoS and Related Issues
6.0 Security Considerations 6.0 Security Considerations
7.0 Acknowledgements 7.0 Acknowledgments
8.0 References 8.0 References
9.0 Author's Address 9.0 Author's Address
1.0 Introduction to InfiniBand 1.0 Introduction to InfiniBand
The InfiniBand Trade Association(IBTA) was formed to develop The InfiniBand Trade Association(IBTA) was formed to develop
an I/O specification to deliver a channel based, switched an I/O specification to deliver a channel based, switched
fabric technology. The InfiniBand standard is aimed at meeting fabric technology. The InfiniBand standard is aimed at meeting
the requirements of scalability, reliability, availability and the requirements of scalability, reliability, availability and
performance of servers in data centers. performance of servers in data centers.
skipping to change at page 7, line 29 skipping to change at page 7, line 26
its IPv6 raw datagram QP. its IPv6 raw datagram QP.
The first 4 types are referred to as IB transports. The latter The first 4 types are referred to as IB transports. The latter
two are classified as Raw datagrams. There is no indication of two are classified as Raw datagrams. There is no indication of
the QP number in the raw datagram packets. The raw datagram the QP number in the raw datagram packets. The raw datagram
packets are limited by the link MTU in size. packets are limited by the link MTU in size.
The two connected modes and the reliable datagram mode may The two connected modes and the reliable datagram mode may
also support 'Automatic Path Migration(APM)'. This is an also support 'Automatic Path Migration(APM)'. This is an
optional facility that provides for a hardware based path optional facility that provides for a hardware based path
failover. An alternate path is associated with the QP when the fail over. An alternate path is associated with the QP when
connection/EE context is first created. If unrecoverable the connection/EE context is first created. If unrecoverable
errors are encountered the connection switches to using the errors are encountered the connection switches to using the
alternate path. alternate path.
1.2.1 InfiniBand Addresses 1.2.1 InfiniBand Addresses
The InfiniBand architecture borrows heavily from the IPv6 The InfiniBand architecture borrows heavily from the IPv6
architecture in terms of the InfiniBand subnet structure and architecture in terms of the InfiniBand subnet structure and
global identifiers (GIDs). global identifiers (GIDs).
The InfiniBand architecture defines the global identifier The InfiniBand architecture defines the GID associated with a
associated with a port as follows: port as a 128-bit unicast or multicast identifier. IBA derives
the GID address format from the IPv6 format[RFC_2373] with
GID (Global Identifier): A 128-bit unicast or some additional properties/restrictions defined to facilitate
multicast identifier used to identify a port on a efficient discovery, communication and routing.
channel adapter, a port on a router, a switch, or a
multicast group. A GID is a valid 128-bit IPv6
address(per RFC 2373) with additional
properties/restrictions defined within IBA to
facilitate efficient discovery, communication, and
routing.
Note: These rules apply only to IBA operation and do
not apply to raw IPv6 operation unless specifically
called out.
The raw IPv6 operation referred to in the note Note:
above is the IPv6 mode of InfiniBand's raw datagram The IBA refers to [RFC_2373] explicitly. It must be noted
service. It does not mean IPv6 itself. The routers and that IBA is therefore unaffected by any further changes
switches referred to in the above definition are the that are introduced in IPv6 addressing architecture.
InfiniBand routers and switches.
The InfiniBand(IB) specification defines two types of GIDs: IBA defines two types of GIDs:
unicast and multicast. unicast and,
multicast.
1.2.1.1 Unicast GIDs 1.2.1.1 Unicast GIDs
The unicast GIDs are defined, as in IPv6, with three scopes. The unicast GIDs are defined, as in IPv6, with three scopes.
The IB specification states: The IB specification states:
a. link local: This is defined to be FE80/10. a. link local: This is defined to be FE80/10.
The IB routers will not forward packets with a The IB routers will not forward packets with a
link local address in source or destination link local address in source or destination
skipping to change at page 8, line 47 skipping to change at page 8, line 36
c. global: c. global:
A unicast GID with a global prefix, i.e. an IB A unicast GID with a global prefix, i.e. an IB
router may use this GID to route packets router may use this GID to route packets
throughout an enterprise or internet. throughout an enterprise or internet.
1.2.1.2 Multicast GIDs 1.2.1.2 Multicast GIDs
The multicast GIDs also parallel the IPv6 multicast addresses. The multicast GIDs also parallel the IPv6 multicast addresses.
The IB specification defines the multicast GIDs as follows: The IB specification defines the multicast GIDs as follows:
FFxy:<112 bits> FFxy:<112 bits>
Flag bits: Flag bits:
The nibble, denoted by x above, are the 4 flag bits: 000T. The nibble, denoted by x above, are the 4 flag bits: 000T.
The first three bits are reserved and are set to zero. The The first three bits are reserved and are set to zero. The
last bit is defined as follows: last bit is defined as follows:
T=0: denotes a permanently assigned i.e. well known GID T=0: denotes a permanently assigned i.e. well known GID
T=1: denotes a transient group T=1: denotes a transient group
Scope bits: Scope bits:
The 4 bits, denoted by y in the GID above, are the scope The 4 bits, denoted by y in the GID above, are the scope
bits. These scope values are described in Table 1. bits. These scope values are described in Table 1.
scope value Address value scope value Address value
0 Reserved 0 Reserved
1 Unassigned 1 Unassigned
2 Link-local 2 Link-local
3 Unassigned 3 Unassigned
4 Unassigned 4 Unassigned
5 Site-local 5 Site-local
6 Unassigned 6 Unassigned
7 Unassigned 7 Unassigned
8 Organization-local 8 Organization-local
9 Unassigned 9 Unassigned
0xA Unassigned 0xA Unassigned
0xB Unassigned 0xB Unassigned
0xC Unassigned 0xC Unassigned
0xD Unassigned 0xD Unassigned
0xE Global 0xE Global
0xF Reserved 0xF Reserved
Table 1 Table 1
The IB specification further refers to [RFC_2373] and The IB specification further refers to [RFC_2373] and
[RFC_2375] while defining the well known multicast addresses. [RFC_2375] while defining the well known multicast addresses.
However, it then states that the well known addresses apply to However, it then states that the well known addresses apply to
IB raw IPv6 datagrams only. It must be noted though that a IB raw IPv6 datagrams only. It must be noted though that a
multicast group can be associated with only a single MGID. multicast group can be associated with only a single MGID.
Thus the same MGID cannot be associated with the UD mode and Thus the same MGID cannot be associated with the UD mode and
the raw datagram mode. the raw datagram mode.
1.3 InfiniBand Multicast Group Management 1.3 InfiniBand Multicast Group Management
skipping to change at page 10, line 36 skipping to change at page 10, line 25
characteristics that define a group. characteristics that define a group.
A LID is associated with the multicast group by the subnet A LID is associated with the multicast group by the subnet
manager(SM) at the time of the multicast group creation. The manager(SM) at the time of the multicast group creation. The
SM determines the multicast tree based on all the group SM determines the multicast tree based on all the group
members and programs the relevant switches. The Multicast members and programs the relevant switches. The Multicast
LID(MLID) is used by the switches to route the packets. LID(MLID) is used by the switches to route the packets.
Any member IB port wanting to participate in the multicast Any member IB port wanting to participate in the multicast
group must join the group. As part of the join operation the group must join the group. As part of the join operation the
port receives the group characteristics from the SM. At the node receives the group characteristics from the SM. At the
same time the subnet manager ensures that the requester can same time the subnet manager ensures that the requester can
indeed participate in the group by verifying that it can indeed participate in the group by verifying that it can
support the group MTU, and accessibility to the rest of the support the group MTU, and accessibility to the rest of the
group members. Other group characteristics may need group members. Other group characteristics may need
verification too. verification too.
The SM, for groups that span IB subnet boundaries, must The SM, for groups that span IB subnet boundaries, must
interact with IB routers to determine the presence of this interact with IB routers to determine the presence of this
group in other IB subnets. If present the MTU must match group in other IB subnets. If present the MTU must match
across the IB subnets. across the IB subnets.
skipping to change at page 11, line 26 skipping to change at page 11, line 15
MGID - Multicast GID for this multicast group MGID - Multicast GID for this multicast group
PortGID - Valid GID of the port joining this multicast group PortGID - Valid GID of the port joining this multicast group
Q_Key - Q_Key to be used by this multicast group Q_Key - Q_Key to be used by this multicast group
MLID - Multicast LID for this multicast group MLID - Multicast LID for this multicast group
MTU - MTU for this multicast group MTU - MTU for this multicast group
P_Key - Partition key for this multicast group P_Key - Partition key for this multicast group
SL - Service Level for this multicast group SL - Service Level for this multicast group
Scope - Same as MGID address scope Scope - Same as MGID address scope
JoinState - Join/Leave status requested by the port: JoinState - Join/Leave status requested by the port:
bit 0: FullMemeber bit 0: FullMember
bit 1: NonMember bit 1: NonMember
bit 2: SendOnlyNonMember bit 2: SendOnlyNonMember
1.3.1.1 JoinState 1.3.1.1 JoinState
The JoinState indicates the membership qualities a port wishes The JoinState indicates the membership qualities a port wishes
to add while joining/creating a group or delete when leaving a to add while joining/creating a group or delete when leaving a
group. The meaning of the JoinState bits are: group. The meaning of the JoinState bits are:
FullMember: FullMember:
skipping to change at page 12, line 51 skipping to change at page 12, line 40
the group. the group.
Note that a special 'delete' message does not exist. It is a Note that a special 'delete' message does not exist. It is a
side effect of the last FullMember 'leave' operation. side effect of the last FullMember 'leave' operation.
1.3.2.4 Multicast Group Create/Delete Traps 1.3.2.4 Multicast Group Create/Delete Traps
The SA may be requested by the ports to generate a report The SA may be requested by the ports to generate a report
whenever a multicast group is created or deleted. The port can whenever a multicast group is created or deleted. The port can
specify the multicast group it is interested in i.e. use a specify the multicast group it is interested in i.e. use a
specific MGID or use a wildcard request. The SA will report specific MGID or use a wild card request. The SA will report
these events using traps 66 (for creates) and 67 (for these events using traps 66 (for creates) and 67 (for
deletes)[IB_ARCH]. deletes)[IB_ARCH].
Therefore, a port wishing to join a group but not create it by Therefore, a port wishing to join a group but not create it by
itself may request a create notification or a port might even itself may request a create notification or a port might even
request a notification for all groups that are created(a request a notification for all groups that are created(a
wildcarded request). The SA will diligently inform them of the wild card request). The SA will diligently inform them of the
creation utilising the aforementioned traps. The requestor can creation utilizing the aforementioned traps. The requester can
then join the multicast group indicated. Similarly, a then join the multicast group indicated. Similarly, a
SendOnlyNonMember or a NonMember might request the SA to SendOnlyNonMember or a NonMember might request the SA to
inform it of group deletions. The endnode, on receiving a inform it of group deletions. The endnode, on receiving a
delete report, can safely release the resources associated delete report, can safely release the resources associated
with the group. The associated MLID is no longer valid for the with the group. The associated MLID is no longer valid for the
group and may be reassigned to a new multicast group by the group and may be reassigned to a new multicast group by the
SM. SM.
2.0 Management of InfiniBand Subnet 2.0 Management of InfiniBand Subnet
To aid in the monitoring and configuration of InfiniBand To aid in the monitoring and configuration of InfiniBand
subnet components a set of MIBs need to be defined. MIBs are subnet components a set of MIB modules need to be defined.
needed for the channel adapters, InfiniBand interfaces, MIB modules are needed for the channel adapters, InfiniBand
InfiniBand subnet manager, InfiniBand subnet management agents interfaces, InfiniBand subnet manager, InfiniBand subnet
and to allow the management of specific device properties. It management agents and to allow the management of specific
must be noted that the management objects addressed in the device properties. It must be noted that the management
IPoIB documents are for all of the IB subnet components and objects addressed in the IPoIB documents are for all of the
are not limited to IP(over IB). The relevant MIBs are IB subnet components and are not limited to IP(over IB).
described in separate documents and are not covered here. The relevant MIB modules are described in separate
documents and are not covered here.
3.0 IP over IB 3.0 IP over IB
As described in section 1.0, the InfiniBand architecture As described in section 1.0, the InfiniBand architecture
provides a broad set of capabilities to choose from when provides a broad set of capabilities to choose from when
implementing IP over InfiniBand networks. implementing IP over InfiniBand networks.
The IPoIB specification must not, and does not, require The IPoIB specification must not, and does not, require
changes in IP and higher layer protocols. Nor does it mandate changes in IP and higher layer protocols. Nor does it mandate
requirements on IP stacks to implement special user level requirements on IP stacks to implement special user level
programs. It is an aim of IPoIB specification that the IPoIB programs. It is an aim of IPoIB specification that the IPoIB
changes be amenable to modularisation and incorporation into changes be amenable to modularization and incorporation into
existing implementations at the same level as other media existing implementations at the same level as other media
types. types.
3.1 InfiniBand as Datalink 3.1 InfiniBand as Datalink
InfiniBand architecture provides multiple methods of data InfiniBand architecture provides multiple methods of data
exchange between two endpoints as was noted above. These are: exchange between two endpoints as was noted above. These are:
Reliable Connected (RC) Reliable Connected (RC)
Reliable Datagram (RD) Reliable Datagram (RD)
skipping to change at page 14, line 42 skipping to change at page 14, line 26
fabric support multicasting. This is possible only in fabric support multicasting. This is possible only in
Unreliable datagram (UD) and IB's Raw datagram modes. Unreliable datagram (UD) and IB's Raw datagram modes.
Thus it is only the UD mode that is universal, supports Thus it is only the UD mode that is universal, supports
multicast, and a robust CRC. Given these conditions it is the multicast, and a robust CRC. Given these conditions it is the
obvious choice for IP over InfiniBand [IPOIB_ENCAP]. obvious choice for IP over InfiniBand [IPOIB_ENCAP].
Future documents might consider the connected modes. In Future documents might consider the connected modes. In
contrast to the limited link MTU offered by UD mode, the contrast to the limited link MTU offered by UD mode, the
connected modes can offer significant benefit in terms of connected modes can offer significant benefit in terms of
performance by utilising a larger MTU. Reliability is also performance by utilizing a larger MTU. Reliability is also
enhanced if the underlying feature of automatic path migration enhanced if the underlying feature of automatic path migration
of connected modes is utilised. of connected modes is utilized.
3.2 Multicast Support 3.2 Multicast Support
InfiniBand specification makes support of multicasting in the InfiniBand specification makes support of multicasting in the
switches optional. Multicast however, is a basic requirement switches optional. Multicast however, is a basic requirement
in IP networks. Therefore, IPoIB requires that multicast in IP networks. Therefore, IPoIB requires that multicast
capable InfiniBand fabrics be used to implement IPoIB capable InfiniBand fabrics be used to implement IPoIB
subnets. subnets.
3.2.1 Mapping IP Multicast to IB Multicast 3.2.1 Mapping IP Multicast to IB Multicast
skipping to change at page 15, line 38 skipping to change at page 15, line 14
section 1.3. The IB specification also defines some well known section 1.3. The IB specification also defines some well known
IB multicast GIDs(MGIDs). The MGIDs are reserved for the IB's IB multicast GIDs(MGIDs). The MGIDs are reserved for the IB's
Raw datagram mode which is incompatible with the other Raw datagram mode which is incompatible with the other
transports of IB. Any mapping that is defined from IP transports of IB. Any mapping that is defined from IP
multicast addresses therefore must not fall into IB's multicast addresses therefore must not fall into IB's
definition of a well-known address. definition of a well-known address.
Therefore all IPoIB related multicast GIDs always set the Therefore all IPoIB related multicast GIDs always set the
transient bit. transient bit.
3.3 IP Subnets Across IB Subnets ? 3.3 IP Subnets Across IB Subnets
Some implementations may wish to support multiple clusters of Some implementations may wish to support multiple clusters of
machines in their own IB subnets but otherwise be part of a machines in their own IB subnets but otherwise be part of a
common IP subnet. For such a solution the IB specification common IP subnet. For such a solution the IB specification
needs multiple upgrades. Some of the required enhancements needs multiple upgrades. Some of the required enhancements
are: are:
1) A method for creating IB multicast GIDs that span multiple 1) A method for creating IB multicast GIDs that span multiple
IB subnets. The partition keys and other parameters need to IB subnets. The partition keys and other parameters need to
be consistent across IB subnets. be consistent across IB subnets.
skipping to change at page 16, line 25 skipping to change at page 15, line 49
The IPoIB subnet is overlaid over the IB subnet. The IPoIB The IPoIB subnet is overlaid over the IB subnet. The IPoIB
subnet is brought up in the following steps: subnet is brought up in the following steps:
Note: the join/leave operation at the IP level will be Note: the join/leave operation at the IP level will be
referred to as IP_join/IP_leave and the join/leave referred to as IP_join/IP_leave and the join/leave
operations at the IB level will be referred to as operations at the IB level will be referred to as
IB_join in this document. IB_join in this document.
1. The all-IPoIB nodes IB multicast group is created 1. The all-IPoIB nodes IB multicast group is created
The fabric administrator creates an IB multicast The fabric administrator creates a IB multicast
group(henceforth called 'broadcast group') when the IP subnet group(henceforth called 'broadcast group') when the IP subnet
is setup. The 'broadcast group' is defined in [IPOIB_ENCAP]. is setup. The 'broadcast group' is defined in [IPOIB_ENCAP].
The method by which the broadcast group is setup is not The method by which the broadcast group is setup is not
defined by IPoIB. The group may be setup at the SM by the defined by IPoIB. The group may be setup at the SM by the
administrator or by the first IB_join. administrator or by the first IB_join.
As noted earlier, at the time of creating an IB multicast As noted earlier, at the time of creating an IB multicast
group, multiple values such as the P_Key, Q_Key, Service group, multiple values such as the P_Key, Q_Key, Service
Level, Hop Limit, Flow ID, TClass, MTU etc., have to be Level, Hop Limit, Flow ID, TClass, MTU etc., have to be
specified. These values should be such that all potential specified. These values should be such that all potential
members of the IB multicast group are be able to communicate members of the IB multicast group are be able to communicate
with one another when using them. In the future, as the IB with one another when using them. In the future, as the IB
skipping to change at page 17, line 31 skipping to change at page 17, line 8
However, the P_Key must still be known to the IPoIB endnode However, the P_Key must still be known to the IPoIB endnode
before it can join the broadcast-group. The P_Key is included before it can join the broadcast-group. The P_Key is included
in the mapping of the broadcast group[IPOIB_ENCAP]. Another in the mapping of the broadcast group[IPOIB_ENCAP]. Another
parameter, the scope of the broadcast group, also needs to be parameter, the scope of the broadcast group, also needs to be
known to the endnode before it can join the broadcast group. known to the endnode before it can join the broadcast group.
It is an implementation choice on how the P_Key and the scope It is an implementation choice on how the P_Key and the scope
bits related to the IPoIB subnet are determined by the bits related to the IPoIB subnet are determined by the
implementation. These could be configuration parameters implementation. These could be configuration parameters
initialised by some means by the administrator. initialized by some means by the administrator.
The methods employed by an implementation to determine the The methods employed by an implementation to determine the
P_Key and scope bits are not specified by IPoIB. P_Key and scope bits are not specified by IPoIB.
4.1 IPoIB VLANs 4.1 IPoIB VLANs
The endpoints in an IB subnet must have compatible P_Keys to The endpoints in an IB subnet must have compatible P_Keys to
communicate with one another. Thus the administrator when communicate with one another. Thus the administrator when
setting up an IP subnet over an IB subnet must ensure that all setting up an IP subnet over an IB subnet must ensure that all
the members have compatible P_Keys. An IP subnet can have only the members have compatible P_Keys. An IP subnet can have only
skipping to change at page 18, line 19 skipping to change at page 17, line 44
IP multicast on InfiniBand subnets follows the same concepts IP multicast on InfiniBand subnets follows the same concepts
and rules as on any other media. However, unlike most other and rules as on any other media. However, unlike most other
media multicast over InfiniBand requires interaction with media multicast over InfiniBand requires interaction with
another entity, the IB subnet manager. This section describes another entity, the IB subnet manager. This section describes
the outline of the process and suggests some guidelines. the outline of the process and suggests some guidelines.
IB architecture specifies the following format for IB IB architecture specifies the following format for IB
multicast packets when used over unreliable datagram(UD) multicast packets when used over unreliable datagram(UD)
mode: mode:
+--------+-------+---------+---------+-------+---------+---------+ +--------+-------+---------+---------+-------+---------+---------+
|Local |Global |Base |Datagram |Packet |Invariant| Variant | |Local |Global |Base |Datagram |Packet |Invariant| Variant |
|Routing |Routing|Transport|Extended |Payload| CRC | CRC | |Routing |Routing|Transport|Extended |Payload| CRC | CRC |
|Header |Header |Header |Transport| (IP) | | | |Header |Header |Header |Transport| (IP) | | |
| | | |Header | | | | | | | |Header | | | |
+--------+-------+---------+---------+-------+---------+---------+ +--------+-------+---------+---------+-------+---------+---------+
For details about the various headers please refer to For details about the various headers please refer to
InfiniBand Architecture Specification[IB_ARCH]. InfiniBand Architecture Specification[IB_ARCH].
The Global routing header (GRH) includes the IB multicast The Global routing header (GRH) includes the IB multicast
group GID. The Local routing header (LRH) includes the local group GID. The Local routing header (LRH) includes the local
identifier (LID). The IB switches in the fabric route the identifier (LID). The IB switches in the fabric route the
packet based on the LID. packet based on the LID.
The GID is made available to the receiving IB user (the IPoIB The GID is made available to the receiving IB user (the IPoIB
interface driver for example). The driver can therefore interface driver for example). The driver can therefore
determine the IB group the packet belongs to. determine the IB group the packet belongs to.
IPv4 defines three levels of multicast compliance. These are: IPv4 defines three levels of multicast compliance. These are:
Level 0: No support for IP multicasting Level 0: No support for IP multicasting
Level 1: Support for sending but not receiving multicasts Level 1: Support for sending but not receiving multicasts
Level 2: Full support for IP multicasting Level 2: Full support for IP multicasting
In IPv6 there is no such distinction. Full multicast support In IPv6 there is no such distinction. Full multicast support
is mandatory. Additionally, all IPv4 subnets support is mandatory. Additionally, all IPv4 subnets support
broadcast(255.255.255.255). IPv4 broadcast can always be broadcast(255.255.255.255). IPv4 broadcast can always be
sent/received by all IPv4 interfaces. sent/received by all IPv4 interfaces.
Every IPoIB subnet requires the broadcast GID to be defined. Every IPoIB subnet requires the broadcast GID to be defined.
Thus a packet can always be broadcast. Thus a packet can always be broadcast.
4.2.1 Sending IP Multicast Datagrams 4.2.1 Sending IP Multicast Datagrams
An IP host may send a multicast packet at any time to any An IP host may send a multicast packet at any time to any
multicast address. multicast address.
The IP layer conveys the multicast packet to the IPoIB The IP layer conveys the multicast packet to the IPoIB
interface driver/module. This module attempts to IB_join the interface driver/module. This module attempts to IB_join the
relevant IB multicast group. This is required since otherwise relevant IB multicast group. This is required since otherwise
skipping to change at page 21, line 47 skipping to change at page 21, line 21
The encapsulation of IP packets in InfiniBand is described The encapsulation of IP packets in InfiniBand is described
in[IPOIB_ENCAP]. in[IPOIB_ENCAP].
It specifies the use of an 'Ethertype' value [IANA] in all It specifies the use of an 'Ethertype' value [IANA] in all
IPoIB communication packets. The link-layer address is IPoIB communication packets. The link-layer address is
comprised of the Global Identifier(GID) and the Queue Pair comprised of the Global Identifier(GID) and the Queue Pair
Number(QPN) [IPOIB_ENCAP]. Number(QPN) [IPOIB_ENCAP].
To allow for multiple IB subnet based IPoIB subnets, the To allow for multiple IB subnet based IPoIB subnets, the
specification utilises the Global Identifier(GID) as part of specification utilizes the Global Identifier(GID) as part of
the link-layer address. Since all packets in IB have to use the link-layer address. Since all packets in IB have to use
the Local Identifier(LID) the address resolution process has the Local Identifier(LID) the address resolution process has
the additional step of resolving the destination GID, returned the additional step of resolving the destination GID, returned
in response to ARP/ND request, to the LID[IPOIB_ENCAP]. This in response to ARP/ND request, to the LID[IPOIB_ENCAP]. This
phase of address resolution might also be used to determine phase of address resolution might also be used to determine
other essential parameters (e.g. the SL, path rate etc.)for other essential parameters (e.g. the SL, path rate etc.)for
successful IB communication between two peers. successful IB communication between two peers.
As noted earlier, all communication in the IPoIB subnet As noted earlier, all communication in the IPoIB subnet
derives the Q_Key to use from the Q_Key specified in the derives the Q_Key to use from the Q_Key specified in the
skipping to change at page 22, line 25 skipping to change at page 21, line 47
link-addresses. In the case of IPoIB, the link-address link-addresses. In the case of IPoIB, the link-address
includes the QPN which might not be constant across reboots or includes the QPN which might not be constant across reboots or
even across network interface resets. Therefore, static ARP even across network interface resets. Therefore, static ARP
entries or RARP server entries will only work if the entries or RARP server entries will only work if the
implementation(s) using these options can ensure that the QPN implementation(s) using these options can ensure that the QPN
associated with an interface is invariant across associated with an interface is invariant across
reboots/network resets[IPOIB_ENCAP]. reboots/network resets[IPOIB_ENCAP].
4.5 DHCPv4 and IPoIB 4.5 DHCPv4 and IPoIB
DHCPv4 [RFC_2131] utilises a 'client identifier' field DHCPv4 [RFC_2131] utilizes a 'client identifier' field
(expected to hold the link-layer address) of 16 bytes. The (expected to hold the link-layer address) of 16 bytes. The
address in the case of IPoIB is 20-bytes. To get around this address in the case of IPoIB is 20-bytes. To get around this
problem IPoIB specifies [IPOIB_DHCP] that the 'broadcast flag' problem IPoIB specifies [IPOIB_DHCP] that the 'broadcast flag'
be used by the client when requesting an IP address. be used by the client when requesting an IP address.
5.0 QoS and Related Issues 5.0 QoS and Related Issues
The IB specification suggests the use of service levels for The IB specification suggests the use of service levels for
load balancing, QoS and deadlock avoidance within an IB load balancing, QoS and deadlock avoidance within an IB
subnet. But the IB specification leaves the usage and mode of subnet. But the IB specification leaves the usage and mode of
skipping to change at page 22, line 49 skipping to change at page 22, line 23
Every IPoIB implementation will determine the relevant SL Every IPoIB implementation will determine the relevant SL
value based on its own policy. No method or process for value based on its own policy. No method or process for
choosing the SL has been defined by the IPoIB standards. choosing the SL has been defined by the IPoIB standards.
6.0 Security Considerations 6.0 Security Considerations
This document describes the IB architecture as relevant to This document describes the IB architecture as relevant to
IPoIB. It further restates issues specified in other IPoIB. It further restates issues specified in other
documents. It does not itself specify any requirements. There documents. It does not itself specify any requirements. There
are no security issues introduced by this document. IPoIB are no security issues introduces by this document. IPoIB
related security issues are described in related security issues are described in [IPOIB_ENCAP] and
[IPOIB_ENCAP] and [IPOIB_DHCP]. [IPOIB_DHCP].
7.0 Acknowledgements 7.0 Acknowledgments
This document has benefited from the comments and suggestion This document has benefited from the comments and suggestions
of the members of the IPoIB working group and the members of of the members of the IPoIB working group and the members of
the InfiniBand(SM) Trade Association. the InfiniBand(SM) Trade Association.
8.0 References 8.0 References
8.1 Normative References
[IB_ARCH] InfiniBand Architecture Specification, Volume 1.1 [IB_ARCH] InfiniBand Architecture Specification, Volume 1.1
[IPOIB_ENCAP] draft-ietf-ipoib-ip-over-infiniband-06.txt
[IPOIB_DHCP] draft-ietf-ipoib-dhcp-over-infiniband-05.txt
8.2 Informative References
[RFC_2373] IP Version 6 Addressing Architecture [RFC_2373] IP Version 6 Addressing Architecture
[RFC_2375] IPv6 Multicast Address Assignments [RFC_2375] IPv6 Multicast Address Assignments
[RFC_1700] Assigned Numbers [RFC_1700] Assigned Numbers
[RFC_1112] Host extensions for IP multicasting [RFC_1112] Host extensions for IP multicasting
[RFC_2236] Internet Group Management Protocol, Version 2 [RFC_2236] Internet Group Management Protocol, Version 2
[RFC_2710] Multicast Listener Discovery [RFC_2710] Multicast Listener Discovery
[IPOIB_ENCAP] draft-ietf-ipoib-ip-over-infiniband-05.txt
[IPOIB_DHCP] draft-ietf-ipoib-dhcp-over-infiniband-05.txt
9.0 Author's Address 9.0 Author's Address
Vivek Kashyap Vivek Kashyap
IBM IBM
15450, SW Koll Parkway 15450, SW Koll Parkway
Beaverton, OR 97006 Beaverton, OR 97006
Phone: +1 503 578 3422 Phone: +1 503 578 3422
Email: vivk@us.ibm.com Email: vivk@us.ibm.com
 End of changes. 44 change blocks. 
94 lines changed or deleted 88 lines changed or added

This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/