2.3.6 IP over InfiniBand (ipoib)

NOTE: This charter is a snapshot of the 51st IETF Meeting in London, England. It may now be out-of-date. Last Modified: 31-Jul-01


Bill Strahm <bill@strahm.net>

Internet Area Director(s):

Thomas Narten <narten@raleigh.ibm.com>
Erik Nordmark <nordmark@eng.sun.com>

Internet Area Advisor:

Thomas Narten <narten@raleigh.ibm.com>

Mailing Lists:

General Discussion:ipoverib@mailbag.intel.com
To Subscribe: listserv@mailbag.intel.com
In Body: subscribe ipoverib
Archive: message to listserv@mailbag.intel.com with index ipoverib in body

Description of Working Group:

InfiniBand is an emerging standard intended as an interconnect for processor and I/O systems and devices (see the Infiniband Trade Association web site at http://www.infinibandta.org for details). IP is one type of traffic (and a very important one) that could use this interconnect. InfiniBand would benefit greatly from a standardized method of handling IP traffic on IB fabrics. It is also important to be able to manage InfiniBand devices in a common way.

The work group will specify the procedures and protocols to support IPv4/v6 over an InfiniBand fabric. Further, they will specify the set of MIB objects to allow management of the InfiniBand protocol.

The scope of this WG is limited to the definition of an encapsulation format for carrying IPv4 and IPv6 over IB networks and for performing address resolution between IP address and IB link-layer addresses. At the present time, more advanced functionalities such as mapping IP QOS into IB-specific capabilities is out of scope. Such work items may be considered in the future, but will require a recharter.

Work items

1. Specify a standards track procedure for supporting ARP/ND packets, and resolving IP addresses to IB link addresses.

2. Specify a standards track encapsulation for carrying IPv4 and IPv6 packets over IB.

3. Determine how to and specify a standard for transfering IP multicast over IB. IB has an optional receiver join multicast capability. Current working group plans are to use IB multicast as part of ARP, so using it for IP multicast as well may be a reasonable approach.

4. Specify a standards track channel adapter MIB that will allow management of an InfiniBand channel adapter. There will also need to be InfiniBand types approved and added to the ifType defined by IANA

5. Specify a standards track baseboard management MIB that will allow management of specified device properties

6. Specify sample counter MIBs to allow InfiniBand sample counters to be exposed to external SNMP management applications

Goals and Milestones:

Jul 01


Submit initial Internet-Draft of ARP encapsulation

Jul 01


Submit initial Internet-Draft of Requirements/Overview

Jul 01


Submit initial Internet-Draft of IP V4/V6 Encapsulation

Jul 01


Submit initial Internet-Draft of Infiniband-Like MIB

Jul 01


Submit initial Internet-Draft of Channel Adapter MIB

Jul 01


Submit initial Internet-Draft of Multicast

Nov 01


Submit initial Internet-Draft of Baseboard MIB

Nov 01


Submit initial Internet-Draft of Sample Counter MIB

Feb 02


Submit initial Internet-Draft of Subnet Mangement MIB

Mar 02


Submit ARP/IP/Multicast encapsulation drafts for IESG Last Call

Mar 02


Submit Infiniband-Like MIB for IESG Last Call

Mar 02


Submit Channel Adapter MIB for IESG Last Call

No Current Internet-Drafts
No Request For Comments

Current Meeting Report

IPOIB Minutes

Bill started the meeting on time at 2:15pm. He first quickly reviewed the very tight agenda and then asked the group for permission to cut the break short to make sure we moved through the agenda.

Bill reviewed the goals of the meeting. Pointed out that IB switch MIB is not a working group item - might need to work on a charter extension. A new working group mail alias will be set up soon, and be the official mailing list.

Asked for consensus on whether he should move all the existing subscribers to the new mail alias. Seemed strong consensus to do that rather than require all to resubscribe to the new alias.

Bill Anderson presented on the IB Interface MIB.

Status is first draft was submitted, but haven't seen any comments yet.

Bill said the draft was preliminary and would appreciate feedback. Covers interface status and statistics. Don't see VL or link speed showing up. Co-resident management paradigms addressed. Tried to define mappings to IBA managed attributes mapping. Defined two new groups. Asked for feedback on several issues:

- per VL traffic counters? VLs are not virtual circuit, but you could require them to look like them. Bill feels strongly we need traffic counters on a per VL basis.

- Permissive DLID packets do not count as broadcast packets. Joe proposed a permissive DLID is a unicast.

- Many IB counters are optional, but some of the error counters are in the MIB as being mandatory.

- Counters initially were 64 bits, but got feedback and backed it down to 32. Concerned about back-to-back link error counts, and others. It was pointed out that a 64 bit counter does not require a 64 bit hardware counter. Software could accumulate a 32 bit counter into a 64 bit counter.

Next Steps
- develop case diagrams
- analyze error counter wrap frequency
- others?

Additional MIB areas that interest Bill
- port info - is a work item, need it to be done by
- switch info
- flow controlo state and statistics
- baseboard management

Bill S. asked for a consensus vote on whether anyone thought this should not be a work item. No one objected. He asked Bill A. to submit a revised draft ASAP that was more complete.

Brian Forbes presented on the IB Switch MIB

The initial proposal is premilinary. Pointed out that the topic is actually broader than just a switch. Draft-ietf-forbes-ibswitchmib-00

This is outside the original charter, but would like to see it put on. Scope is broad (see slides). Even includes baseboard management.

- provide SNMP management
- range of switch implementations (embedded, pizza box, modular/bladed)
- <??>
- establish a framework for multi-protocol SAN (system and storage) Management (IB, FC, etc).

Wants to try to raise the bar in terms of MIB functionality.

Showed a slide of the schema. Eurom at Silicant (???) expressed An opinion we try to map to existing management schema.

Question on relevance of RFC 2573 trap management for SNMP - Forbes said that he'd get back on that question.

Want to intercept where some of the other interconnects are in terms of manageability, specifically Fibre Channel. Question asked on where this was derived from. It's derived from the Infiniband link record.

Known action items
- finish SNMPv2 compatibility, plus SMI.
- Take feedback
- Overview too brief
- Add diagrams
- Expand comments (e.g. traps)

Bill asked a scoping question if the group is interested, does it fit in the charter? Comment was made we can't determine this realtime.

Bill summarized next steps:
- cleaned up doc
- AD's need to determine if this is within scope, or will we have to re-charter.

Bill asked for a consensus vote on whether this was a good thing. About 5-7 folks felt it was a good thing, no-one said it was bad. Some were concerned about whether this was in the existing charter. Bill said he hoped to resolve this via e-mail and not wait until Deccember. AD said he was concerned about the WG taking on too many things initially and then blowing it.

Jerry Chu presented on IPoIB IPv6 proposal


What delimits an IPoIB link? According to RFC2460, "nodes that can communicate at the LINK layer". How does this map to an IB subnet? Or an IB partition? Through IB routers?

How to support IP multicast (incl. Broadcast) on link? One must do: map IP multicast to IB multicast that is fixed. Link multicast must have a boundary which matches the link boundary (link-local scope). IP multicast routing work independently of link type.

Jerry summarized IB "network layer 3" (MGID) and IB "link layer 2" (MLID). Jerry pointed out that multicast routing is not defined yet. The MLID layer, the spec defines a link-local Bill asked if talk was relevant to only 1.0 or also 1.0.a IB spec.

Vivek claims IB spec doesn't support multiple MGIDs. Jerry agrees that it is ambiguous. What does it mean to have records with the same MGID, but different attributes? Jerry asks whether the same MGID can be shared?

Vivek says that all we need to do is map to an MGID. Joe Pelessier suggested we need to clearly delineate what is an IETF problem vs. what is an IBTA problem. He believes we should only map to the MGID, leave the mapping from an MGID to an MLID to the IBTA.

Vivek says he doesn't think we need to worry about the mapping of an MGID to an MLID - it doesn't effect us. Jerry says we need to better data on how to map MGID to the MLID. Joe P. agrees.

Jerry limited an IP link to only be within an IB subnet. If we don't do this, then the scoping essentially doesn't work. Further, should we limit it to a single partition? P_Key + Q_Key? Site-local scope with all nodes?

Link layer address with Jerry's proposal is simpler - only include the ones that have to be there to identify an IB endpoint and make this interoperable. He believes GUID or GID is the right choice. Vivek asks what is the model of the address resolution. Vivek points out that this could mean that the driver than has to do the DLID lookup to the subnet manager. Vivek points out that the other end knows all the information you need - why not put it in the ARP response. Jerry points out that this assumes the path is symmetric - not necessarily a good thing. Also, this makes the ARP resolution more complex. Joe P. believes that this subnet manager communication has to be implemented anyway, so putting it all in the ARP reply doesn't eliminate any code. Vivek didn't agree.

Some discussion on link MTU. Jerry felt that if a 4K MTU HCA comes up on a 2 K MTU fabric, it not be allowed to join the subnet.

Took a 30 minute break.

Vivek's talk

Vivek talked about the status of his ID's. He's authored a requirements draft and is soliciting feedback.

Multicasting draft focused on v4, but he felt that the differences with IPv6 are minimal. Recommendation to make sure you send comments on the draft to the mailing list. Vivek feels that IP subnets across IB subnets is a bad idea because it is poorly specified.

IB has a lack of many things, including scope bits, routing protocol, port to router interaction, router discovery/redirects, complex address resolution.

Jerry comments that if we solve the multicast problem, the IBTA comes out with a routing spec, it will work. Joe agrees. AD says he has a hard time believing this will work.

VLAN tags. Issue is whether an MGID can be present multiple times in the same subnet. If it were, it makes it simple because the VLAN maps to the partition. If it is not possible, we would have to use some other mechanism, possibly Q_Key. Some skeptics on whether the IBTA would allow multiple mappings of the MGID.

Discussed several issues - Doesn't like having a payload type header (could require a different QPN for each protocol). Determination of the service level. QPN flag to map services to QPs. Flags for IP over IB transport - saying what it the capability, would need a separate draft to specify the protocol to use the capability.

Joe P's presentation - comparing the 3 drafts

SL - Vivek proposes using the SL of the multicast all-hosts. Joe is proposing that it be configured into the subnet administration. Joe believes this adheres to the IB intended use. Vivek points out his proposal is actually a bit more complex and permits you to use the IB mechanism. Joe is also proposing it be required, Vivek is proposing it be optional.

Joe walks through a deadlock scenario that he believes can crash the fabric. In the example all nodes use one VL, and asserting flow control. It is possible to deadlock the loop when the switches assert flow control and there is a chain of 2-hop communication going on that create a loop. One classic way to solve this is to label one link of the loop as an epoch and require traffic through that link be on a different VL.

In IB you can't specify VL, just SL. Might map to same VL. Nodes can only control VL on the local link. IB specified solution is to have the SM solve this. But this solution is not required by IB. BUT, IBTA did do extensive work to solve this. All these mechanisms depend on using the SL provided by the SM. So why is it not required - IBTA believes this is a mistake. Quite a bit of discussion on trade-offs of SL solutions.

IB transport. All have UD as required. Vivek allows transport over other service types. Joe feels it's better to remain silent until they are specified, rather than creating the flag but not specify an algorithm. But Joe feels this is a knit.

IP spanning IB subnets -Joes states main issue is multicast. If we can solve that, we should. The rest is IBTA issues.

IBARP Contents - agree on QPN, GUID, QKey

Joe feels GID/GUID choice is not something to loose sleep over.

Skipped through some slides in the interest of time.

Joe talks about GRH. Vivek said he's concerned about the driver physically has to deal with two formats in terms of what is in the receive buffer (GRH or no GRH). Joe feels the argument is not strong enough to mandate this.

Some discussion on what is required, vs. optional. Joe claims you are required to be able to handle either - thus the issue does not effect interoperability.

Ended the discussion with where to go from here given that we have 3 different ID's on the table. Bill outlined several options, including we put the 3 groups of authors in a room, we vote and pick one, or ??? Much discussion on the appropriate path. AD's recommended that consensus be reached on what differences there are between the proposal, as well as what is the same. This might make it clearer. The 3 author groups agreed to take this to the mail alias. Bill asked Joe to revise his document published to the e-mail alias to and add to it a comparison of Vivek vs. Jerry's proposal, then have the folks work together on the e-mail alias to come up with a summary where the group can understand where consensus lies.

On the requirements document, Bill said that Vivek's draft could be the initial form of a requirements draft, but that it should wait until after consensus is reached on how to proceed with the 3 proposals.


IB Switch MIB
IPv6 over IB Encapsulation Unicast and Multicast
IB Interface MIB
Comparison of IP over IB Proposals
Multicasting, IP Encapsulation, Address Resolution