[RTG-DIR] RtgDir review: draft-ietf-armd-problem-statement-03

"Bhatia, Manav (Manav)" <manav.bhatia@alcatel-lucent.com> Wed, 15 August 2012 12:41 UTC

Return-Path: <manav.bhatia@alcatel-lucent.com>
X-Original-To: rtg-dir@ietfa.amsl.com
Delivered-To: rtg-dir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8975821F8807 for <rtg-dir@ietfa.amsl.com>; Wed, 15 Aug 2012 05:41:04 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -7.742
X-Spam-Level:
X-Spam-Status: No, score=-7.742 tagged_above=-999 required=5 tests=[AWL=-1.143, BAYES_00=-2.599, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id txJKmbGTVBou for <rtg-dir@ietfa.amsl.com>; Wed, 15 Aug 2012 05:41:03 -0700 (PDT)
Received: from ihemail3.lucent.com (ihemail3.lucent.com [135.245.0.37]) by ietfa.amsl.com (Postfix) with ESMTP id 5FD0721F8806 for <rtg-dir@ietf.org>; Wed, 15 Aug 2012 05:41:03 -0700 (PDT)
Received: from ihemail2.lucent.com (h135-245-2-35.lucent.com [135.245.2.35]) by ihemail3.lucent.com (8.13.8/IER-o) with ESMTP id q7FCf2nb029976 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Wed, 15 Aug 2012 07:41:02 -0500 (CDT)
Received: from inbansmailrelay2.in.alcatel-lucent.com (h135-250-11-33.lucent.com [135.250.11.33]) by ihemail2.lucent.com (8.13.8/IER-o) with ESMTP id q7FCewwa002707 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Wed, 15 Aug 2012 07:41:01 -0500 (CDT)
Received: from INBANSXCHHUB01.in.alcatel-lucent.com (inbansxchhub01.in.alcatel-lucent.com [135.250.12.32]) by inbansmailrelay2.in.alcatel-lucent.com (8.14.3/8.14.3/GMO) with ESMTP id q7FCevwO001102 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=NOT); Wed, 15 Aug 2012 18:10:58 +0530
Received: from INBANSXCHMBSA1.in.alcatel-lucent.com ([135.250.12.38]) by INBANSXCHHUB01.in.alcatel-lucent.com ([135.250.12.32]) with mapi; Wed, 15 Aug 2012 18:10:57 +0530
From: "Bhatia, Manav (Manav)" <manav.bhatia@alcatel-lucent.com>
To: "rtg-ads@tools.ietf.org" <rtg-ads@tools.ietf.org>
Date: Wed, 15 Aug 2012 18:09:38 +0530
Thread-Topic: RtgDir review: draft-ietf-armd-problem-statement-03
Thread-Index: Ac164wjUJdJmNCDFQXqpVYzDnTP0Nw==
Message-ID: <7C362EEF9C7896468B36C9B79200D8350D063A0AF5@INBANSXCHMBSA1.in.alcatel-lucent.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
acceptlanguage: en-US
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-Scanned-By: MIMEDefang 2.57 on 135.245.2.37
X-Scanned-By: MIMEDefang 2.57 on 135.245.2.35
Cc: "rtg-dir@ietf.org" <rtg-dir@ietf.org>, "draft-ietf-armd-problem-statement.all@tools.ietf.org" <draft-ietf-armd-problem-statement.all@tools.ietf.org>
Subject: [RTG-DIR] RtgDir review: draft-ietf-armd-problem-statement-03
X-BeenThere: rtg-dir@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Routing Area Directorate <rtg-dir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rtg-dir>, <mailto:rtg-dir-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/rtg-dir>
List-Post: <mailto:rtg-dir@ietf.org>
List-Help: <mailto:rtg-dir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rtg-dir>, <mailto:rtg-dir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 15 Aug 2012 12:41:04 -0000

Hello,

I have been selected as the Routing Directorate reviewer for this draft. The Routing Directorate seeks to review all routing or routing-related drafts as they pass through IETF last call and IESG review, and sometimes on special request. The purpose of the review is to provide assistance to the Routing ADs. For more information about the Routing Directorate, please see http://www.ietf.org/iesg/directorate/routing.html

Although these comments are primarily for the use of the Routing ADs, it would be helpful if you could consider them along with any other IETF Last Call comments that you receive, and strive to resolve them through discussion or by updating the draft.

Document: draft-ietf-armd-problem-statement-03
Reviewer: Manav Bhatia
Review Date: Aug 15 2012
IETF LC End Date: Aug 23 2012
Intended Status: Informational                                  

Summary: I have some concerns about this document that I think should be resolved before publication.

Major Issues:

1. In Sec 5 why is there a "may" in the following statement?

"From an L2 perspective, sending to a multicast vs. broadcast address *may* result in the packet being delivered to all nodes, but most (if not all) nodes will filter out the (unwanted) query via filters installed in the NIC -- hosts will never see such packets. "

"may" seems to indicate that there are scenarios when a multicast from an L2 perspective will not be delivered to all nodes. I am unable to envisage a scenario when this can happen? All BUM (broadcast, unlearnt unicast and multicast) traffic in vanilla L2 and VPLS (Virtual Private Lan Service) is delivered to *all* nodes. There are exceptions in H-VPLS or if MMRP is enabled but I suspect if the authors had this in their mind when they wrote the above text.

2. Sec 7.1 begins with the following text:

"One pain point with large L2 broadcast domains is that the routers connected to the L2 domain need to process "a lot of" ARP traffic."

I am not sure if this is correct with how an L2 broadcast domain has been defined in Sec 2. I would wager that a bigger pain point for a large L2 broadcast domain would be handling unknown unicast traffic that needs to get flooded, instead of dealing with the "ARP" traffic. I am aware of very very large L2 broadcast domains that have no ARP/ND scaling problems. Would it then make more sense to replace the L2 broadcast domain with an ARP/ND domain? If Yes, then ARP/ND domain too needs to be defined in Sec 2.

3. Sec 7.1 seems to suggest that Gratuitous ARPs pre-populate ARP caches on the neighboring devices. Without an explicit description of what a neighboring device is, I would presume that this also includes edge/core routers. In that case this statement is not entirely correct as I am aware of routers that will by default not pre-populate their ARP caches on receiving Gratuitous ARPs.

4. Sec 7.2 must also discuss the scaling impact of how the neighbor cache is maintained in IPv6 - especially the impact of moving the neighbor state from REACHABLE to STALE. Once the "IPv6 ARP" gets resolved the neighbor entry moves from the REACHABLE to STALE after around 30secs. The neighbor entry remains in this state till a packet needs to be forwarded to this neighbor. The first time a node sends a packet to a neighbor whose entry is STALE, the sender changes the state to DELAY and sets a timer to expire in around 5 seconds. Most routers initiate moving the state from STALE to DELAY by punting a copy of the data packet to CPU so that the sender can reinitiate the Neighbor discovery process. This patently can be quite CPU and buffer intensive if the neighbor cache size is huge.

Minor Issues:

1. Sec 2 - Terminology should define Address Resolution as this seems to be the core issue that the draft is discussing.

Address Resolution:  Address resolution is the process through which a node determines the link-layer address of a neighbor given only its IP address.  In IPv6, address resolution is performed as part of Neighbor Discovery [RFC4861], Section 7.2.

2. In Sec 7.1 you mention that routers need to drop all transit traffic when there is no response received for an ARP/ND request. You should mention that in addition to this, routers also need to send an ICMP host unreachable error packet back to the sender. ICMP error packets are generated in the control card CPU. So, if the CPU has to generate a high number of such ICMP errors then this can load the CPU. The whole process can be quite CPU as well as buffer intensive. The CPU/buffer overload is usually mitigated by rate limiting the number of ICMP errors generated.

3. In Sec 7.1 you mention that the entire ARP/ND process can be quite CPU intensive since transit data traffic needs to be queued while the address resolution is underway. You could mention that this is mitigated by offloading the queuing part to the line card CPUs so that the CPU on the control card is not inundated with such packets. This obviously would only work on distributed systems that have separate CPUs on the line cards and the main card.

4. Sec 7.1 should mention that this could be used as a DoS attack wherein the attacker sends a high volume of packets for which ARPs need to be resolved. This could result in genuine packets that need to resolve ARPs getting dropped as there is only a finite rate at which packets are sent to CPU for ARP resolution. Again this is both CPU and buffer intensive.

5. Sec 7.2 discusses issues with address resolution mechanism in IPv6. I think its useful for this draft to discuss the fact that unlike IPv4, IPv6 has subnets that are /64. This number is quite large and will perhaps cover trillions of IP addresses, most of which would be unassigned. Thus simplistic IPv6 ND implementations can be vulnerable to attacks which inundates the CPU with huge requests to perform address resolution for a large number of IPv6 addresses, most of which are unassigned. As a result of this genuine IPv6 devices will not be able to join the network. You might want to refer to RFC 6583 for more details.

6. The last paragraph of Sec 7.3 says the following:

"Finally, IPv6 and IPv4 are often run simultaneously and in parallel on the same network, i.e., in dual-stack mode.  In such environments, the IPv4 and IPv6 issues enumerated above compound each other."

While I understand the sentiment behind the above statement, I fail to see how this is related to the MAC problem being described in Sec 7.3. The MAC scaling is a function of the total number of unique MACs that the system has to learn and is orthogonal to the presence of IPv4 or IPv6. I read this statement to mean that something extra happens in the dual stack mode which exacerbates the MAC problem even further. This I believe is patently not the case.

7. Sec 11 - Security Considerations should at the very least give pointers to references on issues related to ARP security vulnerabilities. I don't see IPv6 ND mentioned at all. Since ND relies on ICMPv6 and does not run directly over layer 2, there could possibly be security concerns specific to ND in the data center environments that don't apply to ARP. This document ought to discuss those so that ARMD (or some other WG) can look at solutions addressing those concerns. 

8. Should it be mentioned in the document somewhere (sec 11?) that data center administrators can configure ACLs to filter packets addressed to unallocated IPv6 addresses? Folks can consider the valid IPv6 address ranges and filter out packets that use the unallocated addresses. Doing this will avoid unnecessary ARP resolution for invalid IPv6 addresses. The list of the IPv6 addresses that are legitimate and should be permitted is small and maintainable because of IPv6's address hierarchy. http://www.iana.org/assignments/ipv6-unicast-address-assignments/ipv6-unicast-address-assignments.xml gives a list of large address blocks that have been allocated by IANA.

Cheers, Manav