idnits 2.17.1 

draft-gashinsky-6man-v6nd-enhance-02.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The abstract seems to contain references ([RFC4861]), which it
     shouldn't.  Please replace those with straight textual mentions of the
     documents in question.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == The document seems to lack the recommended RFC 2119 boilerplate, even if
     it appears to use RFC 2119 keywords. 

     (The document does seem to have the reference to RFC 2119 which the
     ID-Checklist requires).
  -- The document date (October 22, 2012) is 4205 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Unused Reference: 'RFC2119' is defined on line 386, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC4398' is defined on line 389, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC4862' is defined on line 396, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC6164' is defined on line 399, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC4255' is defined on line 411, but no explicit
     reference was found in the text

  == Outdated reference: A later version (-07) exists of
     draft-ietf-6man-impatient-nud-02


     Summary: 1 error (**), 0 flaws (~~), 8 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                          W. Kumari
3	Internet-Draft                                                    Google
4	Intended status: Informational                              I. Gashinsky
5	Expires: April 25, 2013                                           Yahoo!
6	                                                              J. Jaeggli
7	                                                                   Zynga
8	                                                         K. Chittimaneni
9	                                                                  Google
10	                                                        October 22, 2012

12	           Neighbor Discovery Enhancement for DOS mititgation
13	                  draft-gashinsky-6man-v6nd-enhance-02

15	Abstract

17	   In IPv4, subnets are generally small, made just large enough to cover
18	   the actual number of machines on the subnet.  In contrast, the
19	   default IPv6 subnet size is a /64, a number so large it covers
20	   trillions of addresses, the overwhelming number of which will be
21	   unassigned.  Consequently, simplistic implementations of Neighbor
22	   Discovery can be vulnerable to denial of service attacks whereby they
23	   attempt to perform address resolution for large numbers of unassigned
24	   addresses.  Such denial of attacks can be launched intentionally (by
25	   an attacker), or result from legitimate operational tools that scan
26	   networks for inventory and other purposes.  As a result of these
27	   vulnerabilities, new devices may not be able to "join" a network, it
28	   may be impossible to establish new IPv6 flows, and existing IPv6
29	   transported flows may be interrupted.

31	   This document describes a modification to the [RFC4861] neighbor
32	   discovery protocol aimed at improving the resilience of the neighbor
33	   discovery process.  We call this process Gratuitous neighbor
34	   discovery and it derives inspiration in part from analogous IPv4
35	   gratuitous ARP implementation.

37	Status of this Memo

39	   This Internet-Draft is submitted in full conformance with the
40	   provisions of BCP 78 and BCP 79.

42	   Internet-Drafts are working documents of the Internet Engineering
43	   Task Force (IETF).  Note that other groups may also distribute
44	   working documents as Internet-Drafts.  The list of current Internet-
45	   Drafts is at http://datatracker.ietf.org/drafts/current/.

47	   Internet-Drafts are draft documents valid for a maximum of six months
48	   and may be updated, replaced, or obsoleted by other documents at any
49	   time.  It is inappropriate to use Internet-Drafts as reference
50	   material or to cite them other than as "work in progress."

52	   This Internet-Draft will expire on April 25, 2013.

54	Copyright Notice

56	   Copyright (c) 2012 IETF Trust and the persons identified as the
57	   document authors.  All rights reserved.

59	   This document is subject to BCP 78 and the IETF Trust's Legal
60	   Provisions Relating to IETF Documents
61	   (http://trustee.ietf.org/license-info) in effect on the date of
62	   publication of this document.  Please review these documents
63	   carefully, as they describe your rights and restrictions with respect
64	   to this document.  Code Components extracted from this document must
65	   include Simplified BSD License text as described in Section 4.e of
66	   the Trust Legal Provisions and are provided without warranty as
67	   described in the Simplified BSD License.

69	Table of Contents

71	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  4
72	     1.1.  Applicability  . . . . . . . . . . . . . . . . . . . . . .  4
73	   2.  The Problem  . . . . . . . . . . . . . . . . . . . . . . . . .  4
74	     2.1.  Scenario 1 - DoS condition induced by default router
75	           failure  . . . . . . . . . . . . . . . . . . . . . . . . .  5
76	   3.  Terminology  . . . . . . . . . . . . . . . . . . . . . . . . .  6
77	   4.  Background . . . . . . . . . . . . . . . . . . . . . . . . . .  7
78	   5.  Neighbor Discovery Overview  . . . . . . . . . . . . . . . . .  8
79	   6.  Proposed Solutions . . . . . . . . . . . . . . . . . . . . . .  8
80	     6.1.  NDP Protocol Gratuitous NA . . . . . . . . . . . . . . . .  9
81	     6.2.  User Configurable DELAY_FIRST_PROBE_TIME . . . . . . . . .  9
82	   7.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . .  9
83	   8.  Security Considerations  . . . . . . . . . . . . . . . . . . . 10
84	   9.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 10
85	   10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 10
86	     10.1. Normative References . . . . . . . . . . . . . . . . . . . 10
87	     10.2. Informative References . . . . . . . . . . . . . . . . . . 10
88	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 11

90	1.  Introduction

92	   This document describes modifications to the IPv6 Neighbor Discovery
93	   protocol [RFC4861] in order to reduce exposure to vulnerabilities
94	   when a network is scanned, either by an intruder, as part of a
95	   deliberate DOS attempt, or through the use of scanning tools that
96	   perform network inventory, security audits, etc. (e.g., "nmap").  In
97	   some cases, DOS-like conditions can also be induced by legitimate
98	   traffic in heavy traffic networks such as campuses or datacenters.

100	1.1.  Applicability

102	   This document is primarily intended for implementors of [RFC4861].

104	   This document is a companion to two additional documents.  The first
105	   document was [RFC6583] Operational Neighbor Discovery Problems which
106	   addressed the problem in detail and described operational and
107	   implementation mitigation within the framework of the Existing
108	   protocol.  The second related document [I-D.ietf-6man-impatient-nud]
109	   Neighbor Unreachability Detection is too impatient proposes to alter
110	   the Neighbor unreachability Detection by relaxing rules in an attempt
111	   to keep devices in the cache.

113	   In this document we propose alterations that allow the update or
114	   installation of neighbor entries without the instigation of a full
115	   [RFC4861] neighbor solicitation.

117	2.  The Problem

119	   In IPv4, subnets are generally small, made just large enough to cover
120	   the actual number of machines on the subnet.  For example, an IPv4
121	   /20 contains only 4096 address.  In contrast, the default IPv6 subnet
122	   size is a /64, a number so large it covers literally billions of
123	   billions of addresses, the overwhelming number of which will be
124	   unassigned.  Consequently, simplistic implementations of Neighbor
125	   Discovery can be vulnerable to denial of service attacks whereby they
126	   perform address resolution for large numbers of unassigned addresses.
127	   Such denial of attacks can be launched intentionally (by an
128	   attacker), or result from legitimate operational tools that scan
129	   networks for inventory and other purposes.  As a result of these
130	   vulnerabilities, new devices may not be able to "join" a network, it
131	   may be impossible to establish new IPv6 flows, and existing IPv6
132	   transport flows may be interrupted.

134	   Network scans attempt to find and probe devices on a network.
135	   Typically, scans are performed on a range of target addresses, or all
136	   the addresses on a particular subnet.  When such probes are directed
137	   via a router, and the target addresses are on a directly attached
138	   network, the router will to attempt to perform address resolution on
139	   a large number of destinations (i.e., some fraction of the 2^64
140	   addresses on the subnet).  The process of testing for the
141	   (non)existence of neighbors can induce a denial of service condition,
142	   where the number of Neighbor Discovery requests overwhelms the
143	   implementation's capacity to process them, exhausts available memory,
144	   replaces existing in-use mappings with incomplete entries that will
145	   never be completed, etc.  The result can be network disruption, where
146	   existing traffic may be impacted, and devices that join the net find
147	   that address resolutions fails.

149	   In order to alleviate risk associated with this DOS threat, some
150	   router implementations have taken steps to rate-limit the processing
151	   rate of Neighbor Solicitations (NS).  While these mitigations do
152	   help, they do not fully address the issue and may introduce their own
153	   set of potential liabilities to the neighbor discovery process.

155	   In some network environments, legitimate Neighbor Discovery traffic
156	   from a large number of connected hosts could induce a DoS condition
157	   even without the use of any scanning tools.

159	2.1.  Scenario 1 - DoS condition induced by default router failure

161	   Consider the following scenario - You have a pair of routers, R1 and
162	   R2, acting as default routers for a campus wifi network that serves
163	   thousands of clients.  These clients range from traditional laptops
164	   with common OSes such as Windows, MAC OS X, etc., to smart phones and
165	   tablets running a slew of mobile OSes.  R1, R2 and all clients are
166	   configured with default ND parameters.

168	   Under normal operating conditions, R1 acts as a default gateway for
169	   all client traffic and R2 is mostly acting as a standby.  R1 and R2
170	   routinely send out Router Advertisements and all nodes perform
171	   Neighbor Discovery as per the default timers configured.  Clients
172	   that are actively transmitting and receiving data will likely have a
173	   Neighbor Cache entry for R1 as REACHABLE and R2 as STALE.

175	   Now imagine that for some reason (power outage, hardware failure,
176	   etc.)  R1 goes down.  When this happens, R2 begins various
177	   housekeeping tasks such as reconverging its routing protocols (OSPF,
178	   BGP, etc.), recalculating layer 2 topologies such as in STP and so
179	   on.  Typically, such reconvergence incidents are quite CPU intensive
180	   depending on the size of the topology and are generally aggravated in
181	   dual stack environments.  Once clients determine that R1 is no longer
182	   reachable, they would start using R2 as their default router.

184	   At this point, the Neighbor Cache Entry for R2 is still marked as
185	   STALE.  As per RFC4861, a node will start sending packets to R2, mark
186	   the neighbor cache entry for R2 as DELAY and set a timer to expire in
187	   DELAY_FIRST_PROBE_TIME seconds.  DELAY_FIRST_PROBE_TIME is a fixed
188	   node constant with a value of 5 seconds.  If the entry is still in
189	   the DELAY state when the timer expires, the entry's state changes to
190	   PROBE.  Upon entering the PROBE state, a node sends a unicast
191	   Neighbor Solicitation message to R2 using the cached link-layer
192	   address.

194	   Ordinarily, it is highly likely that the client will receive
195	   reachability confirmation within the 5 seconds of
196	   DELAY_FIRST_PROBE_TIME by virtue of hints from upper layer protocols.
197	   However, in this scenario, given that R2 is busy doing other things,
198	   it is possible that it will take a longer time for the client to
199	   receive said reachability confirmation, forcing it to enter the PROBE
200	   state and send out a unicast NS message.

202	   With thousands of clients now sending out unicast NS messages to R2
203	   in a short period of time, while it is busy dealing with other
204	   reconvergence related calculations, you effectively end up in a DoS
205	   situation entirely with legitimate traffic.

207	3.  Terminology

209	   Address Resolution  Address resolution is the process through which a
210	      node determines the link-layer address of a neighbor given only
211	      its IP address.  In IPv6, address resolution is performed as part
212	      of Neighbor Discovery [RFC4861], p60

214	   Forwarding Plane  That part of a router responsible for forwarding
215	      packets.  In higher-end routers, the forwarding plane is typically
216	      implemented in specialized hardware optimized for performance.
217	      Forwarding steps include determining the correct outgoing
218	      interface for a packet, decrementing its Time To Live (TTL),
219	      verifying and updating the checksum, placing the correct link-
220	      layer header on the packet, and forwarding it.

222	   Control Plane  That part of the router implementation that maintains
223	      the data structures that determine where packets should be
224	      forwarded.  The control plane is typically implemented as a
225	      "slower" software process running on a general purpose processor
226	      and is responsible for such functions as the routing protocols,
227	      performing management and resolving the correct link-layer address
228	      for adjacent neighbors.  The control plane "controls" the
229	      forwarding plane by programming it with the information needed for
230	      packet forwarding.

232	   Neighbor Cache  As described in [RFC4861], the data structure that
233	      holds the cache of (amongst other things) IP address to link-layer
234	      address mappings for connected nodes.  The forwarding plane
235	      accesses the Neighbor Cache on every forwarded packet.  Thus it is
236	      usually implemented in an ASIC .

238	   Neighbor Discovery Process  The Neighbor Discovery Process (NDP) is
239	      that part of the control plane that implements the Neighbor
240	      Discovery protocol.  NDP is responsible for performing address
241	      resolution and maintaining the Neighbor Cache.  When forwarding
242	      packets, the forwarding plane accesses entries within the Neighbor
243	      Cache.  Whenever the forwarding plane processes a packet for which
244	      the corresponding Neighbor Cache Entry is missing or incomplete,
245	      it notifies NDP to take appropriate action (typically via a shared
246	      queue).  NDP picks up requests from the shared queue and performs
247	      any necessary actions.  In many implementations it is also
248	      responsible for responding to router solicitation messages,
249	      Neighbor Unreachability Detection (NUD), etc.

251	4.  Background

253	   Modern router architectures separate the forwarding of packets
254	   (forwarding plane) from the decisions needed to decide where the
255	   packets should go (control plane).  In order to deal with the high
256	   number of packets per second the forwarding plane is generally
257	   implemented in hardware and is highly optimized for the task of
258	   forwarding packets.  In contrast, the NDP control plane is mostly
259	   implemented in software processes running on a general purpose
260	   processor.

262	   When a router needs to forward an IP packet, the forwarding plane
263	   logic performs the longest match lookup to determine where to send
264	   the packet and what outgoing interface to use.  To deliver the packet
265	   to an adjacent node, It encapsulates the packet in a link-layer frame
266	   (which contains a header with the link-layer destination address).
267	   The forwarding plane logic checks the Neighbor Cache to see if it
268	   already has a suitable link-layer destination, and if not, places the
269	   request for the required information into a queue, and signals the
270	   control plane (i.e., NDP) that it needs the link-layer address
271	   resolved.

273	   In order to protect NDP specifically and the control plane generally
274	   from being overwhelmed with these requests, appropriate steps must be
275	   taken.  For example, the size and rate of the queue might be limited.
276	   NDP running in the control plane of the router dequeues requests and
277	   performs the address resolution function (by performing a neighbor
278	   solicitation and listening for a neighbor advertisement).  This
279	   process is usually also responsible for other activities needed to
280	   maintain link-layer information, such as Neighbor Unreachability
281	   Detection (NUD).

283	   An attacker sending the appropriate packets to addresses on a given
284	   subnet can cause the router to queue attempts to resolve so many
285	   addresses that it crowds out attempts to resolve "legitimate"
286	   addresses (and in many cases becomes unable to perform maintenance of
287	   existing entries in the neighbor cache, and unable to answer Neighbor
288	   Solicitiation).  This condition can result the inability to resolve
289	   new neighbors and loss of reachability to neighbors with existing ND-
290	   Cache entries.  During testing it was concluded that 4 simultaneous
291	   nmap sessions from a low-end computer was sufficient to make a
292	   router's neighbor discovery process unhappy and therefore forwarding
293	   unusable.

295	   This behavior has been observed across multiple platforms and
296	   implementations.

298	5.  Neighbor Discovery Overview

300	   When a packet arrives at (or is generated by) a router for a
301	   destination on an attached link, the router needs to determine the
302	   correct link-layer address to send the packet to.  The router checks
303	   the Neighbor Cache for an existing Neighbor Cache Entry for the
304	   neighbor, and if none exists, invokes the address resolution portions
305	   of the IPv6 Neighbor Discovery [RFC4861] protocol to determine the
306	   link-layer address.

308	   RFC4861 Section 5.2 (Conceptual Sending Algorithm) outlines how this
309	   process works.  A very high level summary is that the device creates
310	   a new Neighbor Cache Entry for the neighbor, sets the state to
311	   INCOMPLETE, queues the packet and initiates the actual address
312	   resolution process.  The device then sends out one or more Neighbor
313	   Solicitations, and when it receives a corresponding Neighbor
314	   Advertisement, completes the Neighbor Cache Entry and sends the
315	   queued packet.

317	6.  Proposed Solutions

319	   Let us examine a few possible solutions that could alleviate the
320	   issues discussed in 'The Problem' section

322	6.1.  NDP Protocol Gratuitous NA

324	   RFC 4861, section 7.2.5 and 7.2.6 [RFC4861] requires that unsolicited
325	   neighbor advertisements result in the receiver setting it's neighbor
326	   cache entry to STALE, kicking off the resolution of the neighbor
327	   using neighbor solicitation.  If the link layer address in an
328	   unsolicited neighbor advertisement matches that of the existing ND
329	   cache entry, routers SHOULD retain the existing entry updating it's
330	   status with regards to LRU retention policy.

332	   Hosts MAY be configured to send unsolicited Neighbor advertisement at
333	   a rate set at the discretion of the operators.  The rate SHOULD be
334	   appropriate to the sizing of ND cache parameters and the host count
335	   on the subnet.  An unsolicited NA rate parameter MUST NOT be enabled
336	   by default.  The unsolicited rate interval as interpreted by hosts
337	   must jitter the value for the interval between transmissions.  Hosts
338	   receiving a neighbor solicitation requests from a router following
339	   each of three subsequent gratuitous NA intervals MUST revert to RFC
340	   4861 behavior.

342	   Implementation of new behavior for unsolicited neighbor advertisement
343	   would make it possible under appropriate circumstances to greatly
344	   reduce the dependence on the neighbor solicitation process for
345	   retaining existing ND cache entries.

347	   This may impact the detection of one-way reachability.

349	6.2.  User Configurable DELAY_FIRST_PROBE_TIME

351	   A very simple solution for Scenario 1 could be to have a user
352	   configurable DELAY_FIRST_PROBE_TIME that could be set to a higher
353	   value than the current constant of 5 seconds.  This would allow
354	   clients to keep sending traffic in the DELAY state, while giving more
355	   time for R2 to stabilize before it has to process the barrage of ND
356	   messages.  It will be up to Network administrators to determine what
357	   this value should be based upon unique characteristics of their
358	   setup.  Having a longer DELAY_FIRST_PROBE_TIME does run the risk of
359	   clients sending traffic without ever knowing that they have forward
360	   reachability.  However, in most cases, the router's forwarding plane
361	   remains unaffected during high CPU events and therefore the
362	   likelihood of the traffic making it to the destination is high.

364	7.  IANA Considerations

366	   No IANA resources or consideration are requested in this draft.

368	8.  Security Considerations

370	   This technique has potential impact on neighbor detection and in
371	   particular the discovery of unidirectional forwarding problems.

373	9.  Acknowledgements

375	   The authors would like to thank Ron Bonica, Troy Bonin, John Jason
376	   Brzozowski, Randy Bush, Vint Cerf, Jason Fesler Erik Kline, Jared
377	   Mauch, Chris Morrow and Suran De Silva.  Special thanks to Thomas
378	   Narten for detailed review and (even more so) for providing text!

380	   Apologies for anyone we may have missed; it was not intentional.

382	10.  References

384	10.1.  Normative References

386	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
387	              Requirement Levels", BCP 14, RFC 2119, March 1997.

389	   [RFC4398]  Josefsson, S., "Storing Certificates in the Domain Name
390	              System (DNS)", RFC 4398, March 2006.

392	   [RFC4861]  Narten, T., Nordmark, E., Simpson, W., and H. Soliman,
393	              "Neighbor Discovery for IP version 6 (IPv6)", RFC 4861,
394	              September 2007.

396	   [RFC4862]  Thomson, S., Narten, T., and T. Jinmei, "IPv6 Stateless
397	              Address Autoconfiguration", RFC 4862, September 2007.

399	   [RFC6164]  Kohno, M., Nitzan, B., Bush, R., Matsuzaki, Y., Colitti,
400	              L., and T. Narten, "Using 127-Bit IPv6 Prefixes on Inter-
401	              Router Links", RFC 6164, April 2011.

403	10.2.  Informative References

405	   [I-D.ietf-6man-impatient-nud]
406	              Nordmark, E. and I. Gashinsky, "Neighbor Unreachability
407	              Detection is too impatient",
408	              draft-ietf-6man-impatient-nud-02 (work in progress),
409	              July 2012.

411	   [RFC4255]  Schlyter, J. and W. Griffin, "Using DNS to Securely
412	              Publish Secure Shell (SSH) Key Fingerprints", RFC 4255,
413	              January 2006.

415	   [RFC6583]  Gashinsky, I., Jaeggli, J., and W. Kumari, "Operational
416	              Neighbor Discovery Problems", RFC 6583, March 2012.

418	Authors' Addresses

420	   Warren Kumari
421	   Google

423	   Email: warren@kumari.net

425	   Igor
426	   Yahoo!
427	   45 W 18th St
428	   New York, NY
429	   USA

431	   Email: igor@yahoo-inc.com

433	   Joel
434	   Zynga
435	   111 Evelyn
436	   Sunnyvale, CA
437	   USA

439	   Email: jjaeggli@zynga.com

441	   Kiran
442	   Google
443	   1600 Amphitheater Pkwy
444	   Mountain View, CA
445	   USA

447	   Email: kk@google.com