idnits 2.17.1 

draft-ietf-grow-route-leak-problem-definition-06.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (May 5, 2016) is 2912 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

     No issues found here.

     Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	Global Routing Operations                                      K. Sriram
3	Internet-Draft                                             D. Montgomery
4	Intended status: Informational                                   US NIST
5	Expires: November 6, 2016                                   D. McPherson
6	                                                            E. Osterweil
7	                                                          Verisign, Inc.
8	                                                              B. Dickson
9	                                                             May 5, 2016

11	        Problem Definition and Classification of BGP Route Leaks
12	            draft-ietf-grow-route-leak-problem-definition-06

14	Abstract

16	   A systemic vulnerability of the Border Gateway Protocol routing
17	   system, known as 'route leaks', has received significant attention in
18	   recent years.  Frequent incidents that result in significant
19	   disruptions to Internet routing are labeled "route leaks", but to
20	   date a common definition of the term has been lacking.  This document
21	   provides a working definition of route leaks, keeping in mind the
22	   real occurrences that have received significant attention.  Further,
23	   this document attempts to enumerate (though not exhaustively)
24	   different types of route leaks based on observed events on the
25	   Internet.  The aim is to provide a taxonomy that covers several forms
26	   of route leaks that have been observed and are of concern to Internet
27	   user community as well as the network operator community.

29	Status of This Memo

31	   This Internet-Draft is submitted in full conformance with the
32	   provisions of BCP 78 and BCP 79.

34	   Internet-Drafts are working documents of the Internet Engineering
35	   Task Force (IETF).  Note that other groups may also distribute
36	   working documents as Internet-Drafts.  The list of current Internet-
37	   Drafts is at http://datatracker.ietf.org/drafts/current/.

39	   Internet-Drafts are draft documents valid for a maximum of six months
40	   and may be updated, replaced, or obsoleted by other documents at any
41	   time.  It is inappropriate to use Internet-Drafts as reference
42	   material or to cite them other than as "work in progress."

44	   This Internet-Draft will expire on November 6, 2016.

46	Copyright Notice

48	   Copyright (c) 2016 IETF Trust and the persons identified as the
49	   document authors.  All rights reserved.

51	   This document is subject to BCP 78 and the IETF Trust's Legal
52	   Provisions Relating to IETF Documents
53	   (http://trustee.ietf.org/license-info) in effect on the date of
54	   publication of this document.  Please review these documents
55	   carefully, as they describe your rights and restrictions with respect
56	   to this document.  Code Components extracted from this document must
57	   include Simplified BSD License text as described in Section 4.e of
58	   the Trust Legal Provisions and are provided without warranty as
59	   described in the Simplified BSD License.

61	Table of Contents

63	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
64	   2.  Working Definition of Route Leaks . . . . . . . . . . . . . .   3
65	   3.  Classification of Route Leaks Based on Documented Events  . .   3
66	     3.1.  Type 1: Hairpin Turn with Full Prefix . . . . . . . . . .   4
67	     3.2.  Type 2: Lateral ISP-ISP-ISP Leak  . . . . . . . . . . . .   5
68	     3.3.  Type 3: Leak of Transit-Provider Prefixes to Peer . . . .   5
69	     3.4.  Type 4: Leak of Peer Prefixes to Transit Provider . . . .   5
70	     3.5.  Type 5: Prefix Re-Origination with Data Path to
71	           Legitimate Origin . . . . . . . . . . . . . . . . . . . .   6
72	     3.6.  Type 6: Accidental Leak of Internal Prefixes and More
73	           Specific Prefixes . . . . . . . . . . . . . . . . . . . .   6
74	   4.  Additional Comments about the Classification  . . . . . . . .   7
75	   5.  Security Considerations . . . . . . . . . . . . . . . . . . .   7
76	   6.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .   7
77	   7.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .   7
78	   8.  Informative References  . . . . . . . . . . . . . . . . . . .   7
79	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  10

81	1.  Introduction

83	   Frequent incidents [Huston2012][Cowie2013][Toonk2015-A][Toonk2015-B][
84	   Cowie2010][Madory][Zmijewski][Paseka][LRL][Khare] that result in
85	   significant disruptions to Internet routing are commonly called
86	   "route leaks".  Examination of the details of some of these incidents
87	   reveals that they vary in their form and technical details.  In order
88	   to pursue solutions to "the route leak problem" it is important to
89	   first provide a clear, technical definition of the problem and
90	   enumerate its most common forms.  Section 2 provides a working
91	   definition of route leaks, keeping in view many recent incidents that
92	   have received significant attention.  Section 3 attempts to enumerate
93	   (though not exhaustively) different types of route leaks based on
94	   observed events on the Internet.  Further, Section 3 provides a
95	   taxonomy that covers several forms of route leaks that have been
96	   observed and are of concern to Internet user community as well as the
97	   network operator community.  This document builds on and extends
98	   earlier work in the IETF [draft-dickson-sidr-route-leak-def][draft-di
99	   ckson-sidr-route-leak-reqts].

101	2.  Working Definition of Route Leaks

103	   A proposed working definition of route leak is as follows:

105	   A "route leak" is the propagation of routing announcement(s) beyond
106	   their intended scope.  That is, an AS's announcement of a learned BGP
107	   route to another AS is in violation of the intended policies of the
108	   receiver, the sender and/or one of the ASes along the preceding AS
109	   path.  The intended scope is usually defined by a set of local
110	   redistribution/filtering policies distributed among the ASes
111	   involved.  Often, these intended policies are defined in terms of the
112	   pair-wise peering business relationship between ASes (e.g., customer,
113	   transit provider, peer).  (For literature related to AS relationships
114	   and routing policies, see [Gao] [Luckie] [Gill].  For measurements of
115	   valley-free violations in Internet routing, see [Anwar] [Giotsas]
116	   [Wijchers].)

118	   The result of a route leak can be redirection of traffic through an
119	   unintended path which may enable eavesdropping or traffic analysis,
120	   and may or may not result in an overload or black-hole.  Route leaks
121	   can be accidental or malicious, but most often arise from accidental
122	   misconfigurations.

124	   The above definition is not intended to be all encompassing.  Our aim
125	   here is to have a working definition that fits enough observed
126	   incidents so that the IETF community has a basis for developing
127	   solutions for route leak detection and mitigation.

129	3.  Classification of Route Leaks Based on Documented Events

131	   As illustrated in Figure 1, a common form of route leak occurs when a
132	   multi-homed customer AS (such as AS3 in Figure 1) learns a prefix
133	   update from one transit provider (ISP1) and leaks the update to
134	   another transit provider (ISP2) in violation of intended routing
135	   policies, and further the second transit provider does not detect the
136	   leak and propagates the leaked update to its customers, peers, and
137	   transit ISPs.

139	                                      /\              /\
140	                                       \ route-leak(P)/
141	                                        \ propagated /
142	                                         \          /
143	              +------------+    peer    +------------+
144	        ______| ISP1 (AS1) |----------->|  ISP2 (AS2)|---------->
145	       /       ------------+  prefix(P) +------------+ route-leak(P)
146	      | prefix |          \   update      /\        \  propagated
147	       \  (P)  /           \              /          \
148	        -------   prefix(P) \            /            \
149	                     update  \          /              \
150	                              \        /route-leak(P)  \/
151	                              \/      /
152	                           +---------------+
153	                           | customer(AS3) |
154	                           +---------------+

156	        Figure 1: Illustration of the basic notion of a route leak.

158	   This document proposes the following taxonomy to cover several types
159	   of observed route leaks, while acknowledging that the list is not
160	   meant to be exhaustive.  In what follows, the AS that announces a
161	   route that is in violation of the intended policies is referred to as
162	   the "offending AS".

164	3.1.  Type 1: Hairpin Turn with Full Prefix

166	   Description: A multi-homed AS learns a route from one upstream ISP
167	   and simply propagates it to another upstream ISP (the turn
168	   essentially resembling a hairpin).  Neither the prefix nor the AS
169	   path in the update is altered.  This is similar to a straight forward
170	   path-poisoning attack [Kapela-Pilosov], but with full prefix.  It
171	   should be noted that leaks of this type are often accidental (i.e.
172	   not malicious).  The update basically makes a hairpin turn at the
173	   offending AS's multi-homed AS.  The leak often succeeds (i.e. leaked
174	   update is accepted and propagated) because the second ISP prefers
175	   customer announcement over peer announcement of the same prefix.
176	   Data packets would reach the legitimate destination albeit via the
177	   offending AS, unless they are dropped at the offending AS due to its
178	   inability to handle resulting large volumes of traffic.

180	   o  Example incidents: Examples of Type 1 route-leak incidents are (1)
181	      the Dodo-Telstra incident in March 2012 [Huston2012], (2) the
182	      VolumeDrive-Atrato incident in September 2014 [Madory], and (3)
183	      the massive Telekom Malaysia route leak of about 179,000 prefixes,
184	      which in turn Level3 accepted and propagated [Toonk2015-B].

186	3.2.  Type 2: Lateral ISP-ISP-ISP Leak

188	   Description: The term "lateral" here is synonymous with "non-transit"
189	   or "peer-to-peer".  This type of route leak typically occurs when,
190	   for example, three sequential ISP peers (e.g.  ISP-A, ISP-B, and ISP-
191	   C) are involved, and ISP-B receives a route from ISP-A and in turn
192	   leaks it to ISP-C.  The typical routing policy between laterally
193	   (i.e. non-transit) peering ISPs is that they should only propagate to
194	   each other their respective customer prefixes.

196	   o  Example incidents: In [Mauch-nanog][Mauch], route leaks of this
197	      type are reported by monitoring updates in the global BGP system
198	      and finding three or more very large ISP ASNs in a sequence in a
199	      BGP update's AS path.  [Mauch] observes that its detection
200	      algorithm detects for these anomalies and potentially route leaks
201	      because very large ISPs do not in general buy transit services
202	      from each other.  However, it also notes that there are exceptions
203	      when one very large ISP does indeed buy transit from another very
204	      large ISP, and accordingly exceptions are made in its detection
205	      algorithm for known cases.

207	3.3.  Type 3: Leak of Transit-Provider Prefixes to Peer

209	   Description: This type of route leak occurs when an offending AS
210	   leaks routes learned from its transit provider to a lateral (i.e.
211	   non-transit) peer.

213	   o  Example incidents: The incidents reported in [Mauch] include the
214	      Type 3 leaks.

216	3.4.  Type 4: Leak of Peer Prefixes to Transit Provider

218	   Description: This type of route leak occurs when an offending AS
219	   leaks routes learned from a lateral (i.e. non-transit) peer to its
220	   (the AS's) own transit provider.  These leaked routes typically
221	   originate from the customer cone of the lateral peer.

223	   o  Example incidents: Examples of Type 4 route-leak incidents are (1)
224	      the Axcelx-Hibernia route leak of Amazon Web Services (AWS)
225	      prefixes causing disruption of AWS and a variety of services that
226	      run on AWS [Kephart],(2) the Hathway-Airtel route leak of 336
227	      Google prefixes causing widespread interruption of Google services
228	      in Europe and Asia [Toonk2015-A], (3) the Moratel-PCCW route leak
229	      of Google prefixes causing Google's services to go offline
230	      [Paseka], and (4) Some of the example incidents cited for Type 1
231	      route leaks above are also inclusive of Type 4 route leaks.  For
232	      instance, in the Dodo-Telstra incident [Huston2012], the leaked
233	      routes from Dodo to Telstra included routes that Dodo learned from
234	      its transit providers as well as lateral peers.

236	3.5.  Type 5: Prefix Re-Origination with Data Path to Legitimate Origin

238	   Description: A multi-homed AS learns a route from one upstream ISP
239	   and announces the prefix to another upstream ISP as if it is being
240	   originated by it (i.e. strips the received AS path, and re-originates
241	   the prefix).  This can be called re-origination or mis-origination.
242	   However, somehow a reverse path to the legitimate origination AS may
243	   be present and data packets reach the legitimate destination albeit
244	   via the offending AS.  (Note: The presence of a reverse path here is
245	   not attributable to the use of path poisoning trick by the offending
246	   AS.)  But sometimes the reverse path may not be present, and data
247	   packets destined for the leaked prefix may be simply discarded at the
248	   offending AS.

250	   o  Example incidents: Examples of Type 5 route leak include (1) the
251	      China Telecom incident in April 2010 [Hiran][Cowie2010][Labovitz],
252	      (2) the Belarusian GlobalOneBel route leak incidents in February-
253	      March 2013 and May 2013 [Cowie2013], (3) the Icelandic Opin Kerfi-
254	      Simmin route leak incidents in July-August 2013 [Cowie2013], and
255	      (4) the Indosat route leak incident in April 2014 [Zmijewski].
256	      The reverse paths (i.e. data paths from the offending AS to the
257	      legitimate destinations) were present in incidents #1, #2 and #3
258	      cited above, but not in incident #4.  In incident #4, the
259	      misrouted data packets were dropped at Indosat's AS.

261	3.6.  Type 6: Accidental Leak of Internal Prefixes and More Specific
262	      Prefixes

264	   Description: An offending AS simply leaks its internal prefixes to
265	   one or more of its transit-provider ASes and/or ISP peers.  The
266	   leaked internal prefixes are often more specific prefixes subsumed by
267	   an already announced less specific prefix.  The more specific
268	   prefixes were not intended to be routed in eBGP.  Further, the AS
269	   receiving those leaks fails to filter them.  Typically, these leaked
270	   announcements are due to some transient failures within the AS; they
271	   are short-lived and typically withdrawn quickly following the
272	   announcements.  However, these more specific prefixes may momentarily
273	   cause the routes to be preferred over other aggregate (i.e. less
274	   specific) route announcements, thus redirecting traffic from its
275	   normal best path.

277	   o  Example incidents: Leaks of internal routes occur frequently (e.g.
278	      multiple times in a week), and the number of prefixes leaked range
279	      from hundreds to thousands per incident.  One highly conspicuous
280	      and widely disruptive leak of internal routes happened in August
281	      2014 when AS701 and AS705 leaked about 22,000 more specifics of
282	      already announced aggregates [Huston2014][Toonk2014].

284	4.  Additional Comments about the Classification

286	   It is worth noting that Types 1 through 4 are similar in that a route
287	   is leaked in violation of policy in each case, but what varies is the
288	   context of the leaked-route source AS and destination AS roles.

290	   Type 5 route leak (i.e. prefix mis-origination with data path to
291	   legitimate origin) can also happen in conjunction with the AS
292	   relationship contexts in Types 2, 3, and 4.  While these
293	   possibilities are acknowledged, simply enumerating more types to
294	   consider all such special cases does not add value as far as solution
295	   development for route leaks is concerned.  Hence, the special cases
296	   mentioned here are not included in enumerating route leak types.

298	5.  Security Considerations

300	   No security considerations apply since this is a problem definition
301	   document.

303	6.  IANA Considerations

305	   This document does not require an action from IANA.

307	7.  Acknowledgements

309	   The authors wish to thank Jared Mauch, Jeff Haas, Warren Kumari,
310	   Amogh Dhamdhere, Jakob Heitz, Geoff Huston, Randy Bush, Job Snijders,
311	   Ruediger Volk, Andrei Robachevsky, Charles van Niman, Chris Morrow,
312	   and Sandy Murphy for comments, suggestions, and critique.  The
313	   authors are also thankful to Padma Krishnaswamy, Oliver Borchert, and
314	   Okhee Kim for their comments and review.

316	8.  Informative References

318	   [Anwar]    Anwar, R., Niaz, H., Choffnes, D., Cunha, I., Gill, P.,
319	              and N. Katz-Bassett, "Investigating Interdomain Routing
320	              Policies in the Wild",  ACM Internet Measurement
321	              Conference (IMC), October 2015,
322	              <http://www.cs.usc.edu/assets/007/94928.pdf>.

324	   [Cowie2010]
325	              Cowie, J., "China's 18 Minute Mystery",  Dyn
326	              Research/Renesys Blog, November 2010,
327	              <http://research.dyn.com/2010/11/
328	              chinas-18-minute-mystery/>.

330	   [Cowie2013]
331	              Cowie, J., "The New Threat: Targeted Internet Traffic
332	              Misdirection",  Dyn Research/Renesys Blog, November 2013,
333	              <http://research.dyn.com/2013/11/
334	              mitm-internet-hijacking/>.

336	   [draft-dickson-sidr-route-leak-def]
337	              Dickson, B., "Route Leaks -- Definitions",  IETF Internet
338	              Draft (expired), October 2012,
339	              <https://tools.ietf.org/html/draft-dickson-sidr-route-
340	              leak-def-03>.

342	   [draft-dickson-sidr-route-leak-reqts]
343	              Dickson, B., "Route Leaks -- Requirements for Detection
344	              and Prevention thereof",  IETF Internet Draft (expired),
345	              March 2012, <http://tools.ietf.org/html/
346	              draft-dickson-sidr-route-leak-reqts-02>.

348	   [Gao]      Gao, L. and J. Rexford, "Stable Internet routing without
349	              global coordination",  IEEE/ACM Transactions on
350	              Networking, December 2001,
351	              <http://www.cs.princeton.edu/~jrex/papers/
352	              sigmetrics00.long.pdf>.

354	   [Gill]     Gill, P., Schapira, M., and S. Goldberg, "A Survey of
355	              Interdomain Routing Policies",  ACM SIGCOMM Computer
356	              Communication Review, January 2014,
357	              <http://www.cs.bu.edu/~goldbe/papers/survey.pdf>.

359	   [Giotsas]  Giotsas, V. and S. Zhou, "Valley-free violation in
360	              Internet routing - Analysis based on BGP Community data",
361	               IEEE ICC 2012, June 2012.

363	   [Hiran]    Hiran, R., Carlsson, N., and P. Gill, "Characterizing
364	              Large-scale Routing Anomalies: A Case Study of the China
365	              Telecom Incident",  PAM 2013, March 2013,
366	              <http://www3.cs.stonybrook.edu/~phillipa/papers/
367	              CTelecom.html>.

369	   [Huston2012]
370	              Huston, G., "Leaking Routes", March 2012,
371	              <http://labs.apnic.net/blabs/?p=139/>.

373	   [Huston2014]
374	              Huston, G., "What's so special about 512?", September
375	              2014, <http://labs.apnic.net/blabs/?p=520/>.

377	   [Kapela-Pilosov]
378	              Pilosov, A. and T. Kapela, "Stealing the Internet: An
379	              Internet-Scale Man in the Middle Attack", DEFCON-16 Las
380	              Vegas, NV, USA, August 2008,
381	              <https://www.defcon.org/images/defcon-16/dc16-
382	              presentations/defcon-16-pilosov-kapela.pdf>.

384	   [Kephart]  Kephart, N., "Route Leak Causes Amazon and AWS Outage",
385	               ThousandEyes Blog, June 2015,
386	              <https://blog.thousandeyes.com/route-leak-causes-amazon-
387	              and-aws-outage>.

389	   [Khare]    Khare, V., Ju, Q., and B. Zhang, "Concurrent Prefix
390	              Hijacks: Occurrence and Impacts",  IMC 2012, Boston, MA,
391	              November 2012, <http://www.cs.arizona.edu/~bzhang/
392	              paper/12-imc-hijack.pdf>.

394	   [Labovitz]
395	              Labovitz, C., "Additional Discussion of the April China
396	              BGP Hijack Incident",  Arbor Networks IT Security Blog,
397	              November 2010,
398	              <http://www.arbornetworks.com/asert/2010/11/additional-
399	              discussion-of-the-april-china-bgp-hijack-incident/>.

401	   [LRL]      Khare, V., Ju, Q., and B. Zhang, "Large Route Leaks",
402	               Project web page, 2012,
403	              <http://nrl.cs.arizona.edu/projects/
404	              lsrl-events-from-2003-to-2009/>.

406	   [Luckie]   Luckie, M., Huffaker, B., Dhamdhere, A., Giotsas, V., and
407	              kc. claffy, "AS Relationships, Customer Cones, and
408	              Validation",  IMC 2013, October 2013,
409	              <http://www.caida.org/~amogh/papers/asrank-IMC13.pdf>.

411	   [Madory]   Madory, D., "Why Far-Flung Parts of the Internet Broke
412	              Today",  Dyn Research/Renesys Blog, September 2014,
413	              <http://research.dyn.com/2014/09/
414	              why-the-internet-broke-today/>.

416	   [Mauch]    Mauch, J., "BGP Routing Leak Detection System",  Project
417	              web page, 2014,
418	              <http://puck.nether.net/bgp/leakinfo.cgi/>.

420	   [Mauch-nanog]
421	              Mauch, J., "Detecting Routing Leaks by Counting",
422	              NANOG-41 Albuquerque, NM, USA, October 2007,
423	              <https://www.nanog.org/meetings/nanog41/presentations/
424	              mauch-lightning.pdf>.

426	   [Paseka]   Paseka, T., "Why Google Went Offline Today and a Bit about
427	              How the Internet Works",  CloudFare Blog, November 2012,
428	              <http://blog.cloudflare.com/
429	              why-google-went-offline-today-and-a-bit-about/>.

431	   [Toonk2014]
432	              Toonk, A., "What caused today's Internet hiccup", August
433	              2014, <http://www.bgpmon.net/
434	              what-caused-todays-internet-hiccup/>.

436	   [Toonk2015-A]
437	              Toonk, A., "What caused the Google service interruption",
438	              March 2015, <http://www.bgpmon.net/
439	              what-caused-the-google-service-interruption/>.

441	   [Toonk2015-B]
442	              Toonk, A., "Massive route leak causes Internet slowdown",
443	              June 2015, <http://www.bgpmon.net/
444	              massive-route-leak-cause-internet-slowdown/>.

446	   [Wijchers]
447	              Wijchers, B. and B. Overeinder, "Quantitative Analysis of
448	              BGP Route Leaks",  RIPE-69, November 2014,
449	              <http://ripe69.ripe.net/
450	              presentations/157-RIPE-69-Routing-WG.pdf>.

452	   [Zmijewski]
453	              Zmijewski, E., "Indonesia Hijacks the World",  Dyn
454	              Research/Renesys Blog, April 2014,
455	              <http://research.dyn.com/2014/04/
456	              indonesia-hijacks-world/>.

458	Authors' Addresses

460	   Kotikalapudi Sriram
461	   US NIST

463	   Email: ksriram@nist.gov

465	   Doug Montgomery
466	   US NIST

468	   Email: dougm@nist.gov
469	   Danny McPherson
470	   Verisign, Inc.

472	   Email: dmcpherson@verisign.com

474	   Eric Osterweil
475	   Verisign, Inc.

477	   Email: eosterweil@verisign.com

479	   Brian Dickson

481	   Email: brian.peter.dickson@gmail.com