idnits 2.17.1 

draft-ietf-grow-ops-reqs-for-bgp-error-handling-04.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (June 6, 2012) is 4340 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Unused Reference: 'RFC5881' is defined on line 1010, but no explicit
     reference was found in the text

  ** Obsolete normative reference: RFC 2858 (Obsoleted by RFC 4760)

  == Outdated reference: A later version (-13) exists of
     draft-ietf-grow-bgp-gshut-03

  == Outdated reference: A later version (-17) exists of
     draft-ietf-grow-bmp-06

  == Outdated reference: A later version (-10) exists of
     draft-ietf-idr-bgp-enhanced-route-refresh-01

  == Outdated reference: A later version (-16) exists of
     draft-ietf-idr-bgp-gr-notification-00

  == Outdated reference: A later version (-06) exists of
     draft-ietf-idr-enhanced-gr-00


     Summary: 1 error (**), 0 flaws (~~), 7 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Internet Engineering Task Force                                R. Shakir
3	Internet-Draft                                                        BT
4	Intended status: Informational                              June 6, 2012
5	Expires: December 8, 2012

7	Operational Requirements for Enhanced Error Handling Behaviour in BGP-4
8	           draft-ietf-grow-ops-reqs-for-bgp-error-handling-04

10	Abstract

12	   BGP-4 is utilised as a key intra- and inter-Autonomous System routing
13	   protocol in modern IP networks.  The failure modes as defined by the
14	   original protocol standards are based on a number of assumptions
15	   around the impact of session failure.  Numerous incidents both in the
16	   global Internet routing table and within Service Provider networks
17	   have been caused by strict handling of a single invalid UPDATE
18	   message causing large-scale failures in one or more Autonomous
19	   Systems.

21	   This memo describes the current use of BGP-4 within Service Provider
22	   networks, and outlines a set of requirements for further work to
23	   enhance the mechanisms available to a BGP-4 implementation when
24	   erroneous data is detected.  Whilst this document does not provide
25	   specification of any standard, it is intended as an overview of a set
26	   of enhancements to BGP-4 to improve the protocol's robustness to suit
27	   its current deployment.

29	Status of this Memo

31	   This Internet-Draft is submitted in full conformance with the
32	   provisions of BCP 78 and BCP 79.

34	   Internet-Drafts are working documents of the Internet Engineering
35	   Task Force (IETF).  Note that other groups may also distribute
36	   working documents as Internet-Drafts.  The list of current Internet-
37	   Drafts is at http://datatracker.ietf.org/drafts/current/.

39	   Internet-Drafts are draft documents valid for a maximum of six months
40	   and may be updated, replaced, or obsoleted by other documents at any
41	   time.  It is inappropriate to use Internet-Drafts as reference
42	   material or to cite them other than as "work in progress."

44	   This Internet-Draft will expire on December 8, 2012.

46	Copyright Notice

48	   Copyright (c) 2012 IETF Trust and the persons identified as the
49	   document authors.  All rights reserved.

51	   This document is subject to BCP 78 and the IETF Trust's Legal
52	   Provisions Relating to IETF Documents
53	   (http://trustee.ietf.org/license-info) in effect on the date of
54	   publication of this document.  Please review these documents
55	   carefully, as they describe your rights and restrictions with respect
56	   to this document.  Code Components extracted from this document must
57	   include Simplified BSD License text as described in Section 4.e of
58	   the Trust Legal Provisions and are provided without warranty as
59	   described in the Simplified BSD License.

61	Table of Contents

63	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
64	     1.1.  Role of BGP-4 in Service Provider Networks . . . . . . . .  3
65	     1.2.  Overview of Operator Requirements for BGP-4 Error
66	           Handling . . . . . . . . . . . . . . . . . . . . . . . . .  4
67	   2.  Errors within BGP-4 UPDATE Messages  . . . . . . . . . . . . .  6
68	     2.1.  Classifying BGP Errors and Expected Error Handling . . . .  7
69	       2.1.1.  Critical BGP Errors  . . . . . . . . . . . . . . . . .  8
70	       2.1.2.  Semantic BGP Errors  . . . . . . . . . . . . . . . . .  8
71	   3.  Avoiding use of NOTIFICATION . . . . . . . . . . . . . . . . . 10
72	   4.  Recovering RIB Consistency . . . . . . . . . . . . . . . . . . 12
73	   5.  Reducing the Impact of Session Reset . . . . . . . . . . . . . 14
74	   6.  Operational Toolset for Monitoring BGP . . . . . . . . . . . . 16
75	   7.  Operational Complexities Introduced by Altering RFC4271  . . . 20
76	     7.1.  Reducing the Network Impact of Session Teardown  . . . . . 22
77	   8.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 24
78	   9.  Security Considerations  . . . . . . . . . . . . . . . . . . . 25
79	   10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 26
80	   11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 27
81	     11.1. Normative References . . . . . . . . . . . . . . . . . . . 27
82	     11.2. Informational References . . . . . . . . . . . . . . . . . 27
83	   Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 29

85	1.  Introduction

87	   Where BGP-4 [RFC4271] is deployed in the Internet and Service
88	   Provider networks, numerous incidents have been recorded due to the
89	   manner in which [RFC4271] specifies errors in routing information
90	   should be handled.  Whilst the behaviour defined in the existing
91	   standards retains utility, the deployments of the protocol have
92	   changed within modern networks, resulting in significantly different
93	   demands for protocol robustness.  Whilst a number of Internet Drafts
94	   have been written to begin to enhance the behaviour of BGP-4 in terms
95	   of the handling of erroneous messages, this memo intends to define a
96	   set of requirements for ongoing work.  These requirements are
97	   considered from the perspective of a Network Operator, and hence this
98	   draft does not intend to define the protocol mechanisms by which such
99	   error handling behaviour is to be implemented.

101	1.1.  Role of BGP-4 in Service Provider Networks

103	   BGP was designed as an inter-Autonomous System (AS) routing protocol
104	   and hence many of the error handling mechanisms within the protocol
105	   specification are designed to be conducive to this role.  In general,
106	   this consideration as an inter-AS routing propagation mechanism
107	   results in the view that a BGP session propagates a relatively small
108	   amount of network-layer reachability information (NLRI) between two
109	   ASes.  In this case, it is the expectation of session resilience for
110	   those adjacencies that are key to routing continuity (for example, it
111	   is expected that two networks peering via BGP would connect multiple
112	   times in order to safeguard equipment or protocol failure).  In
113	   addition, there is some expectation of multiple paths to a particular
114	   NLRI being available - it would be expected that a network can fall
115	   back to utilising alternate, less direct, paths where a failure of a
116	   more direct path occurs.

118	   Traditional network architectures would deploy an Interior Gateway
119	   Protocol (IGP) to carry infrastructure and customer prefixes, with an
120	   Exterior Gateway Protocol (EGP) such as BGP being utilised to
121	   propagate these prefixes to other Autonomous Systems.  However, with
122	   the growth of IP-based services, this is no longer considered best
123	   practice.  In order to ensure that convergence is within acceptable
124	   time bounds, the amount of routing information carried within the IGP
125	   is significantly reduced - and tends to be only infrastructure
126	   prefixes. iBGP is then utilised to propagate both customer, and
127	   external prefixes within an AS.  As such, BGP has become an IGP, with
128	   traditional IGPs acting as a means by which to propagate the routing
129	   information which is required to establish a BGP session, and reach
130	   the egress node within the local routing domain.  This change in role
131	   presents different requirements for the robustness of BGP as a
132	   routing protocol - with the expectation of similar level of
133	   robustness to that of an IGP being set.

135	   Along with this change in role, the nature of the IP routing
136	   information that is carried has changed.  BGP has become a ubiquitous
137	   means by which service information can be propagated between devices.
138	   For instance, BGP is utilised to carry routing information for IP/
139	   MPLS VPN services as described in [RFC4364].  Since there is an
140	   existing deployment of the protocol between PE devices in numerous
141	   networks, it has been adapted to propagate this routing information,
142	   as its use limits number of routing protocols required on each
143	   device.  This additional information being propagated represents a
144	   large change in requirement for the error handling of the protocol -
145	   where session failure occurs, it is likely a complete service outage
146	   for at least a subset of a network's customers is experienced where
147	   an erroneous packet may have occurred within a different sub-topology
148	   or even service (a different address family for example).  For this
149	   reason, there is a significant demand to avoid service affecting
150	   failures that may be triggered by routing information within a single
151	   sub-topology or service.

153	   Both within Internet and multi-service routing architectures, a
154	   number of BGP sessions propagate a large proportion of the required
155	   routing information for network operation.  For Internet routing,
156	   these are typically BGP sessions which propagate the global routing
157	   table to an AS - failure of these sessions may have a large impact on
158	   network service, based on a single erroneous update.  In an multi-
159	   service environment, typical deployments utilise a small number of
160	   core-facing BGP sessions, typically towards route reflector devices.
161	   Failure of these sessions may also result in a large impact to
162	   network operation.  Clearly, the avoidance of conditions requiring
163	   these sessions to fail is of great utility to any network operator,
164	   and provides further motivation for the revision of the existing
165	   behaviour.

167	   Whilst the behaviour in [RFC4271] is suited to ensuring that BGP
168	   messages with erroneous routing information in are limited in scope
169	   (by means of session reset), with the above considerations, it is
170	   clear that this mechanism is not suited to all deployments.  It
171	   should, however, be noted that the change in scope affects the
172	   handling only of errors occurring after BGP session establishment.
173	   There is no current operational requirement to amend the means by
174	   which error handling in session establishment, or liveliness
175	   detection, are performed.

177	1.2.  Overview of Operator Requirements for BGP-4 Error Handling

179	   It is the intention of this document to define a set of criteria for
180	   the manner in which a revised error handling mechanism in BGP-4 is
181	   required to conform.  The motivation for the definition of these
182	   requirements can be summarised based on certain behaviour currently
183	   present in the protocol that is not deemed acceptable within current
184	   operational deployments, or where there is a short-fall in the tool
185	   set available to an operator.  These key requirements can be
186	   summarised as follows:

188	   o  It is unacceptable within modern deployments of the BGP-4 protocol
189	      that a single erroneous UPDATE packet affects prefixes that it
190	      does not carry.  This requirement therefore requires some
191	      modification to the means by which erroneous UPDATE packets are
192	      handled, and reacted to - with a particular focus on avoiding the
193	      use of the NOTIFICATION message.

195	   o  It is recognised that some error conditions may occur within the
196	      BGP-4 protocol may not always be handled gracefully, and may
197	      result in conditions whereby an implementation cannot recover.  In
198	      these (and similar) cases, it is undesirable for an operator that
199	      this reset of the BGP-4 session results in interruption to
200	      forwarding packets (by means of withdrawing prefixes installed by
201	      BGP-4 into a device's RIB, and subsequently FIB).  To this end,
202	      there is a requirement to define a session reset mechanism which
203	      provides session re-initialisation in a non-destructive manner.

205	   o  Further to the requirements to provide a more robust protocol, the
206	      current visibility into error conditions within the BGP-4 protocol
207	      is extremely limited - where further modifications to this
208	      behaviour are to be made, complexity is likely to be added.  Thus,
209	      to ensure that BGP-4 is manageable, there are requirements for
210	      mechanisms by which the protocol can be examined and monitored.

212	   This document describes each of these requirements in further depth,
213	   along with an overview of means by which they are expected to be
214	   achieved.  In addition, the mechanism by which the enhancements
215	   meeting these requirements are to interact is discussed.

217	2.  Errors within BGP-4 UPDATE Messages

219	   Both through analysis of incidents occurring with the Internet DFZ,
220	   and multi-service environments utilising BGP-4 to signal service or
221	   routing information, a number of different classes of errors within
222	   BGP-4 UPDATE messages have been observed.  In order to consider the
223	   applicability of enhanced error handling mechanisms, it is possible
224	   to divide these errors into a number of sub-classes, particularly
225	   focusing around the location of the error within the UPDATE message.

227	   Where an UPDATE message is considered invalid by a BGP speaker due to
228	   an error within a path attribute that is not the NLRI (where the
229	   definition of NLRI includes reachability information encoded in the
230	   MP_REACH_NLRI and MP_UNREACH_NLRI attributes as specified in
231	   [RFC4760]) it is a requirement of any enhanced error handling
232	   mechanism to handle the error in a manner focused on the NLRI
233	   contained within the message.  Since in this case, the message
234	   received from the remote peer is syntactically valid, it is
235	   considered that such an UPDATE is indicative of erroneous data within
236	   a path attribute.  The impact of the current behaviour defined within
237	   the protocol makes the implication that the BGP speaker from whom the
238	   message is received is now an invalid path for all NLRI announced via
239	   the session - which results in a disproportionate impact to overall
240	   network operation.  In particular scenarios (such as networks with
241	   centralised BGP route reflection) such action can result in a loss of
242	   all reachability to a network.  In other contexts (such as the
243	   Internet DFZ), it cannot be assumed that the BGP speaker from whom
244	   the UPDATE message is received is directly responsible for the
245	   erroneous information contained within the message.

247	   Two further error cases exist within UPDATE messages, both of which
248	   are related to the mechanisms that are applicable to messages
249	   received where some difficulty exists in parsing the entire BGP
250	   message.  The two cases concern those cases where a valid NLRI
251	   attribute can be extracted, and those where such an attribute is not
252	   able to be parsed.  In these cases, errors in the packing of
253	   attributes within a BGP message may have occurred.  Such errors are
254	   likely indicative of an error specifically caused by the remote BGP
255	   speaker.  It is, however, desirable to an operator that such errors
256	   are handled without affecting all NLRI across a BGP session.  As
257	   such, there is a key requirement to maximise the number of cases in
258	   which it is possible to extract NLRI from a BGP UPDATE message.  To
259	   this end, it is required that where possible the MP_REACH_NLRI and
260	   MP_UNREACH_NLRI attributes are utilised for encoding all NLRI
261	   (including IPv4 Unicast), and that this attribute is included as the
262	   first attribute of a BGP UPDATE message (as originally recommended in
263	   [I-D.chen-ebgp-error-handling]).  Such a change to the order of
264	   inclusion of this attribute maximises the number of cases in which
265	   NLRI can be extracted from an UPDATE.  Where this is possible, it is
266	   again required that the error handling mechanisms utilised should be
267	   directly applied to the NLRI included in the UPDATE.

269	   For all cases whereby NLRI can be obtained from an UPDATE message, it
270	   is expected that the requirements outlined in Section 3 should be
271	   considered by any enhancement to the BGP-4 protocol.

273	   In the case that it is not possible to completely parse the NLRI
274	   attribute from the UPDATE message received from a peer, it is
275	   extremely likely that this is indicative of a serious error with
276	   either the process of attribute packing, or buffer usage on the
277	   remote BGP speaker.  In this case, clearly, it is not possible to
278	   apply any error handling mechanism that is limited to a specific set
279	   of NLRI, since an implementation has no knowledge of the NLRI
280	   included within the UPDATE message.  In addition, such errors are
281	   considered to be relatively fundamental to the operation of a BGP
282	   implementation, and hence may indicate a case whereby significant
283	   system errors have occurred.  The current BGP-4 standard results in a
284	   BGP speaker restarting a session with the remote BGP speaker.
285	   However where such an error does occur, it is required that a
286	   graceful mechanism is utilised to provide a lower impact to network
287	   operation.  The requirements for enhancements of this nature to BGP-4
288	   are outlined in Section 5, with the requirements outlined therein
289	   focused on providing a means by which system integrity can be
290	   restored whilst allowing for continued network operation.

292	2.1.  Classifying BGP Errors and Expected Error Handling

294	   It is clearly of advantage for BGP-4 implementations to utilise a
295	   consistent set of error handling mechanisms for the different types
296	   of errors that are described in Section 2, and provide consistent
297	   nomenclature to refer to them.  It is therefore suggested that errors
298	   that are indicative of larger scale failures of a BGP speaker, and
299	   hence require some error handling at the session level are referred
300	   to as 'critical' errors, whilst those errors that are identified
301	   based on incorrect content of one of more attributes of a message are
302	   referred to as 'semantic' errors.

304	   The errors identified within the following sections consider only
305	   those errors within the specifications at the time of writing, it is
306	   recommended that in the definition of future extensions to the BGP-4
307	   specification, the error handling behaviour (and the category within
308	   which errors within the extension should be considered by an
309	   implementation) is defined.

311	2.1.1.  Critical BGP Errors

313	   As described in this document, it is of advantage to limit the number
314	   of 'critical' errors that occur within the protocol, therefore, based
315	   on analysis of the processing of BGP UPDATE messages, it is required
316	   that 'critical' error handling behaviour is applied to:

318	   o  UPDATE Message Length errors - whereby the specified overall
319	      UPDATE message length is inconsistent with sum of the Total Path
320	      Attribute and Withdrawn Routes length.  In this case, this is
321	      indicative of message packing failure, whereby the NLRI may not be
322	      correctly extracted.

324	   o  Errors Parsing the NLRI attributes of an UPDATE message - where
325	      NLRI is carried in either the IPv4-Unicast Advertised or Withdrawn
326	      routes, or in the MP_REACH_NLRI or MP_UNREACH_NLRI attributes
327	      [RFC2858], it is not possible to target error handling mechanisms
328	      to specific NLRI, and hence session level mechanisms must be
329	      utilised.

331	   It is expected that those requirements outlined in Section 5 are
332	   utilised to provide session-level handling of those errors identified
333	   as 'critical'.

335	2.1.2.  Semantic BGP Errors

337	   Where a BGP message is correctly formed, a number of cases exist
338	   whereby the contents of the UPDATE are not valid - in these cases,
339	   this represents errors that can be identified to affect specific
340	   NLRI.  The following cases are expected to be classified as semantic
341	   errors:

343	   o  Zero or invalid length errors in path attributes excluding those
344	      containing NLRI, or where the length of all path attributes
345	      contained within the UPDATE does not correspond to the total path
346	      attributes length.  In this case, the NLRI can be correctly
347	      extracted, and hence acted upon.

349	   o  Messages where invalid data or flags are contained in a path
350	      attribute that does not relate to the NLRI.

352	   o  UPDATE messages missing mandatory attributes, unrecognised non-
353	      optional attributes or those that contain duplicate or invalid
354	      attributes (be they unsupported or unexpected).

356	   o  Those messages where the NEXT_HOP, or MP_REACH next-hop values are
357	      missing, length zero, or invalid for the relevant AFI/SAFI.

359	   In these cases, it is expected that these errors can be handled
360	   gracefully, following the requirements detailed in Section 3 and
361	   Section 4 of this memo.

363	3.  Avoiding use of NOTIFICATION

365	   The error handling behaviour defined in RFC4271 is problematic due to
366	   the limited options that are available to an implementation.  When an
367	   erroneous BGP message is received, at the current time, the
368	   implementation must either ignore the error, or send a NOTIFICATION
369	   message, after which it is mandatory to terminate the BGP session.
370	   It is apparent that this requirement is at odds with that of protocol
371	   robustness.

373	   There is significant complexity to this requirement.  The mechanism
374	   defined in [I-D.chen-ebgp-error-handling] describes a means by which
375	   no NOTIFICATION message is generated for all cases whereby NLRI can
376	   be extracted from an UPDATE.  The NLRI contained within the erroneous
377	   UPDATE message is considered as though the remote BGP speaker has
378	   provided an UPDATE marking it as withdrawn.  This results in a limit
379	   in the propagation of the invalid routing information, whilst also
380	   ensuring that no traffic is forwarded via a previously-known path
381	   that may no longer be valid.  This mechanism is referred to as
382	   "treat-as-withdraw".

384	   Whilst this behaviour results in avoiding a NOTIFICATION message,
385	   keeping other routing information advertised by the remote BGP
386	   speaker within the RIB, it may result in unreachability for a sub-set
387	   of the NLRI advertised by the remote speaker.  Two cases should be
388	   considered - that where the entry for a prefix in the Adj-RIB-In of
389	   the neighbour propagating an erroneous packet is utilised, and that
390	   where the prefix installed in the device's RIB is learnt from another
391	   BGP speaker.  In the former case, should the identified NLRI not be
392	   treated as withdrawn, the original NLRI is utilised within the global
393	   RIB.  However, this information is potentially now invalid (i.e. it
394	   no longer provides a valid forwarding path), whilst an alternate
395	   (valid) path may exist in another Adj-RIB-In.  By continuing to
396	   utilise the NLRI for which the UPDATE was considered invalid, traffic
397	   may be forwarded via an invalid path, resulting in routing loops, or
398	   black-holing.  In the second case, no impact to the forwarding of
399	   traffic, or global RIB, is incurred, yet where treat-as-withdraw is
400	   implemented, possibly stale routing information is purged from the
401	   Adj-RIB-In of the neighbour propagating errors.

403	   Whilst mechanisms such as "treat-as-withdraw" are currently
404	   documented, the proposals are limited in their scope - particularly
405	   in terms of restrictions to implementation only on eBGP sessions.
406	   This limitation is made based on the view that the BGP RIB must be
407	   consistent across an autonomous system.  By implementing treat-as-
408	   withdraw for a iBGP session, one or more routers within the
409	   Autonomous System may not have reachability to a prefix, and hence
410	   blackholing of traffic, or routing loops, may occur.  It should,
411	   however, be considered if this view is valid, in light of the manner
412	   in which BGP is utilised within operator networks.  Inconsistency in
413	   a RIB based on a single UPDATE being treated as withdrawn may cause a
414	   inconsistency in a single sub-topology (e.g.  Layer 3 VPN service),
415	   or a service not operating completely (in the case of an UPDATE
416	   carrying service membership information).  Where a NOTIFICATION and
417	   teardown is utilised this is destructive to all sub-topologies in all
418	   address family identifiers (AFIs) carried by the session in question.
419	   Even where mechanisms such as multi-session BGP are utilised, a whole
420	   AFI is affected by such a NOTIFICATION message.  In terms of routing
421	   operation, it is therefore far less costly to endure a situation
422	   where a limited sub-set of routing information within an AS is
423	   invalid, than to consider all routing information as invalid based on
424	   a single trigger.

426	   It is considered that, if extended to cover iBGP, the mechanisms
427	   described in [I-D.chen-ebgp-error-handling] and
428	   [I-D.ietf-idr-optional-transitive] provide a means to avoid the
429	   transmission of a NOTIFICATION to a remote BGP speaker based on a
430	   single erroneous message, where at all possible, and hence meet this
431	   requirement.  The failure cases whereby NLRI cannot be extracted from
432	   the UPDATE message represent a case whereby the receiving system
433	   cannot handle the error gracefully based on this mechanism.

435	4.  Recovering RIB Consistency

437	   The recommendations described in Section 3 may result in the RIB for
438	   a topology within an AS being inconsistent across the AS' internal
439	   routers.  Alternatively, where such mechanisms are deployed at an AS
440	   boundary, interconnects between two ASes may be inconsistent with
441	   each other.  There are therefore risks of traffic blackholing, due to
442	   missing routing information, or forwarding loops.  Whilst this is
443	   deemed an acceptable compromise in the short term, clearly, it is
444	   suboptimal.  Therefore, a requirement exists to provide mechanisms by
445	   which a BGP speaker is able to recover the consistency of the Adj-
446	   RIB-In for a particular neighbour.

448	   In the general case, the consistency of the BGP RIB can be recovered
449	   by re-requesting the entire Adj-RIB-Out of a remote BGP speaker is
450	   re-advertised.  A mechanism to achieve this re-advertisement is
451	   defined within the ROUTE-REFRESH specification [RFC2918].  It is
452	   envisaged that by requesting a refresh of all NLRI advertised by a
453	   BGP speaker, any NLRI which has been withdrawn due to being contained
454	   within an invalid UPDATE message is re-learnt.  Where a ROUTE REFRESH
455	   is used to directly perform a consistency check between the Adj-RIB-
456	   Out of a remote device, and the Adj-RIB-In of the local BGP speaker,
457	   a demarcation between the ROUTE-REFRESH, and normal UPDATE messages
458	   is required (in order that an "end" of the refresh can be used to
459	   identify any 'stale' NLRI) -
460	   [I-D.ietf-idr-bgp-enhanced-route-refresh] provides a means by which
461	   the ROUTE-REFRESH mechanism can be extended to meet this requirement.

463	   Whilst re-advertisement of the whole BGP RIB provides a means by
464	   which withdrawn NLRI can be re-advertised, there are some scaling
465	   implications that must be considered.  In the case that a ROUTE-
466	   REFRESH is generated, all NLRI must be re-packed into UPDATE messages
467	   and advertised by one speaker on the BGP session, whilst the other
468	   must receive all UPDATE messages, and validate the RIB's consistency.
469	   Clearly, it is advantageous to avoid this work where possible.

471	   It is envisaged that during routing inconsistencies caused by
472	   utilising the 'treat-as-withdraw' mechanism, the local BGP speaker is
473	   aware that some routing information was not able to be processed -
474	   due to the fact that an UPDATE message was not parsed correctly.
475	   Since this mechanism (as discussed in Section 3) requires the local
476	   BGP speaker to have determined the set of NLRI for which an erroneous
477	   UPDATE message was received, it is possible to use a targeted
478	   mechanisms to re-request the specific NLRI that was contained within
479	   the erroneous UPDATE message.  By re-requesting, this provides the
480	   remote BGP speaker an opportunity to re-transmit the NLRI - possibly
481	   providing an opportunity to leverage alternative methods to build the
482	   UPDATE message.  Such a request requires extension to the existing
483	   BGP-4 protocol, in terms of specific UPDATE generation filters with a
484	   transient lifetime.  It is envisaged that the work within
485	   [I-D.zeng-one-time-prefix-orf] provides a mechanism allowing targeted
486	   elements of the Adj-RIB-In for a BGP neighbour to be recovered.

488	   It is of particular note for both means of recovering RIB consistency
489	   described that these are effective only when considering transitive
490	   errors within an implementation - for instance, should an RFC
491	   interpretation error within an implementation be present, regardless
492	   of the number of times a specific UPDATE is generated, it is likely
493	   that this error condition will persist (as it may with the existing
494	   behaviour defined by [RFC4271]).  For this reason, there is an
495	   requirement to consider the means by which such consistency recovery
496	   mechanisms are utilised.  It is not advisable that a transitive
497	   filter and advertisement mechanism is triggered by all error handling
498	   events due to the load this is likely to place on the neighbour
499	   receiving such a request.  Where this BGP speaker is a relatively
500	   centralised device - a route reflector (as described by [RFC4456])
501	   for example - the act of generation of UPDATE messages with such
502	   frequency is likely to cause disproportionate load.  It is therefore
503	   an operational requirement of such mechanisms that means of request
504	   dampening be required by any such extension.

506	5.  Reducing the Impact of Session Reset

508	   Even where protocol enhancements allow errors in the BGP-4 protocol
509	   to cease to trigger NOTIFICATION messages, and hence reset a BGP
510	   session, it is clear that some error conditions may not be exited.
511	   In particular, errors due to existing state, or memory structures,
512	   associated with a specific BGP session will not be handled.  It is
513	   therefore important to consider how these error conditions are
514	   currently handled by the protocol.  It should be noted that the
515	   following discussion and analysis considers only those NOTIFICATION
516	   messages generated in response to errors in UPDATE messages (as
517	   defined by Section 6.3 in [RFC4271]).

519	   The existing NOTIFICATION behaviour triggers a reset of all elements
520	   of the BGP-4 session, as described in Section 6 of [RFC4271].  It is
521	   expected that session teardown requires an implementation to re-
522	   initialise all structures and state required for session maintenance.
523	   Clearly, there is some utility to this requirement, as error
524	   conditions in BGP are, in general, exited from.  However, this
525	   definition is responsible for the forwarding outages within networks
526	   utilising BGP for propagation of routing or service when each error
527	   is experienced.  The requirement described in Section 3 is intended
528	   to reduce the cases whereby a NOTIFICATION is required, however, any
529	   mechanism implemented as a response to this requirement by definition
530	   cannot provide a session reset to the extent of that achieved by the
531	   current behaviour.

533	   In order to address this, there is a requirement for a means by which
534	   a BGP speaker can signal that an unhandled error condition in an
535	   UPDATE message occurred - requiring a session reset - yet also
536	   continue to utilise the paths advertised by the neighbour that are
537	   currently in use within the RIB.  In this case, the Adj-RIB-In
538	   received from the neighbour is not considered invalid, despite a
539	   NOTIFICATION, and session reset, being required.  This set of
540	   requirements is akin to those answered by the BGP Graceful Restart
541	   mechanism described in [RFC4724].  Since the operational requirement
542	   in this case is to provide a means to achieve a complete session
543	   restart without disrupting the forwarding path of those prefixes in
544	   use within a BGP speaker's RIB, it is expected that utilising a
545	   procedure similar to the Graceful Restart mechanism meets the error
546	   handling requirement.  By responding to an error condition (repeated
547	   or otherwise) with a message indicating that an error that cannot be
548	   handled has occurred, forcing session reset, whilst retaining
549	   forwarding information within the RIB allows forwarding to all
550	   prefixes within a system's RIB to continue during the period in which
551	   the session restarts.  It is envisaged that the additional complexity
552	   introduced by the introduction of such a mechanism can be limited by
553	   extending existing BGP messages - one such approach is proposed in

555	   [I-D.ietf-idr-bgp-gr-notification].  By placing a time bound on the
556	   restart lifetime, should an error condition not be transient - for
557	   example, should an error have occurred with the BGP process, rather
558	   than a specific of the BGP session - the remote BGP speaker is still
559	   detected as an invalid device for forwarding.

561	   It should be noted that a protocol enhancement meeting this
562	   requirement is not able to solve all error conditions - however, a
563	   complete restart of the BGP and TCP session between two BGP speakers
564	   implements an identical recovery mechanism to that which is achieved
565	   by the existing behaviour.  Where an error condition such as memory
566	   or configuration corruption has occurred in a BGP implementation, it
567	   is expected that a mechanism meeting this requirement continues to
568	   detect this, by means of a bound on time for session restart to
569	   occur.  Whilst there may be some consideration that packets continue
570	   to be forwarded through a device which can be in an failure mode of
571	   this nature for a longer period due to this requirement, the
572	   architecture of modern IP routers should be considered.  A divided
573	   forwarding and control plane is common in many devices, as well as
574	   process separation for software-based devices - corruption of a
575	   specific protocol daemon does not necessarily imply forwarding is
576	   affected.  Indeed, where forwarding behaviour of a device is
577	   affected, it is envisaged that a failure detection mechanism (be it
578	   Bidirectional Forwarding Detection, or indeed BGP KEEPALIVE packets)
579	   will detect such a failure in almost all cases, with the symptomatic
580	   behaviour of such a failure being an invalid UPDATE message in very
581	   few other cases.

583	6.  Operational Toolset for Monitoring BGP

585	   A significant complexity that is introduced through the requirements
586	   defined in this document is that of monitoring BGP session status for
587	   an operator.  Although the existing error handling behaviour causes a
588	   disproportionate failure, session failure is extremely visible to
589	   most operational personnel within a Network Operator due to both
590	   existing definitions of SNMP trap mechanisms for BGP, along with the
591	   forwarding impact typically caused by such a failure.  By introducing
592	   mechanisms by which errors of this nature are not as visible, this is
593	   no longer the case.  There is a requirement that where subsets of the
594	   RIB on a device are no longer reachable from a BGP speaker, or indeed
595	   an AS, that some visibility of this situation, alongside a mechanism
596	   to determine the cause is available to an operator.  Whilst, to some
597	   extent, this can be solved by mandating a sub-requirement of each of
598	   the aforementioned requirements that a BGP speaker must log where
599	   such errors occur, and are hence handled, this does not solve all
600	   cases.  In order to clarify this requirement, the example of the
601	   transmission of an erroneous Optional Transitive attribute can be
602	   considered.  Since, by definition, there is no requirement for all
603	   BGP speakers to parse such an attribute, a receiving router may treat
604	   NLRI as withdrawn based on an erroneous attribute not examined by its
605	   neighbour.  In this case, the upstream device or network, propagating
606	   the UPDATE, has no visibility of this error.  Operationally, however,
607	   it is of interest to the upstream router operator that such invalid
608	   information was propagated.

610	   The requirement for logging of error conditions in transmitted BGP
611	   messages, which are visible to only the receiver, cannot be achieved
612	   by any existing BGP message, or capability.  It is envisaged that
613	   each erroneous event should be transmitted to the remote peer -
614	   including the information as to the set of NLRI that were considered
615	   invalid.  Whilst with some mechanisms this is achieved by default
616	   (for example, One-Time Prefix ORF [I-D.zeng-one-time-prefix-orf]
617	   (Outbound Route Filtering) will transmit the set of prefixes that are
618	   required), the operator requirement is to know which prefixes may
619	   have been unreachable in all cases.  It is envisaged that an
620	   extension to meet this requirement will allow for such information to
621	   be transmitted between peers, and hence logged.  Such a mechanism may
622	   provide further utility as a either a diagnostic, or logging toolset.

624	   As such, it is possible to divide the messages that are required in
625	   order to provide further visibility into BGP for an operator.  Such a
626	   division can be made both due to the required means of message
627	   transmission, alongside the criticality of each request.

629	   o  Messages required to replace NOTIFICATION - In cases where the
630	      error handling mechanisms defined by [RFC4271] currently result in
631	      a NOTIFICATION message being generated, a number of the
632	      requirements detailed within this document result this message
633	      being suppressed.  Despite this change, the error condition's
634	      occurrence is still of interest to an operator in order to provide
635	      both monitoring and troubleshooting capabilities, since some form
636	      of invalid data has been received on a session.  It therefore
637	      considered that an implementation must generate a message both
638	      locally, and transmitted to the remote peer, based on the such a
639	      condition.  Where such a message is transmitted to the remote
640	      peer, it is considered that the BGP session via which the
641	      erroneous UPDATE message was received should be used as transport
642	      to the remote peer.  The information transmitted in such a message
643	      should be minimised to allow identification of the paths which
644	      were considered erroneous (i.e. restricting the information to
645	      that which is directly relevant to a network operator in the case
646	      of an error condition occurring).  Any delay to convergence on the
647	      session in question is considered to be acceptable, given the
648	      suboptimal nature of the reception of invalid routing information
649	      via a BGP session.  Further concerns regarding such a mechanism
650	      relate to the load generated on the BGP speaker in question,
651	      however, it must be considered that in the case of an erroneous
652	      UPDATE being received, and the 'treat-as-withdraw' mechanism being
653	      utilised, where the erroneous path is removed from the Loc-RIB,
654	      there is likely to be a requirement to generate UPDATE messages
655	      withdrawing the prefix from all further BGP speakers to which the
656	      prefix is advertised.  The load generated by the generation of
657	      such UPDATEs is likely to be much greater than that of
658	      transmitting error information via a logging message type back to
659	      the speaker from which it was received.  It is envisaged that
660	      light-weight BGP message-based signalling mechanisms such as the
661	      ADVISORY message types detailed in
662	      [I-D.ietf-idr-operational-message] provide a suitable means to
663	      satisfy this requirement.

665	   o  Additional Diagnostic Capabilities for BGP - In a number of cases,
666	      there is an operational requirement to further debug erroneous BGP
667	      UPDATE messages, along with the particulars of the state of a BGP
668	      speaker.  For instance, where an invalid BGP UPDATE message is
669	      transmitted between two BGP speakers, the exact format of the
670	      UPDATE message is of interest to an operator, as this information
671	      provides a clear indication of an message considered to be
672	      erroneous by the BGP speaker to which it was transmitted.  In this
673	      case, it is considered of great utility that the entire UPDATE
674	      message is transmitted back to the advertising speaker, in order
675	      to allow for further debugging to occur.  Whilst such information
676	      is particularly useful to an operator, it clearly provides
677	      information that is not key to protocol operation - for this
678	      reason, it is expected that some of the concerns regarding the
679	      additional complexity, and load that a BGP speaker is subjected to
680	      is not acceptable.  For this reason, it is required that where
681	      mechanisms are developed to support this requirement, messages of
682	      this nature can be supported both within an existing BGP session,
683	      and via a dedicated separate session, be it BGP carrying messages
684	      such as those defined in [I-D.ietf-idr-operational-message] or a
685	      dedicated monitoring protocol akin to BMP described in
686	      [I-D.ietf-grow-bmp].

688	   Whilst the operational requirement for such monitoring tools to allow
689	   for visibility into BGP is clearly agreed upon, the means by which
690	   such messages are transmitted between two BGP speakers is likely to
691	   be dependent upon both the positions of the speakers in question (for
692	   instances, the requirements for such a protocol may differ where a
693	   session is between two ASBRs under separate administration).  The
694	   introduction of additional message types to the BGP protocol clearly
695	   introduces further complexity - and leaves room for further
696	   implementation and standardisation errors that may compromise the
697	   robustness of the BGP protocol.  In addition, the queuing and
698	   scheduling of these BGP messages must be interleaved with the
699	   transmission of the key protocol messages - such as KEEPALIVE and
700	   UPDATE packets.  It is therefore a concern that should a large number
701	   of messages specifically for operational visibility be transmitted,
702	   this will delay the transmission of UPDATE packets, and hence
703	   adversely affect the end-to-end convergence time for NLRI carried
704	   within BGP.  The operational requirement for why messages are
705	   advantageous to be in-band to a protocol should also be considered.
706	   In particular, it should be noted that where such information is to
707	   be transmitted between administrative boundaries a BGP session
708	   represents an existing channel exists between the two ASes.  This
709	   channel is considered to be secure insofar as the routing
710	   information, and requests sent via the session are considered to come
711	   from a trusted source.  Since error information relates to both a
712	   particular attachment, and is key to ensuring that such a session is
713	   operating as expected, it is considered of great operational benefit
714	   that this information is transmitted over this channel.  In addition,
715	   the overall system scalability is improved by such in-band
716	   transmission.  It is expected that erroneous information resulting in
717	   the 'treat-as-withdraw' mechanism being utilised is relatively
718	   infrequently transmitted between two peers (when compared to the
719	   frequency of UPDATE messages transmission).  The impact of including
720	   an additional BGP message type for such operational visibility is
721	   relatively small from a resource utilisation perspective - additional
722	   processing overhead is only experienced when such a message is
723	   received.  Where a separate session is maintained, particular network
724	   elements within a service provider topology may require hundreds, or
725	   thousands, of additional sessions for the transmission of this
726	   information.  Such an resource consumption overhead is likely to be
727	   unacceptable to some network operators.

729	   For the reasons explained above, it is expected that mechanisms
730	   specified to meet the requirements for event visibility consider the
731	   relative impacts of additional monitoring sessions, or message
732	   inclusion in band to BGP in order not to compromise the security,
733	   scalability and robustness of the BGP-4 protocol.

735	7.  Operational Complexities Introduced by Altering RFC4271

737	   The existing NOTIFICATION and subsequent teardown of a BGP session
738	   upon encountering an error has the advantage that a consistent
739	   approach to error handling is required of all implementations of the
740	   BGP-4 protocol.  This is of operational advantage as it provides a
741	   clear expectation of the behaviour of the protocol.  The requirements
742	   defined herein add further complexity to the error-handling within
743	   BGP, and hence are liable to compromise the existing deterministic
744	   protocol behaviour.  It is therefore deemed that there is a further
745	   requirement to define a set of recommended behaviours based on the
746	   reception of a particular class of erroneous UPDATE message,
747	   alongside highlighting some of the implementation complexities that
748	   may need to be handled in the case that particular recommendations
749	   made within this memo are deployed.

751	   Utilising the classes of erroneous UPDATE message described in
752	   Section 2, the recommended behaviour for a BGP-4 implementation can
753	   be divided into two branches.  Primarily, where a semantic error is
754	   identified, an implementation is expected to utilise the reduced-
755	   impact error handling approach, as described in Section 3.  In the
756	   case that such an approach results in known NLRI being withdrawn from
757	   the BGP speaker's RIB, and an implementation provides functionality
758	   such that these errors are recovered from through an automatically
759	   triggered means, such as those described within Section 4, some
760	   consideration of the scalability of these recovery mechanisms is
761	   required.  Clearly, there is an computational and bandwidth overhead
762	   associated with the re-advertisement of NLRI between two BGP speakers
763	   - both due to the generation of UPDATE messages, their transmission
764	   between the two speakers, and the parsing and processing into the RIB
765	   required.  This overhead is directly proportional to the number of
766	   UPDATE messages that are required.  Where a semantic error is
767	   experienced, by definition the NLRI contained within the UPDATE can
768	   be extracted.  It is therefore possible to minimise the proportion of
769	   the RIB that is re-advertised by targeting any recovery mechanism on
770	   the NLRI contained within the erroneous UPDATE.  Such a targeted
771	   mechanism can be achieved through a means such as One-Time ORF, or
772	   other means of targeting UPDATE messages not discussed within this
773	   memo.  It is recommended that where available, any automatic (or
774	   manual) triggered recovery mechanism behaviour utilises such targeted
775	   means in preference to any whole RIB refresh mechanism (such as
776	   ROUTE-REFRESH).

778	   In the case that an erroneous UPDATE has been processed through a
779	   means such as treat-as-withdraw (described within Section 3), a
780	   recovering mechanism may be considered superfluous, if the assumption
781	   is made that the RIB inconsistency will only be recovered from based
782	   on a path re-convergence (or change in BGP attribute) for the
783	   advertising BGP speaker.  However, where this assumption is not
784	   considered to provide adequate recovery behaviour, and a mechanism to
785	   restore RIB consistency automatically is implemented, some
786	   consideration must be made for where repeated erroneous messages
787	   occur.  In this case, in order to limit the impact to the BGP
788	   speaker's network operation, at a pre-defined point it is recommended
789	   that such automatic recovery mechanisms towards the BGP speaker from
790	   which erroneous UPDATEs are repeatedly received are suppressed, and
791	   the fact that such suppression has occurred is highlighted to an
792	   operator.  The point at which such behaviour is suppressed is to be
793	   defined on a per-implementation basis, taking into account feedback
794	   from the Network Operator community based on the deployment of the
795	   recommendations described in this document.  It is expected that such
796	   trigger points are dependent upon the mechanisms implemented for a
797	   particular BGP-4 implementations, and the impact upon the speaker of
798	   these means of RIB recovery.

800	   Where critical errors are experienced, such that a session reset is
801	   required, the mechanism discussed in Section 5 should be used.
802	   Again, since such a mechanism results in a restart of a BGP session,
803	   it expected that all NLRI carried over the session is re-advertised
804	   as it is re-established, incurring processing overhead on both the
805	   advertising and receiving BGP speaker.  In order to minimise the
806	   consumption of control-plane computational resource on both speakers,
807	   it is recommended that mechanisms allowing a reduced set of BGP
808	   UPDATE messages to be re-transmitted between two speakers are
809	   employed wherever possible - for instance through employing
810	   mechanisms such as those described in [I-D.ietf-idr-enhanced-gr].

812	   In the case that repeated critical errors occur, the overhead of
813	   performing any mechanism implemented based on the requirements in
814	   Section 5 is incurred following each erroneous UPDATE message.  Since
815	   these mechanisms are, by definition, performed automatically in
816	   response to the erroneous message being received similar
817	   considerations as to the impact to the BGP speaker must be taken into
818	   account.  As such, it is expected that after a certain trigger level,
819	   the ongoing receipt of critical errors within BGP UPDATE messages is
820	   deemed to be indicative of a long-lasting failure, and a session no
821	   longer considered viable.  Where such an case is experienced, it is
822	   expected that the BGP session reverts to the standard session failure
823	   behaviour, as described in [RFC4271] and documents updating this base
824	   standard.  Where such a reversion is implemented this condition
825	   should be flagged to an network operator.  The number of restart
826	   attempts before the session reverts to being shut down should be
827	   determined based on the overhead of the recovery mechanisms
828	   implemented (for instance, where [I-D.ietf-idr-enhanced-gr] is
829	   implemented, the impact of session restart may be significantly
830	   lower), and operational experience of the deployment of the
831	   recommendations described in this document.

833	   Since repeated erroneous UPDATE messages which experience critical
834	   errors may be indicative of long-lasting failure modes, it is
835	   recommended that a back-off from restarting BGP sessions experiencing
836	   such behaviour is implemented.  As such, this is not applicable to
837	   restart behaviour through means such as those described in Section 5
838	   since such restarts are time-bound based on the period for which the
839	   Adj-RIB-In from a BGP speaker is maintained as valid (e.g., when
840	   considering BGP Graceful Restart, such restarts are time-bound by the
841	   Restart Time described in [RFC4724]).  However, following a session
842	   reverting to being pulled down based on repeated error conditions, it
843	   is recommended that following restart attempts are subject to an
844	   exponentially increasing interval between subsequent attempts.  It is
845	   therefore recommended that in such cases an implementation implements
846	   the increasing values of IdleHoldTimer as described in the BGP-4 FSM
847	   documented in [RFC4271].

849	7.1.  Reducing the Network Impact of Session Teardown

851	   As discussed within the preceding section, where repeated critical
852	   UPDATE message errors are received, it is recommended that the impact
853	   to the both advertising and receiving BGP-4 speakers be limited by
854	   reverting to tearing the BGP-4 session experiencing such errors down.
855	   The BGP-4 specification presented in [RFC4271] achieves such a
856	   session shutdown by sending a NOTIFICATION message, however, this has
857	   the net result that all downstream BGP speakers (i.e. those to whom
858	   the NLRI carried over the now ceased BGP session was readvertised)
859	   must withdraw this NLRI from their RIB, and perform a best-path
860	   selection if required.  In some cases, there may be no alternate path
861	   being available, and hence a period of time for which no valid BGP
862	   route exists.  Particularly, this is very likely to occur where an
863	   upstream BGP speaker performs a best-path selection and advertises
864	   only a single path to its neighbours - there is a requirement for the
865	   upstream speaker to perform a best-path selection, and re-advertise a
866	   new set of NLRI before the downstream system is able to converge to a
867	   new path.  It should be noted that where UPDATE messages withdrawing
868	   NLRI are not subject to the BGP session's configured
869	   MinRouteAdvertisementInterval (MRAI) [RFC4271], but re-advertisements
870	   are, this may result in a BGP speaker being without a path for a
871	   period up to the MRAI.

873	   Clearly, it is advantageous to avoid this period of time for which
874	   there may be no reachability for a set of NLRI, especially since the
875	   BGP speaker terminating a particular session is doing so due to a
876	   particular error handling policy.  The graceful shutdown mechanism
877	   detailed in [I-D.ietf-grow-bgp-gshut] provides a mechanism by which a
878	   BGP speaker is able to signal that a set of NLRI is to be withdrawn,
879	   and hence allow downstream systems to pre-emptively perform a best-
880	   path selection, and hence advertise new reachability information in a
881	   make-before-break manner.

883	   It is therefore envisaged, that where a session is to be shutdown,
884	   based on a trigger relating to erroneous UPDATE messages being
885	   received (be they repeated or not) that the graceful shutdown
886	   procedure in utilised, so as to reduce the forwarding impact of NLRI
887	   received on the session being withdrawn.

889	8.  IANA Considerations

891	   This memo includes no request to IANA.

893	9.  Security Considerations

895	   The requirements outlined in this document provide mechanisms by
896	   which erroneous BGP messages may be responded to with limited impact
897	   to forwarding operation.  This is of benefit to the security of a BGP
898	   speaker in general.  Where UPDATE messages may have been propagated
899	   by a single malicious Autonomous System or router within a network
900	   (or the Internet default free zone - DFZ), which are then propagated
901	   to all devices within the same routing domain, all other NLRI
902	   available over the same session become unreachable.  This mechanism
903	   may provide means by which an Autonomous System can be isolated from
904	   required routing domains (such as the Internet), should the relevant
905	   UPDATE messages be propagated via specific paths.  By reducing the
906	   impact of such failures, it is envisaged that this possibility may be
907	   constrained to a specific set of NLRI, or a specific topology.

909	   Some mechanisms meeting the requirements specified in this document,
910	   particularly those within Section 6 may provide further security
911	   concerns, however, it is envisaged that these are addressed in per-
912	   enhancement memos.

914	10.  Acknowledgements

916	   The author would like to thank the following network operators for
917	   their insight, and valuable input in defining the requirements for a
918	   variety of operational deployments of the BGP-4 protocol; Shane
919	   Amante, Bruno Decraene, Rob Evans, David Freedman, Wes George, Tom
920	   Hodgson, Sven Huster, Jonathan Newton, Neil McRae, Thomas Mangin, Tom
921	   Scholl and Ilya Varlashkin.

923	   In addition, many thanks are extended to Jeff Haas, Wim Hendrickx,
924	   Tony Li, Alton Lo, Keyur Patel, John Scudder, Adam Simpson and Robert
925	   Raszuk for their expertise relating to implementations of the BGP-4
926	   protocol.

928	11.  References

930	11.1.  Normative References

932	   [RFC2858]  Bates, T., Rekhter, Y., Chandra, R., and D. Katz,
933	              "Multiprotocol Extensions for BGP-4", RFC 2858, June 2000.

935	   [RFC2918]  Chen, E., "Route Refresh Capability for BGP-4", RFC 2918,
936	              September 2000.

938	   [RFC4271]  Rekhter, Y., Li, T., and S. Hares, "A Border Gateway
939	              Protocol 4 (BGP-4)", RFC 4271, January 2006.

941	   [RFC4364]  Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private
942	              Networks (VPNs)", RFC 4364, February 2006.

944	   [RFC4456]  Bates, T., Chen, E., and R. Chandra, "BGP Route
945	              Reflection: An Alternative to Full Mesh Internal BGP
946	              (IBGP)", RFC 4456, April 2006.

948	   [RFC4724]  Sangli, S., Chen, E., Fernando, R., Scudder, J., and Y.
949	              Rekhter, "Graceful Restart Mechanism for BGP", RFC 4724,
950	              January 2007.

952	   [RFC4760]  Bates, T., Chandra, R., Katz, D., and Y. Rekhter,
953	              "Multiprotocol Extensions for BGP-4", RFC 4760,
954	              January 2007.

956	11.2.  Informational References

958	   [I-D.chen-ebgp-error-handling]
959	              Chen, E., Mohapatra, P., and K. Patel, "Revised Error
960	              Handling for BGP Updates from External Neighbors",
961	              draft-chen-ebgp-error-handling-01 (work in progress),
962	              September 2011.

964	   [I-D.ietf-grow-bgp-gshut]
965	              Francois, P., Decraene, B., Pelsser, C., Patel, K., and C.
966	              Filsfils, "Graceful BGP session shutdown",
967	              draft-ietf-grow-bgp-gshut-03 (work in progress),
968	              December 2011.

970	   [I-D.ietf-grow-bmp]
971	              Scudder, J., Fernando, R., and S. Stuart, "BGP Monitoring
972	              Protocol", draft-ietf-grow-bmp-06 (work in progress),
973	              December 2011.

975	   [I-D.ietf-idr-bgp-enhanced-route-refresh]
976	              Patel, K., Chen, E., and B. Venkatachalapathy, "Enhanced
977	              Route Refresh Capability for BGP-4",
978	              draft-ietf-idr-bgp-enhanced-route-refresh-01 (work in
979	              progress), December 2011.

981	   [I-D.ietf-idr-bgp-gr-notification]
982	              Patel, K., Fernando, R., and J. Scudder, "Notification
983	              Message support for BGP Graceful Restart",
984	              draft-ietf-idr-bgp-gr-notification-00 (work in progress),
985	              December 2011.

987	   [I-D.ietf-idr-enhanced-gr]
988	              Patel, K., Chen, E., Fernando, R., and J. Scudder,
989	              "Accelerated Routing Convergence for BGP Graceful
990	              Restart", draft-ietf-idr-enhanced-gr-00 (work in
991	              progress), December 2011.

993	   [I-D.ietf-idr-operational-message]
994	              Freedman, D., Raszuk, R., and R. Shakir, "BGP OPERATIONAL
995	              Message", draft-ietf-idr-operational-message-00 (work in
996	              progress), March 2012.

998	   [I-D.ietf-idr-optional-transitive]
999	              Scudder, J., Chen, E., Mohapatra, P., and K. Patel,
1000	              "Revised Error Handling for BGP UPDATE Messages",
1001	              draft-ietf-idr-optional-transitive-04 (work in progress),
1002	              October 2011.

1004	   [I-D.zeng-one-time-prefix-orf]
1005	              Zeng, Q. and J. Dong, "One-time Address-Prefix Based
1006	              Outbound Route Filter for BGP-4",
1007	              draft-zeng-one-time-prefix-orf-01 (work in progress),
1008	              October 2010.

1010	   [RFC5881]  Katz, D. and D. Ward, "Bidirectional Forwarding Detection
1011	              (BFD) for IPv4 and IPv6 (Single Hop)", RFC 5881,
1012	              June 2010.

1014	Author's Address

1016	   Rob Shakir
1017	   BT
1018	   pp C3L
1019	   BT Centre
1020	   81, Newgate Street
1021	   London  EC1A 7AJ
1022	   UK

1024	   Email: rob.shakir@bt.com
1025	   URI:   http://www.bt.com/