idnits 2.17.1 

draft-ietf-grow-ops-reqs-for-bgp-error-handling-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (April 15, 2011) is 4757 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Unused Reference: 'RFC5881' is defined on line 675, but no explicit
     reference was found in the text

  == Outdated reference: A later version (-01) exists of
     draft-chen-ebgp-error-handling-00

  == Outdated reference: A later version (-17) exists of
     draft-ietf-grow-bmp-05

  == Outdated reference: A later version (-04) exists of
     draft-ietf-idr-optional-transitive-03


     Summary: 0 errors (**), 0 flaws (~~), 5 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	Internet Engineering Task Force                                R. Shakir
3	Internet-Draft                                                       C&W
4	Intended status: Informational                            April 15, 2011
5	Expires: October 17, 2011

7	Operational Requirements for Enhanced Error Handling Behaviour in BGP-4
8	           draft-ietf-grow-ops-reqs-for-bgp-error-handling-00

10	Abstract

12	   BGP-4 is utilised as a key intra- and inter-Autonomous System routing
13	   protocol in modern IP networks.  The failure modes as defined by the
14	   original protocol standards are based on a number of assumptions
15	   around the impact of session failure.  Numerous incidents both in the
16	   global Internet routing table and within Service Provider networks
17	   have been caused by strict handling of a single invalid UPDATE
18	   message causing large-scale failures in one or more Autonomous
19	   Systems.

21	   This memo describes the current use of BGP-4 within Service Provider
22	   networks, and outlines a set of requirements for further work to
23	   enhance the mechanisms available to a BGP-4 implementation when
24	   erroneous data is detected.  Whilst this document does not provide
25	   specification of any standard, it is intended as an overview of a set
26	   of enhancements to BGP-4 to improve the protocol's robustness to suit
27	   its current deployment.

29	Status of this Memo

31	   This Internet-Draft is submitted in full conformance with the
32	   provisions of BCP 78 and BCP 79.

34	   Internet-Drafts are working documents of the Internet Engineering
35	   Task Force (IETF).  Note that other groups may also distribute
36	   working documents as Internet-Drafts.  The list of current Internet-
37	   Drafts is at http://datatracker.ietf.org/drafts/current/.

39	   Internet-Drafts are draft documents valid for a maximum of six months
40	   and may be updated, replaced, or obsoleted by other documents at any
41	   time.  It is inappropriate to use Internet-Drafts as reference
42	   material or to cite them other than as "work in progress."

44	   This Internet-Draft will expire on October 17, 2011.

46	Copyright Notice

48	   Copyright (c) 2011 IETF Trust and the persons identified as the
49	   document authors.  All rights reserved.

51	   This document is subject to BCP 78 and the IETF Trust's Legal
52	   Provisions Relating to IETF Documents
53	   (http://trustee.ietf.org/license-info) in effect on the date of
54	   publication of this document.  Please review these documents
55	   carefully, as they describe your rights and restrictions with respect
56	   to this document.  Code Components extracted from this document must
57	   include Simplified BSD License text as described in Section 4.e of
58	   the Trust Legal Provisions and are provided without warranty as
59	   described in the Simplified BSD License.

61	Table of Contents

63	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
64	     1.1.  Role of BGP-4 in Service Provider Networks . . . . . . . .  3
65	     1.2.  Overview of Operator Requirements for BGP-4 Error
66	           Handling . . . . . . . . . . . . . . . . . . . . . . . . .  4
67	   2.  Avoiding use of NOTIFICATION . . . . . . . . . . . . . . . . .  6
68	   3.  Recovering RIB Consistency . . . . . . . . . . . . . . . . . .  8
69	   4.  Reducing the Impact of Session Reset . . . . . . . . . . . . . 10
70	   5.  Operational Toolset for Monitoring BGP . . . . . . . . . . . . 12
71	   6.  Operational Complexities Introduced by Altering RFC4271  . . . 14
72	   7.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 17
73	   8.  Security Considerations  . . . . . . . . . . . . . . . . . . . 18
74	   9.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 19
75	   10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 20
76	     10.1. Normative References . . . . . . . . . . . . . . . . . . . 20
77	     10.2. Informational References . . . . . . . . . . . . . . . . . 21
78	   Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 22

80	1.  Introduction

82	   Where BGP-4 [RFC4271] is deployed in the Internet and Service
83	   Provider networks, numerous incidents have been recorded due to the
84	   manner in which [RFC4271] specifies errors in routing information
85	   should be handled.  Whilst the behaviour defined in the existing
86	   standards retains utility, the deployments of the protocol have
87	   changed within modern networks, resulting in significantly different
88	   demands for protocol robustness.  Whilst a number of Internet Drafts
89	   have been written to begin to enhance the behaviour of BGP-4 in terms
90	   of the handling of erroneous messages, this draft intends to define a
91	   set of requirements for ongoing work.  These requirements are
92	   considered from the perspective of a Network Operator, and hence this
93	   draft does not intend to define the protocol mechanisms by which such
94	   error handling behaviour is to be implemented.

96	1.1.  Role of BGP-4 in Service Provider Networks

98	   BGP was designed as an inter-Autonomous System (AS) routing protocol
99	   and hence many of the error handling mechanisms within the protocol
100	   specification are designed to be conducive to this role.  In general,
101	   this consideration as an inter-AS routing propagation mechanism
102	   results in the view that a BGP session propagates a relatively small
103	   amount of network-layer reachability information (NLRI) between two
104	   ASes.  In this case, it is the expectation of session resilience for
105	   those adjacencies that are key to routing continuity (for example, it
106	   is expected that two networks peering via BGP would connect multiple
107	   times in order to safeguard equipment or protocol failure).  In
108	   addition, there is some expectation of multiple paths to a particular
109	   NLRI being available - it would be expected that a network can fall
110	   back to utilising alternate, less direct, paths where a failure of a
111	   more direct path occurs.

113	   Traditional network architectures would deploy an Interior Gateway
114	   Protocol (IGP) to carry infrastructure and customer prefixes, with an
115	   Exterior Gateway Protocol (EGP) such as BGP being utilised to
116	   propagate these prefixes to other Autonomous Systems.  However, with
117	   the growth of IP-based services, this is no longer considered best
118	   practice.  In order to ensure that convergence is within acceptable
119	   time bounds, the amount of routing information carried within the IGP
120	   is significantly reduced - and tends to be only infrastructure
121	   prefixes. iBGP is then utilised to propagate both customer, and
122	   external prefixes within an AS.  As such, BGP has become an IGP, with
123	   traditional IGPs acting as a means by which to propagate the routing
124	   information which is required to establish a BGP session, and reach
125	   the egress node within the local routing domain.  This change in role
126	   presents different requirements for the robustness of BGP as a
127	   routing protocol - with the expectation of similar level of
128	   robustness to that of an IGP being set.

130	   Along with this change in role, the nature of the IP routing
131	   information that is carried has changed.  BGP has become a ubiquitous
132	   means by which service information can be propagated between devices.
133	   For instance, BGP is utilised to carry routing information for IP/
134	   MPLS VPN services as described in [RFC4364].  Since there is an
135	   existing deployment of the protocol between PE devices in numerous
136	   networks, it has been adapted to propagate this routing information,
137	   as its use limits number of routing protocols required on each
138	   device.  This additional information being propagated represents a
139	   large change in requirement for the error handling of the protocol -
140	   where session failure occurs, it is likely a complete service outage
141	   for at least a subset of a network's customers is experienced where
142	   an erroneous packet may have occurred within a different sub-topology
143	   or even service (a different address family for example).  For this
144	   reason, there is a significant demand to avoid service affecting
145	   failures that may be triggered by routing information within a single
146	   sub-topology or service.

148	   Both within Internet and multi-service routing architectures, a
149	   number of BGP sessions propagate a large proportion of the required
150	   routing information for network operation.  For Internet routing,
151	   these are typically BGP sessions which propagate the global routing
152	   table to an AS - failure of these sessions may have a large impact on
153	   network service, based on a single erroneous update.  In an multi-
154	   service environment, typical deployments utilise a small number of
155	   core-facing BGP sessions, typically towards route reflector devices.
156	   Failure of these sessions may also result in a large impact to
157	   network operation.  Clearly, the avoidance of conditions requiring
158	   these sessions to fail is of great utility to any network operator,
159	   and provides further motivation for the revision of the existing
160	   behaviour.

162	   Whilst the behaviour in [RFC4271] is suited to ensuring that BGP
163	   messages with erroneous routing information in are limited in scope
164	   (by means of session reset), with the above considerations, it is
165	   clear that this mechanism is not suited to all deployments.  It
166	   should, however, be noted that the change in scope affects the
167	   handling only of errors occurring after BGP session establishment.
168	   There is no current operational requirement to amend the means by
169	   which error handling in session establishment, or liveliness
170	   detection, are performed.

172	1.2.  Overview of Operator Requirements for BGP-4 Error Handling

174	   It is the intention of this document to define a set of criteria for
175	   the manner in which a revised error handling mechanism in BGP-4 is
176	   required to conform.  The motivation for the definition of these
177	   requirements can be summarised based on certain behaviour currently
178	   present in the protocol that is not deemed acceptable within current
179	   operational deployments, or where there is a short-fall in the tool
180	   set available to an operator.  These key requirements can be
181	   summarised as follows:

183	   o  It is unacceptable within modern deployments of the BGP-4 protocol
184	      that a single erroneous UPDATE packet affects prefixes that it
185	      does not carry.  This requirement therefore requires some
186	      modification to the means by which erroneous UPDATE packets are
187	      handled, and reacted to - with a particular focus on avoiding the
188	      use of the NOTIFICATION message.

190	   o  It is recognised that some error conditions may occur within the
191	      BGP-4 protocol may not always be handled gracefully, and may
192	      result in conditions whereby an implementation cannot recover.  In
193	      these (and similar) cases, it is unacceptable for an operator that
194	      this reset of the BGP-4 session results in interruption to
195	      forwarding packets (by means of withdrawing prefixes installed by
196	      BGP-4 into a device's RIB, and subsequently FIB).  To this end,
197	      there is a requirement to define a session reset mechanism which
198	      provides session re-initialisation in a non-destructive manner.

200	   o  Further to the requirements to provide a more robust protocol, the
201	      current visibility into error conditions within the BGP-4 protocol
202	      is extremely limited - where further modifications to this
203	      behaviour are to be made, complexity is likely to be added.  Thus,
204	      to ensure that BGP-4 is manageable, there are requirements for
205	      mechanisms by which the protocol can be examined and monitored.

207	   This document describes each of these requirements in further depth,
208	   along with an overview of means by which they are expected to be
209	   achieved.  In addition, the mechanism by which the enhancements
210	   meeting these requirements are to interact is discussed.

212	2.  Avoiding use of NOTIFICATION

214	   The error handling behaviour defined in RFC4271 is problematic due to
215	   the limited options that are available to an implementation.  When an
216	   erroneous BGP message is received, at the current time, the
217	   implementation must either ignore the error, or send a NOTIFICATION
218	   message, after which it is mandatory to terminate the BGP session.
219	   It is apparent that this requirement is at odds with that of protocol
220	   robustness.

222	   There is significant complexity to this requirement.  The mechanism
223	   defined in [I-D.chen-ebgp-error-handling] describes a means by which
224	   no NOTIFICATION message is generated for all cases whereby NLRI can
225	   be extracted from an UPDATE.  The NLRI contained within the erroneous
226	   UPDATE message is considered as though the remote BGP speaker has
227	   provided an UPDATE marking it as withdrawn.  This results in a limit
228	   in the propagation of the invalid routing information, whilst also
229	   ensuring that no traffic is forwarded via a previously-known path
230	   that may no longer be valid.  This mechanism is referred to as
231	   "treat-as-withdraw".

233	   Whilst this behaviour results in avoiding a NOTIFICATION message,
234	   keeping other routing information advertised by the remote BGP
235	   speaker within the RIB, it may result in unreachability for a sub-set
236	   of the NLRI advertised by the remote speaker.  Two cases should be
237	   considered - that where the entry for a prefix in the Adj-RIB-In of
238	   the neighbour propagating an erroneous packet is utilised, and that
239	   where the prefix installed in the device's RIB is learnt from another
240	   BGP speaker.  In the former case, should the identified NLRI not be
241	   treated as withdrawn, the original NLRI is utilised within the global
242	   RIB.  However, this information is potentially now invalid (i.e. it
243	   no longer provides a valid forwarding path), whilst an alternate
244	   (valid) path may exist in another Adj-RIB-In.  By continuing to
245	   utilise the NLRI for which the UPDATE was considered invalid, traffic
246	   may be forwarded via an invalid path, resulting in routing loops, or
247	   black-holing.  In the second case, no impact to the forwarding of
248	   traffic, or global RIB, is incurred, yet where treat-as-withdraw is
249	   implemented, possibly stale routing information is purged from the
250	   Adj-RIB-In of the neighbour propagating errors.

252	   Whilst mechanisms such as "treat-as-withdraw" are currently
253	   documented, the proposals are limited in their scope - particularly
254	   in terms of restrictions to implementation only on eBGP sessions.
255	   This limitation is made based on the view that the BGP RIB must be
256	   consistent across an autonomous system.  By implementing treat-as-
257	   withdraw for a iBGP session, one or more routers within the
258	   Autonomous System may not have reachability to a prefix, and hence
259	   blackholing of traffic, or routing loops, may occur.  It should,
260	   however, be considered if this view is valid, in light of the manner
261	   in which BGP is utilised within operator networks.  Inconsistency in
262	   a RIB based on a single UPDATE being treated as withdrawn may cause a
263	   inconsistency in a single sub-topology (e.g.  Layer 3 VPN service),
264	   or a service not operating completely (in the case of an UPDATE
265	   carrying service membership information).  Where a NOTIFICATION and
266	   teardown is utilised this is destructive to all sub-topologies in all
267	   address family identifiers (AFIs) carried by the session in question.
268	   Even where mechanisms such as multi-session BGP are utilised, a whole
269	   AFI is affected by such a NOTIFICATION message.  In terms of routing
270	   operation, it is therefore far less costly to endure a situation
271	   where a limited sub-set of routing information within an AS is
272	   invalid, than to consider all routing information as invalid based on
273	   a single trigger.

275	   It is considered that, if extended to cover iBGP, the mechanisms
276	   described in [I-D.chen-ebgp-error-handling] and
277	   [I-D.ietf-idr-optional-transitive] provide a means to avoid the
278	   transmission of a NOTIFICATION to a remote BGP speaker based on a
279	   single erroneous message, where at all possible, and hence meet this
280	   requirement.  The failure cases whereby NLRI cannot be extracted from
281	   the UPDATE message represent a case whereby the receiving system
282	   cannot handle the error gracefully based on this mechanism.

284	3.  Recovering RIB Consistency

286	   The recommendations described in Section 2 may result in the RIB for
287	   a topology within an AS being inconsistent across the AS' internal
288	   routers.  Alternatively, where such mechanisms are deployed at an AS
289	   boundary, interconnects between two ASes may be inconsistent with
290	   each other.  There are therefore risks of traffic blackholing, due to
291	   missing routing information, or forwarding loops.  Whilst this is
292	   deemed an acceptable compromise in the short term, clearly, it is
293	   suboptimal.  Therefore, a requirement exists to provide mechanisms by
294	   which a BGP speaker is able to recover the consistency of the Adj-
295	   RIB-In for a particular neighbour.

297	   It is envisaged that during such routing inconsistencies, the local
298	   BGP speaker is aware that some routing information was not able to be
299	   processed - due to the fact that an UPDATE message was not parsed
300	   correctly.  If the 'treat-as-withdraw' mechanism described within
301	   Section 2 is utilised, it is also possible for the local BGP speaker
302	   to have determined the set of NLRI for which an erroneous UPDATE
303	   message was received.  In this scenario, by utilising targeted
304	   mechanisms to re-request the specific NLRI that was unreachable, this
305	   routing information can be re-transmitted from the remote BGP
306	   speaker.  Such a request requires extension to the existing BGP-4
307	   protocol, in terms of specific UPDATE generation filters with a
308	   transient lifetime.  It is envisaged that the work within
309	   [I-D.zeng-one-time-prefix-orf] provides a mechanism allowing targeted
310	   elements of the Adj-RIB-In for a BGP neighbour to be recovered.

312	   In addition to such cases where specific routing information is known
313	   to be erroneous, the more general case where either a large amount of
314	   the Adj-RIB-In is contained in UPDATE messages subject to treat-as-
315	   withdraw, or the specific prefixes are unknown to the local BGP
316	   speaker must be considered.  In this case, there is a requirement for
317	   a BGP speaker to re-request the entire RIB advertised by a remote
318	   neighbour.  In this case, where such re-advertisement is required, it
319	   is envisaged that a ROUTE-REFRESH as per the description in [RFC2918]
320	   is utilised.  [I-D.keyur-bgp-enhanced-route-refresh] provides a means
321	   by which the ROUTE-REFRESH mechanism can be extended in order to meet
322	   this requirement.

324	   It is of particular note for both means of recovering RIB consistency
325	   described that these are effective only when considering transitive
326	   errors within an implementation - for instance, should an RFC
327	   interpretation error within an implementation be present, regardless
328	   of the number of times a specific UPDATE is generated, it is likely
329	   that this error condition will persist.  For this reason, there is an
330	   requirement to consider the means by which such consistency recovery
331	   mechanisms are utilised.  It is not advisable that a transitive
332	   filter and advertisement mechanism is triggered by all error handling
333	   events due to the load this is likely to place on the neighbour
334	   receiving such a request.  Where this BGP speaker is a relatively
335	   centralised device - a route reflector (as described by [RFC4456])
336	   for example - the act of generation of UPDATE messages with such
337	   frequency is likely to cause disproportionate load.  It is therefore
338	   an operational requirement of such mechanisms that means of request
339	   dampening be required by any such extension.

341	4.  Reducing the Impact of Session Reset

343	   Even where protocol enhancements allow errors in the BGP-4 protocol
344	   to cease to trigger NOTIFICATION messages, and hence reset a BGP
345	   session, it is clear that some error conditions may not be exited.
346	   In particular, errors due to existing state, or memory structures,
347	   associated with a specific BGP session will not be handled.  It is
348	   therefore important to consider how these error conditions are
349	   currently handled by the protocol.  It should be noted that the
350	   following discussion and analysis considers only those NOTIFICATION
351	   messages generated in response to errors in UPDATE messages (as
352	   defined by Section 6.3 in [RFC4271]).

354	   The existing NOTIFICATION behaviour triggers a reset of all elements
355	   of the BGP-4 session, as described in Section 6 of [RFC4271].  It is
356	   expected that session teardown requires an implementation to re-
357	   initialise all structures and state required for session maintenance.
358	   Clearly, there is some utility to this requirement, as error
359	   conditions in BGP are, in general, exited from.  However, this
360	   definition is responsible for the forwarding outages within networks
361	   utilising BGP for route propagation when each error is experienced.
362	   The requirement described in Section 2 is intended to reduce the
363	   cases whereby a NOTIFICATION is required, however, any mechanism
364	   implemented as a response to this requirement by definition cannot
365	   provide a session reset to the extent of that achieved by the current
366	   behaviour.

368	   In order to address this, there is a requirement for a means by which
369	   a BGP speaker can signal that an unhandled error condition in an
370	   UPDATE message occurred - requiring a session reset - yet also
371	   continue to utilise the paths advertised by the neighbour that are
372	   currently in use within the RIB.  In this case, the Adj-RIB-In
373	   received from the neighbour is not considered invalid, despite a
374	   NOTIFICATION, and session reset, being required.  This set of
375	   requirements is akin to those answered by the BGP Graceful Restart
376	   mechanism described in [RFC4724].  Since the operational requirement
377	   in this case is to provide a means to achieve a complete session
378	   restart without disrupting the forwarding path of those prefixes in
379	   use within a BGP speaker's RIB, it is expected that utilising a
380	   procedure similar to the Graceful Restart mechanism meets the error
381	   handling requirement.  By responding to an error condition (repeated
382	   or otherwise) with a message indicating that an error that cannot be
383	   handled has occurred, forcing session reset, whilst retaining
384	   forwarding information within the RIB allows forwarding to all
385	   prefixes within a system's RIB to continue, whilst the session
386	   restarts.  By placing a time bound on the restart lifetime, should an
387	   error condition not be transient - for example, should an error have
388	   occurred with the BGP process, rather than a specific of the BGP
389	   session - the remote BGP speaker is still detected as an invalid
390	   device for forwarding.

392	   It should, however, be noted that a protocol enhancement meeting this
393	   requirement is not able to solve all error conditions - however, a
394	   complete restart of the BGP and TCP session between two BGP speakers
395	   implements an identical recovery mechanism to that which is achieved
396	   by the existing behaviour.  Where an error condition such as memory
397	   or configuration corruption has occurred in a BGP implementation, it
398	   is expected that a mechanism meeting this requirement continues to
399	   detect this, by means of a bound on time for session restart to
400	   occur.  Whilst there may be some consideration that packets continue
401	   to be forwarded through a device which can be in an failure mode of
402	   this nature for a longer period, due to this requirement, the
403	   architecture of modern IP routers should be considered.  A divided
404	   forwarding and control plane is common in many devices, as well as
405	   process separation for software-based devices - corruption of a
406	   specific protocol daemon does not necessarily imply forwarding is
407	   affected.  Indeed, where forwarding behaviour of a device is
408	   affected, it is envisaged that a failure detection mechanism (be it
409	   Bidirectional Forwarding Detection, or indeed BGP KEEPALIVE packets)
410	   will detect such a failure in almost all cases, with the symptomatic
411	   behaviour of such a failure being an invalid UPDATE message in very
412	   few other cases.

414	5.  Operational Toolset for Monitoring BGP

416	   A significant complexity that is introduced through the requirements
417	   defined in this document is that of monitoring BGP session status for
418	   an operator.  Although the existing error handling behaviour causes a
419	   disproportionate failure, session failure is extremely visible to
420	   most operational personnel within a Network Operator due to both
421	   existing definitions of SNMP trap mechanisms for BGP, along with the
422	   forwarding impact typically caused by such a failure.  By introducing
423	   mechanisms by which errors of this nature are not as visible, this is
424	   no longer the case.  There is a requirement that where subsets of the
425	   RIB on a device are no longer reachable from a BGP speaker, or indeed
426	   an AS, that some mechanism to determine the cause is available to an
427	   operator.  Whilst, to some extent, this can be solved by mandating a
428	   sub-requirement of each of the aforementioned requirements that a BGP
429	   speaker must log where such errors occur, and are hence handled, this
430	   does not solve all cases.  In order to clarify this requirement, the
431	   example of the transmission of an erroneous Optional Transitive
432	   attribute can be considered.  Since, by definition, there is no
433	   requirement for all BGP speakers to parse such an attribute, a
434	   receiving router may treat NLRI as withdrawn based on an erroneous
435	   attribute not examined by its neighbour.  In this case, the upstream
436	   device or network, propagating the UPDATE, has no visibility of this
437	   error.  Operationally, however, it is of interest to the upstream
438	   router operator that such invalid information was propagated.

440	   The requirement for logging of error conditions in transmitted BGP
441	   messages, which are visible to only the receiver, cannot be achieved
442	   by any existing BGP message, or capability.  It is envisaged that
443	   each erroneous event should be transmitted to the remote peer -
444	   including the information as to the set of NLRI that were considered
445	   invalid.  Whilst with some mechanisms this is achieved by default
446	   (for example, One-Time Prefix ORF [I-D.zeng-one-time-prefix-orf]
447	   (Outbound Route Filtering) will transmit the set of prefixes that are
448	   required), the operator requirement is to know which prefixes may
449	   have been unreachable in all cases.  It is envisaged that an
450	   extension to meet this requirement will allow for such information to
451	   be transmitted between peers, and hence logged.  Such a mechanism may
452	   provide further utility as a either a diagnostic, or logging toolset.

454	   It should be noted that numerous work items within the IETF exist at
455	   the time of writing that begin to solve this requirement.  Within the
456	   IDR working group both [I-D.raszuk-bgp-diagnostic-message] and
457	   [I-D.ietf-idr-advisory] provide mechanisms by which such information
458	   can be propagated in-band to an existing BGP session.  Transmitting
459	   such diagnostic information in-band is considered the optimal means
460	   by which to propagate details of errors present in UPDATE messages,
461	   due to the fact that no additional protocols (and hence security and
462	   trust concerns) must be configured between two Autonomous Systems
463	   (where the errors occur at an AS boundary), and the load on each BGP
464	   speaker is increased only due to an additional capability, rather
465	   than an additional code base, and protocol.  Clearly, any mechanism
466	   implemented in-band to a BGP session is required to be relatively
467	   lightweight, since the information provided over the session is an
468	   enhancement to the operational visibility of the protocol, and should
469	   not disrupt core protocol operations.  Other, out-of-band, mechanisms
470	   - such as that proposed in [I-D.ietf-grow-bmp] are likely to provide
471	   mechanisms by which further insight into BGP operation can be
472	   achieved.  The fact that such a protocol is implemented independently
473	   of the BGP protocol results in further flexibility to provide
474	   detailed protocol data, without introducing further complexity to the
475	   BGP protocol itself.

477	6.  Operational Complexities Introduced by Altering RFC4271

479	   The existing NOTIFICATION and subsequent teardown of a BGP session
480	   upon encountering an error has the advantage that a consistent
481	   approach to error handling is required of all implementations of the
482	   BGP-4 protocol.  This is of operational advantage, as it provides a
483	   clear expectation of the behaviour of the protocol.  The requirements
484	   defined herein add further complexity to the error-handling within
485	   BGP, and hence are liable to compromise the existing deterministic
486	   protocol behaviour.  It is therefore deemed that there is a further
487	   requirement to provide a clear method by which an erroneous UPDATE
488	   should be reacted to, in order that all protocol implementations
489	   provide a consistent means by which recovery is achieved.  A further
490	   complexity is introduced due to the disparate nature of the work
491	   items altering the BGP error handling behaviour - since all items are
492	   likely to be implemented as a BGP capability [RFC5492], situations
493	   are likely to occur between devices (especially those with different
494	   BGP implementations), where some of the mechanisms referenced are
495	   unsupported.  This adds further barriers to a standard definition of
496	   the BGP-4 error handling behaviour.

498	   In general, the approach considered ideal upon encountering an
499	   erroneous UPDATE message can be divided into two cases - those where
500	   the NLRI can be determined from the message, and those where it
501	   cannot be.  The latter case is the simpler of the two.  In this case,
502	   there is a requirement for the implementation to reset the BGP
503	   session, utilising the reduced-impact approach, described in
504	   Section 4.  In the case where the remote BGP speaker is in a
505	   transient error condition related to specific peer data structures,
506	   or state, a single instance of this behaviour is likely to exit the
507	   error condition.  In the case of implementation errors, it is
508	   possible that the BGP session in question may enter a continuous loop
509	   of being reset, with a partial RIB being held by one or more of the
510	   BGP speakers due to an non-deterministic order of UPDATE propagation.
511	   It is therefore a requirement that within this reduced-impact
512	   procedure any subsequent UPDATE messages that would result in further
513	   session resets are ignored.  Whilst this results in a condition where
514	   an undetermined amount of the RIB is inconsistent, partial
515	   reachability is maintained.  In this case, the operational toolsets
516	   discussed in Section 5 is likely to provide mechanisms by which this
517	   condition can be brought to the attention of the relevant operators.
518	   This requirement to accept a partial RIB, which results in potential
519	   invalid traffic forwarding is a direct result of the deployments of
520	   BGP-4, as described in Section 1.1.

522	   The case where NLRI can be determined from an erroneous UPDATE
523	   provides further complexities.  In this case, a BGP speaker is aware
524	   of the sub-set of the RIB which have been identified as being
525	   contained within invalid UPDATE messages.  This allows a local BGP
526	   speaker to re-request single prefixes, utilising a mechanism such as
527	   "one-time prefix ORF".  However, a similar result is achieved by re-
528	   requesting the entire RIB - albeit with greater resource
529	   requirements.  It is therefore expected that the process of recovery
530	   utilises a staged set of mechanisms to attempt to restore consistency
531	   of the RIB:

533	   1.  Where available, a mechanism capable of requesting only the NLRI
534	       determined to have been contained within a invalid UPDATE should
535	       be utilised.  However, since it is possible that such an error
536	       condition can be transient in nature, it is likely that more than
537	       one request is to be transmitted (assuming the first does not
538	       return a valid UPDATE message).  In order to allow a
539	       deterministic process, there is a requirement for a limit on the
540	       number of specific requests transmitted to be defined.

542	   2.  Where a specific refresh mechanism is not available, a peer
543	       should re-request the entire RIB.  Again, there is a requirement
544	       to limit the number of complete RIB requests that should be sent
545	       via an implementation, in order to provide a bound both on the
546	       expected level of load a device may experience, and on the time
547	       for which the RIB may be inconsistent.

549	   3.  Finally, a session reset should be performed, as per the reduced-
550	       impact NOTIFICATION requirement defined in Section 4.  At this
551	       point, a similar challenge to that discussed above exists, should
552	       the error condition persist.  In this case, as defined above,
553	       there is a requirement to ignore those UPDATE messages that
554	       continue to be erroneous.

556	   It is envisaged that where limits are required, these will be defined
557	   on a per memo-basis, or within a further revision of the requirements
558	   described herein.

560	   Whilst the approach described above provides a standard means by
561	   which error recovery may be handled on a per UPDATE basis, further
562	   complexities are raised where multiple errors occur.  Clearly,
563	   following this procedure causes control-plane load on both the BGP
564	   speakers - for this reason, consideration of how repeated use of the
565	   mechanisms discussed in this document is required.  It is notable
566	   that errors may not occur with UPDATE messages relating to only a
567	   single NLRI, independent errors in multiple NLRIs may be experienced.
568	   For this reason, it is required that an implementation rate limits
569	   the number of error handling events sourced towards a particular
570	   neighbour.  It is expected that such rate limiting, or event
571	   suppression is achieved on a per-session basis, where state
572	   information is already held, rather than on a per-prefix basis as it
573	   is envisaged that such behaviour presents significant scaling
574	   problems, and introduces further state requirements for an
575	   implementation of the protocol.  It is recommended that where a flag
576	   indicative of erroneous behaviour is implemented, the state of such a
577	   value is maintained independently of session establishment.

579	7.  IANA Considerations

581	   This memo includes no request to IANA.

583	8.  Security Considerations

585	   The requirements outlined in this document provide mechanisms by
586	   which erroneous BGP messages may be responded to with limited impact
587	   to forwarding operation.  This is of benefit to the security of a BGP
588	   speaker in general.  Where UPDATE messages may have been propagated
589	   by a single malicious Autonomous System or router within a network
590	   (or the Internet default free zone - DFZ), which are then propagated
591	   to all devices within the same routing domain, all other NLRI
592	   available over the same session become unreachable.  This mechanism
593	   may provide means by which an Autonomous System can be isolated from
594	   required routing domains (such as the Internet), should the relevant
595	   UPDATE messages be propagated via specific paths.  By reducing the
596	   impact of such failures, it is envisaged that this possibility may be
597	   constrained to a specific set of NLRI, or a specific topology.

599	   Some mechanisms meeting the requirements specified in this document,
600	   particularly those within Section 5 may provide further security
601	   concerns, however, it is envisaged that these are addressed in per-
602	   enhancement memos.

604	9.  Acknowledgements

606	   The author would like to thank Rob Evans, David Freedman, Tom
607	   Hodgson, Sven Huster, Jonathan Newton, Neil McRae, Thomas Mangin, Tom
608	   Scholl and Ilya Varlashkin for their review and valuable feedback.

610	10.  References

612	10.1.  Normative References

614	   [I-D.chen-ebgp-error-handling]
615	              Chen, E., Mohapatra, P., and K. Patel, "Revised Error
616	              Handling for BGP Updates from External Neighbors",
617	              draft-chen-ebgp-error-handling-00 (work in progress),
618	              September 2010.

620	   [I-D.ietf-grow-bmp]
621	              Scudder, J., Fernando, R., and S. Stuart, "BGP Monitoring
622	              Protocol", draft-ietf-grow-bmp-05 (work in progress),
623	              December 2010.

625	   [I-D.ietf-idr-advisory]
626	              Scholl, T., Scudder, J., Steenbergen, R., and D. Freedman,
627	              "BGP Advisory Message", draft-ietf-idr-advisory-00 (work
628	              in progress), October 2009.

630	   [I-D.ietf-idr-optional-transitive]
631	              Scudder, J. and E. Chen, "Error Handling for Optional
632	              Transitive BGP Attributes",
633	              draft-ietf-idr-optional-transitive-03 (work in progress),
634	              September 2010.

636	   [I-D.keyur-bgp-enhanced-route-refresh]
637	              Patel, K., Chen, E., and B. Venkatachalapathy, "Enhanced
638	              Route Refresh Capability for BGP-4",
639	              draft-keyur-bgp-enhanced-route-refresh-02 (work in
640	              progress), March 2011.

642	   [I-D.raszuk-bgp-diagnostic-message]
643	              Raszuk, R., Chen, E., and B. Decraene, "BGP Diagnostic
644	              Message", draft-raszuk-bgp-diagnostic-message-02 (work in
645	              progress), March 2011.

647	   [I-D.zeng-one-time-prefix-orf]
648	              Zeng, Q. and J. Dong, "One-time Address-Prefix Based
649	              Outbound Route Filter for BGP-4",
650	              draft-zeng-one-time-prefix-orf-01 (work in progress),
651	              October 2010.

653	   [RFC2918]  Chen, E., "Route Refresh Capability for BGP-4", RFC 2918,
654	              September 2000.

656	   [RFC4271]  Rekhter, Y., Li, T., and S. Hares, "A Border Gateway
657	              Protocol 4 (BGP-4)", RFC 4271, January 2006.

659	   [RFC4364]  Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private
660	              Networks (VPNs)", RFC 4364, February 2006.

662	   [RFC4456]  Bates, T., Chen, E., and R. Chandra, "BGP Route
663	              Reflection: An Alternative to Full Mesh Internal BGP
664	              (IBGP)", RFC 4456, April 2006.

666	   [RFC4724]  Sangli, S., Chen, E., Fernando, R., Scudder, J., and Y.
667	              Rekhter, "Graceful Restart Mechanism for BGP", RFC 4724,
668	              January 2007.

670	   [RFC5492]  Scudder, J. and R. Chandra, "Capabilities Advertisement
671	              with BGP-4", RFC 5492, February 2009.

673	10.2.  Informational References

675	   [RFC5881]  Katz, D. and D. Ward, "Bidirectional Forwarding Detection
676	              (BFD) for IPv4 and IPv6 (Single Hop)", RFC 5881,
677	              June 2010.

679	Author's Address

681	   Rob Shakir
682	   Cable&Wireless Worldwide

684	   Email: rob.shakir@cw.com
685	   URI:   http://www.cw.com/