idnits 2.17.1 

draft-ietf-softwire-mesh-framework-06.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** The document seems to lack a License Notice according IETF Trust
     Provisions of 28 Dec 2009, Section 6.b.i or Provisions of 12 Sep 2009
     Section 6.b -- however, there's a paragraph with a matching beginning.
     Boilerplate error?

     (You're using the IETF Trust Provisions' Section 6.b License Notice from
     12 Feb 2009 rather than one of the newer Notices.  See
     https://trustee.ietf.org/license-info/.)


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document seems to contain a disclaimer for pre-RFC5378 work, and may
     have content which was first submitted before 10 November 2008.  The
     disclaimer is necessary when there are original authors that you have
     been unable to contact, or if some do not wish to grant the BCP78 rights
     to the IETF Trust.  If you are able to get all authors (current and
     original) to grant those rights, you can and should remove the
     disclaimer; otherwise, the disclaimer is needed and you can ignore this
     comment. (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (February 16, 2009) is 5548 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Outdated reference: A later version (-03) exists of
     draft-ietf-softwire-encaps-ipsec-01

  == Outdated reference: A later version (-11) exists of
     draft-ietf-bfd-base-09

  == Outdated reference: A later version (-10) exists of
     draft-ietf-l3vpn-2547bis-mcast-07

  == Outdated reference: A later version (-08) exists of
     draft-ietf-l3vpn-2547bis-mcast-bgp-05

  == Outdated reference: A later version (-15) exists of
     draft-ietf-mpls-ldp-p2mp-05

  -- Obsolete informational reference (is this intentional?): RFC 2385
     (Obsoleted by RFC 5925)

  -- Obsolete informational reference (is this intentional?): RFC 4306
     (Obsoleted by RFC 5996)


     Summary: 1 error (**), 0 flaws (~~), 6 warnings (==), 4 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                              J. Wu
3	Internet Draft                                                    Y. Cui
4	Intended Status: Standards Track                     Tsinghua University
5	Expires: August 16, 2009
6	                                                                 C. Metz
7	                                                                E. Rosen
8	                                                     Cisco Systems, Inc.

10	                                                       February 16, 2009

12	                        Softwire Mesh Framework

14	               draft-ietf-softwire-mesh-framework-06.txt

16	Status of this Memo

18	   This Internet-Draft is submitted to IETF in full conformance with the
19	   provisions of BCP 78 and BCP 79.

21	   Internet-Drafts are working documents of the Internet Engineering
22	   Task Force (IETF), its areas, and its working groups.  Note that
23	   other groups may also distribute working documents as Internet-
24	   Drafts.

26	   Internet-Drafts are draft documents valid for a maximum of six months
27	   and may be updated, replaced, or obsoleted by other documents at any
28	   time.  It is inappropriate to use Internet-Drafts as reference
29	   material or to cite them other than as "work in progress."

31	   The list of current Internet-Drafts can be accessed at
32	   http://www.ietf.org/ietf/1id-abstracts.txt.

34	   The list of Internet-Draft Shadow Directories can be accessed at
35	   http://www.ietf.org/shadow.html.

37	Copyright and License Notice

39	   Copyright (c) 2009 IETF Trust and the persons identified as the
40	   document authors. All rights reserved.

42	   This document is subject to BCP 78 and the IETF Trust's Legal
43	   Provisions Relating to IETF Documents in effect on the date of
44	   publication of this document (http://trustee.ietf.org/license-info).
45	   Please review these documents carefully, as they describe your rights
46	   and restrictions with respect to this document.

48	   This document may contain material from IETF Documents or IETF
49	   Contributions published or made publicly available before November
50	   10, 2008.  The person(s) controlling the copyright in some of this
51	   material may not have granted the IETF Trust the right to allow
52	   modifications of such material outside the IETF Standards Process.
53	   Without obtaining an adequate license from the person(s) controlling
54	   the copyright in such materials, this document may not be modified
55	   outside the IETF Standards Process, and derivative works of it may
56	   not be created outside the IETF Standards Process, except to format
57	   it for publication as an RFC or to translate it into languages other
58	   than English.

60	Abstract

62	   The Internet needs to be able to handle both IPv4 and IPv6 packets.
63	   However, it is expected that some constituent networks of the
64	   Internet will be "single protocol" networks.  One kind of single
65	   protocol network can parse only IPv4 packets and can process only
66	   IPv4 routing information; another kind can parse only IPv6 packets
67	   and can process only IPv6 routing information.  It is nevertheless
68	   required that either kind of single protocol network be able to
69	   provide transit service for the "other" protocol.  This is done by
70	   passing the "other kind" of routing information from one edge of the
71	   single protocol network to the other, and by tunneling the "other
72	   kind" of data packet from one edge to the other.  The tunnels are
73	   known as "Softwires".  This framework document explains how the
74	   routing information and the data packets of one protocol are passed
75	   through a single protocol network of the other protocol.  The
76	   document is careful to specify when this can be done with existing
77	   technology, and when it requires the development of new or modified
78	   technology.

80	Table of Contents

82	 1          Specification of requirements  .........................   4
83	 2          Introduction  ..........................................   4
84	 3          Scenarios of Interest  .................................   7
85	 3.1        IPv6-over-IPv4 Scenario  ...............................   7
86	 3.2        IPv4-over-IPv6 Scenario  ...............................   9
87	 4          General Principles of the Solution  ....................  11
88	 4.1        'E-IP' and 'I-IP'  .....................................  11
89	 4.2        Routing  ...............................................  11
90	 4.3        Tunneled Forwarding  ...................................  12
91	 5          Distribution of Inter-AFBR Routing Information  ........  12
92	 6          Softwire Signaling  ....................................  14
93	 7          Choosing to Forward Through a Softwire  ................  16
94	 8          Selecting a Tunneling Technology  ......................  16
95	 9          Selecting the Softwire for a Given Packet  .............  17
96	10          Softwire OAM and MIBs  .................................  18
97	10.1        Operations and Maintenance (OAM)  ......................  18
98	10.2        MIBs  ..................................................  19
99	11          Softwire Multicast  ....................................  19
100	11.1        One-to-One Mappings  ...................................  20
101	11.1.1      Using PIM in the Core  .................................  20
102	11.1.2      Using mLDP and Multicast MPLS in the Core  .............  21
103	11.2        MVPN-like Schemes  .....................................  22
104	12          Inter-AS Considerations  ...............................  23
105	13          IANA Considerations  ...................................  24
106	14          Security Considerations  ...............................  24
107	14.1        Problem Analysis  ......................................  24
108	14.2        Non-cryptographic techniques  ..........................  26
109	14.3        Cryptographic techniques  ..............................  27
110	15          Contributors  ..........................................  28
111	16          Acknowledgments  .......................................  29
112	17          Normative References  ..................................  30
113	18          Informative References  ................................  31
114	1. Specification of requirements

116	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
117	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
118	   document are to be interpreted as described in [RFC2119].

120	2. Introduction

122	   The routing information in any IP backbone network can be thought of
123	   as being in one of two categories: "internal routing information" or
124	   "external routing information".  The internal routing information
125	   consists of routes to the nodes that belong to the backbone, and to
126	   the interfaces of those nodes.  External routing information consists
127	   of routes to destinations beyond the backbone, especially
128	   destinations to which the backbone is not directly attached.  In
129	   general, BGP [RFC4271] is used to distribute external routing
130	   information, and an "Interior Gateway Protocol" (IGP) such as OSPF
131	   [RFC2328] or IS-IS [RFC1195] is used to distribute internal routing
132	   information.

134	   Often an IP backbone will provide transit routing services for
135	   packets that originate outside the backbone, and whose destinations
136	   are outside the backbone.  These packets enter the backbone at one of
137	   its "edge routers".  They are routed through the backbone to another
138	   edge router, after which they leave the backbone and continue on
139	   their way. The edge nodes of the backbone are often known as
140	   "Provider Edge" (PE) routers.  The term "ingress" (or "ingress PE")
141	   refers to the router at which a packet enters the backbone, and the
142	   term "egress" (or "egress PE") refers to the router at which it
143	   leaves the backbone.  Interior nodes are often known as "P routers".
144	   Routers which are outside the backbone but directly attached to it
145	   are known as "Customer Edge" (CE) routers.  (This terminology is
146	   taken from [RFC4364].)

148	   When a packet's destination is outside the backbone, the routing
149	   information which is needed within the backbone in order to route the
150	   packet to the proper egress is, by definition, external routing
151	   information.

153	   Traditionally, the external routing information has been distributed
154	   by BGP to all the routers in the backbone, not just to the edge
155	   routers (i.e., not just to the ingress and egress points).  Each of
156	   the interior nodes has been expected to look up the packet's
157	   destination address and route it towards the egress point.  This is
158	   known as "native forwarding":  the interior nodes look into each
159	   packet's header in order to match the information in the header with
160	   the external routing information.

162	   It is, however, possible to provide transit services without
163	   requiring that all the backbone routers have the external routing
164	   information.  The routing information which BGP distributes to each
165	   ingress router specifies the egress router for each route.  The
166	   ingress router can therefore "tunnel" the packet directly to the
167	   egress router.  "Tunneling the packet" means putting on some sort of
168	   encapsulation header which will force the interior routers to forward
169	   the packet to the egress router.  The original packet is known as the
170	   "encapsulation payload".  The P routers do not look at the packet
171	   header of the payload, but only at the encapsulation header.  Since
172	   the path to the egress router is part of the internal routing
173	   information of the backbone, the interior routers then do not need to
174	   know the external routing information.  This is known as "tunneled
175	   forwarding".  Of course, before the packet can leave the egress, it
176	   has to be decapsulated.

178	   The scenario where the P routers do not have external routes is
179	   sometimes known as a "BGP-free core".  That is something of a
180	   misnomer, though, since the crucial aspect of this scenario is not
181	   that the interior nodes don't run BGP, but that they don't maintain
182	   the external routing information.

184	   In recent years, we have seen this scenario deployed to support VPN
185	   services, as specified in [RFC4364].  An edge router maintains
186	   multiple independent routing/addressing spaces, one for each VPN to
187	   which it interfaces.  However, the routing information for the VPNs
188	   is not maintained by the interior routers.  In most of these
189	   scenarios, MPLS is used as the encapsulation mechanism for getting
190	   the packets from ingress to egress.  There are some deployments in
191	   which an IP-based encapsulation, such as L2TPv3 (Layer 2 Transport
192	   Protocol) [RFC3931] or GRE (Generic Routing Encapsulation) [RFC2784]
193	   is used.

195	   This same technique can also be useful when the external routing
196	   information consists not of VPN routes, but of "ordinary" Internet
197	   routes.  It can be used any time it is desired to keep external
198	   routing information out of a backbone's interior nodes, or in fact
199	   any time it is desired for any reason to avoid the native forwarding
200	   of certain kinds of packets.

202	   This framework focuses on two such scenarios.

204	      1. In this scenario, the backbone's interior nodes support only
205	         IPv6.  They do not maintain IPv4 routes at all, and are not
206	         expected to parse IPv4 packet headers.  Yet it is desired to
207	         use such a backbone to provide transit services for IPv4
208	         packets.  Therefore tunneled forwarding of IPv4 packets is
209	         required.  Of course, the edge nodes must have the IPv4 routes,
210	         but the ingress must perform an encapsulation in order to get
211	         an IPv4 packet forwarded to the egress.

213	      2. This scenario is the reverse of scenario 1, i.e., the
214	         backbone's interior nodes support only IPv4, but it is desired
215	         to use the backbone for IPv6 transit.

217	   In these scenarios, a backbone whose interior nodes support only one
218	   of the two address families is required to provide transit services
219	   for the other.  The backbone's edge routers must, of course, support
220	   both address families.  We use the term "Address Family Border
221	   Router" (AFBR) to refer to these PE routers.  The tunnels that are
222	   used for forwarding are referred to as "softwires".

224	   These two scenarios are known as the "Softwire Mesh Problem" [SW-
225	   PROB], and the framework specified in this draft is therefore known
226	   as the "Softwire Mesh Framework".  In this framework, only the AFBRs
227	   need to support both address families.  The CE routers support only a
228	   single address family, and the P routers support only the other
229	   address family.

231	   It is possible to address these scenarios via a large variety of
232	   tunneling technologies.  This framework does not mandate the use of
233	   any particular tunneling technology.  In any given deployment, the
234	   choice of tunneling technology is a matter of policy.  The framework
235	   accommodates at least the use of MPLS ([RFC3031], [RFC3032]), both
236	   LDP-based (Label Distribution Protocol, [RFC5036]) and RSVP-TE-based
237	   ([RFC3209]), L2TPv3 [RFC3931], GRE [RFC2784], and IP-in-IP [RFC2003].
238	   The framework will also accommodate the use of IPsec tunneling, when
239	   that is necessary in order to meet security requirements.

241	   It is expected that in many deployments, the choice of tunneling
242	   technology will be made by a simple expression of policy, such as
243	   "always use IP-IP tunnels", or "always use LDP-based MPLS", or
244	   "always use L2TPv3".

246	   However, other deployments may have a mixture of routers, some of
247	   which support, say, both GRE and L2TPv3, but others of which support
248	   only one of those techniques.  It is desirable therefore to allow the
249	   network administration to create a small set of classes, and to
250	   configure each AFBR to be a member of one or more of these classes.
251	   Then the routers can advertise their class memberships to each other,
252	   and the encapsulation policies can be expressed as, e.g., "use L2TPv3
253	   to tunnel to routers in class X, use GRE to tunnel to routers in
254	   class Y".  To support such policies, it is necessary for the AFBRs to
255	   be able to advertise their class memberships; a standard way of doing
256	   this must be developed.

258	   Policy may also require a certain class of traffic to receive a
259	   certain quality of service, and this may impact the choice of tunnel
260	   and/or tunneling technology used for packets in that class.  This
261	   needs to be accommodated by the softwires framework.

263	   The use of tunneled forwarding often requires that some sort of
264	   signaling protocol be used to set up and/or maintain the tunnels.
265	   Many of the tunneling technologies accommodated by this framework
266	   already have their own signaling protocols.  However, some do not,
267	   and in some cases the standard signaling protocol for a particular
268	   tunneling technology may not be appropriate, for one or another
269	   reason, in the scenarios of interest.  In such cases (and in such
270	   cases only), new signaling methodologies need to be defined and
271	   standardized.

273	   In this framework, the softwires do not form an overlay topology
274	   which is visible to routing; routing adjacencies are not maintained
275	   over the softwires, and routing control packets are not sent through
276	   the softwires.  Routing adjacencies among backbone nodes (including
277	   the edge nodes) are maintained via the native technology of the
278	   backbone.

280	   There is already a standard routing method for distributing external
281	   routing information among AFBRs, namely BGP.  However, in the
282	   scenarios of interest, we may be using IPv6-based BGP sessions to
283	   pass IPv4 routing information, and we may be using IPv4-based BGP
284	   sessions to pass IPv6 routing information.  Furthermore, when IPv4
285	   traffic is to be tunneled over an IPv6 backbone, it is necessary to
286	   encode the "BGP next hop" for an IPv4 route as an IPv6 address, and
287	   vice versa.  The method for encoding an IPv4 address as the next hop
288	   for an IPv6 route is specified in [V6NLRI-V4NH]; the method for
289	   encoding an IPv6 address as the next hop for an IPv4 route is
290	   specified in [V4NLRI-V6NH].

292	3. Scenarios of Interest

294	3.1. IPv6-over-IPv4 Scenario

296	   In this scenario, the client networks run IPv6 but the backbone
297	   network runs IPv4.  This is illustrated in Figure 1.

299	                       +--------+   +--------+
300	                       | IPv6   |   |  IPv6  |
301	                       | Client |   | Client |
302	                       | Network|   | Network|
303	                       +--------+   +--------+
304	                           |   \     /   |
305	                           |    \   /    |
306	                           |     \ /     |
307	                           |      X      |
308	                           |     / \     |
309	                           |    /   \    |
310	                           |   /     \   |
311	                       +--------+   +--------+
312	                       |  AFBR  |   |  AFBR  |
313	                    +--| IPv4/6 |---| IPv4/6 |--+
314	                    |  +--------+   +--------+  |
315	    +--------+      |                           |       +--------+
316	    | IPv4   |      |                           |       | IPv4   |
317	    | Client |      |                           |       | Client |
318	    | Network|------|            IPv4           |-------| Network|
319	    +--------+      |            only           |       +--------+
320	                    |                           |
321	                    |  +--------+   +--------+  |
322	                    +--|  AFBR  |---|  AFBR  |--+
323	                       | IPv4/6 |   | IPv4/6 |
324	                       +--------+   +--------+
325	                         |   \     /   |
326	                         |    \   /    |
327	                         |     \ /     |
328	                         |      X      |
329	                         |     / \     |
330	                         |    /   \    |
331	                         |   /     \   |
332	                      +--------+   +--------+
333	                      |  IPv6  |   |  IPv6  |
334	                      | Client |   | Client |
335	                      | Network|   | Network|
336	                      +--------+   +--------+

338	                    Figure 1 IPv6-over-IPv4 Scenario

340	The IPv4 transit core may or may not run MPLS.  If it does, MPLS may be
341	used as part of the solution.

343	While Figure 1 does not show any "backdoor" connections among the client
344	networks, this framework assumes that there will be such connections.

346	That is, there is no assumption that the only path between two client
347	networks is via the pictured transit core network.  Hence the routing
348	solution must be robust in any kind of topology.

350	Many mechanisms for providing IPv6 connectivity across IPv4 networks
351	have been devised over the past ten years.  A number of different
352	tunneling mechanisms have been used, some provisioned manually, others
353	based on special addressing.  More recently, L3VPN (Layer 3 Virtual
354	Private Network) techniques from [RFC4364] have been extended to provide
355	IPv6 connectivity, using MPLS in the AFBRs and optionally in the
356	backbone [V6NLRI-V4NH].  The solution described in this framework can be
357	thought of as a superset of [V6NLRI-V4NH], with a more generalized
358	scheme for choosing the tunneling (softwire) technology.  In this
359	framework, MPLS is allowed, but not required, even at the AFBRs.  As in
360	[V6NLRI-V4NH], there is no manual provisioning of tunnels, and no
361	special addressing is required.

363	3.2. IPv4-over-IPv6 Scenario

365	   In this scenario, the client networks run IPv4 but the backbone
366	   network runs IPv6.  This is illustrated in Figure 2.

368	                       +--------+   +--------+
369	                       | IPv4   |   |  IPv4  |
370	                       | Client |   | Client |
371	                       | Network|   | Network|
372	                       +--------+   +--------+
373	                           |   \     /   |
374	                           |    \   /    |
375	                           |     \ /     |
376	                           |      X      |
377	                           |     / \     |
378	                           |    /   \    |
379	                           |   /     \   |
380	                       +--------+   +--------+
381	                       |  AFBR  |   |  AFBR  |
382	                    +--| IPv4/6 |---| IPv4/6 |--+
383	                    |  +--------+   +--------+  |
384	    +--------+      |                           |       +--------+
385	    | IPv6   |      |                           |       | IPv6   |
386	    | Client |      |                           |       | Client |
387	    | Network|------|            IPv6           |-------| Network|
388	    +--------+      |            only           |       +--------+
389	                    |                           |
390	                    |  +--------+   +--------+  |
391	                    +--|  AFBR  |---|  AFBR  |--+
392	                       | IPv4/6 |   | IPv4/6 |
393	                       +--------+   +--------+
394	                         |   \     /   |
395	                         |    \   /    |
396	                         |     \ /     |
397	                         |      X      |
398	                         |     / \     |
399	                         |    /   \    |
400	                         |   /     \   |
401	                      +--------+   +--------+
402	                      |  IPv4  |   |  IPv4  |
403	                      | Client |   | Client |
404	                      | Network|   | Network|
405	                      +--------+   +--------+

407	                    Figure 2 IPv4-over-IPv6 Scenario

409	The IPv6 transit core may or may not run MPLS.  If it does, MPLS may be
410	used as part of the solution.

412	While Figure 2 does not show any "backdoor" connections among the client
413	networks, this framework assumes that there will be such connections.

415	That is, there is no assumption the only path between two client
416	networks is via the pictured transit core network.  Hence the routing
417	solution must be robust in any kind of topology.

419	While the issue of IPv6-over-IPv4 has received considerable attention in
420	the past, the scenario of IPv4-over-IPv6 has not.  Yet it is a
421	significant emerging requirement, as a number of service providers are
422	building IPv6 backbone networks and do not wish to provide native IPv4
423	support in their core routers.  These service providers have a large
424	legacy of IPv4 networks and applications that need to operate across
425	their IPv6 backbone.  Solutions for this do not exist yet because it had
426	always been assumed that the backbone networks of the foreseeable future
427	would be dual stack.

429	4. General Principles of the Solution

431	   This section gives a very brief overview of the procedures.  The
432	   subsequent sections provide more detail.

434	4.1. 'E-IP' and 'I-IP'

436	   In the following we use the term "I-IP" ("Internal IP") to refer to
437	   the form of IP (i.e., either IPv4 or IPv6) that is supported by the
438	   transit network.  We use the term "E-IP" ("External IP") to refer to
439	   the form of IP that is supported by the client networks.   In the
440	   scenarios of interest, E-IP is IPv4 if and only if I-IP is IPv6, and
441	   E-IP is IPv6 if and only if I-IP is IPv4.

443	   We assume that the P routers support only I-IP.  That is, they are
444	   expected to have only I-IP routing information, and they are not
445	   expected to be able to parse E-IP headers.  We similarly assume that
446	   the CE routers support only E-IP.

448	   The AFBRs handle both I-IP and E-IP. However, only I-IP is used on
449	   AFBR's "core facing interfaces", and E-IP is only used on its client-
450	   facing interfaces.

452	4.2. Routing

454	   The P routers and the AFBRs of the transit network participate in an
455	   IGP, for the purposes of distributing I-IP routing information.

457	   The AFBRs use IBGP to exchange E-IP routing information with each
458	   other.  Either there is a full mesh of IBGP connections among the
459	   AFBRs, or else some or all of the AFBRs are clients of a BGP Route
460	   Reflector.  Although these IBGP connections are used to pass E-IP
461	   routing information (i.e., the NLRI of the BGP updates is in the E-IP
462	   address family), the IBGP connections run over I-IP, and the "BGP
463	   next hop" for each E-IP NLRI is in the I-IP address family.

465	4.3. Tunneled Forwarding

467	   When an ingress AFBR receives an E-IP packet from a client facing
468	   interface, it looks up the packet's destination IP address.  In the
469	   scenarios of interest, the best match for that address will be a BGP-
470	   distributed route whose next hop is the I-IP address of another AFBR,
471	   the egress AFBR.

473	   The ingress AFBR must forward the packet through a tunnel (i.e,
474	   through a "softwire") to the egress AFBR.  This is done by
475	   encapsulating the packet, using an encapsulation header which the P
476	   routers can process, and which will cause the P routers to send the
477	   packet to the egress AFBR.  The egress AFBR then extracts the
478	   payload, i.e., the original E-IP packet, and forwards it further by
479	   looking up its IP destination address.

481	   Several kinds of tunneling technologies are supported.  Some of those
482	   technologies require explicit AFBR-to-AFBR signaling before the
483	   tunnel can be used, others do not.

485	   Transmitting a packet through a softwire always requires that an
486	   encapsulation header be added to the original packet.  The resulting
487	   packet is therefore always longer than the encapsulation payload.  As
488	   an operational matter, the Maximum Transmission Unit (MTU) of the
489	   softwire's path SHOULD be large enough so that (a) no packet will
490	   need to be fragmented before being encapsulated, and (b) no
491	   encapsulated packet will need to be fragmented while it is being
492	   forwarded along a softwire.  A general discussion of MTU issues in
493	   the context of tunneled forwarding may be found in [RFC4459].

495	5. Distribution of Inter-AFBR Routing Information

497	   AFBRs peer with routers in the client networks to exchange routing
498	   information for the E-IP family.

500	   AFBRs use BGP to distribute the E-IP routing information to each
501	   other.  This can be done by an AFBR-AFBR mesh of IBGP sessions, but
502	   more likely is done through a BGP Route Reflector, i.e., where each
503	   AFBR has an IBGP session to one or two Route Reflectors, rather than
504	   to other AFBRs.

506	   The BGP sessions between the AFBRs, or between the AFBRs and the
507	   Route Reflector, will run on top of the I-IP address family.  That
508	   is, if the transit core supports only IPv6, the IBGP sessions used to
509	   distribute IPv4 routing information from the client networks will run
510	   over IPv6; if the transit core supports only IPv4, the IBGP sessions
511	   used to distribute IPv6 routing information from the client networks
512	   will run over IPv4.  The BGP sessions thus use the native networking
513	   layer of the core; BGP messages are NOT tunneled through softwires or
514	   through any other mechanism.

516	   In BGP, a routing update associates an address prefix (or more
517	   generally, "Network Layer Reachability Information", or NLRI) with
518	   the address of a "BGP Next Hop" (NH). The NLRI is associated with a
519	   particular address family.  The NH address is also associated with a
520	   particular address family, which may be the same as or different than
521	   the address family associated with the NLRI.  Generally the NH
522	   address belongs to the address family that is used to communicate
523	   with the BGP speaker to whom the NH address belongs.

525	   Since routing updates which contain information about E-IP address
526	   prefixes are carried over BGP sessions that use I-IP transport, and
527	   since the BGP messages are not tunneled, a BGP update providing
528	   information about an E-IP address prefix will need to specify a next
529	   hop address in the I-IP family.

531	   Due to a variety of historical circumstances, when the NLRI and the
532	   NH in a given BGP update are of different address families, it is not
533	   always obvious how the NH should be encoded.  There is a different
534	   encoding procedure for each pair of address families.

536	   In the case where the NLRI is in the IPv6 address family, and the NH
537	   is in the IPv4 address family, [V6NLRI-V4NH] explains how to encode
538	   the NH.

540	   In the case where the NLRI is in the IPv4 address family, and the NH
541	   is in the IPv6 address family, [V4NLRI-V6NH] explains how to encode
542	   the NH.

544	   If a BGP speaker sends an update for an NLRI in the E-IP family, and
545	   the update is being sent over a BGP session that is running on top of
546	   the I-IP network layer, and the BGP speaker is advertising itself as
547	   the NH for that NLRI, then the BGP speaker MUST, unless explicitly
548	   overridden by policy, specify the NH address in the I-IP family.  The
549	   address family of the NH MUST NOT be changed by a Route Reflector.

551	   In some cases (e.g., when [V4NLRI-V6NH] is used), one cannot follow
552	   this rule unless one's BGP peers have advertised a particular BGP
553	   capability.  This leads to the following softwires deployment
554	   restriction: if a BGP Capability is defined for the case in which an
555	   E-IP NLRI has an I-IP NH, all the AFBRs in a given transit core MUST
556	   advertise that capability.

558	   If an AFBR has multiple IP addresses, the network administrators
559	   usually have considerable flexibility in choosing which one the AFBR
560	   uses to identify itself as the next hop in a BGP update.  However, if
561	   the AFBR expects to receive packets through a softwire of a
562	   particular tunneling technology, and if the AFBR is known to that
563	   tunneling technology via a specific IP address, then that same IP
564	   address must be used to identify the AFBR in the next hop field of
565	   the BGP updates.  For example, if L2TPv3 tunneling is used, then the
566	   IP address which the AFBR uses when engaging in L2TPv3 signaling must
567	   be the same as the IP address it uses to identify itself in the next
568	   hop field of a BGP update.

570	   In [V6NLRI-V4NH], IPv6 routing information is distributed using the
571	   labeled IPv6 address family.  This allows the egress AFBR to
572	   associate an MPLS label with each IPv6 address prefix.  If an ingress
573	   AFBR forwards packets through a softwire than can carry MPLS packets,
574	   each data packet can carry the MPLS label corresponding to the IPv6
575	   route that it matched.  This may be useful at the egress AFBR, for
576	   demultiplexing and/or enhanced performance.  It is also possible to
577	   do the same for the IPv4 address family, i.e., to use the labeled
578	   IPv4 address family instead of the IPv4 address family.  The use of
579	   the labeled IP address families in this manner is OPTIONAL.

581	6. Softwire Signaling

583	   A mesh of inter-AFBR softwires spanning the transit core must be in
584	   place before packets can flow between client networks.  Given N dual-
585	   stack AFBRs, this requires N^2 "point-to-point IP" or "label switched
586	   path" (LSP) tunnels.  While in theory these could be configured
587	   manually, that would result in a very undesirable O(N^2) provisioning
588	   problem.  Therefore manual configuration of point-to-point tunnels is
589	   not considered part of this framework.

591	   Because the transit core is providing layer 3 transit services,
592	   point-to-point tunnels are not required by this framework;
593	   multipoint-to-point tunnels are all that is needed.  In a multipoint-
594	   to-point tunnel, when a packet emerges from the tunnel there is no
595	   way to tell which router put the packet into the tunnel.  This models
596	   the native IP forwarding paradigm, wherein the egress router cannot
597	   determine a given packet's ingress router.  Of course, point-to-point
598	   tunnels might be required for some reason which goes beyond the basic
599	   requirements described in this document.  E.g., QoS or security
600	   considerations might require the use of point-to-point tunnels.  So
601	   point-to-point tunnels are allowed, but not required, by this
602	   framework.

604	   If it is desired to use a particular tunneling technology for the
605	   softwires, and if that technology has its own "native" signaling
606	   methodology, the presumption is that the native signaling will be
607	   used.  This would certainly apply to MPLS-based softwires, where LDP
608	   or RSVP-TE would be used.  A softwire based on IPsec would use
609	   standard IKEv2 (Internet Key Exchange) [RFC4306] and IPsec [RFC4301]
610	   signaling, as that is necessary in order to guarantee the softwire's
611	   security properties.

613	   A Softwire based on GRE might or might not require signaling,
614	   depending on whether various optional GRE header fields are to be
615	   used.  GRE does not have any "native" signaling, so for those cases,
616	   a signaling procedure needs to be developed to support Softwires.

618	   Another possible softwire technology is L2TPv3.  While L2TPv3 does
619	   have its own native signaling, that signaling sets up point-to-point
620	   tunnels.  For the purpose of softwires, it is better to use L2TPv3 in
621	   a multipoint-to-point mode, and this requires a different kind of
622	   signaling.

624	   The signaling to be used for GRE and L2TPv3 to cover these scenarios
625	   is BGP-based, and is described in [ENCAPS-SAFI].

627	   If IP-IP tunneling is used, or if GRE tunneling is used without
628	   options, no signaling is required, as the only information needed by
629	   the ingress AFBR to create the encapsulation header is the IP address
630	   of the egress AFBR, and that is distributed by BGP.

632	   When the encapsulation IP header is constructed, there may be fields
633	   in the IP whose value is determined neither by whatever signaling has
634	   been done nor by the distributed routing information.  The values of
635	   these fields are determined by policy in the ingress AFBR.  Examples
636	   of such fields may be the TTL (Time to Live) field, the DSCP
637	   (DiffServ Service Classes) bits, etc.

639	   It is desirable for all necessary softwires to be fully set up before
640	   the arrival of any packets which need to go through the softwires.
641	   That is, the softwires should be "always on".  From the perspective
642	   of any particular AFBR, the softwire endpoints are always BGP next
643	   hops of routes which the AFBR has installed.  This suggests that any
644	   necessary softwire signaling should be either be done as part of
645	   normal system startup (as would happen, e.g., with LDP-based MPLS),
646	   or else should be triggered by the reception of BGP routing
647	   information (such as is described in [ENCAPS-SAFI]); it is also
648	   helpful if distribution of the routing information that serves as the
649	   trigger is prioritized.

651	7. Choosing to Forward Through a Softwire

653	   The decision to forward through a softwire, instead of to forward
654	   natively, is made by the ingress AFBR.  This decision is a matter of
655	   policy.

657	   In many cases, the policy will be very simple.  Some useful policies
658	   are:

660	     - if routing says that an E-IP packet has to be sent out a "core-
661	       facing interface" to an I-IP core, then send the packet through a
662	       softwire

664	     - if routing says that an E-IP packet has to be sent out an
665	       interface that only supports I-IP packets, then send the E-IP
666	       packets through a softwire

668	     - if routing says that the BGP next hop address for an E-IP packet
669	       is an I-IP address, then send the E-IP packets through a softwire

671	     - if the route which is the best match for a particular packet's
672	       destination address is a BGP-distributed route, then send the
673	       packet through a softwire (i.e., tunnel all BGP-routed packets).

675	   More complicated policies are also possible, but a consideration of
676	   those policies is outside the scope of this document.

678	8. Selecting a Tunneling Technology

680	   The choice of tunneling technology is a matter of policy configured
681	   at the ingress AFBR.

683	   It is envisioned that in most cases, the policy will be a very simple
684	   one, and will be the same at all the AFBRs of a given transit core.
685	   E.g., "always use LDP-based MPLS", or "always use L2TPv3".

687	   However, other deployments may have a mixture of routers, some of
688	   which support, say, both GRE and L2TPv3, but others of which support
689	   only one of those techniques.  It is desirable therefore to allow the
690	   network administration to create a small set of classes, and to
691	   configure each AFBR to be a member of one or more of these classes.
692	   Then the routers can advertise their class memberships to each other,
693	   and the encapsulation policies can be expressed as, e.g., "use L2TPv3
694	   to talk to routers in class X, use GRE to talk to routers in class
695	   Y".  To support such policies, it is necessary for the AFBRs to be
696	   able to advertise their class memberships.  [ENCAPS-SAFI] specifies a
697	   way in which an AFBR may advertise, to other AFBRS, various
698	   characteristics which may be relevant to the policy (e.g., "I belong
699	   to class Y").  In many cases, these characteristics can be
700	   represented by arbitrarily selected communities or extended
701	   communities, and the policies at the ingress can be expressed in
702	   terms of these classes (i.e., communities).

704	   Policy may also require a certain class of traffic to receive a
705	   certain quality of service, and this may impact the choice of tunnel
706	   and/or tunneling technology used for packets in that class.  This
707	   framework allows a variety of tunneling technologies to be used for
708	   instantiating softwires.  The choice of tunneling technology is a
709	   matter of policy, as discussed in section 2.

711	   While in many cases the policy will be unconditional, e.g., "always
712	   use L2TPv3 for softwires", in other cases the policy may specify that
713	   the choice is conditional upon information about the softwire remote
714	   endpoint, e.g., "use L2TPv3 to talk to routers in class X, use GRE to
715	   talk to routers in class Y".  It is desirable therefore to allow the
716	   network administration to create a small set of classes, and to
717	   configure each AFBR to be a member of one or more of these classes.
718	   If each such class is represented as a community or extended
719	   community, then [ENCAPS-SAFI] specifies a method that AFBRs can use
720	   to advertise their class memberships to each other.

722	   This framework also allows for policies of arbitrary complexity,
723	   which may depend on characteristics or attributes of individual
724	   address prefixes, as well as on QoS or security considerations.
725	   However, the specification of such policies is not within the scope
726	   of this document.

728	9. Selecting the Softwire for a Given Packet

730	   Suppose it has been decided to send a given packet through a
731	   softwire.  Routing provides the address, in the address family of the
732	   transport network, of the BGP next hop.  The packet MUST be sent
733	   through a softwire whose remote endpoint address is the same as the
734	   BGP next hop address.

736	   Sending a packet through a softwire is a matter of encapsulating the
737	   packet with an encapsulation header that can be processed by the
738	   transit network, and then transmitting towards the softwire's remote
739	   endpoint address.

741	   In many cases, once one knows the remote endpoint address, one has
742	   all the information one needs in order to form the encapsulation
743	   header.  This will be the case if the tunnel technology instantiating
744	   the softwire is, e.g., LDP-based MPLS, IP-in-IP, or GRE without
745	   optional header fields.

747	   If the tunnel technology being used is L2TPv3 or GRE with optional
748	   header fields, additional information from the remote endpoint is
749	   needed in order to form the encapsulation header.  The procedures for
750	   sending and receiving this information are described in [ENCAPS-
751	   SAFI].

753	   If the tunnel technology being used is RSVP-TE-based MPLS or IPsec,
754	   the native signaling procedures of those technologies will need to be
755	   used.

757	   If the packet being sent through the softwire matches a route in the
758	   labeled IPv4 or labeled IPv6 address families, it should be sent
759	   through the softwire as an MPLS packet with the corresponding label.
760	   Note that most of the tunneling technologies mentioned in this
761	   document are capable of carrying MPLS packets, so this does not
762	   presuppose support for MPLS in the core routers.

764	10. Softwire OAM and MIBs

766	10.1. Operations and Maintenance (OAM)

768	   Softwires are essentially tunnels connecting routers.  If they
769	   disappear or degrade in performance then connectivity through those
770	   tunnels will be impacted.  There are several techniques available to
771	   monitor the status of the tunnel end-points (AFBRs) as well as the
772	   tunnels themselves.  These techniques allow operations such as
773	   softwires path tracing, remote softwire end-point pinging and remote
774	   softwire end-point liveness failure detection.

776	   Examples of techniques applicable to softwire OAM include:

778	     o BGP/TCP timeouts between AFBRs

780	     o ICMP or LSP echo request and reply addressed to a particular AFBR

782	     o BFD (Bidirectional Forwarding Detection) [BFD] packet exchange
783	       between AFBR routers

785	   Another possibility for softwire OAM is to build something similar to
786	   [RFC4378] or in other words creating and generating softwire echo
787	   request/reply packets.  The echo request sent to a well-known UDP
788	   port would contain the egress AFBR IP address and the softwire
789	   identifier as the payload (similar to the MPLS forwarding equivalence
790	   class contained in the LSP echo request).  The softwire echo packet
791	   would be encapsulated with the encapsulation header and forwarded
792	   across the same path (inband) as that of the softwire itself.

794	   This mechanism can also be automated to periodically verify remote
795	   softwires end-point reachability, with the loss of reachability being
796	   signaled to the softwires application on the local AFBR thus enabling
797	   suitable actions to be taken.  Consideration must be given to the
798	   trade-offs between scalability of such mechanisms versus time to
799	   detection of loss of endpoint reachability for such automated
800	   mechanisms.

802	   In general a framework for softwire OAM can for a large part be based
803	   on the [RFC4176] framework.

805	10.2. MIBs

807	   Specific MIBs do exist to manage elements of the softwire mesh
808	   framework.  However there will be a need to either extend these MIBs
809	   or create new ones that reflect the functional elements that can be
810	   SNMP-managed within the softwire network.

812	11. Softwire Multicast

814	   A set of client networks, running E-IP, that are connected to a
815	   provider's I-IP transit core, may wish to run IP multicast
816	   applications.  Extending IP multicast connectivity across the transit
817	   core can be done in a number of ways, each with a different set of
818	   characteristics.  Most (though not all) of the possibilities are
819	   either slight variations of the procedures defined for L3VPNs in
820	   [L3VPN-MCAST].

822	   We will focus on supporting those multicast features and protocols
823	   which are typically used across inter-provider boundaries.  Support
824	   is provided for PIM-SM (PIM Sparse Mode) and PIM-SSM (PIM Single
825	   Source Mode).  Support for BIDIR-PIM (Bidirectional PIM), BSR
826	   (Bootstrap Router Mechanism for PIM), AutoRP (Automatic Rendezvous
827	   Point Determination) is not provided as these features are not
828	   typically used across inter-provider boundaries.

830	11.1. One-to-One Mappings

832	   In the "one-to-one mapping" scheme, each client multicast tree is
833	   extended through the transit core, so that for each client tree there
834	   is exactly one tree through the core.

836	   The one-to-one scheme is not used in [L3VPN-MCAST], because it
837	   requires an amount of state in the core routers which is proportional
838	   to the number of client multicast trees passing through the core.  In
839	   the VPN context, this is considered undesirable, because the amount
840	   of state is unbounded and out of the control of the service provider.
841	   However, the one-to-one scheme models the typical "Internet
842	   multicast" scenario where the client network and the transit core are
843	   both IPv4 or are both IPv6.  If it scales satisfactorily for that
844	   case, it should also scale satisfactorily for the case where the
845	   client network and the transit core support different versions of IP.

847	11.1.1. Using PIM in the Core

849	   When an AFBR receives an E-IP PIM control message from one of its
850	   CEs, it would translate it from E-IP to I-IP, and forward it towards
851	   the source of the tree.  Since the routers in the transit core will
852	   not generally have a route to the source of the tree, the AFBR must
853	   create include an "RPF (Reverse Path Forwarding) Vector" [RPF-VECTOR]
854	   in the PIM message.

856	   Suppose an AFBR A receives an E-IP PIM Join/Prune message from a CE,
857	   for either an (S,G) tree or a (*,G) tree.  The AFBR would have to
858	   "translate" the PIM message into an I-IP PIM message.  It would then
859	   send it to the neighbor which is the next hop along the route to the
860	   root of the (S,G) or (*,G) tree.  In the case of an (S,G) tree the
861	   root of the tree is S; in the case of a (*,G) tree the root of the
862	   tree is the Rendezvous Point (RP) for the group G.

864	   Note that the address of the root of the tree will be an E-IP
865	   address.  Since the routers within the transit core (other than the
866	   AFBRs) do not have routes to E-IP addresses, A must put an "RPF
867	   Vector" [RPF-VECTOR] in the PIM Join/Prune message that it sends to
868	   its upstream neighbor.  The RPF Vector will identify, as an I-IP
869	   address, the AFBR B that is the egress point in the transit network
870	   along the route to the root of the multicast tree.  AFBR B is AFBR
871	   A's "BGP next hop" for the route to the root of the tree.  The RPF
872	   Vector allows the core routers to forward PIM Join/Prune messages
873	   upstream towards the root of the tree, even though they do not
874	   maintain E-IP routes.

876	   In order to "translate" the an E-IP PIM message into an I-IP PIM
877	   message, the AFBR A must translate the address of S (in the case of
878	   an (S,G) group) or the address of G's RP from the E-IP address family
879	   to the I-IP address family, and the AFBR B must translate them back.

881	   In the case where E-IP is IPv4 and I-IP is IPv6, it may be possible
882	   to do this translation algorithmically.  A can translate the IPv4 S
883	   into the corresponding IPv4-mapped IPv6 address [RFC4291], and then B
884	   can translate it back.  At the time of this writing, there is no such
885	   thing as an IPv4-mapped IPv6 multicast address, but if such a thing
886	   were to be standardized, then A could also translate the IPv4 G into
887	   IPv6, and B could translate it back.  The precise circumstances under
888	   which these translations are to be done would be a matter of policy.

890	   Obviously, this translation procedure does not generalize to the case
891	   where the client multicast is IPv6 but the core is IPv4.  To handle
892	   that case, one needs additional signaling between the two AFBRs.
893	   Each downstream AFBR need to signal the upstream AFBR that it needs a
894	   multicast tunnel for (S,G).  The upstream AFBR must then assign a
895	   multicast address G' to the tunnel, and inform the downstream of the
896	   P-G value to use.  The downstream AFBR then uses PIM/IPv4 to join the
897	   (S', G') tree, where S' is the IPv4 address of the upstream ASBR
898	   (Autonomous System Border Router).

900	   The (S', G') trees should be SSM trees.

902	   This procedure can be used to support client multicasts of either
903	   IPv4 or IPv6 over a transit core of the opposite protocol.  However,
904	   it only works when the client multicasts are SSM, since it provides
905	   no method for mapping a client "prune a source off the (*,G) tree"
906	   operation into an operation on the (S',G') tree.  This method also
907	   requires additional signaling.  The BGP-based signaling of [L3VPN-
908	   MCAST-BGP] is one signaling method that could be used.  Other
909	   signaling methods could be defined as well.

911	11.1.2. Using mLDP and Multicast MPLS in the Core

913	   If the transit core implements mLDP (LDP Extensions for Point-to-
914	   Multipoint and Multipoint-to-Multipoint LSPs, [mLDP]) and supports
915	   multicast MPLS, then client Single-Source Multicast (SSM) trees can
916	   be mapped one-to-one onto P2MP (Point-to-Multipoint) LSPs.

918	   When an AFBR A receives a E-IP PIM Join/Prune message for (S,G) from
919	   one of its CEs, where G is an SSM group it would use mLDP to join a
920	   P2MP LSP.  The root of the P2MP LSP would be the AFBR B that is A's
921	   BGP next hop on the route to S. In mLDP, a P2MP LSP is uniquely
922	   identified by a combination of its root and a "FEC (Forwarding
923	   Equivalence Class) identifier".  The original (S,G) can be
924	   algorithmically encoded into the FEC identifier, so that all AFBRs
925	   that need to join the P2MP LSP for (S,G) will generate the same FEC
926	   identifier.  When the root of the P2MP LSP (AFBR B) receives such an
927	   mLDP message, it extracts the original (S,G) from the FEC identifier,
928	   creates an "ordinary" E-IP PIM Join/Prune message, and sends it to
929	   the CE which is its next hop on the route to S.

931	   The method of encoding the (S,G) into the FEC identifier needs to be
932	   standardized.  The encoding must be self-identifying, so that a node
933	   which is the root of a P2MP LSP can determine whether a FEC
934	   identifier is the result of having encoded a PIM (S,G).

936	   The appropriate state machinery must be standardized so that PIM
937	   events at the AFBRs result in the proper mLDP events.  For example,
938	   if at some point an AFBR determines (via PIM procedures) that it no
939	   longer has any downstream receivers for (S,G), the AFBR should invoke
940	   the proper mLDP procedures to prune itself off the corresponding P2MP
941	   LSP.

943	   Note that this method cannot be used when the G is a Sparse Mode
944	   group.  The reason this method cannot be used is that mLDP does not
945	   have any function corresponding to the PIM "prune this source off the
946	   shared tree" function.  So if a P2MP LSP were mapped one-to-one with
947	   a P2MP LSP, duplicate traffic could end up traversing the transit
948	   core (i.e., traffic from S might travel down both the shared tree and
949	   S's source tree).  Alternatively, one could devise an AFBR-to-AFBR
950	   protocol to prune sources off the P2MP LSP at the root of the LSP.
951	   It is recommended though that client SM multicast groups be supported
952	   by other methods, such as those discussed below.

954	   Client-side bidirectional multicast groups set up by PIM-bidir could
955	   be mapped using the above technique to MP2MP (Multipoint-to-
956	   Multipoint) LSPs set up by mLDP [MLDP].  We do not consider this
957	   further as inter-provider bidirectional groups are not in use
958	   anywhere.

960	11.2. MVPN-like Schemes

962	   The "MVPN-like schemes" are those described in [L3VPN-MCAST] and its
963	   companion documents (such as [L3VPN-MCAST-BGP]).  To apply those
964	   schemes to the softwire environment, it is necessary only to treat
965	   all the AFBRs of a given transit core as if they were all, for
966	   multicast purposes, PE routers attached to the same VPN.

968	   The MVPN-like schemes do not require a one-to-one mapping between
969	   client multicast trees and transit core multicast trees.  In the MVPN
970	   environment, it is a requirement that the number of trees in the core
971	   scales less than linearly with the number of client trees.  This
972	   requirement may not hold in the softwires scenarios.

974	   The MVPN-like schemes can support SM, SSM, and Bidir groups.  They
975	   provide a number of options for the control plane:

977	     - LAN-like

979	       Use a set of multicast trees in the core to emulate a LAN (Local
980	       Area Network), and run the client-side PIM protocol over that
981	       "LAN".  The "LAN" can consists of a single Bidir tree containing
982	       all the AFBRs, or a set of SSM trees, one rooted at each AFBR,
983	       and containing all the other AFBRs as receivers.

985	     - NBMA (Non-Broadcast Multiple Access), using BGP

987	       The client-side PIM signaling can be "translated" into BGP-based
988	       signaling, with a BGP route reflector mediating the signaling.

990	   These two basic options admit of many variations; a comprehensive
991	   discussion is in [L3VPN-MCAST].

993	   For the data plane, there are also a number of options:

995	     - All multicast data sent over the emulated LAN.  This particular
996	       option is not very attractive though for the softwires scenarios,
997	       as every AFBR would have to receive every client multicast
998	       packet.

1000	     - Every multicast group mapped to a tree which is considered
1001	       appropriate for that group, in the sense of causing the traffic
1002	       of that group to go to "too many" AFBRs that don't need to
1003	       receive it.

1005	   Again, a comprehensive discussion of the issues can be found in
1006	   [L3VPN-MCAST].

1008	12. Inter-AS Considerations

1010	   We have so far only considered the case where a "transit core"
1011	   consists of a single Autonomous System (AS).  If the transit core
1012	   consists of multiple ASes, then it may be necessary to use softwires
1013	   whose endpoints are AFBRs attached to different Autonomous Systems.
1014	   In this case, the AFBR at the remote endpoint of a softwire is not
1015	   the BGP next hop for packets that need to be sent on the softwire.
1016	   Since the procedures described above require the address of remote
1017	   softwire endpoint to be the same as the address of the BGP next hop,
1018	   those procedures do not work as specified when the transit core
1019	   consists of multiple ASes.

1021	   There are several ways to deal with this situation.

1023	      1. Don't do it; require that there be AFBRs at the edge of each
1024	         AS, so that a transit core does not extend more than one AS.

1026	      2. Use multi-hop EBGP to allow AFBRs to send BGP routes to each
1027	         other, even if the ABFRs are not in the same or in neighboring
1028	         ASes.

1030	      3. Ensure that an ASBR which is not an AFBR does not change the
1031	         next hop field of the routes for which encapsulation is needed.

1033	   In the latter two cases, BGP recursive next hop resolution needs to
1034	   be done, and encapsulations may need to be "stacked" (i.e., multiple
1035	   layers of encapsulation may need to be used).

1037	   For instance, consider packet P with destination IP address D.
1038	   Suppose it arrives at ingress AFBR A1, and that the route that is the
1039	   best match for D has BGP next hop B1.  So A1 will encapsulate the
1040	   packet for delivery to B1.  If B1 is not within A1's AS, A1 will need
1041	   to look up the route to B1 and then find the BGP next hop, call it
1042	   B2, of that route. If the interior routers of A1's AS do not have
1043	   routes to B1, then A1 needs to encapsulate the packet a second time,
1044	   this time for delivery to B2.

1046	13. IANA Considerations

1048	   This document has no actions for IANA.

1050	14. Security Considerations

1052	14.1. Problem Analysis

1054	   In the Softwires mesh framework, the data packets that are
1055	   encapsulated are E-IP data packets that are traveling through the
1056	   Internet.  These data packets (the Softwires "payload") may or may
1057	   not need such security features as authentication, integrity,
1058	   confidentiality, or replay protection.  However, the security needs
1059	   of the payload packets are independent of whether or not those
1060	   packets are traversing softwires.  The fact that a particular payload
1061	   packet is traveling through a softwire does not in any way affect its
1062	   security needs.

1064	   Thus the only security issues we need to consider are those which
1065	   affect the I-IP encapsulation headers, rather than those which affect
1066	   the E-IP payload.

1068	   Since the encapsulation headers determine the routing of packets
1069	   traveling through softwires, they must appear "in the clear".

1071	   In the Softwires mesh framework, for each tunnel receiving endpoint,
1072	   there are one or more "valid" transmitting endpoints, where the valid
1073	   transmitting endpoints are those which are authorized to tunnel
1074	   packets to the receiving endpoint.  If the encapsulation header has
1075	   no guarantee of authentication or integrity, then it is possible to
1076	   have spoofing attacks, in which unauthorized nodes send encapsulated
1077	   packets to the receiving endpoint, giving the receiving endpoint the
1078	   invalid impression the encapsulated packets have really traveled
1079	   through the softwire.  Replay attacks are also possible.

1081	   The effect of such attacks is somewhat limited though.  The receiving
1082	   endpoint of a softwire decapsulates the payload and does further
1083	   routing based on the IP destination address of the payload.  Since
1084	   the payload packets are traveling through the Internet, they have
1085	   addresses from the globally unique address space (rather than, e.g.,
1086	   from a private address space of some sort).  Therefore these attacks
1087	   cannot cause payload packets to be delivered to an address other than
1088	   the one appearing in the destination IP address field of the payload
1089	   packet.

1091	   However, attacks of this sort can result in policy violations.  The
1092	   authorized transmitting endpoint(s) of a softwire may be following a
1093	   policy according to which only certain payload packets get sent
1094	   through the softwire.  If unauthorized nodes are able to encapsulate
1095	   the payload packets so that they arrive at the receiving endpoint
1096	   looking as if they arrived from authorized nodes, then the properly
1097	   authorized policies have been side-stepped.

1099	   Attacks of the sort we are considering can also be used in Denial of
1100	   Service attacks on the receiving tunnel endpoints.  However, such
1101	   attacks cannot be prevented by use of cryptographic
1102	   authentication/integrity techniques, as the need to do cryptography
1103	   on spoofed packets only makes the Denial of Service problem worse.
1104	   (The assumption is that the cryptography mechanisms are likely to be
1105	   more costly than the decapsulation/forwarding mechanisms.  So if one
1106	   tries to eliminate a flooding attack on the decapsulation/forwarding
1107	   mechanisms by discarding packets that do not pass a cryptographic
1108	   integrity test, one ends up just trading one kind of attack for
1109	   another.)

1111	   This section is largely based on the security considerations section
1112	   of RFC 4023, which also deals with encapsulations and tunnels.

1114	14.2. Non-cryptographic techniques

1116	   If a tunnel lies entirely within a single administrative domain, then
1117	   to a certain extent, then there are certain non-cryptographic
1118	   techniques one can use to prevent spoofed packets from reaching a
1119	   tunnel's receiving endpoint.  For example, when the tunnel
1120	   encapsulation is IP-based:

1122	     - The tunnel receiving endpoints can be given a distinct set of
1123	       addresses, and those addresses can be made known to the border
1124	       routers.  The border routers can then filter out packets,
1125	       destined to those addresses, which arrive from outside the
1126	       domain.

1128	     - The tunnel transmitting endpoints can be given a distinct set of
1129	       addresses, and those addresses can be made known to the border
1130	       routers and to the tunnel receiving endpoints. The border routers
1131	       can filter out all packets arriving from outside the domain with
1132	       source addresses that are in this set, and the receiving
1133	       endpoints can discard all packets which appear to be part of a
1134	       softwire, but whose source addresses are not in this set.

1136	   If an MPLS-based encapsulation is used, the border routers can refuse
1137	   to accept MPLS packets from outside the domain, or can refuse to
1138	   accept such MPLS packets whenever the top label corresponds to the
1139	   address of a tunnel receiving endpoint.

1141	   These techniques assume that within a domain, the network is secure
1142	   enough to prevent the introduction of spoofed packets from within the
1143	   domain itself.  That may not always be the case.  Also, these
1144	   techniques however can be difficult or impossible to use effectively
1145	   for tunnels that are not in the same administrative domain.

1147	   A different technique is to have the encapsulation header contain a
1148	   cleartext password.  The 64-bit "cookie" of L2TPv3 [RFC3931] is
1149	   sometimes used in this way.  This can be useful within an
1150	   administrative domain if it is regarded as infeasible for an attacker
1151	   to spy on packets that originate in the domain and that do not leave
1152	   the domain.  An attacker would then not be able to discover the
1153	   password.  An attacker could of course try to guess the password, but
1154	   if the password is an arbitrary 64-bit binary sequence, brute force
1155	   attacks which run through all the possible passwords would be
1156	   infeasible.  This technique may be easier to manage than ingress
1157	   filtering is, and may be just as effective if the assumptions hold.
1158	   Like ingress filtering, though, it may not be applicable for tunnels
1159	   that cross domain boundaries.

1161	   Therefore it is necessary to also consider the use of cryptographic
1162	   techniques for setting up the tunnels and for passing data through
1163	   them.

1165	14.3. Cryptographic techniques

1167	   If the path between the two endpoints of a tunnel is not adequately
1168	   secure, then

1170	     - If a control protocol is used to set up the tunnels (e.g., to
1171	       inform one tunnel endpoint of the IP address of the other), the
1172	       control protocol MUST have an authentication mechanism, and this
1173	       MUST be used when the tunnel is set up.  If the tunnel is set up
1174	       automatically as the result of, for example, information
1175	       distributed by BGP, then the use of BGP's MD5-based
1176	       authentication mechanism [RFC2385] is satisfactory.

1178	     - Data transmission through the tunnel should be secured with
1179	       IPsec.  In the remainder of this section, we specify the way
1180	       IPsec may be used, and the implementation requirements we mention
1181	       are meant to be applicable whenever IPsec is being used.

1183	   We consider only the case where IPsec is used together with an IP-
1184	   based tunneling mechanism.  Use of IPsec with an MPLS-based tunneling
1185	   mechanism is for further study.

1187	   If it is deemed necessary to use tunnels that are protected by IPsec,
1188	   the tunnel type SHOULD be negotiated by the tunnel endpoints using
1189	   the procedures specified in [ENCAPS-IPSEC].  That document allows the
1190	   use of IPsec tunnel mode, but also allows one to treat the tunnel
1191	   head and the tunnel tail as the endpoints of a Security Association,
1192	   and to use IPsec transport mode.

1194	   In order to use IPsec transport mode, encapsulated packets should be
1195	   viewed as originating at the tunnel head and as being destined for
1196	   the tunnel tail.  A single IP address of the tunnel head will be used
1197	   as the source IP address, and a single IP address of the tunnel tail
1198	   will be used as the destination IP address.  This technique can be
1199	   used to to carry MPLS packets through an IPsec Security Association,
1200	   but first encapsulating the MPLS packets in MPLS-in-IP or MPLS-in-GRE
1201	   [RFC4023] and then applying IPsec transport mode.

1203	   When IPsec is used to secure softwires, IPsec MUST provide
1204	   authentication and integrity.  Thus, the implementation MUST support
1205	   either ESP (IP Encapsulating Security Payload) with null encryption

1207	   [RFC4303] or else AH (IP Authentication Header) [RFC4302].  ESP with
1208	   encryption MAY be supported.  If ESP is used, the tunnel tail MUST
1209	   check that the source IP address of any packet received on a given SA
1210	   (IPsec Security Association) is the one expected, as specified in
1211	   [RFC4301] section 5.2 step 4.

1213	   Since the softwires are set up dynamically as a byproduct of passing
1214	   routing information, key distribution MUST be done automatically by
1215	   means of IKEv2 [RFC4306].  If a PKI (Public Key Infrastructure) is
1216	   not available, the IPsec Tunnel Authenticator sub-TLV described in
1217	   [ENCAPS-IPSEC] MUST be used and validated before setting up an SA.

1219	   The selectors associated with the SA are the source and destination
1220	   addresses of the encapsulation header, along with the IP protocol
1221	   number representing the encapsulation protocol being used.

1223	15. Contributors

1225	   Xing Li
1226	   Tsinghua University
1227	   Department of Electronic Engineering, Tsinghua University
1228	   Beijing  100084
1229	   P.R.China

1231	   Phone: +86-10-6278-5983
1232	   Email: xing@cernet.edu.cn

1234	   Simon Barber
1235	   Cisco Systems, Inc.
1236	   250 Longwater Avenue
1237	   Reading, ENGLAND, RG2 6GB
1238	   United Kingdom

1240	   Email: sbarber@cisco.com

1242	   Pradosh Mohapatra
1243	   Cisco Systems, Inc.
1244	   3700 Cisco Way
1245	   San Jose, Ca.  95134
1246	   USA

1248	   Email: pmohapat@cisco.com
1249	   John Scudder
1250	   Juniper Networks
1251	   1194 North Mathilda Avenue
1252	   Sunnyvale, California 94089
1253	   USA

1255	   Email: jgs@juniper.net

1257	16. Acknowledgments

1259	   David Ward, Chris Cassar, Gargi Nalawade, Ruchi Kapoor, Pranav Mehta,
1260	   Mingwei Xu and Ke Xu provided useful input into this document.

1262	Authors' Addresses

1264	   Jianping Wu
1265	   Tsinghua University
1266	   Department of Computer Science, Tsinghua University
1267	   Beijing  100084
1268	   P.R.China

1270	   Phone: +86-10-6278-5983
1271	   Email: jianping@cernet.edu.cn

1273	   Yong Cui
1274	   Tsinghua University
1275	   Department of Computer Science, Tsinghua University
1276	   Beijing  100084
1277	   P.R.China

1279	   Phone: +86-10-6278-5822
1280	   Email: yong@csnet1.cs.tsinghua.edu.cn

1282	   Chris Metz
1283	   Cisco Systems, Inc.
1284	   3700 Cisco Way
1285	   San Jose, Ca.  95134
1286	   USA

1288	   Email: chmetz@cisco.com
1289	   Eric C. Rosen
1290	   Cisco Systems, Inc.
1291	   1414 Massachusetts Avenue
1292	   Boxborough, MA, 01719
1293	   USA

1295	   Email: erosen@cisco.com

1297	17. Normative References

1299	   [ENCAPS-IPSEC] "BGP IPSec Tunnel Encapsulation Attribute", L. Berger,
1300	   R. White, E. Rosen, draft-ietf-softwire-encaps-ipsec-01.txt, April
1301	   2008.

1303	   [ENCAPS-SAFI] "BGP Information SAFI and BGP Tunnel Encapsulation
1304	   Attribute", P. Mohapatra and E. Rosen, draft-ietf-softwire-encaps-
1305	   safi-05.txt, February 2009.

1307	   [RFC2003] "IP Encapsulation within IP", C. Perkins, October 1996.

1309	   [RFC2119] "Key words for use in RFCs to Indicate Requirement Levels",
1310	   S. Bradner, March 1997.

1312	   [RFC2784] "Generic Routing Encapsulation (GRE)", D. Farinacci, T. Li,
1313	   S. Hanks, D. Meyer, P. Traina, RFC 2784, March 2000.

1315	   [RFC3031] "Multiprotocol Label Switching Architecture", E. Rosen, A.
1316	   Viswanathan, R. Callon, RFC 3031, January 2001.

1318	   [RFC3032] "MPLS Label Stack Encoding", E. Rosen, D. Tappan, G.
1319	   Fedorkow, Y. Rekhter, D. Farinacci, T. Li, A. Conta, RFC 3032,
1320	   January 2001.

1322	   [RFC3209] D. Awduche, L. Berger, D. Gan, T. Li, V. Srinivasan, and G.
1323	   Swallow, "RSVP-TE: Extensions to RSVP for LSP Tunnels", RFC 3209,
1324	   December 2001.

1326	   [RFC3931] J. Lau, M. Townsley, I. Goyret, "Layer Two Tunneling
1327	   Protocol - Version 3 (L2TPv3)", RFC 3931, March 2005.

1329	   [RFC4023] "Encapsulating MPLS in IP or Generic Routing Encapsulation
1330	   (GRE)", T. Worster, Y. Rekhter, E. Rosen, RFC 4023, March 2005.

1332	   [V4NLRI-V6NH] F. Le Faucheur, E. Rosen, "Advertising an IPv4 NLRI
1333	   with an IPv6 Next Hop", draft-ietf-softwire-v4nlri-v6nh-02.txt,
1334	   January 2009.

1336	   [V6NLRI-V4NH] J. De Clercq, D. Ooms, S. Prevost, F. Le Faucheur,
1337	   "Connecting IPv6 Islands over IPv4 MPLS using IPv6 Provider Edge
1338	   Routers (6PE)", RFC 4798, February 2007.

1340	18. Informative References

1342	   [BFD] D. Katz and D. Ward, "Bidirectional Forwarding Detection",
1343	   draft-ietf-bfd-base-09.txt, February 2009.

1345	   [L3VPN-MCAST], "Multicast in MPLS/BGP IP VPNs", E. Rosen, R.
1346	   Aggarwal, draft-ietf-l3vpn-2547bis-mcast-07.txt, July 2008.

1348	   [L3VPN-MCAST-BGP], "BGP Encodings and Procedures for Multicast in
1349	   MPLS/BGP IP VPNs", R. Aggarwal, E. Rosen, T. Morin, Y. Rekhter, C.
1350	   Kodeboniya, draft-ietf-l3vpn-2547bis-mcast-bgp-05.txt, June 2008.

1352	   [MLDP] "Label Distribution Protocol Extensions for Point-to-
1353	   Multipoint and Multipoint-to-Multipoint Label Switched Paths", I.
1354	   Minei, K. Kompella, IJ. Wijnands, B. Thomas, draft-ietf-mpls-ldp-
1355	   p2mp-05.txt, June 2008.

1357	   [RFC1195] "Use of OSI IS-IS for Routing in TCP/IP and Dual
1358	   Environments", R. Callon, RFC 1195, December 1990.

1360	   [RFC2328] J. Moy, "OSPF Version 2", RFC 2328, April 1998.

1362	   [RFC2385] "Protection of BGP Sessions via the TCP MD5 Signature
1363	   Option", A. Heffernan, RFC 2385, August 1998.

1365	   [RFC4176] Y. El Mghazli, T. Nadeau, M. Boucadair, K. Chan, A.
1366	   Gonguet, "Framework for Layer 3 Virtual Private Networks (L3VPN)
1367	   Operations and Management", RFC 4176, October 2005.

1369	   [RFC4271] Y. Rekhter, T. Li, S. Hares, "A Border Gateway Protocol 4
1370	   (BGP-4)", RFC 4271, January 2006.

1372	   [RFC4291] "IP Version 6 Addressing Architecture", R. Hinden, S.
1373	   Deering, RFC 4291, February 2006.

1375	   [RFC4301], "Security Architecture for the Internet Protocol", S.
1376	   Kent, K. Seo, RFC 4301, December 2005.

1378	   [RFC4302] "IP Authentication Header", S. Kent, RFC 4302, December
1379	   2005.

1381	   [RFC4303] "IP Encapsulating Security Payload (ESP)", S. Kent, RFC
1382	   4303, December 2005.

1384	   [RFC4306] "Internet Key Exchange (IKEv2) Protocol", C. Kaufman, ed.,
1385	   RFC 4306, December 2005.

1387	   [RFC4364] E. Rosen, Y. Rekhter, "BGP/MPLS IP Virtual Private Networks
1388	   (VPNs)", RFC 4364, February 2006.

1390	   [RFC4378] D. Allan, T. Nadeau, "A Framework for Multi-Protocol Label
1391	   Switching (MPLS) Operations and Management (OAM)", RFC 4378, February
1392	   2006.

1394	   [RFC4459] P. Savola, "MTU and Fragmentation Issues with In-the-
1395	   Network Tunneling", RFC 4459, April 2006.

1397	   [RFC5036] "LDP Specification", L. Andersson, I. Minei, B. Thomas, RFC
1398	   5036, October 2007.

1400	   [RPF-VECTOR], "The RPF Vector TLV", IJ. Wijnands, A. Boers, E. Rosen,
1401	   draft-ietf-pim-rpf-vector-08.txt, January 2009.

1403	   [SW-PROB] X. Li, S. Dawkins, D. Ward, A. Durand, "Softwire Problem
1404	   Statement", RFC 4925, July 2007.