idnits 2.17.1 draft-wkumari-idr-socialite-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (October 18, 2012) is 4201 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: 'RFC5492' is mentioned on line 159, but not defined == Missing Reference: 'This' is mentioned on line 386, but not defined ** Obsolete normative reference: RFC 4893 (Obsoleted by RFC 6793) Summary: 1 error (**), 0 flaws (~~), 3 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 idr W. Kumari 3 Internet-Draft Google 4 Intended status: Informational K. Patel 5 Expires: April 21, 2013 Cisco Systems 6 J. Scudder 7 Juniper Networks 8 October 18, 2012 10 Automagic peering at IXPs. 11 draft-wkumari-idr-socialite-02 13 Abstract 15 This document describes a method for automatically establishing BGP 16 peering sessions at an Internet exchange point. Creation of these 17 peering sessions is facilitated by a host. 19 Status of this Memo 21 This Internet-Draft is submitted in full conformance with the 22 provisions of BCP 78 and BCP 79. 24 Internet-Drafts are working documents of the Internet Engineering 25 Task Force (IETF). Note that other groups may also distribute 26 working documents as Internet-Drafts. The list of current Internet- 27 Drafts is at http://datatracker.ietf.org/drafts/current/. 29 Internet-Drafts are draft documents valid for a maximum of six months 30 and may be updated, replaced, or obsoleted by other documents at any 31 time. It is inappropriate to use Internet-Drafts as reference 32 material or to cite them other than as "work in progress." 34 This Internet-Draft will expire on April 21, 2013. 36 Copyright Notice 38 Copyright (c) 2012 IETF Trust and the persons identified as the 39 document authors. All rights reserved. 41 This document is subject to BCP 78 and the IETF Trust's Legal 42 Provisions Relating to IETF Documents 43 (http://trustee.ietf.org/license-info) in effect on the date of 44 publication of this document. Please review these documents 45 carefully, as they describe your rights and restrictions with respect 46 to this document. Code Components extracted from this document must 47 include Simplified BSD License text as described in Section 4.e of 48 the Trust Legal Provisions and are provided without warranty as 49 described in the Simplified BSD License. 51 Table of Contents 53 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 54 1.1. Requirements notation . . . . . . . . . . . . . . . . . . 3 55 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 56 3. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 57 4. Protocol Extensions . . . . . . . . . . . . . . . . . . . . . 4 58 4.1. Debut Capability . . . . . . . . . . . . . . . . . . . . . 4 59 5. Packet Formats . . . . . . . . . . . . . . . . . . . . . . . . 4 60 5.1. Message Header . . . . . . . . . . . . . . . . . . . . . . 5 61 5.2. INTRODUCTION Record . . . . . . . . . . . . . . . . . . . 6 62 5.3. WITHDRAW Record . . . . . . . . . . . . . . . . . . . . . 7 63 6. Protocol operation . . . . . . . . . . . . . . . . . . . . . . 7 64 7. Operational overview / implications . . . . . . . . . . . . . 8 65 7.1. Additional eBGP sessions. . . . . . . . . . . . . . . . . 8 66 7.2. Simplified debugging. . . . . . . . . . . . . . . . . . . 8 67 7.3. BGP PATH / SIDR implications . . . . . . . . . . . . . . . 8 68 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 69 8.1. Debut TYPE registry . . . . . . . . . . . . . . . . . . . 9 70 9. Security considerations . . . . . . . . . . . . . . . . . . . 9 71 9.1. Privacy . . . . . . . . . . . . . . . . . . . . . . . . . 10 72 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 10 73 11. Author Notes . . . . . . . . . . . . . . . . . . . . . . . . . 10 74 11.1. Changelog. . . . . . . . . . . . . . . . . . . . . . . . . 10 75 11.2. Changes from -00 to -01 . . . . . . . . . . . . . . . . . 10 76 11.3. Changes from -01 to -02 . . . . . . . . . . . . . . . . . 11 77 12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 11 78 12.1. Normative References . . . . . . . . . . . . . . . . . . . 11 79 12.2. Informative References . . . . . . . . . . . . . . . . . . 11 80 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 11 82 1. Introduction 84 A large amount of Internet traffic is exchanged at Internet Exchange 85 Points (IXP). These are networks that are specifically built and 86 operated as locations for networks to peer and exchange traffic. 88 Public peering refers to peering across the (IXP provided) switch 89 fabric. In order to avoid having each participant at the IXP having 90 to contact all of the other participants to enter into peering 91 relationships, the IXP often provides a Route Server (RS). The Route 92 Server is a BGP speaker that participants peer with and announce 93 routes to. The Route Server takes these announcements and serves 94 them to all of the other participants who peer with it (so far this 95 is just like any other BGP router!). The Route Server differs from a 96 standard eBGP speaker in that it neither updates the Next Hop, nor 97 prepends its own AS to the AS Path attribute. By not changing the 98 Next Hop attribute, traffic between participants flows directly 99 between those participants (and does not pass through the Route 100 Server), as the traffic doesn't flow though it, it is appropriate 101 that it doesn't appear in the AS Path - this is known as a 102 transparent Route Server (by not showing up in the AS Path, the fact 103 that the peering between the participants occurs over a public 104 peering session is hidden, and participants are not penalized by 105 having longer AS Paths). 107 This document describes an alternate solution for peering at an IXP. 108 Instead of having a server that re-announces the routes from each 109 participant to all of the others, we introduce a "socialite", a 110 device that is responsible to making introductions between all of the 111 participants and facilitating connections between them. This 112 socialite can be thought of like a host at a dinner party. The 113 guests arrive and the socialite introduces them to each other, and 114 then steps out of the way to allow them to communicate (and peer!) on 115 their own. 117 This solution is aimed at operators who are currently peering with 118 route-servers (and operators of those route-servers), and it is not 119 expected to be a good alternative to "private peerings". 121 1.1. Requirements notation 123 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 124 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 125 document are to be interpreted as described in [RFC2119]. 127 2. Terminology 129 Internet Exchange Point A network for exchanging BGP routing 130 information and traffic. 131 Route Server A BGP speaker at an IXP that "reflects" routes from one 132 participant to all the other participants. See 133 [I-D.jasinska-ix-bgp-route-server] 134 Socialite A device running the Introduction protocol, responsible 135 for making introductions between Guests. 136 Guests Participants of the IXP that speak the Debut protocol with 137 the Socialite, and are introduced by the Socialite to other 138 Guests. 139 Debut The protocol spoken between the Socialite and the Guests. 141 3. Overview 143 The Guests at the IXP form a BGP peering relationship with the 144 Socialite, announcing support for the Debut protocol. The Socialite 145 sends the Guests a set of Debut updates, containing informations 146 about the other participants. The Guests use this information to 147 form direct BGP peerings between themselves. Policy can be 148 configured on the Socialite to only make introductions between 149 subsets of participants if so desired. 151 4. Protocol Extensions 153 The BGP protocol extensions introduced in this document include the 154 definition of a new BGP capability, named "'Debut Capability", and 155 the specification of the message subtypes for the Debut messages. 157 4.1. Debut Capability 159 The "Debut Capability" is a new BGP capability [RFC5492]. The 160 Capability Code for this capability is specified in the IANA 161 Considerations section of this document. The Capability Length field 162 of this capability is zero. By advertising this capability to a 163 peer, a BGP speaker conveys to the peer that the speaker supports the 164 message subtypes for the Debut protocol and the related procedures 165 described in this document. 167 5. Packet Formats 169 The Debut protocol is implemented using TLV structures, and fields 170 are in network byte order. These TLV records are carried as payload 171 in a standard BGP Message packet (RFC 4271, Section 4.1. Message 172 Header Format ) [RFC4271] 174 5.1. Message Header 176 The Debut protocol is implemented using the standard Type-Length- 177 Value paradigm. 179 All Debut messages are carried as payload in a standard BGP Message 180 of Type [TBD_BGP] and are preceded by a standard header that specific 181 the type and length of the message. 183 0 1 2 3 4 5 6 7 15 31 184 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 185 | VER | RES | TYPE | LENGTH | 186 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 187 | VALUE (Variable Length) | 188 ~ ~ 189 ~ ~ 190 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 192 o VER (4 bits): The VER (version) field specifies the version of the 193 Debut protocol. For the initial (this) version of the protocol it 194 will be set to 0. 195 o RES (4 bits): The RES (reserved) field is reserved in the initial 196 (this) version of the protocol. It SHOULD be initialized to 0 on 197 transmit and should be ignored on reception. 198 o TYPE (2 octets): The TYPE field species the TYPE of the TLV 199 record, and allows an implementation to determine what type of 200 information is carried in the record. If the highest bit of the 201 TYPE field is set (the TYPE value is >= 32786), understanding / 202 implementation of the TYPE is optional - if an implementation does 203 not implement this type it may ignore this message (this 204 capability is included to allow for possible future logging, 205 diagnostics, etc). If the highest bit is not set, and the 206 implementation receives a TYPE that it does not implement, it 207 should send a BGP NOTIFY and tear down the session. The TYPE 208 codes are defined in the 209 o LENGTH (2 octets): The number of octets in the VALUE field of the 210 TLV record. The total length of the TLV record in octets can be 211 calculated by adding 4 (the number of octets in the TYPE and 212 LENGTH fields) to the value of this field. This allows 213 implementations to skip over TLV records that it cannot handle. 214 o VALUE (Variable length): The actual data. The meaning of this 215 data is given by the TYPE filed, and the length by the LENGTH 216 field. Parsing of the data field is performed according to the 217 value of the TYPE field. 219 5.2. INTRODUCTION Record 221 INTRODUCTION (TYPE 0) TLV records are used to "make introductions" 222 between the Guests speaking the Debut protocol. They carry the 223 information needed by Guests to contact the other Guests and 224 establish a BGP peering session. 226 0 1 2 3 4 5 6 7 15 24 32 227 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 228 0: | NEIGHBOR AS | 229 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 230 4: | AFI | SAFI | LEN | 231 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 232 | ADDRESS | 233 +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ 234 / / 235 +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ 236 x: | Auth | 237 +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ 239 o NEIGHBOR AS (4 octets): This specifies the Autonomous System of 240 the Guest being introduced. In "Four-octet AS Number" format as 241 specified in RFC4893 [RFC4893] 242 o AFI (2 octets): Address Family Identifier [RFC4760] [RFC4760] 243 o SAFI (1 octet): Subsequent Address Family Identifier [RFC4760] 244 [RFC4760] 245 o LEN (1 octet): The length of the address in the ADDRESS field (32 246 for IPv4, 128 for IPv6). 247 o ADDRESS (Variable length) This contains the IP Address of the 248 Guest being introduced. 249 o Auth (optional, variable length): The existence of this field is 250 determined from LENGTH of the TLV. If the LENGTH is greater than 251 the length of NEIGHBOR AS, FAMILY and ADDRESS, there is Auth 252 data). 254 The AFI and SAFI are included in the INTRODUCTION message to allow 255 the Socialite to introduce Guests with multiple address families. 257 On reception of an INTRODUCTION message a Guest should store the 258 information and then consult local policy (if any) to determine if it 259 is willing to peer with the newly introduced Guest. If so, it should 260 proceed as though this were a manually configured peer. This peering 261 SHOULD be annotated to note that this is a Socialite created peering. 262 It is recommended that the peering show up in the configuration, but 263 not persist across reboots -- this is to allow operators to more 264 easily see all neighbors while looking through the config. 266 5.3. WITHDRAW Record 268 WITHDRAW (TYPE 1) TLV records are used to inform a Guest that another 269 previously introduced Guest is no longer participating. A Guest can 270 use this information to abort in progress connection attempts, 271 invalidate information from a cache, for informational logging or in 272 any other way is sees fit, but it SHOULD NOT use this information to 273 tear down peering sessions to other Guests in ESTABLISHED state. 274 Debut is intended to make initial introductions between participants 275 and does not provide any mechanisms to invalidate / abort sessions 276 once the introductions have been made. 278 If a Socialite attempts to unintroduce an unknown Guest, this 279 information should be logged and then ignored. 281 0 1 2 3 4 5 6 7 15 24 32 282 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 283 0: | AFI | SAFI | LEN | 284 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 285 4: | ADDRESS \ 286 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 288 o AFI (2 octets): Address Family Identifier [RFC4760] [RFC4760] 289 o SAFI (1 octet): Subsequent Address Family Identifier [RFC4760] 290 [RFC4760] 291 o LEN (1 octet): The length of the address in the ADDRESS field (32 292 for IPv4, 128 for IPv6). 293 o ADDRESS (Variable length): This contains the IP address of the 294 Guest that is no longer participating. 296 The AFI and SAFI are included so that the Socialite can inform Guests 297 that only one of the AFI / SAFIs is being removed. 299 6. Protocol operation 301 Debut BGP sessions behave just like any other BGP sessions, just the 302 information carried is different - Guests should use the standard BGP 303 peering process to contact Socialites (or Socialites, Guests). Once 304 the peering is ESTABLISHED, Guests will begin receiving INTRODUCTION 305 messages in UPDATES, and will store them in something resembling Adj- 306 RIB-IN. Standard BGP logic applied for things like error handling, 307 invalidation of previously received information, etc. 309 As Debut is only intended to make initial introductions between 310 Guests (and not to manage sessions between those Guests), if the BGP 311 session between Guest and Socialite goes down, established BGP 312 peerings between Guests will continue to remain active. 314 7. Operational overview / implications 316 There are many reasons why participants peer with route-servers at 317 IXPs (see [I-D.jasinska-ix-bgp-route-server]) including 318 o reducing the administrative burden of arranging and configuring 319 BGP sessions with all the other participants, 320 o not wanting (or being able) to carry views from all the 321 participants, 322 o relying on the IXP operator to implement routing policy decisions 323 (see [I-D.jasinska-ix-bgp-route-server], section 2.3) 325 This solution only attempts to address the first reason for using a 326 route-server, and the implications of deploying this are described 327 below. 329 7.1. Additional eBGP sessions. 331 Debut is used to make introductions between all (or a subset of) 332 participants at an IXP, and then the participants peer over "regular" 333 BGP peerings. This means that each participating router will build a 334 separate BGP peering session with every other participating router. 335 As participants at IXPs (usually) only advertise a small subset of 336 the full Internet routing table (such as internal or customer routes) 337 and there is (usually) not a huge overlap of this routing 338 information, the additional memory requirements are expected to not 339 be too onerous (especially with the capacity of modern routers). As 340 with all operational matters though, "Your network, your rules" 341 applies -- it is up to each operator to determine the applicability / 342 utility of the solution and how it fits (or doesn't) into their 343 network (see what I did there? If this ends poorly, it's your own 344 fault!) 346 7.2. Simplified debugging. 348 As "normal" eBGP peering sessions are setup between the participants 349 (and there is no third party performing route-selection, etc), 350 operators have more visibility into the system and can more easily 351 leverage their existing troubleshooting / debugging skills to debug 352 issues. Debut also more closely aligns the data and control plane, 353 etc. 355 7.3. BGP PATH / SIDR implications 357 Part of the justification for this work is to simplify the design and 358 implementation of path security in SIDR. As a Route-Server passes 359 routing information between peers, but does not show up in the BGP 360 AS-PATH it is indistinguishable from a BGP path shortening attack. 361 By using Debut, eBGP speakers peer directly with each other and this 362 problem is avoided. 364 8. IANA Considerations 366 IANA is requested to assign an AFI and SAFI for the Debut protocol. 367 The text TBD1 should be replaced with the allocated AFI and the text 368 TBD2 should be replaced with the allocated SAFI (and then this 369 sentence should be removed). 371 The IANA is requested to assign a value from the "BGP Message Types" 372 registry and replace the text [TBD_BGP] with this value. The 373 definition should be "Debut protocol". 375 8.1. Debut TYPE registry 377 This document creates a new registry, "Debut Message Types". 379 The registry policy is ""Specification Required". 381 The initial entries in the registry are: 383 Value Short description Reference 384 ------------------------------------------------- 385 0 INTRODUCTION [This] 386 1 WITHDRAW [This] 387 2-3200 Unassigned 388 3200-32767 Private Use 389 32768-65480 Unassigned 390 65481-65535 Private Use 392 Applications to the registry can request specific values that have 393 yet to be assigned. 395 9. Security considerations 397 This protocol is designed to facilitate direct BGP peerings between 398 participants at an IXP, which eliminates the need for transparent 399 route servers (which do not show up in the AS_PATH). This will 400 facilitate the deployment of SIDR. 402 As participants peer with each other directly (and not through a 403 third party) there is less opportunity for malicious tampering with 404 the control plane (for example, by the IXP). 406 Debut currently does not provide a means to securely distribute 407 Authentication information (there is a field, but it's not really 408 defined). Depending on if needed this may be addressed. 410 An attacker who manages to subvert the Socialite (or inject UPDATES 411 that into the Socialite to Guest communication) will be able to make 412 Guests peer with a device under his control -- the impact of this 413 seems to be no worse than in the routeserver model. 415 Currently routeserver operators perform some base level checking / 416 sanitization of routing information (such as enforcing max-paths) - 417 in the socialte model each operator is expected to perform thier own 418 checks. 420 9.1. Privacy 422 By having participants peer directly (as opposed to having their 423 routing information pass through a route-server) the routing 424 information is hidden from the IXP / route-server operator. Please 425 note that this doesn't protect the data-plane, and the routing 426 information could still be sniffed off the wire. 428 The biggest concern with regards to privacy on a route server is 429 towards propagating your policy to a third party, rather than 430 propagating your routing information. 432 10. Acknowledgements 434 The authors wish to thank Elisa Jasinska, Masataka MAWATARI, Robert 435 Raszuk, Martin Hannigan, Simon Leinen. 437 11. Author Notes 439 [ RFC Editor -- Please remove this section before publication! ] 441 1. Choose a better name than "Debut" 443 11.1. Changelog. 445 o Changed the name of the protocol from 'elo-'elo to Debut - this is 446 still not great, but "Introduction" is worse. 447 o Added Operational section, incorporated notes from John Scudder, 448 Keyur. 450 11.2. Changes from -00 to -01 451 o Incorporated some comments from Elisa Jasinska 452 o Mainly version bump tp prevent expire! 454 11.3. Changes from -01 to -02 456 o Incorporated some long lingering nits / suggestions. 457 o 9 or 10 folk have expressed interest and asked us to revive this. 458 I (Warren) have done a really poor job of taking notes and 459 incorporating them. Appologies, if you mentioned issues to me in 460 person I have probably forgotten to incorporate them, *please* 461 send them in email and I'll get to them. 462 o Clarified the audience slightly, improved some security bits. 464 12. References 466 12.1. Normative References 468 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 469 Requirement Levels", BCP 14, RFC 2119, March 1997. 471 [RFC4271] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway 472 Protocol 4 (BGP-4)", RFC 4271, January 2006. 474 [RFC4760] Bates, T., Chandra, R., Katz, D., and Y. Rekhter, 475 "Multiprotocol Extensions for BGP-4", RFC 4760, 476 January 2007. 478 [RFC4893] Vohra, Q. and E. Chen, "BGP Support for Four-octet AS 479 Number Space", RFC 4893, May 2007. 481 12.2. Informative References 483 [I-D.jasinska-ix-bgp-route-server] 484 Jasinska, E., Hilliard, N., Raszuk, R., and N. Bakker, 485 "Internet Exchange Route Server", 486 draft-jasinska-ix-bgp-route-server-03 (work in progress), 487 October 2011. 489 Authors' Addresses 491 Warren Kumari 492 Google 493 1600 Amphitheatre Parkway 494 Mountain View, CA 94043 495 US 497 Email: warren@kumari.net 499 Keyur Patel 500 Cisco Systems 502 Phone: 503 Fax: 504 Email: keyupate@cisco.com 505 URI: 507 John Scudder 508 Juniper Networks 509 1194 N. Mathilda Ave 510 Sunnyvale, CA 511 USA 513 Phone: 514 Fax: 515 Email: jgs@juniper.net 516 URI: