idnits 2.17.1 draft-ietf-msdp-spec-01.txt: ** The Abstract section seems to be numbered Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 21 longer pages, the longest (page 2) being 60 lines == It seems as if not all pages are separated by form feeds - found 0 form feeds but 22 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 7 instances of too long lines in the document, the longest one being 7 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == Line 122 has weird spacing: '... from the R...' == Line 326 has weird spacing: '... change or |...' == Line 722 has weird spacing: '...cluding this ...' == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: A later version (-08) exists of draft-ietf-mboned-anycast-rp-04 ** Downref: Normative reference to an Informational draft: draft-ietf-mboned-anycast-rp (ref. 'ANYCASTRP') == Outdated reference: A later version (-02) exists of draft-meyer-gre-update-01 ** Obsolete normative reference: RFC 1771 (Obsoleted by RFC 4271) ** Obsolete normative reference: RFC 1825 (Obsoleted by RFC 2401) ** Downref: Normative reference to an Historic RFC: RFC 1828 ** Obsolete normative reference: RFC 2283 (Obsoleted by RFC 2858) ** Obsolete normative reference: RFC 2362 (Obsoleted by RFC 4601, RFC 5059) Summary: 14 errors (**), 0 flaws (~~), 10 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Network Working Group Dino Farinacci 2 INTERNET DRAFT Procket Networks 3 Yakov Rekhter 4 David Meyer 5 Cisco Systems 6 Peter Lothberg 7 Sprint 8 Hank Kilmer 9 Jeremy Hall 10 UUnet 11 Category Standards Track 12 December, 1999 14 Multicast Source Discovery Protocol (MSDP) 15 17 1. Status of this Memo 19 This document is an Internet-Draft and is in full conformance with 20 all provisions of Section 10 of RFC 2026. 22 Internet Drafts are working documents of the Internet Engineering 23 Task Force (IETF), its areas, and its working groups. Note that other 24 groups may also distribute working documents as Internet-Drafts. 26 Internet-Drafts are draft documents valid for a maximum of six months 27 and may be updated, replaced, or obsoleted by other documents at any 28 time. It is inappropriate to use Internet-Drafts as reference 29 material or to cite them other than as "work in progress." 31 The list of current Internet-Drafts can be accessed at 32 http://www.ietf.org/ietf/1id-abstracts.txt. 34 The list of Internet-Draft Shadow Directories can be accessed at 35 http://www.ietf.org/shadow.html. 37 2. Abstract 39 The Multicast Source Discovery Protocol, MSDP, describes a mechanism 40 to connect multiple PIM-SM domains together. Each PIM-SM domain uses 41 it's own independent RP(s) and does not have to depend on RPs in 42 other domains. 44 3. Copyright Notice 46 Copyright (C) The Internet Society (1999). All Rights Reserved. 48 4. Introduction 50 The Multicast Source Discovery Protocol, MSDP, describes a mechanism 51 to connect multiple PIM-SM domains together. Each PIM-SM domain uses 52 its own independent RP(s) and does not have to depend on RPs in other 53 domains. Advantages of this approach include: 55 o No Third-party resource dependencies on RP 57 PIM-SM domains can rely on their own RPs only. 59 o Receiver only Domains 61 Domains with only receivers get data without globally 62 advertising group membership. 64 o Global Source State 66 Global source state is not required, since a router need not 67 cache Source Active (SA) messages (see below). MSDP is a 68 periodic protocol. 70 The keywords MUST, MUST NOT, MAY, OPTIONAL, REQUIRED, RECOMMENDED, 71 SHALL, SHALL NOT, SHOULD, SHOULD NOT are to be interpreted as defined 72 in RFC 2119 [RFC2119]. 74 5. Overview 76 An RP (or other MSDP SA originator) in a PIM-SM [RFC2362] domain will 77 have a MSDP peering relationship with a MSDP speaker in another 78 domain. The peering relationship will be made up of a TCP connection 79 in which control information exchanged. Each domain will have one or 80 more connections to this virtual topology. 82 The purpose of this topology is to have domains discover multicast 83 sources from other domains. If the multicast sources are of interest 84 to a domain which has receivers, the normal source-tree building 85 mechanism in PIM-SM will be used to deliver multicast data over an 86 inter-domain distribution tree. 88 We envision this virtual topology will essentially be congruent to 89 the existing BGP topology used in the unicast-based Internet today. 90 That is, the TCP connections between MSDP speakers can be realized by 91 the underlying BGP routing system. 93 6. Procedure 95 A source in a PIM-SM domain originates traffic to a multicast group. 96 The PIM DR which is directly connected to the source sends the data 97 encapsulated in a PIM Register message to the RP in the domain. 99 The RP will construct a "Source-Active" (SA) message and send it to 100 its MSDP peers. The SA message contains the following fields: 102 o Source address of the data source. 103 o Group address the data source sends to. 104 o IP address of the RP. 106 Each MSDP peer receives and forwards the message away from the RP 107 address in a "peer-RPF flooding" fashion. The notion of peer-RPF 108 flooding is with respect to forwarding SA messages. The BGP routing 109 table is examined to determine which peer is the NEXT_HOP towards the 110 originating RP of the SA message. Such a peer is called an "RPF 111 peer". See section 10 below for the details of peer-RPF fowarding. 113 If the MSDP peer receives the SA from a non-RPF peer towards the 114 originating RP, it will drop the message. Otherwise, it forwards the 115 message to all it's MSDP peers. 117 The flooding can be further constrained to children of the peer by 118 interrogating BGP reachability information. That is, if a BGP peer 119 advertises a route (back to you) and you are the next to last AS in 120 the AS_PATH, the peer is using you as the NEXT_HOP. In this case, an 121 implementation SHOULD forward an SA message (which was originated 122 from the RP address covered by that route) to the peer. This is 123 known in other circles as Split-Horizon with Poison Reverse. 125 When an MSDP peer which is also an RP for its own domain receives an 126 SA message, it determines if it has any group members interested in 127 the group which the SA message describes. That is, the RP checks for 128 a (*,G) entry with a non-empty outgoing interface list; this implies 129 that the domain is interested in the group. In this case, the RP 130 triggers a (S,G) join event towards the data source as if a 131 Join/Prune message was received addressed to the RP itself (See 132 [RFC2362] Section 3.2.2). This sets up a branch of the source-tree to 133 this domain. Subsequent data packets arrive at the RP which are 134 forwarded down the shared-tree inside the domain. If leaf routers 135 choose to join the source-tree they have the option to do so 136 according to existing PIM-SM conventions. Finally, if an RP in a 137 domain receives a PIM Join message for a new group G, and it is 138 caching SAs, then the RP should trigger a (S,G) join event for each 139 SA for that group in its cache. 141 This procedure has been affectionately named flood-and-join because 142 if any RP is not interested in the group, they can ignore the SA 143 message. Otherwise, they join a distribution tree. 145 7. Controlling State 147 While RPs which receive SA messages are not required to keep MSDP 148 (S,G) state, an RP SHOULD cache SA messages by default. The advantage 149 of caching is that newly formed MSDP peers can get MSDP (S,G) state 150 sooner and therefore reduce join latency for new joiners. In 151 addition, caching greatly aids in diagnosis and debugging of various 152 problems. 154 7.1. Timers 156 The main timers for MSDP are: SA Advertisement period, SA Hold-down 157 period, the SA Cache timeout period, KeepAlive, HoldTimer, and 158 ConnectRetry. Each is described below. 160 7.1.1. SA Advertisement Period 162 RPs which originate SA messages do it periodically as long as there 163 is data being sent by the source. The SA Advertisement Period MUST be 164 60 seconds. An RP will not send more than one SA message for a given 165 (S,G) within an SA Advertisement period. Originating periodic SA 166 messages is important so that new receivers who join after a source 167 has been active can get data quickly via the receiver's own RP when 168 it is not caching SA state. Finally, if an RP in a domain receives a 169 PIM Join message for a new group G, and it is caching SAs, then the 170 RP should trigger a (S,G) join for each SA for that group in its 171 cache. 173 7.1.2. SA Hold-down Period 175 A caching MSDP speaker SHOULD NOT forward an SA message it has 176 received in the last SA-Hold-down period. The SA-Hold-down period 177 SHOULD be set to 30 seconds. 179 7.1.3. SA Cache Timeout 181 A caching MSDP speaker times out it's SA cache at SA-State-Timer. 182 The SA-State-Timer MUST NOT be less than 90 seconds. 184 7.1.4. KeepAlive, HoldTimer, and ConnectRetry 186 The KeepAlive, HoldTimer, and ConnectRetry timers are defined in RFC 187 1771 [RFC1771]. 189 7.2. Intermediate MSDP Speakers 191 Intermediate RPs do not originate periodic SA messages on behalf of 192 sources in other domains. In general, an RP MUST only originate an SA 193 for its own sources. 195 7.3. SA Filtering and Policy 197 As the number of (S,G) pairs increases in the Internet, an RP may 198 want to filter which sources it describes in SA messages. Also, 199 filtering may be used as a matter of policy which at the same time 200 can reduce state. Only the RP co-located in the same domain as the 201 source can restrict SA messages. Note, hoever, that MSDP peers in 202 transit domains should not filter SA messages or the flood-and-join 203 model can not guarantee that sources will be known throughout the 204 Internet (i.e., SA filtering by transit domains can cause black 205 holes). In general, policy should be expressed using MBGP [RFC2283]. 206 This will cause MSDP messages will flow in the desired direction and 207 peer-RPF fail otherwise. An exception occurs at an administrative 208 scope [RFC2365] boundary. In particular, a SA message for a (S,G) 209 MUST NOT be sent to peers which are on the other side of an 210 administrative scope boundary for G. 212 7.4. SA Requests 214 If an MSDP peer decides to cache SA state, it may accept SA-Requests 215 from other MSDP peers. When an MSDP peer receives an SA-Request for a 216 group range, it will respond to the peer with a set of SA entries, in 217 an SA-Response message, for all active sources sending to the group 218 range requested in the SA-Request message. The peer that sends the 219 request will not flood the responding SA-Response message to other 220 peers. 222 If an implementation receives an SA-Request message and is not 223 caching SA messages, it sends a notification with Error code 7 224 subcode 1, as defined in section 12.2.7. 226 8. Encapsulated Data Packets 228 For bursty sources, the RP may encapsulate multicast data from the 229 source. An interested RP may decapsulate the packet, which SHOULD be 230 forwarded as if a PIM register encapsulated packet was received. That 231 is, if packets are already arriving over the interface toward the 232 source, then the packet is dropped. Otherwise, if the outgoing 233 interface list is non-null, the packet is forwarded appropriately. 234 Note that when doing data encapsulation, an implementation MUST bound 235 the number of packets from the source which are encapsulated. 237 This allows for small bursts to be received before the multicast tree 238 is built back toward the source's domain. For example, an 239 implementation SHOULD encapsulate at least the first packet to 240 provide service to bursty sources. 242 Finally, if an implementation supports an encapsulation of SA data 243 other than default TCP encapsulation, then it MUST support GRE 244 encapsulation. In addition, an implementation MUST learn about not 245 TCP encapsulations via capability advertisement (see section 11.2.5). 247 9. Other Scenarios 249 MSDP is not limited to deployment across different routing domains. 250 It can be used within a routing domain when it is desired to deploy 251 multiple RPs for different group ranges. As long as all RPs have a 252 interconnected MSDP topology, each can learn about active sources as 253 well as RPs in other domains. Another example is the Anycast RP 254 mechanism [ANYCASTRP]. 256 10. MSDP Peer-RPF Forwarding 258 The MSDP Peer-RPF Forwarding rules are used for forwarding SA 259 messages throughout an MSDP enabled internet. Unlike the RPF check 260 used when forwarding data packets, the Peer-RPF check is against the 261 RP address carried in the SA message. 263 10.1. Peer-RPF Forwarding Rules 265 An SA message originated by an MSDP originator R and received by a 266 MSDP router from MSDP peer N is accepted if N is the appropriate RPF 267 neighbor for originator R. The RPF neighbor is chosen using the first 268 of the following rules that matches: 270 (i). R is the RPF neighbor if we have an MSDP peering with R. 272 (ii). The external MBGP neighbor towards which we are 273 poison-reversing the MBGP route towards R is the RPF neighbor 274 if we have an MSDP peering with it. 276 (iii). If we have an MSDP peering with a neighbor in the first 277 AS along the AS_PATH (the AS from which we learned this 278 route), but no exeternal MBGP peering with that neighbor, 279 pick a neighbor via a deterministic rule if you have have 280 several, and that is the RPF neighbor. 282 (vi). The internal MBGP advertiser of the router towards R is 283 the RPF neighbor if we have an MSDP peering with it. 285 (v). If none of the above match, and we have an MSDP 286 default-peer configured, the MSDP default-peer is 287 the RPF neighbor. 289 Once an RPF neighbor is chosen (as defined above), an SA message is 290 accepted if it was received from the RPF neighbor, and discarded 291 otherwise. 293 10.2. MSDP default-peer semantics 295 A MSDP default-peer is much like a default route. It is intended to 296 be used in those cases where a stub network isn't running BGP or 297 MBGP. A MSDP speaker configured with a default-peer accepts all SA 298 messages from the default-peer. Note that a router running BGP or 299 MBGP SHOULD NOT allow configuration of default peers, since this 300 allows the possibility for SA looping to occur. 302 11. MSDP Connection Establishment 304 MSDP speakers establish peering sessions according to the following 305 state machine: 307 De-configured or 308 disabled 309 +-------------------------------------------+ 310 | | 311 +-----|--------->+----------+ | 312 | | +->| INACTIVE |----------------+ | 313 | | | +----------+ | | 314 Deconf'ed | | | /|\ /|\ | Timer + Higher Address 315 or | | | | | | | 316 disabled | | | | | \|/ | 317 | | | | | | +-------------+ 318 | | | | | +---------------| CONNECTING | 319 | | | | | Timeout or +-------------+ 320 | | | | | Router ID Change | 321 \|/ \|/ | | | | 322 +----------+ | | | | 323 | DISABLED | | | +---------------------+ | TCP Established 324 +----------+ | | | | 325 /|\ /|\ | | Connection Timeout or | | 326 | | | | Router ID change or | | 327 | | | | Authorization Failure | | 328 | | | | | | 329 | | | | | \|/ 330 | | | | +-------------+ 331 | | Router ID | | Timer + | ESTABLISHED | 332 | | Change | | Low Address +-------------+ 333 | | | \|/ /|\ | 334 | | | +--------+ | | 335 | | +--| LISTEN |--------------------+ | 336 | | +--------+ TCP Accept | 337 | | | | 338 | | | | 339 | +---------------+ | 340 | De-configured or | 341 | disabled | 342 | | 343 +------------------------------------------------------+ 344 De-configured or 345 disabled 347 12. Packet Formats 349 MSDP messages will be encapsulated in a TCP connection using well- 350 known port 639. One side of the MSDP peering relationship will listen 351 on the well-known port and the other side will do an active connect 352 on the well-known port. The side with the higher peer IP address will 353 do the listen. This connection establishment algorithm avoids call 354 collision. Therefore, there is no need for a call collision 355 procedure. It should be noted, however, that the disadvantage of this 356 approach is that it may result in longer startup times at the passive 357 end. 359 Finally, if an implementation receives a TLV that has length that is 360 longer than expected, the TLV SHOULD be accepted. Any additional data 361 SHOULD be ignored. 363 12.1. MSDP messages will be encoded in TLV format: 365 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 366 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 367 | Type | Length | Value .... | 368 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 370 Type (8 bits) 371 Describes the format of the Value field. 373 Length (16 bits) 374 Length of Type, Length, and Value fields in octets. The 375 minimum length required is 3 octets. 377 Value (variable length) 378 Format is based on the Type value. See below. The length of 379 the value field is Length field minus 3. 381 12.2. The following TLV Types are defined: 383 Code Type 384 =========================================================== 385 1 IPv4 Source-Active 386 2 IPv4 Source-Active Request 387 3 IPv4 Source-Active Response 388 4 KeepAlive 389 5 Encapsulation Capability Advertisement 390 6 Encapsulation Capability Request 391 7 Notification 392 8 GRE Encapsulation 394 Each TLV is described below. 396 12.2.1. IPv4 Source-Active TLV 398 The maximum size SA message that can be sent is 1400 bytes. If an 399 MSDP peer needs to originate a message with information greater than 400 1400 bytes, it sends successive 1400-byte messages. The 1400 byte 401 size does not include the TCP, IP, layer-2 headers. 403 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 404 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 405 | 1 | x + y | Entry Count | 406 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 407 | RP Address | 408 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 409 | Reserved | Gprefix Len | Sprefix Len | \ 410 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ \ 411 | Group Address Prefix | ) z 412 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ / 413 | Source Address Prefix | / 414 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 416 Type 417 IPv4 Source-Active TLV is type 1. 419 Length x 420 Is the length of the control information in the message. x is 421 8 octets (for the first two 32-bit quantities) plus 12 times 422 Entry Count octets. 424 Length y 425 If 0, then there is no data encapsulated. Otherwise an IPv4 426 packet follows and y is the length of the total length field 427 of the IPv4 header encapsulated. If there are multiple SA TLVs 428 in a message, and data is also included, y must be 0 in all SA 429 TLVs except the last one. And the last SA TLV must reflect the 430 source and destination addresses in the IP header of the 431 encapsulated data. 433 Entry Count 434 Is the count of z entries (note above) which follow the RP 435 address field. This is so multiple (S,G)s from the same domain 436 can be encoded efficiently for the same RP address. 438 RP Address 439 The address of the RP in the domain the source has become 440 active in. 442 Reserved 443 The Reserved field MUST be transmitted as zeros and ignored 444 by a receiver. 446 Gprefix Len and Sprefix Len 447 The route prefix length associated with the group address 448 prefix and source address prefix, respectively. 450 Group Address Prefix 451 The group address the active source has sent data to. 453 Source Address Prefix 454 The route prefix associated with the active source. 456 Multiple SA TLVs MAY appear in the same message and can be batched 457 for efficiency at the expense of data latency. This would typically 458 occur on intermediate forwarding of SA messages. 460 12.2.2. IPv4 Source-Active Request TLV 462 The Source-Active Request is used to request SA-state from a caching 463 MSDP peer. If an RP in a domain receives a PIM Join message for a 464 group, creates (*,G) state and wants to know all active sources for 465 group G, and it has been configured to peer with an SA-state caching 466 peer, it may send an SA-Request message for the group. 468 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 469 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 470 | 2 | 8 | Gprefix Len | 471 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 472 | Group Address Prefix | 473 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 475 Type 476 IPv4 Source-Active Request TLV is type 2. 478 Gprefix Len 479 The route prefix length associated with the group address prefix. 481 Group Address Prefix 482 The group address prefix the MSDP peer is requesting. 484 12.2.3. IPv4 Source-Active Response TLV 486 The Source-Active Response is sent in response to a Source-Active 487 Request message. The Source-Active Response message has the same 488 format as a Source-Active message but does not allow encapsulation of 489 multicast data. 491 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 492 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 493 | 3 | x | .... | 494 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 496 Type 497 IPv4 Source-Active Response TLV is type 3. 499 Length x 500 Is the length of the control information in the message. x is 8 501 octets (for the first two 32-bit quantities) plus 12 times Entry 502 Count octets. 504 12.2.4. KeepAlive TLV 506 A KeepAlive TLV is sent to an MSDP peer if and only if there were no 507 MSDP messages sent to the peer after a period of time. This message 508 is necessary for the active connect side of the MSDP connection. The 509 passive connect side of the connection knows that the connection will 510 be reestablished when a TCP SYN packet is sent from the active 511 connect side. However, the active connect side will not know when the 512 passive connect side goes down. Therefore, the KeepAlive timeout will 513 be used to reset the TCP connection. 515 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 516 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 517 | 4 | 3 | | 518 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 520 The length of the message is 3 bytes which encompasses the 1-byte 521 Type field and the 2-byte Length field. 523 12.2.5. Encapsulation Capability Advertisement TLV 525 This TLV is sent by an MSDP speaker to advertise its ability to 526 receive data packets encapsulated as described by the TLV (in 527 addition to the default TCP encapsulation). 529 A MSDP speaker receiving this TLV can choose to either default TCP 530 encapsulation, or may send a IPv4 Encapsulation Request to change to 531 the advertised encapsulation type. 533 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 534 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 535 | 5 | 8 | ENCAP_TYPE | 536 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 537 | Source Port | Reserved | 538 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 540 Type 541 IPv4 Encapsulation Advertisement TLV is type 5. 543 Length 544 Length is a two byte field with value 8. 546 ENCAP_TYPE 547 The following data encapsulation types are defined for MSDP: 549 Value Meaning 550 --------------------------------------- 551 0 TCP Encapsulation 552 1 UDP Encapsulation [RFC768] 553 2 GRE Encapsulation [GRE] 555 Source Port 556 Port for use by the requester. 558 Reserved 559 The Reserved field MUST be transmitted as zeros and ignored 560 by a receiver. 562 Note that since the TLV does not carry endpoint addresses for the GRE 563 or UDP tunnels, an implementation using these encapsulations MUST use 564 the endpoints that are used for the MSDP peering. 566 12.2.6. Encapsulation Capability Request TLV 568 The Encapsulation Capability Request is sent to notify a peer that 569 has advertised an encapsulation capability that it will encapsulate 570 SA data according to the advertised ENCAP_TYPE. 572 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 573 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 574 | 6 | 4 | ENCAP_TYPE | 575 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 576 Type 577 IPv4 Encapsulation Request TLV is type 6. 579 Length 580 Length is a two byte field with value 4. 582 ENCAP_TYPE 583 ENCAP_TYPE is described above. 585 A requester MAY also provide a source port, in which case 586 the TLV has the following form: 588 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 589 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 590 | 6 | 8 | ENCAP_TYPE | 591 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 592 | Source Port | Reserved | 593 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 595 12.2.7. NOTIFICATION TLV 597 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 598 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 599 | 7 | x + 5 | Error Code | 600 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 601 | Error subcode | ... | 602 +-+-+-+-+-+-+-+-+ | 603 | Data | 604 | ... | 605 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 607 Type 608 The Notification TLV is type 7. 610 Length 611 Length is a two byte field with value x + 5, where x is 612 the length of the notification data field. 614 Error code 615 See [RFC1771]. In addition, Error code 7 indicates an 616 an SA-Request Error. 618 Error subcode 619 See [RFC1771]. In addition, Error code 7 subcode 1 indicates 620 the receipt of an SA-Request message by a non-caching 621 MSDP speaker. 623 Data 624 See [RFC1771]. In addition, for Error code 7 subcode 1 (receipt 625 of an SA-Request message by a non-caching MSDP speaker), the 626 TLV has the following form: 628 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 629 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 630 | 7 | 20 | 7 | 631 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 632 | 1 | Reserved | Gprefix Len | Sprefix Len | 633 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 634 | Advertising RP Address | 635 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 636 | Group Address Prefix | 637 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 638 | Source Address Prefix | 639 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 641 See [RFC1771] for NOTIFICATION error handling. 643 12.2.8. Encapsulation Capability State Machine 645 The active connect side of an MSDP peering SHALL begin in ADVERTISING 646 state, and the passive side of the TCP connection begins in DEFAULT 647 state. This will cause the state machine to behave deterministically. 649 +-------+ 650 | | Receive TLV which isn't 651 | | understood or 652 | | Receive Request (TLV 6) or 653 | | Receive Advertisement (TLV 5) 654 \|/ | that isn't understood 655 +---------+----+ 656 | DEFAULT |----------------+ 657 +---------+ | 658 | 659 +-------------+ | 660 | ADVERTISING | | 661 +-------------+ | 662 | | 663 Timeout +--------+ | | 664 +-------->| FAILED | | Send Advertisement | Receive Advertisement 665 | +--------+ | (TLV 5) | (TLV 5) 666 | | | 667 | | | 668 | | | 669 | | | 670 | Receive non-matching | | 671 | Request (TLV 6) | | 672 | +----+ | | 673 | | | | | 674 | | | | | 675 | | \|/ | \|/ 676 | | +------+ | +----------+ 677 | +-| SENT |<-------------+ | RECEIVED | 678 +---+------+ +----------+ 679 | \|/ 680 | | 681 | Receive matching | Send matching 682 | Request (TLV 6) | Request (TLV 6) 683 | +--------+ | 684 +------------>| AGREED |<------------------+ 685 +--------+ 687 Note that if an advertiser transitions into the FAILED state, it 688 SHOULD assume that it has an old-style peer which can only support 689 TCP encapsulation. If an implementation wishes to be backwardly 690 compatible, it SHOULD support TCP encapsulation. In addition, a 691 requester in any state other than AGREED MUST only encapsulate data 692 in the TCP stream. 694 12.2.9. UDP Data Encapsulation 696 When using UDP encapsulation, the UDP psuedo-header has the following 697 form: 699 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 700 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 701 | Source Port | Dest Port | 702 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 703 | Length | Checksum | 704 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 705 | Origin RP Address | 706 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 708 Source Port 709 When using UDP encapsulation, a capability requester 710 uses the advertiser's Source Port as its destination 711 port. The advertiser MUST provide a Source Port. 713 Destination Port 714 When using UDP encapsulation, a capability advertiser 715 uses the well known port 639 as the destination port. 716 A capability requester MUST listen on this well-known 717 port. The requester MAY provide a Source Port in it's 718 reply to the advertiser. 720 Length 721 Length is the length in octets of this user datagram 722 including this header and the data. The minimum value 723 of the length is twelve. 725 Checksum 726 The checksum is computed according to RFC 768 [RFC768]. 728 Originating RP Address 729 The Originating RP Address is the address of the RP sending 730 the encapsulated data. 732 12.2.10. GRE Encapsulation TLV 734 A TLV is defined to describe GRE encapsulated data packets. The TLV 735 has the following form: 737 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 738 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 739 | 8 | 8 + x | Reserved | 740 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 741 | Originating RP IPv4 Address | 742 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 743 | (S,G) Data Packet .... 744 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 746 Type 747 GRE encapsulated data packet TLV is type 8. 749 Length 750 Length is a two byte field with value 8 + x, where 751 x is the length of the (S,G) Data packet. 753 Reserved 754 The Reserved field MUST be transmitted as zeros and ignored 755 by a receiver. 757 The entire GRE header, then, will have the following form: 759 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 760 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 761 | Delivery Headers ..... | 762 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 763 |C| Reserved0 | Ver | Protocol Type |\ 764 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ GRE Header 765 | Checksum (optional) | Reserved1 |/ 766 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+\ 767 | 8 | 8 + x | Reserved | \ 768 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Payload 769 | Originating RP IPv4 Address | / 770 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ . 771 | (S,G) Data Packet .... . 772 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 774 12.2.10.1. Problems with MTU Exceeded by Encapsulation 776 Black holes can arise when PMTU [RFC1191] is used and the tunnel 777 entry point does not relay MTU exceeded errors back to the originator 778 of the packet. A black hole can be realized by the following 779 behavior: the originator sets the Don't Fragment bit in the Delivery 780 Header, the packet gets dropped within the tunnel (MTU is exceeded), 781 but since the originator doesn't receive feedback, it retransmits 782 with the same PMTU, causing subsequently transmitted packets to be 783 dropped. While GRE [GRE] does not require that such errors be relayed 784 back to the originator, known implementations of GRE do not set the 785 Don't Fragment bit in the Delivery Header. 787 13. Security Considerations 789 An MSDP implementation MAY use IPsec [RFC1825] or keyed MD5 [RFC1828] 790 to secure control messages. When encapsulating SA data in GRE, 791 security should be relatively similar to security in a normal IPv4 792 network, as routing using GRE follows the same routing that IPv4 uses 793 natively. Route filtering will remain unchanged. However packet 794 filtering at a firewall requires either that a firewall look inside 795 the GRE packet or that the filtering is done on the GRE tunnel 796 endpoints. In those environments in which this is considered to be a 797 security issue it may be desirable to terminate the tunnel at the 798 firewall. 800 14. Acknowledgments 802 The authors would like to thank Dave Thaler, Bill Nickless, John 803 Meylor, Liming Wei, Manoj Leelanivas, Mark Turner, and John Zwiebel 804 for their design feedback and comments. Bill Fenner also made many 805 contributions, including clarification of the Peer-RPF rules. 807 15. Author's Address: 809 Dino Farinacci 810 Procket Networks 811 3850 No. First St., Ste. C 812 San Jose, CA 95134 813 Email: dino@procket.com 815 Yakov Rehkter 816 Cisco Systems, Inc. 817 170 Tasman Drive 818 San Jose, CA, 95134 819 Email: yakov@cisco.com 821 Peter Lothberg 822 Sprint 823 VARESA0104 824 12502 Sunrise Valley Drive 825 Reston VA, 20196 826 Email: roll@sprint.net 828 Hank Kilmer 829 Email: hank@rem.com 831 Jeremy Hall 832 UUnet Technologies 833 3060 Williams Drive 834 Fairfax, VA 22031 835 Email: jhall@uu.net 837 David Meyer 838 Cisco Systems, Inc. 839 170 Tasman Drive 840 San Jose, CA, 95134 841 Email: dmm@cisco.com 843 16. REFERENCES 845 [ANYCASTRP] Meyer, et. al, "Anycast RP mechanism using PIM and 846 MSDP", draft-ietf-mboned-anycast-rp-04.txt, November, 847 1999. Work in Progress. 849 [GRE] Farinacci, D., at el., "Generic Routing Encapsulation 850 (GRE)", draft-meyer-gre-update-01.txt, December, 851 1999. Work in Progress. 853 [RFC768] Postel, J. "User Datagram Protocol", RFC 768, August, 854 1980. 856 [RFC1191] Mogul, J., and S. Deering, "Path MTU Discovery", 857 RFC 1191, November 1990. 859 [RFC1771] Rekhter, Y., and T. Li, "A Border Gateway Protocol 4 860 (BGP-4)", RFC 1771, March 1995. 862 [RFC1825] Atkinson, R., "Security Architecture for the Internet 863 Protocol", RFC 1825, August, 1995. 865 [RFC1828] P. Metzger and W. Simpson, "IP Authentication using 866 Keyed MD5", RFC 1828, August, 1995. 868 [RFC2119] S. Bradner, "Key words for use in RFCs to Indicate 869 Requirement Levels", RFC 2119, March, 1997. 871 [RFC2283] Bates, T., Chandra, R., Katz, D., and Y. Rekhter., 872 "Multiprotocol Extensions for BGP-4", RFC 2283, 873 February 1998. 875 [RFC2362] Estrin D., et al., "Protocol Independent Multicast - 876 Sparse Mode (PIM-SM): Protocol Specification", RFC 877 2362, June 1998. 879 [RFC2365] Meyer, D. "Administratively Scoped IP Multicast", RFC 880 2365, July, 1998.