idnits 2.17.1 draft-ietf-pim-port-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (October 25, 2010) is 4932 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- ** Obsolete normative reference: RFC 793 (Obsoleted by RFC 9293) ** Obsolete normative reference: RFC 4601 (Obsoleted by RFC 7761) ** Obsolete normative reference: RFC 4960 (Obsoleted by RFC 9260) -- Duplicate reference: RFC4601, mentioned in 'HELLO-OPT', was also mentioned in 'RFC4601'. -- Obsolete informational reference (is this intentional?): RFC 4601 (ref. 'HELLO-OPT') (Obsoleted by RFC 7761) Summary: 3 errors (**), 0 flaws (~~), 1 warning (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group D. Farinacci 3 Internet-Draft IJ. Wijnands 4 Intended status: Experimental S. Venaas 5 Expires: April 28, 2011 cisco Systems 6 M. Napierala 7 AT&T Labs 8 October 25, 2010 10 A Reliable Transport Mechanism for PIM 11 draft-ietf-pim-port-04.txt 13 Abstract 15 This draft describes how a reliable transport mechanism can be used 16 by the PIM protocol to optimize CPU and bandwidth resource 17 utilization by eliminating periodic Join/Prune message transmission. 18 This draft proposes a modular extension to PIM to use either the TCP 19 or SCTP transport protocol. 21 Status of this Memo 23 This Internet-Draft is submitted in full conformance with the 24 provisions of BCP 78 and BCP 79. 26 Internet-Drafts are working documents of the Internet Engineering 27 Task Force (IETF). Note that other groups may also distribute 28 working documents as Internet-Drafts. The list of current Internet- 29 Drafts is at http://datatracker.ietf.org/drafts/current/. 31 Internet-Drafts are draft documents valid for a maximum of six months 32 and may be updated, replaced, or obsoleted by other documents at any 33 time. It is inappropriate to use Internet-Drafts as reference 34 material or to cite them other than as "work in progress." 36 This Internet-Draft will expire on April 28, 2011. 38 Copyright Notice 40 Copyright (c) 2010 IETF Trust and the persons identified as the 41 document authors. All rights reserved. 43 This document is subject to BCP 78 and the IETF Trust's Legal 44 Provisions Relating to IETF Documents 45 (http://trustee.ietf.org/license-info) in effect on the date of 46 publication of this document. Please review these documents 47 carefully, as they describe your rights and restrictions with respect 48 to this document. Code Components extracted from this document must 49 include Simplified BSD License text as described in Section 4.e of 50 the Trust Legal Provisions and are provided without warranty as 51 described in the Simplified BSD License. 53 Table of Contents 55 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 56 1.1. Requirements Notation . . . . . . . . . . . . . . . . . . 5 57 1.2. Definitions . . . . . . . . . . . . . . . . . . . . . . . 5 58 2. Protocol Overview . . . . . . . . . . . . . . . . . . . . . . 6 59 3. New PIM Hello Options . . . . . . . . . . . . . . . . . . . . 8 60 3.1. PIM over the TCP Transport Protocol . . . . . . . . . . . 8 61 3.2. PIM over the SCTP Transport Protocol . . . . . . . . . . . 9 62 4. Establishing Transport Connections . . . . . . . . . . . . . . 11 63 4.1. TCP Connection Maintenance . . . . . . . . . . . . . . . . 12 64 4.2. Moving from PORT to Datagram Mode . . . . . . . . . . . . 13 65 4.3. On-demand versus Pre-configured Connections . . . . . . . 14 66 4.4. Possible Hello Suppression Considerations . . . . . . . . 14 67 4.5. Avoiding a Pair of Connections between Neighbors . . . . . 15 68 5. Common Header Definition . . . . . . . . . . . . . . . . . . . 16 69 6. Explicit Tracking . . . . . . . . . . . . . . . . . . . . . . 20 70 7. Multiple Instances and Address-Family Support . . . . . . . . 21 71 8. Miscellany . . . . . . . . . . . . . . . . . . . . . . . . . . 22 72 9. Security Considerations . . . . . . . . . . . . . . . . . . . 23 73 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 24 74 11. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 25 75 12. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 26 76 13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 27 77 13.1. Normative References . . . . . . . . . . . . . . . . . . . 27 78 13.2. Informative References . . . . . . . . . . . . . . . . . . 27 79 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 28 81 1. Introduction 83 The goals of this specification are: 85 o To create a simple incremental mechanism to provide reliable PIM 86 message delivery in PIM version 2 for use with PIM Sparse-Mode 87 [RFC4601] (including Source-Specific Multicast) and Bidirectional 88 PIM [RFC5015]. 90 o The reliable transport mechanism will be used for Join-Prune 91 message transmission only. 93 o When a router supports this specification, it need not use the 94 reliable transport mechanism with every neighbor. That is, 95 negotiation on a per neighbor basis will occur. 97 The explicit non-goals of this specification are: 99 o Changes to the PIM message formats as defined in [RFC4601]. 101 o Provide support for automatic switching between Datagram mode and 102 Transport mode. Two routers that are PIM neighbors on a link will 103 always use Transport mode if and only if both have Transport mode 104 enabled. 106 This document will specify how periodic Join/Prune message 107 transmission can be eliminated by using TCP [RFC0793] or SCTP 108 [RFC4960] as the reliable transport mechanism for Join/Prune 109 messages. 111 This specification enables greater scalability in terms of control 112 traffic overhead. However, for routers connected to multi-access 113 links that comes at the price of increased control plane state 114 overhead and the control plane overhead required to maintain this 115 state. 117 In many existing and emerging networks, particularly wireless and 118 mobile satellite systems, link degradation due to weather, 119 interference, and other impairments can result in temporary spikes in 120 the packet loss. In these environments, periodic PIM joining can 121 cause join latency when messages are lost causing a retransmission 122 only 60 seconds later. By applying a reliable transport, a lost join 123 is retransmitted rapidly. Furthermore, when the last user leaves a 124 multicast group, any lost prune is similarly repaired and the 125 multicast stream is quickly removed from the wireless/satellite link. 126 Without a reliable transport, the multicast transmission could 127 otherwise continue until it timed out, roughly 3 minutes later. As 128 network resources are at a premium in many of these environments, 129 rapid termination of the multicast stream is critical to maintaining 130 efficient use of bandwidth. 132 1.1. Requirements Notation 134 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 135 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 136 document are to be interpreted as described in [RFC2119]. 138 1.2. Definitions 140 PORT: Stands for PIM Over Reliable Transport. Which is the short 141 form for describing the mechanism in this specification where PIM 142 can use the TCP or SCTP transport protocol. 144 Periodic Join/Prune message: A Join/Prune message sent periodically 145 to refresh state. 147 Incremental Join/Prune message: A Join/Prune message sent as a 148 result of state creation or deletion events. Also known as a 149 triggered message. 151 Native Join/Prune message: A Join/Prune message which is carried 152 with an IP protocol type of PIM. 154 PORT Join/Prune message: A Join/Prune message using TCP or SCTP for 155 transport. 157 Datagram Mode: The current procedures PIM uses by encapsulating 158 Join/Prune messages in IP packets sent either triggered or 159 periodically. 161 PORT Mode: Procedures used by PIM defined in this specification for 162 sending Join/Prune messages over the TCP or SCTP transport layer. 164 2. Protocol Overview 166 PIM Over Reliable Transport (PORT) is a simple extension to PIMv2 for 167 refresh reduction of PIM Join/Prune messages. It involves sending 168 incremental rather than periodic Join/Prune messages over a TCP/SCTP 169 connection between PIM neighbors. 171 PORT only applies to PIM Sparse-Mode [RFC4601] and Bidirectional PIM 172 [RFC5015] Join/Prune messages. 174 This document does not restrict PORT to any specific link types. 175 However, the use of PORT on e.g. multi-access LANs with many PIM 176 neighbors should be carefully evaluated. This due to the fact that 177 there may be a full mesh of PORT connections, and that explicit 178 tracking of all PIM PORT routers is required. 180 PORT can be incrementally used on a link between PORT capable 181 neighbors. Routers which are not PORT capable can continue to use 182 PIM in Datagram Mode. PORT capability is detected using new PORT 183 Capable PIM Hello Options. 185 Once PORT is enabled on an interface and a PIM neighbor also 186 announces that it is PORT enabled, only PORT Join/Prune messages will 187 be used. That is, only PORT Join/Prune messages are accepted from, 188 and sent to, that particular neighbor. Native Join/Prune messages 189 may still be used for other neighbors. 191 PORT Join/Prune messages are sent using a TCP/SCTP connection. When 192 two PIM neighbors are PORT enabled, both for TCP or both for SCTP, 193 they will immediately, or on-demand, establish a connection. If the 194 connection goes down, they will again immediately, or on-demand, try 195 to reestablish the connection. No Join/Prune messages (neither 196 Native nor PORT) are sent while there is no connection. 198 When PORT is used, only incremental Join/Prune messages are sent from 199 downstream routers to upstream routers. As such, downstream routers 200 do not generate periodic Join/Prune messages for state for which the 201 RPF neighbor is PORT-capable. 203 For Joins and Prunes, which are received over a TCP/SCTP connection, 204 the upstream router does not start or maintain timers on the outgoing 205 interface entry. Instead, it keeps track of which downstream routers 206 have expressed interest. An interface is deleted from the outgoing 207 interface list only when all downstream routers on the interface, no 208 longer wish to receive traffic. 210 There is no change proposed for the PIM Join/Prune packet format. 211 However, for Join/Prune messages sent over TCP/SCTP connections, no 212 IP Header is included. The message begins with the PIM common 213 header, followed by the Join/Prune message. See section Section 5 214 for details on the common header. 216 3. New PIM Hello Options 218 3.1. PIM over the TCP Transport Protocol 220 Option Type: PIM-over-TCP Capable 222 0 1 2 3 223 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 224 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 225 | Type = 27 | Length = X + 8 | 226 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 227 | TCP Connection ID AFI | Reserved | Exp | 228 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 229 | TCP Connection ID | 230 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 231 | Interface ID | 232 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 234 Allocated Hello Type values can be found in [HELLO-OPT]. 236 When a router is configured to use PIM over TCP on a given interface, 237 it MUST include the PIM-over-TCP Capable hello option in its Hello 238 messages for that interface. If a router is explicitly disabled from 239 using PIM over TCP it MUST NOT include the PIM-over-TCP Capable hello 240 option in its Hello messages. When the router cannot setup a TCP 241 connection, it will refrain from including this option. 243 Implementations may provide a configuration option to enable or 244 disable PORT functionality. We recommend that this capability be 245 disabled by default. 247 Length: In bytes for the value part of the Type/Length/Value 248 encoding. Where X is 4 bytes if AFI of value 1 (IPv4) is used and 249 16 bytes when AFI of value 2 (IPv6) is used [AFI]. 251 TCP Connection ID AFI: The AFI value to describe the address-family 252 of the address of the TCP Connection ID field. When this field is 253 0, a mechanism outside the scope of this spec is used to obtain 254 the addresses used to establish the TCP connection. 256 Reserved: Set to zero on transmission and ignored on receipt. 258 Exp: For experimental use [RFC3692]. 260 TCP Connection ID: An IPv4 or IPv6 address used to establish the 261 TCP connection. This field is omitted (length 0) for the 262 Connection ID AFI 0. 264 Interface ID: An Interface ID is used to associate the connection a 265 Join/Prune message is received over with an interface which is 266 added or removed from an oif-list. When unnumbered interfaces are 267 used or when a single Transport connection is used for sending and 268 receiving Join/Prune messages over multiple interfaces, the 269 Interface ID is used convey the interface from Join/Prune message 270 sender to Join/Prune message receiver. When a PIM router sets a 271 locally generated value for the Interface ID in the Hello TLV, it 272 must send the same Interface ID value in all Join/Prune messages 273 it is sending to the PIM neighbor. 275 3.2. PIM over the SCTP Transport Protocol 277 Option Type: PIM-over-SCTP Capable 279 0 1 2 3 280 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 281 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 282 | Type = 28 | Length = X + 8 | 283 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 284 | SCTP Connection ID AFI | Reserved | Exp | 285 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 286 | SCTP Connection ID | 287 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 288 | Interface ID | 289 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 291 Allocated Hello Type values can be found in [HELLO-OPT]. 293 When a router is configured to use PIM over SCTP on a given 294 interface, it MUST include the PIM-over-SCTP Capable hello option in 295 its Hello messages for that interface. If a router is explicitly 296 disabled from using PIM over SCTP it MUST NOT include the PIM-over- 297 SCTP Capable hello option in its Hello messages. When the router 298 cannot setup a SCTP connection, it will refrain from including this 299 option. 301 Implementations may provide a configuration option to enable or 302 disable PORT functionality. We recommend that this capability be 303 disabled by default. 305 Length: In bytes for the value part of the Type/Length/Value 306 encoding. Where X is 4 bytes if AFI of value 1 (IPv4) is used and 307 16 bytes when AFI of value 2 (IPv6) is used [AFI]. 309 SCTP Connection ID AFI: The AFI value to describe the address- 310 family of the address of the SCTP Connection ID field. When this 311 field is 0, a mechanism outside the scope of this spec is used to 312 obtain the addresses used to establish the SCTP connection. 314 Reserved: Set to zero on transmission and ignored on receipt. 316 Exp: For experimental use [RFC3692]. 318 SCTP Connection ID: An IPv4 or IPv6 address used to establish the 319 SCTP connection. This field is omitted (length 0) for the 320 Connection ID AFI 0. 322 Interface ID: An Interface ID is used to associate the connection a 323 Join/Prune message is received over with an interface which is 324 added or removed from an oif-list. When unnumbered interfaces are 325 used or when a single Transport connection is used for sending and 326 receiving Join/Prune messages over multiple interfaces, the 327 Interface ID is used convey the interface from Join/Prune message 328 sender to Join/Prune message receiver. When a PIM router sets a 329 locally generated value for the Interface ID in the Hello TLV, it 330 must send the same Interface ID value in all Join/Prune messages 331 it is sending to the PIM neighbor. 333 4. Establishing Transport Connections 335 While a router interface is PORT enabled, a PIM-over-TCP or a PIM- 336 over-SCTP option is included in the PIM Hello messages sent on that 337 interface. When a router on a PORT-enabled interface receives a 338 Hello message containing a PIM-over-TCP/PIM-over-SCTP Option from a 339 new neighbor, or an existing neighbor that did not previously include 340 the option, it switches to PORT mode for that particular neighbor. 342 When a router switches to PORT mode for a neighbor, it stops sending 343 and accepting Native Join/Prune messages for that neighbor. Any 344 state from previous Native Join/Prune messages is left to expire as 345 normal. It will also attempt to establish a Transport connection 346 (TCP or SCTP) with the neighbor. If both the router and its neighbor 347 have announced both PIM-over-TCP and PIM-over-SCTP options, SCTP MUST 348 be used. 350 When the router is using TCP it will compare the TCP Connection ID it 351 announced in the PIM-over-TCP Capable Option with the TCP Connection 352 ID in the Hello received from the neighbor. The router with the 353 lower Connection ID will do an active Transport open to the neighbor 354 Connection ID. The router with the higher Connection ID will do a 355 passive Transport open. An implementation may open connections only 356 on-demand, in that case it may be that the neighbor with the higher 357 Connection ID does the active open, see Section 4.3. Note that the 358 source address of the active open must be the announced Connection 359 ID. 361 When the router is using SCTP, the IP address comparison need not be 362 done since the SCTP protocol can handle call collision. 364 If PORT is used both for IPv4 and IPv6, both IPv4 and IPv6 PIM Hello 365 messages are sent, both containing PORT Hello options. If two 366 neighbors announce the same transport (TCP or SCTP) and the same 367 Connection ID in the IPv4 and IPv6 Hello messages, then only one 368 connection is established and is shared. Otherwise, two connections 369 are established and are used separately. 371 The PIM router that performs the active open initiates the connection 372 with a locally generated source transport port number and a well- 373 known destination transport port number. The PIM router that 374 performs the passive open listens on the well-known local transport 375 port number and does not qualify the remote transport port number. 376 See Section 5 for well-known port number assignment for PORT. 378 When a Transport connection is established (or reestablished), the 379 two routers MUST both send a full set of Join/Prune messages for 380 state for which the other router is the upstream neighbor. This is 381 needed to ensure that the upstream neighbor has the correct state. 382 When moving from Datagram mode, or when the connection has gone down, 383 the router cannot be sure that all the previous Join/Prune state was 384 received by the neighbor. Any state received while in Datagram mode 385 that is not refreshed, will be left to expire. 387 When a Transport connection goes down, Join/Prune state that was sent 388 over the Transport connection is still retained. The neighbor should 389 not be considered down until the neighbor timer has expired. This 390 allows routers to do a control-plane switchover without disrupting 391 the network. If a Transport connection is reestablished before the 392 neighbor timer expires, the previous state is intact and any new 393 Join/Prune messages sent cause state to be created or removed 394 (depending on if it was a Join or Prune). If the neighbor timer does 395 expire, only the upstream router, that has oif-list state, to the 396 expired downstream neighbor will need to clear state. A downstream 397 router, when an upstream neighboring router has expired, will simply 398 update the RPF for the corresponding state to a new neighbor where it 399 would trigger Join/Prune messages like it would in [RFC4601]. It is 400 required of a PIM router to clear its neighbor table for a neighbor 401 who has timed out due to neighbor holdtime expiration. 403 Note that, a Join sent over a Transport connection will only be seen 404 by the upstream router, and thus will not cause routers on the link 405 that do not use PIM PORT with the upstream router to possibly delay 406 the refresh of Join state for the same state. Similarly, a Prune 407 sent over a Transport connection will only be seen by the upstream 408 router, and will thus never cause routers on the link on the link 409 that do not use PIM PORT with the upstream router, to send a Join to 410 override this Prune. 412 Note also, that a datagram PIM Join/Prune message for a said (S,G) or 413 (*,G) sent by some router on a link will not cause routers on the 414 same link that use a Transport connection with the upstream router 415 for that state, to suppress the refresh of that state to the usptream 416 router (because they don't need to periodically refresh this state) 417 or to send a Join to override a Prune (as the upstream router will 418 only stop forwarding the traffic when all joined routers that use a 419 Transport connection have explicitly sent a Prune for this state, as 420 explained in Section 6). 422 4.1. TCP Connection Maintenance 424 TCP is designed to keep connections up indefinitely during a period 425 of network disconnection. If a PIM-over-TCP router fails, the TCP 426 connection may stay up until the neighbor actually reboots, and even 427 then it may continue to stay up until you actually try to send the 428 neighbor some information. This is particularly relevant to PIM, 429 since the flow of Join/Prune messages might be in only one direction, 430 and the downstream neighbor might never get any indication via TCP 431 that the other end of the connection is not really there. 433 Implementations SHOULD support the use of TCP Keep-Alives, see 434 [RFC1122] section 4.2.3.6. We recommend the use of Keep-Alives to be 435 optional, allowing network administrators to use it as needed. Note 436 that Keep-Alives can be used by a peer, independently of whether the 437 other peer supports it. With the use of Keep-Alives one can detect 438 that a connection is not working without sending any TCP data. 440 Most applications using TCP want to detect when a neighbor is no 441 longer there, so that the associated application state can be 442 released. Also, one wants to clean up the TCP state, and not keep 443 half-open connections around indefinitely. This is accomplished by 444 using PIM Hellos and by not introducing an application-specific or 445 new PIM keep-alive message. Therefore, when a GENID changes from a 446 received PIM Hello message, and a TCP connection is established or 447 attempting to be established, the local side will tear down the 448 connection and attempt to reopen a new one for the new instance of 449 the neighbor coming up. However, if the connection is shared by 450 multiple interfaces and the GENID changes only for one of them, then 451 there was not a full reboot and the connection is likely to still 452 work. In that case, the router should just resend all Join/Prune 453 state for that particular neighbor. This is similar to how state is 454 refreshed when GENID changes for PIM in datagram mode. 456 There may be situations where a router ignores some joins or prunes. 457 E.g. due to wrong RP information or receiving joins on an RPF 458 interface. A router may try to cache such messages and apply them 459 later if only a temporary error. It may however also ignore the 460 message, and later change its GENID for that interface to make the 461 neighbor resend all state, including any that may have been 462 previously ignored. It is possible that one receives Join/Prune 463 messages for an interface/link that is down. As long as the neighbor 464 has not expired, we recommend processing those messages as usual. If 465 they are ignored, then the router should change the GENID for that 466 interface when it comes back up, in order to get a full update. 468 4.2. Moving from PORT to Datagram Mode 470 There may be situations where an administrator decides to stop using 471 PORT. If PORT is disabled on a router interface, we start expiry 472 timers with the respective neighbor holdtimes as the initial values. 473 Similarly if we receive a Hello message without a PORT Capable option 474 from a neighbor, we start expiry timers for all Join/Prune state we 475 have for that particular neighbor. The Transport connection should 476 be shut down as soon as there are no more PIM neighborships using it. 478 That is, for the connection we have associated local and remote 479 Connection IDs. When there is no PIM neighbor with that particular 480 remote connection ID on any interface where we announce the local 481 connection ID, the connection should be shut down. 483 4.3. On-demand versus Pre-configured Connections 485 Transport connections could be established when they are needed or 486 when a router interface to other PIM neighbors has come up. The 487 advantage of on-demand Transport connection establishment is the 488 reduction of router resources. Especially in the case where there is 489 no need for n^2 connections on a network interface. The disadvantage 490 is additional delay and queueing when a Join/Prune message needs to 491 be sent and a Transport connection is not established yet. 493 If a router interface has become operational and PIM neighbors are 494 learned from Hello messages, at that time, Transport connections may 495 be established. The advantage is that a connection is ready to 496 transport data by the time a Join/Prune message needs to be sent. 497 The disadvantage is there can be more connections established than 498 needed. This can occur when there is a small set of RPF neighbors 499 for the active distribution trees compared to the total number of 500 neighbors. Even when Transport connections are pre-established 501 before they are needed, a connection can go down and an 502 implementation will have to deal with an on-demand situation. 504 Note that for TCP, it is the router with the lower Connection ID that 505 decides whether to open a connection immediately, or on-demand. The 506 router with the higher Connection ID should only initiate a 507 connection on-demand. That is, if it needs to send a Join/Prune 508 message and there is no currently established connection. 510 Therefore, this specification recommends but does not mandate the use 511 of on-demand Transport connection establishment. 513 4.4. Possible Hello Suppression Considerations 515 This specification indicates that a Transport connection cannot be 516 established until a Hello message is received. One reason for this 517 is to determine if the PIM neighbor supports this specification and 518 the other is to determine the remote address to use to establish the 519 Transport connection. 521 There are cases where it is desirable to suppress entirely the 522 transmission of Hello messages. In this case, it is outside the 523 scope of this document on how to determine if the PIM neighbor 524 supports this specification as well as an out-of-band (outside of the 525 PIM protocol) method to determine the remote address to establish the 526 Transport connection. 528 4.5. Avoiding a Pair of Connections between Neighbors 530 To ensure there are not two connections between a pair of PIM 531 neighbors, the following set of rules must be followed. Let A and B 532 be two PIM neighbors where A's Connection ID is numerically smaller 533 than B's Connection ID, and each is known to the other as having a 534 potential PIM adjacency relationship. 536 At node A: 538 o If there is already an established TCP connection to B, on the 539 PIM-over-TCP port, then A MUST NOT attempt to establish a new 540 connection to B. Rather it uses the established connection to send 541 Join/Prune messages to B. (This is independent of which node 542 initiated the connection.) 544 o If A has initiated a connection to B, but the connection is still 545 in the process of being established, then A MUST refuse any 546 connection on the PIM-over-TCP port from B. 548 o At any time when A does not have a connection to B which is either 549 established or in the process of being established, A MUST accept 550 connections from B. 552 At node B: 554 o If there is already an established TCP connection to A, on the 555 PIM-over-TCP port, then B MUST NOT attempt to establish a new 556 connection to A. Rather it uses the established connection to send 557 Join/Prune messages to A. (This is independent of which node 558 initiated the connection.) 560 o If B has initiated a connection to A, but the connection is still 561 in the process of being established, then if A initiates a 562 connection too, B MUST accept the connection initiated by A and 563 must release the connection which it (B) initiated. 565 5. Common Header Definition 567 It may be desirable for scaling purposes to allow Join/Prune messages 568 from different PIM protocol instances to be sent over the same 569 Transport connection. Also, it may be desirable to have a set of 570 Join/Prune messages for one address-family sent over a Transport 571 connection that is established over a different address-family 572 network layer. 574 To be able to do this we need a common header that is inserted and 575 parsed for each PIM Join/Prune message that is sent on a Transport 576 connection. This common header will provide both record boundary and 577 demux points when sending over a stream protocol like Transport. 579 Each Join/Prune message will have in front of it the following common 580 header in Type/Length/Value format. And multiple different TLV types 581 can be sent over the same Transport connection. 583 To make sure PIM Join/Prune messages are delivered as soon as the TCP 584 transport layer receives the Join/Prune buffer, the TCP Push flag 585 will be set in all outgoing Join/Prune messages sent over a TCP 586 transport connection. 588 PIM messages will be sent using destination TCP port number 8471. 589 When using SCTP as the reliable transport, destination port number 590 8471 will be used. See Section 10 for IANA considerations. 592 Join/Prune messages are error checked. This includes a bad PIM 593 checksum, illegal type fields, illegal addresses or a truncated 594 message. If any parsing errors occur in a Join/Prune message, it is 595 skipped, and we proceed processing any following TLVs. 597 The TLV type field is 16 bits. The range 61440 - 65535 is for 598 experimental use [RFC3692]. 600 The current list of defined TLVs are: 602 IPv4 Join/Prune Message 604 0 1 2 3 605 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 606 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 607 | Type = 1 | Length = X + 16 | 608 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 609 | Reserved | Exp |I-Type | 610 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 611 | Interface ID | 612 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 613 | Instance ID . . . | 614 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 615 | . . . Instance ID | 616 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 617 | PIMv2 Join/Prune Message | 618 | . | 619 | . | 620 | . | 621 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 623 The IPv4 Join/Prune common header is used when a Join/Prune message 624 is sent that has all IPv4 encoded addresses in the PIM payload. 626 Length: In bytes for the value part of the Type/Length/Value 627 encoding. Where X is the number of bytes that make up the PIMv2 628 Join/Prune message. 630 Reserved: Set to zero on transmission and ignored on receipt. 632 Exp: For experimental use [RFC3692]. 634 I-Type: Defines the encoding and semantics of the Instance ID 635 field. Instance Type 0 means Instance ID is not used. Other 636 values are not defined in this specification. A message with an 637 unknown Instance Type MUST be ignored. 639 Interface ID: This is the Interface ID from the Hello TLV, defined 640 in this specification, the PIM router is sending to the PIM 641 neighbor. It indicates to the PIM neighbor what interface to 642 associate the Join/Prune with. 644 Instance ID: This document only defines this for Instance Type 0. 645 For type 0 the field should be set to zero on transmission and 646 ignored on receipt. This field is always 64 bits. 648 PIMv2 Join/Prune Message: PIMv2 Join/Prune message and payload with 649 no IP header in front of it. As you can see from the packet 650 format diagram, multiple Join/Prune messages can go into one TCP/ 651 SCTP stream from the same or different Interface and Instance IDs. 653 IPv6 Join/Prune Message 655 0 1 2 3 656 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 657 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 658 | Type = 2 | Length = X + 16 | 659 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 660 | Reserved | Exp |I-Type | 661 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 662 | Interface ID | 663 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 664 | Instance ID . . . | 665 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 666 | . . . Instance ID | 667 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 668 | PIMv2 Join/Prune Message | 669 | . | 670 | . | 671 | . | 672 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 674 The IPv6 Join/Prune common header is used when a Join/Prune message 675 is sent that has all IPv6 encoded addresses in the PIM payload. 677 Length: In bytes for the value part of the Type/Length/Value 678 encoding. Where X is the number of bytes that make up the PIMv2 679 Join/Prune message. 681 Reserved: Set to zero on transmission and ignored on receipt. 683 Exp: For experimental use [RFC3692]. 685 I-Type: Defines the encoding and semantics of the Instance ID 686 field. Instance Type 0 means Instance ID is not used. Other 687 values are not defined in this specification. 689 Interface ID: This is the Interface ID from the Hello TLV, defined 690 in this specification, the PIM router is sending to the PIM 691 neighbor. It indicates to the PIM neighbor what interface to 692 associate the Join/Prune with. 694 Instance ID: This document only defines this for Instance Type 0. 695 For type 0 the field should be set to zero on transmission and 696 ignored on receipt. 698 PIMv2 Join/Prune Message: PIMv2 Join/Prune message and payload with 699 no IP header in front of it. As you can see from the packet 700 format diagram, multiple Join/Prune messages can go into one TCP/ 701 SCTP stream from the same or different Interface and Instance IDs. 703 6. Explicit Tracking 705 When explicit tracking is used, a router keeps track of join state 706 for individual downstream neighbors on a given interface. This is 707 done for all PORT joins and prunes. It may also be done for native 708 join/prune messages, if all neighbors on the LAN have set the T bit 709 of the LAN Prune Delay option. In the discussion below we will talk 710 about ET (explicit tracking) neighbors, and non-ET neighbors. The 711 set of ET neighbors always includes the PORT neighbors. The set of 712 non-ET neighbors consists of all the non-PORT neighbors unless all 713 neighbors have set the LAN Prune Delay T bit. Then the ET neighbors 714 set contains all neighbors. 716 For some link-types, e.g. point-to-point, tracking neighbors is no 717 different than tracking interfaces. It may also be possible for an 718 implementation to treat different downstream neighbors as being on 719 different logical interfaces, even if they are on the same physical 720 link. Exactly how this is implemented and for which link types, is 721 left to the implementer. 723 For (*,G) and (S,G) state, the router starts forwarding traffic on an 724 interface when a Join is received from a neighbor on such an 725 interface. When a non-ET neighbor sends a Prune, there is generally 726 a small delay to see if another non-ET neighbor sends a Join to 727 override the Prune. If there is no override, one should note that no 728 non-ETP neighbor is interested. If no ET neighbors are interested, 729 the interface can be removed from the oif-list. When a ET neighbor 730 sends a Prune, one removes the join state for that neighbor. If no 731 other ET or non-ET neighbors are interested, the interface can be 732 removed from the oif-list. When a PORT neighbor sends a prune, there 733 can be no Prune Override, since the Prune is not visible to other 734 neighbors. 736 For (S,G,R) state, the router needs to track Prune state on the 737 shared tree. It needs to know which ET neighbors have sent prunes, 738 and whether any non-ET neighbors have sent prunes. Normally one 739 would forward a packet from a source S to a group G out on an 740 interface if a (*,G)-join is received, but no (S,G,R)-prune. With ET 741 one needs to do this check per ET neighbor. That is, the packet 742 should be forwarded unless all ET neighbors that have sent 743 (*,G)-joins have also sent (S,G,R)-prunes, and if a non-ET neighbor 744 has sent a (*,G)-join, whether there also is non-ET (S,G,R)-prune 745 state. 747 7. Multiple Instances and Address-Family Support 749 Multiple instances of the PIM protocol may be used to support e.g. 750 multiple address families. Multiple instances can cause a multiplier 751 effect on the number of router resources consumed. To be able to 752 have an option to use router resources more efficiently, muxing Join/ 753 Prune messages over fewer Transport connections can be performed. 755 There are two ways this can be accomplished, one using a common 756 header format over a TCP connection and the other using multiple 757 streams over a single SCTP connection. 759 Using the Common Header format described previously in this 760 specification, using different TLVs, both IPv4 and IPv6 based Join/ 761 Prune messages can be encoded within a Transport connection. 762 Likewise, within a TLV, multiple occurrences of Join/Prune messages 763 can occur and are tagged with an instance-ID so multiple Join/Prune 764 messages for different instances can use a single Transport 765 connection. 767 When using SCTP multi-streaming, the common header is still used to 768 convey instance information but an SCTP association is used, on a 769 per-instance basis, to send data concurrently for multiple instances. 770 When data is sent concurrently, head of line blocking, which can 771 occur when using TCP, is avoided. 773 8. Miscellany 775 No changes expected in processing of other PIM messages like PIM 776 Asserts, Grafts, Graft-Acks, Registers, and Register-Stops. This 777 goes for BSR and Auto-RP type messages as well. 779 This extension is applicable only to PIM-SM, PIM-SSM and Bidir-PIM. 780 It does not take requirements for PIM-DM into consideration. 782 9. Security Considerations 784 Transport connections can be authenticated using HMACs MD5 and SHA-1 785 similar to use in BGP [RFC4271] and MSDP [RFC3618]. 787 When using SCTP as the transport protocol, [RFC4895] can be used, on 788 a per SCTP association basis to authenticate PIM data. 790 10. IANA Considerations 792 This specification makes use of a TCP port number and a SCTP port 793 number for the use of PIM-Over-Reliable-Transport that has been 794 allocated by IANA. It also makes use of IANA PIM Hello Options 795 allocations that should be made permanent. In addition, a registry 796 for PORT message types is requested. The registry should cover the 797 range 0 - 61439. An RFC is required for assignments in that range. 798 This document defines two PORT message types. Type 1, IPv4 Join/ 799 Prune Message; and Type 2, IPv6 Join/Prune Message. The type range 800 61440 - 65535 is for experimental use [RFC3692]. 802 11. Contributors 804 In addition to the persons listed as authors, significant 805 contributions were provided by Apoorva Karan and Arjen Boers. 807 12. Acknowledgments 809 The authors would like to give a special thank you and appreciation 810 to Nidhi Bhaskar for her initial design and early prototype of this 811 idea. 813 Appreciation goes to Randall Stewart for his authoritative review and 814 recommendation for using SCTP. 816 Thanks also goes to the following for their ideas and commentary 817 review of this specification, Mike McBride, Toerless Eckert, Yiqun 818 Cai, Albert Tian, Suresh Boddapati, Nataraj Batchu, Daniel Voce, John 819 Zwiebel, Yakov Rekhter, Lenny Giuliano, Gorry Fairhurst, Sameer 820 Gulrajani, Thomas Morin and Dimitri Papadimitriou. 822 A special thank you goes to Eric Rosen for his very detailed review 823 and commentary. Many of his comments are reflected as text in this 824 specification. 826 13. References 828 13.1. Normative References 830 [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, 831 RFC 793, September 1981. 833 [RFC1122] Braden, R., "Requirements for Internet Hosts - 834 Communication Layers", STD 3, RFC 1122, October 1989. 836 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 837 Requirement Levels", BCP 14, RFC 2119, March 1997. 839 [RFC3618] Fenner, B. and D. Meyer, "Multicast Source Discovery 840 Protocol (MSDP)", RFC 3618, October 2003. 842 [RFC4271] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway 843 Protocol 4 (BGP-4)", RFC 4271, January 2006. 845 [RFC4601] Fenner, B., Handley, M., Holbrook, H., and I. Kouvelas, 846 "Protocol Independent Multicast - Sparse Mode (PIM-SM): 847 Protocol Specification (Revised)", RFC 4601, August 2006. 849 [RFC4895] Tuexen, M., Stewart, R., Lei, P., and E. Rescorla, 850 "Authenticated Chunks for the Stream Control Transmission 851 Protocol (SCTP)", RFC 4895, August 2007. 853 [RFC4960] Stewart, R., "Stream Control Transmission Protocol", 854 RFC 4960, September 2007. 856 [RFC5015] Handley, M., Kouvelas, I., Speakman, T., and L. Vicisano, 857 "Bidirectional Protocol Independent Multicast (BIDIR- 858 PIM)", RFC 5015, October 2007. 860 13.2. Informative References 862 [AFI] IANA, "Address Family Indicators (AFIs)", ADDRESS FAMILY 863 NUMBERS http://www.iana.org/numbers.html, February 2007. 865 [HELLO-OPT] 866 IANA, "PIM Hello Options", PIM-HELLO-OPTIONS per 867 RFC4601 http://www.iana.org/assignments/pim-hello-options, 868 March 2007. 870 [RFC3692] Narten, T., "Assigning Experimental and Testing Numbers 871 Considered Useful", BCP 82, RFC 3692, January 2004. 873 Authors' Addresses 875 Dino Farinacci 876 cisco Systems 877 Tasman Drive 878 San Jose, CA 95134 879 USA 881 Email: dino@cisco.com 883 IJsbrand Wijnands 884 cisco Systems 885 Tasman Drive 886 San Jose, CA 95134 887 USA 889 Email: ice@cisco.com 891 Stig Venaas 892 cisco Systems 893 Tasman Drive 894 San Jose, CA 95134 895 USA 897 Email: stig@cisco.com 899 Maria Napierala 900 AT&T Labs 901 200 Laurel Drive 902 Middletown, New Jersey 07748> 903 USA 905 Email: mnapierala@att.com