idnits 2.17.1 draft-ietf-pim-port-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Sep 2009 rather than the newer Notice from 28 Dec 2009. (See https://trustee.ietf.org/license-info/) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (March 4, 2010) is 5166 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- ** Obsolete normative reference: RFC 761 (Obsoleted by RFC 793, RFC 7805) ** Obsolete normative reference: RFC 4601 (Obsoleted by RFC 7761) ** Obsolete normative reference: RFC 4960 (Obsoleted by RFC 9260) -- Duplicate reference: RFC4601, mentioned in 'HELLO-OPT', was also mentioned in 'RFC4601'. -- Obsolete informational reference (is this intentional?): RFC 4601 (ref. 'HELLO-OPT') (Obsoleted by RFC 7761) Summary: 4 errors (**), 0 flaws (~~), 1 warning (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group D. Farinacci 3 Internet-Draft IJ. Wijnands 4 Intended status: Experimental S. Venaas 5 Expires: September 5, 2010 cisco Systems 6 M. Napierala 7 AT&T Labs 8 March 4, 2010 10 A Reliable Transport Mechanism for PIM 11 draft-ietf-pim-port-03.txt 13 Abstract 15 This draft describes how a reliable transport mechanism can be used 16 by the PIM protocol to optimize CPU and bandwidth resource 17 utilization by eliminating periodic Join/Prune message transmission. 18 This draft proposes a modular extension to PIM to use either the TCP 19 or SCTP transport protocol. 21 Status of this Memo 23 This Internet-Draft is submitted to IETF in full conformance with the 24 provisions of BCP 78 and BCP 79. 26 Internet-Drafts are working documents of the Internet Engineering 27 Task Force (IETF), its areas, and its working groups. Note that 28 other groups may also distribute working documents as Internet- 29 Drafts. 31 Internet-Drafts are draft documents valid for a maximum of six months 32 and may be updated, replaced, or obsoleted by other documents at any 33 time. It is inappropriate to use Internet-Drafts as reference 34 material or to cite them other than as "work in progress." 36 The list of current Internet-Drafts can be accessed at 37 http://www.ietf.org/ietf/1id-abstracts.txt. 39 The list of Internet-Draft Shadow Directories can be accessed at 40 http://www.ietf.org/shadow.html. 42 This Internet-Draft will expire on September 5, 2010. 44 Copyright Notice 46 Copyright (c) 2010 IETF Trust and the persons identified as the 47 document authors. All rights reserved. 49 This document is subject to BCP 78 and the IETF Trust's Legal 50 Provisions Relating to IETF Documents 51 (http://trustee.ietf.org/license-info) in effect on the date of 52 publication of this document. Please review these documents 53 carefully, as they describe your rights and restrictions with respect 54 to this document. Code Components extracted from this document must 55 include Simplified BSD License text as described in Section 4.e of 56 the Trust Legal Provisions and are provided without warranty as 57 described in the BSD License. 59 Table of Contents 61 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 62 1.1. Requirements Notation . . . . . . . . . . . . . . . . . . 5 63 1.2. Definitions . . . . . . . . . . . . . . . . . . . . . . . 5 64 2. Protocol Overview . . . . . . . . . . . . . . . . . . . . . . 6 65 3. New PIM Hello Options . . . . . . . . . . . . . . . . . . . . 8 66 3.1. PIM over the TCP Transport Protocol . . . . . . . . . . . 8 67 3.2. PIM over the SCTP Transport Protocol . . . . . . . . . . . 9 68 4. Establishing Transport Connections . . . . . . . . . . . . . . 11 69 4.1. TCP Connection Maintenance . . . . . . . . . . . . . . . . 12 70 4.2. Moving from PORT to Datagram Mode . . . . . . . . . . . . 13 71 4.3. On-demand versus Pre-configured Connections . . . . . . . 13 72 4.4. Possible Hello Suppression Considerations . . . . . . . . 14 73 4.5. Avoiding a Pair of Connections between Neighbors . . . . . 14 74 5. Common Header Definition . . . . . . . . . . . . . . . . . . . 16 75 6. Explicit Tracking . . . . . . . . . . . . . . . . . . . . . . 20 76 7. Multiple Instances and Address-Family Support . . . . . . . . 21 77 8. Miscellany . . . . . . . . . . . . . . . . . . . . . . . . . . 22 78 9. Security Considerations . . . . . . . . . . . . . . . . . . . 23 79 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 24 80 11. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 25 81 12. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 26 82 13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 27 83 13.1. Normative References . . . . . . . . . . . . . . . . . . . 27 84 13.2. Informative References . . . . . . . . . . . . . . . . . . 27 85 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 28 87 1. Introduction 89 The goals of this specification are: 91 o To create a simple incremental mechanism to provide reliable PIM 92 message delivery in PIM version 2 for use with PIM Sparse-Mode 93 [RFC4601] and Bidirectional PIM [RFC5015]. 95 o The reliable transport mechanism will be used for Join-Prune 96 message transmission only. 98 o When a router supports this specification, it need not use the 99 reliable transport mechanism with every neighbor. That is, 100 negotiation on a per neighbor basis will occur. 102 The explicit non-goals of this specification are: 104 o Changes to the PIM protocol machinery as defined in [RFC4601]. 105 The reliable transport mechanism will be used as a plugin layer so 106 the PIM component does not know it is really there. 108 o Provide support for automatic switching between Datagram mode and 109 Transport mode. Two routers that are PIM neighbors on a link will 110 always use Transport mode if and only if both have Transport mode 111 enabled. 113 This document will specify how periodic JP message transmission can 114 be eliminated by using TCP [RFC0761] or SCTP [RFC4960] as the 115 reliable transport mechanism for JP messages. 117 This specification enables greater scalability in multicast 118 deployment since the processing required for protocol state 119 maintenance can be reduced. In addition to reduced processing on PIM 120 enabled routers, another important feature is the reduced join and 121 leave latency provided through a reliable transport. 123 In many existing and emerging networks, particularly wireless and 124 mobile satellite systems, link degradation due to weather, 125 interference, and other impairments can result in temporary spikes in 126 the packet loss. In these environments, periodic PIM joining can 127 cause join latency when messages are lost causing a retransmission 128 only 60 seconds later. By applying a reliable transport, a lost join 129 is retransmitted rapidly. Furthermore, when the last user leaves a 130 multicast group, any lost prune is similarly repaired and the 131 multicast stream is quickly removed from the wireless/satellite link. 132 Without a reliable transport, the multicast transmission could 133 otherwise continue until it timed out, roughly 3 minutes later. As 134 network resources are at a premium in many of these environments, 135 rapid termination of the multicast stream is critical to maintaining 136 efficient use of bandwidth. 138 1.1. Requirements Notation 140 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 141 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 142 document are to be interpreted as described in [RFC2119]. 144 1.2. Definitions 146 PORT: Stands for PIM Over Reliable Transport. Which is the short 147 form for describing the mechanism in this specification where PIM 148 can use the TCP or SCTP transport protocol. 150 JP Message: An abbreviation for a Join-Prune message. 152 Periodic JP: A JP message sent periodically to refresh state. 154 Incremental JP: A JP message sent as a result of state creation or 155 deletion events. Also known as a triggered message. 157 Native JP: A JP message which is carried with an IP protocol type 158 of PIM. 160 Reliable JP: A JP message using TCP or SCTP for transport. 162 Datagram Mode: The current procedures PIM uses by encapsulating JP 163 messages in IP packets sent either triggered or periodically. 165 PORT Mode: Procedures used by PIM defined in this specification for 166 sending JP messages over the TCP or SCTP transport layer. 168 2. Protocol Overview 170 PIM Over Reliable Transport (PORT) is a simple extension to PIMv2 for 171 refresh reduction of PIM JP messages. It involves sending 172 incremental rather than periodic JPs over a TCP/SCTP connection 173 between PIM neighbors. 175 PORT only applies to PIM Sparse-Mode [RFC4601] and Bidirectional PIM 176 [RFC5015] JP messages. 178 This document does not restrict PORT to any specific link types. It 179 is however not recommended to use PORT on e.g. multi-access LANs with 180 many PIM neighbors. This due to the fact that there may be a full 181 mesh of PORT connections, and that there is no join suppression. 183 PORT can be incrementally used on a link between PORT capable 184 neighbors. Routers which are not PORT capable can continue to use 185 PIM in Datagram Mode. PORT capability is detected using new PORT 186 Capable PIM Hello Options. 188 Once PORT is enabled on an interface and a PIM neighbor also 189 announces that it is PORT enabled, only Reliable JP messages will be 190 used. That is, only Reliable JP messages are accepted from, and sent 191 to, that particular neighbor. Native JP messages may still be used 192 for other neighbors. 194 Reliable JP messages are sent using a TCP/SCTP connection. When two 195 PIM neighbors are PORT enabled, both for TCP or both for SCTP, they 196 will immediately, or on-demand, establish a connection. If the 197 connection goes down, they will again immediately, or on-demand, try 198 to reestablish the connection. No JP messages (neither Native nor 199 Reliable) are sent while there is no connection. 201 When PORT is used, only incremental JPs are sent from downstream 202 routers to upstream routers. As such, downstream routers do not 203 generate periodic JPs for routes which RPF to a PORT-capable 204 neighbor. 206 For Joins and Prunes, which are received over a TCP/SCTP connection, 207 the upstream router does not start or maintain timers on the outgoing 208 interface entry. Instead, it keeps track of which downstream routers 209 have expressed interest. An interface is deleted from the outgoing 210 interface list only when all downstream routers on the interface, no 211 longer wish to receive traffic. 213 There is no change proposed for the PIM JP packet format. However, 214 for JPs sent over TCP/SCTP connections, no IP Header is included. 215 The message begins with the PIM common header, followed by the JP 216 message. See section Section 5 for details on the common header. 218 3. New PIM Hello Options 220 3.1. PIM over the TCP Transport Protocol 222 Option Type: PIM-over-TCP Capable 224 0 1 2 3 225 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 226 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 227 | Type = 27 | Length = X + 8 | 228 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 229 | TCP Connection ID AFI | Reserved | Exp | 230 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 231 | TCP Connection ID | 232 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 233 | Interface ID | 234 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 236 Allocated Hello Type values can be found in [HELLO-OPT]. 238 When a router is configured to use PIM over TCP on a given interface, 239 it MUST include the PIM-over-TCP Capable hello option in its Hello 240 messages for that interface. If a router is explicitly disabled from 241 using JP over TCP it MUST NOT include the PIM-over-TCP Capable hello 242 option in its Hello messages. When the router cannot setup a TCP 243 connection, it will refrain from including this option. 245 Implementations may provide a configuration option to enable or 246 disable PORT functionality. We recommend that this capability be 247 disabled by default. 249 Length: In bytes for the value part of the Type/Length/Value 250 encoding. Where X is 4 bytes if AFI of value 1 (IPv4) is used and 251 16 bytes when AFI of value 2 (IPv6) is used [AFI]. 253 TCP Connection ID AFI: The AFI value to describe the address-family 254 of the address of the TCP Connection ID field. When this field is 255 0, a mechanism outside the scope of this spec is used to obtain 256 the addresses used to establish the TCP connection. 258 Reserved: Set to zero on transmission and ignored on receipt. 260 Exp: For experimental use [RFC3692]. 262 TCP Connection ID: An IPv4 or IPv6 address used to establish the 263 TCP connection. This field is omitted (length 0) for the 264 Connection ID AFI 0. 266 Interface ID: An Interface ID is used to associate the connection a 267 JP message is received over with an interface which is added or 268 removed from an oif-list. When unnumbered interfaces are used or 269 when a single Transport connection is used for sending and 270 receiving JP messages over multiple interfaces, the Interface ID 271 is used convey the interface from JP message sender to JP message 272 receiver. When a PIM router sets a locally generated value for 273 the Interface ID in the Hello TLV, it must send the same Interface 274 ID value in all JP messages it is sending to the PIM neighbor. 276 3.2. PIM over the SCTP Transport Protocol 278 Option Type: PIM-over-SCTP Capable 280 0 1 2 3 281 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 282 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 283 | Type = 28 | Length = X + 8 | 284 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 285 | SCTP Connection ID AFI | Reserved | Exp | 286 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 287 | SCTP Connection ID | 288 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 289 | Interface ID | 290 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 292 Allocated Hello Type values can be found in [HELLO-OPT]. 294 When a router is configured to use PIM over SCTP on a given 295 interface, it MUST include the PIM-over-SCTP Capable hello option in 296 its Hello messages for that interface. If a router is explicitly 297 disabled from using JP over SCTP it MUST NOT include the PIM-over- 298 SCTP Capable hello option in its Hello messages. When the router 299 cannot setup a SCTP connection, it will refrain from including this 300 option. 302 Implementations may provide a configuration option to enable or 303 disable PORT functionality. We recommend that this capability be 304 disabled by default. 306 Length: In bytes for the value part of the Type/Length/Value 307 encoding. Where X is 4 bytes if AFI of value 1 (IPv4) is used and 308 16 bytes when AFI of value 2 (IPv6) is used [AFI]. 310 SCTP Connection ID AFI: The AFI value to describe the address- 311 family of the address of the SCTP Connection ID field. When this 312 field is 0, a mechanism outside the scope of this spec is used to 313 obtain the addresses used to establish the SCTP connection. 315 Reserved: Set to zero on transmission and ignored on receipt. 317 Exp: For experimental use [RFC3692]. 319 SCTP Connection ID: An IPv4 or IPv6 address used to establish the 320 SCTP connection. This field is omitted (length 0) for the 321 Connection ID AFI 0. 323 Interface ID: An Interface ID is used to associate the connection a 324 JP message is received over with an interface which is added or 325 removed from an oif-list. When unnumbered interfaces are used or 326 when a single Transport connection is used for sending and 327 receiving JP messages over multiple interfaces, the Interface ID 328 is used convey the interface from JP message sender to JP message 329 receiver. When a PIM router sets a locally generated value for 330 the Interface ID in the Hello TLV, it must send the same Interface 331 ID value in all JP messages it is sending to the PIM neighbor. 333 4. Establishing Transport Connections 335 While a router interface is PORT enabled, a PIM-over-TCP or a PIM- 336 over-SCTP option is included in the PIM Hello messages sent on that 337 interface. When a router on a PORT-enabled interface receives a 338 Hello message containing a PIM-over-TCP/PIM-over-SCTP Option from a 339 new neighbor, or an existing neighbor that did not previously include 340 the option, it switches to PORT mode for that particular neighbor. 342 When a router switches to PORT mode for a neighbor, it stops sending 343 and accepting Native JP messages for that neighbor. Any state from 344 previous Native JP messages is left to expire as normal. It will 345 also attempt to establish a Transport connection (TCP or SCTP) with 346 the neighbor. 348 When the router is using TCP it will compare the TCP Connection ID it 349 announced in the PIM-over-TCP Capable Option with the TCP Connection 350 ID in the Hello received from the neighbor. The router with the 351 lower Connection ID will do an active Transport open to the neighbor 352 Connection ID. The router with the higher Connection ID will do a 353 passive Transport open. An implementation may open connections only 354 on-demand, in that case it may be that the neighbor with the higher 355 Connection ID does the active open, see Section 4.3. Note that the 356 source address of the active open must be the announced Connection 357 ID. 359 When the router is using SCTP, the IP address comparison need not be 360 done since the SCTP protocol can handle call collision. 362 If PORT is used both for IPv4 and IPv6, both IPv4 and IPv6 PIM Hello 363 messages are sent, both containing PORT Hello options. If two 364 neighbors announce the same transport (TCP or SCTP) and the same 365 Connection ID in the IPv4 and IPv6 Hello messages, then only one 366 connection is established and is shared. Otherwise, two connections 367 are established and are used separately. 369 The PIM router that performs the active open initiates the connection 370 with a locally generated source transport port number and a well- 371 known destination transport port number. The PIM router that 372 performs the passive open listens on the well-known local transport 373 port number and does not qualify the remote transport port number. 374 See Section 5 for well-known port number assignment for PORT. 376 When a Transport connection is established (or reestablished), the 377 two routers MUST both send a full set of JP messages for which the 378 other router is the upstream neighbor. This is needed to ensure that 379 the upstream neighbor has the correct state. When moving from 380 Datagram mode, or when the connection has gone down, the router 381 cannot be sure that all the previous JP data was received by the 382 neighbor. Any state received while in Datagram mode that is not 383 refreshed, will be left to expire. 385 When a Transport connection goes down, Join or Prune state that was 386 sent over the Transport connection is still retained. The neighbor 387 should not be considered down until the neighbor timer has expired. 388 This allows routers to do a control-plane switchover without 389 disrupting the network. If a Transport connection is reestablished 390 before the neighbor timer expires, the previous state is intact and 391 any new JP messages sent cause state to be created or removed 392 (depending on if it was a Join or Prune). If the neighbor timer does 393 expire, only the upstream router, that has oif-list state, to the 394 expired downstream neighbor will need to clear state. A downstream 395 router, when an upstream neighboring router has expired, will simply 396 RPF to a new neighbor where it would trigger JP messages like it 397 would in [RFC4601]. It is required of a PIM router to clear its 398 neighbor table for a neighbor who has timed out due to neighbor 399 holdtime expiration. 401 Note, since JP messages are sent over a Transport connection, no 402 Prune Override or Join Suppression are possible for these messages. 404 4.1. TCP Connection Maintenance 406 TCP is designed to keep connections up indefinitely during a period 407 of network disconnection. If a PIM-over-TCP router fails, the TCP 408 connection may stay up until the neighbor actually reboots, and even 409 then it may continue to stay up until you actually try to send the 410 neighbor some information. This is particularly relevant to PIM, 411 since the flow of JPs might be in only one direction, and the 412 downstream neighbor might never get any indication via TCP that the 413 other end of the connection isn't really there. 415 Most applications using TCP want to detect when a neighbor is no 416 longer there, so that the associated application state can be 417 released. Also, one wants to clean up the TCP state, and not keep 418 half-open connections around indefinitely. This is accomplished by 419 using PIM Hellos and by not introducing an application-specific or 420 new PIM keep-alive message. Therefore, when a GENID changes from a 421 received PIM Hello message, and a TCP connection is established or 422 attempting to be established, the local side will tear down the 423 connection and attempt to reopen a new one for the new instance of 424 the neighbor coming up. However, if the connection is shared by 425 multiple interfaces and the GENID changes only for one of them, then 426 there was not a full reboot and the connection is likely to still 427 work. In that case, the router should just resend all JP state for 428 that particular neighbor. This is similar to how state is refreshed 429 when GENID changes for PIM in datagram mode. 431 There may be situations where a router ignores some joins or prunes. 432 E.g. due to wrong RP information or receiving joins on an RPF 433 interface. A router may try to cache such messages and apply them 434 later if only a temporary error. It may however also ignore the 435 message, and later change its GENID for that interface to make the 436 neighbor resend all state, including any that may have been 437 previously ignored. It is possible that one receives JP messages for 438 an interface/link that is down. As long as the neighbor has not 439 expired, we recommend processing those messages as usual. If they 440 are ignored, then the router should change the GENID for that 441 interface when it comes back up, in order to get a full update. 443 4.2. Moving from PORT to Datagram Mode 445 There may be situations where an administrator decides to stop using 446 PORT. If PORT is disabled on a router interface, we start expiry 447 timers with the respective neighbor holdtimes as the initial values. 448 Similarly if we receive a Hello message without a PORT Capable option 449 from a neighbor, we start expiry timers for all JP state we have for 450 that particular neighbor. The Transport connection should be shut 451 down as soon as there are no more PIM neighborships using it. That 452 is, for the connection we have associated local and remote Connection 453 IDs. When there is no PIM neighbor with that particular remote 454 connection ID on any interface where we announce the local connection 455 ID, the connection should be shut down. 457 4.3. On-demand versus Pre-configured Connections 459 Transport connections could be established when they are needed or 460 when a router interface to other PIM neighbors has come up. The 461 advantage of on-demand Transport connection establishment is the 462 reduction of router resources. Especially in the case where there is 463 no need for n^2 connections on a network interface or MDT tunnel. 464 The disadvantage is additional delay and queueing when a JP message 465 needs to be sent and a Transport connection is not established yet. 467 If a router interface has become operational and PIM neighbors are 468 learned from Hello messages, at that time, Transport connections may 469 be established. The advantage is that a connection is ready to 470 transport data by the time a JP messages needs to be sent. The 471 disadvantage is there can be more connections established than 472 needed. This can occur when there is a small set of RPF neighbors 473 for the active distribution trees compared to the total number of 474 neighbors. Even when Transport connections are pre-established 475 before they are needed, a connection can go down and an 476 implementation will have to deal with an on-demand situation. 478 Note that for TCP, it is the router with the lower Connection ID that 479 decides whether to open a connection immediately, or on-demand. The 480 router with the higher Connection ID should only initiate a 481 connection on-demand. That is, if it needs to send a JP message and 482 there is no currently established connection. 484 Therefore, this specification recommends but does not mandate the use 485 of on-demand Transport connection establishment. 487 4.4. Possible Hello Suppression Considerations 489 This specification indicates that a Transport connection cannot be 490 established until a Hello message is received. One reason for this 491 is to determine if the PIM neighbor supports this specification and 492 the other is to determine the remote address to use to establish the 493 Transport connection. 495 There are cases where it is desirable to suppress entirely the 496 transmission of Hello messages. In this case, it is outside the 497 scope of this document on how to determine if the PIM neighbor 498 supports this specification as well as an out-of-band (outside of the 499 PIM protocol) method to determine the remote address to establish the 500 Transport connection. 502 4.5. Avoiding a Pair of Connections between Neighbors 504 To ensure there are not two connections between a pair of PIM 505 neighbors, the following set of rules must be followed. Let A and B 506 be two PIM neighbors where A's Connection ID is numerically smaller 507 than B's Connection ID, and each is known to the other as having a 508 potential PIM adjacency relationship. 510 At node A: 512 o If there is already an established TCP connection to B, on the 513 PIM-over-TCP port, then A MUST NOT attempt to establish a new 514 connection to B. Rather it uses the established connection to send 515 JPs to B. (This is independent of which node initiated the 516 connection.) 518 o If A has initiated a connection to B, but the connection is still 519 in the process of being established, then A MUST refuse any 520 connection on the PIM-over-TCP port from B. 522 o At any time when A does not have a connection to B which is either 523 established or in the process of being established, A MUST accept 524 connections from B. 526 At node B: 528 o If there is already an established TCP connection to A, on the 529 PIM-over-TCP port, then B MUST NOT attempt to establish a new 530 connection to A. Rather it uses the established connection to send 531 JPs to A. (This is independent of which node initiated the 532 connection.) 534 o If B has initiated a connection to A, but the connection is still 535 in the process of being established, then if A initiates a 536 connection too, B MUST accept the connection initiated by A and 537 must release the connection which it (B) initiated. 539 5. Common Header Definition 541 It may be desirable for scaling purposes to allow JP messages from 542 different PIM protocol instances to be sent over the same Transport 543 connection. Also, it may be desirable to have a set of JP messages 544 for one address-family sent over a Transport connection that is 545 established over a different address-family network layer. 547 To be able to do this we need a common header that is inserted and 548 parsed for each PIM JP message that is sent on a Transport 549 connection. This common header will provide both record boundary and 550 demux points when sending over a stream protocol like Transport. 552 Each JP message will have in front of it the following common header 553 in Type/Length/Value format. And multiple different TLV types can be 554 sent over the same Transport connection. 556 To make sure PIM JP messages are delivered as soon as the TCP 557 transport layer receives the JP buffer, the TCP Push flag will be set 558 in all outgoing JP messages sent over a TCP transport connection. 560 PIM messages will be sent using destination TCP port number 8471. 561 When using SCTP as the reliable transport, destination port number 562 8471 will be used. See Section 10 for IANA considerations. 564 JP messages are error checked. This includes a bad PIM checksum, 565 illegal type fields, illegal addresses or a truncated message. If 566 any parsing errors occur in a JP message, it is skipped, and we 567 proceed processing any following TLVs. 569 The TLV type field is 16 bits. The range 61440 - 65535 is for 570 experimental use [RFC3692]. 572 The current list of defined TLVs are: 574 IPv4 JP Message 576 0 1 2 3 577 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 578 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 579 | Type = 1 | Length = X + 16 | 580 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 581 | Reserved | Exp |I-Type | 582 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 583 | Interface ID | 584 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 585 | Instance ID . . . | 586 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 587 | . . . Instance ID | 588 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 589 | PIMv2 JP Message | 590 | . | 591 | . | 592 | . | 593 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 595 The IPv4 JP common header is used when a JP message is sent that has 596 all IPv4 encoded addresses in the PIM payload. 598 Length: In bytes for the value part of the Type/Length/Value 599 encoding. Where X is the number of bytes that make up the PIMv2 600 JP message. 602 Reserved: Set to zero on transmission and ignored on receipt. 604 Exp: For experimental use [RFC3692]. 606 I-Type: Defines the encoding and semantics of the Instance ID 607 field. Instance Type 0 means Instance ID is not used. Other 608 values are not defined in this specification. 610 Interface ID: This is the Interface ID from the Hello TLV, defined 611 in this specification, the PIM router is sending to the PIM 612 neighbor. It indicates to the PIM neighbor what interface to 613 associate the JP Join or Prune with. 615 Instance ID: This document only defines this for Instance Type 0. 616 For type 0 the field should be set to zero on transmission and 617 ignored on receipt. 619 PIMv2 JP Message: PIMv2 Join/Prune message and payload with no IP 620 header in front of it. As you can see from the packet format 621 diagram, multiple JP messages can go into one TCP/SCTP stream from 622 the same or different Interface and Instance IDs. 624 IPv6 JP Message 626 0 1 2 3 627 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 628 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 629 | Type = 2 | Length = X + 16 | 630 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 631 | Reserved | Exp |I-Type | 632 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 633 | Interface ID | 634 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 635 | Instance ID . . . | 636 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 637 | . . . Instance ID | 638 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 639 | PIMv2 JP Message | 640 | . | 641 | . | 642 | . | 643 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 645 The IPv6 JP common header is used when a JP message is sent that has 646 all IPv6 encoded addresses in the PIM payload. 648 Length: In bytes for the value part of the Type/Length/Value 649 encoding. Where X is the number of bytes that make up the PIMv2 650 JP message. 652 Reserved: Set to zero on transmission and ignored on receipt. 654 Exp: For experimental use [RFC3692]. 656 I-Type: Defines the encoding and semantics of the Instance ID 657 field. Instance Type 0 means Instance ID is not used. Other 658 values are not defined in this specification. 660 Interface ID: This is the Interface ID from the Hello TLV, defined 661 in this specification, the PIM router is sending to the PIM 662 neighbor. It indicates to the PIM neighbor what interface to 663 associate the JP Join or Prune with. 665 Instance ID: This document only defines this for Instance Type 0. 666 For type 0 the field should be set to zero on transmission and 667 ignored on receipt. 669 PIMv2 JP Message: PIMv2 Join/Prune message and payload with no IP 670 header in front of it. As you can see from the packet format 671 diagram, multiple JP messages can go into one TCP/SCTP stream from 672 the same or different Interface and Instance IDs. 674 6. Explicit Tracking 676 A router needs to keep track of which PORT neighbors express interest 677 in a route on a given interface. For non-PORT neighbors, there is no 678 change, one would usually just need to know if at least one non-PORT 679 neighbor is interested. For some link-types, e.g. point-to-point, 680 tracking neighbors is no different than tracking interfaces. It may 681 also be possible for an implementation to treat different downstream 682 neighbors as being on different logical interfaces, even if they are 683 on the same physical link. Exactly how this is implemented and for 684 which link types, is left to the implementer. 686 For (*,G) and (S,G) routes, the router starts forwarding traffic on 687 an interface when a Join is received from a neighbor on such an 688 interface. When a non-PORT neighbor sends a Prune, there is 689 generally a small delay to see if another non-PORT neighbor sends a 690 Prune Override. If there is no override, one should note that no 691 non-PORT neighbor is interested. If no PORT neighbors are 692 interested, the interface can be removed from the oif-list. When a 693 PORT neighbor sends a Prune, one removes the join state for that 694 neighbor. If no other PORT or non-PORT neighbors are interested, the 695 interface can be removed from the oif-list. In this case there is no 696 Prune Override, since the Prune was not visible to other neighbors. 698 For (S,G,R) routes, the router needs to track Prune state on the 699 shared tree. It needs to know which PORT neighbors have sent prunes, 700 and whether any non-PORT neighbors have sent prunes. The latter is 701 exactly like when not using PORT. Normally one would forward a 702 packet from a source S to a group G out on an interface if a 703 (*,G)-join is received, but no (S,G,R)-prune. With PORT one needs to 704 do this check per PORT neighbor. That is, the packet should be 705 forwarded unless all PORT neighbors that have sent (*,G)-joins have 706 also sent (S,G,R)-prunes and if a non-PORT neighbor has sent a 707 (*,G)-join, whether there also is non-PORT (S,G,R)-prune state. 709 7. Multiple Instances and Address-Family Support 711 Multiple instances of the PIM protocol may be used to support e.g. 712 multiple address families. Multiple instances can cause a multiplier 713 effect on the number of router resources consumed. To be able to 714 have an option to use router resources more efficiently, muxing JP 715 messages over fewer Transport connections can be performed. 717 There are two ways this can be accomplished, one using a common 718 header format over a TCP connection and the other using multiple 719 streams over a single SCTP connection. 721 Using the Common Header format described previously in this 722 specification, using different TLVs, both IPv4 and IPv6 based JP 723 messages can be encoded within a Transport connection. Likewise, 724 within a TLV, multiple occurrences of JP messages can occur and are 725 tagged with an instance-ID so multiple JP messages for different 726 instances can use a single Transport connection. 728 When using SCTP multi-streaming, the common header is still used to 729 convey instance information but an SCTP association is used, on a 730 per-instance basis, to send data concurrently for multiple instances. 731 When data is sent concurrently, head of line blocking, which can 732 occur when using TCP, is avoided. 734 8. Miscellany 736 No changes expected in processing of other PIM messages like PIM 737 Asserts, Grafts, Graft-Acks, Registers, and Register-Stops. This 738 goes for BSR and Auto-RP type messages as well. 740 This extension is applicable only to PIM-SM, PIM-SSM and Bidir-PIM. 741 It does not take requirements for PIM-DM into consideration. 743 9. Security Considerations 745 Transport connections can be authenticated using HMACs MD5 and SHA-1 746 similar to use in BGP [RFC4271] and MSDP [RFC3618]. 748 When using SCTP as the transport protocol, [RFC4895] can be used, on 749 a per SCTP association basis to authenticate PIM data. 751 10. IANA Considerations 753 This specification makes use of a TCP port number and a SCTP port 754 number for the use of PIM-Over-Reliable-Transport that has been 755 allocated by IANA. It also makes use of IANA PIM Hello Options 756 allocations that should be made permanent. In addition, a registry 757 for PORT message types is requested. The registry should cover the 758 range 0 - 61439. An RFC is required for assignments in that range. 759 This document defines two PORT message types. Type 1, IPv4 JP 760 Message; and Type 2, IPv6 JP Message. The type range 61440 - 65535 761 is for experimental use [RFC3692]. 763 11. Contributors 765 In addition to the persons listed as authors, significant 766 contributions were provided by Apoorva Karan and Arjen Boers. 768 12. Acknowledgments 770 The authors would like to give a special thank you and appreciation 771 to Nidhi Bhaskar for her initial design and early prototype of this 772 idea. 774 Appreciation goes to Randall Stewart for his authoritative review and 775 recommendation for using SCTP. 777 Thanks also goes to the following for their ideas and commentary 778 review of this specification, Mike McBride, Toerless Eckert, Yiqun 779 Cai, Albert Tian, Suresh Boddapati, Nataraj Batchu, Daniel Voce, John 780 Zwiebel, Yakov Rekhter, Lenny Giuliano, Gorry Fairhurst and Sameer 781 Gulrajani. 783 A special thank you goes to Eric Rosen for his very detailed review 784 and commentary. Many of his comments are reflected as text in this 785 specification. 787 13. References 789 13.1. Normative References 791 [RFC0761] Postel, J., "DoD standard Transmission Control Protocol", 792 RFC 761, January 1980. 794 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 795 Requirement Levels", BCP 14, RFC 2119, March 1997. 797 [RFC3618] Fenner, B. and D. Meyer, "Multicast Source Discovery 798 Protocol (MSDP)", RFC 3618, October 2003. 800 [RFC4271] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway 801 Protocol 4 (BGP-4)", RFC 4271, January 2006. 803 [RFC4601] Fenner, B., Handley, M., Holbrook, H., and I. Kouvelas, 804 "Protocol Independent Multicast - Sparse Mode (PIM-SM): 805 Protocol Specification (Revised)", RFC 4601, August 2006. 807 [RFC4895] Tuexen, M., Stewart, R., Lei, P., and E. Rescorla, 808 "Authenticated Chunks for the Stream Control Transmission 809 Protocol (SCTP)", RFC 4895, August 2007. 811 [RFC4960] Stewart, R., "Stream Control Transmission Protocol", 812 RFC 4960, September 2007. 814 [RFC5015] Handley, M., Kouvelas, I., Speakman, T., and L. Vicisano, 815 "Bidirectional Protocol Independent Multicast (BIDIR- 816 PIM)", RFC 5015, October 2007. 818 13.2. Informative References 820 [AFI] IANA, "Address Family Indicators (AFIs)", ADDRESS FAMILY 821 NUMBERS http://www.iana.org/numbers.html, February 2007. 823 [HELLO-OPT] 824 IANA, "PIM Hello Options", PIM-HELLO-OPTIONS per 825 RFC4601 http://www.iana.org/assignments/pim-hello-options, 826 March 2007. 828 [RFC3692] Narten, T., "Assigning Experimental and Testing Numbers 829 Considered Useful", BCP 82, RFC 3692, January 2004. 831 Authors' Addresses 833 Dino Farinacci 834 cisco Systems 835 Tasman Drive 836 San Jose, CA 95134 837 USA 839 Email: dino@cisco.com 841 IJsbrand Wijnands 842 cisco Systems 843 Tasman Drive 844 San Jose, CA 95134 845 USA 847 Email: ice@cisco.com 849 Stig Venaas 850 cisco Systems 851 Tasman Drive 852 San Jose, CA 95134 853 USA 855 Email: stig@cisco.com 857 Maria Napierala 858 AT&T Labs 859 200 Laurel Drive 860 Middletown, New Jersey 07748> 861 USA 863 Email: mnapierala@att.com