idnits 2.17.1 draft-ietf-pim-port-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 19. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 932. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 943. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 950. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 956. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (August 22, 2008) is 5719 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- ** Obsolete normative reference: RFC 761 (Obsoleted by RFC 793, RFC 7805) ** Obsolete normative reference: RFC 4601 (Obsoleted by RFC 7761) ** Obsolete normative reference: RFC 4960 (Obsoleted by RFC 9260) -- Duplicate reference: RFC4601, mentioned in 'HELLO-OPT', was also mentioned in 'RFC4601'. -- Obsolete informational reference (is this intentional?): RFC 4601 (ref. 'HELLO-OPT') (Obsoleted by RFC 7761) == Outdated reference: A later version (-10) exists of draft-ietf-l3vpn-2547bis-mcast-05 Summary: 4 errors (**), 0 flaws (~~), 2 warnings (==), 9 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group Dino Farinacci 3 Internet-Draft IJsbrand Wijnands 4 Intended status: Experimental Apoorva Karan 5 Expires: February 23, 2009 Arjen Boers 6 cisco Systems 7 Maria Napierala 8 AT&T Labs 9 August 22, 2008 11 A Reliable Transport Mechanism for PIM 12 draft-ietf-pim-port-00.txt 14 Status of this Memo 16 By submitting this Internet-Draft, each author represents that any 17 applicable patent or other IPR claims of which he or she is aware 18 have been or will be disclosed, and any of which he or she becomes 19 aware will be disclosed, in accordance with Section 6 of BCP 79. 21 Internet-Drafts are working documents of the Internet Engineering 22 Task Force (IETF), its areas, and its working groups. Note that 23 other groups may also distribute working documents as Internet- 24 Drafts. 26 Internet-Drafts are draft documents valid for a maximum of six months 27 and may be updated, replaced, or obsoleted by other documents at any 28 time. It is inappropriate to use Internet-Drafts as reference 29 material or to cite them other than as "work in progress." 31 The list of current Internet-Drafts can be accessed at 32 http://www.ietf.org/ietf/1id-abstracts.txt. 34 The list of Internet-Draft Shadow Directories can be accessed at 35 http://www.ietf.org/shadow.html. 37 This Internet-Draft will expire on February 23, 2009. 39 Copyright Notice 41 Copyright (C) The IETF Trust (2008). 43 Abstract 45 This draft describes how a reliable transport mechanism can be used 46 by the PIM protocol to optimize CPU and bandwidth resource 47 utilization by eliminating periodic Join/Prune message transmission. 48 This draft proposes a modular extension to PIM to use either the TCP 49 or SCTP transport protocol. 51 Table of Contents 53 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 54 1.1. Requirements Notation . . . . . . . . . . . . . . . . . . 5 55 1.2. Definitions . . . . . . . . . . . . . . . . . . . . . . . 5 56 2. Protocol Overview . . . . . . . . . . . . . . . . . . . . . . 7 57 3. New PIM Hello Options . . . . . . . . . . . . . . . . . . . . 8 58 3.1. PIM over the TCP Transport Protocol . . . . . . . . . . . 8 59 3.2. PIM over the SCTP Transport Protocol . . . . . . . . . . . 9 60 4. Establishing Transport Connections . . . . . . . . . . . . . . 11 61 4.1. TCP Connection Maintenance . . . . . . . . . . . . . . . . 12 62 4.2. Transitional Periods . . . . . . . . . . . . . . . . . . . 13 63 4.3. On-demand versus Pre-configured Connections . . . . . . . 13 64 4.4. Possible Hello Suppression Considerations . . . . . . . . 13 65 4.5. Avoiding a Pair of Connections between Neighbors . . . . . 14 66 5. Common Header Definition . . . . . . . . . . . . . . . . . . . 15 67 6. Join/Prune Processing . . . . . . . . . . . . . . . . . . . . 19 68 7. Outgoing Interface List Explicit Tracking . . . . . . . . . . 20 69 8. Multiple Instances and Address-Family Support . . . . . . . . 21 70 9. Miscellany . . . . . . . . . . . . . . . . . . . . . . . . . . 22 71 10. Security Considerations . . . . . . . . . . . . . . . . . . . 23 72 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 24 73 12. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 25 74 13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 26 75 13.1. Normative References . . . . . . . . . . . . . . . . . . . 26 76 13.2. Informative References . . . . . . . . . . . . . . . . . . 26 77 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 27 78 Intellectual Property and Copyright Statements . . . . . . . . . . 28 80 1. Introduction 82 The goals of this specification are: 84 o To create a simple incremental mechanism to provide reliable PIM 85 message delivery in PIM version 2. 87 o The reliable transport mechanism will be used for Join-Prune 88 message transmission only. 90 o Can be used for link-local transmission of Join-Prune messages or 91 multi-hop for use in a multicast VPN environments. 93 o When a router supports this specification, it need not use the 94 reliable transport mechanism on every interface. That is, 95 negotiation on per interface basis (or MDT basis) will occur. 97 The explicit non-goals of this specification are: 99 o Changes to the PIM protocol machinery as defined in [RFC4601]. 100 The reliable transport mechanism will be used as a plugin layer so 101 the PIM component does not know it is really there. 103 o Provide support for both Datagram mode and Transport mode (see 104 Section 1.2 for definitions) on the same physical interface or 105 MDT. 107 This document will specify how periodic JP message transmission can 108 be eliminated by using TCP [RFC0761] or SCTP [RFC4960] as the 109 reliable transport mechanism for JP messages. 111 This specification enables greater scalability in multicast 112 deployment since the processing required for protocol state 113 maintenance can be reduced. These enhancements to PIMv2 are 114 applicable to IP multicast over routed services and VPNs [MCAST-VPN]. 115 In addition to reduced processing on PIM enabled routers, another 116 important feature is the reduced join and leave latency provided 117 through a reliable transport. 119 In many existing and emerging networks, particularly wireless and 120 mobile satellite systems, link degradation due to weather, 121 interference, and other impairments can result in temporary spikes in 122 the packet loss. In these environments, periodic PIM joining can 123 cause join latency when messages are lost causing a retransmission 124 only 60 seconds later. By applying a reliable transport, a lost join 125 is retransmitted rapidly. Furthermore, when the last user leaves a 126 multicast group, any lost prune is similarly repaired and the 127 multicast stream is quickly removed from the wireless/satellite link. 129 Without a reliable transport, the multicast transmission could 130 otherwise continue until it timed out, roughly 3 minutes later. As 131 network resources are at a premium in many of these environments, 132 rapid termination of the multicast stream is critical to maintaining 133 efficient use of bandwidth. 135 1.1. Requirements Notation 137 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 138 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 139 document are to be interpreted as described in [RFC2119]. 141 1.2. Definitions 143 PORT: Stands for PIM Over Reliable Transport. Which is the short 144 form for describing the mechanism in this specification where PIM 145 can use the TCP or SCTP transport protocol. 147 JP Message: An abbreviation for a Join-Prune message. 149 Periodic JP: A JP message sent periodically to refresh state. 151 Incremental JP: A JP message sent as a result of state creation or 152 deletion events. Also known as a triggered message. 154 Native JP: A JP message which is carried with an IP protocol type 155 of PIM. 157 Reliable JP: A JP message using TCP or SCTP for transport. 159 Datagram Mode: The current procedures PIM uses by encapsulating JP 160 messages in IP packets sent either triggered or periodically. 162 Transport Mode: Procedures used by PIM defined in this 163 specification for sending JP messages over the TCP or SCTP 164 transport layer. 166 MDT/PMSI: Used interchangeably in this document. An MDT tunnel is 167 one used between PE router to provide support for a Multicast VPN. 168 The new standards term for an MDT tunnel is a Provider-Network 169 Multicast Service Interface or PMSI. 171 Segmented Multi-Access LAN: A segmented (or partitioned) LAN is 172 like a virtual overlay network using the physical LAN to realize 173 control and data packets. Multiple overlay networks may be 174 created using the physical LAN, much like how VLANs or PMSI 175 overlays are configured over a multi-access phsyical LAN. The 176 interface associated with the partitioned LAN is like an NBMA 177 interface type so explicit tracking can be accomplished. Each 178 partitioned or segmented LAN has it's own data-link encapsulation 179 and link-layer multicast is still used to avoid head-end 180 replication. This concept also applies to MDTs/PMSIs and is 181 called "Segmented MDTs/PMSIs". A Segmented MDT/PMSI is a MDT/PMSI 182 that has a single forwarder (i.e. a single ingress PE) for any 183 multicast stream. 185 2. Protocol Overview 187 PIM Over Reliable Transport (PORT) is a simple extension to PIMv2 for 188 refresh reduction of PIM JP messages. It involves sending 189 incremental rather than periodic JPs over a TCP/SCTP connection 190 between PIM neighbors. 192 PORT can be incrementally used on an interface between PORT capable 193 neighbors. Routers which are not PORT capable can continue to use 194 PIM in Datagram Mode. PORT capability is detected using a new PORT 195 Capable PIM Hello Option. 197 When PORT is used, only incremental JPs are sent from downstream 198 routers to upstream routers. As such, downstream routers do not 199 generate periodic JPs for routes which RPF to a PORT-capable 200 neighbor. 202 For Joins and Prunes, which are received over a TCP/SCTP connection, 203 the upstream router does not start or maintain timers on the outgoing 204 interface entry. Instead, it explicitly tracks downstream routers 205 which have expressed interest. An interface is deleted from the 206 outgoing interface list only when all downstream routers on the 207 interface, no longer wish to receive traffic. 209 Because incremental JPs are sent over a TCP/SCTP connection, no Join 210 suppression or Prune-Override of incremental JPs is possible on 211 multi-access LANs. As a result, upstream routers, which receive an 212 incremental Join or Prune that creates state, explicitly track all 213 downstream nodes. Note, for point-to-point links there is no need 214 for explicitly tracking downstream nodes. 216 There is no change proposed for the PIM JP packet format. However, 217 for JPs sent over TCP/SCTP connections, no IP Header is included. 218 The message begins with the PIM common header, followed by the JP 219 message. See section Section 5 for details on the common header. 221 3. New PIM Hello Options 223 3.1. PIM over the TCP Transport Protocol 225 Option Type: PIM-over-TCP Capable 227 0 1 2 3 228 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 229 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 230 | Type = 65006 | Length = X + 4 | 231 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 232 | TCP Connection ID AFI | Reserved | 233 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 234 | TCP Connection ID | 235 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 236 | Interface ID | 237 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 239 Allocated Hello Type values can be found in [HELLO-OPT]. 241 When a router is configured to use PIM over TCP on a given interface, 242 it MUST include the PORT Capable hello option in its Hello messages 243 for that interface. If a router is explicitly disabled from using JP 244 over TCP it MUST NOT include the PORT Capable hello option in its 245 Hello messages. When the router cannot setup a TCP connection, it 246 will refrain from including this option. 248 This option is only used when a physical or logical interface is a 249 point-to-point, segmented multi-access LAN, a PMSI [MCAST-VPN], a 250 point-to-point or point-to-multipoint GRE tunnel. In all other 251 cases, such as multi-access LANs, Datagram Mode is used. 253 Implementation may provide a configuration option to enable or 254 disable PORT functionality. We recommend that this capability be 255 disabled by default. 257 Length: In bytes for the value part of the Type/Length/Value 258 encoding. Where X is 4 bytes if IP AFI of value 1 is used and 16 259 bytes when IPv6 AFI of 2 is used [AFI]. 261 TCP Connection ID AFI: The AFI value to describe the address-family 262 of the address of the TCP Connection ID field. 264 Reserved: Set to zero on transmission and ignored on receipt. 266 TCP Connection ID: An IP or IPv6 address used to establish the TCP 267 connection. When this field is 0, a mechanism outside the scope 268 of this spec is used to obtain the addresses used to establish the 269 TCP connection. 271 Interface ID: An Interface ID is used to associate the connection a 272 JP message is received over with an interface which is added or 273 removed from an oif-list. When unnumbered interfaces are used or 274 when a single Transport connection is used for sending and 275 receiving JP messages over multiple interfaces, the Interface ID 276 is used convey the interface from JP message sender to JP message 277 receiver. When a PIM router sets a locally generated value for 278 the Interface ID in thie Hello TLV, it must send the same 279 Interface ID value in all JP messages it is sending to the PIM 280 neighbor. 282 3.2. PIM over the SCTP Transport Protocol 284 Option Type: PIM-over-SCTP Capable 286 0 1 2 3 287 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 288 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 289 | Type = 65007 | Length = X + 4 | 290 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 291 | SCTP Connection ID AFI | Reserved | 292 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 293 | SCTP Connection ID | 294 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 295 | Interface ID | 296 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 298 Allocated Hello Type values can be found in [HELLO-OPT]. 300 When a router is configured to use PIM over SCTP on a given 301 interface, it MUST include the PORT Capable hello option in its Hello 302 messages for that interface. If a router is explicitly disabled from 303 using JP over SCTP it MUST NOT include the PORT Capable hello option 304 in its Hello messages. When the router cannot setup a SCTP 305 connection, it will refrain from including this option. 307 This option is only used when an interface is point-to-point or when 308 a multi-access LAN or MDT is segmented (also known as "Partitioned 309 MDTs" in a non-broadcast multi-access (NBMA) mode. In all other 310 cases, such as general purpose multi-access LANs, Datagram Mode is 311 used. 313 Implementation may provide a configuration option to enable or 314 disable PORT functionality. We recommend that this capability be 315 disabled by default. 317 Length: In bytes for the value part of the Type/Length/Value 318 encoding. Where X is 4 bytes if IP AFI of value 1 is used and 16 319 bytes when IPv6 AFI of 2 is used [AFI]. 321 SCTP Connection ID AFI: The AFI value to describe the address- 322 family of the address of the SCTP Connection ID field. 324 Reserved: Set to zero on transmission and ignored on receipt. 326 SCTP Connection ID: An IP or IPv6 address used to establish the 327 SCTP connection. When this field is 0, a mechanism outside the 328 scope of this spec is used to obtain the addresses used to 329 establish the SCTP connection. 331 Interface ID: An Interface ID is used to associate the connection a 332 JP message is received over with an interface which is added or 333 removed from an oif-list. When unnumbered interfaces are used or 334 when a single Transport connection is used for sending and 335 receiving JP messages over multiple interfaces, the Interface ID 336 is used convey the interface from JP message sender to JP message 337 receiver. When a PIM router sets a locally generated value for 338 the Interface ID in thie Hello TLV, it must send the same 339 Interface ID value in all JP messages it is sending to the PIM 340 neighbor. 342 4. Establishing Transport Connections 344 Since this specification describes using Transport on point-to- point 345 links or NBMA configured MDTs, a router knows when a Transport is 346 established with the neighbor. When the Transport connection is not 347 established, Datagram Mode is used. When the Transport connection 348 becomes established Transport Mode is in effect where the router can 349 suppress sending periodic JPs. 351 When a router receives a Hello from a neighbor it has not previously 352 heard from, or the PORT-Capable Option is included in a Hello that 353 was not previously included by an existing neighbor, the router will 354 attempt to establish a Transport connection with the neighbor. When 355 the router is using TCP it will compare the IP address it uses to 356 send Hellos on the interface with the IP address the neighbor is 357 using to send Hellos. The router with the lower IP address will do 358 an active Transport open to the neighbor address. The higher IP 359 addressed neighbor will do a passive Transport open. When the router 360 is using SCTP, the IP address comparison not be done since the SCTP 361 protocol can handle call collision. 363 The PIM router that performs the active open initiates the connection 364 with a locally generated source transport port number and a well- 365 known destination transport port number. The PIM router that 366 performs the passive open listens on the well-known local transport 367 port number and does not qualify the remote transport port number. 368 See Section 5 for well-known port number assignment for PORT. 370 When a Transport connection goes down, Join or Prune state that was 371 sent over the Transport connection is still retained. The neighbor 372 should not be considered down until the neighbor timer has expired. 373 This allows routers to do a control-plane switchover without 374 disrupting the network. If a Transport connection is reestablished 375 before the neighbor timer expires, the previous state is intact and 376 any new JP messages sent cause state to be created or removed 377 (depending on if it was a Join or Prune). If the neighbor timer does 378 expire, only the upstream router, that has oif-list state, to the 379 expired downstream neighbor will need to clear state. A downstream 380 router, when an upstream neighboring router has expired, will simply 381 RPF to a new neighbor where it would trigger JP messages like it 382 would in [RFC4601]. It is required of a PIM router to clear it's 383 neighbor table for a neighbor who has timed out due to neighbor 384 holdtime expiration. 386 When a router is in Datagram Mode with a neighbor and has been 387 sending periodic JP messages to it and then the Transport connection 388 has been established to the neighbor, there is no requirement for the 389 downstream router to send JP messages to the upstream neighbor. The 390 upstream router can keep the state maintained from the Datagram Mode 391 creation. However when a router is in Transport Mode with a neighbor 392 and moves to Datagram Mode because the transport connection went down 393 (and several attempts to reestablish the transport connection fail), 394 the router cannot be sure that all the JP data was received by the 395 neighbor. Therefore, it is required to send a full set of JP 396 messages to refresh or re-create state in the upstream neighbor. 398 An upstream neighbor does have the responsibility of removing the 399 timer-activated timeout of an oif-list entry. When a Transport 400 connection is established, the timer-activated timeout is disabled. 401 When a Transport connection goes down, the timer-activated timeout 402 for an oif-list is enabled. Both the upstream and downstream routers 403 stay in sync based on the state of the Transport connection. If the 404 upstream router has timer-activated timeout on oif-lists, the 405 downstream router will be sending periodic JPs. Otherwise, the 406 downstream router suppresses sending periodic JPs because it assumes 407 the upstream router has disabled the timer-activated timeout of oif- 408 list entries the downstream router has previously joined. 410 4.1. TCP Connection Maintenance 412 TCP is designed to keep connections up indefinitely during a period 413 of network disconnection. If a PIM-over-TCP router fails, the TCP 414 connection may stay up until the neighbor actually reboots, and even 415 then it may continue to stay up until you actually try to send the 416 neighbor some information. This is particularly relevant to PIM, 417 since the flow of JPs might be in only one direction, and the 418 downstream neighbor might never get any indication via TCP that the 419 other end of the connection isn't really there. 421 Most applications using TCP want to detect when a neighbor is no 422 longer there, so that the associated application state can be 423 released. Also, one wants to clean up the TCP state, and not keep 424 half-open connections around indefinitely. This is accomplished by 425 using PIM Hellos and by not introducing an application-specific or 426 new PIM keep-alive message. Therefore, when a GENID changes from a 427 received PIM Hello message, and a TCP connection is established or 428 attempting to be established, the local side will tear down the 429 connection and attempt to reopen a new one for the new instance of 430 the neighbor coming up. 432 When PORT capable routers come up and try to establish transport 433 connections with their neighbors, but cannot for some reason, after 3 434 attempts to do so, the router should go into datagram mode and not 435 advertise the PORT Hello option anymore. Operator intervention is 436 required to restart the process after the problem is found. 438 4.2. Transitional Periods 440 There may be transitional periods when a router receives, from a 441 given neighbor, both datagram JP messages and JP messages sent over a 442 transport connection. When this happens, a transport connection to a 443 particular neighbor is established, and as long as it remains 444 established, the router MUST ignore PIM messages sent in Datagram 445 Mode from that neighbor. Otherwise, the datagram messages could get 446 out of order with respect to the transport messages, and the router 447 could end up in an erroneous state of pruning joined state or joining 448 pruned state which it is unable to recover from as long as the 449 transport connection stays up. 451 4.3. On-demand versus Pre-configured Connections 453 Transport connections could be established when they are needed or 454 when a router interface to other PIM neighbors has come up. The 455 advantages of on-demand Transport connection establishment are the 456 reduction of router resources. Especially in the case where there is 457 no need for n^2 connections on a network interface or MDT tunnel. 458 The disadvantages are deciding what to do when a JP message needs to 459 be sent and a Transport connection is not established yet. An 460 implementation can either send a Datagram Mode JP or queue the JP to 461 be sent as a Transport Mode JP after the Transport connection is 462 established. 464 If a router interface has become operational and PIM neighbors are 465 learned from Hello messages, at that time, Transport connections may 466 be established. The advantage is that a connection is ready to 467 transport data by the time a JP messages needs to be sent. The 468 disadvantage is there can be more connections established than 469 needed. This can occur when there is a small set of RPF neighbors 470 for the active distribution trees compared to the total number of 471 neighbors. Even when Transport connections are pre-established 472 before they are needed, a connection can go down and an 473 implementation will have to deal with an on-demand situation. 475 Therefore, this specification recommends but does not mandate the use 476 of on-demand Transport connection establishment. 478 4.4. Possible Hello Suppression Considerations 480 This specification indicates that a Transport connection cannot be 481 established until a Hello message is received. One reason for this 482 is to determine if the PIM neighbor supports this specification and 483 the other is to determine the remote address to use to establish the 484 Transport connection. 486 There are cases where it is desirable to suppress entirely the 487 transmission of Hello messages. In this case, it is outside the 488 scope of this document on how to determine if the PIM neighbor 489 supports this specification as well as an out-of-band (outside of the 490 PIM protocol) method to determine the remote address to establish the 491 Transport connection. 493 4.5. Avoiding a Pair of Connections between Neighbors 495 To ensure there are not two connections between a pair of PIM 496 neighbors, the following set of rules must be followed. Let A and B 497 be two PIM neighbors where A's IP address is numerically smaller than 498 B's IP address, and each is known to the other as having a potential 499 PIM adjacency relationship. 501 At node A: 503 o If there is already an established TCP connection to B, on the 504 PIM-over-TCP port, then A MUST NOT attempt to establish a new 505 connection to B. Rather it uses the established connection to send 506 JPs to B. (This is independent of which node initiated the 507 connection.) 509 o If A has initiated a connection to B, but the connection is still 510 in the process of being established, then A MUST refuse any 511 connection on the PIM-over-TCP port from B. 513 o At any time when A does not have a connection to B which is either 514 established or in the process of being established, A MUST accept 515 connections from B. 517 At node B: 519 o If there is already an established TCP connection to A, on the 520 PIM-over-TCP port, then B MUST NOT attempt to establish a new 521 connection to A. Rather it uses the established connection to send 522 JPs to A. (This is independent of which node initiated the 523 connection.) 525 o If B has initiated a connection to A, but the connection is still 526 in the process of being established, then if A initiates a 527 connection to, B MUST accept the connection initiated by A and 528 must release the connection which it (B) initiated. 530 5. Common Header Definition 532 It may be desirable for scaling purposes to include JP messages from 533 different PIM protocol instances to be sent over the same Transport 534 connection. Also, it may be desirable to have a set of JP messages 535 for one address-family sent over a Transport connection that is 536 established over a different address-family network layer. 538 To be able to do this we need a common header that is inserted and 539 parsed for each PIM JP message that is sent on a Transport 540 connection. This common header will provide both record boundary and 541 demux points when sending over a stream protocol like Transport. 543 Each JP message will have in front of it the following common header 544 in Type/Length/Value format. And multiple different TLV types can be 545 sent over the same Transport connection. 547 To make sure PIM JP messages are delivered as soon as the TCP 548 transport layer receives the JP buffer, the TCP Push flag will be set 549 in all outgoing JP messages sent over a TCP transport connection. 551 PIM messages will be sent using destination TCP port number 8471. 552 When using SCTP as the reliable transport, destination port number 553 8471 will be used. See Section 11 for IANA considerations. 555 If the buffer length of the received TLV message is less than what is 556 encoded in the TLV Length field, the entire TLV encoded message is 557 ignored and a error message is logged. Likewise, if the received 558 buffer length left to process at each record parsing level, is less 559 than the JP Message Length, the rest of the message is malformed and 560 not processed. 562 Each JP message that has passed the length checks above, contained in 563 the TLV encoding, will be error checked individually. This includes 564 a bad PIM checksum, illegal type fields, or illegal addresses. If 565 any parsing errors occur in a single JP message, it is skipped over 566 and not processed but other JP message records in the TLV are still 567 parsed and processed. 569 The current list of defined TLVs are: 571 IPv4 JP Message 573 0 1 2 3 574 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 575 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 576 | Type = 1 | Length = (12 * X) + Y | 577 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 578 | JP Message Length | Reserved |I-Type| 579 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 580 | Interface ID | 581 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 582 | Instance ID . . . | 583 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 584 | . . . Instance ID | 585 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 586 | PIMv2 JP Message | 587 | . | 588 | . | 589 | . | 590 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 591 | JP Message Length | Reserved |I-Type| 592 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 593 | Instance ID . . . | 594 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 595 | . . . Instance ID | 596 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 597 | PIMv2 JP Message | 598 | . | 599 | . | 600 | . | 601 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 603 The IPv4 JP common header is used when a JP message is sent that has 604 all IPv4 encoded addresses in the PIM payload. 606 Length: In bytes for the value part of the Type/Length/Value 607 encoding. Where there are 12 bytes per JP message (where X above 608 is the number of JP messages contained) enclosed in one 609 transmission plus Y which is the sum of each "JP Message Length" 610 field that appears in the transmission. 612 I-Type: Defines the encoding and semantics of the Instance ID 613 field. This is not specified in this specification. 615 Interface ID: This is the Interface ID from the Hello TLV, defined 616 in this specification, the PIM router is sending to the PIM 617 neighbor. It indicates to the PIM neighbor what interface to 618 associate the JP Join or Prune with. 620 Instance ID: This can be a VPN-ID. This field could also be a BGP 621 Route Target (RT) or BGP Route Distinguisher (RD) as defined in 622 [RFC4364]. Not specified in this specification. 624 Reserved: Set to zero on transmission and ignored on receipt. 626 JP Message Length: The number of bytes that follow which make up 627 the PIMv2 JP message. 629 PIMv2 JP Message: PIMv2 Join/Prune message and payload with no IP 630 header in front of it. As you can see from the packet format 631 diagram, multiple JP messages can go into one TCP/SCTP stream from 632 the same or different Instance IDs. 634 IPv6 JP Message 636 0 1 2 3 637 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 638 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 639 | Type = 2 | Length = (12 * X) + Y | 640 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 641 | JP Message Length | Reserved |I-Type| 642 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 643 | Interface ID | 644 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 645 | Instance ID . . . | 646 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 647 | . . . Instance ID | 648 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 649 | PIMv2 JP Message | 650 | . | 651 | . | 652 | . | 653 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 654 | JP Message Length | Reserved |I-Type| 655 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 656 | Instance ID . . . | 657 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 658 | . . . Instance ID | 659 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 660 | PIMv2 JP Message | 661 | . | 662 | . | 663 | . | 664 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 666 The IPv6 JP common header is used when a JP message is sent that has 667 all IPv6 encoded addresses in the PIM payload. 669 Length: In bytes for the value part of the Type/Length/Value 670 encoding. Where there are 12 bytes per JP message (where X above 671 is the number of JP messages contained) enclosed in one 672 transmission plus Y which is the sum of each "JP Message Length" 673 field that appears in the transmission. 675 I-Type: Defines the encoding and semantics of the Instance ID 676 field. This is not specified in this specification. 678 Interface ID: This is the Interface ID from the Hello TLV, defined 679 in this specification, the PIM router is sending to the PIM 680 neighbor. It indicates to the PIM neighbor what interface to 681 associate the JP Join or Prune with. 683 Instance ID: This can be a VPN-ID, BGP Route Target (RT) or BGP 684 Route Distinguisher (RD). Not specified in this specification. 686 Reserved: Set to zero on transmission and ignored on receipt. 688 JP Message Length: The number of bytes that follow which make up 689 the PIMv2 JP message. 691 PIMv2 JP Message: PIMv2 Join/Prune message and payload with no IP 692 header in front of it. As you can see from the packet format 693 diagram, multiple JP messages can go into one TCP/SCTP stream from 694 the same or different Instance IDs. 696 6. Join/Prune Processing 698 When a PORT neighbor transitions to using Transport Mode, the 699 downstream router sends JP messages for existing routes that RPF to 700 the neighbor over the Transport connection. In addition, periodic JP 701 messages are stopped and only incremental JPs are sent thereafter. 703 A router which has a Transport connection established MUST send and 704 receive JP messages over the Transport session to that given peer as 705 well as accept and process native JP messages as described in 706 [RFC4601]. 708 When a Transport connection is established for a newly discovered 709 neighbor, the downstream router triggers JP messages for its existing 710 state. This is to allow the upstream router to build state it may 711 previously not had. If state had existed due to a Native JP, the 712 expiration timer would have been started. Now it can be stopped 713 because the state is being sent incrementally over the Transport 714 connection. 716 When a Transport connection goes down to a given neighbor, the 717 downstream router does not have to trigger native JP messages. It 718 can wait for its next periodic interval to send a native JP messages. 719 When the upstream router receives the native JP message, it will 720 start the expiration timer for the oif associated with the state from 721 the JP message. 723 Note, since JP messages are sent over a Transport connection, no 724 Prune Override or Join Suppression are possible for these messages. 726 7. Outgoing Interface List Explicit Tracking 728 Since this specification indicates the use of TCP/SCTP for PIM JP 729 messages over point-to-point or NBMA type links, explicit tracking 730 can be achieved by tracking only oif-list state and not per-neighbor 731 per oif-list state. This is true for segmented LANs and in segmented 732 MDT/PMSI environments. 734 By using explicit tracking of oifs, the router tracks all downstream 735 neighbors which have expressed interest in a route on a given 736 interface. The list of tracked routers is one of the checks used to 737 determine whether traffic needs to be forwarded on a given interface 738 or not. 740 For (*,G) and (S,G) routes, the router starts forwarding traffic on 741 an interface when a Join is received from a neighbor on such an 742 interface. This is tracking the oif to the neighbor. When the 743 neighbor sends a Prune, the interface is removed and forwarding of 744 traffic stops on the interface. 746 When all interfaces are removed from the oif-list, the route entry 747 can be removed. 749 For (S,G,R) routes, typically is tracking Prune state on the shared 750 tree. One at least one downstream neighbor sends a Prune over a 751 Transport connection, the (S,G,R) state is create with a empty 752 outgoing interface list. If a subsequent JP is received over a 753 Transport connection which has (*,G) in the join-list and does not 754 have (S,G,R) in the prune-list, the upstream router will add the 755 interface the JP message was received on to the oif-list. And oif- 756 list based explicit tracking will occur just like in the (*,G) and 757 (S,G) route case above. 759 The only difference in the (S,G,R) route case, is that when the 760 outgoing interface is pruned, the entry must stay in the route table 761 or else forwarding will occur on the interfaces for the (*,G) entry. 762 Therefore, explicit tracking for Prunes must be provided. Only when 763 the (S,G,R) oif-list interfaces match the interfaces in the (*,G) can 764 the (S,G,R) route be removed. 766 8. Multiple Instances and Address-Family Support 768 Multiple instances of the PIM protocol may be used to support 769 multiple VPNs or within a VPN to support multiple address families. 770 Multiple instances can cause a multiplier effect on the number of 771 router resources consumed. To be able to have an option to use 772 router resources more efficiently, muxing JP messages over fewer 773 Transport connections can be performed. 775 There are two ways this can be accomplished, one using a common 776 header format over a TCP connection and the other using multiple 777 streams over a single SCTP connection. 779 Using the Common Header format described previously in this 780 specification, using different TLVs, both IPv4 and IPv6 based JP 781 messages can be encoded within a Transport connection. Likewise, 782 within a TLV, multiple occurrences of JP messages can occur and are 783 tagged with an instance-ID so multiple JP messages for different VPNs 784 can use a single Transport connection. 786 When using SCTP multi-streaming, the common header is still used to 787 convey instance information but an SCTP association is used, on a 788 per-VPN basis, to send data concurrently for multiple instances. 789 When data is sent concurrently, head of line blocking, which can 790 occur when using TCP, is avoided. 792 9. Miscellany 794 No changes expected in processing of other PIM messages like PIM 795 Asserts, Grafts, Graft-Acks, Registers, and Register-Stops. This 796 goes for BSR and Auto-RP type messages as well. 798 This extension is applicable only to PIM-SM, PIM-SSM and Bidir-PIM. 799 It does not take requirements for PIM-DM into consideration. 801 10. Security Considerations 803 Transport connections can be authenticated using HMACs MD5 and SHA-1 804 similar to use in BGP [RFC4271] and MSDP [RFC3618]. 806 When using SCTP as the transport protocol, [RFC4895] can be used, on 807 a per SCTP association basis to authenticate PIM data. 809 11. IANA Considerations 811 This specification requests IANA to allocate a TCP port number and a 812 SCTP port number for the use of PIM-Over-Reliable-Transport. 814 12. Acknowledgments 816 The authors would like to give a special thank you and appreciation 817 to Nidhi Bhaskar for her initial design and early prototype of this 818 idea. 820 Appreciation goes to Randall Stewart for his authoritative review and 821 recommendation for using SCTP. 823 Thanks also goes to the following for their ideas and commentary 824 review of this specification, Mike McBride, Toerless Eckert, Yiqun 825 Cai, Albert Tian, Suresh Boddapati, Nataraj Batchu, Daniel Voce, John 826 Zwiebel, Yakov Rekhter, and Lenny Giuliano. 828 A special thank you goes to Eric Rosen for his very detailed review 829 and commentary. Many of his comments are reflected as text in this 830 specification. 832 13. References 834 13.1. Normative References 836 [RFC0761] Postel, J., "DoD standard Transmission Control Protocol", 837 RFC 761, January 1980. 839 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 840 Requirement Levels", BCP 14, RFC 2119, March 1997. 842 [RFC3618] Fenner, B. and D. Meyer, "Multicast Source Discovery 843 Protocol (MSDP)", RFC 3618, October 2003. 845 [RFC4271] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway 846 Protocol 4 (BGP-4)", RFC 4271, January 2006. 848 [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private 849 Networks (VPNs)", RFC 4364, February 2006. 851 [RFC4601] Fenner, B., Handley, M., Holbrook, H., and I. Kouvelas, 852 "Protocol Independent Multicast - Sparse Mode (PIM-SM): 853 Protocol Specification (Revised)", RFC 4601, August 2006. 855 [RFC4895] Tuexen, M., Stewart, R., Lei, P., and E. Rescorla, 856 "Authenticated Chunks for the Stream Control Transmission 857 Protocol (SCTP)", RFC 4895, August 2007. 859 [RFC4960] Stewart, R., "Stream Control Transmission Protocol", 860 RFC 4960, September 2007. 862 13.2. Informative References 864 [AFI] IANA, "Address Family Indicators (AFIs)", ADDRESS FAMILY 865 NUMBERS http://www.iana.org/numbers.html, February 2007. 867 [HELLO-OPT] 868 IANA, "PIM Hello Options", PIM-HELLO-OPTIONS per 869 RFC4601 http://www.iana.org/assignments/pim-hello-options, 870 March 2007. 872 [MCAST-VPN] 873 Rosen and Aggarwal, "Multicast in MPLS/BGP VPNs", Internet 874 Draft draft-ietf-l3vpn-2547bis-mcast-05.txt, July 2007. 876 Authors' Addresses 878 Dino Farinacci 879 cisco Systems 880 Tasman Drive 881 San Jose, CA 95134 882 USA 884 Email: dino@cisco.com 886 IJsbrand Wijnands 887 cisco Systems 888 Tasman Drive 889 San Jose, CA 95134 890 USA 892 Email: ice@cisco.com 894 Apoorva Karan 895 cisco Systems 896 170 Tasman Drive 897 San Jose, CA 898 USA 900 Email: apoorva@cisco.com 902 Arjen Boers 903 cisco Systems 904 Tasman Drive 905 San Jose, CA 95134 906 USA 908 Email: aboers@cisco.com 910 Maria Napierala 911 AT&T Labs 912 200 Laurel Drive 913 Middletown, New Jersey 07748> 914 USA 916 Email: mnapierala@att.com 918 Full Copyright Statement 920 Copyright (C) The IETF Trust (2008). 922 This document is subject to the rights, licenses and restrictions 923 contained in BCP 78, and except as set forth therein, the authors 924 retain all their rights. 926 This document and the information contained herein are provided on an 927 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 928 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 929 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS 930 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 931 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 932 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 934 Intellectual Property 936 The IETF takes no position regarding the validity or scope of any 937 Intellectual Property Rights or other rights that might be claimed to 938 pertain to the implementation or use of the technology described in 939 this document or the extent to which any license under such rights 940 might or might not be available; nor does it represent that it has 941 made any independent effort to identify any such rights. Information 942 on the procedures with respect to rights in RFC documents can be 943 found in BCP 78 and BCP 79. 945 Copies of IPR disclosures made to the IETF Secretariat and any 946 assurances of licenses to be made available, or the result of an 947 attempt made to obtain a general license or permission for the use of 948 such proprietary rights by implementers or users of this 949 specification can be obtained from the IETF on-line IPR repository at 950 http://www.ietf.org/ipr. 952 The IETF invites any interested party to bring to its attention any 953 copyrights, patents or patent applications, or other proprietary 954 rights that may cover technology that may be required to implement 955 this standard. Please address the information to the IETF at 956 ietf-ipr@ietf.org. 958 Acknowledgment 960 Funding for the RFC Editor function is provided by the IETF 961 Administrative Support Activity (IASA).