idnits 2.17.1 draft-farinacci-pim-port-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 19. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 843. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 854. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 861. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 867. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (February 1, 2008) is 5927 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- ** Obsolete normative reference: RFC 761 (Obsoleted by RFC 793, RFC 7805) ** Obsolete normative reference: RFC 4601 (Obsoleted by RFC 7761) ** Obsolete normative reference: RFC 4960 (Obsoleted by RFC 9260) -- Duplicate reference: RFC4601, mentioned in 'HELLO-OPT', was also mentioned in 'RFC4601'. -- Obsolete informational reference (is this intentional?): RFC 4601 (ref. 'HELLO-OPT') (Obsoleted by RFC 7761) == Outdated reference: A later version (-10) exists of draft-ietf-l3vpn-2547bis-mcast-05 Summary: 4 errors (**), 0 flaws (~~), 2 warnings (==), 9 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group Dino Farinacci 3 Internet-Draft IJsbrand Wijnands 4 Intended status: Experimental Apoorva Karan 5 Expires: August 4, 2008 Arjen Boers 6 cisco Systems 7 Maria Napierala 8 AT&T Labs 9 February 1, 2008 11 A Reliable Transport Mechanism for PIM 12 draft-farinacci-pim-port-00.txt 14 Status of this Memo 16 By submitting this Internet-Draft, each author represents that any 17 applicable patent or other IPR claims of which he or she is aware 18 have been or will be disclosed, and any of which he or she becomes 19 aware will be disclosed, in accordance with Section 6 of BCP 79. 21 Internet-Drafts are working documents of the Internet Engineering 22 Task Force (IETF), its areas, and its working groups. Note that 23 other groups may also distribute working documents as Internet- 24 Drafts. 26 Internet-Drafts are draft documents valid for a maximum of six months 27 and may be updated, replaced, or obsoleted by other documents at any 28 time. It is inappropriate to use Internet-Drafts as reference 29 material or to cite them other than as "work in progress." 31 The list of current Internet-Drafts can be accessed at 32 http://www.ietf.org/ietf/1id-abstracts.txt. 34 The list of Internet-Draft Shadow Directories can be accessed at 35 http://www.ietf.org/shadow.html. 37 This Internet-Draft will expire on August 4, 2008. 39 Copyright Notice 41 Copyright (C) The IETF Trust (2008). 43 Abstract 45 This draft describes how a reliable transport mechanism can be used 46 by the PIM protocol to optimize CPU and bandwidth resource 47 utilization by eliminating periodic Join/Prune message transmission. 48 This draft proposes a modular extension to PIM to use either the TCP 49 or SCTP transport protocol. 51 Table of Contents 53 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 54 1.1. Requirements Notation . . . . . . . . . . . . . . . . . . 4 55 1.2. Definitions . . . . . . . . . . . . . . . . . . . . . . . 4 56 2. Protocol Overview . . . . . . . . . . . . . . . . . . . . . . 5 57 3. New PIM Hello Options . . . . . . . . . . . . . . . . . . . . 6 58 3.1. PIM over the TCP Transport Protocol . . . . . . . . . . . 6 59 3.2. PIM over the SCTP Transport Protocol . . . . . . . . . . . 7 60 4. Establishing Transport Connections . . . . . . . . . . . . . . 8 61 4.1. TCP Connection Maintenance . . . . . . . . . . . . . . . . 9 62 4.2. Transitional Periods . . . . . . . . . . . . . . . . . . . 9 63 4.3. On-demand versus Pre-configured Connections . . . . . . . 10 64 4.4. Possible Hello Suppression Considerations . . . . . . . . 10 65 4.5. Avoiding a Pair of Connections between Neighbors . . . . . 11 66 5. Common Header Definition . . . . . . . . . . . . . . . . . . . 12 67 6. Join/Prune Processing . . . . . . . . . . . . . . . . . . . . 16 68 7. Outgoing Interface List Explicit Tracking . . . . . . . . . . 17 69 8. Multiple Instances and Address-Family Support . . . . . . . . 18 70 9. Miscellany . . . . . . . . . . . . . . . . . . . . . . . . . . 19 71 10. Security Considerations . . . . . . . . . . . . . . . . . . . 20 72 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 21 73 12. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 22 74 13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 23 75 13.1. Normative References . . . . . . . . . . . . . . . . . . . 23 76 13.2. Informative References . . . . . . . . . . . . . . . . . . 23 77 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 24 78 Intellectual Property and Copyright Statements . . . . . . . . . . 25 80 1. Introduction 82 The goals of this specification are: 84 o To create a simple incremental mechanism to provide reliable PIM 85 message delivery in PIM version 2. 87 o The reliable transport mechanism will be used for Join-Prune 88 message transmission only. 90 o Can be used for link-local transmission of Join-Prune messages or 91 multi-hop for use in a multicast VPN environments. 93 o When a router supports this specification, it need not use the 94 reliable transport mechanism on every interface. That is, 95 negotiation on per interface basis (or MDT basis) will occur. 97 The explicit non-goals of this specification are: 99 o Changes to the PIM protocol machinery as defined in [RFC4601]. 100 The reliable transport mechanism will be used as a plugin layer so 101 the PIM component does not know it is really there. 103 o Provide support for both Datagram mode and Transport mode (see 104 Section 1.2 for definitions) on the same physical interface or 105 MDT. 107 This document will specify how periodic JP message transmission can 108 be eliminated by using TCP [RFC0761] or SCTP [RFC4960] as the 109 reliable transport mechanism for JP messages. 111 1.1. Requirements Notation 113 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 114 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 115 document are to be interpreted as described in [RFC2119]. 117 1.2. Definitions 119 PORT: Stands for PIM Over Reliable Transport. Which is the short 120 form for describing the mechanism in this specification where PIM 121 can use the TCP or SCTP transport protocol. 123 JP Message: An abbreviation for a Join-Prune message. 125 Periodic JP: A JP message sent periodically to refresh state. 127 Incremental JP: A JP message sent as a result of state creation or 128 deletion events. Also known as a triggered message. 130 Native JP: A JP message which is carried with an IP protocol type 131 of PIM. 133 Reliable JP: A JP message using TCP or SCTP for transport. 135 Datagram Mode: The current procedures PIM uses by encapsulating JP 136 messages in IP packets sent either triggered or periodically. 138 Transport Mode: Procedures used by PIM defined in this 139 specification for sending JP messages over the TCP or SCTP 140 transport layer. 142 MDT/PMSI: Used interchangeably in this document. An MDT tunnel is 143 one used between PE router to provide support for a Multicast VPN. 144 The new standards term for an MDT tunnel is a Provider-Network 145 Multicast Service Interface or PMSI. 147 2. Protocol Overview 149 PIM Over Reliable Transport (PORT) is a simple extension to PIMv2 for 150 refresh reduction of PIM JP messages. It involves sending 151 incremental rather than periodic JPs over a TCP/SCTP connection 152 between PIM neighbors. 154 PORT can be incrementally used on an interface between PORT capable 155 neighbors. Routers which are not PORT capable can continue to use 156 PIM in Datagram Mode. PORT capability is detected using a new PORT 157 Capable PIM Hello Option. 159 When PORT is used, only incremental JPs are sent from downstream 160 routers to upstream routers. As such, downstream routers do not 161 generate periodic JPs for routes which RPF to a PORT-capable 162 neighbor. 164 For Joins and Prunes, which are received over a TCP/SCTP connection, 165 the upstream router does not start or maintain timers on the outgoing 166 interface entry. Instead, it explicitly tracks downstream routers 167 which have expressed interest. An interface is deleted from the 168 outgoing interface list only when all downstream routers on the 169 interface, no longer wish to receive traffic. 171 Because incremental JPs are sent over a TCP/SCTP connection, no Join 172 suppression or Prune-Override of incremental JPs is possible on 173 multi-access LANs. As a result, upstream routers, which receive an 174 incremental Join or Prune that creates state, explicitly track all 175 downstream nodes. Note, for point-to-point links there is no need 176 for explicitly tracking downstream nodes. 178 There is no change proposed for the PIM JP packet format. However, 179 for JPs sent over TCP/SCTP connections, no IP Header is included. 180 The message begins with the PIM common header, followed by the JP 181 message. See section Section 5 for details on the common header. 183 3. New PIM Hello Options 185 3.1. PIM over the TCP Transport Protocol 187 Option Type: PIM-over-TCP Capable 189 0 1 2 3 190 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 191 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 192 | Type = 65006 | Length = X + 4 | 193 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 194 | TCP Connection ID AFI | Reserved | 195 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 196 | TCP Connection ID | 197 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 199 Allocated Hello Type values can be found in [HELLO-OPT]. 201 When a router is configured to use PIM over TCP on a given interface, 202 it MUST include the PORT Capable hello option in its Hello messages 203 for that interface. If a router is explicitly disabled from using JP 204 over TCP it MUST NOT include the PORT Capable hello option in its 205 Hello messages. When the router cannot setup a TCP connection, it 206 will refrain from including this option. 208 This option is only used when an interface is point-to-point, 209 segmented multi-access LAN or a S-PMSI [MCAST-VPN]. In all other 210 cases, Datagram Mode is used. 212 Implementation may provide a configuration option to enable or 213 disable PORT functionality. We recommend that this capability be 214 disabled by default. 216 Length: In bytes for the value part of the Type/Length/Value 217 encoding. Where X is 4 bytes if IP AFI of value 1 is used and 128 218 bytes when IPv6 AFI of 2 is used [AFI]. 220 TCP Connection ID AFI: The AFI value to describe the address-family 221 of the address of the TCP Connection ID field. 223 Reserved: Set to zero on transmission and ignored on receipt. 225 TCP Connection ID: An IP or IPv6 address used to establish the TCP 226 connection. When this field is 0, a mechanism outside the scope 227 of this spec is used to obtain the addresses used to establish the 228 TCP connection. 230 3.2. PIM over the SCTP Transport Protocol 232 Option Type: PIM-over-SCTP Capable 234 0 1 2 3 235 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 236 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 237 | Type = 65007 | Length = X + 4 | 238 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 239 | SCTP Connection ID AFI | Reserved | 240 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 241 | SCTP Connection ID | 242 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 244 Allocated Hello Type values can be found in [HELLO-OPT]. 246 When a router is configured to use PIM over SCTP on a given 247 interface, it MUST include the PORT Capable hello option in its Hello 248 messages for that interface. If a router is explicitly disabled from 249 using JP over SCTP it MUST NOT include the PORT Capable hello option 250 in its Hello messages. When the router cannot setup a SCTP 251 connection, it will refrain from including this option. 253 This option is only used when an interface is point-to-point or when 254 a multi-access LAN or MDT is segmented (also known as "Partitioned 255 MDTs" in a non-broadcast multi-access (NBMA) mode. In all other 256 cases, Datagram Mode is used. 258 Implementation may provide a configuration option to enable or 259 disable PORT functionality. We recommend that this capability be 260 disabled by default. 262 Length: In bytes for the value part of the Type/Length/Value 263 encoding. Where X is 4 bytes if IP AFI of value 1 is used and 128 264 bytes when IPv6 AFI of 2 is used [AFI]. 266 SCTP Connection ID AFI: The AFI value to describe the address- 267 family of the address of the SCTP Connection ID field. 269 Reserved: Set to zero on transmission and ignored on receipt. 271 SCTP Connection ID: An IP or IPv6 address used to establish the 272 SCTP connection. When this field is 0, a mechanism outside the 273 scope of this spec is used to obtain the addresses used to 274 establish the SCTP connection. 276 4. Establishing Transport Connections 278 Since this specification describes using Transport on point-to- point 279 links or NBMA configured MDTs, a router knows when a Transport is 280 established with the neighbor. When the Transport connection is not 281 established, Datagram Mode is used. When the Transport connection 282 becomes established Transport Mode is in effect where the router can 283 suppress sending periodic JPs. 285 When a router receives a Hello from a neighbor it has not previously 286 heard from, or the PORT-Capable Option is included in a Hello that 287 was not previously included by an existing neighbor, the router will 288 attempt to establish a Transport connection with the neighbor. When 289 the router is using TCP it will compare the IP address it uses to 290 send Hellos on the interface with the IP address the neighbor is 291 using to send Hellos. The router with the lower IP address will do 292 an active Transport open to the neighbor address. The higher IP 293 addressed neighbor will do a passive Transport open. When the router 294 is using SCTP, the IP address comparison not be done since the SCTP 295 protocol can handle call collision. 297 When a Transport connection goes down, Join or Prune state that was 298 sent over the Transport connection is still retained. The neighbor 299 should not be considered down until the neighbor timer has expired. 300 This allows routers to do a control-plane switchover without 301 disrupting the network. If a Transport connection is reestablished 302 before the neighbor timer expires, the previous state is intact and 303 any new JP messages sent cause state to be created or removed 304 (depending on if it was a Join or Prune). If the neighbor timer does 305 expire, only the upstream router, that has oif-list state, to the 306 expired downstream neighbor will need to clear state. A downstream 307 router, when an upstream neighboring router has expired, will simply 308 RPF to a new neighbor where it would trigger JP messages like it 309 would today. It is required of a PIM router to clear it's neighbor 310 table for a neighbor who has timed out due to neighbor holdtime 311 expiration. 313 When a router is in Datagram Mode with a neighbor and has been 314 sending periodic JP messages to it and then the Transport connection 315 has been established to the neighbor, there is no requirement for the 316 downstream router to send JP messages to the upstream neighbor. The 317 upstream router can keep the state maintained from the Datagram Mode 318 creation. However when a router is in Transport Mode with a neighbor 319 and moves to Datagram Mode because the transport connection went down 320 (and several attempts to reestablish the transport connection fail), 321 the router cannot be sure that all the JP data was received by the 322 neighbor. Therefore, it is required to send a full set of JP 323 messages to refresh or re-create state in the upstream neighbor. 325 An upstream neighbor does have the responsibility of removing the 326 timer-activated timeout of an oif-list entry. When a Transport 327 connection is established, the timer-activated timeout is disabled. 328 When a Transport connection goes down, the timer-activated timeout 329 for an oif-list is enabled. Both the upstream and downstream routers 330 stay in sync based on the state of the Transport connection. If the 331 upstream router has timer-activated timeout on oif-lists, the 332 downstream router will be sending periodic JPs. Otherwise, the 333 downstream router suppresses sending periodic JPs because it assume 334 the upstream router has disabled the timer-activated timeout of oif- 335 list entries the downstream router has previously joined. 337 4.1. TCP Connection Maintenance 339 TCP is designed to keep connections up indefinitely during a period 340 of network disconnection. If a PIM-over-TCP router fails, the TCP 341 connection may stay up until the neighbor actually reboots, and even 342 then it may continue to stay up until you actually try to send the 343 neighbor some information. This is particularly relevant to PIM, 344 since the flow of JPs might be in only one direction, and the 345 downstream neighbor might never get any indication via TCP that the 346 other end of the connection isn't really there. 348 Most applications using TCP want to detect when a neighbor is no 349 longer there, so that the associated application state can be 350 released. Also, one wants to clean up the TCP state, and not keep 351 half-open connections around indefinitely. This is accomplished by 352 using PIM Hellos and by not introducing an application-specific or 353 new PIM keep-alive message. Therefore, when a GENID changes from a 354 received PIM Hello message, and a TCP connection is established or 355 attempting to be established, the local side will tear down the 356 connection and attempt to reopen a new one for the new instance of 357 the neighbor coming up. 359 When PORT capable routers come up and try to establish transport 360 connections with their neighbors, but cannot for some reason, after 3 361 attempts to do so, the router should go into datagram mode and not 362 advertise the PORT Hello option anymore. Operator intervention is 363 required to restart the process after the problem is found. 365 4.2. Transitional Periods 367 There may be transitional periods when a router receives, from a 368 given neighbor, both datagram JP messages and JP messages sent over a 369 transport connection. When this happens, a transport connection to a 370 particular neighbor is established, and as long as it remains 371 established, the router MUST ignore PIM messages sent in Datagram 372 Mode from that neighbor. Otherwise, the datagram messages could get 373 out of order with respect to the transport messages, and the router 374 could end up in an erroneous state of pruning joined state or joining 375 pruned state which it is unable to recover from as long as the 376 transport connection stays up. 378 4.3. On-demand versus Pre-configured Connections 380 Transport connections could be established when they are needed or 381 when a router interface to other PIM neighbors has come up. The 382 advantages of on-demand Transport connection establishment are the 383 reduction of router resources. Especially in the case where there is 384 no need for n^2 connections on a network interface or MDT tunnel. 385 The disadvantages are deciding what to do when a JP message needs to 386 be sent and a Transport connection is not established yet. An 387 implementation can either send a Datagram Mode JP or queue the JP to 388 be sent as a Transport Mode JP after the Transport connection is 389 established. 391 If a router interface has become operational and PIM neighbors are 392 learned from Hello messages, at that time, Transport connections may 393 be established. The advantage is that a connection is ready to 394 transport data by the time a JP messages needs to be sent. The 395 disadvantage is there can be more connections established than 396 needed. This can occur when there is a small set of RPF neighbors 397 for the active distribution trees compared to the total number of 398 neighbors. Even when Transport connections are pre-established 399 before they are needed, a connection can go down and an 400 implementation will have to deal with an on-demand situation. 402 Therefore, this specification recommends but does not mandate the use 403 of on-demand Transport connection establishment. 405 4.4. Possible Hello Suppression Considerations 407 This specification indicates that a Transport connection cannot be 408 established until a Hello message is received. One reason for this 409 is to determine if the PIM neighbor supports this specification and 410 the other is to determine the remote address to use to establish the 411 Transport connection. 413 There are cases where it is desirable to suppress entirely the 414 transmission of Hello messages. In this case, it is outside the 415 scope of this document on how to determine if the PIM neighbor 416 supports this specification as well as an out-of-band (outside of the 417 PIM protocol) method to determine the remote address to establish the 418 Transport connection. 420 4.5. Avoiding a Pair of Connections between Neighbors 422 To ensure there are not two connections between a pair of PIM 423 neighbors, the following set of rules must be followed. Let A and B 424 be two PIM neighbors where A's IP address is numerically smaller than 425 B's IP address, and each is known to the other as having a potential 426 PIM adjacency relationship. 428 At node A: 430 o If there is already an established TCP connection to B, on the 431 PIM-over-TCP port, then A MUST NOT attempt to establish a new 432 connection to B. Rather it uses the established connection to send 433 JPs to B. (This is independent of which node initiated the 434 connection.) 436 o If A has initiated a connection to B, but the connection is still 437 in the process of being established, then A MUST refuse any 438 connection on the PIM-over-TCP port from B. 440 o At any time when A does not have a connection to B which is either 441 established or in the process of being established, A MUST accept 442 connections from B. 444 At node B: 446 o If there is already an established TCP connection to A, on the 447 PIM-over-TCP port, then B MUST NOT attempt to establish a new 448 connection to A. Rather it uses the established connection to send 449 JPs to A. (This is independent of which node initiated the 450 connection.) 452 o If B has initiated a connection to A, but the connection is still 453 in the process of being established, then if A initiates a 454 connection to, B MUST accept the connection initiated by A and 455 must release the connection which it (B) initiated. 457 5. Common Header Definition 459 It may be desirable for scaling purposes to include JP messages from 460 different PIM protocol instances to be sent over the same Transport 461 connection. Also, it may be desirable to have a set of JP messages 462 for one address-family sent over a Transport connection that x4b is 463 established over a different address-family network layer. 465 To be able to do this we need a common header that is inserted and 466 parse for each PIM JP message that is sent on a Transport connection. 467 This common header will provide both record boundary and demux points 468 when sending over a stream protocol like Transport. 470 Each JP message will have in front of it the following common header 471 in Type/Length/Value format. And multiple different TLV types can be 472 sent over the same Transport connection. 474 To make sure PIM JP messages are delivered as soon as the TCP 475 transport layer receives the JP buffer, the TCP Push flag will be set 476 in all outgoing JP messages sent over a TCP transport connection. 478 PIM messages will be sent using TCP port number TBD. When using SCTP 479 as the reliable transport, port number TBD will be used. See 480 Section 11 for IANA considerations. 482 If the buffer length of the received TLV message is less than what is 483 encoded int he TLV Length field, the entire TLV encoded message is 484 ignored and a error message is logged. Likewise, if the received 485 buffer length left to process at each record parsing level, is less 486 than the JP Message Length, the rest of the message is malformed and 487 not processed. 489 Each JP message that has passed the length checks above, contained in 490 the TLV encoding, will be error checked individually. This includes 491 a bad PIM checksum, illegal type fields, or illegal addresses. If 492 any parsing errors occur in a single JP message, it is skipped over 493 and not processed but other JP message records in the TLV are still 494 parsed and processed. 496 The current list of defined TLVs are: 498 IPv4 JP Message 500 0 1 2 3 501 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 502 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 503 | Type = 1 | Length = (12 * X) + Y | 504 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 505 | JP Message Length | Reserved |I-Type| 506 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 507 | Instance ID . . . | 508 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 509 | . . . Instance ID | 510 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 511 | PIMv2 JP Message | 512 | . | 513 | . | 514 | . | 515 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 516 | JP Message Length | Reserved |I-Type| 517 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 518 | Instance ID . . . | 519 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 520 | . . . Instance ID | 521 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 522 | PIMv2 JP Message | 523 | . | 524 | . | 525 | . | 526 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 528 The IPv4 JP common header is used when a JP message is sent that has 529 all IPv4 encoded addresses in the PIM payload. 531 Length: In bytes for the value part of the Type/Length/Value 532 encoding. Where there are 12 bytes per JP message (where X above 533 is the number of JP messages contained) enclosed in one 534 transmission plus Y which is the sum of each "JP Message Length" 535 field that appears in the transmission. 537 I-Type: Defines the encoding and semantics of the Instance ID 538 field. This is not specified in this specification. 540 Instance ID: This can be a VPN-ID. This field could also be a BGP 541 Route Target (RT) or BGP Route Distinguisher (RD) as defined in 542 [RFC4364]. Not specified in this specification. 544 Reserved: Set to zero on transmission and ignored on receipt. 546 JP Message Length: The number of bytes that follow which make up 547 the PIMv2 JP message. 549 PIMv2 JP Message: PIMv2 Join/Prune message and payload with no IP 550 header in front of it. As you can see from the packet format 551 diagram, multiple JP messages can go into one TCP/SCTP stream from 552 the same or different Instance IDs. 554 IPv6 JP Message 556 0 1 2 3 557 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 558 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 559 | Type = 2 | Length = (12 * X) + Y | 560 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 561 | JP Message Length | Reserved |I-Type| 562 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 563 | Instance ID . . . | 564 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 565 | . . . Instance ID | 566 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 567 | PIMv2 JP Message | 568 | . | 569 | . | 570 | . | 571 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 572 | JP Message Length | Reserved |I-Type| 573 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 574 | Instance ID . . . | 575 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 576 | . . . Instance ID | 577 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 578 | PIMv2 JP Message | 579 | . | 580 | . | 581 | . | 582 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 584 The IPv6 JP common header is used when a JP message is sent that has 585 all IPv6 encoded addresses in the PIM payload. 587 Length: In bytes for the value part of the Type/Length/Value 588 encoding. Where there are 12 bytes per JP message (where X above 589 is the number of JP messages contained) enclosed in one 590 transmission plus Y which is the sum of each "JP Message Length" 591 field that appears in the transmission. 593 I-Type: Defines the encoding and semantics of the Instance ID 594 field. This is not specified in this specification. 596 Instance ID: This can be a VPN-ID, BGP Route Target (RT) or BGP 597 Route Distinguisher (RD). Not specified in this specification. 599 Reserved: Set to zero on transmission and ignored on receipt. 601 JP Message Length: The number of bytes that follow which make up 602 the PIMv2 JP message. 604 PIMv2 JP Message: PIMv2 Join/Prune message and payload with no IP 605 header in front of it. As you can see from the packet format 606 diagram, multiple JP messages can go into one TCP/SCTP stream from 607 the same or different Instance IDs. 609 6. Join/Prune Processing 611 When a PORT neighbor transitions to using Transport Mode, the 612 downstream router sends JP messages for existing routes that RPF to 613 the neighbor over the Transport connection. In addition, periodic JP 614 messages are stopped and only incremental JPs are sent thereafter. 616 A router which has a Transport connection established MUST send and 617 receive JP messages over the Transport session to that given peer as 618 well as accept and process native JP messages as described in 619 [RFC4601]. 621 When a Transport connection is established for a newly discovered 622 neighbor, the downstream router triggers JP messages for its existing 623 state. This is to allow the upstream router to built state it may 624 previously not had. If state had existed due to a Native JP, the 625 expiration timer would have been started. Now it can be stopped 626 because the state is being sent incrementally over the Transport 627 connection. 629 When a Transport connection goes down to a given neighbor, the 630 downstream router does not have to trigger native JP messages. It 631 can wait for its next periodic interval to send a native JP messages. 632 When the upstream router receives the native JP message, it will 633 start the expiration timer for the oif associated with the state from 634 the JP message. 636 Note, since JP messages are sent over a Transport connection, no 637 Prune Override or Join Suppression are possible for these messages. 639 7. Outgoing Interface List Explicit Tracking 641 Since this specification indicates the use of TCP/SCTP for PIM JP 642 messages over point-to-point or NBMA type links, explicit tracking 643 can be achieved by tracking only oif-list state and not per-neighbor 644 per oif-list sate. 646 By using explicit tracking of oifs, the router tracks all downstream 647 neighbors which have expressed interest in a route on a given 648 interface. The list of tracked routers is one of the checks used to 649 determine whether traffic needs to be forwarded on a given interface 650 or not. 652 For (*,G) and (S,G) routes, the router starts forwarding traffic on 653 an interface when a Join is received from a neighbor on such an 654 interface. This is tracking the oif to the neighbor. When the 655 neighbor sends a Prune, the interface is removed and forwarding of 656 traffic stops on the interface. 658 When all interfaces are removed from the oif-list, the route entry 659 can be removed. 661 For (S,G,R) routes, typically is tracking Prune state on the shared 662 tree. One at least one downstream neighbor sends a Prune over a 663 Transport connection, the (S,G,R) state is create with a empty 664 outgoing interface list. If a subsequent JP is received over a 665 Transport connection which has (*,G) in the join-list and does not 666 have (S,G,R) in the prune-list, the upstream router will add the 667 interface the JP message was received on to the oif-list. And oif- 668 list based explicit tracking will occur just like in the (*,G) and 669 (S,G) route case above. 671 The only difference in the (S,G,R) route case, is that when the 672 outgoing interface is pruned, the entry must stay in the route table 673 or else forwarding will occur on the interfaces for the (*,G) entry. 674 Therefore, explicit tracking for Prunes must be provided. Only when 675 the (S,G,R) oif-list interfaces match the interfaces in the (*,G) can 676 the (S,G,R) route be removed. 678 8. Multiple Instances and Address-Family Support 680 Multiple instances of the PIM protocol may be used to support 681 multiple VPNs or within a VPN to support multiple address families. 682 Multiple instances can cause a multiplier effect on the number of 683 router resources consumed. To be able to have an option to use 684 router resources more efficiently, muxing JP messages over fewer 685 Transport connections can be performed. 687 There are two ways this can be accomplished, one using a common 688 header format over a TCP connection and the other using multiple 689 streams over a single SCTP connection. 691 Using the Common Header format described previously in this 692 specification, using different TLVs, both IPv4 and IPv6 based JP 693 messages can be encoded within a Transport connection. Likewise, 694 within a TLV, multiple occurrences of JP messages can occur and are 695 tagged with an instance-ID so multiple JP messages for different VPNs 696 can use a single Transport connection. 698 When using SCTP multi-streaming, the common header is still used to 699 convey instance information but an SCTP association is used, on a 700 per-VPN basis, to send data concurrently for multiple instances. 701 When data is sent concurrently, head of line blocking, which can 702 occur when using TCP, is avoided. 704 9. Miscellany 706 No changes expected in processing of other PIM messages like PIM 707 Asserts, Grafts, Graft-Acks, Registers, and Register-Stops. This 708 goes for BSR and Auto-RP type messages as well. 710 This extension is applicable only to PIM-SM, PIM-SSM and Bidir-PIM. 711 It does not take requirements for PIM-DM into consideration. 713 10. Security Considerations 715 Transport connections can be authenticated using HMACs MD5 and SHA-1 716 similar to use in BGP [RFC4271] and MSDP [RFC3618]. 718 When using SCTP as the transport protocol, [RFC4895] can be used, on 719 a per SCTP association basis to authenticate PIM data. 721 11. IANA Considerations 723 This specification requests IANA to allocate a TCP port number and a 724 SCTP port number for the use of PIM-Over-Reliable-Transport. 726 12. Acknowledgments 728 The authors would like to give a special thank you and appreciation 729 to Nidhi Bhaskar for her initial design and early prototype of this 730 idea. 732 Appreciation goes to Randall Stewart for his authoritative review and 733 recommendation for using SCTP. 735 Thanks also goes to the following for their ideas and commentary 736 review of this specification, Mike McBride, Toerless Eckert, Yiqun 737 Cai, Albert Tian, Suresh Boddapati, and Nataraj Batchu. 739 A special thank you goes to Eric Rosen for his very detailed review 740 and commentary. Many of his comments are reflected as text in this 741 specification. 743 13. References 745 13.1. Normative References 747 [RFC0761] Postel, J., "DoD standard Transmission Control Protocol", 748 RFC 761, January 1980. 750 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 751 Requirement Levels", BCP 14, RFC 2119, March 1997. 753 [RFC3618] Fenner, B. and D. Meyer, "Multicast Source Discovery 754 Protocol (MSDP)", RFC 3618, October 2003. 756 [RFC4271] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway 757 Protocol 4 (BGP-4)", RFC 4271, January 2006. 759 [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private 760 Networks (VPNs)", RFC 4364, February 2006. 762 [RFC4601] Fenner, B., Handley, M., Holbrook, H., and I. Kouvelas, 763 "Protocol Independent Multicast - Sparse Mode (PIM-SM): 764 Protocol Specification (Revised)", RFC 4601, August 2006. 766 [RFC4895] Tuexen, M., Stewart, R., Lei, P., and E. Rescorla, 767 "Authenticated Chunks for the Stream Control Transmission 768 Protocol (SCTP)", RFC 4895, August 2007. 770 [RFC4960] Stewart, R., "Stream Control Transmission Protocol", 771 RFC 4960, September 2007. 773 13.2. Informative References 775 [AFI] IANA, "Address Family Indicators (AFIs)", ADDRESS FAMILY 776 NUMBERS http://www.iana.org/numbers.html, February 2007. 778 [HELLO-OPT] 779 IANA, "PIM Hello Options", PIM-HELLO-OPTIONS per 780 RFC4601 http://www.iana.org/assignments/pim-hello-options, 781 March 2007. 783 [MCAST-VPN] 784 Rosen and Aggarwal, "Multicast in MPLS/BGP VPNs", Internet 785 Draft draft-ietf-l3vpn-2547bis-mcast-05.txt, July 2007. 787 Authors' Addresses 789 Dino Farinacci 790 cisco Systems 791 Tasman Drive 792 San Jose, CA 95134 793 USA 795 Email: dino@cisco.com 797 IJsbrand Wijnands 798 cisco Systems 799 Tasman Drive 800 San Jose, CA 95134 801 USA 803 Email: ice@cisco.com 805 Apoorva Karan 806 cisco Systems 807 170 Tasman Drive 808 San Jose, CA 809 USA 811 Email: apoorva@cisco.com 813 Arjen Boers 814 cisco Systems 815 Tasman Drive 816 San Jose, CA 95134 817 USA 819 Email: aboers@cisco.com 821 Maria Napierala 822 AT&T Labs 823 200 Laurel Drive 824 Middletown, New Jersey 07748> 825 USA 827 Email: mnapierala@att.com 829 Full Copyright Statement 831 Copyright (C) The IETF Trust (2008). 833 This document is subject to the rights, licenses and restrictions 834 contained in BCP 78, and except as set forth therein, the authors 835 retain all their rights. 837 This document and the information contained herein are provided on an 838 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 839 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 840 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS 841 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 842 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 843 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 845 Intellectual Property 847 The IETF takes no position regarding the validity or scope of any 848 Intellectual Property Rights or other rights that might be claimed to 849 pertain to the implementation or use of the technology described in 850 this document or the extent to which any license under such rights 851 might or might not be available; nor does it represent that it has 852 made any independent effort to identify any such rights. Information 853 on the procedures with respect to rights in RFC documents can be 854 found in BCP 78 and BCP 79. 856 Copies of IPR disclosures made to the IETF Secretariat and any 857 assurances of licenses to be made available, or the result of an 858 attempt made to obtain a general license or permission for the use of 859 such proprietary rights by implementers or users of this 860 specification can be obtained from the IETF on-line IPR repository at 861 http://www.ietf.org/ipr. 863 The IETF invites any interested party to bring to its attention any 864 copyrights, patents or patent applications, or other proprietary 865 rights that may cover technology that may be required to implement 866 this standard. Please address the information to the IETF at 867 ietf-ipr@ietf.org. 869 Acknowledgment 871 Funding for the RFC Editor function is provided by the IETF 872 Administrative Support Activity (IASA).