idnits 2.17.1 draft-ietf-bgmp-spec-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity. ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There is 1 instance of too long lines in the document, the longest one being 3 characters in excess of 72. ** The abstract seems to contain references ([MASC]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == Line 307 has weird spacing: '...t build sourc...' == Line 461 has weird spacing: '... target list ...' == Line 1559 has weird spacing: '...is less than...' == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords -- however, there's a paragraph with a matching beginning. Boilerplate error? (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'BR32' is mentioned on line 227, but not defined == Missing Reference: 'BR41' is mentioned on line 227, but not defined == Missing Reference: 'BR31' is mentioned on line 229, but not defined == Missing Reference: 'BR42' is mentioned on line 229, but not defined == Missing Reference: 'BR43' is mentioned on line 229, but not defined == Missing Reference: 'BR22' is mentioned on line 231, but not defined == Missing Reference: 'BR52' is mentioned on line 231, but not defined == Missing Reference: 'BR53' is mentioned on line 231, but not defined == Missing Reference: 'BR21' is mentioned on line 233, but not defined == Missing Reference: 'BR51' is mentioned on line 233, but not defined == Missing Reference: 'BR12' is mentioned on line 235, but not defined == Missing Reference: 'BR61' is mentioned on line 235, but not defined == Missing Reference: 'BR13' is mentioned on line 237, but not defined == Missing Reference: 'BR71' is mentioned on line 241, but not defined == Missing Reference: 'BR81' is mentioned on line 241, but not defined == Missing Reference: 'BRXY' is mentioned on line 245, but not defined == Missing Reference: 'HPIM' is mentioned on line 265, but not defined == Missing Reference: 'PIM-SM' is mentioned on line 265, but not defined == Unused Reference: 'DVMRP' is defined on line 2015, but no explicit reference was found in the text == Unused Reference: 'MOSPF' is defined on line 2047, but no explicit reference was found in the text == Unused Reference: 'PIMDM' is defined on line 2051, but no explicit reference was found in the text ** Obsolete normative reference: RFC 1771 (ref. 'BGP') (Obsoleted by RFC 4271) ** Obsolete normative reference: RFC 2283 (ref. 'MBGP') (Obsoleted by RFC 2858) ** Downref: Normative reference to an Historic RFC: RFC 2189 (ref. 'CBT') == Outdated reference: A later version (-11) exists of draft-ietf-idmr-dvmrp-v3-09 ** Downref: Normative reference to an Informational draft: draft-ietf-idmr-dvmrp-v3 (ref. 'DVMRP') == Outdated reference: A later version (-05) exists of draft-ietf-idmr-membership-reports-04 -- Possible downref: Normative reference to a draft: ref. 'DWR' ** Downref: Normative reference to an Informational RFC: RFC 2715 (ref. 'INTEROP') -- Possible downref: Non-RFC (?) normative reference: ref. 'IPSEC' ** Obsolete normative reference: RFC 2373 (ref. 'IPv6AA') (Obsoleted by RFC 3513) == Outdated reference: A later version (-03) exists of draft-ietf-mboned-imrp-some-issues-02 -- Possible downref: Normative reference to a draft: ref. 'ISSUES' == Outdated reference: A later version (-06) exists of draft-ietf-malloc-masc-05 ** Downref: Normative reference to an Historic draft: draft-ietf-malloc-masc (ref. 'MASC') ** Downref: Normative reference to an Historic RFC: RFC 1584 (ref. 'MOSPF') == Outdated reference: A later version (-08) exists of draft-ietf-idmr-pim-dm-spec-05 -- Possible downref: Normative reference to a draft: ref. 'PIMDM' ** Obsolete normative reference: RFC 2362 (ref. 'PIMSM') (Obsoleted by RFC 4601, RFC 5059) ** Obsolete normative reference: RFC 1966 (ref. 'REFLECT') (Obsoleted by RFC 4456) ** Obsolete normative reference: RFC 1700 (Obsoleted by RFC 3232) == Outdated reference: A later version (-02) exists of draft-ietf-ipngwg-uni-based-mcast-00 Summary: 19 errors (**), 0 flaws (~~), 33 warnings (==), 6 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 BGMP Working Group D. Thaler 3 Internet Engineering Task Force Microsoft 4 INTERNET-DRAFT D. Estrin 5 November 22, 2000 USC/ISI 6 Expires May 2001 D. Meyer 7 Cisco 8 Editors 10 Border Gateway Multicast Protocol (BGMP): 11 Protocol Specification 12 14 Status of this Memo 16 This document is an Internet-Draft and is in full conformance with all 17 provisions of Section 10 of RFC2026. 19 Internet-Drafts are working documents of the Internet Engineering Task 20 Force (IETF), its areas, and its working groups. Note that other groups 21 may also distribute working documents as Internet-Drafts. 23 Internet Drafts are valid for a maximum of six months and may be 24 updated, replaced, or obsoleted by other documents at any time. It is 25 inappropriate to use Internet Drafts as reference material or to cite 26 them other than as a "work in progress". 28 To view the list Internet-Draft Shadow Directories, see 29 http://www.ietf.org/shadow.html. 31 Abstract 33 This document describes BGMP, a protocol for inter-domain multicast 34 routing. BGMP builds shared trees for active multicast groups, and 35 allows receiver domains to build source-specific, inter-domain, 36 distribution branches where needed. Building upon concepts from CBT and 37 PIM-SM, BGMP requires that each multicast group be associated with a 38 single root (in BGMP it is referred to as the root domain). BGMP 39 assumes that at any point in time, different ranges of the class D space 40 Draft BGMP November 2000 42 are associated (e.g., with MASC [MASC]) with different domains. Each of 43 these domains then becomes the root of the shared domain-trees for all 44 groups in its range. Multicast participants will generally receive 45 better multicast service if the session initiator's address allocator 46 selects addresses from its own domain's part of the space, thereby 47 causing the root domain to be local to at least one of the session 48 participants. 50 Copyright Notice 52 Copyright (C) The Internet Society (2000). All Rights Reserved. 54 1. Acknowledgements 56 In addition to the editors, the following individuals have 57 contributed to the design of BGMP: Cengiz Alaettinoglu, Tony 58 Ballardie, Steve Casner, Steve Deering, Dino Farinacci, Bill Fenner, 59 Mark Handley, Ahmed Helmy, Van Jacobson, and Satish Kumar. 61 This document is the product of the IETF BGMP Working Group with Dave 62 Thaler, Deborah Estrin, and David Meyer as editors. 64 Rusty Eddy, Isidor Kouvelas, and Pavlin Radoslavov also provided 65 valuable feedback on this document. 67 2. Purpose 69 It has been suggested that inter-domain multicast is better supported 70 with a rendezvous mechanism whereby members receive source's data 71 packets without any sort of global broadcast (e.g., DVMRP and PIM-DM 72 broadcast initial data packets and MOSPF broadcasts membership 73 information). CBT [CBT] and PIM-SM [PIMSM] use a shared group-tree, 74 to which all members join and thereby hear from all sources (and to 75 which non-members do not join and thereby hear from no sources). 77 This document describes BGMP, a protocol for inter-domain multicast 78 routing. BGMP builds shared trees for active multicast groups, and 79 allows domains to build source-specific, inter-domain, distribution 80 branches where needed. Building upon concepts from CBT and PIM-SM, 81 BGMP requires that each global multicast group be associated with a 82 single root. However, in BGMP, the root is an entire exchange or 83 domain, rather than a single router. 85 Draft BGMP November 2000 87 For non-source-specific groups, BGMP assumes that ranges of the class 88 D space have been associated (e.g., with MASC [MASC]) with selected 89 domains. Each such domain then becomes the root of the shared 90 domain-trees for all groups in its range. An address allocator will 91 generally achieve better distribution trees if it takes its multicast 92 addresses from its own domain's part of the space, thereby causing 93 the root domain to be local. 95 BGMP uses TCP as its transport protocol. This eliminates the need to 96 implement message fragmentation, retransmission, acknowledgement, and 97 sequencing. BGMP uses TCP port 264 for establishing its connections. 98 This port is distinct from BGP's port to provide protocol 99 independence, and to facilitate distinguishing between protocol 100 packets (e.g., by packet classifiers, diagnostic utilities, etc.) 102 Two BGMP peers form a TCP connection between one another, and 103 exchange messages to open and confirm the connection parameters. 104 They then send incremental Join/Prune Updates as group memberships 105 change. BGMP does not require periodic refresh of individual 106 entries. KeepAlive messages are sent periodically to ensure the 107 liveness of the connection. Notification messages are sent in 108 response to errors or special conditions. If a connection encounters 109 an error condition, a notification message is sent and the connection 110 is closed if the error is a fatal one. 112 3. Terminology 114 This document uses the following technical terms: 116 Domain: 117 A set of one or more contiguous links and zero or more routers 118 surrounded by one or more multicast border routers. Note that this 119 loose definition of domain also applies to an external link between 120 two domains, as well as an exchange. 122 Root Domain: 123 When constructing a shared tree of domains for some group, one 124 domain will be the "root" of the tree. The root domain receives 125 data from each sender to the group, and functions as a rendezvous 126 domain toward which member domains can send inter-domain joins, and 127 to which sender domains can send data. 129 Draft BGMP November 2000 131 Multicast RIB: 132 The Routing Information Base, or routing table, used to calculate 133 the "next-hop" towards a particular address for multicast traffic. 135 Multicast IGP (M-IGP): 136 A generic term for any multicast routing protocol used for tree 137 construction within a domain. Typical examples of M-IGPs are: 138 DVMRP, PIM-DM, PIM-SM, CBT, and MOSPF. 140 EGP: A generic term for the interdomain unicast routing protocol in use. 141 Typically, this will be some version of BGP which can support a 142 Multicast RIB, such as MBGP [MBGP], containing both unicast and 143 multicast address prefixes. 145 Component: 146 The portion of a border router associated with (and logically 147 inside) a particular domain that runs the multicast IGP (M-IGP) for 148 that domain, if any. Each border router thus has zero or more 149 components inside routing domains. In addition, each border router 150 with external links that do not fall inside any routing domain will 151 have an inter-domain component that runs BGMP. 153 External peer: 154 A border router in another multicast AS (autonomous system, as used 155 in BGP), to which a BGMP TCP-connection is open. Assuming MBGP is 156 being used, a separate "eBGP" TCP-connection will also be open to 157 the same peer. 159 Internal peer: 160 Another border router of the same multicast AS. A border router 161 either speaks iBGP ("internal" BGP) directly to internal peers in a 162 full mesh, or indirectly through a route reflector [REFLECT]. 164 Next-hop peer: 165 The next-hop peer towards a given IP address is the next EGP router 166 on the path to the given address, according to multicast RIB routes 167 in the EGP's routing table (e.g., in MBGP, routes whose Subsequent 168 Address Family Identifier field indicates that the route is valid 169 for multicast traffic). 171 target: 172 Either an EGP peer, or an M-IGP component. 174 Tree State Table: 175 This is a table of (S-prefix,G-prefix) entries (including (*,G- 177 Draft BGMP November 2000 179 prefix) entries) that have been explicitly joined by a set of 180 targets. Each entry has, in addition to the source and group 181 addresses and masks, a list of targets that have explicitly 182 requested data (on behalf of directly connected hosts or downstream 183 routers). (S,G) entries also have an "SPT" bit. 185 The key words "MUST", "MUST NOT", "SHOULD", "SHOULD NOT", and "MAY" in 186 this document are to be interpreted as described in [RFC2119]. 188 4. Protocol Overview 190 BGMP maintains group-prefix state in response to messages from BGMP 191 peers and notifications from M-IGP components. Group-shared trees 192 are rooted at the domain advertising the group prefix covering those 193 groups. When a receiver joins a specific group address, the border 194 router towards the root domain generates a group-specific Join 195 message, which is then forwarded Border-Router-by-Border-Router 196 towards the root domain (see Figure 1). BGMP Join and Prune messages 197 are sent over TCP connections between BGMP peers, and BGMP protocol 198 state is refreshed by KEEPALIVE messages periodically sent over TCP. 200 BGMP routers build group-specific bidirectional forwarding state as 201 they process the BGMP Join messages. Bidirectional forwarding state 202 means that packets received from any target are forwarded to all 203 other targets in the target list without any RPF checks. No group- 204 specific state or traffic exists in parts of the network where there 205 are no members of that group. 207 BGMP routers build source-specific unidirectional forwarding state, 208 only where needed, to be compatible with source-specific trees (SPTs) 209 used by some M-IGPs (e.g., DVMRP, PIM-DM, or PIM-SM), or to construct 210 trees for source-specific groups. A domain that uses an SPT-based M- 211 IGP may need to inject multicast packets from external sources via 212 different border routers (to be compatible with the M-IGP RPF checks) 213 which thus act as "surrogates". For example, in the Transit_1 214 domain, data from Src_A arrives at BR12, but must be injected by 215 BR11. A surrogate router may create a source-specific BGMP branch if 216 no shared tree state exists. Note: stub domains with a single border 217 router, such as Rcvr_Stub_7 in Figure 1, receive all multicast data 218 packets through that router, to which all RPF checks point. 219 Therefore, stub domains never build source-specific state. 221 Root_Domain 222 [BR91]--------------------------\ 224 Draft BGMP November 2000 226 | | 227 [BR32] [BR41] 228 Transit_3 Transit_4 229 [BR31] [BR42] [BR43] 230 | | | 231 [BR22] [BR52] [BR53] 232 Transit_2 Transit_5 233 [BR21] [BR51] 234 | | 235 [BR12] [BR61] 236 Transit_1[BR11]----------[BR62]Stub_6 237 [BR13] (Src_A) 238 | (Rcvr_D) 239 ------------------- 240 | | 241 [BR71] [BR81] 242 Rcvr_Stub_7 Src_only_Stub_8 243 (Rcvr_C) (Src_B) 245 Figure 1: Example inter-domain topology. [BRXY] represents a BGMP 246 border 247 router. Transit_X is a transit domain network. *_Stub_X is a stub 248 domain network. 250 Data packets are forwarded based on a combination of BGMP and M-IGP 251 rules. The router forwards to a set of targets according to a 252 matching (S,G) BGMP tree state entry if it exists. If not found, the 253 router checks for a matching (*,G) BGMP tree state entry. If neither 254 is found, then the packet is sent natively to the next-hop EGP peer 255 for G, according to the Multicast RIB (for example, in the case of a 256 non-member sender such as Src_B in Figure 1). If a matching entry 257 was found, the packet is forwarded to all other targets in the target 258 list. In this way BGMP trees forward data in a bidirectional manner. 259 If a target is an M-IGP component then forwarding is subject to the 260 rules of that M-IGP protocol. 262 4.1. Design Rationale 264 Several other protocols, or protocol proposals, build shared trees 265 within domains [CBT, HPIM, PIM-SM]. The design choices made for BGMP 266 result from our focus on Inter-Domain multicast in particular. The 267 design choices made by CBT and PIM-SM are better suited to the wide- 268 area intra-domain case. There are three major differences between 270 Draft BGMP November 2000 272 BGMP and other shared-tree protocols: 274 (1) Unidirectional vs. Bidirectional trees 276 Bidirectional trees (using bidirectional forwarding state as 277 described above) minimize third party dependence which is essential 278 in the inter-domain context. For example, in Figure 1, stub domains 279 7 and 8 would like to exchange multicast packets without being 280 dependent on the quality of connectivity of the root domain. 281 However, unidirectional shared trees (i.e., those using RPF checks) 282 have more aggressive loop prevention and share the same processing 283 rules as source-specific entries which are inherently unidirectional. 285 The lack of third party dependence concerns in the INTRA domain case 286 reduces the incentive to employ bidirectional trees. BGMP supports 287 bidirectional trees because it has to, and because it can without 288 excessive cost. 290 (2) Source-specific distribution trees/branches 292 In a departure from other shared tree protocols, source-specific BGMP 293 state is built ONLY where (a) it is needed to pull the multicast 294 traffic down to a BGMP router that has source-specific (S,G) state, 295 and (b) that router is NOT already on the shared tree (i.e., has no 296 (*,G) state), and (c) that router does not want to receive packets 297 via encapsulation from a router which is on the shared tree. BGMP 298 provides source-specific branches because most M-IGP protocols in use 299 today build source-specific trees. BGMP's source-specific branches 300 eliminate the unnecessary overhead of encapsulations for high data 301 rate sources from the shared tree's ingress router to the surrogate 302 injector (e.g. from BR12 to BR11 in Figure 1). Moreover, cases in 303 which shared paths are significantly longer than SPT paths will also 304 benefit. 306 However, except for source-specific group distribution trees, we do 307 not build source-specific inter-domain trees in general because (a) 308 inter-domain connectivity is generally less rich than intra-domain 309 connectivity, so shared distribution trees should have more 310 acceptible path length and traffic concentration properties in the 311 inter-domain context, than in the intra-domain case, and (b) by 312 having the shared tree state always take precedence over source- 313 specific tree state, we avoid ambiguities that can otherwise arise. 315 In summary, BGMP trees are, in a sense, a hybrid between CBT and PIM- 316 SM trees. 318 Draft BGMP November 2000 320 (3) Method of choosing root of group shared tree 322 The choice of a group's shared-tree-root has implications for 323 performance and policy. In the intra-domain case it can be assumed 324 that all potential shared-tree roots (RPs/Cores) within the domain 325 are equally suited to be the root for a group that is initiated 326 within that domain. In the INTER-domain case, there is far more 327 opportunity for unacceptably poor locality, and administrative 328 control of a group's shared-tree root. Therefore in the intra-domain 329 case, other protocols treat all candidate roots (RPs or Cores) as 330 equivalent and emphasize load sharing and stability to maximize 331 performance. In the Inter-Domain case, all roots are not equivalent, 332 and we adopt an approach whereby a group's root domain is not random 333 but is subject to administrative and performance input. 335 5. Protocol Details 337 In this section, we describe the detailed protocol that border 338 routers perform. We assume that each border router conforms to the 339 component-based model described in [INTEROP]. 341 5.1. Interaction with the EGP 343 The fundamental requirements imposed by BGMP are that: 345 (1) For a given source-specific group and source, BGMP must be able 346 to look up the next-hop towards the source in the Multicast RIB, 347 and 349 (2) For a given non-source-specific group, BGMP will map the group 350 address to a nominal "root" address, and must be able to look up 351 the next-hop towards that address in the Multicast RIB. 353 BGMP determines the nominal "root" address as follows. In IPv4, the 354 nominal root address is the group address itself. As a result, a 355 fundamental requirement for IPv4 imposed by BGMP on the design of an EGP 356 is that it be able to carry multicast prefixes. For example, a multi- 357 protocol BGP (MBGP) must be able to carry a multicast prefix in the 358 Unicast Network Layer Reachability Information (NLRI) field of the 359 UPDATE message (i.e., either an IPv4 class D prefix or an IPv6 prefix 360 with high-order octet equal to FF [IPv6AA]). This capability is required 361 by BGMP in the implementation of bi-directional trees; BGMP must be able 362 to forward data and control packets to the next hop towards either a 363 Draft BGMP November 2000 365 unicast source S or a multicast group G (see section 5.2). It is also 366 required that the path attributes defined in [BGP] have the same 367 semantics whether they are accompany unicast or multicast NLRI. 369 MBGP [MBGP] satisfies the requirement described above. MBGP defines the 370 optional transitive attributes Multiprotocol Reachable NLRI 371 (MP_REACH_NLRI) and Multiprotocol Unreachable (MP_UNREACH_NRLI) to carry 372 sets of reachable or unreachable destinations, and the appropriate next 373 hop in the case of MP_REACH_NLRI. These attributes contain an Address 374 Family Information field [RFC1700] which indicates the type of NLRI 375 carried in the attribute. In addition, the attribute carries another 376 field, the Subsequent Address Family Identifier, or SAFI, which can be 377 used to provide additional information about the type of NLRI. For 378 example, SAFI value two indicates that the NLRI is valid for multicast 379 forwarding. BGMP's requirement can be satisfied by allowing the NLRI 380 field of the MP_REACH_NLRI (or MP_UNREACH_NLRI) to carry a multicast 381 prefix in the Prefix field of the NLRI encoding. 383 Finally, while not required for correct BGMP operation, the design of an 384 EGP should also provide a mechanism that allows discrimination between 385 NLRI that is to be used for unicast forwarding and NLRI to be used for 386 multicast forwarding. This property is required to support multicast- 387 specific policy. As mentioned above, MBGP has this capability. 389 For IPv6, this requirement can be avoided if unicast prefix-based group 390 addresses [V6PREFIX] are used. In this case, the nominal root address 391 is the subnet routers anycast address, using the prefix embedded in the 392 group address. Since this is a unicast address, no additional routes 393 are needed in the EGP for IPv6. 395 For example, if the IPv6 group address is 396 ff2e:0100:1234:5678:9abc:def0::123, then the unicast prefix is 397 1234:5678:9abc:def0/64, and the nominal root address would be 398 1234:5678:9abc:def0::. 400 To handle non-prefix-based group addresses, multicast routes would still 401 be needed. However, supporting such addresses may not be a requirement. 403 5.2. Multicast Data Packet Processing 405 For BGMP rules to be applied, an incoming packet must first be 406 "accepted": 408 Draft BGMP November 2000 410 o If the packet arrived on an interface owned by an M-IGP, the M-IGP 411 component determines whether the packet should be accepted or 412 dropped according to its rules. If the packet is accepted, the 413 packet is forwarded (or not forwarded) out any other interfaces 414 owned by the same component, as specified by the M-IGP. 416 o If the packet was received over a point-to-point interface owned 417 by BGMP, the packet is accepted. 419 o If the packet arrived on a multiaccess network interface owned by 420 BGMP, the packet is accepted if it is receiving data on a source- 421 specific branch, if it is the designated forwarder for the longest 422 matching route for S, or for the longest matching route for the 423 nominal root of G. 425 If the packet is accepted, then the router checks the tree state 426 table for a matching (S,G) entry. If one is found, but the packet 427 was not received from the next hop target towards S (if the entry's 428 SPT bit is True), or was not received from the next hop target 429 towards G (if the entry's SPT bit is False) then the packet is 430 dropped and no further actions are taken. If no (S,G) entry was 431 found, the router then checks for a matching (*,G) entry. 433 If neither is found, then the packet is forwarded towards the next- 434 hop peer for the nominal root of G, according to the Multicast RIB. 435 If a matching entry was found, the packet is forwarded to all other 436 targets in the target list. 438 Forwarding to a target which is an M-IGP component means that the 439 packet is forwarded out any interfaces owned by that component 440 according to that component's multicast forwarding rules. 442 5.3. BGMP processing of Join and Prune messages and notifications 444 5.3.1. Receiving Joins 446 When the BGMP component receives a (*,G) or (S,G) Join alert from 447 another component, or a BGMP (S,G) or (*,G) Join message from an 448 external peer, it searches the tree state table for a matching entry. 449 If an entry is found, and that peer is already listed in the target 450 list, then no further actions are taken. 452 Otherwise, if no (*,G) or (S,G) entry was found, one is created. In 453 the case of a (*,G), the target list is initialized to contain the 455 Draft BGMP November 2000 457 next-hop peer towards the nominal root of G, if it is an external 458 peer. If the peer is internal, the target list is initialized to 459 contain the M-IGP component owning the next-hop interface. If there 460 is no next-hop peer (because the nominal root of G is inside the 461 domain), then the target list is initialized to contain the next-hop 462 component. If an (S,G) entry exists for the same G for which the 463 (*,G) Join is being processed, and the next-hop peers toward S and 464 the nominal root of G are different, the BGMP router must first send 465 a (S,G) Prune message toward the source and clear the SPT bit on the 466 (S,G) entry, before activating the (*,G) entry. 468 When creating (S,G) state, if the source is internal to the BGMP 469 speaker's domain, a "Poison-Reverse" bit (PR-bit) is set. This bit 470 indicates that the router may receive packets matching (S,G) anyway 471 due to the BGMP speaker being a member of a domain on the path 472 between S and the root domain. (Depending on the M-IGP protocol, it 473 may in fact receive such packets anyway only if it is the best exit 474 for the nominal root of G.) 476 The target from which the Join was received is then added to the 477 target list. The router then looks up S or the nominal root of G in 478 the Multicast RIB to find the next-hop EGP peer. If the target list, 479 not including the next-hop target towards G for a (*,G) entry, 480 becomes non-null as a result, the next-hop EGP peer must be notified 481 as follows: 483 a) If the next-hop peer towards the nominal root of G (for a (*,G) 484 entry) is an external peer, a BGMP (*,G) Join message is unicast 485 to the external peer. If the next-hop peer towards S (for an 486 (S,G) entry) is an external peer, and the router does NOT have any 487 active (*,G) state for that group address G, a BGMP (S,G) Join 488 message is unicast to the external peer. A BGMP (S,G) Join 489 message is never sent to an external peer by a router that also 490 contains active (*,G) state for the same group. If the next-hop 491 peer towards S (for an (S,G entry) is an external peer and the 492 router DOES have active (*,G) state for that group G, the SPT bit 493 is always set to False. 495 b) If the next-hop peer is an internal peer, a (*,G) or (S,G) Join 496 alert is sent to the M-IGP component owning the next-hop 497 interface. 499 c) If there is no next-hop peer, a (*,G) or (S,G) Join alert is sent 500 to the M-IGP component owning the next-hop interface. 502 Draft BGMP November 2000 504 Finally, if an (S,G) Join is received from an internal peer, the 505 peer should be stored with the M-IGP component target. If (S,G) 506 state exists with the PR-bit set, and the next-hop towards the 507 nominal root for G is through the M-IGP component, an (S,G) 508 Poison-Reverse message is immediately sent to the internal peer. 510 If an (S,G) Join is received from an external peer, and (S,G) 511 state exists with the PR-bit set, and the local BGMP speaker is 512 the best exit for the nominal root of G, and the next-hop towards 513 the nominal root for G is through the interface towards the 514 external peer, an (S,G) Poison-Reverse message is immediately sent 515 to the external peer. 517 5.3.2. Receiving Prune Notifications 519 When the BGMP component receives a (*,G) or (S,G) Prune alert from 520 another component, or a BGMP (*,G) or (S,G) Prune message from an 521 external peer, it searches the tree state table for a matching entry. 522 If no (S,G) entry was found for an (S,G) Prune, but (*,G) state 523 exists, an (S,G) entry is created, with the target list copied from 524 the (*,G) entry. If no matching entry exists, or if the component or 525 peer is not listed in the target list, no further actions are taken. 527 Otherwise, the component or peer is removed from the target list. If 528 the target list becomes null as a result, the next-hop peer towards 529 the nominal root of G (for a (*,G) entry), or towards S (for an (S,G) 530 entry if and only if the BGMP router does NOT have any corresponding 531 (*,G) entry), must be notified as follows. 533 a) If the peer is an external peer, a BGMP (*,G) or (S,G) Prune 534 message is unicast to it. 536 b) If the next-hop peer is an internal peer, a (*,G) or (S,G) Prune 537 alert is sent to the M-IGP component owning the next-hop 538 interface. 540 c) If there is no next-hop peer, a (*,G) or (S,G) Prune alert is sent 541 to the M-IGP component owning the next-hop interface. 543 Draft BGMP November 2000 545 5.3.3. Receiving Route Change Notifications 547 When a border router receives a route for a new prefix in the 548 multicast RIB, or a existing route for a prefix is withdrawn, a route 549 change notification for that prefix must be sent to the BGMP 550 component. In addition, when the next hop peer (according to the 551 multicast RIB) changes, a route change notification for that prefix 552 must be sent to the BGMP component. 554 In addition, in IPv4 (only), an internal route for each class-D 555 prefix associated with the domain (if any) MUST be injected into the 556 multicast RIB in the EGP by the domain's border routers. 558 When a route for a new group prefix is learned, or an existing route 559 for a group prefix is withdrawn, or the next-hop peer for a group 560 prefix changes, a BGMP router updates all affected (*,G) target 561 lists. The router sends a (*,G) Join to the new next-hop target, and 562 a (*,G) Prune to the old next-hop target, as appropriate. In 563 addition, if any (S,G) state exists with the PR-bit set: 565 o If the BGMP speaker has just become the best exit for the nominal 566 root of G, an (S,G) Poison Reverse message with the PR-bit set is 567 sent as noted below. 569 o If the BGMP speaker was the best exit for the nominal root of G 570 and is no longer, an (S,G) Poison Reverse message with the PR-bit 571 clear is sent as noted below. 572 The (S,G) Poison-Reverse messages are sent to all external peers on 573 the next-hop interface towards the nominal root of G from which (S,G) 574 Joins have been received. 576 When an existing route for a source prefix is withdrawn, or the next- 577 hop peer for a source prefix changes, a BGMP router updates all 578 affected (S,G) target lists. The router sends a (S,G) Join to the 579 new next-hop target, and a (S,G) Prune to the old next-hop target, as 580 appropriate. 582 5.3.4. Receiving (S,G) Poison-Reverse messages 584 When a BGMP speaker receives an (S,G) Poison-Reverse message from a 585 peer, it sets the PR-bit on the (S,G) state to match the PR-bit in 586 the message, and looks up the next-hop towards the nominal root of G. 587 If the next-hop target is an M-IGP component, it forwards the (S,G) 588 Poison Reverse message to all internal peers of that component from 590 Draft BGMP November 2000 592 which it has received (S,G) Joins. If the next-hop target is an 593 external peer on a given interface, it forwards the (S,G) Poison 594 Reverse message to all external peers on that interface. 596 When a BGMP speaker receives an (S,G) Poison-Reverse message from an 597 external peer, with the PR-bit set, and the speaker has received no 598 (S,G) Joins from any other peers (e.g., only from the M-IGP, or has 599 (S,G) state due to encapsulation as described in 5.4.1), it knows 600 that its own (S,G) Join is unnecessary, and should send an (S,G) 601 Prune. 603 When a BGMP speaker receives an (S,G) Poison-Reverse message from an 604 internal peer, with the PR-bit set, and the speaker is the best exit 605 for the nominal root of G, and has (S,G) prune state, an (S,G) Join 606 message is sent to cancel the prune state and the state is deleted. 608 5.4. Interaction with M-IGP components 610 When an M-IGP component on a border router first learns that there 611 are internally-reached members for a group G (whose scope is larger 612 than that domain), a (*,G) Join alert is sent to the BGMP component. 613 Similarly, when an M-IGP component on a border router learns that 614 there are no longer internally-reached members for a group G (whose 615 scope is larger than a single domain), a (*,G) Prune alert is sent to 616 the BGMP component. 618 At any time, any M-IGP domain MAY decide to join a source-specific 619 branch for some external source S and group G. When the M-IGP 620 component in the border router that is the next-hop router for a 621 particular source S learns that a receiver wishes to receive data 622 from S on a source-specific path, an (S,G) Join alert is sent to the 623 BGMP component. When it is learned that such receivers no longer 624 exist, an (S,G) Prune alert is sent to the BGMP component. Recall 625 that the BGMP component will generate external source-specific Joins 626 only where the source-specific branch does not coincide with the 627 shared tree distribution tree for that group. 629 Finally, we will require that the border router that is the next-hop 630 internal peer for a particular address S or the nominal root of G be 631 able to forward data for a matching tree state table entry to all 632 members within the domain. This requirement has implications on 633 specific M-IGPs as follows. 635 Draft BGMP November 2000 637 5.4.1. Interaction with DVMRP and PIM-DM 639 DVMRP and PIM-DM are both "broadcast and prune" protocols in which 640 every data packet must pass an RPF check against the packet's source 641 address, or be dropped. If the border router receiving packets from 642 an external source is the only BR to inject the route for the source 643 into the domain, then there are no problems. For example, this will 644 always be true for stub domains with a single border router (see 645 Figure 1). Otherwise, the border router receiving packets externally 646 is responsible for encapsulating the data to any other border routers 647 that must inject the data into the domain for RPF checks to succeed. 649 When an intended border router injector for a source receives 650 encapsulated packets from another border router in its domain, it 651 should create source-specific (S,G) BGMP state. Note that the border 652 router may be configured to do this on a data-rate triggered basis so 653 that the state is not created for very low data-rate/intermittent 654 sources. If source-specific state is created, then its incoming 655 interface points to the virtual encapsulation interface from the 656 border router that forwarded the packet, and it has an SPT flag that 657 is initialized to be False. 659 When the (S,G) BGMP state is created, the BGMP component will in turn 660 send a BGMP (S,G) Join message to the next-hop external peer towards 661 S if there is no (*,G) state for that same group, G. The (S,G) BGMP 662 state will have the SPT bit set to False if (*,G) BGMP state is 663 present. 665 When the first data packet from S arrives from the external peer and 666 matches on the BGMP (S,G) state, and IF there is no (*,G) state, the 667 router sets the SPT flag to True, resets the incoming interface to 668 point to the external peer, and sends a BGMP (S,G) Prune message to 669 the border router that was encapsulating the packets (e.g., in Figure 670 1, BR11 sends the (Src_A,G) Prune to BR12). When the border router 671 with (*,G) state receives the prune for (S,G), it then deletes that 672 border router from its list of targets. 674 If the decapsulator receives a (S,G) Poison Reverse message with the 675 PR-bit set, it will forward it to the encapsulator (which may again 676 forward it up the shared tree according to normal BGMP rules), and 677 both will delete their BGMP (S,G) state. 679 PIM-DM and DVMRP present an additional problem, i.e., no protocol 680 mechanism exists for joining and pruning entire groups; only joins 681 and prunes for individual sources are available. We therefore 683 Draft BGMP November 2000 685 require that some form of Domain-Wide Reports (DWRs) [DWR] are 686 available within such domains. Such messages provide the ability to 687 join and prune an entire group across the domain. One simple 688 heuristic to approximate DWRs is to assume that if there are any 689 internally-reached members, then at least one of them is a sender. 690 With this heuristic, the presense of any M-IGP (S,G) state for 691 internally-reached sources can be used instead. Sending a data 692 packet to a group is then equivalent to sending a DWR for the group. 694 5.4.2. Interaction with PIM-SM 696 Protocols such as PIM-SM build unidirectional shared and source- 697 specific trees. As with DVMRP and PIM-DM, every data packet must 698 pass an RPF check against some group-specific or source-specific 699 address. 701 The fewest encapsulations/decapsulations will be done when the intra- 702 domain tree is rooted at the next-hop internal peer (which becomes 703 the RP) towards the nominal root of G, since in general that router 704 will receive the most packets from external sources. To achieve 705 this, each BGMP border router to a PIM-SM domain should send 706 Candidate-RP-Advertisements within the domain for those groups for 707 which it is the shared-domain tree ingress router. When the border 708 router that is the RP for a group G receives an external data packet, 709 it forwards the packet according to the M-IGP (i.e., PIM-SM) shared- 710 tree outgoing interface list. 712 Other border routers will receive data packets from external sources 713 that are farther down the bidirectional tree of domains. When a 714 border router that is not the RP receives an external packet for 715 which it does not have a source-specific entry, the border router 716 treats it like a local source by creating (S,G) state with a Register 717 flag set, based on normal PIM-SM rules; the Border router then 718 encapsulates the data packets in PIM-SM Registers and unicasts them 719 to the RP for the group. As explained above, the RP for the inter- 720 domain group will be one of the other border routers of the domain. 722 If a source's data rate is high enough, DRs within the PIM-SM domain 723 may switch to the shortest path tree. If the shortest path to an 724 external source is via the group's ingress router for the shared 725 tree, the new (S,G) state in the BGMP border router will not cause 726 BGMP (S,G) Joins because that border router will already have (*,G) 727 state. If however, the shortest path to an external source is via 729 Draft BGMP November 2000 731 some other border router, that border router will create (S,G) BGMP 732 state in response to the M-IGP (S,G) Join alert. In this case, 733 because there is no local (*,G) state to supress it, the border 734 router will send a BGMP (S,G) Join to the next-hop external peer 735 towards S, in order to pull the data down directly. (See BR11 in 736 Figure 1.) As in normal PIM-SM operation, those PIM-SM routers that 737 have (*,G) and (S,G) state pointing to different incoming interfaces 738 will prune that source off the shared tree. Therefore, all internal 739 interfaces may be eventually pruned off the internal shared tree. 741 After the border router sends a BGMP (S,G) Join, if its (S,G) state 742 has the PR-bit clear, a (S,G) Poison-Reverse message (with the PR-bit 743 clear) is sent to the ingress router for G. The ingress router then 744 creates (S,G) if it does not already exist, and removes the next hop 745 towards the nominal root of G from the target list. 747 If the border router later receives an (S,G) Poison-Reverse message 748 with the PR-bit set, the Poison-Reverse message is forwarded to the 749 ingress router for G. The best-exit router then creates (S,G) state 750 if it does not already exist, and puts the next hop towards the 751 nominal root of G in the target list if not already present. 753 5.4.3. Interaction with CBT 755 CBT builds bidirectional shared trees but must address two points of 756 compatibility with BGMP. First, CBT can not accommodate more than 757 one border router injecting a packet. Therefore, if a CBT domain 758 does have multiple external connections, the M-IGP components of the 759 border routers are responsible for insuring that only one of them 760 will inject data from any given source. 762 Second, CBT cannot process source-specific Joins or Prunes. Two 763 options thus exist for each CBT domain: 765 Option A: 766 The CBT component interprets a (S,G) Join alert as if it were an 767 (*,G) Join alert, as described in [INTEROP]. That is, if it is not 768 already on the core-tree for G, then it sends a CBT (*,G) JOIN- 769 REQUEST message towards the core for G. Similarly, when the CBT 770 component receives an (S,G) Prune alert, and the child interface 771 list for a group is NULL, then it sends a (*,G) QUIT_NOTIFICATION 772 towards the core for G. This option has the disadvantage of 773 pulling all data for the group G down to the CBT domain when no 774 members exist. 776 Draft BGMP November 2000 778 Option B: 779 The CBT domain does not propagate any source routes (i.e., non- 780 class D routes) to their external peers for the Multicast RIB 781 unless it is known that no other path exists to that prefix (e.g., 782 routes for prefixes internal to the domain or in a singly-homed 783 customer's domain may be propagated). This insures that source- 784 specific joins are never received unless the source's data already 785 passes through the domain on the shared tree, in which case the 786 (S,G) Join need not be propagated anyway. BGMP border routers will 787 only send source-specific Joins or Prunes to an external peer if 788 that external peer advertises source-prefixes in the EGP. If a 789 BGMP-CBT border router does receive an (S,G) Join or Prune, that 790 border router should ignore the message. 792 To minimize en/de-capsulations, CBTv2 BR's may follow the same 793 scheme as described under PIM-SM above, in which Candidate-Core 794 advertisements are sent for those groups for which it is the 795 shared-tree ingress router. 797 5.4.4. Interaction with MOSPF 799 As with CBTv2, MOSPF cannot process source-specific Joins or Prunes, 800 and the same two options are available. Therefore, an MOSPF domain 801 may either: 803 Option A: 804 send a Group-Membership-LSA for all of G in response to a (S,G) 805 Join alert, and "prematurely age" it out (when no other downstream 806 members exist) in response to an (S,G) Prune alert, OR 808 Option B: 809 not propagate any source routes (i.e., non-class D routes) to their 810 external peers for the Multicast RIB unless it is known that no 811 other path exists to that prefix (e.g., routes for prefixes 812 internal to the domain or in a singly-homed customer's domain may 813 be propagated) 815 5.5. Operation over Multi-access Networks 817 Multiaccess links require special handling to prevent duplicates. 818 The following mechanism enables BGMP to operate over multiaccess 819 links which do not run an M-IGP. This avoids broadcast-and-prune 820 behavior and does not require (S,G) state. 822 Draft BGMP November 2000 824 To elect a designated forwarder per prefix, BGMP uses a FWDR_PREF 825 message to exchange "forwarder preference" values for each prefix. 826 The peer with the highest forwarder preference becomes the designated 827 forwarder, with ties broken by lowest BGMP Identifier. The 828 designated forwarder is the router responsible for forwarding packets 829 up the tree, and is the peer to which joins will be sent. 831 When BGMP first learns that a route exists in the multicast RIB whose 832 next-hop interface is NOT the multiaccess link, the BGMP router sends 833 a BGMP FWDR_PREF message for the prefix, to all BGMP peers on the 834 LAN. The FWDR_PREF message contains a "forwarder preference value" 835 for the local router, and the same value MUST be sent to all peers on 836 the LAN. Likewise, when the prefix is no longer reachable, a 837 FWDR_PREF of 0 is sent to all peers on the LAN. 839 Whenever a BGMP router calculates the next-hop peer towards a 840 particular address, and that peer is reached over a BGMP-owned 841 multiaccess LAN, the designated forwarder is used instead. 843 When a BGMP router receives a FWDR_PREF message from a peer, it looks 844 up the matching route in its multicast RIB, and calculates the new 845 designated forwarder. If the router has tree state entries whose 846 parent target was the old forwarder, it sends Joins to the new 847 forwarder and Prunes to the old forwarder. 849 When a BGMP router which is NOT the designated forwarder receives a 850 packet on the multiaccess link, it is silently dropped. 852 Finally, this mechanism prevents duplicates where full peering exists 853 on a "logical" link. Where full peering does not exist, steps must 854 be taken (outside of BGMP) to present separate logical interfaces to 855 BGMP, each of which is a link with full peering. This might entail, 856 for example, using different link-layer address mappings, doing 857 encapsulation, or changing the physical media. 859 5.6. Interaction between (S,G) state and G-routes 861 As discussed earlier, routers with (*,G) state will not propagate (S,G) 862 joins. However, a special case occurs when (S,G) state coincides with 863 the G-route (or route towards the nominal root of G). When this occurs, 864 care must be taken so that the data will reach the root domain without 865 causing duplicates or black holes. For this reason, (S,G) state on the 866 path between the source and the root domain is annotated as being 867 "poison-reversed". A PR-bit is kept for this purpose, which is updated 868 Draft BGMP November 2000 870 by (UN)POISON_REVERSE messages. 872 6. Interaction with address allocation 874 6.1. Requirements for BGMP components 876 For IPv4 (only), each border router must be able to determine (e.g., 877 from MASC [MASC]) which class-D prefixes (if any) belong to each 878 domain in which an M-IGP component resides, so that it can inject 879 routes for them into the routing table. 881 7. Transition Strategy 883 There have been significant barriers to multicast deployment in 884 Internet backbones. While many of the problems with the current 885 DVMRP backbone (MBONE) have been documented in [ISSUES], most of 886 these problems require longer term engineering solutions. However, 887 there is much that can be done with existing technologies to enable 888 deployment and put in place an architecture that will enable a smooth 889 transition to the next generation of inter-domain multicast routing 890 protocols (i.e., BGMP). This section proposes a near-term transition 891 strategy and architecture that is designed to be simple, risk- 892 neutral, and provide a smooth, incremental transition path to BGMP. 893 In addition, the transition architecture provides for improved 894 convergence properties, some initial policy control, and the 895 opportunity for providers to run either native or tunneled multicast 896 backbones and exchanges. 898 The transition strategy proposed here is to initially use MBGP [MBGP] 899 to provide the desired convergence and policy control properties, and 900 PIM-DM for multicast data forwarding. Once this architecture is in 901 place, backbones and exchanges can incrementally transition to BGMP 902 and domains running other M-IGPs may be incorporated more fully. 904 Since the current MBone uses a broadcast-and-prune backbone running 905 DVMRP, BGMP may view the entire MBone as a single multi-homed stub 906 domain (with a new AS number). The members-are-senders heuristic can 907 then be used initially to provide membership notifications within 908 this stub domain. 910 A BGMP backbone can then be formed by designating one or more neutral 912 Draft BGMP November 2000 914 PIM-DM domains (say, exchanges) as initial BGMP backbones. Each 915 exchange is then associated with a group prefix which is injected 916 into the Multicast RIB by all MBGP/BGMP border routers on that 917 exchange. 919 Any domain which meets the following constraints may then transition 920 from a normal MBone-connected domain to one running BGMP: 922 (1) Must peer with another BGMP domain and participate in M-BGP to 923 propagate routes in the Multicast RIB. 925 (2) Must establish an internal (to the MBone AS) EGP (e.g., iBGP) 926 peer relationship with other border routers of the MBone "stub" 927 domain, as is done with unicast routing. We expect this to 928 eventually involve the use of one or more route reflectors 929 [REFLECT] inside the MBone domain. 931 (3) If the transition will partition the MBone "stub" domain, then it 932 must be insured that the MBone domain will be administratively 933 split into multiple domains, each with a different multicast AS 934 number. 936 Draft BGMP November 2000 938 7.1. Preventing transit through the MBone stub 940 We desire that two AS's which are mutually reachable through BGMP use 941 paths which do not pass through the MBone stub domain. This is 942 illustrated in Figure 2, where the MBone stub is AS 5, which is 943 multi-homed to both AS 3 and AS 4. Paths between sources and 944 destinations which have already transitioned to MBGP/BGMP should not 945 use AS 5 as transit unless no other path exists. 947 ----------------------\ /---------------------------- 948 | | 949 DVMRP /----\ | | /----\ IGP/iBGP 950 ..............| BR |+++++++++| BR |----------- 951 \----/ | E | \----/ 952 + | B | + AS 3 953 MBone + | G | + 954 + | P \-----+---------------------- 955 AS 5 iBGP + | + eBGP 956 + | /-----+---------------------- 957 + | | + 958 + | | + 959 DVMRP /----\ | | /----\ IGP/iBGP 960 ..............| BR |+++++++++| BR |----------- 961 \----/ | | \----/ 962 | | AS 4 963 | | 964 ----------------------/ \---------------------------- 966 Figure 2: Preventing Transit through MBone Stub 968 This requirement is easily solved using standard BGP policy 969 mechanisms. The MBone border routers should prefer EGP routes to 970 DVMRP routes, since DVMRP cannot tag routes as being external. Thus, 971 external routes may appear in the DVMRP routing table, but will not 972 be imported into the EGP since they will be overridden by iBGP 973 routes. 975 Other EGP routers should prefer routes whose ASpath does not contain 976 the well-known MBone AS number. This will insure that the route 977 through the MBone stub is not used unless no other path exists. For 978 safety, routes whose ASpath begins with the MBone AS should receive 979 the worst preference. 981 Draft BGMP November 2000 983 8. Message Formats 985 This section describes message formats used by BGMP. 987 Messages are sent over a reliable transport protocol connection. A 988 message is processed only after it is entirely received. The maximum 989 message size is 4096 octets. All implementations are required to 990 support this maximum message size. 992 All fields labelled "Reserved" below must be transmitted as 0, and 993 ignored upon receipt. 995 8.1. Message Header Format 997 Each message has a fixed-size (4-byte) header. There may or may not 998 be a data portion following the header, depending on the message 999 type. The layout of these fields is shown below: 1001 0 1 2 3 1002 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1003 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1004 | Length | Type | Reserved | 1005 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1007 Length: 1008 This 2-octet unsigned integer indicates the total length of the 1009 message, including the header, in octets. Thus, e.g., it allows 1010 one to locate in the transport-level stream the start of the next 1011 message. The value of the Length field must always be at least 4 1012 and no greater than 4096, and may be further constrained, depending 1013 on the message type. No "padding" of extra data after the message 1014 is allowed, so the Length field must have the smallest value 1015 required given the rest of the message. 1017 Type: 1018 This 1-octet unsigned integer indicates the type code of the 1019 message. The following type codes are defined: 1021 1 - OPEN 1022 2 - UPDATE 1023 3 - NOTIFICATION 1024 4 - KEEPALIVE 1026 Draft BGMP November 2000 1028 8.2. OPEN Message Format 1030 After a transport protocol connection is established, the first 1031 message sent by each side is an OPEN message. If the OPEN message is 1032 acceptable, a KEEPALIVE message confirming the OPEN is sent back. 1033 Once the OPEN is confirmed, UPDATE, KEEPALIVE, and NOTIFICATION 1034 messages may be exchanged. 1036 In addition to the fixed-size BGMP header, the OPEN message contains 1037 the following fields: 1039 0 1 2 3 1040 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1041 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1042 | Version | Rsvd| AddrFam | Hold Time | 1043 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1044 | BGMP Identifier (variable length) | 1045 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1046 | | 1047 + (Optional Parameters) | 1048 | | 1049 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1051 Version: 1052 This 1-octet unsigned integer indicates the protocol version number 1053 of the message. The current BGMP version number is 1. 1055 AddrFam: 1056 The IANA-assigned address family number of the BGMP Identifier. 1057 These include (among others): 1059 Number Description 1060 ------ ----------- 1061 1 IP (IP version 4) 1062 2 IPv6 (IP version 6) 1064 Hold Time: 1065 This 2-octet unsigned integer indicates the number of seconds that 1066 the sender proposes for the value of the Hold Timer. Upon receipt 1067 of an OPEN message, a BGMP speaker MUST calculate the value of the 1068 Hold Timer by using the smaller of its configured Hold Time and the 1069 Hold Time received in the OPEN message. The Hold Time MUST be 1071 Draft BGMP November 2000 1073 either zero or at least three seconds. An implementation may 1074 reject connections on the basis of the Hold Time. The calculated 1075 value indicates the maximum number of seconds that may elapse 1076 between the receipt of successive KEEPALIVE, and/or UPDATE messages 1077 by the sender. 1079 BGMP Identifier: 1080 This 4-octet (for IPv4) or 16-octet (IPv6) unsigned integer 1081 indicates the BGMP Identifier of the sender. A given BGMP speaker 1082 sets the value of its BGMP Identifier to a globally-unique value 1083 assigned to that BGMP speaker (e.g., an IPv4 address). The value 1084 of the BGMP Identifier is determined on startup and is the same for 1085 every BGMP session opened. 1087 Optional Parameters: 1088 This field may contain a list of optional parameters, where each 1089 parameter is encoded as a triplet. The combined length of all optional 1091 parameters can be derived from the Length field in the message 1092 header. 1094 0 1 1095 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 1096 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-... 1097 | Parm. Type | Parm. Length | Parameter Value (variable) 1098 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-... 1100 Parameter Type is a one octet field that unambiguously identifies 1101 individual parameters. Parameter Length is a one octet field that 1102 contains the length of the Parameter Value field in octets. 1103 Parameter Value is a variable length field that is interpreted 1104 according to the value of the Parameter Type field. 1106 This document defines the following Optional Parameters: 1108 a) Authentication Information (Parameter Type 1): 1109 This optional parameter may be used to authenticate a BGMP peer. 1110 The Parameter Value field contains a 1-octet Authentication Code 1111 followed by a variable length Authentication Data. 1113 0 1 2 3 4 5 6 7 8 1115 Draft BGMP November 2000 1117 +-+-+-+-+-+-+-+-+ 1118 | Auth. Code | 1119 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1120 | | 1121 | Authentication Data | 1122 | | 1123 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1125 Authentication Code: 1127 This 1-octet unsigned integer indicates the authentication 1128 mechanism being used. Whenever an authentication mechanism is 1129 specified for use within BGMP, three things must be included in 1130 the specification: 1132 - the value of the Authentication Code which indicates use of the 1133 mechanism, and - the form and meaning of the Authentication Data. 1135 Note that a separate authentication mechanism may be used in 1136 establishing the transport level connection. 1138 Authentication Data: 1140 The form and meaning of this field is a variable-length field 1141 depend on the Authentication Code. 1143 The minimum length of the OPEN message is 12 octets (including 1144 message header). 1146 b) Capability Information (Parameter Type 2): 1147 This is an Optional Parameter that is used by a BGMP-speaker to 1148 convey to its peer the list of capabilities supported by the 1149 speaker. The parameter contains one or more triples , where each triple is 1151 encoded as shown below: 1152 +------------------------------+ 1153 | Capability Code (1 octet) | 1154 +------------------------------+ 1155 | Capability Length (1 octet) | 1156 +------------------------------+ 1157 | Capability Value (variable) | 1158 +------------------------------+ 1159 Capability Code: 1161 Draft BGMP November 2000 1163 Capability Code is a one octet field that unambiguously identifies 1164 individual capabilities. 1166 Capability Length: 1168 Capability Length is a one octet field that contains the length of 1169 the Capability Value field in octets. 1171 Capability Value: 1173 Capability Value is a variable length field that is interpreted 1174 according to the value of the Capability Code field. 1176 A particular capability, as identified by its Capability Code, may 1177 occur more than once within the Optional Parameter. 1179 This document reserves Capability Codes 128-255 for vendor-specific 1180 applications. 1182 This document reserves value 0. 1184 Capability Codes (other than those reserved for vendor specific use) 1185 are assigned only by the IETF consensus process and IESG approval. 1187 8.3. UPDATE Message Format 1189 UPDATE messages are used to transfer Join/Prune/FwdrPref information 1190 between BGMP peers. The UPDATE message always includes the fixed- 1191 size BGMP header, and one or more attributes as described below. 1193 The message format below allows compact encoding of (*,G) Joins and 1194 Prunes, while allowing the flexibility needed to do other updates 1195 such as (S,G) Joins and Prunes towards sources as well as on the 1196 shared tree. In the discussion below, an Encoded-Address-Prefix is 1197 of the form: 1198 0 1 2 3 1199 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1200 +-+-+-+-+-+-+-+-+ 1201 |EnTyp| AddrFam | 1202 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1203 | Address (variable length) | 1204 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1205 | Mask (variable length) | 1207 Draft BGMP November 2000 1209 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1211 EnTyp: 1212 0 - All 1's Mask. The Mask field is 0 bytes long. 1213 1 - Mask length included. The Mask field is 4 bytes long, and 1214 contains the mask length, in bits. 1215 2 - Full Mask included. The Mask field is the same length 1216 as the Address field, and contains the full bitmask. 1218 AddrFam: 1219 The IANA-assigned address family number of the encoded prefix. 1221 Address: 1222 The address associated with the given prefix to be encoded. The 1223 length is determined based on the Address Family. 1225 Mask: 1226 The mask associated with the given prefix. The format (or absence) 1227 of this field is determined by the EnTyp field. 1229 Each attribute is of the form: 1231 0 1 2 3 1232 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1233 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1234 | Length | Type | Data ... 1235 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1236 All attributes are 4-byte aligned. 1238 Length: 1239 The Length is the length of the entire attribute, including the 1240 length, type, and data fields. If other attributes are nested 1241 within the data field, the length includes the size of all such 1242 nested attributes. 1244 Type: 1246 Types 128-255 are reserved for "optional" attributes. If a 1247 required attribute is unrecognized, a NOTIFICATION will be sent and 1248 the connection will be closed if the error is a fatal one. 1249 Unrecognized optional attributes are simply ignored. 1251 0 - JOIN 1253 Draft BGMP November 2000 1255 1 - PRUNE 1256 2 - GROUP 1257 3 - SOURCE 1258 4 - FWDR_PREF 1259 5 - POISON_REVERSE 1261 a) JOIN (Type Code 0) 1263 The JOIN attribute indicates that all GROUP or SOURCE options 1264 nested immediately within the JOIN option should be joined. 1266 0 1 2 3 1267 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1268 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1269 | Length | Type=0 | Reserved | 1270 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1271 | Nested Attributes ... 1272 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1273 No JOIN, PRUNE, or FWDR_PREF attributes may be immediately nested 1274 within a JOIN attribute. 1276 b) PRUNE (Type Code 1) 1278 The PRUNE attribute indicates that all GROUP or SOURCE attributes 1279 nested immediately within the PRUNE attribute should be pruned. 1281 0 1 2 3 1282 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1283 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1284 | Length | Type=1 | Reserved | 1285 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1286 | Nested Attributes ... 1287 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1288 No JOIN, PRUNE, or FWDR_PREF attributes may be immediately nested 1289 within a PRUNE attribute. 1291 c) GROUP (Type Code 2) 1293 The GROUP attribute identifies a given group-prefix. In addition, 1294 any attributes nested immediately within the GROUP attribute also 1295 apply to the given group-prefix. 1297 0 1 2 3 1298 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1299 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1301 Draft BGMP November 2000 1303 | Length | Type=2 | | 1304 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + 1305 | | 1306 | Encoded-Address-Prefix | 1307 | | 1308 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1309 | Nested Attributes (optional) ... 1310 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1311 Encoded-Address-Prefix 1312 The multicast group prefix to be joined to 1313 pruned, 1314 in the format described above. 1315 Nested Attributes No GROUP, SOURCE, or FWDR_PREF attributes may 1316 be 1317 immediately nested within a GROUP attribute. 1319 d) SOURCE (Type Code 3): 1321 The SOURCE attribute identifies a given source-prefix. In 1322 addition, any attributes nested immediately within the SOURCE 1323 attribute also apply to the given source-prefix. 1325 The SOURCE attribute has the following format: 1327 0 1 2 3 1328 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1329 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1330 | Length | Type=2 | | 1331 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + 1332 | | 1333 | Encoded-Address-Prefix | 1334 | | 1335 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1336 | Nested Attributes (optional) ... 1337 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1338 Encoded-Address-Prefix 1339 The Source-prefix in the format described 1340 above. 1341 Nested Attributes No GROUP, SOURCE, or FWDR_PREF attributes may 1342 be 1343 immediately nested within a SOURCE attribute. 1345 e) FWDR_PREF (Type Code 4) 1347 The FWDR_PREF attribute provides a forwarder preference value for 1349 Draft BGMP November 2000 1351 all GROUP or SOURCE attributes nested immediately within the 1352 FWDR_PREF attribute. It is used by a BGMP speaker to inform other 1353 BGMP speakers of the originating speaker's degree of preference for 1354 a given group or source prefix. Usage of this attribute is 1355 described in 5.5. 1357 0 1 2 3 1358 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1359 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1360 | Length | Type=1 | Reserved | 1361 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1362 | Preference Value | 1363 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1364 | Nested Attributes ... 1365 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1366 Preference Value A 32-bit non-negative integer. 1367 Nested Attributes No JOIN, PRUNE, or FWDR_PREF attributes may be 1368 immediately nested within a FWDR_PREF 1369 attribute. 1371 e) POISON_REVERSE (Type Code 5) 1373 The POISON_REVERSE attribute provides a "poison-reverse" (PR-bit) 1374 value for all SOURCE attributes nested immediately within the 1375 POISON_REVERSE attribute. It is used by a BGMP speaker to inform 1376 other BGMP speakers from which it has received (S,G) Joins that 1377 they are on the path of domains between the source and the root 1378 domain. 1380 0 1 2 3 1381 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1382 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1383 | Length | Type=1 | Reserved |P| 1384 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1385 | Nested Attributes ... 1386 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1387 P The PR-bit value. 1388 Nested Attributes No attribues in the document other than SOURCE 1389 may be immediately nested within a 1390 POISON_REVERSE 1391 attribute. 1393 Draft BGMP November 2000 1395 8.4. Encoding examples 1397 Below are enumerated examples of how various updates are built using 1398 nested attributes, where A ( B ) denotes that attribute B is nested 1399 within attribute A. 1400 (*,G-prefix) Join: JOIN ( GROUP ) 1401 (*,G-prefix) Prune: PRUNE ( GROUP ) 1402 (S,G) Join towards S : GROUP ( JOIN ( SOURCE ) ) 1403 (S,G) Join cancelling prune towards root of G: GROUP ( JOIN ( SOURCE ) ) 1404 (S,G) Prune towards S: GROUP ( PRUNE ( SOURCE ) ) 1405 (S,G) Prune towards root of G: GROUP ( PRUNE ( SOURCE ) ) 1406 Switch from (*,G) to (S,G): PRUNE ( GROUP ( JOIN ( SOURCE ) ) ) 1407 Switch from (S,G) to (*,G): JOIN ( GROUP ) 1408 Initial (*,G) Join with S pruned: JOIN ( GROUP ( PRUNE ( SOURCE ) ) ) 1409 Forwarder preference announcement for G-prefix: FWDR_PREF ( GROUP ) 1410 Forwarder preference announcement for S-prefix: FWDR_PREF ( SOURCE ) 1412 8.5. KEEPALIVE Message Format 1414 BGMP does not use any transport protocol-based keep-alive mechanism 1415 to determine if peers are reachable. Instead, KEEPALIVE messages are 1416 exchanged between peers often enough as not to cause the Hold Timer 1417 to expire. A reasonable maximum time between the last KEEPALIVE or 1418 UPDATE message sent, and the time at which a KEEPALIVE message is 1419 sent, would be one third of the Hold Time interval. KEEPALIVE 1420 messages MUST NOT be sent more frequently than one per second. An 1421 implementation MAY adjust the rate at which it sends KEEPALIVE 1422 messages as a function of the Hold Time interval. 1424 If the negotiated Hold Time interval is zero, then periodic KEEPALIVE 1425 messages MUST NOT be sent. 1427 A KEEPALIVE message consists of only a message header, and has a 1428 length of 4 octets. 1430 8.6. NOTIFICATION Message Format 1432 A NOTIFICATION message is sent when an error condition is detected. 1433 The BGMP connection is closed immediately after sending it if the 1434 error is a fatal one. 1436 In addition to the fixed-size BGMP header, the NOTIFICATION message 1437 contains the following fields: 1439 Draft BGMP November 2000 1441 0 1 2 3 1442 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1443 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1444 |O| Error code | Error subcode | Data | 1445 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + 1446 | | 1447 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1449 O-bit: 1450 Open-bit. If clear, the connection will be closed. 1451 If set, indicates the error is not fatal. 1453 Error Code: 1455 This 1-octet unsigned integer indicates the type of 1456 NOTIFICATION. The following Error Codes have been defined: 1458 Error Code Symbolic Name Reference 1460 1 Message Header Error Section 9.1 1462 2 OPEN Message Error Section 9.2 1464 3 UPDATE Message Error Section 9.3 1466 4 Hold Timer Expired Section 9.5 1468 5 Finite State Machine Error Section 9.6 1470 6 Cease Section 9.7 1472 Error subcode: 1474 This 1-octet unsigned integer provides more specific 1475 information about the nature of the reported error. Each 1476 Error 1477 Code may have one or more Error Subcodes associated with it. 1478 If no appropriate Error Subcode is defined, then a zero 1479 (Unspecific) value is used for the Error Subcode field. 1480 The notation (MC) below indicates the error is a fatal one 1481 and the O-bit must be clear. Non-fatal subcodes SHOULD 1482 be sent with the O-bit set. 1484 Message Header Error subcodes: 1486 Draft BGMP November 2000 1488 2 - Bad Message Length 1489 (MC) 1490 3 - Bad Message Type 1491 (MC) 1493 OPEN Message Error subcodes: 1495 1 - Unsupported Version 1496 (MC) 1497 4 - Unsupported Optional Parameter 1498 5 - Authentication Failure 1499 (MC) 1500 6 - Unacceptable Hold Time 1501 (MC) 1502 7 - Unsupported Capability 1503 (MC) 1505 UPDATE Message Error subcodes: 1507 1 - Malformed Attribute List 1508 (MC) 1509 2 - Unrecognized Attribute Type 1510 5 - Attribute Length Error 1511 (MC) 1512 10 - Invalid Address 1513 11 - Invalid Mask 1514 13 - Unrecognized Address Family 1515 Data: 1516 This variable-length field is used to diagnose the reason for the 1517 NOTIFICATION. The contents of the Data field depend upon the 1518 Error Code and Error Subcode. See Section 9 below for more 1519 details. 1521 Note that the length of the Data field can be determined from the 1522 message Length field by the formula: 1524 Message Length = 6 + Data Length 1526 The minimum length of the NOTIFICATION message is 6 octets 1527 (including message header). 1529 Draft BGMP November 2000 1531 9. BGMP Error Handling 1533 This section describes actions to be taken when errors are detected 1534 while processing BGMP messages. BGMP Error Handling is similar to 1535 that of BGP [BGP]. 1537 When any of the conditions described here are detected, a 1538 NOTIFICATION message with the indicated Error Code, Error Subcode, 1539 and Data fields is sent, and the BGMP connection is closed if the 1540 error is a fatal one. If no Error Subcode is specified, then a zero 1541 must be used. 1543 The phrase "the BGMP connection is closed" means that the transport 1544 protocol connection has been closed and that all resources for that 1545 BGMP connection have been deallocated. The remote peer is removed 1546 from the target list of all tree state entries. 1548 Unless specified explicitly, the Data field of the NOTIFICATION 1549 message that is sent to indicate an error is empty. 1551 9.1. Message Header error handling 1553 All errors detected while processing the Message Header are indicated 1554 by sending the NOTIFICATION message with Error Code Message Header 1555 Error. The Error Subcode elaborates on the specific nature of the 1556 error. 1558 If the Length field of the message header is less than 4 or greater 1559 than 4096, or if the Length field of an OPEN message is less than 1560 the minimum length of the OPEN message, or if the Length field of an 1561 UPDATE message is less than the minimum length of the UPDATE message, 1562 or if the Length field of a KEEPALIVE message is not equal to 4, then 1563 the Error Subcode is set to Bad Message Length. The Data field 1564 contains the erroneous Length field. 1566 If the Type field of the message header is not recognized, then the 1567 Error Subcode is set to Bad Message Type. The Data field contains 1568 the erroneous Type field. 1570 9.2. OPEN message error handling 1572 All errors detected while processing the OPEN message are indicated 1573 by sending the NOTIFICATION message with Error Code OPEN Message 1575 Draft BGMP November 2000 1577 Error. The Error Subcode elaborates on the specific nature of the 1578 error. 1580 If the version number contained in the Version field of the received 1581 OPEN message is not supported, then the Error Subcode is set to 1582 Unsupported Version Number. The Data field is a 2-octet unsigned 1583 integer, which indicates the largest locally supported version number 1584 less than the version the remote BGMP peer bid (as indicated in the 1585 received OPEN message). 1587 If the Hold Time field of the OPEN message is unacceptable, then the 1588 Error Subcode MUST be set to Unacceptable Hold Time. An 1589 implementation MUST reject Hold Time values of one or two seconds. 1590 An implementation MAY reject any proposed Hold Time. An 1591 implementation which accepts a Hold Time MUST use the negotiated 1592 value for the Hold Time. 1594 If one of the Optional Parameters in the OPEN message is not 1595 recognized, then the Error Subcode is set to Unsupported Optional 1596 Parameters. 1598 If the OPEN message carries Authentication Information (as an 1599 Optional Parameter), then the corresponding authentication procedure 1600 is invoked. If the authentication procedure (based on Authentication 1601 Code and Authentication Data) fails, then the Error Subcode is set to 1602 Authentication Failure. 1604 If the OPEN message indicates that the peer does not support a 1605 capability which the receiver requires, the receiver may send a 1606 NOTIFICATION message to the peer, and terminate peering. The Error 1607 Subcode in the message is set to Unsupported Capability. The Data 1608 field in the NOTIFICATION message lists the set of capabilities that 1609 cause the speaker to send the message. Each such capability is 1610 encoded the same way as it was encoded in the received OPEN message. 1612 9.3. UPDATE message error handling 1614 All errors detected while processing the UPDATE message are indicated 1615 by sending the NOTIFICATION message with Error Code UPDATE Message 1616 Error. The error subcode elaborates on the specific nature of the 1617 error. 1619 Draft BGMP November 2000 1621 If any recognized attribute has Attribute Length that conflicts with 1622 the expected length (based on the attribute type code), then the 1623 Error Subcode is set to Attribute Length Error. The Data field 1624 contains the erroneous attribute (type, length and value). 1626 If the Encoded-Address-Prefix field in some attribute is 1627 syntactically incorrect, then the Error Subcode is set to Invalid 1628 Prefix Field. 1630 If any other is encountered when processing attributes (such as 1631 invalid nestings), then the Error Subcode is set to Malformed 1632 Attribute List, and the problematic attribute is included in the data 1633 field. 1635 9.4. NOTIFICATION message error handling 1637 If a peer sends a NOTIFICATION message, and there is an error in that 1638 message, there is unfortunately no means of reporting this error via 1639 a subsequent NOTIFICATION message. Any such error, such as an 1640 unrecognized Error Code or Error Subcode, should be noticed, logged 1641 locally, and brought to the attention of the administration of the 1642 peer. The means to do this, however, lies outside the scope of this 1643 document. 1645 9.5. Hold Timer Expired error handling 1647 If a system does not receive successive KEEPALIVE and/or UPDATE 1648 and/or NOTIFICATION messages within the period specified in the Hold 1649 Time field of the OPEN message, then the NOTIFICATION message with 1650 Hold Timer Expired Error Code must be sent and the BGMP connection 1651 closed. 1653 9.6. Finite State Machine error handling 1655 Any error detected by the BGMP Finite State Machine (e.g., receipt of 1656 an unexpected event) is indicated by sending the NOTIFICATION message 1657 with Error Code Finite State Machine Error. 1659 Draft BGMP November 2000 1661 9.7. Cease 1663 In absence of any fatal errors (that are indicated in this section), 1664 a BGMP peer may choose at any given time to close its BGMP connection 1665 by sending the NOTIFICATION message with Error Code Cease. However, 1666 the Cease NOTIFICATION message must not be used when a fatal error 1667 indicated by this section does exist. 1669 9.8. Connection collision detection 1671 If a pair of BGMP speakers try simultaneously to establish a TCP 1672 connection to each other, then two parallel connections between this 1673 pair of speakers might well be formed. We refer to this situation as 1674 connection collision. Clearly, one of these connections must be 1675 closed. 1677 Based on the value of the BGMP Identifier a convention is established 1678 for detecting which BGMP connection is to be preserved when a 1679 collision does occur. The convention is to compare the BGMP 1680 Identifiers of the peers involved in the collision and to retain only 1681 the connection initiated by the BGMP speaker with the higher-valued 1682 BGMP Identifier. 1684 Upon receipt of an OPEN message, the local system must examine all of 1685 its connections that are in the OpenConfirm state. A BGMP speaker 1686 may also examine connections in an OpenSent state if it knows the 1687 BGMP Identifier of the peer by means outside of the protocol. If 1688 among these connections there is a connection to a remote BGMP 1689 speaker whose BGMP Identifier equals the one in the OPEN message, 1690 then the local system performs the following collision resolution 1691 procedure: 1693 1. The BGMP Identifier of the local system is compared to the BGMP 1694 Identifier of the remote system (as specified in the OPEN message). 1696 2. If the value of the local BGMP Identifier is less than the remote 1697 one, the local system closes BGMP connection that already exists (the 1698 one that is already in the OpenConfirm state), and accepts BGMP 1699 connection initiated by the remote system. 1701 3. Otherwise, the local system closes newly created BGMP connection 1702 (the one associated with the newly received OPEN message), and 1703 continues to use the existing one (the one that is already in the 1704 OpenConfirm state). 1706 Draft BGMP November 2000 1708 Comparing BGMP Identifiers is done by treating them as (4-octet long) 1709 unsigned integers. 1711 A connection collision with an existing BGMP connection that is in 1712 Established states causes unconditional closing of the newly created 1713 connection. Note that a connection collision cannot be detected with 1714 connections that are in Idle, or Connect, or Active states. 1716 Closing the BGMP connection (that results from the collision 1717 resolution procedure) is accomplished by sending the NOTIFICATION 1718 message with the Error Code Cease. 1720 10. BGMP Version Negotiation 1722 BGMP speakers may negotiate the version of the protocol by making 1723 multiple attempts to open a BGMP connection, starting with the 1724 highest version number each supports. If an open attempt fails with 1725 an Error Code OPEN Message Error, and an Error Subcode Unsupported 1726 Version Number, then the BGMP speaker has available the version 1727 number it tried, the version number its peer tried, the version 1728 number passed by its peer in the NOTIFICATION message, and the 1729 version numbers that it supports. If the two peers do support one or 1730 more common versions, then this will allow them to rapidly determine 1731 the highest common version. In order to support BGMP version 1732 negotiation, future versions of BGMP must retain the format of the 1733 OPEN and NOTIFICATION messages. 1735 10.1. BGMP Capability Negotiation 1737 When a BGMP speaker sends an OPEN message to its BGMP peer, the 1738 message may include an Optional Parameter, called Capabilities. The 1739 parameter lists the capabilities supported by the speaker. 1741 A BGMP speaker may use a particular capability when peering with 1742 another speaker only if both speakers support that capability. A 1743 BGMP speaker determines the capabilities supported by its peer by 1744 examining the list of capabilities present in the Capabilities 1745 Optional Parameter carried by the OPEN message that the speaker 1746 receives from the peer. 1748 Draft BGMP November 2000 1750 11. BGMP Finite State machine 1752 This section specifies BGMP operation in terms of a Finite State 1753 Machine (FSM). Following is a brief summary and overview of BGMP 1754 operations by state as determined by this FSM. 1756 Initially BGMP is in the Idle state. 1758 Idle state: 1760 In this state BGMP refuses all incoming BGMP connections. No 1761 resources are allocated to the peer. In response to the Start 1762 event (initiated by either system or operator) the local system 1763 initializes all BGMP resources, starts the ConnectRetry timer, 1764 initiates a transport connection to the other BGMP peer, while 1765 listening for a connection that may be initiated by the remote 1766 BGMP peer, and changes its state to Connect. The exact value of 1767 the ConnectRetry timer is a local matter, but should be 1768 sufficiently large to allow TCP initialization. 1770 If a BGMP speaker detects an error, it shuts down the connection 1771 and changes its state to Idle. Getting out of the Idle state 1772 requires generation of the Start event. If such an event is 1773 generated automatically, then persistent BGMP errors may result in 1774 persistent flapping of the speaker. To avoid such a condition it 1775 is recommended that Start events should not be generated 1776 immediately for a peer that was previously transitioned to Idle 1777 due to an error. For a peer that was previously transitioned to 1778 Idle due to an error, the time between consecutive generation of 1779 Start events, if such events are generated automatically, shall 1780 exponentially increase. The value of the initial timer shall be 60 1781 seconds. The time shall be doubled for each consecutive retry. 1783 Any other event received in the Idle state is ignored. 1785 Connect state: 1787 In this state BGMP is waiting for the transport protocol 1788 connection to be completed. 1790 If the transport protocol connection succeeds, the local system 1791 clears the ConnectRetry timer, completes initialization, sends an 1792 OPEN message to its peer, and changes its state to OpenSent. If 1793 the transport protocol connect fails (e.g., retransmission 1794 timeout), the local system restarts the ConnectRetry timer, 1796 Draft BGMP November 2000 1798 continues to listen for a connection that may be initiated by the 1799 remote BGMP peer, and changes its state to Active state. 1801 In response to the ConnectRetry timer expired event, the local 1802 system restarts the ConnectRetry timer, initiates a transport 1803 connection to the other BGMP peer, continues to listen for a 1804 connection that may be initiated by the remote BGMP peer, and 1805 stays in the Connect state. 1807 The Start event is ignored in the Connect state. 1809 In response to any other event (initiated by either system or 1810 operator), the local system releases all BGMP resources associated 1811 with this connection and changes its state to Idle. 1813 Active state: 1815 In this state BGMP is trying to acquire a peer by listening for an 1816 incoming transport protocol connection. 1818 If the transport protocol connection succeeds, the local system 1819 clears the ConnectRetry timer, completes initialization, sends an 1820 OPEN message to its peer, sets its Hold Timer to a large value, 1821 and changes its state to OpenSent. A Hold Timer value of 4 1822 minutes is suggested. 1824 In response to the ConnectRetry timer expired event, the local 1825 system restarts the ConnectRetry timer, initiates a transport 1826 connection to other BGMP peer, continues to listen for a 1827 connection that may be initiated by the remote BGMP peer, and 1828 changes its state to Connect. 1830 If the local system detects that a remote peer is trying to 1831 establish BGMP connection to it, and the IP address of the remote 1832 peer is not an expected one, the local system restarts the 1833 ConnectRetry timer, rejects the attempted connection, continues to 1834 listen for a connection that may be initiated by the remote BGMP 1835 peer, and stays in the Active state. 1837 The Start event is ignored in the Active state. 1839 In response to any other event (initiated by either system or 1840 operator), the local system releases all BGMP resources associated 1841 with this connection and changes its state to Idle. 1843 Draft BGMP November 2000 1845 OpenSent state: 1847 In this state BGMP waits for an OPEN message from its peer. When 1848 an OPEN message is received, all fields are checked for 1849 correctness. If the BGMP message header checking or OPEN message 1850 checking detects an error (see Section 6.2), or a connection 1851 collision (see Section 6.8) the local system sends a NOTIFICATION 1852 message and changes its state to Idle. 1854 If there are no errors in the OPEN message, BGMP sends a KEEPALIVE 1855 message and sets a KeepAlive timer. The Hold Timer, which was 1856 originally set to a large value (see above), is replaced with the 1857 negotiated Hold Time value (see section 4.2). If the negotiated 1858 Hold Time value is zero, then the Hold Time timer and KeepAlive 1859 timers are not started. If the value of the Autonomous System 1860 field is the same as the local Autonomous System number, then the 1861 connection is an "internal" connection; otherwise, it is 1862 "external". Finally, the state is changed to OpenConfirm. 1864 If a disconnect notification is received from the underlying 1865 transport protocol, the local system closes the BGMP connection, 1866 restarts the ConnectRetry timer, while continue listening for 1867 connection that may be initiated by the remote BGMP peer, and goes 1868 into the Active state. 1870 If the Hold Timer expires, the local system sends NOTIFICATION 1871 message with error code Hold Timer Expired and changes its state 1872 to Idle. 1874 In response to the Stop event (initiated by either system or 1875 operator) the local system sends NOTIFICATION message with Error 1876 Code Cease and changes its state to Idle. 1878 The Start event is ignored in the OpenSent state. 1880 In response to any other event the local system sends NOTIFICATION 1881 message with Error Code Finite State Machine Error and changes its 1882 state to Idle. 1884 Whenever BGMP changes its state from OpenSent to Idle, it closes 1885 the BGMP (and transport-level) connection and releases all 1886 resources associated with that connection. 1888 OpenConfirm state: 1890 Draft BGMP November 2000 1892 In this state BGMP waits for a KEEPALIVE or NOTIFICATION message. 1894 If the local system receives a KEEPALIVE message, it changes its 1895 state to Established. 1897 If the Hold Timer expires before a KEEPALIVE message is received, 1898 the local system sends NOTIFICATION message with error code Hold 1899 Timer Expired and changes its state to Idle. 1901 If the local system receives a NOTIFICATION message, it changes 1902 its state to Idle. 1904 If the KeepAlive timer expires, the local system sends a KEEPALIVE 1905 message and restarts its KeepAlive timer. 1907 If a disconnect notification is received from the underlying 1908 transport protocol, the local system changes its state to Idle. 1910 In response to the Stop event (initiated by either system or 1911 operator) the local system sends NOTIFICATION message with Error 1912 Code Cease and changes its state to Idle. 1914 The Start event is ignored in the OpenConfirm state. 1916 In response to any other event the local system sends NOTIFICATION 1917 message with Error Code Finite State Machine Error and changes its 1918 state to Idle. 1920 Whenever BGMP changes its state from OpenConfirm to Idle, it 1921 closes the BGMP (and transport-level) connection and releases all 1922 resources associated with that connection. 1924 Established state: 1926 In the Established state BGMP can exchange UPDATE, NOTIFICATION, 1927 and KEEPALIVE messages with its peer. 1929 If the local system receives an UPDATE or KEEPALIVE message, it 1930 restarts its Hold Timer, if the negotiated Hold Time value is non- 1931 zero. 1933 If the local system receives a NOTIFICATION message, it changes 1934 its state to Idle. 1936 If the local system receives an UPDATE message and the UPDATE 1938 Draft BGMP November 2000 1940 message error handling procedure (see Section 6.3) detects an 1941 error, the local system sends a NOTIFICATION message and changes 1942 its state to Idle. 1944 If a disconnect notification is received from the underlying 1945 transport protocol, the local system changes its state to Idle. 1947 If the Hold Timer expires, the local system sends a NOTIFICATION 1948 message with Error Code Hold Timer Expired and changes its state 1949 to Idle. 1951 If the KeepAlive timer expires, the local system sends a KEEPALIVE 1952 message and restarts its KeepAlive timer. 1954 Each time the local system sends a KEEPALIVE or UPDATE message, it 1955 restarts its KeepAlive timer, unless the negotiated Hold Time 1956 value is zero. 1958 In response to the Stop event (initiated by either system or 1959 operator), the local system sends a NOTIFICATION message with 1960 Error Code Cease and changes its state to Idle. 1962 The Start event is ignored in the Established state. 1964 In response to any other event, the local system sends 1965 NOTIFICATION message with Error Code Finite State Machine Error 1966 and changes its state to Idle. 1968 Whenever BGMP changes its state from Established to Idle, it 1969 closes the BGMP (and transport-level) connection, releases all 1970 resources associated with that connection, and deletes all routes 1971 derived from that connection. 1973 12. Security Considerations 1975 BGMP uses TCP sessions for all network communication between peers. TCP 1976 sessions may be secured through the use of IPsec [IPSEC]. 1978 13. Authors' Addresses 1980 Dave Thaler 1981 Department of Electrical Engineering and Computer Science 1983 Draft BGMP November 2000 1985 Microsoft 1986 One Microsoft Way 1987 Redmond, WA 98052 1988 EMail: dthaler@microsoft.com 1990 Deborah Estrin 1991 Computer Science Dept./ISI 1992 University of Southern California 1993 Los Angeles, CA 90089 1994 EMail: estrin@usc.edu 1996 David Meyer 1997 Cisco Systems 1998 San Jose, CA 1999 EMail: dmm@cisco.com 2001 14. References 2003 [BGP] 2004 Rekhter, Y., and T. Li, "A Border Gateway Protocol 4 (BGP-4)", RFC 2005 1771, March 1995. 2007 [MBGP] 2008 Bates, T., Chandra, R., Katz, D., and Y. Rekhter, "Multiprotocol 2009 Extensions for BGP-4", RFC 2283, February 1998. 2011 [CBT] 2012 Ballardie, A., "Core Based Trees (CBT version 2) Multicast 2013 Routing", RFC 2189, September 1997. 2015 [DVMRP] 2016 Pusateri, T., "Distance Vector Multicast Routing Protocol", draft- 2017 ietf-idmr-dvmrp-v3-09.txt, March 2000. 2019 [DWR] 2020 Fenner, W., "Domain-Wide Reports", draft-ietf-idmr-membership- 2021 reports-04.txt, August 1999. 2023 [INTEROP] 2024 Thaler, D., "Interoperability Rules for Multicast Routing 2025 Protocols", RFC 2715, October 1999. 2027 [IPSEC] 2028 Kent, S., and R. Atkinson, "Security Architecture for the Internet 2030 Draft BGMP November 2000 2032 Protocol", RFC 2401, November 1998. 2034 [IPv6AA] 2035 Hinden, R., and S. Deering, "IP Version 6 Addressing Architecture", 2036 RFC 2373, July 1998. 2038 [ISSUES] 2039 Meyer, D., "Some Issues for an Inter-domain Multicast Routing 2040 Protocol", draft-ietf-mboned-imrp-some-issues-02.txt, June 1997. 2042 [MASC] 2043 Estrin, D., Handley, M, and D. Thaler, "Multicast-Address-Set 2044 advertisement and Claim mechanism", draft-ietf-malloc-masc-05.txt, 2045 July 2000. 2047 [MOSPF] 2048 Moy, J., "Multicast Extensions to OSPF", RFC 1584, Proteon, March 2049 1994. 2051 [PIMDM] 2052 Estrin, et al., "Protocol Independent Multicast-Dense Mode (PIM- 2053 DM): Protocol Specification", draft-ietf-idmr-pim-dm-spec-05.txt, 2054 May 1997. 2056 [PIMSM] 2057 Estrin, et al., "Protocol Independent Multicast-Sparse Mode (PIM- 2058 SM): Protocol Specification", RFC 2362, June 1998. 2060 [REFLECT] 2061 Bates, T., and R. Chandra, "BGP Route Reflection: An alternative to 2062 full mesh IBGP", RFC 1966, June 1996. 2064 [RFC1700] 2065 S. J. Reynolds, J. Postel, "ASSIGNED NUMBERS", RFC 1700, October 2066 1994. 2068 [RFC2119] 2069 S. Bradner, "Key words for use in RFCs to Indicate Requirement 2070 Levels", BCP 14, RFC 2119, March 1997. 2072 [V6PREFIX] 2073 Haberman, B., and D. Thaler, "Unicast-Prefix-based IPv6 Multicast 2074 Addresses", draft-ietf-ipngwg-uni-based-mcast-00.txt, August 2000. 2076 Draft BGMP November 2000 2078 15. Full Copyright Statement 2080 Copyright (C) The Internet Society (2000). All Rights Reserved. 2082 This document and translations of it may be copied and furnished to 2083 others, and derivative works that comment on or otherwise explain it or 2084 assist in its implementation may be prepared, copied, published and 2085 distributed, in whole or in part, without restriction of any kind, 2086 provided that the above copyright notice and this paragraph are included 2087 on all such copies and derivative works. However, this document itself 2088 may not be modified in any way, such as by removing the copyright notice 2089 or references to the Internet Society or other Internet organizations, 2090 except as needed for the purpose of developing Internet standards in 2091 which case the procedures for copyrights defined in the Internet 2092 languages other than English. 2094 The limited permissions granted above are perpetual and will not be 2095 revoked by the Internet Society or its successors or assigns. 2097 This document and the information contained herein is provided on an "AS 2098 IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK 2099 FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT 2100 LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT 2101 INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR 2102 FITNESS FOR A PARTICULAR PURPOSE. 2104 Table of Contents 2106 1 Acknowledgements ................................................ 2 2107 2 Purpose ......................................................... 2 2108 3 Terminology ..................................................... 3 2109 4 Protocol Overview ............................................... 5 2110 4.1 Design Rationale .............................................. 6 2111 5 Protocol Details ................................................ 8 2112 5.1 Interaction with the EGP ...................................... 8 2113 5.2 Multicast Data Packet Processing .............................. 9 2114 5.3 BGMP processing of Join and Prune messages and notifications 2115 .............................................................. 10 2116 5.3.1 Receiving Joins ............................................. 10 2117 5.3.2 Receiving Prune Notifications ............................... 12 2118 5.3.3 Receiving Route Change Notifications ........................ 13 2119 5.3.4 Receiving (S,G) Poison-Reverse messages ..................... 13 2120 5.4 Interaction with M-IGP components ............................. 14 2121 5.4.1 Interaction with DVMRP and PIM-DM ........................... 15 2122 Draft BGMP November 2000 2124 5.4.2 Interaction with PIM-SM ..................................... 16 2125 5.4.3 Interaction with CBT ........................................ 17 2126 5.4.4 Interaction with MOSPF ...................................... 18 2127 5.5 Operation over Multi-access Networks .......................... 18 2128 5.6 Interaction between (S,G) state and G-routes .................. 19 2129 6 Interaction with address allocation ............................. 20 2130 6.1 Requirements for BGMP components .............................. 20 2131 7 Transition Strategy ............................................. 20 2132 7.1 Preventing transit through the MBone stub ..................... 22 2133 8 Message Formats ................................................. 23 2134 8.1 Message Header Format ......................................... 23 2135 8.2 OPEN Message Format ........................................... 24 2136 8.3 UPDATE Message Format ......................................... 27 2137 8.4 Encoding examples ............................................. 32 2138 8.5 KEEPALIVE Message Format ...................................... 32 2139 8.6 NOTIFICATION Message Format ................................... 32 2140 9 BGMP Error Handling ............................................. 35 2141 9.1 Message Header error handling ................................. 35 2142 9.2 OPEN message error handling ................................... 35 2143 9.3 UPDATE message error handling ................................. 36 2144 9.4 NOTIFICATION message error handling ........................... 37 2145 9.5 Hold Timer Expired error handling ............................. 37 2146 9.6 Finite State Machine error handling ........................... 37 2147 9.7 Cease ......................................................... 38 2148 9.8 Connection collision detection ................................ 38 2149 10 BGMP Version Negotiation ....................................... 39 2150 10.1 BGMP Capability Negotiation .................................. 39 2151 11 BGMP Finite State machine ...................................... 40 2152 12 Security Considerations ........................................ 44 2153 13 Authors' Addresses ............................................. 44 2154 14 References ..................................................... 45 2155 15 Full Copyright Statement ....................................... 47