idnits 2.17.1 draft-ietf-bgmp-spec-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 46 longer pages, the longest (page 2) being 117 lines == It seems as if not all pages are separated by form feeds - found 0 form feeds but 47 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** The abstract seems to contain references ([MASC]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == Line 1530 has weird spacing: '...is less than...' == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords -- however, there's a paragraph with a matching beginning. Boilerplate error? (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'BR32' is mentioned on line 229, but not defined == Missing Reference: 'BR41' is mentioned on line 229, but not defined == Missing Reference: 'BR31' is mentioned on line 231, but not defined == Missing Reference: 'BR42' is mentioned on line 231, but not defined == Missing Reference: 'BR43' is mentioned on line 231, but not defined == Missing Reference: 'BR22' is mentioned on line 233, but not defined == Missing Reference: 'BR52' is mentioned on line 233, but not defined == Missing Reference: 'BR53' is mentioned on line 233, but not defined == Missing Reference: 'BR21' is mentioned on line 235, but not defined == Missing Reference: 'BR51' is mentioned on line 235, but not defined == Missing Reference: 'BR12' is mentioned on line 237, but not defined == Missing Reference: 'BR61' is mentioned on line 237, but not defined == Missing Reference: 'BR13' is mentioned on line 239, but not defined == Missing Reference: 'BR71' is mentioned on line 243, but not defined == Missing Reference: 'BR81' is mentioned on line 243, but not defined == Missing Reference: 'BRXY' is mentioned on line 247, but not defined == Missing Reference: 'HPIM' is mentioned on line 267, but not defined == Missing Reference: 'PIM-SM' is mentioned on line 267, but not defined == Unused Reference: 'DVMRP' is defined on line 1986, but no explicit reference was found in the text == Unused Reference: 'MOSPF' is defined on line 2018, but no explicit reference was found in the text == Unused Reference: 'PIMDM' is defined on line 2022, but no explicit reference was found in the text ** Obsolete normative reference: RFC 1771 (ref. 'BGP') (Obsoleted by RFC 4271) ** Obsolete normative reference: RFC 2283 (ref. 'MBGP') (Obsoleted by RFC 2858) ** Downref: Normative reference to an Historic RFC: RFC 2189 (ref. 'CBT') == Outdated reference: A later version (-11) exists of draft-ietf-idmr-dvmrp-v3-09 ** Downref: Normative reference to an Informational draft: draft-ietf-idmr-dvmrp-v3 (ref. 'DVMRP') == Outdated reference: A later version (-05) exists of draft-ietf-idmr-membership-reports-04 -- Possible downref: Normative reference to a draft: ref. 'DWR' ** Downref: Normative reference to an Informational RFC: RFC 2715 (ref. 'INTEROP') -- Possible downref: Non-RFC (?) normative reference: ref. 'IPSEC' ** Obsolete normative reference: RFC 2373 (ref. 'IPv6AA') (Obsoleted by RFC 3513) == Outdated reference: A later version (-03) exists of draft-ietf-mboned-imrp-some-issues-02 -- Possible downref: Normative reference to a draft: ref. 'ISSUES' == Outdated reference: A later version (-06) exists of draft-ietf-malloc-masc-05 ** Downref: Normative reference to an Historic draft: draft-ietf-malloc-masc (ref. 'MASC') ** Downref: Normative reference to an Historic RFC: RFC 1584 (ref. 'MOSPF') == Outdated reference: A later version (-08) exists of draft-ietf-idmr-pim-dm-spec-05 -- Possible downref: Normative reference to a draft: ref. 'PIMDM' ** Obsolete normative reference: RFC 2362 (ref. 'PIMSM') (Obsoleted by RFC 4601, RFC 5059) ** Obsolete normative reference: RFC 1966 (ref. 'REFLECT') (Obsoleted by RFC 4456) ** Obsolete normative reference: RFC 1700 (Obsoleted by RFC 3232) Summary: 16 errors (**), 0 flaws (~~), 32 warnings (==), 6 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 BGMP Working Group D. Thaler 2 Internet Engineering Task Force Microsoft 3 INTERNET-DRAFT D. Estrin 4 March 10, 2000 USC/ISI 5 Expires September 2000 D. Meyer 6 Cisco 7 Editors 9 Border Gateway Multicast Protocol (BGMP): 10 Protocol Specification 11 13 Status of this Memo 15 This document is an Internet-Draft and is in full conformance with all 16 provisions of Section 10 of RFC2026. 18 Internet-Drafts are working documents of the Internet Engineering Task 19 Force (IETF), its areas, and its working groups. Note that other groups 20 may also distribute working documents as Internet-Drafts. 22 Internet Drafts are valid for a maximum of six months and may be 23 updated, replaced, or obsoleted by other documents at any time. It is 24 inappropriate to use Internet Drafts as reference material or to cite 25 them other than as a "work in progress". 27 The list of current Internet-Drafts can be accessed at 28 http://www.ietf.org/ietf/1id-abstracts.txt 30 The list of Internet-Draft Shadow Directories can be accessed at 31 http://www.ietf.org/shadow.html. 33 Abstract 35 This document describes BGMP, a protocol for inter-domain multicast 36 routing. BGMP builds shared trees for active multicast groups, and 37 allows receiver domains to build source-specific, inter-domain, 38 distribution branches where needed. Building upon concepts from CBT and 39 PIM-SM, BGMP requires that each multicast group be associated with a 40 single root (in BGMP it is referred to as the root domain). BGMP 41 assumes that at any point in time, different ranges of the class D space 43 Draft BGMP March 2000 45 are associated (e.g., with MASC [MASC]) with different domains. Each of 46 these domains then becomes the root of the shared domain-trees for all 47 groups in its range. Multicast participants will generally receive 48 better multicast service if the session initiator's address allocator 49 selects addresses from its own domain's part of the space, thereby 50 causing the root domain to be local to at least one of the session 51 participants. 53 Copyright Notice 55 Copyright (C) The Internet Society (2000). All Rights Reserved. 57 1. Acknowledgements 59 In addition to the editors, the following individuals have 60 contributed to the design of BGMP: Cengiz Alaettinoglu, Tony 61 Ballardie, Steve Casner, Steve Deering, Dino Farinacci, Bill Fenner, 62 Mark Handley, Ahmed Helmy, Van Jacobson, and Satish Kumar. 64 This document is the product of the IETF BGMP Working Group with Dave 65 Thaler, Deborah Estrin, and David Meyer as editors. 67 Rusty Eddy, Isidor Kouvelas, and Pavlin Radoslavov also provided 68 valuable feedback on this document. 70 2. Purpose 72 It has been suggested that inter-domain multicast is better supported 73 with a rendezvous mechanism whereby members receive source's data 74 packets without any sort of global broadcast (e.g., DVMRP and PIM-DM 75 broadcast initial data packets and MOSPF broadcasts membership 76 information). CBT [CBT] and PIM-SM [PIMSM] use a shared group-tree, 77 to which all members join and thereby hear from all sources (and to 78 which non-members do not join and thereby hear from no sources). 80 This document describes BGMP, a protocol for inter-domain multicast 81 routing. BGMP builds shared trees for active multicast groups, and 82 allows domains to build source-specific, inter-domain, distribution 83 branches where needed. Building upon concepts from CBT and PIM-SM, 84 BGMP requires that each global multicast group be associated with a 85 single root. However, in BGMP, the root is an entire exchange or 86 domain, rather than a single router. 88 Draft BGMP March 2000 90 BGMP assumes that ranges of the class D space have been associated 91 (e.g., with MASC [MASC]) with selected domains. Each such domain 92 then becomes the root of the shared domain-trees for all groups in 93 its range. An address allocator will generally achieve better 94 distribution trees if it takes its multicast addresses from its own 95 domain's part of the space, thereby causing the root domain to be 96 local. 98 BGMP uses TCP as its transport protocol. This eliminates the need to 99 implement message fragmentation, retransmission, acknowledgement, and 100 sequencing. BGMP uses TCP port 264 for establishing its connections. 101 This port is distinct from BGP's port to provide protocol 102 independence, and to facilitate distinguishing between protocol 103 packets (e.g., by packet classifiers, diagnostic utilities, etc.) 105 Two BGMP peers form a TCP connection between one another, and 106 exchange messages to open and confirm the connection parameters. 107 They then send incremental Join/Prune Updates as group memberships 108 change. BGMP does not require periodic refresh of individual 109 entries. KeepAlive messages are sent periodically to ensure the 110 liveness of the connection. Notification messages are sent in 111 response to errors or special conditions. If a connection encounters 112 an error condition, a notification message is sent and the connection 113 is closed if the error is a fatal one. 115 3. Terminology 117 This document uses the following technical terms: 119 Domain: 120 A set of one or more contiguous links and zero or more routers 121 surrounded by one or more multicast border routers. Note that this 122 loose definition of domain also applies to an external link between 123 two domains, as well as an exchange. 125 Root Domain: 126 When constructing a shared tree of domains for some group, one 127 domain will be the "root" of the tree. The root domain receives 128 data from each sender to the group, and functions as a rendezvous 129 domain toward which member domains can send inter-domain joins, and 130 to which sender domains can send data. 132 Draft BGMP March 2000 134 Multicast RIB: 135 The Routing Information Base, or routing table, used to calculate 136 the "next-hop" towards a particular address for multicast traffic. 138 Multicast IGP (M-IGP): 139 A generic term for any multicast routing protocol used for tree 140 construction within a domain. Typical examples of M-IGPs are: 141 DVMRP, PIM-DM, PIM-SM, CBT, and MOSPF. 143 EGP: A generic term for the interdomain unicast routing protocol in use. 144 Typically, this will be some version of BGP which can support a 145 Multicast RIB, such as MBGP [MBGP], containing both unicast and 146 multicast address prefixes. 148 Component: 149 The portion of a border router associated with (and logically 150 inside) a particular domain that runs the multicast IGP (M-IGP) for 151 that domain, if any. Each border router thus has zero or more 152 components inside routing domains. In addition, each border router 153 with external links that do not fall inside any routing domain will 154 have an inter-domain component that runs BGMP. 156 External peer: 157 A border router in another multicast AS (autonomous system, as used 158 in BGP), to which a BGMP TCP-connection is open. Assuming MBGP is 159 being used, a separate "eBGP" TCP-connection will also be open to 160 the same peer. 162 Internal peer: 163 Another border router of the same multicast AS. A border router 164 either speaks iBGP ("internal" BGP) directly to internal peers in a 165 full mesh, or indirectly through a route reflector [REFLECT]. 167 Next-hop peer: 168 The next-hop peer towards a given IP address is the next EGP router 169 on the path to the given address, according to multicast RIB routes 170 in the EGP's routing table (e.g., in MBGP, routes whose Subsequent 171 Address Family Identifier field indicates that the route is valid 172 for multicast traffic). 174 target: 175 Either an EGP peer, or an M-IGP component. 177 Tree State Table: 178 This is a table of (S-prefix,G-prefix) entries (including (*,G- 180 Draft BGMP March 2000 182 prefix) entries) that have been explicitly joined by a set of 183 targets. Each entry has, in addition to the source and group 184 addresses and masks, a list of targets that have explicitly 185 requested data (on behalf of directly connected hosts or downstream 186 routers). (S,G) entries also have an "SPT" bit. 188 The key words "MUST", "MUST NOT", "SHOULD", "SHOULD NOT", and "MAY" in 189 this document are to be interpreted as described in [RFC2119]. 191 4. Protocol Overview 193 BGMP maintains group-prefix state in response to messages from BGMP 194 peers and notifications from M-IGP components. Group-shared trees 195 are rooted at the domain advertising the group prefix covering those 196 groups. When a receiver joins a specific group address, the border 197 router towards the root domain generates a group-specific Join 198 message, which is then forwarded Border-Router-by-Border-Router 199 towards the root domain (see Figure 1). BGMP Join and Prune messages 200 are sent over TCP connections between BGMP peers, and BGMP protocol 201 state is refreshed by KEEPALIVE messages periodically sent over TCP. 203 BGMP routers build group-specific bidirectional forwarding state as 204 they process the BGMP Join messages. Bidirectional forwarding state 205 means that packets received from any target are forwarded to all 206 other targets in the target list without any RPF checks. No group- 207 specific state or traffic exists in parts of the network where there 208 are no members of that group. 210 BGMP routers build source-specific unidirectional forwarding state, 211 only where needed, to be compatible with source-specific trees (SPTs) 212 used by some M-IGPs (e.g., DVMRP, PIM-DM, or PIM-SM). A domain that 213 uses an SPT-based M-IGP may need to inject multicast packets from 214 external sources via different border routers (to be compatible with 215 the M-IGP RPF checks) which thus act as "surrogates". For example, 216 in the Transit_1 domain, data from Src_A arrives at BR12, but must be 217 injected by BR11. A surrogate router may create a source-specific 218 BGMP branch if no shared tree state exists. Note: stub domains with 219 a single border router, such as Rcvr_Stub_7 in Figure 1, receive all 220 multicast data packets through that router, to which all RPF checks 221 point. Therefore, stub domains never build source-specific state. 223 Root_Domain 224 [BR91]--------------------------\ 225 | | 227 Draft BGMP March 2000 229 [BR32] [BR41] 230 Transit_3 Transit_4 231 [BR31] [BR42] [BR43] 232 | | | 233 [BR22] [BR52] [BR53] 234 Transit_2 Transit_5 235 [BR21] [BR51] 236 | | 237 [BR12] [BR61] 238 Transit_1[BR11]----------[BR62]Stub_6 239 [BR13] (Src_A) 240 | (Rcvr_D) 241 ------------------- 242 | | 243 [BR71] [BR81] 244 Rcvr_Stub_7 Src_only_Stub_8 245 (Rcvr_C) (Src_B) 247 Figure 1: Example inter-domain topology. [BRXY] represents a BGMP 248 border 249 router. Transit_X is a transit domain network. *_Stub_X is a stub 250 domain network. 252 Data packets are forwarded based on a combination of BGMP and M-IGP 253 rules. The router forwards to a set of targets according to a 254 matching (S,G) BGMP tree state entry if it exists. If not found, the 255 router checks for a matching (*,G) BGMP tree state entry. If neither 256 is found, then the packet is sent natively to the next-hop EGP peer 257 for G, according to the Multicast RIB (for example, in the case of a 258 non-member sender such as Src_B in Figure 1). If a matching entry 259 was found, the packet is forwarded to all other targets in the target 260 list. In this way BGMP trees forward data in a bidirectional manner. 261 If a target is an M-IGP component then forwarding is subject to the 262 rules of that M-IGP protocol. 264 4.1. Design Rationale 266 Several other protocols, or protocol proposals, build shared trees 267 within domains [CBT, HPIM, PIM-SM]. The design choices made for BGMP 268 result from our focus on Inter-Domain multicast in particular. The 269 design choices made by CBT and PIM-SM are better suited to the wide- 270 area intra-domain case. There are three major differences between 271 BGMP and other shared-tree protocols: 273 Draft BGMP March 2000 275 (1) Unidirectional vs. Bidirectional trees 277 Bidirectional trees (using bidirectional forwarding state as 278 described above) minimize third party dependence which is essential 279 in the inter-domain context. For example, in Figure 1, stub domains 280 7 and 8 would like to exchange multicast packets without being 281 dependent on the quality of connectivity of the root domain. 282 However, unidirectional shared trees (i.e., those using RPF checks) 283 have more aggressive loop prevention and share the same processing 284 rules as source-specific entries which are inherently unidirectional. 286 The lack of third party dependence concerns in the INTRA domain case 287 reduces the incentive to employ bidirectional trees. BGMP supports 288 bidirectional trees because it has to, and because it can without 289 excessive cost. 291 (2) Source-specific distribution trees/branches 293 In a departure from other shared tree protocols, source-specific BGMP 294 state is built ONLY where (a) it is needed to pull the multicast 295 traffic down to a BGMP router that has source-specific (S,G) state, 296 and (b) that router is NOT already on the shared tree (i.e., has no 297 (*,G) state), and (c) that router does not want to receive packets 298 via encapsulation from from a router which is on the shared tree. 299 BGMP provides source-specific branches because most M-IGP protocols 300 in use today build source-specific trees. BGMP's source-specific 301 branches eliminate the unnecessary overhead of encapsulations for 302 high data rate sources from the shared tree's ingress router to the 303 surrogate injector (e.g. from BR12 to BR11 in Figure 1). Moreover, 304 cases in which shared paths are significantly longer than SPT paths 305 will also benefit. 307 However, we do not build source-specific inter-domain trees in 308 general because (a) inter-domain connectivity is generally less rich 309 than intra-domain connectivity, so shared distribution trees should 310 have more acceptible path length and traffic concentration properties 311 in the inter-domain context, than in the intra-domain case, and (b) 312 by having the shared tree state always take precedence over source- 313 specific tree state, we avoid ambiguities that can otherwise arise. 315 In summary, BGMP trees are, in a sense, a hybrid between CBT and PIM- 316 SM trees. 318 (3) Method of choosing root of group shared tree 320 Draft BGMP March 2000 322 The choice of a group's shared-tree-root has implications for 323 performance and policy. In the intra-domain case it can be assumed 324 that all potential shared-tree roots (RPs/Cores) within the domain 325 are equally suited to be the root for a group that is initiated 326 within that domain. In the INTER-domain case, there is far more 327 opportunity for unacceptably poor locality, and administrative 328 control of a group's shared-tree root. Therefore in the intra-domain 329 case, other protocols treat all candidate roots (RPs or Cores) as 330 equivalent and emphasize load sharing and stability to maximize 331 performance. In the Inter-Domain case, all roots are not equivalent, 332 and we adopt an approach whereby a group's root domain is not random 333 but is subject to administrative and performance input. 335 5. Protocol Details 337 In this section, we describe the detailed protocol that border 338 routers perform. We assume that each border router conforms to the 339 component-based model described in [INTEROP]. 341 5.1. Interaction with the EGP 343 A fundamental requirement imposed by BGMP on the design of an EGP is 344 that it be able to carry multicast prefixes. For example, a multi- 345 protocol BGP (MBGP) must be able to carry a multicast prefix in the 346 Unicast Network Layer Reachability Information (NLRI) field of the 347 UPDATE message (i.e., either an IPv4 class D prefix or an IPv6 prefix 348 with high-order octet equal to FF [IPv6AA]). This capability is 349 required by BGMP in the implementation of bi-directional trees; BGMP 350 must be able to forward data and control packets to the next hop 351 towards either a unicast source S or a multicast group G (see section 352 5.2). It is also required that the path attributes defined in [BGP] 353 have the same semantics whether they are accompany unicast or 354 multicast NLRI. 356 MBGP [MBGP] satisfies the requirement described above. MBGP defines 357 the optional transitive attributes Multiprotocol Reachable NLRI 358 (MP_REACH_NLRI) and Multiprotocol Unreachable (MP_UNREACH_NRLI) to 359 carry sets of reachable or unreachable destinations, and the 360 appropriate next hop in the case of MP_REACH_NLRI. These attributes 361 contain an Address Family Information field [RFC1700] which indicates 362 the type of NLRI carried in the attribute. In addition, the attribute 363 carries another field, the Subsequent Address Family Identifier, or 364 SAFI, which can be used to provide additional information about the 366 Draft BGMP March 2000 368 type of NLRI. For example, SAFI value two indicates that the NLRI is 369 valid for multicast forwarding. BGMP's requirement can be satisfied 370 by allowing the NLRI field of the MP_REACH_NLRI (or MP_UNREACH_NLRI) 371 to carry a multicast prefix in the Prefix field of the NLRI encoding. 373 Finally, while not required for correct BGMP operation, the design of 374 an EGP should also provide a mechanism that allows discrimination 375 between NLRI that is to be used for unicast forwarding and NLRI to be 376 used for multicast forwarding. This property is required to support 377 multicast-specific policy. As mentioned above, MBGP has this 378 capability. 380 5.2. Multicast Data Packet Processing 382 For BGMP rules to be applied, an incoming packet must first be 383 "accepted": 385 o If the packet arrived on an interface owned by an M-IGP, the M-IGP 386 component determines whether the packet should be accepted or 387 dropped according to its rules. If the packet is accepted, the 388 packet is forwarded (or not forwarded) out any other interfaces 389 owned by the same component, as specified by the M-IGP. 391 o If the packet was received over a point-to-point interface owned 392 by BGMP, the packet is accepted. 394 o If the packet arrived on a multiaccess network interface owned by 395 BGMP, the packet is accepted if it is the designated forwarder for 396 longest matching route for S, if it is receiving data on a source- 397 specific branch, or for the longest matching route for G. 399 If the packet is accepted, then the router checks the tree state 400 table for a matching (S,G) entry. If one is found, but the packet 401 was not received from the next hop target towards S (if the entry's 402 SPT bit is True), or was not received from the next hop target 403 towards G (if the entry's SPT bit is False) then the packet is 404 dropped and no further actions are taken. If no (S,G) entry was 405 found, the router then checks for a matching (*,G) entry. 407 If neither is found, then the packet is forwarded towards the next- 408 hop peer for G, according to the Multicast RIB. If a matching entry 409 was found, the packet is forwarded to all other targets in the target 410 list. 412 Draft BGMP March 2000 414 Forwarding to a target which is an M-IGP component means that the 415 packet is forwarded out any interfaces owned by that component 416 according to that component's multicast forwarding rules. 418 5.3. BGMP processing of Join and Prune messages and notifications 420 5.3.1. Receiving Joins 422 When the BGMP component receives a (*,G) or (S,G) Join alert from 423 another component, or a BGMP (S,G) or (*,G) Join message from an 424 external peer, it searches the tree state table for a matching entry. 425 If an entry is found, and that peer is already listed in the target 426 list, then no further actions are taken. 428 Otherwise, if no (*,G) or (S,G) entry was found, one is created. In 429 the case of a (*,G), the target list is initialized to contain the 430 next-hop peer towards G, if it is an external peer. If the peer is 431 internal, the target list is initialized to contain the M-IGP 432 component owning the next-hop interface. If there is no next-hop 433 peer (because G is inside the domain), then the target list is 434 initialized to contain the next-hop component. If an (S,G) entry 435 exists for the same G for which the (*,G) Join is being processed, 436 and the next-hop peers toward S and G are different, the BGMP router 437 must first send a (S,G) Prune message toward the source and clear the 438 SPT bit on the (S,G) entry, before activating the (*,G) entry. 440 When creating (S,G) state, if the source is internal to the BGMP 441 speaker's domain, a "Poison-Reverse" bit (PR-bit) is set. This bit 442 indicates that the router may receive packets matching (S,G) anyway 443 due to the BGMP speaker being a member of a domain on the path 444 between S and the root domain. (Depending on the M-IGP protocol, it 445 may in fact receive such packets anyway only if it is the best exit 446 for G.) 448 The target from which the Join was received is then added to the 449 target list. The router then looks up S or G in the Multicast RIB to 450 find the next-hop EGP peer. If the target list, not including the 451 next-hop target towards G for a (*,G) entry, becomes non-null as a 452 result, the next-hop EGP peer must be notified as follows: 454 a) If the next-hop peer towards G (for a (*,G) entry) is an external 455 peer, a BGMP (*,G) Join message is unicast to the external peer. 456 If the next-hop peer towards S (for an (S,G) entry) is an external 457 peer, and the router does NOT have any active (*,G) state for that 459 Draft BGMP March 2000 461 group address G, a BGMP (S,G) Join message is unicast to the 462 external peer. A BGMP (S,G) Join message is never sent to an 463 external peer by a router that also contains active (*,G) state 464 for the same group. If the next-hop peer towards S (for an (S,G 465 entry) is an external peer and the router DOES have active (*,G) 466 state for that group G, the SPT bit is always set to False. 468 b) If the next-hop peer is an internal peer, a (*,G) or (S,G) Join 469 alert is sent to the M-IGP component owning the next-hop 470 interface. 472 c) If there is no next-hop peer, a (*,G) or (S,G) Join alert is sent 473 to the M-IGP component owning the next-hop interface. 475 Finally, if an (S,G) Join is received from an internal peer, the 476 peer should be stored with the M-IGP component target. If (S,G) 477 state exists with the PR-bit set, and the next-hop towards G is 478 through the M-IGP component, an (S,G) Poison-Reverse message is 479 immediately sent to the internal peer. 481 If an (S,G) Join is received from an external peer, and (S,G) 482 state exists with the PR-bit set, and the local BGMP speaker is 483 the best exit for G, and the next-hop towards G is through the 484 interface towards the external peer, an (S,G) Poison-Reverse 485 message is immediately sent to the external peer. 487 5.3.2. Receiving Prune Notifications 489 When the BGMP component receives a (*,G) or (S,G) Prune alert from 490 another component, or a BGMP (*,G) or (S,G) Prune message from an 491 external peer, it searches the tree state table for a matching entry. 492 If no (S,G) entry was found for an (S,G) Prune, but (*,G) state 493 exists, an (S,G) entry is created, with the target list copied from 494 the (*,G) entry. If no matching entry exists, or if the component or 495 peer is not listed in the target list, no further actions are taken. 497 Otherwise, the component or peer is removed from the target list. If 498 the target list becomes null as a result, the next-hop peer towards G 499 (for a (*,G) entry), or towards S (for an (S,G) entry if and only if 500 the BGMP router does NOT have any corresponding (*,G) entry), must be 501 notified as follows. 503 Draft BGMP March 2000 505 a) If the peer is an external peer, a BGMP (*,G) or (S,G) Prune 506 message is unicast to it. 508 b) If the next-hop peer is an internal peer, a (*,G) or (S,G) Prune 509 alert is sent to the M-IGP component owning the next-hop 510 interface. 512 c) If there is no next-hop peer, a (*,G) or (S,G) Prune alert is sent 513 to the M-IGP component owning the next-hop interface. 515 5.3.3. Receiving Route Change Notifications 517 When a border router receives a route for a new prefix in the 518 multicast RIB, or a existing route for a prefix is withdrawn, a route 519 change notification for that prefix must be sent to the BGMP 520 component. In addition, when the next hop peer (according to the 521 multicast RIB) changes, a route change notification for that prefix 522 must be sent to the BGMP component. 524 In addition, an internal route for each class-D prefix associated 525 with the domain (if any) MUST be injected into the multicast RIB in 526 the EGP by the domain's border routers. 528 When a route for a new group prefix is learned, or an existing route 529 for a group prefix is withdrawn, or the next-hop peer for a group 530 prefix changes, a BGMP router updates all affected (*,G) target 531 lists. The router sends a (*,G) Join to the new next-hop target, and 532 a (*,G) Prune to the old next-hop target, as appropriate. In 533 addition, if any (S,G) state exists with the PR-bit set: 535 o If the BGMP speaker has just become the best exit for G, an (S,G) 536 Poison Reverse message with the PR-bit set is sent as noted below. 538 o If the BGMP speaker was the best exit for G and is no longer, an 539 (S,G) Poison Reverse message with the PR-bit clear is sent as 540 noted below. 541 The (S,G) Poison-Reverse messages are sent to all external peers on 542 the next-hop interface towards G from which (S,G) Joins have been 543 received. 545 When an existing route for a source prefix is withdrawn, or the next- 546 hop peer for a source prefix changes, a BGMP router updates all 547 affected (S,G) target lists. The router sends a (S,G) Join to the 549 Draft BGMP March 2000 551 new next-hop target, and a (S,G) Prune to the old next-hop target, as 552 appropriate. 554 5.3.4. Receiving (S,G) Poison-Reverse messages 556 When a BGMP speaker receives an (S,G) Poison-Reverse message from a 557 peer, it sets the PR-bit on the (S,G) state to match the PR-bit in 558 the message, and looks up the next-hop towards G. If the next-hop 559 target is an M-IGP component, it forwards the (S,G) Poison Reverse 560 message to all internal peers of that component from which it has 561 received (S,G) Joins. If the next-hop target is an external peer on 562 a given interface, it forwards the (S,G) Poison Reverse message to 563 all external peers on that interface. 565 When a BGMP speaker receives an (S,G) Poison-Reverse message from an 566 external peer, with the PR-bit set, and the speaker has received no 567 (S,G) Joins from any other peers (e.g., only from the M-IGP, or has 568 (S,G) state due to encapsulation as described in 5.4.1), it knows 569 that its own (S,G) Join is unnecessary, and should send an (S,G) 570 Prune. 572 When a BGMP speaker receives an (S,G) Poison-Reverse message from an 573 internal peer, with the PR-bit set, and the speaker is the best exit 574 for G, and has (S,G) prune state, an (S,G) Join message is sent to 575 cancel the prune state and the state is deleted. 577 5.4. Interaction with M-IGP components 579 When an M-IGP component on a border router first learns that there 580 are internally-reached members for a group G (whose scope is larger 581 than that domain), a (*,G) Join alert is sent to the BGMP component. 582 Similarly, when an M-IGP component on a border router learns that 583 there are no longer internally-reached members for a group G (whose 584 scope is larger than a single domain), a (*,G) Prune alert is sent to 585 the BGMP component. 587 At any time, any M-IGP domain MAY decide to join a source-specific 588 branch for some external source S and group G. When the M-IGP 589 component in the border router that is the next-hop router for a 590 particular source S learns that a receiver wishes to receive data 591 from S on a source-specific path, an (S,G) Join alert is sent to the 592 BGMP component. When it is learned that such receivers no longer 593 exist, an (S,G) Prune alert is sent to the BGMP component. Recall 595 Draft BGMP March 2000 597 that the BGMP component will generate external source-specific Joins 598 only where the source-specific branch does not coincide with the 599 shared tree distribution tree for that group. 601 Finally, we will require that the border router that is the next-hop 602 internal peer for a particular address S or G be able to forward data 603 for a matching tree state table entry to all members within the 604 domain. This requirement has implications on specific M-IGPs as 605 follows. 607 5.4.1. Interaction with DVMRP and PIM-DM 609 DVMRP and PIM-DM are both "broadcast and prune" protocols in which 610 every data packet must pass an RPF check against the packet's source 611 address, or be dropped. If the border router receiving packets from 612 an external source is the only BR to inject the route for the source 613 into the domain, then there are no problems. For example, this will 614 always be true for stub domains with a single border router (see 615 Figure 1). Otherwise, the border router receiving packets externally 616 is responsible for encapsulating the data to any other border routers 617 that must inject the data into the domain for RPF checks to succeed. 619 When an intended border router injector for a source receives 620 encapsulated packets from another border router in its domain, it 621 should create source-specific (S,G) BGMP state. Note that the border 622 router may be configured to do this on a data-rate triggered basis so 623 that the state is not created for very low data-rate/intermittent 624 sources. If source-specific state is created, then its incoming 625 interface points to the virtual encapsulation interface from the 626 border router that forwarded the packet, and it has an SPT flag that 627 is initialized to be False. 629 When the (S,G) BGMP state is created, the BGMP component will in turn 630 send a BGMP (S,G) Join message to the next-hop external peer towards 631 S if there is no (*,G) state for that same group, G. The (S,G) BGMP 632 state will have the SPT bit set to False if (*,G) BGMP state is 633 present. 635 When the first data packet from S arrives from the external peer and 636 matches on the BGMP (S,G) state, and IF there is no (*,G) state, the 637 router sets the SPT flag to True, resets the incoming interface to 638 point to the external peer, and sends a BGMP (S,G) Prune message to 639 the border router that was encapsulating the packets (e.g., in Figure 640 1, BR11 sends the (Src_A,G) Prune to BR12). When the border router 642 Draft BGMP March 2000 644 with (*,G) state receives the prune for (S,G), it then deletes that 645 border router from its list of targets. 647 If the decapsulator receives a (S,G) Poison Reverse message with the 648 PR-bit set, it will forward it to the encapsulator (which may again 649 forward it up the shared tree according to normal BGMP rules), and 650 both will delete their BGMP (S,G) state. 652 PIM-DM and DVMRP present an additional problem, i.e., no protocol 653 mechanism exists for joining and pruning entire groups; only joins 654 and prunes for individual sources are available. We therefore 655 require that some form of Domain-Wide Reports (DWRs) [DWR] are 656 available within such domains. Such messages provide the ability to 657 join and prune an entire group across the domain. One simple 658 heuristic to approximate DWRs is to assume that if there are any 659 internally-reached members, then at least one of them is a sender. 660 With this heuristic, the presense of any M-IGP (S,G) state for 661 internally-reached sources can be used instead. Sending a data 662 packet to a group is then equivalent to sending a DWR for the group. 664 5.4.2. Interaction with PIM-SM 666 Protocols such as PIM-SM build unidirectional shared and source- 667 specific trees. As with DVMRP and PIM-DM, every data packet must 668 pass an RPF check against some group-specific or source-specific 669 address. 671 The fewest encapsulations/decapsulations will be done when the intra- 672 domain tree is rooted at the next-hop internal peer towards G (which 673 becomes the RP), since in general that router will receive the most 674 packets from external sources. To achieve this, each BGMP border 675 router to a PIM-SM domain should send Candidate-RP-Advertisements 676 within the domain for those groups for which it is the shared-domain 677 tree ingress router. When the border router that is the RP for a 678 group G receives an external data packet, it forwards the packet 679 according to the M-IGP (i.e., PIM-SM) shared-tree outgoing interface 680 list. 682 Other border routers will receive data packets from external sources 683 that are farther down the bidirectional tree of domains. When a 684 border router that is not the RP receives an external packet for 685 which it does not have a source-specific entry, the border router 686 treats it like a local source by creating (S,G) state with a Register 688 Draft BGMP March 2000 690 flag set, based on normal PIM-SM rules; the Border router then 691 encapsulates the data packets in PIM-SM Registers and unicasts them 692 to the RP for the group. As explained above, the RP for the inter- 693 domain group will be one of the other border routers of the domain. 695 If a source's data rate is high enough, DRs within the PIM-SM domain 696 may switch to the shortest path tree. If the shortest path to an 697 external source is via the group's ingress router for the shared 698 tree, the new (S,G) state in the BGMP border router will not cause 699 BGMP (S,G) Joins because that border router will already have (*,G) 700 state. If however, the shortest path to an external source is via 701 some other border router, that border router will create (S,G) BGMP 702 state in response to the M-IGP (S,G) Join alert. In this case, 703 because there is no local (*,G) state to supress it, the border 704 router will send a BGMP (S,G) Join to the next-hop external peer 705 towards S, in order to pull the data down directly. (See BR11 in 706 Figure 1.) As in normal PIM-SM operation, those PIM-SM routers that 707 have (*,G) and (S,G) state pointing to different incoming interfaces 708 will prune that source off the shared tree. Therefore, all internal 709 interfaces may be eventually pruned off the internal shared tree. 711 After the border router sends a BGMP (S,G) Join, if its (S,G) state 712 has the PR-bit clear, a (S,G) Poison-Reverse message (with the PR-bit 713 clear) is sent to the ingress router for G. The ingress router then 714 creates (S,G) if it does not already exist, and removes the next hop 715 towards G from the target list. 717 If the border router later receives an (S,G) Poison-Reverse message 718 with the PR-bit set, the Poison-Reverse message is forwarded to the 719 ingress router for G. The best-exit router then creates (S,G) state 720 if it does not already exist, and puts the next hop towards G in the 721 target list if not already present. 723 5.4.3. Interaction with CBT 725 CBT builds bidirectional shared trees but must address two points of 726 compatibility with BGMP. First, CBT can not accommodate more than 727 one border router injecting a packet. Therefore, if a CBT domain 728 does have multiple external connections, the M-IGP components of the 729 border routers are responsible for insuring that only one of them 730 will inject data from any given source. 732 Second, CBT cannot process source-specific Joins or Prunes. Two 733 options thus exist for each CBT domain: 735 Draft BGMP March 2000 737 Option A: 738 The CBT component interprets a (S,G) Join alert as if it were an 739 (*,G) Join alert, as described in [INTEROP]. That is, if it is not 740 already on the core-tree for G, then it sends a CBT (*,G) JOIN- 741 REQUEST message towards the core for G. Similarly, when the CBT 742 component receives an (S,G) Prune alert, and the child interface 743 list for a group is NULL, then it sends a (*,G) QUIT_NOTIFICATION 744 towards the core for G. This option has the disadvantage of 745 pulling all data for the group G down to the CBT domain when no 746 members exist. 748 Option B: 749 The CBT domain does not propagate any source routes (i.e., non- 750 class D routes) to their external peers for the Multicast RIB 751 unless it is known that no other path exists to that prefix (e.g., 752 routes for prefixes internal to the domain or in a singly-homed 753 customer's domain may be propagated). This insures that source- 754 specific joins are never received unless the source's data already 755 passes through the domain on the shared tree, in which case the 756 (S,G) Join need not be propagated anyway. BGMP border routers will 757 only send source-specific Joins or Prunes to an external peer if 758 that external peer advertises source-prefixes in the EGP. If a 759 BGMP-CBT border router does receive an (S,G) Join or Prune, that 760 border router should ignore the message. 762 To minimize en/de-capsulations, CBTv2 BR's may follow the same 763 scheme as described under PIM-SM above, in which Candidate-Core 764 advertisements are sent for those groups for which it is the 765 shared-tree ingress router. 767 5.4.4. Interaction with MOSPF 769 As with CBTv2, MOSPF cannot process source-specific Joins or Prunes, 770 and the same two options are available. Therefore, an MOSPF domain 771 may either: 773 Option A: 774 send a Group-Membership-LSA for all of G in response to a (S,G) 775 Join alert, and "prematurely age" it out (when no other downstream 776 members exist) in response to an (S,G) Prune alert, OR 778 Option B: 779 not propagate any source routes (i.e., non-class D routes) to their 780 external peers for the Multicast RIB unless it is known that no 782 Draft BGMP March 2000 784 other path exists to that prefix (e.g., routes for prefixes 785 internal to the domain or in a singly-homed customer's domain may 786 be propagated) 788 5.5. Operation over Multi-access Networks 790 Multiaccess links require special handling to prevent duplicates. 791 The following mechanism enables BGMP to operate over multiaccess 792 links which do not run an M-IGP. This avoids broadcast-and-prune 793 behavior and does not require (S,G) state. 795 To elect a designated forwarder per prefix, BGMP uses a FWDR_PREF 796 message to exchange "forwarder preference" values for each prefix. 797 The peer with the highest forwarder preference becomes the designated 798 forwarder, with ties broken by lowest BGMP Identifier. The 799 designated forwarder is the router responsible for forwarding packets 800 up the tree, and is the peer to which joins will be sent. 802 When BGMP first learns that a route exists in the multicast RIB whose 803 next-hop interface is NOT the multiaccess link, the BGMP router sends 804 a BGMP FWDR_PREF message for the prefix, to all BGMP peers on the 805 LAN. The FWDR_PREF message contains a "forwarder preference value" 806 for the local router, and the same value MUST be sent to all peers on 807 the LAN. Likewise, when the prefix is no longer reachable, a 808 FWDR_PREF of 0 is sent to all peers on the LAN. 810 Whenever a BGMP router calculates the next-hop peer towards a 811 particular address, and that peer is reached over a BGMP-owned 812 multiaccess LAN, the designated forwarder is used instead. 814 When a BGMP router receives a FWDR_PREF message from a peer, it looks 815 up the matching route in its multicast RIB, and calculates the new 816 designated forwarder. If the router has tree state entries whose 817 parent target was the old forwarder, it sends Joins to the new 818 forwarder and Prunes to the old forwarder. 820 When a BGMP router which is NOT the designated forwarder receives a 821 packet on the multiaccess link, it is silently dropped. 823 Finally, this mechanism prevents duplicates where full peering exists 824 on a "logical" link. Where full peering does not exist, steps must 825 be taken (outside of BGMP) to present separate logical interfaces to 826 BGMP, each of which is a link with full peering. This might entail, 827 for example, using different link-layer address mappings, doing 829 Draft BGMP March 2000 831 encapsulation, or changing the physical media. 833 5.6. Interaction between (S,G) state and G-routes 835 As discussed earlier, routers with (*,G) state will not propagate (S,G) 836 joins. However, a special case occurs when (S,G) state coincides with 837 the G-route. When this occurs, care must be taken so that the data will 838 reach the root domain without causing duplicates or black holes. For 839 this reason, (S,G) state on the path between the source and the root 840 domain is annotated as being "poison-reversed". A PR-bit is kept for 841 this purpose, which is updated by (UN)POISON_REVERSE messages. 843 6. Interaction with address allocation 845 6.1. Requirements for BGMP components 847 Each border router must be able to determine (e.g., from MASC [MASC]) 848 which class-D prefixes (if any) belong to each domain in which an M- 849 IGP component resides, so that it can inject routes for them into the 850 routing table. 852 7. Transition Strategy 854 There have been significant barriers to multicast deployment in 855 Internet backbones. While many of the problems with the current 856 DVMRP backbone (MBONE) have been documented in [ISSUES], most of 857 these problems require longer term engineering solutions. However, 858 there is much that can be done with existing technologies to enable 859 deployment and put in place an architecture that will enable a smooth 860 transition to the next generation of inter-domain multicast routing 861 protocols (i.e., BGMP). This section proposes a near-term transition 862 strategy and architecture that is designed to be simple, risk- 863 neutral, and provide a smooth, incremental transition path to BGMP. 864 In addition, the transition architecture provides for improved 865 convergence properties, some initial policy control, and the 866 opportunity for providers to run either native or tunneled multicast 867 backbones and exchanges. 869 The transition strategy proposed here is to initially use MBGP [MBGP] 870 to provide the desired convergence and policy control properties, and 872 Draft BGMP March 2000 874 PIM-DM for multicast data forwarding. Once this architecture is in 875 place, backbones and exchanges can incrementally transition to BGMP 876 and domains running other M-IGPs may be incorporated more fully. 878 Since the current MBone uses a broadcast-and-prune backbone running 879 DVMRP, BGMP may view the entire MBone as a single multi-homed stub 880 domain (with a new AS number). The members-are-senders heuristic can 881 then be used initially to provide membership notifications within 882 this stub domain. 884 A BGMP backbone can then be formed by designating one or more neutral 885 PIM-DM domains (say, exchanges) as initial BGMP backbones. Each 886 exchange is then associated with a group prefix which is injected 887 into the Multicast RIB by all MBGP/BGMP border routers on that 888 exchange. 890 Any domain which meets the following constraints may then transition 891 from a normal MBone-connected domain to one running BGMP: 893 (1) Must peer with another BGMP domain and participate in M-BGP to 894 propagate routes in the Multicast RIB. 896 (2) Must establish an internal (to the MBone AS) EGP (e.g., iBGP) 897 peer relationship with other border routers of the MBone "stub" 898 domain, as is done with unicast routing. We expect this to 899 eventually involve the use of one or more route reflectors 900 [REFLECT] inside the MBone domain. 902 (3) If the transition will partition the MBone "stub" domain, then it 903 must be insured that the MBone domain will be administratively 904 split into multiple domains, each with a different multicast AS 905 number. 907 Draft BGMP March 2000 909 7.1. Preventing transit through the MBone stub 911 We desire that two AS's which are mutually reachable through BGMP use 912 paths which do not pass through the MBone stub domain. This is 913 illustrated in Figure 2, where the MBone stub is AS 5, which is 914 multi-homed to both AS 3 and AS 4. Paths between sources and 915 destinations which have already transitioned to MBGP/BGMP should not 916 use AS 5 as transit unless no other path exists. 918 ----------------------\ /---------------------------- 919 | | 920 DVMRP /----\ | | /----\ IGP/iBGP 921 ..............| BR |+++++++++| BR |----------- 922 \----/ | E | \----/ 923 + | B | + AS 3 924 MBone + | G | + 925 + | P \-----+---------------------- 926 AS 5 iBGP + | + eBGP 927 + | /-----+---------------------- 928 + | | + 929 + | | + 930 DVMRP /----\ | | /----\ IGP/iBGP 931 ..............| BR |+++++++++| BR |----------- 932 \----/ | | \----/ 933 | | AS 4 934 | | 935 ----------------------/ \---------------------------- 937 Figure 2: Preventing Transit through MBone Stub 939 This requirement is easily solved using standard BGP policy 940 mechanisms. The MBone border routers should prefer EGP routes to 941 DVMRP routes, since DVMRP cannot tag routes as being external. Thus, 942 external routes may appear in the DVMRP routing table, but will not 943 be imported into the EGP since they will be overridden by iBGP 944 routes. 946 Other EGP routers should prefer routes whose ASpath does not contain 947 the well-known MBone AS number. This will insure that the route 948 through the MBone stub is not used unless no other path exists. For 949 safety, routes whose ASpath begins with the MBone AS should receive 950 the worst preference. 952 Draft BGMP March 2000 954 8. Message Formats 956 This section describes message formats used by BGMP. 958 Messages are sent over a reliable transport protocol connection. A 959 message is processed only after it is entirely received. The maximum 960 message size is 4096 octets. All implementations are required to 961 support this maximum message size. 963 All fields labelled "Reserved" below must be transmitted as 0, and 964 ignored upon receipt. 966 8.1. Message Header Format 968 Each message has a fixed-size (4-byte) header. There may or may not 969 be a data portion following the header, depending on the message 970 type. The layout of these fields is shown below: 972 0 1 2 3 973 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 974 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 975 | Length | Type | Reserved | 976 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 978 Length: 979 This 2-octet unsigned integer indicates the total length of the 980 message, including the header, in octets. Thus, e.g., it allows 981 one to locate in the transport-level stream the start of the next 982 message. The value of the Length field must always be at least 4 983 and no greater than 4096, and may be further constrained, depending 984 on the message type. No "padding" of extra data after the message 985 is allowed, so the Length field must have the smallest value 986 required given the rest of the message. 988 Type: 989 This 1-octet unsigned integer indicates the type code of the 990 message. The following type codes are defined: 992 1 - OPEN 993 2 - UPDATE 994 3 - NOTIFICATION 995 4 - KEEPALIVE 997 Draft BGMP March 2000 999 8.2. OPEN Message Format 1001 After a transport protocol connection is established, the first 1002 message sent by each side is an OPEN message. If the OPEN message is 1003 acceptable, a KEEPALIVE message confirming the OPEN is sent back. 1004 Once the OPEN is confirmed, UPDATE, KEEPALIVE, and NOTIFICATION 1005 messages may be exchanged. 1007 In addition to the fixed-size BGMP header, the OPEN message contains 1008 the following fields: 1010 0 1 2 3 1011 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1012 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1013 | Version | Rsvd| AddrFam | Hold Time | 1014 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1015 | BGMP Identifier (variable length) | 1016 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1017 | | 1018 + (Optional Parameters) | 1019 | | 1020 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1022 Version: 1023 This 1-octet unsigned integer indicates the protocol version number 1024 of the message. The current BGMP version number is 1. 1026 AddrFam: 1027 The IANA-assigned address family number of the BGMP Identifier. 1028 These include (among others): 1030 Number Description 1031 ------ ----------- 1032 1 IP (IP version 4) 1033 2 IPv6 (IP version 6) 1035 Hold Time: 1036 This 2-octet unsigned integer indicates the number of seconds that 1037 the sender proposes for the value of the Hold Timer. Upon receipt 1038 of an OPEN message, a BGMP speaker MUST calculate the value of the 1039 Hold Timer by using the smaller of its configured Hold Time and the 1040 Hold Time received in the OPEN message. The Hold Time MUST be 1042 Draft BGMP March 2000 1044 either zero or at least three seconds. An implementation may 1045 reject connections on the basis of the Hold Time. The calculated 1046 value indicates the maximum number of seconds that may elapse 1047 between the receipt of successive KEEPALIVE, and/or UPDATE messages 1048 by the sender. 1050 BGMP Identifier: 1051 This 4-octet (for IPv4) or 16-octet (IPv6) unsigned integer 1052 indicates the BGMP Identifier of the sender. A given BGMP speaker 1053 sets the value of its BGMP Identifier to a globally-unique value 1054 assigned to that BGMP speaker (e.g., an IPv4 address). The value 1055 of the BGMP Identifier is determined on startup and is the same for 1056 every BGMP session opened. 1058 Optional Parameters: 1059 This field may contain a list of optional parameters, where each 1060 parameter is encoded as a triplet. The combined length of all optional 1062 parameters can be derived from the Length field in the message 1063 header. 1065 0 1 1066 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 1067 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-... 1068 | Parm. Type | Parm. Length | Parameter Value (variable) 1069 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-... 1071 Parameter Type is a one octet field that unambiguously identifies 1072 individual parameters. Parameter Length is a one octet field that 1073 contains the length of the Parameter Value field in octets. 1074 Parameter Value is a variable length field that is interpreted 1075 according to the value of the Parameter Type field. 1077 This document defines the following Optional Parameters: 1079 a) Authentication Information (Parameter Type 1): 1080 This optional parameter may be used to authenticate a BGMP peer. 1081 The Parameter Value field contains a 1-octet Authentication Code 1082 followed by a variable length Authentication Data. 1084 0 1 2 3 4 5 6 7 8 1086 Draft BGMP March 2000 1088 +-+-+-+-+-+-+-+-+ 1089 | Auth. Code | 1090 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1091 | | 1092 | Authentication Data | 1093 | | 1094 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1096 Authentication Code: 1098 This 1-octet unsigned integer indicates the authentication 1099 mechanism being used. Whenever an authentication mechanism is 1100 specified for use within BGMP, three things must be included in 1101 the specification: 1103 - the value of the Authentication Code which indicates use of the 1104 mechanism, and - the form and meaning of the Authentication Data. 1106 Note that a separate authentication mechanism may be used in 1107 establishing the transport level connection. 1109 Authentication Data: 1111 The form and meaning of this field is a variable-length field 1112 depend on the Authentication Code. 1114 The minimum length of the OPEN message is 12 octets (including 1115 message header). 1117 b) Capability Information (Parameter Type 2): 1118 This is an Optional Parameter that is used by a BGMP-speaker to 1119 convey to its peer the list of capabilities supported by the 1120 speaker. The parameter contains one or more triples , where each triple is 1122 encoded as shown below: 1123 +------------------------------+ 1124 | Capability Code (1 octet) | 1125 +------------------------------+ 1126 | Capability Length (1 octet) | 1127 +------------------------------+ 1128 | Capability Value (variable) | 1129 +------------------------------+ 1130 Capability Code: 1132 Draft BGMP March 2000 1134 Capability Code is a one octet field that unambiguously identifies 1135 individual capabilities. 1137 Capability Length: 1139 Capability Length is a one octet field that contains the length of 1140 the Capability Value field in octets. 1142 Capability Value: 1144 Capability Value is a variable length field that is interpreted 1145 according to the value of the Capability Code field. 1147 A particular capability, as identified by its Capability Code, may 1148 occur more than once within the Optional Parameter. 1150 This document reserves Capability Codes 128-255 for vendor-specific 1151 applications. 1153 This document reserves value 0. 1155 Capability Codes (other than those reserved for vendor specific use) 1156 are assigned only by the IETF consensus process and IESG approval. 1158 8.3. UPDATE Message Format 1160 UPDATE messages are used to transfer Join/Prune/FwdrPref information 1161 between BGMP peers. The UPDATE message always includes the fixed- 1162 size BGMP header, and one or more attributes as described below. 1164 The message format below allows compact encoding of (*,G) Joins and 1165 Prunes, while allowing the flexibility needed to do other updates 1166 such as (S,G) Joins and Prunes towards sources as well as on the 1167 shared tree. In the discussion below, an Encoded-Address-Prefix is 1168 of the form: 1169 0 1 2 3 1170 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1171 +-+-+-+-+-+-+-+-+ 1172 |EnTyp| AddrFam | 1173 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1174 | Address (variable length) | 1175 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1176 | Mask (variable length) | 1178 Draft BGMP March 2000 1180 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1182 EnTyp: 1183 0 - All 1's Mask. The Mask field is 0 bytes long. 1184 1 - Mask length included. The Mask field is 4 bytes long, and 1185 contains the mask length, in bits. 1186 2 - Full Mask included. The Mask field is the same length 1187 as the Address field, and contains the full bitmask. 1189 AddrFam: 1190 The IANA-assigned address family number of the encoded prefix. 1192 Address: 1193 The address associated with the given prefix to be encoded. The 1194 length is determined based on the Address Family. 1196 Mask: 1197 The mask associated with the given prefix. The format (or absence) 1198 of this field is determined by the EnTyp field. 1200 Each attribute is of the form: 1202 0 1 2 3 1203 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1204 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1205 | Length | Type | Data ... 1206 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1207 All attributes are 4-byte aligned. 1209 Length: 1210 The Length is the length of the entire attribute, including the 1211 length, type, and data fields. If other attributes are nested 1212 within the data field, the length includes the size of all such 1213 nested attributes. 1215 Type: 1217 Types 128-255 are reserved for "optional" attributes. If a 1218 required attribute is unrecognized, a NOTIFICATION will be sent and 1219 the connection will be closed if the error is a fatal one. 1220 Unrecognized optional attributes are simply ignored. 1222 0 - JOIN 1224 Draft BGMP March 2000 1226 1 - PRUNE 1227 2 - GROUP 1228 3 - SOURCE 1229 4 - FWDR_PREF 1230 5 - POISON_REVERSE 1232 a) JOIN (Type Code 0) 1234 The JOIN attribute indicates that all GROUP or SOURCE options 1235 nested immediately within the JOIN option should be joined. 1237 0 1 2 3 1238 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1239 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1240 | Length | Type=0 | Reserved | 1241 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1242 | Nested Attributes ... 1243 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1244 No JOIN, PRUNE, or FWDR_PREF attributes may be immediately nested 1245 within a JOIN attribute. 1247 b) PRUNE (Type Code 1) 1249 The PRUNE attribute indicates that all GROUP or SOURCE attributes 1250 nested immediately within the PRUNE attribute should be pruned. 1252 0 1 2 3 1253 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1254 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1255 | Length | Type=1 | Reserved | 1256 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1257 | Nested Attributes ... 1258 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1259 No JOIN, PRUNE, or FWDR_PREF attributes may be immediately nested 1260 within a PRUNE attribute. 1262 c) GROUP (Type Code 2) 1264 The GROUP attribute identifies a given group-prefix. In addition, 1265 any attributes nested immediately within the GROUP attribute also 1266 apply to the given group-prefix. 1268 0 1 2 3 1269 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1270 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1272 Draft BGMP March 2000 1274 | Length | Type=2 | | 1275 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + 1276 | | 1277 | Encoded-Address-Prefix | 1278 | | 1279 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1280 | Nested Attributes (optional) ... 1281 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1282 Encoded-Address-Prefix 1283 The multicast group prefix to be joined to 1284 pruned, 1285 in the format described above. 1286 Nested Attributes No GROUP, SOURCE, or FWDR_PREF attributes may 1287 be 1288 immediately nested within a GROUP attribute. 1290 d) SOURCE (Type Code 3): 1292 The SOURCE attribute identifies a given source-prefix. In 1293 addition, any attributes nested immediately within the SOURCE 1294 attribute also apply to the given source-prefix. 1296 The SOURCE attribute has the following format: 1298 0 1 2 3 1299 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1300 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1301 | Length | Type=2 | | 1302 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + 1303 | | 1304 | Encoded-Address-Prefix | 1305 | | 1306 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1307 | Nested Attributes (optional) ... 1308 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1309 Encoded-Address-Prefix 1310 The Source-prefix in the format described 1311 above. 1312 Nested Attributes No GROUP, SOURCE, or FWDR_PREF attributes may 1313 be 1314 immediately nested within a SOURCE attribute. 1316 e) FWDR_PREF (Type Code 4) 1318 The FWDR_PREF attribute provides a forwarder preference value for 1320 Draft BGMP March 2000 1322 all GROUP or SOURCE attributes nested immediately within the 1323 FWDR_PREF attribute. It is used by a BGMP speaker to inform other 1324 BGMP speakers of the originating speaker's degree of preference for 1325 a given group or source prefix. Usage of this attribute is 1326 described in 5.5. 1328 0 1 2 3 1329 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1330 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1331 | Length | Type=1 | Reserved | 1332 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1333 | Preference Value | 1334 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1335 | Nested Attributes ... 1336 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1337 Preference Value A 32-bit non-negative integer. 1338 Nested Attributes No JOIN, PRUNE, or FWDR_PREF attributes may be 1339 immediately nested within a FWDR_PREF 1340 attribute. 1342 e) POISON_REVERSE (Type Code 5) 1344 The POISON_REVERSE attribute provides a "poison-reverse" (PR-bit) 1345 value for all SOURCE attributes nested immediately within the 1346 POISON_REVERSE attribute. It is used by a BGMP speaker to inform 1347 other BGMP speakers from which it has received (S,G) Joins that 1348 they are on the path of domains between the source and the root 1349 domain. 1351 0 1 2 3 1352 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1353 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1354 | Length | Type=1 | Reserved |P| 1355 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1356 | Nested Attributes ... 1357 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1358 P The PR-bit value. 1359 Nested Attributes No attribues in the document other than SOURCE 1360 may be immediately nested within a 1361 POISON_REVERSE 1362 attribute. 1364 Draft BGMP March 2000 1366 8.4. Encoding examples 1368 Below are enumerated examples of how various updates are built using 1369 nested attributes, where A ( B ) denotes that attribute B is nested 1370 within attribute A. 1371 (*,G-prefix) Join: JOIN ( GROUP ) 1372 (*,G-prefix) Prune: PRUNE ( GROUP ) 1373 (S,G) Join towards S : GROUP ( JOIN ( SOURCE ) ) 1374 (S,G) Join cancelling prune towards G: GROUP ( JOIN ( SOURCE ) ) 1375 (S,G) Prune towards S: GROUP ( PRUNE ( SOURCE ) ) 1376 (S,G) Prune towards G: GROUP ( PRUNE ( SOURCE ) ) 1377 Switch from (*,G) to (S,G): PRUNE ( GROUP ( JOIN ( SOURCE ) ) ) 1378 Switch from (S,G) to (*,G): JOIN ( GROUP ) 1379 Initial (*,G) Join with S pruned: JOIN ( GROUP ( PRUNE ( SOURCE ) ) ) 1380 Forwarder preference announcement for G-prefix: FWDR_PREF ( GROUP ) 1381 Forwarder preference announcement for S-prefix: FWDR_PREF ( SOURCE ) 1383 8.5. KEEPALIVE Message Format 1385 BGMP does not use any transport protocol-based keep-alive mechanism 1386 to determine if peers are reachable. Instead, KEEPALIVE messages are 1387 exchanged between peers often enough as not to cause the Hold Timer 1388 to expire. A reasonable maximum time between the last KEEPALIVE or 1389 UPDATE message sent, and the time at which a KEEPALIVE message is 1390 sent, would be one third of the Hold Time interval. KEEPALIVE 1391 messages MUST NOT be sent more frequently than one per second. An 1392 implementation MAY adjust the rate at which it sends KEEPALIVE 1393 messages as a function of the Hold Time interval. 1395 If the negotiated Hold Time interval is zero, then periodic KEEPALIVE 1396 messages MUST NOT be sent. 1398 A KEEPALIVE message consists of only a message header, and has a 1399 length of 4 octets. 1401 8.6. NOTIFICATION Message Format 1403 A NOTIFICATION message is sent when an error condition is detected. 1404 The BGMP connection is closed immediately after sending it if the 1405 error is a fatal one. 1407 In addition to the fixed-size BGMP header, the NOTIFICATION message 1408 contains the following fields: 1410 Draft BGMP March 2000 1412 0 1 2 3 1413 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1414 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1415 |O| Error code | Error subcode | Data | 1416 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + 1417 | | 1418 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1420 O-bit: 1421 Open-bit. If clear, the connection will be closed. 1422 If set, indicates the error is not fatal. 1424 Error Code: 1426 This 1-octet unsigned integer indicates the type of 1427 NOTIFICATION. The following Error Codes have been defined: 1429 Error Code Symbolic Name Reference 1431 1 Message Header Error Section 9.1 1433 2 OPEN Message Error Section 9.2 1435 3 UPDATE Message Error Section 9.3 1437 4 Hold Timer Expired Section 9.5 1439 5 Finite State Machine Error Section 9.6 1441 6 Cease Section 9.7 1443 Error subcode: 1445 This 1-octet unsigned integer provides more specific 1446 information about the nature of the reported error. Each 1447 Error 1448 Code may have one or more Error Subcodes associated with it. 1449 If no appropriate Error Subcode is defined, then a zero 1450 (Unspecific) value is used for the Error Subcode field. 1451 The notation (MC) below indicates the error is a fatal one 1452 and the O-bit must be clear. Non-fatal subcodes SHOULD 1453 be sent with the O-bit set. 1455 Message Header Error subcodes: 1457 Draft BGMP March 2000 1459 2 - Bad Message Length 1460 (MC) 1461 3 - Bad Message Type 1462 (MC) 1464 OPEN Message Error subcodes: 1466 1 - Unsupported Version 1467 (MC) 1468 4 - Unsupported Optional Parameter 1469 5 - Authentication Failure 1470 (MC) 1471 6 - Unacceptable Hold Time 1472 (MC) 1473 7 - Unsupported Capability 1474 (MC) 1476 UPDATE Message Error subcodes: 1478 1 - Malformed Attribute List 1479 (MC) 1480 2 - Unrecognized Attribute Type 1481 5 - Attribute Length Error 1482 (MC) 1483 10 - Invalid Address 1484 11 - Invalid Mask 1485 13 - Unrecognized Address Family 1486 Data: 1487 This variable-length field is used to diagnose the reason for the 1488 NOTIFICATION. The contents of the Data field depend upon the 1489 Error Code and Error Subcode. See Section 9 below for more 1490 details. 1492 Note that the length of the Data field can be determined from the 1493 message Length field by the formula: 1495 Message Length = 6 + Data Length 1497 The minimum length of the NOTIFICATION message is 6 octets 1498 (including message header). 1500 Draft BGMP March 2000 1502 9. BGMP Error Handling 1504 This section describes actions to be taken when errors are detected 1505 while processing BGMP messages. BGMP Error Handling is similar to 1506 that of BGP [BGP]. 1508 When any of the conditions described here are detected, a 1509 NOTIFICATION message with the indicated Error Code, Error Subcode, 1510 and Data fields is sent, and the BGMP connection is closed if the 1511 error is a fatal one. If no Error Subcode is specified, then a zero 1512 must be used. 1514 The phrase "the BGMP connection is closed" means that the transport 1515 protocol connection has been closed and that all resources for that 1516 BGMP connection have been deallocated. The remote peer is removed 1517 from the target list of all tree state entries. 1519 Unless specified explicitly, the Data field of the NOTIFICATION 1520 message that is sent to indicate an error is empty. 1522 9.1. Message Header error handling 1524 All errors detected while processing the Message Header are indicated 1525 by sending the NOTIFICATION message with Error Code Message Header 1526 Error. The Error Subcode elaborates on the specific nature of the 1527 error. 1529 If the Length field of the message header is less than 4 or greater 1530 than 4096, or if the Length field of an OPEN message is less than 1531 the minimum length of the OPEN message, or if the Length field of an 1532 UPDATE message is less than the minimum length of the UPDATE message, 1533 or if the Length field of a KEEPALIVE message is not equal to 4, then 1534 the Error Subcode is set to Bad Message Length. The Data field 1535 contains the erroneous Length field. 1537 If the Type field of the message header is not recognized, then the 1538 Error Subcode is set to Bad Message Type. The Data field contains 1539 the erroneous Type field. 1541 9.2. OPEN message error handling 1543 All errors detected while processing the OPEN message are indicated 1544 by sending the NOTIFICATION message with Error Code OPEN Message 1546 Draft BGMP March 2000 1548 Error. The Error Subcode elaborates on the specific nature of the 1549 error. 1551 If the version number contained in the Version field of the received 1552 OPEN message is not supported, then the Error Subcode is set to 1553 Unsupported Version Number. The Data field is a 2-octet unsigned 1554 integer, which indicates the largest locally supported version number 1555 less than the version the remote BGMP peer bid (as indicated in the 1556 received OPEN message). 1558 If the Hold Time field of the OPEN message is unacceptable, then the 1559 Error Subcode MUST be set to Unacceptable Hold Time. An 1560 implementation MUST reject Hold Time values of one or two seconds. 1561 An implementation MAY reject any proposed Hold Time. An 1562 implementation which accepts a Hold Time MUST use the negotiated 1563 value for the Hold Time. 1565 If one of the Optional Parameters in the OPEN message is not 1566 recognized, then the Error Subcode is set to Unsupported Optional 1567 Parameters. 1569 If the OPEN message carries Authentication Information (as an 1570 Optional Parameter), then the corresponding authentication procedure 1571 is invoked. If the authentication procedure (based on Authentication 1572 Code and Authentication Data) fails, then the Error Subcode is set to 1573 Authentication Failure. 1575 If the OPEN message indicates that the peer does not support a 1576 capability which the receiver requires, the receiver may send a 1577 NOTIFICATION message to the peer, and terminate peering. The Error 1578 Subcode in the message is set to Unsupported Capability. The Data 1579 field in the NOTIFICATION message lists the set of capabilities that 1580 cause the speaker to send the message. Each such capability is 1581 encoded the same way as it was encoded in the received OPEN message. 1583 9.3. UPDATE message error handling 1585 All errors detected while processing the UPDATE message are indicated 1586 by sending the NOTIFICATION message with Error Code UPDATE Message 1587 Error. The error subcode elaborates on the specific nature of the 1588 error. 1590 Draft BGMP March 2000 1592 If any recognized attribute has Attribute Length that conflicts with 1593 the expected length (based on the attribute type code), then the 1594 Error Subcode is set to Attribute Length Error. The Data field 1595 contains the erroneous attribute (type, length and value). 1597 If the Encoded-Address-Prefix field in some attribute is 1598 syntactically incorrect, then the Error Subcode is set to Invalid 1599 Prefix Field. 1601 If any other is encountered when processing attributes (such as 1602 invalid nestings), then the Error Subcode is set to Malformed 1603 Attribute List, and the problematic attribute is included in the data 1604 field. 1606 9.4. NOTIFICATION message error handling 1608 If a peer sends a NOTIFICATION message, and there is an error in that 1609 message, there is unfortunately no means of reporting this error via 1610 a subsequent NOTIFICATION message. Any such error, such as an 1611 unrecognized Error Code or Error Subcode, should be noticed, logged 1612 locally, and brought to the attention of the administration of the 1613 peer. The means to do this, however, lies outside the scope of this 1614 document. 1616 9.5. Hold Timer Expired error handling 1618 If a system does not receive successive KEEPALIVE and/or UPDATE 1619 and/or NOTIFICATION messages within the period specified in the Hold 1620 Time field of the OPEN message, then the NOTIFICATION message with 1621 Hold Timer Expired Error Code must be sent and the BGMP connection 1622 closed. 1624 9.6. Finite State Machine error handling 1626 Any error detected by the BGMP Finite State Machine (e.g., receipt of 1627 an unexpected event) is indicated by sending the NOTIFICATION message 1628 with Error Code Finite State Machine Error. 1630 Draft BGMP March 2000 1632 9.7. Cease 1634 In absence of any fatal errors (that are indicated in this section), 1635 a BGMP peer may choose at any given time to close its BGMP connection 1636 by sending the NOTIFICATION message with Error Code Cease. However, 1637 the Cease NOTIFICATION message must not be used when a fatal error 1638 indicated by this section does exist. 1640 9.8. Connection collision detection 1642 If a pair of BGMP speakers try simultaneously to establish a TCP 1643 connection to each other, then two parallel connections between this 1644 pair of speakers might well be formed. We refer to this situation as 1645 connection collision. Clearly, one of these connections must be 1646 closed. 1648 Based on the value of the BGMP Identifier a convention is established 1649 for detecting which BGMP connection is to be preserved when a 1650 collision does occur. The convention is to compare the BGMP 1651 Identifiers of the peers involved in the collision and to retain only 1652 the connection initiated by the BGMP speaker with the higher-valued 1653 BGMP Identifier. 1655 Upon receipt of an OPEN message, the local system must examine all of 1656 its connections that are in the OpenConfirm state. A BGMP speaker 1657 may also examine connections in an OpenSent state if it knows the 1658 BGMP Identifier of the peer by means outside of the protocol. If 1659 among these connections there is a connection to a remote BGMP 1660 speaker whose BGMP Identifier equals the one in the OPEN message, 1661 then the local system performs the following collision resolution 1662 procedure: 1664 1. The BGMP Identifier of the local system is compared to the BGMP 1665 Identifier of the remote system (as specified in the OPEN message). 1667 2. If the value of the local BGMP Identifier is less than the remote 1668 one, the local system closes BGMP connection that already exists (the 1669 one that is already in the OpenConfirm state), and accepts BGMP 1670 connection initiated by the remote system. 1672 3. Otherwise, the local system closes newly created BGMP connection 1673 (the one associated with the newly received OPEN message), and 1674 continues to use the existing one (the one that is already in the 1675 OpenConfirm state). 1677 Draft BGMP March 2000 1679 Comparing BGMP Identifiers is done by treating them as (4-octet long) 1680 unsigned integers. 1682 A connection collision with an existing BGMP connection that is in 1683 Established states causes unconditional closing of the newly created 1684 connection. Note that a connection collision cannot be detected with 1685 connections that are in Idle, or Connect, or Active states. 1687 Closing the BGMP connection (that results from the collision 1688 resolution procedure) is accomplished by sending the NOTIFICATION 1689 message with the Error Code Cease. 1691 10. BGMP Version Negotiation 1693 BGMP speakers may negotiate the version of the protocol by making 1694 multiple attempts to open a BGMP connection, starting with the 1695 highest version number each supports. If an open attempt fails with 1696 an Error Code OPEN Message Error, and an Error Subcode Unsupported 1697 Version Number, then the BGMP speaker has available the version 1698 number it tried, the version number its peer tried, the version 1699 number passed by its peer in the NOTIFICATION message, and the 1700 version numbers that it supports. If the two peers do support one or 1701 more common versions, then this will allow them to rapidly determine 1702 the highest common version. In order to support BGMP version 1703 negotiation, future versions of BGMP must retain the format of the 1704 OPEN and NOTIFICATION messages. 1706 10.1. BGMP Capability Negotiation 1708 When a BGMP speaker sends an OPEN message to its BGMP peer, the 1709 message may include an Optional Parameter, called Capabilities. The 1710 parameter lists the capabilities supported by the speaker. 1712 A BGMP speaker may use a particular capability when peering with 1713 another speaker only if both speakers support that capability. A 1714 BGMP speaker determines the capabilities supported by its peer by 1715 examining the list of capabilities present in the Capabilities 1716 Optional Parameter carried by the OPEN message that the speaker 1717 receives from the peer. 1719 Draft BGMP March 2000 1721 11. BGMP Finite State machine 1723 This section specifies BGMP operation in terms of a Finite State 1724 Machine (FSM). Following is a brief summary and overview of BGMP 1725 operations by state as determined by this FSM. 1727 Initially BGMP is in the Idle state. 1729 Idle state: 1731 In this state BGMP refuses all incoming BGMP connections. No 1732 resources are allocated to the peer. In response to the Start 1733 event (initiated by either system or operator) the local system 1734 initializes all BGMP resources, starts the ConnectRetry timer, 1735 initiates a transport connection to the other BGMP peer, while 1736 listening for a connection that may be initiated by the remote 1737 BGMP peer, and changes its state to Connect. The exact value of 1738 the ConnectRetry timer is a local matter, but should be 1739 sufficiently large to allow TCP initialization. 1741 If a BGMP speaker detects an error, it shuts down the connection 1742 and changes its state to Idle. Getting out of the Idle state 1743 requires generation of the Start event. If such an event is 1744 generated automatically, then persistent BGMP errors may result in 1745 persistent flapping of the speaker. To avoid such a condition it 1746 is recommended that Start events should not be generated 1747 immediately for a peer that was previously transitioned to Idle 1748 due to an error. For a peer that was previously transitioned to 1749 Idle due to an error, the time between consecutive generation of 1750 Start events, if such events are generated automatically, shall 1751 exponentially increase. The value of the initial timer shall be 60 1752 seconds. The time shall be doubled for each consecutive retry. 1754 Any other event received in the Idle state is ignored. 1756 Connect state: 1758 In this state BGMP is waiting for the transport protocol 1759 connection to be completed. 1761 If the transport protocol connection succeeds, the local system 1762 clears the ConnectRetry timer, completes initialization, sends an 1763 OPEN message to its peer, and changes its state to OpenSent. If 1764 the transport protocol connect fails (e.g., retransmission 1765 timeout), the local system restarts the ConnectRetry timer, 1767 Draft BGMP March 2000 1769 continues to listen for a connection that may be initiated by the 1770 remote BGMP peer, and changes its state to Active state. 1772 In response to the ConnectRetry timer expired event, the local 1773 system restarts the ConnectRetry timer, initiates a transport 1774 connection to the other BGMP peer, continues to listen for a 1775 connection that may be initiated by the remote BGMP peer, and 1776 stays in the Connect state. 1778 The Start event is ignored in the Connect state. 1780 In response to any other event (initiated by either system or 1781 operator), the local system releases all BGMP resources associated 1782 with this connection and changes its state to Idle. 1784 Active state: 1786 In this state BGMP is trying to acquire a peer by listening for an 1787 incoming transport protocol connection. 1789 If the transport protocol connection succeeds, the local system 1790 clears the ConnectRetry timer, completes initialization, sends an 1791 OPEN message to its peer, sets its Hold Timer to a large value, 1792 and changes its state to OpenSent. A Hold Timer value of 4 1793 minutes is suggested. 1795 In response to the ConnectRetry timer expired event, the local 1796 system restarts the ConnectRetry timer, initiates a transport 1797 connection to other BGMP peer, continues to listen for a 1798 connection that may be initiated by the remote BGMP peer, and 1799 changes its state to Connect. 1801 If the local system detects that a remote peer is trying to 1802 establish BGMP connection to it, and the IP address of the remote 1803 peer is not an expected one, the local system restarts the 1804 ConnectRetry timer, rejects the attempted connection, continues to 1805 listen for a connection that may be initiated by the remote BGMP 1806 peer, and stays in the Active state. 1808 The Start event is ignored in the Active state. 1810 In response to any other event (initiated by either system or 1811 operator), the local system releases all BGMP resources associated 1812 with this connection and changes its state to Idle. 1814 Draft BGMP March 2000 1816 OpenSent state: 1818 In this state BGMP waits for an OPEN message from its peer. When 1819 an OPEN message is received, all fields are checked for 1820 correctness. If the BGMP message header checking or OPEN message 1821 checking detects an error (see Section 6.2), or a connection 1822 collision (see Section 6.8) the local system sends a NOTIFICATION 1823 message and changes its state to Idle. 1825 If there are no errors in the OPEN message, BGMP sends a KEEPALIVE 1826 message and sets a KeepAlive timer. The Hold Timer, which was 1827 originally set to a large value (see above), is replaced with the 1828 negotiated Hold Time value (see section 4.2). If the negotiated 1829 Hold Time value is zero, then the Hold Time timer and KeepAlive 1830 timers are not started. If the value of the Autonomous System 1831 field is the same as the local Autonomous System number, then the 1832 connection is an "internal" connection; otherwise, it is 1833 "external". Finally, the state is changed to OpenConfirm. 1835 If a disconnect notification is received from the underlying 1836 transport protocol, the local system closes the BGMP connection, 1837 restarts the ConnectRetry timer, while continue listening for 1838 connection that may be initiated by the remote BGMP peer, and goes 1839 into the Active state. 1841 If the Hold Timer expires, the local system sends NOTIFICATION 1842 message with error code Hold Timer Expired and changes its state 1843 to Idle. 1845 In response to the Stop event (initiated by either system or 1846 operator) the local system sends NOTIFICATION message with Error 1847 Code Cease and changes its state to Idle. 1849 The Start event is ignored in the OpenSent state. 1851 In response to any other event the local system sends NOTIFICATION 1852 message with Error Code Finite State Machine Error and changes its 1853 state to Idle. 1855 Whenever BGMP changes its state from OpenSent to Idle, it closes 1856 the BGMP (and transport-level) connection and releases all 1857 resources associated with that connection. 1859 OpenConfirm state: 1861 Draft BGMP March 2000 1863 In this state BGMP waits for a KEEPALIVE or NOTIFICATION message. 1865 If the local system receives a KEEPALIVE message, it changes its 1866 state to Established. 1868 If the Hold Timer expires before a KEEPALIVE message is received, 1869 the local system sends NOTIFICATION message with error code Hold 1870 Timer Expired and changes its state to Idle. 1872 If the local system receives a NOTIFICATION message, it changes 1873 its state to Idle. 1875 If the KeepAlive timer expires, the local system sends a KEEPALIVE 1876 message and restarts its KeepAlive timer. 1878 If a disconnect notification is received from the underlying 1879 transport protocol, the local system changes its state to Idle. 1881 In response to the Stop event (initiated by either system or 1882 operator) the local system sends NOTIFICATION message with Error 1883 Code Cease and changes its state to Idle. 1885 The Start event is ignored in the OpenConfirm state. 1887 In response to any other event the local system sends NOTIFICATION 1888 message with Error Code Finite State Machine Error and changes its 1889 state to Idle. 1891 Whenever BGMP changes its state from OpenConfirm to Idle, it 1892 closes the BGMP (and transport-level) connection and releases all 1893 resources associated with that connection. 1895 Established state: 1897 In the Established state BGMP can exchange UPDATE, NOTIFICATION, 1898 and KEEPALIVE messages with its peer. 1900 If the local system receives an UPDATE or KEEPALIVE message, it 1901 restarts its Hold Timer, if the negotiated Hold Time value is non- 1902 zero. 1904 If the local system receives a NOTIFICATION message, it changes 1905 its state to Idle. 1907 If the local system receives an UPDATE message and the UPDATE 1909 Draft BGMP March 2000 1911 message error handling procedure (see Section 6.3) detects an 1912 error, the local system sends a NOTIFICATION message and changes 1913 its state to Idle. 1915 If a disconnect notification is received from the underlying 1916 transport protocol, the local system changes its state to Idle. 1918 If the Hold Timer expires, the local system sends a NOTIFICATION 1919 message with Error Code Hold Timer Expired and changes its state 1920 to Idle. 1922 If the KeepAlive timer expires, the local system sends a KEEPALIVE 1923 message and restarts its KeepAlive timer. 1925 Each time the local system sends a KEEPALIVE or UPDATE message, it 1926 restarts its KeepAlive timer, unless the negotiated Hold Time 1927 value is zero. 1929 In response to the Stop event (initiated by either system or 1930 operator), the local system sends a NOTIFICATION message with 1931 Error Code Cease and changes its state to Idle. 1933 The Start event is ignored in the Established state. 1935 In response to any other event, the local system sends 1936 NOTIFICATION message with Error Code Finite State Machine Error 1937 and changes its state to Idle. 1939 Whenever BGMP changes its state from Established to Idle, it 1940 closes the BGMP (and transport-level) connection, releases all 1941 resources associated with that connection, and deletes all routes 1942 derived from that connection. 1944 12. Security Considerations 1946 BGMP uses TCP sessions for all network communication between peers. TCP 1947 sessions may be secured through the use of IPsec [IPSEC]. 1949 13. Authors' Addresses 1951 Dave Thaler 1952 Department of Electrical Engineering and Computer Science 1954 Draft BGMP March 2000 1956 Microsoft 1957 One Microsoft Way 1958 Redmond, WA 98052 1959 EMail: dthaler@microsoft.com 1961 Deborah Estrin 1962 Computer Science Dept./ISI 1963 University of Southern California 1964 Los Angeles, CA 90089 1965 EMail: estrin@usc.edu 1967 David Meyer 1968 Cisco Systems 1969 San Jose, CA 1970 EMail: dmm@cisco.com 1972 14. References 1974 [BGP] 1975 Rekhter, Y., and T. Li, "A Border Gateway Protocol 4 (BGP-4)", RFC 1976 1771, March 1995. 1978 [MBGP] 1979 Bates, T., Chandra, R., Katz, D., and Y. Rekhter, "Multiprotocol 1980 Extensions for BGP-4", RFC 2283, February 1998. 1982 [CBT] 1983 Ballardie, A., "Core Based Trees (CBT version 2) Multicast 1984 Routing", RFC 2189, September 1997. 1986 [DVMRP] 1987 Pusateri, T., "Distance Vector Multicast Routing Protocol", draft- 1988 ietf-idmr-dvmrp-v3-09.txt, March 2000. 1990 [DWR] 1991 Fenner, W., "Domain-Wide Reports", draft-ietf-idmr-membership- 1992 reports-04.txt, August 1999. 1994 [INTEROP] 1995 Thaler, D., "Interoperability Rules for Multicast Routing 1996 Protocols", RFC 2715, October 1999. 1998 [IPSEC] 1999 Kent, S., and R. Atkinson, "Security Architecture for the Internet 2001 Draft BGMP March 2000 2003 Protocol", RFC 2401, November 1998. 2005 [IPv6AA] 2006 Hinden, R., and S. Deering, "IP Version 6 Addressing Architecture", 2007 RFC 2373, July 1998. 2009 [ISSUES] 2010 Meyer, D., "Some Issues for an Inter-domain Multicast Routing 2011 Protocol", draft-ietf-mboned-imrp-some-issues-02.txt, June 1997. 2013 [MASC] 2014 Estrin, D., Handley, M, and D. Thaler, "Multicast-Address-Set 2015 advertisement and Claim mechanism", draft-ietf-malloc-masc-05.txt, 2016 July 2000. 2018 [MOSPF] 2019 Moy, J., "Multicast Extensions to OSPF", RFC 1584, Proteon, March 2020 1994. 2022 [PIMDM] 2023 Estrin, et al., "Protocol Independent Multicast-Dense Mode (PIM- 2024 DM): Protocol Specification", draft-ietf-idmr-pim-dm-spec-05.txt, 2025 May 1997. 2027 [PIMSM] 2028 Estrin, et al., "Protocol Independent Multicast-Sparse Mode (PIM- 2029 SM): Protocol Specification", RFC 2362, June 1998. 2031 [REFLECT] 2032 Bates, T., and R. Chandra, "BGP Route Reflection: An alternative to 2033 full mesh IBGP", RFC 1966, June 1996. 2035 [RFC1700] 2036 S. J. Reynolds, J. Postel, "ASSIGNED NUMBERS", RFC 1700, October 2037 1994. 2039 [RFC2119] 2040 S. Bradner, "Key words for use in RFCs to Indicate Requirement 2041 Levels", BCP 14, RFC 2119, March 1997. 2043 Draft BGMP March 2000 2045 15. Full Copyright Statement 2047 Copyright (C) The Internet Society (2000). All Rights Reserved. 2049 This document and translations of it may be copied and furnished to 2050 others, and derivative works that comment on or otherwise explain it or 2051 assist in its implementation may be prepared, copied, published and 2052 distributed, in whole or in part, without restriction of any kind, 2053 provided that the above copyright notice and this paragraph are included 2054 on all such copies and derivative works. However, this document itself 2055 may not be modified in any way, such as by removing the copyright notice 2056 or references to the Internet Society or other Internet organizations, 2057 except as needed for the purpose of developing Internet standards in 2058 which case the procedures for copyrights defined in the Internet 2059 languages other than English. 2061 The limited permissions granted above are perpetual and will not be 2062 revoked by the Internet Society or its successors or assigns. 2064 This document and the information contained herein is provided on an "AS 2065 IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK 2066 FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT 2067 LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT 2068 INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR 2069 FITNESS FOR A PARTICULAR PURPOSE. 2071 Table of Contents 2073 1 Acknowledgements ................................................ 2 2074 2 Purpose ......................................................... 2 2075 3 Terminology ..................................................... 3 2076 4 Protocol Overview ............................................... 5 2077 4.1 Design Rationale .............................................. 6 2078 5 Protocol Details ................................................ 8 2079 5.1 Interaction with the EGP ...................................... 8 2080 5.2 Multicast Data Packet Processing .............................. 9 2081 5.3 BGMP processing of Join and Prune messages and notifications 2082 .............................................................. 10 2083 5.3.1 Receiving Joins ............................................. 10 2084 5.3.2 Receiving Prune Notifications ............................... 11 2085 5.3.3 Receiving Route Change Notifications ........................ 12 2086 5.3.4 Receiving (S,G) Poison-Reverse messages ..................... 13 2087 5.4 Interaction with M-IGP components ............................. 13 2088 5.4.1 Interaction with DVMRP and PIM-DM ........................... 14 2090 Draft BGMP March 2000 2092 5.4.2 Interaction with PIM-SM ..................................... 15 2093 5.4.3 Interaction with CBT ........................................ 16 2094 5.4.4 Interaction with MOSPF ...................................... 17 2095 5.5 Operation over Multi-access Networks .......................... 18 2096 5.6 Interaction between (S,G) state and G-routes .................. 19 2097 6 Interaction with address allocation ............................. 19 2098 6.1 Requirements for BGMP components .............................. 19 2099 7 Transition Strategy ............................................. 19 2100 7.1 Preventing transit through the MBone stub ..................... 21 2101 8 Message Formats ................................................. 22 2102 8.1 Message Header Format ......................................... 22 2103 8.2 OPEN Message Format ........................................... 23 2104 8.3 UPDATE Message Format ......................................... 26 2105 8.4 Encoding examples ............................................. 31 2106 8.5 KEEPALIVE Message Format ...................................... 31 2107 8.6 NOTIFICATION Message Format ................................... 31 2108 9 BGMP Error Handling ............................................. 34 2109 9.1 Message Header error handling ................................. 34 2110 9.2 OPEN message error handling ................................... 34 2111 9.3 UPDATE message error handling ................................. 35 2112 9.4 NOTIFICATION message error handling ........................... 36 2113 9.5 Hold Timer Expired error handling ............................. 36 2114 9.6 Finite State Machine error handling ........................... 36 2115 9.7 Cease ......................................................... 37 2116 9.8 Connection collision detection ................................ 37 2117 10 BGMP Version Negotiation ....................................... 38 2118 10.1 BGMP Capability Negotiation .................................. 38 2119 11 BGMP Finite State machine ...................................... 39 2120 12 Security Considerations ........................................ 43 2121 13 Authors' Addresses ............................................. 43 2122 14 References ..................................................... 44 2123 15 Full Copyright Statement ....................................... 46