idnits 2.17.1 draft-ietf-idmr-pim-arch-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-19) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity. ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. ** Expected the document's filename to be given on the first page, but didn't find any ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 24 longer pages, the longest (page 2) being 68 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Introduction section. (A line matching the expected section header was found, but with an unexpected indentation: ' 1.2 Background' ) ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack an Authors' Addresses Section. ** There are 543 instances of weird spacing in the document. Is it really formatted ragged-right, rather than justified? ** There are 8 instances of too long lines in the document, the longest one being 4 characters in excess of 72. ** There is 1 instance of lines with control characters in the document. ** The abstract seems to contain references ([15], [2], [3], [4], [18], [5], [19], [6], [7], [8], [9], [10], [11], [12], [13], [14], [1]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 12 has weird spacing: '...ivation and...' == Line 17 has weird spacing: '... Drafts are ...' == Line 18 has weird spacing: '...cuments of t...' == Line 19 has weird spacing: '...ups may also ...' == Line 23 has weird spacing: '... Drafts may ...' == (538 more instances...) == Couldn't figure out when the document was first submitted -- there may comments or warnings related to the use of a disclaimer for pre-RFC5378 work that could not be issued because of this. Please check the Legal Provisions document at https://trustee.ietf.org/license-info to determine if you need the pre-RFC5378 disclaimer. -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Missing reference section? '1' on line 93 looks like a reference -- Missing reference section? '2' on line 306 looks like a reference -- Missing reference section? '3' on line 871 looks like a reference -- Missing reference section? '4' on line 76 looks like a reference -- Missing reference section? '5' on line 93 looks like a reference -- Missing reference section? '6' on line 306 looks like a reference -- Missing reference section? '7' on line 266 looks like a reference -- Missing reference section? '8' on line 266 looks like a reference -- Missing reference section? '9' on line 282 looks like a reference -- Missing reference section? '10' on line 294 looks like a reference -- Missing reference section? '11' on line 312 looks like a reference -- Missing reference section? '12' on line 335 looks like a reference -- Missing reference section? '13' on line 386 looks like a reference -- Missing reference section? '14' on line 409 looks like a reference -- Missing reference section? '15' on line 452 looks like a reference -- Missing reference section? '19' on line 871 looks like a reference -- Missing reference section? '18' on line 932 looks like a reference Summary: 16 errors (**), 0 flaws (~~), 9 warnings (==), 18 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group Steven Deering (XEROX) 3 Internet Draft Deborah Estrin (USC) 4 Dino Farinacci (CISCO) 5 Mark Handley (UCL) 6 Ahmed Helmy (USC) 7 Van Jacobson (LBL) 8 Chinggung Liu (USC) 9 Puneet Sharma (USC) 10 David Thaler (UMICH) 11 Liming Wei (CISCO) 12 Protocol Independent Multicast-Sparse Mode (PIM-SM): Motivation and 13 Architecture 15 Status of This Memo 17 This document is an Internet Draft. Internet Drafts are working 18 documents of the Internet Engineering Task Force (IETF), its Areas, 19 and its Working Groups. (Note that other groups may also distribute 20 working documents as Internet Drafts). 22 Internet Drafts are draft documents valid for a maximum of six 23 months. Internet Drafts may be updated, replaced, or obsoleted by 24 other documents at any time. It is not appropriate to use Internet 25 Drafts as reference material or to cite them other than as a 26 ``working'' draft'' or ``work in progress.'' 28 Please check the I-D abstract listing contained in each Internet 29 Draft directory to learn the current status of this or any other 30 Internet Draft. 32 Abstract 34 Traditional multicast routing mechanisms (e.g. DVMRP and 35 MOSPF [1][2]) were intended for use within regions where groups are 36 widely represented or bandwidth is universally plentiful. When group 37 members, and senders to those group members, are distributed 38 sparsely across a wide area, these schemes are not efficient; data 39 packets or membership report information are periodically sent over 40 many links that do not lead to receivers or senders, respectively. 41 This characteristic lead the Internet community to investigate 42 multicast routing architectures that efficiently establish 43 distribution trees across wide-area internets, where many groups are 44 sparsely represented and where bandwidth is not uniformly plentiful 45 due to the distances and multiple administrations traversed. 46 Efficiency is evaluated in terms of the state, control message 47 processing, and data packet processing required across the entire 48 network in order to deliver data packets to the members of the group. 50 The Protocol Independent Multicast--Sparse Mode (PIM-SM) 51 architecture: 53 (a) maintains the traditional IP multicast service model of 54 receiver-initiated membership; 56 (b) uses explicit joins that propagate hop-by-hop from 57 members' directly connected routers toward the distribution 58 tree. 60 (c) builds a shared multicast distribution tree centered at a 61 Rendezvous Point, and then builds source-specific trees for 62 those sources whose data traffic warrants it. 64 (d) is not dependent on a specific unicast routing protocol; 65 and 67 (e) uses soft-state mechanisms to adapt to underlying network 68 conditions and group dynamics. 70 The robustness, flexibility, and scaling properties of this 71 architecture make it well suited to large heterogeneous inter- 72 networks. 74 This document motivates and describes the PIM-SM architecture. 75 Companion documents describe the detailed protocol mechanisms 76 for PIM-SM and PIM-DM, respectively [3][4]. 78 1 Introduction 80 This document describes an architecture for efficiently routing 81 to multicast groups that may span wide-area (and inter-domain) 82 internets. We refer to the approach as Protocol Independent 83 Multicast-Sparse Mode (PIM-SM) because it is not dependent on 84 any particular unicast routing protocol. Throughout this 85 document we will use the shorter term PIM, to mean PIM-SM. When 86 we are referring to the PIM Dense Mode protocol we will say 87 PIM-DM explicitly. 89 The most significant innovation in this architecture is the 90 efficient support of sparse, wide area groups. This sparse mode 91 (SM) of operation complements the traditional dense-mode 92 approach to multicast routing for campus networks, as developed 93 by Deering [5][6] and implemented in MOSPF and DVMRP [1][2]. 94 These traditional dense mode multicast schemes were intended for 95 use within regions where a group is widely represented or 96 bandwidth is universally plentiful. However, when group members, 97 and senders to those groups, are distributed sparsely across 98 a wide area, these schemes are not efficient; data packets (in 99 the case of DVMRP) or membership report information (in the case 100 of MOSPF) are occasionally sent over many links that do not 101 lead to receivers or senders, respectively. The purpose of this 102 work is to develop a multicast routing architecture that 103 efficiently establishes distribution trees even when members are 104 sparsely distributed. Efficiency is evaluated in terms of the 105 state, control message, and data packet overhead required across 106 the entire network in order to deliver data packets to the 107 members of the group. 109 1.1 Definition of Terms (Glossary): 110 Following is a list of terms and definitions 111 used throughout this document, in alphabetical order. 112 This is a subset of the glossary list that appears in the 113 protocol specification. 115 * Asserts. The process of choosing a single router to 116 forward multicast packets from a particular source onto a 117 particular LAN segment. The need for Asserts arises when 118 a LAN segment has multiple directly-connected routers with 119 routes to the source. 121 * Bootstrap router (BSR). A BSR is a dynamically elected 122 router within a PIM domain. It is responsible for 123 constructing the RP-Set and originating Bootstrap messages. 125 * Candidate-BSR (C-BSR). A C-BSR is a router configured to 126 participate in the BSR election and act as BSRs if elected. 128 * Dense-mode (DM). A generic term referring to a multicast 129 routing protocol that is optimized for dense groups. DVMRP, 130 MOSPF, and Dense-mode PIM are examples. 132 * Designated Router (DR). The DR is the highest IP 133 addressed PIM router on a multi-access LAN. Normally, the 134 DR sets up multicast route entries and sends corresponding 135 Join/Prune and Register messages on behalf of directly- 136 connected receivers and sources, respectively. The DR may 137 or may not be the same router as the IGMP Querier. The DR 138 may or may not be the long-term, last-hop router for the 139 group, or a particular source that is sending to the group; 140 a router on the LAN that has a lower metric route to the 141 data source, or to the group's RP, may take over that role. 143 * Incoming interface (iif). The iif of a multicast route 144 entry indicates the interface from which multicast data 145 packets are accepted for forwarding. The iif is initialized 146 when the entry is created. 148 * Join list. The Join list is one of two lists of IP unicast 149 addresses that is included in a Join/Prune message; each 150 address refers to a source or RP. It indicates those 151 sources or RPs to which downstream receiver(s) wish to 152 join. 154 * Last-hop router. The last-hop router is the router which 155 forwards multicast data packets to directly-connected 156 member hosts. In general the last-hop router is the DR for 157 the LAN. However, under various conditions described in 158 this document a parallel router connected to the same LAN 159 may take over as the last-hop router in place of the DR. 161 * Member. A host that desires to receive multicast datagrams 162 for a group. This host need not be a sender to the group. A 163 Member is synonymously called a Receiver. 165 * Outgoing interface (oif) list. Each multicast route 166 entry 168 has an oif list containing the outgoing interfaces to which 169 multicast packets matching that entry should be forwarded. 171 * Prune List. The Prune list is the second list of IP 172 unicast addresses that is included in a Join/Prune message. 173 It indicates those sources or RPs from which downstream 174 receiver(s) wish to prune. 176 * PIM Multicast Border Router (PMBR). A PMBR connects a 177 PIM domain to other multicast routing domain(s). 179 * Rendezvous Point (RP). Each multicast group has a 180 shared-tree via which receivers hear of sources. The RP is 181 the root of this per-group shared tree, called the RP-Tree. 182 Candidate-RPs are routers configured to participate as RPs 183 for some (or all) groups. 185 * RP-Set. The BSR for a PIM region constructs a set of RP 186 IP addresses based on Candidate-RP advertisements received. 187 The RP-Set information is distributed to all PIM routers in 188 a domain in a Bootstrap message. 190 * Reverse Path Forwarding (RPF). RPF is used to select the 191 appropriate incoming interface for a multicast route entry 192 . The RPF neighbor for an IP address X is the the next-hop 193 router used to forward packets toward X. The RPF interface 194 is the interface to that RPF neighbor. In the common case 195 this is the next hop used by the unicast routing protocol 196 for sending unicast packets toward X. For example, in cases 197 where unicast and multicast routes are not congruent, it 198 can be different. 200 * Route entry. A multicast route entry is state maintained 201 in a router along the distribution tree and is created, and 202 updated based on incoming control messages, and in some 203 cases data packets. The route entry may be different from 204 the forwarding entry; the latter is used to forward data 205 packets in real time. Typically a forwarding entry is not 206 created until data packets arrive, the forwarding entry's 207 iif and oif list are copied from the route entry, and the 208 forwarding entry may be flushed and recreated at will. 210 * Shared Tree (RP tree). The set of paths connecting all 211 receivers of a group to its RP is the RP tree. A receiver 212 on the RP tree receives packets from all sources of the 213 group, except those sources that were pruned off the RP 214 tree. 216 * Shortest path tree (SPT). The SPT is the multicast 217 distribution tree created by the merger of all of the 218 shortest paths that connect receivers to the source (as 219 determined by unicast routing). 221 * Source. A host that sends multicast datagrams to a group. 222 A Source is not required to be a member. A Source is 223 synonymously called a Sender. 225 * Sparse Mode (SM). Sparse mode PIM uses explicit 226 Join/Prune messages and Rendezvous points in place of Dense 227 Mode PIM's and DVMRP's broadcast and prune mechanism. 229 * Wildcard (WC) multicast route entry. Wildcard multicast 230 route entries are those entries that may be used to forward 231 packets for any source sending to the specified group. 232 Wildcard bits in the join list of a Join/Prune message 233 represent either a (*,G) or (*,*,RP) join; in the prune 234 list they represent a (*,G) prune. 236 * (S,G) route entry. (S,G) is a source-specific route 237 entry. It may be created in response to data packets, 238 Join/Prune messages, or Asserts. The (S,G) state in routers 239 creates a source-rooted, shortest path (or reverse shortest 240 path) distribution tree. (S,G)RPT bit entries are source- 241 specific entries on the shared RP-Tree; these entries are 242 used to prune particular sources off of the shared tree. 244 * (*,G) route entry. Group members join the shared RP-Tree 245 for a particular group. This tree is represented by (*,G) 246 multicast route entries along the shortest path branches 247 between the RP and the group members. 249 * (*,*,RP) route entry. PMBRs join toward all RPs 250 supporting non-local groups, within their PIM domain in 251 order to pull packets generated within the region out to 252 the borders of the region. The routers along the shortest 253 path branches between the RP(s) and the PMBRs keep (*,*,RP) 254 state and use it to determine how to deliver packets toward 255 the PMBRs if data packets arrive for which there is not a 256 longer match. 258 1.2 Background 260 In the traditional dense-mode IP multicast model, established by 261 Deering [6], a multicast address is assigned to the collection 262 of receivers for a multicast group. Senders simply use that 263 address as the destination address of a packet to reach all 264 members of the group. The separation of senders and receivers 265 allows any host, member or non-member, to send to a group. A 266 group membership protocol (IGMP) [7][8] is used for routers to 267 learn the existence of members on their directly attached 268 subnetworks. 269 This receiver-initiated join procedure has very good scaling 270 properties; as the group grows, it becomes more likely that a 271 new receiver will be able to splice onto a nearby branch of the 272 distribution tree. A multicast routing protocol, in the form of 273 an extension to existing unicast protocols (e.g. DVMRP, an 274 extension to a RIP-like distance-vector unicast protocol; or 275 MOSPF, an extension to the link-state unicast protocol OSPF), is 276 executed on routers to construct multicast packet delivery paths 277 and to accomplish multicast data packet forwarding. 279 In the case of link-state protocols, changes of group membership 280 on a subnetwork are detected by one of the routers directly 281 attached to that subnetwork, and that router broadcasts the 282 information to all other routers in the same routing domain [9]. 283 Each router maintains an up-to-date image of the domain's 284 topology through the unicast link-state routing protocol. Upon 285 receiving a multicast data packet, the router uses the topology 286 information and the group membership information to determine 287 the shortest-path tree (SPT) from the packet's source subnetwork 288 to its destination group members. Broadcasting of membership 289 information is one major factor preventing link-state multicast 290 from scaling to larger, wide-area, networks --- every router 291 must receive and store membership information for every group in 292 the domain. The other major factor is the processing cost of the 293 Dijkstra shortest-path-tree calculations performed to compute 294 the delivery trees for all active multicast sources [10] for all 295 groups, thus limiting its applicability on an internet-wide 296 basis. 298 Distance-vector multicast routing protocols construct multicast 299 distribution trees using variants of Reverse Path Forwarding 300 (RPF) [11]. When the first data packet is sent to a group from a 301 particular source subnetwork, and a router receiving this packet 302 has no knowledge about the group, the router forwards the 303 incoming packet out all interfaces except the incoming 304 interface. (Some schemes reduce the number of outgoing 305 interfaces further by using unicast routing protocol information 306 to keep track of child-parent information [6][2].) A special 307 mechanism is used to avoid forwarding of data packets to leaf 308 subnetworks with no members in that group (also known as 309 truncated broadcasting). Also if the arriving data packet does 310 not come through the interface that the router uses to send 311 packets to the source of the data packet, the data packet is 312 silently dropped; thus the term Reverse Path Forwarding [11]. 313 When a router attached to a leaf subnetwork, receives a data 314 packet addressed to a new group, if it finds no members present 315 on its attached subnetworks, it sends a prune message upstream 316 towards the source of the data packet. The prune messages prune 317 the tree branches not leading to group members, thus resulting 318 in a source-specific shortest-path tree with all leaves having 319 members. Pruned branches will ``grow back'' after a time-out 320 period; these branches will again be pruned if there are still 321 no multicast members and data packets are still being sent to 322 the group. 324 Compared with the total number of destinations within the 325 greater internet, the number of destinations having group 326 members of any particular wide-area group is likely to be 327 small. More importantly, bandwidth limitations, and therefore 328 data and control message overhead, should not be ignored in a 329 wide area context. In the case of distance-vector multicast 330 schemes, routers that are not on the multicast delivery tree 331 still have to carry the periodic truncated-broadcast of packets, 332 and process the subsequent pruning of branches for all active 333 groups. One particular distance-vector multicast protocol, 334 DVMRP, has been deployed in hundreds of regions connected by the 335 MBONE [12]. However, its occasional broadcasting behavior 336 severely limits its capability to scale to larger networks 337 supporting much larger numbers of groups, many of which are 338 sparse. 340 1.3 Extending multicast to the wide area: scaling issues 342 The scalability of a multicast protocol can be evaluated in 343 terms of its overhead growth with the size of the internet, 344 numbers of receivers or sources per group, number of groups, and 345 distribution of group receivers and senders. Overhead is 346 evaluated in terms of resources consumed in routers and links, 347 i.e., state, processing, and bandwidth. 349 Existing dense-mode link-state and distance-vector multicast 350 routing schemes have good scaling properties only when multicast 351 groups densely populate the network of interest, or when the 352 overhead of dense-mode operation is negligible relative to the 353 network resources. When most of the subnets or links in the 354 (inter)network have group members, then the bandwidth, storage 355 and processing overhead of broadcasting membership reports 356 (link-state), or data packets (distance-vector) is warranted, 357 since the information or data packets are needed in most parts 358 of the network anyway. The emphasis of our work is to develop 359 multicast protocols that will also efficiently support the 360 sparsely distributed groups that are likely to be most prevalent 361 in wide-area, multi-administration, inter-networks where 362 resources must be used more conservatively. 364 1.4 Overhead and tree types 366 [Figures are present only in the postscript version] 367 Fig. 1 Example of Multicast Trees 369 The examples in Figure 1 illustrate the inadequacies of dense- 370 mode mechanisms when supporting sparse, wide area groups. There 371 are three domains that communicate via an internet. There is a 372 member of a particular group, G, located in each of the domains. 373 There are no other members of this group currently active in the 374 internet. If a traditional IP multicast routing mechanism such 375 as DVMRP is used, then when a source in domain 376 A starts to send to the group, its data packets will be 377 broadcast throughout the entire internet. Subsequently all those 378 sites that do not have local members will send prune messages 379 and the distribution tree will stabilize to that illustrated 380 with bold lines in Figure 1(b). However, periodically, the 381 source's packets will be broadcast throughout the entire 382 internet when the pruned-off branches times out. 384 Thus far we have motivated our design by contrasting it to the 385 traditional dense-mode IP multicast routing protocols. The Core 386 Based Tree (CBT) protocol [13] was proposed to address similar 387 scaling problems in support of sparse-mode multicast. CBT uses a 388 single delivery tree for each group, rooted at one of a small 389 set of ``core'' routers and shared by all senders to the group. 390 CBT does not exhibit the occasional broadcasting or flooding 391 behavior of earlier protocols. However, CBT does so at the cost 392 of imposing a single shared tree for each multicast group. 394 If CBT were used to support the example group, then a core might 395 be defined in domain A, and the distribution tree illustrated in 396 Figure 1(c) would be established. This distribution tree would 397 also be used by sources sending from domains B and C. This would 398 result in concentration of all sources' traffic on the path 399 indicated with bold lines. We refer to this as traffic 400 concentration. This is a potentially significant issue with 401 any protocol that imposes a single shared tree per group. In 402 addition, the packets traveling from Y to Z will not travel 403 via the shortest path used by unicast packets between Y and Z. 405 [Figures are present only in the postscript version] 406 Fig. 2 Comparison of shortest-path trees and center-based tree 408 We need to know the kind of degradations a core-based tree can 409 incur in average networks. David Wall [14] proved that the bound 410 on maximum delay of an optimal core-based tree (which he called 411 a center-based tree) is 2 times the shortest-path delay. To 412 get a better understanding of how well optimal core-based trees 413 perform in average cases, we simulated an optimal core-based 414 tree algorithm over large number of different random graphs. We 415 measured the maximum delay within each group, and experimented 416 with graphs of different node degrees. We show the ratio of the 417 CBT maximum delay versus shortest-path tree maximum delay in 418 Figure 2(a). For each node degree, we tried 500 different 50- 419 node graphs with 10-member groups chosen randomly. It can be 420 seen that the maximum delays of core-based trees with optimal 421 core placement, are up to 1.4 times greater than shortest-path 422 trees. Note that although some error bars in the delay graph 423 extend below 1, there are no real data points below 1 --- the 424 distribution is not symmetric, for more details see [15]. For 425 interactive applications where low latency is critical, it is 426 desirable to use the shortest-path trees to avoid the longer 427 delays of an optimal core-based tree. 429 With respect to the potential traffic concentration problem, we 430 also conducted simulations in randomly generated 50-node 431 networks. In each network, there were 300 active groups all 432 having 40 members, of which 32 members were also senders. We 433 measured the number of traffic flows on each link of the 434 network, then recorded the maximum number within the network. 435 For each node degree between three and eight, 500 random 436 networks were generated, and the measured maximum number of 437 traffic flows were averaged. Figure 2(b) shows a plot of the 438 measurements in networks with different node degrees. This 439 experiment demonstrates situations in which CBT may exhibit 440 significantly greater traffic concentrations. 442 It is evident to us that both tree types have their advantages 443 and disadvantages. One type of tree may perform very well under 444 one class of conditions, while the other type may be better in 445 other situations. For example, shared trees may perform very 446 well for large numbers of low data rate sources (e.g., resource 447 discovery applications), while SPT(s) may be better suited for 448 high data rate sources (e.g., real time teleconferencing). It 449 would be ideal to flexibly support both types of trees within 450 one multicast architecture, so that the selection of tree types 451 becomes a configuration decision within a multicast protocol. A 452 more complete analysis of these tradeoffs can be found in [15]. 453 PIM is designed to address the two issues addressed above: to 454 avoid the overhead of broadcasting packets when group members 455 sparsely populate the internet, and to do so in a way that 456 supports good-quality distribution trees for heterogeneous 457 applications. 459 In PIM, a multicast router can choose to use shortest-path trees 460 or a group-shared tree. The last-hop routers of the receivers 461 can make this decision independently. A receiver could even 462 choose different types of trees for different sources. In 463 general, we recommend that routers be configured to join the 464 shortest path tree for a source when the source's data rate 465 exceeds a configured threshold. 467 The capability to support different tree types is the 468 fundamental difference between PIM and CBT. There are other 469 significant protocol engineering differences as well, the most 470 significant of which is PIM's use of soft state reliability 471 mechanisms. CBT uses explicit hop-by-hop mechanisms to achieve 472 reliable delivery of control messages. As described in the next 473 section, PIM uses periodic refreshes as its primary means of 474 reliability. This approach reduces the complexity of the 475 protocol and covers a wide range of protocol and network 476 failures in a single simple mechanism. Although soft-state 477 refreshing can introduce additional message protocol overhead, 478 we introduce the notion of scalable timers to address such 479 concerns. 481 1.5 Document organization 483 In the remainder of this document we enumerate the specific 484 design requirements for wide-area multicast routing (section 485 2), summarize the architectural components and functions 486 (section 3), enumerate several protocol engineering choices 487 made in the design of PIM protocols (section 4), and consider 488 the use of aggregation to address the scalability problem 489 (section 5). Protocol details can be found in [3]. 491 2 Requirements 493 We had several design objectives in mind when designing this 494 architecture: 496 * Sparse-Mode Regions 498 We define a sparse mode region as one in which 500 (a) the number of networks/domains with group members 501 present is significantly smaller than number of 502 networks/domains in the region as a whole; 504 (b) group members span an area that is too large/wide to 505 rely on scope control; and 507 (c) the region spanned by the group is not sufficiently 508 resource rich to ignore the overhead of traditional 509 schemes. 511 Groups in sparse-mode regions are not necessarily ``small''; 512 therefore we must support dynamic groups with large numbers 513 of participants (i.e. receivers and senders). 515 * High-Quality Data Distribution 517 We wish to support low-delay data distribution when needed 518 by the application. In particular, we avoid imposing a 519 single shared tree in which data packets are forwarded to 520 receivers along a common tree, independent of their source. 521 Source-specific trees are superior when 523 (a) multiple sources send data simultaneously and would 524 experience poor service when the traffic is all 525 concentrated on a single shared tree, or 527 (b) the path lengths between sources and destinations in 528 the shortest-path tree (SPTs) are significantly 529 shorter than in the shared tree. 531 * Routing Protocol Independence 533 The protocol should make use of existing unicast routing 534 functionality to adapt to topology changes, but at the same 535 time be independent of the particular protocol employed. 536 This independence has another advantage that the multicast 537 domain boundaries may extend beyond unicast domain 538 boundaries. This allows network designers to take into 539 consideration the multicast requirements and not to be 540 burdened with unicast topology restrictions. We accomplish 541 this by letting the multicast protocol make use of the 542 unicast routing tables, independent of how those tables are 543 computed. 545 * Interoperability with dense mode protocols 547 We require interoperability with traditional RPF and link- 548 state multicast routing, both intra-domain and inter- 549 domain. For example, the intra-domain portion of a 550 distribution tree may be established by some other IP 551 multicast protocol, and the inter-domain portion by PIM; or 552 vice versa. In some cases it will be necessary to impose 553 some additional protocol or configuration overhead in order 554 to interoperate with some intra-domain routing protocols. 556 * Robustness 558 The protocol should be able to gracefully adapt to routing 559 changes. We achieve this by 561 (a) using soft state refreshment mechanisms, 563 (b) avoiding a single point of failure by using an RP- 564 Set, and 566 (c) adapting along with (and based on) unicast routing 567 changes to deliver multicast service so long as 568 unicast packets are being serviced. 570 * Scalability 572 We provide mechanisms for scaling with group and network 573 size. These mechanisms address the forms of overhead: 574 control messages and state. Bandwidth consumed by data 575 packets is already minimized through the use of explicit- 576 join sparse mode. Control message overhead can also be 577 limited to a fixed percentage of the link bandwidth by 578 adjusting the frequency of periodic messages on a link by 579 link basis. This method of controlling overhead was 580 proposed by Van Jacobson. 582 State overhead can be managed in such a way that each 583 router can unilaterally choose its own tradeoff point 584 between the amount of state maintained and the amount of 585 bandwidth consumed by unneeded flooding of multicast 586 packets. 588 3 PIM Components and Functions: Overview 590 In this section we describe the architectural components of PIM. 591 The detailed protocol mechanisms are described in [3]. As 592 described, traditional multicast routing protocols were 593 optimized for densely distributed groups or uniformly 594 bandwidth-rich regions, and rely on data driven actions in all 595 network routers to establish efficient distribution trees. In 596 contrast, sparse-mode multicast constrains data distribution so 597 that packets reach only routers that are on the path to group 598 members. PIM differs from existing IP multicast schemes in two 599 fundamental ways: 601 * Routers with local (or downstream) members join a sparse- 602 mode PIM distribution tree by sending explicit Join/Prune 603 messages; in dense-mode IP multicast membership is assumed 604 and multicast data packets are sent until routers without 605 local (or downstream) members send explicit prune messages 606 to remove themselves from the distribution tree. 608 * Whereas dense-mode IP multicast tree construction is data 609 driven, sparse-mode PIM must use per-group Rendezvous 610 Point for receivers to ``meet'' new sources. Rendezvous 611 Points (RP) are used by senders to announce their existence 612 and by receivers to learn about new senders of a group. In 613 SM, the shared-tree join state is stored in anticipation of 614 data packets, whereas DM does not create state until a data 615 packet arrives. The source-specific trees and associate 616 state are data-driven in PIM, as in PIM-DM. 618 The shortest-path-tree state maintained in routers is roughly 619 the same type as the multicast routing information that is 620 currently maintained by routers running existing IP multicast 621 protocols such as MOSPF, i.e., source (S), multicast address 622 (G), outgoing interface set (oif), incoming interface (iif). 623 We refer to this information as the multicast routing 624 entry for (S,G). For all routers containing a (S,G) entry, their 625 oif's and iif together form a shortest-path tree rooted 626 at S. 628 An entry for a shared tree can match packets from any source for 629 its associated group if the packets come through the right 630 incoming interface, we denote such an entry (*,G). A (*,G) entry 631 keeps the same information a (S,G) entry keeps, except that it 632 saves the RP address in place of the source address. There is a 633 wildcard flag (WC-bit) indicating that this is a wild card 634 entry, and an RPT-bit indicating that this is a shared tree 635 entry. 637 [Figures are present only in the postscript version] 638 Fig. 3 How senders rendezvous with receivers 639 Figure 3 shows a simple scenario of a sender and a receiver 640 joining a multicast group via an RP. When the receiver wants to 641 join a multicast group, its last-hop PIM router ( A in fig 3) 642 sends a Join/Prune message towards the RP for the group. If the 643 last-hop router does not have RP information, it is considered 644 an error. Processing of this message by intermediate routers 645 sets up the multicast tree branch from the RP to the receiver. 646 When sources start sending to the multicast group, the 647 designated router ( D in fig 3) sends a PIM-Register message, 648 encapsulating the data packet, to the RP for that group. If the 649 source's data rate warrants a source-specific tree, the RP 650 responds by sending a Join/Prune message towards the source. 651 Processing of these messages by intermediate routers (there are 652 no intermediate routers between the RP and the source in fig 3) 653 sets up a packet delivery path from the source to the RP. 655 If source-specific distribution trees are desired (based on the 656 source's data rate or some other configuration parameter), the 657 last-hop PIM router for each member eventually joins the 658 source-rooted distribution tree for each source by sending a 659 Join/Prune message towards the source, including the source in 660 the Join list. After data packets are received on the new path, 661 router B in fig 3 sends a PIM-prune message towards the RP, 662 including the source S in the prune list. 663 B knows, by checking the incoming interface in it routing 664 table, that it is at a point where the shortest-path tree and 665 the RP tree branches diverge. A flag, called SPT-bit, is 666 included in (S,G) entries to indicate whether the transition 667 from shared tree to shortest-path tree has completed. This 668 minimizes the chance of losing data packets during the 669 transition. 671 Each PIM router must be able to map a multicast group address to 672 that group's RP (an IP address). To do so, an RP-Set is 673 distributed to all PIM routers within a region, and each router 674 runs the same hash function to map from group address to a 675 particular RP in the RP-Set. In this way all routers within a 676 PIM region map a particular group address to the same RP. The 677 RP-Set is constructed and distributed by a dynamically-elected 678 bootstrap router (BSR) within the region. Only a single RP is 679 active for a group at any one point in time, and the BSR is 680 responsible for keeping the RP-Set up to date. Therefore, all 681 candidate RPs within the region send periodic advertisements 682 (liveness indication) to the BSR. 684 PIM avoids explicit enumeration of receivers. In general, in 685 many existing and anticipated applications, the number of 686 receivers is much larger than the number of sources, and when 687 the number of sources is very large, the average data rate tends 688 to be lower (e.g. resource discovery). In any finite capacity 689 network there is an upper bound on the data rate that any 690 individual host can send or receive. Therefore there are 691 fundamental bounds on the number of high data rate sources that 692 can simultaneously send to the same group. However, there are no 693 such bounds on the number of low datarate sources that can 694 simultaneously send to the same group. If there are very large 695 numbers of sources sending to a group, but the sources' average 696 data rates are low, then it may be more efficient to support the 697 group with a shared tree instead which has less per-source 698 overhead; therefore we suggest triggering Shortest Path Tree 699 (SPT) Join/Prune messages only after the last hop router has 700 received a threshold datarate from the particular source. If 701 sources are low data rate, these Join/Prunes will not be 702 triggered and receivers will receive packets via the shared tree 703 instead and no source specific tree state will be constructed. 704 Issues of group-specific state proliferation and state 705 aggregation are discussed further in section 5. In summary, 706 data packets from the source will travel to the RP in Register 707 messages, and from the RP will travel to receivers via the 708 distribution paths established by the Join/Prune messages sent 709 upstream from receivers towards the RP. If the RP and receivers 710 initiate shortest path tree Join/Prunes, the sources data 711 packets will longest match on the source specific (S,G) state 712 instead of traveling via the RP distribution tree. Some data 713 packets will continue to travel from the sources to the RP in 714 order to reach new receivers. Similarly, receivers will continue 715 to receive some data packets via the RP tree in order to pick up 716 new senders. However, when source-specific tree distribution is 717 used, most data packets will arrive at receivers over a 718 shortest-path distribution tree. At times when group 719 participation is not changing, and all receivers have joined the 720 shortest path tree(s), the RP can inform source(s) to stop 721 sending data-encapsulating Register messages. 723 4 Protocol Engineering Design Features 725 In this section we describe engineering features embodied in the 726 PIM protocols: robustness, interaction with other multicast 727 protocols, and multicast service interfaces. 729 4.1 Robustness features 731 There are several areas in which PIM is designed for robustness. 733 4.1.1 Lost PIM messages 735 The protocol is fairly robust to lost control messages. If a 736 PIM-Register message gets lost then data packets will continue 737 to be encapsulated in subsequent PIM-Register messages until the 738 first hop router receives a Register-stop message message from 739 the RP. If a new Join/Prune message (carrying join information) 740 is lost over an off-tree link (i.e. a link that is not already 741 part of the mutlicast distribution tree), then for the remainder 742 of the refresh period, packets will not be forwarded on the new 743 path, causing join latency; or in the case of prune information, 744 packets will continue to be forwarded until the refresh is sent, 745 causing leave latency. 747 All outgoing-interface state that is cached is timed out after a 748 period equal to `3.5' times the refresh period (e.g., default of 749 210 seconds for the default 60 second refresh interval). As in 750 other multicast routing protocols, this longer timeout interval 751 allows individual packets to be lost without adversely affecting 752 the routing function. When a routing entry has no more outgoing 753 interfaces it is scheduled to be deleted some time later and a 754 prune can be sent upstream (if no prune is sent upstream the 755 upstream state will eventually time out anyway since no 756 Join/Prunes will be received to refresh the join state.) 757 Initially PIM messages are configured to be refreshed every 60 758 seconds. However, in the future a scalable timer mechanism will 759 be deployed in which the rate is a function of the amount of 760 state in a router and link bandwidth (i.e., for lower speed 761 links the rate will be slower and for higher speed links it may 762 be higher). 764 4.1.2 Multiple Rendezvous Points and RP failure scenarios 766 If only a single RP were available to be used for a multicast 767 group, group communication would be disrupted if the RP became 768 unreachable. Assigning a set of available RPs greatly increases 769 the robustness of the system. A small set of PIM routers within 770 a domain are configured to act as Candidate RPs (C-RPs), and 771 periodically send C-RP Advertisements to the elected BSR. At any 772 point in time only a single RP is active for a group. However, 773 when the BSR detects that a particular RP is no longer 774 reachable, the BSR deletes the unreachable RP(s) from the RP-Set 775 next distributed within the periodic Bootstrap message, and all 776 PIM routers within the region rehash affected groups (i.e., 777 those that were previously hashed to the now-unreachable RP). 779 4.2 Interaction with other multicast protocols 781 The basic difference between traditional IP multicast routing 782 and PIM is that the former is completely data driven; we will 783 refer to traditional IP multicast routing as ``dense mode'' for 784 the purposes of this discussion. Four important behavioral 785 differences result: 787 * Dense mode sends and stores explicit prune state in 788 response to unwanted data packets. Sparse mode requires 789 explicit joining; the default action is to not send data 790 packets where they have not been requested. 792 * Sparse mode stores shared-tree join state in anticipation 793 of data packets; Dense-mode routers do not store any state 794 until data packets are sent (i.e. for active data sources). 795 The difference is not very significant for active groups 796 (i.e., PIM would have one additional tree active); however 797 for idle groups dense mode has the advantage of having no 798 state at all, whereas PIM would have state for the one 799 shared-tree. 801 * Sparse mode relies on the concept of an RP for data to be 802 delivered to receivers who request to join the group. 803 Dense-mode groups do not require an RP; broadcast is used 804 as the rendezvous mechanism. 806 * Sparse mode relies on periodic refreshing of explicit 807 Join/Prune messages. Dense mode does not need to send prune 808 messages periodically because of its data driven nature. 810 In simplified terms, the cost of dense mode is the default 811 broadcast behavior and maintenance of prune state, whereas the 812 cost of sparse mode is the need for RPs and RP-tree state for 813 idle groups. If all members of a group are located within a 814 bandwidth-rich region, the group may be supported in a strictly 815 dense mode using scope control. However, such groups cannot 816 include any members beyond the indicated scope, without imposing 817 broadcast and prune overhead on the larger scope needed to reach 818 the remote receiver. PIM is designed to address the more general 819 problem of groups that are not a priori limited to intra-domain 820 membership and may therefore span domains. 822 In the case of multi-access LANs, some interesting issues arise 823 because of possibility of parallel routers forwarding duplicate 824 packets onto the LAN. In SM we must be particularly careful with 825 the operation of the RPtree because the RPF check that prevents 826 routing loops is dependent on information stored in the router, 827 and not based on the source address found in the packet header. 828 As a result it is conceivable that a packet could be routed in 829 elaborate loops because different routers are using different 830 criteria for accepting the packet. To solve this problem each 831 router on a multi-access LAN sends Assert messages when a data 832 packet from a source arrives on the outgoing interface for the 833 associated (S,G) or the (*,G) entry. All routers listen to 834 Assert messages, compare the metrics included therein, and only 835 one router remains the forwarder for that source to that LAN. 837 We also wish to interoperate with networks that do not have 838 routers modified to generate and interpret PIM Join/Prune 839 messages. We have to address two functions: pulling data out to 840 the dense-mode cloud, and importing data into the PIM region 841 from a dense mode region: 843 * In PIM, joining a distribution tree is not passive, routers 844 with local members must take explicit join action to 845 receive data packets. This creates problems when a dense- 846 mode region, wishes to interoperate with PIM. To do so, one 847 of two things must happen: 849 1 Either, PMBR's on the border between PIM and dense 850 mode regions join to all of the PIM region's RPs to 851 pull out all packets generated within the PIM region. 852 Or, 854 2 The PMBR on the border of a dense mode region must 855 receive some indication of membership within the dense 856 mode cloud, and must generate PIM explicit Join/Prune 857 messages to pull the data down to the dense mode 858 cloud. 860 The first of these two approaches is appropriate when the 861 PIM region is a stub or multihomed and is connected to a 862 dense mode backbone. The second of these two approaches is 863 appropriate when the dense mode region is connecting to a 864 PIM backbone. 866 * The PMBRs at the border between PIM and dense mode regions 867 must act as DRs for the sources external to the PIM-SM 868 domain. In other words the PMBR sets up source specific 869 state and sends Registers on behalf of external sources. 871 The details of these mechanisms are described in [3][19]. 873 4.3 Multicast service interface 875 The multicast interface for hosts is unchanged. Hosts need only 876 learn about and communicate their interest in joining to 877 multicast addresses. 879 5 Scaling and Aggregation 881 There are several motivations for aggregating source 882 information; the most important are PIM message size and the 883 amount of memory used for multicast routing entries. 885 One might consider using the highest level aggregate available 886 for an address when setting up the multicast routing entry. This 887 is optimal with respect to routing entry space. It is also 888 optimal with respect to PIM message size. However, PIM messages 889 will carry very coarse information and when the messages arrive 890 at routers closer to the source(s) where more specific routes 891 exist there will be a large fanout and PIM messages will travel 892 towards all members of the aggregate which would be inefficient 893 in most/many cases. 895 Traditional IP multicast routing (dense mode) does not have this 896 problem since prune messages can carry most fine grain 897 information which are triggered based on data packets. If the 898 prune messages are lost, subsequent data triggers the prune. On 899 the other hand, graft messages may be subject to the fan-out 900 problem. In this case, they are sent as far as the message 901 information takes it. The penalty is increased join latency. 903 If PIM is being used for inter-domain routing, and routers were 904 able to map from IP address to domain identifier, then one 905 possibility would be to use the domain level aggregate for a 906 source in PIM messages (Autonomous System (AS) numbers or 907 Routing Domain Identifiers (RDIs)). Then the PIM message would 908 travel to the PMBRs of the domain and the PMBRs can use the 909 internal multicast protocol's mechanism for propagating the join 910 within the domain (e.g. send appropriate link-state 911 advertisement in MOSPF or register a ``local member'' and do not 912 prune in the case of RPF). However this approach requires that 913 it is both possible and efficient to map from IP to domain 914 address when processing data packets, as well as control 915 packets. 917 We address the issues of control traffic and state scaling 918 separately below. The detailed mechanisms have not yet been 919 incorporated into the protocol specification as they are still 920 being designed. 922 5.1 Containing control traffic overhead 924 To control the bandwidth consumed by periodic control messages, 925 we adopt a technique proposed by one of the authors (Jacobson), 926 called scalable timers. The timers controlling periodic 927 refreshing of control messages are set such that the total 928 overhead is a small fixed percentage of the link bandwidth. 930 Eventually, PIM should use the scalable timer approach; this 931 approach was initially proposed by Van Jacobson and a detailed 932 design and analysis was reported in [18]. In this approach the 933 refresh interval is determined by the sender of the information. 934 The sender can adjust the frequency of control messages (and 935 therefore the timeout period at the control message receiver) 936 depending upon the amount of state that it has to communicate, 937 or refresh, over a particular link. It can thereby keep the 938 amount of control traffic to some small percentage of the link 939 bandwidth. In this case the receiver of the control messages may 940 infer the appropriate refresh interval based on measurement of 941 arriving control traffic, and set its timeout values 942 accordingly. 944 In the absence of more experimentation with scalable timer 945 mechanisms, the current PIM protocol specifies that the sender 946 of control messages communication hold-time values explicitly. 947 Therefore, a router tells its neighbors how long to keep it 948 reachable by advertising the holdtime in PIM-Hello messages. 949 Likewise, Join/Prune messages indicate how long state should be 950 kept. This allows the sender to change its frequency without the 951 receivers requiring any special configuration information. 953 5.2 Containing state overhead 955 PIM-SM maintains less source-specific state than do dense mode 956 protocols. The more important issue faced by all existing 957 multicast routing schemes is how to reduce the amount of group- 958 specific state. This remains an open area of investigation. 960 6 Conclusions 962 We have presented a solution to the problem of routing multicast 963 packets in large, wide-area internets. Our approach 965 (a) uses constrained, receiver-initiated, membership 966 advertisement for sparsely distributed multicast groups; 968 (b) supports both shared and shortest path tree types in one 969 protocol; 971 (c) does not depend on a particular unicast protocol; and 973 (d) uses soft state mechanisms to reliably and responsively 974 maintain multicast trees. 976 The architecture accommodates graceful and efficient adaptation 977 to varying types of multicast groups, and to different network 978 conditions. 980 7 Acknowledgments 982 Tony Ballardie, Scott Brim, Jon Crowcroft, Paul Francis, Lixia 983 Zhang and John Zwiebel provided detailed comments on previous 984 drafts. The authors of CBT and membership of the IDMR WG 985 provided many of the motivating ideas for this work and useful 986 feedback on design details. 988 References 989 1. J.Moy. Multicast extension to ospf. 990 Internet Draft, September 1992. 992 2. D.Waitzman S.Deering, C.Partridge. Distance vector multicast 993 routing protocol, nov 1988. RFC1075. 995 3. D.Estrin, D.Farinacci, A.Helmy, D.Thaler, S.Deering, M.Handley, 996 V.Jacobson, C.Liu, P.Sharma, and L.Wei. Protocol independent 997 multicast - sparse mode (pim-sm): Protocol specification. 998 Proposed Experimental RFC, September 1996. 1000 4. D.Estrin, D.Farinacci, A.Helmy, V.Jacobson, and L.Wei. Protocol 1001 independent multicast - dense mode (pim-dm): Protocol 1002 specification. Proposed Experimental RFC, September 1996. 1004 5. S.Deering and D.Cheriton. Multicast routing in datagram 1005 internetworks and extended lans. 1006 ACM Transactions on Computer Systems, pages 85--111, May 1990. 1008 6. S.Deering. Multicast Routing in a Datagram Internetwork. PhD 1009 thesis, Stanford University, 1991. 1011 7. S.Deering. Host extensions for ip multicasting, aug 1989. 1012 RFC1112. 1014 8. W.Fenner. Internet group management protocol, version 2. 1015 Internet Draft, May 1996. 1017 9. J.Moy. Ospf version 2, oct 1991. RFC1247. 1019 10. J.Moy. Mospf: Analysis and experience. 1020 Internet Draft, July 1993. 1022 11. Y.K. Dalal and R.M. Metcalfe. Reverse path forwarding of 1023 broadcast packets. 1024 Communications of the ACM, 21(12):1040--1048, 1978. 1026 12. Ron Frederick. Ietf audio videocast. 1027 Internet Society News, 1(4):19, 1993. 1029 13. A.J. Ballardie, P.F. Francis, and J.Crowcroft. Core based trees. 1030 In Proceedings of the ACM SIGCOMM, San Francisco, 1993. 1032 14. David Wall. Mechanisms for Broadcast and Selective Broadcast. 1033 PhD thesis, Stanford University, June 1980. Technical Report N0. 1034 190. 1036 15. L.Wei and D.Estrin. The trade-offs of multicast trees and 1037 algorithms. In Proceedings of the 1994 international 1038 conference on computer communications and networks, San 1039 Francisco, September 1994. 1041 16. S.Deering, D.Estrin, D.Farinacci, B.Fenner, V.Jacobson, 1042 M.Handley, D.Thaler, L.Wei, and A.Helmy. Interoperability 1043 mechanisms for pim-sm and dvmrp. 1044 Internet Draft, January 1996. 1046 17. D.Estrin, D.Farinacci, A.Helmy, D.Thaler, S.Deering, M.Handley, 1047 V.Jacobson, C.Liu, P.Sharma, and L.Wei. Pim multicast border 1048 router (pmbr) specification for connecting pim-sm domains to a 1049 dvmrp backbone. Internet Draft, September 1996. 1051 18. P.Sharma, D.Estrin, S.Floyd, and V.Jacobson. Scalable timers for 1052 soft state protocols. 1053 Infocom 97, June 1996. 1055 Addresses of Authors: 1057 Deborah Estrin 1058 Computer Science Dept/ISI 1059 University of Southern Calif. 1060 Los Angeles, CA 90089 1061 estrin@usc.edu 1063 Dino Farinacci 1064 Cisco Systems Inc. 1065 170 West Tasman Drive, 1066 San Jose, CA 95134 1067 dino@cisco.com 1069 Ahmed Helmy 1070 Computer Science Dept. 1071 University of Southern Calif. 1072 Los Angeles, CA 90089 1073 ahelmy@catarina.usc.edu 1075 David Thaler 1076 EECS Department 1077 University of Michigan 1078 Ann Arbor, MI 48109 1079 thalerd@eecs.umich.edu 1081 Stephen Deering 1082 Xerox PARC 1083 3333 Coyote Hill Road 1084 Palo Alto, CA 94304 1085 deering@parc.xerox.com 1087 Mark Handley 1088 Department of Computer Science 1089 University College London 1090 Gower Street 1091 London, WC1E 6BT 1092 UK 1093 m.handley@cs.ucl.ac.uk 1095 Van Jacobson 1096 Lawrence Berkeley Laboratory 1097 1 Cyclotron Road 1098 Berkeley, CA 94720 1099 van@ee.lbl.gov 1101 Ching-gung Liu 1102 Computer Science Dept. 1103 University of Southern Calif. 1104 Los Angeles, CA 90089 1105 charley@catarina.usc.edu 1107 Puneet Sharma 1108 Computer Science Dept. 1109 University of Southern Calif. 1110 Los Angeles, CA 90089 1111 puneet@catarina.usc.edu 1113 Liming Wei 1114 Cisco Systems Inc. 1115 170 West Tasman Drive, 1116 San Jose, CA 95134 1117 lwei@cisco.com