idnits 2.17.1 draft-briscoe-ama-00.txt: -(1016): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(1017): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-24) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. ** Bad filename characters: the document name given in the document, 'draft-briscoe-ama-00.ps,', contains other characters than digits, lowercase letters and dash. ** Missing revision: the document name given in the document, 'draft-briscoe-ama-00.ps,', does not give the document revision number ~~ Missing draftname component: the document name given in the document, 'draft-briscoe-ama-00.ps,', does not seem to contain all the document name components required ('draft' prefix, document source, document name, and revision) -- see https://www.ietf.org/id-info/guidelines#naming for more information. == Mismatching filename: the document gives the document name as 'draft-briscoe-ama-00.ps,', but the file name used is 'draft-briscoe-ama-00' == There are 10 instances of lines with non-ascii characters in the document. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 1452 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 2 instances of too long lines in the document, the longest one being 1 character in excess of 72. ** There are 3 instances of lines with control characters in the document. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 256 has weird spacing: '... addr offse...' == Line 477 has weird spacing: '...ount of trees...' == Line 503 has weird spacing: '... mask mask ...' == Line 506 has weird spacing: '... wm wd ...' == Line 535 has weird spacing: '...storage pro...' == (2 more instances...) == Couldn't figure out when the document was first submitted -- there may comments or warnings related to the use of a disclaimer for pre-RFC5378 work that could not be issued because of this. Please check the Legal Provisions document at https://trustee.ietf.org/license-info to determine if you need the pre-RFC5378 disclaimer. -- The document date (21 Nov 1997) is 9651 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. '1' -- Possible downref: Non-RFC (?) normative reference: ref. '2' -- Possible downref: Non-RFC (?) normative reference: ref. '4' ** Downref: Normative reference to an Experimental RFC: RFC 1075 (ref. '5') -- Possible downref: Non-RFC (?) normative reference: ref. '6' -- Possible downref: Non-RFC (?) normative reference: ref. '7' ** Downref: Normative reference to an Historic RFC: RFC 2201 (ref. '8') ** Downref: Normative reference to an Historic RFC: RFC 2189 (ref. '9') -- Possible downref: Non-RFC (?) normative reference: ref. '10' -- Possible downref: Non-RFC (?) normative reference: ref. '11' -- Possible downref: Non-RFC (?) normative reference: ref. '12' -- Possible downref: Non-RFC (?) normative reference: ref. '13' -- Possible downref: Non-RFC (?) normative reference: ref. '14' -- Possible downref: Non-RFC (?) normative reference: ref. '15' -- Possible downref: Non-RFC (?) normative reference: ref. '16' -- Possible downref: Non-RFC (?) normative reference: ref. '17' ** Obsolete normative reference: RFC 1884 (ref. '18') (Obsoleted by RFC 2373) -- Possible downref: Non-RFC (?) normative reference: ref. '19' -- Possible downref: Non-RFC (?) normative reference: ref. '20' -- Possible downref: Non-RFC (?) normative reference: ref. '21' -- Possible downref: Non-RFC (?) normative reference: ref. '22' -- Possible downref: Non-RFC (?) normative reference: ref. '23' -- Possible downref: Non-RFC (?) normative reference: ref. '24' -- Possible downref: Non-RFC (?) normative reference: ref. '25' -- Possible downref: Non-RFC (?) normative reference: ref. '26' -- Possible downref: Non-RFC (?) normative reference: ref. '27' -- Possible downref: Non-RFC (?) normative reference: ref. '28' Summary: 15 errors (**), 1 flaw (~~), 11 warnings (==), 24 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 INTERNET DRAFT R Briscoe 2 MALLOC(?) Working Group M Tatham 3 Expiration: 26 May 1998 BT 4 21 Nov 1997 6 End to End Aggregation of Multicast Addresses 8 draft-briscoe-ama-00.ps,.txt [28] 9 Note: lack of figures and copious subscripts make the .txt version 10 difficult to read 12 Status of this Memo 14 This document is an Internet-Draft. Internet-Drafts are working 15 documents of the Internet Engineering Task Force (IETF), its areas, and 16 its working groups. Note that other groups may also distribute working 17 documents as Internet-Drafts. 19 Internet-Drafts are draft documents valid for a maximum of six months 20 and may be updated, replaced, or obsoleted by other documents at any 21 time. It is inappropriate to use Internet-Drafts as reference material 22 or to cite them other than as "work in progress." 24 To view the entire list of current Internet-Drafts, please check the 25 "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow 26 Directories on ftp.is.co.za (Africa), ftp.nordu.net (Europe), 27 munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or 28 ftp.isi.edu (US West Coast). 30 Abstract 32 This paper presents an approach for solving the inherent problem with 33 multicast routing scalability - by co-operation between end-systems and 34 the network. We introduce an extremely efficient, elegant way to name 35 arbitrary sized inter-meshed aggregations of multicast addresses. This 36 is done in such a way that it is easy to calculate how to change the 37 name to encompass many more related names. We describe how these 38 aggregate names could be used anywhere in place of the set of addresses 39 to which they refer, not by resolving them into multiple operations, but 40 by a single bulk action throughout the routing tree, and in session 41 descriptions potentially including those for reservations. Initial 42 aggregation in end-systems might only reduce the problem by an order of 43 magnitude, but it is believed that this will provide sufficient 44 structure for routers to be able to recognise further aggregation 45 potential. To improve the chances of router aggregation, address set 46 allocation schemes must fulfil certain criteria that are laid down in 47 this paper. 49 Keywords 51 Multicast, address, aggregation, end to end, scalable, Internet. 53 Contents 55 1. Introduction & Requirements 56 2. Scheme Description 57 2.1 Construction of an AMA 58 2.2 Probability of address aggregation 59 2.3 Further storage optimisation (for IPv4) 60 2.4 Life-cycle of an Aggregate Multicast Address 61 2.4.1 How an application would assign an AMA 62 2.4.2 How a sending application would understand an AMA 63 2.4.3 How a receiving host would join or leave an AMA 64 2.4.4 How a receiving application would understand an AMA 65 2.4.5 How a router would aggregate AMAs 66 2.4.6 How an application would expand or contract an AMA if 67 required 68 2.5 Router Implementation 69 2.6 Derivation of recommended values for constants 70 2.6.1 Statistics 71 2.7 Evolution 72 3. Related Work 73 4. Limitations and Further Work 74 5. Security Considerations 75 6. Conclusions 76 7. Acknowledgements 77 8. References 78 9. Authors' addresses 80 1. Introduction & Requirements 82 The addressing scheme used for Internet multicast has been recognised as 83 unscalable since its inception. Every multicast group has to have a 84 separate entry in the forwarding tables of every router on its path. 85 Because multicast groups, being logical entities, have no direct 86 relationship with physical topology, they cannot directly be matched to 87 the hierarchical design of the Internet. In this paper we present an 88 approach for solving this problem indirectly, in the proven tradition of 89 the Internet; end to end - by co-operation between end-systems and the 90 network. 92 The primary problem is that multicast addresses are fixed by session 93 initiators, but aggregation relies on a clustering pattern emerging from 94 the demography of the receivers. Further, the initiator usually doesn't 95 know who the receivers will be until after the addresses to be used have 96 been fixed. Therefore, any clustering beyond small-scale aggregation 97 within applications will have to be achieved on a longer time-scale by 98 session initiators predicting the likely demography of their session 99 (e.g. based on past experience of similar sessions). The job of these 100 predictive systems will be much simpler if they have some degree of pre- 101 existing aggregation to nucleate around, rather than having to 102 crystallise aggregation from complete chaos without any bootstrap 103 process. 105 A large class of applications utilises multiple multicast addresses 106 internally. Further, many such applications consist of varying sets of 107 multiple multicast groups that are all sub-sets of a bigger set (e.g. 108 news, stock-feeds, network games, virtual worlds, distributed 109 simulations, all applications using layered multicast [1]). Any solution 110 should ensure that such "nucleating applications" use aggregations of 111 addresses that can be identified as such. 113 Related work is described later. Most is in the area of straightforward 114 multicast address allocation, which is, perhaps surprisingly, far from 115 being sorted out. It is very difficult, indeed probably reckless, to 116 comment on proposals that have only been hinted at in the odd mailing 117 list posting, or conference presentation. However, it appears that most 118 schemes are assuming that aggregation will be possible if addresses are 119 allocated based on the topological position in the Internet of the root 120 of the multicast tree. Although this may well be the case, it is only 121 true if most multicast applications will tend to be used by groups of 122 users that happen to use the same ISP (Internet Service Provider), or 123 they use ISPs that have a common parent ISP. No evidence that this is 124 the case has been presented. In some cases the tree root even seems to 125 be (erroneously) assumed to be in the same place as the session 126 initiator (the party requesting the address). Certainly, nowhere does it 127 seem to be taken into account that the likely receiver topology is just 128 as important in determining whether multicast trees can be aggregated. 129 Some of the proposals that are available also seem to implicitly assume 130 that routing state will be aggregated in the same way unicast routing is 131 aggregated - by discarding the right-hand bits of addresses which have 132 common left-hand bit fields, despite there being no greater significance 133 to any bit in a multicast address. These criticisms may prove to be 134 unfounded when the work in question is published in detail. 136 The primary thesis of this paper is that it will never be possible to 137 aggregate multicast state in routers if there is no means to name 138 aggregations of multicast addresses. In unicast, an address prefix is an 139 aggregation name, but for multicast the prefix means nothing. A similar 140 concept is required, but with radically different properties and 141 requirements. 143 These names must take up significantly less space than would the list of 144 addresses themselves. Further, if only some sizes of aggregations can be 145 named, this will lead to wastage of addresses. Therefore, any solution 146 must offer a way to name arbitrary sized aggregations. Further, the 147 naming of aggregations must not make any one type of address more in 148 demand than others (e.g. addresses with trailing zeroes, or with 149 sequences of zeros) or encourage the hoarding of certain addresses. It 150 should also be possible to grow or shrink the size of the set identified 151 by the name, while avoiding clashes with addresses in use by other 152 sessions. An informal list of further requirements of address allocation 153 schemes, which are not directly relevant to the discussion here, is at 154 [26]. 156 These aggregation names should themselves be open to aggregation, 157 implying the naming should be recursive. Otherwise the small-scale 158 clustering that the "nucleating applications" will be able to engender, 159 will simply result in these small clusters collecting in routers deeper 160 into the network. Where two names might be aggregated, it should be 161 possible to test this possibility, based on the structure of the names, 162 rather than by trial and error or exhaustive expansion to the lists of 163 addresses that the names resolve to. 165 To encourage their use, these aggregation names should offer 166 convenience and efficiency for the programmer. Preferably they should 167 enable bulk joins to and leaves from sets of multicast addresses. The 168 "nucleating applications" would then naturally assist the network in 169 aggregating together multicast addresses, not out of any sense of 170 altruism, but because using an atomic name for an aggregation is 171 convenient for the programmer and efficient for the system it runs on. 172 This approach could be accused of moving away from random address 173 allocation and therefore possibly encouraging address hoarding. 174 Certainly, once a pattern emerges, the aim should be to bias address 175 allocation to re-inforce the pattern. However, this is still originally 176 based on random allocation, so shouldn't lead to hoarding as long as we 177 don't aim to re-inforce patterns to such an extent that they become 178 entrenched beyond their useful life. 180 2. Scheme Description 182 The proposed scheme for identifying aggregates of multicast addresses 183 builds directly on IP multicasting [2]. The scheme assigns a new meaning 184 to (a number that looks like) any existing multicast address, but only 185 when it is in in a context associated with (at least) two other 186 parameters that define aggregation size. Multiple sets of these 187 aggregation size parameters can be associated with the same single 188 multicast address to signify a list of aggregates of multicast addresses 189 affording further aggregation. These aggregated addresses would be 190 suitable replacements even for single discrete multicast addresses both 191 in hard multicast routing state on the router and soft state in messages 192 initiating and refreshing it, as well as in session descriptions. Thus, 193 in the context of sending data to or receiving data from a multicast 194 address, there is no change from the existing meaning for the address - 195 it doesn't represent an aggregate, just a single address. In the context 196 of joining or leaving a multicast group and in the context of describing 197 a session, the address means an aggregation (although possibly of size 198 1) if associated with size parameters. This switch of meaning dependent 199 on context is consistent with the fact that the aggregation parameters 200 aren't associated with sending and receiving, only with allocation, 201 joining, leaving and describing. 203 For conciseness, we shall refer to this tuple of multicast address and 204 aggregation parameters as an aggregated multicast address (AMA(i)). 205 Thus, it is intended that AMAs are used in future versions of Internet 206 Group Management Protocol (IGMP) [3, 4] and the various multicast 207 routing protocols (DVMRP [5, 6], CBT [7, 8, 9] & PIM [10, 11, 12]) to 208 replace multicast addresses wherever they are used. It would also be 209 highly advantageous for session directory, invitation and announcement 210 protocols (e.g. SDP [13], SAP [14] & SIP [15]) to evolve to use AMAs in 211 place of multicast addresses. Evolution from the current version of IGMP 212 and multicast routing protocols and from current session description 213 protocols is discussed under the "Evolution" section. 215 Further, it would be natural for evolving multicast address allocation 216 schemes ([20, 21]) to use AMAs to define allocations concisely. Just as 217 sets of related AMAs can be aggregated together into a single AMA, the 218 reverse is also possible. Thus parts of large allocations can be "sub- 219 allocated" to further parties if required. However, allocation of 220 multicast addresses (and hence AMAs) is not as straightforward as for 221 unicast addresses. Allocation issues are discussed under the section on 222 "Probability of address aggregation". 224 Firstly we shall define how an AMA is constructed and how its 225 construction affects its meaning. Then we shall follow through a typical 226 life-cycle of an AMA: 227 1. how an application would assign an AMA 228 2. how a sending application would understand an AMA 229 3. how a receiving host would join or leave an AMA 230 4. how a receiving application would understand an AMA 231 5. how a router would aggregate AMAs 232 6. how an application would expand or contract an AMA if required 234 2.1 Construction of an AMA 236 The construction of an AMA is based on two architectural principles: 237 1. to identify a range of addresses by starting from a base, using an 238 offset (d or v) to the minimum and a range value (n or s) to 239 identify the maximum 240 2. to use two levels of this structure - the first based on bit-field 241 widths to home in on a rough area using very few bits, the second 242 using actual values to achieve the arbitrary aggregation size 243 requirement 245 ----------2^d---------->--2^n------ 246 |-v-->-s-> 248 The motivation for having a maximum for the rougher range as well as the 249 fine-grained one is explained later. 251 The address-like component of an AMA is defined to be a concatenation of 252 bit-fields as shown (with recommended values of bit widths shown for 253 IPv4 - see discussion later): 255 Description: mcast mask remainder 256 addr offset 257 flags 258 Value: d r 259 Width: 4 wd wr 260 Recommended width: 4 24 261 <--><-->.<-------.--------.-------> 262 1110XXXX.XXXXXXXX.XXXXXXXX.XXXXXXXX 264 The three associated aggregation parameters are defined below. The 265 recommended values of bit widths are shown for IPv4, but later a scheme 266 for optimising these is proposed (see Further storage optimisation). 267 Introducing this now would just confuse the explanation: 269 Description: mask aggr'n aggr'n 270 width size offset 271 Value: m s t 272 Width: wm ws ws 273 Recommended width: 4 16 16 274 <--> <-------.-------> <-------.-------> 275 XXXX XXXXXXXX.XXXXXXXX XXXXXXXX.XXXXXXXX 277 The value in the "mask width" bit field, m, (recall this is associated 278 with an address, not part of it) determines the width, n, of a bit-mask 279 overlaid on the remainder (as described next), where: 281 n = m + m0 --------(1) 283 and where m0 is a constant integer (set for this scheme) that determines 284 the minimum size of an AMA (see later for discussion). It is recommended 285 that: 287 m0 = 0 289 The value in the mask offset field, d, determines the offset in bits 290 from the right-hand end of the address to the right-hand end of the 291 mask, for example, if m0 = 0: 293 d r m s 294 Example pseudo-dotted 13 . 221 . 147 . 93 3 5 295 decimal value: <--><-->.<-------.--------.-------> <--><-... 296 11101101.11011101.10010011.01011101 0011 297 Mask (XXX): 11101101.11011101.XXX10011.01011101 298 <----d-------- 299 <-n 301 Note: in these diagrams, the label "d" is used to denote where the value 302 of d is stored and the length of the offset set by this value. 304 If n + d > wr , the bit mask simply wraps round into the right-hand end 305 of the remainder, r. 307 The value within the masked bits, vmin (100b in this example) determines 308 the base or minimum address of the AMA(ii). 310 The value of the aggregate size, s, (recall this is associated with an 311 address, not part of it) determines the upper or maximum address of the 312 AMA by setting the maximum value in the masked bits, vmax, where: 314 vmax = vmin + s - 1 316 The AMA is defined as the set of addresses that result when v is varied 317 between its minimum and maximum values. For example, the AMA defined by 318 our example address and example aggregation size parameters follows: 320 dotted decimal 321 11101101.11011101.10010011.01011101 237.221.147.93 a0 322 11101101.11011101.10110011.01011101 237.221.179.93 a1 323 11101101.11011101.11010011.01011101 237.221.211.93 a2 324 11101101.11011101.11110011.01011101 237.221.243.93 a3 325 11101101.11011101.00010011.01011101 237.221. 19.93 a4 327 Note that if vmax > 2^n, v may wrap within the bit-field (without 328 carrying outside the bit-field). 330 Note that s=0 means no values of v at all (useful later). 331 Note that meanings of s > 2^n are reserved but undefined. ------(2) 333 For convenience, we might denote an AMA as so far described by the 334 tuple: 336 (a, m, s) 338 which for the example above would be: 340 (237.221.147.93, 3, 5) 342 Where two or more AMAs are disjoint sets that share a common a (even if 343 the masked bits are different) and m, they may be represented by the 344 more general form that is the correct full definition of an AMA: 346 (a, m, s0, [(t1, s1), ... (ti, si), ... ]) 348 where ti is an offset from vmin, and si is the size of the set of 349 addresses based on vmini where: 351 vmini = vmin + ti. 353 Thus, continuing the above example, all three of the following sets of 354 addresses or AMAs represent the same thing: 356 a0, a1, a3, a4 357 (a0, m, 2), (a3, m, 2) 358 (a0, m, 2, 3,2) 360 This allows re-use of one address by associating it with multiple 361 aggregate size pairs with minimal extra storage. 362 Further, as the sequence of pairs along the AMA is processed, 364 if ti >= 2^n, m is incremented until ti < 2^n --------(3) 366 m remains at its higher value for subsequent pairs, unless it is 367 incremented further later. 369 Again continuing the above example to illustrate this point, if we 370 denote a8 as the address related to a0 like so: 372 <-v-> dotted decimal 373 11101101.11011101.10010011.01011101 237.221.147.93 a0 374 11101101.11011100.10010011.01011101 237.220.147.93 a8 376 then these two sets represent the same thing: 378 (a0, 3, 2), (a3, 3, 2), (a8, 4, 7) 379 (a0, 3, 2, 3,2, 8,7) 381 The motivation for this particular design of scheme should start to 382 become apparent; AMAs can be fully manipulated without caring about 383 which multicast address was randomly chosen to start with as the base. 384 This ensures there is no more demand for any one multicast address than 385 another. vmin doesn't have to be all zeroes to ensure all 2^n 386 combinations of bits under the mask are used. This ensures that the task 387 of aggregating multicast addresses into AMAs and small AMAs into big 388 AMAs should be achievable with just bit-wise logic operations. 390 2.2 Probability of address aggregation 392 Fig 1. attempts to represent multicast trees of various topologies 393 across the Internet as various three-sided shapes. The Internet is 394 represented by the oval with end-systems around the perimeter. The root 395 of each tree is indicated by a dot. The receivers that have joined the 396 tree are spread across the side of the "triangle" opposite the tree 397 root. The shaded areas are an attempt to represent the aggregation 398 potential of a representative topology by representing an even density 399 of receivers around the edge and an even density of routers across the 400 oval, rather than being a representation of the physical density of 401 hosts and routers. Trees "a", "d" and "e" are core-based trees, with 402 tree roots within the network rather than on end-systems. Trees "b" and 403 "c" are source-based with roots on end-systems. It should be noted that 404 these trees are probably not typical, mainly because it is difficult in 405 such a diagram to draw trees that have receivers spread all over the 406 Internet. However the trees have been chosen to illustrate various 407 points. 409 {See .ps or [28] for fig} 410 Fig 1 - Various multicast tree topologies across the Internet 412 Although trees "a" and "d" have roots that could well be in the same 413 domain (autonomous system), their receiver sets have caused them to grow 414 in completely opposite directions so no aggregation would be possible. 415 Tree "e" is completely contained within tree "d" although their roots 416 could well be in completely unrelated domains. Thus it should be 417 possible to aggregate the routing information they need. Tree "a" is 418 nearly contained within tree "b", so it would be desirable to be able to 419 aggregate their routing at least where they overlap. Trees "a" and "c" 420 have a similar relationship to that between "a" and "b". Trees "b" and 421 "c" cross eachother at the network core so some aggregation may be 422 possible where the directions of the trees coincide on certain links. 423 There could even be occaisions where the routing of "a", "b" and "c" can 424 all be aggregated together. 426 It should now be possible to see that, although it is more likely that 427 two trees with close roots will have routing information that is 428 condusive to aggregation, this is by no means a necessary or sufficient 429 condition. In fact it appears that receiver distribution is a better 430 indication of aggregation potential. 432 A multicast address needs to be allocated when a session is initiated, 433 at which time the likely receiver topology may be difficult to predict. 434 For certain types of tree (e.g. source-based) it is much easier to be 435 certain where the root should be at this early stage. For core-based 436 trees, the positioning of the core is ideally dependent on the 437 prediction of the receiver topology (which can be sketchy, as we have 438 already said). 440 Receiver topology is difficult but not impossible to predict. This 441 usually reduces to a marketing-type problem. What is clear is that 442 address allocation schemes that are based exclusively on the position of 443 the root of the multicast tree (or worse the position of the session 444 initiator, who may not even be taking part once the session is going) 445 will not realise anything close to the full aggregation potential of any 446 conceivable topologies of multicast trees that are likely on the 447 Internet. A good scheme must allow addresses to be allocated to sessions 448 based on a combination of root and receiver topology. It is outside the 449 scope of this paper to solve how this allocation would be done, but it 450 is clear that a scheme that disallows a future solution to this problem 451 is a bad scheme. 453 The scheme proposed for AMAs allows just such a combination of 454 allocation between root and receiver topology. Taking the example AMA 455 used already, 457 d r 458 Example pseudo- 13 . 221 . 147 . 93 459 dotted decimal value: <--><-->.<-------.--------.-------> 460 11101101.11011101.10010011.01011101 461 |<----d-------- 462 | 463 d | f 464 root-based allocation: <-->. |<----.--> 465 receiver-based allocation: --------.--> <---- 466 |_________g_______________| 468 The 16 bits, labelled g, to the left of the offset (to which d points) 469 are those that the largest possible mask could cover so they might 470 conceivably all be aggregated together. Therefore these bits should be 471 allocated purely on likely receiver positioning. In the example this 472 field wraps after 11 bits with the remaining 5 bits being at the far 473 right hand end of the address. On the other hand, it is proposed that d 474 and f would be allocated based on the position within the Internet of 475 the root of their multicast tree (possibly combined with the likely 476 direction from the root from which most receivers would join, to take 477 account of trees like "d" and "e" above). f is defined as the 8 bits to 478 the right of the offset (wrapping round the length of r if necessary). 479 The mask can never cover f which is why it is available for allocation 480 based on the root position. 482 A simple short term solution for an address allocation scheme might be 483 to generate g, d and f from an algorithm seeded by a unicast address 484 prefix (or a set of a few representative prefixes) that represents the 485 majority of likely receivers and the unicast address of the tree root. 486 The latter would be necessary if the tree were source-based, but it may 487 be possible for the allocation service (or some other service) to 488 suggest the best tree root position if using core-based multicast. 490 2.3 Further storage optimisation (for IPv4) 492 Optimisations considered but rejected are recorded elsewhere [25]. 494 2.3.1 Mask width, m 496 In the contexts where AMAs are used, the first four bits of the address 497 are redundant, as a non-multicast address wouldn't make sense. m is the 498 natural candidate to store in place of this 1110 bit pattern which is 499 fixed for IPv4 (class D) addresses (but see below). When the AMA is read 500 from storage, m can be read into another variable and this bit pattern 501 can be re-instated. 503 Description: mask mask remainder 504 width offset 505 Value: m d r 506 Width: wm wd wr 507 Recommended width: 4 4 24 508 <--><-->.<-------.--------.-------> 509 XXXXXXXX.XXXXXXXX.XXXXXXXX.XXXXXXXX 511 However, there is no guarantee that the multicast address range will 512 always be confined to this fixed first four bits. For instance, class E 513 addresses (starting 1111) are currently reserved, and might conceivably 514 form an extension to the IPv4 multicast address space in the future. 516 Thus this optimisation is suggested but if it were adopted, the 517 implications for the future use of other ranges as an extension to the 518 multicast range would need serious consideration. However, as this is an 519 optimisation, implementations could be designed so that if other address 520 ranges ever became valid for multicast, the optimisation could be 521 removed. 523 2.3.2 Aggregation size, s 525 Many applications will have sessions with less than 16 multicast 526 addresses and most will have less than 256. Thus it seems wasteful to 527 use 16 bits for s (and t) when most of the time, most of their leading 528 bits will be zeroes. Also we must not forget that there will be many 529 sessions that only use one multicast address. 531 Therefore, it is recommended that ws be dependent on the value of m at 532 least in the environments indicated: 534 Rule applies in: 535 router storage protocol fields 536 If m = 0 (0000b), ws = 0 Y N? 537 If 0 < m <= 1 (0001b), ws = 2 N?? N 538 If 1 < m <= 3 (0011b), ws = 4 N? N 539 If 3 < m <= 7 (0111b), ws = 8 Y N? 540 If 7 < m <= 15 (1111b), ws = 16 Y Y 541 \--------(4) 543 The question marks indicate where the rule is open for discussion after 544 implementation experience. 546 It is recommended, at least for router storage, that m = 0 is used to 547 indicate an AMA that is actually just a single discrete multicast 548 address, where associating a size of 1 with it would be a waste of 549 space. 551 However, this appears to make the widest possible mask 15 bits, because 552 it precludes using m = 0 to mean n = 16. Although the need for a 16 bit 553 wide mask is likely to be rare, it is still possible to force the mask 554 to be this wide by exploiting the equivalence of the two AMAs in the 555 following example (due to the ability of t to increment m, given in 556 formula (3)): 558 (a, 16, s) (impossible value of m) 559 (a, 15, 1, >2^15, s) 561 The two conditions that lead to sub-byte storage requirements are not 562 recommended as they are likely to be more trouble than they are worth, 563 even for router storage. 565 It has also been assumed that saving a byte or two is not worth it for 566 protocols (like IGMP) as opposed to router storage. The hassle of a 567 conditional width field is probably not worth the small reduction in 568 message size that would result. 570 Note that the width of t would always follow the same rule as that for 571 s. That is: 573 wti = wsi 575 Note that in all cases (except m = 0) t can always be large enough, 576 within the available field-width, to be capable of incrementing m into 577 the next range, by formula (3). For example, these two AMAs are 578 equivalent : 580 (a, 8,0) 581 (a, m=7,s0=0, t1=128,s1=128) 583 In the second AMA, the starting mask width is 7, so s0, t1 and s1 are 8 584 bits wide. But, by formula (3), because t1 >= 2^7 (has the first bit 585 set) m is incremented to 8. 587 Note that if t forces m to increment into a range that would alter the 588 width of both t and s (formula (4)), the increase in width of t doesn't 589 happen until the next pair of t and s, if at all, while the s in the 590 current pair should be widened immediately. The reasoning, is that the 591 width of t mustn't be affected by its own value. 593 2.4 Life-cycle of an Aggregate Multicast Address 595 2.4.1 How an application would assign an AMA 597 First, the application initiating (or expanding) the session must decide 598 how many multicast addresses it needs (bearing in mind the scheme allows 599 it to expand or shrink from this initial decision later). This we will 600 call smax. (As discussed above, even if smax is 1, it may still make 601 sense to use AMAs.) 603 We assume some smart address allocation scheme has been worked out that 604 meets the requirements laid down in the section on the Probability of 605 address aggregation and may indeed be like the simple one outlined in 606 that section. It should be noted that a sub-optimal, tactical address 607 allocation service wouldn't stop this scheme working, but as the art of 608 allocation improved, aggregation results would improve. 610 The application (or session initiator) now predicts the likely receiver 611 topology for the session it is initiating. From this, it decides the 612 best position for the root of the multicast tree(s) that make up the 613 session (or some other service does this at its request). The 614 application would then pass all these constraining parameters, including 615 the size of AMA it required to the address allocation service. The 616 address allocation service would return an AMA of the required size. 617 To arrive at an AMA, it would internally (probably) allocate g based on 618 the receiver topology and d and f based on the root and receiver 619 topology given to it by the application. 621 Also, internally, it would have to calculate n such that 2^(n-1) < smax, 622 but 2^n >= smax . 624 In other words, it would decide integer n where no more than 2^n 625 multicast addresses are needed in the session. m follows rather 626 straightforwardly from formula (1). 628 For example, if the session needed 6 multicast addresses and m0 = 0, it 629 would use m = n = 3. 631 We shall assume the address allocated is the example address used above. 632 It then joins 6 multicast groups simultaneously by sending one message 633 to join the AMA: 635 (237.220.147.93, 3, 6) 637 If the allocation scheme is light-weight it may give no guarantees that 638 the any of the addresses haven't been simultaneously allocated to other 639 sessions. Therefore, the application may have to monitor whether any of 640 the multicasts are in use. If they are free, it holds them for itself 641 (as is done with discrete multicast addresses today), then goes ahead 642 and advertises the session (or invites users in) using this AMA. 644 In fact, if there is a finite possibility of simultaneous allocation, 645 the above is not necessarily the best opening strategy for grabbing a 646 set of free multicast addresses. Which strategy is best depends on the 647 size of the set required and the level of utilisation of the address 648 space (which will oscillate daily and increase into the future, 649 decreasing the probability of finding a free set in a single attempt). 650 This is a large enough problem space to become a topic for study in its 651 own right (see Further Work), so it is not gone into in any depth here, 652 save to make some broad generalisations. 654 If the probability of finding a free set of multicast addresses first 655 time is high, the obvious strategy would be to just try another address 656 set if part of the first was busy. If the allocation service couldn't 657 guarantee all its allocations were not in use, it would have to be 658 possible to receive a slightly different response to a repeated 659 identical request. 661 Otherwise it may be best to test a larger set than is required, then 662 drop the busy ones. AMAs make this easy, as it is often possible to 663 derive one smaller AMA from a larger one to deliberately avoid some of 664 the addresses. 666 Yet a third method would be to try a number of contiguous or overlapping 667 AMAs, then drop those that tested busy and amalgamate the remaining ones 668 into one AMA (again often but not always possible). 670 The comments in this section on allocation of AMAs for applications 671 apply equally well to allocation of AMAs to allocation sub-authorities 672 in an allocation hierarchy. However, it is made clear in the section on 673 Probability of aggregation that allocation hierarchies per se are not a 674 panacea where multicast is concerned. 676 2.4.2 How a sending application would understand an AMA 678 This section is included to remind the reader that sending to an AMA 679 probably isn't a valid activity. As explained earlier, the concept of an 680 AMA doesn't exist in the context of sending. If the sending part of an 681 application includes understanding of sessions it may well utilise AMAs 682 (this is also a sure sign it hasn't been written in a structured way!). 683 However, the raw sending aspects will be direct to discrete multicast 684 addresses, and shouldn't involve AMAs, unless they are modelled as an 685 array of multicast addresses. 687 It might be desirable to send the same information to multiple multicast 688 group addresses using only one socket and one flow of packets through 689 the network. If a clear need for this were identified, it would be 690 sensible to use the same addressing scheme as AMAs provide. However, no 691 clear need is immediately apparent to the author, so it would not be 692 sensible to update APIs for sending to multicast sockets and the router 693 code that disseminates multicast packets so that they act on this extra 694 parameter. 696 2.4.3 How a receiving host would join or leave an AMA 698 This is the other main use for AMAs besides in session initiation. An 699 application's request to join an AMA given in a session announcement or 700 invitation would be translated directly into a (future) IGMP call to 701 join that AMA as a single atomic action. If the AMA was a superset of 702 another AMA already having its soft-state joins regularly refreshed by 703 the stack, the stack would have to merge the two AMAs into one expanded 704 AMA (if it didn't the router would, before forwarding the joins). 706 Similarly, a call to leave an AMA would translate directly to a (future) 707 IGMP call to leave. If the call to leave an AMA resolved to a sub-set of 708 the multicast addresses previously joined, the router would be designed 709 to be able to handle contracting the size of the AMA it considered was 710 still of interest on that interface (or layer 2 address) down to the AMA 711 that mapped to the remaining addresses. The host stack would also have 712 to be able to contract its AMA for it to regularly refresh the soft- 713 state of the remaining joined addresses. 715 The stack would have to ensure the frequency of join refreshes remained 716 sufficient while it amalgamated or contracted down any out of phase AMA 717 refreshes. 719 AMAs can be used as a consistent computational type for addressing any 720 number of multicast addresses, whether the AMA resolves to many or just 721 a single multicast address. In fact, APIs (application programming 722 interfaces) could pre-empt the introduction of AMAs into the network, by 723 presenting AMAs to the programmer but having middleware or the stack 724 convert AMAs into sets of discrete multicast addresses until the network 725 is upgraded. However, it would be sensible for routers and protocols to 726 signify an AMA of size 1 by not storing or passing the aggregation size 727 parameters at all (see Further storage optimisation - this would help 728 backward compatibility too). Otherwise the efficiency benefits of the 729 scheme would be offset by the 25% increase in storage required for each 730 non-aggregated, isolated address. On the other hand, to hide AMAs from 731 those programmers not interested in sessions with multiple multicast 732 addresses, it may well be best to implement an API for AMAs by 733 overloading a similar one for discrete multicast addresses(iii). 735 2.4.4 How a receiving application would understand an AMA 737 All the comments above on the irrelevance (and possible relevance) of 738 AMAs to sending data apply equally to receiving. 740 2.4.5 How a router would aggregate AMAs 742 As a router received the equivalent of "join" and "leave" requests and 743 refreshed "joins" it would continually be looking for matches at two 744 levels: 746 1. on the one hand, the router would continually attempt to merge 747 incoming AMAs on a per interface basis to reduce the state being 748 held in its tables. 749 2. on the other, the router would attempt to aggregate the outgoing 750 "join" refresh messages it forwarded upstream on a per-interface 751 basis too. 753 AMAs can be aggregated at two levels (whether by a router, or by a 754 host). 755 1. where two or more AMAs share a common a (even if the masked bits 756 are different) and m, overlapping s ranges can be merged 757 2. where a whole set of combinations under a mask is present (s = 0), 758 m can be incremented and this can be represented as half a full 759 range at the wider level, which is then subject to further 760 aggregation using the first technique again 762 Where multicast trees cross eachother (e.g. "b" and "c" in Fig 1), it 763 would be necessary to "de-aggregate" routing at the point where the 764 routes to the two roots diverge. As stated before, because AMAs can be 765 split as easily as they can be amalgamated, this is not a problem. 766 The section on Router Implementation should be referred to for more 767 detail on these matters. 769 2.4.6 How an application would expand or contract an AMA if required 771 As long as the overall application session was utilising an AMA with s = 772 smax, any one receiver could vary vmin and s around to utilise any 773 subset AMA of the larger AMA set (useful for many simulation 774 applications). Thus contraction and re-expansion of individual joins is 775 straightforward. 777 To expand a session beyond the original smax but still with a single AMA 778 sometimes requires time for preparation of the ground and a degree of 779 luck. In all cases this is because one is attempting to acquire use of 780 extra multicast addresses without altering the addresses already being 781 used. This limits the sets of interesting multicast addresses to AMAs 782 that are related to the one in use. This is a limitation of the AMA 783 scheme when compared to schemes based on arbitrary masks such as [19] 784 (which has different limitations discussed under Related work). However, 785 this was a concious compromise to avoid the larger storage needed for 786 arbitrary masks as opposed to contiguous ones. 788 The following operations would (probably) be enacted by the address 789 allocation service in response to a request to increase the size of an 790 existing allocation (the uncertainty is because the detailed design of 791 such a service is outside the scope of this paper). 793 Firstly, if not already there, smax should be increased to 2^n and the 794 new addresses tested for prior allocation. 796 If enough addresses still aren't free, n can be incremented without 797 harm, then s can be doubled for every increment and the new addresses 798 tested. 800 If this doesn't find enough free addresses, keeping n incremented , vmin 801 can be changed while s is increased to try addresses related to the 802 current range which will still give the network a chance to aggregate 803 addresses as long as other receivers are co-operating within the same 804 rules. 806 If this doesn't work, one can look for a close but disjoint AMA (or 807 AMAs) and watch for a gap between the current AMA and the new one until 808 it can be closed by a superset AMA later. 810 Beyond that, the only solution is to give up and use more than one AMA, 811 but this is unlikely to be necessary. 813 2.5 Router implementation 815 The discussion in this section shouldn't be taken to imply that this is 816 the best way to implement multicast routing. It is simply necessary to 817 establish at high level that it would theoretically be possible to 818 modify multicast routing implementations to take advantage of AMAs. 820 Currently, the most general form of multicast routing table is a 821 database with (a usually large number of) rows for each unique 822 address/source pair as follows: 824 a, S, fI, [fOj, ... fOk,] misc 826 where the variables are defined as: 828 a : multicast address 829 S : source 830 fI : incoming interface 831 fOj ... fOk : list of outgoing interfaces to which to duplicate and 832 forward packets 833 misc : other miscellaneous information not relevant to this 834 paper (e.g. MTUs, prune state) 836 To take advantage of the aggregation and deaggregation potential of 837 AMAs, the most general modification to this would be for each row to 838 contain the following: 840 a, m, S, [(tm, sm, fIh,), ... (tn, sn, fIi,)], [(tp, sp, fOj,), ... 841 (tq, sq, fOk,)], misc 843 Here one row represents the routing for all the related AMAs that are 844 being aggregated and de-aggregated at this router with respect to one 845 source. We define related AMAs as AMAs that can be represented with the 846 same base address and mask, only differing in their offset and size. 848 This is illustrated in Fig 2. It should be noted that the three trees, 849 A1, A2 & A3 represent related AMAs, not discrete multicast trees 850 (although they could be AMAs of size one), because, as noted before, 851 routing is one of the contexts where AMAs can completely replace 852 discrete multicast addresses. Thus, for these related AMAs, only one set 853 of routing information would be passed in or out of each interface (e.g. 854 because all three AMAs "join" f4, only one routing message would enter 855 this interface for them all). Another way of defining related AMAs is to 856 say that it is possible to represent their union as a single AMA (we 857 will call this the super-AMA). The only distinction between the three 858 trees illustrated is that they represent the different sub-sets (of the 859 super-AMA) that share identical routing at this router. Each row such as 860 that presented above represents the complete routing definition for each 861 super-AMA at this router. 863 {See .ps or [28] for fig} 864 Fig 2 - Aggregation and de-aggregation of multicast trees 866 The routing row for this super-AMA is similar to the previous row 867 structure, except that: 868 � the multicast address field has subtly different semantics. It 869 doesn't mean a single address, but it is taken to mean the base 870 address with which all the offset/size pairs in the row are 871 associated - the base address of the super-AMA. 872 � the mask is included, which for IPv4, could be overlaid over the 873 front of the base address as described under Further storage 874 optimisation 875 � each outgoing interface is associated with an offset/size pair(s) 876 (t and s are as already defined under Construction of an AMA above) 877 which defines the outgoing AMA on that interface when associated 878 with the base address 879 � there may me more than one incoming interface associated with all 880 the related AMAs being de-aggregated at this router, each of which 881 is listed with an offset/size pair(s) as for the outgoing 882 interfaces. 884 For example, if A1, A2 & A3 in Fig 2 were the three AMAs respectively: 886 (a,m,17), (a,m,0, 31,42), (a,m,0, 20,37) 888 then the routing table entry for the super-AMA (a,m,42) with respect to 889 source S would be: 891 a, m, S, [(0, 17, f1,), (20, 42, f3,)], 892 [(0, 42, f4,), (0, 42, f5,), ((0, 17, 31, 42, f6,)], misc 894 When a packet arrives at the incoming interface, it's destination is 895 matched against the AMA associated with each outgoing interface. A match 896 causes it to be duplicated and forwarded out of that interface. 898 Obviously, the row structure above isn't the most efficient for routing 899 look-ups. It is more suited for routing updates. This suggests that a 900 read-optimised data-structure should be built from this write-optimised 901 structure and both held by the router, with the former being regularly 902 refreshed from the latter. This is similar to the way most unicast 903 routing is implemented. Whether there is one read-optimised table per 904 router or a number of sub-sets of the main table specific to each 905 interface is beyond the scope of this paper. 907 2.6 Derivation of recommended values for constants 909 The principles for choosing the constants for IPv4 have been laid out in 910 detail, so that they can be re-applied judiciously for IPv6 once (or 911 while) its multicast addressing scheme [18] is finalised. 913 wd = 4 914 It is an important design feature that the mask offset is contained 915 within the address, rather than added as another parameter outside the 916 address. This is because, as a packet arrives at the router, this offset 917 can be read to find the point at which any mask would be applied. This 918 should greatly speed the process of matching a packet against the AMAs 919 in the routing table. It also crams as much meaning into the address as 920 possible, reducing the extra state held in routers and passed in 921 protocols. The offset doesn't need to be varied as aggregation 922 progresses, so it makes sense for it to be a fixed value, and therefore 923 it might as well be part of the address. 925 For IPv4, a width of 4 for the offset was a compromise. Ideally the 926 width would have allowed any offset value across the remainder (28 - 927 wd). Note that, if the offset did allow the mask to start anywhere in 928 the remainder, this does not mean there would be no space in the 929 remainder that was guaranteed to always be outside the mask (e.g. for 930 address allocation related to the position of the tree root - see "How 931 an application would assign an AMA" above). This depends on the mask 932 width, not the offset size. Therefore, wd could have been set to 5 933 leaving the remainder 23 bits wide. However, this would only have used 934 half the value of the fifth bit, so 4 was chosen as a compromise between 935 availability of more descriptions for AMAs and utilisation of bits. It 936 should be noted that, because AMAs overlap in their use of addresses, 937 there are diminishing returns in having more ways to split the 938 description of sub-groups of the address space. Also a 5/23 split was 939 "binarily awkward" compared to a 4/24 split. 941 wm = 4 942 Some thought was given to limiting the mask to 3 bits wide. This would 943 have led to a maximum AMA size of 2^8, which would probably have been 944 adequate if just considering application aggregation. However, we hope 945 that higher level techniques will be found to enable considerable router 946 aggregation based on the foundations laid by AMAs. Limiting routers to 947 aggregation of just two orders of magnitude seemed short-sighted, when 948 using one more bit would raise the maximum aggregation limit to 2^16 949 (and align on a half-byte boundary). The importance of keeping the size 950 of m down, is that it is stored in the router for each AMA. 951 The bit mask is deliberately fixed on its right-hand end so that when m 952 is varied, the LSB of the mask stays in the same place otherwise large- 953 scale aggregation would be terribly hard. 955 ws = 0, 8 or 16 956 The discussion of this value's potential dependence on m is well 957 rehearsed under Further storage optimisation . The motivation for 958 keeping its size down is again router storage. It is expected that a 959 large number of values of s will be under 16 (at least in edge routers), 960 so it would be profligate to allocate a whole byte, let alone two just 961 because such bit-widths will be needed for higher level aggregation or 962 applications with more complex sessions. 964 m0 = 0 965 We could have made m0 = 1 but this would have led to an extra increment 966 operation every time the AMA was interpreted. Whether this is more 967 efficient than testing for m = 0 is debatable (machine instruction set 968 dependent). n = 0 could be made illegal rather than mapping it to 16, 969 which would only lose a few ridiculously large (?) aggregates. 971 Alternatively, we could even have made m0 = 2, on the grounds that AMAs 972 of size 2 are uninteresting, and AMAs of size 1 can be indicated by no 973 value of s. We decided against this on grounds of caution over lack of 974 elegance. If we put discontinuities in the maths, it makes it much more 975 difficult to build higher order mechanisms that are simple. 977 2.6.1 Statistics 979 Below are listed the number of distinct AMAs of a selection of sizes 980 (note they are not confined to powers of 2): 981 AMA size distinct AMAs 982 {left as an exercise for the reader or until the author gets round to 983 working out the formula...} 985 2.7 Evolution 987 It is a simple matter to convert an AMA to the list of addresses it 988 refers to. It is also possible to convert lists of addresses into AMAs 989 or lists of AMAs. 991 AMA-enabled hosts and routers shouldn't send or forward joins or leaves 992 to routers that don't understand AMAs, they should convert the AMAs to 993 lists of discrete multicast addresses. 995 This assumes routers can determine which version of a multicast routing 996 protocol their upstream router for each interface is using. This should 997 be straightforward as version stamped routing will also be arriving at 998 the interface down that link (multi-host links will cause complications 999 for some routing protocols). This also assumes hosts can determine which 1000 version of IGMP is being used by their upstream router which should be 1001 possible, but is probably difficult. 1003 Session descriptions would either have to use both forms for an interim 1004 period, or parallel descriptions in the two versions would have to be 1005 transmitted on different channels. 1007 IGMP, multicast routing protocols and the session description protocols 1008 could evolve independently as long as applications and AMA-enabled 1009 routers had the capability installed to convert AMAs into lists of 1010 discrete multicast addresses when necessary. 1012 3. Related Work 1014 The initial schemes for allocation of multicast addresses involve three 1015 distinct classes of address: 1016 � addresses that are permanently assigned for specific purposes [23] 1017 � address ranges that are permanently assigned for certain uses [23], 1018 from which single addresses are intended to be temporarily assigned 1019 � an address range reserved for assignment within an administrative 1020 scope which may therefore safely have multiple assignments within 1021 multiple administrative domains [17] 1023 For several years, multicast group addresses for public Mbone 1024 (experimental Internet multicast backbone overlay) sessions have been 1025 allocated using the sd (session directory) or sdr tools. In addition to 1026 advertising multicast sessions based on the SDP [13], sd and sdr 1027 encompass a proprietary mechanism for allocating multicast addresses to 1028 sessions from an IANA (Internet Assigned Numbers Authority) defined 1029 address range 224.2.128.0-224.2.255.255 [23]. The algorithm used has not 1030 been described publicly in detail, but is essentially random assignment 1031 from the available addresses to ensure very efficient use of the address 1032 space, but taking account of existing allocations to minimise the 1033 probability of collisions. There is also a collision detection algorithm 1034 included for the cases where an address is chosen simultaneously by 1035 multiple parties. 1037 Thus there is currently no standard mechanism for address allocation, 1038 but the author of the sd tools recently agreed to publish the address 1039 allocation mechanism from sdr as a separate protocol - Address 1040 Allocation Protocol (AAP) [20] - which could be incorporated into other 1041 address allocators. 1043 Microsoft created an alternative means for allocation of multicast 1044 addresses by extending their Dynamic Host Configuration Protocol, 1045 originally designed to allocate temporary unicast addresses. Multicast 1046 DHCP (M-DHCP) [22] will allocate multicast addresses, consistent with 1047 the requested Administrative Scope (Link Local, Organisation Local, 1048 Global, etc.) and with a `lease period', the effectiveness of which is 1049 unproven. This lease period can be renewed if desired. The M-DHCP 1050 authors recently removed all aspects of hierarchy from the M- 1051 DHCP protocol, to confine it to allocating multicast addresses to hosts 1052 initiating sessions, rather than also using it to allocate ranges of 1053 addresses down allocation hierarchies. This was in response to criticism 1054 over its static hierarchy and static allocation wasting the address 1055 space. All aspects of multicast address hierarchy, and allocation policy 1056 are expected to be handled by a separate protocol, such as AAP (see 1057 above). 1059 Where allocation of ranges of addresses to organisations is concerned 1060 there is concensus among the known proposals that these should not be 1061 static. Range allocations should be over one (longish) timescale with 1062 the allocation of addresses from within that range for the duration of 1063 individual sessions. The intention is to avoid long term "ownership" of 1064 addresses ranges by assuring organisations that they will be able to 1065 have addresses on demand as long as they co-operate with everyone else 1066 in returning unused address space. Thus the multicast address allocation 1067 proposals are distinct from unicast in the following aspects: 1068 � No fixed assignments of address ranges to IP domains - address 1069 ranges are claimed as needed 1070 � Random allocation of addresses to sessions within a domain 1071 � Allocations of addresses have finite lifetimes 1073 The MASC (Multicast Address Set Claim) [21] is one such proposal. This 1074 is work in progress but the outline of the mechanism as described by the 1075 authors is as follows: 1076 1. The ISP (Internet Service Provider) claims an address set (in 1077 general, the ISP's next level provider will be in control of address 1078 allocation to the ISP) 1079 2. Address ranges will be allocated for a set lifetime and in such a 1080 way that they may be aggregated. Allocation will take account of 1081 administratively scoped multicast addresses. 1082 3. The ISP then advertises this address range to other domains using 1083 BGP4++ [21] (or similar mechanism). All border routers will thus 1084 hear these announcements. It waits a certain time, however, (typ. 3 1085 days) before using the address range, to detect collisions 1086 4. Address allocation mechanisms (such as AAP or M-DHCP (see below)) 1087 will listen to announcements and allocate addresses to sessions 1088 appropriately. 1090 A possibility being investigated for MASC address range allocation is 1091 "Kampai style" addressing [19] which uses a non-continuous mask to 1092 define the allocated range so is more flexible when range sizes need to 1093 be arbitrarily increased or decreased. However, Kampai-style addressing 1094 allocates ranges in sizes that are powers of two and hence could be very 1095 wasteful where large ranges are concerned. Although there is brief 1096 consideration of a way to remove this restriction, it is admitted it 1097 will be more complex and hasn't been thought through. 1099 The BGMP (Border Gateway Multicast Protocol) draft [24], proposes a new 1100 architecture for Inter-Domain IP multicast: 1101 1. Address ranges are associated (at least temporarily) with domains, 1102 allocated using the MASC mechanism described above. 1103 2. These address ranges will be advertised globally. The proposal is 1104 that BGP4 will be used for this, given the proposed Multiprotocol 1105 Extensions [21]. Thus, multicast address ranges will be advertised 1106 in much the same way as unicast IPv4 NLRIs are advertised in BGP4 1107 now. They will be advertised at least a day in advance of use, to 1108 enable any clashes of address allocation to be resolved. 1109 3. A multicast session initiated within a domain will be allocated an 1110 address from within the domain's multicast address assignment. 1111 4. BGMP then assumes an inter-domain shared tree, for which the `root 1112 domain' (the focal domain of the shared tree) is the domain owning 1113 the address. 1115 Some alternative ideas generated while preparing this paper (but 1116 rejected in favour of the scheme presented) are recorded elsewhere [25] 1118 4. Limitations and Further Work 1120 The ability to do large-scale aggregation is based on hope that higher 1121 level organisation will be achieved. This scheme "feels" as if it is a 1122 good foundation for solutions to these problems, but the author can't 1123 give any guarantees without more work. This implies it will be necessary 1124 to simulate, build, test & refine. 1126 Aggregation of multicast addresses by this scheme will probably mean it 1127 is more efficient for routers to store multicast routing state keyed on 1128 interface than on multicast address. In other words, a table of 1129 aggregated multicast addresses would be held for each interface, rather 1130 than a table of interfaces for each address. The implications of this 1131 need investigation. 1133 The problem of how a router would efficiently look up each multicast 1134 packet in a table of AMAs has been deliberately left to one side. 1135 Because AMAs are logically similar to unicast address prefixes, similar 1136 techniques should be appropriate. This may not appear obvious, because 1137 an AMA is more obfuscated than a unicast routing prefix. Briefly, the 1138 first item to be extracted from an incoming packet would be d. This 1139 would point to where the mask started. The eight bits to the right of 1140 this mask would then be guaranteed to be unmasked (invariant), so d and 1141 this octet could be looked up in a Patricia trie or similar but more 1142 efficient data-structure [16]. The two potentially masked octets would 1143 then have to be built into another similar look-up table. Finding any 1144 one address in this structure would again be akin to the longest prefix 1145 (shortest mask) problem. 1147 It is possible that m is redundant, but it is believed that keeping it 1148 will make the aggregation maths a lot easier . 1150 Applicability of AMAs as a solution to reliable multicast 1151 clustering/layering needs assessment. 1153 Applicability of AMAs for aggregation in RSVP (reservation protocol) 1154 [27] when used for multicast needs assessment. 1156 Further work is needed on the potential for using AMAs to insulate 1157 upstream routers from high join/leave churn by introducing pessimistic 1158 inertia in the aggregation. The effect on leave latency (particularly 1159 where used for congestion control in layered multicast) would need 1160 careful study. 1162 The assertion that the weak capability for allocation growth (as 1163 compared to kampai-style addressing) is offset by more efficient storage 1164 needs more justification. 1166 It is believed that, due to topological realities, aggregation in the 1167 network will never approach the aggregation potential of applications 1168 that use sessions with multiple multicast addresses. This would be a 1169 strong argument against aggregation schemes that are not end to end, but 1170 this needs proving. 1172 Evolution from current versions of protocols needs more careful 1173 analysis. 1175 The design of an AMA scheme for IPv6 [18] needs to be done. 1176 ATM (asynchronous transfer mode) multicast may benefit from a scheme 1177 based on similar thinking? 1179 5. Security Considerations 1181 Phasing of AMA aggregation in routers must be designed carefully so that 1182 it is not possible for one user to join to an AMA that overlaps a 1183 neighbouring join, then leave the AMA and cause the neighbours to lose 1184 their join before their soft-state refresh re-instates it. 1186 It is possible that the aggregation of multicast addresses into sets for 1187 use in the description of complex sessions will cause service providers 1188 to hoard multicast addresses more than when they are allocated singly. 1189 The scheme has been carefully designed to avoid such a tendency on 1190 technical grounds, but predicting how selfish people adapt based on 1191 their understanding of a mechanism moves us into the realms of 1192 psychology where anything goes. In other words, address hoarding 1193 shouldn't be any worse than the situation was without this proposal as 1194 long as people don't misunderstand. 1196 This proposal is considered independent of all aspects of the security 1197 of (encrypted, tamper-proofed, authenticated etc.) multicasts. 1199 6. Conclusions 1201 A strong case has been made that, for multicast addresses to be open to 1202 aggregation, there must be a standard way to name arbitrary size groups 1203 of addresses. An efficient, elegant technique for doing this has been 1204 presented which gives equal value to each point of the multicast address 1205 space, thus preserving the principle of randomness designed to prevent 1206 address hoarding. These names have been called aggregated multicast 1207 addresses (AMAs). 1209 For IPv4, AMAs look like a 32 bit IP address coupled with, typically, an 1210 extra octet (but more generally with two octets) representing the size 1211 of the aggregation of addresses. The list of addresses that form the 1212 aggregation are defined by identifying a string of bits (of defined 1213 width) within the address, which are incremented as if they were a 1214 binary number in their own right until the aggregation size is reached. 1215 All but the first four bits of the address-like component represents the 1216 base address of the aggregation from which the incrementation starts. 1217 Because multicast addresses always(iv) start with the same four bits in 1218 IPv4 (1110b), the first four bits of an AMA can be used to store another 1219 value in transit and storage, but replaced by 1110b to derive the base 1220 multicast address of the aggregation when required. The value stored in 1221 the first four bits of an AMA represents the width of the field within 1222 the AMA which varies to define the list of addresses. Related but 1223 disjoint AMAs can also be represented efficiently by using the most 1224 general AMA form: an AMA followed by a sequence of alternating pairs of 1225 numbers which represent respectively a further jump from the based 1226 address then a further size of aggregation on top of this. 1228 In the scheme recommended for IPv4, potentially an arbitrary-sized 1229 aggregation of any size up to 2^16 - 1 multicast addresses could be 1230 represented by a field the size of an IPv4 address (4B) plus 2B (256kB 1231 reduced to 6B). However, it should be understood, that such ultra-large- 1232 scale aggregation has a low probability of happening without higher- 1233 level organisation of the address space by end-systems. First or second 1234 order aggregation will occur naturally under this scheme due to the 1235 large class of applications that build sessions from multiple multicast 1236 addresses. Medium-scale aggregation will be possible where routers can 1237 identify overlap or concatenation within multicast routing tables, which 1238 was not possible before without a way to describe aggregations of 1239 multicast addresses. This is because the AMA scheme is inherently 1240 recursive, so that it is possible to merge certain sets of AMAs into one 1241 AMA. To improve the chances of aggregation in multicast routing tables, 1242 address set allocation schemes must fulfil certain criteria that are 1243 laid down in this paper. 1245 The proposal is that AMAs will completely replace multicast addresses in 1246 contexts where aggregation makes sense, that is when describing, 1247 joining, leaving or updating the routing of multicast addresses. For 1248 sending data to and receiving data from multicast addresses, 1249 aggregation, and therefore AMAs, are not relevant. This implies 1250 protocols for describing multicast sessions (such as SDP, SAP, SIP, RSVP 1251 etc.) and protocols for updating the routing of multicast addresses 1252 (such as IGMP, DVMRP, CBT, PIM) will all need to be updated to handle 1253 AMAs in place of discrete multicast addresses. Independent evolution of 1254 all these protocols is considered to be reasonably straightforward. 1255 Although this represents a major round of protocol upgrades, all these 1256 protocols are experimental, and it is a commonly held view that 1257 multicast as it stands is not sufficiently scalable for wide-area 1258 deployment. 1260 It should be clarified that, although joining and leaving aggregates of 1261 multicast addresses can be achieved in single bulk operations, AMAs 1262 deliberately overlap in their use of individual addresses. Thus, 1263 allocation remains on a per address basis. In other words, when joining 1264 an AMA, it may be found that some of the addresses within it are in use 1265 if a strong address allocation scheme is not in use. An AMA allocation 1266 procedure is described in the text for mutating the AMA to cover an 1267 overlapping set of addresses to avoid those addresses in use while 1268 testing different addresses, until a full set of unused addresses is 1269 obtained without losing the addresses in the original AMA that were 1270 free. 1272 To summarise, we have presented a new paradigm(v) for multicasting, such 1273 that describing, joining, leaving and updating multicast routing can and 1274 should all be discussed in terms of aggregates of multicast addresses 1275 (even if they are of size unity) rather than discrete multicast 1276 addresses. We have introduced an efficient, elegant way to name such 1277 aggregates that preserves all the architectural principles on which 1278 multicast addressing is founded, and which will allow potentially large- 1279 scale aggregation of multicast addressing by both end-systems and 1280 routers in end to end co-operation. 1282 7. Acknowledgements 1284 Peter Bagnall, BT, for reviews and suggestions. 1286 8. References 1288 [1] S. McCanne, V. Jacobson, and M. Vetterli, (LBNL/UCB) "Receiver- 1289 driven Layered Multicast", In Proceedings of the ACM SIGCOMM'96, 1290 Stanford, California, pp.1-14. Aug 1996, 1291 1293 [2] S.E.Deering, "Multicast Routing in Internetworks and Extended LANs", 1294 In Proceedings of the ACM SIGCOMM'88, Stanford, California, Aug 1988 1296 [3] S.E.Deering (Stanford U.), "Host Extensions for IP Multicasting", 1297 IETF RFC, Aug 1989 1299 [4] W. Fenner (Xerox PARC), "Internet Group Management Protocol, Version 1300 2", IETF Internet Draft, 01 Nov 1997, 1302 [5] D. Waitzman, C. Partridge, S. Deering, "Distance Vector Multicast 1303 Routing Protocol" v1, IETF RFC, Nov 1988, 1305 [6] T. Pusateri (Juniper Networks), "Distance Vector Multicast Routing 1306 Protocol", IETF Internet Draft, 30 Oct 1997, 1309 [7] A.Ballardie, J.Crowcroft, "Core Based Trees: An Architecture for 1310 Scalable Inter-Domain Multicast Routing", In Proceedings of the ACM 1311 SIGCOMM'93, San Francisco, Sep 1993 1313 [8] A. Ballardie, "Core Based Trees (CBT) Multicast Routing 1314 Architecture", IETF RFC, Sep 1997, 1316 [9] A. Ballardie, "Core Based Trees (CBT version 2) Multicast Routing", 1317 IETF RFC, Sep 1997, 1319 [10] S.E.Deering, D.Estrin, D.Farinacci, V.Jacobsen, L.Ching-Gung and 1320 L.Wei, "An Architecture for Wide-Area Multicasting". In Proceedings of 1321 the ACM SIGCOMM'94, London, Sep 1994 1323 [11] Deering, S, et. al., "Protocol Independent Multicast Version 2, 1324 Dense Mode Specification", IETF Internet Draft, 28 May 1997 1327 [12] Estrin, D, et. al., "Protocol Independent Multicast Sparse-Mode 1328 (PIM-SM): Protocol Specification", IETF Internet Draft, 9 Sep 1997, 1329 1331 [13] M. Handley (ISI) and V. Jacobson (LBNL), "SDP: Session Description 1332 Protocol", IETF Internet Draft, 02 Sep 1997, 1335 [14] M. Handley (ISI), "SAP - Session Announcement Protocol", IETF 1336 Internet Draft (currently expired, see Multiparty Multimedia Session 1337 Control IETF working group charter for status 1338 ) 1340 [15] M. Handley (ISI), H. Schulzrinne (Columbia U.), E. Schooler 1341 (Caltech), "SIP: Session Initiation Protocol", IETF Internet Draft, 13 1342 Nov 1997, 1344 [16] M.Degermark, A.Brodnik, S.Carlsson, S.Pink, "Small Forwarding 1345 Tables for Fast Routing Look-ups", In Proceedings of the ACM SIGCOMM'97, 1346 Cannes, France, Sep 1997 1347 1349 [17] D. Meyer, University of Oregon, "Administratively Scoped IP 1350 Multicast", IETF Internet Draft, Nov 1997, 1353 [18] R. Hinden, Ipsilon Networks and S. Deering, Xerox PARC, "IP Version 1354 6 Addressing Architecture", IETF RFC 1356 [19] P. Tsuchiya, Bellcore, "Efficient and flexible Hierarchical Address 1357 Assignment", to be published - author's address: 1358 1360 [20] M. Handley (ISI), "AAP: Address Allocation Protocol" 1361 (presentation), In Proceedings of the 39th IETF (mmusic working group) 1362 1364 [21] D. Estrin (ISI), M. Handley (ISI) and D. Thaler (Merit), "MASC: 1365 Multicast Address Set Claim" (presentation), In Proceedings of the 39th 1366 IETF (idmr working group) 1369 [21] Tony Bates et al., Cisco Systems, Juniper Networks, "Multiprotocol 1370 Extensions for BGP-4", 26 Sep 1997, 1373 [22] Baiju V. Patel, Intel Corp, Munil Shah, Microsoft Corp. "Multicast 1374 address allocation extensions to the Dynamic Host Configuration 1375 Protocol", IETF Internet Draft, 24 Nov 1997, 1378 [23] Internet Assigned Numbers Authority, "Internet Multicast 1379 Addresses", 1382 [24] D. Thaler (U. Michigan), D. Estrin (USC/ISI) and D. Meyer (U. 1383 Oregon) , "Border Gateway Multicast Protocol (BGMP): Protocol 1384 Specification", IETF Internet Draft, 30 Oct 1997, 1387 [25] R. Briscoe, "Rejected ideas for End to End Aggregation of Multicast 1388 Addresses", 13 Nov 1997, 1389 1391 [26] M. Handley, "Multicast address allocation", in archive of postings 1392 to ipmulticast mailing list, 19 Jun 1997, 1393 1395 [27] L. Zhang, S. Deering, D. Estrin, S. Shenker, and D. Zappala, "RSVP: 1396 A New Resource ReSerVation Protocol" In IEEE Network, Sep 1993, 1397 1399 [28] This paper itself, in HTML format 1400 1402 9. Authors' Addresses 1404 Bob Briscoe 1405 B54/74 BT Labs 1406 Martlesham Heath 1407 Ipswich, IP5 3RE 1408 England 1409 Phone: +44 1473 645196 1410 Fax: +44 1473 640929 1411 EMail: briscorj@boat.bt.com 1412 Home page: http://www.labs.bt.com/people/briscorj/ 1414 Dr Martin Tatham 1415 B29/129 BT Labs 1416 Martlesham Heath 1417 Ipswich, IP5 3RE 1418 England 1419 Phone: +44 1473 642498 1420 Fax: +44 1473 640709 1421 EMail: martin.tatham@bt-sys.bt.co.uk 1423 Notes 1425 i) Pronounced as in "If I 'ad an AMA" (American civil rights song in a 1426 cockney accent). 1428 ii) The multicast addresses that are within the bit mask defined by the 1429 width m, but not between vmin & vmax (defined by vmin and s) are not 1430 "wasted" by using an AMA. In the example used above, the addresses with 1431 v = 001, 010 & 011 are not reserved by the AMA. They can be used by a 1432 completely different application and different set of users. The bit 1433 mask is just a way of narrowing the context of the s parameter (to 1434 define when it wraps). It causes no direct effect on address space 1435 utilisation, but without it, it is believed aggregation would be a lot 1436 more difficult. 1438 iii) Application programs and even routers should never need to know the 1439 list of multicast addresses that an AMA resolves to (other than for 1440 evolution from today's protocols). They will not need to bit shift along 1441 multicast addresses to find the value of d, then bit shift d bits back 1442 from the end to find the mask etc. The AMA tuple can be used by 1443 applications in all cases in place of listing the set of addresses it 1444 resolves to {this statement needs proving, mind - I haven't worked 1445 through AMA aggregation, expansion and contraction maths yet, but we 1446 live in hope!}. 1448 iv) A discussion of the issues if this turns out to not always be the 1449 case is under the section on "Further storage optimisation". 1451 v) There are some places where the "p word" really is appropriate.