idnits 2.17.1 draft-armitage-ion-cluster-size-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-18) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity. ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 123 instances of weird spacing in the document. Is it really formatted ragged-right, rather than justified? ** The abstract seems to contain references ([1]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 36 has weird spacing: '...rrently uses ...' == Line 37 has weird spacing: '... manage the ...' == Line 42 has weird spacing: '...oyed as a ser...' == Line 44 has weird spacing: '...es that will ...' == Line 50 has weird spacing: '...l- ling to...' == (118 more instances...) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Unexpected draft version: The latest known version of draft-ietf-ipatm-ipmc is -11, but you're referring to -12. Summary: 10 errors (**), 0 flaws (~~), 7 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet-Draft Grenville Armitage 2 Bellcore 3 July 12th, 1996 5 Issues affecting MARS Cluster Size 6 8 Status of this Memo 10 This document was submitted to the IETF Internetworking over NBMA 11 (ION) WG. Publication of this document does not imply acceptance by 12 the ION WG of any ideas expressed within. Comments should be 13 submitted to the ion@nexen.com mailing list. 15 Distribution of this memo is unlimited. 17 This memo is an internet draft. Internet Drafts are working documents 18 of the Internet Engineering Task Force (IETF), its Areas, and its 19 Working Groups. Note that other groups may also distribute working 20 documents as Internet Drafts. 22 Internet Drafts are draft documents valid for a maximum of six 23 months. Internet Drafts may be updated, replaced, or obsoleted by 24 other documents at any time. It is not appropriate to use Internet 25 Drafts as reference material or to cite them other than as a "working 26 draft" or "work in progress". 28 Please check the lid-abstracts.txt listing contained in the 29 internet-drafts shadow directories on ds.internic.net (US East 30 Coast), nic.nordu.net (Europe), ftp.isi.edu (US West Coast), or 31 munnari.oz.au (Pacific Rim) to learn the current status of any 32 Internet Draft. 34 Abstract 36 IP multicast over ATM currently uses the MARS model [1] to 37 manage the use of ATM pt-mpt SVCs for IP multicast packet 38 forwarding. The scope of any given MARS services is the MARS Cluster 39 - typically the same as an IPv4 Logical IP Subnet (LIS). Current 40 IP/ATM networks are usually architected with unicast routing and 41 forwarding issues dictating the sizes of individual LISes. However, 42 as IP multicast is deployed as a service, the sizes of LISes will 43 only be as big as a MARS Cluster can be. This document looks at the 44 issues that will constrain MARS Cluster size, and why large scale IP 45 over ATM networks might preferably be built with many small Clusters 46 rather than few large Clusters. 48 1. Introduction 50 A MARS Cluster is the set of IP/ATM interfaces that are wil- ling to 51 engage in direct, ATM level pt-mpt SVCs to perform IP multicast 52 packet forwarding [1]. Each IP/ATM interface (a MARS Client) must 53 keep state information regarding the ATM addresses of each leaf node 54 (recipient) of each pt-mpt SVC it has open. In addition, each 55 MARS Client receives MARS_JOIN and MARS_LEAVE messages from the 56 MARS whenever there is a requirement that Clients around the 57 Cluster need to update their pt-mpt SVCs for a given IP multicast 58 group. 60 The definition of Cluster 'size' can mean two things - the number of 61 MARS Clients using a given MARS, and the geographic distribution of 62 MARS Clients. The number of MARS Clients in a Cluster impacts on 63 the amount of state information any given client may need to store 64 while managing outgoing pt- mpt SVCs. It also impacts on the 65 average rate of JOIN/LEAVE traffic that is propagated by the MARS on 66 ClusterControlVC, and the number of pt-mpt VCs that may need 67 modification each time a MARS_JOIN or MARS_LEAVE appears on 68 ClusterControlVC. 70 The geographic distribution of clients impacts on the latency between 71 a client issuing a MARS_JOIN, and it finally being added onto the 72 pt-mpt VCs of the other MARS Clients transmitting to the 73 specified multicast group. (This latency is made up of both the time 74 to propagate the MARS_JOIN, and the delay in the underlying ATM 75 cloud's reaction to the subsequent ADD_PARTY messages.) 77 2. Limitations on state storage 79 A Cluster should not contain more MARS Clients than the maximum 80 number of leaf nodes supportable by the most limited member of the 81 cluster. 83 Two items are affected by this limitation: 85 ClusterControlVC from the MARS. It has a leaf node per cluster 86 member (MARS Client). This limitation applies only to the node 87 supporting the MARS itself. 89 Packet forwarding SVCs out of each MARS Client for each IP 90 multicast group being sent to. The number of MARS Clients that 91 may chose to be members of a given group may encompass every MARS 92 Client in the cluster. 94 Under UNI 3.0/3.1 the most obvious limit on the size of a cluster 95 is the 2^15 leaf nodes that can be added to a pt-mpt SVC. However, in 96 practice most ATM NICs (and probably switches) are going to 97 impose a limit much lower than this - a function of how much per-leaf 98 node state information they need to store (and are capable of 99 storing) for pt-mpt SVCs. 101 A MARS Client may impose its own state storage limitations, such 102 that the combined memory consumption of a MARS Client and the ATM 103 NIC in a given host limits the client to fewer leaf nodes than 104 the ATM NIC alone might have been able to support. 106 Limitations of the switch to which a MARS or MARS Client is directly 107 attached may also impose a lower limit on leaf nodes than that of 108 the MARS, MARS Client, or ATM NIC. Cluster size is limited by the 109 most constraining of these limits. 111 It may be possible to work around leaf node limits by distributing 112 the leaf nodes across multiple pt-mpt SVCs operating in parallel. 113 However, such an approach requires further study, and is unlikely 114 to be a useful workaround for Client or NIC based limitations. 116 A related observation can also be made that the number of MARS 117 Clients in a Cluster may be limited by the memory constraints of the 118 MARS itself. It is required to keep state on all the groups that 119 every one of its MARS Clients have joined. For a given memory 120 limit, the maximum number of MARS Clients must drop if the average 121 number of groups joined per Client rises. Depending on the level of 122 group memberships, this limitation may be more severe that pt- 123 mpt leaf node limits. 125 3. Signaling load. 127 In any given cluster there will be an 'ambient' level of 128 MARS_JOIN/LEAVE activity. What that level will actually be depends 129 on the types of multicast applications running on the majority 130 of the hosts in the cluster. It is reasonable to assume that as the 131 number of MARS Clients in a given cluster rises, so does the 132 ambient level of MARS_JOIN/LEAVE activity that the MARS receives 133 and propagates out on ClusterControlVC. 135 The existence of MARS_JOIN/LEAVE traffic also has a consequential 136 impact on signaling activity at the ATM level (across the UNI 137 and {P}NNI boundaries). For groups that are VC Mesh supported, 138 each MARS_JOIN or MARS_LEAVE propagated on ClusterControlVC will 139 result in an ADD_PARTY or DROP_PARTY message sent across the 140 UNIs of all MARS Clients that are transmitting to a given group. 141 As a clusters membership increases, so does the average number 142 of MARS Clients that trigger ATM signaling activity in response to 143 MARS_JOIN/LEAVEs. 145 The size of a cluster needs to be chosen to provide some level 146 of containment to this ambient level of MARS and UNI/NNI 147 signaling. 149 Some refinements to the MARS Client behaviour may also be explored 150 to smooth out UNI signaling transients. The MARS spec currently 151 requires that revalidation of group memberships only occurs when 152 the Client starts sending new packets to an invalidated group SVC. A 153 Client could apply a similar algorithm to decide when it should 154 issue ADD_PARTYs after seeing a MARS_JOIN - wait until it actually 155 has a packet to send, send the packet, then initiate the 156 ADD_PARTY. As a result actively transmitting Clients would update 157 their SVCs sooner than intermittently transmitting Clients. This 158 requires careful implementation of the Client state machine. 160 4. Group change latencies 162 The group change latency can be defined as the time it takes for all 163 the senders to a group to have correctly updated their forwarding 164 SVCs after a MARS_JOIN or MARS_LEAVE is received from the MARS. This 165 is affected by both the number of Cluster members and the 166 geographical distribution of Cluster members. 168 The number of Cluster members affects the ATM level signaling load 169 offered as soon as a MARS_JOIN or MARS_LEAVE is seen. If the load 170 is high, the ATM Cloud itself may suffer slow processing of the 171 various SVC modifications that are being requested. 173 Wide geographic distribution of Cluster members delays the 174 propagation of MARS_JOIN/LEAVE and ATM UNI/NNI messages. The further 175 apart various members are, the longer it takes for them to receive 176 MARS_JOIN/LEAVE traffic on ClusterControlVC, and the longer it takes 177 for the ATM network to react to ADD_PARTY and DROP_PARTY 178 requests. If the long distance paths are populated by many ATM 179 switches, propagation delays due to per-switch processing will add 180 substantially to delays due to the speed of light. 182 Unfortunately, some mechanisms for smoothing out the transient ATM 183 signaling load described in section 3 have a consequence of 184 increasing the group change latency (since the goal is for some 185 of the senders to deliberately delay updating their forwarding SVCs) 187 A related effect will also be felt by the MARS itself. The larger 188 the MARS database, the longer it may take to process MARS_JOIN/LEAVE 189 messages (which involve locating and updating individual group 190 entries). Whilst this issue may not be important for conferencing 191 applications (with group membership changes on the human time 192 frame), high speed simulation environments may find such 193 considerations important. 195 5. Large IP/ATM networks using Mrouters 197 Building a large scale, multicast capable IP over ATM network is a 198 tradeoff between Cluster sizes and numbers of Mrouters. For a given 199 number of hosts across the entire IP/ATM network, as cluster 200 sizes drop you need more of them. Clusters must be interconnected by 201 Mrouters, so the number of Mrouters rises. (The actual rise 202 in the number of Mrouters depends largely on the logical IP 203 topology you choose to implement, since a single physical Mrouter 204 may interconnect more than two Clusters at once.) It is a local 205 deployment question as to what the optimal mix of Clusters and 206 Mrouters will be. 208 A constructive way to view conventional Mrouters is that they 209 are aggregation points for signaling and data plane loads. An 210 Mrouter hides group membership changes in one cluster from 211 senders within other Clusters, and protects local group members from 212 being swamped by SVCs from senders in other Clusters. 213 MARS_JOIN/LEAVE traffic in one Cluster is hidden from the members of 214 all other Clusters. (The consequential UNI signaling load is 215 localized to the source Cluster too.) Group members in a cluster are 216 fed packets from an SVC originating on the MARS Client residing in 217 their local Mrouter, rather than terminating multiple SVCs 218 originating on the actual senders in remote Clusters. 220 As a side effect of the Mrouters role in aggregating data path 221 flows, it reduces the impact of SVC leaf-node limits. A hypothetical 222 10000 node Cluster could be broken into two 5000 node Clusters, 223 or four 2500 node Clusters. In each case the individual Cluster 224 members need only source pt-mpt SVCs with maximums of 5000 or 2500 225 leaf nodes respectively. 227 6. Large IP/ATM networks using Cell Switch Routers (CSRs) 229 A Cell Switch Router may act as a conventional Mrouter, and provide 230 all the benefits described in the previous section. However, one of 231 the useful characteristics of the CSR is the ability to internally 232 'short-cut' the cells from an incoming VCC to an outgoing VCC. Once 233 the CSR has identified a flow of IP traffic, and associated it 234 with an inbound and outbound VCC, it begins to function as an ATM 235 cell level device rather than a packet level device. Even when 236 operating in a 'short-cut' mode the CSR is still able to protect 237 Clusters from the MARS_JOIN/LEAVE activities of surrounding Clusters. 238 From the perspective of Clusters to which the CSR is directly 239 attached, the CSR terminates and originates pt-mpt SVCs. It acts as 240 the path out of a source Cluster, and the entry point into a 241 target Cluster. It remains unnecessary for senders in one Cluster 242 to issue ADD_PARTY or DROP_PARTY messages in response to 243 group membership changes in other Clusters - the CSR tracks these 244 changes, and updates the pt-mpt trees rooted on its own ATM ports as 245 needed. 247 However, there is one significant point of difference to a 248 conventional Mrouter - a simple CSR cannot aggregate the packet 249 flows from multiple senders in one Cluster onto a single SVC 250 into an adjacent Cluster. Within a Cluster with multiple sources, the 251 CSR is a leaf node on an individual SVC per source (just like a 252 conventional Mrouter). But if it chooses to 'short-cut' traffic at 253 the cell level to group members in another Cluster, it must 254 construct a separate forwarding SVC into the target cluster to 255 match each VCC from each sender in the source Cluster. This 256 requirement stems from the need to maintain AAL_SDU boundaries at 257 the ultimate recipients - the group members in the target cluster. 258 If the cells from individual senders in the source Cluster were 259 FIFO merged onto a single outgoing SVC into the target Cluster, 260 recipients in the target Cluster would have a hard time 261 reconstructing individual AAL_SDUs from the interleaved cells. (This 262 is mostly due to our use of AAL5. AAL3/4 could provide a 263 solution using the MID field, although we would be limited to 264 2^10 senders per Cluster and introduce a MID management problem.) 266 Interestingly, this problem can magnify the UNI signaling load 267 offered within the target Cluster whenever a new group member 268 arrives. If there are N senders in the source Cluster, the CSR will 269 have built N identical pt-mpt SVCs out to the group members within 270 the target Cluster. If a new MARS_JOIN is issued within the 271 target Cluster, the CSR must issue N ADD_PARTYs to update its SVCs 272 into the target Cluster. (Under similar circumstances a 273 conventional Mrouter would have issued only one ADD_PARTY for its 274 single SVC into the target Cluster.) 276 A possible solution is for the CSR's underlying cell switching fabric 277 to provide AAL_SDU-aware cell forwarding. If segmented AAL_SDUs 278 arriving from the source Cluster could be buffered and forwarded 279 in groups of cells representing entire AAL_SDUs, the CSR would need 280 only a single SVC into the target Cluster. Its impact on the 281 Clusters it was attached to would then be the same as that of a 282 conventional Mrouter. (This does not necessarily imply full re- 283 assembly followed by segmentation. It would be sufficient for the 284 incoming cells to be buffered in sequence, and the fed onto the 285 outbound SVC. The CSRs switch fabric would not be performing any 286 AAL level checks other than detecting AAL_SDU boundaries.) 288 7. The impact of Multicast Servers (MCSs) 290 The MCS has an intra-Cluster affect somewhat analogous to the 291 inter-Cluster affect of the Mrouter. It aggregates AAL_SDU flows 292 around the Cluster into a single pt-mpt SVC. This single pt-mpt 293 SVC is the only one that needs to be updated when an intra-cluster 294 group membership change occurs. 296 It also reduces the amount of MARS_JOIN/LEAVE traffic on 297 ClusterControlVC - such messages for MCS supported groups are 298 propagated out on ServerControlVC, thus interrupting only the 299 (presumably smaller) set of MCSes attached to the MARS. One way to 300 look at an MCS is a stripped-down Mrouter, operating intra-Cluster 301 and performing minimal (if any) forwarding decisions based on IP 302 level information. Whether the use of MCSs allows you to deploy 303 larger Clusters depends on the mix of MCS supported groups and VC 304 Mesh supported groups within your Cluster. 306 8. Conclusion 308 This short document has provided a high level overview of the 309 parameters affecting the size of MARS Clusters within multicast 310 capable IP/ATM networks. Limitations on the number of leaf nodes a 311 pt-mpt SVC may support, sizes of the MARS database, propagation 312 delays of MARS and UNI messages, and the frequency of MARS and 313 UNI control messages are all identified as issues that will 314 constrain Clusters. Mrouters (either conventional or in Cell 315 Switch Router form) were identified as useful aggregators of IP 316 multicast traffic and signaling information. Large scale IP 317 multicasting over ATM requires a combination of Mrouters and 318 appropriately sized MARS Clusters. 320 Security Consideration 322 Security consideration are not addressed in this document. 324 Acknowledgments 326 Author's Address 328 Grenville Armitage 329 Bellcore, 445 South Street 330 Morristown, NJ, 07960 331 USA 332 Email: gja@thumper.bellcore.com 333 Ph. +1 201 829 2635 335 References 337 [1] G. Armitage, "Support for Multicast over UNI 3.0/3.1 based ATM 338 Networks.", Bellcore, INTERNET DRAFT, draft-ietf-ipatm-ipmc-12.txt, 339 February 1996.