idnits 2.17.1 

draft-armitage-ion-cluster-size-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in
     this document.

     Expected boilerplate is as follows today (2024-04-18) according to
     https://trustee.ietf.org/license-info :

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.a:
        This Internet-Draft is submitted in full conformance with the provisions
        of BCP 78 and BCP 79.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2:
        Copyright (c) 2024 IETF Trust and the persons identified as the document
        authors.  All rights reserved.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3:
        This document is subject to BCP 78 and the IETF Trust's Legal Provisions
        Relating to IETF Documents
        (https://trustee.ietf.org/license-info) in effect on the date of
        publication of this document.  Please review these documents
        carefully, as they describe your rights and restrictions with
        respect to this document.  Code Components extracted from this
        document must include Simplified BSD License text as described in
        Section 4.e of the Trust Legal Provisions and are provided
        without warranty as described in the Simplified BSD License.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** Missing expiration date.  The document expiration date should appear on
     the first and last page.

  ** The document seems to lack a 1id_guidelines paragraph about
     Internet-Drafts being working documents. 

  ** The document seems to lack a 1id_guidelines paragraph about 6 months
     document validity. 

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     current Internet-Drafts. 

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     Shadow Directories. 

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.

  ** There are 123 instances of weird spacing in the document.  Is it really
     formatted ragged-right, rather than justified?

  ** The abstract seems to contain references ([1]), which it shouldn't. 
     Please replace those with straight textual mentions of the documents in
     question.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == Line 36 has weird spacing: '...rrently  uses ...'

  == Line 37 has weird spacing: '... manage  the  ...'

  == Line 42 has weird spacing: '...oyed as  a ser...'

  == Line 44 has weird spacing: '...es that  will ...'

  == Line 50 has weird spacing: '...l- ling  to...'

  == (118 more instances...)

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- Couldn't find a document date in the document -- date freshness check
     skipped.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Unexpected draft version: The latest known version of 
     draft-ietf-ipatm-ipmc is -11, but you're referring to -12.


     Summary: 10 errors (**), 0 flaws (~~), 7 warnings (==), 3 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Internet-Draft                                      Grenville Armitage
2	                                                              Bellcore
3	                                                       July 12th, 1996

5	                   Issues affecting MARS Cluster Size
6	                <draft-armitage-ion-cluster-size-00.txt>

8	Status of this Memo

10	   This document was submitted to the IETF Internetworking over NBMA
11	   (ION) WG.  Publication of this document does not imply acceptance by
12	   the ION WG of any ideas expressed within.  Comments should be
13	   submitted to the ion@nexen.com mailing list.

15	   Distribution of this memo is unlimited.

17	   This memo is an internet draft. Internet Drafts are working documents
18	   of the Internet Engineering Task Force (IETF), its Areas, and its
19	   Working Groups. Note that other groups may also distribute working
20	   documents as Internet Drafts.

22	   Internet Drafts are draft documents valid for a maximum of six
23	   months.  Internet Drafts may be updated, replaced, or obsoleted by
24	   other documents at any time. It is not appropriate to use Internet
25	   Drafts as reference material or to cite them other than as a "working
26	   draft" or "work in progress".

28	   Please check the lid-abstracts.txt listing contained in the
29	   internet-drafts shadow directories on ds.internic.net (US East
30	   Coast), nic.nordu.net (Europe), ftp.isi.edu (US West Coast), or
31	   munnari.oz.au (Pacific Rim) to learn the current status of any
32	   Internet Draft.

34	Abstract

36	   IP multicast over ATM  currently  uses  the  MARS  model [1] to
37	   manage  the  use  of ATM pt-mpt SVCs for IP multicast packet
38	   forwarding. The scope of any given MARS services is the MARS Cluster
39	   - typically the same as an IPv4 Logical IP Subnet (LIS). Current
40	   IP/ATM networks are usually architected with unicast routing and
41	   forwarding issues dictating the sizes of individual LISes. However,
42	   as IP multicast is deployed as  a service,  the sizes of LISes will
43	   only be as big as a MARS Cluster can be. This document looks at the
44	   issues that  will constrain MARS Cluster size, and why large scale IP
45	   over ATM networks might preferably be built with many small Clusters
46	   rather than few large Clusters.

48	1. Introduction

50	   A MARS Cluster is the set of IP/ATM interfaces that are wil- ling  to
51	   engage in direct, ATM level pt-mpt SVCs to perform IP multicast
52	   packet forwarding [1].  Each IP/ATM  interface (a MARS  Client) must
53	   keep state information regarding the ATM addresses of each leaf node
54	   (recipient) of each  pt-mpt  SVC it   has  open.  In  addition,  each
55	   MARS  Client  receives MARS_JOIN and MARS_LEAVE messages  from  the
56	   MARS  whenever there  is a requirement that Clients around the
57	   Cluster need to update their pt-mpt SVCs for a given IP multicast
58	   group.

60	   The definition of Cluster 'size' can mean two things - the number of
61	   MARS Clients using a given MARS, and the geographic distribution of
62	   MARS Clients.  The number of MARS Clients in  a Cluster impacts on
63	   the amount of state information any given client may need to store
64	   while managing  outgoing  pt- mpt  SVCs. It also impacts on the
65	   average rate of JOIN/LEAVE traffic that is propagated by the MARS on
66	   ClusterControlVC, and the number of pt-mpt VCs that may need
67	   modification each time a MARS_JOIN or MARS_LEAVE appears on
68	   ClusterControlVC.

70	   The geographic distribution of clients impacts on the latency between
71	   a client issuing a MARS_JOIN, and it finally being added onto the
72	   pt-mpt VCs  of  the  other  MARS  Clients transmitting to the
73	   specified multicast group. (This latency is made up of both the time
74	   to propagate the MARS_JOIN, and the delay in the underlying ATM
75	   cloud's reaction to the subsequent ADD_PARTY messages.)

77	2. Limitations on state storage

79	   A Cluster should not contain more MARS Clients than the maximum
80	   number  of  leaf nodes supportable by the most limited member of the
81	   cluster.

83	   Two items are affected by this limitation:

85	      ClusterControlVC from the MARS. It has  a  leaf  node  per cluster
86	      member  (MARS Client). This limitation applies only to the node
87	      supporting the MARS itself.

89	      Packet forwarding SVCs out of each MARS Client for each IP
90	      multicast  group  being  sent to. The number of MARS Clients that
91	      may chose to be members of a given group may  encompass every MARS
92	      Client in the cluster.

94	   Under UNI 3.0/3.1 the most obvious limit on the  size  of  a cluster
95	   is the 2^15 leaf nodes that can be added to a pt-mpt SVC. However, in
96	   practice  most ATM  NICs  (and  probably switches) are going to
97	   impose a limit much lower than this - a function of how much per-leaf
98	   node state information  they need to store (and are capable of
99	   storing) for pt-mpt SVCs.

101	   A MARS Client may impose its own state storage  limitations, such
102	   that  the combined memory consumption of a MARS Client and the ATM
103	   NIC in a given host limits the client  to  fewer leaf  nodes  than
104	   the ATM NIC alone might have been able to support.

106	   Limitations of the switch to which a MARS or MARS Client  is directly
107	   attached  may  also  impose  a lower limit on leaf nodes than that of
108	   the MARS, MARS Client, or ATM NIC.  Cluster size  is limited by the
109	   most constraining of these limits.

111	   It may be possible to work around leaf node limits  by distributing
112	   the leaf nodes across multiple pt-mpt SVCs operating in parallel.
113	   However, such an approach requires  further study,  and is unlikely
114	   to be a useful workaround for Client or NIC based limitations.

116	   A related observation can also be made that  the  number  of MARS
117	   Clients in a Cluster may be limited by the memory constraints of the
118	   MARS itself. It is required to keep state on all  the  groups  that
119	   every  one  of its MARS Clients have joined. For a given memory
120	   limit, the maximum number of MARS Clients must drop if the average
121	   number of groups joined per Client rises. Depending on the level of
122	   group  memberships, this  limitation  may  be  more severe that pt-
123	   mpt leaf node limits.

125	3. Signaling load.

127	   In any given cluster there will be  an  'ambient'  level  of
128	   MARS_JOIN/LEAVE  activity.  What that level will actually be depends
129	   on the types of multicast  applications  running  on the  majority
130	   of the hosts in the cluster. It is reasonable to assume that as the
131	   number of  MARS  Clients  in  a  given cluster  rises, so does the
132	   ambient level of MARS_JOIN/LEAVE activity that  the  MARS  receives
133	   and  propagates  out  on ClusterControlVC.

135	   The existence of MARS_JOIN/LEAVE traffic also has  a  consequential
136	   impact  on  signaling  activity  at  the ATM level (across the UNI
137	   and {P}NNI boundaries). For groups that  are VC  Mesh  supported,
138	   each MARS_JOIN or MARS_LEAVE propagated on  ClusterControlVC  will
139	   result  in   an   ADD_PARTY   or DROP_PARTY  message sent across the
140	   UNIs of all MARS Clients that are transmitting  to  a  given  group.
141	   As  a  clusters membership  increases,  so  does  the average number
142	   of MARS Clients that trigger ATM signaling activity in  response  to
143	   MARS_JOIN/LEAVEs.

145	   The size of a cluster needs to be  chosen  to  provide  some level
146	   of  containment  to  this  ambient  level of MARS and UNI/NNI
147	   signaling.

149	   Some refinements to the MARS Client behaviour  may  also  be explored
150	   to  smooth  out UNI signaling transients. The MARS spec currently
151	   requires that revalidation of  group  memberships only occurs when
152	   the Client starts sending new packets to an invalidated group SVC. A
153	   Client could apply a  similar algorithm  to  decide  when it should
154	   issue ADD_PARTYs after seeing a MARS_JOIN - wait until it actually
155	   has a packet  to send,  send  the  packet,  then initiate the
156	   ADD_PARTY. As a result actively transmitting Clients would update
157	   their SVCs sooner  than  intermittently  transmitting Clients. This
158	   requires careful implementation of the Client state machine.

160	4. Group change latencies

162	   The group change latency can be defined as the time it takes for  all
163	   the  senders  to a group to have correctly updated their forwarding
164	   SVCs after a MARS_JOIN or MARS_LEAVE is received from the MARS. This
165	   is affected by both the number of Cluster members and the
166	   geographical distribution of Cluster members.

168	   The number of Cluster members affects the ATM level  signaling load
169	   offered  as  soon as a MARS_JOIN or MARS_LEAVE is seen. If the load
170	   is high, the ATM Cloud itself  may  suffer slow  processing  of  the
171	   various SVC modifications that are being requested.

173	   Wide geographic distribution of Cluster members  delays  the
174	   propagation of MARS_JOIN/LEAVE and ATM UNI/NNI messages. The further
175	   apart various members are, the longer it  takes  for them to receive
176	   MARS_JOIN/LEAVE traffic on ClusterControlVC, and the longer it takes
177	   for the  ATM  network  to  react  to ADD_PARTY  and  DROP_PARTY
178	   requests.  If  the long distance paths are populated by many ATM
179	   switches, propagation delays due  to  per-switch processing will add
180	   substantially to delays due to the speed of light.

182	   Unfortunately, some mechanisms for smoothing out  the  transient ATM
183	   signaling  load  described  in  section 3 have a consequence of
184	   increasing the group  change  latency  (since the  goal  is  for some
185	   of the senders to deliberately delay updating their forwarding SVCs)

187	   A related effect will also be felt by the MARS  itself.  The larger
188	   the MARS database, the longer it may take to process MARS_JOIN/LEAVE
189	   messages (which involve locating and  updating individual group
190	   entries). Whilst this issue may not be important for conferencing
191	   applications (with group  membership changes on the human time
192	   frame), high speed simulation environments may find such
193	   considerations important.

195	5. Large IP/ATM networks using Mrouters

197	   Building a large scale, multicast capable IP over  ATM network  is  a
198	   tradeoff  between  Cluster sizes and numbers of Mrouters. For a given
199	   number  of  hosts  across  the  entire IP/ATM network, as cluster
200	   sizes drop you need more of them.  Clusters must be interconnected by
201	   Mrouters, so  the  number of  Mrouters  rises.  (The  actual  rise
202	   in  the  number of Mrouters depends largely on  the  logical  IP
203	   topology  you choose to implement, since a single physical Mrouter
204	   may interconnect more than two Clusters at once.) It  is  a  local
205	   deployment  question  as to what the optimal mix of Clusters and
206	   Mrouters will be.

208	   A constructive way to view  conventional  Mrouters  is  that they
209	   are  aggregation  points  for signaling and data plane loads. An
210	   Mrouter hides  group  membership  changes  in  one cluster from
211	   senders within other Clusters, and protects local group members from
212	   being swamped by SVCs from senders in other  Clusters.
213	   MARS_JOIN/LEAVE  traffic in one Cluster is hidden from the members of
214	   all other Clusters.  (The consequential UNI signaling load is
215	   localized to the source Cluster too.) Group members in a cluster are
216	   fed packets from an SVC  originating  on the MARS Client residing in
217	   their local Mrouter, rather than terminating multiple  SVCs
218	   originating on the actual senders in remote Clusters.

220	   As a side effect of the Mrouters role  in  aggregating  data path
221	   flows, it reduces the impact of SVC leaf-node limits. A hypothetical
222	   10000 node Cluster could  be  broken  into  two 5000 node Clusters,
223	   or four 2500 node Clusters. In each case the individual Cluster
224	   members need only source pt-mpt  SVCs with maximums of 5000 or 2500
225	   leaf nodes respectively.

227	6. Large IP/ATM networks using Cell Switch Routers (CSRs)

229	   A Cell Switch Router may act as a conventional Mrouter,  and provide
230	   all the benefits described in the previous section.  However, one of
231	   the useful characteristics of the CSR is the ability to internally
232	   'short-cut' the cells from an incoming VCC to an outgoing VCC. Once
233	   the CSR has identified  a  flow of  IP  traffic, and associated it
234	   with an inbound and outbound VCC, it begins to  function  as an  ATM
235	   cell level device rather than a packet level device.  Even when
236	   operating in a 'short-cut' mode the CSR  is  still able to protect
237	   Clusters from the MARS_JOIN/LEAVE activities of surrounding Clusters.
238	   From the perspective of Clusters to which  the  CSR is directly
239	   attached, the CSR terminates and originates pt-mpt SVCs. It acts as
240	   the path out of a  source Cluster,  and  the  entry  point  into  a
241	   target Cluster. It remains unnecessary for senders  in  one  Cluster
242	   to  issue ADD_PARTY  or  DROP_PARTY  messages  in  response  to
243	   group membership changes in other Clusters - the CSR tracks  these
244	   changes, and updates the pt-mpt trees rooted on its own ATM ports as
245	   needed.

247	   However, there is one significant point of difference  to  a
248	   conventional  Mrouter  -  a  simple CSR cannot aggregate the packet
249	   flows from multiple senders in  one  Cluster  onto  a single  SVC
250	   into an adjacent Cluster. Within a Cluster with multiple sources, the
251	   CSR is a leaf node  on  an  individual SVC per source (just like a
252	   conventional Mrouter). But if it chooses to 'short-cut' traffic at
253	   the cell  level  to  group members  in  another  Cluster,  it must
254	   construct a separate forwarding SVC into the target cluster  to
255	   match  each  VCC from  each  sender  in  the source Cluster. This
256	   requirement stems from the need to maintain AAL_SDU  boundaries  at
257	   the ultimate  recipients - the group members in the target cluster.
258	   If the cells from  individual  senders  in  the  source Cluster were
259	   FIFO merged onto a single outgoing SVC into the target Cluster,
260	   recipients in the target Cluster would  have a  hard time
261	   reconstructing individual AAL_SDUs from the interleaved cells. (This
262	   is mostly due to  our  use  of  AAL5.  AAL3/4  could  provide  a
263	   solution  using  the  MID  field, although we would be limited to
264	   2^10 senders per Cluster and introduce a MID management problem.)

266	   Interestingly, this problem can magnify  the  UNI  signaling load
267	   offered within the target Cluster whenever a new group member
268	   arrives. If there are N senders in the  source Cluster,  the CSR will
269	   have built N identical pt-mpt SVCs out to the group members  within
270	   the  target  Cluster.  If  a  new MARS_JOIN  is issued within the
271	   target Cluster, the CSR must issue N ADD_PARTYs to update its SVCs
272	   into the target Cluster.  (Under  similar  circumstances  a
273	   conventional Mrouter would have issued only one ADD_PARTY for its
274	   single SVC into the target Cluster.)

276	   A possible solution is for the CSR's underlying cell switching fabric
277	   to provide AAL_SDU-aware cell forwarding.  If segmented AAL_SDUs
278	   arriving from the source  Cluster  could  be buffered  and  forwarded
279	   in groups of cells representing entire AAL_SDUs, the CSR would need
280	   only a single SVC into the target  Cluster.  Its impact on the
281	   Clusters it was attached to would then be the same as that of a
282	   conventional Mrouter.  (This  does  not necessarily imply full re-
283	   assembly followed by segmentation. It would be  sufficient  for  the
284	   incoming cells  to be buffered in sequence, and the fed onto the
285	   outbound SVC. The CSRs switch fabric would  not  be  performing any
286	   AAL  level  checks  other  than detecting AAL_SDU boundaries.)

288	7. The impact of Multicast Servers (MCSs)

290	   The MCS has an intra-Cluster affect  somewhat  analogous  to the
291	   inter-Cluster  affect  of  the  Mrouter.  It aggregates AAL_SDU flows
292	   around the Cluster into a single  pt-mpt  SVC.  This  single pt-mpt
293	   SVC is the only one that needs to be updated when an intra-cluster
294	   group membership change occurs.

296	   It also reduces the amount  of  MARS_JOIN/LEAVE  traffic  on
297	   ClusterControlVC - such  messages for MCS supported groups are
298	   propagated out  on  ServerControlVC,  thus  interrupting only  the
299	   (presumably smaller) set of MCSes attached to the MARS. One way to
300	   look at an MCS is a stripped-down  Mrouter, operating intra-Cluster
301	   and performing minimal (if any) forwarding decisions based on IP
302	   level information. Whether the use  of MCSs allows you to deploy
303	   larger Clusters depends on the mix of MCS supported groups and VC
304	   Mesh supported groups within your Cluster.

306	8. Conclusion

308	   This short document has provided a high level overview of  the
309	   parameters affecting the size of MARS Clusters within multicast
310	   capable IP/ATM networks. Limitations on the  number  of leaf nodes a
311	   pt-mpt SVC may support, sizes of the MARS database, propagation
312	   delays of MARS and UNI messages,  and  the frequency  of  MARS and
313	   UNI control messages are all identified as  issues  that  will
314	   constrain  Clusters.  Mrouters (either  conventional  or in Cell
315	   Switch Router form) were identified as useful aggregators of IP
316	   multicast traffic  and  signaling  information.  Large scale IP
317	   multicasting over ATM requires a combination of Mrouters and
318	   appropriately sized MARS Clusters.

320	Security Consideration

322	   Security consideration are not addressed in this document.

324	Acknowledgments

326	Author's Address

328	   Grenville Armitage
329	   Bellcore, 445 South Street
330	   Morristown, NJ, 07960
331	   USA
332	   Email: gja@thumper.bellcore.com
333	   Ph. +1 201 829 2635

335	References

337	   [1] G. Armitage, "Support for Multicast over UNI 3.0/3.1 based ATM
338	   Networks.", Bellcore, INTERNET DRAFT, draft-ietf-ipatm-ipmc-12.txt,
339	   February 1996.