idnits 2.17.1 

draft-ietf-mboned-mrm-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Looks like you're using RFC 2026 boilerplate.  This must be updated to
     follow RFC 3978/3979, as updated by RFC 4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** Missing expiration date.  The document expiration date should appear on
     the first and last page.

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard

  == It seems as if not all pages are separated by form feeds - found 0 form
     feeds but 22 pages


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.

  ** There are 52 instances of too long lines in the document, the longest
     one being 2 characters in excess of 72.

  == There are 6 instances of lines with multicast IPv4 addresses in the
     document.  If these are generic example addresses, they should be changed
     to use the 233.252.0.x range defined in RFC 5771


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The "Author's Address" (or "Authors' Addresses") section title is
     misspelled.

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- Couldn't find a document date in the document -- date freshness check
     skipped.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: 'AH' is mentioned on line 165, but not defined

  == Missing Reference: 'RFC1889' is mentioned on line 467, but not defined

  ** Obsolete undefined reference: RFC 1889 (Obsoleted by RFC 3550)

  == Unused Reference: 'UDP' is defined on line 1024, but no explicit
     reference was found in the text

  == Unused Reference: 'MD5' is defined on line 1030, but no explicit
     reference was found in the text

  ** Downref: Normative reference to an Informational RFC: RFC 1321 (ref.
     'MD5')

  -- Possible downref: Non-RFC (?) normative reference: ref. 'KA98'


     Summary: 7 errors (**), 0 flaws (~~), 8 warnings (==), 3 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	MBone Deployment Working Group                        Kevin Almeroth (ed)
2	Internet Engineering Task Force                                      UCSB
3	Internet Draft                                                 Liming Wei
4	October 1999                                          Siara Systems, Inc.
5	Expires:  April 1999                                       Dino Farinacci
6	                                                                    Cisco

8	                  Multicast Reachability Monitor (MRM)
9	                    <draft-ietf-mboned-mrm-00.txt>

11	Status of this Memo

13	   This document is an Internet-Draft and is in full conformance
14	   with all provisions of Section 10 of RFC2026.

16	   Internet-Drafts are working documents of the Internet Engineering
17	   Task Force (IETF), its areas, and its working groups.  Note that
18	   other groups may also distribute working documents as
19	   Internet-Drafts.

21	   Internet-Drafts are draft documents valid for a maximum of six
22	   months and may be updated, replaced, or obsoleted by other
23	   documents at any time.  It is inappropriate to use Internet-
24	   Drafts as reference material or to cite them other than as
25	   "work in progress."

27	   The list of current Internet-Drafts can be accessed at
28	   http://www.ietf.org/ietf/1id-abstracts.txt

30	   The list of Internet-Draft Shadow Directories can be accessed at
31	   http://www.ietf.org/shadow.html.

33	Abstract

35	   MRM facilitates automated fault detection and fault isolation in a
36	   large multicast routing infrastructure. It is designed to alarm a
37	   network administrator of multicast reachability problems in close
38	   to real-time.

40	   There are two basic types of components in MRM, MRM manager and MRM
41	   testers. This document specifies the protocol with which the two MRM
42	   components communicate, the types of operations the testers perform,
43	   and information an MRM manager can obtain.

45	Table of Contents

47	   Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   1

49	1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . .   3
50	   1.1 Partitioning Network Monitoring Tasks  . . . . . . . . . . . .   3

52	2. Functions of the MRM Mechanism . . . . . . . . . . . . . . . . . .   4
53	   2.1 Fault Detection  . . . . . . . . . . . . . . . . . . . . . . .   4
54	   2.2 Fault Isolation  . . . . . . . . . . . . . . . . . . . . . . .   5
55	   2.3 The Protocol . . . . . . . . . . . . . . . . . . . . . . . . .   5
56	      2.3.1 MRM Manager Requests  . . . . . . . . . . . . . . . . . .   6
57	         2.3.1.1 MRM Manager Beacon Message . . . . . . . . . . . . .   7
58	         2.3.1.2 Test Sender Requests (TSRs)  . . . . . . . . . . . .   7
59	         2.3.1.3 Test Receiver Requests (TRRs)  . . . . . . . . . . .   8
60	      2.3.2 Status Reports  . . . . . . . . . . . . . . . . . . . . .  10

62	3. Use of MRM Well Known Addresses and Ports  . . . . . . . . . . . .  11

64	4. Message Formats  . . . . . . . . . . . . . . . . . . . . . . . . .  11
65	   4.1 MRM Message Header . . . . . . . . . . . . . . . . . . . . . .  12
66	   4.2 MRM Manager Beacon Message . . . . . . . . . . . . . . . . . .  13
67	   4.3 Test Sender Request (TSR)  . . . . . . . . . . . . . . . . . .  13
68	   4.4 Test Receiver Requests (TRR) . . . . . . . . . . . . . . . . .  14
69	   4.5 Status Report to the MRM Manager . . . . . . . . . . . . . . .  16
70	   4.6 MRM Test Packet  . . . . . . . . . . . . . . . . . . . . . . .  17
71	   4.7 MRM Request-Ack Messages . . . . . . . . . . . . . . . . . . .  17

73	5. Authenticating MRM Messages  . . . . . . . . . . . . . . . . . . .  17
74	   5.1 Generating Authenticated Messages  . . . . . . . . . . . . . .  18
75	   5.2 Receiving Authenticated Messages . . . . . . . . . . . . . . .  18
76	   5.3 Key Management . . . . . . . . . . . . . . . . . . . . . . . .  18

78	6. Security Considerations  . . . . . . . . . . . . . . . . . . . . .  19

80	7. Different Approaches to Implement MRM  . . . . . . . . . . . . . .  19

82	8. Example of an MRM Setup  . . . . . . . . . . . . . . . . . . . . .  19

84	9. Acknowledgment . . . . . . . . . . . . . . . . . . . . . . . . . .  21

86	10. Authors addresses . . . . . . . . . . . . . . . . . . . . . . . .  21

88	11. References  . . . . . . . . . . . . . . . . . . . . . . . . . . .  22

90	Appendix A - Change History . . . . . . . . . . . . . . . . . . . . .  22
91	1. Introduction

93	   The Multicast Reachability Monitor (MRM) is a network fault detection
94	   and isolation mechanism for administering a multicast routing
95	   infrastructure. It is proposed in response to requests from network
96	   managers and users who need more systematic ways to get up-to-date
97	   multicast reachability status. For these purposes, existing tools are
98	   inefficient and inconvenient to use across large numbers of systems.
99	   The companion document [mrm-use] contains additional information on
100	   justification and usage guidelines for MRM.

102	   The design goals for MRM include:

104	   (1) Close to real-time detection and alarm of network problems,
105	       independent of user input;

107	   (2) Good coverage over the network, both in terms of the number of
108	       systems to be monitored, and the types of diagnostics to be
109	       performed;

111	   (3) Good extensibility and relative independence of other specific
112	       diagnostic tools and protocols (we borrow packet formats from
113	       RTPv2, but almost nothing else from the RTP protocol). This makes
114	       it easy to incorporate newer diagnostic tools as they become
115	       available.

117	1.1 Partitioning Network Monitoring Tasks

119	   Functionally, the task of monitoring a multicast domain can be
120	   divided into two subtasks:

122	   (1) Fault detection
123	   (2) Fault isolation

125	   In the fault detection phase, the participating MRM systems do not
126	   need much detail about the nature of the fault.  The mechanism can
127	   be very simple and brute force.  Data packets can be originated
128	   from designated locations in the network and reception conditions
129	   monitored from other locations.

131	   In the fault isolation phase, depending on the types of fault
132	   identified, the MRM manager can use proper tools to isolate the
133	   fault and hopefully pin-point the location or reasons of the fault.

135	   The rest of this document is organized as follows, Section 2
136	   describes the MRM framework and details of the MRM protocol; Section
137	   3 describes the usage of the well known MRM addresses and ports;
138	   Section 4 specifies packet formats; Sections 5 discusses the MRM
139	   authentication mechanisms; Section 6 discusses a few security issues;
140	   and Section 8 gives an example of MRM setup.

142	2. Functions of the MRM Mechanism

144	   An MRM based fault monitoring system consists of two types of
145	   components: (1) an MRM manager that configures tests, collects and
146	   presents fault information, and (2) MRM testers that source or sink
147	   test traffic. These components collaborate to accomplish the two
148	   functions of MRM: fault detection and fault isolation.

150	   The MRM testers can be any routing devices or trusted end hosts.
151	   They provide statistics about received data packets, to be used to
152	   derive the network reachability status. These data packets can be
153	   sourced by a router acting as an MRM tester, in response to a request
154	   from the MRM manager. A system originating MRM data packets for
155	   testing purposes is also called a Test Source (TS). A configured
156	   set of MRM testers receiving the test traffic, and collecting
157	   receiver statistics are also called Test Receivers (TRs).

159	   An MRM manager initiates configuration requests to the MRM testers
160	   and assigns the roles of TSs and TRs. The MRM manager informs the TSs
161	   and TRs the types of monitoring or diagnostic tests to run. The MRM
162	   manager also specifies the type of reports the TRs should send.

164	   To guard against attacks on the MRM systems, IPsec Authentication
165	   Header (AH) [AH] is used with HMAC-MD5 transformation as the standard
166	   authentication algorithm.  Authentication should always be enabled,
167	   especially when MRM is used to monitor production services.

169	   Note that this document only specifies the types of information an MRM
170	   manager can obtain, and the protocol used to acquire such
171	   information. How an MRM manager processes or presents the diagnostic
172	   information is an implementation issue.  An MRM manager can be as
173	   simple as a command line wrapper of requests with simple display
174	   functions, it can also be more sophisticated and incorporated as part
175	   of a operational network monitoring tool in daily use by a network
176	   operation center (NOC).

178	2.1 Fault Detection

180	   Multicast routing can behave abnormally in different ways. The
181	   following are a few common types of faults:

183	   (1) Topological disconnectivity

185	       The network topology for multicast routing is disconnected.  For
186	       example when a route for a subset of the networks are not in the
187	       topology table.

189	   (2) Black holes in forwarding path

191	       No multicast packets can get through to certain receivers, even
192	       though the network topology is perhaps intact. A possible cause
193	       could be disabled multicast forwarding. Another possibility is
194	       pruning errors,n e.g. due to inconsistent actions and timer
195	       values on a multi-access LAN.

197	   (3) Excessive/persistent Losses

199	       Packets flow, but with excessive losses over extended period of
200	       time. The possible causes include heavy congestion, line errors
201	       or misuse of forwarding modes, etc.

203	   (4) Excessive duplicates

205	       Packets arrive at the receivers, but with large numbers of
206	       duplicates.

208	   (5) Others

210	       Other types of fault that can be detected, e.g. non-pruners
211	       as a failure mode. A non-pruning neighbor can be a sink for all
212	       multicast traffic at all times, even if no receivers exist behind
213	       that neighbor. This is "outlawed" by the "MBONE-community" [jhawk].
214	       Detecting the existence of such system in an inter-domain scenario,
215	       however, is not trivial.  We leave this task to the next iteration
216	       of MRM refinement.

218	2.2 Fault Isolation

220	   Fault isolation is initiated by the MRM manager. For different types
221	   of faults detected, various tools can be used to isolate the faults
222	   to small areas in the network. Currently, the tools available for
223	   this purpose includes but not limited to mtrace [MTRACE}, MIBs based
224	   debugging tools based, http-based status report mechanism and remote
225	   execution mechanisms.

227	   When one tool is not sufficient, a combination of tools can be
228	   applied.  In general, MRM is designed to be flexible about the types
229	   of tools it can utilize.  Integrating the functionality of other
230	   tools into MRM is an implementation issue for the MRM manager.

232	2.3 The Protocol

234	   As stated above, the task of monitoring multicast reachability is
235	   accomplished by letting an MRM manager configure the MRM testers to
236	   perform fault detection and isolation tests. The MRM manager
237	   summarizes or displays the collected reports for the network
238	   operators, in an implementation specific way.

240	   The MRM manager keeps a list of tester addresses. The relevant
241	   routing devices are administratively configured as candidate MRM
242	   testers. These testers will become active TSs and TRs once they
243	   accept and process requests from an MRM manager.

245	   We chose to use RTPv2 encapsulation for the following MRM messages:
246	   fault report messages from TRs and optionally some test data packets.
247	   This is to allow re-use of existing RTP based reception mechanisms.
248	   Note that despite the use of the RTPv2 packet format, the design
249	   goals and rules for the MRM message exchange protocol are entirely
250	   different from those specified in RTP.

252	2.3.1 MRM Manager Requests

254	   An MRM manager sends Test Sender requests to the TSs, and Test
255	   Receiver requests to the TRs.

257	   The MRM manager optionally transmits periodic beacon requests
258	   to the well-known MRM multicast address MRM-ADDR (224.0.1.111)
259	   that all TSs and TRs listen to. This beacon mechanism has three
260	   purposes:

262	   (1) For the TSs and TRs to learn the liveness of the MRM manager;

264	   (2) As a medium to periodically refresh requests, in order for
265	       testers to recover lost MRM messages, configurations or state
266	       (e.g. across reboots).

268	   (3) Inform a large group of test participants that an MRM session
269	       has been changed or cancelled.

271	   The use of beacon messages by the manager is optional primarily
272	   because multicast connectivity between the manager and TSs and
273	   TRs may not exist.  As a result, while beacon messages may add
274	   robustness, they should not be relied on to provide critical
275	   functionality.  While the manager chooses whether or not to
276	   send beacon messages, TSs and TRs must be prepared to handle
277	   these messages.

279	   The MRM manager may send a request to either a unicast address,
280	   or multicast address 224.0.1.111. When the message is sent via
281	   unreliable unicast transport (UDP), the recipient must send a
282	   positive acknowledgement after it has received that request.
283	   Unacknowledged request messages are retransmitted.

285	2.3.1.1 MRM Manager Beacon Message

287	   The MRM manager periodically transmits beacon messages to advertise its
288	   liveness to all MRM testers. This message is UDP-encapsulated.  The
289	   sender's timestamp can be used to calculate the jitters in delay
290	   between subsequent beacon messages.

292	   The recommended default Beacon message interval is 1 minute.  The MRM
293	   manager may piggyback the manager requests on the beacon messages.
294	   This potentially reduces the need to individually check and repair
295	   each tester's setup state, while still able to provide reliability
296	   through a soft-state refresh mechanism.

298	2.3.1.2 Test Sender Requests (TSRs)

300	   A Test Sender request is first unicast delivered to a TS, then
301	   refreshed through multicast delivery via the MRM beacon mechanism.
302	   A Test Sender request specifies one of the following two ways to
303	   generate test packets:

305	   (1) Local packet trigger. This request includes the following
306	       parameters:

308	       (a) intervals between two consecutive test packets;
309	       (b) format and length of test packets (e.g. RTP/UDP);
310	       (c) multicast address for the test group.

312	       If a TS accepts this local packet trigger, it will start sending
313	       periodic test packets, at intervals specified in the MRM request
314	       message. The IP address of the MRM manager will be used as the ID
315	       for all test packets originated by the TS under this request.  To
316	       detect loops and packet losses, all test packets also contain a
317	       monotonically increasing sequence number (if encapsulated in RTP,
318	       this would be the RTP sequence number).

320	   (2) Proxy packet trigger (see Section 5 for security impacts).

322	       This request lets a TS send a (sequence of) MRM test packet(s),
323	       using the IP source address provided by the manager request
324	       message.  This request contains all parameters a local packet
325	       trigger has, plus a proxy-source address.

327	       This request is useful for monitoring intra-domain multicast
328	       connectivity for external sources.  A proxy packet trigger can be
329	       used to inject packets into the local domain, pretending there is
330	       an active source external of the local domain. Inside the domain,
331	       as far as forwarding is concerned, these packets are
332	       indistinguishable from packets originated from a real external
333	       source.  For security reasons, proxy packet triggers should be
334	       enabled very carefully.

336	   TSR messages are also used to stop ongoing tests.  By re-sending
337	   the original TSR packet, but with a holdtime of zero, a test can
338	   be stopped.  NOTE:  TRR messages with a holdtime of zero should
339	   also be sent to each test receiver participating in the test.

341	2.3.1.3 Test Receiver Requests (TRRs)

343	   An MRM status request is first addressed to a unicast address of a
344	   TR, and subsequently should be carried in the MRM manager beacon
345	   messages sent to 224.0.1.111.

347	   Each such request carries a holdtime of the request, after which the
348	   TR can safely discard any information collected.  A TRR with a
349	   holdtime of zero implies that an ongoing test should be terminated.
350	   The TRR specifies how each TR should collect the reception data.

352	   The following are the request types for the TRs:

354	   (1) Monitor multicast group. This request has the following fields:

356	       (a) J-bit. If set, the TR will join the specified group, as if it
357	           were a host with a member of that group.

359	           If a tester did an IGMP join at the beginning of a test, when
360	           the MRM request expires, the IGMP group membership should be
361	           withdrawn.

363	           When a TR is instructed to join a data group of an existing
364	           application (e.g. a heartbeat [heartbeat] group), it is wise
365	           to assess the impact on the TR system if the data rate is
366	           non-trivial.

368	           Furthermore, the use of existing groups introduces uncertainty
369	           as to whether the source is actually transmitting.  Because
370	           TRs expect a constant flow of packets, using existing group
371	           traffic, which may be bursty, introduces uncertainty at the
372	           receiver as to whether traffic is flowing but is being lost
373	           or not being sent.

375	       (b) The address of the group to be monitored;

377	       (c) List of source addresses to record reception quality
378	           information;

380	       (d) Threshold description for triggering fault reports.

382	           This draft revision only specifies packet loss based
383	           threshold.  A fault is detected if the packet loss percentage
384	           has reached the threshold during the specified time window for
385	           measurement. Once set, the width of this window is fixed. But
386	           the starting point (or left edge) of the window keeps moving
387	           forward.

389	           Reception quality data within the measurement window should be
390	           kept so that threshold calculations can be made continuously
391	           as the window moves forward in time.

393	       (e) Maximum and minimum delays to trigger fault report. The report
394	           is sent at a randomized delay between the minimum and the
395	           maximum value.

397	       (f) Type of error reports solicited. It is possible to specify an
398	           RTCP report (as if the test session uses RTP), or a native MRM
399	           report.  Currently, MRM only supports RTP-based reports.

401	   (2) Fault isolation request. This request is sent after a fault is
402	       detected and identified by the MRM manager. It specifies the tool
403	       and its associated parameters.

405	       Details about this request message will be added in a future
406	       revision of the MRM specification.

408	   (3) Poll for receiver statistics. This instructs the TR to report the
409	       statistics (historic data) it has collected via Status Reports.
410	       The TR will send Status Reports, even if the fault threshold has
411	       not been reached. Section 2.3.2 describes the status report
412	       mechanism in detail.

414	   When large numbers of TRs are activated, a fault in the upstream of a
415	   tree may result in many TRs sending reports at the same time.  To
416	   address the issue of possible report implosion, each TR may use one
417	   of the following two strategies:

419	   (1) Report via unicast message. The MRM manager assigns a pre-
420	       determined report-delay (as part of the configuration design
421	       task) to each TR. Each TR upon detecting a fault, will randomly
422	       delay the sending of its report based on the pre-set delay
423	       period. This would allow an MRM system to monitor networks with
424	       up to thousands of systems without unreasonable compromises in
425	       detection response times.

427	   (2) Each TR may be instructed to report the detected faults to the
428	       well-known MRM group address 224.0.1.111 using the RTCP format
429	       [RFC1889] and does back-off or suppression when duplicate reports
430	       from other Testers are seen.  If using this strategy the manager
431	       should realize that using multicast to report a problem with
432	       multicast may not be particularly robust.

434	       This method allows the use of existing RTP-based monitoring tools
435	       in the initial deployment and experiments with MRM.  However, it
436	       will prevent the MRM manager from learning a complete list of
437	       receivers affected by a specific fault. When multicast routing is
438	       not working correctly, these reports may not be heard by the MRM
439	       manager, leaving faults undetected and not alarmed by the MRM
440	       manager.  It is recommended that all designs include at least a
441	       subset of TRs that (take turns to) unicast their reports.

443	   There is ambiguity in MRM not hearing any fault report from a certain
444	   TR. It could be due to fault-free network status, the crash of the
445	   TR, or problems in the transport mechanism between the TR and the MRM
446	   manager. Requiring each TR to frequently report its liveness and to
447	   only do unicast fault report may work for a moderate number of
448	   testers, but may put undue burden on the network for larger numbers
449	   of testers.  A compromising solution is to only report liveness from
450	   a critical portion of the network and do unicast fault report from a
451	   subset of the testers. The periodic liveness reports serve two
452	   purposes: (1) it provides evidence that the tester is still alive;
453	   (2) it indicates the conditions of the tester functions. The
454	   request-ack messages are used as tester liveness reports.

456	   Note that the fault isolation phase does not necessarily require the
457	   MRM manager to send a Fault Isolation Request to a TR. E.g, in a
458	   typical network today, a third party mtrace issued by the MRM manager
459	   may be sufficient to identify the faulty hop excessively dropping
460	   packets if the tester is not completely blacked out.

462	2.3.2 Status Reports

464	   These reports are sent by the TRs to the MRM manager, in response to
465	   a status request.

467	   For now, we use RTP [RFC1889] "receiver report (RR)" packet format to
468	   carry receiver's status reports. It is expected that the MRM-native
469	   report format (to be defined in future draft revisions) will carry
470	   more useful information about the routing state and statistics.

472	   Please refer to RFC1889 for details on the packet formats. Here we
473	   define the few RTCP items used by MRM (or loosely referred to as RTP
474	   profile for MRM):

476	      SSRC (Synchronization source) of packet sender:
477	         IP address of the Test Sender.

479	      Extended highest sequence number received:
480	         Highest sequence number seen by the Test Receiver.

482	      Fraction loss:
483	         Percent loss of Test Sender data.

485	      Cumulative number of packets lost:
486	         Total number of RTP data packets from SSRC lost within this
487	         reception window period.

489	      Inter-arrival Jitter:
490	         Set to zero when sent, ignored when received.

492	   When this report is UDP encapsulated and unicast addressed to the MRM
493	   manager, it is explicitly acknowledged. The acknowledgement packet
494	   contains the RTCP header portion of the original packet after the MRM
495	   header.

497	3. Use of MRM Well Known Addresses and Ports

499	   Once all TS and TR systems are configured, they join the well-known
500	   MRM control group MRM-ADDR (224.0.1.111) and listen to the well-known
501	   MRM UDP port MRM-MANAGER-PORT (679).

503	   The MRM beacon messages are periodically sent to 224.0.1.111 UDP
504	   port 679.

506	4. Message Formats

508	   By default, MRM control messages are encapsulated inside UDP, and an
509	   IP authentication header (AH) [KA98], is inserted in between the IP
510	   header and the UDP header, as shown below:

512	      +-----------+------+------------+------------+--------------+
513	      | IP Header |  AH  | UDP header | MRM header |  MRM payload |
514	      +-----------+------+------------+------------+--------------+

516	   The MRM status report in RTCP format is:

518	      +-----------+------+------------+------------------+------------+
519	      | IP Header |  AH  | UDP header | RTCP Rcvr Report | MRM header |
520	      +-----------+------+------------+------------------+------------+

522	   The MRM ACK packet format is:

524	      +-----------+------+------------+------------+-------------+
525	      | IP Header |  AH  | UDP header | MRM header | RTCP Header |
526	      +-----------+------+------------+------------+-------------+

528	   The inserted AH is reproduced below:

530	       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
531	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
532	      | Next Header   |  Payload Len  |          RESERVED             |
533	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
534	      |                 Security Parameters Index (SPI)               |
535	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
536	      |                    Sequence Number                            |
537	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
538	      |                                                               |
539	      |                Authentication Data (variable)                 |
540	      |                                                               |
541	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

543	   As specified in [KA98], the following are the default values for the
544	   fields above:

546	      Next Header: 17, the value for UDP protocol.

548	      Payload Len: 5, when MD5 is used (total length is 7 32-bit words).

550	      RESERVED: 0 when sent, ignored when received.

552	      SPI: 0 - 50, when using configured MD5 keys

554	      Sequence Number: the sequence number

556	      Authentication Data: message digest

558	4.1 MRM Message Header

560	   The MRM message header contains 4 32-bit words.

562	       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
563	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
564	      |Version| Type  |   Code        |           Holdtime            |
565	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
566	      |                    Target IP address                          |
567	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
568	      |M|    Reserved                 |      MRM message length       |
569	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
570	      |              Timestamp (in milliseconds)                      |
571	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

573	      Version:  4 bits
574	         This revision defines version 1 of MRM.

576	      Type: 4 bits
577	         The defined message types are:

579	         0 = Beacon       (from MRM manager to all testers)
580	         1 = TS Request   (from MRM manager to Test Senders)
581	         2 = TR Request   (from MRM manager to Test Receivers)
582	         3 = Status Response  (from TR to the MRM manager)
583	         4 = TS Request Ack   (from TS to MRM manager)
584	         5 = TR request Ack   (from TR to MRM manager)
585	         6 = Status Response Ack  (from MRM manager to TR)

587	      Code: 8 bits
588	         Defined according to each packet type.

590	      Holdtime: 16 bits
591	         Maximum duration in seconds this message should be honored.

593	      Target IP address: 32 bits
594	         The unicast address of the intended recipient of this message.

596	      M: 1 bit,
597	         0: last MRM request message in this packet.
598	         1: more MRM request messages follow in the same packet.

600	   When multiple MRM messages are grouped into one packet, the IP/AH/UDP
601	   headers of the second and all subsequent MRM messages are omitted. The
602	   total length of the IP packet will reflect the the sum of lengths of
603	   all MRM messages in the packet.

605	4.2 MRM Manager Beacon Message

607	   This message is UDP encapsulated, addressed to UDP port MRM-MANAGER-
608	   PORT.  The outstanding Test Sender Requests and Test Receiver
609	   Requests are included in the beacon message. The individual MRM
610	   headers are included with these TSR/TRRs.

612	4.3 Test Sender Request (TSR)

614	   There are two code values for a TSR:

616	      0: Local packet trigger
617	      1: Proxy packet trigger

619	   NOTE:  A host-based implementation is not expected to provide
620	   proxy packet capability.

622	   Following the MRM message header are the fields for the source
623	   specification request:

625	       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
626	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
627	      |   UDP port of test packets    |R| S | LEN |     Reserved      |
628	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
629	      |              Test group address                               |
630	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
631	      |              Inter-packet delay (millisecond)                 |
632	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
633	      |      Proxy source IP address (for proxy packet trigger)       |
634	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

636	      UDP port of test packets: 16 bits
637	         UDP port of test packets.

639	      R: 1 bit
640	         0: Tester will originate RTP/UDP encapsulated test packets
641	         1: Tester will originate another kind of packet (not used)

643	      S: 2 bits
644	         00: send on the targeted interface only
645	         01: send on all the multicast enabled interfaces
646	         10: send on test-send enabled interfaces
647	         11: Unused

649	      LEN: 3 bits (optional)
650	         Size of the packets to be sourced.  The length field represents
651	         a multiple of 16 bytes.  The range of possible packet sizes is
652	         16 bytes to 2048 bytes (2^7)*(16 bytes).  The LEN field is
653	         optional.  If ignored, test senders should send 16 byte packets.

655	      Reserved: 10 bits
656	         Set to zero when sent. Ignored with received.

658	      Inter-packet delay: 32 bits
659	         Number of milliseconds between consecutive test packets.

661	      Test group address: 32 bits
662	         Multicast address of the test group.

664	      Proxy source IP address: 32 bits
665	         IP address of the source to proxy packet for. This field
666	         exists only for a proxy packet trigger request.

668	4.4 Test Receiver Requests (TRR)

670	   The following are code values for status request messages:

672	      0: Monitor multicast group (Monitor request)
673	      1: Poll for receiver statistics (Poll request)
674	      2: Fault isolation request (not used in this revision)

676	   Message format for monitor and poll requests:

678	       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
679	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
680	      |J|R|     Reserved              | Number of sources to monitor  |
681	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
682	      |Thres index (0)|  Pkt loss (%) | Reception window (seconds)    |
683	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
684	      |  Min report delay (seconds)   | Max report delay (seconds)    |
685	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
686	      |  Max startup delay (seconds)  |            Reserved           |
687	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
688	      |  UDP port of test packets     |  UDP port for status reports  |
689	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
690	      /          Threshold description block                          /
691	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
692	      |              Test group address                               |
693	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
694	      |              IP address of Source 1                           |
695	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
696	      |             Inter-Packet delay interval from source 1         |
697	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
698	      /                     ...                                       /
699	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
700	      |              IP address of Source n                           |
701	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
702	      |             Inter-Packet delay from source n                  |
703	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

705	      J: 1 bit
706	         0: Don't join the multicast group to be monitored.
707	         1: Join the multicast group to be monitored.

709	      R: 1 bit
710	         0: Fault report should be sent in RTCP format
711	         1: Fault report should be sent in native MRM format (not used).

713	      Reserved:
714	         Zeroed when sent, ignored when received.

716	      Number of sources to monitor: 16 bit
717	         The number of sources this target tester should monitor. When
718	         all sources for the test group are monitored, this field is
719	         set to 1, and the corresponding source address field is set
720	         to 0.0.0.0.

722	      Thres index: 8 bits
723	         Always 0. Index of the criteria for determining a threshold
724	         for a fault.  The value of this index determines the content
725	         for the "Threshold description Block".

727	      Pkt loss (%): 8 bits
728	         Percentage of packet loss. A criteria to determine whether a
729	         fault has occurred.

731	      Max report delay (seconds): 16 bits
732	         Maximum number of seconds within which a fault report must be
733	         sent after it is detected.

735	      Min report delay (seconds): 16 bits
736	         Minimum number of seconds a fault report needs to be sent after
737	         it is detected. A report should not be sent in less than this
738	         delay.

740	      Max startup delay (seconds): 16 bits
741	         Max number of seconds the TR can wait before the start of the
742	         test. The test is considered started if a test packet is
743	         received, or the "max startup delay" has passed after the
744	         receipt of this request.

746	      Reception window (seconds): 16 bits
747	         Number of seconds used for calculating packet loss percentage.

749	      UDP port of data packets: 16 bits
750	         UDP port test data packets use.

752	      UDP port of status report packets: 16 bits
753	         UDP port of status report packets.

755	      Threshold description block: 0 bit
756	         Variable length, depending on "Thres index". This revision only
757	         defines threshold index 0, with no threshold description block.

759	      Test group address: 32 bits
760	         The IP multicast address for the test group.

762	      IP address of source 1 .. n: 32 bits
763	         The IP address of the sources the targeted tester should monitor.
764	         When the address is 0.0.0.0, all sources to this group will be
765	         monitored.

767	      Inter-packet delay from source 1 .. n: 32 bits
768	         Intervals between consecutive packets from the source
769	         (milliseconds).

771	4.5 Status Report to the MRM Manager

773	   This MRM revision uses the reception report (RTCP) format based on
774	   Section 2.3.2. Future revisions will define MRM specific report
775	   formats.

777	4.6 MRM Test Packet

779	   MRM test packets are RTPv2/UDP encapsulated. The RTPv2 packet header
780	   is replicated below for easy of description.

782	       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
783	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
784	      |V=2|P|X|  CC   |M|     PT      |       sequence number         |
785	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
786	      |                           timestamp                           |
787	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
788	      |           synchronization source (SSRC) identifier            |
789	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
790	      |                   IP address of MRM manager                   |
791	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

793	      CC:
794	         Set to 0 when sent, ignored when received.

796	      M:
797	         Set to 0 when sent, ignored when received.

799	      PT:
800	         Set to 0 when sent, ignored when received.

802	      Sequence number:
803	         Sequence number. Set to 0, when a tester is activated.

805	      Timestamp:
806	         System timestamp, in milliseconds.

808	      SSRC:
809	         IP address of the tester, or a configured 32-bit number that
810	         uniquely identifies the tester.

812	4.7 MRM Request-Ack Messages

814	   The Acknowledgement messages for the Test Sender Request and the
815	   Status Request provide guarantees that the requests are indeed
816	   received by the testers, instead of being lost.  The acknowledgement
817	   packets contain the MRM header and trailer for the respective
818	   messages, except that the message length and authentication data
819	   fields are recalculated.

821	5. Authenticating MRM Messages

823	   All MRM messages should be authenticated with the MD5 mechanism
824	   specified here. The fields in the messages are transmitted in the
825	   clear. Packets that fail the authentication check are discarded by
826	   the receivers.

828	5.1 Generating Authenticated Messages

830	   The sender of the MRM message decides which authentication key
831	   is used.

833	   (1) The MRM message length field is filled with the number of
834	       bytes in the message;

836	   (2) The rest of the message is composed;

838	   (3) The IPSEC AH is constructed;

840	   (4) The "authentication data" field is zeroed;

842	   (5) The MRM authentication Key (16 byte long) is appended
843	       to the MRM message.

845	   (6) The pad for the key is added. The digest is calculated and
846	       written into the "authentication data" field.

848	   The part with the MD5 secret is not transmitted.

850	5.2 Receiving Authenticated Messages

852	   The receiver follows the following steps when processing an incoming
853	   message:

855	   (1) The digest is stored away and the "authentication data"
856	       field zeroed;

858	   (2) It finds the key according to the value of "Key ID", and
859	       the key is appended and the packet properly padded;

861	   (3) A new digest is calculated.

863	   A message is discarded if the new digest is different from the one
864	   carried in the packet.

866	5.3 Key Management

868	   We expect to rely on manual key distribution in the initial stages.
869	   And MRM should be able to utilize the standard secure key management
870	   mechanism when it becomes available.

872	6. Security Considerations

874	   The strength of the security mechanism here depends on the strength
875	   of the key and the MD5 algorithm.

877	   Insufficiently protected TSs and TRs (e.g. by weak keys) can be
878	   subject to attacks that can cause the TSs and TRs to take actions
879	   causing harm to the network.

881	7. Different Approaches to Implement MRM

883	   MRM is originally targeted at two types of users: network operation
884	   centers that provide production quality services; and network
885	   administrators who oversee semi-production or experimental multicast
886	   services. The former often rely on SNMP-based tools for management
887	   tasks and typically desire all types of monitoring functionalities to
888	   be wrapped into the same set of tools. While the later, who usually
889	   set the stage for production quality offerings, do not normally rely
890	   on SNMP-based tools and favor task-oriented tools.

892	   For this reason, this document specifies the native MRM messages and
893	   operations. A companion document will define the MRM MIB that can
894	   accomplish the majority of the native MRM tasks.

896	8. Example of an MRM Setup

898	   The example shown in this section is for illustration purpose only,
899	   and does not cover all possible functionalities of the MRM framework.
900	          .                                                .
901	  Neighbor.    T1                                T2        . Neighbor
902	  Domain  .  +----+           +----+           +-----+     . Domain
903	         ----| BR1|-----------| R2 |-----------| BR3 |--------
904	          .  +----+           +----+           +-----+     .
905	          .    |   .                              |        .
906	               |    .                             |
907	               |     .-----------------------.    |
908	               |                              .   |
909	             +----+                            +-----+
910	             | R4 |                            | R5  |
911	             +----+                            +-----+
912	                 .                              /
913	                  .           T3               /
914	                   .        +----+            /
915	                    --------| R6 |-----------/
916	                            +----+
917	                              |
918	                              |
919	                         -------------
920	                        | MRM Manager |
921	                         -------------

923	   The above is a simple topology used to demonstrate the use of various
924	   MRM features. Border routers BR1, BR3 and an internal router R6 are
925	   administratively configured as candidate MRM Testers. The MRM manager
926	   configures T1 to be a TS, and T2,T3 to be TRs. The following are the
927	   messages sent by the MRM components.

929	   (1) MRM manager sends Test Sender request (TSR) to T1.
930	       Req1 = {Local packet trigger,
931	               test packet interval = 60,000 (ms),
932	               RTP/UDP test packet  = TRUE,
933	               Test group           = 239.255.255.2}

935	       T1 acknowledges receipt of Req1.

937	   (2) MRM manager sends TR request Req2 to T2. Req2 has the following
938	       content:

940	          J-bit                         = TRUE,
941	          list of source addresses      = {T1's IP address},
942	          threshold for fault detection = {20% loss over 10 minutes},
943	          max delay for fault report    = 10 seconds,
944	          min delay for fault report    = 0 seconds,
945	          Test group                    = 239.255.255.2,

947	       T2 acknowledges receipt of Req2. Req2 is retransmitted if the
948	       retransmission timer expires.

950	   (3) MRM manager sends TR request Req3 to T3. Similar to Req2,
951	       except the target is T3, and,

953	          max delay for fault report    = 20 seconds,
954	          min delay for fault report    = 10 seconds

956	       By using different (min, max) report times, it can avoid report
957	       implosion at the MRM manager, when a fault is detected by T2 and T3
958	       at the same time.

960	   (4) MRM manager periodically sends beacon messages, carrying Req1 and
961	       Req2, Req3. The holdtime is set to the remaining lifetime of the
962	       original request.

964	   Assume T1 has a fault such that it can only forward 1% of all
965	   multicast packets, the fault is detected by T2 and T3. T2 randomly
966	   delays between 0-10 seconds, and sends a fault report to the MRM
967	   manager.  The MRM manager acknowledges this report. T3 randomly
968	   delays between 10-20 seconds, and sends its fault report to the MRM
969	   manager, which is also acknowledged. This concludes the fault
970	   detection phase.

972	   In the fault isolation phase, assume the MRM manager sends a third
973	   party mtrace request to T2 or T3, and isolates the fault to between
974	   T1, R2 and T1, R4. The MRM manager can then issue an an alarm to the
975	   network operator, with proper descriptions of the problem.

977	   The operation for fault isolation phase might be more complicated for
978	   other types of fault, e.g. if T1 has lost the ability to forward
979	   multicast packets completely, T2 and T3 wouldn't have any multicast
980	   routing state or statistics for mtrace to work, some other mechanisms
981	   would have to be put in use.

983	9. Acknowledgment

985	   We'd like to thank John Meylor, Beau Williamson, Stephen Deering,
986	   Ishan Wu, Louis Mamakos, Manoj Leelanivas, David Meyer, Bill Fenner
987	   and Dave Thaler for their comments and suggestions.  We'd like to
988	   especially TY Lin and Kamil Sarac for filling in missing details from
989	   the previous version of the specification.

991	10. Authors addresses

993	   Kevin Almeroth
994	   Department of Computer Science
995	   University of California
996	   Santa Barbara, CA 93106-5110
997	   almeroth@cs.ucsb.edu

999	   Liming Wei
1000	   Siara Systems, Inc.
1001	   300 Ferguson Drive
1002	   Mountain View, California 94043
1003	   lwei@siara.com

1005	   Dino Farinacci
1006	   cisco Systems, Inc.
1007	   170 West Tasman Drive
1008	   San Jose, CA 95134
1009	   dino@cisco.com

1011	11. References

1013	   [mtrace] Steven Casner, Bill Fenner et al. The mtrace tool.

1015	   [mrm-use] Kevin Almeroth, Liming Wei, "Justification and Use of MRM",
1016	            draft, Jan 15, 1999.

1018	   [aboba]  Bernard Aboba, "The Use of SNTP as a Multicast Heartbeat",
1019	            Draft, draft-ietf-mboned-sntp-heart-02.txt.

1021	   [ping]   Jon Postel, "Internet Control Message Protocol", RFC792,
1022	            Information Sciences Institute, 1981.

1024	   [UDP]    Jon Postel, "User Datagram Protocol", RFC768. Information
1025	            Sciences Institute.

1027	   [scope]  Dave Meyer, "Administratively Scoped IP Multicast",
1028	            Draft, draft-ietf-mboned-admin-ip-space-03.txt.

1030	   [MD5]    R. Rivest, "The MD5 Message-Digest Algorithm", RFC1321,
1031	            April, 1992

1033	   [KA98]   Kent Stephen, Randall Atkinson, "IP Authentication Header",
1034	            "draft-ietf-ipsec-auth-header-07.txt", July 1998

1036	Appendix A - Change History

1038	   October 1999 -- revisions since draft-ietf-mboned-mrm-00.txt

1040	   (1) Added a TS length field to allow test send packets to be
1041	       specified between 16 bytes and 2048 bytes in 16 byte
1042	       increments.

1044	   (2) Made usage of beacon messages by the manager optional.
1045	       Test agents are required to be able to process beacon
1046	       messages.

1048	   (3) Monitoring existing groups is relegated to a later version
1049	       because of the difficulty in monitoring the source to
1050	       determine if it is sending a packet.  When an MRM Test
1051	       Source is used, Test Receivers know when, how many, and
1052	       for how long packets will be sent.  If no packets are
1053	       received the test receiver knows to report 100% loss.
1054	       This assumption is not possible when monitoring existing
1055	       groups.

1057	   (4) Added additional detail about packet formats and packet
1058	       handling procedures to reduce ambiguity.