Last Call: <draft-ietf-mboned-auto-multicast-14.txt> (Automatic Multicast Tunneling) to Proposed Standard

Magnus Westerlund <magnus.westerlund@ericsson.com> Wed, 26 December 2012 13:01 UTC

Return-Path: <magnus.westerlund@ericsson.com>
X-Original-To: ietf@ietfa.amsl.com
Delivered-To: ietf@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id EEEA121F8CA8; Wed, 26 Dec 2012 05:01:41 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -105.918
X-Spam-Level:
X-Spam-Status: No, score=-105.918 tagged_above=-999 required=5 tests=[AWL=-0.271, BAYES_00=-2.599, HELO_EQ_SE=0.35, HTML_MESSAGE=0.001, HTML_OBFUSCATE_05_10=0.001, J_CHICKENPOX_62=0.6, RCVD_IN_DNSWL_MED=-4, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id tlQ-3-AZdzET; Wed, 26 Dec 2012 05:01:39 -0800 (PST)
Received: from mailgw7.ericsson.se (mailgw7.ericsson.se [193.180.251.48]) by ietfa.amsl.com (Postfix) with ESMTP id 2007221F8CA4; Wed, 26 Dec 2012 05:01:37 -0800 (PST)
X-AuditID: c1b4fb30-b7f736d0000010de-61-50daf530dec5
Received: from ESESSHC015.ericsson.se (Unknown_Domain [153.88.253.124]) by mailgw7.ericsson.se (Symantec Mail Security) with SMTP id A8.9B.04318.035FAD05; Wed, 26 Dec 2012 14:01:36 +0100 (CET)
Received: from ESESSMB309.ericsson.se ([169.254.9.200]) by ESESSHC015.ericsson.se ([153.88.183.63]) with mapi id 14.02.0318.004; Wed, 26 Dec 2012 14:01:35 +0100
From: Magnus Westerlund <magnus.westerlund@ericsson.com>
To: "mboned@ietf.org" <mboned@ietf.org>, "ietf@ietf.org" <ietf@ietf.org>, "iesg@ietf.org" <iesg@ietf.org>
Subject: Last Call: <draft-ietf-mboned-auto-multicast-14.txt> (Automatic Multicast Tunneling) to Proposed Standard
Thread-Topic: Last Call: <draft-ietf-mboned-auto-multicast-14.txt> (Automatic Multicast Tunneling) to Proposed Standard
Thread-Index: Ac3jVcIqcwxiuhgnSx6BeVaaAQVGSw==
Date: Wed, 26 Dec 2012 13:01:35 +0000
Message-ID: <52E4A8FC978E0241AE652516E24CAF000E5FBC@ESESSMB309.ericsson.se>
Accept-Language: sv-SE, en-US
Content-Language: sv-SE
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [153.88.183.16]
Content-Type: multipart/alternative; boundary="_000_52E4A8FC978E0241AE652516E24CAF000E5FBCESESSMB309ericsso_"
MIME-Version: 1.0
X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrMLMWRmVeSWpSXmKPExsUyM+Jvja7B11sBBlcPylrM+DOR2eLZxvks FlNaTjNafLtyg92BxWPJkp9MAYxRXDYpqTmZZalF+nYJXBmbH7xnKji4ialiy/PTbA2MT34x djFyckgImEh86X3ODGGLSVy4t56ti5GLQ0jgEKPElNWX2SGcJYwS31t3gHWwCVhI3PzRyAZi iwhkS3xvmccKYjMLaEr83joFKM7BISxQIvHvkANESaXEh5aNzBC2nsT+69vBbBYBVYk1KxaC 2bwC3hKXvpwEG8MoICtx//s9FoiR4hKf5z5ggjhOQGLJnvNQh4pKvHz8jxXCVpTYebadGaI+ X+LS42usEDMFJU7OfMIygVF4FpJRs5CUzUJSBhHXk7gxdQobhK0tsWzha2YIW1dixr9DLMji CxjZVzGy5yZm5qSXm29iBEbMwS2/DXYwbrovdohRmoNFSZw33PVCgJBAemJJanZqakFqUXxR aU5q8SFGJg5OqQbGgJfHFvrWuq14tdT4oVv5y9Tnz28tWBDblhC161SR4NUlnlJPrn54dqJs z76k271zxT+Ur5myafZD0ewJn7P/3eD6wMq3grfRci7j/O47YazC8/dbJnrmz/vY8FvFiWm1 4uaH0fO158zekTjzwpn3WzLS7xonKwmapy9RSV+kH+Oix2B3rPNxtBJLcUaioRZzUXEiAHHD D11mAgAA
Cc: "tsv-dir@ietf.org" <tsv-dir@ietf.org>
X-BeenThere: ietf@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: IETF-Discussion <ietf.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf>, <mailto:ietf-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ietf>
List-Post: <mailto:ietf@ietf.org>
List-Help: <mailto:ietf-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf>, <mailto:ietf-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 26 Dec 2012 13:01:42 -0000

Hi,

Sorry for being late with this IETF last call comments. I will partly blame the ADs requesting this Transport Directorate review a bit late, the other part is all mine and the holidays. Anyway, I do hope you will consider these issues and comments as I believe I found some serious ones in addition to a number of clarifications that should be made.


Significant Issues:

1. Congestion Control
This is clearly a tunnel establishment protocol of something that is IP traffic. Thus normally the responsibility for congestion control is with the tunnelled traffic. However, I would like argue that this does not apply in this case due to the nature of the tunnelled traffic, i.e. multicast traffic and secondary due to limitations in the tunnel protocol.

Lets start with the second part. This protocol claims to support ASM still don't provide a upstream delivery mechanism, i.e. an ASM receiver is not capable of sending as it should. This prevents several existing mechanism for congestion control that exist in protocols supporting multicast. The first is using RTCP for congestion control in ASM [RFC3550], the second is TCP-Friendly Multicast Congestion Control [RFC4654] that can be used in the RMT suite of protocols, and I know has been implemented in some NORM implementation. Thus only strictly receiver based mechanisms, such as Wave and Equation Based Rate Control [RFC3738] are available in this context.

Secondly, many multicast usages are in fact deployed without any congestion control. This based on that the deploying entity controls the scope and authorization for requesting multicast delivery. However, does restrictions does not apply to AMT delivery of multicast. If the gateway can reach using unicast the relay it can be delivered the multicast group from the domain the relay is attached to. Thus, this protocol changes the deployment restrictions of multicast which many non-congestion controlled delivery is based on. Instead the non-congestion controlled traffic can now sent over an IP/UDP tunnel over Internet where neither relay nor gateway may have any knowledge about the path the traffic may take.

Based on this I would like to see two changes to this protocol specification. First a section discussing the issue of congestion control. Secondly, I think this protocol should have an applicability statement limiting its deployment to restricted environments where the relay and gateway deployers can provide certain resource provisions between the entities to avoid the multicast traffic affecting other traffic sharing the same bottlenecks in ways not allowed by the network provider.

2. Security

This protocols is frank with it having limited security features and says this is similar to the IGMP and MLD protocols being used. However, I think this is a failure to propoerly consider the threat model. If one uses AMT over general Internet it will run in a network where the one deploying the multicast and the relays no longer control requirements on source address verification or possibilities for traffic separation as they can do within the domains where multicast currently are deployed. The security vulnerabilities in IGMP and MLD are much more contained and controllable in a LAN environment where one has chosen to deploy multicast compared to an Relay exposing this to the whole Internet. Once more I think there is only two choices here.

A) Beef up the security to general Internet threat model, i.e. at a minimal provide a real model for gateway authentication using identities, not only return routability based verifications.

B) Limit the applicability of AMT to managed environments and make it clear that the relay will need to limit which gateways are allowed to access the relay based on addressing.

Based on the first significant issue with congestion control I expect that there is little meaning to do A) unless also one is willing to beef up AMT to provide congestion control. Which I think is not according to the design wishes for the protocol designers.

3. Use of Zero Checksum

The AMT specification enables the use of Zero UDP checksum with IPv6, i.e.
draft-ietf-6man-udpchecksums-06<http://datatracker.ietf.org/doc/draft-ietf-6man-udpchecksums/> and draft-ietf-6man-udpzero-08<http://datatracker.ietf.org/doc/draft-ietf-6man-udpzero/>. Nothing against this in principal. However, I have noticed that AMT fails to properly address the failure modes of using a zero-checksum. AMT is a typical example of a protocol that actually need active verification of each tunnel that zero-checksum functions. This as AMT is clearly intended to be deployed with its Gateway part in end-hosts and residential network devices or routers. This means the tunnel will pass through both firewalls and NATs on its path between the relay and the gateway. Unless these devices are not upgraded to support zero-checksum in UDP for IPv6 the traffic may actually become black holed. The most likely is a simple firewall that has a rule for IPv6/UDP which doesn't allow zero checksum as it is against RFC 2460. Thus all the Multicast data packets will disappear on route in the tunnel. There is no mechanism in the AMT protocol to detect this and negotiate with the relay so that it will not use the zero-checksum for this tunnel.

This must be addressed as I see it. If not the AMT will be so brittle that it can't be used in a large number of its intended deployments.

4) MTU issues

This document total fails to discuss the issues of MTU blackholing. As the IP multicast datagrams as well as the encapsulated IGMP/MLD messages can with the added tunnel-overhead result in that the sent packet exceeds the MTU of the path, these packets could be black holed. This can potentially result in very intermittent transport behavior for the tunnel. Thus, some discussion of how do handle the MTU issues in this context should be introduced.

I am willing to discuss methods here, but I guess several alternatives exists and thus which is most appropriate and the level of AMT support for them varies I would like the protocol designers to do a first stab at resolving this.

Other Issues
======================================================================

A) The table of content on page 2 should include more levels of headings. Most likely down to 4 is needed to make the TOC usable for finding content in the document.

B) The claimed ASM support
I would like to better understand how one can claim to support ASM when one has no up stream path to inject the ASM group participants traffic. When one uses ASM one normally does this for a reason and needing the possibility to inject packets into the group. These limitations needs to be clarified.

C) Section 4:

This section indicates is its figures, the ones in Section 4.1.1 and Section 4.1.3.1 that the Router Mode IGMP/MLD functions are outside of AMT. Which based on the requirements in Section 5.3.3.4 is not accurate. That implementation must be AMT specific to maintain the AMT tunnel to group membership handling

D) Not all figures have handles that can be used to reference.

E) Section 4.1.5.1:


   Similarly, the selection of a unicast Relay address may be source-
   dependent, as a relay contacted by a gateway to supply multicast
   traffic must have native multicast connectivity to the traffic source


I find this statement confusing. There is no support in the protocol for including the multicast group(s) which the gateway like to get in the discovery phase of the protocol.

F) AMT Gateway in home router.
One deployment scenario is that the AMT gateway is deployed in the home network router to provide access to multicast groups provided by the ISP. However, the startup procedures in this deployment is unclear. The text appears to indicate that one can both have a gateway implementation that as soon as the router boots it starts doing discovery and requests to have Queries to send to its internal local network. Other suggestions appears to be to wait until some host actually request to join a group. The protocol specification appears to do its best to leave very much flexibility and thus produce huge variance in the market.

G) Figure 3:
I find no discussion of Membership Updates to rejoin the groups after the tunnel has changed its source address as seen by the relay. This I think should have some discussion. Yes, it reasonably clear that you will get traffic just by sending new membership updates over the new tunnel. However, some discussion of the timing between teardown and this membership update should be considered. Figure 3 implies it should be sent after the teardown, which I think is correct due to the traffic volumes to the NAT most likely causing the path change.

H) Figure in Section 4.2.2.2
Propose that the external side of the NAT should be marked as the one having the "e" addresses.

I) Seeing the figure in Section 4.2.2.3 I definitely commented on the Address Collision issues. It is made somewhat clearer later on this. But, maybe an clearer section 4 sub-section to discuss this general issue that multiple left side host can have the same address as other behind other tunnel-end-points and thus there is need in the Relay to hide this from upstream and accept it and use the tunnel context to track the different hosts.

J) Section 4.2.2.3:


 To avoid placing an undue burden on the relay platform, the protocol
   specifically allows zero-valued UDP checksums on the multicast data
   messages.  This is not an issue in UDP over IPv4 as the UDP checksum
   field may be set to zero.  However, this is a problem for UDP over
   IPv6 as that protocol requires a valid, non-zero checksum in UDP
   datagrams [RFC2460].  Messages sent over IPv6 with a UDP checksum of
   zero may fail to reach the gateway.  This is a well known issue for
   UDP-based tunneling protocols that is described
   [I-D.ietf-6man-udpzero].  A recommended solution is described in
   [I-D.ietf-6man-udpchecksums].

I think this needs reformulating and I don't understand what is intended
with the last sentence.

K) Section 5.1.1.
"Destination UDP Port -  The IANA-assigned AMT port number."

I find it strange that the protocol is mandating that all traffic is sent
to the IANA assigned port. Why can't the protocol not allow more flexible handling
of the destination port? I find one single thing in the protocol which prevents
usage of an other relay listener port. That is that the Relay Advertisement would
need a port field in addition to the address.

L) Section 5.1.1.4
A 32-bit random value generated by the gateway and echoed by the
   relay in a Relay Advertisement message.

Should the above value make it clear that it preferably should be a
cryptographically random value as defined in RFC 4086?

M) There is lack of specification in Section 5.1.1 of what one does if version
is different from 0. This is mentioned in Section 5.3.3.1 but not for gateways
and not all messages types.

N) Section 5.1.4.8:

   The Querier's Query Interval Code (QQIC) field in the general query
   is used by a relay to specify the time offset a gateway should use to
   schedule a new three-way handshake to refresh the group membership
   state within the relay (current time + Query Interval).

In several places the QQIC and QRV are not made clear that this is defined in
the external references for MLD and IGMP.

O) Section 5:
When specifying the bit-fields, please indicate the length of each field in the
text. This is an accessibility question. If you have impaired vision interpreting
the figures field length correctly can be different.

P) Section 5.2.2.4:

This section defined what retransmission parameters that one can potentially
configure. However, the section fails to define what the max or min values that
are acceptable are. Wrongly configured retransmission parameters can have significant
negative impact on the network by causing bursts or unnecessary traffic.

Q) Section 5.2.3.3:
The gateway may continue to receive Multicast Data messages long
   after the gateway sends a Membership Update message that deletes
   existing group subscriptions.

What is "long" in the above sentence. Are we talking some known number of seconds,
a TCP MSL, i.e. 2 min?

R) Section 5.2.3.4.3
   A gateway MAY retransmit a Relay Discovery message if it does not
   receive a matching Relay Advertisement message within some timeout
   period.  If the gateway retransmits the message multiple times, the
   timeout period SHOULD be adjusted to provide an random exponential
   back-off.  The RECOMMENDED timeout is a random value in the range
   [initial_timeout, MIN(initial_timeout * 2^retry_count,
   maximum_timeout)], with a RECOMMENDED initial_timeout of 1 second and
   a RECOMMENDED maximum_timeout of 120 seconds (which is the
   recommended minimum NAT mapping timeout described in [RFC4787]).

I wonder if the above exponential backof is really what is desired. As it randomly
picks the timeout between initial timout value and the 2^retry_count*initial_timeout it
will be both lower biased and also capable of producing timing intervals that doesn't
grow. If one desire to have random timeout to avoid some clock synchronization effects
I think an algorithm that is Td = MIN(initial_timeout * 2^retry_count,
   maximum_timeout) and where the actual timeout is random*Td and where random is a
random value from the uniform distribution in the interval [0.5,1.5]. Will both ensure
that the timout between two retransmissions never is less and on average grows with
a factor two.

This section is also not defining a minimal initial timout value, or any method for
safely determine a more performant value from a safe initial value. To do a RTT
measurement using the AMT control messages would require some extensions but
could be a good way of deterimining a better initial value than 0.5 seconds which
would be my recommendation for a default value.

S) Section 5.2.3.4.4

   If a gateway executes the relay discovery procedure at the start of
   each membership update cycle and the relay address returned in the
   latest Relay Advertisement message differs from the address returned
   in a previous Relay Advertisement message, then the gateway SHOULD
   send a Teardown message (if supported) to the old relay address,
   using information from the last Membership Query message received
   from that relay, as described in Section 5.2.3.7.  This behavior is
   illustrated in the following diagram.

This text and the figure after it does not appear to be consistent.
The figure implies a timer that isn't present in the above. The textual
description appears sensitive to flapping anycast routing. I think
the figures indication of some higher timeout before redoing Relay discovery
appears much more robust.

T) Section 5.2.3.5.3
See R)
Also this text appears redundant to previous text. Maybe generalize this
into its own section being used in general for all messages needing retransmission

U) Section 5.2.3.5.4
   Querier's Query Interval Code carried by the general-query.  A
   gateway MAY use a smaller timer duration if required to refresh a NAT
   mapping that would otherwise timeout.

Maybe the protocol would rather need a NAT keep-alive message to be sent
from the gateway to the relay. But maybe the Request, Query cycle is light weight
enough that this works fine.

V) Section 5.2.3.6.1

o  Insert IGMP or MLD datagrams into a queue for transmission after
      it receives a Membership Query message.

What assumptions of queue depth exist in the above. Clearly the messages in this
queue should expire if they become to old.

X) Section 5.2.3.7
Gateway support for the Teardown message is OPTIONAL but RECOMMENDED.

The above is a very strange usage of RFC 2119 keywords. IF you use the synonyms
then maybe the error of writing it this way is clear.
Gateway MAY support for the Teardown message but SHOULD.

Y) The usage of retransmission versus repetitions are not always clear.
Some of the messages appears to simply need to be repeated QRV number of times with
some interval. Others should really be matched with an answer and if not received within
timeout retransmitted. Can these two cases be made more clear?

Z) Section 5.3.5
   The hash function RECOMMENDED for use in computing the Response MAC
   is the MD5 hash digest [RFC1321], though hash functions or keyed-hash
   functions of greater cryptographic strength may be used.

I think this points to a security vulnerability. I think it needs to be made clear
that the MAC MUST be keyed. If it is just a digest, then an attacker can calculate the
MAC and perform an off-path attack.

This should be made clear also in Section 6.1 to be a requirement.

AA) Section A.1:

Altough this proposals has its advantages I think it might also illustrate a
short-coming. First of all 48-bits is quite short for a MAC. I would prefer a
variable length field.

Secondly, doesn't this actually create more material for an attacker to determine
the key used by the relay?

That was all I have found.

Cheers

Magnus