Re: [Int-area] SEAL - draft-templin-intearea-seal-05.txt
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Int-area] SEAL - draft-templin-intearea-seal-05.txt
Hi Ralph,
On Sep 19, 2009, at 7:16 AM, Ralph Droms wrote:
The IESG has received draft-templin-intearea-seal-05.txt, "The
Subnetwork Encapsulation andAdaptation Layer (SEAL)" as an
Individual Submission. I need intarea review and comment on the
document as part of the review process. Please read and respond
with comments to the int-area at ietf.org mailing list. Thanks.
The SEAL draft (draft-templin-intarea-seal-05.txt) is a reasonably
well-written document with some interesting technical content.
However, I don't believe that it is suitable for publication as an
IETF Proposed Standard RFC for several reasons:
(1) This is fairly substantial document that has not been produced by
an IETF WG, and I do not believe that it has received wide-enough
visibility or review to determine if there is IETF consensus to
publish this document as a standards-track document. If
standards-track RFC publication is the desired goal, I think it would
be better to hold a BOF to see if there is interest in forming a WG to
pursue this work.
(2) There are a number of significant technical issues with the
document that would (IMO) need to be addressed before it is ready for
publication. While this document might be a good starting point for
IETF work (if there are enough people who want to work on it), I do
not think it is even close to ready for standards-track RFC
publication in its current form.
I have included my review notes below, which raise several
significant issues with this document.
Margaret
>Abstract
>
> For the purpose of this document, subnetworks are defined as
virtual
> topologies that span connected network regions bounded by
> encapsulating border nodes. These virtual topologies may span
> multiple IP and/or sub-IP layer forwarding hops, and can introduce
> failure modes due to packet duplication and/or links with diverse
> Maximum Transmission Units (MTUs). This document specifies a
> Subnetwork Encapsulation and Adaptation Layer (SEAL) that
> accommodates such virtual topologies over diverse underlying link
> technologies.
This abstract (and the title) refer to SEAL as a "layer". However, it
is unclear to me that SEAL acts as a "layer" in the typical sense that
the term is used in the IETF...
> First, the IPv4 header Identification field is only 16 bits in
> length, meaning that at most 2^16 packets pertaining to the same
> (source, destination, protocol, Identification)-tuple may be active
This doesn't follow... I think, perhaps, what this document intends to
say is that there may only be 2^16 packets with the same source IP,
destination IP and protocol three-tuple that all have a unique IP
ID value? This is better stated in one of the appendices.
> in the Internet at a given time. Due to the escalating
deployment of
> high-speed links (e.g., 1Gbps Ethernet), however, this number may
> soon become too small by several orders of magnitude.
This claim should be substantiated with a reference of some sort of
data. Also, there should be some explanation included for why this is
an operational (or even theoretical) problem...
> Furthermore,
> there are many well-known limitations pertaining to IPv4
> fragmentation and reassembly - even to the point that it has been
> deemed "harmful" in both classic and modern-day studies (cited
> above). In particular, IPv4 fragmentation raises issues ranging
from
> minor annoyances (e.g., in-the-network router fragmentation) to the
> potential for major integrity issues (e.g., mis-association of the
> fragments of multiple IP packets during reassembly).
Reference?
> As a result of these perceived limitations, a fragmentation-
avoiding
> technique for discovering the MTU of the forward path from a source
> to a destination node was devised through the deliberations of the
> Path MTU Discovery Working Group (PMTUDWG) during the late 1980's
> through early 1990's (see Appendix D). In this method, the source
> node provides explicit instructions to routers in the path to
discard
> the packet and return an ICMP error message if an MTU restriction
is
> encountered. However, this approach has several serious
shortcomings
> that lead to an overall "brittleness".
Reference? Also, there was more recent work on PMTUD that doesn't
seem to be reflected in this paragraph. The transport area should be
asked to review this document, particularly the PMTUD WG if that
remains in some form.
> In particular, site border routers in the Internet are being
> configured more and more to discard ICMP error messages coming from
> the outside world. This is due in large part to the fact that
> malicious spoofing of error messages in the Internet is made simple
> since there is no way to authenticate the source of the messages.
Reference?
> Furthermore, when a source node that requires ICMP error message
> feedback when a packet is dropped due to an MTU restriction does
not
> receive the messages, a path MTU-related black hole occurs. This
> means that the source will continue to send packets that are too
> large and never receive an indication from the network that they
are
> being discarded.
My (incomplete) understanding is that this problem has been addressed
by new PMTUD work.
> The issues with both IPv4 fragmentation and this "classical" method
> of path MTU discovery are exacerbated further when IP-in-IP
tunneling
> is used. For example, site border routers that are configured as
> ingress tunnel endpoints may be required to forward packets into
the
> subnetwork on behalf of hundreds, thousands, or even more original
> sources located within the site.
This wording is awkward in a number of ways:
1) There is not typically one site border router in cases where tunnel
decapsulation is happening at the edge of a network, because typically
the tunnel encapsulation/decapsulation happens (at least logically)
outside of the network perimeter (i.e. outside the firewall).
2) I am not sure what it means to say that the ingress tunnel endpoint
forwards packet into the subnetwork on behalf of local sources... It
forwards those packet out to the Internet, not onto the subnetwork (at
least as I would use that term). It seems like this document is using
the term "subnetwork" to mean "tunnel"? If so, that is a bit
confusing given that subnet already has a well-defined meaning.
> If IPv4 fragmentation were used,
> this would quickly wrap the 16-bit Identification field and could
> lead to undetected data corruption.
Ummm... Why and how? I think I understand what this is driving at,
but it is not clearly explained. If the ingress tunnel endpoint is
using a single source adddress/destination address/protocol three-tuple
for a very large amount of traffic generated from a site, and if that
tunnel endpoint is performing fragmentation on _external_ packets to
make them fit in the tunnel, I can see how you might have accidental
ID overlaps.
However, it is more typical for tunnels to fragment the _inner_ IP
packet, causing reassembly to happen at the end nodes, and thus
avoiding the need to collapse many ID numbering spaces into one. For
precisely this reason.
>If classical IPv4 path MTU
> discovery were used instead, the site border router may be
> inconvenienced by excessive ICMP error messages coming from the
> subnetwork that may be either untrustworthy or insufficiently
> provisioned to allow translation into error messages to be returned
> to the original sources.
The passive voice hurts here, as it isn't clear who would be doing the
PMTUD... The original sending nodes? Or the ingress tunnel endpoint
itself?
The "subnetwork may ... be untrustworthy? What does that mean? What
would be insufficiently provisioned?
I am not sure what problem is being described here.
> The situation is exacerbated further still by IPsec tunnels, since
> only the first IPv4 fragment of a fragmented packet contains the
> transport protocol selectors (e.g., the source and destination
ports)
> required for identifying the correct security association rendering
> fragmentation useless under certain circumstances. Even worse,
there
> may be no way for a site border router that configures an IPsec
> tunnel to transcribe the encrypted packet fragment contained in an
> ICMP error message into a suitable ICMP error message to return to
> the original source.
This is one of the classic reasons to avoid intermediate fragmentation.
> Due to these many limitations, a new approach to accommodate links
> with diverse MTUs is necessary.
I do not know that this is true... There are several tunneling
protocols (GRE, L2TP, etc.) that already handle these issues, and this
document has not discussed how they handle them nor has it made any
case that they are being handled inadequately in currently, widely-
deployed tunneling solutions.
> This document introduces a Subnetwork Encapsulation and Adaptation
> Layer (SEAL) for tunnel-mode operation of IP over subnetworks that
> connect Ingress and Egress Tunnel Endpoints (ITEs/ETEs) of border
> nodes. It provides a standalone specification designed to be
> tailored to specific associated tunneling protocols such as VET
> [I-D.templin-intarea-vet], the Locator-Identifier Split Protocol
> (LISP) [I-D.ietf-lisp] and others.
I do not believe that the LISP WG has explicitly been asked to review
this document. However, this solution has been proposed on the LISP
WG mailing list... It did not receive wide-spread support, was
rejected by at least some LISP participants, and was ruled
out-of-scope for LISP discussion. So, it is a bit odd to see an AD-
sponsored submission that claims that this is tailored to LISP...
> SEAL encapsulation additionally includes a 2-bit version number.
> This document specifies SEAL protocol versions 0 and 1.
A two-bit version number where two of the versions are already assigned?
> ITE - Ingress Tunnel Endpoint
>
> ETE - Egress Tunnel Endpoint
These terms need references or actual definitions. There are several
terms in this section that just expand acronyms with no actual
definition.
> DF - the IPv4 header "Don't Fragment" flag
Reference?
> The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD,
> SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in
this
> document, are to be interpreted as described in [RFC2119].
This document makes frequent use of lower-case requirements keywords
(such as "must") in cases where they appear to be (or could at least
be interpreted as) requirements. They should be uppercased, switched
to other words, or clarified.
> SEAL was motivated by the specific case of subnetwork abstraction
for
> Mobile Ad hoc Networks (MANETs);
And, yet the MANET group did not adopt SEAL? Why?
> however, the domain of applicability
> also extends to subnetwork abstractions of enterprise networks, ISP
> networks, SOHO networks, the interdomain routing core, and many
> others. In particular, SEAL is a natural complement to the
> enterprise network abstraction manifested through the VET mechanism
> [I-D.templin-intarea-vet], the RANGER architecture
> [I-D.templin-ranger][I-D.russert-rangers] and the LISP protocol
> [I-D.ietf-lisp]. The term "subnetwork" within this document is
used
> synonymously with the term "enterprise" that appears in these
> references.
Again, LISP is mentioned despite the fact that there is no clear
cosensus that this work complements LISP.
So, the term "enterprise" is yet another term for "tunnel"?
> SEAL introduces a minimal new sublayer for IPvX in IPvY
encapsulation
> (e.g., as IPv4/SEAL/IPv6), and appears as a subnetwork
encapsulation
> as seen by the inner IP layer. SEAL can also be used as a sublayer
> for encapsulating inner IPvX packets within outer IPvY/UDP headers
> (e.g., as IPv4/UDP/SEAL/IPv6) such as for the Teredo domain of
> applicability [RFC4380]. When it appears immediately after the
outer
> IPv4 header, the SEAL header is processed exactly as for IPv6
> extension headers.
What does it mean that the SEAL header is processed "exactly as for IPv6
extension headers"? IPv4 does have an IP header extension mechanism
(IP options), why wasn't that used?
> network. SEAL-enabled ITEs insert a SEAL header during the
> encapsulation of inner IP packets in mid-layer and outer
> encapsulating headers and trailers. For example, an inner IPv6
> packet would appear as IPv4/*/SEAL/**/IPv6/**/* after the mid-layer
> '**' encapsulations, the SEAL header, and outer IPv4 and '*'
> encapsulations are added.
It is still quite unclear to me, after reading the document twice (and
up to here the third time), what the ** and * are supposed to
signify. Could these things be given names or explained more clearly
in the introduction, or something?
> o For tunnel-mode IPsec encapsulations, [RFC4301], the SEAL header
> is inserted between the {AH,ESP} header and outer IP headers as:
> IPvX/SEAL/{AH,ESP}/*/IPvY.
Ummm... It is my understanding that if the SEAL header comes after
the ESP header, it needs to be encrypted. But, the middle box that
is inserting the SEAL header doesn't have the credentials needed to
encrypt it. So, I think this example is invalid.
> For all other packets, the ITE admits the packet if it is no larger
> than the tunnel interface MTU; otherwise, it drops the packet and
> sends a PTB error message to the source with the MTU value set to
the
> tunnel interface MTU. The message must contain as much of the
> invoking packet as possible without the entire message exceeding
the
> minimum IP MTU (i.e., 576 bytes for IPv4 and 1280 bytes for IPv6).
I am not sure that all IPv4 stacks will handle a PTB message correctly
if it is returned in response to a packet that did not have the DF bit
set. It is likely that those nodes do not implement PMTUD and/or
won't have the newer code needed to read the MTU length from the ICMP
error. IMO, a survey of how existing stacks will reply to this message
in this situation should be conducted before we standardize a protocol
that depends on specific behavior in this case.
> Note that this SEAL segmentation ignores the fact that the mid-
layer
> packet may be unfragmentable outside of the subnetwork. This
> segmentation process is a mid-layer (not an IP layer) operation
> employed by the ITE to adapt the mid-layer packet to the subnetwork
> path characteristics, and the ETE will restore the packet to its
> original form during reassembly. Therefore, the fact that the
packet
> may have been segmented within the subnetwork is not observable
> outside of the subnetwork.
I think there may be some significant problems with this approach.
While it true that the exact same bits will be received on the other
end in normal circumstances, this has most of the same problems that
are present with any in-the-network fragmentation. In particular,
what I think of as the "unit of loss" is not maintained end-to-end. A
single packet from the source is broken into several pieces that may
be lost separately. In order to make up for one last piece, the full
original packet needs to be retransmitted. This has very bad
properties for congestion, especially when large packets are used.
> The ITE next encapsulates each segment in the requisite IP/* outer
> headers according to the specific encapsulation format (e.g.,
> [RFC2003], [RFC2473], [RFC4213], [RFC4380], etc.), except that it
> writes 'SEAL_PROTO' in the protocol field of the outer IP header
> (when simple IP encapsulation is used) or writes 'SEAL_DPORT' in
the
> outer destination service port field (e.g., when IP/UDP
encapsulation
> is used). The ITE finally sets the A bit as specified in Section
> 4.3.5 (if necessary), sets the packet identification values as
> specified in Section 4.3.6 and sends the packets as specified in
> Section 4.3.7.
It does not appear that SEAL includes any mechanism to allow per-flow
load balancing (for LAG/ECMP systems) when UDP is used in the
encapsulation. Since that is, as I understand it, the main reason
_why_ someone would want to use UDP encapsulation for a tunneling
protocol, this seems like a significant omission. If the IETF chooses
to go through with this proposal, the author shoudl look at the UDP
source port mechanism in the LISP encapsulation for an example of how
to support LAG/ECMP per-flow load balancing by using different UDP
source ports for different flows.
> Note that when IPv6 is used as the outer IP encapsulation layer,
the
> ITE must insert an IPv6 fragment header with an Identification
value
> set as described in Section 4.3.6.
Middle boxes are not supposed to fragment IPv6 packet en-route. There
should be no reason to do so, as all IPv6 nodes support PMTUD.
I am somewhat concerned about the descriptions of SEAL probes and
acknowlegments. Has any analysis or experimentation been done to
determine how much management (or other non-data) traffic will result
in a "typical" SEAL deployment?
> For tunnels specifically designed for the traversal of Network
> Address Translators (NATs) (e.g., Teredo [RFC4380]) and other
> middleboxes that may rewrite the outer IP ID field, the ITE instead
> writes least significant bits of the SEAL_ID in the ID field of the
> SEAL header and writes a random value in the Identification field
in
> the outer IP header.
How is SEAL expected to detect if one of these boxes exists in the path?
> Since the ID field in the SEAL header is only
> 16 bits, however, the ITE must limit the rate at which it sends
> packets to avoid wrapping the ID field. Alternatively, the ITE and
> ETE can use SEAL-FS to obtain a larger ID field in the SEAL header
> (see Section 5.3.6). In either case, both the ITE and ETE must be
> aware of the manner in which the SEAL_ID is inserted.
How are they aware of this? Through manual configuration of both ends?
> The ITE should specifically process raw ICMPv4 Protocol Unreachable
> messages and ICMPv6 Parameter Problem messages with Code
> "Unrecognized Next Header type encountered" as a hint that the ETE
> does not implement the SEAL protocol.
And do what?
>4.4.1. Reassembly Buffer Requirements
>
> ETEs must be capable of performing IP-layer reassembly for SEAL
> protocol IP packets up to 2KB in length, and must also be capable
of
> performing SEAL-layer reassembly for mid-layer packets up to (2KB -
> OHLEN). Hence, ETEs:
>
> o MUST configure a reassembly buffer size (i.e., a SEAL Maximum
> Reassembly Unit (S_MRU)) of at least 2KB
This seems small to me... Most host stacks I am familiar with can
reassembly a 4K packet, and such packets are frequently used for
block-based file sharing protocols, such as NFS (or at least they were
a few years ago).
It is not clear to me why the SEAL spec essentially reinvents ICMP
over UDP. The motivation for this should be better explained.
The IANA Considerations section is missing a registry for new SEAL
version numbers.
>10. Security Considerations
>
> Unlike IPv4 fragmentation, overlapping fragment attacks are not
> possible due to the requirement that SEAL segments be non-
> overlapping.
Include some further explanation of how this is enforced.
> The SEAL header is sent in-the-clear (outside of any IPsec/ESP
> encapsulations) the same as for the outer IPv4/* headers. As for
> IPv6 extension headers, the SEAL header is protected only by L2
> integrity checks and is not covered under any L3 integrity checks.
Analysis is needed to explain why the SEAL header is not vulnerable
due to this lack of protection.
The security section needs to analyze the SEAL control protocol and
explore the possible security issues with attackers modifying or
fabricating SEAL control messages.
Also, the security section is missing coverage of the general
security issues with tunnels including the ability to inject packets
into a remote network, etc.
> 2. Classical path MTU discovery requires that
> routers generate a PTB message for *all* packets lost due to an MTU
> restriction; this situation is exacerbated at high data rates and
> becomes severe for in-the-network tunnels that service many
> communicating end systems.
Reference? It is my understanding that ICMP error messages are
always rate limited.
> 5. Using SEAL, ETEs encapsulate ICMP error messages in an outer
> UDP/IP header such that packet-filtering network middleboxes
will
> not filter them the same as for"raw" ICMP messages that may be
> generated by an attacker.
Most firewalls and filtering middle boxes will drop inbound messages
that don't correspond to recent outbound packets and/or all UDP
traffic that is not to an allowed port. So using UDP isn't a magic
bullet to get these messages through firewalls and NATs.
> When the SEAL-TE ITE has knowledge that it wil traverse a
subnetwork
> with non-negligible loss due to, e.g., interference, link errors,
> congestion, etc. ,it can solicit Reassembly Reports from the ETE
> periodically to discover missing segments for retransmission
within a
> single round-trip time. However, retransmission of missing
segments
> may require the ITE to maintain considerable state and may also
> result in considerable delay variance and packet reordering.
Given that the source will also be retransmitting, it seems like
having SEAL endpoints retransmit fragments could be a bad idea,
especially in the presence of congestion.
> Each link in the path over which a SEAL-TE tunnel is configured is
> responsible for veryfying the integrity of packets that traverse
the
> link. Typical links employ strong integrity checks for packet
sizes
> that are no larger than the link MTU. Therefore, as long as the
> packet sizes supported by the underlying link are not violated, the
> SEAL-TE tunnel can reasonably expect that each SEAL segment will be
> correctly verified by the underlying link integrity checks.
>
> The SEAL-TE tunnel therfore need only concern itself with packet-
> splicing errors that can occur due to reassembly misassociations,
> i.e., when a segment from packet X is reassembled with segments
from
> packet Y. The primary sources of such errors include software bugs
> and wrapping IP ID fields. Given that IP fragmentation and
> reassembly implementations are well-tested, and that SEAL
> segmentation and reassembly entails a very simplistic procedure,
only
> the latter scenario bears further mention.
This isn't actually correct. According to research I've read (e.g. the
Stone-Partridge paper on this subject), there is a sinificant amount of
end-to-end packet corruption that occurs _between_ the links, in the
routers themselves. Because the link integrity check is performed on
the inbound interface and regenerated on the outbound interface, errors
that occur inside the router (from bus errors, or whatever) are not
detected and pass to the remote end. Research and recent discussion on
the LISP list would tend to indicate that these errors happen at a
rate on the order of 10^3 to 10^4 when traffic transits a multi-link
network.
I have a general question about the appendices. Are they intended
to normative or merely informative? This should be clarified in the
specification.
>Appendix D. Historic Evolution of PMTUD
>
> (Taken from "Neighbor Affiliation Protocol for IPv6-over-(foo)-
over-
> IPv4"; written 10/30/2002):
Why not just reference this document instea of quoting it?
Templin Expires January 3, 2010 [Page 42]
Note: Messages sent to this list are the opinions of the senders and do not imply endorsement by the IETF.