[tcpm] Rewriting MSS option for NAT64
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[tcpm] Rewriting MSS option for NAT64
Hi,
In the BEHAVE wg we're working on NAT64, a way to make IPv6 clients
talk to IPv4 servers through a translator. It's a lot like NAT-PT but
with many of the issues better addressed. See:
http://tools.ietf.org/html/draft-bagnulo-behave-nat64-03
I've been working on text about packet sizes and fragmentation (see
the text at the end of the message for context), and Lars asked me to
ask you guys' input on this part:
The TCP MSS option [RFC 793] is used during the three-way handshake
by the two hosts involved to inform each other about the maximum TCP
segment size (assuming IP and TCP headers without options) that the
host can receive.
In practice, the MSS option is often used to make TCP work in the
presence of broken path MTU discovery.
To avoid unnecessary path MTU discovery cycles, a NAT64 SHOULD
rewrite the MSS option in SYN packets to the minimum of the original
MSS option, the NAT64's MTU on the IPv6 side - 60 and the NAT64's
MTU on the IPv4 side - 40. This applies to SYNs in both the IPv4-to-
IPv6 direction and the IPv6-to-IPv4 direction.
Since this is already very widely deployed in boxes that do stuff like
PPPoE that reduces the MTU on access networks, I'm assuming there is
no problem with this, especially since we're putting this into a
translator that breaks authentication etc anyway.
Iljitsch
Packet sizes
It's the job of the network layer to adapt to different maximum packet
sizes as packets move through the network. There are three mechanisms
that handle this: transport layer negotiations such as the TCP MSS
option, path MTU discovery and fragmentation. The difference between
the IPv4 and IPv6 header sizes requires some handling in a NAT64
translator, and there are complications because of the differences
between how IPv4 and IPv6 handle fragmentation, as well as the issue
of how to demultiplex fragmented IPv4 packets.
There are two approaches to path MTU discovery and fragmentation when
translating from IPv6 to IPv4:
1. Set DF to 0 in the translated packets. This avoids path MTU
discovery issues but leads to significant numbers of fragments.
2. Set DF to 1 in the translated packets. This supports path MTU
discovery on the IPv4 side so unnecessary fragments are avoided, but
it doesn't address the issue that IPv6 hosts are not required to
perform PMTUD when sending packets of 1280 bytes or smaller.
The choice made in this document is to support option 1 for packets
upto 1280 bytes, and option 2 for packets larger than 1280 bytes.
A NAT64 translator MUST have an MTU of at least 1280 on all of its
interfaces, both IPv4 and IPv6 interfaces.
TCP MSS option
The TCP MSS option [RFC 793] is used during the three-way handshake by
the two hosts involved to inform each other about the maximum TCP
segment size (assuming IP and TCP headers without options) that the
host can receive.
In practice, the MSS option is often used to make TCP work in the
presence of broken path MTU discovery.
To avoid unnecessary path MTU discovery cycles, a NAT64 SHOULD rewrite
the MSS option in SYN packets to the minimum of the original MSS
option, the NAT64's MTU on the IPv6 side - 60 and the NAT64's MTU on
the IPv4 side - 40. This applies to SYNs in both the IPv4-to-IPv6
direction and the IPv6-to-IPv4 direction.
Path MTU discovery
The vast majority of both IPv4 and IPv6 hosts use path MTU discovery
[RFC 1191] [RFC 1981]. With IPv4, PMTUD can be enabled on a per-packet
basis by setting the DF bit to 1. With IPv6, there is no need for
PMTUD for packets up to 1280 bytes because all IPv6 hosts are required
to be able to receive 1280-byte packets without fragmentation. When
sending larger packets, IPv6 hosts implicitly use PMTUD.
IPv6-to-IPv4
If the NAT64 has the same MTUs on its IPv6 and IPv4 interfaces, it
will never have to generate "packet too big" messages for incoming
IPv6 packets because the translation from IPv6 to IPv4 reduces the
packet size by 20 bytes, more if the IPv6 packet has extension headers
that are removed during the translation, such as the fragment header.
If the MTU on the IPv6 side is larger than 1280 bytes and more than 20
bytes smaller than the MTU on the IPv4 side, the NAT64 MUST generate
the appropriate "packet too big" messages on the IPv6 side.
To support PMTUD, for translated packets that are larger than 1260
bytes on the IPv4 side (1280 bytes IPv6 packets with 20 byte size
reduction through the translation), the DF bit is set to 1 in the
resulting IPv4 packet.
IPv4 routers may generate "packet too big" messages indicating a
supported MTU size smaller than 1280 bytes. In those cases, the IPv6
hosts will continue to send packets larger than what the IPv4 path MTU
can support. To allow packets to be delivered successfully in this
case, the DF bit is set to 0 in all translated packets smaller than or
equal to 1260 bytes, to allow these packets to be fragmented in the
IPv4 network.
Note: it is highly recommended for IPv4 hosts running services that
may be used by IPv6 clients through a NAT64 translator to use an MTU
size of at least 1260 bytes and to properly generate "packet too big"
messages.
When a NAT64 translates "packet too big" messages from IPv6 to IPv4,
it adjusts the advertised MTU to the minimum of the original
advertised MTU + 20, the NAT64's MTU on the IPv6 side + 20 and the
NAT64's MTU on the IPv4 side.
IPv4-to-IPv6
Because it may be necessary to include a fragmentation header or other
extension header, the NAT64 MUST be prepared to generate "packet too
big" messages for packets with the DF bit set to 1 received from the
IPv4 side, regardless of the MTU sizes on the IPv4 and IPv6
interfaces. If the packet is larger than can be transmitted on the
IPv6 side after translation, the NAT64 returns a "packet too big"
message indicating the maximum IPv4 packet size that would be
supported using the same translation as the current packet. This can
be calculated as IPv4-packet-size - (IPv6-packet-size - IPv6-total-
length) + 20.
When a NAT64 translates "packet too big" messages from IPv4 to IPv6,
it adjusts the advertised MTU to the minimum of the original
advertised MTU - 20, the NAT64's MTU on the IPv6 side and the NAT64's
MTU on the IPv4 side - 20. However, if the advertised MTU in "packet
too big" messages is smaller than 1260 bytes, the value put into the
translated "packet too big" message is 1280. This makes sure that the
IPv6 host will limit its packet sizes to 1280 bytes, so its packets
are subsequently translated into IPv4 packets with DF set to 0. (This
deviates from [RFC 2765].)
Fragmentation
Because NAT deviates from normal router behavior, the limitation that
IPv6 packets or IPv4 packets with DF set to 1 are not fragmented by
routers doesn't apply to a NAT64 translator. Where appropriate, these
packets are fragmented after translation as described below.
Demultiplexing
Because NAT64 provides a stateful many-to-one (perhaps even many-to-
many) translation, it is necessary to recognize which session a given
packet belongs to. For this, the TCP or UDP port numbers must be
known, but these only occur in the first fragment of a fragmented
packet. There are two possible ways to deal with this:
1. Reassemble the packet before translating it.
2. Create translation state for the fragments belonging to the same
packet so each packet can be translated.
Strategy 2 is attractive in large installations because it requires
less storage and processing. However, it may still be necessary to
buffer fragments for some time, as the fragment containing the first
part of the packet (and with that, the port numbers) may not be the
first one to arrive.
Note: based on the assumptions that hosts generate fragments in-order
and that reordering must happen through parallel network links and
that the path between these parallel links and a NAT64 supports speeds
of at least 10 Mbps, there is a very high probability that two out-of-
order fragments making up a packet will arrive at the NAT64 within 50
to 100 milliseconds. Further assuming that fragmented traffic makes up
less than 10% of all traffic, this only requires a buffer of 6 to
12,500 fragments (50 ms at 10 Mbps to 100 ms at 10 Gbps).
In some cases, especially in the IPv6-to-IPv4 direction, there may
only be a single session matching the fragment's source and
destination addresses and protocol number. In these cases, it would be
possible to translate the fragments out-of-order. A NAT64 translator
MAY do this for TCP, however, it MUST NOT translate UDP packets before
the first fragment is available. The reason for this is that the
fragment could be part of a packet setting up a new session. However,
with TCP session establishment packets don't carry data, so it's
extremely unlikely that they are fragmented. This is not the case with
UDP, and in the IPv4-to-IPv6 direction, a UDP packet may have a zero
checksum, which must be recalculated when translating to IPv6, for
which the entire packet must be available.
IPv6-to-IPv4
For all IPv4 packets that the NAT64 creates through translation, the
translator generates an ID value. This applies to all packets,
regardless of their size or the value of the DF field. A NAT64
translator MAY employ strategies to avoid reusing an ID value for a
certain source, destination, protocol tuple as long as possible. If
the IPv4 packets are fragments of an IPv6 packet, then state is
created that makes it possible for all the fragments to have the same
ID value on the IPv4 side.
[RFC 2765] specifies copying the lower bits from the IPv6 ID field in
a fragment header (if present) to the IPv4 ID field, but this runs the
risk of two IPv6 hosts talking to the same IPv4 destination through
the NAT64 using the same ID value.
Otherwise, when translating IPv6 packets with a fragmentation header,
the fragments are translated as per [RFC 2765].
IPv4-to-IPv6
Because packets coming in on the IPv4 side may be larger than 1280
bytes after translation, a NAT64 MUST implement PMTUD on the IPv6
side. In other words, it must react to "packet too big" messages for
any IPv6 destination that it communicates with by limiting the size of
the packets that it sends to the advertised maximum.
In the case where, after translation from IPv4 to IPv6, a packet is
larger than a destination's PMTU, the NAT64 returns a "packet too big"
as outlined earlier in the case that the DF bit was set to 1 in the
IPv4 packet. If the DF bit was set to 0, the translator first
translates the IPv4 packet, and then fragments the resulting IPv6
packets using normal IPv6 fragmentation rules. The value in the ID
field is generated locally by the NAT64. If the IPv4 packet was a
fragment, state is created that allows the same ID value to be used
for all IPv6 packets or fragments that are part of the same original
IPv4 packet.
_______________________________________________
Behave mailing list
Behave at ietf.org
https://www.ietf.org/mailman/listinfo/behave
Note: Messages sent to this list are the opinions of the senders and do not imply endorsement by the IETF.