Re: HEX-UTF-8 vs. Unicode-escapes (was Re: [EAI] Re: utf-8-address syntax: ...)
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: HEX-UTF-8 vs. Unicode-escapes (was Re: [EAI] Re: utf-8-address syntax: ...)



Charles Lindsey wrote on 2/25/07 14:59 +0000:
On Sat, 24 Feb 2007 05:14:21 -0000, Chris Newman <Chris.Newman at Sun.COM>
wrote:
Frank Ellermann wrote on 2/23/07 14:11 +0100:
The input is already valid (one hopes) raw UTF-8,
why not simply use RFC 3987 style for it ?  I.e.
"percent-encoded" everywhere, not only for some
forbidden ASCII octets (CTL, SP, %, +, and =).

Or, for that matter, why not just use <xtext>? That would be no more of a mess that RFC 3461 already made of it. There is nothing in RFC 3461 thst prevents the use of <xtext> for octets >=128, except that they didn't anticipate it. Existing software (and humans) that understand the addr-type rfc822 would then immediately understand addr-type utf-8-enc. <xtext> ain't (that) broken. Why fix it? There are already too many ways of encoding 8bits into 7. Why invent yet another one?

Good question. I would have preferred to use xtext, but this text from RFC 3461 makes that not possible without too much risk of breakage:


  Due to limitations in the Delivery Status Notification format, the
  value of the original recipient address prior to encoding as "xtext"
  MUST consist entirely of printable (graphic and white space)
  characters from the US-ASCII [4] repertoire.  If an addr-type is
  defined for addresses which use characters outside of this
  repertoire, the specification for that addr-type MUST define the
  means of encoding those addresses in printable US-ASCII characters
  when are then encoded as xtext.

The problem is that xtext is a transfer encoding which is removed when a traditional message/delivery-status part is generated and there's a hard requirement in today's deployed MTAs that the result of xtext removal be 7-bit ASCII. So the choice is to produce a non-xtext encoding for all the 8-bit characters and have two encodings present, or to design an encoding that obviates the need for xtext. The latter seemed cleaner to me, although an approach that used xtext for encoding 7-bit and something else for non-ASCII is an approach the working group might want to consider.

So xtext was badly botched.

               - Chris


_______________________________________________ IMA mailing list IMA at ietf.org https://www1.ietf.org/mailman/listinfo/ima




Note: Messages sent to this list are the opinions of the senders and do not imply endorsement by the IETF.