Re: [EAI] I-D Action:draft-ietf-eai-mailto-00.txt
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [EAI] I-D Action:draft-ietf-eai-mailto-00.txt



Martin Duerst wrote:

>> Oops, I thought the idea was to finish mailto-bis *before*
>> tackling mailto-eai.
 
> Well, yes in theory, but in practice, it makes quite a bit
> of sense to have these activities overlap. See below for
> an example.

Maybe, but then let's please pick this list to hash out the
details, it is too confusing if it involves three lists (EAI,
URI, RFC822).  

Here's how I see the mailto mess:

RFC 1738 had a very simple mailto: concept based on RFC 822
<rfc822-addr-spec> given as  encoded822addr = 1*xchar

<xchar> used to be <unreserved> or <reserved> or <escape>.
<escape> is what is today known as STD 66 <pct-encoded>.

Figuring out what <unreserved> and <reserved> was, and what
it is today can be difficult.  But ",", ";", "?", "&", and
"=" were allowed in RFC 1738 mailto URIs.

RFC 2368 introduced the concept of a query-part introduced
by "?" with "&" separated name "=" value pairs.  Therefore
we can forget RFC 1738 noting "," and ";" as still allowed.

RFC 2368 also replaced <addr-spec> by <mailbox>.  But that
was an error, and mailto-bis will fix it.  

RFC 2368 used the infamous #-rule to form comma separated
lists of <mailbox>es.  For an 822-parser the #-rule is okay,
comma-separated optional elements with comments and folding
white space.  For a 2822 parser it's a nightmare (2822upd
needed a major change to emulate the #-rule as obs-cenity).

For an URI-parser the #-rule with <mailbox> is FUBAR, and
mailto-bis tries to fix it.  But so far it does not, it has
%2C instead of the "," (allowed since RFC 1738) to separate
<addr-spec> elements.  [ISSUE #1]

RFC 2368 states that "?", "&", and "=" are reserved.  That
is somewhat odd, as "&" and "=" only need to be reserved 
after the "?".  And "?" only needs to be reserved before
the query part (i.e. in the list of mailto addresses).  

Let's assume that was a simplification in RFC 2368, using
the same rules on both sides of the "?".  But mailto-bis
claims that ";" has to be percent encoded, in addition to
"&" and ";".  It was always okay to use semicolon "as is",
so I guess that's wrong in 2c1.  [ISSUE #2]

2c2 states that <NO-WS-CTL> and <obs-local-part> are not
allowed, it should also state that <obs-domain> is not
allowed.  [ISSUE #3]  Using 2822upd instead of RFC 2822
would be clearer, you could then simply decree that all
2822upd obs-cenities are verboten.  You won't miss a bit
of these obscenities, guaranteed, the rest of mailto-bis
is interesting enough without this cruft.

2c3 is fine for the <domain> part and wrt comments also
in the <local-part> part.  IMO you can't say that white
space is generally not allowed in <local-part>, it only
has to be percent-encoded when found in <quoted-string>.
[ISSUE #4]

2c4 should go to an I18N section about mailto IRIs.  For
an ordinary URI consumer a percent-encoded <domain> with
UTF8-non-ascii makes no sense, and where it makes sense
it belongs to mailto-eai, not mailto-bis.  [ISSUE #5]

URI producers MUST NOT use percent-encoded UTF8-non-ascii
in a <local-part> or <domain> of a mailto URI.  There is
no such thing as a domain with UTF8-non-ascii, domains in
2822upd e-mail are limited to A-labels separated by dots.

2c5 for the <local-part> has it right, percent-encoded
UTF8-non-ascii is reserved for mailto-EAI, simply shift
2c5 also to the I18N section with references to RFC 3987
and EAI for the details.  As is 2c4 and 2c5 are confusing
for the purpose of explaining non-EAI mailto URLs.

The section about "body" doesn't mention that the charset
is unclear, actually a bug in RFC 2368.  Just saying that
2368 > 2277 implicitly means UTF-8 is shaky, at least it
has to be explicitly spelled out, and also noted in the
2368-diff.  [ISSUE #6]  Besides I fear that "body" is an
interoperability nightmare and poorly supported, so there
should be a warning that anything with more than a single
US ASCII line might fail miserably.

The "to" idea (to= parameter) in RFC 2368 was always odd,
it won't work for RFC 1738 clients ignoring a query part,
it's redundant wrt the purpose of mailto without to=, and
the example mailto:?to=... IMO violates the STD 66 syntax.
[ISSUE #7]  The NOT RECOMMENDED should be about using to=
at all, using only to= isn't possible under STD 66 rules.

I don't see why using the same header field name more than
once is allowed, with tons of prose explaining why it most
likely won't fly.  Just say MUST NOT and be done with it.

RFC 2368 didn't say that this is okay, for most plausible
header fields in a mailto URL RFC 2822 allows at most one
occurence.  [ISSUE #8]  Keywords are unlimited in 2822upd,
but I've never seen an e-mail with multiple keywords, let
alone a mailto URL using multiple keywords.  Netnews does
not allow multiple keywords.

RFC 2368 says Bcc is a bad idea, mailto-bis claims that it
is okay.  RFC 2368 got it right, Cc is good enough in URLs.
[ISSUE #9]  RFC 2368 gives more examples of bad ideas, but
the mailto-bis chapter about unsafe header fields gives not
a single example, it also doesn't mention Reply-To as "new"
known good idea (but it's explained in an example later).

Chapter 6 should be renamed to I18N considerations, adding
2c4 and 2c5 wholesale with references to RFC 3987, EAI, and
BCP 18 as mentioned above (issues #5 and #6).

Percent encoding UTF-8 as outlined in the "non-ascii" part
of chapter 2 (2nd point in the 2nd list) doesn't help, only
RFC 2047 can be expected to work (outside of EAI) with who
knows what legacy MUA using its very own idea of a "local"
charset.  RFC 2368 got this right:

| 8-bit characters in mailto URLs are forbidden. MIME 
| encoded words (as defined in [RFC2047]) are permitted
| in header values, but not for any part of a "body" hname.

There is no such thing as a mail header field using UTF-8
outside of EAI (message/global).  And mailto-bis prepares
a message/rfc822, not a message/global.  [ISSUE #10]  Only
for body= it's arguably possible to permit UTF-8, based on
the shaky 2368 > 2277 theory, IOW assuming an RFC 2368 bug.

Chapter 7.3 uses UTF-8 in header fields, that belongs into
mailto-eai (message/global), not mailto-bis.  The examples
in 7.3 using RFC 2047 are okay for a message/rfc822 header.

> we still can use the time to move ahead and tease out issues.

Okay, ten issues, I hope I didn't forget anything already
reported for mailto-bis-04 and before.

> When drafting mailto-eai, I was looking at the utf8headers
> draft and decided that in order to get a syntax definition
> for the <internationalized <fallback>> construct, the best
> thing to pick was <mailbox>.

IMHO mailto-eai should not be completely different from the
old RFC 1738 idea <addr-spec>, reenforced in mailto-bis, and
reflecting common practice, no matter what RFC 2368 says -
<mailbox> was a mistake, Paul said this, one of the authors.

> The first alternative might be to use
>    internationalized <fallback>
> or some such, but that might not fly because it would
> just be interpreted as
>    display_name <email_address>

Point.  After the long rant about legacy MUAs, and why they
won't get UTF-8 right, I cannot suddenly assume that they
would understand EAI syntax.  In fact EAI addresses cannot
work at all with legacy MUAs, unless the MUA is so stupid
that it never notes that it is trying something "unusual",
delegating all trivial error handling to the MSA <shudder />

But I still don't want the <mailbox> in utf8-headers, for
nested angle brackets you need something like...

  "<" utf8-addr-spec [ alt-address ] ">" 

...minus 2822 obs-cenities, and with a single space instead
of FWS, let alone 1*FWS.

 Frank

_______________________________________________
IMA mailing list
IMA at ietf.org
http://www.ietf.org/mailman/listinfo/ima



Note: Messages sent to this list are the opinions of the senders and do not imply endorsement by the IETF.