[EAI] Thinking about requirements / downgrade
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[EAI] Thinking about requirements / downgrade
I'm going to take a couple steps back. Mostly I'm focusing on the local part of the address, and I think there's a solution to get us unstuck.
A lot of background. At a high level there are really only a few requirements:
* We need Unicode addresses (That's the point of EAI after all :)
* Many people still need a user friendly ASCII address, for the English side of a Japanese business card if nothing else.
* Some sort of "downgrade" must exist for backwards compatibility. I'm being liberal with the term. "Downgrade" could be a user trying Unicode and then retrying with ASCII if necessary, or just giving out ASCII because they know EAI won't work in some scenario.
1) We need Unicode addresses.
The Unicode address is solved reasonably well by UTF-8 and the existing RFCs. UTF-8 also solves a myriad of problems regarding code pages in the rest of the message, but that's not specifically an EAI issue.
2) Friendly ASCII addresses.
Friendly ASCII addresses are pretty much a non-issue since mailbox aliases are a common mail feature. (I get mail to shawnste & Shawn.Steele) I make the small assumption that aliasing would be extended to Unicode and that aliases will be more common in an EAI world. This probably primarily impacts mailbox administration tools.
3) Downgrade
So that leaves downgrade. Downgrade is necessary for a "transition" period, which is probably many years. Note that there's no technical requirement that downgrade use the friendly ASCII address mechanism. It is also worth mentioning that downgrade necessarily requires an EAI aware system (even if it's "just" human downgrade, the human has to be aware of the issue) to do the downgrade.
There are also various degrees of downgrade:
* Human-only downgrade with no automated mechanism. If my mail bounces I try a different address. Some big problems are that I may not know the other address, I may not know how to send From: an ASCII address, and some systems may accept mail they can't reply to.
* Partially automated downgrade. This could be something like From:, or the older headers or something else that doesn't provide the same experience in a mixed environment as in a legacy or EAI only environment. Eg: simple mail may work, but DL's, newsgroups, reply-to, or other cases could get tricky or fail.
Some big problems with partial downgrade are that it is partial, so user's may think mail works, but get unexpected failures in edge cases. Also (depending on the degree of "partial"), it may fail if the fallback address(es) aren't know or configured correctly. Also some mail may appear to succeed but may not be able to be replied to.
* Fully automated downgrade. This would be some magic system that would downgrade all headers so that "everything" worked, including mailing lists, etc. In addition to the problem of discovering the downgrade address, we've run into numerous technical concerns and edge cases with the solutions investigated so far.
I've been flip-flopping on how much downgrade is necessary.
Human-only downgrade:
I think it's possible that a human-only mechanism may "work", although that may slow EAI adoption significantly. (If I don't think my EAI address'll work, then I'll only give out my ASCII address, then why bother with EAI?)
Partially automated downgrade:
Personally I don't see much point in partially automated downgrade (folks from Exchange & Outlook agree.) If my mail to you works, and your replies work, but everything breaks when I send to a DL or newsgroup, then it's almost worse.
Fully automated downgrade:
Fully automated downgrade would be cool, but we've pretty much proved that it's impossible. Mostly because pairing the addresses breaks down in several cases, or because the syntax to make the pairing explicit breaks downlevel syntax.
A possible solution:
A while back Mark suggested an solution that's been discussed before and I rejected out of hand. Either I've had a paradigm shift in my thinking or I misunderstood Mark's earlier suggestion, however there is a way that allows fully automated downgrade without breaking any of the other ASCII address alias or UTF8SMTP behavior. It also only requires updating the EAI aware servers/clients. Unaware systems wouldn't see anything different, probably even mailing lists. Upgrade is even supported.
Mark's suggestion (or my modification of it) is something like this:
* Everyone that wants one gets a Unicode address, and a human friendly ASCII alias (if desired).
* When people exchange addresses in written form, they can use Unicode if they think the recipient is EAI aware, or they can use the ASCII address.
* Mail with EAI to EAI and the Unicode address works as spec'd for UTF8SMTP, etc. No extra fields are sent.
* When a human or form isn't expecting Unicode, the user shares the human-friendly ASCII alias. Then the ASCII to EAI server behaves as normal. No extra fields are sent.
* When an EAI server (or client) hits an EAI unaware system, the message is downgraded. Addresses are downgraded by using unmapped punycode, which is algorithmic, so it avoids all the address pairing problems. An EAI aware client app (like Outlook) could then upgrade the punycode if desired. Before downgrade a message is identical to the existing practice, just with UTF-8, and is non-breaking. After downgrade, a message is still identical to existing legacy behavior, so there's no breaking.
By using UNMAPPED punycode (raw ACE encoding), servers can control their own mappings (Turkish I or whatever's interesting to them). Presumably they'd decode back to Unicode, then do their mapping and routing. The problems that downgrade had with mailing lists is avoided because all addresses could be downgraded at any point. Additionally a ASCII-only mailbox user doesn't need an EAI aware server to use EAI because their EAI aware client could do the mapping.
To me, the differences between this and previous punycode proposals (maybe I misunderstood them) are:
a) UTF-8 is preferred, and is the long term form, not punycode.
b) Human-readable ASCII aliases are a preferred method of exchange on the sticky note or whatnot. (Nobody's going to exchange a punycode address).
c) Punycode is only used for fallback, when UTF8STMP or other EAI protocols aren't recognized.
d) It isn't IDN punycode, but just unmapped Unicode that the server has to decode and map.
The advantages are:
* Users get a Unicode address.
* Users still get a human friendly ASCII address as needed.
* There's a full downgrade capability (with upgrade if desired)
* Interoperability with legacy systems should be very high.
* No problems with pairing of the addresses.
* Works whenever an EAI-aware client/mailbox behave. Intermediate systems don't impact it, and an EAI aware recipient client can have an EAI experience even if they don’t have an EAI mailbox themselves.
* Only servers/clients upgraded to EAI need to be touched. Legacy systems in the middle will still behave. (I'm told by Exchange that it would take decades to upgrade everything).
* Mapping still happens by the mailbox server's rules.
* Algorithmic rules for downgrade mean that downgrade is always possible instead of requiring knowledge of the ASCII alias.
* Senders can downgrade (if the client is EAI aware) even with only a Unicode recipient address, even if their mail server is a legacy server.
The disadvantages are:
* Non-EAI aware recipients would receive punycode addresses instead of human friendly ASCII. This is mitigated by the fact that most clients display the display name.
* Punycode downgrade requires that EAI aware systems know the punycode mapping. This is mitigated by reducing the complexity of the other pairing schemes.
* EAI mailbox servers have to unmap the Punycode to Unicode before doing any mapping to find the mailbox. (Aliases might work in some cases, but you'd still want case mapping probably).
* For systems that don't need EAI (eg: many US mail servers) there's reduced incentive to upgrade to be EAI aware since an EAI aware client is sufficient. This could delay complete adoption. (It's also an advantage that could speed up early adoption.)
* Three addresses (Unicode, ASCII, Punycode) instead of 2 (though the Punycode is just an encoded form of the Unicode).
Couple minor comments:
* As I said, the punycode should be unmapped ACE, not IDN form.
* To make that clear, I'd pick a different prefix, or some other signature mechanism. (My fear is that developers would accidentally call the IDN APIs and get inappropriate mappings).
I know we've discussed Punycode before, but I think the difference is that I'm suggesting it as a last-resort case, not as a replacement for UTF8SMTP, nor a replacement for the human friendly ASCII aliases.
Please note that I am very much against Punycode as a solution for EAI (eg: replacing UTF-8). By itself as a long-term solution Punycode has huge problems. I think it has a place a "hack" to solve downgrade though.
If necessary to further discussion I could put this in draft form.
-Shawn
Note: Messages sent to this list are the opinions of the senders and do not imply endorsement by the IETF.