idnits 2.17.1 draft-levine-dns-mailbox-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 103: '... part MUST be interpreted and ass...' Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (September 20, 2015) is 3133 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- == Missing Reference: '0-9a-v' is mentioned on line 201, but not defined == Missing Reference: 'Bb' is mentioned on line 222, but not defined == Missing Reference: 'Oo' is mentioned on line 222, but not defined == Missing Reference: 'Yy' is mentioned on line 222, but not defined == Missing Reference: 'Rr' is mentioned on line 222, but not defined == Missing Reference: 'Ee' is mentioned on line 222, but not defined == Missing Reference: 'Tt' is mentioned on line 222, but not defined Summary: 2 errors (**), 0 flaws (~~), 8 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group J. Levine 3 Internet-Draft Taughannock Networks 4 Intended status: Experimental September 20, 2015 5 Expires: March 23, 2016 7 Encoding mailbox local-parts in the DNS 8 draft-levine-dns-mailbox-01 10 Abstract 12 Many applications would like to store per-mailbox information 13 securely in the DNS. Mapping mailbox local-parts into the DNS is a 14 difficult problem, due to the fuzzy matching that most mail systems 15 do, and the DNS design that only does exact matching. We propose 16 several experimental approaches that attempt to implement the 17 required fuzzy matching through DNS queries. 19 Status of This Memo 21 This Internet-Draft is submitted in full conformance with the 22 provisions of BCP 78 and BCP 79. 24 Internet-Drafts are working documents of the Internet Engineering 25 Task Force (IETF). Note that other groups may also distribute 26 working documents as Internet-Drafts. The list of current Internet- 27 Drafts is at http://datatracker.ietf.org/drafts/current/. 29 Internet-Drafts are draft documents valid for a maximum of six months 30 and may be updated, replaced, or obsoleted by other documents at any 31 time. It is inappropriate to use Internet-Drafts as reference 32 material or to cite them other than as "work in progress." 34 This Internet-Draft will expire on March 23, 2016. 36 Copyright Notice 38 Copyright (c) 2015 IETF Trust and the persons identified as the 39 document authors. All rights reserved. 41 This document is subject to BCP 78 and the IETF Trust's Legal 42 Provisions Relating to IETF Documents 43 (http://trustee.ietf.org/license-info) in effect on the date of 44 publication of this document. Please review these documents 45 carefully, as they describe your rights and restrictions with respect 46 to this document. Code Components extracted from this document must 47 include Simplified BSD License text as described in Section 4.e of 48 the Trust Legal Provisions and are provided without warranty as 49 described in the Simplified BSD License. 51 Table of Contents 53 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 54 1.1. Definitions . . . . . . . . . . . . . . . . . . . . . . . 3 55 2. Summary of the approaches . . . . . . . . . . . . . . . . . . 3 56 3. Literal bytes . . . . . . . . . . . . . . . . . . . . . . . . 3 57 4. Encoded bytes . . . . . . . . . . . . . . . . . . . . . . . . 4 58 4.1. Static or Dynamic name servers . . . . . . . . . . . . . 4 59 4.2. All names valid . . . . . . . . . . . . . . . . . . . . . 5 60 5. Encoded regular expressions . . . . . . . . . . . . . . . . . 5 61 5.1. Representing the DFA in the DNS . . . . . . . . . . . . . 6 62 5.2. Matching a local-part against a DFA . . . . . . . . . . . 7 63 6. Pointer to server . . . . . . . . . . . . . . . . . . . . . . 7 64 7. Scaling Issues . . . . . . . . . . . . . . . . . . . . . . . 8 65 8. Security Considerations . . . . . . . . . . . . . . . . . . . 8 66 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 9 67 9.1. Normative References . . . . . . . . . . . . . . . . . . 9 68 9.2. Informative References . . . . . . . . . . . . . . . . . 9 69 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 9 71 1. Introduction 73 E-mail mailboxes consist of a local-part (sometimes informally called 74 left hand side or LHS), an @-sign and a domain name. While the 75 domain name works like any other domain name, the local-part can 76 contain any ASCII characters, up to 64 characters long. Mailboxes in 77 Internationalized mail [RFC6532] can contain arbitrary UTF-8 78 characters in the local-part, not just ASCII. (The domain name also 79 can contain UTF-8 U-labels, but the process to translate U-labels to 80 ASCII A-labels for DNS resolution is well defined and is not further 81 addressed here.) The DNS protocol is 8-bit clean, other than ASCII 82 case folding, although some DNS provisioning software does not handle 83 characters outside the ASCII set very well. 85 Mail systems usually handle variant forms of local-parts. The most 86 common variants are ASCII upper and lower case, which are generally 87 treated as equivalent. But many other variants are possible. Some 88 systems allow and ignore "noise" characters such as dots, so local 89 parts johnsmith and John.Smith would be equivalent. Many systems 90 allow "extensions" such as john-ext or mary+ext where john or mary is 91 treated as the effective local-part, and the ext is passed to the 92 recipient for further handling. Yet other systems use an LDAP or 93 other directory to do approximate matching, so an address such as 94 john.smith might also match jsmith so long as there's no other 95 address that also matches. 97 [RFC5321] and its predecessors have always made it clear that only 98 the recipient MTA is allowed to interpret the local-part of an 99 address: 101 "... due to a long history of problems when intermediate hosts 102 have attempted to optimize transport by modifying them, the local- 103 part MUST be interpreted and assigned semantics only by the host 104 specified in the domain part of the address." (Sec 2.3.11.) 106 This presents a problem when attempting to map local-parts into the 107 DNS, since the DNS only handles exact matchies, and clients cannot 108 make any assumptions about variants of local parts, and hence cannot 109 try to normalize variants to a "standard" version published in the 110 DNS. 112 This document suggests some approaches to shoehorn local-parts into 113 the DNS. Since none of them have, to our knowledge, been implemented 114 they are all presented as experiments, with the hope that people 115 implement them, see how well the work, and perhaps later select one 116 of them for standardization. 118 1.1. Definitions 120 The ABNF terms "mailbox" and "local-part" are used as in [RFC5321]. 122 2. Summary of the approaches 124 o Literal bytes: put the local-part directly into the DNS as a name 126 o Encoded bytes: encode the local-part into names consisting of 127 letters and digits 129 o Regex: encode the set of names as a Deterministic Finite Automaton 130 (DFA) corresponding to a regular expression that matches the valid 131 names. 133 o Pointer to server: securely identify an http server that will 134 handle the lookup. 136 3. Literal bytes 138 Since the DNS protocol is mostly 8-bit clean, one can put the local- 139 part into the DNS as is. The suggested separator is _lmailbox so the 140 address Bob.Smith@example.com would be represented as: 142 Bob\.Smith._lmailbox.example.com 143 (The \. is the master file convention for a literal dot in a name.) 144 The maximum length of a local-part is 64 characters, while a DNS name 145 component is limited to 63, but actual local-parts of 64 characters 146 are vanishingly rare, and systems with distinct mailboxes with names 147 that differ only in the 64th character even rarer. It also cannot 148 distinguish between upper and lower case ASCII characters, but MTAs 149 that do not treat them the same are also very rare. 151 This has the benefit of simplicity--the server can directly see 152 exactly what name the client is looking up. Its disdvantage is that 153 some provisioning software does not handle names well if they contain 154 characters outside the usual ASCII printing character set. Its other 155 characteristics are similar to those for encoded bytes, described 156 next. 158 4. Encoded bytes 160 To avoid problems with characters in DNS names, we can encode the 161 local-part with a simple reversible transformation that represents 162 names using the hostname subset of ASCII. To preserve lexical order, 163 which might be useful, take the local-part, pad it out to 64 bytes 164 with xFF bytes, which are invalid both in ASCII and UTF-8, and break 165 the string into two 32 byte chunks. Then encode each chunk as 52 166 characters in a variant of base32, with each 5-bit section 167 represented as a character from the sequence 0-9a-v. Then use the 168 encoded low part, a dot, and the encoded high part as end of the DNS 169 name. The suggested separator is _emailbox so the address 170 Bob.Smith@example.com would be represented as: 172 vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvg. 173 89nm4bijdlkn8q7vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvg. 174 _emailbox.example.com 176 (The name is is displayed on several lines to make it fit in the 177 margins, but the actual name is one long string delimited by dots.) 178 Since many local parts are 32 bytes or less, a simple optimization 179 would be to omit the low part if it's all encoded 0xff bytes. 181 4.1. Static or Dynamic name servers 183 A mail server with a small set of variants could export the names as 184 either literal or encoded bytes to be served by an ordinary 185 authoritative DNS server. A mail server with the more typical wide 186 range of variants could be lashed up to a special purpose DNS server 187 that recovers the local-part from the literal or encoded bytes, 188 figures out what key it corresponds to, and synthesizes an key 189 record, or NXDOMAIN if there isn't one. 191 4.2. All names valid 193 Synthesizing NXDOMAIN responses is likely to be hard, due to the 194 difficulty of figuring what the valid addresses above and below it 195 are (or even worse, the NSEC3 hashes.) Also, a static zone with NSEC 196 is easily enumerated, which would leak the set of mailboxes in the 197 domain. 199 A dynamic server has the option of returning a record for every query 200 for a syntactically valid encoded name, i.e. anything that is two 201 names of 52 characters from the set [0-9a-v]. If there is no key for 202 the mailbox (which may mean the mailbox doesn't exist or that it does 203 exist but doesn't have a key), the key field in the record is zero 204 length. This makes dynamic DNSSEC somewhat easier, since the server 205 doesn't have to synthesize NXDOMAIN responses for valid encoded 206 names, and for other names it is straightforward to compute the 207 nearest possible encoded names. It also makes it unproductive to try 208 to enumerate the names in the domain. 210 5. Encoded regular expressions 212 Many variant local-parts are easily described using regular 213 expressions. For example, the local-parts matching "bobsmith" on a 214 system that ignores ASCII case distinctions and allows dots between 215 the characters would be described as 216 "[Bb].?[Oo].?[Bb].?[Ss].?[Mm].?[Ii].?[Tt].?[Hh]". The local-parts 217 for the address "bob" with optional + extensions would be 218 "[Bb][Oo][Bb](\+.*)?" For typical variant rules, it is 219 straightforward to generate the regular expressions, and even for 220 variants not easily described by patterns, it is possible to 221 enumerate distinct variants, e.g. 222 "([Bb][Oo][Bb]|[Bb][Oo][Bb][Bb][Yy]|[Rr][Oo][Bb][Ee][Rr][Tt])". 224 Regular expressions are equivalent to Deterministitic Finite Automata 225 (DFA), often called state machines, and algorithms to translate 226 betwen them are well known. See, for example, chapter 3 of [ASU86]. 227 Lexical analyzer generators such as lex [LESK75] take a collection of 228 regular expressions and translate them into a DFA that can be used to 229 match the regular expressions against input strings efficiently in a 230 single pass through the input string with one lookup per character in 231 the string. For Unicode text, one can either treat the string as a 232 sequence of Unicode characters, or a sequence of the octets in the 233 UTF-8 repreentation, and translate either into a DFA and a state 234 machine. In the discussion below we assume the machine matches the 235 octets, but the implementation using charactrs would be very similar. 237 This approach stores the state machine in the DNS, to allow DNS 238 clients to efficiently match valid local-parts against the regular 239 expression. The state machine in a DFA consists of a set of states, 240 conventionally identified by decimal numbers. Each state can be a 241 terminal state, which means that if the input is at the end of the 242 string, the regular expression has matched. The state also has a set 243 of transitions, pairs of (octet,state) that tell the DFA to switch to 244 the given state based on the next input octet. 246 To match an input string, the client starts at state zero, then uses 247 each octet in the input string (in this case the local-part) to 248 choose a next state. If at any stage the octet does not have a 249 corresponding next state, the match fails. If at the end of the 250 string, the final state is a terminal state, the match succeeded and 251 the terminal state identifies which regular expression it matched. 252 The DFA matcher here is considerably simpler than the one that lex 253 and similar programs use, since they repeatedly match expressions 254 against a long string of input to divide it into lexical tokens, 255 while in this application there is one input string that either 256 matches or not. 258 5.1. Representing the DFA in the DNS 260 Each state in the DFA is represented by a collection of DNS names and 261 records. We define a new DFA record that contains a single 16-bit 262 field, which is the state number of the next state. Most records are 263 of the form: 265 cc.ddd._rmailbox.example.com IN DFA 123 267 In this example, ddd is the current state number as a decimal number, 268 and cc is the hex value of the next octet. Non-terminal states have 269 a DFA record to identify the next state. Terminal states (which may 270 also be non-terminal states if one local-part is a prefix of another) 271 have key records such as SMIMEA. 273 For wildcard subexpressions, written as "." , the cc is a * DNS 274 wildcard. The DNS closest encloser rule allows states where a few 275 characters have specific matches, and everything goes to a default 276 state, as in situations were a user calls out a few specific address 277 extensions, e.g. "bob-dnslist" and "bob-jokes" and every other 278 extension matches "bob-.*". This encoding makes the zone 279 considerably smaller than it would be if a record for every possible 280 octet value had to be stored separately. 282 Once the local-parts are compiled into the state machine records, 283 they are an ordinary DNS zone that can be served by an ordinary 284 authoritative server. 286 5.2. Matching a local-part against a DFA 288 Start by turning the local-part into a list of octets. For 289 traditional ASCII local-parts, the characters are the octets, for 290 internationalized local-parts the characters are Unicode characters, 291 which may be represented by several UTF-8 octets. Set the state 292 number to zero, which is by convention the initial state. 294 For each octet, create a DNS name using the hex code of the current 295 octet, the current state and _rmailbox.domain. If this is not the 296 last octet in the local-part, look up a DFA record to find the next 297 state. If the DFA record is found, use its value as the next state 298 and advance to the next octet. If there is no DFA record, stop, 299 there is no key for this name. 301 If this is the last octet of the local-part, look up whatever key 302 record is desired. If it's found, it's the key for the local-part. 303 If not, there is no key. 305 As a minor optimization, state number 65535 in a DFA record means a 306 trailing wildcard that matches the rest of the local-part. This 307 permits more efficient matching of the common extension idioms such 308 as "bob+.*" without having to iterate through the octets in the 309 extension. If a retrieved DFA record contains 65535, the name 310 matched so the client fetches the key record at the same name. 312 6. Pointer to server 314 Rather than trying to encode local-parts into the DNS, publish a 315 pointer to a per-domain web server that can provide the keys, 316 identified by URI RR [RFC7553]. Each key type will have to register 317 a new enumservice [RFC6117] type for naming the URI record, e.g.: 319 _smimecert._smtp.example.com URI 0 0 "https://keyserver.example.com/ 320 smimecerts" 322 The URI has to be https, with the name suitably verified by TLSA 323 certificates. To find a key, take the URI, add "?mailbox={escmbx}" 324 where {escmbx} is the full ASCII or UTF-8 mailbox name suitably hex 325 escaped for a URI, and fetch it. The server will either return a 326 result as application/pgp-keys or application/pkix-cert or other 327 appropriate type or a 4xx status if there is no key available. 329 This is certainly slower than a single DNS lookup, but it's 330 comparable to the sequence of lookups for the DFA encoding, and it's 331 about the same speed as the subsequent SMTP session to send a 332 message, so it's probably fast enough. 334 7. Scaling Issues 336 Mail systems vary from tiny home systems with a handful of users to 337 giant public systems with hundreds of millions of users. Signing and 338 publishing a zone with one key per user for a large mail system would 339 likely exceed the capacity of much DNS software. For comparison, the 340 largest signed zone as of mid-2015 is probably the .COM TLD, with 341 about 280 million records and 117 million names. Considering the 342 large size of key records, a zone with one key per user for a large 343 mail system could easily be an order of magnitude larger. Hence, any 344 approach that requires putting all of the keys into a static signed 345 zone is unikely to be practical at scale. 347 With this in mind, the more promising approaches appear to be encoded 348 names (Section 4), which offers the possibility of responses 349 generated from the underlying database on the fly, or pointer to 350 server (Section 6) going directly to a web service. 352 8. Security Considerations 354 Some approaches may make it somewhat easier to extract valid local 355 parts for a domain. The All Names Valid option makes name searches 356 unproductive. 358 The regular expression representation is difficult to reverse 359 engineer. With NSEC records it's possible to recover the DFA and in 360 principle to translate it back into a large regular expression, but 361 there's no efficient way to take the regular expression and extract a 362 useful set of distinct names. (It's easy to enumerate lots of 363 variants of the same name, which is not useful to spammers since a 364 blast of mail to the same recipient is typically shut down in moments 365 by bulk counters.) 367 All of the usual attacks against DNS servers are likely to occur. 368 The usual techniques for mitigating them should work. Many queries 369 will cache poorly, but probably no worse than rDNS or DNSBL queries 370 do now. 372 If PGP or S/MIME keys are published in the DNS, it is unclear what 373 security assertions the publishing server is making about them. The 374 server would presumably be saying this is the key for mailbox so-and- 375 so, but S/MIME and PGP have historically tried to bind keys to users 376 or organizations, not just mailboxes. 378 9. References 380 9.1. Normative References 382 [RFC5321] Klensin, J., "Simple Mail Transfer Protocol", RFC 5321, 383 DOI 10.17487/RFC5321, October 2008, 384 . 386 [RFC6117] Hoeneisen, B., Mayrhofer, A., and J. Livingood, "IANA 387 Registration of Enumservices: Guide, Template, and IANA 388 Considerations", RFC 6117, DOI 10.17487/RFC6117, March 389 2011, . 391 [RFC6532] Yang, A., Steele, S., and N. Freed, "Internationalized 392 Email Headers", RFC 6532, DOI 10.17487/RFC6532, February 393 2012, . 395 [RFC7553] Faltstrom, P. and O. Kolkman, "The Uniform Resource 396 Identifier (URI) DNS Resource Record", RFC 7553, DOI 397 10.17487/RFC7553, June 2015, 398 . 400 9.2. Informative References 402 [ASU86] Aho, A., Sethi, R., and J. Ullman, "Compilers: Principles, 403 Techniques, and Tools", 1986. 405 [LESK75] Lesk, M., "Lex--A Lexical Analyzer Generator", CSTR 39, 406 DOI 10.1234/567.890, 1975, 407 . 409 Author's Address 411 John Levine 412 Taughannock Networks 413 PO Box 727 414 Trumansburg, NY 14886 416 Phone: +1 831 480 2300 417 Email: standards@taugh.com 418 URI: http://jl.ly