idnits 2.17.1 draft-newman-url-imap-09.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-20) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity. ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Introduction section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 18 instances of too long lines in the document, the longest one being 7 characters in excess of 72. ** The abstract seems to contain references ([IMAP4]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords -- however, there's a paragraph with a matching beginning. Boilerplate error? (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (May 1997) is 9837 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '256' on line 639 -- Looks like a reference, but probably isn't: '6' on line 547 ** Obsolete normative reference: RFC 1738 (ref. 'BASIC-URL') (Obsoleted by RFC 4248, RFC 4266) ** Obsolete normative reference: RFC 2060 (ref. 'IMAP4') (Obsoleted by RFC 3501) ** Obsolete normative reference: RFC 2068 (ref. 'HTTP') (Obsoleted by RFC 2616) ** Obsolete normative reference: RFC 822 (ref. 'IMAIL') (Obsoleted by RFC 2822) ** Obsolete normative reference: RFC 1808 (ref. 'REL-URL') (Obsoleted by RFC 3986) ** Obsolete normative reference: RFC 2044 (ref. 'UTF8') (Obsoleted by RFC 2279) Summary: 17 errors (**), 0 flaws (~~), 2 warnings (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group C. Newman 3 Internet Draft: IMAP URL Scheme Innosoft 4 Document: draft-newman-url-imap-09.txt May 1997 5 Expires in six months 7 IMAP URL Scheme 9 Status of this memo 11 This document is an Internet Draft. Internet Drafts are working 12 documents of the Internet Engineering Task Force (IETF), its Areas, 13 and its Working Groups. Note that other groups may also distribute 14 working documents as Internet Drafts. 16 Internet Drafts are draft documents valid for a maximum of six 17 months. Internet Drafts may be updated, replaced, or obsoleted by 18 other documents at any time. It is not appropriate to use Internet 19 Drafts as reference material or to cite them other than as a 20 ``working draft'' or ``work in progress``. 22 To learn the current status of any Internet-Draft, please check the 23 1id-abstracts.txt listing contained in the Internet-Drafts Shadow 24 Directories on ds.internic.net, nic.nordu.net, ftp.isi.edu, or 25 munnari.oz.au. 27 A revised version of this draft document will be submitted to the 28 RFC editor as a Proposed Standard for the Internet Community. 29 Discussion and suggestions for improvement are requested. This 30 document will expire six months after publication. Distribution of 31 this draft is unlimited. 33 Abstract 35 IMAP [IMAP4] is a rich protocol for accessing remote message 36 stores. It provides an ideal mechanism for accessing public 37 mailing list archives as well as private and shared message stores. 38 This document defines a URL scheme for referencing objects on an 39 IMAP server. 41 1. Conventions used in this document 43 The key words "MUST", "MUST NOT", "SHOULD", "SHOULD NOT", and "MAY" 44 in this document are to be interpreted as defined in "Key words for 45 use in RFCs to Indicate Requirement Levels" [KEYWORDS]. 47 2. IMAP scheme 49 The IMAP URL scheme is used to designate IMAP servers, mailboxes, 50 messages, MIME bodies [MIME], and search programs on Internet hosts 51 accessible using the IMAP protocol. 53 The IMAP URL follows the common Internet scheme syntax as defined 54 in RFC 1738 [BASIC-URL] except that clear text passwords are not 55 permitted. If : is omitted, the port defaults to 143. 57 An IMAP URL takes one of the following forms: 59 imap:/// 60 imap:///;TYPE= 61 imap:///[uidvalidity][?] 62 imap:///[uidvalidity][isection] 64 The first form is used to refer to an IMAP server, the second form 65 refers to a list of mailboxes, the third form refers to the 66 contents of a mailbox or a set of messages resulting from a search, 67 and the final form refers to a specific message or message part. 68 Note that the syntax here is informal. The authoritative formal 69 syntax for IMAP URLs is defined in section 11. 71 3. IMAP User Name and Authentication Mechanism 73 A user name and/or authentication mechanism may be supplied. They 74 are used in the "LOGIN" or "AUTHENTICATE" commands after making the 75 connection to the IMAP server. If no user name or authentication 76 mechanism is supplied, the user name "anonymous" is used with the 77 "LOGIN" command and the password is supplied as the Internet e-mail 78 address of the end user accessing the resource. If the URL doesn't 79 supply a user name, the program interpreting the IMAP URL SHOULD 80 request one from the user if necessary. 82 An authentication mechanism can be expressed by adding 83 ";AUTH=" to the end of the user name. When such an 84 is indicated, the client SHOULD request appropriate 85 credentials from that mechanism and use the "AUTHENTICATE" command 86 instead of the "LOGIN" command. If no user name is specified, one 87 SHOULD be obtained from the mechanism or requested from the user as 88 appropriate. 90 The string ";AUTH=*" indicates that the client SHOULD select an 91 appropriate authentication mechanism. It MAY use any mechanism 92 listed in the CAPABILITY command or use an out of band security 93 service resulting in a PREAUTH connection. If no user name is 94 specified and no appropriate authentication mechanisms are 95 available, the client SHOULD fall back to anonymous login as 96 described above. This allows a URL which grants read-write access 97 to authorized users, and read-only anonymous access to other users. 99 If a user name is included with no authentication mechanism, then 100 ";AUTH=*" is assumed. 102 Since URLs can easily come from untrusted sources, care must be 103 taken when resolving a URL which requires or requests any sort of 104 authentication. If authentication credentials are supplied to the 105 wrong server, it may compromise the security of the user's account. 106 The program resolving the URL should make sure it meets at least 107 one of the following criteria in this case: 109 (1) The URL comes from a trusted source, such as a referral server 110 which the client has validated and trusts according to site policy. 111 Note that user entry of the URL may or may not count as a trusted 112 source, depending on the experience level of the user and site 113 policy. 114 (2) Explicit local site policy permits the client to connect to the 115 server in the URL. For example, if the client knows the site 116 domain name, site policy may dictate that any hostname ending in 117 that domain is trusted. 118 (3) The user confirms that connecting to that domain name with the 119 specified credentials and/or mechanism is permitted. 120 (4) A mechanism is used which validates the server before passing 121 potentially compromising client credentials. 122 (5) An authentication mechanism is used which will not reveal 123 information to the server which could be used to compromise future 124 connections. 126 URLs which do not include a user name must be treated with extra 127 care, since they are more likely to compromise the user's primary 128 account. A URL containing ";AUTH=*" must also be treated with 129 extra care since it might fall back on a weaker security mechanism. 130 Finally, clients are discouraged from using a plain text password 131 as a fallback with ";AUTH=*" unless the connection has strong 132 encryption (e.g. a key length of greater than 56 bits). 134 Note that if unsafe or reserved characters such as " " or ";" are 135 present in the user name or authentication mechanism, they MUST be 136 encoded as described in RFC 1738 [BASIC-URL]. 138 4. IMAP server 140 An IMAP URL referring to an IMAP server has the following form: 142 imap:/// 144 A program interpreting this URL would issue the standard set of 145 commands it uses to present a view of the contents of an IMAP 146 server. This is likely to be semanticly equivalent to one of the 147 following URLs: 149 imap:///;TYPE=LIST 150 imap:///;TYPE=LSUB 152 The program interpreting this URL SHOULD use the LSUB form if it 153 supports mailbox subscriptions. 155 5. Lists of mailboxes 157 An IMAP URL referring to a list of mailboxes has the following 158 form: 160 imap:///;TYPE= 162 The may be either "LIST" or "LSUB", and is case 163 insensitive. The field ";TYPE=" MUST be included. 165 The is any argument suitable for the 166 list_mailbox field of the IMAP [IMAP4] LIST or LSUB commands. The 167 field may be omitted, in which case the program 168 interpreting the IMAP URL may use "*" or "%" as the 169 . The program SHOULD use "%" if it supports a 170 hierarchical view, otherwise it SHOULD use "*". 172 Note that if unsafe or reserved characters such as " " or "%" are 173 present in they MUST be encoded as described in 174 RFC 1738 [BASIC-URL]. If the character "/" is present in 175 enc_list_mailbox, it SHOULD NOT be encoded. 177 6. Lists of messages 179 An IMAP URL referring to a list of messages has the following form: 181 imap:///[uidvalidity][?] 183 The field is used as the argument to the IMAP4 184 "SELECT" command. Note that if unsafe or reserved characters such 185 as " ", ";", or "?" are present in they MUST be 186 encoded as described in RFC 1738 [BASIC-URL]. If the character "/" 187 is present in enc_mailbox, it SHOULD NOT be encoded. 189 The [uidvalidity] field is optional. If it is present, it MUST be 190 the argument to the IMAP4 UIDVALIDITY status response at the time 191 the URL was created. This SHOULD be used by the program 192 interpreting the IMAP URL to determine if the URL is stale. 194 The [?] field is optional. If it is not present, the 195 contents of the mailbox SHOULD be presented by the program 196 interpreting the URL. If it is present, it SHOULD be used as the 197 arguments following an IMAP4 SEARCH command with unsafe characters 198 such as " " (which are likely to be present in the ) 199 encoded as described in RFC 1738 [BASIC-URL]. 201 7. A specific message or message part 203 An IMAP URL referring to a specific message or message part has the 204 following form: 206 imap:///[uidvalidity][isection] 208 The and [uidvalidity] are as defined above. 210 If [uidvalidity] is present in this form, it SHOULD be used by the 211 program interpreting the URL to determine if the URL is stale. 213 The refers to an IMAP4 message UID, and SHOULD be used as 214 the argument to the IMAP4 "UID FETCH" command. 216 The [isection] field is optional. If not present, the URL refers 217 to the entire Internet message as returned by the IMAP command "UID 218 FETCH BODY.PEEK[]". If present, the URL refers to the object 219 returned by a "UID FETCH BODY.PEEK[
]" command. The 220 type of the object may be determined with a "UID FETCH 221 BODYSTRUCTURE" command and locating the appropriate part in the 222 resulting BODYSTRUCTURE. Note that unsafe characters in [isection] 223 MUST be encoded as described in [BASIC-URL]. 225 8. Relative IMAP URLs 227 Relative IMAP URLs are permitted and are resolved according to the 228 rules defined in RFC 1808 [REL-URL] with one exception. In IMAP 229 URLs, parameters are treated as part of the normal path with 230 respect to relative URL resolution. This is believed to be the 231 behavior of the installed base and is likely to be documented in a 232 future revision of the relative URL specification. 234 The following observations are also important: 236 The grammar element is considered part of the user name for 237 purposes of resolving relative IMAP URLs. This means that unless a 238 new login/server specification is included in the relative URL, the 239 authentication mechanism is inherited from a base IMAP URL. 241 URLs always use "/" as the hierarchy delimiter for the purpose of 242 resolving paths in relative URLs. IMAP4 permits the use of any 243 hierarchy delimiter in mailbox names. For this reason, relative 244 mailbox paths will only work if the mailbox uses "/" as the 245 hierarchy delimiter. Relative URLs may be used on mailboxes which 246 use other delimiters, but in that case, the entire mailbox name 247 MUST be specified in the relative URL or inherited as a whole from 248 the base URL. 250 The base URL for a list of mailboxes or messages which was referred 251 to by an IMAP URL is always the referring IMAP URL itself. The 252 base URL for a message or message part which was referred to by an 253 IMAP URL may be more complicated to determine. The program 254 interpreting the relative URL will have to check the headers of the 255 MIME entity and any enclosing MIME entities in order to locate the 256 "Content-Base" and "Content-Location" headers. These headers are 257 used to determine the base URL as defined in [HTTP]. For example, 258 if the referring IMAP URL contains a "/;SECTION=1.2" parameter, 259 then the MIME headers for section 1.2, for section 1, and for the 260 enclosing message itself SHOULD be checked in that order for 261 "Content-Base" or "Content-Location" headers. 263 9. Multinational Considerations 265 IMAP4 [IMAP4] section 5.1.3 includes a convention for encoding 266 non-US-ASCII characters in IMAP mailbox names. Because this 267 convention is private to IMAP, it is necessary to convert IMAP's 268 encoding to one that can be more easily interpreted by a URL 269 display program. For this reason, IMAP's modified UTF-7 encoding 270 for mailboxes MUST be converted to UTF-8 [UTF8]. Since 8-bit 271 characters are not permitted in URLs, the UTF-8 characters are 272 encoded as required by the URL specification [BASIC-URL]. Sample 273 code is included in Appendix A to demonstrate this conversion. 275 10. Examples 277 The following examples demonstrate how an IMAP4 client program 278 might translate various IMAP4 URLs into a series of IMAP4 commands. 279 Commands sent from the client to the server are prefixed with "C:", 280 and responses sent from the server to the client are prefixed with 281 "S:". 283 The URL: 285 287 Results in the following client commands: 289 290 C: A001 LOGIN ANONYMOUS sheridan@babylon5.org 291 C: A002 SELECT gray-council 292 293 C: A003 UID FETCH 20 BODY.PEEK[] 295 The URL: 297 299 Results in the following client commands: 301 302 303 C: A001 LOGIN MICHAEL zipper 304 C: A002 LIST "" users.* 306 The URL: 308 310 Results in the following client commands: 312 313 C: A001 LOGIN ANONYMOUS bester@psycop.psy.earth 314 C: A002 SELECT ~peter/&ZeVnLIqe-/&U,BTFw- 315 317 The URL: 319 321 Results in the following client commands: 323 324 C: A001 AUTHENTICATE KERBEROS_V4 325 326 C: A002 SELECT gray-council 327 C: A003 UID FETCH 20 BODY.PEEK[1.2] 329 If the following relative URL is located in that body part: 331 <;section=1.4> 333 This could result in the following client commands: 335 C: A004 UID FETCH 20 (BODY.PEEK[1.2.MIME] 336 BODY.PEEK[1.MIME] 337 BODY.PEEK[HEADER.FIELDS (Content-Base Content-Location)]) 338 340 C: A005 UID FETCH 20 BODY.PEEK[1.4] 342 The URL: 344 346 Could result in the following: 348 349 C: A001 CAPABILITY 350 S: * CAPABILITY IMAP4rev1 AUTH=GSSAPI 351 S: A001 OK 352 C: A002 AUTHENTICATE GSSAPI 353 354 S: A002 OK user lennier authenticated 355 C: A003 SELECT "gray council" 356 ... 357 C: A004 SEARCH SUBJECT shadows 358 S: * SEARCH 8 10 13 14 15 16 359 S: A004 OK SEARCH completed 360 C: A005 FETCH 8,10,13:16 ALL 361 ... 363 NOTE: In this final example, the client has implementation dependent 364 choices. The authentication mechanism could be anything, including 365 PREAUTH. And the final FETCH command could fetch more or less 366 information about the messages, depending on what it wishes to display 367 to the user. 369 11. Security Considerations 371 Security considerations discussed in the IMAP specification [IMAP4] 372 and the URL specification [BASIC-URL] are relevant. Security 373 considerations related to authenticated URLs are discussed in 374 section 3 of this document. 376 Many email clients store the plain text password for later use 377 after logging into an IMAP server. Such clients MUST NOT use a 378 stored password in response to an IMAP URL without explicit 379 permission from the user to supply that password to the specified 380 host name. 382 12. ABNF for IMAP URL scheme 384 This uses ABNF as defined in RFC 822 [IMAIL]. Terminals from the 385 BNF for IMAP [IMAP4] and URLs [BASIC-URL] are also used. Strings 386 are not case sensitive and free insertion of linear-white-space is 387 not permitted. 389 achar = uchar / "&" / "=" / "~" 390 ; see [BASIC-URL] for "uchar" definition 392 bchar = achar / ":" / "@" / "/" 394 enc_auth_type = 1*achar 395 ; encoded version of [IMAP-AUTH] "auth_type" 397 enc_list_mailbox = 1*bchar 398 ; encoded version of [IMAP4] "list_mailbox" 400 enc_mailbox = 1*bchar 401 ; encoded version of [IMAP4] "mailbox" 403 enc_search = 1*bchar 404 ; encoded version of search_program below 406 enc_section = 1*bchar 407 ; encoded version of section below 409 enc_user = 1*achar 410 ; encoded version of [IMAP4] "userid" 412 imapurl = "imap://" iserver "/" [ icommand ] 414 iauth = ";AUTH=" ( "*" / enc_auth_type ) 416 icommand = imailboxlist / ipath / isearch 418 imailboxlist = [enc_list_mailbox] ";TYPE=" list_type 420 ipath = enc_mailbox [uidvalidity] iuid [isection] 422 isearch = enc_mailbox [ "?" enc_search ] [uidvalidity] 424 isection = "/;SECTION=" enc_section 426 iserver = [iuserauth "@"] hostport 427 ; See [BASIC-URL] for "hostport" definition 429 iuid = "/;UID=" nz_number 430 ; See [IMAP4] for "nz_number" definition 432 iuserauth = enc_user [iauth] / [enc_user] iauth 434 list_type = "LIST" / "LSUB" 436 search_program = ["CHARSET" SPACE astring SPACE] 1#search_key 437 ; IMAP4 literals may not be used 438 ; See [IMAP4] for "astring" and "search_key" 440 section = section_text / (nz_number *["." nz_number] 441 ["." (section_text / "MIME")]) 442 ; See [IMAP4] for "section_text" and "nz_number" 444 uidvalidity = ";UIDVALIDITY=" nz_number 445 ; See [IMAP4] for "nz_number" definition 447 13. References 449 [BASIC-URL] Berners-Lee, Masinter, McCahill, "Uniform Resource 450 Locators (URL)", RFC 1738, CERN, Xerox Corporation, University of 451 Minnesota, December 1994. 453 455 [IMAP4] Crispin, M., "Internet Message Access Protocol - Version 456 4rev1", RFC 2060, University of Washington, December 1996. 458 460 [IMAP-AUTH] Myers, J., "IMAP4 Authentication Mechanism", RFC 1731, 461 Carnegie-Mellon University, December 1994. 463 465 [HTTP] Fielding, Gettys, Mogul, Frystyk, Berners-Lee, "Hypertext 466 Transfer Protocol -- HTTP/1.1", RFC 2068, UC Irvine, DEC, MIT/LCS, 467 January 1997. 469 471 [IMAIL] Crocker, "Standard for the Format of ARPA Internet Text 472 Messages", STD 11, RFC 822, University of Delaware, August 1982. 474 476 [KEYWORDS] Bradner, "Key words for use in RFCs to Indicate 477 Requirement Levels", RFC 2119, Harvard University, March 1997. 479 481 [MIME] Freed, N., Borenstein, N., "Multipurpose Internet Mail 482 Extensions", RFC 2045, Innosoft, First Virtual, November 1996. 484 486 [REL-URL] Fielding, "Relative Uniform Resource Locators", RFC 1808, 487 UC Irvine, June 1995. 489 491 [UTF8] Yergeau, F. "UTF-8, a transformation format of Unicode and 492 ISO 10646", RFC 2044, Alis Technologies, October 1996. 494 496 14. Author's Address 498 Chris Newman 499 Innosoft International, Inc. 500 1050 East Garvey Ave. South 501 West Covina, CA 91790 USA 503 Email: chris.newman@innosoft.com 505 Appendix A. Sample code 507 Here is sample C source code to convert between URL paths and IMAP 508 mailbox names, taking into account mapping between IMAP's modified UTF-7 509 [IMAP4] and hex-encoded UTF-8 which is more appropriate for URLs. This 510 code has not been rigorously tested nor does it necessarily behave 511 reasonably with invalid input, but it should serve as a useful example. 512 This code just converts the mailbox portion of the URL and does not deal 513 with parameters, query or server components of the URL. 515 #include 516 #include 518 /* hexadecimal lookup table */ 519 static char hex[] = "0123456789ABCDEF"; 521 /* URL unsafe printable characters */ 522 static char urlunsafe[] = " \"#%&+:;<=>?@[\\]^`{|}"; 524 /* UTF7 modified base64 alphabet */ 525 static char base64chars[] = 526 "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+,"; 527 #define UNDEFINED 64 529 /* UTF16 definitions */ 530 #define UTF16MASK 0x03FFUL 531 #define UTF16SHIFT 10 532 #define UTF16HIGHSTART 0xD800UL 533 #define UTF16HIGHEND 0xDBFFUL 534 #define UTF16LOSTART 0xDC00UL 535 #define UTF16LOEND 0xDFFFUL 537 /* Convert an IMAP mailbox to a URL path 538 * dst needs to have roughly 4 times the storage space of src 539 * Hex encoding can triple the size of the input 540 * UTF-7 can be slightly denser than UTF-8 541 * (worst case: 8 octets UTF-7 becomes 9 octets UTF-8) 542 */ 543 void MailboxToURL(char *dst, char *src) 544 { 545 unsigned char c, i, bitcount; 546 unsigned long ucs4, utf16, bitbuf; 547 unsigned char base64[256], utf8[6]; 549 /* initialize modified base64 decoding table */ 550 memset(base64, UNDEFINED, sizeof (base64)); 551 for (i = 0; i < sizeof (base64chars); ++i) { 552 base64[base64chars[i]] = i; 554 } 556 /* loop until end of string */ 557 while (*src != '\0') { 558 c = *src++; 559 /* deal with literal characters and &- */ 560 if (c != '&' || *src == '-') { 561 if (c < ' ' || c > '~' || strchr(urlunsafe, c) != NULL) { 562 /* hex encode if necessary */ 563 dst[0] = '%'; 564 dst[1] = hex[c >> 4]; 565 dst[2] = hex[c & 0x0f]; 566 dst += 3; 567 } else { 568 /* encode literally */ 569 *dst++ = c; 570 } 571 /* skip over the '-' if this is an &- sequence */ 572 if (c == '&') ++src; 573 } else { 574 /* convert modified UTF-7 -> UTF-16 -> UCS-4 -> UTF-8 -> HEX */ 575 bitbuf = 0; 576 bitcount = 0; 577 ucs4 = 0; 578 while ((c = base64[(unsigned char) *src]) != UNDEFINED) { 579 ++src; 580 bitbuf = (bitbuf << 6) | c; 581 bitcount += 6; 582 /* enough bits for a UTF-16 character? */ 583 if (bitcount >= 16) { 584 bitcount -= 16; 585 utf16 = (bitcount ? bitbuf >> bitcount : bitbuf) & 0xffff; 586 /* convert UTF16 to UCS4 */ 587 if (utf16 >= UTF16HIGHSTART && utf16 <= UTF16HIGHEND) { 588 ucs4 = (utf16 & UTF16MASK) << UTF16SHIFT; 589 continue; 590 } else if (utf16 >= UTF16LOSTART && utf16 <= UTF16LOEND) { 591 ucs4 |= utf16 & UTF16MASK; 592 } else { 593 ucs4 = utf16; 594 } 595 /* convert UTF-16 range of UCS4 to UTF-8 */ 596 if (ucs4 <= 0x7fUL) { 597 utf8[0] = ucs4; 598 i = 1; 599 } else if (ucs4 <= 0x7ffUL) { 600 utf8[0] = 0xc0 | (ucs4 >> 6); 601 utf8[1] = 0x80 | (ucs4 & 0x3f); 602 i = 2; 603 } else if (ucs4 <= 0xffffUL) { 604 utf8[0] = 0xe0 | (ucs4 >> 12); 605 utf8[1] = 0x80 | ((ucs4 >> 6) & 0x3f); 606 utf8[2] = 0x80 | (ucs4 & 0x3f); 607 i = 3; 608 } else { 609 utf8[0] = 0xf0 | (ucs4 >> 18); 610 utf8[1] = 0x80 | ((ucs4 >> 12) & 0x3f); 611 utf8[2] = 0x80 | ((ucs4 >> 6) & 0x3f); 612 utf8[3] = 0x80 | (ucs4 & 0x3f); 613 i = 4; 614 } 615 /* convert utf8 to hex */ 616 for (c = 0; c < i; ++c) { 617 dst[0] = '%'; 618 dst[1] = hex[utf8[c] >> 4]; 619 dst[2] = hex[utf8[c] & 0x0f]; 620 dst += 3; 621 } 622 } 623 } 624 /* skip over trailing '-' in modified UTF-7 encoding */ 625 if (*src == '-') ++src; 626 } 627 } 628 /* terminate destination string */ 629 *dst = '\0'; 630 } 632 /* Convert hex coded UTF-8 URL path to modified UTF-7 IMAP mailbox 633 * dst should be about twice the length of src to deal with non-hex coded URLs 634 */ 635 void URLtoMailbox(char *dst, char *src) 636 { 637 unsigned int utf8pos, utf8total, i, c, utf7mode, bitstogo, utf16flag; 638 unsigned long ucs4, bitbuf; 639 unsigned char hextab[256]; 641 /* initialize hex lookup table */ 642 memset(hextab, 0, sizeof (hextab)); 643 for (i = 0; i < sizeof (hex); ++i) { 644 hextab[hex[i]] = i; 645 if (isupper(hex[i])) hextab[tolower(hex[i])] = i; 646 } 648 utf7mode = 0; 649 utf8total = 0; 650 bitstogo = 0; 651 while ((c = *src) != '\0') { 652 ++src; 653 /* undo hex-encoding */ 654 if (c == '%' && src[0] != '\0' && src[1] != '\0') { 655 c = (hextab[src[0]] << 4) | hextab[src[1]]; 656 src += 2; 657 } 658 /* normal character? */ 659 if (c >= ' ' && c <= '~') { 660 /* switch out of UTF-7 mode */ 661 if (utf7mode) { 662 if (bitstogo) { 663 *dst++ = base64chars[(bitbuf << (6 - bitstogo)) & 0x3F]; 664 } 665 *dst++ = '-'; 666 utf7mode = 0; 667 } 668 *dst++ = c; 669 /* encode '&' as '&-' */ 670 if (c == '&') { 671 *dst++ = '-'; 672 } 673 continue; 674 } 675 /* switch to UTF-7 mode */ 676 if (!utf7mode) { 677 *dst++ = '&'; 678 utf7mode = 1; 679 } 680 /* Encode US-ASCII characters as themselves */ 681 if (c < 0x80) { 682 ucs4 = c; 683 utf8total = 1; 684 } else if (utf8total) { 685 /* save UTF8 bits into UCS4 */ 686 ucs4 = (ucs4 << 6) | (c & 0x3FUL); 687 if (++utf8pos < utf8total) { 688 continue; 689 } 690 } else { 691 utf8pos = 1; 692 if (c < 0xE0) { 693 utf8total = 2; 694 ucs4 = c & 0x1F; 695 } else if (c < 0xF0) { 696 utf8total = 3; 697 ucs4 = c & 0x0F; 699 } else { 700 /* NOTE: can't convert UTF8 sequences longer than 4 */ 701 utf8total = 4; 702 ucs4 = c & 0x03; 703 } 704 continue; 705 } 706 /* loop to split ucs4 into two utf16 chars if necessary */ 707 utf8total = 0; 708 do { 709 if (ucs4 > 0xffffUL) { 710 bitbuf = (bitbuf << 16) | ((ucs4 >> UTF16SHIFT) 711 + UTF16HIGHSTART); 712 ucs4 = (ucs4 & UTF16MASK) + UTF16LOSTART; 713 utf16flag = 1; 714 } else { 715 bitbuf = (bitbuf << 16) | ucs4; 716 utf16flag = 0; 717 } 718 bitstogo += 16; 719 /* spew out base64 */ 720 while (bitstogo >= 6) { 721 bitstogo -= 6; 722 *dst++ = base64chars[(bitstogo ? (bitbuf >> bitstogo) : bitbuf) 723 & 0x3F]; 724 } 725 } while (utf16flag); 726 } 727 /* if in UTF-7 mode, finish in ASCII */ 728 if (utf7mode) { 729 if (bitstogo) { 730 *dst++ = base64chars[(bitbuf << (6 - bitstogo)) & 0x3F]; 731 } 732 *dst++ = '-'; 733 } 734 /* tie off string */ 735 *dst = '\0'; 736 }