idnits 2.17.1 draft-newman-url-imap-08.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-23) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity. ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Introduction section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** There are 17 instances of too long lines in the document, the longest one being 7 characters in excess of 72. ** The abstract seems to contain references ([IMAP4]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords -- however, there's a paragraph with a matching beginning. Boilerplate error? RFC 2119 keyword, line 77: '...ogram interpreting the IMAP URL SHOULD...' RFC 2119 keyword, line 82: '...ated, the client SHOULD request approp...' RFC 2119 keyword, line 85: '... SHOULD be obtained from the mecha...' RFC 2119 keyword, line 88: '...cates that the client SHOULD select an...' RFC 2119 keyword, line 89: '...n mechanism. It MAY use any mechanism...' (23 more instances...) Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (May 1997) is 9840 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Missing reference section? 'IMAP4' on line 474 looks like a reference -- Missing reference section? 'KEYWORDS' on line 423 looks like a reference -- Missing reference section? 'MIME' on line 428 looks like a reference -- Missing reference section? 'BASIC-URL' on line 446 looks like a reference -- Missing reference section? 'REL-URL' on line 433 looks like a reference -- Missing reference section? 'HTTP' on line 412 looks like a reference -- Missing reference section? 'UTF8' on line 438 looks like a reference -- Missing reference section? 'IMAIL' on line 418 looks like a reference -- Missing reference section? 'IMAP-AUTH' on line 407 looks like a reference -- Missing reference section? '256' on line 604 looks like a reference -- Missing reference section? '6' on line 512 looks like a reference Summary: 11 errors (**), 0 flaws (~~), 1 warning (==), 14 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group C. Newman 3 Internet Draft: IMAP URL Scheme Innosoft 4 Document: draft-newman-url-imap-08.txt May 1997 5 Expires in six months 7 IMAP URL Scheme 9 Status of this memo 11 This document is an Internet Draft. Internet Drafts are working 12 documents of the Internet Engineering Task Force (IETF), its Areas, 13 and its Working Groups. Note that other groups may also distribute 14 working documents as Internet Drafts. 16 Internet Drafts are draft documents valid for a maximum of six 17 months. Internet Drafts may be updated, replaced, or obsoleted by 18 other documents at any time. It is not appropriate to use Internet 19 Drafts as reference material or to cite them other than as a 20 ``working draft'' or ``work in progress``. 22 To learn the current status of any Internet-Draft, please check the 23 1id-abstracts.txt listing contained in the Internet-Drafts Shadow 24 Directories on ds.internic.net, nic.nordu.net, ftp.isi.edu, or 25 munnari.oz.au. 27 A revised version of this draft document will be submitted to the 28 RFC editor as a Proposed Standard for the Internet Community. 29 Discussion and suggestions for improvement are requested. This 30 document will expire six months after publication. Distribution of 31 this draft is unlimited. 33 Abstract 35 IMAP [IMAP4] is a rich protocol for accessing remote message 36 stores. It provides an ideal mechanism for accessing public 37 mailing list archives as well as private and shared message stores. 38 This document defines a URL scheme for referencing objects on an 39 IMAP server. 41 1. Conventions used in this document 43 The key words "MUST", "MUST NOT", "SHOULD", "SHOULD NOT", and "MAY" 44 in this document are to be interpreted as defined in "Key words for 45 use in RFCs to Indicate Requirement Levels" [KEYWORDS]. 47 2. IMAP scheme 49 The IMAP URL scheme is used to designate IMAP servers, mailboxes, 50 messages, MIME bodies [MIME], and search programs on Internet hosts 51 accessible using the IMAP protocol. 53 The IMAP URL follows the common Internet scheme syntax as defined 54 in RFC 1738 [BASIC-URL] except that clear text passwords are not 55 permitted. If : is omitted, the port defaults to 143. 57 An IMAP URL takes one of the following forms: 59 imap:/// 60 imap:///;TYPE= 61 imap:///[uidvalidity][?] 62 imap:///[uidvalidity][isection] 64 The first form is used to refer to an IMAP server, the second form 65 refers to a list of mailboxes, the third form refers to the 66 contents of a mailbox or a set of messages resulting from a search, 67 and the final form refers to a specific message or message part. 69 3. IMAP User Name and Authentication Mechanism 71 A user name and/or authentication mechanism may be supplied. They 72 are used in the "LOGIN" or "AUTHENTICATE" commands after making the 73 connection to the IMAP server. If no user name or authentication 74 mechanism is supplied, the user name "anonymous" is used with the 75 "LOGIN" command and the password is supplied as the Internet e-mail 76 address of the end user accessing the resource. If the URL 77 supplies a user name, the program interpreting the IMAP URL SHOULD 78 request one from the user if necessary. 80 An authentication mechanism can be expressed by adding 81 ";AUTH=" to the end of the user name. When such an 82 is indicated, the client SHOULD request appropriate 83 credentials from that mechanism and use the "AUTHENTICATE" command 84 instead of the "LOGIN" command. If no user name is specified, one 85 SHOULD be obtained from the mechanism or requested from the user as 86 appropriate. 88 The string ";AUTH=*" indicates that the client SHOULD select an 89 appropriate authentication mechanism. It MAY use any mechanism 90 listed in the CAPABILITY command or use an out of band security 91 service resulting in a PREAUTH connection. If no user name is 92 specified and no appropriate authentication mechanisms are 93 available, the client SHOULD fall back to anonymous login as 94 described above. This allows a URL which grants read-write access 95 to authorized users, and read-only anonymous access to other users. 97 Note that if unsafe or reserved characters such as " " or ";" are 98 present in the user name or authentication mechanism, they MUST be 99 encoded as described in RFC 1738 [BASIC-URL]. 101 4. IMAP server 103 An IMAP URL referring to an IMAP server has the following form: 105 imap:/// 107 A program interpreting this URL would issue the standard set of 108 commands it uses to present a view of the contents of an IMAP 109 server. It is likely to be semanticly equivalent to one of the 110 following URLs: 112 imap:///;TYPE=LIST 113 imap:///;TYPE=LSUB 115 The program interpreting this URL SHOULD use the LSUB form if it 116 supports mailbox subscriptions. 118 5. Lists of mailboxes 120 An IMAP URL referring to a list of mailboxes has the following 121 form: 123 imap:///;TYPE= 125 The may be either "LIST" or "LSUB", and is case 126 insensitive. The field ";TYPE=" MUST be included. 128 The is any argument suitable for the 129 list_mailbox field of the IMAP [IMAP4] LIST or LSUB commands. The 130 field may be omitted, in which case the program 131 interpreting the IMAP URL may use "*" or "%" as the 132 . The program SHOULD use "%" if it supports a 133 hierarchical view, otherwise it SHOULD use "*". 135 Note that if unsafe or reserved characters such as " " or "%" are 136 present in they MUST be encoded as described in 137 RFC 1738 [BASIC-URL]. If the character "/" is present in 138 enc_list_mailbox, it SHOULD NOT be encoded. 140 6. Lists of messages 142 An IMAP URL referring to a list of messages has the following form: 144 imap:///[uidvalidity][?] 146 The field is used as the argument to the IMAP4 147 "SELECT" command. Note that if unsafe or reserved characters such 148 as " ", ";", or "?" are present in they MUST be 149 encoded as described in RFC 1738 [BASIC-URL]. If the character "/" 150 is present in enc_mailbox, it SHOULD NOT be encoded. 152 The [uidvalidity] field is optional. If it is present, it MUST be 153 the argument to the IMAP4 UIDVALIDITY status response at the time 154 the URL was created. This SHOULD be used by the program 155 interpreting the IMAP URL to determine if the URL is stale. 157 The [?] field is optional. If it is not present, the 158 contents of the mailbox SHOULD be presented by the program 159 interpreting the URL. If it is present, it SHOULD be used as the 160 arguments following an IMAP4 SEARCH command with unsafe characters 161 such as " " (which are likely to be present in the ) 162 encoded as described in RFC 1738 [BASIC-URL]. 164 7. A specific message or message part 166 An IMAP URL referring to a specific message or message part has the 167 following form: 169 imap:///[uidvalidity][isection] 171 The and [uidvalidity] are as defined above. 173 If [uidvalidity] is present in this form, it SHOULD be used by the 174 program interpreting the URL to determine if the URL is stale. 176 The refers to an IMAP4 message UID, and SHOULD be used as the 177 argument to the IMAP4 "UID FETCH" command. 179 The [isection] field is optional. If not present, the URL refers 180 to the entire Internet message as returned by the IMAP command "UID 181 FETCH BODY.PEEK[]". If present, the URL refers to the object 182 returned by a "UID FETCH BODY.PEEK[
]" command. The 183 type of the object may be determined with a "UID FETCH 184 BODYSTRUCTURE" command and locating the appropriate part in the 185 resulting BODYSTRUCTURE. Note that unsafe characters in [isection] 186 MUST be encoded as described in [BASIC-URL]. 188 8. Relative IMAP URLs 190 Relative IMAP URLs are permitted and are resolved according to the 191 rules defined in RFC 1808 [REL-URL] with one exception. In IMAP 192 URLs, parameters are treated as part of the normal path with 193 respect to relative URL resolution. This is believed to be the 194 behavior of the installed base and is likely to be documented in a 195 future revision of the relative URL specification. 197 The following observations are also important: 199 The grammar element is considered part of the user name for 200 purposes of resolving relative IMAP URLs. This means that unless a 201 new login/server specification is included in the relative URL, the 202 authentication mechanism is inherited from a base IMAP URL. 204 URLs always use "/" as the hierarchy delimiter for the purpose of 205 resolving paths in relative URLs. IMAP4 permits the use of any 206 hierarchy delimiter in mailbox names. For this reason, relative 207 mailbox paths will only work if the mailbox uses "/" as the 208 hierarchy delimiter. Relative URLs may be used on mailboxes which 209 use other delimiters, but in that case, the entire mailbox name 210 MUST be specified in the relative URL or inherited as a whole from 211 the base URL. 213 The base URL for a list of mailboxes or messages which was referred 214 to by an IMAP URL is always the referring IMAP URL itself. The 215 base URL for a message or message part which was referred to by an 216 IMAP URL may be more complicated to determine. The program 217 interpreting the relative URL will have to check the headers of the 218 MIME entity and any enclosing MIME entities in order to locate the 219 "Content-Base" and "Content-Location" headers. These headers are 220 used to determine the base URL as defined in [HTTP]. For example, 221 if the referring IMAP URL contains a "/;SECTION=1.2" parameter, 222 then the MIME headers for section 1.2, for section 1, and for the 223 enclosing message itself SHOULD be checked in that order for 224 "Content-Base" or "Content-Location" headers. 226 9. Multinational Considerations 228 IMAP4 [IMAP4] section 5.1.3 includes a convention for encoding 229 non-US-ASCII characters in IMAP mailbox names. Because this 230 convention is private to IMAP, it is necessary to convert IMAP's 231 encoding to one that can be more easily interpreted by a URL 232 display program. For this reason, IMAP's modified UTF-7 encoding 233 for mailboxes MUST be converted to UTF-8 [UTF8]. Since 8-bit 234 characters are not permitted in URLs, the UTF-8 characters are 235 encoded as required by the URL specification [BASIC-URL]. Sample 236 code is included in Appendix A to demonstrate this conversion. 238 10. Examples 240 The following examples demonstrate how an IMAP4 client program 241 might translate various IMAP4 URL into a series of IMAP4 commands. 242 Commands sent from the client to the server are prefixed with "C:", 243 and responses sent from the server to the client are prefixed with 244 "S:". 246 The URL: 248 250 Results in the following client commands: 252 253 C: A001 LOGIN ANONYMOUS sheridan@babylon5.org 254 C: A002 SELECT gray-council 255 256 C: A003 UID FETCH 20 BODY.PEEK[] 258 The URL: 260 262 Results in the following client commands: 264 265 266 C: A001 LOGIN MICHAEL zipper 267 C: A002 LIST "" users.* 269 The URL: 271 273 Results in the following client commands: 275 276 C: A001 LOGIN ANONYMOUS bester@psycop.psy.earth 277 C: A002 SELECT ~peter/&ZeVnLIqe-/&U,BTFw- 278 280 The URL: 282 284 Results in the following client commands: 286 287 C: A001 AUTHENTICATE KERBEROS_V4 288 289 C: A002 SELECT gray-council 290 C: A003 UID FETCH 20 BODY.PEEK[1.2] 292 If the following relative URL is located in that body part: 294 <;section=1.4> 296 This could result in the following client commands: 298 C: A004 UID FETCH 20 (BODY.PEEK[1.2.MIME] 299 BODY.PEEK[1.MIME] 300 BODY.PEEK[HEADER.FIELDS (Content-Base Content-Location)]) 301 303 C: A005 UID FETCH 20 BODY.PEEK[1.4] 305 The URL: 307 309 Could result in the following: 311 312 C: A001 CAPABILITY 313 S: * CAPABILITY IMAP4rev1 AUTH=GSSAPI 314 S: A001 OK 315 C: A002 AUTHENTICATE GSSAPI 316 317 S: A002 OK user lennier authenticated 318 C: A003 SELECT "gray council" 319 ... 320 C: A004 SEARCH SUBJECT shadows 321 S: * SEARCH 8 10 13 14 15 16 322 S: A004 OK SEARCH completed 323 C: A005 FETCH 8,10,13:16 ALL 324 ... 326 NOTE: In this final example, the client has implementation dependent 327 choices. The authentication mechanism could be anything, including 328 PREAUTH. And the final FETCH command could fetch more or less 329 information about the messages, depending on what it wishes to display 330 to the user. 332 11. ABNF for IMAP URL scheme 334 This uses ABNF as defined in RFC 822 [IMAIL]. Terminals from the 335 BNF for IMAP [IMAP4] and URLs [BASIC-URL] are also used. Strings 336 are not case sensitive and free insertion of linear-white-space is 337 not permitted. 339 achar = uchar / "&" / "=" / "~" 340 ; see [BASIC-URL] for "uchar" definition 342 bchar = achar / ":" / "@" / "/" 344 enc_auth_type = 1*achar 345 ; encoded version of [IMAP-AUTH] "auth_type" 347 enc_list_mailbox = 1*bchar 348 ; encoded version of [IMAP4] "list_mailbox" 350 enc_mailbox = 1*bchar 351 ; encoded version of [IMAP4] "mailbox" 353 enc_search = 1*bchar 354 ; encoded version of search_program below 356 enc_section = 1*bchar 357 ; encoded version of section below 359 enc_user = *achar 360 ; encoded version of [IMAP4] "userid" 362 imapurl = "imap://" iserver "/" [ icommand ] 364 iauth = ";AUTH=" ( "*" / enc_auth_type ) 366 icommand = imailboxlist / ipath / isearch 368 imailboxlist = [enc_list_mailbox] ";TYPE=" list_type 370 ipath = enc_mailbox [uidvalidity] iuid [isection] 372 isearch = enc_mailbox [ "?" enc_search ] [uidvalidity] 374 isection = "/;SECTION=" enc_section 375 iserver = [enc_user [ iauth ] "@"] hostport 376 ; See [BASIC-URL] for "hostport" definition 378 iuid = "/;UID=" nz_number 379 ; See [IMAP4] for "nz_number" definition 381 list_type = "LIST" / "LSUB" 383 search_program = ["CHARSET" SPACE astring SPACE] 1#search_key 384 ; IMAP4 literals may not be used 385 ; See [IMAP4] for "astring" and "search_key" 387 section = section_text / (nz_number *["." nz_number] 388 ["." (section_text / "MIME")]) 389 ; See [IMAP4] for "section_text" and "nz_number" 391 uidvalidity = ";UIDVALIDITY=" nz_number 392 ; See [IMAP4] for "nz_number" definition 394 12. References 396 [BASIC-URL] Berners-Lee, Masinter, McCahill, "Uniform Resource 397 Locators (URL)", RFC 1738, CERN, Xerox Corporation, University of 398 Minnesota, December 1994. 400 402 [IMAP4] Crispin, M., "Internet Message Access Protocol - Version 403 4rev1", RFC 2060, University of Washington, December 1996. 405 407 [IMAP-AUTH] Myers, J., "IMAP4 Authentication Mechanism", RFC 1731, 408 Carnegie-Mellon University, December 1994. 410 412 [HTTP] Fielding, Gettys, Mogul, Frystyk, Berners-Lee, "Hypertext 413 Transfer Protocol -- HTTP/1.1", RFC 2068, UC Irvine, DEC, MIT/LCS, 414 January 1997. 416 418 [IMAIL] Crocker, "Standard for the Format of ARPA Internet Text 419 Messages", STD 11, RFC 822, University of Delaware, August 1982. 421 423 [KEYWORDS] Bradner, "Key words for use in RFCs to Indicate 424 Requirement Levels", RFC 2119, Harvard University, March 1997. 426 428 [MIME] Freed, N., Borenstein, N., "Multipurpose Internet Mail 429 Extensions", RFC 2045, Innosoft, First Virtual, November 1996. 431 433 [REL-URL] Fielding, "Relative Uniform Resource Locators", RFC 1808, 434 UC Irvine, June 1995. 436 438 [UTF8] Yergeau, F. "UTF-8, a transformation format of Unicode and 439 ISO 10646", RFC 2044, Alis Technologies, October 1996. 441 443 13. Security Considerations 445 Security considerations discussed in the IMAP specification [IMAP4] 446 and the URL specification [BASIC-URL] are relevant. 448 Client authors SHOULD be careful when selecting an authentication 449 mechanism if ";AUTH=*" is specified without a user name. Clients 450 SHOULD NOT fall back to the "LOGIN" command with a user other than 451 "anonymous". A client which violates this rule is vulnerable to an 452 active attacker which spoofs the server and does not declare 453 support for any AUTHENTICATE mechanisms. 455 Many email clients store the plain text password for later use 456 after logging into an IMAP server. Such clients MUST NOT use a 457 stored password in response to an IMAP URL without explicit 458 permission from the user to supply that password to the specified 459 host name. 461 14. Author's Address 463 Chris Newman 464 Innosoft International, Inc. 465 1050 East Garvey Ave. South 466 West Covina, CA 91790 USA 468 Email: chris.newman@innosoft.com 470 Appendix A. Sample code 472 Here is sample C source code to convert between URL paths and IMAP 473 mailbox names, taking into account mapping between IMAP's modified UTF-7 474 [IMAP4] and hex-encoded UTF-8 which is more appropriate for URLs. This 475 code has not been rigorously tested nor does it necessarily behave 476 reasonably with invalid input, but it should serve as a useful example. 477 This code just converts the mailbox portion of the URL and does not deal 478 with parameters, query or server components of the URL. 480 #include 481 #include 483 /* hexadecimal lookup table */ 484 static char hex[] = "0123456789ABCDEF"; 486 /* URL unsafe printable characters */ 487 static char urlunsafe[] = " \"#%&+:;<=>?@[\\]^`{|}"; 489 /* UTF7 modified base64 alphabet */ 490 static char base64chars[] = 491 "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+,"; 492 #define UNDEFINED 64 494 /* UTF16 definitions */ 495 #define UTF16MASK 0x03FFUL 496 #define UTF16SHIFT 10 497 #define UTF16HIGHSTART 0xD800UL 498 #define UTF16HIGHEND 0xDBFFUL 499 #define UTF16LOSTART 0xDC00UL 500 #define UTF16LOEND 0xDFFFUL 502 /* Convert an IMAP mailbox to a URL path 503 * dst needs to have roughly 4 times the storage space of src 504 * Hex encoding can triple the size of the input 505 * UTF-7 can be slightly denser than UTF-8 506 * (worst case: 8 octets UTF-7 becomes 9 octets UTF-8) 507 */ 508 void MailboxToURL(char *dst, char *src) 509 { 510 unsigned char c, i, bitcount; 511 unsigned long ucs4, utf16, bitbuf; 512 unsigned char base64[256], utf8[6]; 514 /* initialize modified base64 decoding table */ 515 memset(base64, UNDEFINED, sizeof (base64)); 516 for (i = 0; i < sizeof (base64chars); ++i) { 517 base64[base64chars[i]] = i; 519 } 521 /* loop until end of string */ 522 while (*src != '\0') { 523 c = *src++; 524 /* deal with literal characters and &- */ 525 if (c != '&' || *src == '-') { 526 if (c < ' ' || c > '~' || strchr(urlunsafe, c) != NULL) { 527 /* hex encode if necessary */ 528 dst[0] = '%'; 529 dst[1] = hex[c >> 4]; 530 dst[2] = hex[c & 0x0f]; 531 dst += 3; 532 } else { 533 /* encode literally */ 534 *dst++ = c; 535 } 536 /* skip over the '-' if this is an &- sequence */ 537 if (c == '&') ++src; 538 } else { 539 /* convert modified UTF-7 -> UTF-16 -> UCS-4 -> UTF-8 -> HEX */ 540 bitbuf = 0; 541 bitcount = 0; 542 ucs4 = 0; 543 while ((c = base64[(unsigned char) *src]) != UNDEFINED) { 544 ++src; 545 bitbuf = (bitbuf << 6) | c; 546 bitcount += 6; 547 /* enough bits for a UTF-16 character? */ 548 if (bitcount >= 16) { 549 bitcount -= 16; 550 utf16 = (bitcount ? bitbuf >> bitcount : bitbuf) & 0xffff; 551 /* convert UTF16 to UCS4 */ 552 if (utf16 >= UTF16HIGHSTART && utf16 <= UTF16HIGHEND) { 553 ucs4 = (utf16 & UTF16MASK) << UTF16SHIFT; 554 continue; 555 } else if (utf16 >= UTF16LOSTART && utf16 <= UTF16LOEND) { 556 ucs4 |= utf16 & UTF16MASK; 557 } else { 558 ucs4 = utf16; 559 } 560 /* convert UTF-16 range of UCS4 to UTF-8 */ 561 if (ucs4 <= 0x7fUL) { 562 utf8[0] = ucs4; 563 i = 1; 564 } else if (ucs4 <= 0x7ffUL) { 565 utf8[0] = 0xc0 | (ucs4 >> 6); 566 utf8[1] = 0x80 | (ucs4 & 0x3f); 567 i = 2; 568 } else if (ucs4 <= 0xffffUL) { 569 utf8[0] = 0xe0 | (ucs4 >> 12); 570 utf8[1] = 0x80 | ((ucs4 >> 6) & 0x3f); 571 utf8[2] = 0x80 | (ucs4 & 0x3f); 572 i = 3; 573 } else { 574 utf8[0] = 0xf0 | (ucs4 >> 18); 575 utf8[1] = 0x80 | ((ucs4 >> 12) & 0x3f); 576 utf8[2] = 0x80 | ((ucs4 >> 6) & 0x3f); 577 utf8[3] = 0x80 | (ucs4 & 0x3f); 578 i = 4; 579 } 580 /* convert utf8 to hex */ 581 for (c = 0; c < i; ++c) { 582 dst[0] = '%'; 583 dst[1] = hex[utf8[c] >> 4]; 584 dst[2] = hex[utf8[c] & 0x0f]; 585 dst += 3; 586 } 587 } 588 } 589 /* skip over trailing '-' in modified UTF-7 encoding */ 590 if (*src == '-') ++src; 591 } 592 } 593 /* terminate destination string */ 594 *dst = '\0'; 595 } 597 /* Convert hex coded UTF-8 URL path to modified UTF-7 IMAP mailbox 598 * dst should be about twice the length of src to deal with non-hex coded URLs 599 */ 600 void URLtoMailbox(char *dst, char *src) 601 { 602 unsigned int utf8pos, utf8total, i, c, utf7mode, bitstogo, utf16flag; 603 unsigned long ucs4, bitbuf; 604 unsigned char hextab[256]; 606 /* initialize hex lookup table */ 607 memset(hextab, 0, sizeof (hextab)); 608 for (i = 0; i < sizeof (hex); ++i) { 609 hextab[hex[i]] = i; 610 if (isupper(hex[i])) hextab[tolower(hex[i])] = i; 611 } 613 utf7mode = 0; 614 utf8total = 0; 615 bitstogo = 0; 616 while ((c = *src) != '\0') { 617 ++src; 618 /* undo hex-encoding */ 619 if (c == '%' && src[0] != '\0' && src[1] != '\0') { 620 c = (hextab[src[0]] << 4) | hextab[src[1]]; 621 src += 2; 622 } 623 /* normal character? */ 624 if (c >= ' ' && c <= '~') { 625 /* switch out of UTF-7 mode */ 626 if (utf7mode) { 627 if (bitstogo) { 628 *dst++ = base64chars[(bitbuf << (6 - bitstogo)) & 0x3F]; 629 } 630 *dst++ = '-'; 631 utf7mode = 0; 632 } 633 *dst++ = c; 634 /* encode '&' as '&-' */ 635 if (c == '&') { 636 *dst++ = '-'; 637 } 638 continue; 639 } 640 /* switch to UTF-7 mode */ 641 if (!utf7mode) { 642 *dst++ = '&'; 643 utf7mode = 1; 644 } 645 /* Encode US-ASCII characters as themselves */ 646 if (c < 0x80) { 647 ucs4 = c; 648 utf8total = 1; 649 } else if (utf8total) { 650 /* save UTF8 bits into UCS4 */ 651 ucs4 = (ucs4 << 6) | (c & 0x3FUL); 652 if (++utf8pos < utf8total) { 653 continue; 654 } 655 } else { 656 utf8pos = 1; 657 if (c < 0xE0) { 658 utf8total = 2; 659 ucs4 = c & 0x1F; 660 } else if (c < 0xF0) { 661 utf8total = 3; 662 ucs4 = c & 0x0F; 664 } else { 665 /* NOTE: can't convert UTF8 sequences longer than 4 */ 666 utf8total = 4; 667 ucs4 = c & 0x03; 668 } 669 continue; 670 } 671 /* loop to split ucs4 into two utf16 chars if necessary */ 672 utf8total = 0; 673 do { 674 if (ucs4 > 0xffffUL) { 675 bitbuf = (bitbuf << 16) | ((ucs4 >> UTF16SHIFT) 676 + UTF16HIGHSTART); 677 ucs4 = (ucs4 & UTF16MASK) + UTF16LOSTART; 678 utf16flag = 1; 679 } else { 680 bitbuf = (bitbuf << 16) | ucs4; 681 utf16flag = 0; 682 } 683 bitstogo += 16; 684 /* spew out base64 */ 685 while (bitstogo >= 6) { 686 bitstogo -= 6; 687 *dst++ = base64chars[(bitstogo ? (bitbuf >> bitstogo) : bitbuf) 688 & 0x3F]; 689 } 690 } while (utf16flag); 691 } 692 /* if in UTF-7 mode, finish in ASCII */ 693 if (utf7mode) { 694 if (bitstogo) { 695 *dst++ = base64chars[(bitbuf << (6 - bitstogo)) & 0x3F]; 696 } 697 *dst++ = '-'; 698 } 699 /* tie off string */ 700 *dst = '\0'; 701 }