idnits 2.17.1 draft-lindsey-usefor-signed-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack an Authors' Addresses Section. ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. == There are 1 instance of lines with non-RFC2606-compliant FQDNs in the document. Miscellaneous warnings: ---------------------------------------------------------------------------- == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (May 2000) is 8746 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'CFWS' is mentioned on line 220, but not defined == Missing Reference: 'FWS' is mentioned on line 230, but not defined -- Looks like a reference, but probably isn't: '0' on line 1354 == Unused Reference: 'RFC 1036' is defined on line 926, but no explicit reference was found in the text == Unused Reference: 'RFC 2234' is defined on line 949, but no explicit reference was found in the text == Outdated reference: A later version (-09) exists of draft-ietf-drums-msg-fmt-07 -- Possible downref: Non-RFC (?) normative reference: ref. 'PGPMOOSE' -- Possible downref: Non-RFC (?) normative reference: ref. 'PGPVERIFY' ** Obsolete normative reference: RFC 1036 (Obsoleted by RFC 5536, RFC 5537) ** Obsolete normative reference: RFC 1327 (Obsoleted by RFC 2156) ** Obsolete normative reference: RFC 2234 (Obsoleted by RFC 4234) ** Obsolete normative reference: RFC 2440 (Obsoleted by RFC 4880) -- No information found for draft-ietf-drums-smtpupd- - is the name correct? -- Possible downref: Normative reference to a draft: ref. 'SMTP' -- No information found for draft-ietf-usefor-article-format - is the name correct? -- Possible downref: Normative reference to a draft: ref. 'USEFOR' Summary: 10 errors (**), 0 flaws (~~), 8 warnings (==), 10 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group Charles H. Lindsey 3 Internet-Draft University of Manchester 4 May 2000 6 Signed Headers in Mail and Netnews 8 draft-lindsey-usefor-signed-00.txt 10 Status of this Memo 12 This document is an Internet-Draft and is in full conformance with 13 all provisions of Section 10 of RFC2026. 15 Internet-Drafts are working documents of the Internet Engineering 16 Task Force (IETF), its areas, and its working groups. Note that 17 other groups may also distribute working documents as Internet- 18 Drafts. 20 Internet-Drafts are draft documents valid for a maximum of six 21 months and may be updated, replaced, or obsoleted by other 22 documents at any time. It is inappropriate to use Internet- 23 Drafts as reference material or to cite them other than as "work 24 in progress." 26 The list of current Internet-Drafts can be accessed at 27 http://www.ietf.org/ietf/1id-abstracts.txt. 29 The list of Internet-Draft Shadow Directories can be accessed at 30 http://www.ietf.org/shadow.html. 32 Abstract 34 The huge growth of Netnews/Usenet in recent years has been 35 accompanied by many attempts to abuse the system by various forms 36 of malpractice, particularly the forging of various headers, 37 causing it to appear that articles came from parties other than 38 those that actually injected them or conveyed some Approval that 39 the real poster was not entitled to give. Insofar as Netnews is 40 regularly gatwayed to and from Email systems, these problems also 41 extend to the Email domain. 43 This document provides a cryptographically secure means whereby it 44 can be established beyond doubt that relevant headers of a Netnews 45 article or an Email message have not been tampered with in 46 transit, and that they were indeed originated by the person 47 purporting to have done so. It seeks to supplement, rather than to 48 supplant, the existing protocols for signing the bodies of 49 articles and messages. 51 [This proposal arises from the activities of the Usenet Format Working 52 Group, which is charged with updating the Netnews standards. Comments 53 are invited, preferably sent to the mailing list of the Group at 54 usenet-format@landfield.com.] 55 1. Introduction 57 [Remarks enclosed in square brackets and aligned with the left margin, 58 such as this one, are not part of this draft, but are editorial notes to 59 explain matters amongst ourselves, or to point out alternatives, or to 60 indicate work yet to be done.] 62 1.1. Scope and Objectives 64 [This is a Draft of a Draft, for discussion within the USEFOR mailing 65 list until the best format for putting it forward has been decided on. 66 It also needs to be decided whether it should be aimed towards an 67 Experimental Protocol, the Standards track, or as an integral part of 68 [USEFOR]] 70 "Netnews" is a set of protocols [USEFOR] that enables news "articles" 71 to be broadcast to potentially-large audiences, using a flooding 72 algorithm which propagates copies throughout a network of 73 participating hosts. The huge growth in the use of this protocol in 74 recent years has been accompanied by many attempts to abuse the 75 system by causing it to appear that articles came from parties other 76 than those that actually injected them, or that they had been posted 77 with some Approval that the real poster was not entitled to give, or 78 that they otherwise appeared to be different from what they actually 79 were. The effects of such abuse are particularly accute in the case 80 of "Control" articles which can cause newsgroups to be created or 81 removed on hosts worldwide, or which can cause unauthorized deletion 82 of articles already received and stored on such hosts. It is 83 therefore considered essential to provide a cryptographically secure 84 means whereby it can be established beyond doubt that the source and 85 structure of articles are exactly as they purport to be. 87 "Electronic Mail" is a system for routing "messages" [MESSFOR] 88 between individual computer users, usually on a one-to-one basis. The 89 formats of Email messages and News articles have deliberately been 90 made to be similar, so that messages may be gatewayed to news systems 91 and vice-versa. In order that the same protection may be provided 92 end-to-end for articles passing through such gateways, the protocal 93 described here has been designed so that it will also work in the 94 Email environment. If it should be found to have further applications 95 in the Email environment, then that would be an added bonus. 97 An existing experimental protocol "pgpverify" [PGPVERIFY] is already 98 in widespread use for authenticating Control messages for creating 99 and removing newsgroups within Usenet, and has proven itself very 100 successful in mitigating the effects of malicious attacks against the 101 integrity of Usenet. This present proposal is largely based upon 102 pgpverify; however, pgpverify is unsuitable for more widespread use 103 as it stands because it is unable to cope with folded headers and 104 with the changes that mail messages in particular are likely to 105 undergo during transport. A second similar experimental protocol 106 "pgpmoose" [PGPMOOSE] is also currently in use for protecting 107 moderated newsgroups against unauthorized postings. 109 There also exist protocols for the cryptographic signature of bodies 110 of articles, notably S/Mime and PGP/Mime [RFC 2015], and it is 111 moreover common to sign such bodies using PGP alone without the use 112 of Mime [RFC 2045] et seq at all. However, these protocols cannot, by 113 their nature, be used to sign headers. Moreover, since the signature 114 is applied after any Content-Transfer-Encoding [RFC 2045], it may be 115 impossible to verify the signature if the Content-Transfer-Encoding 116 should be changed as the message passes through a succession of sites 117 during transport. Nevertheless, this present proposal does not 118 attempt to usurp those protocols, but merely provides the means to 119 sign headers, both of complete messages and of headers embedded in 120 Mime messages and multiparts. 122 [This document has been designed to fit on top of the drafts currently 123 in preparation for Email [MESSFOR] and for News [USEFOR]. It is 124 expected that at least the Email draft will have progressed to the RFC 125 stage by the time the present document is complete, at which time all 126 references to [MESSFOR] in the present text will be replaced by 127 references to that RFC. If it is thought wise to issue this document 128 before [USEFOR] is complete, then that reference will have to be to [RFC 129 1036] instead.] 131 1.2. Notations and Conventions 133 1.2.1. Requirements notation 135 Certain words, when capitalized, are used to define the significance 136 of individual requirements. The key words "MUST", "SHOULD", "MAY" and 137 the same words followed by "NOT" are to be interpreted as described 138 in [RFC 2119]. 140 1.2.2. Syntactic notation 142 This document uses the Augmented Backus Naur Form described in [RFC 143 2234]. A discussion of this is outside the bounds of this document, 144 but it is expected that implementors will be able to quickly 145 understand it with reference to the defining document. 147 1.3. Overview 149 This proposal makes provision for Signed headers to be included in 150 news articles and in Mime messages and multiparts. A Signed header 151 provides a cryptographic signature over a named set of other headers, 152 including lower level headers contained in Mime messages and 153 multiparts below the current level. Such signatures can give 154 assurance to a recipient who verifies them that those headers have 155 not been changed or added to in transit, and/or that the article was 156 indeed sent by its purported originator. 158 The bodies of articles, Mime messages and multiparts are not directly 159 included in the Signature. Rather, the intention is that each such 160 body part should have a Content-MD5 (or similar) header computed for 161 it, and that header should then be included in the Signature instead. 163 There is also provision for Verified headers which may be added by 164 agents that have checked a Signed header. Verified headers may 165 themselves be included in further Signed headers; this may be 166 especially useful in the case of gateways which find it necessary to 167 change an article in ways that invalidate an original signature. 169 Every effort has been made to ensure that signatures remain 170 verifiable in spite of all reasonable (and even unreasonable) changes 171 to which they may be subjected in transit. These include changes to 172 the Content-Transfer-Encoding of body parts (a principle reason for 173 including them only via the Content-MD5 header), changes in the order 174 of headers and of their layout, and encodings and re-encodings of 175 unusual character sets. This is to be achieved by converting headers 176 into a canonical form before they are signed. New headers, yet to be 177 invented, need provide no problem, and there is no commitment to any 178 particular character set (provided header-names remain in US-ASCII, 179 as at present). 181 Provision is made for different protocols which may be required in 182 the future. However, this proposal defines just one, recommended 183 protocol, and it is not desirable that other protocols should be 184 defined unless and until serious deficiencies in the existing ones 185 have been revealed. 187 2. Basic Structure of Authenticating Headers 189 A Signed or a Verified header may appear in the headers of a news 190 article or a mail message, or in the headers of a Mime multipart 191 sub-part or of a Mime message/rfc822 object (or indeed of any similar 192 Mime object yet to be invented). In all cases, the term "current 193 level" encompasses the entire set of headers in that same object. 194 Where the headers at the current level include a "Content-Type: 195 multipart/*" or "Content-Type: message/*" header, lower-level 196 headers can arise within its sub-parts. 198 2.1. Syntax of the Signed header 200 Signed = "Signed" ["-" DIGIT9] ":" 1*SP header-ref-list 201 1*( ";" header-parameter ) CRLF 202 DIGIT9 = %x31-39 ; 1..9 203 header-ref-list= header-ref *( [CFWS] "," [CFWS] header-ref ) 204 header-ref = [ "+" / "-" ] ( field-name *( "/" 1*DIGIT ) 205 / "mail-standard" / "news-standard" ) 206 field-name ; see [MESSFOR] 207 CFWS ; see [MESSFOR] 208 FWS ; see [MESSFOR] 209 header-parameter 210 = attribute "=" value 211 attribute = signed-token / x-token 212 signed-token = "protocol" / "key" / "sig" / 213 214 value = token / quoted-string 215 x-token = [CFWS] The two characters "X-" or "x-" followed, 216 with no intervening white space, by any token> 218 [CFWS] 219 token = [CFWS] 1* [CFWS] 221 tspecials = "(" / ")" / "<" / ">" / "@" / 222 "," / ";" / ":" / "\" / DQUOTE / 223 "/" / "[" / "]" / "?" / "=" 224 quoted-string ; see [MESSFOR] 225 protocol-value = ietf-token / x-token 226 ietf-token = 229 key-id-value = token 230 signature-value= DQUOTE [FWS] 1*( btext [FWS] ) DQUOTE 231 btext = %x41-5A / %x61-7A / %x30-39 / "+" / "/" / "=" 232 ; base 64 chars 234 The header-parameters MUST include a "protocol" parameter and a "sig" 235 parameter, of which the "sig" paramameter MUST be the last parameter 236 and MUST NOT be followed by CFWS (though it MAY be followed by WS). 238 NOTE: The requirement for an explicit SP after the ":" is to 239 ensure compatibility with the syntax of Netnews [USEFOR]; it is 240 not strictly necessary for Email. 242 The use of a DIGIT9 in the Signed header allows for 10 distinct such 243 headers at any one level. This is more than sufficient for the 244 intended usage (it would be most unusual to get beyond Signed-2) 245 whilst still permitting implementations to check header-names against 246 a fixed list of valid names. There MUST NOT be more than one Signed 247 header with no DIGIT9, or the same DIGIT9, within one set of headers. 249 The header-ref-list indicates those header-refs, at or below the 250 current level, which are covered by the signature. The ordering of 251 this list is significant. A header-ref prefixed by a "+", or not 252 prefixed at all, indicates a header-ref to be added to the list 253 defined by those preceding it, and a header-ref prefixed by "-" 254 indicates a header-ref to be removed from the header-refs defined by 255 the list preceding it. 257 Tokens are case-insensitive. "Foobar" is the preferred protocol 258 defined by this proposal. It is desirable to keep the number of 259 recognized protocols to an absolute minimum, and it is anticipated 260 that further protocols would only be needed in the event that serious 261 cryptographic deficiencies were to be found in the existing ones. 262 [Obviously, "foobar" is just a placeholder for whatever name is finally 263 chosen.] 265 The "key" parameter identifies the key used to generate the signature 266 in a notation dependent upon the protocol (but commonly "0x" followed 267 by hexadecimal digits). The CFWS following it MAY include a comment 268 containing an identification of the person or entity which created 269 the signature. 271 The header-ref "news-standard" is a macro representing a set of 272 common headers that SHOULD normally be included when signing the 273 headers of a Netnews article, and is defined as the list 275 Date, Newsgroups, Distribution, Message-ID, From, Reply-To, 276 Followup-To, References, Subject, Keywords, Control, Content-Type, 277 Content-ID 279 The header-ref "mail-standard" performs the same function for mail 280 messages, and is defined as the list 282 Date, From, Reply-To, To, Cc, In-Reply-To, References, Subject, 283 Keywords, Content-Type, Content-ID 285 NOTE: Those lists have carefully excluded those headers (such as 286 Sender and Content-Transfer-Encoding) which are liable to be 287 added or altered by sites downstream from the one which 288 generated the Signed header. If some header-ref in the list 289 matches no header in the actual article, then it comprises an 290 assertion that no such header was present when the article was 291 signed. Headers which are routinely added to or altered as the 292 article progresses through transports (such as Path, Received 293 and Xref) SHOULD NOT be included in a header-ref-list, and 294 neither should any header which appears twice in the set of 295 headers. A header-ref prefixed by "-" may be used to exclude any 296 header-ref from one of the standard lists. 298 2.2. Semantics of the Signed header 300 Where the headers at the current level include a "Content-Type: 301 multipart/*" or "Content-Type: message/*" header, lower-level headers 302 within its sub-parts may be referenced as follows: 304 (i) A header-ref not postfixed by any "/ DIGIT"s references the 305 header of that name, if any, at the current level. Header-refs 306 are, for this purpose, considered as case-insensitive. 308 (ii) A header-ref of the form "XXXX/" (or "XXXX//..."), 309 where and are numbers and the current level contains a 310 "Content-Type: multipart/*" header, references the header that 311 would be referenced by "XXXX" alone (or by "XXXX/...") in the 312 th sub-part of that multipart, that sub-part now being 313 regarded as the current level. 315 (iii)A header-ref of the form "XXXX/1", where the current level 316 contains a "Content-Type: message/rfc822" header (or any other 317 message type which provides for its own set of headers), 318 references the header that would be referenced by "XXXX" alone 319 in that message object. 321 (iv) A header-ref that does not match up with multipart or message 322 Content-Type headers as indicated above MUST NOT be used. 324 (v) For example "Content-MD5/3/2" references the Content-MD5 header 325 of the second part of a multipart, which is itself the third 326 part of a multipart established at the current level. 328 A protocol, as established by this proposal or by any extension to 329 it, comprises two parts: a "canonicalization algorithm" and a 330 "cryptographic algorithm". 332 The signature of a Signed header is constructed in accordance with a 333 given header-ref-list as follows: 335 1. A partial Signed header is constructed from that header-ref-list 336 and such header-parameters (excluding "sig") as are required by 337 the protocol, including at least a "protocol" parameter and, most 338 likely, a "key" parameter identifying the cryptographic key used 339 (possibly followed by a comment indicating the person or entity 340 responsible), all followed by a CRLF. 342 2. The header-ref-list is reduced by expanding the macros "mail- 343 standard" and/or "news-standard", removing from the preceding part 344 of the list any header-ref prefixed by a "-", and removing any 345 duplicates. 347 3. The partial Signed header followed by all the headers referenced 348 by the reduced header-ref-list (being headers at the current level 349 or encapsulated within multiparts at any lower level and taken in 350 their order within the header-ref-list) are concatenated to 351 produce a list of headers to be signed. 353 4. The list of headers to be signed is subjected to the 354 canonicalization algorithm of the protocol to produce a 355 canonicalized list. 357 5. The canonicalized list is subjected to the cryptographic algorithm 358 of the protocol to produce an octet stream representing the 359 signature. 361 6. If the octet stream as produced by the cryptographic algorithm is 362 not already in the form of base64 characters, it is now encoded in 363 base64 [RFC 2045]. A "sig" parameter is appended to the partial 364 Signed header, its value consisting of a quoted-string containing 365 the base64-encoded octet stream, split into convenient lines by 366 the insertion of FWS. 368 7. The Signed header thus constructed is then incorporated into the 369 set of headers at the current level. 371 The signature of a Signed header is verified as follows: 373 1. The "sig" parameter is removed from the Signed header to give a 374 partial Signed header. 376 2-4.The corresponding steps of the process that constructed the 377 header are taken, producing a canonicalized list. 379 5. The public key identified according to the "protocol" parameter is 380 now used by the cryptographic algorithm of that protocol to verify 381 the signature. This may result in a simple pass-fail, or it may 382 return some indication of the privileges (such as the authority to 383 issue certain news control messages or to manage some mailing 384 list) enjoyed by the owner of that key. 386 The purpose of a Signed header is solely to establish that the 387 headers referenced in it were present in an article when that article 388 passed through the hands of the person or entity that generated the 389 signature (and hence that it did indeed pass through those hands). It 390 SHOULD NOT be taken as an endorsement of whatever is contained in the 391 body of the article. If the contents of the body require such 392 endorsement, then the body SHOULD be signed separately, for example 393 in accordance with PGP/Mime [RFC 2015]. 395 Signatures will typically be generated by the originators of articles 396 (to prove the origin), by moderators of moderated newsgroups (to 397 testify to their Approved header), by managers of mailing lists, and 398 by gateways. They SHOULD NOT be generated by intermediate transports 399 and relayers through which the article might pass. This is intended 400 to be an end-to-end protocol, and signatures SHOULD ONLY be added 401 when new, hitherto unsigned, information is added. Moreover, the set 402 of headers included within the signature SHOULD be no more than is 403 necessary to achieve the security desired. 405 NOTE: It will be observed that no provision has been made to 406 include the bodies of an article or of its sub-parts in the 407 signature. If (as will indeed often be the case) it is required 408 to attest that the body (or sub-part) dispatched along with the 409 set of headers is the same as the body that was delivered at the 410 far end, then the proper procedure is to construct a Content-MD5 411 header [RFC 1864] for that body (or sub-part) and to include 412 that Content-MD5 amongst the headers that are signed. Doing it 413 this way confers three advantages: 414 a) The Content-MD5 header is constructed in such a way that it 415 is immune to changes of Content-Transfer-Encoding to which an 416 article, or its sub-parts, may be subjected during transport. 417 b) Given that many user agents already routinely construct a 418 Content-MD5 header, and verify it on receipt (a practice much to 419 be commended), it should be possible to generate a Signed header 420 without an extra pass through the entire body (especially in the 421 common case where there are no sub-parts). This applies 422 particularly in the case of additional signatures by moderators 423 or mailing list managers, who may not need to examine the body 424 at all. 425 c) If a Content-MD5 header should fail to verify (perhaps 426 because of some transmission error) the verification of a Signed 427 header might still succeed, giving the recipient at least some 428 partial information as to where any problem might lie. 430 NOTE: If, at some future time, a Content-SHA1 header (or any 431 similar header based upon a different hashing algorithm) should 432 be invented, it could equally well be used for this purpose. 434 2.3. Syntax of the Verified header 436 Verified = "Verified" ["-" DIGIT9] ":" 1*SP name-addr 437 *( ";" header-parameter ) CRLF 438 name-addr ; 439 attribute =/ verified-token 440 verified-token = "signature" / "hashcheck" 441 signature-value= "good" / "FAILED" 442 hashcheck-value= DQUOTE ( "good" / "FAILED" ) 443 FWS header-ref-list DQUOTE 445 The use of a DIGIT9 in the Verified header allows for 10 distinct 446 such headers in one article. Each Verified header MUST match some 447 Signed header with the same DIGIT9 in that same set of headers. There 448 MAY be more than one Verified header with the same DIGIT9 within one 449 set of headers (but observe that it would not then be possible to 450 include those headers in a further Signed header). 452 Tokens used for attributes are case-insensitive. The only parameters 453 defined by this proposal are the "signature" and "hashcheck" 454 parameters. Other parameters permitted by the syntax are for the 455 purpose of future extensions to this proposal, and should be ignored 456 except as defined in such extensions. The absence of a "signature" 457 parameter should be taken as indicating that the verification had 458 succeeded. The "hashcheck" parameter is to indicate that a Content- 459 MD5 (or similar) header identified in the header-ref-list had been 460 verified, or not as the case may be. 461 [Do we also want a "confidence" parameter for the verifier to express 462 his certainty of the identity of the original Signer, and if so, what 463 notation to use?] 465 2.4. Semantics of the Verified header 467 The Verified header is intended to be added to an article by an agent 468 through which the article passes, and serves as an assertion that the 469 corresponding Signed header has been cryptographically verified by 470 the person or entity identified in the name-addr (or otherwise if the 471 "FAILED" value is present). The addr-spec contained in that name- 472 addr MUST be a valid email address by which that person or entity may 473 be contacted. The original Signed header MUST NOT be removed from the 474 article. The Verified header (supposing it is the only one present 475 with that particular DIGIT9, if any) MAY itself be included in a 476 further Signed header added at the same time. 478 NOTE: The purpose of a Verified header is to save the ultimate 479 recipient the trouble of verifying the cryptographic signature 480 himself (which can be time consuming, and may require knowledge 481 of public keys not in his possession). Such a verification, if 482 performed close to the ultimate recipient (such as by the news 483 or mail server to which he connects) could normally be regarded 484 as adequate evidence of authenticity, even if not signed itself. 485 It would be hard (certainly in the case of Netnews) for a 486 malicious interloper to cause such a verification to appear 487 bearing the identity of the local server of each ultimate 488 recipient. 490 NOTE: The Verified header is also useful in the case that a 491 gateway (or a moderator) makes some change to an article that 492 renders an original Signed header invalid. Such a gateway can 493 therefore certify that the original form of the Signed header 494 had been verified, and can then resign the article (including 495 his added Verified header). Likewise, a site (such as the 496 originator's own server) with a well known public key can verify 497 and resign an article whose originator's public key may be less 498 well known. However, Verified headers SHOULD NOT be added as 499 routine by other intermediate sites. 501 It is normally the business of the reading agent of the ultimate 502 recipient to check the correctness of a Content-MD5 or similar 503 header. Nevertheless, an earlier agent that has added a Verified 504 header and also checked such a Content-MD5 header MAY so indicate by 505 including a "hashcheck" parameter. 507 3. Protocol definition 509 3.1. Requirements for canonicalization algorithms 511 It is a sad fact of life that those implementing agents for handling 512 Netnews and Email cannot resist the temptation to "improve" articles 513 passed through them by rewriting headers that are thought not to 514 conform to some real or supposed standard. Experience shows that, in 515 the majority of cases, such tinkering makes matters worse rather than 516 better, and for that reason [USEFOR] and, to a lesser extent, 517 [MESSFOR] and [SMTP] try to forbid it, especially when perpetrated by 518 relaying and transport agents (there are arguments in favour of 519 allowing injecting agents and other agents close to the originator to 520 do some limited cleanups, especially where it is impractical to 521 return the article to the originator for correction). 523 Furthermore, in the case of Email it is often required for the 524 transport protocols to modify articles en route, most notably when 525 articles containing octets with the 8th bit set have to be passed 526 through a channel that permits only 7bit. 528 It is a further sad fact of life that agents which make such changes 529 are not going to go away just because some standard says so. 530 Therefore, the canonicalization algorithm SHOULD endeavour to enable 531 the headers of articles to be signed and verified in accordance with 532 this proposal in spite of such tinkerings, insofar as they can be 533 anticipated. The following list indicates some common practices which 534 are worth detecting and protecting against. 536 o Headers may be re-folded to fit within some preferred overall 537 line length. This may result in the creation of whitespace where 538 none existed before. 539 o Trailing whitespace may be removed, and line endings changed 540 to/from CRLF. 541 o Header-names may be converted into some usual canonical form 542 (e.g. "Mime-Version" into "MIME-Version"). 543 o Phrases, or parts thereof, may be converted to or from quoted- 544 strings. 545 o Date-times may be rewritten in some preferred format, or into 546 some preferred timezone. 547 o Headers with non-ASCII characters may be converted to or from the 548 notation defined in [RFC 2047]. 549 Observe that there is no canonical way to do this conversion and it 550 is, moreover, frequently performed in contexts where it is not 551 strictly allowed. 552 [Other contributions to this list welcomed.] 554 Since the slightest change to a canonicalization algorithm will 555 render it inoperable with previous versions, such an algorithm MUST 556 NOT be changed once it has been defined by this proposal, or any 557 extension thereof. In the event of some inadequacy being found, it 558 would be necessary to devise and standardize a new algorithm, a task 559 not to be undertaken lightly. For this reason, canonicalization 560 algorithms SHOULD be designed to cope with the widest possible range 561 of headers, including those not yet invented. Therefore, they SHOULD 562 NOT, so far as possible, rely on the ability to parse any particular 563 header. 565 NOTE: A canonicalization algorithm is required simply to produce 566 an octet stream for submission to the cryptographic algorithm. 567 That stream does not have to be human readable, nor does it have 568 to be a syntactically-correct header, nor does it have to be 569 convertible back into the original header, or into any correct 570 header at all. Insofar as many original headers can, in 571 principle, be mapped into the same octet stream, this in no way 572 reduces the utility of the algorithm, even though it might 573 enable conspiracy theorists to imagine, and even implement, 574 various sorts of covert channels for use by malicious 575 interlopers. 577 3.2. The Foobar protocol 579 [Suggestions for a proper name on a postcard, please, to /dev/null for 580 now.] 582 The "foobar" protocol is comprised of a canonicalization algorithm 583 "foo" and a cryptographic algorithm "bar". 585 3.2.1. The Foo canonicalization algorithm 587 For the purposes of this algorithm, the headers Subject, Comments, 588 Organization and Summary, and all headers starting with "X-", are to 589 be considered "unstructured" and all other headers "structured" 590 (whether or not they were so described in any other standard). 591 Headers are considered to be constrained to the following syntax: 593 structured-header 594 = header-name ":" 595 1*SP structured-header-content CRLF 596 unstructured-header 597 = header-name ":" 598 1*SP unstructured-header-content CRLF 599 header-name = 1*name-character *( "-" 1*name-character ) 600 name-character= ALPHA / DIGIT 601 structured-header-content 602 = *structured-header-zone 603 unstructured-header-content 604 = unstructured-header-zone 605 structured-header-zone 606 = neutral-zone / quoted-zone / sharp-zone / 607 square-zone / comment-zone 608 unstructured-header-zone 609 = 1*( FWS / encoded-word / ) 610 neutral-zone = 1*( FWS / encoded-word / 611 ) 612 quoted-zone = DQUOTE *( FWS / 613 ) 614 DQUOTE 615 sharp-zone = "<" *( FWS / 616 "> ) ">" 617 square-zone = "[" *( FWS / 618 ) "]" 619 comment-zone = "(" *( FWS / encoded-word / comment-zone / 620 ) ")" 621 encoded-word = "=?" pure-token "?" pure-token "?" 622 1* "?=" 624 pure-token = 1* 627 o where '' means any octet other than those 628 representing the US-ASCII characters NULL, CR, LF, TAB and SP, 629 o where 'except unquoted "x"' means except any "x" not immediately 630 preceded by a "\" and thus constituting a quoted-pair, and 631 o where an encoded-word does not include "(" or ")" when in a 632 comment-zone, and does not include DQUOTE, "<", "[", or "(" when 633 in a neutral-zone. 634 Observe that certain header-names containing non-alphanumeric 635 characters, and permitted by [MESSFOR] (though never used in 636 practice) are excluded from this protocol. Moreover, it is not 637 assumed that this protocol will work on any of the obsolete syntax 638 defined by [MESSFOR]. 640 NOTE: All known Email and Netnews headers (and a lot more 641 besides) are encompassed within this syntax. Observe that the 642 various zones cannot possibly overlap, and that any encoded-word 643 must be fully contained within its zone. All encoded-words 644 permitted by [RFC 2047] (and more besides) are covered. The 645 structure is easily parsed by a straightforward state machine 646 (though the nesting of comment-zones is a nuisance, as is the 647 impossibility to detect whether a sequence beginning "=?" was 648 really an encoded-word until you get to the matching "?="). 650 Each header to be included in the algorithm, which will in general 651 consist of several lines (those after the first commencing with 652 whitespace), is processed as follows: 654 1. The header-name at the start of the header is converted to 655 lowercase and the whitespace following it (if any) is replaced by 656 a single SP. 658 2. Within each unstructured-header-zone and each comment-zone, all 659 instances of FWS are replaced by a single SP; within each 660 neutral-, quote-, sharp- or square-zone, all instances of FWS are 661 omitted (thus the header has now been unfolded into a single 662 line). Any whitespace at the end of the header is removed, and it 663 is ensured that the header ends with a single CRLF. 665 3. The DQUOTEs (ASCII '"') enclosing each quoted-zone are removed 666 (but not any quoted DQUOTE or any DQUOTE within other zones so 667 that, in particular, they are not removed within msg-ids). 669 4. Any date-time occurring in a Date, Resent-Date or Expires header 670 (but not in any other header) is converted into the number of 671 seconds since the start of January 1st 1970 UTC, expressed as a 672 decimal number without leading zeroes, and as more precisely 673 defined by the POSIX mktime routine. 674 [Can someone give me a reference to the proper POSIX document?] 676 5. Any encoded-word (where allowed by the above sysnax, and whether 677 or not its length is more than 75 characters) is replaced by the 678 sequence of octets obtained by decoding it. Moreover, where two 679 adjacent encoded-words are separated by whitespace, that 680 whitespace is removed (see [RFC 2047]). 682 NOTE: The decoding of encoded-words must take place last, 683 because it could produce arbitrary sequences of octets (when 684 decoding into UCS-16, for example) which might then be confused 685 with US-ASCII characters such as DQUOTE, etc. Whitespace needs 686 to be removed entirely from structured headers because it is 687 possible it may have been introduced by folding in unexpected 688 places en route, subsequent to the original signing. 690 If, during signing, a header is found not to conform to the given 691 syntax (in particular, if the closing delimiter of some zone is not 692 found), then the signing MUST be aborted (and it MAY be aborted if 693 the header is malformed for some other reason). When verifying a 694 signature, however, an implementation MAY attempt to continue even 695 when the final zone of a header has no closing delimiter. 697 NOTE: If an internet mail message in the format defined by 698 [MESSFOR] is converted into X.400 mail by a gateway conforming 699 to [RFC 1327] and then back into internet mail, then it is 700 likely that any signature made in accordance with this proposal 701 will fail to verify. For example comments in headers containing 702 addresses (such as in From, Reply-To, etc.) may be converted 703 into phrases and moved in front of the addr-spec, or even 704 removed entirely, and thus the canonicalized form of the message 705 will have been changed. This old convention, for storing the 706 Real Name of the person associated with the address in a 707 following comment, is now deprecated by both [MESSFOR] and 708 [USEFOR], but even where phrases are used for this purpose it is 709 possible that other changes to the message will still render the 710 signature unverifiable. Note that there is in any case no 711 expectation that an internet mail message signed according to 712 this proposal will ever be able to be verified once it has been 713 passed permanently into an X.400 system, nor vice versa. 715 3.2.2. The Bar cryptographic algorithm 717 [Open PGP is the obvious choice for this, since it is widely available 718 and is blessed by the IETF. My only reservation is that it comes with a 719 rather poor certification system as compared with, say, SPKI. So this 720 choice might yet have to be reviewed.] 722 The stream of octets resulting from the canonicalization algorithm is 723 signed, in binary mode (signature type 0x00), in accordance with Open 724 PGP [RFC 2440]. 726 NOTE: The signature is made in binary mode just in case any [RFC 727 2047] decoding into UCS-16 has produced octets which might be 728 mistaken for isolated CR, LF or trailing SP characters, which 729 are treated specially in PGP text mode. 731 The output of the algorithm MUST be Ascii-armored [RFC 2440], but the 732 Armor Header Line ("BEGIN PGP SIGNATURE"), the Armor Headers (e.g. 733 "Version:"), the blank line following the Armor Headers, and the 734 Armor Tail ("END PGP SIGNATURE") are to be omitted (thus yielding a 735 sequence of base64 characters). Observe that these characters will 736 include a CRC checksum, which SHOULD be on a separate line from the 737 rest of the signature. 739 The signature included within the Ascii-armor MAY include 740 certificates as evidence that the signing key has the necessary 741 authorization to sign articles of that nature, but such usage is in 742 general deprecated except between parties that have agreed otherwise 743 or where, for some reason, an unusual signatory is signing and 744 attaches a certificate from the usual signatory. 746 The signature SHOULD use the DSA public-key algorithm and the SHA-1 747 hashing algorithm, and be incorporated in a Version 4 Signature 748 Packet in the new format. It MAY alternatively use the combination 749 RSA/MD5 with Version 3 in the old format (for compatibility with PGP 750 2.6.x) and it MAY use the combination RSA/SHA-1 with Version 4 in the 751 new format. Verifiers MUST be able to verify all of these forms. 753 4. Applications 755 It is anticipated that protocols for specific applications of the 756 signature mechanisms described in this proposal will be devised, 757 whether under the auspices of the IETF or otherwise. For example, the 758 need to be able to verify the origin of Control messages for creating 759 and removing newsgroups and for cancelling articles was a prime 760 motivation for creating this proposal. 762 It is up to each such application to specify appropriate mechanisms 763 for establishing a Public Key Infrastructure suited to its purpose. 764 Such an infrastructure would provide for the storing, distribution 765 and authorization of the necessary public keys (and for revocations 766 thereof). This proposal establishes no preferred mechanisms in this 767 regard, except to draw attention to the possible usefulness of the 768 Content-Type application/pgp-keys as defined in [RFC 2015]. 770 5. Examples 772 [The MD5 hashes in the following are bogus, but I would expect to 773 include genuine ones in the final version. The signatures are genuine, 774 by my own key] 776 5.1. Newgroup Control message 778 A 'newgroup' control message in the format given in [USEFOR]. 780 Newsgroups: comp.foo 781 From: "Charles Lindsey" 782 Subject: cmsg newgroup comp.foo moderated 783 Control: newgroup comp.foo moderated 784 Approved: newgroups-request@isc.example 785 Message-ID: <919190727.4918@isc.example> 786 Date: Tue, 16 Feb 1999 18:45:27 -0000 787 MIME-Version: 1.0 788 Content-Type: multipart/mixed; boundary=88888888 789 Signed: news-standard,+content-md5/1,+content-type/1,+content-md5/3, 790 +content-type/3; protocol=foobar; key="0x2C15F1A9" 791 (Charles Lindsey); 792 sig=" 793 iQB8AwUAOLVOAK1e6k0sFfGpAQH5swMzBpEVYf0mhFg1r3ErtGSC1RS7iwHPalsJ 794 3miSKIfK7GdBnNfVGg9feiTkYMv3aMpUGYRaxn6W1K5QxIQInU+KNbCWiPLrGPdS 795 jW7gYe7vB3tBeXiOe7+6wPHmzUAlKiuRuNcfQrOYGg== 796 =GGsm" 798 This is a multipart message in MIME format. 800 --88888888 801 Content-Type: application/news-groupinfo 802 Content-MD5: T7NtIdVqde62kheQuAHOaw== 804 For your newsgroups file: 805 comp.foo For Foo discussions (Moderated) 806 --88888888 807 Content-Type: text/plain 809 comp.foo a moderated newsgroup which passed its vote for creation 810 by 424:8 as reported in news.announce.newgroups on 10 Feb 99. 812 --88888888 813 Content-Type: application/news-transmission 814 Content-MD5: +piSsoeNmdin5ukFQuFTlw== 816 Newsgroups: comp.foo 817 Path: not-for-relaying 818 Distribution: local 819 From: "Charles Lindsey" 820 Message-ID: <919190727.4918/part2@isc.example> 821 Date: Tue, 16 Feb 1999 18:45:27 -0000 822 Subject: Charter for newsgroup com.foo 823 Approved: newgroups-request@isc.example 825 The charter, culled from the call for votes: 827 Comp.foo is a moderated newsgroup for discussing all manner of 828 Foos. 830 Moderation submission address: 831 comp-foo@bar.example 833 --88888888-- 835 5.2. Mail message re-signed by mailing list owner 837 received: from house.example by bar.example (8.8.8/AL/MJK-2.0) 838 id XAA10880; Sat, 13 Feb 1999 23:00:14 GMT 839 Resent-From: "Example Mail Server" 840 Precedence: list 841 Received: (from list@localhost) 842 by house.example (8.9.2/8.9.2) id OAA28279; 843 Sat, 13 Feb 1999 14:59:56 -0800 (PST) 844 From: <"[john]"@ 845 temple.example> (John Smith) 846 Organization: http://www.temple.example/john 847 Subject: Submission to mailing list 848 in connection with foo. 849 Message-ID: <19990213145946.20115@main.temple.example> 850 Date: Sat, 13 Feb 1999 22:59:46 +0000 851 Mime-Version: 1.0 852 Content-Type: text/plain; charset=us-ascii 853 Content-MD5: +piSsoeNmdin5ukFQuFTlw== 854 Signed: mail-standard,content-md5; 855 protocol=Foobar; key="0x2376C8BD" (John Smith); 856 sig=" 857 iQBVAwUAOLVRmGR/OLEjdsi9AQEIfQH+I9fB4+4cItsNX0fHq8KlT6ETKQUwnmZB 858 TBB3ygoa0n6fiSxMijoMR3SRfQqzGY5fMbOMlv1mMyxVcs74jpk8OQ== 859 =qRiE" 861 Verified: majordomo-request@com.example; signature=good; 862 hashcheck=content-md5 863 Signed-1: message-id,date,resent-from, 864 verified,signed; protocol=FOOBAR; key="0x2C15F1A9"; 865 sig=" 866 iQB8AwUAOLVs2a1e6k0sFfGpAQFGGwMxAeCoV6JIuruJky7j2TOhvILDgf6ZUZA5 867 B7okwUTK0omlWdBmc3jLb/8oVHhZCD1aEoejqLWsU1KbQYdn2MZuwA/yAaTDEpdM 868 DMXM1ui+G569BoyxKmUce9Je4hY6tq47e1ajQO8HRw== 869 =JXiU" 871 Text of John's message. 873 -- 874 John's signature. 876 Passing the original form of this through the foo canonicalization 877 algorithm produces the following, in the case of the "Signed:" header 878 (observe lines folded for convenience of this document - the true 879 line endings indicated by "CRLF"): 881 signed: mail-standard,content-md5;protocol=Foobar;key=0x2376C8BD( 882 John Smith)CRLF 883 date: 918946786CRLF 884 from: <"[john]"@temple.example>(John Smith)CRLF 885 subject: Submission to mailing list in connection with foo.CRLF 886 content-type: text/plain;charset=us-asciiCRLF 887 content-md5: +piSsoeNmdin5ukFQuFTlw==CRLF 889 And here is the result of canonicalizing to produce the "Signed-1:" 890 header: 892 signed-1: message-id,date,resent-from,verified,signed;protocol=FO 893 OBAR;key=0x2C15F1A9CRLF 894 message-id: <19990213145946.20115@main.temple.example>CRLF 895 date: 918946786CRLF 896 resent-from: ExampleMailServerCRLF 897 verified: majordomo-request@com.example;signature=good;hashcheck= 898 content-md5CRLF 899 signed: mail-standard,content-md5;protocol=Foobar;key=0x2376C8BD( 900 John Smith);sig=iQBVAwUAOLVRmGR/OLEjdsi9AQEIfQH+I9fB4+4cItsNX0fHq 901 8KlT6ETKQUwnmZBTBB3ygoa0n6fiSxMijoMR3SRfQqzGY5fMbOMlv1mMyxVcs74jp 902 k8OQ===qRiECRLF 904 NOTE: the second signature signed only that which it had added 905 itself, plus sufficient of the original headers to identify the 906 original message. It did not need to scan the body to recompute 907 the MD5 hash, but effectively included it by signing the 908 original "Signed:" header. 910 6. Security 912 TBD 914 [What is there to say here?] 916 7. References 918 [MESSFOR] P. Resnick, "Internet Message Format Standard", draft- 919 ietf-drums-msg-fmt-07.txt, March 1998. 921 [PGPMOOSE] Greg Rose, [I need a URL for this], October 1995. 923 [PGPVERIFY] David Lawrence, 924 ftp://ftp.isc.org/pub/pgpcontrol/README.html. 926 [RFC 1036] M. Horton and R. Adams, "Standard for Interchange of 927 USENET Messages", RFC 1036, December 1987. 929 [RFC 1327] S. Hardcastle-Kille, "Mapping between X.400(1988) / ISO 930 10021 and RFC 822", RFC 1327, May 1992. 932 [RFC 1864] J. Myers and M. Rose, "The Content-MD5 Header Field", RFC 933 1864, October 1995. 935 [RFC 2015] M. Elkins, "MIME Security with Pretty Good Privacy (PGP)", 936 RFC 2015, October 1996. 938 [RFC 2045] N. Freed and N. Borenstein, "Multipurpose Internet Mail 939 Extensions (MIME) Part One: Format of Internet Message Bodies", 940 RFC 2045, November 1996. 942 [RFC 2047] K. Moore, "MIME (Multipurpose Internet Mail Extensions) 943 Part Three: Message Header Extensions for Non-ASCII Text", RFC 944 2047, November 1996. 946 [RFC 2119] S. Bradner, "Key words for use in RFCs to Indicate 947 Requirement Levels", RFC 2119, March 1997. 949 [RFC 2234] D. Crocker and P. Overell, "Augmented BNF for Syntax 950 Specifications: ABNF", RFC 2234, November 1997. 952 [RFC 2440] J. Callas, L. Donnerhacke, H. Finney, and R. Thayer, 953 "OpenPGP Message Format", RFC 2440, November 1998. 955 [SMTP] John C. Klensin and Dawn P. Mann, "Simple Mail Transfer 956 Protocol", draft-ietf-drums-smtpupd-*.txt. 958 [USEFOR] Charles H. Lindsey, "News Article Format", draft-ietf- 959 usefor-article-format-03.txt. 961 8. Acknowledgements 963 The author acknowledges the work of David Lawrence, as original 964 author of "pgpverify", for many of the ideas contained herein, and 965 also many contributions from members of the usenet-format mailing 966 list. 968 9. Contact Address 970 Charles. H. Lindsey 971 5 Clerewood Avenue 972 Heald Green 973 Cheadle 974 Cheshire SK8 3JU 975 United Kingdom 976 Phone: +44 161 437 4506 977 Email: chl@clw.cs.man.ac.uk 979 Comments on this draft should preferably be sent to the mailing list 980 of the Usenet Format Working Group at 982 usenet-format@landfield.com. 984 This draft expires six months after the date of publication (see Page 985 1) (i.e. in November 2000). 987 10. Intellectual Property Rights 989 [The usual texts from RFC 2026 to be inserted here.] 991 Appendix A. Model implementation 993 The following is written in PERL, with full use made of facilities 994 provided by the Perl CPAN library. 996 Appendix A.1. The foo canonicalization 998 package Canon; 1000 use MIME::Words qw(decode_mimewords); 1001 use Date::Parse; 1002 use Exporter (); 1003 @ISA = qw(Exporter); 1004 @EXPORT = qw(canonicalize); 1006 %unstructureds = ('subject', 1, 'comments', 1, 'organization', 1, 1007 'summary', 1); 1008 %dates = ('date', 1, 'resent-date', 1, 'expires', 1); 1010 sub canonicalize { 1011 my $tag = lc shift; 1012 my $line = shift; 1013 my $signing = shift; # for more stringent checks when signing 1015 $is_structured = (not $unstructureds{$tag}) && $tag !~ m/^x-/o; 1016 $is_date = $dates{$tag}; 1017 @outlist = ($tag, ': '); 1018 $outptr = \@outlist; # will point to @encodelist during encoding 1019 $state = 0; # for the state machine 1020 $encoding = 0; # part of the state machine 1021 $pending = 0; # to remember the FWS between encoded-words 1023 do { 1024 # lexical split of $line into plain ($x) and next delimiter ($y) 1025 $line =~ m/(.*?) # anything except the following: 1026 ( \\\S # quoted-pair 1027 | [][)><("] # various bracket delimiters 1028 | =\?(?!=) | \?=\s+=\? | \?= # for encoded-words 1029 | \s*$ # trailing whitespace 1030 ) /sogx; 1031 $x = $1; $y = $2; 1033 # convert $x into canonical form 1034 if ($is_date && $state == 0) { 1035 $x =~ s/(\S*)\s+/$1 /sog; # reduce FWS to SP 1036 if ($x !~ m/^\s*$/) { # zone not empty 1037 if ($signing && $x !~ m/^\s? 1038 ((mon|tue|wed|thu|fri|sat|sun)\s?,\s?)? 1039 [0-9]{1,2}\s 1040 (jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\s 1041 [0-9]{4}\s 1042 [0-9]{2}:[0-9]{2}:[0-9]{2}\s 1043 [-+][0-9]{4}\s? 1044 /oix) {die "Bad Date '", $x, "'\n"} 1045 if (not ($x = str2time($x))) {die "Bad Date '", $x, "'\n"} 1046 } 1047 } elsif ($is_structured && $state <= 0) { 1048 $x =~ s/(\S*)\s+/$1/sog; # eliminate FWS 1049 } else { # unstructured, or in a comment-zone 1050 $x =~ s/(\S*)\s+/$1 /sog; # reduce FWS to SP 1051 } 1052 push @$outptr, $x; 1054 # state machine to process $y 1055 if ($is_structured) { 1056 if ($state == 0) { # neutral-zone 1057 if ($y eq '"') 1058 {$state = -1; _end_encoding()} 1059 elsif ($y eq '<') 1060 {$state = -2; push @$outptr, $y; _end_encoding()} 1061 elsif ($y eq '[') 1062 {$state = -3; push @$outptr, $y; _end_encoding()} 1063 elsif ($y eq '(') 1064 {$state = 1; push @$outptr, $y; _end_encoding()} 1065 elsif ($y eq '=?') 1066 {_start_encoding(); push @$outptr, $y} 1067 elsif ($y =~ m/\?=/o) 1068 {push @$outptr, $y; _end_encoding()} 1069 elsif ($y =~ m/^[])>]$/o) { 1070 if ($signing) {die "Unbalanced '", $y, "'\n"} 1071 else {push @$outptr, $y} 1072 } 1073 else {$y =~ s/^\s*$/\r\n/o; push @$outptr, $y} 1074 # eliminate trailing WS; insert CRLF 1076 } else { 1077 if ($y =~ s/^\s*$/\r\n/o && $signing) 1078 {die "Unbalanced header ", $line} 1080 if ($state == -1) { # in quoted-zone 1081 if ($y eq '"') {$state = 0} 1082 else {push @$outptr, $y} 1083 } 1084 elsif ($state == -2) { # in sharp-zone 1085 if ($y eq '>') {$state = 0} 1086 push @$outptr, $y; 1087 } 1088 elsif ($state == -3) { # in square-zone 1089 if ($y eq ']') {$state = 0} 1090 push @$outptr, $y; 1091 } 1092 elsif ($state > 0) { # in comment-zone 1093 if ($y eq '(') 1094 {$state ++; push @$outptr, $y; _end_encoding()} 1095 elsif ($y eq ')') 1096 {$state --; push @$outptr, $y; _end_encoding()} 1097 elsif ($y eq '=?') 1098 {_start_encoding(); push @$outptr, $y} 1099 elsif ($y =~ m/\?=/o) 1100 {push @$outptr, $y; _end_encoding()} 1101 else {push @$outptr, $y} 1102 } 1103 } 1104 } else { # unstructured 1105 $y =~ s/^\s*$/\r\n/o; # eliminate trailing WS; insert CRLF 1106 if ($y eq '=?') 1107 {_start_encoding(); push @$outptr, $y} 1108 elsif ($y =~ m/\?=/o) 1109 {push @$outptr, $y; _end_encoding()} 1110 else {push @$outptr, $y} 1111 } 1113 } until $y eq "\r\n"; 1114 if ($encoding) {_end_encoding()} 1115 $line = join('', @outlist); 1116 return $line; 1117 } 1119 sub _start_encoding { # entered at every '=?' 1120 @encodelist = (); 1121 $outptr = \@encodelist; # divert output during encoding 1122 $encoding = 1; 1123 } 1125 sub _end_encoding { # entered at every '?=' or unexpected delimiter 1126 my $token = "[^][()<>@,;:\"\?.=\x00-\x20\x7f-\xff]+"; 1127 my $encoded_text = "[^\?\x00-\x20\x7f-\xff]+"; 1128 if ($encoding) { 1129 $outptr = \@outlist; # cease output diversion 1130 if ($y =~ m/^\?=/o) { # '?=' as expected 1131 $encodelist[$#encodelist] = '?='; # in case it was '?=\s=?' 1132 $x = join('', @encodelist); 1133 if ($genuine = $x =~ m/^=\?$token\?$token\?$encoded_text\?=$/o) 1134 {$x = decode_mimewords($x)} # dies if it fails 1135 if ($is_structured && $state <= 0) { 1136 if ($genuine) {$x =~ s/\s//go} # eliminate FWS 1137 } else { 1138 if ($pending && not $genuine) {push @$outptr, ' '} 1139 } 1140 push @$outptr, $x; 1141 } else { # unexpected delimiter during encoding 1142 if ($pending && (not $is_structured || $state > 0)) { 1143 push @$outptr, ' '; 1144 } 1145 push @$outptr, @encodelist; 1146 } 1147 $encoding = 0; 1148 if ($pending = $y =~ m/^\?=\s+=\?/o) { 1149 _start_encoding(); 1150 push @$outptr, ('=?'); 1151 } 1152 } 1153 } 1155 Appendix A.2. Parsing of the Signed header 1157 # This module must be stored in Mail/Field/Signed.pm 1158 # relative to the other programs in the suite 1159 package Mail::Field::Signed; 1161 use strict; 1162 use vars qw(@ISA); 1163 use MIME::Field::ParamVal; 1164 use Carp; 1166 @ISA = qw(MIME::Field::ParamVal); 1168 INIT: { 1169 my $x = bless([]); 1171 $x->register('Signed'); 1172 $x->register('Signed_1'); 1173 $x->register('Signed_2'); 1174 $x->register('Signed_3'); 1175 $x->register('Signed_4'); 1176 $x->register('Signed_5'); 1177 $x->register('Signed_6'); 1178 $x->register('Signed_7'); 1179 $x->register('Signed_8'); 1180 $x->register('Signed_9'); 1182 } 1184 my @news_standard = qw(date newsgroups distribution message-id from 1185 reply-to followup-to references subject 1186 keywords control content-type content-id); 1187 my @mail_standard = qw(date from reply-to to cc in-reply-to 1188 references subject keywords content-type 1189 content-id); 1191 sub parse { 1192 my ($self, $string) = @_; 1193 my $clean_string = _skip_CFWS($string); 1194 $self->set($self->parse_params($clean_string)); 1195 $self->{string} = $string; 1196 $self->{header_refs} = (); 1197 do { 1198 if ($self->{_} =~ m/([-+]?[-\w]+(\/\d+)*)/og) { 1199 if ($1 eq "news-standard") 1200 {$self->_incorporate_header(@news_standard)} 1201 elsif ($1 eq "mail-standard") 1202 {$self->_incorporate_header(@mail_standard)} 1203 else 1204 {$self->_incorporate_header(($1))} 1205 } else { die "Bad header-ref-list", $string,"\n" } 1206 } while ($self->{_} =~ m/,/og); 1207 return $self; 1208 } 1210 sub stringify { 1211 my $self = shift; 1212 return $self->{string}; 1213 } 1215 sub header_refs { 1216 my $self = shift; 1217 @{$self->{header_refs}}; 1218 } 1220 sub _incorporate_header { 1221 my ($self, @additions) = @_; 1222 my $refs = \@{$self->{header_refs}}; 1223 foreach (@additions) { 1224 if (m/^-([-\w]+(\/\d+)*)/o) { 1225 # item to be removed from list 1226 for (my $i = 0; $i < @$refs; $i++) 1227 {if (@$refs[$i] eq $1) {splice(@$refs, 1)} } 1228 } elsif (m/^\+?([-\w]+(\/\d+)*)/o) { 1229 # item to be added to list 1230 I: { 1231 for (my $i = 0; $i < @$refs; $i++) 1232 {if (@$refs[$i] eq $1) {last I} } 1233 push (@$refs, $1); # only if not already present 1234 } 1235 } 1237 } 1238 } 1240 sub _skip_CFWS { 1241 my $line = shift; 1242 my $count = 0; 1243 my @buf = (); 1244 while ($line =~ m/\G([^\s\("]*)\s*|\G(\()|\G(")/sog) { 1245 if ($1) {push @buf, ($1)} 1246 elsif ($2) { # comment 1247 $count += 1; 1248 do { 1249 $line =~ m/\G[^()]*([()])/sog 1250 or die "Unclosed comment\n"; 1251 $count += ($1 eq '(') ? +1 : -1; 1252 } until ($count == 0); 1253 } elsif ($3) { # quoted-string 1254 push @buf, ('"'); 1255 do { 1256 $line =~ m/\G([^\"\s]+)|\G(\s+)|\G(")/sog; 1257 if ($1) {push @buf, ($1)} 1258 elsif ($2) {push @buf, (' ')} 1259 elsif ($3) {push @buf, ('"'); last} 1260 } 1261 } 1262 } 1263 return join('', @buf); 1264 } 1266 1; 1268 Appendix A.3. The Signing program 1270 use English; 1271 use Mail::Header; 1272 use Mail::Field; 1273 use Mail::Field::Signed; 1274 use MIME::Parser; 1275 use Canon; 1277 $signing = 1; # This is a program to sign headers 1279 # Read partial Signed header from file 1280 open SIGNED, "<".$ARGV[0]; 1281 $signed = new Mail::Header \*SIGNED; 1282 @names = $signed->tags; 1283 $tag = $names[0]; 1284 if ($tag !~ m/^signed(-[1-9])?$/oi || $#names != 0) 1285 {die "Invalid SIGNED file ", $ARGV[0], "\n"} 1286 $line = Mail::Field->extract($tag, $signed); 1288 unless (lc($line->param('protocol')) eq 'foobar') 1289 {die "Unknown protocol ", $line->param('protocol'), "\n"} 1290 if ($line->param('sig')) 1291 {die "'sig' already present\n"} 1292 unless ($line->param('key')) 1293 {die "'key' missing\n"} 1295 $parser = new MIME::Parser output_to_core=>'ALL'; 1296 $article = $parser->read(\*STDIN) or die "Malformed article\n"; 1298 if ($article->head->count($tag)) 1299 {die "Message already signed\n"} 1301 $tmp = "/tmp/sign-$$"; 1302 open(FH, "> $tmp") or die "Cannot open $tmp: $!\n"; 1304 print FH canonicalize($tag, $line->stringify, $signing); 1305 foreach $ref ($line->header_refs) { 1306 _extract_header($article, $ref); 1307 } 1308 close(FH); 1310 sub _extract_header { 1311 my ($article, $ref) = @_; 1312 $ref =~ m/([-\w]+(\/\d+)*?)((\/(\d+))?)/o; 1313 if ($3) # $ref of the form "header/1"; call ourselves recursively 1314 {_extract_header($article->parts($5-1), $1)} 1315 else { # $ref is a header at the current level 1316 if ($article->head->count($1) > 1) 1317 {die "Cannot sign duplicated header ", $1, "\n"} 1318 elsif ($article->head->count($1) == 1) { 1319 print FH canonicalize($1, $article->head->get($1), $signing) 1320 } 1321 } 1322 } 1324 # The remainder of this code is dependent upon the particular 1325 # implementation of OpenPGP. 1327 $key = $line->param('key'); 1328 $pgp = 1329 "pgps -fab +verbose=0 +textmode=off -u $key < $tmp 2>/dev/null |"; 1330 open(FH, $pgp) or die "Cannot open pipe from pgp: $!\n"; 1332 undef $INPUT_RECORD_SEPARATOR; 1333 $_ = ; # The OpenPGP signature record 1334 unlink $tmp; 1335 s/^.*[^\w+\/=\n].*\n|^\n//mog; # remove non-base64 lines 1336 s/^/ /mog; # indent by 3 spaces 1337 s/\A/;\n sig="\n/mo; s/\Z/"/mo; # enclose in '; sig="..."' 1339 $article->head->add($tag, $line->stringify . $_); 1340 $article->print; 1342 Appendix A.4. The Verification program 1343 use English; 1344 use Mail::Header; 1345 use Mail::Field; 1346 use Mail::Field::Signed; 1347 use MIME::Parser; 1348 use Canon; 1350 $signing = 0; # This is a program to verify signed headers 1351 $parser = new MIME::Parser output_to_core=>'ALL'; 1352 $article = $parser->read(\*STDIN) or die "Malformed article\n"; 1354 $tag = $ARGV[0]; 1355 unless ($tag =~ m/^Signed(-[1-9])?/io) 1356 {die "Bad parameter ", $tag, "\n"} 1358 $line = Mail::Field->extract($tag, $article); 1359 unless ($line) 1360 {die $tag, " header not found\n"} 1361 unless (lc($line->param('protocol')) eq 'foobar') 1362 {die "Unknown protocol ", $line->param('protocol'), "\n"} 1363 unless ($line->param('key') and $line->param('sig')) 1364 {die "Malformed Signed header\n"} 1366 $tmp = "/tmp/sign-$$"; 1367 open(FH, "> $tmp") or die "Cannot open $tmp: $!\n"; 1369 $signed = $line->stringify; 1370 $signed =~ s/\s*;[^;]*\bsig\b[^;]*$//io; # remove "; sig=..." 1371 print FH canonicalize($tag, $signed, $signing); 1372 foreach $ref ($line->header_refs) { 1373 _extract_header($article, $ref); 1374 } 1375 close(FH); 1377 sub _extract_header { 1378 my ($article, $ref) = @_; 1379 $ref =~ m/([-\w]+(\/\d+)*?)((\/(\d+))?)/o; 1380 if ($3) # $ref of the form "header/1"; call ourselves recursively 1381 {_extract_header($article->parts($5-1), $1)} 1382 else { # $ref is a header at the current level 1383 if ($article->head->count($1) > 1) 1384 {die "Duplicated header ", $1, " signed\n"} 1385 elsif ($article->head->count($1) == 1) { 1386 print FH canonicalize($1, $article->head->get($1), $signing) 1387 } 1388 } 1389 } 1391 # The remainder of this code is dependent upon the particular 1392 # implementation of OpenPGP. 1394 use IPC::Open2; 1395 $pgp = "pgpv -f --batchmode -o $tmp 2>&1"; 1396 open2(\*PIPEOUT, \*PIPEIN, $pgp); 1397 $armour = $line->param('sig'); 1398 $armour =~ s/\s//sog; 1399 $armour =~ s/([\w+\/=]{64})/$1\n/sog; 1400 $armour =~ s/(=[\w+\/]{4}\Z)/\n$1/so; 1401 print PIPEIN "-----BEGIN PGP SIGNATURE-----\n", 1402 "Charset: noconv\n\n", 1403 $armour, "\n", 1404 "-----END PGP SIGNATURE-----\n"; 1405 close(PIPEIN); 1406 undef $INPUT_RECORD_SEPARATOR; 1407 $result = ; 1408 unlink $tmp; 1410 $result =~ s/^This signature applies to another message\n//mo; 1411 $result =~ m/Key ID +([0-9a-fA-F]+)/iom; 1412 unless ("0x" . $1 eq $line->param('key')) { 1413 print "Signature was for key ", $line->param('key'), 1414 ", not for 0x", $1, "\n"; 1415 $badsig = 1; 1416 } 1417 $badsig |= ($result !~ m/Good signature/iom); 1418 print $result; 1419 exit $badsig; 1421 Appendix B. Test cases 1423 The following, believe it or not, is a valid email message. Note 1424 that there are various TABs and much trailing whitespace in it 1425 (assuming these come through to the published form of this document). 1427 Subject: Unstructured headers can contain unmatched (s and unescaped 1428 "s; (comments like this) and "quoted strings" are not 1429 treated specially. 1430 SUMMARY: Multiple spaces, tabs and foldings 1431 in unstructured headers are reduced to a single SP, and trailing 1432 whitespace (of which there is much in these examples)) is ignored. 1433 X-Header: All X headers are "treated "as unstructured") 1434 from: "Scooby Doo" (all FWS in 1435 structured headers is removed, except in comments) 1436 tO: "John (the Boss) Smith" , 1437 "Bill \"fingers\" 1438 Sykes" <"#*\"~"@twist.example> (Observe unescaped \( and escaped " 1439 within quoted strings, and (properly matched) parentheses within 1440 comments) 1441 rEPLY-tO:"#*\"~"@twist.example 1442 (Observe "s elided, since not in <...>) 1443 Message-ID: <"*\"~and-other-grunge)(]["@[127.0.0.1"Ugh!]> 1444 (Yes that is a legal msg-id, including the " in the domain-literal) 1445 Sender: foo@[127.0.0.1"Ugh!] (another " in a domain-literal) 1446 Cc: foo@[127.0.0.1(this is not], bar@[a comment)127.0.0.1], 1447 "=?utf-8?Q?not_an_encoded_word?=" 1448 <=?utf-8?Q?not_an_encoded_word?=@bar.example>, 1449 =?us-ascii?Q?Joe_D._Bloggs_=5Bwho=20else=5d?= , 1450 =?us-ascii?Q?C&A?=@bar.example (treated as an encoded-word even 1451 though, syntactically, it isn't) 1452 (in comment but =?is0-8859-1?Q?not(an_encoded-word?=)) 1453 (=?us-ascii?Q?encoded-word_split_into-?= 1454 =?us-ascii?b?cGFydHM=?=) 1455 Comments: An unstructured encoded word can have 1456 =?us-ascii?Q?any_characters_in_it_<>()[]"?= =?bogus_e.w?= 1457 Date: (pre comment) sAt, 13 fEb 1458 1999 14:59:56 -0800 (PST) 1459 Keywords: (various illegal constructs which nevertheless get through) 1460 \(Not a comment\), \" (naked quoted-pair), \ (not a quoted-SP) 1462 Comments: Various mismatches, which should be rejected. 1463 Foo: ) (naked \)) 1464 Bar: ((mismatched parens) 1465 Baz: <"mismatch" 1466 Fred: ["mismatch" 1467 Date: Sat, 13 Feb 1999 23:00:14 GMT 1468 Date: 29 Feb 2001 23:00:14 +0000 1470 The following is the result of applying the foo canonicalization to 1471 it (lines folded for convenience, as before, and blank lines inserted 1472 between headers for readability). 1474 subject: Unstructured headers can contain unmatched (s and unesca 1475 ped "s; (comments like this) and "quoted strings" are not treated 1476 specially.CRLF 1478 summary: Multiple spaces, tabs and foldings in unstructured heade 1479 rs are reduced to a single SP, and trailing whitespace (of which 1480 there is much in these examples)) is ignored.CRLF 1482 x-header: All X headers are "treated "as unstructured")CRLF 1484 from: ScoobyDoo(all FWS in structured headers is 1485 removed, except in comments)CRLF 1487 to: John(theBoss)Smith,Bill\"fingers\"Sykes<"#*\ 1488 "~"@twist.example>(Observe unescaped \( and escaped " within quot 1489 ed strings, and (properly matched) parentheses within comments)CRLF 1491 reply-to: #*\"~@twist.example(Observe "s elided, since not in <.. 1492 .>)CRLF 1494 message-id: <"*\"~and-other-grunge)(]["@[127.0.0.1"Ugh!]>(Yes tha 1495 t is a legal msg-id, including the " in the domain-literal)CRLF 1497 sender: foo@[127.0.0.1"Ugh!](another " in a domain-literal)CRLF 1499 cc: foo@[127.0.0.1(thisisnot],bar@[acomment)127.0.0.1],=?utf-8?Q? 1500 not_an_encoded_word?=<=?utf-8?Q?not_an_encoded_word?=@bar.example 1501 >,JoeD.Bloggs[whoelse],C&A@bar.example(treated a 1502 s an encoded-word even though, syntactically, it isn't)(in commen 1503 t but =?is0-8859-1?Q?not(an_encoded-word?=))(encoded-word split i 1504 nto-parts)CRLF 1505 comments: An unstructured encoded word can have any characters in 1506 it <>()[]" =?bogus_e.w?=CRLF 1508 date: (pre comment)918946796(PST)CRLF 1510 keywords: (various illegal constructs which nevertheless get thro 1511 ugh)\(Notacomment\),\"(naked quoted-pair),\(not a quoted-SP)CRLF