Network Working Group Charles H. Lindsey Internet-Draft University of Manchester September 2001 Signed Headers in Mail and Netnews draft-lindsey-usefor-signed-01.txt Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract The huge growth of Netnews/Usenet in recent years has been accompanied by many attempts to abuse the system by various forms of malpractice, particularly the forging of various headers, causing it to appear that articles came from parties other than those that actually injected them or conveyed some Approval that the real poster was not entitled to give. Insofar as Netnews is regularly gatwayed to and from Email systems, these problems also extend to the Email domain. This document provides a cryptographically secure means whereby it can be established beyond doubt that relevant headers of a Netnews article or an Email message have not been tampered with in transit, and that they were indeed originated by the person purporting to have done so. It seeks to supplement, rather than to supplant, the existing protocols for signing the bodies of articles and messages. [This proposal arises from the activities of the Usenet Format Working Group, which is charged with updating the Netnews standards. Comments are invited, preferably sent to the mailing list of the Group at usenet-format@landfield.com.] Lindsey [Page 1] Signed Headers in Mail and Netnews May 2000 Table of Contents 1. Introduction .................................................. 3 1.1. Scope and Objectives ...................................... 3 1.2. Notations and Conventions ................................. 4 1.2.1. Requirements notation ................................. 4 1.2.2. Syntactic notation .................................... 4 1.3. Overview .................................................. 4 2. Basic Structure of Authenticating Headers ..................... 5 2.1. Syntax of the Signed header ............................... 5 2.2. Semantics of the Signed header ............................ 7 2.3. Syntax of the Verified header ............................. 10 2.4. Semantics of the Verified header .......................... 10 3. Protocol definition ........................................... 11 3.1. Requirements for canonicalization algorithms .............. 11 3.2. The PGP-Head-1 protocol ................................... 12 3.2.1. The PGP-Head-1 canonicalization algorithm ............. 13 3.2.2. The PGP-Head-1 cryptographic algorithm ................ 15 3.2.3. The PGP-Head-1 Macro Definition ....................... 16 4. Applications .................................................. 16 5. Examples ...................................................... 17 5.1. Newgroup Control message .................................. 17 5.2. Mail message re-signed by mailing list owner .............. 18 6. Security ...................................................... 19 7. References .................................................... 20 8. Acknowledgements .............................................. 21 9. Contact Address ............................................... 21 10. Intellectual Property Rights ................................. 21 Appendix A - Model implementation ................................. 21 Appendix A.1 - The PGP-Head-1 canonicalization .................... 21 Appendix A.2 - Parsing of the Signed header ....................... 25 Appendix A.3 - The Signing program ................................ 28 Appendix A.4 - The Verification program ........................... 29 Appendix B - Test cases ........................................... 30 Appendix C - PGP Public Key ....................................... 32 Lindsey [Page 2] News Article Format July 2001 1. Introduction [Remarks enclosed in square brackets and aligned with the left margin, such as this one, are not part of this draft, but are editorial notes to explain matters amongst ourselves, or to point out alternatives, or to indicate work yet to be done.] 1.1. Scope and Objectives [This is a Draft of a Draft, for discussion within the USEFOR mailing list until the best format for putting it forward has been decided on. It remains to be decided whether it should be aimed towards an Experimental Protocol or the Standards track. ] "Netnews" is a set of protocols [USEFOR] that enables news "articles" to be broadcast to potentially-large audiences, using a flooding algorithm which propagates copies throughout a network of participating hosts. The huge growth in the use of this protocol in recent years has been accompanied by many attempts to abuse the system by causing it to appear that articles came from parties other than those that actually injected them, or that they had been posted with some Approval that the real poster was not entitled to give, or that they otherwise appeared to be different from what they actually were. The effects of such abuse are particularly accute in the case of "Control" articles which can cause newsgroups to be created or removed on hosts worldwide, or which can cause unauthorized deletion of articles already received and stored on such hosts. It is therefore considered essential to provide a cryptographically secure means whereby it can be established beyond doubt that the source and structure of articles are exactly as they purport to be. "Electronic Mail" is a system for routing "messages" [RFC 2822] between individual computer users, usually on a one-to-one basis. The formats of Email messages and News articles have deliberately been made to be similar, so that messages may be gatewayed to news systems and vice-versa. In order that the same protection may be provided end-to-end for articles passing through such gateways, the protocal described here has been designed so that it will also work in the Email environment. If it should be found to have further applications in the Email environment, then that would be an added bonus. An existing experimental protocol "pgpverify" [PGPVERIFY] is already in widespread use for authenticating Control messages for creating and removing newsgroups within Usenet, and has proven itself very successful in mitigating the effects of malicious attacks against the integrity of Usenet. This present proposal is largely based upon pgpverify; however, pgpverify is unsuitable for more widespread use as it stands because it is unable to cope with folded headers and with the changes that mail messages in particular are likely to undergo during transport. A second similar experimental protocol "pgpmoose" [PGPMOOSE] is also currently in use for protecting moderated newsgroups against unauthorized postings. Lindsey [Page 3] Signed Headers in Mail and Netnews May 2000 There also exist protocols for the cryptographic signature of bodies of articles, notably S/Mime and PGP/Mime [RFC 2015] and [RFC 2015bis] , and it is moreover common to sign such bodies using PGP alone without the use of Mime [RFC 2045] et seq at all. However, these protocols cannot, by their nature, be used to sign headers. Moreover, since the signature is applied after any Content-Transfer-Encoding [RFC 2045], it may be impossible to verify the signature if the Content-Transfer-Encoding should be changed as the message passes through a succession of sites during transport. Nevertheless, this present proposal does not attempt to usurp those protocols, but merely provides the means to sign headers, both of complete messages and of headers embedded in Mime messages and multiparts. [This document has been designed to fit on top of the draft currently in preparation for News [USEFOR]. If it is thought wise to issue this document before [USEFOR] is complete, then that reference will have to be to [RFC 1036] instead.] 1.2. Notations and Conventions 1.2.1. Requirements notation Certain words, when capitalized, are used to define the significance of individual requirements. The key words "MUST", "SHOULD", "MAY" and the same words followed by "NOT" are to be interpreted as described in [RFC 2119]. 1.2.2. Syntactic notation This document uses the Augmented Backus Naur Form described in [RFC 2234]. A discussion of this is outside the bounds of this document, but it is expected that implementors will be able to quickly understand it with reference to the defining document. 1.3. Overview This proposal makes provision for Signed headers to be included in news articles and in Mime messages and multiparts. A Signed header provides a cryptographic signature over a named set of other headers, including lower level headers contained in Mime messages and multiparts below the current level. Such signatures can give assurance to a recipient who verifies them that those headers have not been changed or added to in transit, and/or that the article was indeed sent by its purported originator. The bodies of articles, Mime messages and multiparts are not directly included in the Signature. Rather, the intention is that each such body part should have a Content-MD5 (or similar) header computed for it, and that header should then be included in the Signature instead. There is also provision for Verified headers which may be added by agents that have checked a Signed header. Verified headers may themselves be included in further Signed headers; this may be especially useful in the case of gateways which find it necessary to Lindsey [Page 4] Signed Headers in Mail and Netnews May 2000 change an article in ways that invalidate an original signature. Every effort has been made to ensure that signatures remain verifiable in spite of all reasonable (and even unreasonable) changes to which they may be subjected in transit. These include changes to the Content-Transfer-Encoding of body parts (a principle reason for including them only via the Content-MD5 header), changes in the order of headers and of their layout, and encodings and re-encodings of unusual character sets. This is to be achieved by converting headers into a canonical form before they are signed. New headers, yet to be invented, need provide no problem, and there is no commitment to any particular character set (provided field-names remain in US-ASCII, as at present). Provision is made for different protocols which may be required in the future. However, this proposal defines just one, recommended protocol, and it is not desirable that other protocols should be defined unless and until serious deficiencies in the existing ones have been revealed. 2. Basic Structure of Authenticating Headers A Signed or a Verified header may appear in the headers of a news article or a mail message, or in the headers of a Mime multipart sub-part or of a Mime message/rfc822 object (or indeed of any similar Mime object yet to be invented). In all cases, the term "current level" encompasses the entire set of headers in that same object. Where the headers at the current level include a "Content-Type: multipart/*" or "Content-Type: message/*" header, lower-level headers can arise within its sub-parts. Examples of Netnews articles and Email messages containing these headers may be found in section 5 below. 2.1. Syntax of the Signed header Signed = "Signed" ["-" DIGIT9] ":" 1*SP header-ref-list 1*( ";" header-parameter ) CRLF DIGIT9 = %x31-39 ; 1..9 header-ref-list= header-ref *( [CFWS] "," [CFWS] header-ref ) header-ref = [ "+" / "-" ] [ subpart-indicator ] field-name / [ subpart-indicator ] macro-name subpart-indicator = 1*( DIGIT9 *DIGIT ":" ) field-name ; see section [RFC 2822] CFWS ; see [RFC 2822] FWS ; see [RFC 2822] macro-name = '$' header-parameter = signed-header-parameter / other-header-parameter signed-header-parameter = signed-token "=" value Lindsey [Page 5] Signed Headers in Mail and Netnews May 2000 signed-token = [CFWS] ( "protocol" / "key" / "sig" ) [CFWS] / other-header-parameter = attribute "=" value attribute = iana-token / x-token iana-token = value = token / quoted-string x-token = [CFWS] "x-" token-core [CFWS] token = [CFWS] token-core [CFWS] token-core = 1* tspecials = "(" / ")" / "<" / ">" / "@" / "," / ";" / ":" / " "/" / "[" / "]" / "?" / "=" quoted-string ; see [RFC 2822] protocol-value = ietf-token / x-token ietf-token = key-id-value = token signature-value= [CFWS] DQUOTE [FWS] 1*( btext [FWS] ) DQUOTE [CFWS] btext = %x41-5A / %x61-7A / %x30-39 / "+" / "/" / "=" ; base 64 chars The header-parameters MUST include a "protocol" parameter and a "sig" parameter, of which the "sig" paramameter MUST be the last parameter and MUST NOT be followed by CFWS (though it MAY be followed by WS). Any other-header-parameter that is present SHOULD be ignored. NOTE: The requirement for an explicit SP after the ":" is to ensure compatibility with the syntax of Netnews [USEFOR]; it is not strictly necessary for Email. The use of a DIGIT9 in the Signed header allows for 10 distinct such headers at any one level. This is more than sufficient for the intended usage (it would be most unusual to get beyond Signed-2) whilst still permitting implementations to check field-names against a fixed list of valid names. There MUST NOT be more than one Signed header with no DIGIT9, or the same DIGIT9, within one set of headers. The header-ref-list indicates those header-refs, at or below the current level, which are covered by the signature. The ordering of this list is significant. A header-ref prefixed by a "+", or not prefixed at all, indicates a header-ref to be added to the list defined by those preceding it (unless already present therein), and a header-ref prefixed by "-" indicates a header-ref to be removed from the header-refs defined by the list preceding it. Any macro-names in the header-ref-list MUST be defined, in the definition of the protocol, to expand into a header-ref-list which does not itself contain any subpart-indicators or further macro-names. Lindsey [Page 6] Signed Headers in Mail and Netnews May 2000 NOTE: If some header-ref in the list matches no header in the actual article, then it comprises an assertion that no such header was present when the article was signed. Headers which are routinely added to or altered as the article progresses through transports (such as Path, Received and Xref, and even Sender and Content-Transfer-Encoding) SHOULD NOT be included in a header-ref-list, and neither should any header which appears twice in the set of headers. A header-ref prefixed by "-" may be used to exclude any header-ref arising from the expansion of a macro-name. Tokens are case-insensitive. "PGP-Head-1" (section 3.2) is the preferred protocol defined by this proposal. It is desirable to keep the number of recognized protocols to an absolute minimum, and it is anticipated that further protocols would only be needed in the event that serious cryptographic deficiencies were to be found in the existing ones. The "key" parameter identifies the key used to generate the signature in a notation dependent upon the protocol (but commonly "0x" followed by hexadecimal digits). The CFWS following it MAY include a comment containing an identification of the person or entity which owns that key. 2.2. Semantics of the Signed header Where the headers at the current level include a "Content-Type: multipart/*" or "Content-Type: message/*" header, lower-level headers within its sub-parts may be referenced as follows: (i) A header-ref not containing any subpart-indicator references the header of that name, if any, at the current level. Header-refs are, for this purpose, considered as case-insensitive. (ii) A header-ref of the form ":XXXX/" (or "::...XXXX"), where and are numbers and the current level contains a "Content-Type: multipart/*" header, references the header that would be referenced by "XXXX" alone (or by ":...XXXX") in the th sub-part of that multipart, that sub-part now being regarded as the current level. (iii)A header-ref of the form "1:XXXX", where the current level contains a "Content-Type: message/rfc822" header (or any other message type which provides for its own set of headers), references the header that would be referenced by "XXXX" alone in that message object. (iv) A header-ref that does not match up with multipart or message Content-Type headers as indicated above MUST NOT be used. (v) For example "3:2:Content-MD5" references the Content-MD5 header of the second part of a multipart, which is itself the third part of a multipart established at the current level. Lindsey [Page 7] Signed Headers in Mail and Netnews May 2000 A protocol, as established by this proposal or by any extension to it, comprises three parts: a "canonicalization algorithm", a "cryptographic algorithm", and possibly a "macro definition". The signature of a Signed header is constructed in accordance with a given header-ref-list as follows: 1. A partial Signed header is constructed from that header-ref-list and such header-parameters (excluding "sig") as are required by the protocol, including at least a "protocol" parameter and, most likely, a "key" parameter identifying the cryptographic key used (possibly followed by a comment indicating the person or entity responsible), all followed by a CRLF. 2. A reduced header-ref-list is obtained by first expanding any macro-names defined in the macro definition of the protocol, duplicating any subpart-indicator in front of each header-ref in the expansion; then any "+" prefixes are stripped, and finally, working from left to right, if a header-ref duplicates a preceding one the second copy is removed, and if a header-ref is prefixed by "-", that copy and any previous ones (without that prefix) are removed. 3. The partial Signed header (with its original header-ref-list) followed by all the headers referenced by the reduced header-ref- list (being headers at the current level or encapsulated within multiparts at any lower level and taken in their order within the reduced header-ref-list) are concatenated to produce a list of headers to be signed. 4. The list of headers to be signed is subjected to the canonicalization algorithm of the protocol to produce a canonicalized list. 5. The canonicalized list is subjected to the cryptographic algorithm of the protocol to produce a character stream representing the signature encoded in base64 [RFC 2045]. 6. A "sig" parameter is appended to the partial Signed header, its value consisting of a quoted-string containing the base64-encoded stream, split into convenient lines by the insertion of FWS. 7. The Signed header thus constructed is then incorporated into the set of headers at the current level. The signature of a Signed header is verified as follows: 1. The "sig" parameter is removed from the Signed header to give a partial Signed header. 2-4.The corresponding steps of the process that constructed the header are taken, producing a canonicalized list. Lindsey [Page 8] Signed Headers in Mail and Netnews May 2000 5. The public key identified according to the "protocol" parameter is now used by the cryptographic algorithm of that protocol to verify that canonicalized list. This may result in a simple pass-fail, or it may return some indication of the privileges (such as the authority to issue certain news control messages or to manage some mailing list) enjoyed by the owner of that key. The purpose of a Signed header is solely to establish that the headers referenced in it were present in an article when that article passed through the hands of the person or entity that generated the signature (and hence that it did indeed pass through those hands). It SHOULD NOT be taken as an endorsement of whatever is contained in the body of the article. If the contents of the body require such endorsement, then the body SHOULD be signed separately, for example in accordance with PGP/Mime [RFC 2015] and [RFC 2015bis]. Signatures will typically be generated by the originators of articles (to prove the origin), by moderators of moderated newsgroups (to testify to their Approved header), by managers of mailing lists, and occasionally by gateways. They SHOULD NOT be generated by intermediate transports and relayers through which the article might pass. This is intended to be an end-to-end protocol, and signatures SHOULD ONLY be added when new, hitherto unsigned, information is added. Moreover, the set of headers included within the signature SHOULD be no more than is necessary to achieve the security desired. NOTE: It will be observed that no provision has been made to include the bodies of an article or of its sub-parts in the signature. If (as will indeed often be the case) it is required to attest that the body (or sub-part) dispatched along with the set of headers is the same as the body that was delivered at the far end, then the proper procedure is to construct a Content-MD5 header [RFC 1864] for that body (or sub-part) and to include that Content-MD5 amongst the headers that are signed. Doing it this way confers three advantages: a) The Content-MD5 header is constructed in such a way that it is immune to changes of Content-Transfer-Encoding to which an article, or its sub-parts, may be subjected during transport. b) Given that many user agents already routinely construct a Content-MD5 header, and verify it on receipt (a practice much to be commended), it should be possible to generate a Signed header without an extra pass through the entire body (especially in the common case where there are no sub-parts). This applies particularly in the case of additional signatures by moderators or mailing list managers, who may not need to examine the body at all. c) If a Content-MD5 header should fail to verify (perhaps because of some transmission error) the verification of a Signed header might still succeed, giving the recipient at least some partial information as to where any problem might lie. NOTE: If, at some future time, a Content-SHA1 header (or any similar header based upon a different hashing algorithm) should be invented, it could equally well be used for this purpose. Lindsey [Page 9] Signed Headers in Mail and Netnews May 2000 2.3. Syntax of the Verified header Verified = "Verified" ["-" DIGIT9] ":" 1*SP name-addr *( ";" header-parameter ) CRLF name-addr ; see [RFC 2822] header-parameter =/ verified-header-parameter verified-header-parameter = signature-token "=" signature-value / hashcheck-token "=" hashcheck-value signature-token= [CFWS] "signature" [CFWS] hashcheck-token= [CFWS] "hashcheck" [CFWS] signature-value= [CFWS] ( "good" / "FAILED" ) [CFWS] hashcheck-value= [CFWS] DQUOTE ( "good" / "FAILED" ) FWS header-ref-list DQUOTE [CFWS] The Verified header is a "variant header" (i.e. it may be present or not, and in a different form, in different copies of the same article or message). The use of a DIGIT9 in the Verified header allows for 10 distinct such headers in one article. Each Verified header MUST match some Signed header with the same DIGIT9 in that same set of headers. There MAY be more than one Verified header with the same DIGIT9 within one set of headers (but observe that it would not then be possible to include those headers in a further Signed header). Tokens used for attributes are case-insensitive. The only parameters defined by this proposal are the "signature" and "hashcheck" parameters. Any other-header-parameter that is present SHOULD be ignored. The absence of a "signature" parameter should be taken as indicating that the verification had succeeded. The "hashcheck" parameter is to indicate that a Content-MD5 (or similar) header identified in the header-ref-list had been verified, or not as the case may be. [Do we also want a "confidence" parameter for the verifier to express his certainty of the identity of the original Signer, and if so, what notation to use?] 2.4. Semantics of the Verified header The Verified header is intended to be added to an article by an agent through which the article passes, and serves as an assertion that the corresponding Signed header has been cryptographically verified by the person or entity identified in the name-addr (or otherwise if the "FAILED" value is present). The addr-spec contained in that name- addr MUST be a valid email address by which that person or entity may be contacted. The original Signed header MUST NOT be removed from the article. The Verified header (supposing it is the only one present with that particular DIGIT9, if any) MAY itself be included in a further Signed header added at the same time. Lindsey [Page 10] Signed Headers in Mail and Netnews May 2000 NOTE: The purpose of a Verified header is to save the ultimate recipient the trouble of verifying the cryptographic signature himself (which can be time consuming, and may require knowledge of public keys not in his possession). Such a verification, if performed close to the ultimate recipient (such as by the news or mail server to which he connects) could normally be regarded as adequate evidence of authenticity, even if not signed itself. It would be hard (certainly in the case of Netnews) for a malicious interloper to cause such a verification to appear bearing the identity of the local server of each ultimate recipient. NOTE: The Verified header is also useful in the case that a gateway (or a moderator) makes some change to an article that renders an original Signed header invalid. Such a gateway can therefore certify that the original form of the Signed header had been verified, and can then re-sign the article (including the added Verified header). Likewise, a site (such as the originator's own server) with a well known public key can verify and resign an article whose originator's public key may be less well known. However, Verified headers SHOULD NOT be added as routine by other intermediate sites. It is normally the business of the reading agent of the ultimate recipient to check the correctness of a Content-MD5 or similar header. Nevertheless, an earlier agent that has added a Verified header and also checked such a Content-MD5 header MAY so indicate by including a "hashcheck" parameter. 3. Protocol definition 3.1. Requirements for canonicalization algorithms It is a sad fact of life that those implementing agents for handling Netnews and Email cannot resist the temptation to "improve" articles passed through them by rewriting headers that are thought not to conform to some real or supposed standard. Experience shows that, in the majority of cases, such tinkering makes matters worse rather than better, and for that reason [USEFOR] and, to a lesser extent, [RFC 2822] and [RFC 2821] try to forbid it, especially when perpetrated by relaying and transport agents (there are arguments in favour of allowing injecting agents and other agents close to the originator to do some limited cleanups, especially where it is impractical to return the article to the originator for correction). Furthermore, in the case of Email it is often required for the transport protocols to modify articles en route, most notably when articles containing octets with the 8th bit set have to be passed through a channel that permits only 7bit. It is a further sad fact of life that agents which make such changes are not going to go away just because some standard says so. Therefore, the canonicalization algorithm SHOULD endeavour to enable the headers of articles to be signed and verified in accordance with Lindsey [Page 11] Signed Headers in Mail and Netnews May 2000 this proposal in spite of such tinkerings, insofar as they can be anticipated. The following list indicates some common practices which are worth detecting and protecting against. o Headers may be re-folded to fit within some preferred overall line length. This may result in the creation of whitespace where none existed before. o Trailing whitespace may be removed, and line endings changed to/from CRLF. o Field-names may be converted into some usual canonical form (e.g. "Mime-Version" into "MIME-Version"). o Phrases, or parts thereof, may be converted to or from quoted- strings. o Date-times may be rewritten in some preferred format, or into some preferred timezone. o Headers with non-ASCII characters may be converted to or from the notation defined in [RFC 2047]. Observe that there is no canonical way to do this conversion and it is, moreover, frequently performed in contexts where it is not strictly allowed. [Other contributions to this list welcomed.] Since the slightest change to a canonicalization algorithm will render it inoperable with previous versions, such an algorithm MUST NOT be changed once it has been defined by this proposal, or any extension thereof. In the event of some inadequacy being found, it would be necessary to devise and standardize a new algorithm, a task not to be undertaken lightly. For this reason, canonicalization algorithms SHOULD be designed to cope with the widest possible range of headers, including those not yet invented. Therefore, they SHOULD NOT, so far as possible, rely on the ability to parse any particular header. NOTE: A canonicalization algorithm is required simply to produce an octet stream for submission to the cryptographic algorithm. That stream does not have to be human readable, nor does it have to be a syntactically-correct header, nor does it have to be convertible back into the original header, or into any correct header at all. Insofar as many original headers can, in principle, be mapped into the same octet stream, this in no way reduces the utility of the algorithm, even though it might enable conspiracy theorists to imagine, and even implement, various sorts of covert channels for use by malicious interlopers. 3.2. The PGP-Head-1 protocol The "The PGP-Head-1" protocol is comprised of a canonicalization algorithm, a cryptographic algorithm, and a macro definition. Lindsey [Page 12] Signed Headers in Mail and Netnews May 2000 3.2.1. The PGP-Head-1 canonicalization algorithm For the purposes of this algorithm, the headers Subject, Comments, Organization and Summary, and all headers starting with "X-", are to be considered "unstructured" and all other headers "structured" (whether or not they were so described in any other standard). Headers are considered to be constrained to the following syntax: structured-header = header-name ":" 1*SP structured-header-content CRLF unstructured-header = header-name ":" 1*SP unstructured-header-content CRLF header-name = 1*name-character *( "-" 1*name-character ) name-character= ALPHA / DIGIT structured-header-content = *structured-header-zone unstructured-header-content = unstructured-header-zone structured-header-zone = neutral-zone / quoted-zone / sharp-zone / square-zone / comment-zone unstructured-header-zone = *( FWS / encoded-word / ) neutral-zone = 1*( FWS / encoded-word / ) quoted-zone = DQUOTE *( FWS / ) DQUOTE sharp-zone = "<" *( FWS / "> ) ">" square-zone = "[" *( FWS / ) "]" comment-zone = "(" *( FWS / encoded-word / comment-zone / ) ")" encoded-word = "=?" pure-token "?" pure-token "?" 1* "?=" pure-token = 1* o where '' means any octet other than those representing the US-ASCII characters NULL, CR, LF, TAB and SP, o where 'except unquoted "x"' means except any "x" not immediately preceded by a "\" and thus constituting a quoted-pair, and o where an encoded-word does not include "(" or ")" when in a comment-zone, and does not include DQUOTE, "<", "[", or "(" when in a neutral-zone. Observe that certain field-names containing non-alphanumeric characters, and permitted by [RFC 2822] (though never used in practice) are excluded from this protocol. Moreover, it is not assumed that this protocol will work on any of the obsolete syntax defined by [RFC 2822]. Lindsey [Page 13] Signed Headers in Mail and Netnews May 2000 NOTE: All known Email and Netnews headers (and a lot more besides) are encompassed within this syntax. Observe that the various zones cannot possibly overlap, and that any encoded-word must be fully contained within its zone. All encoded-words permitted by [RFC 2047] (and more besides) are covered. The structure is easily parsed by a straightforward state machine (though the nesting of comment-zones is a nuisance, as is the impossibility to detect whether a sequence beginning "=?" was really an encoded-word until you get to the matching "?="). Each header to be included in the algorithm, which will in general consist of several lines (those after the first commencing with whitespace), is processed as follows: 1. The header-name at the start of the header is converted to lowercase, the whitespace following it (if any) is removed, and a single SP is inserted. 2. Any date-time occurring in a Date, Resent-Date or Expires header (but not in any other header) is converted to UTC and written as a date-time in the format 07 dec 2000 23:59:60 +0000 Note absence of day-of-week, leading zero included in day, month- name in lower case, year as four digits, seconds included, and timezone 0000 preceded by a "+" as opposed to a "-", and observe that any FWS will be removed in the next step, giving 07dec200023:59:60+0000 NOTE: Observe that the effect is to treat "31 Dec 2000 23:59:60 +0000" (which is a legitimate date-time as defined by [RFC 2822] ) as being different from "1 Jan 2001 00:00:00 +0000". 3. Within each unstructured-header-zone and each comment-zone, all instances of FWS are replaced by a single SP; within each neutral-, quote-, sharp- or square-zone, all instances of FWS are omitted (thus the header has now been unfolded into a single line). Any whitespace at the end of the header is removed, and it is ensured that the header ends with a single CRLF. 4. The DQUOTEs (ASCII '"') enclosing each quoted-zone are removed (but not any quoted DQUOTE or any DQUOTE within other zones so that, in particular, they are not removed within msg-ids). 5. Any encoded-word (where allowed by the above syntax, and whether or not its length is more than 75 characters) is replaced by the sequence of octets obtained by decoding it. Moreover, where two adjacent encoded-words are separated by whitespace, that whitespace is removed (see [RFC 2047]). NOTE: The decoding of encoded-words must take place last, because it could produce arbitrary sequences of octets (when decoding into UCS-16, for example) which might then be confused with US-ASCII characters such as DQUOTE, etc. Whitespace needs Lindsey [Page 14] Signed Headers in Mail and Netnews May 2000 to be removed entirely from structured headers because it is possible it may have been introduced by folding in unexpected places en route, subsequent to the original signing. Typical results of the use of this canonicalization algorithm can be found with the example in section 5.2 below. Test data for use with implementations of this algorithm, and containing many obscure cases, can be found in Appendix B. If, during signing, a header is found not to conform to the given syntax (in particular, if the closing delimiter of some zone is not found), then the signing MUST be aborted (and it MAY be aborted if the header is malformed for some other reason). When verifying a signature, however, an implementation MAY attempt to continue even when the final zone of a header has no closing delimiter. NOTE: If an internet mail message in the format defined by [RFC 2822] is converted into X.400 mail by a gateway conforming to [RFC 1327] and then back into internet mail, then it is likely that any signature made in accordance with this proposal will fail to verify. For example, comments in headers containing addresses (such as in From, Reply-To, etc.) may be converted into phrases and moved in front of the addr-spec, or even removed entirely, and thus the canonicalized form of the message will have been changed. This old convention, for storing the Real Name of the person associated with the address in a following comment, is now deprecated by both [RFC 2822] and [USEFOR], but even where phrases are used for this purpose it is possible that other changes to the message will still render the signature unverifiable. Note that there is in any case no expectation that an internet mail message signed according to this proposal will ever be able to be verified once it has been passed permanently into an X.400 system, nor vice versa. 3.2.2. The PGP-Head-1 cryptographic algorithm [Open PGP is the obvious choice for this, since it is widely available and is blessed by the IETF. My only reservation is that it comes with a rather poor certification system as compared with, say, SPKI. So this choice might yet have to be reviewed.] The stream of octets resulting from the canonicalization algorithm is signed, in binary mode (signature type 0x00), in accordance with Open PGP [RFC 2440]. NOTE: The signature is made in binary mode just in case any [RFC 2047] decoding into UCS-16 has produced octets which might be mistaken for isolated CR, LF or trailing SP characters, which are treated specially in PGP text mode, and also because the treatment of trailing whitespace differs between Open PGP and some earlier PGP implementations. Lindsey [Page 15] Signed Headers in Mail and Netnews May 2000 The output of the algorithm MUST be Ascii-armored [RFC 2440], but the Armor Header Line ("BEGIN PGP SIGNATURE"), the Armor Headers (e.g. "Version:"), the blank line following the Armor Headers, and the Armor Tail ("END PGP SIGNATURE") are to be omitted (thus yielding a sequence of base64 characters). Observe that these characters will include a CRC checksum, which SHOULD be on a separate line from the rest of the signature. The signature included within the Ascii-armor MAY include certificates as evidence that the signing key has the necessary authorization to sign articles of that nature, but such usage is in general deprecated except between parties that have agreed otherwise or where, for some reason, an unusual signatory is signing and attaches a certificate from the usual signatory. The signature SHOULD use the DSA public-key algorithm and the SHA-1 hashing algorithm, and be incorporated in a Version 4 Signature Packet in the new format. It MAY alternatively use the combination RSA/MD5 with Version 3 in the old format (for compatibility with PGP 2.6.x) and it MAY use the combination RSA/SHA-1 with Version 4 in the new format. Verifiers MUST be able to verify all of these forms. 3.2.3. The PGP-Head-1 Macro Definition Two macro-names are defined, "$news-standard" and "$mail-standard". "$news-standard" is a macro representing a set of common headers that SHOULD normally be included when signing the headers of a Netnews article, and is defined to expand into the header-ref-list Date, Newsgroups, Distribution, Message-ID, From, Reply-To, Followup-To, References, Subject, Keywords, Control, Content-Type, Content-ID "$mail-standard" performs the same function for mail messages, and is defined to expand into the header-ref-list Date, From, Reply-To, To, Cc, In-Reply-To, References, Subject, Keywords, Content-Type, Content-ID NOTE: Those lists have carefully excluded those headers (such as Sender and Content-Transfer-Encoding) which are liable to be added or altered by sites downstream from the one which generated the Signed header. 4. Applications It is anticipated that protocols for specific applications of the signature mechanisms described in this proposal will be devised, whether under the auspices of the IETF or otherwise. For example, the need to be able to verify the origin of Control messages for creating and removing newsgroups and for cancelling articles was a prime motivation for creating this proposal. Lindsey [Page 16] Signed Headers in Mail and Netnews May 2000 It is up to each such application to specify appropriate mechanisms for establishing a Public Key Infrastructure suited to its purpose. Such an infrastructure would provide for the storing, distribution and authorization of the necessary public keys (and for revocations thereof). This proposal establishes no preferred mechanisms in this regard, except to draw attention to the possible usefulness of the Content-Type application/pgp-keys as defined in [RFC 2015] and [RFC 2015bis]. 5. Examples 5.1. Newgroup Control message A 'newgroup' control message in the format given in [USEFOR]. Newsgroups: comp.foo From: "Charles Lindsey" Subject: cmsg newgroup comp.foo moderated Control: newgroup comp.foo moderated Approved: newgroups-request@isc.example Message-ID: <919190727.4918@isc.example> Date: Tue, 16 Feb 1999 18:45:27 -0000 MIME-Version: 1.0 Content-Type: multipart/mixed; boundary=88888888 Signed: $news-standard,+1:content-md5,+1:content-type,+3:content-md5, +3:content-type; protocol=pgp-head-1; key="0xA336D40C" (DSS-example); sig=" iQA/AwUAO40EkyQRKsmjNtQMEQIwGgCfSYj8sgrRgRNQIwWKKnk+M9j0o+wAn2Mp fS2zwmmrA/KvCXyiTFsk35pr =8p5V" This is a multipart message in MIME format. --88888888 Content-Type: application/news-groupinfo Content-MD5: 68BGYb5+8KAVeqno7Et7Ug== For your newsgroups file: comp.foo For Foo discussions (Moderated) --88888888 Content-Type: text/plain comp.foo a moderated newsgroup which passed its vote for creation by 424:8 as reported in news.announce.newgroups on 10 Feb 99. --88888888 Lindsey [Page 17] Signed Headers in Mail and Netnews May 2000 Content-Type: application/news-transmission Content-MD5: cjeIxiGbPsrse1G/w9cfqQ== Newsgroups: comp.foo Path: not-for-relaying Distribution: local From: "Charles Lindsey" Message-ID: <919190727.4918$p=1@isc.example> Date: Tue, 16 Feb 1999 18:45:27 -0000 Subject: Charter for newsgroup com.foo Approved: newgroups-request@isc.example The charter, culled from the call for votes: Comp.foo is a moderated newsgroup for discussing all manner of Foos. Moderation submission address: comp-foo@bar.example --88888888-- 5.2. Mail message re-signed by mailing list owner received: from house.example by bar.example (8.8.8/AL/MJK-2.0) id XAA10880; Sat, 13 Feb 1999 23:00:14 GMT Resent-From: "Example Mail Server" Precedence: list Received: (from list@localhost) by house.example (8.9.2/8.9.2) id OAA28279; Sat, 13 Feb 1999 14:59:56 -0800 (PST) From: <"[john]"@ temple.example> (John Smith) Organization: http://www.temple.example/john Subject: Submission to mailing list in connection with foo. Message-ID: <19990213145946.20115@main.temple.example> Date: Sat, 13 Feb 1999 22:59:46 +0000 Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-MD5: ayoAIdYN8PZqpOgij7VG2Q== Signed: $mail-standard,content-md5; protocol=PGP-Head-1; key="0xA336D40C" (DSS-example); sig=" iQA/AwUAO40E1yQRKsmjNtQMEQLvzQCgtNnWdN2lwYtFoajEen96111IMboAn2hV z9edcA/oc2F6ui8nIj/X5/UW =buij" Verified: majordomo-request@com.example; signature=good; hashcheck="good content-md5" Lindsey [Page 18] Signed Headers in Mail and Netnews May 2000 Signed-1: message-id,date,resent-from, verified,signed; protocol=PGP-HEAD-1; key="0xA336D40C"; sig=" iQA/AwUAO40GgCQRKsmjNtQMEQLsNACdFPk9gPtPq9qpWMLXlurvhBLqMbAAoLg0 uOVRa6sHqBo2bVf+P/7qy0bF =FyFy" Text of John's message. -- John's signature. Passing the original form of this through the PGP-Head-1 canonicalization algorithm produces the following, in the case of the "Signed:" header (observe lines folded for convenience of this document - the true line endings being indicated by "CRLF"): signed: $mail-standard,content-md5;protocol=PGP-Head-1;key=0xA336 D40C(DSS-example)CRLF date: 13feb199922:59:46+0000CRLF from: <"[john]"@temple.example>(John Smith)CRLF subject: Submission to mailing list in connection with foo.CRLF content-type: text/plain;charset=us-asciiCRLF content-md5: ayoAIdYN8PZqpOgij7VG2Q==CRLF And here is the result of canonicalizing to produce the "Signed-1:" header: signed-1: message-id,date,resent-from,verified,signed;protocol=PG P-HEAD-1;key=0xA336D40CCRLF message-id: <19990213145946.20115@main.temple.example>CRLF date: 13feb199922:59:46+0000CRLF resent-from: ExampleMailServerCRLF verified: majordomo-request@com.example;signature=good;hashcheck= goodcontent-md5CRLF signed: $mail-standard,content-md5;protocol=PGP-Head-1;key=0xA336 D40C(DSS-example);sig=iQA/AwUAO40E1yQRKsmjNtQMEQLvzQCgtNnWdN2lwYt FoajEen96111IMboAn2hVz9edcA/oc2F6ui8nIj/X5/UW=buijCRLF NOTE: the second signature signed only that which it had added itself, plus sufficient of the original headers to identify the original message. It did not need to scan the body to recompute the MD5 hash, but effectively included it by signing the original "Signed:" header. 6. Security TBD [What is there to say here?] Lindsey [Page 19] Signed Headers in Mail and Netnews May 2000 7. References [PGPMOOSE] Greg Rose, [I need a URL for this], October 1995. [PGPVERIFY] David Lawrence, . [RFC 1036] M. Horton and R. Adams, "Standard for Interchange of USENET Messages", RFC 1036, December 1987. [RFC 1327] S. Hardcastle-Kille, "Mapping between X.400(1988) / ISO 10021 and RFC 822", RFC 1327, May 1992. [RFC 1864] J. Myers and M. Rose, "The Content-MD5 Header Field", RFC 1864, October 1995. [RFC 2015] M. Elkins, "MIME Security with Pretty Good Privacy (PGP)", RFC 2015, October 1996. [RFC 2015bis] M. Elkins, D. Del Torto, R. Levien, and T. Roessler, "MIME Security with OpenPGP", draft-ietf-openpgp-mime-06.txt, April 2001. [RFC 2045] N. Freed and N. Borenstein, "Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies", RFC 2045, November 1996. [RFC 2047] K. Moore, "MIME (Multipurpose Internet Mail Extensions) Part Three: Message Header Extensions for Non-ASCII Text", RFC 2047, November 1996. [RFC 2119] S. Bradner, "Key words for use in RFCs to Indicate Requirement Levels", RFC 2119, March 1997. [RFC 2234] D. Crocker and P. Overell, "Augmented BNF for Syntax Specifications: ABNF", RFC 2234, November 1997. [RFC 2440] J. Callas, L. Donnerhacke, H. Finney, and R. Thayer, "OpenPGP Message Format", RFC 2440, November 1998. [RFC 2821] John C. Klensin and Dawn P. Mann, "Simple Mail Transfer Protocol", RFC 2821, April 2001. [RFC 2822] P. Resnick, "Internet Message Format", RFC 2822, April 2001. [USEFOR] Charles H. Lindsey, "News Article Format", draft-ietf- usefor-article-format-03.txt. Lindsey [Page 20] Signed Headers in Mail and Netnews May 2000 8. Acknowledgements The author acknowledges the work of David Lawrence, as original author of "pgpverify", for many of the ideas contained herein, and also many contributions from members of the usenet-format mailing list. 9. Contact Address Charles. H. Lindsey 5 Clerewood Avenue Heald Green Cheadle Cheshire SK8 3JU United Kingdom Phone: +44 161 436 6131 Email: chl@clw.cs.man.ac.uk Comments on this draft should preferably be sent to the mailing list of the Usenet Format Working Group at usenet-format@landfield.com. This draft expires six months after the date of publication (see Page 1) (i.e. in March 2002). 10. Intellectual Property Rights [The usual texts from RFC 2026 to be inserted here.] Appendix A - Model implementation The following is written in PERL, with full use made of facilities provided by the Perl CPAN library. Appendix A.1 - The PGP-Head-1 canonicalization package Canon; use MIME::Words qw(decode_mimewords); use Date::Parse; use Date::Format; use Time::Local; use Exporter (); @ISA = qw(Exporter); @EXPORT = qw(set_protocol $canonicalize %macros); my $protocol; my %unstructureds = (); my %dates = (); my @ph1_news_standard = qw(date newsgroups distribution message-id from reply-to followup-to references subject keywords control content-type Lindsey [Page 21] Signed Headers in Mail and Netnews May 2000 content-id); my @ph1_mail_standard = qw(date from reply-to to cc in-reply-to references subject keywords content-type content-id); sub set_protocol { $_ = shift; SWITCH: { # PGP-Head-1 protocol /^pgp-head-1$/o && do { %macros = ( 'news-standard' => \@ph1_news_standard, 'mail-standard' => \@ph1_mail_standard, ); %unstructureds = ('subject', 1, 'comments', 1, 'organization', 1, 'summary', 1); %dates = ('date', 1, 'resent-date', 1, 'expires', 1); $canonicalize = \&ph1_canon; $protocol = $_; last SWITCH; }; # other protocols go in here die "Unknown protocol $_\n"; } } sub ph1_canon { my $tag = lc shift; my $line = shift; my $signing = shift; # for more stringent checks when signing my ($ss,$mm,$hh,$day,$month,$year,$zone,$time,$dummy,@dateval); $is_structured = (not $unstructureds{$tag}) && $tag !~ m/^x-/o; $is_date = $dates{$tag}; @outlist = ($tag, ': '); $outptr = \@outlist; # will point to @encodelist during encoding $state = 0; # for the state machine $encoding = 0; # part of the state machine $pending = 0; # to remember the FWS between encoded-words do { # lexical split of $line into plain ($x) + next delimiter ($y) $line =~ m/(.*?) # anything except the following: ( \\\S # quoted-pair | [][)><("] # various bracket delimiters | =\?(?!=) | \?=\s+=\? | \?= # for encoded-words | \s*$ # trailing whitespace ) /sogx; $x = $1; $y = $2; Lindsey [Page 22] Signed Headers in Mail and Netnews May 2000 # convert $x into canonical form if ($is_date && $state == 0) { $x =~ s/(\S*)\s+/$1 /sog; # reduce FWS to SP if ($x !~ m/^\s*$/) { # zone not empty if ($signing && $x !~ m/^\s? ((mon|tue|wed|thu|fri|sat|sun)\s?,\s?)? [0-9]{1,2}\s (jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\s [0-9]{4}\s [0-9]{2}:[0-9]{2}:[0-9]{2}\s [-+][0-9]{4}\s? /oix) {die "Bad Date '", $x, "'\n"} if (not ( (($ss,$mm,$hh,$day,$month,$year,$zone) = strptime($x)) && ($time = timegm(0,$mm,$hh,$day,$month,$year)) >= 0 # $ss not used, in case it is a leap second )) {die "Bad Date '", $x, "'\n"} ($dummy,$mm,$hh,$day,$month,$year) = gmtime($time - $zone); @dateval = ($ss,$mm,$hh,$day,$month,$year); # $ss restored $x = lc strftime("%d%h%Y%T+0000", @dateval); } } elsif ($is_structured && $state <= 0) { $x =~ s/(\S*)\s+/$1/sog; # eliminate FWS } else { # unstructured, or in a comment-zone $x =~ s/(\S*)\s+/$1 /sog; # reduce FWS to SP } push @$outptr, $x; # state machine to process $y if ($is_structured) { if ($state == 0) { # neutral-zone if ($y eq '"') {$state = -1; _end_encoding()} elsif ($y eq '<') {$state = -2; push @$outptr, $y; _end_encoding()} elsif ($y eq '[') {$state = -3; push @$outptr, $y; _end_encoding()} elsif ($y eq '(') {$state = 1; push @$outptr, $y; _end_encoding()} elsif ($y eq '=?') {_start_encoding(); push @$outptr, $y} elsif ($y =~ m/\?=/o) {push @$outptr, $y; _end_encoding()} elsif ($y =~ m/^[])>]$/o) { if ($signing) {die "Unbalanced '", $y, "'\n"} else {push @$outptr, $y} } else {$y =~ s/^\s*$/\r\n/o; push @$outptr, $y} # eliminate trailing WS; insert CRLF } else { # other zones if ($y =~ s/^\s*$/\r\n/o && $signing) {die "Unbalanced header ", $line} Lindsey [Page 23] Signed Headers in Mail and Netnews May 2000 if ($state == -1) { # in quoted-zone if ($y eq '"') {$state = 0} else {push @$outptr, $y} } elsif ($state == -2) { # in sharp-zone if ($y eq '>') {$state = 0} push @$outptr, $y; } elsif ($state == -3) { # in square-zone if ($y eq ']') {$state = 0} push @$outptr, $y; } elsif ($state > 0) { # in comment-zone if ($y eq '(') {$state ++; push @$outptr, $y; _end_encoding()} elsif ($y eq ')') {$state --; push @$outptr, $y; _end_encoding()} elsif ($y eq '=?') {_start_encoding(); push @$outptr, $y} elsif ($y =~ m/\?=/o) {push @$outptr, $y; _end_encoding()} else {push @$outptr, $y} } } } else { # unstructured $y =~ s/^\s*$/\r\n/o; # eliminate trailing WS; insert CRLF if ($y eq '=?') {_start_encoding(); push @$outptr, $y} elsif ($y =~ m/\?=/o) {push @$outptr, $y; _end_encoding()} else {push @$outptr, $y} } } until $y eq "\r\n"; if ($encoding) {_end_encoding()} $line = join('', @outlist); return $line; } sub _start_encoding { # entered at every '=?' @encodelist = (); $outptr = \@encodelist; # divert output during encoding $encoding = 1; } sub _end_encoding { # entered at every '?=' or unexpected delimiter my $token = "[^][()<>@,;:\"\?.=\x00-\x20\x7f-\xff]+"; my $encoded_text = "[^\?\x00-\x20\x7f-\xff]+"; if ($encoding) { $outptr = \@outlist; # cease output diversion if ($y =~ m/^\?=/o) { # '?=' as expected $encodelist[$#encodelist] = '?='; # in case it was '?=\s=?' $x = join('', @encodelist); $genuine = $x =~ m/^=\?$token\?$token\?$encoded_text\?=$/o; Lindsey [Page 24] Signed Headers in Mail and Netnews May 2000 if ($genuine) {$x = decode_mimewords($x)} # dies if it fails if ($is_structured && $state <= 0) { if ($genuine) {$x =~ s/\s//go} # eliminate FWS } else { if ($pending && not $genuine) {push @$outptr, ' '} } push @$outptr, $x; } else { # unexpected delimiter during encoding if ($pending && (not $is_structured || $state > 0)) { push @$outptr, ' '; } push @$outptr, @encodelist; } $encoding = 0; if ($pending = $y =~ m/^\?=\s+=\?/o) { _start_encoding(); push @$outptr, ('=?'); } } } Appendix A.2 - Parsing of the Signed header # This module must be stored in Mail/Field/Signed.pm # relative to the other programs in the suite package Mail::Field::Signed; use strict; use vars qw(@ISA); use MIME::Field::ParamVal; use Canon; use Carp; @ISA = qw(MIME::Field::ParamVal); INIT: { my $x = bless([]); $x->register('Signed'); $x->register('Signed_1'); $x->register('Signed_2'); $x->register('Signed_3'); $x->register('Signed_4'); $x->register('Signed_5'); $x->register('Signed_6'); $x->register('Signed_7'); $x->register('Signed_8'); $x->register('Signed_9'); } sub parse { my ($self, $string, $signing) = @_; my $clean_string = _skip_CFWS($string); Lindsey [Page 25] Signed Headers in Mail and Netnews May 2000 my $macro; $self->set($self->parse_params($clean_string)); $self->{string} = $string; $self->{header_refs} = (); set_protocol($self->protocol); do { if ($self->{_} =~ m/\G(((\d+:)*)(\$)|([-+]?)((\d+:)*))([-\w]+)(?!:)/og) { # ((---$2--)($4) (--$5-)(---$6--))(--$8--) if (defined $4) { if ($macro = $macros{$8}) {$self->_incorporate_header('', $2, @{$macro})} else { die "Unknown macro ", $8, "\n" } } else {$self->_incorporate_header($5, $6, ($8))} } else { die "Bad header-ref-list ", $string, "\n" } } while ($self->{_} =~ m/,/og); foreach ('protocol', 'key') { unless($self->param($_)) {die "$_ missing\n"}; } return $self; } sub tag { my $self = shift; return Mail::Field::tag($self); } sub protocol { my $self = shift; return lc($self->param('protocol')); } sub key { my $self = shift; return lc($self->param('key')); } sub sig { my $self = shift; return lc($self->param('sig')); } sub stringify { my $self = shift; return $self->{string}; } sub header_refs { my $self = shift; @{$self->{header_refs}}; } sub _incorporate_header { my ($self, $plusminus, $level, @additions) = @_; Lindsey [Page 26] Signed Headers in Mail and Netnews May 2000 my $refs = \@{$self->{header_refs}}; foreach (@additions) { if ($plusminus eq '-') { # item to be removed from list for (my $i = 0; $i < @$refs; $i++) {if (@$refs[$i] eq $level.$_) {splice(@$refs, $i, 1)} } } else { # item to be added to list I: { for (my $i = 0; $i < @$refs; $i++) {if (@$refs[$i] eq $level.$_) {last I} } push (@$refs, $level.$_); # only if not already present } } } } sub _skip_CFWS { my $line = shift; my $count = 0; my @buf = (); while ($line =~ m/\G([^\s\("]*)\s*|\G(\()|\G(")/sog) { # (---$1----) ($2) (3) if ($1) {push @buf, ($1)} elsif ($2) { # comment $count += 1; do { $line =~ m/\G[^()]*([()])/sog or die "Unclosed comment\n"; $count += ($1 eq '(') ? +1 : -1; } until ($count == 0); } elsif ($3) { # quoted-string push @buf, ('"'); do { $line =~ m/\G([^\"\s]+)|\G(\s+)|\G(")/sog; # (---$1---) (-$2) (3) if ($1) {push @buf, ($1)} elsif ($2) {push @buf, (' ')} elsif ($3) {push @buf, ('"'); last} } } } return join('', @buf); } sub extract_headers { my ($self, $article, $FH, $proc, $signing) = @_; my $ref; my $signed = $self->stringify; $signed =~ s/\s*;[^;]*\bsig\b[^;]*$//io; # remove "; sig=..." print $FH (&$proc($self->tag, $signed, $signing)); foreach $ref ($self->header_refs) { _extract_header($article, $ref, $FH, $proc, $signing); } Lindsey [Page 27] Signed Headers in Mail and Netnews May 2000 } sub _extract_header { my ($article, $ref, $FH, $proc, $signing) = @_; $ref =~ m/((\d+):((\d+:)*))?([-\w]+)/o; # ((-$2) (---$3--)) (--$5--) if ($1) # $ref of the form "1:header"; call ourself recursively {_extract_header($article->parts($2-1), $3.$5, $FH, $proc, $signing)} else { # $ref is a header at the current level if ($article->head->count($5) > 1) {die "Cannot sign duplicated header ", $5, "\n"} elsif ($article->head->count($5) == 1) { print $FH (&$proc($5, $article->head->get($5), $signing)); } } } 1; Appendix A.3 - The Signing program use English; use Mail::Header; use Mail::Field; use Mail::Field::Signed; use MIME::Parser; use Canon; $signing = 1; # This is a program to sign headers # Read partial Signed header from file open SIGNED, "<".$ARGV[0]; $signed = new Mail::Header \*SIGNED; @names = $signed->tags; $tag = $names[0]; if ($tag !~ m/^signed(-[1-9])?$/oi || $#names != 0) {die "Invalid SIGNED file ", $ARGV[0], "\n"} $line = Mail::Field->extract($tag, $signed); if ($line->sig) {die "'sig' already present\n"} $parser = new MIME::Parser output_to_core=>'ALL'; $parser->parse_nested_messages('NEST'); # special treatment for message/rfc822 $article = $parser->read(\*STDIN) or die "Malformed article\n"; if ($article->head->count($tag)) {die "Message already signed\n"} $tmp = "/tmp/sign-$$"; open(FH, "> $tmp") or die "Cannot open $tmp: $!\n"; $line->extract_headers($article, \*FH, $canonicalize, $signing); close(FH); Lindsey [Page 28] Signed Headers in Mail and Netnews May 2000 # The remainder of this code is dependent upon the particular # implementation of OpenPGP. $key = $line->param('key'); $pgp = "pgps -fab +verbose=0 +textmode=off -u $key <$tmp 2>/dev/null |"; open(FH, $pgp) or die "Cannot open pipe from pgp: $!\n"; undef $INPUT_RECORD_SEPARATOR; $_ = ; # The OpenPGP signature record unlink $tmp; s/^.*[^\w+\/=\n].*\n|^\n//mog; # remove non-base64 lines s/^/ /mog; # indent by 3 spaces s/\A/;\n sig="\n/mo; s/\Z/"/mo; # enclose in '; sig="..."' $article->head->add($tag, $line->stringify . $_); $article->print; Appendix A.4 - The Verification program use English; use Mail::Header; use Mail::Field; use Mail::Field::Signed; use MIME::Parser; use Canon; $signing = 0; # This is a program to verify signed headers $parser = new MIME::Parser output_to_core=>'ALL'; $parser->parse_nested_messages('NEST'); # special treatment for message/rfc822 $article = $parser->read(\*STDIN) or die "Malformed article\n"; $tag = $ARGV[0]; unless ($tag =~ m/^Signed(-[1-9])?/io) {die "Bad parameter ", $tag, "\n"} $line = Mail::Field->extract($tag, $article); unless ($line) {die $tag, " header not found\n"} unless ($line->param('sig')) {die "Malformed Signed header\n"} $tmp = "/tmp/sign-$$"; open(FH, "> $tmp") or die "Cannot open $tmp: $!\n"; $line->extract_headers($article, \*FH, $canonicalize, $signing); close(FH); # The remainder of this code is dependent upon the particular # implementation of OpenPGP. use IPC::Open2; $pgp = "pgpv -f --batchmode -o $tmp 2>&1"; open2(\*PIPEOUT, \*PIPEIN, $pgp); $armour = $line->param('sig'); Lindsey [Page 29] Signed Headers in Mail and Netnews May 2000 $armour =~ s/\s//sog; $armour =~ s/([\w+\/=]{64})/$1\n/sog; $armour =~ s/(=[\w+\/]{4}\Z)/\n$1/so; print PIPEIN "-----BEGIN PGP SIGNATURE-----\n", "Charset: noconv\n\n", $armour, "\n", "-----END PGP SIGNATURE-----\n"; close(PIPEIN); undef $INPUT_RECORD_SEPARATOR; $result = ; unlink $tmp; $result =~ s/^This signature applies to another message\n//mo; $result =~ m/Key ID +([0-9a-fA-F]+)/iom; unless ("0x" . $1 eq $line->param('key')) { print "Signature was for key ", $line->param('key'), ", not for 0x", $1, "\n"; $badsig = 1; } $badsig |= ($result !~ m/Good signature/iom); print $result; exit $badsig; Appendix B - Test cases The following, believe it or not, is a valid email message. Note that there were various TABs and much trailing whitespace in it (but they are unlikely to have survived through to the published form of this document). Subject: Unstructured headers can contain unmatched (s and unescaped "s; (comments like this) and "quoted strings" are not treated specially. SUMMARY: Multiple spaces, tabs and foldings in unstructured headers are reduced to a single SP, and trailing whitespace (of which there is much in these examples)) is ignored. X-Header: All X headers are "treated "as unstructured") from: "Scooby Doo" (all FWS in structured headers is removed, except in comments) tO: "John (the Boss) Smith" , "Bill \"fingers\" Sykes" <"#*\"~"@twist.example> (Observe unescaped \( and escaped " within quoted strings, and (properly matched) parentheses within comments) rEPLY-tO:"#*\"~"@twist.example (Observe "s elided, since not in <...>) Message-ID: <"*\"~and-other-grunge)(]["@[127.0.0.1"Ugh!]> (Yes that is a legal msg-id, including the " in the domain-literal) Sender: foo@[127.0.0.1"Ugh!] (another " in a domain-literal) Lindsey [Page 30] Signed Headers in Mail and Netnews May 2000 Cc: foo@[127.0.0.1(this is not], bar@[a comment)127.0.0.1], "=?utf-8?Q?not_an_encoded_word?=" <=?utf-8?Q?not_an_encoded_word?=@bar.example>, =?us-ascii?Q?Joe_D._Bloggs_=5Bwho=20else=5d?= , =?us-ascii?Q?C&A?=@bar.example (treated as an encoded-word even though, syntactically, it isn't) (in comment but =?is0-8859-1?Q?not(an_encoded-word?=)) (=?us-ascii?Q?encoded-word_split_into-?= =?us-ascii?b?cGFydHM=?=) Comments: An unstructured encoded word can have =?us-ascii?Q?any_characters_in_it_<>()[]"?= =?bogus_e.w?= Date: (pre comment) sAt, 13 fEb 1999 14:59:56 -0800 (PST) Keywords: (various illegal constructs which nevertheless get through) \(Not a comment\), \" (naked quoted-pair), \ (not a quoted-SP) Comments: Various mismatches, which should be rejected. Foo: ) (naked \)) Bar: ((mismatched parens) Baz: <"mismatch" Fred: ["mismatch" Date: Sat, 13 Feb 1999 23:00:14 GMT Date: 29 Feb 2001 23:00:14 +0000 The following is the result of applying the PGP-Head-1 canonicalization to it (lines folded for convenience, as before, and blank lines inserted between headers for readability). subject: Unstructured headers can contain unmatched (s and unesca ped "s; (comments like this) and "quoted strings" are not treated specially.CRLF summary: Multiple spaces, tabs and foldings in unstructured heade rs are reduced to a single SP, and trailing whitespace (of which there is much in these examples)) is ignored.CRLF x-header: All X headers are "treated "as unstructured")CRLF from: ScoobyDoo(all FWS in structured headers is removed, except in comments)CRLF to: John(theBoss)Smith,Bill\"fingers\"Sykes<"#*\ "~"@twist.example>(Observe unescaped \( and escaped " within quot ed strings, and (properly matched) parentheses within comments)CRLF reply-to: #*\"~@twist.example(Observe "s elided, since not in <... \&.>)CRLF message-id: <"\*"~and-other-grunge)(]["@[127.0.0.1"Ugh!]>(Yes tha t is a legal msg-id, including the " in the domain-literal)CRLF sender: foo@[127.0.0.1"Ugh!](another " in a domain-literal)CRLF Lindsey [Page 31] Signed Headers in Mail and Netnews May 2000 cc: foo@[127.0.0.1(thisisnot],bar@[acomment)127.0.0.1],=?utf-8?Q? not_an_encoded_word?=<=?utf-8?Q?not_an_encoded_word?=@bar.example >,JoeD.Bloggs[whoelse],C&A@bar.example(treated a s an encoded-word even though, syntactically, it isn't)(in commen t but =?is0-8859-1?Q?not(an_encoded-word?=))(encoded-word split i nto-parts)CRLF comments: An unstructured encoded word can have any characters in it <>()[]" =?bogus_e.w?=CRLF date: (pre comment)13feb199922:59:56+0000(PST)CRLF keywords: (various illegal constructs which nevertheless get thro ugh)\(Notacomment\),\"(naked quoted-pair),\(not a quoted-SP)CRLF Appendix C - PGP Public Key For the benefit of those who would like to experiment with the examples given in section 5, the following is the Public Key for 0xA336D40C (DSS-example). -----BEGIN PGP PUBLIC KEY BLOCK----- Version: PGPfreeware 5.0i for non-commercial use mQDiBDuB8NIRAgDeGDfg+ZlgHDZkkXDpeeaBJxIq9/pkuFL/6puw9j+k/JsKzLr9 ktqlFgkdnDyYbWm26lWAmjliZEeIyggBlxSlAKD/lbF/4JAJox/7xqW8fuSc9sPO AwIA0rQJ1TEhIztyUYB5j4D9V7pHKyhbdifFEf1MwrYsnluiejd5/K623J4wQr/m +zMzr7lnX6ZLPkITKgfgpjoAWQIAzg9BAYwHGVgjRg82MxxlP5737ihfa0yWMeVn KTU1mToKMaokGMrMnvuOjvu6GmgHdbfgaFXThrnuerN8rRqVP7QLRFNTLWV4YW1w bGWJAEsEEBECAAsFAjuB8NIECwMBAgAKCRAkESrJozbUDIgdAKCc4eqIbAFlOB6O rWv8CzMPBNo2ZACeKOD6mS+GrEgQkD+cW1MytHVjFTE= =Zl1B -----END PGP PUBLIC KEY BLOCK----- [Would it be useful to include descriptions of pgpverify and pgpmoose as additional appendices?] Lindsey [Page 32]