Network Working Group J. C. Klensin INTERNET DRAFT July 15, 1992 Updates: RFC-821 Expires: January 27, 1993 SMTP Extensions for Transport of Enhanced Messages Abstract A series of extensions and clarifications are provided for the Simple Mail Transfer Protocol specified by RFC-821. In combination, they provide for the transport of "8 bit mail", i.e., data characters with all bits of the octets used for information, for more robust and efficient handling of large messages, and for an improved foundation for any future extensions to SMTP. Status of this Memo This document is an Internet Draft. Internet Drafts are working documents of the Internet Engineering Task Force (IETF), its Areas, and its Working Groups. Note that other groups may also distribute working documents as Internet Drafts). Internet Drafts are draft documents valid for a maximum of six months. Internet Drafts may be updated, replaced, or obsoleted by other documents at any time. It is not appropriate to use Internet Drafts as reference material or to cite them other than as a "working draft" or "work in progress." Please check the I-D abstract listing contained in each Internet Draft directory to learn the current status of this or any other Internet Draft. This document is a working draft as part of the development of an extension to the SMTP protocol. A subsequent version will be submitted to the RFC editor as a proposed standard. Distribution is unlimited. Comments are solicited and should be sent to the editor at Klensin@MIT.EDU or, preferably, by joining in discussions on the ietf-smtp mailing list (subscription requests to ietf-smtp-request@dimacs.rutgers.edu, postings to ietf-smtp@dimacs.rutgers.edu). 1. Introduction and Background 1.1 Introduction RFC-821 [RFC821] defines a protocol, SMTP, to transfer mail reliably and efficiently. It is largely independent of the transmission subsystem used. It requires only a reliable ordered data stream, of at least 7-bit units, that consists of "lines" and "characters". It also makes some implied assumptions about end-to-end virtual circuit connections as the primary model for transporting and delivering mail. SMTP, as described in RFC-821, is restricted to the transport of data in 7-bit ASCII [ANSI-X3.4] encoding. The term "ASCII", as used in this document, refers to ANSI X3.4, and not to any national language variations on ISO 646; the use of "US" with "ASCII" is merely to add additional emphasis when that appears useful. Strictly speaking, incorporation of any non-ASCII character encoding, whether 7 or 8 bits, or the assumption of a special interpretation for any control character other than ASCII, CR, and LF is an extension from RFC-821 that may not be compatible in subtle ways with existing conforming implementations. Such extensions require either changes to RFC-821 itself, or prior agreement among all parties and hosts which will transport or handle the mail. A strict reading of RFC-821 would permit the receiver of a message to assume that it contained only ASCII characters. MIME [RFC1341], Multimedia Internet Mail Extensions, provides for identifying the use and encoding of character sets other than [US] ASCII within a structured message body, using extended headers. Because MIME does not require an 8-bit transport mechanism, its use with 7-bit transport is likely to provide better interoperability than the use of an 8-bit transport mechanism in situations where mail must be passed through one or more unknown mail relays, gateways, or exploders between the sender and the receiver. At the same time, most electronic mail messages do not pass through such mechanisms, but are simple textual messages sent within a small, "local" community of users. Within such local communities, sending characters represented using other than 7-bit coding with a transport mechanism that logically reflects the length of the character codes, without additional encoding, provides considerable simplification. Such a system has been much in demand for 8-bit characters. The consequences, within a community that has decided to use this protocol extension, of discovering that a receiving host will not accept transport of extended-length characters, are also not severe, since that problem will presumably rarely arise. Nonetheless, this document provides a framework for conversion of enhanced transport forms into the line- and character-oriented 7-bit form permitted by the original SMTP and outlines mechanisms by which the acceptability of 8-bit transport to a given server can be inferred without first opening a mail connection. In addition to the issues of 8-bit transport and general extensibility, a number of trends, not least of which is the introduction of the extended "multipart/multimedia" design of MIME, have contributed to steady increases in the average and maximum sizes of messages that people wish to transport over electronic mail facilities. When hosts imposed limits on mail message sizes in the early days of the ARPANET, limits in the ranges of four, or even one, kilocharacters, were considered reasonable. The applications-level host requirements RFC [RFC1123] specified 64K characters as the minimum size at which it was reasonable to reject mail messages for excessive length. Under RFC-821, there is no mechanism for rejecting a message as being too long without actually having that message transmitted. There is also no provision for checkpointing or otherwise salvaging the portion of the message transmitted before the size limit of the server is encountered. If all hosts accept messages of at least 64Kb, experimenting with longer limits may waste considerable bandwidth. This may be a major consideration on slow or expensive links. This document provides a model for determining whether a large message will be accepted without actually transmitting most of it. 1.2. Background, History, and Context of this Draft The strongest evidence for the importance of 8-bit transport is that many vendors and implementors already support it over the usual SMTP channels and many report that they have done so in response to intense customer pressure. Since the mechanisms that have been chosen have not been standardized, messages containing octets with the high bit set may "escape" the local environment. Difficulties of varying degrees of severity may arise when they do so, including information loss as characters are "bit stripped", which may be considered a severe violation of user expectations about reliable mail transport and delivery. This document has two primary purposes. One is to specify a clear extension model for SMTP, so potential problems with further extensions can be avoided. The second, and the original goal of the working group, was to provide for 8-bit character-oriented transport via SMTP when that is deemed necessary. A critical secondary purpose is to standardize mechanisms and clarify procedures in ways that prevent destructive "escape" of improperly-identified 8-bit characters and potentially even more severe problems which could otherwise result from the transport of characters that comprise multiple octets or of data not organized into character form. In other words, transport of 8-bit characters is occurring, will continue to occur, and is perceived of as desirable under many circumstances. For it to coexist with older, more restricted, implementations, requires that it be used in a coordinated way and only when both parties are able and willing to use it. That, in turn, requires a clear mechanism as to how coordination will occur and agreement be verified. This protocol extension provides that mechanism. 2. Notation and terminology There are several situations in this document in which the bit pattern associated with the code for a character is, in the event of possible ambiguity, more significant than the character itself. In those situations, the bit pattern is cited (in hexadecimal notation) as the value of the octet, and the referenced ASCII characters are then indicated in parentheses. When characters, or character names, are mentioned, they are to be construed strictly in accord with ASCII, that is, from American National Standard ANSI X3.4-1986. However, for the purposes of this specification, the "international reference version" table in ISO 646 [ISO646] and that in ASCII are identical. <> Discussion: ISO 646 has traditionally contained two character tables. One is called the "International Reference Version" (often referred to as ISO646/IRV) and has been identical to ASCII except for the substitution of "universal currency symbol" for "dollar sign". The other is called the "Basic Version" (often referred to as ISO646/BV). National Language Variants on ISO 646 (often referred to as ISO646/NLV-language) are derived from ISO646/BV by the substitution of national characters into positions that ISO646/BV designates as reserved for this purpose. So-called "invariant ISO 646" is a large subset of the non-reserved characters in ISO646/BV. Except in a few situations where the distinction is important, the terms "8-bit characters", "8-bit text", and "8-bit transport" are used interchangably to refer to messages that might contain octets with the high bit set to 1. As above, when the distinction is important, the term "octet" is used rather than "character" or "byte". While it provides a framework that could be extended to the logical transport of characters longer than 8 bits, this document does not specify or permit such transport. <> Discussion: Character codings in which individual characters occupy more than one octet (e.g., 16- or 32-bit character codes), may pose special problems for SMTP-style transport that 8-bit characters do not. In particular, with some possible codings, some of the octets of some characters might have the same bit patterns as, e.g., CR LF. Such character sets can always be handled by an encoding above the mail transport level, so that only conventional 7-bit or 8-bit characters are actually seen by the mail transport mechanisms. However, if they are to be transported in "native" form, transport extensions beyond those specified here will be required to insure the unambiguous recognizability of CR, LF, and "." or to avoid the necessity of recognizing those characters. For many years, we have used the term "gateway" in discussions of mail to refer to something that operates at the applications level, translating between different mail protocols or environments. Such gateways are quite distinct from gateways at the IP and routing level. The term "mail gateway" should perhaps be used consistently to avoid any possible confusion, but that confusion rarely occurs. A similar situation applies when we discuss "transport" in a mail context, referring to "mail transport" and not the underlying transport layer. With RFC-821, we have had a "mail transport" mechanism that logically deals only with 7 bit characters, even though most of the underlying transport layers deal in octets. This document extends the logical mail transport environment to full octet width, again largely independent of the underlying mechanisms. "Transport" in this document should always be read as "mail transport" and not in terms of how the network carries octets or packets. This document uses the terms "byte" and "kilobytes" in several contexts. "Byte", as used here, is taken to be an 8-bit quantity, not one of variable size. In particular, "kilobyte" is intended to be construed as 1024 octets, not as 1024 "characters". Similarly, uses of "length" terminology in this document is always associated with bit and octet storage units and not with the lengths of strings in character units. The terms "client" and "sender" are used together or interchangably to indicate the source of a particular mail transaction and "server" is used to describe the target. This usage is consistent with that in RFC-821. This document uses the term "mail transaction" to indicate a use of SMTP, with or without the extensions specified here, with the intent of sending mail. Mail transactions always start with a HELO or EHLO command with the intent of following it with a FROM command and one or more RCPT verbs. Use of VRFY, EXPN, or the new EHLO, SIZE, or EVFY commands without prior FROM commands in the SMTP session does not constitute a mail transaction. A mail transaction ends when a QUIT or RSET is sent. It may also be considered to end when the CRLF.CRLF that terminates the DATA command is received but, in that case, a second mail transaction in the same [connection] session will normally start with a FROM command rather than HELO or EHLO. <> Discussion: From the standpoint of the sender/client, a mail transaction is a matter of intent. From the standpoint of the server, the state is "not a mail transaction" until a FROM command is received. This protocol provides the framework for several features that require specification in additional RFCs before they can be validly used. For those features, which are clearly identified, this document provides only syntax and, in most cases, an overview of the characteristics that must be defined in feature-specific supplemental RFCs. The major feature for which neither this document nor RFC-821 provide is a specification for the transport of information that is not structured into "characters" and "lines". However, this document provides a framework around which such a definition might be developed in the future if that were desired. Language in this document that implies forms of enhanced transport other than that specified with the EMAL verb has been retained to allow for compatible extension for additional features. Finally, this document contains many subsections that are identified with the terms "discussion", "implementation note", or "example". While not strictly part of the specification, these subsections provide context for the features, guidance for implementors, and other "folklore" about the intent of the working group that produced the specification to aid understanding and the generation of interoperable implementations. 3. Organization and summary This document consequently contains ten major components, which follow: (i) Definition of a new SMTP verb, EMAL FROM, as an alternative to MAIL FROM where 8 bit transport is desired. (ii) Provision of a framework for additional transport type extensions via the addition of additional "FROM" verbs. (iii) Definition of a new SMTP verb, EVFY, which can be used to determine whether an enhanced transport request is likely to be accepted for a particular address. (iv) Definition of a new SMTP verb, EHLO, which can be used by a client as an alternative to HELO. Either HELO or EHLO may be used when the client intends to use enhanced mail facilities, but the server response to EHLO provides the client with a structured listing of the mail features supported by the server. (v) Definition of a new SMTP verb, SIZE, which can be used by the client to inform the server of the approximate size of the data to be transferred. The semantics of this verb and its interaction with maximum message size limitations are also specified. (vi) A discussion of the interaction of enhanced transport with message formats (i.e., RFC-822 material) (vii) A description of enhancements to "trace" header fields to permit more efficient isolation of problems in today's more complex world. (viii) Clarification, and additional specification, of the RFC-821 description of the application and semantics of the RSET command. (ix) A discussion of failure and error conditions when server and client conform to this protocol. (x) A discussion of the handling of failure conditions in general with specific discussion of the alternatives in the "unverified eight bit encountered" error. This problem might be encountered when the server conforms to this specification but the client does not. Most of these sections impose requirements which are mandatory if the protocol specified here is implemented. Under some circumstances, such as the management of large mailing lists or relays with large aggregate message traffic, the costs of opening a mail connection and determining whether the destination host will accept the enhanced features specified here may be considered excessive. Additional specifications will be needed to provide methods for making that determination using the domain name system or other tables or caching methods. This protocol does not require that those facilities be used, nor does the use of those facilities change the actual mail transport command sequence specified here. Advice as to when supplemental facilities are permitted or required to be used may appear in future applicability statements. 4. The EMAL FROM verb The SMTP protocol, as specified in RFC-821, is extended to permit the use of a new verb that supplements the "handling" components of what we shall refer to as the "FROM" verbs. In other words, this specification adds "EMAL FROM", as defined below, to the "MAIL FROM", "SEND FROM", "SOML FROM", and "SAML FROM" forms specified in RFC-821. This addition also provides an explicit extension model for future transport variations as needed. If this new verb is to be taken as an acronym, "E" should be read as "enhanced" or as "eight". This is an extension in the traditional sense. An implemention MUST NOT be so constructed that it is possible for it to accept EMAL FROM in a context in which it would not accept MAIL FROM. <> Discussion: Mailing list discussion seems to indicate that this should be explicitly stated. It is a statement for which conformance is easily tested "on the wire". As specified in RFC-821, DATA is treated as introducing a stream of ASCII (and therefore 7-bit) characters, divided into lines that are delimited by the ASCII control characters CR followed by LF, with potential restrictions on line lengths, and terminated with the sequence "CR LF . CR LF". If 8 bit transport is desired, the MAIL FROM verb is replaced by EMAL FROM. If the receiver does not recognize that verb (which will be the case with all SMTP servers that conform to RFC-821 alone), or will not accept enhanced mail features, it gives a fatal negative reply (500 if the verb is not recognized, 556 if the verb is recognized but not accepted). Such a reply would indicate that the sender MUST NOT send octets with the high bit turned on. Otherwise the receiver sends the positive 250 reply identical to that normally returned in response to MAIL FROM. The sender can then proceed with the rest of the mail transaction, sending a message containing 8-bit text after the DATA verb. All SMTP command verbs, including the enhanced FROM verbs, are written in ASCII characters. Nothing in this specification provides for any character or coding other than those of ASCII to be used in SMTP transactions ("the envelope"). It does provide for such characters in the message body initiated by the DATA verb and terminated with CR LF . CR LF. The format of such messages are as specified in other RFCs, e.g., RFC-822 [RFC822] and MIME [RFC1342]. 5. Further extensions to SMTP 5.1 Provisions for adding additional FROM forms. Specifications may be written to add additional FROM forms which should then be registered with the Internet Assigned Number Authority (IANA) in accord with the provisions of section 5.3. 5.2 Provisions for other new verbs The general approach used in this specification assumes that further extensions to support different forms of transport that preserve a character-and-line-oriented model (e.g., direct transport of character sets that must be handled in special ways due to multiple-octet encodings or transport-level data compression) will be handled by adding additional forms of the FROM command, as discussed above. Other transport arrangments (such as data "streaming" or transport of true binary data), if introduced, may require additional or variant commands and verbs. Such new verbs may be registered with IANA, as described below. In addition, those that are intended for general use should be documented in RFCs and submitted for standardization. In order to permit experimentation, verbs starting with the character "X" are reserved for use between consenting systems by mutual agreement. Any command without a rigorous and public definition must be given a name starting in "X", and public (registered) values shall never begin with "X". <> Discussion: All commands defined by RFC-821 and by this specification have precisely four characters in their first (or only) token. It is likely that some mail systems depend on this property, which should be preserved unless there is reason for doing otherwise. 5.3 Specifications and Registration Procedures Even when new features or verbs are not intended for general use, it is undesireable that two different sets of systems use the same verb in different ways. The introduction of the EHLO functionality (see section 7), which permits a client to interrogate a server about the features supported by the latter, may exacerbate the potential problems of identical verbs being used for different purposes, since a client may discover a server-supported feature when no prior agreement exists between the two hosts. To avoid these problems, and to keep the specification of EHLO useful, unambiguous, and meaningful, any verbs used in SMTP processing must either be registered or must be explicitly private. Registration must be with the Internet Assigned Numbers Authority (IANA) and may occur in either of the following forms: (i) Commands intended for general use: These commands should be developed and documented using IETF standards-track procedures. The RFCs or working drafts leading to them are expected to specify any and all special treatment that these new verbs imply for transport. Such documents must also specify any deviations or exceptions to the rules of section 9.1. If the transport extensions being proposed have implications for conversions at gateways, those conversions must be discussed and, preferably, completely specified. The verbs should be registered when serious testing begins, but not later than approval of the extension as a proposed standard. (ii) Commands intended for use in special communities: the names of these commands should be registered, along with a short description of applicability, before the commands are placed into use. In either case, verbs must be registered before any server announces them in the response to an EHLO inquiry. Completely private verbs -- those starting in "X" as discussed above -- do not require registration and will not be registered. Except for short-term experimentation, use of such verbs is discouraged. 6. EVFY command A new command verb, EVFY is defined, corresponding to VRFY (as defined in RFC-821) and with the same argument, but requesting information as to whether the address is acceptable for 8 bit transport. EVFY has the same reply codes as VRFY, but the successful 250 or 251 codes are returned only if enhanced transport will be accepted for that address. Code 556 must be returned if the address is acceptable, but enhanced transport will not be accepted for it. EVFY without an argument MUST be treated as a syntax error. Circumstances in which the address appears to be valid but is remote or cannot be exactly verified for other reasons, should be treated as specified for VRFY in RFC-1123 [RFC1123]. 7. EHLO command Clients may be able to act more intelligently if they can determine the characteristics and capabilities of servers to which they expect to send messages. It is desirable for a client to be able to determine which of the SMTP extensions defined herein are supported on a particular host. Similarly, a client might wish to be able to determine whether other optional features of RFC-821 such as SEND, SOML, or SAML are provided by a given server. <> Discussion: Under some circumstances, it may be desirable to make some of these determinations in an out-of-band way. This protocol does not prohibit such mechanisms and anticipates at least one of them. See the discussion at the end of section 3 above. A new command verb, EHLO, is defined. In order to minimize the number of commands issued, if the special capabilities of EHLO are needed, it is used as an alternative to "HELO", not as a separate command. If a server implementation provides support for any of the features specified in this document or subsequently defined as specified in section 5, it must accept and process EHLO. The argument syntax for EHLO is identical to that for HELO, i.e., the primary fully-qualified domain name of the sender-SMTP. Except for verbs starting in "X" (see the discussion in section 5), all verbs supported in a particular server must be listed. In addition, LIMIT information must be provided as specified in section 7.3. Other than verbs starting in "X", no verb may be listed that is not specified in RFC-821 or this document, or registered as provided for in section 5. Verbs starting in "X" may be listed or not depending on the particular needs of the server or private agreements between server and client. The term "supported", as used above, implies that the server provides meaningful support for the capability implied by the command, rather than just recognizing the verb. For example, SEND FROM is not "supported" if the command would be refused with all possible predicates or if there are no possible addresses (in RCPT TO) that will be accepted. The response is also expected to reflect actual system configuration and operation, e.g., if a server implementation provides support for VRFY, but the command is disabled for security reasons as provided for in RFC-1123, EHLO should not list that verb. <> Discussion: If VRFY is meaningfully supported--i.e., the server expects to actually confirm accessibility of addresses--then it should be listed even if some (or under some circumstances most or all) addresses that the server supports cannot be confirmed in real time. The information provided by EHLO will usually be static for most servers (at least once they are configured for a particular site). However, since it might change from session to session in some cases, clients should, in general, not cache the information between mail connections. <> Discussion: There are several SMTP servers in use on the Internet that support and use verbs that are not specified in this document or in RFC-821. Since RFC-821 takes no position on command extensions, these may be conforming implementations. This specification does take such a position (above and in section 5) and therefore has much stronger conformance implications than RFC-821. The intent with the EHLO verb and other enhanced capabilities is not to invalidate any existing RFC-821 implementations that are valid in the absence of this specification. It is, however, intended to provide an "all or nothing" approach: if enhanced capabilities are supported (e.g., EHLO is accepted at all) then all of the stricter requirements of this specification apply. In particular, if these enhanced capabilities are supported, then any (non "X") verbs that do not appear either here or in RFC-821 must be registered with IANA and reported by EHLO. 7.1 Server considerations If a EHLO command is received the server must return a formatted message that consists of a multiple-line 255 reply, using the continuation convention specified in RFC-821. The first line of this response will be the the primary fully-qualified domain name of the receiver-SMTP. Subsequent lines will consist of a verb supported by the server and a special LIMIT indicator with two values (see section 7.3, below). All verbs supported by the server must be included in the reply, including those specified in RFC-821, in this document, and in extensions provided for in section 5. For example, a typical receiving server supporting this protocol might respond to a EHLO command with: 255-foo.domain 255-HELO 255-MAIL FROM 255-VRFY 255-EXPN 255-RCPT TO 255-EHLO 255-EMAL FROM 255-EVFY 255-SIZE 255-RSET 255-QUIT 255-DATA 255 LIMIT 64 3000 <> Discussion: 255 was chosen using the "positive completion" and "mail system" model of RFC-821. This is really a system response in context, rather than a purely informational (x1z) one. The terminal "5" is arbitrary, motivated by a desire to leave a little space after 251. Note that the normal response to HELO is 250, not 255. Servers supporting EHLO MUST NOT return "502 Command not implemented" for any verb that they list in their response to EHLO. Servers that support any enhanced verb (i.e., a verb that appears in this specification) that does not apear in RFC-821 MUST NOT return "500 Syntax error" in response to EHLO. <> Discussion: The statements above are part of the process of binding all of the enhanced commands specified here together on a basis in which implementations are expected to support all of them together, rather than picking and choosing. At least one example of these rules could be stated as "EMAL support does not conform to this specification if the server that proports to support EMAL does not support EHLO; EHLO support does not conform if it does not return all of the "supported" verbs (as defined above); and an EHLO implementation does not conform if it returns any (non-"X") verb strings that are not in this specification, in RFC-821, or registered with IANA. 7.2 Sender (client) considerations A client that chooses to send a EHLO command to a server will receive either host identification and a capability list or a "500 Syntax error" indication (if the EHLO verb is not supported). If a capability list is received, the client MUST NOT send verbs not listed. If a 500 reply is received, the client MUST NOT attempt to send EMAL or other verbs that do not appear in RFC-821. A client that chooses to not send a EHLO command may, in parallel with the RFC-821 model, attempt to use any desired facility and determine its availability based on the response codes. <> Discussion: EHLO Response Caching. A client is permitted to send whatever commands it likes if it does not send EHLO. The worst that can happen is that it will get a rejection reply, and that can happen regardless. With the exception of the size limits, the other capabilities can be thought of as optimizations--if you can use them in a particular situation, then things will work a little better, or smoother, or faster, or... And it will be a rare case indeed that a host will start supporting a given enhanced feature and then stop supporting it. So caching and consequently making a "wrong" inference is not, unlike picking up a bad address from a DNS cache, a threat to much of anything. It would be sensible (although not necessarily ideal) to operate on a basis that, if you don't like the answer you get, you don't cache it for very long at all; if you do like the answer, you keep it cached until such time as you get a rejection message. <> Discussion: In practical terms, if a client sends "EHLO" to a server and receives a response starting in 5, if is explicitly justified in assuming that the server does not support 8-bit transport according to this specification. If it gets a sequence of 255 responses, it is explicitly justified in assuming that the server does support 8-bit transport as specified here, as well as the additional features (such as SIZE verification) that this specification requires be supported if EHLO is specified. 7.3 The LIMIT Reply of EHLO As message sizes grow it becomes progressively more useful for connecting clients to know whether or not a server will be able to accept a message of a given size. The LIMIT reply of EHLO returns two values that are set by the administrator of the server to guide the client in the handling of large messsages. The format of the reply is as follows: 255 LIMIT Both and are positive integer counts referring to message sizes (length during transport) in kilo-octets. is the size for a message below which the server will accept under normal conditions. is the size for a message above which the server will not accept. must be greater than or equal to which must, in turn, be greater than or equal to zero. Messages between sizes and may be accepted by the server, depending on available resources. <> Implementation note: Servers may, for various administrative reasons, not want to give out exact limits. In practice, limits may also depend on the designated recipient, with some users able to receive larger messages than others even on a given host. For these and other reasons, the values and should be taken as general guidance, but not as absolute figures. Note that existing provisions of RFC1123 imply minimum values of LIMIT 64 64 for Internet hosts. <> Discussion: A client can follow up this information by using the SIZE verb (see below) to determine if the server is willing to accept messages between and . | | |---------------------<================>---------------------| Server will accept Ask Server Server will reject 8. SIZE verb A new command verb, SIZE, is defined. Server implementations that provide support for the enhanced mail verbs must accept and process SIZE. Size does not specify an exact message length, but an upper bound in kilobytes, on what will be transmitted as the "message", i.e., the number of octets to be placed on the wire after the DATA verb and up to the terminating ".CRLF" string. It is intended for server capability verification purposes and not as an alternative for delimiting the end of the message body. <> Discussion: This type of estimated size has two characteristics that exact sizes (byte or bit lengths) do not. First, it may be convenient to estimate this type of size from crude file system measures (e.g., "number of records" or "number of pages"), while a specific length may require careful examination of the data stream for, e.g. "." characters appearing at the beginning of the line. Second, it is not unusual to change internal end of line conventions to SMTP CRLF, to remap character sets, or to perform encodings to different transport conventions dynamically, rather than storing the transport-encoded mail file prior to transport (see "client considerations", below, for additional discussion of this estimating process). Various widely-practiced transport behaviors (e.g., deletion of trailing blanks), while undesirable, also can distort exact sizes. It is possible with a crude upper-bound size to statistically estimate the effects of these transformations, while exact sizes require creation (or careful simulation) of the file to be transported and possibly simulation of the transport mechanism. The argument to SIZE is a numeral specifying the predicted maximum message length in kilobytes of the message that is part of the current mail transaction. A SIZE agreement (i.e., sending of the command by the client and positive reply by the server) extends only through the end of the next DATA statement. <> Discussion: Kilobytes, rather than bytes, were chosen to stress the fact that this is an estimate, rather than a precise value, and to prevent anyone from trying to infer end-of-message from it. The use of a maximum-kilobyte estimate is also intended to smooth over most of the differences among file systems in terms of representation of end of line, widths of characters, and so on. However, the intent is to have this estimate be of the only length that has a canonical meaning, that is, the number of octets actually being transported, rather than the length in either the sending or receiving file system. SIZE should be sent, if at all, after the FROM command and prior to any RCPT commands. SIZE is not meaningful outside a mail transaction; EHLO should be used to obtain similar information. 8.1 Server considerations If a SIZE command is received as part of a mail transaction, the server SHOULD make any of three types of replies: (1) An acceptance reply, normally "250 OK", indicating that a message of up to that size may be accepted. A server MUST NOT make this reply and then reject, for reasons under its control, a message whose transport size is less than the limit specified. (2) A temporary rejection reply, normally "452 insufficient system storage", indicating that a message of the specified size cannot be accepted at present, but that this is a temporary restriction. This response means that the requested size is acceptable to the server system at some times. (3) A fatal error reply, normally "552 message size too large", indicating that a message of the specified size is not acceptable to this host. <> Discussion: This distinction is made for those cases where the size limitation may be quite transient and consistent with the sender's requeuing the message for retry and delivery later. Examples of such limitations would be such traditional problems as "system disk full", but not "we expect a new system release next week that has higher limits". Of course, if it is feasible, it would be better for systems in these transient situations to accept the message and queue or store it locally, but these could be very large messages, at least in principle. <> Implementations of support for SIZE should use caution to insure that it does not become a conduit for denial-of-service attacks. Support for SIZE is discouraged outside mail transactions and no semantics are defined for it. Enhanced servers should reject it as a syntax error, or, preferably, with "503 SIZE not accepted without a FROM verb". 8.2 Sender (client) considerations As part of a mail transaction, a sender MAY send the SIZE command for messages whose expected length is below 64 kilobytes. Senders are encouraged to send the SIZE command or use EHLO or out-of-band information to verify normal capacities for messages whose expected length is larger than 64 kilobytes. Server rejection of the SIZE command as a syntax error (not permitted from enhanced servers) SHOULD be construed by the sender as "no information" and the sender should behave as it would have behaved had the SIZE verb not been sent. If the server accepts the SIZE command but rejects the particular size requested with a temporary or fatal reply code, the sender may either abandon the mail transaction (sending QUIT or RSET) or may continue with it. However, it is not intended that SIZE become a subject for iterative negotiation between sender and receiver; senders MUST NOT send a second SIZE command within the same mail transaction. <> Discussion: Nothing other than good sense prevents a client from wildly overestimating a SIZE and, for obvious reasons, overestimating is better than underestimating. Overestimated sizes may, of course, result in unnecessary rejections. It seems unreasonable to require that servers enforce the limits to which they earlier agreed, although most will presumably enforce some limit at or above the accepted size. This is consistent with the general model of SIZE as specifying a sloppy value. <> Discussion and implementation note: One of the difficulties in estimating the amount of data to be transmitted, and hence the value to be sent with SIZE, arises when the internal storage conventions of the originating host use a single character end of line convention, or some other marking or counting convention, rather than CR LF. In this situation, some implementations have historically created a file in Internet canonical mail transfer form, i.e., with doubled leading periods and CR LF line delimiters, while others have converted to the Internet form as lines are read in and actually transmitted. In the latter case, while the file size on the local host may be readily determined from the file system, the actual number of octets to be transmitted is not known until after all of them have been sent. If such an implementation does not wish to scan the file and count line delimiters for performance reasons, the worst-case estimate of SIZE for systems using single-character line delimiters is twice the number of characters in the file (expressed in kilo-octets). This worst case would be reached if either the file consists only of line delimiters or if it consists of alternating periods and line delimiters. Since the optimal value to be sent with SIZE for files with no line-starting periods is the internal length of the file plus the number of lines it contains (that is, adding one extra character per line), a considerably better estimate than one for the the worst case may be obtained by knowing the average or typical line length in the file and dividing it into the file size to obtain the number of lines. In some cases, such as those in which a message composing agent performs line wrapping and filling functions, typical line length information might be obtained from that agent. In others, a much better-than-worst-case estimate may be obtained statistically by sampling the lengths of lines in the file, preferably by probing at random or by examining lines at several different points in the file, or, if that is not feasible, by examining the first several lines. Similar logic applies when "lines" in the internal file system are denoted in some fashion that does not involve and end-of-line character sequence, e.g., by carrying character counts for each line. 8.3 Server replies to RCPT in an implementation supporting SIZE. A relay (or post office host) that can not accept a message of some specified size may provide the client with the next hop information, in the hope that the next hop is either the final destination or can relay message of this size. If this information is to be supplied, it should be provided via the message "559 Too big, deliver to user at ... which parallels the RFC-821 message code 551. <> Example: Many sites implement (typically via DNS MX records) a single host as the normal receiver of all mail for the site or organization. This host may, however, have limited resources relative to overall demands on mail flow into the organization. On the other hand, particular users may have powerful workstations which do not have the same resource constraints such that having large messages sent directly to them might permit larger messages to be delivered. <> Discussion: Server designers contemplating this strategy should be aware that few sending systems have the capability of dealing with 551 codes automatically; these codes typically cause messages to be rejected and "bounced" to the user. Presumably the new 559 code in this case will get much the same treatment. Even if a SIZE command is set and accepted, a server is permitted to reject messages based on size for individual addresses (i.e., after receipt of RCPT TO) by responding to the delivery address with code 552. Since a client may send large messages without first sending SIZE, or may, in principle, send sizes larger than those specified, a server may reject a message as being too long if it exceeds a specified size (or if size is unspecified) as provided for in RFC-821, i.e., by returning a 552 (preferred) or 554 code after the data are received. 9. Interaction with the message format and headers. Both RFC-821 and 822 explicitly reference "ASCII" as the character code in which all text is written and with which it is interpreted. The introduction of an enhanced transport mechanism introduces a potential ambiguity, since, while there is only one ASCII, there are many character sets and mechanisms using 8-bit and longer coding. This has two implications: 9.1 Message format. When sending a message using EMAL FROM, the message format MUST conform to MIME and, in particular, with its provisions for specifying message body types and character sets. Hence, message body-parts which contain 8-bit data may do so only in a fashion consistent with MIME. 9.2 Header character set. With the exception of the "trace" or "time stamp" fields specified in RFC-821 and 822 and elaborated upon below, this specification imposes no requirements on mail header fields other than those in 9.1 above. Trace fields must be entirely in ASCII, using the leading zero form specified in RFC-821 if 8 bit underlying transport is in use. <> Discussion: Additional requirements about other header fields do appear in RFC-822 and RFCs that supplement it. This specification neither relaxes nor increases those requirements. 10. Trace fields RFC-821 specifies that mail transport agents add time stamps as trace information to messages they are processing. RFC-822 specifies, in section 4.3 (especially 4.3.2), the format of these ("Received") fields for relayed messages. RFC-822 indicates that additional "via" and "with" values should be registered but none have been as of the date of this document. While the tracing information specified in these earlier documents has proven useful, the Internet and its mail handling has evolved so that an audit trail that only documents relay and delivery activities has become inadequate. In particular, messages may be converted from one character set to another, formats may be altered, and address strings may be changed at gateways. These transformations, and information about where and how they were performed, should be included in the audit trail. The extensions to the transport mechanism contemplated here involve further complications, since gateways may be called upon to convert between one transport format and another, an activity that may require significant analysis and transformation of the message itself. The principle of providing and maintaining trace and audit trail information is reaffirmed and extended. Any mail transport facility, including gateways within the Internet and gateways from other mail systems, that relays, converts, translates, or otherwise modifies an enhanced mail message MUST add one or more "Received" fields to the message to document these changes. Mail transport facilities that relay, convert, or translate traditional SMTP mail are encouraged to do so. The intent here is to insist that any change to a message as it passes through a transport, other than adding the Received line, be documented, and documented fairly explicitly. The list of "Received" parameters in RFC-822 is extended to include ["convert" atom "to" atom ["to" atom]...] These represent, respectively, the character set and/or transport form received by the relay or gateway and the character set and/or transport form produced by the relay or gateway. "ASCII" and "EBCDIC", the keyword "8-bit", all of the transport encodings permitted in MIME, the keywords "7-bit-MIME" and "8-bit-MIME" (designating MIME over 7 and 8 bit paths respectively), and the keyword "unknown" (discussed below) are explicitly permitted for use with "convert" and "to". <> Discussion: "MIME7" and "MIME8", while obvious and more attractive alternatives, almost guarantee future confusion with, e.g., "version seven of MIME". Servers providing "Received" lines of this sort are explicitly encouraged to supplement the atoms associated with "convert" and "to" with parenthesized comments that provide prose descriptions of decisions made and actions performed when those might be helpful in subsequent understanding or debugging. When structured messages are converted from one MIME format to another, or from another format to structured MIME messages, the conversion will typically occur on individual body parts, not homogeneously for the entire message. These cases should be documented using body part conversion trace fields embedded in the message according to MIME conventions. "to 7-bit-MIME" is to be used in conjunction with such per-body-part conversion trace fields, to indicate that such fields appear and that the specific conversion information appears in them. <> Discussion: It is assumed that "convert 8-bit to 7-bit-MIME" will appear only if the message entering the gateway was determined to be in 8-bit form, but was not compliant with this specification in terms of verification of capabilities or use of MIME formats. See section 13 below. "Convert 8-bit-MIME to 7-bit-MIME" or "convert 7-bit-MIME to 7-bit-MIME" would both indicate that conversion trace information appears on a per-body-part basis in the message body and implies the presence of such information. As is the case for "with", multiple "to" parameters may be specified in a single "Received" header to denote multiple transformations. <> Discussion: If the relevant atoms were registered, this permits, e.g., "convert ASCII to PostScript to G3Fax..." although, under most circumstances, the starting and ending conversions within a given host are really all that is required. In practice, a specification that detailed would normally appear as "convert 7-bit-MIME to 7-bit-MIME", with additional information specified on a per-body-part basis within the message. As provided elsewhere in this specification, servers may choose to accept messages or protocol negotiations that are invalid in one or more respects and transform them into an acceptable form (presumably using external information) rather than returning them. In these situations, at least some of the information about the format of the incoming message cannot be known with certainty or specified with registered keywords. "Convert unknown to..." should be used to denote this situation, and the clause should be supplemented with a comment that indicates what was assumed about the incoming message, what actions were applied to it, or both. <> While there may be other cases, it is explicitly intended that "convert unknown" be used when the incoming message is invalid in the opinion of the server and the server attempts to "fix" the message before relaying it or passing it through a gateway. A parenthesised comment should be used to describe the fix applied. For the purposes of the "with" parameter, the original protocol specified by RFC-821 should be designated by "smtp", as indicated there. If the extensions of this protocol are used, "esmtp" should be used. 11. RSET and related RFC-821 issues RFC-821 is not specific about exactly what the RSET verb resets. This has apparently not been a problem in the past because of the simplicity of the protocol. This enhanced protocol includes additional commands and state information, making a more precise definition desirable. The definition provided should not constrain any existing RFC-821 implementation since it is consistent with both the current practice and the only two plausible interpretations. RSET is to be interpreted by SMTP servers as clearing state information present in a session. In particular, it eliminates the effect of any prior FROM commands, any DATA, and any delivery addresses. It resets the server's state to "not a mail transaction" (see section 2). RSET has been interpreted by some SMTP servers as requiring that a new HELO command be sent after RSET is acknowledged. Other servers assume that the previous HELO is not reset. Servers SHOULD accept a HELO command subsequent to RSET without special comment, overriding a previous one if necessary. Servers MUST NOT require a HELO command after a RSET. <> Discussion: The description above summarizes the current situation with SMTP implementations based on a series of experiments. No implementations have been identified that reject a second HELO, but it would not be surprising to find one. While the SMTP protocol provides for multiple destination (RCPT) commands, other state-inducing commands (e.g., the choice of MAIL, EMAL, or SEND with FROM) provide exclusive information that it is not meaningful to specify more than once in a given mail transaction. If a second instance of a state-inducing command appears in a given mail transaction, the server MAY either accept it, overriding earlier information, or may reject it as an out-of-sequence command with a "503 bad sequence of commands" code. A client sending multiple of these commands within a mail transaction MUST be prepared to send a RSET and start over, or to send QUIT and abandon the session, if 503 is received in this case. Clients SHOULD, if possible, behave in a way that avoids this situation. <> Discussion: The issues above do not arise in the normal case of multiple successful message transmissions in the same session, since each successful message completion (i.e., server receipt of DATA, the message, CR LF . CR LF, and then sending a positive completion reply) results in terminating a mail transaction. Clients SHOULD NOT send RSET after receipt of a 250 response after DATA and the message; servers MUST reset their states after sending that 250 response and MUST NOT require clients to send RSET before the next xxxx FROM command, where "xxxx" is "MAIL", "SEND", "SOML", "SAML", "EMAL" or some future extension as specified in section 5.1. <> Discussion: This involves another nasty and intrusive bit of reality about which RFC-821 is vague. Where something as meaning-laden as an enhanced FROM verb is involved, we can't leave this to chance. The discussion above prohibits the "use the first and ignore all the rest" and the "pick one to believe at random" cases. Some SMTP servers have been observed experimentally to work in the "accept the last one" model outlined. 12. Failure and error conditions 12.1 RFC-821 behavior with unrecognized verbs. While it is not quite explicit, RFC-821 appears to expect that, if a verb is not recognized by the receiver, it will reject the command with a "permanent error", 5yz, code, presumably 500 (Syntax error). Similarly, it appears to specify that, if the sender receives such a code, it must either abandon the mail message (sending QUIT or RSET, presumably) or do something else involving the same or a different verb; it may not simply ignore the 5yz error code and pretend it was a 2yz (or 354) code. This specification depends on that behavioral model. Consistent with RFC-821, we specify that existing SMTP servers are to reply with a return code of 500 (Syntax error) when any unfamiliar verb is received. <> Discussion: The material above should probably have made it into RFC-1123, but some of the issues--particularly the fact that anyone could ever have believed that anything else (such as simply ignoring 5yz codes) was permitted--have emerged only in the process of the investigation leading to this specification. Nonetheless, this clarification is believed to be consistent with existing usage and implementations of SMTP. <> That belief has been reinforced by fairly extensive testing-by- probing of existing implementations. No implementations exhibited catastrophic failure upon receipt of an unknown verb and all of those probed responded to such verbs by returning a "500 Syntax error" response. At the same time, it is impossible to verify that unknown commands will not cause subtle state changes in servers. Consequently, SMTP clients SHOULD respond to a "syntax error" reply by sending RSET and starting over, rather than assuming the state of the remote machine. 12.2 Responses when EMAL is recognized. An SMTP server which does implement this specification may nonetheless respond to the EMAL verb or its variations with an error message. The new code 556 is assigned to this purpose, to be construed as "enhanced transport not accepted" if it appears in response to EMAL FROM. Presumably this would occur only if the originator address (the parameter to EMAL FROM) was unacceptable for enhanced transport for some reason. 556 may also be returned in response to one or more of the RCPT commands if the refusal is destination-specific. More specifically, a receiving implementation that conforms to this specification MUST return 556 rather than 550 (or some other code) if it would accept 7-bit mail for a particular address but would not accept enhanced transport for it. Conversely, 550 must be returned when the recipient would be rejected in either case. <> Discussion: Ideally, a server that can accept a particular enhanced transport option at all should be able to accept it for any destination for which it accepts mail. In practice, that may sometimes not be the case. In addition, the general design of RFC-821 permits a server to decline to accept a particular piece of mail for any particular destination for any reason. Consequently, it is not possible to prohibit a server from accepting enhanced mail and subsequently rejecting a delivery address. Our design choices in the matter are limited to whether to permit RCPT TO to deliver 556 (indicating that the particular transport type is not acceptable) or whether it should be restricted to one of existing (RFC-821) non-delivery codes. From the sender's point of view, one could appropriately deduce that a 556 error in response to EMAL or some future enhanced FROM verb indicates that enhanced transport is not accepted from the sending host. A 556 in response to a RCPT verb would indicate that enhanced transport is not accepted for that particular address. <> Discussion: Server designers should be aware that accepting enhanced transport (e.g., 8-bit EMAL FROM) for mail to a given destination and then bouncing it is likely to be disruptive to the general mail environment, especially if the originating system was prepared to send mail in 7-bit form if necessary. Consequently, it is desirable for servers to not accept such mail unless they can guarantee delivery if the address is otherwise acceptable. This implies that it is desirable for servers to be prepared to either cause conversion of the message to an acceptable 7-bit MIME form (e.g., send it to an appropriate gateway), or that they should have out-of-band information available to permit them to determine the feasibility of enhanced delivery to a final destination without first accepting the mail. Conversely, it implies that mail client systems and those setting up, e.g., DNS records for particular hosts, should endeavor to prevent rejections from arising. If enhanced transport is accepted, and there is a subsequent delivery failure that necessitates the generation of a notification message (see RFC-1123, section 5.3.3), the error message text itself should be prepared using only ASCII graphic characters. <> Discussion: If the notification message contains the original content text, that message will normally have to be returned using enhanced transport if it was received using enhanced transport. Other provisions of this specification imply that if this is not feasible (e.g., the notification message must be returned over a path that does not support enhanced transport, the server generating the notification must either be prepared to convert the message content in a loss-less way to a 7-bit form, or that it should not attempt to return the content. If the specified enhanced transport verb is acceptable for the context specified in the mail transaction, then, when the DATA command is received, the server should return the same "354 Go ahead, terminating with CR LF . CR LF" message normally produced in response to that command. Any of the other codes returned by DATA may be returned also. 12.3 Sender action in response to fatal errors. The action to be taken by the sender if 500, 556, or any other 500-series code is returned is not specified by this specification other than in terms of the limitation imposed above that "something else" must be done. In other words, these codes MUST NOT be ignored, and octets with the high bit turned on (or extended-length characters) MUST NOT be transmitted unless an enhanced FROM command has been sent and acknowledged with a 250 code, AND a 250 or 251 reply has been received in response to at least one of the RCPT commands. The sender may, however, send a RSET and renegotiate the transfer after preparing to send data in a different form. The transformations permitted by the rules under "Interaction with message formats and headers" above are available to hosts providing intra-Internet gateway services between transport types. Of course the originating or destination system environments may make other transformations in messages appropriate to their knowledge of their own environments. <> Discussion and pointer: That paragraph contains the new version of the conversion weasel-words. With the understanding that alternate text would be gratefully welcomed, what it intends to do is to incorporate the Freed compromise. Restated very crudely into the authority model, the originating UA can do whatever it wants, and can delegate that authority to anything within its local system environment. The definition of "local system environment", the mechanism for delegation, and whether that delegation can be assumed within a local system environment are administrative questions beyond the scope of this (or any other) specification. Similarly, the ultimate destination UA can do whatever it wants, and can delegate that authority to anything within its local system environment (same definitions and qualifications). <> Nothing is said about gateways into non-Internet environments: While this further points out the importance of a "mail gateway requirements and guidelines" RFC, we have agreed that is a separate problem and one that we must avoid trying to solve here if we ever want to converge. The only conversions that are explicitly conformant to this specification involve gateways providing loss-less conversions between valid MIME formats, presumably by conversion to appropriate 7 bit transport formats and adding content-tranport-encoding fields to reflect the result of the transformations. In particular, an enhanced mail gateway MUST NOT attempt to convert between character sets or transport encodings by discarding high-order bits or octets. Similarly, conversion from one character set to another requires knowledge of both character sets and, as such, is not a transport activity. 12.4 Mail relays, mail gateways, and this protocol. While it is not explicit in RFC-821, there is a general principle that mail transport facilities should not alter, or even inspect, the message itself. There is already a small exception to this in the requirement for receivers to add trace information and "time stamp" ("Received") lines (RFC-821, page 21; RFC-1123, section 5.2.8). Although this document respecifies the trace information, it is intended to avoid making further exceptions unless necessary and to be specific about those that are necessary. If a mail gateway is used to transform the message from an enhanced transport form to a 7-bit transport form, the resulting message MUST conform to the formats specified by MIME for 7-bit transport. This may require that it understand the content and structure of messages written in that format, since mechanical translation (e.g., character set encoding) of a message that uses extended-length characters in conjunction with MIME may not produce a resulting message that is compliant with that specification. <> Discussion/Translation into plain English: Nothing in this document permits nested encodings if MIME does not permit them. The responsibility for insuring that a message transmitted with EMAL FROM is in MIME format falls largely on the originator. Section 13.2 discusses violations of this principle. 12.4.1 Review of present RFC-821 status and requirements. Under a number of circumstances, an RFC-821 SMTP sender implementation may be called upon to deliver mail, not to a final destination, but to an intermediary (relay or gateway site) or to the address of a mailing list exploder. The sender may have no way to know that it is dealing with an intermediary. An intermediate mail system may not be able to verify, e.g., addresses during the SMTP negotiation. RFC-1123 explicitly provides (section 5.2.7) for intermediate systems to return "ok" 250 codes for addresses that cannot be verified, only to send mail messages with error indications back when addresses fail after the SMTP connection is closed. Such failure could occur on the local host (e.g., local list expansion) or remotely (e.g., in a relay's SMTP processing with the next host in sequence). Consequently, a mechanism that we might describe as "whoops, that isn't really something that can be delivered as specified" alreadys exist in many SMTP server implementations, especially those that operate as relays or mail gateways. To the degree their implementation models require, clients must be prepared to deal with such delayed responses as well as immediate ones. And, as discussed above, returning messages to users as undeliverable is an acceptable (and normal) response to a receiver's rejection of enhanced forms of transport. At the same time, mail gateways are permitted to accept one address from a sender for delivery and then carry out significant transformations of that address (and even the message) before passing it along to the actual delivery host, or the next host in sequence. While RFC-821 provides for altering host names (section 3.6) and RFC-1123 provides for header, address, and protocol modifications (section 5.3.7), nothing in any Internet standard protocol to date attempts to completely specify this behavior in the general case. 12.4.2 Relay behavior. The basic model described above for RFC-821 is not changed by this protocol. While hosts that accept 8 bit messages for relaying may be prepared to "downgrade" such messages to seven bit transport, they cannot be required to do so. A receiver may reject a request for enhanced transport or for any specific transport type, regardless of whether the request comes directly from the originating host or some intermediary. A relay host that accepts enhanced transport of a particular type must be prepared for a host to which it attempts to pass the message to reject that option. Hosts may agree to enhanced transport and specific types and then "bounce" messages by mailing error indications as specified in RFC-1123 and above, just as they may accept mailbox designations and then bounce those messages. 13. Treatment of other important protocol violations 13.1 Receipt of 8 bit characters without prior verification Since it is known that some Internet hosts now send 8-bit characters without performing the verification specified in this document (i.e., sending EMAL and getting a positive response), servers should be robust enough to avoid self-destruction if non-compliant behavior of this type is encountered. As mentioned above, sending SMTPs MUST NOT transmit octets with the high bit non-zero without first successfully negotiating 8-bit transport with the receiver. Receivers are not required to enforce this requirement beyond the degree needed to prevent their destruction if this rule is violated. If a receiver encounters octets with the high bit set after a DATA command, it MUST select one of the following three alternatives to be conforming to this specification: (i) It may reject the message with a 520 error code, indicating an attempt to send invalid data over the transmission channel. This message SHOULD NOT be sent until the terminating CR LF . CR LF is received. (ii) It may deliver the message in 8 bit form if it knows that such delivery can be made reliably and without loss of information (if it is the destination MTA) and may transparently relay the message (using MAIL FROM) as received (if it is a relay MTA). <> Discussion: This option has the effect of [almost] encouraging and permitting a strategy that is otherwise taken as explicitly non-conforming: the sending of 8 bit data over a conventional 821 connection with MAIL FROM. The context for this option should be carefully understood. To be in this situation, the delivery or relay host has received a non-conforming message from a non-conforming sender. This rule exists to permit the relay to forward the message "without making things any worse", i.e., to cause the same non-conforming message to be delivered to the destination as would have been delivered had the relay not been involved. And it permits the final delivery SMTP server to do whatever it decides to do for (or to) its users, a situation that is impossible to restrict anyway. (iii) If sufficient information is available to make the conversion and it has gateway capabilities, it may convert the message to a valid MIME form consistent with seven bit transport and forward or deliver the message in that form. This requires that the received 8 bit message be in MIME form in order that, e.g., character sets can be reliably determined or that the MTA has access to reliable out of band information about the character set(s) present in the message. MTAs MUST NOT attempt to guess at information not explicitly supplied in incoming messages in order to perform conversions of this type. <> Discussion: The options above deliberately and explicitly prohibit the practice of relays "bit stripping" messages (i.e., zeroing the high-order bit) as a conversion method to 7 bit transport. This technique loses information; the severity of the information loss is a function of the actual message content and the perceptions of the user, but can be quite significant. 13.2. Receipt of non-MIME message bodies after receipt and acceptance of EMAL. This protocol requires that a message transmitted after EMAL is used in a mail transaction conform to the MIME format (see section 9.1). A receiving SMTP server or relay is not required to detect failure to conform to this requirement. However, if the server does do so, it may reject the message and should use a 558 error code in the rejection. A gateway which is otherwise inspecting or modifying the message body assumes responsibility for the messages it forwards and, consequently, must either reject invalid messages or transform them into valid form without loss of information (paralleling the discussion in section 13.1). 14. Compliance summary A server implementation supporting any of these verbs other than SIZE, must support all of them. SIZE might plausibly be supported in implementations that do not support the other verbs (which does not make that implementation fully conform to this specification), but must be supported in implementations that support the others. A server must not require that EHLO preceed the use of other verbs specified here. A client may attempt to use any of these verbs, but must observe responses to insure that the server verifies its willingness to accept them. Some of those responses constrain further action on the part of the client, as discussed above. For example, if the client asks for a capabilities list (via EHLO), it must not send commands that are not represented on the list received. Similarly, if a receiving SMTP rejects the EMAL FROM command, the client must not attempt to transport 8-bit information with the DATA command. Nothing in this specification imposes any requirements that clients wait for responses to particular commands before issuing the next one(s) that are not imposed by RFC-821 or the logic of the commands themselves. However, several of the provisions above imply that a client must synchronize with verifications (affirmative responses) from the server before actually sending the message body. 15. References [ANSI-X3.4] American National Standards Institute, "Coded Character Set--7-Bit American Standard Code for Information Interchange", ANSI X3.4-1986. [Gianone] Gianone, Christine M., "A Kermit Protocol Extension for International Character Sets", Columbia University, 1990 (unpublished paper). Available via anonymous FTP from watsun.cc.columbia.edu:kermit/e/isok5.txt. [RFC821] Postel, J. "Simple Mail Transfer Protocol", RFC-821, August 1982. [RFC822] Crocker, D. "Standard for the Format of ARPA Internet Text Messages", RFC 822, August 1982. [RFC1123] Braden, R. "Requirements for Internet Hosts -- Application and Support", RFC-1123, October 1989. [RFC1342] Borenstein, N. and N. Freed. "MIME (Multipurpose Internet Mail Extensions): Mechanisms for specifying and describing the format of internet message bodies", RFC-1341, June 1992. [ISO646] International Organization for Standardization. "International Standard--Information Processing--ISO 7-bit coded character set for information interchange", ISO 646:1983. 16. Acknowledgements This document represents a synthesis of the ideas of many people and reactions to the ideas and proposals of others. Randall Atkinson, Craig Everhart, Risto Kankkunen, and Greg Vaudreuil contributed ideas and text sufficient to be considered co-authors. Other important suggestions, text, or encouragement came from Harald Alvestrand, Jim Conklin, Mark Crispin, Frank da Cruz, Dave Crocker, Ned Freed, 'Olafur Gudmundsson, Per Hedeland, Christian Huitma, Neil Katin, Eliot Lear, Harold A. Miller, Dan Oscarsson, Einar Stefferud, Rayan Zachariassen, and probably several others. Of course, none of these people are necessarily responsible for the combination of ideas represented here. Indeed, in some cases, the response to a particular criticism was to accept the problem identification but to include an entirely different solution from the one originally proposed. 17. Security considerations. This RFC does not discuss security issues and is not believed to raise any security issues not endemic in electronic mail and present in fully conforming implementations of RFC-821. It does provide, via the EHLO verb and response, an announcement of system mail capabilities, but all of the information provided can be readily deduced by selective probing of the verbs required to transport and deliver mail. Similarly, as discussed above, capabilities such as those provided by the SIZE verb might be used for crude attempts at denial of service attacks, but, unless implementations are very weak, there is no problem here that has not always existed with SMTP. 18. Editor's Address John C. Klensin Department of Architecture Room N52-457 Massachusetts Institute of Technology Cambridge, MA 02139 USA tel: 617 253 1355 (international: +1 617 253 1355) fax: 617 491 6266 (international: +1 617 491 6266) email: Klensin@MIT.EDU ------------------------------ Appendix A New response codes (or response codes used in new ways) introduced in this document. 255 EHLO: Normal informational response 452 SIZE: insufficient system storage 503 SIZE: not accepted without a FROM verb 503 Redundant use of any state-inducing command: Use of, e.g., MAIL FROM twice in the same mail transaction, or both MAIL FROM and EMAL FROM. 520 DATA: Invalid data on transmission channel (8-bit data encountered after MAIL FROM command or non-MIME data after EMAL FROM. 552 SIZE: message size too large 552 RCPT: message size too large for this specific address. 556 EVFY: Address ok, enhanced transport not accepted. 556 EMAL: Enhanced transport not accepted. 556 RCPT: Enhanced transport not accepted (destination specific) 558 DATA: Invalid message format (i.e., not MIME) encountered after EMAL FROM command. 559 RCPT: user not local, please try... (if message can be delivered directly, but not to the originally-specified address) Expires: January 27, 1993 ********** [Temporary] Appendix B: Changes for the 15 July version arising from the Boston IETF and subsequent fine-tuning discussions. (Changes to draft-ietf-smtpext-8bittransport-05.txt) -- Additional language for the expanded trace fields, including a discussion of bogus messages. -- More specific language about what should be returned when EHLO is sent and how the response should be construed. -- Change of title to remove "text-based". -- Clarification of the role of "discussion" sections. -- Several more typos and small clarifications. Changes for the 24 June version arising from on-list discussion of the draft produced after the San Diego IETF. (Changes to draft-ietf-smtpext-8bittransport-04.txt) -- Reinstated general extension mechanism in section 5. -- Cleaned up some section numbering and make several small editorial changes. -- Tightened definition of EHLO with regard to features and future extensions. Changes from the 12 March version arising from discussion at the March 1992 (San Diego) IETF and discussion subsequent to that meeting: (Changes to draft-ietf-smtpext-8bittransport-03.txt) -- Consolidate of the "capabilities" notion (former CPBL verb) with an enhanced "hello" command (EHLO). -- Added discussion of a possible 551 response to RCPT TO in the context of a "too large" message announced by SIZE (section 8.3). -- Added some new text to the complicance summary (section 14). -- Replacement of "negotiate" with the more precise "verify" -- Removed several remaining textual artifacts of incomplete or rejected design ideas. -- Removed open issue in EVFY and replaced all placeholders with text. -- Removed explicit disclaimer that these features are not asserted to be necessary. -- Changed language about failure/reporting of inability to deliver in 8-bit format to a Discussion and made explicit provision for non-return of content. -- Replaced the "conformance summary" placeholder with text. -- Defined the reply syntax from EHLO/CPBL and the associated size model. -- Defined and clarified the use of SIZE. -- Added new subsection to discuss the error handling when non-MIME message bodies are received with EMAL. -- Fixed text to be less tedious about "ANSI X3.4". -- Addition of brief discussion/explanation about the issues associated with "long" (multiple octet) characters. Changes from the 22 November version (draft-ietf-smtpex-8bittransport-02.txt) -- Fixed several small typographical errors -- Removed a few residual vestiges of very wide transport and envelope character set specifications. -- Replaced several different types of references to what is now MIME with that designation. -- Removed several server requirements for the 8->7 boundary, adopting the "conformance on the wire" and "separate requirements doc" approach agreed to on the mailing list. In particular, the "wretched solution" has been removed or, more exactly, downgraded to a discussion note as agreed upon on the mailing list. -- Improved rules for server when unnegotiated 8-bit is encountered, per mailing list. These are a change in tone, but not a real change in requirements. -- Removed "tentative decision" identifiers in all areas in which no disagreement has been expressed on the list, since these tentative agreements were discussed in Santa Fe. -- Made the rule requiring all hosts that support EMAL to also support MAIL explicit as the result of mailing list discussion. -- Provided a thinly-disguised forward pointer to the MXE proposal. ********** [Temporary] Appendix C. Outstanding issues, RFC-ZZZZ-03. 12 March 1992. ** Items marked ** have been fixed or eliminated as problems in the first or second drafts of version 4. ** **(1) Should the remaining discussion paragraphs be retained somewhat longer or removed at this time? (general) **(2) Is the extension mechanism for alternate "FROM" forms adequate specified, and, if how, how should it be specified (Section 5, IESG/IANA issue). **(3) Does EVFY need to distinguish between "cannot verify remote address" and "cannot verify remote address and whether 8-bit mail can be delivered to it"? (Section 6). **(4) Syntax model for CPBL (Section 7.1) **(5) (placeholder) Syntax for SIZE replies (Section 7.1) **(6) (placeholder) Improved wording needed for the trace field requirement (Section 10) **(7) Need a pair of Received keywords to replace 7/8-bit-MIME. (section 10) **(8) Granularity of "Received: ...convert...to" (Section 10) **(9) Errors returned by different paths than messages were sent over and non-return of content. (Section 12.2) **(10) (placeholder) The compliance summary is still a placeholder (section 14). --------------------- The following additional items, left over from November, will go away as indicated unless serious proposals appear early in the San Diego meeting or sooner. (11) Open issue: The CPBL functionality gives us a way to explicitly specify how further extensions beyond those of this document (including "private" ones) might be tested. In addition to possibly the usual words about "X"s, we could *require* that the attempted use of any verb not specified in a standard or near-standard RFC must be preceeded by the use of CPBL to verify that the server supports it. My bias that being explicit reduces later problems makes a small argument for including some text to this effect. Anyone who feels strongly one way or the other should speak up. Default decision: Defer (punt) **(12) Extensions/ mechanisms for formatted error messages when such messages are mailed back. There are really two separate problems here: an encapsulation model (MIME extension) for returning the content of 8-bit messages over 7-bit channels and a canonical representation and taxonomy for mailed error responses. Note that these are primarily MIME problems; RFC-ZZZZ mostly just needs to point to the solution. Also note that not solving the encapsulation problem implies non-return of content in some cases. Default decision: if agreement cannot be reached, the language in RFC-ZZZZ that permits non-return of content in some cases will be strengthened. The canonical mesage form problem is one we have been living with since before RFC-821 and is not on the critical path for RFC-ZZZZ. ********** Expires: January 27, 1993