Applications Area Working Group S. Leonard Internet-Draft Penango, Inc. Intended Status: Informational September 22, 2014 Expires: March 26, 2015 The text/markdown Media Type draft-ietf-appsawg-text-markdown-02.txt Abstract This document registers the text/markdown media type for use with Markdown, a family of plain text formatting syntaxes that optionally can be converted to formal markup languages such as HTML. Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." Copyright Notice Copyright (c) 2014 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. [[TODO: add table of contents.]] Leonard Exp. March 26, 2015 [Page 1] Internet-Draft The text/markdown Media Type September 2014 1. Introduction 1.1. On Formats In computer systems, textual data is stored and processed using a continuum of techniques. On the one end is plain text: a linear sequence of characters in some character set (code), possibly interrupted by line breaks, page breaks, or other control characters. Plain text provides /some/ fixed facilities for formatting instructions, namely codes in the character set that have meanings other than "represent this character on the output medium"; however, these facilities are not particularly extensible. Compare with [RFC6838] Section 4.2.1. Applications may neuter the effects of these special characters by prohibiting them or by ignoring their dictated meanings, as is the case with how modern applications treat most control characters in US-ASCII. On this end, any text reader or editor that interprets the character set can be used to see or manipulate the text. If some characters are corrupted, the corruption is unlikely to affect the ability of a computer system to process the text (even if the human meaning is changed). On the other end is binary format: a sequence of instructions intended for some computer application to interpret and act upon. Binary formats are flexible in that they can store non-textual data efficiently (perhaps storing no text at all, or only storing certain kinds of text for very specialized purposes). Binary formats require an application to be coded specifically to handle the format; no partial interoperability is possible. Furthermore, if even one byte or bit are corrupted in a binary format, it may prevent an application from processing any of the data correctly. Between these two extremes lies formatted text, i.e., text that includes non-textual information coded in a particular way, that affects the interpretation of the text by computer programs. Formatted text is distinct from plain text and binary format in that the non-textual information is encoded into textual characters, which are assigned specialized meanings /not/ defined by the character set. With a regular text editor and a standard keyboard (or other standard input mechanism), a user can enter these textual characters to express the non-textual meanings. For example, a character like "<" no longer means "LESS-THAN SIGN"; it means the start of a tag or element that affects the document in some way. On the formal end of the spectrum is markup, a family of languages for annotating a document in such a way that the annotations are syntactically distinguishable from the text. Markup languages are (reasonably) well-specified and tend to follow (mostly) standardized syntax rules. Examples of markup languages include SGML, HTML, XML, Leonard Exp. March 26, 2015 [Page 2] Internet-Draft The text/markdown Media Type September 2014 and LaTeX. [[TODO: CITE.]] Standardized rules lead to interoperability between markup processors, but a skill requirement for new (human) users of the language that they learn these rules in order to do useful work. This imposition makes markup less accessible for non-technical users (i.e., users who are unwilling or unable to invest in the requisite skill development). informal /---------formatted text----------\ formal <------v-------------v-------------v-----------------------v----> plain text informal markup formal markup binary format (Markdown) (HTML, XML, etc.) Figure 1: Degrees of Formality in Data Storage Formats for Text On the informal end of the spectrum are lightweight markup languages. In comparison with formal markup like XML, lightweight markup uses simple syntax, and is designed to be easy for humans to enter with basic text editors. Markdown, the subject of this document, is an /informal/ plain text formatting syntax that is intentionally targeted at non-technical users (i.e., users upon whom little to no skill development is imposed) using unspecialized tools (i.e., text boxes). Jeff Atwood once described these informal markup languages as "humane" [HUMANE]. 1.2. Markdown Design Philosophy Markdown specifically is a family of syntaxes that are based on the original work of John Gruber with substantial contributions from Aaron Swartz, released in 2004 [MARKDOWN]. Since its release a number of web or web-facing applications have incorporated Markdown into their text entry systems, frequently with custom extensions. Fed up with the complexity and security pitfalls of formal markup languages (e.g., HTML5) and proprietary binary formats (e.g., commercial word processing software), yet unwilling to be confined to the restrictions of plain text, many users have turned to Markdown for document processing. Whole toolchains now exist to support Markdown for online and offline projects. Informality is a bedrock premise of Gruber's design. Gruber created Markdown after disastrous experiences with strict XML and XHTML processing of syndicated feeds. In Mark Pilgrim's "thought experiment", several websites went down because one site included invalid XHTML in a blog post, which was automatically copied via trackbacks across other sites [DIN2MD]. These scenarios led Gruber to believe that clients (e.g., web browsers) SHOULD try to make sense of data that they receive, rather than rejecting data simply because it fails to adhere to strict, unforgiving standards. (In [DIN2MD], Gruber compared Postel's Law [RFC0793] with the XML standard, which Leonard Exp. March 26, 2015 [Page 3] Internet-Draft The text/markdown Media Type September 2014 says: "Once a fatal error is detected [...] the processor MUST NOT continue normal processing" [XML1.0-3].) As a result, there is no such thing as "invalid" Markdown; there is no standard demanding adherence to the Markdown syntax; there is no governing body that guides or impedes its development. If the Markdown syntax does not result in the "right" output (defined as output that the author wants, not output that adheres to some dictated system of rules), Gruber's view is that the author either should keep on experimenting, or should change the processor to address the author's particular needs (see [MARKDOWN] Readme and [MD102b8] perldoc; see also [CATPICS]). 1.3. Uses of Markdown Since its introduction in 2004, Markdown has enjoyed remarkable success. Markdown works for users for three key reasons. First, the markup instructions (in text) look similar to the markup that they represent; therefore the cognitive burden to learn the syntax is low. Second, the primary arbiter of the syntax's success is *running code*. The tool that converts the Markdown to a presentable format, and not a series of formal pronouncements by a standards body, is the basis for whether syntactic elements matter. Third, Markdown has become something of an Internet meme [INETMEME], in that Markdown gets received, reinterpreted, and reworked as additional communities encounter it. There are communities that are using Markdown for scholarly writing [CITE], for screenplays [CITE], for mathematical formulae [CITE], and even for music annotation [CITE]. Clearly, a screenwriter has no use for specialized Markdown syntax for mathematicians; likewise, mathematicians do not need to identify characters or props in common ways. The overall gist is that all of these communities can take the common elements of Markdown (which are rooted in the common elements of HTML circa 2004) and build on them in ways that best fit their needs. 1.4. Uses of Labeling Markdown Content as text/markdown To support identifying and conveying Markdown (as distinguished from plain text), this document defines a media type and parameters that indicate, in broad strokes, the author's intent on how to interpret the Markdown. This registration draws particular inspiration from the text/troff registration [RFC4263]; troff is an informal plain text formatting syntax primarily intended for output to monospace line- oriented printers and screen devices. In that sense, Markdown is a kind of troff for modern computing. The primary purpose of an Internet media type is to label "content" on the Internet, as distinct from "files". Content is any computer- readable format that can be represented as a primary sequence of Leonard Exp. March 26, 2015 [Page 4] Internet-Draft The text/markdown Media Type September 2014 octets, along with type-specific metadata (parameters) and type- agnostic metadata (protocol dependent). From this description, it is apparent that appending ".markdown" to the end of a filename is not a sufficient means to identify Markdown. Filenames are properties of files in file systems, but Markdown frequently exists in databases or content management systems (CMSes) where the file metaphor does not apply. One CMS [RAILFROG] uses media types to select appropriate processing, so a media type is necessary for the safe and interoperable use of Markdown. Unlike complete HTML documents, [MDSYNTAX] provides no means to include metadata into the content stream. Several derivative flavors have invented metadata incorporation schemes (e.g., [MULTIMD]), but these schemes only address specific use cases. In general, the metadata must be supplied via supplementary means in an encapsulating protocol, format, or convention. The relationship between the content and the metadata is not directly addressed by this specification; however, by identifying Markdown with a media type, Markdown content can participate as a first-class citizen with a wide spectrum of metadata schemes. Finally, registering a media type through the IETF process is not trivial. Markdown can no longer be considered a "vendor"-specific innovation, but the registration requirements even in the vendor tree have proven to be overly burdensome for most Markdown implementers. Moreover, registering hundreds of Markdown variants with distinct media types would impede interoperability: virtually all Markdown content can be processed by virtually any Markdown processor, with varying degrees of success. The goal of this specification is to reduce all of these burdens by having one media type that accommodates diversity and eases registration. 1.3. Requirements Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. Leonard Exp. March 26, 2015 [Page 5] Internet-Draft The text/markdown Media Type September 2014 2. Example The following is an example of Markdown as an e-mail attachment: MIME-Version: 1.0 Content-Type: text/markdown; charset=UTF-8; flavor=Original; processor="Markdown.pl-1.0.2b8 --html4tags" Content-Disposition: attachment; filename=readme.md Sample HTML 4 Markdown ============= This is some sample Markdown. [Hooray!][foo] (Remember that link names are not case-sensitive.) Bulleted Lists ------- Here are some bulleted lists... * One Potato * Two Potato * Three Potato - One Tomato - Two Tomato - Three Tomato More Information ----------- [.markdown, .md](http://daringfireball.net/projects/markdown/) has more information. [fOo]: http://example.com/some/foo/location 'This Title Will Not Work with Markdown.pl-1.0.1' 3. Markdown Media Type Registration Application This section provides the media type registration application for the text/markdown media type (see [RFC6838], Section 5.6). Type name: text Subtype name: markdown Required parameters: charset. Per Section 4.2.1 of [RFC6838], charset is REQUIRED. There is no default value. UTF-8 is Leonard Exp. March 26, 2015 [Page 6] Internet-Draft The text/markdown Media Type September 2014 RECOMMENDED; however, neither [MDSYNTAX] nor popular implementations at the time of this registration actually require or assume any particular encoding. In fact, many Markdown processors can get along just fine by operating on character codes that lie in the Portable Character Set (i.e., printable US-ASCII), blissfully oblivious to coded values outside of that range. Optional parameters: The following parameters reflect the author's intent regarding the content. A detailed specification can be found in Section 4. flavor: The variant, or "flavor" of the Markdown content, with optional rules (qualifiers). Default value: "Original". processor: A specific Markdown implementation, with optional arguments. Default value: none (receiver's choice). output-type: The Content-Type (Internet media type) of the output, with optional parameters. Default value: "text/html". Encoding considerations: Text. Security considerations: Markdown interpreted as plain text is relatively harmless. A text editor need only display the text. The editor SHOULD take care to handle control characters appropriately, and to limit the effect of the Markdown to the text editing area itself; malicious Unicode- based Markdown could, for example, surreptitiously change the directionality of the text. An editor for normal text would already take these control characters into consideration, however. Markdown interpreted as a precursor to other formats, such as HTML, carry all of the security considerations as the target formats. For example, HTML can contain instructions to execute scripts, redirect the user to other webpages, download remote content, and upload personally identifiable information. Markdown also can contain islands of formal markup, such as HTML. These islands of formal markup may be passed as-is, transformed, or ignored (perhaps because the islands are conditional or incompatible) when the Markdown is interpreted into the target format. Since Markdown may have different interpretations depending on the tool and the environment, a better approach is to analyze (and sanitize or block) the output markup, rather than attempting to analyze the Markdown. Specific security considerations apply to the optional parameters; Leonard Exp. March 26, 2015 [Page 7] Internet-Draft The text/markdown Media Type September 2014 for details, consult Section 4. Interoperability considerations: Markdown flavors are designed to be broadly compatible with humans ("humane"), but not necessarily with each other. Therefore, syntax in one Markdown flavor may be ignored or treated differently in another flavor. The overall effect is a general degradation of the output, proportional to the quantity of flavor-specific Markdown used in the text. When it is desirable to reflect the author's intent in the output, stick with the flavor identified in the flavor parameter. Published specification: This specification. Applications that use this media type: Markdown conversion tools, Markdown WYSIWYG editors, and plain text editors and viewers; target markup processors indirectly use Markdown (e.g., web browsers for Markdown converted to HTML). Additional information: Magic number(s): None File extension(s): .md, .markdown Macintosh file type code(s): TEXT Person & email address to contact for further information: Sean Leonard Restrictions on usage: None. Author/Change controller: Sean Leonard Intended usage: COMMON Provisional registration? Yes 4. Optional Parameters The following optional parameters can be used by an author to indicate the author's intent regarding how the Markdown ought to be processed. For security and accuracy, IANA registries will be created. However, authors who wish to use custom values by private agreement may do so via an extension mechanism; all unregistered identifiers MUST start with an exclamation mark "!". Leonard Exp. March 26, 2015 [Page 8] Internet-Draft The text/markdown Media Type September 2014 All identifiers are case-sensitive; receivers MUST compare for exact equality. Identifiers MUST NOT be registered if another registration differs only in the casing, as these registrations may cause confusion. The following ABNF definitions are used in this section: EXTCHAR = REXTCHAR = Figure X: ABNF Used in This Section The discussion in this section presumes that the parameter values are discrete strings. When encoded in protocols such as MIME [RFC2045], however, the value strings MUST be escaped properly. 4.1. flavor The flavor parameter indicates the Markdown variant in which the author composed the content. The overall intent of this parameter is to provide a facility for Markdown tools, such as graphical editors, to be able to broadly categorize the content and perform useful services such as syntax highlighting without resorting to executing the Markdown processor. Of course, actual recipients may use this information for any useful purpose, including picking and configuring an appropriate Markdown processor. The entire parameter is case- sensitive. An IANA registry of flavors will be created as discussed in Section 5. A flavor identifier is composed of two or more Unicode characters excluding spaces (Zs category), control characters, the hyphen-minus "-", quotation marks """, and the plus sign "+"; however, ASCII characters alone SHOULD be used. Additionally, registered flavor identifiers MUST NOT begin with "!", the exclamation mark. By convention, flavor identifiers start with a capital letter (when using Roman characters), but this is not a requirement. Unregistered flavor identifiers MUST begin with "!" (plus two additional characters). When omitted, the default value is "Original". Its meaning is covered in Section 5. Generators MUST NOT emit empty flavor parameters, but parsers MUST treat empty flavor parameters the same as if omitted. Leonard Exp. March 26, 2015 [Page 9] Internet-Draft The text/markdown Media Type September 2014 The full ABNF of the flavor parameter is: flavor-param = flavor *( *WSP rule ) *WSP flavor = registered-fid / unregistered-fid registered-fid = fid-char 1*("!" / fid-char) unregistered-fid = "!" 2*fid-char fid-char = %d35-%d42 / %d44 / %d46-%d126 / REXTCHAR rule = "+" (should-rule / any-rule) should-rule = should-rule-char [ *(should-rule-char / "_") should-rule-char ] any-rule = 1*rule-char rule-char = %d35-%d42 / %d44-%d126 / REXTCHAR Figure X: ABNF of the flavor parameter 4.1.1. flavor rules [[TODO: consider. This section is mainly inspired from pandoc.]] Most flavors are self-contained, with no options. However, some flavors have optional rules that may be applied with discretion. For those flavors where optional rules are an integral feature, the author MAY indicate that those extra rules be applied in a plus sign- delimited list. Because Markdown has no inherent concept of validity, authors SHOULD be aware that receivers are not required to honor these optional rules--the special characters in the Markdown content may well be interpreted as plain text, rather than Markdown markup. Generally speaking, defining a new (simple) flavor is preferable to defining a complex flavor with multiple optional rules. A flavor rule identifier is composed of any sequence of Unicode characters excluding spaces (Zs category), control characters, quotation marks """, exclamation marks "!", and the plus sign "+"; however, lowercase ASCII letters and the underscore "_" alone SHOULD be used, where the underscore SHOULD NOT be at the beginning or end. The syntax for flavor rules derives in significant part from pandoc [PANDOC]. [[TODO: There are no requirements about exclamation marks for unregistered rules...flavor rules SHOULD be registered along with the Leonard Exp. March 26, 2015 [Page 10] Internet-Draft The text/markdown Media Type September 2014 flavor, but a receiver does not need to reject the flavor parameter simply because it does not recognize a rule...it can just ignore the rule.]] 4.2. processor The processor parameter indicates the specific Markdown implementation that the author intends be used. The purpose of this parameter is to control the automatic processing of Markdown into some output format, but of course actual recipients may use this information for any useful purpose. The entire parameter is case- sensitive. An IANA registry of processors will be created as discussed in Section 5. A processor identifier is composed of two or more Unicode characters excluding spaces (Zs category), control characters, the hyphen-minus "-", quotation marks """, the less-than sign "<", and the greater-than sign ">"; however, ASCII characters alone SHOULD be used. Additionally, registered processor identifiers MUST NOT begin with "!", the exclamation mark. Unregistered processor identifiers MUST begin with "!" (plus two additional characters). When omitted, the default value is to use whatever processor the receiver prefers. Generators MUST NOT emit empty processor parameters, but parsers MUST treat empty processor parameters the same as if omitted. The full ABNF of the processor parameter is: processor-param = processor [ "-" version ] *( 1*WSP argument ) *WSP processor = registered-pid / unregistered-pid registered-pid = pid-char 1*("!" / pid-char) unregistered-pid = "!" 2*pid-char version = pid-char *("!" / pid-char) argument = regular-argument / uri-argument regular-argument = 1*(regular-char / quoted-chars) pid-char = %d35-%d44 / %d46-%d59 / %d61 / %d63-126 / REXTCHAR regular-char = %d33 / %d35-%d59 / %d61 / %d63-126 / REXTCHAR Leonard Exp. March 26, 2015 [Page 11] Internet-Draft The text/markdown Media Type September 2014 quoted-chars = DQUOTE *pqcontent DQUOTE pqcontent = %d1-%d33 / %d35-127 / EXTCHAR / DQUOTE DQUOTE uri-argument = "<" URI-reference ">" ; from [RFC3986] Figure X: processor parameter ABNF 4.2.1. processor version For better precision, an author MAY include the processor version. The version is delimited from the processor identifier with a hyphen- minus "-"; the version string itself is an opaque string. Version strings (e.g., "2.0", "3.0.5") are registered and updated along with the processor registration. Updates to processor registrations SHOULD only add new versions when those new versions have a material difference on the interpretation of the Markdown content. If a processor has a version "2014.10" and a version "2014.11", for example, but "2014.11" only provides performance updates, then the processor registration SHOULD NOT separately register the "2014.11" version. The repertoire of the version string is the same as the processor identifier (and like the processor identifier, ASCII characters alone SHOULD be used). A receiver that recognizes the processor but not the processor version MAY use any version of the processor, preferably the latest version. 4.2.2. processor arguments Processor arguments MAY be supplied for finer-grained control over how the processor behaves. Multiple arguments and URI references are supported. 4.2.2.1. Quoted Arguments According to the ABNF above, arguments are delimited by whitespace. Quotation marks are used to support zero-length arguments, as well as whitespace or quotation marks in a single argument. If a quotation mark appears anywhere in the argument, the following text is considered quoted; two successive quotation marks "" mean one quotation mark. A single quotation mark ends the quoting. Because of this rule, quotation marks do not have to appear at the termini of an argument; embedded quotation marks start (and end) quoting within a single argument. For example: a""b means: ab Leonard Exp. March 26, 2015 [Page 12] Internet-Draft The text/markdown Media Type September 2014 for the actual argument. 4.2.2.2. URI Reference Arguments Certain processors can take supplementary content, such as metadata, from other resources. To support these workflows, an author MAY use the URI delimiters <> to signal a URI, such as cid: or mid: URLs [RFC2392] in the context of MIME messages. The URI MUST comply with [RFC3986], and MAY be a relative reference if the subject Markdown content has a base URI. The receiver is to interpret this as a request to retrieve the resource, and to supply that resource in a local reference form that the processor can use (e.g., via a temporary file). The URI MUST be entire argument; the URI cannot be combined with other text to constitute the argument (and the ABNF above supports this restriction). The reason for this restriction is security, so that a maliciously constructed argument string cannot resolve to some other file reference (such as parent directories like ../ or special files such as /dev/hd0). If the processor accepts URI strings directly, the string is to be supplied as a regular string without <> delimiters. For security reasons, direct file references MUST NOT be included in the processor arguments. The prior paragraph notwithstanding, certain workflows may require file references. In such cases, file: URLs [RFC1738] (including relative references) are appropriate. The receiver SHOULD apply the same security and privacy analyses to file: URLs as it would to any other URI. 4.2.2.3. Appropriate Arguments and Security Considerations Not all arguments are appropriate for inclusion in the processor parameter. Appropriate arguments are basically limited to those that affect the output markup, without side-effects. Arguments MUST NOT identify input sources or output destinations. For example, if a processor normally reads Markdown input using the arguments "-i filename" or "< filename" (i.e., from standard input), those arguments MUST be omitted. Arguments that have no bearing on the output MUST be omitted as well, such as arguments that control verbosity of the processor (-v) or that cause side-effects (such as writing diagnostic messages to some other file). Of course, if warnings or errors are signaled within the output, arguments enabling that output MAY be used. When in doubt, a receiver SHOULD omit arguments with unknown or undocumented effects, and MAY ignore author-supplied arguments entirely, but SHALL NOT reorder arguments. An author has very little assurance that a receiver will honor unregistered arguments. Consequently, the burden is squarely on processor registrants Leonard Exp. March 26, 2015 [Page 13] Internet-Draft The text/markdown Media Type September 2014 (Section 5.2) to document their arguments properly. For security reasons, the parsed argument array (or a string unambiguously representing the delimited argument array) MUST be passed directly to the processor. Emitting the argument array as-is in a batch script (for example) may cause risky side effects, such as automatic substitutions, alias activation, or macro execution. The arguments in this parameter MUST be encoded to preserve characters outside of US-ASCII, and to signal the required encoding to the receiver. When going between (system) processes, some implementations may interpret character codes based on locale environment variables. Therefore, it is not sufficient to pass arguments from this parameter "as-is" to the processor: the routine MUST change the locale or transform the arguments to an appropriate character encoding so that there is no ambiguity. Furthermore, the NUL character (%d0, U+0000) is not permitted because most common operating systems use that code point as a delimiter. 4.2.3. Examples of processor parameters [[TODO: provide examples.]] 4.3. output-type The output-type parameter indicates the Internet media type (and parameters) of the output from the processor. When omitted, the default value is "text/html". Generators MUST NOT emit empty output-type parameters, but parsers MUST treat empty output-type parameters the same as if omitted. The default value of text/html ought to be suitable for the majority of current purposes. However, Markdown is increasingly becoming integral to workflows where HTML is not the target output; examples range from TeX [CITE], to PDF [CITE], to OPML [CITE], and even to entire e-books [CITE]. Security provides a significant motivator for this parameter. Most Markdown processors emit byte (octet) streams; without a well-defined means for a Markdown processor to pass metadata onwards, it is perilous for post-processing to assume that the content is always HTML. A processor might emit PostScript (application/postscript) content, for example, in which case an HTML sanitizer would fail to excise dangerous instructions. The value of output-type is an Internet media type with optional parameters. The syntax (including case sensitivity considerations) is the same as specified in [RFC2045] for the Content-Type header (with Leonard Exp. March 26, 2015 [Page 14] Internet-Draft The text/markdown Media Type September 2014 updates over time), namely: type "/" subtype *(";" parameter) ; Matching of media type and subtype ; is ALWAYS case-insensitive. Figure X: Content-Type ABNF (from [RFC2045]) The Internet media type in the output-type parameter MUST be observed. Processors or processor arguments that conflict with the output-type parameter MUST be re-chosen, ignored, or rejected. Although arbitrary optional parameters may be passed along with the Internet media type, receivers are under no obligation to honor or interpret them in any particular way. For example, the parameter value "text/plain; format=flowed; charset=ISO-2022-JP" obligates the receiver to output text/plain (and to treat the output as plain text- -no sneaking in or labeling the output as HTML!). In contrast, such a parameter value neither obligates the receiver to follow [RFC3676] (for flowed output) nor to output ISO-2022-JP Japanese character encoding (see [RFC1468]). Markdown implementations for all kinds of formats already exist, including formats that are not registered Internet media types, or that are inexpressible as Internet media types. For example, one Markdown processor for the mass media industry outputs formatted screenplays [CITE to fountain.io]: none of applicable media types application/pdf, text/html, or text/plain adequately distinguish this kind of output. Such distinctions SHOULD be made in the processor parameter (and to a lesser extent, the flavor parameter), underscoring that the primary concern of the output-type parameter is making technical and security-related decisions. The output-type parameter does not distinguish between fragment content and whole-document content. A Markdown processor MAY (and typically will) output HTML or XHTML fragment content, without preambles or postambles such as , , , , , , or elements. Receivers MUST be aware of this behavior and take appropriate precautions. [[TODO: consider.]] The author may specify the output-type "text/markdown", which has a special meaning. "text/markdown" means that the author does not want to invoke Markdown processing at all: the receiver SHOULD view the Markdown source as-is. In this case, the processor choice has little practical effect because the Markdown is not actually processed, but other tools can use the flavor parameter (and secondarily if so inclined, the processor parameter) to perform useful services such as Leonard Exp. March 26, 2015 [Page 15] Internet-Draft The text/markdown Media Type September 2014 syntax highlighting. This output-type is not the default because one generally assumes that Markdown is meant for composing rather than reading: readers expect to see the output format (or dual-display of the output and the Markdown). However, if authors are collaboratively editing a document or are discussing Markdown, "text/markdown" may make sense. While the optional parameter output-type may be used recursively (as a sneaky way to stash the author's follow-on or secondary intent), receivers are not obligated to recognize it; optional parameters internal to output-type MAY be ignored. 5. IANA Considerations IANA is asked to register the media type text/markdown in the Standards tree using the application provided in Section 2 of this document. IANA is also asked to establish a subtype registry called "Markdown Parameters". The registry has two sub-registries: a registry of flavors and a registry of processors. 5.1. Registry of Flavors Each entry in this registry shall consist of a flavor identifier and information about the flavor, as follows: 5.1.1. Flavor Template Identifier: [Identifier] Description: [Concise, prose description of the syntax, with emphasis on its purpose, the community that it addresses, and notable variations from [MDSYNTAX] or another flavor.] Documentation: [References to documentation.] Rules: {for each rule} Identifier: [Identifier] Description: [Concise, prose description of the rule.] Documentation: [References to documentation.] Responsible Parties: {for each party} ([type: individual, corporate, representative]) [Name] ... Currently Maintained? [Yes/No] Leonard Exp. March 26, 2015 [Page 16] Internet-Draft The text/markdown Media Type September 2014 Tools: {for each tool} Name: [Name] Version(s): [Significant version or versions that implement the flavor] Type: ["Processor" or some other type] Reference(s): ... Purpose: [Concise, prose description of the tool.] A responsible party can be an individual author or maintainer, a corporate author or maintainer (plus an individual contact), or a representative of a community of interest dedicated to the Markdown syntax. Multiple tools MAY be listed, but only one is necessary for a successful registration. If a tool is a Markdown processor, it MUST be registered; however, any Markdown-related tool (for example, graphical editors, emacs "major modes", web apps) is acceptable. The purpose of the tool requirement is to ensure that the flavor is actually used in practice. 5.1.2. Initial Registration The registry shall have the following initial registration: Identifier: Original Description: Gruber's original Markdown syntax. Documentation: [MDSYNTAX] Rules: None. Responsible Parties: (individual) John Gruber Currently Maintained? No Tools: Name: Markdown.pl Version(s): 1.0.1, 1.0.2b8 Type: Processor Reference(s): Purpose: Converts Markdown to HTML or XHTML circa 2004. 5.1.3. Reserved Identifiers Leonard Exp. March 26, 2015 [Page 17] Internet-Draft The text/markdown Media Type September 2014 The flavors registry SHALL have the following identifiers RESERVED. No one is allowed to register them (or any case variations of them). Standard Common Markdown 5.1.4. Standard of Review Registrations are made by a highly constrained Expert Review [RFC5226] that amounts more-or-less to First-Come, First-Served with sanity checking. The designated expert SHALL review the flavor registration. The identifier MUST comply with the syntax specified in this document. Additionally, the identifier MUST NOT differ from other registered identifiers merely by case. The description and documentation SHOULD provide sufficient guidance to an implementer to implement a tool to handle the flavor. The designated expert SHOULD warn the registrant if the description and documentation are inadequate; however, inadequacy (in the opinion of the designated expert) will not bar a registration. All references (including contact information) MUST be verified as functional at the time of the registration. If rules are included in the registration, the rule identifiers MUST comply with the syntax specified in this document. The description and documentation of each rule SHOULD provide sufficient guidance to an implementer to implement a tool to handle the rule. The designated expert SHOULD warn the registrant if the description and documentation are inadequate; however, inadequacy (in the opinion of the designated expert) will not bar a registration. The designated expert MUST determine that all tools listed in the registration are real implementations. If a tool is a Markdown processor, the processor MUST be registered in the Registry of Flavors in Section 5.2. The designated expert MAY request that the registrant provide evidence that a tool actually works (for example, that it passes certain test suites); however, the failure of a tool to work according to the flavor registration will not bar a registration. (For example, not even Gruber's own Markdown.pl implementation complies with [MDSYNTAX]. C'est la vie!) If a registration is being updated, the designated expert SHOULD verify that the updating registrant matches the contact information on the prior registration, and if not, that the updating registrant has authority from the prior registrant to update it. All fields may be updated except the Identifier, which is permanent: not even case Leonard Exp. March 26, 2015 [Page 18] Internet-Draft The text/markdown Media Type September 2014 may be changed. 5.2. Registry of Processors Each entry in this registry SHALL consist of a processor identifier and information about the processor, as follows: 5.2.1. Processor Template Identifier: [Identifier] Description: [Concise, prose description of the processor, with emphasis on its purpose, the community that it addresses, and notable variations from [MDSYNTAX] or another flavor.] Documentation: [References to documentation.] Versions: {for each version} Identifier: [Identifier] Description: [Optional, concise, prose description of the version. "N/A" SHALL be used to indicate no description.] Arguments: {in general} Argument Ordering: [Concise, prose description of how arguments need to be ordered.] {for each argument} Argument Syntax: [Syntax here; multiple consecutive argument positions are allowed, separated by a single space. Use braces for variable information (add : for example input), for URI references, and .. for sequences of arguments with # as a placeholder for the number of arguments or ..-.. to indicate the first character of the subsequent argument that ends the sequence, e.g.: -c --title {title: "The Rain in Spain"} --metadata --bullet-chars:{#} {char 1}..{char #} --verbs {verb: walk, run, sleep}..-.. ] Description: [Concise, prose description of the argument.] Documentation: [References to documentation.] Output Type(s): [Internet media types, comma-separated (with optional LWSP)] Leonard Exp. March 26, 2015 [Page 19] Internet-Draft The text/markdown Media Type September 2014 Security Considerations: [Sufficient description of risks and other considerations; "N/A" or "None" responses are insufficient.] Responsible Parties: {for each party} ([type: individual, corporate, representative]) [Name] ... Currently Maintained? [Yes/No] A responsible party can be an individual author or maintainer, a corporate author or maintainer (plus an individual contact), or a representative of a community of interest dedicated to the Markdown processor. 5.2.2. Initial Registration The registry shall have the following initial registration: Identifier: Markdown.pl Description: Gruber's original Markdown processor, written in Perl. Requires Perl 5.6.0 or later. "Welcome to the 21st Century." Works with Movable Type 2.6+, Blosxom 2.0+, BBEdit 6.1+, and the command-line. Documentation: [MARKDOWN] Versions: Identifier: 1.0.1 Description: The 2004-12-17 version. Identifier: 1.0.2b8 Description: The 2007-05-09 version. Fixes many bugs and adds several new features; see VERSION HISTORY in Markdown.pl. Leonard Exp. March 26, 2015 [Page 20] Internet-Draft The text/markdown Media Type September 2014 Arguments: Argument Syntax: --html4tags Description: "Use the --html4tags command-line switch to produce HTML output from a Unix-style command line." Without this argument, Markdown.pl outputs XHTML style tags by default, e.g.:
. Even though XHTML style is the default, the output SHOULD be analyzed as text/html; the processor makes no attempt to make its output well-formed application/html+xml (not surprising--see the design philosophy). Documentation: [MARKDOWN] Output Type: text/html Security Considerations: The security of this implementation has not been fully analyzed. Responsible Parties: (individual) John Gruber Currently Maintained? No [[TODO: maybe?]] 5.2.3. Reserved Identifiers The processors registry SHALL have the following identifiers RESERVED. No one is allowed to register them (or any case variations of them). Standard Markdown md 5.2.4. Standard of Review Registrations are First-Come, First-Served [RFC5226]. The checks prescribed by this section can be performed automatically. The identifier MUST comply with the syntax specified in this document. Additionally, the identifier MUST NOT differ from other registered identifiers merely by case. The description and documentation SHOULD provide sufficient guidance to an implementer to know how to invoke the processor and handle the output. All references (including contact information) MUST be verified as functional at the time of the registration. If arguments are included in the registration, the Argument Syntax Leonard Exp. March 26, 2015 [Page 21] Internet-Draft The text/markdown Media Type September 2014 MUST comply with the template instructions in Section 5.2.1. Each description and documentation field SHOULD provide sufficient guidance to an implementer to know how to invoke the processor and handle the output. The Security Considerations field is not optional; it MUST be provided. If a registration is being updated, the contact information MUST either match the prior registration and be verified, or the prior registrant MUST confirm that the updating registrant has authority to update the registration. All fields may be updated except the Identifier, which is permanent: not even case may be changed. 6. Security Considerations See the answer to the Security Considerations template questions in Section 2. Security considerations for the optional parameters are integrated throughout Section 4. 7. References 7.1. Normative References [MARKDOWN] Gruber, J., "Daring Fireball: Markdown", December 2004, . [MDSYNTAX] Gruber, J., "Daring Fireball: Markdown Syntax Documentation", December 2004, . [RFC1738] Berners-Lee, T., Masinter, L., and M. McCahill, "Uniform Resource Locators (URL)", RFC 1738, December 1994. [RFC2045] Freed, N. and N. Borenstein, "Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies", RFC 2045, November 1996. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform Resource Identifier (URI): Generic Syntax", STD 66, RFC 3986, January 2005. [RFC5226] Narten, T., and H. Alvestrand, "Guidelines for Writing an Leonard Exp. March 26, 2015 [Page 22] Internet-Draft The text/markdown Media Type September 2014 IANA Considerations Section in RFCs", RFC 5226, May 2008. [RFC5322] Resnick, P., Ed., "Internet Message Format", RFC 5322, October 2008. [RFC6838] Freed, N., Klensin, J., and T. Hansen, "Media Type Specifications and Registration Procedures", BCP 13, RFC 6838, January 2013. 7.2. Informative References [HUMANE] Atwood, J., "Is HTML a Humane Markup Language?", May 2008, . [DIN2MD] Gruber, J., "Dive Into Markdown", March 2004, . [MD102b8] Gruber, J., "[ANN] Markdown.pl 1.0.2b8", May 2007, , . [CATPICS] Gruber, J. and M. Arment, "The Talk Show: Ep. 88: 'Cat Pictures' (Side 1)", July 2014, . [INETMEME] Solon, O., "Richard Dawkins on the internet's hijacking of the word 'meme'", June 2013, , . [MULTIMD] Penney, F., "MultiMarkdown", April 2014, . [PANDOC] MacFarlane, J., "Pandoc", 2014, . [RAILFROG] Railfrog Team, "Railfrog", April 2009, . [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, RFC 793, September 1981. [RFC2392] Levinson, E., "Content-ID and Message-ID Uniform Resource Locators", RFC 2392, August 1998. [RFC4263] Lilly, B., "Media Subtype Registration for Media Type Leonard Exp. March 26, 2015 [Page 23] Internet-Draft The text/markdown Media Type September 2014 text/troff", RFC 4263, January 2006. [XML1.0-3] Bray, T., Paoli, J., Sperberg-McQueen, M., Maler, E., and F. Yergeau, "Extensible Markup Language (XML) 1.0 (Third Edition)", World Wide Web Consortium Recommendation REC- xml-20040204, February 2004, . [TODO] [[Add remaining references.]] Appendix A. Change Log This draft is a continuation from draft-ietf-appsawg-text-markdown- 01.txt. These technical changes were made: 1. The entire document was reorganized: optional parameters now have their own section, and the Introduction section is divided into four subsections. 2. The Introduction section provides substantial background information, along with goals and use cases for both Markdown and the Internet media type registration. 3. The rules parameter was reverted back to flavor, and flavor was beefed up. 4. The processor parameters were consolidated and simplified. 5. Dependencies on POSIX were removed. 6. The output-type parameter was added. 7. Unregistered identifiers can be used with their own ! syntax. 8. The IANA Considerations section was fleshed out in great detail, with emphasis on easing the registration process. 9. Security considerations were weaved throughout the specification. Overall, most of the complexity in this specification comes directly from the security considerations. Those considerations are necessary since a lot of bad things can and will happen when HTML, URIs, and executable code get together. 10. Changed the example in Section 2 to use initially registered identifiers. 11. Added output-type="text/markdown" for recursive handling (i.e., don't process this Markdown, just show it like it is). Leonard Exp. March 26, 2015 [Page 24] Internet-Draft The text/markdown Media Type September 2014 Author's Address Sean Leonard Penango, Inc. 5900 Wilshire Boulevard 21st Floor Los Angeles, CA 90036 USA EMail: dev+ietf@seantek.com URI: http://www.penango.com/ Leonard Exp. March 26, 2015 [Page 25]