| < draft-crockford-jsonorg-json-00.txt | draft-crockford-jsonorg-json-01.txt > | |||
|---|---|---|---|---|
| JSON D. Crockford | JSON D. Crockford | |||
| Internet Draft JSON.org | Internet Draft JSON.org | |||
| draft-crockford-jsonorg-json-00.txt January, 2006 | draft-crockford-jsonorg-json-01.txt February, 2006 | |||
| Intended status: Informational | Intended status: Informational | |||
| Expires: June 10, 2006 | Expires: June 10, 2006 | |||
| JSON | JSON | |||
| Status of this Memo | Status of this Memo | |||
| This document may not be modified, and derivative works of it | This document may not be modified, and derivative works of it | |||
| may not be created, except to publish it as an RFC and to | may not be created, except to publish it as an RFC and to | |||
| translate it into languages other than English. | translate it into languages other than English. | |||
| By submitting this Internet-Draft, each author represents that any | By submitting this Internet-Draft, each author represents that any | |||
| applicable patent or other IPR claims of which he or she is aware | applicable patent or other IPR claims of which he or she is aware | |||
| have been or will be disclosed, and any of which he or she becomes | have been or will be disclosed, and any of which he or she becomes | |||
| skipping to change at line 45 ¶ | skipping to change at line 45 ¶ | |||
| This Internet Draft will expire on June 10, 2006. | This Internet Draft will expire on June 10, 2006. | |||
| Copyright Notice | Copyright Notice | |||
| Copyright (C) The Internet Society (2006). | Copyright (C) The Internet Society (2006). | |||
| Abstract | Abstract | |||
| JSON (JavaScript Object Notation) is a light-weight, text-based, | JSON (JavaScript Object Notation) is a light-weight, text-based, | |||
| language-independent, data interchange format. It was derived from | language-independent, data interchange format. It was derived from | |||
| ECMA 262 (The ECMAScript Programming Language Standard), Third | the ECMAScript Programming Language Standard. JSON defines a small | |||
| Edition. JSON defines a small set of formatting rules for the | set of formatting rules for the portable representation of structured | |||
| portable representation of structured data. | data. | |||
| Conventions used in this document | Conventions used in this document | |||
| The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
| "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | |||
| document are to be interpreted as described in RFC-2119. | document are to be interpreted as described in RFC-2119. | |||
| The syntax diagrams in this document are to be interpreted as | The syntax rules in this document are to be interpreted as | |||
| described in RFC-2234. | described in RFC-2234. | |||
| 1. Introduction | 1. Introduction | |||
| JSON, or JavaScript Object Notation, is a text format for the | JSON, or JavaScript Object Notation, is a text format for the | |||
| serialization of structured data. It is derived from the object | serialization of structured data. It is derived from the object | |||
| literals of JavaScript, as defined in ECMA 262 (The ECMAScript | literals of JavaScript, as defined in the ECMAScript | |||
| Programming Language Standard), Third Edition (1999). | Programming Language Standard [ECMA]. | |||
| JSON can represent four primitive types (strings, numbers, booleans, | JSON can represent four primitive types (strings, numbers, booleans, | |||
| and null) and two structured types (objects and arrays). | and null) and two structured types (objects and arrays). | |||
| A string is a sequence of zero or more Unicode characters. | A string is a sequence of zero or more Unicode characters. | |||
| An object is an unordered collection of zero or more name/value | An object is an unordered collection of zero or more name/value | |||
| pairs, where a name is a string, and a value is a string, number, | pairs, where a name is a string, and a value is a string, number, | |||
| boolean, null, object, or array. | boolean, null, object, or array. | |||
| An array is an ordered sequence of zero or more values. | An array is an ordered sequence of zero or more values. | |||
| The terms "object" and "array" come from the conventions of | The terms "object" and "array" come from the conventions of | |||
| JavaScript. | JavaScript. | |||
| JSON's design goals were to be minimal, portable, textual, and a | ||||
| subset of JavaScript. | ||||
| 2. JSON Grammar | 2. JSON Grammar | |||
| A JSON text is a sequence of tokens. The set of tokens includes six | A JSON text is a sequence of tokens. The set of tokens includes six | |||
| structural characters, strings, numbers, and three literal names. | structural characters, strings, numbers, and three literal names. | |||
| A JSON text is a serialized object or array. | ||||
| JSON-text = object / array | ||||
| These are the six structural characters: | These are the six structural characters: | |||
| <begin-object> = %x7B ; { left brace | begin-array = %x5B ; [ left square bracket | |||
| <end-object> = %x7D ; } right brace | begin-object = %x7B ; { left curly bracket | |||
| <begin-array> = %x5B ; [ left bracket | end-array = %x5D ; ] right square bracket | |||
| <end-array> = %x5D ; ] right brace | end-object = %x7D ; } right curly bracket | |||
| <name-separator> = %x3A ; : colon | name-separator = %x3A ; : colon | |||
| <value-separator> = %x2C ; , comma | value-separator = %x2C ; , comma | |||
| 2.1. Whitespace | 2.1. Whitespace | |||
| The tokens may be separated by any combination of these whitespace | The tokens MAY be separated by any combination of these | |||
| characters: | insignificant whitespace characters: | |||
| space U+0020 Space | space U+0020 Space | |||
| TAB U+0009 Horizontal tab | HT U+0009 Horizontal tab | |||
| LF U+000A Line feed or New line | LF U+000A Line feed or New line | |||
| CR U+000D Carriage return | CR U+000D Carriage return | |||
| Insignificant whitespace must not be placed within a | Insignificant whitespace MUST NOT be placed within a | |||
| multicharacter token (a literal name, number, or string). A space | multicharacter token (a literal name, number, or string). | |||
| character in a string is significant. | A space character in a string is significant. | |||
| 2.2. Values | 2.2. Values | |||
| A JSON value can be a object, array, number, or string, or one of | A JSON value MUST be a object, array, number, or string, or one of | |||
| the literal names true, false, or null. The literal names must be | the three literal names: | |||
| in lower case. No other literal names are allowed. | ||||
| <value> = <string> / <number> / <object> / <array> / | false null true | |||
| <true> / <false> / <null> | ||||
| <true> = %x74.72.75.65 ; true | The literal names MUST be in lower case. No other literal names | |||
| are allowed. | ||||
| <false> = %x66.61.6c.73.65 ; false | value = false / null / true / object / array / number / string | |||
| <null> = %x6e.75.6c.6c ; null | false = %x66.61.6c.73.65 ; false | |||
| null = %x6e.75.6c.6c ; null | ||||
| true = %x74.72.75.65 ; true | ||||
| 2.3. Objects | 2.3. Objects | |||
| An object structure is represented as a pair of curly braces | An object structure is represented as a pair of curly brackets | |||
| surrounding zero or more name/value pairs (or members). A name is | surrounding zero or more name/value pairs (or members). A name is | |||
| a string. A single colon comes after each name, separating the | a string. A single colon comes after each name, separating the | |||
| name from the value. A single comma separates a value from a | name from the value. A single comma separates a value from a | |||
| following name. | following name. | |||
| <object> = <begin-object> [ <member> | object = begin-object [ member *( value-separator member ) ] | |||
| *( <value-separator> <member> ) ] <end-object> | end-object | |||
| <member> = <string> <name-separator> <value> | member = string name-separator value | |||
| 2.4. Arrays | 2.4. Arrays | |||
| An array structure is represented as square brackets surrounding | An array structure is represented as square brackets surrounding | |||
| zero or more values (or elements). Elements are separated by | zero or more values (or elements). Elements are separated by | |||
| commas. | commas. | |||
| <array> = <begin-array> [ <value> | array = begin-array [ value *( value-separator value ) ] | |||
| *( <value-separator> <value> ) ] <end-array> | end-array | |||
| 2.5. Numbers | 2.5. Numbers | |||
| The representation of numbers is similar to that used in | The representation of numbers is similar to that used in most | |||
| programming languages. A number contains an integer component | programming languages. A number contains an integer component | |||
| (which may be prefixed with an optional minus sign (U+002D)), | which may be prefixed with an optional minus sign, which may be | |||
| which may be followed by a fraction part and/or an exponent part. | followed by a fraction part and/or an exponent part. | |||
| Octal and hex forms are not allowed. Leading zeros are not allowed | Octal and hex forms are not allowed. Leading zeros are not | |||
| as that could lead to confusion. | allowed. | |||
| A fraction part is a decimal point (U+002E) followed by one or | A fraction part is a decimal point followed by one or more digits. | |||
| more digits. | ||||
| An exponent part begins with the letter E in upper or lower case | An exponent part begins with the letter E in upper or lower case, | |||
| (U+0045 or U+0065), which may be followed by a plus (U+002B) or | which may be followed by a plus or minus sign. The E and optional | |||
| minus (U+002D). The E and optional sign are followed by one or | sign are followed by one or more digits. | |||
| more digits. | ||||
| Numeric values that cannot be represented as sequences of digits | Numeric values that cannot be represented as sequences of digits | |||
| (such as Infinity and NaN) are not permitted. | (such as Infinity and NaN) are not permitted. | |||
| <number> = [ "-" ] <int> [ <frac> ] [ <exp> ] | number = [ minus ] int [ frac ] [ exp ] | |||
| <int> = "0" / ( <digit1-9> *<digit> ) | decimal-point = %x2E ; . | |||
| <frac> = "." 1*<digit> | digit1-9 = %x31-39 ; 1-9 | |||
| <exp> = ( "e" / "E" ) [ "-" / "+" ] 1*<digit> | e = %x65 / %x45 ; e E | |||
| <digit> = "0" / "1" / "2" / "3" / "4" / | exp = e [ minus / plus ] 1*DIGIT | |||
| "5" / "6" / "7" / "8" / "9" | ||||
| <digit1-9> = "1" / "2" / "3" / "4" / | frac = decimal-point 1*DIGIT | |||
| "5" / "6" / "7" / "8" / "9" | ||||
| int = zero / ( digit1-9 *DIGIT ) | ||||
| minus = %x2D ; - | ||||
| plus = %x2B ; + | ||||
| zero = %x30 ; 0 | ||||
| 2.6. Strings | 2.6. Strings | |||
| The representation of strings is similar to conventions used in | The representation of strings is similar to conventions used in | |||
| the C family of programming languages. A string begins and ends | the C family of programming languages. A string begins and ends | |||
| with quotation marks (U+0022). All Unicode characters can be | with quotation marks. All Unicode characters may be placed within | |||
| placed within the quotation marks except for the characters which | the quotation marks except for the characters which must be | |||
| must be escaped: quotation mark (U+0022), reverse virgule | escaped: quotation mark, reverse solidus, and the control | |||
| (U+005C), and the control characters (U+0000 through U+001F). | characters (U+0000 through U+001F). | |||
| Any character may be escaped. If the character is in the Basic | Any character may be escaped. If the character is in the Basic | |||
| Multilingual Plane (U+0000 through U+FFFF) then it may be | Multilingual Plane (U+0000 through U+FFFF) then it may be | |||
| represented as a six-character sequence: a reverse virgule | represented as a six-character sequence: a reverse solidus | |||
| followed by the lower case letter u (U+0075) followed by four | followed by the lower case letter u followed by four hexadecimal | |||
| hexadecimal digits which encode the character's code point. The | digits which encode the character's code point. The hexadecimal | |||
| hexadecimal letters a though f can be in upper or lower case. So, | letters A though F can be in upper or lower case. So, for | |||
| for example, a string containing only a single reverse virgule | example, a string containing only a single reverse solidus | |||
| character may be represented as "\u005C". | character may be represented as "\u005C". | |||
| Alternatively, there are two-character sequence escape | Alternatively, there are two-character sequence escape | |||
| representations of some popular characters. So, for example, a | representations of some popular characters. So, for example, a | |||
| string containing only a single reverse virgule character may be | string containing only a single reverse solidus character may be | |||
| represented more compactly as "\\". | represented more compactly as "\\". | |||
| Short Long | ||||
| form form | ||||
| \" \u0022 quotation mark | ||||
| \\ \u005C reverse virgule or backslash | ||||
| \/ \u002F virgule or slash | ||||
| \b \u0008 backspace | ||||
| \f \u000C form feed | ||||
| \n \u000A line feed or new line | ||||
| \r \u000D carriage return | ||||
| \t \u0009 tab | ||||
| To escape an extended character that is not in the Basic | To escape an extended character that is not in the Basic | |||
| Multilingual Plane, then the character is represented as a | Multilingual Plane, then the character is represented as a | |||
| twelve-character sequence, encoding the UTF-16 surrogate pair. So, | twelve-character sequence, encoding the UTF-16 surrogate pair. | |||
| for example, a string containing only the G clef character | So, for example, a string containing only the G clef character | |||
| (U+1D11E) may be represented as "\uD834\uDD1E". | (U+1D11E) may be represented as "\uD834\uDD1E". | |||
| A space in a string is treated as a space character, not as | string = quotation-mark *char quotation-mark | |||
| insignificant whitespace. | ||||
| <string> = <quotation-mark> *<char> <quotation-mark> | char = unescaped / | |||
| escape ( | ||||
| %x22 / ; " quotation mark U+0022 | ||||
| %x5C / ; \ reverse solidus U+005C | ||||
| %x2F / ; / solidus U+002F | ||||
| %x62 / ; b backspace U+0008 | ||||
| %x66 / ; f form feed U+000C | ||||
| %x6E / ; n line feed U+000A | ||||
| %x72 / ; r carriage return U+000D | ||||
| %x74 / ; t tab U+0009 | ||||
| %x75 4HEXDIG ) ; uXXXX U+XXXX | ||||
| <quotation-mark> = %x22 ; " | escape = %x5C ; \ | |||
| <escape> = %x5C ; \ | quotation-mark = %x22 ; " | |||
| <char> = | unescaped = %x20-21 / %x23-5B / %x5D-10FFFF | |||
| <unescaped> / | ||||
| <escape> ( | ||||
| %x22 / ; " quotation mark | ||||
| %x5C / ; \ reverse virgule | ||||
| %x2F / ; / virgule | ||||
| %x62 / ; b backspace | ||||
| %x66 / ; f form feed | ||||
| %x6E / ; n line feed | ||||
| %x72 / ; r carriage return | ||||
| %x74 / ; t tab | ||||
| %x75 4<hex-digit> ) ; uXXXX | ||||
| <hex-digit> = <digit> / "a" / "b" / "c" / "d" / "e" / "f" / | 3. Encoding | |||
| "A" / "B" / "C" / "D" / "E" / "F" | ||||
| <unescaped> = %x20-21 / %x23-5B / %x5D-10FFFF | JSON text SHOULD be encoded in Unicode. The default encoding is | |||
| UTF-8. | ||||
| 3. Parsers | Since the first two characters of a JSON text will always be ASCII | |||
| characters, it is possible to determine if an octet stream is | ||||
| UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking at the | ||||
| pattern of nulls in the first four octets. | ||||
| A JSON parser transforms a JSON text into another representation. A | 00 00 00 xx UTF-32BE | |||
| 00 xx 00 xx UTF-16BE | ||||
| xx 00 00 00 UTF-32LE | ||||
| xx 00 xx 00 UTF-16LE | ||||
| xx xx xx xx UTF-8 | ||||
| 4. Parsers | ||||
| A JSON parser transforms a JSON text into another representation. A | ||||
| JSON parser MUST accept all texts that conform to the JSON grammar. | JSON parser MUST accept all texts that conform to the JSON grammar. | |||
| A JSON parser MAY accept non-JSON forms or extensions. | A JSON parser MAY accept non-JSON forms or extensions. | |||
| An implementation may set limits on the size of texts that it | An implementation may set limits on the size of texts that it | |||
| accepts. An implementation may set limits on the maximum depth of | accepts. An implementation may set limits on the maximum depth of | |||
| nesting. An implementation may set limits on the range of numbers. | nesting. An implementation may set limits on the range of numbers. | |||
| An implementation may set limits on the length and character contents | An implementation may set limits on the length and character contents | |||
| of strings. | of strings. | |||
| 4. Generators | 5. Generators | |||
| A JSON generator produces JSON text. The resulting text MUST strictly | A JSON generator produces JSON text. The resulting text MUST | |||
| conform to the JSON grammar. | strictly conform to the JSON grammar. | |||
| 5. IANA Considerations | 6. IANA Considerations | |||
| The MIME media type for JSON text is text/json. | The MIME media type for JSON text is text/json. | |||
| 6. Security Considerations | 7. Security Considerations | |||
| Since JSON is a subset of JavaScript, the eval() function (which | Generally there are security issues with scripting languages. JSON | |||
| compiles and execute a text) can be used as a JSON parser. This | is a subset of JavaScript, but it is a safe subset that excludes | |||
| should only done if the text is known to be safe. A regular | assignment and invocation. | |||
| expression can be used to prove that the text contains only JSON | ||||
| tokens. A text containing only JSON tokens is safe to eval because | A JSON text can be safely passed into JavaScript's eval() function | |||
| the JSON subset of JavaScript is safe. | (which compiles and executes a string) if all of the characters not | |||
| enclosed in strings are in the set of characters which form JSON | ||||
| tokens. This can be quickly determined in JavaScript with two | ||||
| regular expressions and calls to the test and replace methods. | ||||
| var my_JSON_object = !(/[^,:{}\[\]0-9.\-+Eaeflnr-u \n\r\t]/.test( | ||||
| text.replace(/"(\\.|[^"\\])*"/g, ''))) && | ||||
| eval('(' + text + ')'); | ||||
| 8. References | ||||
| 8.1 Normative References | ||||
| [ECMA] European Computer Manufacturers Association, "ECMAScript | ||||
| Language Specification 3rd Edition", December 1999, | ||||
| <http://www.ecma-international.org/publications/files/ | ||||
| ecma-st/ECMA-262.pdf>. | ||||
| [UNICODE] The Unicode Consortium, "The Unicode Standard | ||||
| Version 4.0", 2003, | ||||
| <http://www.unicode.org/versions/Unicode4.1.0/>. | ||||
| Author's Address | Author's Address | |||
| Douglas Crockford | Douglas Crockford | |||
| JSON.org | JSON.org | |||
| Contact Email: douglas@crockford.com | Contact Email: douglas@crockford.com | |||
| Intellectual Property Statement | Intellectual Property Statement | |||
| The IETF takes no position regarding the validity or scope of any | The IETF takes no position regarding the validity or scope of any | |||
| End of changes. 58 change blocks. | ||||
| 125 lines changed or deleted | 157 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||