< draft-crockford-jsonorg-json-00.txt   draft-crockford-jsonorg-json-01.txt >
JSON D. Crockford JSON D. Crockford
Internet Draft JSON.org Internet Draft JSON.org
draft-crockford-jsonorg-json-00.txt January, 2006 draft-crockford-jsonorg-json-01.txt February, 2006
Intended status: Informational Intended status: Informational
Expires: June 10, 2006 Expires: June 10, 2006
JSON JSON
Status of this Memo Status of this Memo
This document may not be modified, and derivative works of it This document may not be modified, and derivative works of it
may not be created, except to publish it as an RFC and to may not be created, except to publish it as an RFC and to
translate it into languages other than English. translate it into languages other than English.
By submitting this Internet-Draft, each author represents that any By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes have been or will be disclosed, and any of which he or she becomes
skipping to change at line 45 skipping to change at line 45
This Internet Draft will expire on June 10, 2006. This Internet Draft will expire on June 10, 2006.
Copyright Notice Copyright Notice
Copyright (C) The Internet Society (2006). Copyright (C) The Internet Society (2006).
Abstract Abstract
JSON (JavaScript Object Notation) is a light-weight, text-based, JSON (JavaScript Object Notation) is a light-weight, text-based,
language-independent, data interchange format. It was derived from language-independent, data interchange format. It was derived from
ECMA 262 (The ECMAScript Programming Language Standard), Third the ECMAScript Programming Language Standard. JSON defines a small
Edition. JSON defines a small set of formatting rules for the set of formatting rules for the portable representation of structured
portable representation of structured data. data.
Conventions used in this document Conventions used in this document
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC-2119. document are to be interpreted as described in RFC-2119.
The syntax diagrams in this document are to be interpreted as The syntax rules in this document are to be interpreted as
described in RFC-2234. described in RFC-2234.
1. Introduction 1. Introduction
JSON, or JavaScript Object Notation, is a text format for the JSON, or JavaScript Object Notation, is a text format for the
serialization of structured data. It is derived from the object serialization of structured data. It is derived from the object
literals of JavaScript, as defined in ECMA 262 (The ECMAScript literals of JavaScript, as defined in the ECMAScript
Programming Language Standard), Third Edition (1999). Programming Language Standard [ECMA].
JSON can represent four primitive types (strings, numbers, booleans, JSON can represent four primitive types (strings, numbers, booleans,
and null) and two structured types (objects and arrays). and null) and two structured types (objects and arrays).
A string is a sequence of zero or more Unicode characters. A string is a sequence of zero or more Unicode characters.
An object is an unordered collection of zero or more name/value An object is an unordered collection of zero or more name/value
pairs, where a name is a string, and a value is a string, number, pairs, where a name is a string, and a value is a string, number,
boolean, null, object, or array. boolean, null, object, or array.
An array is an ordered sequence of zero or more values. An array is an ordered sequence of zero or more values.
The terms "object" and "array" come from the conventions of The terms "object" and "array" come from the conventions of
JavaScript. JavaScript.
JSON's design goals were to be minimal, portable, textual, and a
subset of JavaScript.
2. JSON Grammar 2. JSON Grammar
A JSON text is a sequence of tokens. The set of tokens includes six A JSON text is a sequence of tokens. The set of tokens includes six
structural characters, strings, numbers, and three literal names. structural characters, strings, numbers, and three literal names.
A JSON text is a serialized object or array.
JSON-text = object / array
These are the six structural characters: These are the six structural characters:
<begin-object> = %x7B ; { left brace begin-array = %x5B ; [ left square bracket
<end-object> = %x7D ; } right brace begin-object = %x7B ; { left curly bracket
<begin-array> = %x5B ; [ left bracket end-array = %x5D ; ] right square bracket
<end-array> = %x5D ; ] right brace end-object = %x7D ; } right curly bracket
<name-separator> = %x3A ; : colon name-separator = %x3A ; : colon
<value-separator> = %x2C ; , comma value-separator = %x2C ; , comma
2.1. Whitespace 2.1. Whitespace
The tokens may be separated by any combination of these whitespace The tokens MAY be separated by any combination of these
characters: insignificant whitespace characters:
space U+0020 Space space U+0020 Space
TAB U+0009 Horizontal tab HT U+0009 Horizontal tab
LF U+000A Line feed or New line LF U+000A Line feed or New line
CR U+000D Carriage return CR U+000D Carriage return
Insignificant whitespace must not be placed within a Insignificant whitespace MUST NOT be placed within a
multicharacter token (a literal name, number, or string). A space multicharacter token (a literal name, number, or string).
character in a string is significant. A space character in a string is significant.
2.2. Values 2.2. Values
A JSON value can be a object, array, number, or string, or one of A JSON value MUST be a object, array, number, or string, or one of
the literal names true, false, or null. The literal names must be the three literal names:
in lower case. No other literal names are allowed.
<value> = <string> / <number> / <object> / <array> / false null true
<true> / <false> / <null>
<true> = %x74.72.75.65 ; true The literal names MUST be in lower case. No other literal names
are allowed.
<false> = %x66.61.6c.73.65 ; false value = false / null / true / object / array / number / string
<null> = %x6e.75.6c.6c ; null false = %x66.61.6c.73.65 ; false
null = %x6e.75.6c.6c ; null
true = %x74.72.75.65 ; true
2.3. Objects 2.3. Objects
An object structure is represented as a pair of curly braces An object structure is represented as a pair of curly brackets
surrounding zero or more name/value pairs (or members). A name is surrounding zero or more name/value pairs (or members). A name is
a string. A single colon comes after each name, separating the a string. A single colon comes after each name, separating the
name from the value. A single comma separates a value from a name from the value. A single comma separates a value from a
following name. following name.
<object> = <begin-object> [ <member> object = begin-object [ member *( value-separator member ) ]
*( <value-separator> <member> ) ] <end-object> end-object
<member> = <string> <name-separator> <value> member = string name-separator value
2.4. Arrays 2.4. Arrays
An array structure is represented as square brackets surrounding An array structure is represented as square brackets surrounding
zero or more values (or elements). Elements are separated by zero or more values (or elements). Elements are separated by
commas. commas.
<array> = <begin-array> [ <value> array = begin-array [ value *( value-separator value ) ]
*( <value-separator> <value> ) ] <end-array> end-array
2.5. Numbers 2.5. Numbers
The representation of numbers is similar to that used in The representation of numbers is similar to that used in most
programming languages. A number contains an integer component programming languages. A number contains an integer component
(which may be prefixed with an optional minus sign (U+002D)), which may be prefixed with an optional minus sign, which may be
which may be followed by a fraction part and/or an exponent part. followed by a fraction part and/or an exponent part.
Octal and hex forms are not allowed. Leading zeros are not allowed Octal and hex forms are not allowed. Leading zeros are not
as that could lead to confusion. allowed.
A fraction part is a decimal point (U+002E) followed by one or A fraction part is a decimal point followed by one or more digits.
more digits.
An exponent part begins with the letter E in upper or lower case An exponent part begins with the letter E in upper or lower case,
(U+0045 or U+0065), which may be followed by a plus (U+002B) or which may be followed by a plus or minus sign. The E and optional
minus (U+002D). The E and optional sign are followed by one or sign are followed by one or more digits.
more digits.
Numeric values that cannot be represented as sequences of digits Numeric values that cannot be represented as sequences of digits
(such as Infinity and NaN) are not permitted. (such as Infinity and NaN) are not permitted.
<number> = [ "-" ] <int> [ <frac> ] [ <exp> ] number = [ minus ] int [ frac ] [ exp ]
<int> = "0" / ( <digit1-9> *<digit> ) decimal-point = %x2E ; .
<frac> = "." 1*<digit> digit1-9 = %x31-39 ; 1-9
<exp> = ( "e" / "E" ) [ "-" / "+" ] 1*<digit> e = %x65 / %x45 ; e E
<digit> = "0" / "1" / "2" / "3" / "4" / exp = e [ minus / plus ] 1*DIGIT
"5" / "6" / "7" / "8" / "9"
<digit1-9> = "1" / "2" / "3" / "4" / frac = decimal-point 1*DIGIT
"5" / "6" / "7" / "8" / "9"
int = zero / ( digit1-9 *DIGIT )
minus = %x2D ; -
plus = %x2B ; +
zero = %x30 ; 0
2.6. Strings 2.6. Strings
The representation of strings is similar to conventions used in The representation of strings is similar to conventions used in
the C family of programming languages. A string begins and ends the C family of programming languages. A string begins and ends
with quotation marks (U+0022). All Unicode characters can be with quotation marks. All Unicode characters may be placed within
placed within the quotation marks except for the characters which the quotation marks except for the characters which must be
must be escaped: quotation mark (U+0022), reverse virgule escaped: quotation mark, reverse solidus, and the control
(U+005C), and the control characters (U+0000 through U+001F). characters (U+0000 through U+001F).
Any character may be escaped. If the character is in the Basic Any character may be escaped. If the character is in the Basic
Multilingual Plane (U+0000 through U+FFFF) then it may be Multilingual Plane (U+0000 through U+FFFF) then it may be
represented as a six-character sequence: a reverse virgule represented as a six-character sequence: a reverse solidus
followed by the lower case letter u (U+0075) followed by four followed by the lower case letter u followed by four hexadecimal
hexadecimal digits which encode the character's code point. The digits which encode the character's code point. The hexadecimal
hexadecimal letters a though f can be in upper or lower case. So, letters A though F can be in upper or lower case. So, for
for example, a string containing only a single reverse virgule example, a string containing only a single reverse solidus
character may be represented as "\u005C". character may be represented as "\u005C".
Alternatively, there are two-character sequence escape Alternatively, there are two-character sequence escape
representations of some popular characters. So, for example, a representations of some popular characters. So, for example, a
string containing only a single reverse virgule character may be string containing only a single reverse solidus character may be
represented more compactly as "\\". represented more compactly as "\\".
Short Long
form form
\" \u0022 quotation mark
\\ \u005C reverse virgule or backslash
\/ \u002F virgule or slash
\b \u0008 backspace
\f \u000C form feed
\n \u000A line feed or new line
\r \u000D carriage return
\t \u0009 tab
To escape an extended character that is not in the Basic To escape an extended character that is not in the Basic
Multilingual Plane, then the character is represented as a Multilingual Plane, then the character is represented as a
twelve-character sequence, encoding the UTF-16 surrogate pair. So, twelve-character sequence, encoding the UTF-16 surrogate pair.
for example, a string containing only the G clef character So, for example, a string containing only the G clef character
(U+1D11E) may be represented as "\uD834\uDD1E". (U+1D11E) may be represented as "\uD834\uDD1E".
A space in a string is treated as a space character, not as string = quotation-mark *char quotation-mark
insignificant whitespace.
<string> = <quotation-mark> *<char> <quotation-mark> char = unescaped /
escape (
%x22 / ; " quotation mark U+0022
%x5C / ; \ reverse solidus U+005C
%x2F / ; / solidus U+002F
%x62 / ; b backspace U+0008
%x66 / ; f form feed U+000C
%x6E / ; n line feed U+000A
%x72 / ; r carriage return U+000D
%x74 / ; t tab U+0009
%x75 4HEXDIG ) ; uXXXX U+XXXX
<quotation-mark> = %x22 ; " escape = %x5C ; \
<escape> = %x5C ; \ quotation-mark = %x22 ; "
<char> = unescaped = %x20-21 / %x23-5B / %x5D-10FFFF
<unescaped> /
<escape> (
%x22 / ; " quotation mark
%x5C / ; \ reverse virgule
%x2F / ; / virgule
%x62 / ; b backspace
%x66 / ; f form feed
%x6E / ; n line feed
%x72 / ; r carriage return
%x74 / ; t tab
%x75 4<hex-digit> ) ; uXXXX
<hex-digit> = <digit> / "a" / "b" / "c" / "d" / "e" / "f" / 3. Encoding
"A" / "B" / "C" / "D" / "E" / "F"
<unescaped> = %x20-21 / %x23-5B / %x5D-10FFFF JSON text SHOULD be encoded in Unicode. The default encoding is
UTF-8.
3. Parsers Since the first two characters of a JSON text will always be ASCII
characters, it is possible to determine if an octet stream is
UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking at the
pattern of nulls in the first four octets.
A JSON parser transforms a JSON text into another representation. A 00 00 00 xx UTF-32BE
00 xx 00 xx UTF-16BE
xx 00 00 00 UTF-32LE
xx 00 xx 00 UTF-16LE
xx xx xx xx UTF-8
4. Parsers
A JSON parser transforms a JSON text into another representation. A
JSON parser MUST accept all texts that conform to the JSON grammar. JSON parser MUST accept all texts that conform to the JSON grammar.
A JSON parser MAY accept non-JSON forms or extensions. A JSON parser MAY accept non-JSON forms or extensions.
An implementation may set limits on the size of texts that it An implementation may set limits on the size of texts that it
accepts. An implementation may set limits on the maximum depth of accepts. An implementation may set limits on the maximum depth of
nesting. An implementation may set limits on the range of numbers. nesting. An implementation may set limits on the range of numbers.
An implementation may set limits on the length and character contents An implementation may set limits on the length and character contents
of strings. of strings.
4. Generators 5. Generators
A JSON generator produces JSON text. The resulting text MUST strictly A JSON generator produces JSON text. The resulting text MUST
conform to the JSON grammar. strictly conform to the JSON grammar.
5. IANA Considerations 6. IANA Considerations
The MIME media type for JSON text is text/json. The MIME media type for JSON text is text/json.
6. Security Considerations 7. Security Considerations
Since JSON is a subset of JavaScript, the eval() function (which Generally there are security issues with scripting languages. JSON
compiles and execute a text) can be used as a JSON parser. This is a subset of JavaScript, but it is a safe subset that excludes
should only done if the text is known to be safe. A regular assignment and invocation.
expression can be used to prove that the text contains only JSON
tokens. A text containing only JSON tokens is safe to eval because A JSON text can be safely passed into JavaScript's eval() function
the JSON subset of JavaScript is safe. (which compiles and executes a string) if all of the characters not
enclosed in strings are in the set of characters which form JSON
tokens. This can be quickly determined in JavaScript with two
regular expressions and calls to the test and replace methods.
var my_JSON_object = !(/[^,:{}\[\]0-9.\-+Eaeflnr-u \n\r\t]/.test(
text.replace(/"(\\.|[^"\\])*"/g, ''))) &&
eval('(' + text + ')');
8. References
8.1 Normative References
[ECMA] European Computer Manufacturers Association, "ECMAScript
Language Specification 3rd Edition", December 1999,
<http://www.ecma-international.org/publications/files/
ecma-st/ECMA-262.pdf>.
[UNICODE] The Unicode Consortium, "The Unicode Standard
Version 4.0", 2003,
<http://www.unicode.org/versions/Unicode4.1.0/>.
Author's Address Author's Address
Douglas Crockford Douglas Crockford
JSON.org JSON.org
Contact Email: douglas@crockford.com Contact Email: douglas@crockford.com
Intellectual Property Statement Intellectual Property Statement
The IETF takes no position regarding the validity or scope of any The IETF takes no position regarding the validity or scope of any
 End of changes. 58 change blocks. 
125 lines changed or deleted 157 lines changed or added

This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/