idnits 2.17.1 draft-ietf-jsonpath-base-01.txt: -(2): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(7): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(1184): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == There are 9 instances of lines with non-ascii characters in the document. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 33 instances of too long lines in the document, the longest one being 6 characters in excess of 72. == There are 1 instance of lines with non-RFC2606-compliant FQDNs in the document. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (8 July 2021) is 1016 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '0' on line 238 -- Looks like a reference, but probably isn't: '1' on line 240 -- Looks like a reference, but probably isn't: '2' on line 391 -- Looks like a reference, but probably isn't: '3' on line 391 == Unused Reference: 'E4X-overview' is defined on line 1182, but no explicit reference was found in the text Summary: 1 error (**), 0 flaws (~~), 4 warnings (==), 6 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 JSONPath WG S. Gössner, Ed. 3 Internet-Draft Fachhochschule Dortmund 4 Intended status: Standards Track G. Normington, Ed. 5 Expires: 9 January 2022 6 C. Bormann, Ed. 7 Universität Bremen TZI 8 8 July 2021 10 JSONPath: Query expressions for JSON 11 draft-ietf-jsonpath-base-01 13 Abstract 15 JSONPath defines a string syntax for identifying values within a 16 JavaScript Object Notation (JSON) document. 18 Contributing 20 This document picks up the popular JSONPath specification dated 21 2007-02-21 and provides a normative definition for it. In its 22 current state, it is a strawman document showing what needs to be 23 covered. 25 Comments and issues may be directed to this document's github 26 repository (https://github.com/ietf-wg-jsonpath/draft-ietf-jsonpath- 27 jsonpath). 29 Status of This Memo 31 This Internet-Draft is submitted in full conformance with the 32 provisions of BCP 78 and BCP 79. 34 Internet-Drafts are working documents of the Internet Engineering 35 Task Force (IETF). Note that other groups may also distribute 36 working documents as Internet-Drafts. The list of current Internet- 37 Drafts is at https://datatracker.ietf.org/drafts/current/. 39 Internet-Drafts are draft documents valid for a maximum of six months 40 and may be updated, replaced, or obsoleted by other documents at any 41 time. It is inappropriate to use Internet-Drafts as reference 42 material or to cite them other than as "work in progress." 44 This Internet-Draft will expire on 9 January 2022. 46 Copyright Notice 48 Copyright (c) 2021 IETF Trust and the persons identified as the 49 document authors. All rights reserved. 51 This document is subject to BCP 78 and the IETF Trust's Legal 52 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 53 license-info) in effect on the date of publication of this document. 54 Please review these documents carefully, as they describe your rights 55 and restrictions with respect to this document. Code Components 56 extracted from this document must include Simplified BSD License text 57 as described in Section 4.e of the Trust Legal Provisions and are 58 provided without warranty as described in the Simplified BSD License. 60 Table of Contents 62 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 63 1.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 3 64 1.2. Inspired by XPath . . . . . . . . . . . . . . . . . . . . 4 65 1.3. Overview of JSONPath Expressions . . . . . . . . . . . . 5 66 2. JSONPath Examples . . . . . . . . . . . . . . . . . . . . . . 8 67 3. JSONPath Syntax and Semantics . . . . . . . . . . . . . . . . 11 68 3.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . 11 69 3.2. Processing Model . . . . . . . . . . . . . . . . . . . . 11 70 3.3. Syntax . . . . . . . . . . . . . . . . . . . . . . . . . 12 71 3.4. Semantics . . . . . . . . . . . . . . . . . . . . . . . . 12 72 3.5. Selectors . . . . . . . . . . . . . . . . . . . . . . . . 13 73 3.5.1. Root Selector . . . . . . . . . . . . . . . . . . . . 14 74 3.5.2. Dot Selector . . . . . . . . . . . . . . . . . . . . 14 75 3.5.3. Dot Wild Card Selector . . . . . . . . . . . . . . . 15 76 3.5.4. Index Selector . . . . . . . . . . . . . . . . . . . 15 77 3.5.5. Index Wild Card Selector . . . . . . . . . . . . . . 18 78 3.5.6. Array Slice Selector . . . . . . . . . . . . . . . . 18 79 3.5.7. Descendant Selector . . . . . . . . . . . . . . . . . 22 80 3.5.8. Union Selector . . . . . . . . . . . . . . . . . . . 22 81 3.5.8.1. Syntax . . . . . . . . . . . . . . . . . . . . . 22 82 3.5.8.2. Semantics . . . . . . . . . . . . . . . . . . . . 23 83 3.5.9. Filter Selector . . . . . . . . . . . . . . . . . . . 23 84 3.5.9.1. Syntax . . . . . . . . . . . . . . . . . . . . . 23 85 3.5.9.2. Semantics . . . . . . . . . . . . . . . . . . . . 25 86 4. Expression Language . . . . . . . . . . . . . . . . . . . . . 26 87 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 27 88 6. Security Considerations . . . . . . . . . . . . . . . . . . . 27 89 7. References . . . . . . . . . . . . . . . . . . . . . . . . . 27 90 7.1. Normative References . . . . . . . . . . . . . . . . . . 27 91 7.2. Informative References . . . . . . . . . . . . . . . . . 28 92 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 28 93 Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . 29 94 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 29 96 1. Introduction 98 This document picks up the popular JSONPath specification dated 99 2007-02-21 [JSONPath-orig] and provides a normative definition for 100 it. In its current state, it is a strawman document showing what 101 needs to be covered. 103 JSON is defined by [RFC8259]. 105 JSONPath is not intended as a replacement, but as a more powerful 106 companion, to JSON Pointer [RFC6901]. [insert reference to section 107 where the relationship is detailed. The purposes of the two syntaxes 108 are different. Pointer is for isolating a single location within a 109 document. Path is a query syntax that can also be used to pull 110 multiple locations.] 112 1.1. Terminology 114 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 115 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 116 "OPTIONAL" in this document are to be interpreted as described in 117 BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all 118 capitals, as shown here. 120 The grammatical rules in this document are to be interpreted as ABNF, 121 as described in [RFC5234]. ABNF terminal values in this document 122 define Unicode code points rather than their UTF-8 encoding. For 123 example, the Unicode PLACE OF INTEREST SIGN (U+2318) would be defined 124 in ABNF as "%x2318". 126 The terminology of [RFC8259] applies except where clarified below. 127 The terms "Primitive" and "Structured" are used to group the types as 128 in Section 1 of [RFC8259]. Definitions for "Object", "Array", 129 "Number", and "String" remain unchanged. Importantly "object" and 130 "array" in particular do not take on a generic meaning, such as they 131 would in a general programming context. 133 Additional terms used in this specification are defined below. 135 Value: As per [RFC8259], a structure complying to the generic data 136 model of JSON, i.e., composed of components such as containers, 137 namely JSON objects and arrays, and atomic data, namely null, 138 true, false, numbers, and text strings. 140 Member: A name/value pair in an object. (Not itself a value.) 141 Name: The name in a name/value pair constituting a member. (Also 142 known as "key", "tag", or "label".) This is also used in 143 [RFC8259], but that specification does not formally define it. It 144 is included here for completeness. 146 Element: A value in an array. (Also used with a distinct meaning in 147 XML context for XML elements.) 149 Index: A non-negative integer that identifies a specific element in 150 an array. 152 Query: Short name for JSONPath expression. 154 Argument: Short name for the value a JSONPath expression is applied 155 to. 157 Node: The pair of a value along with its location within the 158 argument. 160 Root Node: The unique node whose value is the entire argument. 162 Nodelist: A list of nodes. The output of applying a query to an 163 argument is manifested as a list of nodes. While this list can be 164 represented in JSON, e.g. as an array, the nodelist is an abstract 165 concept unrelated to JSON values. 167 Normalized Path: A simple form of JSONPath expression that 168 identifies a node by providing a query that results in exactly 169 that node. Similar to, but syntactically different from, a JSON 170 Pointer [RFC6901]. 172 For the purposes of this specification, a value as defined by 173 [RFC8259] is also viewed as a tree of nodes. Each node, in turn, 174 holds a value. Further nodes within each value are the elements of 175 arrays and the member values of objects and are themselves values. 176 (The type of the value held by a node may also be referred to as the 177 type of the node.) 179 A query is applied to an argument, and the output is a nodelist. 181 1.2. Inspired by XPath 183 A frequently emphasized advantage of XML is the availability of 184 powerful tools to analyse, transform and selectively extract data 185 from XML documents. [XPath] is one of these tools. 187 In 2007, the need for something solving the same class of problems 188 for the emerging JSON community became apparent, specifically for: 190 * Finding data interactively and extracting them out of [RFC8259] 191 JSON values without special scripting. 193 * Specifying the relevant parts of the JSON data in a request by a 194 client, so the server can reduce the amount of data in its 195 response, minimizing bandwidth usage. 197 So what does such a tool look like for JSON? When defining a 198 JSONPath, how should expressions look? 200 The XPath expression 202 /store/book[1]/title 204 looks like 206 x.store.book[0].title 208 or 210 x['store']['book'][0]['title'] 212 in popular programming languages such as JavaScript, Python and PHP, 213 with a variable x holding the argument. Here we observe that such 214 languages already have a fundamentally XPath-like feature built in. 216 The JSONPath tool in question should: 218 * be naturally based on those language characteristics. 220 * cover only essential parts of XPath 1.0. 222 * be lightweight in code size and memory consumption. 224 * be runtime efficient. 226 1.3. Overview of JSONPath Expressions 228 JSONPath expressions always apply to a value in the same way as XPath 229 expressions are used in combination with an XML document. Since a 230 value is anonymous, JSONPath uses the abstract name "$" to refer to 231 the root node of the argument. 233 JSONPath expressions can use the _dot notation_ 235 $.store.book[0].title 237 or the _bracket notation_ 238 $['store']['book'][0]['title'] 240 for paths input to a JSONPath processor. [1] Where a JSONPath 241 processor uses JSONPath expressions as output paths, these will 242 always be converted to Output Paths which employ the more general 243 _bracket notation_. [2] Bracket notation is more general than dot 244 notation and can serve as a canonical form when a JSONPath processor 245 uses JSONPath expressions as output paths. 247 JSONPath allows the wildcard symbol "*" for member names and array 248 indices. It borrows the descendant operator ".." from [E4X] and the 249 array slice syntax proposal "[start:end:step]" [SLICE] from 250 ECMASCRIPT 4. 252 JSONPath was originally designed to employ an _underlying scripting 253 language_ for computing expressions. The present specification 254 defines a simple expression language that is independent from any 255 scripting language in use on the platform. 257 JSONPath can use expressions, written in parentheses: "()", as 258 an alternative to explicit names or indices as in: 260 $.store.book[(@.length-1)].title 262 The symbol "@" is used for the current node. Filter expressions are 263 supported via the syntax "?()" as in 265 $.store.book[?(@.price < 10)].title 267 Here is a complete overview and a side by side comparison of the 268 JSONPath syntax elements with their XPath counterparts. 270 +=======+==================+=====================================+ 271 | XPath | JSONPath | Description | 272 +=======+==================+=====================================+ 273 | / | $ | the root element/node | 274 +-------+------------------+-------------------------------------+ 275 | . | @ | the current element/node | 276 +-------+------------------+-------------------------------------+ 277 | / | "." or "[]" | child operator | 278 +-------+------------------+-------------------------------------+ 279 | .. | n/a | parent operator | 280 +-------+------------------+-------------------------------------+ 281 | // | .. | nested descendants (JSONPath | 282 | | | borrows this syntax from E4X) | 283 +-------+------------------+-------------------------------------+ 284 | * | * | wildcard: All elements/nodes | 285 | | | regardless of their names | 286 +-------+------------------+-------------------------------------+ 287 | @ | n/a | attribute access: JSON values do | 288 | | | not have attributes | 289 +-------+------------------+-------------------------------------+ 290 | [] | [] | subscript operator: XPath uses it | 291 | | | to iterate over element collections | 292 | | | and for predicates; native array | 293 | | | indexing as in JavaScript here | 294 +-------+------------------+-------------------------------------+ 295 | | | [,] | Union operator in XPath (results in | 296 | | | a combination of node sets); | 297 | | | JSONPath allows alternate names or | 298 | | | array indices as a set | 299 +-------+------------------+-------------------------------------+ 300 | n/a | [start:end:step] | array slice operator borrowed from | 301 | | | ES4 | 302 +-------+------------------+-------------------------------------+ 303 | [] | ?() | applies a filter (script) | 304 | | | expression | 305 +-------+------------------+-------------------------------------+ 306 | n/a | () | expression engine | 307 +-------+------------------+-------------------------------------+ 308 | () | n/a | grouping in Xpath | 309 +-------+------------------+-------------------------------------+ 311 Table 1: Overview over JSONPath, comparing to XPath 313 XPath has a lot more to offer (location paths in unabbreviated 314 syntax, operators and functions) than listed here. Moreover there is 315 a significant difference how the subscript operator works in Xpath 316 and JSONPath: 318 * Square brackets in XPath expressions always operate on the _node 319 set_ resulting from the previous path fragment. Indices always 320 start at 1. 322 * With JSONPath, square brackets operate on the _object_ or _array_ 323 addressed by the previous path fragment. Array indices always 324 start at 0. 326 2. JSONPath Examples 328 This section provides some more examples for JSONPath expressions. 329 The examples are based on the simple JSON value shown in Figure 1, 330 which was patterned after a typical XML example representing a 331 bookstore (that also has bicycles). 333 { "store": { 334 "book": [ 335 { "category": "reference", 336 "author": "Nigel Rees", 337 "title": "Sayings of the Century", 338 "price": 8.95 339 }, 340 { "category": "fiction", 341 "author": "Evelyn Waugh", 342 "title": "Sword of Honour", 343 "price": 12.99 344 }, 345 { "category": "fiction", 346 "author": "Herman Melville", 347 "title": "Moby Dick", 348 "isbn": "0-553-21311-3", 349 "price": 8.99 350 }, 351 { "category": "fiction", 352 "author": "J. R. R. Tolkien", 353 "title": "The Lord of the Rings", 354 "isbn": "0-395-19395-8", 355 "price": 22.99 356 } 357 ], 358 "bicycle": { 359 "color": "red", 360 "price": 19.95 361 } 362 } 363 } 365 Figure 1: Example JSON value 367 The examples in Table 2 use the expression mechanism to obtain the 368 number of elements in an array, to test for the presence of a member 369 in a object, and to perform numeric comparisons of member values with 370 a constant. 372 +======================+=========================+=================+ 373 | XPath | JSONPath | Result | 374 +======================+=========================+=================+ 375 | /store/book/author | $.store.book[*].author | the authors of | 376 | | | all books in | 377 | | | the store | 378 +----------------------+-------------------------+-----------------+ 379 | //author | $..author | all authors | 380 +----------------------+-------------------------+-----------------+ 381 | /store/* | $.store.* | all things in | 382 | | | store, which | 383 | | | are some books | 384 | | | and a red | 385 | | | bicycle | 386 +----------------------+-------------------------+-----------------+ 387 | /store//price | $.store..price | the prices of | 388 | | | everything in | 389 | | | the store | 390 +----------------------+-------------------------+-----------------+ 391 | //book[3] | $..book[2] | the third book | 392 +----------------------+-------------------------+-----------------+ 393 | //book[last()] | "$..book[(@.length-1)]" | the last book | 394 | | "$..book[-1]" | in order | 395 +----------------------+-------------------------+-----------------+ 396 | //book[position()<3] | "$..book[0,1]" | the first two | 397 | | "$..book[:2]" | books | 398 +----------------------+-------------------------+-----------------+ 399 | //book[isbn] | $..book[?(@.isbn)] | filter all | 400 | | | books with isbn | 401 | | | number | 402 +----------------------+-------------------------+-----------------+ 403 | //book[price<10] | $..book[?(@.price<10)] | filter all | 404 | | | books cheaper | 405 | | | than 10 | 406 +----------------------+-------------------------+-----------------+ 407 | //* | $..* | all elements in | 408 | | | XML document; | 409 | | | all member | 410 | | | values and | 411 | | | array elements | 412 | | | contained in | 413 | | | input value | 414 +----------------------+-------------------------+-----------------+ 416 Table 2: Example JSONPath expressions applied to the example 417 JSON value 419 3. JSONPath Syntax and Semantics 421 3.1. Overview 423 A JSONPath query is a string which selects zero or more nodes of a 424 piece of JSON. A valid query conforms to the ABNF syntax defined by 425 this document. 427 A query MUST be encoded using UTF-8. To parse a query according to 428 the grammar in this document, its UTF-8 form SHOULD first be decoded 429 into Unicode code points as described in [RFC3629]. 431 A string to be used as a JSONPath query needs to be _well-formed_ and 432 _valid_. A string is a well-formed JSONPath query if it conforms to 433 the syntax of JSONPath. A well-formed JSONPath query is valid if it 434 also fulfills all semantic requirements posed by this document. 436 The well-formedness and the validity of JSONPath queries are 437 independent of the value the query is applied to; no further errors 438 can be raised during application of the query to a value. 440 (Obviously, an implementation can still fail when executing a 441 JSONPath query, e.g., because of resource depletion, but this is not 442 modeled in the present specification.) 444 3.2. Processing Model 446 In this specification, the semantics of a JSONPath query are defined 447 in terms of a _processing model_. That model is not prescriptive of 448 the internal workings of an implementation: Implementations may wish 449 (or need) to design a different process that yields results that 450 conform to the model. 452 In the processing model, a valid query is executed against a value, 453 the _argument_, and produces a list of zero or more nodes of the 454 value. 456 The query is a sequence of zero or more _selectors_, each of which is 457 applied to the result of the previous selector and provides input to 458 the next selector. These results and inputs take the form of a 459 _nodelist_, i.e., a sequence of zero or more nodes. 461 The nodelist going into the first selector contains a single node, 462 the argument. The nodelist resulting from the last selector is 463 presented as the result of the query; depending on the specific API, 464 it might be presented as an array of the JSON values at the nodes, an 465 array of Output Paths referencing the nodes, or both -- or some other 466 representation as desired by the implementation. Note that the API 467 must be capable of presenting an empty nodelist as the result of the 468 query. 470 A selector performs its function on each of the nodes in its input 471 nodelist, during such a function execution, such a node is referred 472 to as the "current node". Each of these function executions produces 473 a nodelist, which are then concatenated into the result of the 474 selector. 476 The processing within a selector may execute nested queries, which 477 are in turn handled with the processing model defined here. 478 Typically, the argument to that query will be the current node of the 479 selector or a set of nodes subordinate to that current node. 481 3.3. Syntax 483 Syntactically, a JSONPath query consists of a root selector ("$"), 484 which stands for a nodelist that contains the root node of the 485 argument, followed by a possibly empty sequence of _selectors_. 487 json-path = root-selector *(dot-selector / 488 dot-wild-selector / 489 index-selector / 490 index-wild-selector / 491 union-selector / 492 slice-selector / 493 descendant-selector / 494 filter-selector) 496 The syntax and semantics of each selector is defined below. 498 3.4. Semantics 500 The root selector "$" not only selects the root node of the argument, 501 but it also produces as output a list consisting of one node: the 502 argument itself. 504 A selector may select zero or more nodes for further processing. A 505 syntactically valid selector MUST NOT produce errors. This means 506 that some operations which might be considered erroneous, such as 507 indexing beyond the end of an array, simply result in fewer nodes 508 being selected. 510 But a selector doesn't just act on a single node: a selector acts on 511 each of the nodes in its input nodelist and concatenates the 512 resultant nodelists to form the result nodelist of the selector. 514 For each node in the list, the selector selects zero or more nodes, 515 each of which is a descendant of the node or the node itself. 517 For instance, with the argument "{"a":[{"b":0},{"b":1},{"c":2}]}", 518 the query "$.a[*].b" selects the following list of nodes: "0", "1" 519 (denoted here by their value). Let's walk through this in detail. 521 The query consists of "$" followed by three selectors: ".a", "[*]", 522 and ".b". 524 Firstly, "$" selects the root node which is the argument. So the 525 result is a list consisting of just the root node. 527 Next, ".a" selects from any input node of type object and selects the 528 node of any member value of the input node corresponding to the 529 member name ""a"". The result is again a list of one node: 530 "[{"b":0},{"b":1},{"c":2}]". 532 Next, "[*]" selects from any input node which is an array and selects 533 all the elements of the input node. The result is a list of three 534 nodes: "{"b":0}", "{"b":1}", and "{"c":2}". 536 Finally, ".b" selects from any input node of type object with a 537 member name "b" and selects the node of the member value of the input 538 node corresponding to that name. The result is a list containing 539 "0", "1". This is the concatenation of three lists, two of length 540 one containing "0", "1", respectively, and one of length zero. 542 As a consequence of this approach, if any of the selectors selects no 543 nodes, then the whole query selects no nodes. 545 In what follows, the semantics of each selector are defined for each 546 type of node. 548 3.5. Selectors 550 A JSONPath query consists of a sequence of selectors. Valid 551 selectors are 553 * Root selector "$" 555 * Dot selector ".", used with object member names exclusively. 557 * Dot wild card selector ".*". 559 * Index selector "[]", where "" is either an (possibly 560 negative) array index or an object member name. 562 * Index wild card selector "[*]". 564 * Array slice selector "[::]", where "", 565 "", "" are integer literals. 567 * Nested descendants selector "..". 569 * Union selector "[,,...,]", holding a comma 570 delimited list of index, index wild card, array slice, and filter 571 selectors. 573 * Filter selector "[?()]" 575 * Current item selector "@" 577 3.5.1. Root Selector 579 Syntax 581 Every valid JSONPath query MUST begin with the root selector "$". 583 root-selector = "$" 585 Semantics 587 The Argument -- the root JSON value -- becomes the root node, which 588 is addressed by the root selector "$". 590 3.5.2. Dot Selector 592 Syntax 594 A dot selector starts with a dot "." followed by an object's member 595 name. 597 dot-selector = "." dot-member-name 598 dot-member-name = name-first *name-char 599 name-first = 600 ALPHA / 601 "_" / ; _ 602 %x80-10FFFF ; any non-ASCII Unicode character 603 name-char = DIGIT / name-first 605 DIGIT = %x30-39 ; 0-9 606 ALPHA = %x41-5A / %x61-7A ; A-Z / a-z 607 Member names containing other characters than allowed by "dot- 608 selector" -- such as space ` ` and minus "-" characters -- MUST NOT 609 be used with the "dot-selector". (Such member names can be addressed 610 by the "index-selector" instead.) 612 Semantics 614 The "dot-selector" selects the node of the member value corresponding 615 to the member name from any JSON object. It selects no nodes from 616 any other JSON value. 618 Note that the "dot-selector" follows the philosophy of JSON strings 619 and is allowed to contain bit sequences that cannot encode Unicode 620 characters (a single unpaired UTF-16 surrogate, for example). The 621 behaviour of an implementation is undefined for member names which do 622 not encode Unicode characters. 624 3.5.3. Dot Wild Card Selector 626 Syntax 628 The dot wild card selector has the form ".*". 630 dot-wild-selector = "." "*" ; dot followed by asterisk 632 Semantics 634 A "dot-wild-selector" acts as a wild card by selecting the nodes of 635 all member values of an object as well as all element nodes of an 636 array. Applying the "dot-wild-selector" to a primitive JSON value 637 (number, string, or true/false/null) selects no node. 639 3.5.4. Index Selector 641 Syntax 643 An index selector "[]" addresses at most one object member 644 value or at most one array element value. 646 index-selector = "[" (quoted-member-name / element-index) "]" 648 Applying the "index-selector" to an object value, a "quoted-member- 649 name" string is required. JSONPath allows it to be enclosed in 650 _single_ or _double_ quotes. 652 quoted-member-name = string-literal 654 string-literal = %x22 *double-quoted %x22 / ; "string" 655 %x27 *single-quoted %x27 ; 'string' 657 double-quoted = unescaped / 658 %x27 / ; ' 659 ESC %x22 / ; \" 660 ESC escapable 662 single-quoted = unescaped / 663 %x22 / ; " 664 ESC %x27 / ; \' 665 ESC escapable 667 ESC = %x5C ; \ backslash 669 unescaped = %x20-21 / ; s. RFC 8259 670 %x23-26 / ; omit " 671 %x28-5B / ; omit ' 672 %x5D-10FFFF ; omit \ 674 escapable = ( %x62 / %x66 / %x6E / %x72 / %x74 / ; \b \f \n \r \t 675 ; b / ; BS backspace U+0008 676 ; t / ; HT horizontal tab U+0009 677 ; n / ; LF line feed U+000A 678 ; f / ; FF form feed U+000C 679 ; r / ; CR carriage return U+000D 680 "/" / ; / slash (solidus) 681 "\" / ; \ backslash (reverse solidus) 682 (%x75 hexchar) ; uXXXX U+XXXX 683 ) 685 hexchar = non-surrogate / (high-surrogate "\" %x75 low-surrogate) 686 non-surrogate = ((DIGIT / "A"/"B"/"C" / "E"/"F") 3HEXDIG) / 687 ("D" %x30-37 2HEXDIG ) 688 high-surrogate = "D" ("8"/"9"/"A"/"B") 2HEXDIG 689 low-surrogate = "D" ("C"/"D"/"E"/"F") 2HEXDIG 691 HEXDIG = DIGIT / "A" / "B" / "C" / "D" / "E" / "F" 693 ; Task from 2021-06-15 interim: update ABNF later 695 Applying the "index-selector" to an array, a numerical "element- 696 index" is required. JSONPath allows it to be negative. 698 element-index = int ; decimal integer 700 int = ["-"] ( "0" / (DIGIT1 *DIGIT) ) ; - optional 701 DIGIT1 = %x31-39 ; 1-9 non-zero digit 703 Notes: 1. "double-quoted" strings follow JSON in [RFC8259]; "single- 704 quoted" strings follow an analogous pattern. 2. An "element-index" 705 is an integer (in base 10, as in JSON numbers). 3. As in JSON 706 numbers, the syntax does not allow octal-like integers with leading 707 zeros such as "01" or "-01". 709 Semantics 711 A "quoted-member-name" string MUST be converted to a member name by 712 removing the surrounding quotes and replacing each escape sequence 713 with its equivalent Unicode character, as in the table below: 715 +=================+===================+=============================+ 716 | Escape Sequence | Unicode Character | Description | 717 +=================+===================+=============================+ 718 | \b | U+0008 | BS backspace | 719 +-----------------+-------------------+-----------------------------+ 720 | \t | U+0009 | HT horizontal tab | 721 +-----------------+-------------------+-----------------------------+ 722 | \n | U+000A | LF line feed | 723 +-----------------+-------------------+-----------------------------+ 724 | \f | U+000C | FF form feed | 725 +-----------------+-------------------+-----------------------------+ 726 | \r | U+000D | CR carriage return | 727 +-----------------+-------------------+-----------------------------+ 728 | \" | U+0022 | quotation mark | 729 +-----------------+-------------------+-----------------------------+ 730 | \' | U+0027 | apostrophe | 731 +-----------------+-------------------+-----------------------------+ 732 | \/ | U+002F | slash (solidus) | 733 +-----------------+-------------------+-----------------------------+ 734 | \\ | U+005C | backslash (reverse | 735 | | | solidus) | 736 +-----------------+-------------------+-----------------------------+ 737 | \uXXXX | U+XXXX | unicode character | 738 +-----------------+-------------------+-----------------------------+ 740 Table 3: Escape Sequence Replacements 742 The "index-selector" applied with a "quoted-member-name" to an object 743 selects the node of the corresponding member value from it, if and 744 only if that object has a member with that name. Nothing is selected 745 from a value which is not a object. 747 Array indexing via "element-index" is a way of selecting a particular 748 array element using a zero-based index. For example, selector "[0]" 749 selects the first and selector "[4]" the fifth element of a 750 sufficiently long array. 752 A negative "element-index" counts from the array end. For example, 753 selector "[-1]" selects the last and selector "[-2]" selects the last 754 but one element of an array with at least two elements. 756 3.5.5. Index Wild Card Selector 758 Syntax 760 The index wild card selector has the form "[*]". 762 index-wild-selector = "[" "*" "]" ; asterisk enclosed by brackets 764 Semantics 766 An "index-wild-selector" selects the nodes of all member values of an 767 object as well as of all elements of an array. Applying the "index- 768 wild-selector" to a primitive JSON value (such as a number, string, 769 or true/false/null) selects no node. 771 The "index-wild-selector" behaves identically to the "dot-wild- 772 selector". 774 3.5.6. Array Slice Selector 776 Syntax 778 The array slice selector has the form "[::]". It 779 selects elements starting at index "", ending at -- but not 780 including -- "", while incrementing by "step". 782 slice-selector = "[" slice-index "]" 784 slice-index = ws [start] ws ":" ws [end] [ws ":" ws [step] ws] 786 start = int ; included in selection 787 end = int ; not included in selection 788 step = int ; default: 1 790 ws = *( %x20 / ; Space 791 %x09 / ; Horizontal tab 792 %x0A / ; Line feed or New line 793 %x0D ) ; Carriage return 795 The "slice-selector" consists of three optional decimal integers 796 separated by colons. 798 Semantics 800 The "slice-selector" was inspired by the slice operator of ECMAScript 801 4 (ES4), which was deprecated in 2014, and that of Python. 803 Informal Introduction 805 This section is non-normative. 807 Array indexing is a way of selecting a particular element of an array 808 using a 0-based index. For example, the expression "[0]" selects the 809 first element of a non-empty array. 811 Negative indices index from the end of an array. For example, the 812 expression "[-2]" selects the last but one element of an array with 813 at least two elements. 815 Array slicing is inspired by the behaviour of the 816 "Array.prototype.slice" method of the JavaScript language as defined 817 by the ECMA-262 standard [ECMA-262], with the addition of the "step" 818 parameter, which is inspired by the Python slice expression. 820 The array slice expression "[start:end:step]" selects elements at 821 indices starting at "start", incrementing by "step", and ending with 822 "end" (which is itself excluded). So, for example, the expression 823 "[1:3]" (where "step" defaults to "1") selects elements with indices 824 "1" and "2" (in that order) whereas "[1:5:2]" selects elements with 825 indices "1" and "3". 827 When "step" is negative, elements are selected in reverse order. 828 Thus, for example, "[5:1:-2]" selects elements with indices "5" and 829 "3", in that order and "[::-1]" selects all the elements of an array 830 in reverse order. 832 When "step" is "0", no elements are selected. This is the one case 833 which differs from the behaviour of Python, which raises an error in 834 this case. 836 The following section specifies the behaviour fully, without 837 depending on JavaScript or Python behaviour. 839 Detailed Semantics 841 An array selector is either an array slice or an array index, which 842 is defined in terms of an array slice. 844 A slice expression selects a subset of the elements of the input 845 array, in the same order as the array or the reverse order, depending 846 on the sign of the "step" parameter. It selects no nodes from a node 847 which is not an array. 849 A slice is defined by the two slice parameters, "start" and "end", 850 and an iteration delta, "step". Each of these parameters is 851 optional. "len" is the length of the input array. 853 The default value for "step" is "1". The default values for "start" 854 and "end" depend on the sign of "step", as follows: 856 +===========+=========+==========+ 857 | Condition | start | end | 858 +===========+=========+==========+ 859 | step >= 0 | 0 | len | 860 +-----------+---------+----------+ 861 | step < 0 | len - 1 | -len - 1 | 862 +-----------+---------+----------+ 864 Table 4: Default array slice 865 start and end values 867 Slice expression parameters "start" and "end" are not directly usable 868 as slice bounds and must first be normalized. Normalization for this 869 purpose is defined as: 871 FUNCTION Normalize(i, len): 872 IF i >= 0 THEN 873 RETURN i 874 ELSE 875 RETURN len + i 876 END IF 878 The result of the array indexing expression "[i]" applied to an array 879 of length "len" is defined to be the result of the array slicing 880 expression "[i:Normalize(i, len)+1:1]". 882 Slice expression parameters "start" and "end" are used to derive 883 slice bounds "lower" and "upper". The direction of the iteration, 884 defined by the sign of "step", determines which of the parameters is 885 the lower bound and which is the upper bound: 887 FUNCTION Bounds(start, end, step, len): 888 n_start = Normalize(start, len) 889 n_end = Normalize(end, len) 891 IF step >= 0 THEN 892 lower = MIN(MAX(n_start, 0), len) 893 upper = MIN(MAX(n_end, 0), len) 894 ELSE 895 upper = MIN(MAX(n_start, -1), len-1) 896 lower = MIN(MAX(n_end, -1), len-1) 897 END IF 899 RETURN (lower, upper) 901 The slice expression selects elements with indices between the lower 902 and upper bounds. In the following pseudocode, the "a(i)" construct 903 expresses the 0-based indexing operation on the underlying array. 905 IF step > 0 THEN 907 i = lower 908 WHILE i < upper: 909 SELECT a(i) 910 i = i + step 911 END WHILE 913 ELSE if step < 0 THEN 915 i = upper 916 WHILE lower < i: 917 SELECT a(i) 918 i = i + step 919 END WHILE 921 END IF 923 When "step = 0", no elements are selected and the result array is 924 empty. 926 An implementation MUST raise an error if any of the slice expression 927 parameters does not fit in the implementation's representation of an 928 integer. If a successfully parsed slice expression is evaluated 929 against an array whose size doesn't fit in the implementation's 930 representation of an integer, the implementation MUST raise an error. 932 3.5.7. Descendant Selector 934 Syntax 936 The descendant selector starts with a double dot ".." and can be 937 followed by an object member name (similar to the "dot-selector"), by 938 an "index-selector" acting on objects or arrays, or by a wild card. 940 descendant-selector = ".." ( dot-member-name / ; .. 941 index-selector / ; ..[] 942 index-wild-selector / ; ..[*] 943 "*" ; ..* 944 ) 946 Semantics 948 The "descendant-selector" is inspired by ECMAScript for XML (E4X). 949 It selects the node and all its descendants. 951 3.5.8. Union Selector 953 3.5.8.1. Syntax 955 The union selector is syntactically related to the "index-selector". 956 It contains multiple, comma separated entries. 958 union-selector = "[" ws union-entry 1*(ws "," ws union-entry) ws "]" 960 union-entry = ( quoted-member-name / 961 element-index / 962 slice-index 963 ) 965 Task (T1): This, besides slice-index, is currently one of only two 966 places in the document that mentions whitespace. Whitespace needs 967 to be handled throughout the ABNF syntax. Room Consensus at the 968 2021-06-15 interim was that JSONPath generally is generous with 969 allowing insignificant whitespace throughout. Minimizing the 970 impact of the many whitespace insertion points by choosing a rule 971 name such as "S" was mentioned. Some conventions will probably 972 help with minimizing the number of places where S needs to be 973 inserted. 975 3.5.8.2. Semantics 977 A union selects any node which is selected by at least one of the 978 union selectors and selects the concatenation of the lists (in the 979 order of the selectors) of nodes selected by the union elements. 980 Note that any node selected in more than one of the union selectors 981 is kept as many times in the node list. 983 3.5.9. Filter Selector 985 3.5.9.1. Syntax 987 The filter selector has the form "[?]". It works via iterating 988 over structured values, i.e. arrays and objects. 990 filter-selector = "[?" boolean-expr "]" 992 During iteration process each array element or object member is 993 visited and its value -- accessible via symbol "@" -- or one of its 994 descendants -- uniquely defined by a relative path -- is tested 995 against a boolean expression "boolean-expr". 997 The current item is selected if and only if the result is "true". 999 boolean-expr = logical-expr 1000 logical-expr = ([neg-op] primary-expr) / logical-or-expr 1001 neg-op = "!" ; not operator 1002 primary-expr = "(" logical-or-expr ")" 1003 logical-or-expr = logical-and-expr *["||" logical-and-expr] 1004 logical-and-expr = comp-expr *["&&" comp-expr] 1006 comp-expr = (rel-path-val / 1007 json-path) [(comp-op comparable / ; comparison 1008 regex-op regex / ; RegEx test 1009 in-op container )] ; containment test 1010 comp-op = "==" / "!=" / ; comparison ... 1011 "<" / ">" / ; operators 1012 "<=" / ">=" 1013 regex-op = "~=" ; RegEx match 1014 in-op = " in " ; in operator 1015 comparable = number / string-literal / ; primitive ... 1016 true / false / null / ; values only 1017 rel-path-val / ; descendant value 1018 json-path ; any value 1020 rel-path-val = "@" *(dot-selector / index-selector) 1022 container = 1023 regex = 1025 Notes: 1027 * Parentheses can be used with "boolean-expr" for grouping. So 1028 filter selection syntax in the original proposal "[?()]" is 1029 naturally contained in the current lean syntax "[?]" as a 1030 special case. 1032 * Comparisons are restricted to primitive values (such as number, 1033 string, "true", "false", "null"). Comparisons with complex values 1034 will fail, i.e. no selection occurs. 1036 * Types are not implicitly converted in comparisons. So ""13 == 1037 '13'"" selects no node. 1039 * A member or element value by itself is _falsy_ only, if it does 1040 not exist. Otherwise it is _truthy_, resulting in its value. To 1041 be more specific explicit comparisons are necessary. This 1042 existence test -- as an exception of the general rule -- also 1043 works with complex values. 1045 * Regular expression tests can be applied to "string" values only. 1047 * Containment tests work with arrays and objects. 1049 * Explicit boolean type conversion is done by the not operator "neg- 1050 op". 1052 * The behaviour of operators is consistent with the 'C'-family of 1053 programming languages. 1055 3.5.9.2. Semantics 1057 The "filter-selector" works with arrays and objects exclusively. Its 1058 result might be a list of _zero_, _one_, _multiple_ or _all_ of their 1059 element or member values then. Applied to other value types, it will 1060 select nothing. 1062 Negation operator "neg-op" allows to test _falsiness_ of values. 1064 +========+==========+========+=============================+ 1065 | Type | Negation | Result | Comment | 1066 +========+==========+========+=============================+ 1067 | Number | !0 | true | "false" for non-zero number | 1068 +--------+----------+--------+-----------------------------+ 1069 | String | "!""" | true | "false" for non-empty | 1070 | | "!''" | | string | 1071 +--------+----------+--------+-----------------------------+ 1072 | null | !null | true | -- | 1073 +--------+----------+--------+-----------------------------+ 1074 | true | !true | false | -- | 1075 +--------+----------+--------+-----------------------------+ 1076 | false | !false | true | -- | 1077 +--------+----------+--------+-----------------------------+ 1078 | Object | "!{}" | false | always "false" | 1079 | | "!{a:0}" | | | 1080 +--------+----------+--------+-----------------------------+ 1081 | Array | "![]" | false | always "false" | 1082 | | "![0]" | | | 1083 +--------+----------+--------+-----------------------------+ 1085 Table 5: Test falsiness of JSON values 1087 Applying negation operator twice "!!" gives us _truthiness_ of 1088 values. 1090 Some examples: 1092 +===================+=======================+===============+===========+ 1093 |JSON | Query | Result |Comment | 1094 +===================+=======================+===============+===========+ 1095 |"{"a":1,"b":2}" | $[?@] | "[1,2]" |Same as | 1096 |"[2,3,4]" | | "[2,3,4]" |"$.*" or | 1097 | | | |"$[*]" | 1098 +-------------------+-----------------------+---------------+-----------+ 1099 |./. | $[?@==2] | "[2]" |Select by | 1100 | | | "[2]" |value. | 1101 +-------------------+-----------------------+---------------+-----------+ 1102 |{"a":{"b":{"c":{}}}| "$[?@.b]" |[{"b":{"c":{}}]|Existence | 1103 | | "$[?@.b.c]" | | | 1104 +-------------------+-----------------------+---------------+-----------+ 1105 |{"key":false} | "$[?index(@)=='key']" | "[false]" |Select | 1106 | | "$[?index(@)==0]" | "[]" |object | 1107 | | | |member | 1108 +-------------------+-----------------------+---------------+-----------+ 1109 |[3,4,5] | "$[?index(@)==2]" | "[5]" |Select | 1110 | | "$[?index(@)==17]" | "[]" |array | 1111 | | | |element | 1112 +-------------------+-----------------------+---------------+-----------+ 1113 |{"col":"red"} | $[?@ in | ["red"] |Containment| 1114 | |['red','green','blue']]| | | 1115 +-------------------+-----------------------+---------------+-----------+ 1116 |{"a":{"b":{5},c:0}}| $[?@.b==5 && !@.c] |[{"b":{5},c:0}]|Existence | 1117 +-------------------+-----------------------+---------------+-----------+ 1119 Table 6 1121 4. Expression Language 1123 Task (T2): Separate out expression language. For now, this 1124 section is a repository for ABNF taken from [RFC8259]. This needs 1125 to be deduplicated with definitions above. 1127 number = [ minus ] jsint [ frac ] [ exp ] 1128 decimal-point = %x2E ; . 1129 digit1-9 = %x31-39 ; 1-9 1130 e = %x65 / %x45 ; e E 1131 exp = e [ minus / plus ] 1*DIGIT 1132 frac = decimal-point 1*DIGIT 1133 jsint = zero / ( digit1-9 *DIGIT ) 1134 minus = %x2D ; - 1135 plus = %x2B ; + 1136 zero = %x30 ; 0 1138 false = %x66.61.6c.73.65 ; false 1139 null = %x6e.75.6c.6c ; null 1140 true = %x74.72.75.65 ; true 1142 5. IANA Considerations 1144 TBD: Define a media type for JSONPath expressions. 1146 6. Security Considerations 1148 This section gives security considerations, as required by [RFC3552]. 1150 7. References 1152 7.1. Normative References 1154 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1155 Requirement Levels", BCP 14, RFC 2119, 1156 DOI 10.17487/RFC2119, March 1997, 1157 . 1159 [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO 1160 10646", STD 63, RFC 3629, DOI 10.17487/RFC3629, November 1161 2003, . 1163 [RFC5234] Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax 1164 Specifications: ABNF", STD 68, RFC 5234, 1165 DOI 10.17487/RFC5234, January 2008, 1166 . 1168 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 1169 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 1170 May 2017, . 1172 [RFC8259] Bray, T., Ed., "The JavaScript Object Notation (JSON) Data 1173 Interchange Format", STD 90, RFC 8259, 1174 DOI 10.17487/RFC8259, December 2017, 1175 . 1177 7.2. Informative References 1179 [E4X] ISO, "Information technology — ECMAScript for XML (E4X) 1180 specification", ISO/IEC 22537:2006 , 2006. 1182 [E4X-overview] 1183 Adobe Systems Inc., The Mozilla Foundation, Opera Software 1184 ASA, and others, "Proposed ECMAScript 4 Edition — Language 1185 Overview", 2007. 1187 [ECMA-262] Ecma International, "ECMAScript Language Specification, 1188 Standard ECMA-262, Third Edition", December 1999, 1189 . 1193 [JSONPath-orig] 1194 Gössner, S., "JSONPath — XPath for JSON", 21 February 1195 2007, . 1197 [RFC3552] Rescorla, E. and B. Korver, "Guidelines for Writing RFC 1198 Text on Security Considerations", BCP 72, RFC 3552, 1199 DOI 10.17487/RFC3552, July 2003, 1200 . 1202 [RFC6901] Bryan, P., Ed., Zyp, K., and M. Nottingham, Ed., 1203 "JavaScript Object Notation (JSON) Pointer", RFC 6901, 1204 DOI 10.17487/RFC6901, April 2013, 1205 . 1207 [SLICE] "Slice notation", n.d., 1208 . 1210 [XPath] Berglund, A., Boag, S., Chamberlin, D., Fernandez, M., 1211 Kay, M., Robie, J., and J. Simeon, "XML Path Language 1212 (XPath) 2.0 (Second Edition)", World Wide Web Consortium 1213 Recommendation REC-xpath20-20101214, 14 December 2010, 1214 . 1216 Acknowledgements 1218 This specification is based on Stefan Gössner's original online 1219 article defining JSONPath [JSONPath-orig]. 1221 The books example was taken from http://coli.lili.uni- 1222 bielefeld.de/~andreas/Seminare/sommer02/books.xml -- a dead link now. 1224 Contributors 1226 Marko Mikulicic 1227 InfluxData, Inc. 1228 Pisa 1229 Italy 1231 Email: mmikulicic@gmail.com 1233 Edward Surov 1234 TheSoul Publishing Ltd. 1235 Limassol 1236 Cyprus 1238 Email: esurov.tsp@gmail.com 1240 Authors' Addresses 1242 Stefan Gössner (editor) 1243 Fachhochschule Dortmund 1244 Sonnenstraße 96 1245 D-44139 Dortmund 1246 Germany 1248 Email: stefan.goessner@fh-dortmund.de 1250 Glyn Normington (editor) 1251 Winchester 1252 United Kingdom 1254 Email: glyn.normington@gmail.com 1256 Carsten Bormann (editor) 1257 Universität Bremen TZI 1258 Postfach 330440 1259 D-28359 Bremen 1260 Germany 1262 Phone: +49-421-218-63921 1263 Email: cabo@tzi.org