idnits 2.17.1 draft-soilandreyes-arcp-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (January 23, 2018) is 2256 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Looks like a reference, but probably isn't: '1' on line 618 ** Obsolete normative reference: RFC 2279 (Obsoleted by RFC 3629) ** Obsolete normative reference: RFC 5785 (Obsoleted by RFC 8615) == Outdated reference: A later version (-17) exists of draft-kunze-bagit-14 Summary: 2 errors (**), 0 flaws (~~), 2 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group S. Soiland-Reyes 3 Internet-Draft The University of Manchester 4 Intended status: Informational M. Caceres 5 Expires: July 27, 2018 Mozilla Corporation 6 January 23, 2018 8 The Archive and Packaging (arcp) URI scheme 9 draft-soilandreyes-arcp-00 11 Abstract 13 This specification proposes the Application and Packaging Pointer URI 14 scheme "arcp". 16 arcp URIs can be used to consume or reference hypermedia resources 17 bundled inside a file archive or an application package, as well as 18 to resolve URIs for archive resources within a programmatic 19 framework. 21 This URI scheme provides mechanisms to generate a unique base URI to 22 represent the root of the archive, so that relative URI references in 23 a bundled resource can be resolved within the archive without having 24 to extract the archive content on the local file system. 26 An arcp URI can be used for purposes of isolation (e.g. when 27 consuming multiple archives), security constraints (avoiding "climb 28 out" from the archive), or for externally identiyfing sub-resources 29 referenced by hypermedia formats. 31 Status of This Memo 33 This Internet-Draft is submitted in full conformance with the 34 provisions of BCP 78 and BCP 79. 36 Internet-Drafts are working documents of the Internet Engineering 37 Task Force (IETF). Note that other groups may also distribute 38 working documents as Internet-Drafts. The list of current Internet- 39 Drafts is at https://datatracker.ietf.org/drafts/current/. 41 Internet-Drafts are draft documents valid for a maximum of six months 42 and may be updated, replaced, or obsoleted by other documents at any 43 time. It is inappropriate to use Internet-Drafts as reference 44 material or to cite them other than as "work in progress." 46 This Internet-Draft will expire on July 27, 2018. 48 Copyright Notice 50 Copyright (c) 2018 IETF Trust and the persons identified as the 51 document authors. All rights reserved. 53 This document is subject to BCP 78 and the IETF Trust's Legal 54 Provisions Relating to IETF Documents 55 (https://trustee.ietf.org/license-info) in effect on the date of 56 publication of this document. Please review these documents 57 carefully, as they describe your rights and restrictions with respect 58 to this document. Code Components extracted from this document must 59 include Simplified BSD License text as described in Section 4.e of 60 the Trust Legal Provisions and are provided without warranty as 61 described in the Simplified BSD License. 63 Table of Contents 65 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 66 2. Background . . . . . . . . . . . . . . . . . . . . . . . . . 3 67 3. Scheme syntax . . . . . . . . . . . . . . . . . . . . . . . . 4 68 3.1. Authority . . . . . . . . . . . . . . . . . . . . . . . . 4 69 3.2. Path . . . . . . . . . . . . . . . . . . . . . . . . . . 5 70 4. Scheme semantics . . . . . . . . . . . . . . . . . . . . . . 5 71 4.1. Authority semantics . . . . . . . . . . . . . . . . . . . 5 72 4.2. Path semantics . . . . . . . . . . . . . . . . . . . . . 7 73 4.3. Resolution protocol . . . . . . . . . . . . . . . . . . . 7 74 4.4. Resolving from a .well-known endpoint . . . . . . . . . . 8 75 5. Encoding considerations . . . . . . . . . . . . . . . . . . . 9 76 6. Interoperability considerations . . . . . . . . . . . . . . . 9 77 7. Security Considerations . . . . . . . . . . . . . . . . . . . 10 78 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11 79 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 11 80 9.1. Normative References . . . . . . . . . . . . . . . . . . 11 81 9.2. Informative References . . . . . . . . . . . . . . . . . 12 82 9.3. URIs . . . . . . . . . . . . . . . . . . . . . . . . . . 14 83 Appendix A. Examples . . . . . . . . . . . . . . . . . . . . . . 14 84 A.1. Sharing using app names . . . . . . . . . . . . . . . . . 14 85 A.2. Sandboxing . . . . . . . . . . . . . . . . . . . . . . . 16 86 A.3. Origin-based . . . . . . . . . . . . . . . . . . . . . . 16 87 A.4. Hash-based . . . . . . . . . . . . . . . . . . . . . . . 17 88 A.5. Archives that are not files . . . . . . . . . . . . . . . 18 89 A.6. Linked Data containers which are not on the web . . . . . 18 90 A.7. Resolution of packaged resources . . . . . . . . . . . . 19 91 Appendix B. Acknowledgements . . . . . . . . . . . . . . . . . . 19 92 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 20 94 1. Introduction 96 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 97 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 98 document are to be interpreted as described in [RFC2119]. 100 For the purpose of this specification, an *archive* is a collection 101 of sub-resources addressable by name or path. This definition covers 102 typical archive file formats like ".zip" or "tar.gz" and derived 103 "+zip" media types [RFC6839], but also non-file resource packages 104 like an LDP Container [W3C.REC-ldp-20150226], an installed Web App 105 [W3C.WD-appmanifest-20180118], or a BagIt folder structure 106 [I-D.draft-kunze-bagit-14]. 108 For brevity, the term _archive_ is used throughout this 109 specification, although from the above it can also mean a 110 _container_, _application_ or _package_. 112 2. Background 114 Mobile and Web Applications may bundle resources such as stylesheets 115 with relative URI references to scripts, images and fonts. Resolving 116 such resources within URI handling frameworks may require generating 117 absolute URIs and applying Same-Origin [RFC6454] security policies 118 separately for each app. 120 Applications that are accessing resources bundled inside an archive 121 (e.g. "zip" or "tar.gz" file) can struggle to consume hypermedia 122 content types that use relative URI references [RFC3986] such as 123 "../css/", as it is challenging to determine the base URI in a 124 consistent fashion. 126 Frequently the archive must be unpacked locally to synthesize base 127 URIs like "file:///tmp/a1b27ae03865/" to represent the root of the 128 archive. Such URIs are temporary, might not be globally unique, and 129 could be vulnerable to attacks such as "climbing out" of the root 130 directory. 132 An archive containing multiple HTML or Linked Data resources, such as 133 in a BagIt archive [I-D.draft-kunze-bagit-14], may be using relative 134 URIs to cross-reference constituent files. 136 Consumptions of archives might be performed in memory or through a 137 common framework, abstracting away any local file location. 139 Consumption of an archive with a consistent base URL should be 140 possible no matter from which location it was retrieved, or on which 141 device it is inspected. 143 When consuming multiple archives from untrusted sources it would be 144 beneficial to have a Same Origin policy [RFC6454] so that relative 145 hyperlinks can't escape the particular archive. 147 The "file:" URI scheme [RFC8089] can be ill-suited for purposes such 148 as above, where a location-independent URI scheme is more flexible, 149 secure and globally unique. 151 3. Scheme syntax 153 The "arcp" URI scheme follows the [RFC3986] syntax for hierarchical 154 URIs according to the following productions: 156 URI = scheme ":" arcp-specific [ "#" fragment ] 158 scheme = "arcp" 160 arcp-specific = "//" arcp-authority [ path-absolute ] [ "?" query ] 162 The "arcp-authority" component provides a unique identifier for the 163 opened archive. See Section 3.1 for details. 165 The "path-absolute" component provides the absolute path of a 166 resource (e.g. a file or directory) within the archive. See 167 Section 3.2 for details. 169 The "query" component MAY be used, but its semantics is undefined by 170 this specification. 172 The "fragment" component MAY be used by implementations according to 173 [RFC3986] and the implied media type [RFC2046] of the resource at the 174 path. This specification does not specify how to determine the media 175 type. 177 3.1. Authority 179 The purpose of the "authority" component in an arcp URI is to build a 180 unique base URI for a particular archive. The authority is NOT 181 intended to be resolvable without former knowledge of the archive. 183 The authority of an arcp URI MUST be valid according to these 184 productions: 186 arcp-authority = uuid | ni | name | authority 187 uuid = "uuid," UUID 188 ni = "ni," alg-val 189 name = "name," reg-name 190 1. The prefix "uuid," combines with the "UUID" production as defined 191 in [RFC4122], e.g. "uuid,2a47c495-ac70-4ed1-850b-8800a57618cf" 193 2. The prefix "ni," combines with the "alg-val" production as 194 defined in [RFC6920], e.g. "ni,sha- 195 256;JCS7yveugE3UaZiHCs1XpRVfSHaewxAKka0o5q2osg8" 197 3. The prefix "name," combines with the "reg-name" production as 198 defined in [RFC3986], e.g. "name,app.example.com". 200 4. The production "authority" matches its definition in [RFC3986]. 201 As this necessarily also match the above prefixed productions, 202 those should be considered first before falling back to this 203 production. 205 3.2. Path 207 The "path-absolute" component, if present, MUST match the production 208 in [RFC3986] and provide the absolute path of a resource (e.g. a file 209 or directory) within the archive. 211 Archive media types vary in constraints and possibilities on how to 212 express paths, however implementations SHOULD use "/" as path 213 separator for nested folders and files. 215 It is RECOMMENDED to include the trailing "/" if it is known the path 216 represents a directory. 218 4. Scheme semantics 220 This specification does not constrain what format might constitute an 221 _archive_, and neither does it require that the archive is 222 retrievable as a single bytestream or file. 224 Examples of retrievable archive media types include "application/ 225 zip", "application/vnd.android.package-archive", "application/x-tar", 226 "application/x-gtar" and "application/x-7z-compressed". 228 Examples of non-file archives include an LDP Container 229 [W3C.REC-ldp-20150226], an installed Web App 230 [W3C.WD-appmanifest-20180118], or a BagIt folder structure 231 [I-D.draft-kunze-bagit-14]. 233 4.1. Authority semantics 235 The _authority_ component identifies the archive itself. 237 Implementations MAY assume that two arcp URIs with the same authority 238 component relate to resources within the same archive, subject to 239 limitations explained in this section. 241 The authority prefix, if present, helps to inform consumers what 242 uniqueness constraints have been used when identifying the archive, 243 without necessarily providing access to the archive. 245 1. If the prefix is "uuid," followed by a UUID [RFC4122], this 246 indicates a unique archive identity. 248 2. If the prefix is "uuid," followed by a v4 UUID [RFC4122], this 249 indicate uniqueness based on a random number generator. 250 Implementations creating random-based authorities SHOULD generate 251 the v4 random UUID using a suitable random number generator 252 [RFC4086]. 254 3. If the prefix is "uuid," followed by a v5 name-based UUID 255 [RFC4122], this indicates uniqueness based on an existing archive 256 location, typically an URL. 257 Implementations creating location-based authorities from an 258 archive's URL SHOULD generate the v5 UUID using the URL namespace 259 "6ba7b811-9dad-11d1-80b4-00c04fd430c8" and the particular URL 260 (see [RFC4122] section 4.3). 261 Note that while implementations cannot resolve which location was 262 used, they can confirm the name-based UUID if the location is 263 otherwise known. 265 4. If the prefix is "ni," this indicates a unique archive identity 266 based on a hashing of the archive's bytestream or content. 267 Implementations can assume that resources within an "ni" arcp 268 URIs remains static, although the implementation may use content 269 negotiation or similar transformations. 270 The checksum MUST be expressed according to [RFC6920]'s "alg-val" 271 production. 272 Implementations creating hash-based authorities from an archive's 273 bytestream SHOULD use the "sha-256" without truncation. 275 5. If the prefix is "name," this indicates that the authority is an 276 application or package name, typically as installed on a device 277 or system. 278 Implementations SHOULD assume that an unrecognized "name" 279 authority is only unique within a particular installation, but 280 MAY assume further uniqueness guarantees for names under their 281 control. 282 It is RECOMMENDED that implementations creating name-based 283 authorities use DNS names under their control, for instance an 284 app installed as "app.example.com" can make an authority 285 "name,app.example.com" to refer to its packaged resources, or 286 "name,foo.app.example.com" to refer to a "foo" container 287 distributed across all installations. 289 The uniqueness properties are unspecified for arcp URIs which 290 authority do not match any of the prefixes defined in this 291 specification. 293 4.2. Path semantics 295 The _path_ component of an arcp URI identify individual resources 296 within a particular archive, typically a _directory_ or _file_. 298 o If the _path_ is "/" - e.g. "arcp://uuid,833ebda2-f9a8-4462-b74a- 299 4fcdc1a02d22/" - then the arcp URI represent the archive itself, 300 typically represented as a root directory or collection. 302 o If the path ends with "/" then the path represents a directory or 303 collection. 305 The arcp URIs can be used for uniquely identifying the resources 306 independent of the location of the archive, such as within an 307 information system. 309 Assuming an appropriate resolution mechanism which have knowledge of 310 the corresponding archive, an arcp URI can also be used for 311 resolution. 313 Some archive formats might permit resources with the same (duplicate) 314 path, in which case it is undefined from this specification which 315 particular entry is described. 317 4.3. Resolution protocol 319 This specification do not define the protocol to resolve resources 320 according to the arcp URI scheme. For instance, one implementation 321 might rewrite arcp URIs to localized paths in a temporary directory, 322 while another implementation might use an embedded HTTP server. 324 It is envisioned that an implementation will have extracted or opened 325 an archive in advance, and assigned it an appropriate authority 326 according to Section 3.1. Such an implementation can then resolve 327 arcp URIs programmatically, e.g. by using in-memory access or mapping 328 paths to the extracted archive on the local file system. 330 Implementations that support resolving arcp URIs SHOULD: 332 1. Fail with the equivalent of _Not Found_ if the authority is 333 unknown. 335 2. Fail with the equivalent of _Gone_ if the authority is known, but 336 the content of the archive is no longer available. 338 3. Fail with the equivalent of _Not Found_ if the path does not map 339 to a file or directory within the archive. 341 4. Return the corresponding (potentially uncompressed) bytestream if 342 the path maps to a file within the archive. 344 5. Return an appropriate directory listing if the path maps to a 345 directory within the archive. 347 6. Return an appropriate directory listing of the archive's root 348 directory if the path is "/". 350 Not all archive formats or implementations will have the concept of a 351 directory listing, in which case the implementation MAY fail such 352 resolutions with the equivalent of "Not Implemented". 354 It is not undefined by this specification how an implementation can 355 determine the media type of a file within an archive. This could be 356 expressed in secondary resources (such as a manifest), be determined 357 by file extensions or magic bytes. 359 The media type "text/uri-list" [RFC2483] MAY be used to represent a 360 directory listing, in which case it SHOULD contain only URIs that 361 start with the arcp URI of the directory. 363 Some archive formats might support resources which are neither 364 directories nor regular files (e.g. device files, symbolic links). 365 This specification does not define the semantics of attempting to 366 resolve such resources. 368 This specification does not define how to change an archive or its 369 content using arcp URIs. 371 4.4. Resolving from a .well-known endpoint 373 If the "authority" component of an arcp URI matches the "alg-val" 374 production, an application MAY attempt to resolve the authority from 375 any ".well-known/ni/" endpoint [RFC5785] as specified in [RFC6920] 376 section 4, in order to retrieve the complete archive. Applications 377 SHOULD verify the checksum of the retrieved archive before resolving 378 the individual path. 380 5. Encoding considerations 382 The productions for "UUID" and "alg-val" are restricted to URI safe 383 ASCII and should not require any encoding considerations. 385 Care should be taken to %-encode the directory and file segments of 386 "path-absolute" according to [RFC3986] (for URIs) or [RFC3987] (for 387 IRIs). 389 When used as part an IRI, paths SHOULD be expressed using 390 international Unicode characters instead of %-encoding as ASCII. 392 Not all archive formats have an explicit character encoding specified 393 for their paths. If no such information is available for the archive 394 format, implementations MAY assume that the path component is encoded 395 with UTF-8 [RFC2279]. 397 Some archive formats have case-insensitive paths, in which cases it 398 is RECOMMENDED to preserve the casing as expressed in the archive. 400 6. Interoperability considerations 402 As multiple authorities are possible for the same archive 403 (Section 3.1), and path interpretation might vary, there can be 404 interoperability challenges when exchanging arcp URIs between 405 implementations. Some considerations: 407 1. Two implementations describe the same archive (e.g. stored in the 408 same local file path), but using different random-based UUID 409 authorities. The implementations may need to detect equality of 410 the two UUIDs out of band. 412 2. Two implementations describe an archive retrieved from the same 413 URL, with the same location-based UUID authority, but retrieved 414 at different times. The implementations might disagree about the 415 content of the archive. 417 3. Two implementations describe an archive retrieved from the same 418 URL, with the same location-based UUID authority, but retrieved 419 using different content negotiation resulting in different 420 archive representations. The implementations may disagree about 421 path encoding, file name casing or hierarchy. 423 4. Two implementations describe the same archive bytestream using 424 the hash-based authority, but they have used two different hash 425 algorithms. The implementations may need to negotiate to a 426 common hash algorithm. 428 5. Two implementations access the same archive, which contain file 429 paths with Unicode characters, but extracted to two different 430 file systems. Limitations and conventions for file names in the 431 local file system (such as Unicode normalization, case 432 insensitivity, total path length) may result in the 433 implementations having inconsistent or inaccessible paths. 435 7. Security Considerations 437 As when handling any content, extra care should be taken when 438 consuming archives and arcp URIs from unknown sources. 440 An archive could contain compressed files that expand to fill all 441 available disk space. 443 A maliciously crafted archive could contain paths with characters 444 (e.g. backspace) which could make an arcp URI invalid or misleading 445 if used unescaped. 447 A maliciously crafted archive could contain paths (e.g. combined 448 Unicode sequences) that cause the arcp URI to be very long, causing 449 issues in information systems propagating said URI. 451 An archive might contain symbolic links that, if extracted to a local 452 file system, might address files outside the archive's directory 453 structure. Implementations SHOULD detect such links and prevent 454 outside access. 456 An maliciously crafted arcp URI might contain "../" path segments, 457 which if naively converted to a "file:///" URI might address files 458 outside the archive's directory structure. Implementations SHOULD 459 perform Path Segment Normalization [RFC3986] before converting arcp 460 URIs. 462 In particular for IRIs, an archive might contain multiple paths with 463 similar-looking characters or with different Unicode combine 464 sequences, which could be used to mislead users. 466 An URI hyperlink might use or guess an arcp URI authority to attempt 467 to climb into a different archive for malicious purposes. 468 Applications SHOULD employ Same Orgin policy [RFC6454] checks if 469 resolving cross-references is not desired. 471 While a UUID or hash-based authority provide some level of 472 information hiding of an archive's origin, this should not be relied 473 upon for access control or anonymisation. Implementors should keep 474 in mind that such authority components in many cases can be 475 predictably generated by third-parties, for instance using dictionary 476 attacks. 478 8. IANA Considerations 480 This specification requests that IANA registers the following URI 481 scheme according to the provisions of [RFC7595]. 483 Scheme name: arcp 485 Status: provisional 487 Applications/protocols that use this protocol: Hypermedia-consuming 488 application that handle archives or packages. 490 Contact: Stian Soiland-Reyes stain@apache.org [1] 492 Change controller: Stian Soiland-Reyes 494 9. References 496 9.1. Normative References 498 [RFC2046] Freed, N. and N. Borenstein, "Multipurpose Internet Mail 499 Extensions (MIME) Part Two: Media Types", RFC 2046, 500 DOI 10.17487/RFC2046, November 1996, 501 . 503 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 504 Requirement Levels", BCP 14, RFC 2119, 505 DOI 10.17487/RFC2119, March 1997, 506 . 508 [RFC2279] Yergeau, F., "UTF-8, a transformation format of ISO 509 10646", RFC 2279, DOI 10.17487/RFC2279, January 1998, 510 . 512 [RFC2483] Mealling, M. and R. Daniel, "URI Resolution Services 513 Necessary for URN Resolution", RFC 2483, 514 DOI 10.17487/RFC2483, January 1999, 515 . 517 [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform 518 Resource Identifier (URI): Generic Syntax", STD 66, 519 RFC 3986, DOI 10.17487/RFC3986, January 2005, 520 . 522 [RFC3987] Duerst, M. and M. Suignard, "Internationalized Resource 523 Identifiers (IRIs)", RFC 3987, DOI 10.17487/RFC3987, 524 January 2005, . 526 [RFC4086] Eastlake 3rd, D., Schiller, J., and S. Crocker, 527 "Randomness Requirements for Security", BCP 106, RFC 4086, 528 DOI 10.17487/RFC4086, June 2005, 529 . 531 [RFC4122] Leach, P., Mealling, M., and R. Salz, "A Universally 532 Unique IDentifier (UUID) URN Namespace", RFC 4122, 533 DOI 10.17487/RFC4122, July 2005, 534 . 536 [RFC5785] Nottingham, M. and E. Hammer-Lahav, "Defining Well-Known 537 Uniform Resource Identifiers (URIs)", RFC 5785, 538 DOI 10.17487/RFC5785, April 2010, 539 . 541 [RFC6454] Barth, A., "The Web Origin Concept", RFC 6454, 542 DOI 10.17487/RFC6454, December 2011, 543 . 545 [RFC6839] Hansen, T. and A. Melnikov, "Additional Media Type 546 Structured Syntax Suffixes", RFC 6839, 547 DOI 10.17487/RFC6839, January 2013, 548 . 550 [RFC6920] Farrell, S., Kutscher, D., Dannewitz, C., Ohlman, B., 551 Keranen, A., and P. Hallam-Baker, "Naming Things with 552 Hashes", RFC 6920, DOI 10.17487/RFC6920, April 2013, 553 . 555 [RFC7595] Thaler, D., Ed., Hansen, T., and T. Hardie, "Guidelines 556 and Registration Procedures for URI Schemes", BCP 35, 557 RFC 7595, DOI 10.17487/RFC7595, June 2015, 558 . 560 [RFC8089] Kerwin, M., "The "file" URI Scheme", RFC 8089, 561 DOI 10.17487/RFC8089, February 2017, 562 . 564 9.2. Informative References 566 [FirefoxOS] 567 Mozilla Firefox, "Firefox OS security overview", 568 MDN Mozilla Developer Network Web Docs, February 2017, 569 . 573 [I-D.draft-kunze-bagit-14] 574 Kunze, J., Littman, J., Madden, L., Summers, E., Boyko, 575 A., and B. Vargas, "The BagIt File Packaging Format 576 (V0.97)", draft-kunze-bagit-14 (work in progress), October 577 2016. 579 [RFC4648] Josefsson, S., "The Base16, Base32, and Base64 Data 580 Encodings", RFC 4648, DOI 10.17487/RFC4648, October 2006, 581 . 583 [RFC6570] Gregorio, J., Fielding, R., Hadley, M., Nottingham, M., 584 and D. Orchard, "URI Template", RFC 6570, 585 DOI 10.17487/RFC6570, March 2012, 586 . 588 [ROBundle] 589 Soiland-Reyes, S., Gamble, M., and R. Haines, "Research 590 Object Bundle 1.0", Zenodo report, 591 DOI 10.5281/zenodo.12586, November 2014, 592 . 594 [W3C.NOTE-app-uri-20150723] 595 Caceres, M., "The app: URL Scheme", World Wide Web 596 Consortium NOTE NOTE-app-uri-20150723, July 2015, 597 . 599 [W3C.NOTE-widgets-uri-20120313] 600 Caceres, M., "Widget URI scheme", World Wide Web 601 Consortium NOTE NOTE-widgets-uri-20120313, March 2012, 602 . 604 [W3C.REC-ldp-20150226] 605 Speicher, S., Arwe, J., and A. Malhotra, "Linked Data 606 Platform 1.0", World Wide Web Consortium Recommendation 607 REC-ldp-20150226, February 2015, 608 . 610 [W3C.WD-appmanifest-20180118] 611 Caceres, M., Christiansen, K., Lamouri, M., Kostiainen, 612 A., and R. Dolin, "Web App Manifest", World Wide Web 613 Consortium WD WD-appmanifest-20180118, January 2018, 614 . 616 9.3. URIs 618 [1] mailto:stain@apache.org 620 Appendix A. Examples 622 A.1. Sharing using app names 624 A photo gallery application on a mobile device uses arcp URIs for 625 navigation between its UI states. The gallery is secured so that 626 other applications can't normally access its photos. 628 The application is installed as the app name "gallery.example.org" as 629 the vendor controls "example.org", making the corresponding name- 630 based arcp URI: 632 arcp://name,gallery.example.org/ 634 A user is at the application state which shows the newest photos as 635 thumbnails: 637 arcp://name,gallery.example.org/photos/?New 639 The user selects a photo, rendered with metadata overlaid: 641 arcp://name,gallery.example.org/photos/137 643 The user requests to "share" the photo, selecting 644 "messaging.example.com" which uses a common arcp URI framework on the 645 device. 647 The photo gallery registers with the device's arcp framework that the 648 chosen "messaging.example.com" gets read permission to its 649 "/photos/137" resource. 651 The sharing function returns a URI Template [RFC6570]: 653 arcp://name,messaging.example.com/share{;uri}{;redirect} 655 Filling in the template, the gallery requests to pop up: 657 arcp://name,messaging.example.com/share 658 ;uri=arcp://gallery.example.org/photos/137 659 ;redirect=arcp://gallery.example.org/photos/%3fNew 661 The arcp framework checks its registration for 662 "messaging.example.com" and finds the installed messaging 663 application. It performs permission checks that other apps are 664 allowed to navigate to its "/share" state. 666 The messaging app is launched and navigates to its "sharing" UI, 667 asking the user for a caption. 669 The messaging app requests the arcp framework to retrieve 670 "arcp://name,gallery.example.org/photos/137" using content 671 negotiation for an "image/jpeg" representation. 673 The arcp framework finds the installed photo gallery 674 "gallery.example.org", and confirms the read permission. 676 The photo gallery application returns a JPEG representation after 677 retrieving the photo from its internal store. 679 After the messaging app has completed sharing the picture bytestream, 680 it request the UI framework to navigate to: 682 arcp://name,gallery.example.org/photos/?New 684 The UI returns to the original view in the photo gallery. 686 If the messaging app had attempted to _retrieve_ the arcp URI 688 arcp://name,gallery.example.org/photos/?New 690 then it would be rejected by the arcp framework as permission was not 691 granted. 693 However, if such access had been granted, the gallery could return a 694 "text/uri-list" of the newest photos: 696 arcp://name,gallery.example.org/photos/137 697 arcp://name,gallery.example.org/photos/138 698 arcp://name,gallery.example.org/photos/139 700 This examples show that although an arcp URI represents a resource, 701 it can have different representations or UI states for different 702 apps. 704 A.2. Sandboxing 706 An document store application has received a file "document.tar.gz" 707 which content will be checked for consistency. 709 For sandboxing purposes it generates a UUID v4 "32a423d6-52ab-47e3- 710 a9cd-54f418a48571" using a pseudo-random generator. The arcp base 711 URI is thus: 713 arcp://uuid,32a423d6-52ab-47e3-a9cd-54f418a48571/ 715 The archive contains the files: 717 o "./doc.html" which links to "css/base.css" 719 o "./css/base.css" which links to "../fonts/Coolie.woff" 721 o "./fonts/Coolie.woff" 723 The application generates the corresponding arcp URIs and uses those 724 for URI resolutions to list resources and their hyperlinks: 726 arcp://uuid,32a423d6-52ab-47e3-a9cd-54f418a48571/doc.html 727 -> arcp://uuid,32a423d6-52ab-47e3-a9cd-54f418a48571/css/base.css 728 arcp://uuid,32a423d6-52ab-47e3-a9cd-54f418a48571/css/base.css 729 -> arcp://uuid,32a423d6-52ab-47e3-a9cd-54f418a48571/fonts/Coolie.woff 730 arcp://uuid,32a423d6-52ab-47e3-a9cd-54f418a48571/fonts/Coolie.woff 732 The application is now confident that all hyperlinked files are 733 indeed present in the archive. In its database it notes which 734 "tar.gz" file corresponds to UUID "32a423d6-52ab-47e3-a9cd- 735 54f418a48571". 737 If the application had encountered a malicious hyperlink 738 "../../../outside.txt" it would first resolve it to the absolute URI 739 "arcp://uuid,32a423d6-52ab-47e3-a9cd-54f418a48571/outside.txt" and 740 conclude from the "Not Found" error that the path "/outside.txt" was 741 not present in the archive. 743 A.3. Origin-based 745 A web crawler is about to index the content of the URL 746 "http://example.com/data.zip" and need to generate absolute URIs as 747 it continues crawling inside the individual resources of the archive. 749 The application generates a UUID v5 based on the URL namespace 750 "6ba7b811-9dad-11d1-80b4-00c04fd430c8" and the URL to the zip file: 752 >>> uuid.uuid5(uuid.NAMESPACE_URL, "http://example.com/data.zip") 753 UUID('b7749d0b-0e47-5fc4-999d-f154abe68065') 755 Thus the location-based arcp URI for indexing the ZIP content is 757 arcp://uuid,b7749d0b-0e47-5fc4-999d-f154abe68065/ 759 Listing all directories and files in the ZIP, the crawler finds the 760 URIs: 762 arcp://uuid,b7749d0b-0e47-5fc4-999d-f154abe68065/ 763 arcp://uuid,b7749d0b-0e47-5fc4-999d-f154abe68065/pics/ 764 arcp://uuid,b7749d0b-0e47-5fc4-999d-f154abe68065/pics/flower.jpeg 766 When the application encounters "http://example.com/data.zip" some 767 time later it can recalculate the same base arcp URI. This time the 768 ZIP file has been modified upstream and the crawler finds 769 additionally: 771 arcp://uuid,b7749d0b-0e47-5fc4-999d-f154abe68065/pics/cloud.jpeg 773 If files had been removed from the updated ZIP file the crawler can 774 simply remove those from its database, as it used the same arcp base 775 URI as in last crawl. 777 A.4. Hash-based 779 An application where users can upload software distributions for 780 virus checking needs to avoid duplication as users tend to upload 781 "foo-1.2.tar" multiple times. 783 The application calculates the "sha-256" checksum of the uploaded 784 file to be in hexadecimal: 786 17edf80f84d478e7c6d2c7a5cfb4442910e8e1778f91ec0f79062d8cbdef42cd 788 The "base64url" encoding [RFC4648] of the binary version of the 789 checksum is: 791 F-34D4TUeOfG0selz7REKRDo4XePkewPeQYtjL3vQs0 793 The corresponding "alg-val" authority is thus: 795 sha-256;F-34D4TUeOfG0selz7REKRDo4XePkewPeQYtjL3vQs0 797 From this the hash base arcp URL is: 799 arcp://ni,sha-256;F-34D4TUeOfG0selz7REKRDo4XePkewPeQYtjL3vQs0/ 800 The crawler finds that its virus database already contain entries 801 for: 803 arcp://ni,sha-256;F-34D4TUeOfG0selz7REKRDo4XePkewPeQYtjL3vQs0/bin/evil 805 and flags the upload as malicious without having to scan it again. 807 A.5. Archives that are not files 809 An application is relating BagIt archives [I-D.draft-kunze-bagit-14] 810 on a shared file system, using structured folders and manifests 811 rather than individual archive files. 813 The BagIt payload manifest "/gfs/bags/scan15/manifest-md5.txt" lists 814 the files: 816 49afbd86a1ca9f34b677a3f09655eae9 data/27613-h/q172.png 817 408ad21d50cef31da4df6d9ed81b01a7 data/27613-h/q172.txt 819 The application generates a random UUID v4 "ff2d5a82-7142-4d3f-b8cc- 820 3e662d6de756" which it adds to the bag metadata file 821 "/gfs/bags/scan15/bag-info.txt" 823 External-Identifier: ff2d5a82-7142-4d3f-b8cc-3e662d6de756 825 It then generates arcp URIs for the files listed in the manifest: 827 arcp://uuid,ff2d5a82-7142-4d3f-b8cc-3e662d6de756/data/27613-h/q172.png 828 arcp://uuid,ff2d5a82-7142-4d3f-b8cc-3e662d6de756/data/27613-h/q172.txt 830 When a different application on the same shared file system encounter 831 these arcp URIs, it can match them to the correct bag folder by 832 inspecting the "External-Identifier" metadata. 834 A.6. Linked Data containers which are not on the web 836 An application exposes in-memory objects of an Address Book as a 837 Linked Data Platform container [W3C.REC-ldp-20150226], but addressing 838 the container using arcp URIs instead of http to avoid network 839 exposure. 841 The arcp URIs are used in conjuction with a generic LDP client 842 library (developed for http), but connected to the application's URI 843 resolution mechanism. 845 The application generates a new random UUID v4 "12f89f9c-e6ca- 846 4032-ae73-46b68c2b415a" for the address book, and provides the 847 corresponding arcp URI to the LDP client: 849 arcp://uuid,12f89f9c-e6ca-4032-ae73-46b68c2b415a/ 851 The LDP client resolves the container with content negotiation for 852 the "text/turtle" media type, and receives: 854 @base . 855 @prefix ldp: . 856 @prefix dcterms: . 858 859 a ldp:BasicContainer; 860 dcterms:title "Address book"; 861 ldp:contains , . 863 The LDP client resolves the relative URIs to retrieve each of the 864 contacts: 866 arcp://uuid,12f89f9c-e6ca-4032-ae73-46b68c2b415a/contact1 867 arcp://uuid,12f89f9c-e6ca-4032-ae73-46b68c2b415a/contact2 869 A.7. Resolution of packaged resources 871 A virtual file system driver on a mobile operating system has mounted 872 several packaged applications for resolving common resources. An 873 application requests the rendering framework to resolve a picture 874 from "arcp://uuid,eb1edec9-d2eb-4736-a875-eb97b37c690e/img/logo.png" 875 to show it within a user interface. 877 The framework first checks that the authority "uuid,eb1edec9-d2eb- 878 4736-a875-eb97b37c690e" is valid to access according to the Same 879 Origin policies or permissions of the running application. It then 880 matches the authority to the corresponding application package. 882 The framework resolves "/img/logo.png" from within that package, and 883 returns an image buffer it already had cached in memory. 885 Appendix B. Acknowledgements 887 This specification is inspired by two original URI scheme proposals 888 from W3C, "app" from [W3C.NOTE-app-uri-20150723] and "widget" from 889 [W3C.NOTE-widgets-uri-20120313]. 891 The "app" URI scheme was used by packaged web apps in Mozilla's 892 Firefox OS [FirefoxOS] and to identify resources in Research Object 893 Bundles [ROBundle], however the W3C Notes did not progress further as 894 W3C Recommendation track documents, and their URI schemes were never 895 formally registered with IANA. 897 While the focus of the previous proposals was to specify how to 898 resolve resources from within a packaged application, this 899 specification generalize the URI scheme to support referencing and 900 identifying resources within any archive, package or application, and 901 adding flexibility for how resources can be resolved. 903 The authors would like to thank Graham Klyne, Carsten Bormann, Roy T. 904 Fielding, S Moonesamy and Julian Reschke for valuable feedback and 905 suggestions. 907 Authors' Addresses 909 Stian Soiland-Reyes 910 The University of Manchester 911 Oxford Road 912 Manchester 913 United Kingdom 915 Email: stain@apache.org 917 Marcos Caceres 918 Mozilla Corporation 920 Email: marcos@marcosc.com