idnits 2.17.1
draft-soilandreyes-arcp-03.txt:
Checking boilerplate required by RFC 5378 and the IETF Trust (see
https://trustee.ietf.org/license-info):
----------------------------------------------------------------------------
No issues found here.
Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
----------------------------------------------------------------------------
No issues found here.
Checking nits according to https://www.ietf.org/id-info/checklist :
----------------------------------------------------------------------------
No issues found here.
Miscellaneous warnings:
----------------------------------------------------------------------------
== The copyright year in the IETF Trust and authors Copyright Line does not
match the current year
-- The document date (February 05, 2018) is 2264 days in the past. Is this
intentional?
Checking references for intended status: Informational
----------------------------------------------------------------------------
-- Looks like a reference, but probably isn't: '1' on line 642
-- Looks like a reference, but probably isn't: '2' on line 644
-- Looks like a reference, but probably isn't: '3' on line 646
-- Looks like a reference, but probably isn't: '4' on line 648
-- Looks like a reference, but probably isn't: '5' on line 650
-- Looks like a reference, but probably isn't: '6' on line 652
-- Looks like a reference, but probably isn't: '7' on line 654
** Obsolete normative reference: RFC 2279 (Obsoleted by RFC 3629)
** Obsolete normative reference: RFC 5785 (Obsoleted by RFC 8615)
== Outdated reference: A later version (-17) exists of draft-kunze-bagit-14
Summary: 2 errors (**), 0 flaws (~~), 2 warnings (==), 8 comments (--).
Run idnits with the --verbose option for more detailed information about
the items above.
--------------------------------------------------------------------------------
2 Network Working Group S. Soiland-Reyes
3 Internet-Draft The University of Manchester
4 Intended status: Informational M. Caceres
5 Expires: August 9, 2018 Mozilla Corporation
6 February 05, 2018
8 The Archive and Package (arcp) URI scheme
9 draft-soilandreyes-arcp-03
11 Abstract
13 This specification define the Archive and Package URI scheme "arcp".
15 arcp URIs can be used to consume or reference hypermedia resources
16 bundled inside a file archive or an application package, as well as
17 to resolve URIs for archive resources within a programmatic
18 framework.
20 This URI scheme provides mechanisms to generate a unique base URI to
21 represent the root of the archive, so that relative URI references in
22 a bundled resource can be resolved within the archive without having
23 to extract the archive content on the local file system.
25 An arcp URI can be used for purposes of isolation (e.g. when
26 consuming multiple archives), security constraints (avoiding "climb
27 out" from the archive), or for externally identiyfing sub-resources
28 referenced by hypermedia formats.
30 Status of This Memo
32 This Internet-Draft is submitted in full conformance with the
33 provisions of BCP 78 and BCP 79.
35 Internet-Drafts are working documents of the Internet Engineering
36 Task Force (IETF). Note that other groups may also distribute
37 working documents as Internet-Drafts. The list of current Internet-
38 Drafts is at https://datatracker.ietf.org/drafts/current/.
40 Internet-Drafts are draft documents valid for a maximum of six months
41 and may be updated, replaced, or obsoleted by other documents at any
42 time. It is inappropriate to use Internet-Drafts as reference
43 material or to cite them other than as "work in progress."
45 This Internet-Draft will expire on August 9, 2018.
47 Copyright Notice
49 Copyright (c) 2018 IETF Trust and the persons identified as the
50 document authors. All rights reserved.
52 This document is subject to BCP 78 and the IETF Trust's Legal
53 Provisions Relating to IETF Documents
54 (https://trustee.ietf.org/license-info) in effect on the date of
55 publication of this document. Please review these documents
56 carefully, as they describe your rights and restrictions with respect
57 to this document. Code Components extracted from this document must
58 include Simplified BSD License text as described in Section 4.e of
59 the Trust Legal Provisions and are provided without warranty as
60 described in the Simplified BSD License.
62 Table of Contents
64 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
65 2. Requirements Language . . . . . . . . . . . . . . . . . . . . 4
66 3. Scheme syntax . . . . . . . . . . . . . . . . . . . . . . . . 4
67 3.1. Authority . . . . . . . . . . . . . . . . . . . . . . . . 5
68 3.2. Path . . . . . . . . . . . . . . . . . . . . . . . . . . 5
69 4. Scheme semantics . . . . . . . . . . . . . . . . . . . . . . 6
70 4.1. Authority semantics . . . . . . . . . . . . . . . . . . . 6
71 4.2. Path semantics . . . . . . . . . . . . . . . . . . . . . 7
72 4.3. Resolution protocol . . . . . . . . . . . . . . . . . . . 8
73 4.4. Resolving from a .well-known endpoint . . . . . . . . . . 9
74 5. Encoding considerations . . . . . . . . . . . . . . . . . . . 9
75 6. Interoperability considerations . . . . . . . . . . . . . . . 10
76 7. Security Considerations . . . . . . . . . . . . . . . . . . . 10
77 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11
78 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 12
79 9.1. Normative References . . . . . . . . . . . . . . . . . . 12
80 9.2. Informative References . . . . . . . . . . . . . . . . . 13
81 9.3. URIs . . . . . . . . . . . . . . . . . . . . . . . . . . 14
82 Appendix A. Examples . . . . . . . . . . . . . . . . . . . . . . 15
83 A.1. Sandboxing base URI . . . . . . . . . . . . . . . . . . . 15
84 A.2. Location-based . . . . . . . . . . . . . . . . . . . . . 15
85 A.3. Hash-based . . . . . . . . . . . . . . . . . . . . . . . 16
86 A.4. Archives that are not files . . . . . . . . . . . . . . . 17
87 A.5. Linked Data containers which are not on the web . . . . . 18
88 A.6. Resolution of packaged resources . . . . . . . . . . . . 18
89 A.7. Sharing using app names . . . . . . . . . . . . . . . . . 19
90 Appendix B. Acknowledgements . . . . . . . . . . . . . . . . . . 20
91 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 21
93 1. Introduction
95 Mobile and Web Applications may bundle resources such as stylesheets
96 with _relative URI references_ [RFC3986] (4.2 [1]) to scripts, images
97 and fonts. Resolving and parsing such resources within URI handling
98 frameworks may require generating absolute URIs and applying Same-
99 Origin [RFC6454] security policies separately for each app.
101 Software that is accessing resources bundled inside an archive (e.g.
102 "zip" or "tar.gz" file) can struggle to consume hypermedia content
103 types that use relative URI references such as "../css/", as it is
104 challenging to establishing the _base URI_ [RFC3986] (5.1 [2]) in a
105 consistent fashion.
107 Frequently the archive might be unpacked locally, implying base URIs
108 like "file:///tmp/a1b27ae03865/" to represent the root of the
109 archive. Such URIs are temporary, might not be globally unique, and
110 could be vulnerable to attacks such as "climbing out" of the root
111 directory.
113 An archive containing multiple HTML or Linked Data resources, such as
114 in a BagIt archive [I-D.draft-kunze-bagit-14], may be using relative
115 URIs to cross-reference constituent files, making it challenging to
116 index or annotate such resources.
118 Consumption of an archive with a consistent base URL should be
119 possible no matter from which location it was retrieved, on which
120 device it is inspected, and with which mechanism the archive is
121 accessed (e.g. virtual file system).
123 When consuming multiple archives from untrusted sources it would be
124 beneficial to have a Same Origin policy [RFC6454] so that relative
125 hyperlinks can't escape the particular archive.
127 The "file:" URI scheme [RFC8089] can be ill-suited for purposes such
128 as above, while a location-independent URI scheme can be more
129 flexible, secure and globally unique.
131 This specification define the Archive and Package URI scheme "arcp"
132 as an alternative to addressing resources within an archive,
133 application or package.
135 For the purpose of this specification, an *archive* is a collection
136 of sub-resources addressable by name or path. This definition covers
137 typical archive file formats like ".zip" or "tar.gz" and derived
138 "+zip" media types [RFC6839], but also non-file resource packages
139 like an LDP Container [W3C.REC-ldp-20150226], an installed Web App
141 [W3C.WD-appmanifest-20180118], or a BagIt folder structure
142 [I-D.draft-kunze-bagit-14].
144 For brevity, the term _archive_ is used throughout this
145 specification, although from the above it can also mean a
146 _container_, _application_, _aggregation_ or _package_.
148 The main purpose of arcp URIs is to provide consistent identifiers as
149 absolute URIs for nested resources. This specification does not
150 define a new network protocol, however it suggests an abstract
151 resolution protocol that implementations can apply using existing
152 protocols or programming frameworks.
154 2. Requirements Language
156 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
157 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
158 document are to be interpreted as described in [RFC2119].
160 3. Scheme syntax
162 The "arcp" URI scheme follows the [RFC3986] syntax for hierarchical
163 URIs according to the following productions:
165 arcp-URI = arcp-scheme ":" arcp-specific [ "#" fragment ]
167 arcp-scheme = "arcp"
169 arcp-specific = "//" arcp-authority [ path-absolute ] [ "?" query ]
171 The "arcp-authority" component provides a unique identifier for the
172 opened archive. See Section 3.1 for details.
174 The "path-absolute" component provides the absolute path of a
175 resource (e.g. a file or directory) within the archive. See
176 Section 3.2 for details.
178 The "query" component MAY be used, but its semantics is undefined by
179 this specification.
181 The "fragment" component MAY be used by implementations according to
182 [RFC3986] and the implied media type [RFC2046] of the resource at the
183 path. This specification does not specify how to determine the media
184 type.
186 3.1. Authority
188 The purpose of the "authority" component in an arcp URI is to build a
189 unique identifier for a particular archive. The authority is NOT
190 intended to be resolvable without former knowledge of the archive.
192 The authority of an arcp URI MUST be valid according to these
193 productions:
195 arcp-authority = uuid | ni | name | authority
196 uuid = "uuid," UUID
197 ni = "ni," alg-val
198 name = "name," reg-name
200 1. The prefix "uuid," combines with the "UUID" production as defined
201 in [RFC4122], e.g. "uuid,2a47c495-ac70-4ed1-850b-8800a57618cf"
203 2. The prefix "ni," combines with the "alg-val" production as
204 defined in [RFC6920], e.g. "ni,sha-
205 256;JCS7yveugE3UaZiHCs1XpRVfSHaewxAKka0o5q2osg8"
207 3. The prefix "name," combines with the "reg-name" production as
208 defined in [RFC3986], e.g. "name,app.example.com". For arcp IRIs
209 [RFC3987], its "ireg-name" production applies instead of "reg-
210 name".
212 4. The production "authority" matches its definition in [RFC3986],
213 or "iauthority" for arcp IRIs [RFC3987]. As this necessarily
214 also match the above prefixed productions, those should be
215 considered first before falling back to this production.
217 3.2. Path
219 The "path-absolute" component, if present, MUST match the production
220 in [RFC3986], or "ipath-absolute" for arcp IRIs [RFC3987]. This
221 provide the absolute path of a resource (e.g. a file or directory)
222 within the archive.
224 Archive media types vary in constraints and possibilities on how to
225 express paths, however implementations SHOULD use "/" as path
226 separator for nested folders and files.
228 It is RECOMMENDED to include the trailing "/" if it is known that the
229 path represents a directory.
231 4. Scheme semantics
233 This specification does not constrain what format might constitute an
234 _archive_, and neither does it require that the archive is
235 retrievable as a single bytestream or file.
237 Examples of retrievable archive media types include "application/
238 zip", "application/vnd.android.package-archive", "application/x-tar",
239 "application/x-gtar" and "application/x-7z-compressed".
241 Examples of non-file archives include an LDP Container
242 [W3C.REC-ldp-20150226], an installed Web App
243 [W3C.WD-appmanifest-20180118], or a BagIt folder structure
244 [I-D.draft-kunze-bagit-14].
246 4.1. Authority semantics
248 The _authority_ component identifies the archive itself.
250 Implementations MAY assume that two arcp URIs with the same authority
251 component relate to resources within the same archive, subject to
252 limitations explained in this section.
254 The authority prefix, if present, helps to inform consumers what
255 uniqueness constraints have been used when identifying the archive,
256 without necessarily providing access to the archive.
258 1. If the prefix is "uuid," followed by a UUID [RFC4122], this
259 indicates a unique archive identity. Applications MAY assume
260 that the corresponding "urn:uuid:" URI identifies the archive.
262 2. If the prefix is "uuid," followed by a v4 UUID [RFC4122] (4.4
263 [3]), this indicate uniqueness based on a random number
264 generator.
265 Implementations creating random-based authorities SHOULD generate
266 the v4 random UUID using a suitable random number generator
267 [RFC4086].
269 3. If the prefix is "uuid," followed by a v5 name-based UUID
270 [RFC4122] (4.3 [4]), this indicates uniqueness based on an
271 existing archive location, typically an URL.
272 Implementations creating location-based authorities SHOULD
273 generate the v5 UUID using the URL namespace "6ba7b811-9dad-
274 11d1-80b4-00c04fd430c8" and an retrievable archive URL. Note
275 that while implementations cannot resolve which location was
276 used, they can confirm the name-based UUID if the location is
277 otherwise known.
279 4. If the prefix is "ni," this indicates a unique archive identity
280 based on a hashing of the archive's bytestream or content.
281 Implementations MAY assume that resources within an "ni" arcp
282 URIs remains static, although the implementation may use content
283 negotiation or similar transformations.
284 The checksum MUST be expressed according to the "alg-val"
285 production in [RFC6920] (3 [5]). Implementations creating hash-
286 based authorities from an archive's bytestream SHOULD use the
287 hash method "sha-256" without truncation. Implementations MAY
288 assume that the corresponding "ni:" URI identifies the archive.
290 5. If the prefix is "name," this indicates that the authority is an
291 application or package name, typically as installed on a device
292 or system.
293 Implementations SHOULD assume that an unrecognised "name"
294 authority is only unique within a particular installation, but
295 MAY assume further uniqueness guarantees for names under their
296 control.
297 It is RECOMMENDED that implementations creating name-based
298 authorities use DNS names under their control, for instance an
299 app installed as "app.example.com" can make an authority
300 "name,app.example.com" to refer to its packaged resources, or
301 "name,foo.app.example.com" to refer to its dynamic container of
302 "foo" resources.
304 The uniqueness properties are *unspecified* for arcp URIs which
305 authority do not match any of the prefixes defined in this
306 specification.
308 4.2. Path semantics
310 The _path_ component of an arcp URI identify individual resources
311 within a particular archive, typically a _directory_ or _file_.
313 o If the _path_ is "/" - e.g. - then the arcp URI represent the archive itself,
315 typically represented as a root directory or collection.
317 o If the path ends with "/" then the path represents a directory or
318 collection.
320 The arcp URIs can be used for uniquely identifying resources within
321 an archive, such as in an information system considering multiple
322 archives.
324 Assuming an appropriate mechanism which have knowledge of the
325 corresponding archive, an arcp URI can also be used for resolution.
327 Some archive formats might permit resources with the same (duplicate)
328 path, in which case it is undefined from this specification which
329 particular entry is described.
331 4.3. Resolution protocol
333 This specification do not define a network protocol to resolve
334 resources according to the arcp URI scheme. For instance, one
335 implementation might rewrite arcp URIs to localized paths in a
336 temporary directory, while another implementation might use an
337 embedded HTTP server.
339 It is envisioned that an implementation will have accessed an archive
340 in advance, and assigned it an appropriate authority according to
341 Section 3.1. Such an implementation can then resolve arcp URIs, e.g.
342 by using in-memory archive access or mapping arcp paths to the the
343 local file system.
345 Implementations that support resolving arcp URIs SHOULD:
347 1. Fail with the equivalent of _Not Found_ if the authority is
348 unknown.
350 2. Fail with the equivalent of _Gone_ if the authority is known, but
351 the content of the archive is no longer available.
353 3. Fail with the equivalent of _Not Found_ if the path does not map
354 to a resource within the archive.
356 4. Return the corresponding (potentially uncompressed) bytestream if
357 the path maps to a file within the archive.
359 5. Return an appropriate directory listing if the path maps to a
360 directory within the archive.
362 6. Return an appropriate directory listing of the archive's root
363 directory if the path is "/".
365 Implementations MAY support other ways to resolve arcp URIs, e.g.
366 query parameters or content negotiation.
368 Not all archive formats or implementations will have the concept of a
369 directory listing, in which case the implementation MAY fail such
370 resolutions with the equivalent of "Not Implemented".
372 It is not undefined by this specification how an implementation can
373 determine the media type of a file within an archive. This could be
374 expressed in secondary resources (such as a manifest), be determined
375 by file extensions or magic bytes.
377 The media type "text/uri-list" [RFC2483] MAY be used to represent a
378 directory listing, in which case it SHOULD contain only URIs with the
379 arcp URI of the directory as a common base.
381 Some archive formats might support resources which are neither
382 directories nor regular files (e.g. device files, symbolic links).
383 This specification does not define the semantics of attempting to
384 resolve such resources.
386 This specification does not define how to change an archive or its
387 content using arcp URIs.
389 4.4. Resolving from a .well-known endpoint
391 If the "authority" component of an arcp URI matches the "alg-val"
392 production, an application MAY assume corresponding "ni:///" or
393 "nih:" URIs [RFC6920] identify the archive bytestream or content.
395 Applications MAY attempt to retrieve the corresponding archive from
396 any ".well-known/ni/" endpoint [RFC5785] as specified in [RFC6920] (4
397 [6]). Applications SHOULD verify the checksum of the retrieved
398 archive before resolving individual arcp paths.
400 5. Encoding considerations
402 The productions for "uuid" and "ni" are restricted to URI safe ASCII
403 and should not require any encoding considerations.
405 When arcp is used in IRIs [RFC3987], the "name" production permit
406 Unicode characters corresponding to its "ireg-name" production.
408 Care should be taken to %-encode the directory and file segments of
409 "path-absolute" according to [RFC3986] for URIs or "ipath-absolute"
410 [RFC3987] for IRIs.
412 Not all archive formats have an explicit character encoding specified
413 for their paths. If no such information is available for the archive
414 format, implementations MAY assume that the path component is encoded
415 with UTF-8 [RFC2279].
417 Some archive formats have case-insensitive paths, in which cases it
418 is RECOMMENDED to preserve the casing as expressed in the archive.
420 6. Interoperability considerations
422 As multiple authorities are possible for the same archive
423 (Section 3.1), and path interpretation might vary, there can be
424 interoperability challenges when exchanging arcp URIs between
425 implementations. Some considerations:
427 1. Two implementations describe the same archive (e.g. stored in the
428 same local file path), but using different random-based UUID
429 authorities. The implementations may need to detect equality of
430 the two UUIDs out of band.
432 2. Two implementations describe an archive retrieved from the same
433 URL, with the same location-based UUID authority, but retrieved
434 at different times. The implementations might disagree about the
435 content of the archive.
437 3. Two implementations describe an archive retrieved from the same
438 URL, with the same location-based UUID authority, but retrieved
439 using different content negotiation resulting in different
440 archive formats. The implementations may disagree about path
441 encoding, file name casing or hierarchy.
443 4. Two implementations describe the same archive bytestream using
444 the hash-based authority, but they have used two different hash
445 algorithms. The implementations may need to negotiate to a
446 common hash algorithm.
448 5. Two implementations access the same archive, which contain file
449 paths with Unicode characters, but extracted to two different
450 file systems. Limitations and conventions for file names in the
451 local file system (such as Unicode normalization, case
452 insensitivity, total path length) may result in the
453 implementations having inconsistent or inaccessible paths.
455 7. Security Considerations
457 As when handling any content, extra care should be taken when
458 consuming archives and arcp URIs from unknown sources.
460 Archives might contain malicious or inappropriate content or file
461 paths.
463 An archive could contain compressed files that expand to fill all
464 available disk space.
466 A maliciously crafted archive could contain paths with characters
467 (e.g. backspace) which could make an arcp URI invalid or misleading
468 if used unescaped.
470 A maliciously crafted archive could contain paths with character
471 combinations (e.g. combined Unicode sequences, text orientation
472 change) that cause the arcp URI to be very long or disruptive when
473 rendered in an user interface.
475 An archive might contain symbolic links that, if extracted to a local
476 file system, might address files outside the archive's directory
477 structure. Implementations SHOULD detect such links and prevent
478 outside access.
480 An maliciously crafted arcp URI might contain "../" path segments,
481 which if naively converted to a "file:///" URI might address files
482 outside the archive's directory structure. Implementations SHOULD
483 perform Path Segment Normalization [RFC3986] before converting arcp
484 URIs.
486 In particular for IRIs, an archive might contain multiple paths with
487 similar-looking characters or with different Unicode combine
488 sequences, which could be used to mislead users.
490 An URI hyperlink might use or guess an arcp URI authority to attempt
491 to climb into a different archive for malicious purposes.
492 Applications SHOULD employ Same Orgin policy [RFC6454] checks if
493 resolving cross-references is not desired.
495 While a UUID or hash-based authority provide some level of
496 information hiding of an archive's origin, this should not be relied
497 upon for access control or anonymisation. Implementors should keep
498 in mind that such authority components in many cases can be
499 predictably generated by third-parties, for instance using dictionary
500 attacks.
502 8. IANA Considerations
504 This specification requests that IANA registers the following URI
505 scheme according to the provisions of [RFC7595].
507 Scheme name: arcp
509 Status: provisional
511 Applications/protocols that use this protocol: Hypermedia-consuming
512 application that handle archives or packages.
514 Contact: Stian Soiland-Reyes stain@apache.org [7]
516 Change controller: Stian Soiland-Reyes
518 9. References
520 9.1. Normative References
522 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
523 Requirement Levels", BCP 14, RFC 2119,
524 DOI 10.17487/RFC2119, March 1997,
525 .
527 [RFC2279] Yergeau, F., "UTF-8, a transformation format of ISO
528 10646", RFC 2279, DOI 10.17487/RFC2279, January 1998,
529 .
531 [RFC2483] Mealling, M. and R. Daniel, "URI Resolution Services
532 Necessary for URN Resolution", RFC 2483,
533 DOI 10.17487/RFC2483, January 1999,
534 .
536 [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
537 Resource Identifier (URI): Generic Syntax", STD 66,
538 RFC 3986, DOI 10.17487/RFC3986, January 2005,
539 .
541 [RFC3987] Duerst, M. and M. Suignard, "Internationalized Resource
542 Identifiers (IRIs)", RFC 3987, DOI 10.17487/RFC3987,
543 January 2005, .
545 [RFC4086] Eastlake 3rd, D., Schiller, J., and S. Crocker,
546 "Randomness Requirements for Security", BCP 106, RFC 4086,
547 DOI 10.17487/RFC4086, June 2005,
548 .
550 [RFC4122] Leach, P., Mealling, M., and R. Salz, "A Universally
551 Unique IDentifier (UUID) URN Namespace", RFC 4122,
552 DOI 10.17487/RFC4122, July 2005,
553 .
555 [RFC5785] Nottingham, M. and E. Hammer-Lahav, "Defining Well-Known
556 Uniform Resource Identifiers (URIs)", RFC 5785,
557 DOI 10.17487/RFC5785, April 2010,
558 .
560 [RFC6454] Barth, A., "The Web Origin Concept", RFC 6454,
561 DOI 10.17487/RFC6454, December 2011,
562 .
564 [RFC6920] Farrell, S., Kutscher, D., Dannewitz, C., Ohlman, B.,
565 Keranen, A., and P. Hallam-Baker, "Naming Things with
566 Hashes", RFC 6920, DOI 10.17487/RFC6920, April 2013,
567 .
569 [RFC7595] Thaler, D., Ed., Hansen, T., and T. Hardie, "Guidelines
570 and Registration Procedures for URI Schemes", BCP 35,
571 RFC 7595, DOI 10.17487/RFC7595, June 2015,
572 .
574 9.2. Informative References
576 [FirefoxOS]
577 Mozilla Firefox, "Firefox OS security overview",
578 MDN Mozilla Developer Network Web Docs, February 2017,
579 .
583 [I-D.draft-kunze-bagit-14]
584 Kunze, J., Littman, J., Madden, L., Summers, E., Boyko,
585 A., and B. Vargas, "The BagIt File Packaging Format
586 (V0.97)", draft-kunze-bagit-14 (work in progress), October
587 2016.
589 [RFC2046] Freed, N. and N. Borenstein, "Multipurpose Internet Mail
590 Extensions (MIME) Part Two: Media Types", RFC 2046,
591 DOI 10.17487/RFC2046, November 1996,
592 .
594 [RFC4648] Josefsson, S., "The Base16, Base32, and Base64 Data
595 Encodings", RFC 4648, DOI 10.17487/RFC4648, October 2006,
596 .
598 [RFC6570] Gregorio, J., Fielding, R., Hadley, M., Nottingham, M.,
599 and D. Orchard, "URI Template", RFC 6570,
600 DOI 10.17487/RFC6570, March 2012,
601 .
603 [RFC6839] Hansen, T. and A. Melnikov, "Additional Media Type
604 Structured Syntax Suffixes", RFC 6839,
605 DOI 10.17487/RFC6839, January 2013,
606 .
608 [RFC8089] Kerwin, M., "The "file" URI Scheme", RFC 8089,
609 DOI 10.17487/RFC8089, February 2017,
610 .
612 [ROBundle]
613 Soiland-Reyes, S., Gamble, M., and R. Haines, "Research
614 Object Bundle 1.0", Zenodo report,
615 DOI 10.5281/zenodo.12586, November 2014,
616 .
618 [W3C.NOTE-app-uri-20150723]
619 Caceres, M., "The app: URL Scheme", World Wide Web
620 Consortium NOTE NOTE-app-uri-20150723, July 2015,
621 .
623 [W3C.NOTE-widgets-uri-20120313]
624 Caceres, M., "Widget URI scheme", World Wide Web
625 Consortium NOTE NOTE-widgets-uri-20120313, March 2012,
626 .
628 [W3C.REC-ldp-20150226]
629 Speicher, S., Arwe, J., and A. Malhotra, "Linked Data
630 Platform 1.0", World Wide Web Consortium Recommendation
631 REC-ldp-20150226, February 2015,
632 .
634 [W3C.WD-appmanifest-20180118]
635 Caceres, M., Christiansen, K., Lamouri, M., Kostiainen,
636 A., and R. Dolin, "Web App Manifest", World Wide Web
637 Consortium WD WD-appmanifest-20180118, January 2018,
638 .
640 9.3. URIs
642 [1] https://tools.ietf.org/html/rfc3986#section-4.2
644 [2] https://tools.ietf.org/html/rfc3986#section-5.1
646 [3] https://tools.ietf.org/html/rfc4122#section-4.4
648 [4] https://tools.ietf.org/html/rfc4122#section-4.3
650 [5] https://tools.ietf.org/search/rfc6920#section-3
652 [6] https://tools.ietf.org/html/rfc6920#section-4
654 [7] mailto:stain@apache.org
656 Appendix A. Examples
658 A.1. Sandboxing base URI
660 An document store application has received a file "document.tar.gz"
661 which content will be checked for consistency.
663 For sandboxing purposes it generates a UUID v4 "32a423d6-52ab-47e3-
664 a9cd-54f418a48571" using a pseudo-random generator. The arcp base
665 URI is thus:
667 arcp://uuid,32a423d6-52ab-47e3-a9cd-54f418a48571/
669 The archive contains the files:
671 o "./doc.html" which links to "css/base.css"
673 o "./css/base.css" which links to "../fonts/Foo.woff"
675 o "./fonts/Foo.woff"
677 The application generates the corresponding arcp URIs and uses those
678 for URI resolutions to list resources and their hyperlinks:
680 arcp://uuid,32a423d6-52ab-47e3-a9cd-54f418a48571/doc.html
681 -> arcp://uuid,32a423d6-52ab-47e3-a9cd-54f418a48571/css/base.css
682 arcp://uuid,32a423d6-52ab-47e3-a9cd-54f418a48571/css/base.css
683 -> arcp://uuid,32a423d6-52ab-47e3-a9cd-54f418a48571/fonts/Foo.woff
684 arcp://uuid,32a423d6-52ab-47e3-a9cd-54f418a48571/fonts/Foo.woff
686 The application is now confident that all hyperlinked files are
687 indeed present in the archive. In its database it notes which
688 "tar.gz" file corresponds to UUID "32a423d6-52ab-47e3-a9cd-
689 54f418a48571".
691 If the application had encountered a malicious hyperlink
692 "../../../outside.txt" it would first resolve it to the absolute URI
693 and
694 conclude from the "Not Found" error that the path "/outside.txt" was
695 not present in the archive.
697 A.2. Location-based
699 A web crawler is about to index the content of the URL
700 "http://example.com/data.zip" and need to generate absolute URIs as
701 it continues crawling inside the individual resources of the archive.
703 The application generates a UUID v5 based on the URL namespace
704 "6ba7b811-9dad-11d1-80b4-00c04fd430c8" and the URL to the zip file:
706 >>> uuid.uuid5(uuid.NAMESPACE_URL, "http://example.com/data.zip")
707 UUID('b7749d0b-0e47-5fc4-999d-f154abe68065')
709 Thus the location-based arcp URI for indexing the ZIP content is
711 arcp://uuid,b7749d0b-0e47-5fc4-999d-f154abe68065/
713 Listing all directories and files in the ZIP, the crawler finds the
714 URIs:
716 arcp://uuid,b7749d0b-0e47-5fc4-999d-f154abe68065/
717 arcp://uuid,b7749d0b-0e47-5fc4-999d-f154abe68065/pics/
718 arcp://uuid,b7749d0b-0e47-5fc4-999d-f154abe68065/pics/flower.jpeg
720 When the application encounters "http://example.com/data.zip" some
721 time later it can recalculate the same base arcp URI. This time the
722 ZIP file has been modified upstream and the crawler finds
723 additionally:
725 arcp://uuid,b7749d0b-0e47-5fc4-999d-f154abe68065/pics/cloud.jpeg
727 If files had been removed from the updated ZIP file the crawler can
728 simply remove those from its database, as it used the same arcp base
729 URI as in last crawl.
731 A.3. Hash-based
733 A repository where users can annotate content of open source software
734 distributions needs to avoid duplication, as users tend to upload
735 "foo-1.2.tar" multiple times.
737 The repository calculates the "sha-256" checksum of the uploaded file
738 to be in hexadecimal:
740 7f83b1657ff1fc53b92dc18148a1d65dfc2d4b1fa3d677284addd200126d9069
742 The "base64url" encoding [RFC4648] of the binary version of the
743 checksum is:
745 f4OxZX_x_FO5LcGBSKHWXfwtSx-j1ncoSt3SABJtkGk
747 The corresponding "alg-val" authority [RFC6920] is thus:
749 sha-256;f4OxZX_x_FO5LcGBSKHWXfwtSx-j1ncoSt3SABJtkGk
750 From this the hash-based arcp base URL is:
752 arcp://ni,sha-256;f4OxZX_x_FO5LcGBSKHWXfwtSx-j1ncoSt3SABJtkGk/
754 The repository adds annotations for detected source code files within
755 the archive.
757 A client is browsing the annotations and discovers:
759 arcp://ni,sha-256;f4OxZX_x_FO5LcGBSKHWXfwtSx-j1ncoSt3SABJtkGk/src/luhn.c
761 The client constructs the corresponding "ni" URI [RFC6920]:
763 ni:///sha-256;f4OxZX_x_FO5LcGBSKHWXfwtSx-j1ncoSt3SABJtkGk/
765 To retrieve the archive from "repo.example.com", the client resolve
766 the corresponding ".well-known" URI [RFC5785]:
768 http://repo.example.com/.well-known/
769 ni/sha-256/f4OxZX_x_FO5LcGBSKHWXfwtSx-j1ncoSt3SABJtkGk/
771 After the client verifies the corresponding "sha-256" checksum it
772 reads the path "/src/luhn.c" from the retrieved archive.
774 A.4. Archives that are not files
776 An application is storing BagIt archives [I-D.draft-kunze-bagit-14]
777 on a shared file system, using structured "bag" folders and manifests
778 rather than individual archive files.
780 The BagIt payload manifest "/gfs/bags/scan15/manifest-md5.txt" lists
781 the files:
783 49afbd86a1ca9f34b677a3f09655eae9 data/27613-h/q172.png
784 408ad21d50cef31da4df6d9ed81b01a7 data/27613-h/q172.txt
786 The application generates a random UUID v4 "ff2d5a82-7142-4d3f-b8cc-
787 3e662d6de756" and adds the corresponding "urn:uuid" UUID to the bag
788 metadata file "/gfs/bags/scan15/bag-info.txt"
790 External-Identifier: urn:uuid:ff2d5a82-7142-4d3f-b8cc-3e662d6de756/
792 It then generates arcp URIs for the files listed in the manifest:
794 arcp://uuid,ff2d5a82-7142-4d3f-b8cc-3e662d6de756/data/27613-h/q172.png
795 arcp://uuid,ff2d5a82-7142-4d3f-b8cc-3e662d6de756/data/27613-h/q172.txt
796 When a different application on the same shared file system encounter
797 these arcp URIs, it can match them to the correct bag folder by
798 inspecting the "External-Identifier" metadata.
800 A.5. Linked Data containers which are not on the web
802 An application exposes in-memory objects of an Address Book as a
803 Linked Data Platform container [W3C.REC-ldp-20150226], but addressing
804 the container using arcp URIs instead of http to avoid network
805 exposure.
807 The arcp URIs are used in conjuction with a generic LDP client
808 library (developed for http), but connected to the application's URI
809 resolution mechanism.
811 The application generates a new random UUID v4 "12f89f9c-e6ca-
812 4032-ae73-46b68c2b415a" for the address book, and provides the
813 corresponding arcp URI to the LDP client:
815 arcp://uuid,12f89f9c-e6ca-4032-ae73-46b68c2b415a/
817 The LDP client resolves the container with content negotiation for
818 the "text/turtle" media type, and receives:
820 @base .
821 @prefix ldp: .
822 @prefix dcterms: .
824
825 a ldp:BasicContainer;
826 dcterms:title "Address book";
827 ldp:contains , .
829 The LDP client resolves the relative URIs to retrieve each of the
830 contacts:
832 arcp://uuid,12f89f9c-e6ca-4032-ae73-46b68c2b415a/contact1
833 arcp://uuid,12f89f9c-e6ca-4032-ae73-46b68c2b415a/contact2
835 A.6. Resolution of packaged resources
837 A virtual file system driver on a mobile operating system has mounted
838 several packaged applications for resolving common resources. An
839 application requests the rendering framework to resolve a picture to
840 show it within a user interface:
842
845 The framework finds the corresponding application package, installed
846 as "app.example.com". It then checks that the authority
847 "name,app.example.com" is valid to access according to the Same
848 Origin policies or permissions of the running application.
850 The framework resolves "/img/logo.png" from within that package, and
851 returns an image buffer it already had cached in memory.
853 A.7. Sharing using app names
855 A photo gallery application on a mobile device uses arcp URIs for
856 navigation between its UI states. The gallery is secured so that
857 other applications can't normally access its photos.
859 The application is installed as the app name "gallery.example.org" as
860 the vendor controls "example.org", making the corresponding name-
861 based arcp URI:
863 arcp://name,gallery.example.org/
865 A user is at the application state which shows the newest photos as
866 thumbnails:
868 arcp://name,gallery.example.org/photos/?New
870 The user selects a photo, rendered with metadata overlaid:
872 arcp://name,gallery.example.org/photos/137
874 The user requests to "share" the photo, selecting
875 "messaging.example.com" which uses a common arcp URI framework on the
876 device.
878 The photo gallery registers with the device's arcp framework that the
879 chosen "messaging.example.com" should get read permission to its
880 "/photos/137" resource.
882 The sharing function returns a URI Template [RFC6570]:
884 arcp://name,messaging.example.com/share;+{uri};{+redirect}
886 Filling in the template, the gallery requests to pop up:
888 arcp://name,messaging.example.com/share
889 ;uri=arcp://gallery.example.org/photos/137
890 ;redirect=arcp://gallery.example.org/photos/%3fNew
892 The arcp framework checks its registration for
893 "messaging.example.com" and finds the installed messaging
894 application. It performs permission checks that other apps are
895 allowed to navigate to its "/share" state.
897 The messaging app is launched and navigates to its "sharing" UI,
898 asking the user for a caption.
900 The messaging app requests the arcp framework to retrieve the "uri"
901 using content
902 negotiation for an "image/jpeg" representation.
904 The arcp framework finds the installed photo gallery
905 "gallery.example.org", and confirms the read permission.
907 The photo gallery application returns a scaled down JPEG
908 representation after retrieving the photo from its internal store.
910 After the messaging app has completed sharing the picture bytestream,
911 it request the UI framework to navigate to the "redirect" state:
913 arcp://name,gallery.example.org/photos/?New
915 The UI returns to the original view in the photo gallery.
917 This example shows that although an arcp URI represents a resource,
918 it can have different representations or views in different apps.
920 Appendix B. Acknowledgements
922 This specification is inspired by two original URI scheme proposals
923 from W3C, "app" from [W3C.NOTE-app-uri-20150723] and "widget" from
924 [W3C.NOTE-widgets-uri-20120313].
926 The "app" URI scheme was used by packaged web apps in Mozilla's
927 Firefox OS [FirefoxOS] and to identify resources in Research Object
928 Bundles [ROBundle], however the W3C Notes did not progress further as
929 W3C Recommendation track documents, and their URI schemes were never
930 formally registered with IANA.
932 While the focus of the previous proposals was to specify how to
933 resolve resources from within a packaged application, this
934 specification generalize the URI scheme to support referencing and
935 identifying resources within any archive, package or application, and
936 adding flexibility for how resources can be resolved.
938 The authors would like to thank Graham Klyne, Carsten Bormann, Roy T.
939 Fielding, S Moonesamy, Julian Reschke and Frank Ellermann for
940 valuable feedback and suggestions.
942 Authors' Addresses
944 Stian Soiland-Reyes
945 The University of Manchester
946 Oxford Road
947 Manchester
948 United Kingdom
950 Email: stain@apache.org
951 URI: http://orcid.org/0000-0001-9842-9718
953 Marcos Caceres
954 Mozilla Corporation
956 Email: marcos@marcosc.com
957 URI: http://marcosc.com/