idnits 2.17.1 

draft-pwid-urn-specification-04.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack a Security Considerations section.

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  == There are 1 instance of lines with non-RFC2606-compliant FQDNs in the
     document.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == The document doesn't use any RFC 2119 keywords, yet seems to have RFC
     2119 boilerplate text.

  -- The document date (November 4, 2018) is 1993 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

     No issues found here.

     Summary: 2 errors (**), 0 flaws (~~), 3 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Internet Engineering Task Force                           E. Zierau, Ed.
3	Internet-Draft                                      Royal Danish Library
4	Intended status: Informational                          November 4, 2018
5	Expires: May 8, 2019

7	            A Persistent Web IDentifier (PWID) URN Namespace
8	                    draft-pwid-urn-specification-04

10	Abstract

12	   This document specifies a Uniform Resource Name (URN) for Persistent
13	   Web IDentifiers to web material in web archives using the 'pwid'
14	   namespace identifier.

16	   The main purpose of the standard is to support specification of
17	   references that are not covered by other reference techniques: to
18	   support references to material in web archives with restricted
19	   access.  Furthermore, it supports persistent technology agnostic
20	   references to web archives in general, in a form that can work as an
21	   algorithmic basis for finding web archive resources in general.  An
22	   additional important benefit is that it can be used in specifying web
23	   collections, which then can form a persistent computational basis for
24	   the extract of the archived collection parts.  Since the parts can be
25	   specified generally, this further allow collections to be specified
26	   with elements from one or more web archives.

28	   The PWID is designed for researchers and therefore it is designed as
29	   general, global, sustainable, humanly readable, technology agnostic,
30	   persistent and precise web references for web materials in web
31	   archives.

33	Status of This Memo

35	   This Internet-Draft is submitted in full conformance with the
36	   provisions of BCP 78 and BCP 79.

38	   Internet-Drafts are working documents of the Internet Engineering
39	   Task Force (IETF).  Note that other groups may also distribute
40	   working documents as Internet-Drafts.  The list of current Internet-
41	   Drafts is at https://datatracker.ietf.org/drafts/current/.

43	   Internet-Drafts are draft documents valid for a maximum of six months
44	   and may be updated, replaced, or obsoleted by other documents at any
45	   time.  It is inappropriate to use Internet-Drafts as reference
46	   material or to cite them other than as "work in progress."

48	   This Internet-Draft will expire on May 8, 2019.

50	Copyright Notice

52	   Copyright (c) 2018 IETF Trust and the persons identified as the
53	   document authors.  All rights reserved.

55	   This document is subject to BCP 78 and the IETF Trust's Legal
56	   Provisions Relating to IETF Documents
57	   (https://trustee.ietf.org/license-info) in effect on the date of
58	   publication of this document.  Please review these documents
59	   carefully, as they describe your rights and restrictions with respect
60	   to this document.  Code Components extracted from this document must
61	   include Simplified BSD License text as described in Section 4.e of
62	   the Trust Legal Provisions and are provided without warranty as
63	   described in the Simplified BSD License.

65	Table of Contents

67	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
68	     1.1.  Requirements Language . . . . . . . . . . . . . . . . . .   5
69	   2.  Namespace Registration Template . . . . . . . . . . . . . . .   5
70	   3.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  20
71	   4.  References  . . . . . . . . . . . . . . . . . . . . . . . . .  20
72	     4.1.  Normative References  . . . . . . . . . . . . . . . . . .  20
73	     4.2.  Informative References  . . . . . . . . . . . . . . . . .  20
74	   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . .  23

76	1.  Introduction

78	   The URN PWID is a supplement to existing reference standards, where
79	   the PWID will support references to web archives, including areas
80	   that are not supported today: support of references to material in
81	   web archives with restricted access.  Furthermore, it enables
82	   technology agnostic references to web archives in general, which can
83	   for instance can be needed for references to web material that is
84	   dynamic (e.g. a news site) or a specific version of a web material
85	   (e.g. specific version of the DOI handbook).

87	   The URN PWID is a form that can work as an algorithmic basis for
88	   finding the resource.  This also enables basis for computation of
89	   archived web parts to a collection from one or more web archives.

91	   Furthermore, the PWID includes information about the resource which
92	   makes it possible to find alternative resources, in cases where the
93	   original precise resource have become unavailable.

95	   The PWID URN is designed to be a persistent reference that is
96	   general, global and technology agnostic in order to enhance its
97	   chances for being sustainable.  Furthermore, it is designed to be
98	   humanly readable and with ability to make precision of the web
99	   archive resource covers.  This design enables a PWID URN to:

101	   o  be used for technical solutions e.g. to make them resolvable

103	   o  cover references to all sorts of materials in web archives

105	   o  cover references to materials from all sort of web archives

107	   The motivation for defining a PWID namespace is the growing challenge
108	   of references to archived web resources, which the PWID as a URN can
109	   assist in overcoming.  The standard is needed to address web
110	   materials meeting precision and persistency issues on par precision
111	   in with traditional references for analogue material.  Furthermore,
112	   it is needed in order to address web archive resources that are not
113	   freely available online.  The PWID URN covers both referencing of web
114	   resources from research papers and definition of web collection/
115	   corpus.  In detail the challenges are:

117	   o  Citation guidelines generally do not cover general and persistent
118	      referencing techniques for web resources that are not registered
119	      by Persistent Identifier systems (like DOI [DOI]).  However, an
120	      increasing number of references point to resources that only exist
121	      on the web, e.g. blogs that turned out to have a historical
122	      impact.  In order to obtain persistency for a reference, the
123	      target need to be stable.  As the live web is 'alive' and in
124	      constant change, persistency can only be obtained by referring to
125	      archived snapshots of the web.  The PWID URN is therefore focused
126	      on referencing archived web material in a technology agnostic way
127	      (research documented in [IPRES2016] and [ResawRef]).

129	   o  There are many new initiatives for web archive referencing, - most
130	      of them are centralised solutions which offers harvest and
131	      referencing, but these cannot be used for existing materials in
132	      web archives.  Other initiatives only cover open web archives,
133	      which does not cover material in archives with restricted access
134	      and where there is a risk of imprecision if a resource in an
135	      alternative archive is the result of resolving such a resource.
136	      The PWID URN is needed in order to fill these gaps where other
137	      techniques are not sufficient.

139	   o  There are many different requirements for construction of
140	      collection definitions for web material besides precision and
141	      persistency.  Recent research have found that various legal and
142	      sustainability issues leads to a need for a collection to be
143	      defined by references to the web parts in the collection.  The
144	      PWID URN is needed in such definitions in order to fulfil these
145	      requirements and to enable a collection to cover web materials
146	      from more archives (research documented in [ResawColl]).

148	   The PWID is especially useful for web material where precision is in
149	   focus and/or there are references to materials from web archives
150	   requiring special grants in order to gain access.  The precision
151	   regards both pointing to the archive where it was found and validated
152	   against its purpose (other archived versions in other web archives
153	   may differ both regarding completeness and contents even within short
154	   time periods) as well as precision about what is actually referred by
155	   the reference (e.g. is it the page or the whole website).

157	   Furthermore the PWID is very useful in specification of contents of a
158	   web collection (also known as web corpus).  Definitions of web
159	   collections are often needed for extraction of data used in
160	   production of research results, e.g. for evaluations in the future.
161	   Current practices today are not persistent as they often use some CDX
162	   version, which vary for different implementations.

164	   Strict syntax is needed for the PWID reference in order to ensure
165	   that it can be used for computational purposes.  This is especially
166	   relevant for automatic extraction of parts from web collection
167	   definitions.  Furthermore, readers of research papers are today
168	   expecting to be able to access a referenced resource by clicking an
169	   actionable URI, therefore a similar facility will be expected for
170	   references to available archived web material, which strict syntax
171	   can make possible.  Examples of technical solutions that is enabled
172	   are:

174	   o  resolving of a references and automatic extraction of web
175	      collection defined by PWID URNs [ResawRef] [ResawColl]

177	   o  Resolving of a PWID reference by resolving services.  As a start,
178	      there is work on a prototype that can work for the Danish web
179	      archive data and open web archives with standard patterns for the
180	      current technologies.  There may come different implementations
181	      for resolving which may rely on different protocols and
182	      application

184	   The purpose of the PWID is also to express a web archive reference as
185	   simple as possible and at the same time meeting requirements for
186	   sustainability, usability and scope.  Therefore, the PWID URN is
187	   focused on only having the minimum required information to make a
188	   precise identification of a resource in an arbitrary web archive.
189	   Resent research have found that this is obtain by the following
190	   information [ResawRef]:

192	   o  Identification of web archive
193	   o  Identification of source:

195	      *  Archived URI or identifier

197	      *  Archival timestamp

199	   o  Intended precision (page, part, subsite etc.)

201	   The PWID URN represents this information in a human readable way as
202	   well as a well-defined way that enables technical solutions to
203	   interpret the URN.

205	1.1.  Requirements Language

207	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
208	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
209	   document are to be interpreted as described in [RFC2119].

211	2.  Namespace Registration Template

213	   Namespace Identifier:

215	      PWID

217	   Version:

219	      4

221	   Date:

223	      2018-11-03

225	   Registrant:

227	      Eld Maj-Britt Olmuetz Zierau
228	      Royal Danish Library
229	      Soeren Kierkegaards Plads 1
230	      1219 Copenhagen
231	      Denmark
232	      ph: +45 9132 4690
233	      email: elzi@kb.dk

235	   Purpose:

237	      The URN PWID is a supplement to existing reference standards,
238	      where the PWID will support references to web archives, including
239	      areas that are not supported today: support of references to
240	      material in web archives with restricted access.  Furthermore it
241	      enables technology agnostic references to web archives in general,
242	      which can for instance can be needed for references to web
243	      material that is dynamic (e.g. a news site) or a specific version
244	      of a web material (e.g. specific version of the DOI handbook).

246	      The URN PWID is a form that can work as an algorithmic basis for
247	      finding the resource.  This also enables basis for computation of
248	      archived web parts to a collection from one or more web archives.

250	      Furthermore, the PWID includes information about the resource
251	      which makes it possible to find alternative resources, in cases
252	      where the original precise resource have become unavailable.

254	      The PWID URN is designed to be a persistent reference that is
255	      general, global and technology agnostic in order to enhance its
256	      chances for being sustainable.  Furthermore, it is designed to be
257	      humanly readable and with ability to make precision of the web
258	      archive resource covers.  This design enables a PWID URN to:

260	      *  be used for technical solutions e.g. to make them resolvable

262	      *  cover references to all sorts of materials in web archives

264	      *  cover references to materials from all sort of web archives

266	      The motivation for defining a PWID namespace is the growing
267	      challenge of references to archived web resources, which the PWID
268	      as a URN can assist in overcoming.  The standard is needed to
269	      address web materials meeting precision and persistency issues on
270	      par precision in with traditional references for analogue
271	      material.  Furthermore, it is needed in order to address web
272	      archive resources that are not freely available online.  The PWID
273	      URN covers both referencing of web resources from research papers
274	      and definition of web collection/corpus.  In detail the challenges
275	      are:

277	      *  Citation guidelines generally do not cover general and
278	         persistent referencing techniques for web resources that are
279	         not registered by Persistent Identifier systems (like DOI
280	         [DOI]).  However, an increasing number of references point to
281	         resources that only exist on the web, e.g. blogs that turned
282	         out to have a historical impact.  In order to obtain
283	         persistency for a reference, the target need to be stable.  As
284	         the live web is 'alive' and in constant change, persistency can
285	         only be obtained by referring to archived snapshots of the web.
286	         The PWID URN is therefore focused on referencing archived web
287	         material in a technology agnostic way (research documented in
288	         [IPRES2016] and [ResawRef]).

290	      *  There are many new initiatives for web archive referencing, -
291	         most of them are centralised solutions which offers harvest and
292	         referencing, but these cannot be used for existing materials in
293	         web archives.  Other initiatives only cover open web archives,
294	         which does not cover material in archives with restricted
295	         access and where there is a risk of imprecision if a resource
296	         in an alternative archive is the result of resolving such a
297	         resource.  The PWID URN is needed in order to fill these gaps
298	         where other techniques are not sufficient.

300	      *  There are many different requirements for construction of
301	         collection definitions for web material besides precision and
302	         persistency.  Recent research have found that various legal and
303	         sustainability issues leads to a need for a collection to be
304	         defined by references to the web parts in the collection.  The
305	         PWID URN is needed in such definitions in order to fulfil these
306	         requirements and to enable a collection to cover web materials
307	         from more archives (research documented in [ResawColl]).

309	      The PWID is especially useful for web material where precision is
310	      in focus and/or there are references to materials from web
311	      archives requiring special grants in order to gain access.  The
312	      precision regards both regards precise reference where there can
313	      be no doubt about that you have the correct web material as well
314	      as precision about what is actually referred by the reference
315	      (e.g. is it the page or the whole website)

317	      Furthermore, the PWID is very useful in specification of contents
318	      of a web collection (also known as web corpus).  Definitions of
319	      web collections are often needed for extraction of data used in
320	      production of research results, e.g. for evaluations in the
321	      future.  Current practices today are not persistent as they often
322	      use some CDX version, which vary for different implementations.

324	      Strict syntax is needed for the PWID reference in order to ensure
325	      that it can be used for computational purposes.  This is
326	      especially relevant for automatic extraction of parts from web
327	      collection definitions.  Furthermore, readers of research papers
328	      are today expecting to be able to access a referenced resource by
329	      clicking an actionable URI, therefore a similar facility will be
330	      expected for references to available archived web material, which
331	      strict syntax can make possible.  Examples of technical solutions
332	      that is enabled are:

334	      *  resolving of a references and automatic extraction of web
335	         collection defined by PWID URNs [ResawRef] [ResawColl]

337	      *  Resolving of a PWID reference by resolving services.  As a
338	         start, there is work on a prototype that can work for the
339	         Danish web archive data and open web archives with standard
340	         patterns for the current technologies.  There may come
341	         different implementations for resolving which may rely on
342	         different protocols and application

344	      The purpose of the PWID is also to express a web archive reference
345	      as simple as possible and at the same time meeting requirements
346	      for sustainability, usability and scope.  Therefore, the PWID URN
347	      is focused on only having the minimum required information to make
348	      a precise identification of a resource in an arbitrary web
349	      archive.  Resent research have found that this is obtain by the
350	      following information [ResawRef]:

352	      *  Identification of web archive

354	      *  Identification of source:

356	         +  Archived URI or identifier

358	         +  Archival timestamp

360	      *  Intended precision (page, part, subsite etc.)

362	      The PWID URN represents this information in a human readable way
363	      as well as a well-defined way that enables technical solutions to
364	      interpret the URN.

366	   Syntax:

368	      The syntax of the PWID URN is specified below in Augmented Backus-
369	      Naur Form (ABNF) [RFC5234] and it conforms to URN syntax defined
370	      in [RFC8141].  The syntax definition of the PWID URN is:

372	           pwid-urn = "urn" ":" pwid-NID ":" pwid-NSS

374	           pwid-NID = "pwid"
375	           pwid-NSS = archive-id ":" archival-time ":" precision-spec
376	                                 ":" archived-item

378	           archive-id = +( unreserved )
379	           precision-spec = "part" / "page" / "subsite" / "site"
380	                    / "collection" / "recording" / "snapshot"
381	                    / "other"

383	           archived-item = URI / archived-item-id
384	           archived-item-id = +( unreserved )

386	      where

388	      *  'archival-time' is a UTC timestamp as described in the W3C
389	         profile of [ISO8601] [W3CDTF] (also defined in [RFC3339]), for
390	         example YYYY-MM-DDThh:mm:ssZ.  The 'archival-time' shall
391	         represent the timestamp that the web archive have recorded for
392	         the referenced archived URI.  The archival-time may be
393	         specified at any of the levels of granularity described in
394	         [W3CDTF], as long as it reflects exactly the granularity of the
395	         timestamp recorded in the web archive, which is in accordance
396	         with the WARC standard [ISO28500].

398	      *  'unreserved' is defined as in [RFC3986]

400	      *  'precision-spec' values are not case sensitive (i.e.  "PAGE" /
401	         "PART" / "PaGe" / ... are valid values as well.)

403	      *  'URI' is defined as in [RFC3986] but where occurrences of "[",
404	         "]", "?" and "#" are %-encoded in order not to clash with URN
405	         reserved characters [RFC8141]

407	      The precision specification is expressing the intended precision
408	      of the reference.  For example, if the reference is to an html web
409	      element, this element can be interpreted in several ways:

411	      *  As just one web part
412	         Meaning the file containing the html, and precisely this file

414	      *  A web page
415	         Meaning that an application like Wayback shows result in a
416	         browser, and calculates referenced web parts (display
417	         templates, images etc.) and use these found web parts in the
418	         result.
419	         If the full reference only contains the PWID URN for the page,
420	         this may mean that the archived page can change look over time,
421	         e.g. in case that parts referred by the page did not exist at
422	         reference time, but are harvested at a later stage, - or in
423	         case the web archive's algorithm for calculation of the
424	         referred web parts are changed and given a different result.
425	         In order to make a precise reference to a picture in context of
426	         a web page, the most precise reference will be to provide the
427	         PWID URN for the page (with page precision) and the PWID URN
428	         for the image file part which contains the referred picture
429	         (with part precision)

431	      *  As a site or subsite
432	         Meaning that an application like Wayback shows result in a
433	         browser showing the web page, - and if there are restricted
434	         access according to the reference, the application also needs
435	         to make sure that all parts/pages belonging to the site/subsite
436	         is available.
437	         If the full reference only contains the PWID URN for the site/
438	         subsite, this may mean that the site/subsite can change its
439	         appearance over time, in the same way as for the web page
440	         described above.

442	      The precision specification needs to be part of an URN PWID in
443	      order to enable the person making the above described precision in
444	      the reference.  Furthermore this precesion specification will make
445	      it possible for resolvers to display the referred source in a way
446	      that corresponds to the precision specification.

448	      Especially for web materials, there can be different ways to
449	      represent e.g. a web page, which provides different precision of
450	      the source as well.  The above examples with part, page, subsite
451	      and site are addressing the most common access via browser
452	      functionality like in Wayback.  However, there are also web
453	      archives that archive snapshots of the web pages for the archived
454	      URI.  A third option can be to produce a collection of archived
455	      URIs as basis for browser access instead of letting the web
456	      archive calculate sub items (which may change over time).  An
457	      example of the production of such a collection is provided in the
458	      section about assignment.  Lastly, a web page may be archived via
459	      a web recording.

461	      As consequence of the above, there are following valid precision-
462	      spec values:

464	      *  part
465	         the single archived web part harvested as a file from the
466	         specified URI, e.g. a pdf, an html text, an image

468	      *  page
469	         the web page represented by the web page file (e.g. html)
470	         harvested from the specified URI, where this contents is
471	         interpreted as a web page with all referred parts relevant to
472	         display the web page (but where referred parts must be
473	         calculated as described above), e.g. an html page with referred
474	         images

476	      *  subsite
477	         The referred web page (as described under 'page') where it is
478	         possible to browse to all references starting with the same
479	         path as the archived URI

481	      *  site
482	         The referred web page (as described under 'page') where it is
483	         possible to browse to all references in the domain specified in
484	         the archived URI

486	      *  collection
487	         Representation of a collection specification, where it is up to
488	         the web archive applications to find out how it is rendered
489	         (e.g. collection specification in the XML format enabling
490	         interpretation as in the example provided in [ResawColl])

492	      *  snapshot
493	         a snapshot (image) representation of web material, e.g. a web
494	         page

496	      *  recording
497	         Representation of a web recording specification where it is up
498	         to the web archive applications to find out how it is rendered
499	         (where interpretation could depends on file-suffix for the web
500	         recording), an example is web recording coded in a WARC file

502	      *  other
503	         This is a placeholder to allow reference of a resource of any
504	         kind with an assigned identifier (by the archive).  In all
505	         cases, it will be up to the application serving the web archive
506	         to interpret how this item should be rendered

508	   Assignment:

510	      The PWID URNs does not have to be assigned by an authority, as
511	      they are based on the information created at the time of
512	      archiving.  In other words: the PWID URNs are created
513	      independently, but following an algorithm which ensures that the
514	      referred item can be found if it is still available.  It also has
515	      the benefit that it includes information to look alternative
516	      resources e.g. via Memento for some open web archives [MEMENTO] or
517	      via possibly coming web archive infrastructures.

519	      A PWID URN is created by finding the relevant information of the
520	      syntax parts of the PWID on form:

522	           "urn:pwid:" archive-id ":" archival-time ":" precision-spec
523	                                  ":" archived-item

525	      The PWID URN for an archived item in hand can be constructed by
526	      exchanging the unspecified PWID parts with relevant information,
527	      as explained in the following:

529	      *  archive-id (identification of web archive):
530	         In this version of the standard, it is recommended to use the
531	         domain of the web archive as the identifier for the web archive
532	         (e.g. archive.org for Internet Archives open web archive).
533	         This is recommended, since browsing of this domain page
534	         typically will lead to description of how to access the web
535	         archive, e.g. online or by applying for access grants.
536	         Furthermore, it is more precise than e.g.  the name of the
537	         archive, since there may be more than one installation of web
538	         archives in the same organisation, e.g.  archive.org and
539	         archive-it.org are both covered by Internet Archive.  When a
540	         registry of web archives are established it will be more
541	         precise and persistent to use the web archive identifier
542	         specified in this registry (e.g.  DKWA for the Danish web
543	         archive with domain netarkivet.dk)

545	      *  archival-time (archival timestamp):
546	         The archival time for the archived item in hand may be
547	         displayed along with the archived item, but there are different
548	         implementation where it is important to be aware of whether a
549	         more precise timestamp can be found, and that it is the correct
550	         timestamp that is used.  For many Wayback implementation the
551	         precise time can be found as part of the URI used for viewing
552	         the archived item, e.g. in the example of
553	         https://web.archive.org/web/20160122112029/http://www.dr.dk
554	         viewable by the Internet Archives Wayback installation, the
555	         number 20160122112029 represents the archival time
556	         2016-01-22T11:20:29Z.  In other installations.  In other
557	         installations, the most precise time may be found in the URI
558	         from a search result leading to the resource (which usually
559	         redirects on basis of a call to the underlying archive index).
560	         Especially for web pages with frames, there may be cases where
561	         the actual time is not displayed with the source, since only
562	         the times for the contents of the frames are displayed.

564	      *  precision-spec (precision as represented page, part, site,
565	         snapshot etc.):
566	         The precision specification specifies how the user should view
567	         the referred item - either as a specific representation (with
568	         inherited precision) or by use of tools (e.g. browse web site
569	         based on calculations or browse on basis of collection of
570	         specific parts).

572	         Since the archived URI can have different forms indicated by
573	         the precision specification, this information may be used in
574	         resolution and location.
575	         For most imprecision types are the ones that involves
576	         calculation, i.e. page, site or subsite.  For items like an
577	         image that have no references to calculate the precision is
578	         best described by part, since it also tells that it is a
579	         precise reference.

581	      *  archived-item (archived URI or identifier):
582	         The archived item will be the URI (or identifier assigned for a
583	         resource by the archive) of the displayed the archived item in
584	         hand.

586	      A much easier way to construct PWID URNs is to use tools that
587	      construct them.  Currently, there is also a prototype for a SOLR-
588	      Wayback tool (Source at https://github.com/netarchivesuite/
589	      solrwayback) [PWIDprovider], which can assist in finding the most
590	      precise reference to an archived web page.  This Wayback version
591	      can provide all PWID URNs belonging a shown page (with the page
592	      PWID URN at the top).  For example, in netarkivet.dk, the archived
593	      URI for the web page http://www.susanlegetoej.dk/shop/handskedyr-
594	      siameser-killing-8681p.html archived 2008-11-29 01:19:16 UTC, has
595	      the following parts calculated by the SOLR-Wayback tool:

597	         urn:pwid:netarkivet.dk:2008-11-
598	         29T00:41:42Z:part:http://www.susanlegetoej.dk/images/ddcss/
599	         SK113_Master_NF.css

601	         urn:pwid:netarkivet.dk:2008-11-
602	         29T00:39:47Z:part:http://www.susanlegetoej.dk/shop/css/
603	         print.css

605	         urn:pwid:netarkivet.dk:2008-11-
606	         29T00:40:06Z:part:http://www.susanlegetoej.dk/images/ddcss/
607	         SK113_Basket_NF.css

609	         urn:pwid:netarkivet.dk:2008-11-
610	         29T00:40:00Z:part:http://www.susanlegetoej.dk/images/ddcss/
611	         SK113_TopMenu_NF.css

613	         urn:pwid:netarkivet.dk:2008-11-
614	         29T00:40:00Z:part:http://www.susanlegetoej.dk/images/ddcss/
615	         SK113_SearchPage_NF.css

617	         urn:pwid:netarkivet.dk:2008-11-
618	         29T00:40:35Z:part:http://www.susanlegetoej.dk/images/ddcss/
619	         SK113_Productmenu_NF.css
620	         urn:pwid:netarkivet.dk:2008-11-
621	         29T00:40:22Z:part:http://www.susanlegetoej.dk/images/ddcss/
622	         SK113_SpaceTop_NF.css

624	         urn:pwid:netarkivet.dk:2008-11-
625	         29T00:40:24Z:part:http://www.susanlegetoej.dk/images/ddcss/
626	         SK113_SpaceLeft_NF.css

628	         urn:pwid:netarkivet.dk:2008-11-
629	         29T00:40:23Z:part:http://www.susanlegetoej.dk/images/ddcss/
630	         SK113_SpaceBottom_NF.css

632	         urn:pwid:netarkivet.dk:2008-11-
633	         29T00:40:25Z:part:http://www.susanlegetoej.dk/images/ddcss/
634	         SK113_SpaceRight_NF.css

636	         urn:pwid:netarkivet.dk:2008-11-
637	         29T00:37:23Z:part:http://www.susanlegetoej.dk/images/ddcss/
638	         SK113_ProductInfo_NF.css

640	         urn:pwid:netarkivet.dk:2008-11-
641	         29T00:37:24Z:part:http://www.susanlegetoej.dk/Shop/js/
642	         Variants.js

644	         urn:pwid:netarkivet.dk:2009-03-
645	         03T11:53:00Z:part:http://www.susanlegetoej.dk/Shop/js/Media.js

647	         urn:pwid:netarkivet.dk:2009-03-
648	         03T11:53:02Z:part:http://www.susanlegetoej.dk/images/design/
649	         print.gif

651	         urn:pwid:netarkivet.dk:2009-03-
652	         03T11:54:19Z:part:http://www.susanlegetoej.dk/Shop/js/Scroll.js

654	         urn:pwid:netarkivet.dk:2009-03-
655	         03T11:54:09Z:part:http://www.susanlegetoej.dk/Shop/js/
656	         Shop5Common.js

658	         urn:pwid:netarkivet.dk:2006-11-
659	         20T20:16:03Z:part:http://www.susanlegetoej.dk/images/602551.jpg

661	   Security and Privacy:

663	      Security and privacy considerations are restricted to accessible
664	      web resources in web archives.  Resolvers to PWID URNs will
665	      usually only be possible using the web archives' access tools,
666	      where security and privacy are covered by these tools.  In such
667	      cases security and privacy will covered by such tools, since the
668	      information used for access has no security and privacy issues.
669	      In the cases where resolution is made around the archives' access
670	      tools, there should be made separate analysis.

672	   Interoperability:

674	      This is covered by comments in the Syntax description:

676	      *  the PWID URN conforms to the URI standard defined as in RFC
677	         3986 [RFC3986] and the URN standard RFC 8141 [RFC8141]

679	      *  the 'archival-time' of the PWID URN conforms UTC timestamp as
680	         described in the W3C profile of ISO 8601 [ISO8601] [W3CDTF] and
681	         is in accordance with the WARC standard ISO 28500 [ISO28500].

683	      *  the 'archived-item' is either an assigned identifier (the URN
684	         standard RFC 8141 [RFC8141]) or an URI which conforms to the
685	         URI standard defined as in RFC 3986 [RFC3986], with %-encodings
686	         of "[", "]", "#", and "?" in order to conform to the the URN
687	         standard RFC 8141 [RFC8141]

689	   Resolution:

691	      The information in a PWID URN can be used for locating a web
692	      archive resource, for any kind of web archive.  It includes the
693	      minimum information for web archive materials, which enables
694	      resolvability, manually or by a resolver.  Resolution of a PWID
695	      URN is the primary motivation of making a formal URN definition,
696	      instead of just textual representation of the for needed parts of
697	      a PWID.

699	      Resolution (manually or automatically) is done based on the PWID
700	      parts:

702	      *  Web archive identification for web archive holding referred
703	         resource
704	         The identifier is either an identifier where location of the
705	         web archive can be found by looking up the identifier in a
706	         registry, - or it is the domain name for the web archive, where
707	         browsing this domain page typically will lead to description of
708	         how to access the web archive, e.g. online or by applying for
709	         access grants

711	      *  Archived URI or identifier of archived item
712	         If the resource is an archived URI, this URI must be used in
713	         search for or construction of location of the resource.  If the
714	         resource is an identifier assigned to the resource (by the
715	         archive), it is this identifier that must be used in search for
716	         or construction of location of the resource

718	      *  Date and time associated with the archived item
719	         The archival date and time must be used in search for or
720	         construction of the location of the resource

722	      *  Precision of what is referred
723	         The precision can either contribute to the guidance of
724	         activating tools to view the referred item e.g. browse the
725	         referred item as a page on basis of computed closest past,
726	         browse the referred item on basis of parts specified in a
727	         collection, or view the referred item as a snapshot.  In the
728	         example of the snapshot, it also contains a specification of
729	         which resource to display

731	      In the following the different resolution techniques are explained
732	      (manual as well as via a service) .

734	      An example of a PWID URN is:

736	         urn:pwid:archive.org:2016-01-22T11:20:29Z:page:http://www.dr.dk

738	      has the information:

740	      *  archive.org
741	         Currently known identifier in form of the Internet Archive
742	         domain name for their open access web archive.  If Internet
743	         Archive registered their open web archive in an IANA web
744	         archive register, this identifier could currently be
745	         "web.archive.org/web/" for Wayback resolution, or it could be
746	         "archive.org/pwid/" if a PWID interface was created as
747	         described below

749	      *  2016-01-22T11:20:29Z
750	         UTC date and time associated with the archived URI

752	      *  page
753	         Clarification that the reference cover the full web page with
754	         all its inherited parts selected by the web archive

756	      *  http://www.dr.dk
757	         archived URI of item

759	      Based on the current (2018) knowledge of Internet Archive's open
760	      access web interface, which has the pattern:

762	         https://web.archive.org/web/<time>/<uri>

764	      If the web archive has registered an identifier for the web
765	      archive along with the prefix before <time> and <uri>, then this
766	      identifier can be used to manually (or automatically) deduce the
767	      prefix via this register

769	      we can manually (or automatically) deduce an actual (current 2018)
770	      access https address for Internet Archives Wayback application
771	      (where only digits from the date is included):

773	         https://web.archive.org/web/20160122112029/http://www.dr.dk

775	      The same recipe can be used for other Wayback platforms for open
776	      web archives.

778	      Another manual resolution would be to find the resource by use of
779	      the specified web archive's search interface.  This will work for
780	      both open web archives and web archives with restricted access.

782	      It is also noteworthy that the information in the PWID can help in
783	      finding an alternative resource, in case the original referred
784	      resource is not available anymore.  The archived URI can be
785	      searched in other web archives, where the date and time can help
786	      to find the best match found, e.g. via Memento (for some open web
787	      archives) or via possibly coming web archive infrastructures.

789	      Regarding the precision specification, there are not yet any
790	      implementations which support distinctive rendering depending on
791	      such a parameter, e.g. only providing html for an html page
792	      specified as part and the page with calculated elements if
793	      specified as page etc.  Therefore, the precision specification
794	      will initially be ignored by a resolution to a Wayback interface.

796	      A resolving service is currently available in form of code for a
797	      prototype which run at the Royal Danish Library [PWIDresolver] and
798	      is planned to be more broadly available.  This service currently
799	      covers both the Danish web archive (with the proper rights) and
800	      open web archives with access services based on a patterns
801	      including archive, archival time and archived URI.  In other
802	      words, for open web archives it covers conversion of PWID URNs
803	      for: archive.org, archive-it.org, arquivo.pt, bibalex.org,
804	      nationalarchives.gov.uk, stanford.edu and vefsafn.is.  For the
805	      Danish web archive with restricted access, the prototype works
806	      locally accessing the CDX of the library, and providing access via
807	      a local proxy to a restricted environment.  The source code for
808	      this prototype is available from
809	      https://github.com/netarchivesuite/NAS-research/releases/
810	      tag/0.0.6.

812	      Automatic access of a referenced web resource may work on the open
813	      web for open web archive or in restricted environments for the web
814	      archives with restricted access.  There may be a need for varied
815	      operation depending on the available technology and applications,
816	      e.g.:

818	      *  Via locally installed browser plug-ins or applications forming
819	         http/https URIs as described above

821	      *  Via web research infrastructures
822	         this is a future solution scenario as a web archive research
823	         infrastructure do not yet exists.  However, it is a likely
824	         future scenario, as it is currently being proposed in the RESAW
825	         community [RESAW]

827	   Documentation:

829	      None relevant

831	   Additional Information:

833	      The PWID was originally suggested as a URI, where the suggestion
834	      was based on research between a computer science researcher with
835	      knowledge of web archiving and researchers from humanity subject
836	      (History and Literature).  This resulted in the paper "Persistent
837	      Web References - Best Practices and New Suggestions" [IPRES2016]
838	      from the iPres 2016 conference.  In this paper, the PWID is
839	      referred to as WPID.  However, one of the feedbacks has been a
840	      concern that WPID was interpreted as a PID related to a PID-
841	      system, e.g. as the DOI.  All though PID does not have a precise
842	      definition that makes it wrong to call it a "WPID.  The danger is
843	      that it is confused with PID systems, which is not the intension.
844	      Consequently, this suggestion names the PWID instead.

846	      The comments on the drafted PWID URI ([DraftPwidUri]) has been
847	      that is seems to be a URN rather than a URI.  Which is the reason
848	      why it is now suggested as a URN.

850	      At the RESAW 2017 conference there are two related papers: One on
851	      referencing practices [ResawRef] and one on research data
852	      management practices [ResawColl].  This practice is also planned
853	      to be used for Danish web collections.

855	      The interest for this new PWID has already been shown.  There was
856	      a lot of response at iPRES.  Especially at the RESAW 2017
857	      conference, web researchers from digital humanities have expressed
858	      strong interest in the PWID, since it can fill a gap and make it
859	      possible for them to make all the references they need to make.

861	      Therefore, the ambition is to make the PWID URN namespace
862	      definition a constituent part of a standard being developed in the
863	      IETF or some other recognized standards body.

865	      At iPRES 2018, the PWID URN was presented in a digital poster,
866	      which had a lot of interest around it, and it won the "best
867	      poster" award [IPRES2018].  A more researcher-oriented version of
868	      this poster has been accepted to iDCC 2019.

870	   Revision Information:

872	      This is the fourth version of PWID as a URN, where remarks from
873	      the URN PWID reviews have been incorporated.  This large covers
874	      the following:

876	      *  It has been more clear clear that the PWID URN is a needed
877	         supplement to existing standards (especially in Abstract and
878	         Introduction of RFC, as well as Purpose of URN template)

880	      *  It has been made more clear that the PWID URN also can be used
881	         as basis for search of resources that has become unavailable
882	         (especially in the Introduction of RFC, as well as Purpose and
883	         Resolution sections of URN template)

885	      *  The Introduction section of the RFC and the Purpose section of
886	         the URN template has been aligned.

888	      *  'Coverage' has been renamed to 'precision' and it has been
889	         explain in much more details (especially in the Syntax,
890	         Assignment and Resolution sections)

892	      *  Use of the term "ambiguity" have been rephrased in order to be
893	         more correct

895	      *  'archival-time' and 'URI' have been decribed in more details
896	         and more correctly (in the Syntax section)

898	      *  Description of Assignment has been expanded to provived more
899	         thorough and precise description (in the Assignment section)

901	      *  Description of Resolution has been expanded to provived more
902	         thorough and precise description (in the Resolution section)

904	      *  The Interoperability descriptions have been adjusted to reflect
905	         the descrions in the Syntax section (in the Interoperability
906	         section)

908	      Furthermore the Security and Privacy section has been edited in
909	      order to become more clear, and the Additional Information section
910	      has been extended with mentioning of the price winning iPRES 2018
911	      poster and coming iDCC 2019 poster.

913	3.  Acknowledgements

915	   A special thanks to Caroline Nyvang and Thomas Kromann who have
916	   contributed to the research identifying the minimum information
917	   required in a persistent web reference, and to Bolette Jurik
918	   contributed with supplementary research concerning requirements for
919	   web collection/corpora definitions.  Also thanks to all that have
920	   contributed to this work with the research and reviewing this RFC.

922	4.  References

924	4.1.  Normative References

926	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
927	              Requirement Levels", BCP 14, RFC 2119,
928	              DOI 10.17487/RFC2119, March 1997,
929	              <https://www.rfc-editor.org/info/rfc2119>.

931	   [RFC3339]  Klyne, G. and C. Newman, "Date and Time on the Internet:
932	              Timestamps", RFC 3339, DOI 10.17487/RFC3339, July 2002,
933	              <https://www.rfc-editor.org/info/rfc3339>.

935	   [RFC3986]  Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
936	              Resource Identifier (URI): Generic Syntax", STD 66,
937	              RFC 3986, DOI 10.17487/RFC3986, January 2005,
938	              <https://www.rfc-editor.org/info/rfc3986>.

940	   [RFC5234]  Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax
941	              Specifications: ABNF", STD 68, RFC 5234,
942	              DOI 10.17487/RFC5234, January 2008,
943	              <https://www.rfc-editor.org/info/rfc5234>.

945	   [RFC8141]  Saint-Andre, P. and J. Klensin, "Uniform Resource Names
946	              (URNs)", RFC 8141, DOI 10.17487/RFC8141, April 2017,
947	              <https://www.rfc-editor.org/info/rfc8141>.

949	4.2.  Informative References

951	   [DOI]      International DOI Foundation, "The DOI System", 2016,
952	              <https://web.archive.org/web/20161020222635/
953	              https:/www.doi.org/>.

955	              urn:pwid:archive.org:2016-10-20T22:26:35:site:https://www.
956	              doi.org/

958	   [DraftPwidUri]
959	              Zierau, E., "DRAFT: Scheme Specification for the pwid URI,
960	              version 4", June 2018, <https://datatracker.ietf.org/doc/
961	              draft-pwid-uri-specification/>.

963	   [IPRES2016]
964	              Zierau, E., Nyvang, C., and T. Kromann, "Persistent Web
965	              References - Best Practices and New Suggestions", October
966	              2016, <http://www.ipres2016.ch/frontend/organizers/media/
967	              iPRES2016/_PDF/
968	              IPR16.Proceedings_4_Web_Broschuere_Link.pdf>.

970	              In: proceedings of the 13th International Conference on
971	              Preservation of Digital Objects (iPres) 2016, pp. 237-246

973	   [IPRES2018]
974	              Zierau, E., "Precise and Persistent Web Archive References
975	              - Status, context and expected progress of the PWID",
976	              September 2018", September 2018.

978	              In: proceedings of the 15th International Conference on
979	              Preservation of Digital Objects (iPres) 2018

981	   [ISO28500]
982	              International Organization for Standardization,
983	              "Information and documentation -- WARC file format", 2017,
984	              <https://www.iso.org/standard/68004.html>.

986	   [ISO8601]  International Organization for Standardization, "Data
987	              elements and interchange formats -- Information
988	              interchange -- Representation of dates and times", 2004,
989	              <https://www.iso.org/standard/40874.html>.

991	   [MEMENTO]  Memento Development Group, "About the Memento Project",
992	              January 2015, <http://mementoweb.org/about/>.

994	              urn:pwid:archive.org:2018-11-
995	              01T15:26:28Z:page:http://mementoweb.org/about/

997	   [PWIDprovider]
998	              Royal Danish Library (Netarkivet), "SolrWayback 3.1",
999	              2018, <https://github.com/netarchivesuite/solrwayback>.

1001	              urn:pwid:archive.org:2018-06-
1002	              11T02:00:05Z:page:https://github.com/netarchivesuite/
1003	              solrwayback

1005	   [PWIDresolver]
1006	              Royal Danish Library (Netarkivet), "Date and Time Formats:
1007	              note submitted to the W3C. 15 September 1997", 2018,
1008	              <https://github.com/netarchivesuite/NAS-research/releases/
1009	              tag/0.0.6>.

1011	              urn:pwid:archive.org:2018-07-
1012	              16T06:53:51Z:page:https://github.com/netarchivesuite/NAS-
1013	              research/releases/tag/0.0.6

1015	   [RESAW]    The Resaw Community, "A Research infrastructure for the
1016	              Study of Archived Web materials", 2017,
1017	              <https://web.archive.org/web/20170529113150/
1018	              http://resaw.eu/>.

1020	              pwid:archive.org:2017-05-29T11:31:50Z:site:http://resaw.eu
1021	              /

1023	   [ResawColl]
1024	              Jurik, B. and E. Zierau, "Data Management of Web archive
1025	              Research Data", 2017,
1026	              <https://archivedweb.blogs.sas.ac.uk/files/2017/06/
1027	              RESAW2017-JurikZierau-
1028	              Data_management_of_web_archive_research_data.pdf>.

1030	              In: proceedings of the RESAW 2017 Conference, DOI:
1031	              10.14296/resaw.0002

1033	   [ResawRef]
1034	              Nyvang, C., Kromann, T., and E. Zierau, "Capturing the Web
1035	              at Large - a Critique of Current Web Referencing
1036	              Practices", 2017,
1037	              <https://archivedweb.blogs.sas.ac.uk/files/2017/06/
1038	              RESAW2017-NyvangKromannZierau-
1039	              Capturing_the_web_at_large.pdf>.

1041	              In: proceedings of the RESAW 2017 Conference, DOI:
1042	              10.14296/resaw.0004

1044	   [W3CDTF]   W3C, "Date and Time Formats: note submitted to the W3C. 15
1045	              September 1997", 1997,
1046	              <http://www.w3.org/TR/NOTE-datetime>.

1048	              W3C profile of ISO 8601 urn:pwid:archive.org:2017-04-
1049	              03T03:37:42Z:page:http://www.w3.org/TR/NOTE-datetime

1051	Author's Address

1053	   Eld Maj-Britt Olmuetz Zierau (editor)
1054	   Royal Danish Library
1055	   Soeren Kierkegaards Plads 1
1056	   Copenhagen  1219
1057	   Denmark

1059	   Phone: +45 9132 4690
1060	   Email: elzi@kb.dk