idnits 2.17.1 

draft-pwid-urn-specification-03.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack a Security Considerations section.

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  == There are 1 instance of lines with non-RFC2606-compliant FQDNs in the
     document.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == The document doesn't use any RFC 2119 keywords, yet seems to have RFC
     2119 boilerplate text.

  -- The document date (July 16, 2018) is 2104 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

     No issues found here.

     Summary: 2 errors (**), 0 flaws (~~), 3 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Internet Engineering Task Force                           E. Zierau, Ed.
3	Internet-Draft                                      Royal Danish Library
4	Intended status: Informational                             July 16, 2018
5	Expires: January 17, 2019

7	            A Persistent Web IDentifier (PWID) URN Namespace
8	                    draft-pwid-urn-specification-03

10	Abstract

12	   This document specifies a Uniform Resource Name (URN) for Persistent
13	   Web IDentifiers to web material in web archives using the 'pwid'
14	   namespace identifier.  The purpose of the standard is to support
15	   general exact referencing method which includes support for
16	   references to archives with restricted access, for exact references
17	   to existing web material, and for exact specification of elements in
18	   a web corpus (possibly spanning over several web archives).  The PWID
19	   URN therefore offers a scheme to make references that are not
20	   currently supported.

22	   The PWID is designed for researchers and therefore it is designed as
23	   general, global, sustainable, humanly readable, technology agnostic,
24	   persistent and precise web references for web materials in web
25	   archives, and in a way that can make them potentially resolvable.

27	Status of This Memo

29	   This Internet-Draft is submitted in full conformance with the
30	   provisions of BCP 78 and BCP 79.

32	   Internet-Drafts are working documents of the Internet Engineering
33	   Task Force (IETF).  Note that other groups may also distribute
34	   working documents as Internet-Drafts.  The list of current Internet-
35	   Drafts is at https://datatracker.ietf.org/drafts/current/.

37	   Internet-Drafts are draft documents valid for a maximum of six months
38	   and may be updated, replaced, or obsoleted by other documents at any
39	   time.  It is inappropriate to use Internet-Drafts as reference
40	   material or to cite them other than as "work in progress."

42	   This Internet-Draft will expire on January 17, 2019.

44	Copyright Notice

46	   Copyright (c) 2018 IETF Trust and the persons identified as the
47	   document authors.  All rights reserved.

49	   This document is subject to BCP 78 and the IETF Trust's Legal
50	   Provisions Relating to IETF Documents
51	   (https://trustee.ietf.org/license-info) in effect on the date of
52	   publication of this document.  Please review these documents
53	   carefully, as they describe your rights and restrictions with respect
54	   to this document.  Code Components extracted from this document must
55	   include Simplified BSD License text as described in Section 4.e of
56	   the Trust Legal Provisions and are provided without warranty as
57	   described in the Simplified BSD License.

59	Table of Contents

61	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
62	     1.1.  Requirements Language . . . . . . . . . . . . . . . . . .   4
63	   2.  Namespace Registration Template . . . . . . . . . . . . . . .   4
64	   3.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  15
65	   4.  References  . . . . . . . . . . . . . . . . . . . . . . . . .  15
66	     4.1.  Normative References  . . . . . . . . . . . . . . . . . .  15
67	     4.2.  Informative References  . . . . . . . . . . . . . . . . .  16
68	   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . .  18

70	1.  Introduction

72	   The purpose of the PWID URN is to represent general, global,
73	   sustainable, humanly readable, technology agnostic, persistent and
74	   precise web archive resource references in a way that;

76	   o  can be used for technical solutions e.g. to make them resolvable

78	   o  can cover references to all sorts of materials in web archives

80	   o  can cover references to materials from all sort of web archives

82	   The motivation for defining a PWID namespace is the growing challenge
83	   of references to archived web resources, which the PWID as a URN can
84	   assist in overcoming.  The standard is needed to address web
85	   materials meeting precision and persistency issues on par precision
86	   in with traditional references for analogue material.  Furthermore,
87	   it is needed in order to address web archive resources that are not
88	   freely available online.  The PWID URN covers both referencing of web
89	   resources from research papers and definition of web collection/
90	   corpus.  In detail the challenges are:

92	   o  Citation guidelines generally do not cover general and persistent
93	      referencing techniques for web resources that are not registered
94	      by Persistent Identifier systems (like DOI [DOI]).  However, an
95	      increasing number of references point to resources that only exist
96	      on the web, e.g. blogs that turned out to have a historical
97	      impact.  In order to obtain persistency for a reference, the
98	      target need to be stable.  As the live web is 'alive' and in
99	      constant change, persistency can only be obtained by referring to
100	      archived snapshots of the web.  The PWID URN is therefore focused
101	      on referencing archived web material in a technology agnostic way
102	      (research documented in [IPRES] and [ResawRef]).

104	   o  There are many new initiatives for web archive referencing, - most
105	      of them are centralised solutions which offers harvest and
106	      referencing, but these cannot be used for existing materials in
107	      web archives.  Other initiatives only cover open web archives,
108	      which does not cover material in archives with restricted access
109	      and where there is a risk of imprecision if a resource in an
110	      alternative archive is the result of resolving such a resource.
111	      The PWID URN is needed in order to fill these gaps where other
112	      techniques are not sufficient.

114	   o  There are many different requirements for construction of
115	      collection definitions for web material besides precision and
116	      persistency.  Recent research have found that various legal and
117	      sustainability issues leads to a need for a collection to be
118	      defined by references to the web parts in the collection.  The
119	      PWID URN is needed in such definitions in order to fulfil these
120	      requirements and to enable a collection to cover web materials
121	      from more archives (Research documented in [ResawColl]).

123	   The PWID is especially useful for web material where precision is in
124	   focus and/or there are references to materials from web archives
125	   requiring special grants in order to gain access.  The precision
126	   regards both regards precise reference where there can be no doubt
127	   about that you have the correct web material as well as precision
128	   about what is actually referred by the reference (e.g. is it the page
129	   or the whole website)

131	   Furthermore the PWID is very useful in specification of contents of a
132	   web collection (also known as web corpus).  Definitions of web
133	   collections are often needed for extraction of data used in
134	   production of research results, e.g. for evaluations in the future.
135	   Current practices today are not persistent as they often use some CDX
136	   version, which vary for different implementations.

138	   For the sake of usability and sustainability, the definition of the
139	   PWID URN is focused on only having the minimum required information
140	   to make a precise identification of a resource in an arbitrary web
141	   archive.  Resent research have found that this is obtain by the
142	   following information [ResawRef]:

144	   o  Identification of web archive
145	   o  Identification of source:

147	      *  Archived URI or identifier

149	      *  Archival timestamp

151	   o  Intended coverage (page, part, subsite etc.)

153	   The PWID URN represents this information in an unambiguous way, and
154	   thus enabling technical solutions to be defined in this URN.

156	1.1.  Requirements Language

158	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
159	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
160	   document are to be interpreted as described in [RFC2119].

162	2.  Namespace Registration Template

164	   Namespace Identifier:

166	      PWID

168	   Version:

170	      3

172	   Date:

174	      2018-07-13

176	   Registrant:

178	      Eld Maj-Britt Olmuetz Zierau
179	      Royal Danish Library
180	      Soeren Kierkegaards Plads 1
181	      1219 Copenhagen
182	      Denmark
183	      ph: +45 9132 4690
184	      email: elzi@kb.dk

186	   Purpose:

188	      The purpose of the PWID URN is to represent general, global,
189	      sustainable, humanly readable, technology agnostic, persistent and
190	      precise web archive resource references in a way that:

192	      *  can be used for technical solutions e.g. to make them
193	         resolvable

195	      *  can cover references to all sorts of materials in web archives

197	      *  can cover references to materials from all sort of web archives

199	      The motivation for defining a PWID namespace is the growing
200	      challenge of references to archived web resources, which the PWID
201	      as a URN can assist in overcoming.  The standard is needed to
202	      address web materials meeting precision and persistency issues on
203	      par precision in with traditional references for analogue
204	      material.  Furthermore, it is needed in order to address web
205	      archive resources that are not freely available online.  This
206	      regards both referencing of web resources from research papers and
207	      definition of web collection/corpus.  In detail the challenges
208	      are:

210	      *  Citation guidelines generally do not cover general and
211	         persistent referencing techniques for web resources that are
212	         not registered by Persistent Identifier systems (like DOI
213	         [DOI]).  However, an increasing number of references point to
214	         resources that only exist on the web, e.g. blogs that turned
215	         out to have a historical impact.  In order to obtain
216	         persistency for a reference, the target need to be stable.  As
217	         the live web is 'alive' and in constant change, persistency can
218	         only be obtained by referring to archived snapshots of the web.
219	         The PWID URN is therefore focused on referencing archived web
220	         material in a technology agnostic way (research documented in
221	         [IPRES] and [ResawRef]).

223	      *  There are many new initiatives for web archive referencing, -
224	         most of them are centralised solutions which offers harvest and
225	         referencing, but these cannot be used for existing materials in
226	         web archives.  Other initiatives only cover open web archives,
227	         which does not cover material in archives with restricted
228	         access and where there is a risk of imprecision if a resource
229	         in an alternative archive is the result of resolving such a
230	         resource.  The PWID URN is needed in order to fill these gaps
231	         where other techniques are not sufficient.

233	      *  There are many different requirements for construction of
234	         collection definitions for web material besides precision and
235	         persistency.  Recent research have found that various legal and
236	         sustainability issues leads to a need for a collection to be
237	         defined by references to the web parts in the collection.  The
238	         PWID URN is needed in such definitions in order to fulfil these
239	         requirements and to enable a collection to cover web materials
240	         from more archives (research documented in [ResawColl]).

242	      The PWID is especially useful for web material where precision is
243	      in focus and/or there are references to materials from web
244	      archives requiring special grants in order to gain access.  The
245	      precision regards both regards precise reference where there can
246	      be no doubt about that you have the correct web material as well
247	      as precision about what is actually referred by the reference
248	      (e.g. is it the page or the whole website)

250	      Furthermore the PWID is very useful in specification of contents
251	      of a web collection (also known as web corpus).  Definitions of
252	      web collections are often needed for extraction of data used in
253	      production of research results, e.g. for evaluations in the
254	      future.  Current practices today are not persistent as they often
255	      use some CDX version, which vary for different implementations.

257	      Strict unambiguous syntax is needed for the PWID reference in
258	      order to ensure that it can be used for computational purposes.
259	      This is relevant for web collection definitions, which will need a
260	      strict syntax in order to be a basis for automatic extraction.
261	      Furthermore, readers of research papers are today expecting to be
262	      able to access a referenced resource by clicking an actionable
263	      URI, therefore a similar facility will be expected for references
264	      to available archived web material, which strict syntax can make
265	      possible.  Examples of technical solutions that is enabled by
266	      strict are:

268	      *  resolving of a references and automatic extraction of web
269	         collection defined by PWID URNs [ResawRef] [ResawColl]

271	      *  Resolving of a PWID reference by resolving services.  As a
272	         start, there is work on a prototype that can work for the
273	         Danish web archive data and open web archives with standard
274	         patterns for the current technologies.  There may come
275	         different implementations for resolving which may rely on
276	         different protocols and application

278	      The purpose of the PWID is also to express a web archive reference
279	      as simple as possible and at the same time meeting requirements
280	      for sustainability, usability and scope.  Therefore, the PWID URN
281	      is focused on only having the minimum required information to make
282	      a precise identification of a resource in an arbitrary web
283	      archive.  Resent research have found that this is obtain by the
284	      following information [ResawRef]:

286	      *  Identification of web archive
287	      *  Identification of source:

289	         +  Archived URI or identifier

291	         +  Archival timestamp

293	      *  Intended coverage (page, part, subsite etc.)

295	      The PWID URN represents this information in an unambiguous way,
296	      and thus enabling technical solutions to be defined in this URN.

298	   Syntax:

300	      The syntax of the PWID URN is specified below in Augmented Backus-
301	      Naur Form (ABNF) [RFC5234] and it conforms to URN syntax defined
302	      in RFC 8141 [RFC8141].  The syntax definition of the PWID URN is:

304	           pwid-urn = "urn" ":" pwid-NID ":" pwid-NSS

306	           pwid-NID = "pwid"
307	           pwid-NSS = archive-id ":" archival-time ":" coverage-spec
308	                               ":" archived-item

310	           archive-id = +( unreserved )

312	           archival-time = full-date datetime-delim full-pwid-time
313	           datetime-delim = "T"
314	           full-pwid-time = time-hour [":"] time-minute
315	                                     [":"] time-second "Z"

317	           coverage-spec = "part" / "page" / "subsite" / "site"
318	                    / "collection" / "recording" / "snapshot"
319	                    / "other"

321	           archived-item = URI / archived-item-id
322	           archived-item-id = +( unreserved )

324	      where

326	      *  'unreserved' is defined as in RFC 3986 [RFC3986]

328	      *  'coverage-spec' values are not case sensitive (i.e.  "PAGE" /
329	         "PART" / "PaGe" / ... are valid values as well.)

331	      *  'archival-time' is a UTC timestamp conforming to the W3C
332	         profile ISO 8601 [ISO8601] (also defined in RFC 3339
333	         [RFC3339]), with a few exception.  It has to be a UTC timestamp
334	         in order to conform with web archiving practices, which always
335	         uses UTC in order to avoid confusions.  The 'full-date' is
336	         defined as in RFC 3339 [RFC3339].  The 'archival-time' must
337	         represent the time specified in the archive, and can therefore
338	         be specified at any of the levels of granularity as described
339	         in [W3CDTF] and in accordance with teh WARC standard ISO 28500
340	         [ISO28500].

342	         In line with RFC 3339 [RFC3339] the "T" may alternatively be
343	         lower case "t".

345	         'time-hour', 'time-minute' and 'time-second' are defined as in
346	         RFC 3339 [RFC3339].

348	         In line with RFC 3339 [RFC3339] the "Z" may alternatively be
349	         lower case "z".

351	      *  'URI' is defined as in RFC 3986 [RFC3986]

353	      The 'coverage-spec' defines the type of archived item, serving as
354	      a precision to what is referred:

356	      *  part
357	         the single archived element, e.g. a pdf, a html text, an image

359	      *  page
360	         the full context as a page, e.g. a html page with referred
361	         images

363	      *  subsite
364	         the full context as a subsite within its domain, e.g. a
365	         document represented in a web structure

367	      *  site
368	         the full context as a site within its domain

370	      *  collection
371	         a collection/corpora definition, e.g. defined as descibed in
372	         [ResawColl]

374	      *  snapshot
375	         a snapshot (image) representation of web material, e.g. a web
376	         page

378	      *  recording
379	         a recording of a web browsing

381	      *  other
382	         if something else

384	   Assignment:

386	      The PWID URNs does not have to be assigned by an authority, as
387	      they are based on the information created at the time of
388	      archiving:

390	      *  Identification of web archive

392	      *  Identification of source:

394	         +  Archived URI or identifier

396	         +  Archival timestamp

398	      *  Intended coverage (page, part, subsite etc.)

400	      The rest of the PWID URN

402	      *  Intended coverage (page, part, subsite etc.)

404	      is specifying what the user of the PWID URN wants to be focused on
405	      - and may later be used for how a resource is displayed.  However
406	      it is not part of the actual location of the resource.

408	      In other words: the PWID URNs are created independently, but
409	      following an algorithm that itself guarantees uniqueness.

411	      In this version of the standard, it is recommededto use the web
412	      domain as the identifier for the web archive.  This is
413	      recommended, since it currently implicitly provides information
414	      about the web archive.  Furthermore, it is more precise than e.g.
415	      the name of the archive, since there may be more than one
416	      installation of web archives in the same organisation, e.g.
417	      archive.org and archive-it.org are both covered by Internet
418	      Archive.

420	      Currently, there is also a prototype for a SOLR-Wayback tool
421	      (Source at https://github.com/netarchivesuite/solrwayback)
422	      [PWIDprovider], which can assist in finding the most precise
423	      reference to an archived web page by provideing all PWIDs belongig
424	      to it.  For example, in archive: netarkivet.dk, archived URI:
425	      http://www.susanlegetoej.dk/shop/handskedyr-siameser-killing-
426	      8681p.html archiving time: 2008-11-29 01:19:16 UTC, [web page],
427	      has the parts:

429	         urn:pwid:netarkivet.dk:2008-11-
430	         29T00:41:42Z:part:http://www.susanlegetoej.dk/images/ddcss/
431	         SK113_Master_NF.css
432	         urn:pwid:netarkivet.dk:2008-11-
433	         29T00:39:47Z:part:http://www.susanlegetoej.dk/shop/css/
434	         print.css

436	         urn:pwid:netarkivet.dk:2008-11-
437	         29T00:40:06Z:part:http://www.susanlegetoej.dk/images/ddcss/
438	         SK113_Basket_NF.css

440	         urn:pwid:netarkivet.dk:2008-11-
441	         29T00:40:00Z:part:http://www.susanlegetoej.dk/images/ddcss/
442	         SK113_TopMenu_NF.css

444	         urn:pwid:netarkivet.dk:2008-11-
445	         29T00:40:00Z:part:http://www.susanlegetoej.dk/images/ddcss/
446	         SK113_SearchPage_NF.css

448	         urn:pwid:netarkivet.dk:2008-11-
449	         29T00:40:35Z:part:http://www.susanlegetoej.dk/images/ddcss/
450	         SK113_Productmenu_NF.css

452	         urn:pwid:netarkivet.dk:2008-11-
453	         29T00:40:22Z:part:http://www.susanlegetoej.dk/images/ddcss/
454	         SK113_SpaceTop_NF.css

456	         urn:pwid:netarkivet.dk:2008-11-
457	         29T00:40:24Z:part:http://www.susanlegetoej.dk/images/ddcss/
458	         SK113_SpaceLeft_NF.css

460	         urn:pwid:netarkivet.dk:2008-11-
461	         29T00:40:23Z:part:http://www.susanlegetoej.dk/images/ddcss/
462	         SK113_SpaceBottom_NF.css

464	         urn:pwid:netarkivet.dk:2008-11-
465	         29T00:40:25Z:part:http://www.susanlegetoej.dk/images/ddcss/
466	         SK113_SpaceRight_NF.css

468	         urn:pwid:netarkivet.dk:2008-11-
469	         29T00:37:23Z:part:http://www.susanlegetoej.dk/images/ddcss/
470	         SK113_ProductInfo_NF.css

472	         urn:pwid:netarkivet.dk:2008-11-
473	         29T00:37:24Z:part:http://www.susanlegetoej.dk/Shop/js/
474	         Variants.js

476	         urn:pwid:netarkivet.dk:2009-03-
477	         03T11:53:00Z:part:http://www.susanlegetoej.dk/Shop/js/Media.js
478	         urn:pwid:netarkivet.dk:2009-03-
479	         03T11:53:02Z:part:http://www.susanlegetoej.dk/images/design/
480	         print.gif

482	         urn:pwid:netarkivet.dk:2009-03-
483	         03T11:54:19Z:part:http://www.susanlegetoej.dk/Shop/js/Scroll.js

485	         urn:pwid:netarkivet.dk:2009-03-
486	         03T11:54:09Z:part:http://www.susanlegetoej.dk/Shop/js/
487	         Shop5Common.js

489	         urn:pwid:netarkivet.dk:2006-11-
490	         20T20:16:03Z:part:http://www.susanlegetoej.dk/images/602551.jpg

492	      On long term, there should be created a registry that keeps track
493	      of identifiers of archives over time, since they are likely to
494	      change names, merge etc. when taking about a 100 year period.

496	   Security and Privacy:

498	      Security and privacy considerations are restricted to accessible
499	      web resources in web archives.  If resolvers to PWID URNs are
500	      created, there should be made an analysis of whether they can be
501	      restricted to the former mentioned registry of web archives.
502	      Security and privacy will then be a question of security and
503	      privacy considerations related to the web archive resources.

505	   Interoperability:

507	      This is covered by comments in the Syntax description:

509	      *  the PWID URN conforms to the URI standard defined as in RFC
510	         3986 [RFC3986] and the URN standard RFC 8141 [RFC8141]

512	      *  the 'archival-time' of the PWID URN conforms to the URI
513	         standard defined as in RFC 3986 [RFC3986]W3C profile ISO 8601
514	         [ISO8601] (also defined in RFC 3339 [RFC3339]) and to the WARC
515	         standard ISO 28500 [ISO28500] using UTC dates only

517	      *  the 'archived-item' is a URI which conforms to the URI standard
518	         defined as in RFC 3986 [RFC3986]

520	   Resolution:

522	      The information in a PWID URN can be used for locating a web
523	      archive resource, for any kind of web archive.  It includes the
524	      minimum information for web archive materials, which enables
525	      resolvability, manually or by a resolver. esolution of a PWID URN
526	      is the primary motivation of making a formal URN definition,
527	      instead of just textual representation of the for needed parts of
528	      a PWID.

530	      A resolving service is currently available in form of code for a
531	      prototype which run at the Royal Danish Library [PWIDresolver] and
532	      is planned to be more broudly available or can be installed
533	      locally.  This service currently covers bothe the Danish web
534	      archives (with the proper rights) and open web archives with
535	      access sevices based on a patterns including archive, archival
536	      time and archived URI.  In other words, for open web archives it
537	      covers conversion of PWIDs for: archive.org, archive-it.org,
538	      arquivo.pt, bibalex.org, nationalarchives.gov.uk, stanford.edu and
539	      vefsafn.is.  The source code for this prototyppe is available from
540	      https://github.com/netarchivesuite/NAS-research/releases/
541	      tag/0.0.6.

543	      Resolution (manually or automatically) is done based on the PWID
544	      parts:

546	      *  Web archive identification
547	         to find the archive holding the material

549	      *  Archived URI or identifier of item
550	         as part of identifying the material

552	      *  Date and time associated with the archived URI/item
553	         as part of precise identification of the material

555	      *  Coverage of what is referred
556	         as part of clarification of what the referred material covers
557	         (page, part etc.)

559	      in the following the different resolution techniques are explained
560	      (manual as well as via a service) An example of a PWID URN is:

562	         urn:pwid:archive.org:2016-01-22T11:20:29Z:page:http://www.dr.dk

564	      has the information:

566	      *  archive.org
567	         currently known identifier in form of the Internet Archive
568	         domian name for their open access web archive

570	      *  2016-01-22T11:20:29Z
571	         UTC date and time associated with the archived URI

573	      *  page
574	         clarification that the reference cover the full web page with
575	         all its inherited parts selected by the web archive

577	      *  http://www.dr.dk
578	         archived URI of item

580	      With knowledge of the current (2017) Internet Archive open access
581	      web interface having the form:

583	         https://web.archive.org/web/<time>/<uri>

585	      We can manually (or technically) deduce an actual (current 2017)
586	      access https address:

588	         https://web.archive.org/web/20160122112029/http://www.dr.dk

590	      and regard the referred web part as the reference in the way that
591	      the content coverage specifies, i.e. for a webpage the value
592	      'part' would mean the html of the web page, 'webpage' would mean
593	      the resoult of the web archive rendering the referred html as a
594	      web page etc.

596	      The same recipe can be used for other Wayback platforms - and
597	      possibly also other web archive access tools platforms, as the
598	      crucial information is date and URI, which are requested to be
599	      looked up in a specified archive.

601	      Note that this also includes access to archives that are only
602	      accessible via a local proxy to a restricted environment (which
603	      the current prototype does for references to the Danish
604	      Netarkivet).  Here the difference is that the archive information
605	      is used to identify the local environment used (possibly on-site)
606	      and then construct local http/https address based on knowledge
607	      from the local access installation.

609	      Automatic access of a referenced web resource may work on the open
610	      net for open web archive or in restricted environments for the web
611	      archives with restricted access.  There may be a need for varied
612	      operation depending on the available technology and applications,
613	      e.g.:

615	      *  Via locally installed browser plug-ins or applications forming
616	         http/https URIs:

618	         +  http/https URIs for standard web archive interfaces
619	            At this stage there are initiatives on streamlined and
620	            standardize APIs to web archives interfaces, - and in case
621	            such APIs will be implemented generally, it may be used for
622	            resolving of the PWID URNs.  This could be on form (denoting
623	            pwid parts in <> using syntax names):

625	               https://<archive-id>/pwid?time=<archival-
626	               time>&coverage=<coverage-spec>&item=<archived-item>

628	            The example from previous section would then resolve by

630	               https://archive.org/pwid?time=2016-01-22T11:20:29Z&covera
631	               ge=page&item=http://www.dr.dk

633	         +  http/https URIs for archive material for individual web
634	            archives
635	            Using the current open access http/https address pattern for
636	            the individual web archives, which for the example is

638	               https://web.archive.org/web/20160122112029/
639	               http://www.dr.dk

641	            This would require a registry of the different patterns for
642	            the individual web archives

644	      *  Via web research infrastructures
645	         this is a future solution scenario as a web archive research
646	         infrastructure do not yet exists.  However, it is a likely
647	         future scenario, as it is currently being proposed in the RESAW
648	         community [RESAW].  The PWID URN resolving could in such cases
649	         be a question of starting a special application, as for the
650	         'mailto' scheme RFC 6068 [RFC6068].

652	   Documentation:

654	      None relevant

656	   Additional Information:

658	      The PWID was originally suggested as a URI based on research
659	      between a computer science researcher with know of web archiving
660	      and researchers from humanity subject (History and Literature).
661	      This resulted in the paper "Persistent Web References - Best
662	      Practices and New Suggestions" [IPRES] from the iPres 2016
663	      conference.  In this paper the PWID is referred to as WPID.
664	      However, one of the feedbacks has been a concern that WPID was
665	      interpreted as a PID related to a PID-system, e.g. as the DOI.
666	      All though PID does not have a precise definition that makes it
667	      wrong to call it a "WPID.  The danger is that it is confused with
668	      PID systems, which is not the intension.  Consequently, this
669	      suggestion names the PWID instead.

671	      The comments on the drafted PWID URI ([DraftPwidUri]) has been
672	      that is seems to be a URN rather than a URI.  Which is the reason
673	      why it is now suggested as a URN, although there is a danger that
674	      users of the reference style can be confused by the the additional
675	      "urn:" prefix.

677	      At the RESAW 2017 conference there are two related papers: One on
678	      referencing practices [ResawRef] and one on research data
679	      management practices [ResawColl].  This practice is also planned
680	      to be used for Danish web collections.

682	      The interest for this new PWID has already been shown.  There was
683	      a lot of response at iPRES.  Especially at the RESAW 2017
684	      conference, web researchers from digital humanities have expressed
685	      strong interest in the PWID, since it can fill a gap and make it
686	      possible for them to make all the references they need to make.
687	      Therefore, the ambition is to make the PWID URN namespace
688	      definition a constituent part of a standard being developed in the
689	      IETF or some other recognized standards body.  The textual version
690	      of the PWID is also suggested in a textual form in a draft of the
691	      revision of the ISO 690 reference standard.

693	   Revision Information:

695	      This is the third version of PWID as a URN, where prototypes for
696	      resolving a PWID and getting PWIDs for a web page has been added
697	      and explained.  Furthermore, it has been made more clear where the
698	      PWID URN makes a difference and "closed archives" have to
699	      "archives with restricted access"

701	3.  Acknowledgements

703	   A special thanks to Caroline Nyvang and Thomas Kromann who have
704	   contributed to the research identifying the minimum information
705	   required in a persistent web reference, and to Bolette Jurik
706	   contributed with supplementary research concerning requirements for
707	   web collection/copora definitions.  Also thanks to all that have
708	   contributed to this work with the research and reviewing this RFC.

710	4.  References

712	4.1.  Normative References

714	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
715	              Requirement Levels", BCP 14, RFC 2119,
716	              DOI 10.17487/RFC2119, March 1997,
717	              <https://www.rfc-editor.org/info/rfc2119>.

719	   [RFC3339]  Klyne, G. and C. Newman, "Date and Time on the Internet:
720	              Timestamps", RFC 3339, DOI 10.17487/RFC3339, July 2002,
721	              <https://www.rfc-editor.org/info/rfc3339>.

723	   [RFC3986]  Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
724	              Resource Identifier (URI): Generic Syntax", STD 66,
725	              RFC 3986, DOI 10.17487/RFC3986, January 2005,
726	              <https://www.rfc-editor.org/info/rfc3986>.

728	   [RFC5234]  Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax
729	              Specifications: ABNF", STD 68, RFC 5234,
730	              DOI 10.17487/RFC5234, January 2008,
731	              <https://www.rfc-editor.org/info/rfc5234>.

733	   [RFC8141]  Saint-Andre, P. and J. Klensin, "Uniform Resource Names
734	              (URNs)", RFC 8141, DOI 10.17487/RFC8141, April 2017,
735	              <https://www.rfc-editor.org/info/rfc8141>.

737	4.2.  Informative References

739	   [DOI]      International DOI Foundation, "The DOI System", 2016,
740	              <https://web.archive.org/web/20161020222635/
741	              https:/www.doi.org/>.

743	              urn:pwid:archive.org:2016-10-20T22:26:35:site:https://www.
744	              doi.org/

746	   [DraftPwidUri]
747	              Zierau, E., "DRAFT: Scheme Specification for the pwid URI,
748	              version 4", June 2018, <https://datatracker.ietf.org/doc/
749	              draft-pwid-uri-specification/>.

751	   [IPRES]    Zierau, E., Nyvang, C., and T. Kromann, "Persistent Web
752	              References - Best Practices and New Suggestions", October
753	              2016, <http://www.ipres2016.ch/frontend/organizers/media/
754	              iPRES2016/_PDF/
755	              IPR16.Proceedings_4_Web_Broschuere_Link.pdf>.

757	              In: proceedings of the 13th International Conference on
758	              Preservation of Digital Objects (iPres) 2016, pp. 237-246

760	   [ISO28500]
761	              International Organization for Standardization,
762	              "Information and documentation -- WARC file format", 2017,
763	              <https://www.iso.org/standard/68004.html>.

765	   [ISO8601]  International Organization for Standardization, "Data
766	              elements and interchange formats -- Information
767	              interchange -- Representation of dates and times", 2004,
768	              <https://www.iso.org/standard/40874.html>.

770	   [PWIDprovider]
771	              Royal Danish Library (Netarkivet), "SolrWayback 3.1",
772	              2018, <https://github.com/netarchivesuite/solrwayback>.

774	              urn:pwid:archive.org:2018-06-
775	              11T02:00:05Z:page:https://github.com/netarchivesuite/
776	              solrwayback

778	   [PWIDresolver]
779	              Royal Danish Library (Netarkivet), "Date and Time Formats:
780	              note submitted to the W3C. 15 September 1997", 2018,
781	              <https://github.com/netarchivesuite/NAS-research/releases/
782	              tag/0.0.6>.

784	              urn:pwid:archive.org:2018-07-
785	              16T06:53:51Z:page:https://github.com/netarchivesuite/NAS-
786	              research/releases/tag/0.0.6

788	   [RESAW]    The Resaw Community, "A Research infrastructure for the
789	              Study of Archived Web materials", 2017,
790	              <https://web.archive.org/web/20170529113150/
791	              http://resaw.eu/>.

793	              pwid:archive.org:2017-05-29T11:31:50Z:site:http://resaw.eu
794	              /

796	   [ResawColl]
797	              Jurik, B. and E. Zierau, "Data Management of Web archive
798	              Research Data", 2017,
799	              <https://archivedweb.blogs.sas.ac.uk/files/2017/06/
800	              RESAW2017-JurikZierau-
801	              Data_management_of_web_archive_research_data.pdf>.

803	              In: proceedings of the RESAW 2017 Conference, DOI:
804	              10.14296/resaw.0002

806	   [ResawRef]
807	              Nyvang, C., Kromann, T., and E. Zierau, "Capturing the Web
808	              at Large - a Critique of Current Web Referencing
809	              Practices", 2017,
810	              <https://archivedweb.blogs.sas.ac.uk/files/2017/06/
811	              RESAW2017-NyvangKromannZierau-
812	              Capturing_the_web_at_large.pdf>.

814	              In: proceedings of the RESAW 2017 Conference, DOI:
815	              10.14296/resaw.0004

817	   [RFC6068]  Duerst, M., Masinter, L., and J. Zawinski, "The 'mailto'
818	              URI Scheme", RFC 6068, DOI 10.17487/RFC6068, October 2010,
819	              <https://www.rfc-editor.org/info/rfc6068>.

821	   [W3CDTF]   W3C, "Date and Time Formats: note submitted to the W3C. 15
822	              September 1997", 1997,
823	              <http://www.w3.org/TR/NOTE-datetime>.

825	              W3C profile of ISO 8601 urn:pwid:archive.org:2017-04-
826	              03T03:37:42Z:page:http://www.w3.org/TR/NOTE-datetime

828	Author's Address

830	   Eld Maj-Britt Olmuetz Zierau (editor)
831	   Royal Danish Library
832	   Soeren Kierkegaards Plads 1
833	   Copenhagen  1219
834	   Denmark

836	   Phone: +45 9132 4690
837	   Email: elzi@kb.dk