idnits 2.17.1 draft-pwid-urn-specification-09.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) == There are 1 instance of lines with non-RFC2606-compliant FQDNs in the document. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (September 5, 2019) is 1694 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- No issues found here. Summary: 2 errors (**), 0 flaws (~~), 3 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force E. Zierau, Ed. 3 Internet-Draft Royal Danish Library 4 Intended status: Informational September 5, 2019 5 Expires: March 8, 2020 7 A Persistent Web IDentifier (PWID) URN Namespace 8 draft-pwid-urn-specification-09 10 Abstract 12 This document specifies a Uniform Resource Name (URN) for Persistent 13 Web IDentifiers for web material in web archives using the 'pwid' 14 namespace identifier. 16 The main purpose of the standard is to support specification of 17 references that are not covered by other reference techniques: to 18 support references to material in web archives with restricted 19 access. Furthermore, it supports persistent technology agnostic 20 references to web archives in general, in a form that can work as an 21 algorithmic basis for finding web archive resources in general. An 22 additional important benefit is that the standard can be used for 23 specifying web collections, which can then form a persistent 24 computational basis for the extract of the archived collection parts. 26 The PWID URN is designed to meet requirements for proper referencing 27 needed by researchers. Therefore, it is designed as general, global, 28 sustainable, humanly readable, technology agnostic, persistent and 29 precise web references for web materials in web archives. 31 Status of This Memo 33 This Internet-Draft is submitted in full conformance with the 34 provisions of BCP 78 and BCP 79. 36 Internet-Drafts are working documents of the Internet Engineering 37 Task Force (IETF). Note that other groups may also distribute 38 working documents as Internet-Drafts. The list of current Internet- 39 Drafts is at https://datatracker.ietf.org/drafts/current/. 41 Internet-Drafts are draft documents valid for a maximum of six months 42 and may be updated, replaced, or obsoleted by other documents at any 43 time. It is inappropriate to use Internet-Drafts as reference 44 material or to cite them other than as "work in progress." 46 This Internet-Draft will expire on March 8, 2020. 48 Copyright Notice 50 Copyright (c) 2019 IETF Trust and the persons identified as the 51 document authors. All rights reserved. 53 This document is subject to BCP 78 and the IETF Trust's Legal 54 Provisions Relating to IETF Documents 55 (https://trustee.ietf.org/license-info) in effect on the date of 56 publication of this document. Please review these documents 57 carefully, as they describe your rights and restrictions with respect 58 to this document. Code Components extracted from this document must 59 include Simplified BSD License text as described in Section 4.e of 60 the Trust Legal Provisions and are provided without warranty as 61 described in the Simplified BSD License. 63 Table of Contents 65 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 66 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 5 67 2. Namespace Registration Template . . . . . . . . . . . . . . . 5 68 3. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 19 69 4. References . . . . . . . . . . . . . . . . . . . . . . . . . 19 70 4.1. Normative References . . . . . . . . . . . . . . . . . . 19 71 4.2. Informative References . . . . . . . . . . . . . . . . . 20 72 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 22 74 1. Introduction 76 The PWID URN is a supplement to existing reference standards, where 77 the PWID URN will support references to web archives, including areas 78 that are not supported today: support of references to material in 79 web archives with restricted access. Furthermore, the PWID URN 80 enables technology agnostic references to web archives in general, 81 which can be needed, for instance for references to dynamic web 82 material with frequent updates (e.g. a news site) or a specific 83 version of a web material (e.g. specific version of the DOI 84 handbook). 86 The PWID URN is in a form which can work as an algorithmic basis for 87 finding the resource. This also enables computation of archived web 88 parts to a collection from one or more web archives, if the 89 collection parts are specified by PWID URNs. 91 Furthermore, the PWID URN includes information about the resource 92 which makes it possible to find alternative resources, in cases where 93 the original precise resource has become unavailable. 95 The PWID URN is designed to be a persistent reference that is 96 general, global and technology agnostic in order to enhance its 97 chances of being sustainable. Furthermore, it is designed to be 98 humanly readable and with an ability to specify precision about what 99 the referenced web archive resource covers. This design enables a 100 PWID URN to: 102 o be used in technical solutions, e.g. to make them resolvable 104 o cover references to materials from all sorts of web archives 106 The motivation for defining a PWID namespace is the growing 107 challenges of references to archived web resources, and the PWID as a 108 URN can assist in overcoming a lot of these challenges. The standard 109 is needed to address web materials meeting precision and persistency 110 issues on par with precision in traditional references for analogue 111 material. Furthermore, it is needed in order to address web archive 112 resources that are not freely available online. The PWID URN covers 113 both referencing of web resources from research papers and definition 114 of web collections/corpora. In detail the challenges are: 116 o Persistent Identifier systems (like DOI [DOI]) will only cover 117 registered resources. In general, citation guidelines do not 118 cover general and persistent referencing techniques for web 119 resources that are not registered. However, an increasing number 120 of references point to resources that only exist on the web, e.g. 121 blogs that turn out to have a historical impact. In order to 122 obtain persistency for a reference, the target needs to be stable. 123 For non-registered web resources, the common rule is that the 124 resource will change, since the live-web is constantly changing. 125 Persistency can only be obtained by referring to something stable, 126 i.e. an archived version of the resource from the web. The PWID 127 URN is therefore focused on referencing archived web material in a 128 technology agnostic way (research documented in [IPRES2016] and 129 [ResawRef]). 131 o References to materials, which only exist in web archives (i.e. no 132 longer on the live web) are not well supported, especially not for 133 materials that only exists in archives with restricted access. 134 There are many new initiatives for web archive referencing, - most 135 of which are centralized solutions offering harvesting and 136 referencing, but these cannot be used for materials that only 137 exist in web archives. The PWID URN can be used for all web 138 archives, including web archives with restricted access. 140 o One of the referencing initiatives for open web archives uses URLs 141 which depend on the current setup of the web archive's access 142 platform. These URLs are usually technology and placement 143 dependent, and therefore such a reference style is not suited for 144 references that are important to retrace for a long period. The 145 PWID URN can be used for such reference purposes, since it is 146 technology agnostic. 148 o Another referencing initiative, for open web archives, is omitting 149 specification of the web archive where the resource was found. 150 This strategy is used in order to open the possibility of using 151 alternatives from other archives. However, this also adds a risk 152 of imprecision since different archives tend to have different 153 versions even when harvesting at the same time. Therefore, such a 154 reference style is not suited for references where it is important 155 that the reference is precisely the verified reference. The PWID 156 URN can provide an exact reference for where the reference was 157 validated. Additionally, the PWID contains the needed information 158 in order to search for alternative resource, if needed. 160 o For web collections/corpora (possibly across different web 161 archives), recent research have found that various legal and 162 sustainability issues has led to a need of a collection definition 163 of references to their web parts. Furthermore, there is a need 164 for a similar persistent referencing for all parts for calculation 165 and sustainability reasons. So far, there has been no stable 166 standard for definition of such collection parts. The PWID URN 167 can be used for such definitions in order to fulfil these 168 requirements (research documented in [ResawColl]). 170 The PWID URN is especially useful for web material where precision is 171 in focus and/or there are references to materials from web archives 172 requiring special permissions in order to gain access. The precision 173 regards two aspects. Firstly, pointing out the archive where the 174 resource was found and validated against its purpose (other archived 175 versions in other web archives may differ both regarding completeness 176 and contents even within short time periods). Secondly, specifying 177 whether the referred resource is a web page or a part in form of one 178 file. 180 The possibility of specifying the part/file precision enables the 181 PWID URN to be used in specification of contents of a web collection. 182 Definitions of web collections are often needed for extraction of 183 data used in production of research results, e.g. for future 184 evaluations. Current practices are not persistent as they often use 185 some CDX version, which vary for different implementations. 187 Strict syntax is needed for the PWID URN, in order to ensure that it 188 can act as a reference which can used for computational purposes. 189 This is especially relevant for automatic extraction of parts from 190 web collection definitions. Furthermore, today's readers of research 191 papers are expecting to be able to access a referenced resource by 192 clicking an actionable URI, therefore a similar possibility will be 193 expected for references to available archived web material, and this 194 is possible with a strict syntax. A prototype for resolving URN 195 PWIDs has been developed for the Danish web archive data and open web 196 archives with standard patterns for the current technologies. 197 Implementations for resolution of PWID URNs for other web archives 198 may be developed. 200 The purpose of the PWID URN is also to express a web archive 201 reference as simple as possible and at the same time meet the 202 requirements for sustainability, usability and scope. Therefore, the 203 PWID URN is focused on having only the minimum required information 204 to make a precise identification of a resource in an arbitrary web 205 archive. Recent research have shown that this can be obtained by the 206 following information [ResawRef]: 208 o Identification of web archive 210 o Identification of source: 212 * Archived URI or identifier 214 * Archival timestamp 216 o Intended precision (page, part/file) 218 The PWID URN represents this information in a human readable way as 219 well as a well-defined way that enables technical solutions to 220 interpret the URN. 222 1.1. Requirements Language 224 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 225 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 226 document are to be interpreted as described in [RFC2119]. 228 2. Namespace Registration Template 230 Namespace Identifier: 232 PWID 234 Version: 236 1 238 Date: 240 2019-09-06 242 Registrant: 244 Eld Maj-Britt Olmuetz Zierau 245 Royal Danish Library 246 Soeren Kierkegaards Plads 1 247 1219 Copenhagen 248 Denmark 249 ph: +45 9132 4690 250 email: elzi@kb.dk 252 Purpose: 254 The PWID URN is a supplement to existing reference standards, 255 where the PWID URN will support references to web archives, 256 including areas that are not supported today: support of 257 references to material in web archives with restricted access. 258 Furthermore, the PWID URN enables technology agnostic references 259 to web archives in general, which can be needed, for instance for 260 references to dynamic web material with frequent updates (e.g. a 261 news site) or a specific version of a web material (e.g. specific 262 version of the DOI handbook). 264 The PWID URN is in a form which can work as an algorithmic basis 265 for finding the resource. This also enables computation of 266 archived web parts to a collection from one or more web archives, 267 if the collection parts are specified by PWID URNs. 269 Furthermore, the PWID URN includes information about the resource 270 which makes it possible to find alternative resources, in cases 271 where the original precise resource has become unavailable. 273 The PWID URN is designed to be a persistent reference that is 274 general, global and technology agnostic in order to enhance its 275 chances of being sustainable. Furthermore, it is designed to be 276 humanly readable and with an ability to specify precision about 277 what the referenced web archive resource covers. This design 278 enables a PWID URN to: 280 * be used in technical solutions, e.g. to make them resolvable 282 * cover references to materials from all sorts of web archives 284 The motivation for defining a PWID namespace is the growing 285 challenges of references to archived web resources, and the PWID 286 as a URN can assist in overcoming a lot of these challenges. The 287 standard is needed to address web materials meeting precision and 288 persistency issues on par with precision in traditional references 289 for analogue material. Furthermore, it is needed in order to 290 address web archive resources that are not freely available 291 online. The PWID URN covers both referencing of web resources 292 from research papers and definition of web collections/corpora. 293 In detail the challenges are: 295 * Persistent Identifier systems (like DOI [DOI]) will only cover 296 registered resources. In general, citation guidelines do not 297 cover general and persistent referencing techniques for web 298 resources that are not registered. However, an increasing 299 number of references point to resources that only exist on the 300 web, e.g. blogs that turn out to have a historical impact. In 301 order to obtain persistency for a reference, the target needs 302 to be stable. For non-registered web resources, the common 303 rule is that the resource will change, since the live-web is 304 constantly changing. Persistency can only be obtained by 305 referring to something stable, i.e. an archived version of the 306 resource from the web. The PWID URN is therefore focused on 307 referencing archived web material in a technology agnostic way 308 (research documented in [IPRES2016] and [ResawRef]). 310 * References to materials, which only exist in web archives (i.e. 311 no longer on the live web) are not well supported, especially 312 not for materials that only exists in archives with restricted 313 access. There are many new initiatives for web archive 314 referencing, - most of which are centralized solutions offering 315 harvesting and referencing, but these cannot be used for 316 materials that only exist in web archives. The PWID URN can be 317 used for all web archives, including web archives with 318 restricted access. 320 * One of the referencing initiatives for open web archives uses 321 URLs which depend on the current setup of the web archive's 322 access platform. These URLs are usually technology and 323 placement dependent, and therefore such a reference style is 324 not suited for references that are important to retrace for a 325 long period. The PWID URN can be used for such reference 326 purposes, since it is technology agnostic. 328 * Another referencing initiative, for open web archives, is 329 omitting specification of the web archive where the resource 330 was found. This strategy is used in order to open the 331 possibility of using alternatives from other archives. 332 However, this also adds a risk of imprecision since different 333 archives tend to have different versions even when harvesting 334 at the same time. Therefore, such a reference style is not 335 suited for references where it is important that the reference 336 is precisely the verified reference. The PWID URN can provide 337 an exact reference for where the reference was validated. 338 Additionally, the PWID contains the needed information in order 339 to search for alternative resource, if needed. 341 * For web collections/corpora (possibly across different web 342 archives), recent research have found that various legal and 343 sustainability issues has led to a need of a collection 344 definition of references to their web parts. Furthermore, 345 there is a need for a similar persistent referencing for all 346 parts for calculation and sustainability reasons. So far, 347 there has been no stable standard for definition of such 348 collection parts. The PWID URN can be used for such 349 definitions in order to fulfil these requirements (research 350 documented in [ResawColl]). 352 The PWID URN is especially useful for web material where precision 353 is in focus and/or there are references to materials from web 354 archives requiring special permissions in order to gain access. 355 The precision regards two aspects. Firstly, pointing out the 356 archive where the resource was found and validated against its 357 purpose (other archived versions in other web archives may differ 358 both regarding completeness and contents even within short time 359 periods) Secondly, specifying whether the referred resource is a 360 web page or a part in form of one file. 362 The possibility of specifying the part/file precision enables the 363 PWID URN to be used in specification of contents of a web 364 collection. Definitions of web collections are often needed for 365 extraction of data used in production of research results, e.g. 366 for future evaluations. Current practices are not persistent as 367 they often use some CDX version, which vary for different 368 implementations. 370 Strict syntax is needed for the PWID URN, in order to ensure that 371 it can act as a reference which can used for computational 372 purposes. This is especially relevant for automatic extraction of 373 parts from web collection definitions. Furthermore, today's 374 readers of research papers are expecting to be able to access a 375 referenced resource by clicking an actionable URI, therefore a 376 similar possibility will be expected for references to available 377 archived web material, and this is possible with a strict syntax. 378 A prototype for resolving URN PWIDs has been developed for the 379 Danish web archive data and open web archives with standard 380 patterns for the current technologies. Implementations for 381 resolution of PWID URNs for other web archives may be developed. 383 The purpose of the PWID URN is also to express a web archive 384 reference as simple as possible and at the same time meet the 385 requirements for sustainability, usability and scope. Therefore, 386 the PWID URN is focused on having only the minimum required 387 information to make a precise identification of a resource in an 388 arbitrary web archive. Recent research have shown that this can 389 be obtained by the following information [ResawRef]: 391 * Identification of web archive 393 * Identification of source: 395 + Archived URI or identifier 397 + Archival timestamp 399 * Intended precision (page, part/file) 401 The PWID URN represents this information in a human readable way 402 as well as a well-defined way that enables technical solutions to 403 interpret the URN. 405 Syntax: 407 The syntax of the PWID URN is specified below in Augmented Backus- 408 Naur Form (ABNF) [RFC5234] and conforms to URN syntax defined in 409 [RFC8141]. The syntax definition of the PWID URN is: 411 pwid-urn = "urn:" pwid-NID ":" pwid-NSS 413 pwid-NID = "pwid" 414 pwid-NSS = archive-domain ":" archival-time ":" precision-spec 415 ":" archived-uri 417 archival-time = utc-date ["T" utc-time] "Z" 418 utc-date = utc-year "-" utc-month "-" utc-day 419 utc-year = 4DIGIT 420 utc-month = 2DIGIT ; 01-12 421 utc-day = 2DIGIT ; 01-28, 01-29, 01-30, 01-31 based on 422 ; month/year in UTC time 423 utc-time = utc-hour ":" utc-minute [":" utc-second [secfrac]] 424 utc-hour = 2DIGIT ; 00-23 425 utc-minute = 2DIGIT ; 00-59 426 utc-second = 2DIGIT ; 00-58, 00-59, 00-60 based on leap second 427 ; rules 428 secfrac = "." 1*9DIGIT 430 precision-spec = "part" / "page" 432 where 434 * All parts of the pwid-NSS are case insensitive, except for 435 cases where the archived-uri represents a URI with case 436 sensitive parts. According to [RFC8141] (section 3.1) this 437 means that the PWID URNs in general are case insensitive, 438 except from cases where it includes a case sensitive archived 439 URI. 441 * 'archive-domain' is defined as in (section 3.5) [RFC1034]. 442 The 'archive-domain' must identify the web archive by the 443 domain for the archive leading to descriptions of how to access 444 (or apply for access) materials in the archive. (Discussion of 445 this way to identify the web archive is described in the 446 "Assignment" section and discussed in the "Additional 447 information" section). 449 * 'archival-time' is a UTC timestamp which conforms to the W3C 450 profile of [ISO8601] [W3CDTF] and a subset of date-time 451 specified in [RFC3339] (except from allowing partial time 452 specification). 453 The 'archival-time' may be specified at any of the levels of 454 granularity, as long as it reflects exactly the granularity of 455 the timestamp recorded in the archive (which is in accordance 456 with the WARC standard [ISO28500]). 458 * 'DIGIT' is defined as in [RFC5234]. 460 * 'archived-uri' is defined as 'URI' in [RFC3986] but where 461 occurrences of "[", "]", "?", "#" and "%" are %-encoded in 462 order not to clash with URN reserved characters [RFC8141] as 463 well as having unambiguous use of "%". 464 The 'archived-uri' must be the URI for the archived source. 466 The precision specification is expressing the intended precision 467 of the reference, which is needed for specification of 469 * precise coverage of the reference 470 e.g. to an html file, since the precise meaning of what the 471 reference covers can be very varied (the html file itself? the 472 web page it renders to?) or precise web parts of a collection 473 specification. 475 * degree of how precise the reference is with respect to what can 476 be viewed in the future 477 The html file itself will be the same. However for web pages, 478 there are interpretation involved, which mean the result of 479 rendering them in the web archive can change over time. This 480 may happen in case the web archive's algorithm for calculation 481 of which archived web parts to use for the web page. It may 482 also happen if the web page refers to parts which are added to 483 the web archive later, and therefore will give another 484 expression than the originally referenced expression. 486 The following valid precision-spec values are exists: 488 * 'page' 489 Meaning that an application like Wayback calculates a resulting 490 web page based on calculated referenced web parts (display 491 templates, images etc.). For example, an html page displaying 492 an image will need both the html and the referred image. 494 * 'part' 495 Meaning the single archived file/web part harvested as from the 496 specified URI. For references to web pages with html code 497 (i.e. pages where there is an option to "View page source"), 498 this will mean the actual file with the html code. It is 499 relevant to refer to web pages this way, in case it is part of 500 a collection specification or in case it is the html that is of 501 interest (e.g. java scripts or hidden links that are not 502 visible when rendering the web page). 503 For all other types of files, the URI will be for single files 504 to be interpreted a file. 506 Assignment: 508 The PWID URNs do not have to be assigned by an authority, as they 509 are based on the information created at the time of archiving. In 510 other words: a PWID URN is created independently, but following an 511 algorithm which ensures that the referred item can be found if it 512 is still available. A prerequisite for assignment of a PWID is 513 that the web archive can be identified (with a domain describing 514 the web archive) and that it has registered metadata about the 515 archived URI and the time of archiving (also discussed in section 516 "Additional Information"). 518 A PWID URN is created by finding the relevant information of the 519 syntax parts of the PWID: 521 "urn:pwid:" archive-domain ":" archival-time ":" precision-spec 522 ":" archived-uri 524 The PWID URN for an archived item at hand can be constructed by 525 exchanging the unspecified PWID parts with relevant information, 526 as explained in the following: 528 * archive-domain (identification of web archive): 529 This must be the domain of the web archive as identification of 530 the web archive (e.g. archive.org for Internet Archive's open 531 web archive and netarkivet.dk for the Danish web archive with 532 restricted access). Use of the web archive domain as an 533 identification of a web archive is chosen, since most web 534 archives have a web archive domain page that leads to a 535 description of how to access the web archive, e.g. by online 536 access or by applying for access grants. Furthermore, it is 537 more precise than e.g. the name of the archive, since there 538 may be more than one installation of web archives at the same 539 organization, e.g. archive.org and archive-it.org are both 540 covered by Internet Archive. 541 A more precise and persistent identification would require a 542 formal registry of web archives, but such a registry does not 543 yet exist. 545 * archival-time (archival timestamp): 546 The archival time for the archived item must be specified with 547 as much granularity as possible in order to make sure it 548 uniquely identifies the resource at hand. The archival time 549 may be displayed along with the archived item, but there are 550 different implementations. It is important to be aware of 551 whether a more precise timestamp can be found, and whether the 552 correct timestamp is used. In many Wayback implementations, 553 the precise timestamp can be found as part of the URI used for 554 viewing the archived item. For example, the archive http URI 555 https://web.archive.org/web/20160122100823/https://www.dr.dk 556 for an archived resource viewable via the Internet Archive's 557 Wayback installation, the number 20160122100823 represents the 558 archival time 2016-01-22T10:08:23Z. In other installations, 559 the most precise timestamp may be found in the URI from a 560 search result leading to the resource (which usually redirects 561 on basis of a call to the underlying archive index). 562 Especially for web pages with frames, there may be cases where 563 the actual time is not displayed with the source, since only 564 the times for the contents of the frames are displayed. 566 * precision-spec (part or page): 567 The precision specification specifies how the referred item 568 should be regarded. A typical PWID URN reference in a paper 569 would be 'page', where a tool will be needed to render the web 570 page. Alternatively, the precision-spec can be 'part', which 571 is the most precise reference since it reference a specific 572 file where no additional calculations are needed (e.g. as part 573 of a collection, a specific html file with hidden links or to 574 indicate that a single image is referenced). In order to see 575 whether a viewed browser page is a computed web page or a 576 single file, browsers have a function "View page source" which 577 is not activated if for single files). 579 * archived-uri (archived URI): 580 The URI that was harvested by the web archive for the 581 referenced resource. 583 A much easier way to construct PWID URNs is to use tools that 584 construct them. Currently, there is also a prototype for a SOLR- 585 Wayback tool (Source at https://github.com/netarchivesuite/ 586 solrwayback) [PWIDprovider], which can assist in finding the most 587 precise reference to an archived web page. This Wayback version 588 can provide all PWID URNs belonging to a shown page where the page 589 PWID URN is provided at the top of the PWID URN list with 'part' 590 precision, i.e. the page PWID URN can be taken replacing the 591 'part' with 'page' or all provided PWID URNs can be taken and e.g. 592 used in a collection definition. 594 Security and Privacy: 596 Security and privacy considerations are restricted to accessible 597 web resources in web archives. Resolvers to PWID URNs will 598 usually only be possible using the web archives' access tools, 599 where security and privacy are covered by these tools. In such 600 cases, security and privacy will be as covered by these tools. 602 It should be noted that an archived web page or part could be just 603 as dangerous as a "live" page or part; for instance, it could 604 include insecure scripts, malware, trackers, etc. Furthermore, an 605 archived page can in fact be more dangerous, because it could 606 include outdated scripts with known vulnerabilities that can never 607 be patched because the script is archived for all time in a 608 vulnerable state. 610 Interoperability: 612 This is covered by comments in the Syntax description: 614 * the PWID URN conforms to the URI standard defined as in 615 [RFC3986] and the URN standard [RFC8141] 617 * the 'archival-time' of the PWID URN conforms to the UTC 618 timestamp as described in the W3C profile of ISO 8601 [ISO8601] 619 [W3CDTF] and is in accordance with the WARC standard ISO 28500 620 [ISO28500]. 622 * for 'archived-uri', this URI conforms to the URI standard 623 defined as in [RFC3986], with %-encodings of "[", "]", "#", "?" 624 and "%" in order to conform to the URN standard [RFC8141] as 625 well as having unambiguous use of "%" 627 Resolution: 629 The information in a PWID URN can be used for locating a web 630 archive resource, for any kind of web archive. It includes the 631 minimum information for web archive materials, which enables 632 resolvability, manually or by a resolver. Resolution of a PWID 633 URN is the primary motivation of making a formal URN definition, 634 instead of just textual representation of the needed parts of a 635 PWID. 637 Resolution is done based on the PWID parts. This can be done 638 manually by using information from the PWID parts to lookup the 639 web archive and use the web archives tools to search for the 640 resource. It can also be done automatically by using the 641 information from the PWID parts to construct an URI to locate the 642 archived resource the internet (for online web archives) or a 643 local restricted network (for web archives with access 644 restrictions). The relevant information from the PWID parts are: 646 * Web archive domain for web archive holding referred resource 647 The domain name for the web archive. For the manual solution, 648 this domain is used to find a description of how to access the 649 web archive's materials. For example, "archive.org" is the 650 domain name leading to the Internet Archive's interface to 651 their online web collection, and "netarkivet.dk" is the domain 652 name leading to the website for the Danish web archive with 653 information about how to apply for access permission to the web 654 collections. For an automatic solution, the domain will be 655 used to identify how to calculate the pattern for the URI for 656 the resource. 658 * Archived URI of archived resource 659 For the manual solution this domain, the archived URI for the 660 resource must be used in search for the resource. For the 661 automatic solution, this is used as a parameter for 662 construction of the URI for the resource. 664 * Date and time associated with the archived item 665 The archival date and time must be used in manual search for 666 the resource or as parameter to automatic construction of the 667 URI for the resource. 669 * Precision of what is referred 670 The precision contributes to the guidance of how to view the 671 referred item. If the precision is 'page', the resource must 672 be browsed using the web archive browsing tool, which computes 673 all parts needed for browsing of the page. If the precision is 674 'part', the "View page source" browser function can be used for 675 pages to get the referred resource. If the resource is a 676 single file (this option is not activated, since the full 677 resource is already shown). The part precision can also be 678 indicator for tools (e.g. a collection extraction tool) that 679 they can fetch the contents by fetching the file pointed to. 681 In the following, the different resolution techniques are 682 explained (manual as well as via a service) using the following 683 PWID URN as an example: 685 urn:pwid:archive.org:2016-01-22T10:08:23Z:page:https://www.dr.dk 687 In this example the information from the URN PWID parts are: 689 * "archive.org" 690 Currently known identifier in form of the Internet Archive 691 domain name for their open access web archive. 693 * "2016-01-22T10:08:23Z" 694 UTC date and time associated with the archived URI 696 * "page" 697 Clarification that the reference cover the full web page with 698 all its inherited parts selected by the web archive 700 * "https://www.dr.dk" 701 archived URI of the referenced resource 703 A manual resolution technique would be to go through the following 704 steps using the specified web archive's search interface (which 705 will work for both open web archives and web archives with 706 restricted access onsite): 708 * Browse the web archive domain "archive.org" 709 In this case, the domain leads directly into a page where you 710 can search for archived URIs (in other cases there may be need 711 for additional clicks to get to search interface or 712 descriptions of how to apply for access). 714 * Enter the archived URI "https://www.dr.dk" in the search field 715 and make a search, which will result in an overview of the 716 different times that the resource was archived. 718 * Use the archival time "2016-01-22T10:08:23Z" to select the 719 correct resource 721 The "page" information is used in verification that the right 722 precision level is reached. In case the precision-spec had been 723 'part', it would require an extra step selecting "View page 724 source" on the resulting page. 726 It is also noteworthy that the information in the PWID can help in 727 finding an alternative resource, in case the original referred 728 resource is no longer available. The archived URI can be searched 729 in other web archives, where the date and time can help to find 730 the best match, e.g. via Memento [MEMENTO] (for some open web 731 archives) or via possible coming web archive infrastructures. 733 Alternative resolution (automatically or manually) of this URN 734 PWID can be deduced based on the current (2019) knowledge of 735 Internet Archive's open Wayback access web interface, which has 736 the pattern: 738 https://web.archive.org/web/