idnits 2.17.1 draft-pwid-urn-specification-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) == There are 1 instance of lines with non-RFC2606-compliant FQDNs in the document. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (March 1, 2019) is 1883 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- No issues found here. Summary: 2 errors (**), 0 flaws (~~), 3 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force E. Zierau, Ed. 3 Internet-Draft Royal Danish Library 4 Intended status: Informational March 1, 2019 5 Expires: September 2, 2019 7 A Persistent Web IDentifier (PWID) URN Namespace 8 draft-pwid-urn-specification-06 10 Abstract 12 This document specifies a Uniform Resource Name (URN) for Persistent 13 Web IDentifiers for web material in web archives using the 'pwid' 14 namespace identifier. 16 The main purpose of the standard is to support specification of 17 references that are not covered by other reference techniques: to 18 support references to material in web archives with restricted 19 access. Furthermore, it supports persistent technology agnostic 20 references to web archives in general, in a form that can work as an 21 algorithmic basis for finding web archive resources in general. An 22 additional important benefit is that the standard can be used for 23 specifying web collections, which can then form a persistent 24 computational basis for the extract of the archived collection parts. 25 Since these parts can be specified generally, this further allows 26 collections to be specified with elements from one or more web 27 archives. 29 The PWID URN is designed to meet requirements for proper referencing 30 needed by researchers. Therefore it is designed as general, global, 31 sustainable, humanly readable, technology agnostic, persistent and 32 precise web references for web materials in web archives. 34 Status of This Memo 36 This Internet-Draft is submitted in full conformance with the 37 provisions of BCP 78 and BCP 79. 39 Internet-Drafts are working documents of the Internet Engineering 40 Task Force (IETF). Note that other groups may also distribute 41 working documents as Internet-Drafts. The list of current Internet- 42 Drafts is at https://datatracker.ietf.org/drafts/current/. 44 Internet-Drafts are draft documents valid for a maximum of six months 45 and may be updated, replaced, or obsoleted by other documents at any 46 time. It is inappropriate to use Internet-Drafts as reference 47 material or to cite them other than as "work in progress." 48 This Internet-Draft will expire on September 2, 2019. 50 Copyright Notice 52 Copyright (c) 2019 IETF Trust and the persons identified as the 53 document authors. All rights reserved. 55 This document is subject to BCP 78 and the IETF Trust's Legal 56 Provisions Relating to IETF Documents 57 (https://trustee.ietf.org/license-info) in effect on the date of 58 publication of this document. Please review these documents 59 carefully, as they describe your rights and restrictions with respect 60 to this document. Code Components extracted from this document must 61 include Simplified BSD License text as described in Section 4.e of 62 the Trust Legal Provisions and are provided without warranty as 63 described in the Simplified BSD License. 65 Table of Contents 67 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 68 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 5 69 2. Namespace Registration Template . . . . . . . . . . . . . . . 6 70 3. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 22 71 4. References . . . . . . . . . . . . . . . . . . . . . . . . . 22 72 4.1. Normative References . . . . . . . . . . . . . . . . . . 22 73 4.2. Informative References . . . . . . . . . . . . . . . . . 23 74 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 25 76 1. Introduction 78 The PWID URN is a supplement to existing reference standards, where 79 the PWID URN will support references to web archives, including areas 80 that are not supported today: support of references to material in 81 web archives with restricted access. Furthermore, the PWID URN 82 enables technology agnostic references to web archives in general, 83 which can be needed, for instance for references to dynamic web 84 material with frequent updates (e.g. a news site) or a specific 85 version of a web material (e.g. specific version of the DOI 86 handbook). 88 The PWID URN is in a form which can work as an algorithmic basis for 89 finding the resource. This also enables computation of archived web 90 parts to a collection from one or more web archives, if the 91 collection parts are specified by PWID URNs. 93 Furthermore, the PWID URN includes information about the resource 94 which makes it possible to find alternative resources, in cases where 95 the original precise resource has become unavailable. 97 The PWID URN is designed to be a persistent reference that is 98 general, global and technology agnostic in order to enhance its 99 chances of being sustainable. Furthermore, it is designed to be 100 humanly readable and with an ability to specify precision about what 101 the referenced web archive resource covers. This design enables a 102 PWID URN to: 104 o be used in technical solutions, e.g. to make them resolvable 106 o cover references to all sorts of materials in web archives 108 o cover references to materials from all sorts of web archives 110 The motivation for defining a PWID namespace is the growing 111 challenges of references to archived web resources, and the PWID as a 112 URN can assist in overcoming a lot of these challenges. The standard 113 is needed to address web materials meeting precision and persistency 114 issues on par precision in traditional references for analogue 115 material. Furthermore, it is needed in order to address web archive 116 resources that are not freely available online. The PWID URN covers 117 both referencing of web resources from research papers and definition 118 of web collections/corpora. In detail the challenges are: 120 o Persistent Identifier systems (like DOI [DOI]) will only cover 121 registered resources. In general, citation guidelines do not 122 cover general and persistent referencing techniques for web 123 resources that are not registered. However, an increasing number 124 of references point to resources that only exist on the web, e.g. 125 blogs that turn out to have a historical impact. In order to 126 obtain persistency for a reference, the target needs to be stable. 127 For non-registered web resources, the common rule is that the 128 resource will change, since the live-web is constantly changing. 129 Persistency can only be obtained by referring to something stable, 130 i.e. an archived snapshot of the resource from the web. The PWID 131 URN is therefore focused on referencing archived web material in a 132 technology agnostic way (research documented in [IPRES2016] and 133 [ResawRef]). 135 o References to materials, which only exist in web archives (i.e. no 136 longer on the live web) are not well supported, especially not for 137 materials that only exists in archives with restricted access. 138 There are many new initiatives for web archive referencing, - most 139 of which are centralized solutions offering harvesting and 140 referencing, but these cannot be used for materials that only 141 exist in web archives. The PWID URN can be used for all web 142 archives, including web archives with restricted access. 144 o One of the referencing initiatives for open web archives uses URLs 145 which depend on the current setup of the web archive's access 146 platform. These URLs are usually technology and placement 147 dependent, and therefore such a reference style is not suited for 148 references that are important to retrace for a long period. The 149 PWID URN can be used for such reference purposes, since it is 150 technology agnostic. 152 o Another referencing initiative, for open web archives, is omitting 153 specification of the web archive where the resource was found. 154 This strategy is used in order to open the possibility of using 155 alternatives from other archives. However, this also adds a risk 156 of imprecision since different archives tend to have different 157 versions even when harvesting at the same time. Therefore, such a 158 reference style is not suited for references where it is important 159 that the reference is precisely the verified reference. The PWID 160 URN can provide an exact reference for where the reference was 161 validated. Additionally, the PWID contains the needed information 162 in order to search for alternative resource, if needed. 164 o For reference of web collections/corpora (possibly across 165 different web archives), recent research have found that various 166 legal and sustainability issues has led to a need of a collection 167 definition of references to their web parts. Furthermore, there 168 is a need for a similar persistent referencing for all parts for 169 calculation and sustainability reasons. So far, there has been no 170 stable standard for definition of such collection parts. The PWID 171 URN can be used for such definitions in order to fulfil these 172 requirements (research documented in [ResawColl]). 174 The PWID URN is especially useful for web material where precision is 175 in focus and/or there are references to materials from web archives 176 requiring special permissions in order to gain access. The precision 177 regards both pointing to the archive where it was found and validated 178 against its purpose (other archived versions in other web archives 179 may differ both regarding completeness and contents even within short 180 time periods) as well as precision in what is actually referred by 181 the reference (e.g. is it the page or the whole website). 183 Furthermore, the PWID URN is very useful in specification of contents 184 of a web collection. Definitions of web collections are often needed 185 for extraction of data used in production of research results, e.g. 186 for future evaluations. Current practices are not persistent as they 187 often use some CDX version, which vary for different implementations. 189 Strict syntax is needed for the PWID URN, in order to ensure that it 190 can act as a reference which can used for computational purposes. 191 This is especially relevant for automatic extraction of parts from 192 web collection definitions. Furthermore, today's readers of research 193 papers are expecting to be able to access a referenced resource by 194 clicking an actionable URI, therefore a similar possibility will be 195 expected for references to available archived web material, and this 196 is possible with a strict syntax. Examples of technical solutions 197 that are enabled are: 199 o Resolving of a reference to a web collection and automatic 200 extraction of the parts of a web collection defined by PWID URNs 201 [ResawRef] [ResawColl] 203 o Resolving of a PWID URN by resolving services. To begin with, a 204 prototype has been developed for the Danish web archive data and 205 open web archives with standard patterns for the current 206 technologies. Implementations for resolution of PWID URNs for 207 other web archives may be developed. 209 The purpose of the PWID URN is also to express a web archive 210 reference as simple as possible and at the same time meet the 211 requirements for sustainability, usability and scope. Therefore, the 212 PWID URN is focused on having only the minimum required information 213 to make a precise identification of a resource in an arbitrary web 214 archive. Recent research have shown that this can be obtained by the 215 following information [ResawRef]: 217 o Identification of web archive 219 o Identification of source: 221 * Archived URI or identifier 223 * Archival timestamp 225 o Intended precision (page, part, subsite etc.) 227 The PWID URN represents this information in a human readable way as 228 well as a well-defined way that enables technical solutions to 229 interpret the URN. 231 1.1. Requirements Language 233 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 234 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 235 document are to be interpreted as described in [RFC2119]. 237 2. Namespace Registration Template 239 Namespace Identifier: 241 PWID 243 Version: 245 6 247 Date: 249 2019-03-01 251 Registrant: 253 Eld Maj-Britt Olmuetz Zierau 254 Royal Danish Library 255 Soeren Kierkegaards Plads 1 256 1219 Copenhagen 257 Denmark 258 ph: +45 9132 4690 259 email: elzi@kb.dk 261 Purpose: 263 The PWID URN is a supplement to existing reference standards, 264 where the PWID URN will support references to web archives, 265 including areas that are not supported today: support of 266 references to material in web archives with restricted access. 267 Furthermore, the PWID URN enables technology agnostic references 268 to web archives in general, which can be needed, for instance for 269 references to dynamic web material with frequent updates (e.g. a 270 news site) or a specific version of a web material (e.g. specific 271 version of the DOI handbook). 273 The PWID URN is in a form which can work as an algorithmic basis 274 for finding the resource. This also enables computation of 275 archived web parts to a collection from one or more web archives, 276 if the collection parts are specified by PWID URNs. 278 Furthermore, the PWID URN includes information about the resource 279 which makes it possible to find alternative resources, in cases 280 where the original precise resource has become unavailable. 282 The PWID URN is designed to be a persistent reference that is 283 general, global and technology agnostic in order to enhance its 284 chances of being sustainable. Furthermore, it is designed to be 285 humanly readable and with an ability to specify precision about 286 what the referenced web archive resource covers. This design 287 enables a PWID URN to: 289 * be used in technical solutions, e.g. to make them resolvable 291 * cover references to all sorts of materials in web archives 293 * cover references to materials from all sorts of web archives 295 The motivation for defining a PWID namespace is the growing 296 challenges of references to archived web resources, and the PWID 297 as a URN can assist in overcoming a lot of these challenges. The 298 standard is needed to address web materials meeting precision and 299 persistency issues on par with precision in traditional references 300 for analogue material. Furthermore, it is needed in order to 301 address web archive resources that are not freely available 302 online. The PWID URN covers both referencing of web resources 303 from research papers and definition of web collections/corpora. 304 In detail the challenges are: 306 * Persistent Identifier systems (like DOI [DOI]) will only cover 307 registered resources. In general, citation guidelines do not 308 cover general and persistent referencing techniques for web 309 resources that are not registered. However, an increasing 310 number of references point to resources that only exist on the 311 web, e.g. blogs that turn out to have a historical impact. In 312 order to obtain persistency for a reference, the target needs 313 to be stable. For non-registered web resources, the common 314 rule is that the resource will change, since the live-web is 315 constantly changing. Persistency can only be obtained by 316 referring to something stable, i.e. an archived snapshot of the 317 resource from the web. The PWID URN is therefore focused on 318 referencing archived web material in a technology agnostic way 319 (research documented in [IPRES2016] and [ResawRef]). 321 * References to materials, which only exist in web archives (i.e. 322 no longer on the live web) are not well supported, especially 323 not for materials that only exists in archives with restricted 324 access. There are many new initiatives for web archive 325 referencing, - most of which are centralized solutions offering 326 harvesting and referencing, but these cannot be used for 327 materials that only exist in web archives. The PWID URN can be 328 used for all web archives, including web archives with 329 restricted access. 331 * One of the referencing initiatives for open web archives uses 332 URLs which depend on the current setup of the web archive's 333 access platform. These URLs are usually technology and 334 placement dependent, and therefore such a reference style is 335 not suited for references that are important to retrace for a 336 long period. The PWID URN can be used for such reference 337 purposes, since it is technology agnostic. 339 * Another referencing initiative, for open web archives, is 340 omitting specification of the web archive where the resource 341 was found. This strategy is used in order to open the 342 possibility of using alternatives from other archives. 343 However, this also adds a risk of imprecision since different 344 archives tend to have different versions even when harvesting 345 at the same time. Therefore, such a reference style is not 346 suited for references where it is important that the reference 347 is precisely the verified reference. The PWID URN can provide 348 an exact reference for where the reference was validated. 349 Additionally, the PWID contains the needed information in order 350 to search for alternative resource, if needed. 352 * For reference of web collections/corpora (possibly across 353 different web archives), recent research have found that 354 various legal and sustainability issues has led to a need of a 355 collection definition of references to their web parts. 356 Furthermore, there is a need for a similar persistent 357 referencing for all parts for calculation and sustainability 358 reasons. So far, there has been no stable standard for 359 definition of such collection parts. The PWID URN can be used 360 for such definitions in order to fulfil these requirements 361 (research documented in [ResawColl]). 363 The PWID URN is especially useful for web material where precision 364 is in focus and/or there are references to materials from web 365 archives requiring special permissions in order to gain access. 366 The precision regards both pointing to the archive where it was 367 found and validated against its purpose (other archived versions 368 in other web archives may differ both regarding completeness and 369 contents even within short time periods) as well as precision in 370 what is actually referred by the reference (e.g. is it the page or 371 the whole website). 373 Furthermore, the PWID URN is very useful in specification of 374 contents of a web collection. Definitions of web collections are 375 often needed for extraction of data used in production of research 376 results, e.g. for future evaluations. Current practices are not 377 persistent as they often use some CDX version, which vary for 378 different implementations. 380 Strict syntax is needed for the PWID URN, in order to ensure that 381 it can act as a reference which can used for computational 382 purposes. This is especially relevant for automatic extraction of 383 parts from web collection definitions. Furthermore, today's 384 readers of research papers are expecting to be able to access a 385 referenced resource by clicking an actionable URI, therefore a 386 similar possibility will be expected for references to available 387 archived web material, and this is possible with a strict syntax. 388 Examples of technical solutions that are enabled are: 390 * Resolving of a reference to a web collection and automatic 391 extraction of the parts of a web collection defined by PWID 392 URNs [ResawRef] [ResawColl] 394 * Resolving of a PWID URN by resolving services. To begin with, 395 a prototype has been developed for the Danish web archive data 396 and open web archives with standard patterns for the current 397 technologies. Implementations for resolution of PWID URNs for 398 other web archives may be developed. 400 The purpose of the PWID URN is also to express a web archive 401 reference as simple as possible and at the same time meet the 402 requirements for sustainability, usability and scope. Therefore, 403 the PWID URN is focused on having only the minimum required 404 information to make a precise identification of a resource in an 405 arbitrary web archive. Recent research have shown that this can 406 be obtained by the following information [ResawRef]: 408 * Identification of web archive 410 * Identification of source: 412 + Archived URI or identifier 414 + Archival timestamp 416 * Intended precision (page, part, subsite etc.) 418 The PWID URN represents this information in a human readable way 419 as well as a well-defined way that enables technical solutions to 420 interpret the URN. 422 Syntax: 424 The syntax of the PWID URN is specified below in Augmented Backus- 425 Naur Form (ABNF) [RFC5234] and conforms to URN syntax defined in 426 [RFC8141]. The syntax definition of the PWID URN is: 428 pwid-urn = "urn" ":" pwid-NID ":" pwid-NSS 430 pwid-NID = "pwid" 431 pwid-NSS = archive-id ":" archival-time ":" precision-spec 432 ":" archived-item-id 434 archive-id = domain / "~" registered-archive-id 435 registered-archive-id = +( unreserved ) 437 archival-time = utc-date ["T" utc-time] "Z" 438 utc-date = utc-year "-" utc-month "-" utc-day 439 utc-year = 4DIGIT 440 utc-month = 2DIGIT ; 01-12 441 utc-day = 2DIGIT ; 01-28, 01-29, 01-30, 01-31 based on 442 ; month/year in UTC time 443 utc-time = utc-hour ":" utc-minute [":" utc-second [secfrac]] 444 utc-hour = 2DIGIT ; 00-23 445 utc-minute = 2DIGIT ; 00-59 446 utc-second = 2DIGIT ; 00-58, 00-59, 00-60 based on leap second 447 ; rules 448 secfrac = "." (1-9)*DIGIT 450 precision-spec = "part" / "page" / "subsite" / "site" 451 / "collection" / "recording" / "snapshot" 452 / extension 453 extension = +( letter ) 455 archived-item-id = URI / "~" registered-item-id 456 registered-item-id = +( unreserved ) 458 where 460 * All parts of the pwid-NSS are case insensitive, except for 461 archived-item-id in cases where the archived-item-id is an URI 462 with case sensitive parts. According to [RFC8141] (section 463 3.1) this means that the PWID URNs in general are case 464 insensitive, except from cases where it includes a case 465 sensitive URI as archived-item-id. 467 * archive-id must either be the domain for the archive which can 468 lead to descriptions of how to access (or apply for access) 469 materials in he archive, - or it must be a registered archive- 470 id (registry still to be defined and created). Distinction 471 between the to types of identifiers is made by matching the 472 first character with "~". In case of a match, it means that 473 the rest of the identifier is a registered archive item 474 identifiers, since the syntax requires such identifiers to be 475 prefixed with "~", while no URI is allowed to start with this 476 character 478 * 'domain' is defined as in (section section 3.5) [RFC1034]. 480 * 'unreserved' is defined as in [RFC3986]. 482 * 'archival-time' is a UTC timestamp which conforms to the W3C 483 profile of [ISO8601] [W3CDTF] and a subset of date-time 484 specified in [RFC3339] (except from allowing partial time 485 specification). The archival-time may be specified at any of 486 the levels of granularity, as long as it reflects exactly the 487 granularity of the timestamp recorded in the archive (which is 488 in accordance with the WARC standard [ISO28500]). 490 * archive-item-id must either be the archived URI for the source 491 or a registered archive-item-id. Distinction between the to 492 types of identifiers is made by matching the first character 493 with "~". In case of a match, it means that the rest of the 494 identifier is a registered archive item identifiers, since the 495 syntax requires such identifiers to be prefixed with "~", while 496 no URI is allowed to start with this character 498 * 'URI' is defined as in [RFC3986] but where occurrences of "[", 499 "]", "?", "#" and "%" are %-encoded in order not to clash with 500 URN reserved characters [RFC8141] as well as having unambiguous 501 use of "%". 503 The precision specification is expressing the intended precision 504 of the reference. For example, if it refers to an html web 505 element, this element can be interpreted in several ways: 507 * As one web part only 508 Meaning the file containing the html, and precisely this file 510 * As a web page 511 Meaning that an application like Wayback shows a resulting web 512 page in a browser based on calculated referenced web parts 513 (display templates, images etc.). 514 If the full reference contains only the PWID URN for the page, 515 this may mean that the archived page can change its appearance 516 over time, e.g. if parts referred by the page did not exist at 517 reference time, but are harvested at a later stage, - or if the 518 web archive's algorithm for calculation of the referred web 519 parts are changed and consequently returns a different result. 520 Therefore, the most a precise reference to a picture in context 521 of a web page would be to provide the PWID URN for the page 522 (with page precision) and the PWID URN for the image file part 523 which contains the referred picture (with part precision) 525 * As a site or subsite 526 Meaning that an application like Wayback shows the result in a 527 browser showing the web page. If access is limited to the 528 referenced part (the html page), then the application would 529 also need to make sure that all parts/pages belonging to the 530 site/subsite is available. 531 If the full reference only contains the PWID URN for the site/ 532 subsite, this may mean that the site/subsite can change its 533 appearance over time in the same way as for the web page 534 described above 536 The precision specification needs to be part of a PWID URN in 537 order to enable the making of the above described precision in the 538 reference. Furthermore, this precision specification will make it 539 possible for resolvers to display the referred source in a way 540 that corresponds to the precision specification. 542 There are different ways to represent e.g. a web page, which 543 provides different precision of the source as well. The above 544 examples with part, page, subsite and site are addressing the most 545 common access via browser functionality like in Wayback. However, 546 some web archives archive snapshots of the web pages for the 547 archived URI. A third option is to produce a collection of 548 archived URIs as basis for browser access instead of letting the 549 web archive calculate sub items (which may change over time). An 550 example of the production of such a collection is provided in the 551 section about assignment. Lastly, a web page may be archived via 552 a web recording. 554 Because of the above, the following valid precision-spec values 555 are exists: 557 * part 558 The single archived web part harvested as a file from the 559 specified URI, e.g. a pdf, an html text or an image 561 * page 562 The web page represented by the web page file (e.g. html) 563 harvested from the specified URI, where its content is 564 interpreted as a web page with all referred parts relevant to 565 display the web page (but where referred parts must be 566 calculated as described above), e.g. an html page with referred 567 images 569 * subsite 570 The referred web page (as described under 'page') from which is 571 possible to browse to all references starting with the same 572 path as the archived URI 574 * site 575 The referred web page (as described under 'page') from which is 576 possible to browse to all references in the domain specified in 577 the archived URI 579 * collection 580 Representation of a collection specification, where the web 581 archive applications will decide how it is rendered (e.g. 582 collection specification in the XML format enabling 583 interpretation as in the example provided in [ResawColl]) 585 * snapshot 586 A snapshot (image) representation of web material, e.g. a web 587 page 589 * recording 590 Representation of a web recording specification where ithe web 591 archive applications will decide how it is rendered 592 (interpretation could e.g. depend on file-suffix for the web 593 recording), an example is a web recording coded in a WARC file 595 The option of making an extension value is included to allow 596 reference of a resource of any kind with an assigned identifier, 597 even if it is not covered by the other values. In all cases, it 598 will be up to the application serving the web archive to interpret 599 how this item should be rendered. 601 Assignment: 603 The PWID URNs do not have to be assigned by an authority, as they 604 are based on the information created at the time of archiving. In 605 other words: a PWID URN is created independently, but following an 606 algorithm which ensures that the referred item can be found if it 607 is still available. A PWID URN also has the benefit that it 608 includes information to look at alternative resources e.g. via 609 Memento for some open web archives [MEMENTO] or via possible 610 future web archive infrastructures. 612 A PWID URN is created by finding the relevant information of the 613 syntax parts of the PWID: 615 "urn:pwid:" archive-id ":" archival-time ":" precision-spec 616 ":" archived-item-id 618 The PWID URN for an archived item at hand can be constructed by 619 exchanging the unspecified PWID parts with relevant information, 620 as explained in the following: 622 * archive-id (identification of web archive): 623 In this version of the standard, it is recommended to use the 624 domain of the web archive as the identifier for the web archive 625 (e.g. archive.org for Internet Archive's open web archive and 626 netarkivet.dk for the Danish web archive with restricted 627 access). This is recommended, since browsing the domain page 628 will typically lead to a description of how to access the web 629 archive, e.g. by online access or by applying for access 630 grants. Furthermore, it is more precise than e.g. the name of 631 the archive, since there may be more than one installation of 632 web archives at the same organization, e.g. archive.org and 633 archive-it.org are both covered by Internet Archive. 634 When a registry of web archives is established, it will be more 635 precise and persistent to use the web archive identifier 636 specified in this registry. (e.g. DKWA for the Danish web 637 archive with the domain netarkivet.dk). The syntax requires 638 that such identifiers are prefixed with the charecter "~". 640 * archival-time (archival timestamp): 641 The archival time for the archived item must be specified with 642 as much granularity as possible in order to make sure it 643 uniquely identifies the resourceat at hand. The archival time 644 may be displayed along with the archived item, but there are 645 different implementations where it is important to be aware of 646 whether a more precise timestamp can be found, and whether the 647 correct timestamp is used. In many Wayback implementations, 648 the precise timestamp can be found as part of the URI used for 649 viewing the archived item. For example, the archive http URI 650 https://web.archive.org/web/20160122112029/http://www.dr.dk for 651 an archived resource viewable via the Internet Archive's 652 Wayback installation, the number 20160122112029 represents the 653 archival time 2016-01-22T11:20:29Z. In other installations, 654 the most precise timestamp may be found in the URI from a 655 search result leading to the resource (which usually redirects 656 on basis of a call to the underlying archive index). 657 Especially for web pages with frames, there may be cases where 658 the actual time is not displayed with the source, since only 659 the times for the contents of the frames are displayed. 661 * precision-spec (precision as represented page, part, site, 662 snapshot etc.): 663 The precision specification specifies how the user should view 664 the referred item - either as a specific representation (with 665 inherited precision) or by use of tools (e.g. browse web site 666 based on calculations or browse on basis of collection of 667 specific parts). 668 Inherited precision is implicitly indicated by the precision 669 specification from how the information is used in resolution 670 and location. The most precis reference is part, e.g. for an 671 image which can be located and accessed independently. Less 672 precise references are references where calculation of other 673 parts are needed in order to resolve and view it, e.g. page, 674 site or subsite. 676 * archived-item-id (archived URI or registered identifier): 677 The archived item identifier will either be the archived URI of 678 the displayed archived item at hand, or it will be an 679 identifier assigned for a resource by the archive. In the 680 latter case, the syntax requires that such identifiers are 681 prefixed with the charecter "~". 683 A much easier way to construct PWID URNs is to use tools that 684 construct them. Currently, there is also a prototype for a SOLR- 685 Wayback tool (Source at https://github.com/netarchivesuite/ 686 solrwayback) [PWIDprovider], which can assist in finding the most 687 precise reference to an archived web page. This Wayback version 688 can provide all PWID URNs belonging to a shown page (with the page 689 PWID URN at the top). For example, in netarkivet.dk, the archived 690 URI for the web page http://www.susanlegetoej.dk/shop/handskedyr- 691 siameser-killing-8681p.html archived 2008-11-29 01:19:16 UTC, has 692 the following parts calculated by the SOLR-Wayback tool: 694 urn:pwid:netarkivet.dk:2008-11- 695 29T00:41:42Z:part:http://www.susanlegetoej.dk/images/ddcss/ 696 SK113_Master_NF.css 698 urn:pwid:netarkivet.dk:2008-11- 699 29T00:39:47Z:part:http://www.susanlegetoej.dk/shop/css/ 700 print.css 702 urn:pwid:netarkivet.dk:2008-11- 703 29T00:40:06Z:part:http://www.susanlegetoej.dk/images/ddcss/ 704 SK113_Basket_NF.css 706 urn:pwid:netarkivet.dk:2008-11- 707 29T00:40:00Z:part:http://www.susanlegetoej.dk/images/ddcss/ 708 SK113_TopMenu_NF.css 710 urn:pwid:netarkivet.dk:2008-11- 711 29T00:40:00Z:part:http://www.susanlegetoej.dk/images/ddcss/ 712 SK113_SearchPage_NF.css 713 urn:pwid:netarkivet.dk:2008-11- 714 29T00:40:35Z:part:http://www.susanlegetoej.dk/images/ddcss/ 715 SK113_Productmenu_NF.css 717 urn:pwid:netarkivet.dk:2008-11- 718 29T00:40:22Z:part:http://www.susanlegetoej.dk/images/ddcss/ 719 SK113_SpaceTop_NF.css 721 urn:pwid:netarkivet.dk:2008-11- 722 29T00:40:24Z:part:http://www.susanlegetoej.dk/images/ddcss/ 723 SK113_SpaceLeft_NF.css 725 urn:pwid:netarkivet.dk:2008-11- 726 29T00:40:23Z:part:http://www.susanlegetoej.dk/images/ddcss/ 727 SK113_SpaceBottom_NF.css 729 urn:pwid:netarkivet.dk:2008-11- 730 29T00:40:25Z:part:http://www.susanlegetoej.dk/images/ddcss/ 731 SK113_SpaceRight_NF.css 733 urn:pwid:netarkivet.dk:2008-11- 734 29T00:37:23Z:part:http://www.susanlegetoej.dk/images/ddcss/ 735 SK113_ProductInfo_NF.css 737 urn:pwid:netarkivet.dk:2008-11- 738 29T00:37:24Z:part:http://www.susanlegetoej.dk/Shop/js/ 739 Variants.js 741 urn:pwid:netarkivet.dk:2009-03- 742 03T11:53:00Z:part:http://www.susanlegetoej.dk/Shop/js/Media.js 744 urn:pwid:netarkivet.dk:2009-03- 745 03T11:53:02Z:part:http://www.susanlegetoej.dk/images/design/ 746 print.gif 748 urn:pwid:netarkivet.dk:2009-03- 749 03T11:54:19Z:part:http://www.susanlegetoej.dk/Shop/js/Scroll.js 751 urn:pwid:netarkivet.dk:2009-03- 752 03T11:54:09Z:part:http://www.susanlegetoej.dk/Shop/js/ 753 Shop5Common.js 755 urn:pwid:netarkivet.dk:2006-11- 756 20T20:16:03Z:part:http://www.susanlegetoej.dk/images/602551.jpg 758 Security and Privacy: 760 Security and privacy considerations are restricted to accessible 761 web resources in web archives. Resolvers to PWID URNs will 762 usually only be possible using the web archives' access tools, 763 where security and privacy are covered by these tools. In such 764 cases security and privacy will covered by such tools, since the 765 information used for access has no security and privacy issues. 766 In the cases where resolution is made around the archives' access 767 tools, there should be made separate analysis. 769 Interoperability: 771 This is covered by comments in the Syntax description: 773 * the PWID URN conforms to the URI standard defined as in 774 [RFC3986] and the URN standard [RFC8141] 776 * the 'archival-time' of the PWID URN conforms to the UTC 777 timestamp as described in the W3C profile of ISO 8601 [ISO8601] 778 [W3CDTF] and is in accordance with the WARC standard ISO 28500 779 [ISO28500]. 781 * for use of URIs for the 'archived-item-id', this URI conforms 782 to the URI standard defined as in [RFC3986], with %-encodings 783 of "[", "]", "#", "?" and "%" in order to conform to the URN 784 standard [RFC8141] as well as having unambiguous use of "%" 786 Resolution: 788 The information in a PWID URN can be used for locating a web 789 archive resource, for any kind of web archive. It includes the 790 minimum information for web archive materials, which enables 791 resolvability, manually or by a resolver. Resolution of a PWID 792 URN is the primary motivation of making a formal URN definition, 793 instead of just textual representation of the for needed parts of 794 a PWID. 796 Resolution (manually or automatically) is done based on the PWID 797 parts: 799 * Web archive identification for web archive holding referred 800 resource 801 If the identifier do not start with "~", then the identifier is 802 the domain name for the web archive, where browsing this domain 803 page typically will lead to description of how to access the 804 web archive. For example, "archive.org" is the domain name 805 leading to the Internet Archive's interface to their online web 806 collection, and "netarkivet.dk" is the domain name leading to 807 the website for the Danish web archive with information about 808 how to apply for access permission to the web collections. If 809 the identifier starts with "~" the archive can be identified by 810 looking up the identifier (from the rest of the archive 811 identier) in a registry of archives. It is a future 812 possibility is to have such a registry which should have 813 archive identifiers along with their current location on the 814 internet. Such a registry will be needed for persistent 815 reference to the archive, since an archive may change their 816 location and name or archives may merge. There is work in 817 progress to define such a registry, but no details yet. 819 * Archived URI or registered identifier of archived item 820 If the identifier do not start with "~", then the resource is 821 an archived URI, this URI must be used in search for or 822 construction of location of the resource. If the identifier 823 starts with "~", then the rest of the characters in the 824 identifier constitutes a registered identifier assigned to the 825 resource (by the archive), it is then this identifier that must 826 be used in search for or construction of location of the 827 resource 829 * Date and time associated with the archived item 830 The archival date and time must be used in search for or 831 construction of the location of the resource 833 * Precision of what is referred 834 The precision can either contribute to the guidance of 835 activating tools to view the referred item e.g. browse the 836 referred item as a page on basis of computed closest past, 837 browse the referred item on basis of parts specified in a 838 collection, or view the referred item as a snapshot. In the 839 example of the snapshot, it also contains a specification of 840 which resource to display 842 In the following the different resolution techniques are explained 843 (manual as well as via a service) . 845 An example of a PWID URN is: 847 urn:pwid:archive.org:2016-01-22T11:20:29Z:page:http://www.dr.dk 849 has the information: 851 * archive.org 852 Currently known identifier in form of the Internet Archive 853 domain name for their open access web archive. If Internet 854 Archive registered their open web archive in an IANA web 855 archive register, this identifier could currently be 856 "web.archive.org/web/" for Wayback resolution, or it could be 857 "archive.org/pwid/" if a PWID interface was created as 858 described below 860 * 2016-01-22T11:20:29Z 861 UTC date and time associated with the archived URI 863 * page 864 Clarification that the reference cover the full web page with 865 all its inherited parts selected by the web archive 867 * http://www.dr.dk 868 archived URI of item 870 Resolution of this URN PWID can be deduced based on the current 871 (2019) knowledge of Internet Archive's open Wayback access web 872 interface, which has the pattern: 874 https://web.archive.org/web/