idnits 2.17.1 draft-daigle-appidarch-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 2) being 60 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 240 has weird spacing: '...ociated with ...' == Line 434 has weird spacing: '...nes its scope...' == Line 464 has weird spacing: '...-- many resou...' -- The document date (March 2015) is 3330 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Unused Reference: '1' is defined on line 542, but no explicit reference was found in the text Summary: 1 error (**), 0 flaws (~~), 6 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force L. Daigle 3 Internet-Draft TCE 4 Intended status: Informational March 2015 5 Expires: August 31, 2015 7 Internet Application Identifier Architecture 8 draft-daigle-appidarch-00.txt 10 Abstract 12 This document outlines a general architecture for Internet 13 applications, through the perspective of applications identifiers. 14 It provides a survey of past approaches, drawing out common elements 15 and highlighting common traps and roadblocks. 17 Status of this Memo 19 This Internet-Draft is submitted in full conformance with the 20 provisions of BCP 78 and BCP 79. 22 Internet-Drafts are working documents of the Internet Engineering 23 Task Force (IETF). Note that other groups may also distribute 24 working documents as Internet-Drafts. The list of current Internet- 25 Drafts is at http://datatracker.ietf.org/drafts/current/. 27 Internet-Drafts are draft documents valid for a maximum of six months 28 and may be updated, replaced, or obsoleted by other documents at any 29 time. It is inappropriate to use Internet-Drafts as reference 30 material or to cite them other than as "work in progress." 32 This Internet-Draft will expire on August 31, 2015. 34 Copyright Notice 36 Copyright (c) 2015 IETF Trust and the persons identified as the 37 document authors. All rights reserved. 39 This document is subject to BCP 78 and the IETF Trust's Legal 40 Provisions Relating to IETF Documents (http://trustee.ietf.org/ 41 license-info) in effect on the date of publication of this document. 42 Please review these documents carefully, as they describe your rights 43 and restrictions with respect to this document. Code Components 44 extracted from this document must include Simplified BSD License text 45 as described in Section 4.e of the Trust Legal Provisions and are 46 provided without warranty as described in the Simplified BSD License. 48 Table of Contents 50 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 2 51 2. Basic components of Application Identifier Architecture . . . 2 52 3. Application Identifier Architectures in More Detail . . . . . 2 53 3.1. System . . . . . . . . . . . . . . . . . . . . . . . . . . 2 54 3.2. Identifiers . . . . . . . . . . . . . . . . . . . . . . . 3 55 3.3. Identified . . . . . . . . . . . . . . . . . . . . . . . . 3 56 4. Survey of existing work . . . . . . . . . . . . . . . . . . . 5 57 4.1. Domain Name System . . . . . . . . . . . . . . . . . . . . 5 58 4.2. World Wide Web . . . . . . . . . . . . . . . . . . . . . . 6 59 4.3. Classic URIs . . . . . . . . . . . . . . . . . . . . . . . 8 60 4.4. IP addresses . . . . . . . . . . . . . . . . . . . . . . . 10 61 5. Common design choices and challenges . . . . . . . . . . . . . 10 62 5.1. Identifiers . . . . . . . . . . . . . . . . . . . . . . . 10 63 5.2. Identified . . . . . . . . . . . . . . . . . . . . . . . . 10 64 6. Issues in (mis)using identifiers . . . . . . . . . . . . . . . 11 65 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11 66 8. Security Considerations . . . . . . . . . . . . . . . . . . . 11 67 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 11 68 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 11 70 1. Introduction 72 This document posits a high level architecture of Internet 73 application identifier systems, as well as a survey of IETF efforts 74 dealing in standardization of Internet applications and services 75 using those identifier systems. 77 Status of this revision: this is a very drafty -00 document. The 78 hope and expectation is that it is enough to stimulate some thought 79 and discussion to flesh out future documents. 81 2. Basic components of Application Identifier Architecture 83 There are 3 basic components of an Application Identifier 84 Architecture: 86 o System 88 o Identifiers 90 o Identified 92 The System is the context in which the identifiers are created, used 93 and in which they are intended to make sense. This is usually 94 transparent, except when identifiers are used outside of this 95 context, causing greater or lesser problems over time. This is 96 discussed in more detail, below. 98 Identifiers are typically strings of bits or characters. They may 99 have multiple representations. 101 The concept of what is being Identified is also dependent on the 102 System -- whether it's Internet hosts, services, documents, parts of 103 documents, people or other actors from the physical world, their 104 representatives in the Internet, etc. 106 3. Application Identifier Architectures in More Detail 108 3.1. System 109 As noted above, the System is the context in which the identifiers 110 make sense. In the Domain Name System, for example, the system 111 initially consisted of the set of hosts attached to the Internet, and 112 it has generalized to the set of addressable Internet services. 113 These are organized into ?domains?, which are operated under the 114 control of a single entity, and individual domains are completely 115 independent of each other. 117 3.2. Identifiers 119 Identifiers may identify a thing that exists ? content, service, 120 location ? or is posited to exist. Typical actions on identifiers 121 include: 123 o "Minting" -- creation of an identifier, usually including 124 association with the identified thing 126 o Transformation -- changing the bits (characters) of the identifier 127 by some set of rules and/or to conform to some structure; relative 128 or absolute 130 o Comparison -- of identifiers. Are 2 different identifiers the 131 same? Identifying the same thing? Expressing relationship 132 between things? 134 o Resolution -- using the identifier to access what it identifies 136 o Validation -- confirmation that the identifier conforms to the 137 system?s rules (syntax) 139 o Status check -- has the identifier been minted? Is it active 140 within the system? 142 o Authentication -- confirmation that the identifier association is 143 valid (as minted) 145 o Lookup -- some systems support look up ? finding identifier 146 entries based on partial fragments, typically leading characters 147 (bits) 149 o Search -- some systems support searches for identifiers based on 150 fragments of the related data 152 o Subscribe -- subscribing to an identifier allows you to get 153 periodic updates as to state of the identifier/identified. 155 3.3. Identified 157 Identifiers can be associated with just about any level of concept, 158 construct, network or software element, or entity in the physical 159 world. The range of possible identified things is generally scoped 160 by the System. 162 From the perspective of application architectures, there are 4 levels 163 of things that may be identified, and may have individual 164 identifiers: 166 o Entity/resource -- the thing itself. For example, the published 167 work "Moby Dick" 169 o Instance -- a specific copy of the thing, e.g., a copy of "Moby 170 Dick". 172 o Properties -- the characteristics of the thing. The set of 173 properties discussed is generally constrained by the System. 175 o Relationship -- an identifier may identify something as "part-of" 176 a larger entity. 178 There are actions that may be performed on the things identified: 180 o Assign properties -- associate values with properties of 181 identified item 183 o Get properties 185 o Intrinsic -- E.g., format, number of words 187 o Applied -- director's cut 189 o Publish -- put copy somewhere 191 o First instance 193 o Cache/replica 195 o Get (a copy) 197 o Any copy 199 o Specific service 201 o Closest 203 o Cheapest 205 o Authenticate -- confirm (cryptographically) that the resource is 206 genuinely the one expected / related to identifier 208 o Comparison 210 o Equivalence 212 o Send -- a reflection of "get"? 214 o Subscribe 215 o Search 217 4. Survey of existing work 219 4.1. Domain Name System 221 The Domain Name System (DNS) was designed to provide identifiers to 222 allow storage and retrieval of (sets of) properties associated with 223 Internet hosts and services ? real and virtual. 225 DNS identifiers are hierarchical, dot-separated labels, typically 226 expressed as characters. Host names are a subset of domain name 227 identifiers, with some restrictions on the permissible characters. 229 o "Minting" -- the authority for a domain can create labels within 230 that domain. So-called "synthetic" domain labels are created 231 dynamically. 233 o Transformation -- relative domain names can be understood within 234 the context of a ?search domain? 236 o Comparison -- domain names are matched on an octet-by-octet basis 237 (IDNs are not considered here) 239 o Resolution -- DNS resolution means "DNS lookup" -- using the DNS 240 infrastructure to retrieve resource records associated with the 241 domain name. Resolution can be tailored to retrieve particular 242 types of resource records (e.g., A records, or AAAA records, or MX 243 records) 245 o Validation -- any string that conforms to DNS syntax may be 246 considered valid. 248 o Status check -- DNS does not distinguish between "inactive" and 249 "not minted". That is, either every label in the hierarchical 250 domain name is accessible in an authoritative domain server (in 251 which case the domain name is "minted" and "active") or the DNS 252 will return the value that it does not exist. (Not true in 253 DNSSEC?) 255 o Authentication -- domain names are not authenticated (see below 256 for DNSSEC). 258 o Lookup -- DNS resolution is lookup of domain names 260 o Search -- DNS does not support search (wildcards?) 262 o Subscribe -- N/A 264 DNS identifies resource records. The resource records are 265 themselves descriptions of Internet hosts, services, and other 266 information stored in the DNS, but fundamentally the DNS identifier 267 is for resource records. 269 o Entity/resource -- a set of resource records 271 o Instance -- copies of resource records may be stored in caching 272 servers; there are no special identifiers to distinguish primary 273 or cached results 275 o Properties -- N/A 277 o Relationship -- N/A 279 There are actions that may be performed on the things identified: 281 o Assign properties -- resource records have time to live (TTL) and 282 serial numbers included 284 o Get properties -- parsed as part of the response from the server. 286 o Publish -- publishing a DNS resource record amounts to updating a 287 DNS zone file with the record. 289 o Get (a copy) -- resolve the domain name identifier 291 o Authenticate -- DNSSEC is used to authenticate that the resource 292 records/response received for domain name resolution are as they 293 were published. 295 o Comparison -- of RRs? 297 o Send -- N/A 299 o Search -- N/A 301 4.2. World Wide Web 303 The World Wide Web (WWW) is largely defined by the HTTP protocol. 304 "Pages" defined in HTML are the primary design target, though these 305 days much content of varying formats is delivered via the HTTP 306 protocol. For the sake of discussion, we'll say that WWW identifiers 307 are HTTP scheme URIs. 309 o "Minting" -- typically, HTTP URIs are not composed consciously, so 310 much as assembled practically with components of the domain name 311 hosting the web server and some path components based on how the 312 website is laid out hierarchically (which may or may not relate to 313 an underlying file structure) 315 o Transformation -- HTTP URIs may be relative (to current page in 316 the "hierarchy", current domain authority etc) 318 o Comparison -- HTTP URIs are not inherently comparable except by 319 characterwise comparision or determining relative relationships 321 o Resolution -- HTTP URIs are resolved by parsing the authority 322 component from the URI, connecting to the server, and requesting 323 the resource associated with the path part of the URI. 325 o Validation -- any string that conforms to HTTP URI syntax may be 326 considered valid. 328 o Status check -- HTTP does not distinguish between "inactive" and 329 "not minted". That is, either the HTTP server is available and 330 the resource is found on it (in which case the URI is "minted" and 331 "active") or the server (or path) are not found. 333 o Authentication -- HTTP URIs are not authenticated (see below for 334 authentication of servers). 336 o Lookup -- N/A 338 o Search -- HTTP does not support search (within server?) 340 o Subscribe -- N/A 342 HTTP URIs identify "web pages" or "resources". 344 o Entity/resource -- web content (page) 346 o Instance -- copies of pages may be stored in caching servers; 347 there are no special identifiers to distinguish primary or cached 348 results 350 o Properties -- properties may be embedded within the HTML document, 351 but there are no special identifiers to query/retrieve properties; 352 as part of the HTTP protocol, capabilities may be negotiated 354 o Relationship -- relative URIs (?) 356 There are actions that may be performed on the things identified: 358 o Assign properties -- the web server may assign properties to a web 359 page. 361 o Get properties --parsed as part of the response from the server. 363 o Publish -- publishing a web page is done on the backend, out of 364 band of the WWW system 366 o Get (a copy) -- resolve the URI 367 o Authenticate -- certs are used, within HTTP, to authenticate the 368 server is empowered to operate for a given domain name. 369 Individual pages are not authenticated. 371 o Comparison -- many web pages look alike -- but there is no 372 inherent way to claim two web pages (documents) are "the same". 374 o Send -- N/A 376 o Search -- within the WWW there is no support for search (all 377 search is achieved as an external system). 379 4.3. Classic URIs 381 The advent of the WWW heralded a burst of development of standards 382 for applications and content on the Internet. Much work was done to 383 elaborate a general system of identifiers, supporting a broad range 384 of application needs -- Uniform Resource Identifiers in the large, 385 encompassing Locators (identifiers of Internet "location"), Names 386 (persistent, location-independent identifiers for resources), 387 Characteristics (metadata about resources), Agents (for composing 388 actions). 390 In the large, the classic perspective on URIs was that they would 391 identify all resources (documents, services, media, components, 392 classes, parameters etc) that would be referenced within Internet 393 protocols. 395 o "Minting" -- dependent on the URI scheme. The HTTP URI scheme is 396 outlined above as a dynamic URI creation example. Some 397 (namespaces of) URNs require more explicit minting of identifiers 398 to be used in URNs (e.g., ISBN URNs). 400 o Transformation -- dependent on the URI scheme. URNs were 401 intended to be (authoritatively) transformed into URLs identifying 402 the location of the desired resource at a given point in time. 404 o Comparison -- Scheme-dependent -- there is no URI-wide support for 405 comparing URIs (except byte-wise equality). 407 o Resolution -- Scheme-dependent. Some URI schemes do not have 408 Internet-based resolution capabilities. 410 o Validation -- any string that conforms to URI syntax may be 411 considered valid. Individual schemes may include validation 412 services (e.g., out of band lookup services, built-in checksums, 413 etc). 415 o Status check -- Generally, URIs do not distinguish between 416 "inactive" and "not minted". That is, successful resolution 417 implies minted, unsuccessful resolution is ambiguous. Individual 418 schemes may provide more refined methods of confirmation of status 419 (SIP URIs?) 421 o Authentication -- There is no URI-wide support for URI 422 authentication. 424 o Lookup -- N/A 426 o Search -- URIs are not inherently searchable 428 o Subscribe -- N/A 430 Classically, URIs identify Internet "resources". However, URIs 431 have been found in contexts that are disjoint from the Internet 432 (e.g., XML component identification). "Resource" is deliberately 433 general -- could be documents, services, etc. Each URI scheme 434 refines its scope of intended resources. 436 Entity/resource -- typically Internet content or service, but see 437 comment above. 439 Instance -- Most URI schemes do not support identification of 440 instances. However, URNs are intended to identify locations of 441 multiple instances of a given resource. 443 Properties -- URIs may identify URCs (Uniform Resource 444 Characteristics) an articulation of the properties of a given 445 resource. (URCs were never finally standardized). 447 Relationship -- relative URIs (?) 449 There are actions that may be performed on the things identified: 451 o Assign properties -- publish a URC 453 o Get properties -- retrieve a URC associated with the URI. 455 o Publish -- scheme-dependent 457 o Get (a copy) -- resolve the URI 459 o Authenticate -- certs are used, within some URI schemes, to 460 authenticate the server is empowered to operate for a given domain 461 name. Individual resources are not authenticated. (Unless you 462 have a URC with a checksum?) 464 o Comparison -- many resources look alike -- but there is no 465 inherent way to claim two resources are "the same". 467 o Send -- N/A 469 o Search -- within the URI space there is no support for search (all 470 search is achieved as an external system). 472 4.4. IP addresses 474 To be added... the interesting thing about IP addresses is 475 considering the context in which they are defined, and then 476 contrasting that with the places they turn up. 478 5. Common design choices and challenges 480 In creating a system, there are important common design choices that 481 need to be made. Sometimes the answer is implicit within the overall 482 design constraints of the system. Other times, considerable effort 483 is required to refine specifications and make appropriate choices. 484 As noted, these are common design questions. This is an area where 485 understanding of previous systems? design discussions can be 486 particularly helpful (in order not to repeat them needlessly). 488 5.1. Identifiers 490 In defining identifiers for a system, is the intention that they: 492 o Name something -- the identifier will be associated with one 493 entity (or instance of an entity), wherever that entity may be 494 located. 496 o Locate something -- identify the location of an entity (at some 497 point in time). 499 o Are Smart or dumb identifiers -- "smart" identifiers have 500 structures that can be parsed to determine something about the 501 thing identified (e.g., domain in which it is stored); "dumb" 502 identifiers are opaque and must be resolved within the system. 504 o Have uniqueness -- is the identifier/resource binding unique? 506 o Have scope -- is the uniqueness (or other properties) only 507 maintained within some limited scope, or is it global? 509 o Permanent -- what is the expected level of permanence of the 510 identifier's relevance (the ability to use it, the binding between 511 the identifier and the identified resource). Or, are they 512 transient identifiers? 514 5.2. Identified 516 The specifics of the identified resource need to be defined, as well. 518 o Instances -- can there be multiple instances of a single resource? 519 How can they be distinguished and/or how can two instances be 520 equated. This is important if one needs to be able to cache 521 instances or otherwise validate "local" copies. 523 o Scope of applicability -- what constitutes a "resource" in this 524 system? 526 6. Issues in (mis)using identifiers 528 Things like using IP addressed out of context of the routing system ? 529 assumptions about uniqueness and volatility may be improper. 531 7. IANA Considerations 533 This memo includes no request to IANA. 535 8. Security Considerations 537 This document is about considering applications systems. Security is 538 important to applications, but is not specifically called out here. 540 9. References 542 [1] Bradner, S., "Key words for use in RFCs to Indicate 543 Requirement Levels", BCP 14, RFC 2119, March 1997, . 546 Author's Address 548 Leslie Daigle 549 ThinkingCat Enterprises 550 Leesburg, VA 20176 551 US 553 Email: ldaigle@thinkingcat.com