idnits 2.17.1 

draft-daigle-appidarch-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == The page length should not exceed 58 lines per page, but there was 1
     longer page, the longest (page 2) being 60 lines


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == Line 240 has weird spacing: '...ociated  with ...'

  == Line 434 has weird spacing: '...nes its  scope...'

  == Line 464 has weird spacing: '...-- many  resou...'

  -- The document date (March 2015) is 3330 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Unused Reference: '1' is defined on line 542, but no explicit reference
     was found in the text


     Summary: 1 error (**), 0 flaws (~~), 6 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Internet Engineering Task Force                                L. Daigle
3	Internet-Draft                                                       TCE
4	Intended status: Informational                                March 2015
5	Expires: August 31, 2015

7	              Internet Application Identifier Architecture
8	                     draft-daigle-appidarch-00.txt

10	Abstract

12	   This document outlines a general architecture for Internet
13	   applications, through the perspective of applications identifiers.
14	   It provides a survey of past approaches, drawing out common elements
15	   and highlighting common traps and roadblocks.

17	Status of this Memo

19	   This Internet-Draft is submitted in full conformance with the
20	   provisions of BCP 78 and BCP 79.

22	   Internet-Drafts are working documents of the Internet Engineering
23	   Task Force (IETF).  Note that other groups may also distribute
24	   working documents as Internet-Drafts.  The list of current Internet-
25	   Drafts is at http://datatracker.ietf.org/drafts/current/.

27	   Internet-Drafts are draft documents valid for a maximum of six months
28	   and may be updated, replaced, or obsoleted by other documents at any
29	   time.  It is inappropriate to use Internet-Drafts as reference
30	   material or to cite them other than as "work in progress."

32	   This Internet-Draft will expire on August 31, 2015.

34	Copyright Notice

36	   Copyright (c) 2015 IETF Trust and the persons identified as the
37	   document authors.  All rights reserved.

39	   This document is subject to BCP 78 and the IETF Trust's Legal
40	   Provisions Relating to IETF Documents (http://trustee.ietf.org/
41	   license-info) in effect on the date of publication of this document.
42	   Please review these documents carefully, as they describe your rights
43	   and restrictions with respect to this document.  Code Components
44	   extracted from this document must include Simplified BSD License text
45	   as described in Section 4.e of the Trust Legal Provisions and are
46	   provided without warranty as described in the Simplified BSD License.

48	Table of Contents

50	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  2
51	   2.  Basic components of Application Identifier Architecture  . . .  2
52	   3.  Application Identifier Architectures in More Detail  . . . . .  2
53	     3.1.  System . . . . . . . . . . . . . . . . . . . . . . . . . .  2
54	     3.2.  Identifiers  . . . . . . . . . . . . . . . . . . . . . . .  3
55	     3.3.  Identified . . . . . . . . . . . . . . . . . . . . . . . .  3
56	   4.  Survey of existing work  . . . . . . . . . . . . . . . . . . .  5
57	     4.1.  Domain Name System . . . . . . . . . . . . . . . . . . . .  5
58	     4.2.  World Wide Web . . . . . . . . . . . . . . . . . . . . . .  6
59	     4.3.  Classic URIs . . . . . . . . . . . . . . . . . . . . . . .  8
60	     4.4.  IP addresses . . . . . . . . . . . . . . . . . . . . . . . 10
61	   5.  Common design choices and challenges . . . . . . . . . . . . . 10
62	     5.1.  Identifiers  . . . . . . . . . . . . . . . . . . . . . . . 10
63	     5.2.  Identified . . . . . . . . . . . . . . . . . . . . . . . . 10
64	   6.  Issues in (mis)using identifiers . . . . . . . . . . . . . . . 11
65	   7.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 11
66	   8.  Security Considerations  . . . . . . . . . . . . . . . . . . . 11
67	   9.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 11
68	   Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 11

70	1.  Introduction

72	   This document posits a high level architecture of Internet
73	   application identifier systems, as well as a survey of IETF efforts
74	   dealing in standardization of Internet applications and services
75	   using those identifier systems.

77	   Status of this revision:  this is a very drafty -00 document.  The
78	   hope and expectation is that it is enough to stimulate some thought
79	   and discussion to flesh out future documents.

81	2.  Basic components of Application Identifier Architecture

83	   There are 3 basic components of an Application Identifier
84	   Architecture:

86	   o  System

88	   o  Identifiers

90	   o  Identified

92	   The System is the context in which the identifiers are created, used
93	   and in which they are intended to make sense.  This is usually
94	   transparent, except when identifiers are used outside of this
95	   context, causing greater or lesser problems over time.  This is
96	   discussed in more detail, below.

98	   Identifiers are typically strings of bits or characters.    They may
99	   have multiple representations.

101	   The concept of what is being Identified is also dependent on the
102	   System -- whether it's Internet hosts, services, documents, parts of
103	   documents, people or other actors from the physical world, their
104	   representatives in the Internet, etc.

106	3.  Application Identifier Architectures in More Detail

108	3.1.  System
109	   As noted above, the System is the context in which the identifiers
110	   make sense.   In the Domain Name System, for example, the system
111	   initially consisted of the set of hosts attached to the Internet, and
112	   it has generalized to the set of addressable Internet services.
113	   These are organized into ?domains?, which are operated under the
114	   control of a single entity, and individual domains are completely
115	   independent of each other.

117	3.2.  Identifiers

119	   Identifiers may identify a thing that exists ? content, service,
120	   location ? or is posited to exist.  Typical actions on identifiers
121	   include:

123	   o  "Minting" --  creation of an identifier, usually including
124	      association with the identified thing

126	   o  Transformation -- changing the bits (characters) of the identifier
127	      by some set of rules and/or to conform to some structure; relative
128	      or absolute

130	   o  Comparison -- of identifiers.  Are 2 different identifiers the
131	      same?  Identifying the same thing?  Expressing relationship
132	      between things?

134	   o  Resolution -- using the identifier to access what it identifies

136	   o  Validation -- confirmation that the identifier conforms to the
137	      system?s rules (syntax)

139	   o  Status check -- has the identifier been minted?  Is it active
140	      within the system?

142	   o  Authentication -- confirmation that the identifier association is
143	      valid (as minted)

145	   o  Lookup -- some systems support look up ? finding identifier
146	      entries based on partial fragments, typically leading characters
147	      (bits)

149	   o  Search -- some systems support searches for identifiers based on
150	      fragments of the related data

152	   o  Subscribe -- subscribing to an identifier allows you to get
153	      periodic updates as to state of the identifier/identified.

155	3.3.  Identified

157	   Identifiers can be associated with just about any level of concept,
158	   construct, network or software element, or entity in the physical
159	   world.   The range of possible identified things is generally scoped
160	   by the System.

162	   From the perspective of application architectures, there are 4 levels
163	   of things that may be identified, and may have individual
164	   identifiers:

166	   o  Entity/resource -- the thing itself.  For example, the published
167	      work "Moby Dick"

169	   o  Instance -- a specific copy of the thing, e.g.,  a copy of "Moby
170	      Dick".

172	   o  Properties -- the characteristics of the thing.  The set of
173	      properties discussed is generally constrained by the System.

175	   o  Relationship -- an identifier may identify something as "part-of"
176	      a larger entity.

178	   There are actions that may be performed on the things identified:

180	   o  Assign properties -- associate values with properties of
181	      identified item

183	   o  Get properties

185	   o  Intrinsic -- E.g., format, number of words

187	   o  Applied -- director's cut

189	   o  Publish -- put copy somewhere

191	   o  First instance

193	   o  Cache/replica

195	   o  Get (a copy)

197	   o  Any copy

199	   o  Specific service

201	   o  Closest

203	   o  Cheapest

205	   o  Authenticate -- confirm (cryptographically) that the resource is
206	      genuinely the one expected / related to identifier

208	   o  Comparison

210	   o  Equivalence

212	   o  Send --  a reflection of "get"?

214	   o  Subscribe
215	   o  Search

217	4.  Survey of existing work

219	4.1.  Domain Name System

221	   The Domain Name System (DNS) was designed to provide identifiers to
222	   allow storage and retrieval of (sets of) properties associated with
223	   Internet hosts and services ? real and virtual.

225	   DNS identifiers are hierarchical, dot-separated labels, typically
226	   expressed as characters.  Host names are a subset of domain name
227	   identifiers, with some restrictions on the permissible characters.

229	   o  "Minting" -- the authority for a domain can create labels within
230	      that domain.  So-called "synthetic" domain labels are created
231	      dynamically.

233	   o  Transformation -- relative domain names can be understood within
234	      the context of a ?search domain?

236	   o  Comparison -- domain names are matched on an octet-by-octet basis
237	      (IDNs are not considered here)

239	   o  Resolution --  DNS resolution means "DNS lookup" -- using the DNS
240	      infrastructure to retrieve resource records associated  with the
241	      domain name.  Resolution can be tailored to retrieve particular
242	      types of resource records (e.g., A records, or AAAA records, or MX
243	      records)

245	   o  Validation -- any string that conforms to DNS syntax may be
246	      considered valid.

248	   o  Status check --  DNS does not distinguish between "inactive" and
249	      "not minted".  That is, either every label in the hierarchical
250	      domain name is accessible in an authoritative domain server (in
251	      which case the domain name is "minted" and "active") or the DNS
252	      will return the value that it does not exist.  (Not true in
253	      DNSSEC?)

255	   o  Authentication -- domain names are not authenticated (see below
256	      for DNSSEC).

258	   o  Lookup -- DNS resolution is lookup of domain names

260	   o  Search -- DNS does not support search (wildcards?)

262	   o  Subscribe -- N/A

264	   DNS identifies resource records.   The resource records are
265	   themselves descriptions of Internet hosts, services, and other
266	   information stored in the DNS, but fundamentally the DNS identifier
267	   is for resource records.

269	   o  Entity/resource -- a set of resource records

271	   o  Instance -- copies of resource records may be stored in caching
272	      servers; there are no special identifiers to distinguish primary
273	      or cached results

275	   o  Properties -- N/A

277	   o  Relationship -- N/A

279	   There are actions that may be performed on the things identified:

281	   o  Assign properties -- resource records have time to live (TTL) and
282	      serial numbers included

284	   o  Get properties -- parsed as part of the response from the server.

286	   o  Publish -- publishing a DNS resource record amounts to updating a
287	      DNS zone file with the record.

289	   o  Get (a copy) -- resolve the domain name identifier

291	   o  Authenticate -- DNSSEC is used to authenticate that the resource
292	      records/response received for domain name resolution are as they
293	      were published.

295	   o  Comparison -- of RRs?

297	   o  Send --  N/A

299	   o  Search -- N/A

301	4.2.  World Wide Web

303	   The World Wide Web (WWW) is largely defined by the HTTP protocol.
304	   "Pages" defined in HTML are the primary design target, though these
305	   days much content of varying formats is delivered via the HTTP
306	   protocol.  For the sake of discussion, we'll say that WWW identifiers
307	   are HTTP scheme URIs.

309	   o  "Minting" -- typically, HTTP URIs are not composed consciously, so
310	      much as assembled practically with components of the domain name
311	      hosting the web server and some path components based on how the
312	      website is laid out hierarchically (which may or may not relate to
313	      an underlying file structure)

315	   o  Transformation -- HTTP URIs may be relative (to current page in
316	      the "hierarchy", current domain authority etc)

318	   o  Comparison -- HTTP URIs are not inherently comparable except by
319	      characterwise comparision or determining relative relationships

321	   o  Resolution -- HTTP URIs are resolved by parsing the authority
322	      component from the URI, connecting to the server, and requesting
323	      the resource associated with the path part of the URI.

325	   o  Validation -- any string that conforms to HTTP URI syntax may be
326	      considered valid.

328	   o  Status check -- HTTP does not distinguish between "inactive" and
329	      "not minted".  That is, either the HTTP server is available and
330	      the resource is found on it (in which case the URI is "minted" and
331	      "active") or the server (or path) are not found.

333	   o  Authentication -- HTTP URIs are not authenticated (see below for
334	      authentication of servers).

336	   o  Lookup -- N/A

338	   o  Search -- HTTP does not support search (within server?)

340	   o  Subscribe -- N/A

342	   HTTP URIs identify "web pages" or "resources".

344	   o  Entity/resource -- web content (page)

346	   o  Instance -- copies of pages may be stored in caching servers;
347	      there are no special identifiers to distinguish primary or cached
348	      results

350	   o  Properties -- properties may be embedded within the HTML document,
351	      but there are no special identifiers to query/retrieve properties;
352	      as part of the HTTP protocol, capabilities may be negotiated

354	   o  Relationship -- relative URIs (?)

356	   There are actions that may be performed on the things identified:

358	   o  Assign properties -- the web server may assign properties to a web
359	      page.

361	   o  Get properties --parsed as part of the response from the server.

363	   o  Publish -- publishing a web page is done on the backend, out of
364	      band of the WWW system

366	   o  Get (a copy) -- resolve the URI
367	   o  Authenticate --  certs are used, within HTTP, to authenticate the
368	      server is empowered to operate for a given domain name.
369	      Individual pages are not authenticated.

371	   o  Comparison -- many web pages look alike -- but there is no
372	      inherent way to claim two web pages (documents) are "the same".

374	   o  Send --  N/A

376	   o  Search -- within the WWW there is no support for search (all
377	      search is achieved as an external system).

379	4.3.  Classic URIs

381	   The advent of the WWW heralded a burst of development of standards
382	   for applications and content on the Internet.  Much work was done to
383	   elaborate a general system of identifiers, supporting a broad range
384	   of application needs -- Uniform Resource Identifiers in the large,
385	   encompassing Locators (identifiers of Internet "location"), Names
386	   (persistent, location-independent identifiers for resources),
387	   Characteristics (metadata about resources),  Agents (for composing
388	   actions).

390	   In the large, the classic perspective on URIs was that they would
391	   identify all resources (documents, services, media, components,
392	   classes, parameters etc) that would be referenced within Internet
393	   protocols.

395	   o  "Minting" -- dependent on the URI scheme.  The HTTP URI scheme is
396	      outlined above as a dynamic URI creation example.   Some
397	      (namespaces of) URNs require more explicit minting of identifiers
398	      to be used in URNs (e.g., ISBN URNs).

400	   o  Transformation -- dependent on the URI scheme.   URNs were
401	      intended to be (authoritatively) transformed into URLs identifying
402	      the location of the desired resource at a given point in time.

404	   o  Comparison -- Scheme-dependent -- there is no URI-wide support for
405	      comparing URIs (except byte-wise equality).

407	   o  Resolution -- Scheme-dependent.  Some URI schemes do not have
408	      Internet-based resolution capabilities.

410	   o  Validation -- any string that conforms to URI syntax may be
411	      considered valid.  Individual schemes may include validation
412	      services (e.g., out of band lookup services, built-in checksums,
413	      etc).

415	   o  Status check -- Generally, URIs do not distinguish between
416	      "inactive" and "not minted".   That is, successful resolution
417	      implies minted, unsuccessful resolution is ambiguous.  Individual
418	      schemes may provide more refined methods of confirmation of status
419	      (SIP URIs?)

421	   o  Authentication -- There is no URI-wide support for URI
422	      authentication.

424	   o  Lookup -- N/A

426	   o  Search -- URIs are not inherently searchable

428	   o  Subscribe -- N/A

430	   Classically, URIs identify  Internet "resources".   However, URIs
431	   have been found in contexts that are disjoint from the Internet
432	   (e.g., XML component identification).  "Resource" is deliberately
433	   general -- could be documents, services, etc.   Each URI scheme
434	   refines its  scope of intended resources.

436	      Entity/resource -- typically Internet content or service, but see
437	      comment above.

439	      Instance -- Most URI schemes do not support identification of
440	      instances.  However, URNs are intended to identify locations of
441	      multiple instances of a given resource.

443	      Properties -- URIs may identify URCs (Uniform Resource
444	      Characteristics)  an articulation of the properties of a given
445	      resource.  (URCs were never finally standardized).

447	      Relationship -- relative URIs (?)

449	   There are actions that may be performed on the things identified:

451	   o  Assign properties -- publish a URC

453	   o  Get properties -- retrieve a URC associated with the URI.

455	   o  Publish -- scheme-dependent

457	   o  Get (a copy) -- resolve the URI

459	   o  Authenticate --  certs are used, within some URI schemes, to
460	      authenticate the server is empowered to operate for a given domain
461	      name.  Individual resources are not authenticated.  (Unless you
462	      have a URC with a checksum?)

464	   o  Comparison -- many  resources look alike -- but there is no
465	      inherent way to claim two resources are "the same".

467	   o  Send --  N/A

469	   o  Search -- within the URI space there is no support for search (all
470	      search is achieved as an external system).

472	4.4.  IP addresses

474	   To be added... the interesting thing about IP addresses is
475	   considering the context in which they are defined, and then
476	   contrasting that with the places they turn up.

478	5.  Common design choices and challenges

480	   In creating a system, there are important common design choices that
481	   need to be made.  Sometimes the answer is implicit within the overall
482	   design constraints of the system.  Other times, considerable effort
483	   is required to refine specifications and make appropriate choices.
484	   As noted, these are common design questions.  This is an area where
485	   understanding of previous systems?  design discussions can be
486	   particularly helpful (in order not to repeat them needlessly).

488	5.1.  Identifiers

490	   In defining identifiers for a system, is the intention that they:

492	   o  Name something -- the identifier will be associated with one
493	      entity (or instance of an entity), wherever that entity may be
494	      located.

496	   o  Locate something -- identify the location of an entity (at some
497	      point in time).

499	   o  Are Smart or dumb identifiers -- "smart" identifiers have
500	      structures that can be parsed to determine something about the
501	      thing identified (e.g., domain in which it is stored); "dumb"
502	      identifiers are opaque and must be resolved within the system.

504	   o  Have uniqueness -- is the identifier/resource binding unique?

506	   o  Have scope -- is the uniqueness (or other properties) only
507	      maintained within some limited scope, or is it global?

509	   o  Permanent -- what is the expected level of permanence of the
510	      identifier's relevance (the ability to use it, the binding between
511	      the identifier and the identified resource).  Or, are they
512	      transient identifiers?

514	5.2.  Identified

516	   The specifics of the identified resource need to be defined, as well.

518	   o  Instances -- can there be multiple instances of a single resource?
519	      How can they be distinguished and/or how can two instances be
520	      equated.  This is important if one needs to be able to cache
521	      instances or otherwise validate "local" copies.

523	   o  Scope of applicability -- what constitutes a "resource" in this
524	      system?

526	6.  Issues in (mis)using identifiers

528	   Things like using IP addressed out of context of the routing system ?
529	   assumptions about uniqueness and volatility may be improper.

531	7.  IANA Considerations

533	   This memo includes no request to IANA.

535	8.  Security Considerations

537	   This document is about considering applications systems.  Security is
538	   important to applications, but is not specifically called out here.

540	9.  References

542	   [1]        Bradner, S., "Key words for use in RFCs to Indicate
543	              Requirement Levels", BCP 14, RFC 2119, March 1997, <http:/
544	              /xml.resource.org/public/rfc/html/rfc2119.html>.

546	Author's Address

548	   Leslie Daigle
549	   ThinkingCat Enterprises
550	   Leesburg, VA 20176
551	   US

553	   Email: ldaigle@thinkingcat.com