idnits 2.17.1 

draft-amsuess-core-rd-replication-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** There are 3 instances of too long lines in the document, the longest one
     being 4 characters in excess of 72.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (March 02, 2018) is 2239 days in the past.  Is this
     intentional?

  -- Found something which looks like a code comment -- if you have code
     sections in the document, please surround them with '<CODE BEGINS>' and
     '<CODE ENDS>' lines.


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  -- Looks like a reference, but probably isn't: '1' on line 833

  == Outdated reference: A later version (-14) exists of
     draft-ietf-core-dynlink-04

  == Outdated reference: A later version (-28) exists of
     draft-ietf-core-resource-directory-13


     Summary: 2 errors (**), 0 flaws (~~), 3 warnings (==), 3 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	CoRE                                                          C. Amsuess
3	Internet-Draft                                            March 02, 2018
4	Intended status: Informational
5	Expires: September 3, 2018

7	                     Resource Directory Replication
8	                  draft-amsuess-core-rd-replication-01

10	Abstract

12	   Discovery of endpoints and resources in M2M applications over large
13	   networks is enabled by Resource Directories, but no special
14	   consideration has been given to how such directories can scale beyond
15	   what can be managed by a single device.

17	   This document explores different ways in which Resource Directories
18	   can be scaled up from single network to enterprise and global scale.
19	   It does not attempt to standardize any of those methods, but only to
20	   demonstrate the feasibility of such extensions and to provide
21	   terminology and exploratory groundwork for later documents.

23	Status of This Memo

25	   This Internet-Draft is submitted in full conformance with the
26	   provisions of BCP 78 and BCP 79.

28	   Internet-Drafts are working documents of the Internet Engineering
29	   Task Force (IETF).  Note that other groups may also distribute
30	   working documents as Internet-Drafts.  The list of current Internet-
31	   Drafts is at https://datatracker.ietf.org/drafts/current/.

33	   Internet-Drafts are draft documents valid for a maximum of six months
34	   and may be updated, replaced, or obsoleted by other documents at any
35	   time.  It is inappropriate to use Internet-Drafts as reference
36	   material or to cite them other than as "work in progress."

38	   This Internet-Draft will expire on September 3, 2018.

40	Copyright Notice

42	   Copyright (c) 2018 IETF Trust and the persons identified as the
43	   document authors.  All rights reserved.

45	   This document is subject to BCP 78 and the IETF Trust's Legal
46	   Provisions Relating to IETF Documents
47	   (https://trustee.ietf.org/license-info) in effect on the date of
48	   publication of this document.  Please review these documents
49	   carefully, as they describe your rights and restrictions with respect
50	   to this document.  Code Components extracted from this document must
51	   include Simplified BSD License text as described in Section 4.e of
52	   the Trust Legal Provisions and are provided without warranty as
53	   described in the Simplified BSD License.

55	Table of Contents

57	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
58	   2.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . .   3
59	   3.  Goals of upscaling  . . . . . . . . . . . . . . . . . . . . .   3
60	     3.1.  Large numbers of registrations  . . . . . . . . . . . . .   3
61	     3.2.  Large number of requests  . . . . . . . . . . . . . . . .   3
62	     3.3.  Redundancy  . . . . . . . . . . . . . . . . . . . . . . .   4
63	   4.  Approaches  . . . . . . . . . . . . . . . . . . . . . . . . .   4
64	     4.1.  Shared authority  . . . . . . . . . . . . . . . . . . . .   4
65	     4.2.  Plain caching . . . . . . . . . . . . . . . . . . . . . .   5
66	     4.3.  RD-aware caching  . . . . . . . . . . . . . . . . . . . .   6
67	       4.3.1.  Potential for improvement . . . . . . . . . . . . . .   7
68	     4.4.  Distinct registration points  . . . . . . . . . . . . . .   7
69	       4.4.1.  Redundancy and handover . . . . . . . . . . . . . . .   8
70	       4.4.2.  Loops between RDs and proxies . . . . . . . . . . . .   8
71	   5.  Proposed RD extensions  . . . . . . . . . . . . . . . . . . .   9
72	     5.1.  Provenance  . . . . . . . . . . . . . . . . . . . . . . .   9
73	     5.2.  Lifetime Age  . . . . . . . . . . . . . . . . . . . . . .  10
74	   6.  Example scenarios . . . . . . . . . . . . . . . . . . . . . .  11
75	     6.1.  Redundant and replicated resource lookup (anycast)  . . .  11
76	     6.2.  Redundant and replicated resource lookup (distinct
77	           registration points)  . . . . . . . . . . . . . . . . . .  12
78	       6.2.1.  Variation: Large number of registrations, localized
79	               queries . . . . . . . . . . . . . . . . . . . . . . .  15
80	       6.2.2.  Variation: Combination with anycast . . . . . . . . .  15
81	     6.3.  Anonymous global endpoint lookup  . . . . . . . . . . . .  16
82	   7.  References  . . . . . . . . . . . . . . . . . . . . . . . . .  18
83	     7.1.  Informative References  . . . . . . . . . . . . . . . . .  18
84	     7.2.  URIs  . . . . . . . . . . . . . . . . . . . . . . . . . .  18
85	   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . .  18

87	1.  Introduction

89	   [ See abstract for now. ]

91	   This document is being developed in a git based workflow.  Please see
92	   https://github.com/chrysn/resource-directory-replication [1] for more
93	   details and easy ways to contribute.

95	2.  Terminology

97	   This document assumes familiarity with [RFC7252] and
98	   [I-D.ietf-core-resource-directory] and uses terminology from those
99	   documents.

101	   Examples in which URI paths like "/rd" or "/rd-lookup/res" are used
102	   assume that those URIs have been obtained before by an RD Discovery
103	   process; these paths are only examples, and no implementation should
104	   make assumptions based on the literal paths.

106	3.  Goals of upscaling

108	   The following sections outline different reasons why a Resource
109	   Directory should be scaled beyond a singe device.  Not all of them
110	   will necessarily apply to all use cases, and not all solution
111	   approaches might be suitable for all goals.

113	3.1.  Large numbers of registrations

115	   Even at 1kB of link data per registration, modern server hardware can
116	   easily keep the data of millions of registrations in RAM
117	   simultaneously.  Thus, the mere size of registration data is not
118	   expected to be a factor that requires scaling to multiple nodes.

120	   The traffic produced when millions of nodes with the default 24h
121	   lifetime amounts to dozens of exchanges per second, which is doable
122	   with equal ease at central network equipment.

124	   However, if the directory has additional interaction with its
125	   registered nodes, for example because it provides proxying to
126	   registered endpoints, resources like file descriptors can be
127	   exhausted earlier, and the traffic load on the registration server
128	   grows with the traffic it is proxying for the endpoint.

130	3.2.  Large number of requests

132	   Not all approaches to constrained restful communication use the
133	   Resource Directory only in the setup stage; some are might also
134	   utilize a Resource Directory in more day-to-day operation.

136	   [ TODO: get some numbers on how many requests a single RD can deal
137	   with. ]

139	3.3.  Redundancy

141	   With the RD as a central part of CoRE infrastructures, outages can
142	   affect a large number of users.

144	   A decentralized RD should be able to deal both with scheduled
145	   downtimes of hosts as well as unexpected outages of hosts or parts of
146	   the network, especially with network splits between the individual
147	   parts of the directory.

149	4.  Approaches

151	   In this section, two independent chains of approaches are presented.
152	   The "shared authority" approach (using anycast or DNS aliases), and
153	   proxy-based caching (in stages from using generic proxies to RD
154	   replication that only bears little resemblance to proxies).

156	   In the remainder of this document, the term "proxy" always refers to
157	   a device which a client can access as if it were a resource
158	   directory, and forwards the request to an actual RD.

160	   Elements from those chains can be mixed.

162	4.1.  Shared authority

164	   With this approach, a single host and port (or "authority" component
165	   in the generic URI syntax) is used for all interactions with the RD.

167	   This can be implemented using a host name pointing to different IP
168	   addresses simultaneously or depending on the requester's location,
169	   using IP anycast addresses or both.

171	   From the client's or proxy's point of view, all interaction happens
172	   with same Origin Server.

174	   In this setup, the replication is hidden from the REST interactions,
175	   and takes place inside the RD server implementation or its database
176	   backend.

178	   Compared to the other approaches, this is more complex to set up when
179	   it involves managing anycast addresses: Running an IPv4 anycast
180	   network on Internet scale requires running an Autonomous System.  In
181	   either variation, all server instances are tightly coupled; they need
182	   shared administration and probably need to run the same software.

184	   The replication characteristics are laregly inherited from the
185	   underlying backend.

187	   As registering endpoints only store the URI constructed from the
188	   Location-Path option to their registration request, registration
189	   updates can end up at any instance of the server, though they are
190	   likely to reach the same one as before most of the time.

192	   Spontaneous failure of individual nodes can interrupt endpoints'
193	   registrations in scenarious that do not use anycast addresses until
194	   the unusable addresses have left DNS caches.

196	4.2.  Plain caching

198	   Caching reverse proxies that are not particularly aware of a Resource
199	   Directory can be used to mitigate the effect of large numbers of
200	   requests on a single RD server.  In this approach, there exists a
201	   single central RD server instance, but proxies are placed in front of
202	   it to reduce its load.

204	   Caching is applicable only to the lookup interfaces; the POST request
205	   used in registration and renewal are not cacheable.

207	   A prerequisite for successful caching is that fresh copies exist in
208	   the cache; this is likely to happen only if there are many alike
209	   requests to the Resource Directory.  The proxy can than serve cached
210	   copies, and might find it advantageous to observe frequent queries.

212	   The simplest way to set up such proxying is to have the proxies
213	   forward all requests to the central RD and to advertise only the
214	   proxies' addresses.

216	   Due to the discovery process of the RD, operators can also limit the
217	   proxies to the lookup interfaces and advertise the central server for
218	   registration purposes.  A sample exchange between a node and its
219	   6LoWPAN border router could be:

221	Req: GET coap://[fe80::1]/.well-known/core?rt=core.rd*

223	Res: 2.05 Content
224	<coap://central-rd.example.com/rd>;rt="core.rd",
225	<coap://europe3.proxy.rd.example.com/rd-lookup/res>;rt="core.rd-lookup-res",
226	<coap://europe3.proxy.rd.example.com/rd-lookup/ep>;rt="core.rd-lookup-ep"

228	   Special care should be taken when a reverse proxy is not accessed by
229	   the client under the same address as the origin server, as relative
230	   references change their meaning when served from there.  This can be
231	   ignored completely on the resource lookup interface (as long as the
232	   provenance extension is not used); ignoring it on the endpoint lookup
233	   interface gives the client "wrong" results, though that is likely to
234	   only matter to applications that use both the lookup and the
235	   registration interface, like Commissioning Tools could do.  Proxies
236	   can be configured to do content transcoding (cf.  [RFC8075]
237	   Section 6.5.2) to preserve the lookup responses' original meanings.

239	   This approach does not help at all with large numbers of
240	   registrations.  It can mitigate issues with large numbers of lookup
241	   requests, provided that many identical requests arrive at the proxy.
242	   The effect on the redundancy goal is negligible: The proxy can
243	   provide lookup results only for as long as the cache is fresh during
244	   a central server outage, which is 60 seconds unless the RD server
245	   says otherwise.

247	   This approach can be run with off-the-shelf RD servers and proxies.
248	   The only configuration required is for the proxy to have a forwarding
249	   address, and for the RD (or its announcer) tho know which lookup
250	   addresses to advertise.

252	4.3.  RD-aware caching

254	   Similar to the above, specialized proxies can be employed that are
255	   aware that their target is an RD lookup address.

257	   The "plain caching" approach is limited in that it requires a small
258	   set of lookups to be frequently performed.  A proxy that is aware
259	   that the address it is forwarding to is of the Resource Type
260	   "core.rd-lookup-*" can utilize knowledge of how an RD works to serve
261	   more specialized requests as well from fresh generic content.

263	   For example, assume that the proxy frequently receives requests of
264	   the shape

266	Req: GET /rd-lookup/res?rt=core.s&rt=ex.temperature&ex.building=8341&title=X

268	   for arbitrary values of X.  Then it can use the following request to
269	   keep a fresh cache:

271	Req: GET coap://rd.example.com/rd-lookup/res?rt=core.s&rt=ex.temperature
272	    &ex.building=8341
273	Observe: 1

275	   and from that serve filtered responses to individual requests.

277	   This method shares the advantages of plain caching, with reduced
278	   limitations but requiring specialized proxying software.  The
279	   software does not necessarily need more configuration: A general-
280	   purpose proxy is free to explore the origin server's ".well-known/
281	   core" information, and can decide to enable RD optimizations after
282	   discovering that the frequently accesses resources are of resource
283	   type "core.rd-lookup-*".

285	4.3.1.  Potential for improvement

287	   Observing a large lookup result is relatively inefficient as the
288	   complete document needs to be transferred when a change happens.
289	   Serializations of web links that are suitable for expressing small
290	   deltas are expected to be developed for PATCH operations on
291	   registration resources.  If those formats are compatible with
292	   observation, they can be applied directly.  Otherwise, the proxy can
293	   try to establish a "push" dynamic link ([I-D.ietf-core-dynlink]) to
294	   receive continuous PATCH updates on its resource.

296	   The applicability of the RD-aware approach is further limited to
297	   query parameters of which the proxy knows that they are not subject
298	   to lookup filtering on other entities than the queried one.  In the
299	   example above, were the variable part the "d" attribute (of
300	   endpoints, as opposed to the "title" of resources), the proxy could
301	   not do the filtering on its own becaus it would not have the required
302	   information.  Even the above example does not allow for fully
303	   accurate replication, as the endpoint _might_ register with a "title"
304	   endpoint attribute, even though no such attribute is specified right
305	   now.  Also, annotating the links in the endpoint lookup with
306	   information about which registration they belong to would help the
307	   proxy keep all the data around to solve more complex queries.  The
308	   provenance extension is proposed for that purpose.

310	   In its extreme form, the proxy can observe the complete lookup
311	   resources of the Resource Directory.  It can then answer all queries
312	   on its own based on the continuously fresh state transferred in the
313	   observations.  That form requires the RD to support the provenance
314	   extension.

316	   For such proxies, it can be suitable to configure them to use stale
317	   cache values for extended periods of time when the RD becomes
318	   intermittently unavailable.

320	4.4.  Distinct registration points

322	   Caching proxies that are aware of RD semantics could be extended to
323	   gather information from more than one Resource Directory.

325	   When executing queries, they would consider candidates from all
326	   configured upstream servers and report the union of the respective
327	   query results.  At this stage, it is highly recommended that content
328	   transcoding takes place.

330	   With this approach, many distinct registration URIs can be
331	   advertised, for example due to geographic proximity.

333	   Unlike the other proxying approaches, this helps with the "large
334	   number of registrations" goal.  If that number is unmanageable for
335	   single devices, proxies need not keep full copies of all the RDs'
336	   states but rather send out queries to all of their upstreams,
337	   behaving more like the "plain caching" proxies.  This multiplies the
338	   lookup traffic, but allows for huge numbers of registrations.  The
339	   problems of "too many lookups" versus "too many registrations" can be
340	   traded off against each other if the proxies keep parts of the RDs'
341	   states locally at hand while forwarding more exotic requests to all
342	   RDs.

344	4.4.1.  Redundancy and handover

346	   This approach also tackles the redundancy goal.  When an endpoint
347	   registeres at its RD, the RD updates its endpoint and resource lookup
348	   results and includes the registration data until further notice (for
349	   correct operation, the "Lifetime Age" extension is useful).

351	   If at some point in time that RD server becomes unavailable, the
352	   proxies can keep the cached information around.  Before the lifetime
353	   expires, the endpoint will attempt to renew its registration and find
354	   that the RD is unavailable.  It will then go through discovery again,
355	   find the most recently advertised registration URI or pick another
356	   one out of a set and start a new registration there.

358	   If the lookup proxies do not evict the old (and soon-to-time-out)
359	   registration when the new one on a different RD with the same
360	   endpoint name and domain arrives, at worst there will be the same
361	   information twice from two registration resources available for
362	   lookup.

364	4.4.2.  Loops between RDs and proxies

366	   In this configuration, it can be tempting to run a Resource Directory
367	   and a lookup proxy (aimed at multiple resource directories) on the
368	   same host.

370	   [ It might be easier to recommend simply using different hosts, at
371	   least host names, in those cases, or anything else that allows direct
372	   and not publically advertised access to the real RDs' lookups. ]

374	   In such a setup, other aggregating lookup proxies must take care to
375	   only select locally registered entries.  With the current filtering
376	   rules, observing the resources "/rd-lookup/ep?href=/*" and "/rd-
377	   lookup/res?provenance=/*" crudely provides that kind of filtering.

379	5.  Proposed RD extensions

381	5.1.  Provenance

383	   In order for an RD-aware proxy to serve resource lookup requests that
384	   filter on endpoint parameters, the proxy needs a way to tell which
385	   endpoint registration submitted that link.  That information might
386	   also be useful for other purposes.

388	   This introduces a new link attribute "provenance".  Its value is a
389	   URI reference as described by [RFC3986] Section 4.1.  The URI is to
390	   be interpreted by the same rules that apply to the "anchor"
391	   attribute, namely by resolving the reference relative to the
392	   requested document's URI.  The attribute should not be repeated, and
393	   in presence of multiple attributes, only the last should be
394	   considered.

396	   [ TODO: If a something link-format-ish comes up during the
397	   development of this document which allows setting base-hrefs in-line,
398	   evaluate whether it really makes sense to inherit anchor's rules or
399	   whether it's better to phrase it in a way that the requested base URI
400	   always counts. ]

402	   The URI given in the "provenance" attribute describes where the
403	   information in the link was obtained from.  An aggregator of links
404	   can thus declare its sources for each link.

406	   It is recommended that a Resource Directory adds the URI of the
407	   registration resource to resource lookups.  Thus, if an endpoint
408	   registers as

410	   Req: POST /rd?ep=node1
411	   Payload:
412	   </sensors/temp>;if="core.s"

414	   Res: 2.01 Created
415	   Location: /reg/1234

417	   then a lookup will add a provenance attribute:

419	   Req: GET /rd-lookup/res?if=core.s

421	   Res: 2.05 Content
422	   Payload:
423	   <coap://.../sensors/temp>;if="core.s";anchor="coap://...";
424	       provenance="/reg/1234"

426	   This is not an IANA consideration as there is no established registry
427	   of link attributes.

429	   By itself, the provenance attribute does not need to be registered in
430	   the RD Parameters Registry because it is just another link attribute.
431	   If it is desired that provenance information is only shown on request
432	   (eg. by RD-aware proxies), a parameter can be introduced there:

434	   o  Full name: Link provenance

436	   o  short: provenance

438	   o  Validity: URI

440	   o  Use: Resource lookup only

442	   o  Description: If "provenance" or any string starting with
443	      "provenance=" is given as one of the ampersand-delimited query
444	      arguments, the RD is instructed to add the provenance attribute to
445	      all looked up links; otherwise, the RD will not present them.  The
446	      filtering rules still apply: If there is a "=" sign in the query
447	      argument, only links with matching provenance will be reported.

449	5.2.  Lifetime Age

451	   The result of an endpoint lookup as a whole has inhomogenous cache
452	   properties that would determine its Max-Age:

454	   o  The document can change at any time when a new endpoint registers.

456	   o  The document can change at any time when an endpoint deregisters.

458	   o  Each record can be expected to not change until its lifetime has
459	      expired.

461	   As currently specified, a lookup client has no way to tell where in
462	   its lifetime an endpoint is.  Therefore, a new link attribute is
463	   suggested that allows the RD to share that information:

465	   The new link attribute Lifetime Age (lt-age) is described for use in
466	   RD Endpoint Lookups.  Valid values are integers from 0 to the
467	   lifetime of the registration.  The value indicates how many seconds
468	   have passed since the endpoint last renewed its registration.

470	   Care has to be taken when replicating this value in caches, as the
471	   caching agent might be unaware of the attribute's semantics and not
472	   update it.  (This is unlike the Max-Age attribute, which a caching
473	   agent needs to understand and reduce accordingly when serving from
474	   the cache).  It should therefore only be used with responses that
475	   carry the default Max-Age of 60 or less.

477	   Clients that use the lookup interface (especially RD-aware proxies)
478	   are free to treat that record and its corresponding resource records
479	   as fresh until after the difference of lt and lt-age seconds have
480	   passed since the endpoint lookup result was obtained, especially if
481	   the origin server has become unavailable.

483	   Security considerations: Given that this leaks information about the
484	   endpoint's communication patterns, it may be prudent for an RD only
485	   to reveal this information on a need-to-know basis.

487	6.  Example scenarios

489	6.1.  Redundant and replicated resource lookup (anycast)

491	   This scenario describes a setup where millions of devices register in
492	   a company-wide Resource Directory.

494	   The directory is scaled using the shared authority / anycast
495	   approach, and the RD implementation is backed by a NoSQL-style
496	   distributed database.

498	           /'''''''\______/'''''\__/''''''''\
499	        /-                                   -\
500	        |,           NoSQL database           |
501	          \,,,                           ,~''
502	              \_____/'''\__________/''''   \
503	           /             |                  \
504	     /''''''\        /''''''\                 /''''''\
505	     | RD-A |        | RD-B |                 | RD-C |
506	     \______/        \______/                 \______/
507	    /  |  | \        / | | | \                  | | |
508	   E   E  C  E       E E E E C                  C C C

510	   ("E" and "C" represent endpoints and lookup clients, respectively)

512	   Both endloints and lookup clients receive the RD address
513	   "2001:db8::an1:ca57" is announced to all devices on the network using
514	   the RDAO option in IPv6 Neighbor Discovery.  Any packages to that
515	   addresses are routed by the network to the closest of the three RD
516	   instances A, B and C.  Discovery invariably looks like this:

518	   Req: GET coap://[2001:db8::an1:ca57/.well-known/core?rt=core.rd*

520	   Res: 2.05 Content
521	   </rd>;rt="core.rd",
522	   </rd-lookup/res>;rt="core.rd-lookup-res",
523	   </rd-lookup/ep>;rt="core.rd-lookup-ep"

525	   An endpoint close to B would therefore register with

527	   Req: POST coap://[2001:db8::an1:ca57]/rd?ep=endpoint1&
528	       d=facility23.eu.example.com
529	   Payload:
530	   </sensors/temp>;if="core.s"

532	   Res: 2.01 Created
533	   Location: /reg/123e4567-e89b-12d3-a456-426655440000

535	   Any client could immediately see that the endpoint is registered by
536	   issuing

538	   Req: GET coap://[2001:db8::an1:ca57]/rd-lookup/ep?ep=endpoint1&
539	       d=facility23.eu.example.com

541	   Res: 2.05 Content
542	   Payload:
543	   </reg/123e4567-e89b-12d3-a456-426655440000>;ep="endpoint1";
544	       d="facility23.eu.example.com";con="coap://[2001:db8:23::1]"

546	   If at any point in time the RD instance B becomes unavailable, the
547	   registering endpoint's renewal requests will be routed to the next
548	   available instance, for example A.  That instance can update the
549	   shared database with renewed lifetime just as B would have done.

551	   How this performs under a net split depends on the database backend.
552	   Registration resources based on UUIDs were chosen in this example
553	   because those would allow the system to keep accepting new
554	   registrations even in a netsplit situation; the risk of the
555	   registration request not being idempotent towards a node that
556	   switches sides during such a split is considered acceptable.

558	6.2.  Redundant and replicated resource lookup (distinct registration
559	      points)

561	   This scenario takes place in the same environment as the previous
562	   one.

564	   Rather than a shared database, distinct registration points are
565	   advertised.  The advertised registration points are called RD-A to
566	   RD-C; independent of them are lookup proxies LP-X to LP-Z.  Some of
567	   them run on the same hosts.

569	           /'''''''\______/'''''\__/''''''''\
570	        /-                                   -\
571	        |,                                    |
572	          \,,,                           ,~''
573	              \_____/'''\__________/''''   \
574	               |               |            \
575	     /''''''\  |     /''''''\  |  /''''''\   |  /''''''\
576	     | RD-A |--+     | RD-B |--+--| RD-C |   +--| LP-Z |
577	     | LP-X |  |     | LP-Y |  |  |      |   |  |      |
578	     \_____1/  |     \_____2/  |  \____3/    |  \_____4/
579	               |               |             |
580	         +--+--+            +--+--+          +--+
581	         E  E  C            E  E  E          C  C

583	   The lookup proxies in this scenario are constantly observing the
584	   "/rd-lookup/ep?href=/*" and "/rd-lookup/res?provenance=/*" resources
585	   of known RDs on other hosts, and might get updated internally with
586	   state from a co-hosted RD or observe that using an internal
587	   interface.  As there is really suitable content format and
588	   observation mechanism for those yet, the exchanges are partially
589	   described in words here.

591	   RDAO announcements point to the nearest host (whose IP address ends
592	   with the numbers of the respective box in the figure), and hosts that
593	   do not serve both functions provide lookup as follows:

595	   Req: GET coap://[2001:db8:23::3]/.well-known/core?rt=core.rd*

597	   Res: 2.05 Content
598	   Payload:
599	   </rd>;rt="core.rd",
600	   <coap://[2001:db8:23::2]/rd-lookup/ep>;rt="core.rd-lookup-ep",
601	   <coap://[2001:db8:23::2]/rd-lookup/res>;rt="core.rd-lookup-res"

603	   When a client then registers as

605	   Req: POST coap://[2001:db8:23::3]/rd?ep=endpoint1&
606	       d=facility23.eu.example.com
607	   Payload:
608	   </sensors/temp>;if="core.s"

610	   Res: 2.01 Created
611	   Location: /reg/42
612	   the RD at 3 sends notifications to the observing lookup proxies X, Y
613	   and Z:

615	 Res: Patch Result
616	 Add one record: </reg/42>;ep="endpoint1";d="facility23.eu.example.com";
617	     con="coap://[2001:db8:23::1]";lt-age=0

619	   As soon as that is processed, clients can query LP-Z

621	   Req: GET coap://[2001:db8:4::4]/rd-lookup/ep?ep=endpoint1&
622	       d=facility23.eu.example.com

624	   Res: 2.05 Content
625	   Payload:
626	   <coap://[2001:db8:23::3]/reg/42>;ep="endpoint1";
627	       d="facility23.eu.example.com";con="coap://[2001:db8:23::1]"

629	   (Note that lt-age is elided to the client as per the security
630	   considerations for that information).

632	   When a net split happens that cuts LP-Z's site off the rest, it keeps
633	   that information available until the lt-age runs out.

635	   When RD-C unexpectedly becomes unavailable, endpoint1 fails to renew
636	   its registration.  It then starts the RD discovery process again,
637	   picks the next available RD (this time B) and gets a new registration
638	   from that.

640	   RD-B then sends an update to the proxies:

642	 Res: Patch Result
643	 Add one record: </reg/11>;ep="endpoint1";d="facility23.eu.example.com";
644	     con="coap://[2001:db8:23::1]";lt-age=0

646	   The proxies remove C's registration "/reg/42" based on the duplicate
647	   name and now answer requests like this:

649	   Req: GET /rd-lookup/ep?ep=endpoint1&d=facility23.eu.example.com

651	   Res: 2.05 Content
652	   Payload:
653	   <coap://[2001:db8:23::2]/reg/11>;ep="endpoint1";
654	       d="facility23.eu.example.com";con="coap://[2001:db8:23::1]"

656	   Req: GET /rd-lookup/res?if=core.s&d=facility23.eu.example.com

658	   Res: 2.05 Content
659	   Payload:
660	   <coap://[2001:db8:23::1]/sensors/temp>;if="core.s";
661	       anchor="coap://[2001:db8:23::1]/sensors/temp";
662	       provenance="coap://[2001:db8:23:2]/reg/11",
663	   ...

665	6.2.1.  Variation: Large number of registrations, localized queries

667	   If the lookup proxies are not capable of keeping track of all the
668	   registered data, they can opt to forward requests to all the RDs
669	   instead.  In this example, queries are often localized (queries
670	   within a building are often limited to the same building), so LP-Y
671	   could decide to only keep two particular observations active to each
672	   RD:

674	   o  "/rd-lookup/ep?href=/*&d=facility23.eu.example.com"

676	   o  "/rd-lookup/res?provenance=/*&d=facility23.eu.example.com"

678	   With those observed, it could still accurately respond to the above
679	   queries without calling out to the other RDs.

681	   If a query came in as "/rd-lookup/res?if=core.s", it would still need
682	   to forward that query to all RDs to build an overview of all sensors
683	   in the network for the requester.

685	6.2.2.  Variation: Combination with anycast

687	   In a variation of this, all the RDs and LPs can use a shared anycast
688	   address.  They would be then advertised as in the anycast/NoSQL
689	   example.

691	   All RDs would need to be configured such that they encode their host
692	   name in their path (eg. "/reg/rd-c/42").  Nodes must then have proxy
693	   forwarding rules set up such that

695	   o  "/rd" is served from the local RD if there is one, or forwarded to
696	      any (the closest) RD

698	   o  "/reg/*" requests are served if hosted locally, otherwise
699	      forwarded to the appropriate RD, or respond with a 5.04 Gateway
700	      timeout if that is not available any more

702	   o  Lookup request are served from the local lookup proxy, or
703	      forwarded to the closest one on RD-only hosts.

705	   Such a setup is easier if all hosts provide both registration and
706	   lookup functionality.

708	6.3.  Anonymous global endpoint lookup

710	   This scenario describes a way to provide connectivity into devices in
711	   difficult network situations based on identifiers of their
712	   cryptographic keys, the KID context identifiers of OSCORE.  A global
713	   network of untrusted Resource Directory servers is built, and the
714	   individual servers provide network relaying services for endpoints
715	   that operate behind NAT or firewalls.

717	   It assumes the existance of two other hypothetical mechanisms:

719	   o  The RD Parameter named "proxy".

721	      An endpoint can ask the RD to act as a reverse proxy for it by
722	      adding the "proxy" registration parameter; an RD that does
723	      proxying disregards the implicit "con" parameter and announces a
724	      name of its own instead.

726	   o  A URI scheme called "oscore".

728	      A URI of the form "oscore://VGhh...2aWNl/sensor/temp" refers to a
729	      resource "/sensor/temp/" on any OSCORE capable host with which the
730	      client has a key established under the KID context given by the
731	      base64 string in the authority component.

733	      To resolve the URI to a concrete protocol and socket, a form of
734	      Resource Directory assisted protocol negotiation is used.

736	   RD servers join a global pool of servers using a protocol that is not
737	   further described here, but could conceivably be based on distributed
738	   hash tables (DHTs).

740	   Endpoints register only with a key derived name, and usually do not
741	   provide any links because those would be accessible only to
742	   authenticated requesters.

744	   They register at any of a set of preconfigured DNS names for finding
745	   a Resource Directory.  Those names resolve to any of the currently
746	   active RD servers, where geographic proximity could play a role in
747	   the choice of address returned.

749	   When the endpoint discovers the registration URI (for which it uses
750	   coap+tcp to make later proxying more stable), the server returns
751	   links to its explicit IP address:

753	   <coap+tcp://[2001:db8:1:2::3]/rd>;rt="core.rd",
754	   <coap+tcp://[2001:db8:1:2::3]/rd-lookup/ep>;rt="core.rd-lookup-ep"

756	   (This avoids conflict when the DNS assignment flips and a different
757	   host (on which the registration resource is unknown) is returned.
758	   Alternatively, the servers could use a unified scheme of registration
759	   resource naming like "/reg/${name}" or a UUID-based scheme.)

761	   The endpoint then registers:

763	   Req: POST coap+tcp://[2001:db8:1:2::3]/rd?proxy&ep=VGhhdCdzIHRoZSB\
764	       LZXlJZENvbnRleHQgdXNlZCB3aXRoIHRoaXMgZGV2aWNl
765	   Payload: empty

767	   Res: 2.01 Created
768	   Location: /reg/123

770	   When a client wants to talk to that registered server, its RD
771	   discovery process will yield another instance, which it then queries:

773	   Req: GET coap://[2001:db8:4:5::6]/rd-lookup/ep?ep=VGhhdCdzIHRoZSBL\
774	       ZXlJZENvbnRleHQgdXNlZCB3aXRoIHRoaXMgZGV2aWNl

776	   The server will look up the given ep name in the backing DHT, and
777	   forward the request right to the (precisely: any) RD server that has
778	   announced that ep value, which then answers:

780	   Res: 2.05 Created
781	   Payload:
782	   <coap+tcp://[2001:db8:1:2::3]/reg/123>;ep="VGhh...2aWNl";
783	       con="coap://[2001:db8:1:2::3]:10123";
784	       at="coap+tcp://[2001:db8:1:2::3]:10123"

786	   (This particular server uses multiple ports to tell traffic for
787	   different endpoints apart; it could just as well use a catch-all DNS
788	   record, do name based virtual hosting and announce
789	   "con="coap://reg123.server3.example.com" instead.)

791	   The client will then use the discovered address to direct its OSCORE
792	   requests to, and the RD server will proxy for it.

794	   Note that while this setup _can_ serve as a generic RD and answer
795	   resource requests as well, it is doubtful whether there would be any
796	   interest in it given the data becomes public, and is limited by the
797	   necessity to have an "ep=" filter in all requests lest the network be
798	   flooded with requests.

800	7.  References

802	7.1.  Informative References

804	   [I-D.ietf-core-dynlink]
805	              Shelby, Z., Koster, M., Groves, C., Zhu, J., and B.
806	              Silverajan, "Dynamic Resource Linking for Constrained
807	              RESTful Environments", draft-ietf-core-dynlink-04 (work in
808	              progress), September 2017.

810	   [I-D.ietf-core-resource-directory]
811	              Shelby, Z., Koster, M., Bormann, C., Stok, P., and C.
812	              Amsuess, "CoRE Resource Directory", draft-ietf-core-
813	              resource-directory-13 (work in progress), March 2018.

815	   [RFC3986]  Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
816	              Resource Identifier (URI): Generic Syntax", STD 66,
817	              RFC 3986, DOI 10.17487/RFC3986, January 2005,
818	              <https://www.rfc-editor.org/info/rfc3986>.

820	   [RFC7252]  Shelby, Z., Hartke, K., and C. Bormann, "The Constrained
821	              Application Protocol (CoAP)", RFC 7252,
822	              DOI 10.17487/RFC7252, June 2014,
823	              <https://www.rfc-editor.org/info/rfc7252>.

825	   [RFC8075]  Castellani, A., Loreto, S., Rahman, A., Fossati, T., and
826	              E. Dijk, "Guidelines for Mapping Implementations: HTTP to
827	              the Constrained Application Protocol (CoAP)", RFC 8075,
828	              DOI 10.17487/RFC8075, February 2017,
829	              <https://www.rfc-editor.org/info/rfc8075>.

831	7.2.  URIs

833	   [1] https://github.com/chrysn/resource-directory-replication

835	Author's Address
836	   Christian Amsuess
837	   Hollandstr. 12/4
838	   1020
839	   Austria

841	   Phone: +43-664-9790639
842	   Email: christian@amsuess.com