idnits 2.17.1 

draft-ietf-ids-inp-02.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in
     this document.

     Expected boilerplate is as follows today (2024-04-18) according to
     https://trustee.ietf.org/license-info :

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.a:
        This Internet-Draft is submitted in full conformance with the provisions
        of BCP 78 and BCP 79.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2:
        Copyright (c) 2024 IETF Trust and the persons identified as the document
        authors.  All rights reserved.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3:
        This document is subject to BCP 78 and the IETF Trust's Legal Provisions
        Relating to IETF Documents
        (https://trustee.ietf.org/license-info) in effect on the date of
        publication of this document.  Please review these documents
        carefully, as they describe your rights and restrictions with
        respect to this document.  Code Components extracted from this
        document must include Simplified BSD License text as described in
        Section 4.e of the Trust Legal Provisions and are provided
        without warranty as described in the Simplified BSD License.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** Missing expiration date.  The document expiration date should appear on
     the first and last page.

  ** The document seems to lack a 1id_guidelines paragraph about
     Internet-Drafts being working documents. 

  ** The document seems to lack a 1id_guidelines paragraph about 6 months
     document validity -- however, there's a paragraph with a matching
     beginning. Boilerplate error?

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     current Internet-Drafts. 

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     Shadow Directories. 

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- Couldn't find a document date in the document -- date freshness check
     skipped.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Possible downref: Non-RFC (?) normative reference: ref. '1'

  -- Possible downref: Non-RFC (?) normative reference: ref. '2'

  -- Possible downref: Non-RFC (?) normative reference: ref. '3'

  -- Possible downref: Non-RFC (?) normative reference: ref. '4'

  -- Possible downref: Non-RFC (?) normative reference: ref. '5'

  -- Possible downref: Non-RFC (?) normative reference: ref. '6'

  -- Possible downref: Non-RFC (?) normative reference: ref. '7'


     Summary: 8 errors (**), 0 flaws (~~), 1 warning (==), 9 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	INTERNET-DRAFT                                          Joann J. Ordille
3	draft-ietf-ids-inp-02.txt                 Bell Labs, Lucent Technologies
4	Expires in six months

6	                      Internet Nomenclator Project
7	                  Filename: draft-ietf-ids-inp-02.txt

9	Status of this Memo

11	      This document is an Internet-Draft.  Internet-Drafts are working
12	      documents of the Internet Engineering Task Force (IETF), its
13	      areas, and its working groups.  Note that other groups may also
14	      distribute working documents as Internet-Drafts.

16	      Internet-Drafts are draft documents valid for a maximum of six
17	      months and may be updated, replaced, or obsoleted by other
18	      documents at any time.  It is inappropriate to use Internet-
19	      Drafts as reference material or to cite them other than as ``work
20	      in progress.''

22	      To learn the current status of any Internet-Draft, please check
23	      the ``1id-abstracts.txt'' listing contained in the Internet-
24	      Drafts Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net
25	      (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East
26	      Coast), or ftp.isi.edu (US West Coast).

28	Abstract

30	    The goal of the Internet Nomenclator Project is to integrate the
31	    hundreds of publicly available CCSO servers from around the world.
32	    Each CCSO server has a database schema that is tailored to the needs
33	    of the organization that owns it.  The project is integrating the
34	    different database schema into one query service.  The Internet
35	    Nomenclator Project will provide fast cross-server searches for
36	    locating people on the Internet.  It augments existing CCSO services
37	    by supplying schema integration, more extensive indexing, and two
38	    kinds of caching -- all this in a system that scales as the number
39	    of CCSO servers grows.  One of the best things about the system is
40	    that administrators can incorporate their CCSO servers into
41	    Nomenclator without changing the servers. All Nomenclator needs is
42	    basic information about the server.

44	    This document provides an overview of the Nomenclator system,
45	    describes how to register a CCSO server in the Internet Nomenclator
46	    Project, and how to use the Nomenclator search engine to find people
47	    on the Internet.

49	    Distribution of this document is unlimited.  Comments should be sent
50	    to the author.

52	1.  Introduction

54	    Hundreds of organizations provide directory information through the
55	    CCSO name service protocol [3]. Although the organizations provide a
56	    wealth of information about people, finding any one person can be
57	    difficult because each organization's server is independent.  The
58	    different servers have different database schemas (attribute names
59	    and data formats).  The 300+ CCSO servers have more than 900
60	    different attributes to describe information about people. Very few
61	    common attributes exist.  Only name and email occur in more than 90%
62	    of the servers [4].  No special support exists for cross-server
63	    searches, so searching can be slow and expensive.

65	    The goal of the Internet Nomenclator Project is to provide fast,
66	    integrated access to the information in the CCSO servers.  The
67	    project is the first large-scale use of the  Nomenclator system.
68	    Nomenclator is a more general system than a white pages directory
69	    service.  It is a scalable, extensible information system for the
70	    Internet.

72	    Nomenclator answers descriptive (i.e. relational) queries.  Users
73	    can locate information about people, organizations, hosts, services,
74	    publications, and other objects by describing their attributes.
75	    Nomenclator achieves fast descriptive query processing through an
76	    active catalog, and extensive meta-data and data caching.  The
77	    active catalog constrains the search space for a query by returning
78	    a list of data repositories where the answer to the query is likely
79	    to be found.  Meta-data and data caching keep frequently used query
80	    processing resources close to the user, thus reducing communication
81	    and processing costs.

83	    Through the Internet Nomenclator Project, users can query any CCSO
84	    server, regardless of its attribute names or data formats, by
85	    specifying the query to Nomenclator (see Figure 1).  Nomenclator
86	    provides a world view of the data in the different servers.  Users
87	    express their queries in this world view.  Nomenclator returns the
88	    answer immediately if it has been cached by a previous query. If
89	    not, Nomenclator uses its active catalog to constrain the query to
90	    the subset of relevant CCSO servers.  The speed of the query is
91	    increased, because only relevant servers are contacted. Nomenclator
92	    translates the global query into local queries for each relevant
93	    CCSO server.  It then translates the responses into the format of
94	    the world view.

96	    --------------------------------------------------------------------

98	                      +-------------+             +-------------+
99	                      |             |             |             |
100	          World View  |             | Local View  |             |
101	          Query       |             | Query       |  Relevant   |
102	          ----------->|             |------------>|             |
103	                      | Nomenclator |             |  CCSO       |
104	                      |             |             |             |
105	          <-----------|             |<------------|  Server     |
106	           World View |             |  Local View |             |
107	           Response   |             |  Response   |             |
108	                      +-------------+             +-------------+

110	                       Figure 1:  A Nomenclator Query

112	                   Nomenclator translates queries to and from
113	                   the language of the relevant CCSO servers.

115	    --------------------------------------------------------------------

117	    The Internet Nomenclator Project makes it easier for users to find a
118	    particular CCSO server, but it does not send all queries to that
119	    server.  When Nomenclator constrains the search for a query answer,
120	    it screens out irrelevant queries from ever reaching the server.
121	    When Nomenclator finds an answer in its cache, it screens out
122	    redundant queries from reaching the server.  The server becomes
123	    easier to find and use without experiencing the high loads caused by
124	    exhaustive and redundant searches.

126	    The Internet Nomenclator Project creates the foundation for a much
127	    broader heterogeneous directory service for the Internet.  The
128	    current version of Nomenclator provides integrated access to CCSO
129	    and relational database services. The Nomenclator System
130	    Architecture supports fast, integrated searches of any collection of
131	    heterogeneous directories.  The Internet Nomenclator Project can be
132	    enhanced to support additional name services, or provide intergated
133	    query services for other application domains. The project is
134	    starting with CCSO services, because the CCSO services are widely
135	    available and successful.

137	    Section 2 describes the Nomenclator system in more detail.  Section
138	    3 explains how to register a CCSO server as part of the project.
139	    Section 4 briefly describes how to use Nomenclator.  Section 5
140	    provides a summary.

142	2.  Nomenclator System

144	    Nomenclator is a scalable, extensible information system for the
145	    Internet. It supports descriptive (i.e. relational) queries.  Users
146	    locate information about people, organizations, hosts, services,
147	    publications, and other objects by describing their attributes.
148	    Nomenclator achieves fast descriptive query processing through an
149	    active catalog, and extensive meta-data and data caching.

151	    The active catalog constrains the search space for a query by
152	    returning a list of data repositories where the answer to the query
153	    is likely to be found.  Components of the catalog are distributed
154	    indices that isolate queries to parts of the network, and smart
155	    algorithms for limiting the search space by using semantic,
156	    syntactic, or structural constraints.  Meta-data caching improves
157	    performance by keeping frequently used characterizations of the
158	    search space close to the user, thus reducing active catalog
159	    communication and processing costs.  When searching for query
160	    responses, these techniques improve query performance by contacting
161	    only the data repositories likely to have actual responses,
162	    resulting in acceptable search times.

164	    Administrators make their data available in Nomenclator by supplying
165	    information about the location, format, contents, and protocols of
166	    their data repositories.  Experience with Nomenclator shows that
167	    gathering a small amount of information from data owners can have a
168	    substantial positive impact on the ability of users to retrieve
169	    information.  For example, each CCSO administrator provides a
170	    mapping from the local view of data (i.e. the local schema) at the
171	    CCSO server to Nomenclator's world view.  The administrator also
172	    supplies possible values for any attributes with small domains at
173	    the data repository (such as the 'city' or 'state_or_province'
174	    attributes).  With this information, Nomenclator can isolate queries
175	    to a small percentage of the CCSO data repositories, and provide an
176	    integrated view of their data.  Nomenclator provides tools that
177	    minimize the effort that administrators expend in characterizing
178	    their data repositories.  Nomenclator does not require
179	    administrators to change the format of their data or the access
180	    protocol for their database.

182	2.1 Components of a Nomenclator System

184	    A Nomenclator system is comprised of a distributed catalog service
185	    and a query resolver (see Figure 2).  The distributed catalog
186	    service gathers meta-data about data repositories and makes it
187	    available to the query resolver. Meta-data includes constraints on
188	    attribute values at a data repository, known patterns of data
189	    distribution across several data repositories, search and navigation
190	    techniques, schema and protocol translation techniques, and the
191	    differing schema at data repositories.

193	    --------------------------------------------------------------------

195	                      +-------------+             +-------------+
196	                      |             |             |             |
197	          World View  |             |  Meta Data  |             |
198	          Query       |             |  Request    | Distributed |
199	          ----------->|   Query     | ----------->|             |
200	                      |   Resolver  |             |  Catalog    |
201	                      |             |             |             |
202	          <-----------|   (caches)  | <-----------|  Service    |
203	           World View |             |  Meta Data  |             |
204	           Response   |             |  Response   |             |
205	                      +-------------+             +-------------+

207	                    Figure 2: Components of a Nomenclator System

209	    --------------------------------------------------------------------

211	    Query resolvers at the user sites retrieve, use, cache, and re-use
212	    this meta-data in answering user queries.  The catalog is "active"
213	    in two ways. First, some meta-data moves from the distributed
214	    catalog service to each query resolver during query processing.
215	    Second, the query resolver uses the initial meta-data, in particular
216	    the search and navigation techniques, to generate additional meta-
217	    data that guides query processing.  Typically, one resolver process
218	    serves a few hundred users in an organization, so users can benefit
219	    from larger resolver caches.

221	    Query resolvers cache techniques for constraining the search space
222	    and the results of previously constrained searches (meta-data), and
223	    past query answers (data) to speed future query processing.  Meta-
224	    data and data caching tailor the query resolver to the specific
225	    needs of the users at the query site.  They also increase the scale
226	    of a Nomenclator system by reducing the load from repeated searches
227	    or queries on the distributed catalog service, data repositories,
228	    and communications network.

230	    The distributed catalog service is logically one network service,
231	    but it can be divided into pieces that are distributed and/or
232	    replicated.  Query resolvers access this distributed, replicated
233	    service using the same techniques that work for multiple data
234	    repositories.

236	    A Nomenclator system naturally includes many query resolvers.
237	    Resolvers are independent, but renewable, query agents that can be
238	    as powerful as the resources available at the user site.  Caching
239	    decreases the dependence of the resolver on the distributed catalog
240	    service for frequently used meta-data, and on data repositories for
241	    frequently used data.  Caching thus improves the number of users
242	    that can be supported and the local availability of the query
243	    service.

245	2.2 Meta-Data Techniques

247	    The active catalog structures the information space into a
248	    collection of relations about people, hosts, organizations, services
249	    and other objects. It collects meta-data for each relation and
250	    structures it into "access functions" for locating and retrieving
251	    data.  Access functions respond to the question: "Where is data to
252	    answer this query?"  There are two types of responses corresponding
253	    to the two types of access functions.  The first type of response
254	    is: "Look over there." "Catalog functions" return this response;
255	    they constrain the query search by limiting the data repositories
256	    contacted to those having data relevant to the query. Catalog
257	    functions return a referral to data access functions that will
258	    answer the query or to additional catalog functions to contact for
259	    more detailed information.  The second response to "Where?" is:
260	    "Here it is!" "Data access functions" return this response; they
261	    understand how to obtain query answers from specific data
262	    repositories.  They return tuples that answer the query.
263	    Nomenclator supplies access functions for common name services, such
264	    as the CCSO service, and organizations can write and supply access
265	    functions for data in their repositories.

267	    Access functions are implemented as remote or local services.
268	    Remote access functions are services that are available through a
269	    standard remote procedure call interface.  Local access functions
270	    are functions that are supplied with the query resolver.  Local
271	    access functions can be applied to a variety of indexing and data
272	    retrieval tasks by loading them with meta-data stored in distributed
273	    catalog service.  Remote access functions are preferred over local
274	    ones when the resources of the query resolver are inadequate to
275	    support the access function.  The owners of data may also choose to
276	    supply remote access functions for privacy reasons if their access
277	    functions use proprietary information or algorithms.  Local
278	    functions are preferred whenever possible, because they are highly
279	    replicated in resolver caches.  They can reduce system and network
280	    load by bringing the resources of the active catalog directly to the
281	    users.

283	    Remote access functions are simple to add to Nomenclator and local
284	    access functions are simple to apply to new data repositories,
285	    because the active catalog provides "referrals" that describe the
286	    conditions for using access functions.  For simplicity, this
287	    document describes referral techniques for exact matching of query
288	    strings.  Extensions to these techniques in Nomenclator support
289	    matching query strings that contain wildcards or word-based matching
290	    of query strings in the style of the CCSO services.

292	    Each referral contains a template and a list of references to access
293	    functions.  The template is a conjunctive selection predicate that
294	    describes the scope of the access functions.  Conjunctive queries
295	    that are within the scope of the template can be answered with the
296	    referral.  When a template contains a wildcard value ("*") for an
297	    attribute, the attribute must be present in any queries that are
298	    processed by the referral.  The system follows the following rule:

300	      Query Coverage Rule:

302	      If the set of tuples satisfying the selection predicate in a query
303	      is covered by (is a subset of) the set of tuples satisfying the
304	      template, then the query can be answered by the access functions
305	      in the reference list of the referral.

307	    For example, the query below:

309	      select * from People where country = "US" and surname = "Ordille";

311	    is covered by the following templates in Lines (1) through (3), but
312	    not by the templates in Lines (4) and (5):

314	      (1) country = "US" and surname = "*"

316	      (2) country = "US" and surname = "Ordille"

318	      (3) country = "US"

320	      (4) organization = "*"

322	      (5) country = "US" and surname = "Elliott"

324	    Referrals form a generalization/specialization graph for a relation
325	    called a "referral graph."  Referral graphs are a conceptual tool
326	    that guides the integration of different catalog functions into our
327	    system and that supplies a basis for catalog function construction
328	    and query processing.  A "referral graph" is a partial ordering of
329	    the referrals for a relation.  It is constructed using the
330	    subset/superset relationship: "S is a subset of G."  A referral S is
331	    a subset of referral G if the set of queries covered by the template
332	    of S is a subset of the set of queries covered by the template of G.
333	    S is considered a more specific referral than G; G is considered a
334	    more general referral than S.  For example, the subset relationship
335	    exists between the pairs of referrals with the templates listed
336	    below:

338	      (1) country = "US" and surname = "Ordille"
339	          is a subset of
340	          country = "US"

342	      (2) country = "US" and surname = "Ordille"
343	          is a subset of
344	          country = "US" and surname = "*"

346	      (3) country = "US" and surname = "*"
347	          is a subset of
348	          country ="US"

350	      (4) country = "US"
351	          is a subset
352	          "empty template"

354	    but it does not exist between the pairs of referrals with the
355	    following templates:

357	      (5) country = "US"
358	          is not a subset of
359	          department = "CS"

361	      (6) country = "US" and name = "Ordille"
362	          is not a subset of
363	          country = "US" and name = "Elliott"

365	    In Lines (1) and (2), the more general referral covers more queries,
366	    because it covers queries that list different values for surname.
367	    In Line (3), the more general referral covers more queries, because
368	    it covers queries that do not constrain surname to a value.  In Line
369	    (4), the specific referral covers only those queries that constrain
370	    the country to "US" while the empty template covers all queries.

372	    During query processing, wildcards in a template are replaced with
373	    the value of the corresponding attribute in the query.  For any
374	    query covered by two referrals S and G such that S is a subset of G,
375	    the set of tuples satisfying the template in S is covered by the set
376	    of tuples satisfying the template in G.  S is used to process the
377	    query, because it provides the more constrained (and faster) search
378	    space.  The referral S has a more constrained logical search space
379	    than G, because the set of tuples in the scope of S is no larger,
380	    and often smaller, than the set in the scope of G. Moreover, S has a
381	    more constrained physical search space than G, because the data
382	    repositories that must contacted for answers to S must also be
383	    contacted for answers to G, but additional data repositories may
384	    need to be contacted to answer G.

386	    In constraining a query, a catalog function always produces a
387	    referral that is more specific than the referral containing the
388	    catalog function.  Wildcards ("*") in a template indicate which
389	    attribute values are used by the associated catalog function to
390	    generate a more specific referral.  In other words, catalog
391	    functions always follow the rule:

393	      Catalog Function Constrained Search Rule:

395	      Given a referral R with a template t and a catalog function cf,
396	      and a query q covered by t, the result of using cf to process q,
397	      cf(q), is a referral R' with template t' such that q  is covered
398	      by t' and R' is more specific than R.

400	    Catalog functions make it possible to import a portion of the
401	    indices for the information space into the query resolver.  Since
402	    they generate referrals, the resolver can cache the most useful
403	    referrals for a relation and call the catalog function as needed to
404	    generate new referrals.

406	    The resolver query processing algorithm obtains an initial set of
407	    referrals from the distributed catalog service.  It then navigates
408	    the referral graph, calling catalog functions as necessary to obtain
409	    additional referrals that narrow the search space. Sometimes, two
410	    referrals that cover the query have the relationship of general to
411	    specific to each other.  The resolver eliminates unnecessary access
412	    function processing by using only the most specific referral along
413	    each path of the referral graph.

415	    The search space for the query is initially set to all the data
416	    repositories in the relation.  As the resolver obtains referrals to
417	    sets of relevant data repositories (and their associated data access
418	    functions) it forms the intersection of the referrals to constrain
419	    the search space further.  The intersection of the referrals
420	    includes only those data repositories listed in all the referrals.
421	    Intersection combines independent paths through the referral graph
422	    to derive benefit from indices on different attributes.

424	2.3 Meta-Data and Data Caching

426	    A Nomenclator query resolver caches the meta-data that result from
427	    calling catalog functions.  It also caches the responses for
428	    queries.  If the predicate of a new query is covered by the
429	    predicate of a previous query, Nomenclator calculates the response
430	    for the new query from the cached response of the old query.
431	    Nomenclator timestamps its cache entries to provide measures of the
432	    currentness of query responses and selective cache refresh.   The
433	    timestamps are used to calculate a t-bound on query responses
434	    [5][1].  A t-bound is the time after which changes may have occurred
435	    to the data that are not reflected in the query response. It is the
436	    time of the oldest cache entry used to calculate the response.
437	    Nomenclator returns a t-bound with each query response.  Users can
438	    request more current data by asking for responses that are more
439	    recent than this t-bound. Making such a request flushes older items
440	    from the cache if more recent items are available.  Query resolvers
441	    calculate a minimum t-bound that is some refresh interval earlier
442	    than the current time.  Resolvers keep themselves current by
443	    replacing items in the cache that are earlier than the minimum t-
444	    bound.

446	2.4 Scale and Performance

448	    Three performance studies of active catalog and meta-data caching
449	    techniques are available [5].  The first study shows that the active
450	    catalog and meta-data caching can constrain the search effectively
451	    in a real environment, the X.500 name space.  The second study
452	    examined the performance of an active catalog and meta-data caching
453	    for single users on a local area network.  The experiments showed
454	    that the techniques to eliminate data repositories from the search
455	    space can dramatically improve response time.  Response times
456	    improve, because latency is reduced.  The reduction of latency in
457	    communications and processing is critical to large-scale descriptive
458	    query optimization.  The experiments also showed that an active
459	    catalog is the most significant contributor to better response time
460	    in a system with low load, and that meta-data caching functions to
461	    reduce the load on the system.  The third study used an analytical
462	    model to evaluate the performance and scaling of these techniques
463	    for a large Internet environment.  It showed that meta-data caching
464	    plays an essential role in scaling the distributed catalog service
465	    to millions of users.  It also showed that constraining the search
466	    space with an active catalog contributes significantly to scaling
467	    data repositories to millions of users.  Replication and data
468	    caching also contribute to the scale of the system in a large
469	    Internet environment.

471	3.  Registering a CCSO Server

473	    The Internet Nomenclator Project supports the following home page:

475	      http://cm.bell-labs.com/cs/what/nomenclator

477	    The home page provides a variety of information and services.

479	    Administrators can register their CCSO servers through services on
480	    this home page.  The registration service collects CCSO server
481	    location information, contact information for the administrator of
482	    the CCSO server, implicit and explicit constraints on entries in the
483	    server's database, and a mapping from the local schema of the CCSO
484	    server to the schema of the world view.

486	    The implicit and explicit constraints on the server's database are
487	    the fuel for Nomenclator's catalog functions.  The registration
488	    center currently collects constraints on organization name,
489	    department, city, state or province name, country, phone number,
490	    postal code, and email address.  These constraints are automatically
491	    incorporated into Nomenclator's distributed catalog service.  They
492	    are used by catalog functions in query resolvers to constrain
493	    searches to relevant CCSO servers.  For example, a database only
494	    contains information about the computer science and electrical
495	    engineering departments at a French university.  The department,
496	    organization and country attributes are constrained.  Nomenclator
497	    uses these constraints to prevent queries about other departments,
498	    organizations or countries from being sent to this CCSO server.

500	    The mapping from the local schema of the CCSO server to the schema
501	    of the world view allows Nomenclator to translate queries and
502	    responses for the CCSO server.  The registration center currently
503	    collects this mapping by requesting an example of how to translate a
504	    typical entry in the CCSO server into the world view schema and,
505	    optionally, an example of how to translate a canonical entry in the
506	    world view schema into the local schema of the CCSO server [4].
507	    These examples are then used to generate a mapping program that is
508	    stored in the distributed catalog service.  The CCSO data access
509	    function in the query resolver interprets these programs to
510	    translate queries and responses communicated with that CCSO server.
511	    We plan to release the mapping language to CCSO server
512	    administrators, so administrators can write and maintain the mapping
513	    for their servers.  We have experimented with more than 20 mapping
514	    programs.  They are seldom more than 50 lines, and are often
515	    shorter.  It typically takes one or two lines to map an attribute.

517	4.  Using Nomenclator

519	    The Internet Nomenclator Project currently provides a centralized
520	    query service on the Internet.  The project runs a Nomenclator query
521	    resolver that is accessible through its Web page (see the URL in
522	    Section 3) and the Simple Nomenclator Query Protocol (SNQP) [2].

524	    The service answers queries that are a conjunction of string values
525	    for attributes.  A variety of matching techniques are supported
526	    including exact string matching, matching with wildcards, and word-
527	    based matching in the style of the CCSO service.  Our web interface
528	    uses the Simple Nomenclator Query Protocol (SNQP) [2]. Programmers
529	    can create their own interfaces by using this protocol to
530	    communicate with the Nomenclator query resolver.  They will require
531	    the host name and port number for the query resolver which they can
532	    obtain from the Nomenclator home page.

534	    Subsequent phases of the project will provide enhanced services such
535	    as providing advice about the cost of queries and ways to constrain
536	    queries further to produce faster response times, and allowing users
537	    to request more current data.  We also plan to distribute query
538	    resolvers, so users can benefit from running query resolvers
539	    locally.  Local query resolvers reduce latency for the user, and
540	    distribute query processing load throughout the network.

542	5.  Summary

544	    The Internet Nomenclator Project augments existing CCSO services by
545	    supplying schema integration and fast cross-server searches. The key
546	    to speed in descriptive query processing is an active catalog, and
547	    extensive meta-data and data caching.  The Nomenclator system is the
548	    result of research in distributed systems [5][6][7][4].  It can be
549	    extended to incorporate other name servers, besides the CCSO
550	    servers, and to address distributed search and retrieval challenges
551	    in other application domains. In addition to providing a white pages
552	    service, the Internet Nomenclator Project will evaluate how an
553	    active catalog, meta-data caching and data caching perform in very
554	    large global information system.  The ultimate goal of the project
555	    is to refine these techniques to provide the best possible global
556	    information systems.

558	6.  Security Considerations

560	    Security considerations are not discussed in this document.

562	7.  Acknowledgements

564	    Thanks to <<your name here!!>> for their comments on earlier drafts
565	    of this document.

567	8.  References

569	[1]         H. Garcia-Molina, G. Wiederhold. "Read-Only Transactions in
570	            a Distributed Database,"  ACM Transactions on Database Sys-
571	            tems 7(2), pp. 209-234.  June 1982.

573	[2]         J. Elliott, J. Ordille. "The Simple Nomenclator Query Proto-
574	            col (SNQP)," Internet Draft.
575	            <URL:ftp://ftp.internic.net/internet-drafts/draft-ietf-ids-
576	            snqp-02.txt>

578	[3]         R. Hedberg, S. Dorner, P. Pomes.  "The CCSO Nameserver (Ph)
579	            Architecture," Internet Draft.  December 1995.
580	            <URL:ftp://ftp.internic.net/internet-drafts/draft-ietf-ids-
581	            ph-00.txt>

583	[4]         A. Levy, J. Ordille. "An Experiment in Integrating Internet
584	            Information Sources," AAAI Fall Symposium on AI Applications
585	            in Knowledge Navigation and Retrieval, November 1995.
586	            <URL:http://cm.bell-labs.com/cm/cs/doc/95/11-01.ps.gz>

588	[5]         J. Ordille. "Descriptive Name Services for Large Internets,"
589	            Ph. D. Dissertation. University of Wisconsin. 1993.
590	            <URL:http://cm.bell-labs.com/cm/cs/doc/93/12-01.ps.gz>

592	[6]         J. Ordille, B. Miller. "Distributed Active Catalogs and
593	            Meta-Data Caching in Descriptive Name Services," Thirteenth
594	            International IEEE Conference on Distributed Computing Sys-
595	            tems, pp. 120-129.  May 1993.  <URL:http://cm.bell-
596	            labs.com/cm/cs/doc/93/5-01.ps.gz>

598	[7]         J. Ordille, B. Miller. "Nomenclator Descriptive Query Opti-
599	            mization in Large X.500 Environments," ACM SIGCOMM Symposium
600	            on Communications Architectures and Protocols, pp.  185-196,
601	            September 1991.  <URL:http://cm.bell-
602	            labs.com/cm/cs/doc/91/9-01.ps.gz>

604	9.  Author's address:

606	    Joann J. Ordille
607	    Bell Labs, Lucent Technologies
608	    Computing Sciences Research Center
609	    700 Mountain Avenue, Rm 2C-301
610	    Murray Hill, NJ 07974  USA

612	    Email: joann@bell-labs.com

614	               This Internet Draft expires January 30, 1997.