Network Working Group                                 Roland Hedberg
Internet Draft                                      Bruce Greenblatt
<draft-ietf-find-cip-tagged-06.txt>
<draft-ietf-find-cip-tagged-07.txt>                       Ryan Moats
Expires in six months                                      Mark Wahl

     A Tagged Index Object for use in the Common Indexing Protocol

Status of this Memo

     This document is an Internet-Draft.  Internet-Drafts are working
documents of the Internet Engineering Task Force (IETF), its areas, and
its working groups. Note that other groups may also distribute working
documents as Internet-Drafts.

     Internet-Drafts are draft documents valid for a maximum of six
months.  Internet-Drafts may be updated, replaced, or made obsolete by
other documents at any time.  It is not appropriate to use  Internet-
Drafts as reference material or to cite them other than as a "working
draft" or "work in progress".

To view the entire list of current Internet-Drafts, please check
the "1id-abstracts.txt" listing contained in the Internet-Drafts
Shadow Directories on ftp.is.co.za (Africa), ftp.nordu.net
(Northern Europe), ftp.nis.garr.it (Southern Europe), munnari.oz.au
(Pacific Rim), ftp.ietf.org (US East Coast), or ftp.isi.edu
(US West Coast).

     Distribution of this document is unlimited.

     Abstract

     This document defines a mechanism by which information servers can
exchange indices of information from their databases by making use of
the Common Indexing Protocol (CIP).  This document defines the structure
of the index information being exchanged, as well as a the appropriate
meanings for the headers that are defined in the Common Indexing Proto-
col.  It is assumed that the structures defined here can be used by
X.500 DSAs, LDAP servers, Whois++ servers, CCSO CSO Ph servers and many others.

1. Introduction

     The Common Indexing Protocol (CIP) as defined in [1] proposes a
mechanism for distributing searches across several instances of a single
type of search engine with a view to creating create a global directory.  CIP
provides a scalable, flexible scheme to tie individual databases into
distributed data warehouses that can scale gracefully with the growth of
the Internet.  CIP provides a mechanism for meeting these goals that is
independent of the access method that is used to access the actual data
that underlies the indices.  Separate from CIP is the definition of the
Index Object that is used to contain the information that is exchanged
among Index Servers.  One such Index Object that has already been
defined is the Centroid that is derived from the Whois++ protocol [2].

     The Centroid does not meet all of the requirements for the exchange
of index information amongst information servers.  For example, it does
not support the notion of incremental updates natively.  For information
servers that contain millions of records in their database, constant
exchange of complete dredges of the database is bandwidth intensive.
The Tagged Index Object is specifically designed to support the exchange
of index update information.  This design comes at the cost of an
increase in the size of the index object being exchanged.  The Centroid
is also not tailored to always be able to give boolean answers to
queries.  In the Centroid Model, "an index server will take a query in
standard Whois++ format, search its collections of centroids and other
forward information, determine which servers hold records which may fill
that query, and then notifies the user's client of the next servers to
contact to submit the query." [2] Thus, the exchange of Centroids
amongst index servers allows hints to be given as to about which information
server actually contains the information.  The Tagged Index Object
labels the various pieces of information with identifiers that tie the
individual object attributes back to an object as a whole.  This "tag-
ging" "tagging"
of information allows an index server to be more capable of
directing a specific query to the appropriate information server.
Again, this feature is added to the Tagged Index Object at the expense
of an increase in the size of the index object.

2. Background

     The Lightweight Directory Access Protocol (LDAP) is defined in [3],
and it defines a mechanism for accessing a collection of information
arranged hierarchically in such a manner way as to provide a globally
distributed database which is normally called the Directory Information
Tree (DIT).  Some distinguishing characteristics of LDAP servers are
that it is normally the case that normally, several servers cooperate to manage a
common subtree of the DIT.  LDAP servers are expected to respond to
requests that pertain to portions of the DIT for which they have data,
as well as for those portions for which they have no information in
their database. For example, the LDAP server for a portion of the DIT in
the United States (c=US) must be able to provide a response to a Search
operation that pertains to a portion of the DIT in Sweden (c=se).  Nor-
mally, the response given will be a referral to another LDAP server that
is expected to be more knowledgeable about the appropriate subtree.
However, there is no mechanism that currently enables these LDAP servers
to refer the LDAP client to the supposedly more knowledgeable server.
Typically, an LDAP (v3) server is configured with the name of exactly
one other LDAP server to which all LDAP clients are referred when their
requests fall outside the subtree of the DIT for which that LDAP server
has knowledge.  This specification defines a mechanism whereby LDAP
server can exchange index information that will allow referrals to point
towards a clearly accurate destination.

     While the

     The X.500 series of recommendations defines the Directory
Information Shadowing Protocol (DISP) [4] which allows X.500 DSAs to
exchange actual information in the DIT.  Shadowing allows various infor-
mation
information from various portions of the DIT to be replicated amongst partic-
ipating
participating DSAs.  The design point of DISP is optimized improved at the exchange
of entire portions of the DIT, whereas the design point of CIP and the
Tagged Index Object is optimize optimized at the exchange of structural index
information about the DIT, and improving the performance of tree naviga-
tion amongst various information servers.  The Tagged Index Object is
more appropriate for the exchange of index information than is DISP.
DISP is more targeted at DIT distribution and fault tolerance.  DISP is
thus more appropriate for the exchange of the actual data in order to
spread the load amongst several information servers.  DISP is tailored
specifically to X.500 (and other hierarchical directory systems), while
the Tagged Index Object and CIP can be used in a wide variety of infor-
mation server environments.

     While DISP allows an individual directory server to collect infor-
mation about large parts of the DIT, it would require a huge database to
collect all of the replicas for a meaningful significant portion of the DIT.  Fur-
thermore, as X.525 states: "Before shadowing can occur, an agreement,
covering the conditions under which shadowing may occur is required.
Although such agreements may be established in a variety of ways, such
as policy statements covering all DSAs within a given DMD ...", where a
DMD is a Directory Management Domain.  This is due owing to the case that the
actual
data in the DIT is being exchanged amongst DSA rather than only
the information required to maintain an Index.  In many environments
such an agreement is not appropriate, and in order to collect informa-
tion information
for a meaningful portion of the DIT, a large number of many agreements
may need to be arranged.

3. Object

     What is desired is to have an information server (or network of
information servers) that can quickly respond to real world requests,
like:

-    What is Tim Howes' Howes's email address?  This is much harder than, than; What
     is
     email address does Tim Howes at Netscape's email address. Netscape have ?

-    What is the X.509 certificate for Fred Smith at compuserve.com?
     One certainly doesn't want to search CompuServe's entire directory
     tree to find out this one piece of information.  I also don't want
     to have to shadow the entire CompuServe directory subtree onto my
     server.  If this request is being made because Fred is trying to
     log into my server, I'd certainly want to be able to respond to the
     BIND in real time.

-    Who are all of the people at Novell that have a title of program-
     mer?

     All of programmer?

     all these requests can reasonably be translated into LDAP or
Whois++, and other directory access protocol queries.  They can also be
serviced in a straightforward manner way by the users home information
server if it has the appropriate reference information into the database
that contains the source data.  In this situation,  Here, the first server
would be able to "chain" the request on behalf of for the user.  Alterna-
tively, a precise referral could be returned.  If the home information
server wants to service (i.e chain) the request based on the index
information that it has on hand, this servicing could be done by any
number of several
different means:

-    issuing LDAP operations to the remote directory server

-    issuing DSP operations to the remote directory server

-    issuing DAP operations to the remote directory server
-    issuing Whois++ operations to the remote Whois++ server

-     ...

4. The Tagged Index Object

     This section defines a Tagged Index Object that can be exchanged by
Information Servers using CIP.  While in many cases often it is acceptable for
Information Servers to make use of the Centroid construct (as defined in definition (from
[2]) to exchange index information, the goals in defining a new con-
struct are multi-pronged:

-    When the Information Server receives a search request that warrants
     that a referral be returned, allow the server to return a referral
     that will point client to a server that is most likely able to
     answer the request correctly.  False positive referrals (the search
     turns up hits in the index object that generate referrals to
     servers that don't hold the desired information) can be reduced,
     depending on the choice of attribute tokenization types that are
     used.

-    When the Information Server receives a search request that is not
     operating against local data,    Potentially allow the Information Server itself
     to "chain" the request to the appropriate remote Information
     Server.  Note incremental updates that LDAP itself does not define how Chaining works,
     but X.500 does.  This seems very similar to the first "prong".

-    Finally, when a collection of Information Servers are operating
     against a large distributed directory, allow them will then consume
     substantially less bandwidth then if full updates always had to distribute
     index information amongst themselves (ala CIP) so that as their own
     searches can be carried out with some degree of efficiency.
     used.

4.1. The Agreement

     Before a Tagged Index Object can be exchanged, the organization
which
that administers the object supplier and the organization which that admin-
isters the object consumer must reach an agreement on how the servers
will communicate. This agreement contains the following:

-    "version":The version of the agreement and the index type.    "index-type": This specification describes the index type
     "x-tagged-index-1"

-    "dsi": An OID which that uniquely identifies the subtree and scope.
     This field is not explicitly necessary, as it may not provide
     information beyond that which what is contained in the "base-uri" below.

-    "base-uri": One or more URI's which that will form the base of any
     referrals created based upon on the index object that is governed by
     this agreement.  For example, in the LDAP URL format [8] the base-
     uri would specify (among other items): the LDAP host,  the base
     object to which this index object refers (e.g. c=SE), and the scope
     of the index object (e.g. single container).

-    "supplier": The hostname and listening port number portnumber of the supplier
     server, as well as any alternative servers holding that same naming
     contexts, in case if the supplier is unavailable.

-    "consumeraddr": This is a URI of the "mailto:" form, with the RFC
     822 email address of the consumer server.  Subsequent  Further versions of
     this draft allow other forms of URI, so that the consumer may
     retrieve the update via the WWW, FTP or CIP

-    "updateinterval": The maximum duration in seconds between occu-
     rances of the supplier server generating an update.  If the con-
     sumer server has not received an update from the supplier server
     after waiting this long since the previous update, it is likely
     that the index information is now out of date.  A typical value for
     a server with frequent updates would be 604800 seconds, or every
     week.  Servers whose DITs are only  modified annually could have a
     much longer update interval.

-    "attributeNamespace": Every set of index servers that together
     wants to support a specific usage of indeces, has to agree on which
     attributenames to use in the index objects. The participating
     directory servers also has to agree on the mapping from local
     attributenames to the attributenames used in the index. Since one
     specific index server might be involved in several such sets, it
     has to have some way to connect a update to the proper set of
     indexes. One possible solution to this would be to use different
     DSIs.

-    "consistencybase": How consistency of the index is maintained over
     incremental updates:

          "complete" - every change or delete concerning one object has
          to contain all tokens connected to that object. This method
          must be supported by any server who wants to comply with this
          standard.

          "tag" - starting at a full update every incremental update
          refering back to this full updated has to maintain state-
          information regarding tags, such that a object within the
          original database is assigned the same tagnumber every time.
          This method is optional.

          "unique" - every object in the Dataset has to have a unique
          value for a specific attribute in the index. A example of such
          a attribute could be the distinguishedName attribute. This
          method is also optional.

-    "securityoption": Whether and how the supplier server should  sign
     and encrypt the update before sending it to the consumer server.
     Options for this version of the specification are:

          "none" - the update is sent in plaintext

          "PGP/MIME": the update is digitally signed and encrypted using
          PGP [9]

          "S/MIME": the update is digitally signed and encrypted using
          S/MIME [10]

          "SSLv3": the update is digitally signed and encrypted using an
          SSLv3 connection [11]

          "Fortezza": the update is digitally signed and encrypted using
          Fortezza [5]

     It is recommended that the "PGP/MIME" option be used when  exchang-
ing exchanging
sensitive information across public networks, and both the supplier
and consumer have PGP keys. The "Fortezza" option is intended for use in
environments where security protocols are based on Fortezza-compatible
devices. The "S/MIME" option can be used with both the supplier and
consumer have RSA keys and can make use of the PKCS protocols defined in
the S/MIME specification. The "SSLv3" option can be used when both the
supplier and consumer have access to SSL services, have server certifi-
cates, and can mutually authenticate each other.  Should these be IANA
registered things???

-    Security Credentials: The long-term cryptographic credentials used
     for key exchange and authentication of the consumer and supplier
     servers, if a security option was selected.  For "PGP/MIME", "PGP/MIME," this
     will be the trusted public keys of both servers.  For "Fortezza", "Fortezza,"
     this will be the certificate paths of both servers to a common
     point of trust. For "S/MIME" and "SSLv3" these will be the certifi-
     cates of the supplier and consumer.

     Note that if the index server maintains the information that would
appear in the agreement in a directory according to the definitions in
[7], then no real formal agreement between the two parties needs to be
put in place, and the information that is required for communication
between the two index servers is derived automatically from the direc-
tory.
directory.

4.2. Content Type

     The update consists of a MIME object of type application/cip-index-
object.  The parameters are:

     "type": this has value "application/index.obj.tagged".

     "dsi": the DSI (if any) from the agreement.

     "base-uri". A set of URIs, separated by spaces. In each URI, the
     hostname/portno must be distinct, and based on the "supplier" part
     of the agreement.

     The payload is mostly textual data but may include bytes with the
high bit set.  The originating information server should set the con-
tent-transfer-encoding as appropriate for the information included in
the payload.

     This object may be encapsulated in a wrapper content (such as mul-
tipart/signed) or be encrypted as part of the security procedures. The
resulting content can the distributed, for example via electronic mail.
For example,
From: supplier@sup.com Date: Thu, 16 Jan 1997 13:50:37 -0500
Message-Id: <199701161850.NAA29295@sup.com>;
To: consumer@consumer.com       <<-- from consumer server address

Reply-to: supplier-admin@sup.com
MIME-Version: 1.0
Content-Type: application/index.obj.tagged;
dsi=1.3.6.1.4.1.1466.85.85.1.2.3.4.5.6.7.8.9.10.11.12.13.14.15.16;
base-uri="ldap://sup.com/dc=sup,dc=com ldap://alt.com/dc=sup,dc=com"

     The payload is series of CRLF-terminated lines. The payload only
includes characters from a subset of the printable US-ASCII subset of
UTF-8.  Attribute values that occur outside of this subset are encoded
as defined below.  As more experience is gained with index objects and
UTF-8 data, a future version of this specification may allow for the
native transfer of UTF-8 data without requiring this special encoding.
No other character sets are permitted by this version of the specifica-
tion.
UTF-8.
Some supplier servers may only be able to generate the printable
US-ASCII subset, subset of UTF-8, but all consumer servers must be able to
handle the full range of Unicode characters when decoding the attribute
values (in the "attr-value" field in the BNF below).

4.3.  Tagged Index BNF

     The Tagged Index object has the following grammar, expressed in
modified BNF format:

index-object = 0*(io-part SEP) io-part
io-part      = header SEP schema-spec SEP index-info
header       = version-spec SEP update-type SEP this-update SEP
                last-update SEP context-size name-space SEP
version-spec = "version:" *SPACE "x-tagged-index-1"
update-type  = "updatetype:" *SPACE ( "total" | "incremental")
               ( "incremental" [*SPACE "tagbased"|"uniqueIDbased" ] )
this-update  = "thisupdate:" *SPACE TIMESTAMP
last-update  = [ "lastupdate:" *SPACE TIMESTAMP ] SEP]
context-size = [ "contextsize:" *SPACE 1*DIGIT ] SEP]
schema-spec  = "BEGIN IO-Schema" SEP 1*(schema-line SEP)
               "END IO-Schema"
schema-line  = attribute-name ":" token-type
token-type   = "FULL" | "TOKEN" | "RFC822" | "UUCP" | "DNS"
index-info   = full-index | incremental-index
full-index   = "BEGIN Index-Info" SEP 1*(index-block SEP)
               "END Index-Info"
incremental-index = 1*(add-block | delete-block | update-block)
add-block    = "BEGIN Add Block" SEP 1*(index-block SEP)
               "END Add Block"
delete-block = "BEGIN Delete Block" SEP 1*(index-block SEP)
               "END Delete Block"
update-block = "BEGIN Update Block" SEP 1*(index-block
               0*(old-index-block SEP)
               1*(new-index-block SEP)
               "END Update Block"
old-index-block = "BEGIN Old" SEP 1*(index-block SEP)
               "END Old"
new-index-block = "BEGIN New" SEP 1*(index-block SEP)
               "END New"
index-block  = first-line 0*(SEP cont-line)
first-line   = attr-name ":" *SPACE taglist "/" attr-value
cont-line    = "-" taglist "/" attr-value
taglist      = tag 0*("," tag) | "*"
tag          = 1*DIGIT ["-" 1*DIGIT]
attr-value   = 0*(UTF8) 1*(UTF8)
attr-name    = 1*(NAMECHAR)
UTF8         = ASCII | "%" HEX HEX
TIMESTAMP    = 1*DIGIT
ASCII        = DIGIT | UPPER | LOWER | OTHER
NAMECHAR     = DIGIT | UPPER | LOWER | "-" | ";" | "."
SPACE        = <ASCII space, hex 20>; %x20>;
SEP          = (CR LF) | LF
CR           = <ASCII CR, carriage return, hex 0D>; %x0D>;
LF           = <ASCII LF, line feed, hex 0A>;
HEX          = "a" | "b" | "c" | "d" | "e" | "f" | DIGIT %x0A>;
DIGIT        = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" |
               "8" | "9"

UPPER        = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" |
               "I" | "J" | "K" | "L" | "M" | "N" | "O" | "P" |
               "Q" | "R" | "S" | "T" | "U" | "V" | "W" | "X" |
               "Y" | "Z"
LOWER        = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" |
               "i" | "j" | "k" | "l" | "m" | "n" | "o" | "p" |
               "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" |
               "y" | "z"
OTHER

US-ASCII-SAFE  = "(" | ")" | "+" | "," | "-" | "." | "/" | ":" |
               "=" | "?" | "@" | ";" | "$" | "_" | "!" | "~" |
               "*" | "'" | "\" | """ | "#" | "&" | "<" | ">" |
               "[" | "]" | "^" | "`" | "{" | "|" | "}"

     Characters that are allowed to appear unescaped in attr-values are
the printable subset of (low) ASCII minus the "%" characters, i.e. hex
21 through hex 7e inclusive with the exception of hex 25 (which is the
"%" character).  Any other UTF-8 encoding of a character that appears in
an attr-value must be excaped by using the "%" character and two hex
digits that encode the character.  For example, The UCS-2 sequence
"A<NOT IDENTICAL TO><ALPHA>." (0041, 2262, 0391, 002E) may be encoded in
UTF-8 as follows:
   41 E2 89 A2 CE 91 2E

     If this character sequence appears in an attribute that is in a
Tagged Index Object attr-value, then it is encoded as:
   41 25 65 32 25 38 39 25 61 32 25 63 65 25 39 31 2E

     When viewed as an character string the encoding appears as:
   "A%e2%89%a2%ce%91." %x01-09 / %x0B-0C / %x0E-7F
                ;; US-ASCII except CR, LF, NUL
UTF8           = US-ASCII-SAFE / UTF8-1 / UTF8-2 / UTF8-3
                          / UTF8-4 / UTF8-5
UTF8-CONT      = %x80-BF
UTF8-1         = %xC0-DF UTF8-CONT
UTF8-2         = %xE0-EF 2UTF8-CONT
UTF8-3         = %xF0-F7 3UTF8-CONT
UTF8-4         = %xF8-FB 4UTF8-CONT
UTF8-5         = %xFC-FD 5UTF8-CONT

The set of characters allowed to appear in the attr-name field is
limited to the set of characters used in LDAP and WHOIS++ attribute
names.  For other services that have attribute name character sets that
are larger than these, it is suggested that those services should create a pro-
file that maps the names onto object identifiers, and the sequence of
digits and periods is used by those services in creating the attr-name
fields for their Tagged Index Objects.

     Note that the attribute value may only be empty in the case of an
incremental update

It is worth mentioning that contains updates to a "Update Block" index based in which the tagged index
object indicates that certain attributes of
objects are being removed.
This specification only supports the replacement of entire attributes,
so that in the case of a multi-valued attribute, all of the values must MUST be specified performed in the Replace Block, not just the newly added values.  The
intention of the Tagged Index Object is to supply a snapshot of order specified by the cur-
rent tagged index of the directory.
object itself.

4.3.1.  Header Descriptions

     The header section consists of one or more "header lines".  The
following header lines are defined:

     "version": This line must always be present, and have the value "x-
     tagged-index-1" for this version of the specification.

     "updatetype": This line must always be present.  It takes as the
     value either "total" or "incremental".  The first update sent by
     a supplier server to a consumer server for a DSI must be a "total" update (why?).
     update.

     "thisupdate": This line must always be present. The value is the
     number of seconds from 00:00:00 UTC January 1, 1970 at which the
     supplier constructed this update.

     "lastupdate": This line must be present if the "updatetype" list
     has the value "incremental".  The value is the number of seconds
     from 00:00:00 UTC January 1, 1970 at which the supplier constructed
     the previous update sent to the consumer.  This field allows the
     consumer to determine if a previous update was missed. missed

     "contextsize": This line may be present at the supplier's option.
     The value is a number, which is the approximate total number of
     entries in the subtree.  This information is provided for statisti-
     cal purposes only.

4.3.2.  Tokenization Types

     The Tagged Index Object inherits the "TOKEN" scheme for tokeniza-
tion as specified in [2].  In addition, there are several other tok-
enization schemes defined for the Tagged Index Object.  The following
table presents these schemes and what character(s) are used to delimit
tokens.

        Token Type      Tokenization Characters
        FULL    none
        TOKEN   white space, "@"
        RFC822  white space, ".", "@"
        UUCP    white space, "!"
        DNS     any character note a number, letter, or "-"

4.3.3.  Tag Conventions

     In the tag list, multiple consecutive tags may be shortened by
using "#-#".  For example, the list "3,4,5,6,7,8,9,10" may be shortened
to "3-10".  Tags are to be applied to the data on a per entry level.
Thus, if two index lines in the same index object contain the same tag,
then it is always the case that those two lines always refer back to the same
"record" in the directory.  In LDAP terminology, the two lines would
refer back to the same directory object.  Additionally if two index
lines in the same index object contain different tags, then it is always
the case that those two lines refer back to different records in the
directory. The tags meaning of '*' in the index object are meaningful only tag position is that that specific
token apears in every record in the context of
that transmission. directory.

     The tag applied to the same underlying record in two separate
transmissions of a full-update full-index may be different.  Thus, receiv-
ing receiving index
servers should make no assumptions about the values of the tags across
index object boundaries.  If the recieving index server is
implemented in such a way that it maintains a structure similar to the
one that exists in the tagged index object with numbered tags attached
to various records, then these "internal" tags are distinct from the
tags that appear in the index object as created by the transmitting
index server.

4.4. Incremental Indexing

     The tagged index object format supports the ability of information
servers to distribute only delta index data, rather than distributing
total index information each time.  This scenario, known as incremental
indexing supports three basic types of operations: add, delete and
replace.  If th the incremental updatetype is specified in the tagged index
object, then the index object contains a snapshot of only the changes
that have been made since the index object specified in the lastupdate
header was distributed.  If the receiving index server did not receive
that index object, it should request a total index object.  If the CIP
protocol supports it, the index server may request the specific index
object that it missed.

     If the tagged index object contains an Add Block, then the lines in
the Add Block refer to new records that were added to the information
base of the transmitting index server.  It can be guaranteed that those
records did not exist in any previously received tagged index object,
and the receiving index server can insert this index information in the
index that it already maintains for the transmitting index server.

     If the receiving index server is maintaining internal tags, then a new
internal tag should be created for each tag in the Add Block.

     If the tagged index object contains a Delete Block, then the
structure of the Delete Block contains lines each of which refers to depends on how the "key" field (in consistency is
maintained;

- "completeRecord": all the
attr-name area of tokens connected to the index line) from a record in the information
server that has been deleted since the last update (specified in the
lastupdate header field).  This key field is assumed to be the unique
identifier on the transmitting information server for the record that
   deleted has been deleted.  In to be included, the case of LDAP servers, tag used to connect tokens in this field would have an
attr-name of "dn".  Other forms of information servers would use the
appropriate unique identifier.  Thus, the unique identifier must have
   message has no relation to tags used in previously been sent by the transmitting tagged index server.  If
   objects.

- "uniqueIDBased": only the receiving
index server unique identifier has never received information for the record refered to by
a line in the Delete Block, then it should be ignored, with defined.

- "tagBased": all the proviso
that tokens connected to the receiving index server record has more than likely "lost" some infor-
mation previously distributed by the transmitting index server.  If the
receiving index server is maintaining internal tags, to be included
   but then after process-
ing the Delete Block, preceded by the internal tag numbers may be reordered so as to
not have "holes" used for this specific record in the sequence.
   preceding set of the last full update and the there on following
   incremental updates.

     If the tagged index object contains an Update Block, then the lines
in the Update Block refer to records that were changed in the informa-
tion information
base of the transmitting index server.  As was mentioned in clause
4.3, if any portion of an attribute in Again the information server has been
changed, then the entire attribute must be specified, and all index
information from all values specific content of a multi-valued attribute must be speci-
fied.  If
the attribute was removed from block depends on how the record in consistency is maintained.

- "completeRecord": All the information
server, tokens representing the attribute value specified in old version of the attr-value field should
   record as well as the new ones has to be
empty.  Attributes which included.

- "uniqueIDBased": The unique ID has to be included together with the
  tokens that have not been changed in changed.

- "tagBased": Only the record changed tokens are not
specified. included, but then both the
  old version, if there was one, as well as the new one, if there is
  one.

The Update Block also supports the idea of indexing new
attributes which that were not previously included in the tagged index
object.  For example, if the transmitting index server began including
index information on postal addresses, then it could include an Update
Block in the index object that included all of the index information on
postal addresses for all records in its information base, and indicate
that nothing else has changed.  If

5. Example

     In the receiving following sections, for each different consistencybase
type, the tagged index server object is main-
taining internal tags, then after processing the Update Block, the
internal tag numbers should remain the same.

5. Example

     As an example, represented for the following LDIF [6] entries scenario;
The examples starts with one full update and following that a set of
updates. The underlying information is presented in the resulting
Tagged Index Object are presented. LDIF [6] format.

5.1 The original database

  dn: cn=Barbara Jensen, ou=Product Development, o=Ace Industry, c=US
  objectclass: top
  objectclass: person
  objectclass: organizationalPerson
  cn: Barbara Jensen
  cn: Barbara J Jensen
  cn: Babs Jensen
  sn: Jensen
  uid: bjensen
           telephonenumber: +1 408 555 1212
           description: A big sailing fan.
  dn: cn=Bjorn Jensen, ou=Accounting, o=Ace Industry, c=US
  objectclass: top
  objectclass: person
  objectclass: organizationalPerson
  cn: Bjorn Jensen
  sn: Jensen
           telephonenumber: +1 408 555 1212
  title: Accounting manager
  dn: cn=Gern Jensen, ou=Product Testing, o=Ace Industry, c=US
  objectclass: top
  objectclass: person
  objectclass: organizationalPerson
  cn: Gern Jensen
  cn: Gern O Jensen
  sn: Jensen
           uid: gernj
           telephonenumber: +1 408 555 1212
  title: testpilot
  dn: cn=Horatio Jensen, ou=Product Testing, o=Ace Industry, c=US
  objectclass: top
  objectclass: person
  objectclass: organizationalPerson
  cn: Horatio Jensen
  cn: Horatio N Jensen
  sn: Jensen
           uid: hjensen
           telephonenumber: +1 408 555 1212

     The Tagged Index Object for this example would be:
  title: testpilot

5.1.1 "Complete" consistency based full update

    version: x-tagged-index-1
    updatetype: total
    thisupdate: 855938804
    BEGIN IO-Schema
                      dn:
    cn: TOKEN
    sn: FULL
                      ou:
    title: TOKEN
                      o:
    END IO-Schema
    BEGIN Index-Info
    cn: 1/Barbara
    -1/J
    -1/Babs
    -*/Jensen
    -2/Bjorn
    -3/Gern
    -3/O
    -4/Horatio
    -4/N
    sn: */Jensen
    title: 1/product
    -1-2/manager
    -1/accounting
    -3,4/testpilot
    END Index-Info

5.1.2 "tag" consistency based full update

    version: x-tagged-index-1
    updatetype: total
    thisupdate: 855938804
    BEGIN IO-Schema
    cn: TOKEN
                      c:
    sn: FULL
    title: TOKEN
                      objectclass:
    END IO-Schema
    BEGIN Index-Info
    cn: 1/Barbara
    -1/J
    -1/Babs
    -*/Jensen
    -2/Bjorn
    -3/Gern
    -3/O
    -4/Horatio
    -4/N
    sn: */Jensen
    title: 1/product
    -1-2/manager
    -1/accounting
    -3,4/testpilot
    END Index-Info

5.1.3 "unique" consistency based full update

    version: x-tagged-index-1
    updatetype: total
    thisupdate: 855938804
    BEGIN IO-Schema
    dn: FULL
    cn: TOKEN
    sn: FULL
                      uid: FULL
    title: TOKEN
    END IO-Schema
    BEGIN Index-Info
    dn: 1/cn=Barbara Jensen,ou=Product
Development,o=Ace Industry,c=US Jensen, ou=Product Development, o=Ace Industry, c=US
    -2/cn=Bjorn Jensen,ou=Accounting,o=Ace
Industry,c=US Jensen, ou=Accounting, o=Ace Industry, c=US
    -3/cn=Gern Jensen,ou=Product Testing,o=Ace
Industry,c=US Jensen, ou=Product Testing, o=Ace Industry, c=US
    -4/cn=Horatio Jensen,ou=Product Testing,o=Ace
Industry,c=US
                      ou: 1,3-4/Product
                      -1/Development
                      -2/Accounting
                      -3-4/Testing
                      o: */Ace
                      -*/Industry
                      c: */US
                      objectclass: */top
                      -*/person
                      -*/organizationalPerson Jensen, ou=Product Testing, o=Ace Industry, c=US
    cn: 1/Barbara
    -1/J
    -1/Babs
    -*/Jensen
    -2/Bjorn
    -3/Gern
    -3/O
    -4/Horatio
    -4/N
    sn: */Jensen
                      uid: 1/bjensen
                      -3/gernj
                      -4/hjensen
    title: 1/product
                      1/manager
                      1/rod
                      1/and
                      1/reel
                      1/division
    -1-2/manager
    -1/accounting
    -3,4/testpilot
    END Index-Info

     As an example of the Incremental Index Object, consider an

5.2 First update
that occurs when Barbara

  Gern Jensen's entry above changes to:

  dn: cn=Barbara Jensen-Smith, cn=Gern Jensen, ou=Product Development, Testing, o=Ace Industry, c=US
  objectclass: top
  objectclass: person
  objectclass: organizationalPerson
  cn: Barbara Jensen-Smith
           cn: Barbara J Jensen-Smith Gern Jensen
  cn: Babs Jensen-Smith Gern O Jensen
  sn: Jensen-Smith
           uid: bjensen
           telephonenumber: +1 408 555 1212
           description: A big sailing fan.

     The Tagged Index Object for this example would be: Jensen
  title: chiefpilot

5.2.1 First update using "complete"

    version: x-tagged-index-1
    updatetype: incremental
    lastupdate: 855940000
    thisupdate: 855938804
    BEGIN IO-schema
                      dn: FULL
                      rdn: FULL
    cn: TOKEN
    sn: FULL
    title: FULL
    END IO-Schema
    BEGIN Update Block
                      dn: 1/cn=Barbara Jensen,ou=Product
Development,o=Ace Industry,c=US
                      rdn: 1/rdn=Barbara Jensen-Smith
    BEGIN Old
    cn: 1/ Barbara 1/Gern
    cn: 1/ Babs 1/O
    cn: 1/Jensen-Smith 1/Jensen
    sn: 1/Jensen-Smith 1/Jensen
    title: 1/ 1/testpilot
    END Old
    BEGIN New
    cn: 1/Gern
    cn: 1/O
    cn: 1/Jensen
    sn: 1/Jensen
    title: 1/chiefpilot
    END New
    END Update Block

     Note that in the above record, the attributes dn, cn and sn are
modified from the original record.  The attributes that do not change
from the original are objectclass, uid, telephonenumber and description.
Any attributes that are not changed SHOULD not be present in UPDATE
block.  Notice the title attribute has been removed from Barbara Jensen-
Smith's entry.

     In this next example, consider an LDIF file containing a series of
change records and comments.

5.2.2 First update using "tag" consistency

    version: x-tagged-index-1
    updatetype: incremental
    lastupdate: 855940000
    thisupdate: 855938804
    BEGIN IO-schema
    cn: TOKEN
    sn: FULL
    title: FULL
    END IO-Schema
    BEGIN Update Block
    BEGIN Old
    title: 3/testpilot
    END Old
    BEGIN New
    title: 3/chiefpilot
    END New
    END Update Block

5.2.3 First update using "unique" ID's

    version: x-tagged-index-1
    updatetype: incremental
    lastupdate: 855940000
    thisupdate: 855938804
    BEGIN IO-schema
    cn: TOKEN
    sn: FULL
    title: FULL
    END IO-Schema
    BEGIN Update Block
    BEGIN Old
    dn: 1/cn=Gern Jensen, ou=Product Testing, o=Ace Industry, c=US
    title: 1/testpilot
    END Old
    BEGIN New
    dn: 1/cn=Gern Jensen, ou=Product Testing, o=Ace Industry, c=US
    title: 1/chiefpilot
    END New
    END Update Block

5.3 Second update

   # Add a new entry
   dn: cn=Fiona Jensen, cn=Bo Didley, ou=Marketing, o=Ace Industry, c=US
   changetype: add
   objectclass: top
   objectclass: person
   objectclass: organizationalPerson
   cn: Fiona Jensen Bo Didley
   sn: Jensen
   uid: fiona
   telephonenumber: +1 408 555 1212
   jpegphoto:< /usr/local/directory/photos/fiona.jpg Didley
   title: Policy Maker
   # Delete an existing entry
   dn: cn=Robert cn=Bjorn Jensen, ou=Marketing, ou=Accounting, o=Ace Industry, c=US
   changetype: delete
   # Modify all other entries: adding an entry's relative distinguished name additional locality value
   dn: cn=Paul cn=Barbara Jensen, ou=Product Development, o=Ace Industry, c=US
   changetype: modrdn
   newrdn: cn=Paula Jensen
   deleteoldrdn: 1
   # Rename and entry and move all of its children to a new location in
   # the directory tree (only implemented by LDAPv3 servers). modify
   add: locality
   locality: New Jersey
   dn: ou=PD Accountants, cn=Gern Jensen, ou=Product Development, Testing, o=Ace Industry, c=US
   changetype: modrdn
   newrdn: ou=Product Development Accountants
   deleteoldrdn: 0
   newsuperior: ou=Accounting, o=Ace Industry, c=US
   # Modify an entry: add an additional value to the postaladdress
attribute,
   # completely delete the description attribute, replace the
telephonenumber
   # attribute with two values, and delete a specific value from the
   # facsimiletelephonenumber attribute modify
   add: locality
   locality: New Orleans
   dn: cn=Paula cn=Horatio Jensen, ou=Product Development, Testing, o=Ace Industry, c=US
   changetype: modify
   add: postaladdress
   postaladdress: 123 Anystreet $ Sunnyvale, CA $ 94086
   -
   delete: description
   -
   replace: telephonenumber
   telephonenumber: +1 408 555 1234
   telephonenumber: +1 408 555 5678
   -
   delete: facsimiletelephonenumber
   facsimiletelephonenumber: +1 408 555 9876
   -
     The Tagged Index Object for this example would be: locality
   locality: New Caledonia

5.3.1 "complete"

    version: x-tagged-index-1
    updatetype: incremental
thisupdate: 855938804
    lastupdate: 855912345 855938804
    thisupdate: 855939525
    BEGIN IO-Schema
dn: FULL
ou: IO-schema
    cn: TOKEN
o:
    sn: FULL
    title: FULL
    locality: TOKEN
c:
    END IO-Schema
    BEGIN Add Block
    cn: 1/Bo
    -1/Didley
    sn: 1/Didley
    title: 1/Policy
    -1/maker
    locality: 1/New
    -1/York
    END Add Block
    BEGIN Delete Block
    cn: 1/Bjorn
    -1/Jensen
    sn: 1/Jensen
    title: 1/Accounting
    -1/Manager
    END Delete Block
    BEGIN Update Block
    BEGIN Old
    cn: 1/Barbara
    -1/J
    -1-3/Jensen
    -2/Gern
    -2/O
    -3/Horatio
    sn: 1-3/Jensen
    title: 1/Production
    -1/Manager
    -2/Testpilot
    -3/Chiefpilot
    END Old
    BEGIN New
    cn: 1/Barbara
    -1/J
    -1-3/Jensen
    -2/Gern
    -2/O
    -3/Horatio
    sn: 1-3/Jensen
    title: 1/Production
    -1/Manager
    -2/Testpilot
    -3/Chiefpilot
    locality: 1/Jersey
    -2/Orleans
    -3/Caledonia
    -1-3/New
    END New    END Update Block

5.3.2 "tag"

    version: x-tagged-index-1
    updatetype: incremental
    lastupdate: 855938804
    thisupdate: 855939525
    BEGIN IO-schema
    cn: TOKEN
objectclass:
    sn: FULL
    title: FULL
    locality: TOKEN
    END IO-Schema
    BEGIN Add Block
    cn: 5/Bo
    -5/Didley
    sn: 5/Didley
    title: 5/Policy
    -5/maker
    locality: 5/New
    -5/York
    END Add Block
    BEGIN Delete Block
    cn: 2/Bjorn
    -2/Jensen
    sn: 2/Jensen
    title: 2/Accounting
    -2/Manager
    END Delete Block
    BEGIN Update Block
    BEGIN New
    locality: 1/Jersey
    -2/Orleans
    -4/Caledonia
    -1,2,4/New
    END New
    END Update Block

5.3.3 "unique"

    version: x-tagged-index-1
    updatetype: incremental
    lastupdate: 855938804
    thisupdate: 855939525
    BEGIN IO-schema
    cn: TOKEN
    sn: FULL
uid: FULL
    title: FULL
    locality: TOKEN
    END IO-Schema
    BEGIN Add Block
objectclass: top
objectclass: person
objectclass: organizationalPerson
c: 1/us
o: 1/Ace
o: 1/Industry
ou: 1/Marketing
cn: 1/Fiona
    dn: 1/cn=Bo Didley, ou=Marketing, o=Ace Industry, c=US
    cn: 1/Jensen 1/Bo
    -1/Didley
    sn: 1/Jensen
uid: 1/Fiona 1/Didley
    title: 1/Policy
    -1/maker
    locality: 1/New
    -1/York
    END Add Block
    BEGIN Delete Block
    dn: 1/cn=Robert 1/cn=Bjorn Jensen, ou=Marketing, ou=Accounting, o=Ace Industry, c=us c=US
    END Delete Block
    BEGIN Update Block
    BEGIN New
    dn: 1/ou=PD Accountants, 1/cn=Barbara Jensen, ou=Product Development, o=Ace Industry, c=US
-2/cn=Paula
    -2/cn=Gern Jensen, ou=Product Development, Testing, o=Ace Industry, c=US
rdn: 1/Product Development Accountants
description: 2/
telephonenumber: 2/+1 408 555 5678
facsimilenumber: 2/
postaladdress: 2/123
-2/AnyStreet
-2/Sunnyvale
-2/CA
-2/94086
    -3/cn=Horatio Jensen, ou=Product Testing, o=Ace Industry, c=US
    locality: 1/Jersey
    -2/Orleans
    -3/Caledonia
    -1-3/New
    END New
    END Update Block
END Index-Info

6. Aggregation

6.1. Aggregation of Tagged Index Objects

     Aggregation of two tagged index objects is done by merging the  two
lists  of  values  and  rewriting each tag list.  The tag list rewriting
process is done so that the resulting index object appears as if it came
from a single source.  Tags from one of the two tagged index objects are
"mapped" to the number space above that used by the other  tagged  index
object. An index server that aggregates tagged index
objects for export MUST ensure that the export URL (i.e. the base-uri of
the CIP object) for the aggregate index object will route all queries
that have "hits" on the index object to that server (otherwise, query
routing will not succeed).

7. Security Considerations

     This  specification provides a protocol for transfering transferring information
between two servers.  The actual information transfered transferred may be protected
by  laws in many countries, so care must be taken in the methods used to
tokenize the data in order to ensure that  protected  data  may  not  be
reconstructed  in  full by the receiving server.  This protocol does not
have any inherent protection against spoofing  or  eavesdropping.   How-
ever,
However,  since  this  protocol is transported in MIME messages (as are all
CIP index objects), it inherits all of  the  security  capabilities  and
liabilities of other MIME messages.  Specifically, those wanting to pre-
vent
prevent eavesdropping or spoofing may use some of  the  various  techniques
for signing and encrypting MIME messages.

     Information  Server  administrators  must  decide  what portions of
their databases are  appropriate  for  inclusion  in  the  Tagged  Index
Object.   For  distribution  of  information  outside of the enterprise,
information server developers are encouraged  to  allow  for  facilities
that  hide the organizational structure when generating the Tagged Index
Object from the underlying information database.  In order to  To allow  for
the  secure  transmission  of  Tagged Index Objects across the Internet,

Index Servers should make use of SSL to carry out when completing the  connection. In
order  to  strongly  verify the identity of the peer index server on the
other side of the connection, SSL version 3 certificate exchange  should
be  implemented,  and the identity in the peer's certificate verify with
the Public Key Infrastructure.  If electronic mail is used  to  exchange
the  Tagged  Index  Objects,  then  a secure messaging facility, such as
PGP/MIME  or S/MIME should be used to sign  or  encrypt  (or  both)  the
information.

8. References

[1]  J.  Allen,  M.  Mealling,  "The Architecture of the Common Indexing
     Protocol (CIP)," Internet Draft (work in progress) June 1997.

[2]  C. Weider, J. Fullton, S. Spero, "Architecture of the Whois++ Index
     Service.  RFC 1913, February 1996.

[3]  M. Wahl, T. Howes, S. Kille, "Lightweight Directory Access Protocol
     (v3)," Internet Draft (work in progress), June RFC 2251, December 1997.

[4]  ITU, "X.525 Information Technology - Open Systems Interconnection -
     The Directory: Replication", November 1993.

[5]  "FORTEZZA  Application  Implementors  Guide for the FORTEZZA Crypto
     Card (Production Version)", Document #PD4002102-1.01, SPYRUS, 1995.

[6]  G. Good, " The LDAP Data Interchange Format (LDIF). (LDIF) - Technical
     Specification", Internet Draft (work in
     progress), 25 prgress) , November 1996. 1998.

[7]  R. Hedberg, "LDAPv2 client Vs the Index Mesh". Internet Draft (work
     in progress), November 1997.

[8]  T.  Howes, M. Smith, "The LDAP URL Format". Internet Draft (work in
     progress), June Format", RFC 2255, December 1997.

[9]  M. Elkins, "MIME Security with Pretty Good Privacy (PGP)", RFC2015, RFC 2015,
     October 1996.

[10] Blake Ramsdell, "S/MIME Version 3 Message Specification",  Internet
     Draft,  (work in progress), May 1997. August 1998.

[11] C. Allen, T. Dierks,  "The  TLS  Protocol  Version  1.0",  Internet
     Draft, (work in progress), November 1997.

9.  Author's Addresses

    Roland Hedberg
     Umdac
     Umea University
     901 87 Umea
     Sweden
    Catalogix
    Dalsveien 53
    0387 Oslo
    Norway
    Email:  Roland.Hedberg@umdac.umu.se  roland@catalogix.ac.se

    Bruce Greenblatt
     RSA Data Security
     100 Marine Parkway
     Suite 500
     Redwood City,
    6841 Heaton Moor Drive
    San Jose, CA 94065 95119
    USA
    Email: bgreenblatt@rsa.com bruceg@innetix.com
    Phone: +1-650-595-8782 +1-408-224-5349

    Ryan Moats
    AT&T
    15621 Drexel Circle
    Omaha, NE 68135-2358
    USA
    EMail:  jayhawk@ds.internic.net  jayhawk@att.com
    Phone:  +1 402 894-9456
    Mark Wahl
     Critical Angle,
    Innosoft International, Inc.
     4815 W Braker Lane #502-385
    8911 Capital of Texas Hwy, Suite 4140
    Austin, TX 78759
     Email: M.Wahl@critical-angle.com
    USA
    Phone +1 626 919 3600
    EMail  Mark.Wahl@innosoft.com

                           Table of Contents

1. Introduction  . . . . . . . . . . . . . . . . . . . . . . . . . .   2
2. Background  . . . . . . . . . . . . . . . . . . . . . . . . . . .   2
3. Object  . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   4
4. The Tagged Index Object . . . . . . . . . . . . . . . . . . . . .   5
4.1. The Agreement . . . . . . . . . . . . . . . . . . . . . . . . .   5
4.2. Content Type  . . . . . . . . . . . . . . . . . . . . . . . . .   7
4.3 Tagged Index BNF . . . . . . . . . . . . . . . . . . . . . . . .   8
4.3.1. Header Descriptions . . . . . . . . . . . . . . . . . . . . .  10
4.3.2. Tokenization types  . . . . . . . . . . . . . . . . . . . . .  11
4.3.3. Tag Conventions . . . . . . . . . . . . . . . . . . . . . . .  11
4.4. Incremental Indexing  . . . . . . . . . . . . . . . . . . . . .  11
5. Example Examples  . . . . . . . . . . . . . . . . . . . . . . . . . . . .  13
5.1 The original database  . . . . . . . . . . . . . . . . . . . . .  13
5.1.1 "complete" consistency based full update . . . . . . . . . . .  14
5.1.2 "tag" consistency based full update  . . . . . . . . . . . . .  14
5.1.3 "unique" consistency based full update . . . . . . . . . . . .  15
5.2 First update . . . . . . . . . . . . . . . . . . . . . . . . . .  15
5.2.1 "complete" consistency based incremental update  . . . . . . .  16
5.2.2 "tag" consistency based incremental update   . . . . . . . . .  16
5.2.3 "unique" consistency based incremental update  . . . . . . . .  17
5.3 Second update  . . . . . . . . . . . . . . . . . . . . . . . . .  17
5.3.1 "complete" consistency based incremental update  . . . . . . .  18
5.3.2 "tag" consistency based incremental update . . . . . . . . . .  19
5.3.3 "unique" consistency based incremental update  . . . . . . . .  20
6. Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . .  18  20
6.1 Aggregation of Tagged Index Objects  . . . . . . . . . . . . . .  18  20
7. Security Considerations . . . . . . . . . . . . . . . . . . . . .  18  21
8. References  . . . . . . . . . . . . . . . . . . . . . . . . . . .  19  21
9. Author's Addresses  . . . . . . . . . . . . . . . . . . . . . . .  20  22