Network Working Group Roland Hedberg Internet Draft Bruce Greenblatt<draft-ietf-find-cip-tagged-06.txt><draft-ietf-find-cip-tagged-07.txt> Ryan Moats Expires in six months Mark Wahl A Tagged Index Object for use in the Common Indexing Protocol Status of this Memo This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months. Internet-Drafts may be updated, replaced, or made obsolete by other documents at any time. It is not appropriate to use Internet- Drafts as reference material or to cite them other than as a "working draft" or "work in progress". To view the entire list of current Internet-Drafts, please check the "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow Directories on ftp.is.co.za (Africa), ftp.nordu.net (Northern Europe), ftp.nis.garr.it (Southern Europe), munnari.oz.au (Pacific Rim), ftp.ietf.org (US East Coast), or ftp.isi.edu (US West Coast). Distribution of this document is unlimited. Abstract This document defines a mechanism by which information servers can exchange indices of information from their databases by making use of the Common Indexing Protocol (CIP). This document defines the structure of the index information being exchanged, as well as a the appropriate meanings for the headers that are defined in the Common Indexing Proto- col. It is assumed that the structures defined here can be used by X.500 DSAs, LDAP servers, Whois++ servers,CCSOCSO Ph servers and many others. 1. Introduction The Common Indexing Protocol (CIP) as defined in [1] proposes a mechanism for distributing searches across several instances of a single type of search enginewith a viewtocreatingcreate a global directory. CIP provides a scalable, flexible scheme to tie individual databases into distributed data warehouses that can scale gracefully with the growth of the Internet. CIP provides a mechanism for meeting these goals that is independent of the access method that is used to access theactualdata that underlies the indices. Separate from CIP is the definition of the Index Object that is used to contain the information that is exchanged among Index Servers. One such Index Object that has already been defined is the Centroid that is derived from the Whois++ protocol [2]. The Centroid does not meet allofthe requirements for the exchange of index information amongst information servers. For example, it does not support the notion of incremental updates natively. For information servers that contain millions of records in their database, constant exchange of complete dredges of the database is bandwidth intensive. The Tagged Index Object is specifically designed to support the exchange of index update information. This design comes at the cost of an increase in the size of the index object being exchanged. The Centroid is also not tailored to always be able to give boolean answers to queries. In the Centroid Model, "an index server will take a query in standard Whois++ format, search its collections of centroids and other forward information, determine which servers hold records which may fill that query, and then notifies the user's client of the next servers to contact to submit the query." [2] Thus, the exchange of Centroids amongst index servers allows hints to be givenas toabout which information server actually contains the information. The Tagged Index Object labels the various pieces of information with identifiers that tie the individual object attributes back to an object as a whole. This"tag- ging""tagging" of information allows an index server to be more capable of directing a specific query to the appropriate information server. Again, this feature is added to the Tagged Index Object at the expense of an increase in the size of the index object. 2. Background The Lightweight Directory Access Protocol (LDAP) is defined in [3], and it defines a mechanism for accessing a collection of information arranged hierarchically in such amannerway as to provide a globally distributed database which is normally called the Directory Information Tree (DIT). Some distinguishing characteristics of LDAP servers are thatit is normally the case thatnormally, several servers cooperate to manage a common subtree of the DIT. LDAP servers are expected to respond to requests that pertain to portions of the DIT for which they have data, as well as for those portions for which they have no information in their database. For example, the LDAP server for a portion of the DIT in the United States (c=US) must be able to provide a response to a Search operation that pertains to a portion of the DIT in Sweden (c=se). Nor- mally, the response given will be a referral to another LDAP server that is expected to be more knowledgeable about the appropriate subtree. However, there is no mechanism that currently enables these LDAP servers to refer the LDAP client to the supposedly more knowledgeable server. Typically, an LDAP (v3) server is configured with the name of exactly one other LDAP server to which all LDAP clients are referred when their requests fall outside the subtree of the DIT for which that LDAP server has knowledge. This specification defines a mechanism whereby LDAP server can exchange index information that will allow referrals to point towards a clearly accurate destination.While theThe X.500 series of recommendations defines the Directory Information Shadowing Protocol (DISP) [4] which allows X.500 DSAs to exchangeactualinformation in the DIT. Shadowing allows variousinfor- mationinformation from various portions of the DIT to be replicated amongstpartic- ipatingparticipating DSAs. The design point of DISP isoptimizedimproved at the exchange of entire portions of the DIT, whereas the design point of CIP and the Tagged Index Object isoptimizeoptimized at the exchange of structural index information about the DIT, and improving the performance of tree naviga- tion amongst various information servers. The Tagged Index Object is more appropriate for the exchange of index information than is DISP. DISP is more targeted at DIT distribution and fault tolerance. DISP is thus more appropriate for the exchange of theactualdata in order to spread the load amongst several information servers. DISP is tailored specifically to X.500 (and other hierarchical directory systems), while the Tagged Index Object and CIP can be used in a wide variety of infor- mation server environments. While DISP allows an individual directory server to collect infor- mation about large parts of the DIT, it would require a huge database to collect allofthe replicas for ameaningfulsignificant portion of the DIT. Fur- thermore, as X.525 states: "Before shadowing can occur, an agreement, covering the conditions under which shadowing may occur is required. Although such agreements may be established in a variety of ways, such as policy statements covering all DSAs within a given DMD ...", where a DMD is a Directory Management Domain. This isdueowing to the case that theactualdata in the DIT is being exchanged amongst DSA rather than only the information required to maintain an Index. In many environments such an agreement is not appropriate, andin orderto collectinforma- tioninformation for a meaningful portion of the DIT,a large number ofmany agreements may need to be arranged. 3. Object What is desired is to have an information server (or network of information servers) that can quickly respond to real world requests, like: - What is TimHowes'Howes's email address? This is much harderthan,than; Whatisemail address does Tim Howes atNetscape's email address.Netscape have ? - What is the X.509 certificate for Fred Smith at compuserve.com? One certainly doesn't want to search CompuServe's entire directory tree to find out this one piece of information. I also don't want to have to shadow the entire CompuServe directory subtree onto my server. If this request is being made because Fred is trying to log into my server, I'd certainly want to be able to respond to the BIND in real time. - Who are allofthe people at Novell that have a title ofprogram- mer? All ofprogrammer? all these requests can reasonably be translated into LDAP or Whois++, and other directory access protocol queries. They can also be serviced in a straightforwardmannerway by the users home information server if it has the appropriate reference information into the database that contains the source data.In this situation,Here, the first server would be able to "chain" the requeston behalf offor the user. Alterna- tively, a precise referral could be returned. If the home information server wants to service (i.e chain) the request based on the index information that it has on hand, this servicing could be doneby any number ofseveral different means: - issuing LDAP operations to the remote directory server - issuing DSP operations to the remote directory server - issuing DAP operations to the remote directory server - issuing Whois++ operations to the remote Whois++ server - ... 4. The Tagged Index Object This section defines a Tagged Index Object that can be exchanged by Information Servers using CIP. Whilein many casesoften it is acceptable for Information Servers to make use of the Centroidconstruct (as defined indefinition (from [2]) to exchange index information, the goals in defining a new con- struct are multi-pronged: - When the Information Server receives a search request that warrants that a referral be returned, allow the server to return a referral that will point client to a server that is most likely able to answer the request correctly. False positive referrals (the search turns up hits in the index object that generate referrals to servers that don't hold the desired information) can be reduced, depending on the choice of attribute tokenization types that are used. -When the Information Server receives a search request that is not operating against local data,Potentially allowthe Information Server itself to "chain" the request to the appropriate remote Information Server. Noteincremental updates thatLDAP itself does not define how Chaining works, but X.500 does. This seems very similar to the first "prong". - Finally, when a collection of Information Servers are operating against a large distributed directory, allow themwill then consume substantially less bandwidth then if full updates always had todistribute index information amongst themselves (ala CIP) so that as their own searches canbecarried out with some degree of efficiency.used. 4.1. The Agreement Before a Tagged Index Object can be exchanged, the organizationwhichthat administers the object supplier and the organizationwhichthat admin- isters the object consumer must reach an agreement on how the servers will communicate. This agreement contains the following: -"version":The version of the agreement and the index type."index-type": This specification describes the index type "x-tagged-index-1" - "dsi": An OIDwhichthat uniquely identifies the subtree and scope. This field is not explicitly necessary, as it may not provide information beyondthat whichwhat is contained in the "base-uri" below. - "base-uri": One or more URI'swhichthat will form the base of any referrals created baseduponon the index object that is governed by this agreement. For example, in the LDAP URL format [8] the base- uri would specify (among other items): the LDAP host, the base object to which this index object refers (e.g. c=SE), and the scope of the index object (e.g. single container). - "supplier": The hostname and listeningport numberportnumber of the supplier server, as well as any alternative servers holding that same naming contexts,in caseif the supplier is unavailable. - "consumeraddr": This is a URI of the "mailto:" form, with the RFC 822 email address of the consumer server.SubsequentFurther versions of this draft allow other forms of URI, so that the consumer may retrieve the update via the WWW, FTP or CIP - "updateinterval": The maximum duration in seconds between occu- rances of the supplier server generating an update. If the con- sumer server has not received an update from the supplier server after waiting this long since the previous update, it is likely that the index information is now out of date. A typical value for a server with frequent updates would be 604800 seconds, or every week. Servers whose DITs are only modified annually could have a much longer update interval. - "attributeNamespace": Every set of index servers that together wants to support a specific usage of indeces, has to agree on which attributenames to use in the index objects. The participating directory servers also has to agree on the mapping from local attributenames to the attributenames used in the index. Since one specific index server might be involved in several such sets, it has to have some way to connect a update to the proper set of indexes. One possible solution to this would be to use different DSIs. - "consistencybase": How consistency of the index is maintained over incremental updates: "complete" - every change or delete concerning one object has to contain all tokens connected to that object. This method must be supported by any server who wants to comply with this standard. "tag" - starting at a full update every incremental update refering back to this full updated has to maintain state- information regarding tags, such that a object within the original database is assigned the same tagnumber every time. This method is optional. "unique" - every object in the Dataset has to have a unique value for a specific attribute in the index. A example of such a attribute could be the distinguishedName attribute. This method is also optional. - "securityoption": Whether and how the supplier server should sign and encrypt the update before sending it to the consumer server. Options for this version of the specification are: "none" - the update is sent in plaintext "PGP/MIME": the update is digitally signed and encrypted using PGP [9] "S/MIME": the update is digitally signed and encrypted using S/MIME [10] "SSLv3": the update is digitally signed and encrypted using an SSLv3 connection [11] "Fortezza": the update is digitally signed and encrypted using Fortezza [5] It is recommended that the "PGP/MIME" option be used whenexchang- ingexchanging sensitive information across public networks, and both the supplier and consumer have PGP keys. The "Fortezza" option is intended for use in environments where security protocols are based on Fortezza-compatible devices. The "S/MIME" option can be used with both the supplier and consumer have RSA keys and can make use of the PKCS protocols defined in the S/MIME specification. The "SSLv3" option can be used when both the supplier and consumer have access to SSL services, have server certifi- cates, and can mutually authenticate each other.Should these be IANA registered things???- Security Credentials: The long-term cryptographic credentials used for key exchange and authentication of the consumer and supplier servers, if a security option was selected. For"PGP/MIME","PGP/MIME," this will be the trusted public keys of both servers. For"Fortezza","Fortezza," this will be the certificate paths of both servers to a common point of trust. For "S/MIME" and "SSLv3" these will be the certifi- cates of the supplier and consumer. Note that if the index server maintains the information that would appear in the agreement in a directory according to the definitions in [7], then no real formal agreement between the two parties needs to be put in place, and the information that is required for communication between the two index servers is derived automatically from thedirec- tory.directory. 4.2. Content Type The update consists of a MIME object of type application/cip-index- object. The parameters are: "type": this has value "application/index.obj.tagged". "dsi": the DSI (if any) from the agreement. "base-uri". A set of URIs, separated by spaces. In each URI, the hostname/portno must be distinct, and based on the "supplier" part of the agreement. The payload is mostly textual data but may include bytes with the high bit set. The originating information server should set the con- tent-transfer-encoding as appropriate for the information included in the payload. This object may be encapsulated in a wrapper content (such as mul- tipart/signed) or be encrypted as part of the security procedures. The resulting content can the distributed, for example via electronic mail. For example, From: supplier@sup.com Date: Thu, 16 Jan 1997 13:50:37 -0500 Message-Id: <199701161850.NAA29295@sup.com>; To: consumer@consumer.com <<-- from consumer server address Reply-to: supplier-admin@sup.com MIME-Version: 1.0 Content-Type: application/index.obj.tagged; dsi=1.3.6.1.4.1.1466.85.85.1.2.3.4.5.6.7.8.9.10.11.12.13.14.15.16; base-uri="ldap://sup.com/dc=sup,dc=com ldap://alt.com/dc=sup,dc=com" The payload is series of CRLF-terminated lines. The payloadonly includes characters from a subset of the printable US-ASCII subset of UTF-8. Attribute values that occur outside of this subset are encoded as defined below. As more experienceisgained with index objects and UTF-8 data, a future version of this specification may allow for the native transfer of UTF-8 data without requiring this special encoding. No other character sets are permitted by this version of the specifica- tion.UTF-8. Some supplier servers may only be able to generate the printable US-ASCIIsubset,subset of UTF-8, but all consumer servers must be able to handle the full range of Unicode characters when decoding the attribute values (in the "attr-value" field in the BNF below). 4.3. Tagged Index BNF The Tagged Index object has the following grammar, expressed in modified BNF format: index-object = 0*(io-part SEP) io-part io-part = header SEP schema-spec SEP index-info header = version-spec SEP update-type SEP this-update SEP last-updateSEPcontext-size name-space SEP version-spec = "version:" *SPACE "x-tagged-index-1" update-type = "updatetype:" *SPACE ( "total" |"incremental")( "incremental" [*SPACE "tagbased"|"uniqueIDbased" ] ) this-update = "thisupdate:" *SPACE TIMESTAMP last-update = [ "lastupdate:" *SPACE TIMESTAMP]SEP] context-size = [ "contextsize:" *SPACE 1*DIGIT]SEP] schema-spec = "BEGIN IO-Schema" SEP 1*(schema-line SEP) "END IO-Schema" schema-line = attribute-name ":" token-type token-type = "FULL" | "TOKEN" | "RFC822" | "UUCP" | "DNS" index-info = full-index | incremental-index full-index = "BEGIN Index-Info" SEP 1*(index-block SEP) "END Index-Info" incremental-index = 1*(add-block | delete-block | update-block) add-block = "BEGIN Add Block" SEP 1*(index-block SEP) "END Add Block" delete-block = "BEGIN Delete Block" SEP 1*(index-block SEP) "END Delete Block" update-block = "BEGIN Update Block" SEP1*(index-block0*(old-index-block SEP) 1*(new-index-block SEP) "END Update Block" old-index-block = "BEGIN Old" SEP 1*(index-block SEP) "END Old" new-index-block = "BEGIN New" SEP 1*(index-block SEP) "END New" index-block = first-line 0*(SEP cont-line) first-line = attr-name ":" *SPACE taglist "/" attr-value cont-line = "-" taglist "/" attr-value taglist = tag 0*("," tag) | "*" tag = 1*DIGIT ["-" 1*DIGIT] attr-value =0*(UTF8)1*(UTF8) attr-name = 1*(NAMECHAR)UTF8 = ASCII | "%" HEX HEXTIMESTAMP = 1*DIGITASCII = DIGIT | UPPER | LOWER | OTHERNAMECHAR = DIGIT | UPPER | LOWER | "-" | ";" | "." SPACE = <ASCII space,hex 20>;%x20>; SEP = (CR LF) | LF CR = <ASCII CR, carriage return,hex 0D>;%x0D>; LF = <ASCII LF, line feed,hex 0A>; HEX = "a" | "b" | "c" | "d" | "e" | "f" | DIGIT%x0A>; DIGIT = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" UPPER = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" | "J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" | "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z" LOWER = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" | "j" | "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z"OTHERUS-ASCII-SAFE ="(" | ")" | "+" | "," | "-" | "." | "/" | ":" | "=" | "?" | "@" | ";" | "$" | "_" | "!" | "~" | "*" | "'" | "\" | """ | "#" | "&" | "<" | ">" | "[" | "]" | "^" | "`" | "{" | "|" | "}" Characters that are allowed to appear unescaped in attr-values are the printable subset of (low) ASCII minus the "%" characters, i.e. hex 21 through hex 7e inclusive with the exception of hex 25 (which is the "%" character). Any other UTF-8 encoding of a character that appears in an attr-value must be excaped by using the "%" character and two hex digits that encode the character. For example, The UCS-2 sequence "A<NOT IDENTICAL TO><ALPHA>." (0041, 2262, 0391, 002E) may be encoded in UTF-8 as follows: 41 E2 89 A2 CE 91 2E If this character sequence appears in an attribute that is in a Tagged Index Object attr-value, then it is encoded as: 41 25 65 32 25 38 39 25 61 32 25 63 65 25 39 31 2E When viewed as an character string the encoding appears as: "A%e2%89%a2%ce%91."%x01-09 / %x0B-0C / %x0E-7F ;; US-ASCII except CR, LF, NUL UTF8 = US-ASCII-SAFE / UTF8-1 / UTF8-2 / UTF8-3 / UTF8-4 / UTF8-5 UTF8-CONT = %x80-BF UTF8-1 = %xC0-DF UTF8-CONT UTF8-2 = %xE0-EF 2UTF8-CONT UTF8-3 = %xF0-F7 3UTF8-CONT UTF8-4 = %xF8-FB 4UTF8-CONT UTF8-5 = %xFC-FD 5UTF8-CONT The set of characters allowed to appear in the attr-name field is limited to the set of characters used in LDAP and WHOIS++ attribute names. For other services that have attribute name character sets that are larger than these,it is suggested thatthose services should create a pro- file that maps the names onto object identifiers, and the sequence of digits and periods is used by those services in creating the attr-name fields for their Tagged Index Objects.Note that the attribute value may only be empty in the case of an incremental updateIt is worth mentioning thatcontainsupdates to a"Update Block"index based inwhich thetagged indexobject indicates that certain attributes ofobjectsare being removed. This specification only supports the replacement of entire attributes, so that in the case of a multi-valued attribute, all of the values mustMUST bespecifiedperformed in theReplace Block, not just the newly added values. The intention of the Tagged Index Object is to supply a snapshot oforder specified by thecur- renttagged indexof the directory.object itself. 4.3.1. Header Descriptions The header section consists of one or more "header lines". The following header lines are defined: "version": This line must always be present, and have the value "x- tagged-index-1" for this version of the specification. "updatetype": This line must always be present. It takes as the value either "total" or "incremental". The first update sent by a supplier server to a consumer server for a DSI must be a "total"update (why?).update. "thisupdate": This line must always be present. The value is the number of seconds from 00:00:00 UTC January 1, 1970 at which the supplier constructed this update. "lastupdate": This line must be present if the "updatetype" list has the value "incremental". The value is the number of seconds from 00:00:00 UTC January 1, 1970 at which the supplier constructed the previous update sent to the consumer. This field allows the consumer to determine if a previous update wasmissed.missed "contextsize": This line may be present at the supplier's option. The value is a number, which is the approximate total number of entries in the subtree. This information is provided for statisti- cal purposes only. 4.3.2. Tokenization Types The Tagged Index Object inherits the "TOKEN" scheme for tokeniza- tion as specified in [2]. In addition, there are several other tok- enization schemes defined for the Tagged Index Object. The following table presents these schemes and what character(s) are used to delimit tokens. Token Type Tokenization Characters FULL none TOKEN white space, "@" RFC822 white space, ".", "@" UUCP white space, "!" DNS any character note a number, letter, or "-" 4.3.3. Tag Conventions In the tag list, multiple consecutive tags may be shortened by using "#-#". For example, the list "3,4,5,6,7,8,9,10" may be shortened to "3-10". Tags are to be applied to the data on a per entry level. Thus, if two index lines in the same index object contain the same tag, thenit is always the case thatthose two lines always referbackto the same "record" in the directory. In LDAP terminology, the two lines would referbackto the same directory object. Additionally if two index lines in the same index object contain different tags, then it is always the case that those two lines refer back to different records in the directory. Thetagsmeaning of '*' in theindex object are meaningful onlytag position is that that specific token apears in every record in thecontext of that transmission.directory. The tag applied to the same underlying record in two separate transmissions of afull-updatefull-index may be different. Thus,receiv- ingreceiving index servers should make no assumptions about the values of the tags across index object boundaries.If the recieving index server is implemented in such a way that it maintains a structure similar to the one that exists in the tagged index object with numbered tags attached to various records, then these "internal" tags are distinct from the tags that appear in the index object as created by the transmitting index server.4.4. Incremental Indexing The tagged index object format supports the ability of information servers to distribute only delta index data, rather than distributing total index information each time. This scenario, known as incremental indexing supports three basic types of operations: add, delete and replace. Ifththe incremental updatetype is specified in the tagged index object, then the index object contains a snapshot of only the changes that have been made since the index object specified in the lastupdate header was distributed. If the receiving index server did not receive that index object, it should request a total index object. If the CIP protocol supports it, the index server may request the specific index object that it missed. If the tagged index object contains an Add Block, then the lines in the Add Block refer to new records that were added to the information base of the transmitting index server. It can be guaranteed that those records did not exist in any previously received tagged index object, and the receiving index server can insert this index information in the index that it already maintains for the transmitting index server. If thereceiving index server is maintaining internal tags, then a new internal tag should be created for each tag in the Add Block. If thetagged index object contains a Delete Block, then the structure of the Delete Blockcontains lines each of which refers todepends on how the"key" field (inconsistency is maintained; - "completeRecord": all theattr-name area oftokens connected to theindex line) from arecordin the information server that has been deleted since the last update (specified in the lastupdate header field). This key field is assumedto bethe unique identifier on the transmitting information server for the record thatdeleted hasbeen deleted. Into be included, thecase of LDAP servers,tag used to connect tokens in thisfield would have an attr-name of "dn". Other forms of information servers would use the appropriate unique identifier. Thus, the unique identifier must havemessage has no relation to tags used in previouslybeensentby the transmittingtagged indexserver. Ifobjects. - "uniqueIDBased": only thereceiving index serverunique identifier hasnever received information for the record referedtoby a line in the Delete Block, then it shouldbeignored, withdefined. - "tagBased": all theproviso thattokens connected to thereceiving index serverrecord hasmore than likely "lost" some infor- mation previously distributed by the transmitting index server. If the receiving index server is maintaining internal tags,to be included but thenafter process- ing the Delete Block,preceded by theinternaltagnumbers may be reordered so as to not have "holes"used for this specific record in thesequence.preceding set of the last full update and the there on following incremental updates. If the tagged index object contains an Update Block, then the lines in the Update Block refer to records that were changed in theinforma- tioninformation base of the transmitting index server.As was mentioned in clause 4.3, if any portion of an attribute inAgain theinformation server has been changed, then the entire attribute must be specified, and all index information from all valuesspecific content ofa multi-valued attribute must be speci- fied. Iftheattribute was removed fromblock depends on how therecord inconsistency is maintained. - "completeRecord": All theinformation server,tokens representing theattribute value specified inold version of theattr-value field shouldrecord as well as the new ones has to beempty. Attributes whichincluded. - "uniqueIDBased": The unique ID has to be included together with the tokens that havenot been changed inchanged. - "tagBased": Only therecordchanged tokens arenot specified.included, but then both the old version, if there was one, as well as the new one, if there is one. The Update Block also supports the idea of indexing new attributeswhichthat were not previously included in the tagged index object. For example, if the transmitting index server began including index information on postal addresses, then it could include an Update Block in the index object that included allofthe index information on postal addresses for all records in its information base, and indicate that nothing else has changed.If5. Example In thereceivingfollowing sections, for each different consistencybase type, the tagged indexserverobject ismain- taining internal tags, then after processing the Update Block, the internal tag numbers should remain the same. 5. Example As an example,represented for the followingLDIF [6] entriesscenario; The examples starts with one full update and following that a set of updates. The underlying information is presented in theresulting Tagged Index Object are presented.LDIF [6] format. 5.1 The original database dn: cn=Barbara Jensen, ou=Product Development, o=Ace Industry, c=US objectclass: top objectclass: person objectclass: organizationalPerson cn: Barbara Jensen cn: Barbara J Jensen cn: Babs Jensen sn: Jensen uid: bjensentelephonenumber: +1 408 555 1212 description: A big sailing fan.dn: cn=Bjorn Jensen, ou=Accounting, o=Ace Industry, c=US objectclass: top objectclass: person objectclass: organizationalPerson cn: Bjorn Jensen sn: Jensentelephonenumber: +1 408 555 1212title: Accounting manager dn: cn=Gern Jensen, ou=Product Testing, o=Ace Industry, c=US objectclass: top objectclass: person objectclass: organizationalPerson cn: Gern Jensen cn: Gern O Jensen sn: Jensenuid: gernj telephonenumber: +1 408 555 1212title: testpilot dn: cn=Horatio Jensen, ou=Product Testing, o=Ace Industry, c=US objectclass: top objectclass: person objectclass: organizationalPerson cn: Horatio Jensen cn: Horatio N Jensen sn: Jensenuid: hjensen telephonenumber: +1 408 555 1212 The Tagged Index Object for this example would be:title: testpilot 5.1.1 "Complete" consistency based full update version: x-tagged-index-1 updatetype: total thisupdate: 855938804 BEGIN IO-Schemadn:cn: TOKEN sn: FULLou:title: TOKENo:END IO-Schema BEGIN Index-Info cn: 1/Barbara -1/J -1/Babs -*/Jensen -2/Bjorn -3/Gern -3/O -4/Horatio -4/N sn: */Jensen title: 1/product -1-2/manager -1/accounting -3,4/testpilot END Index-Info 5.1.2 "tag" consistency based full update version: x-tagged-index-1 updatetype: total thisupdate: 855938804 BEGIN IO-Schema cn: TOKENc:sn: FULL title: TOKENobjectclass:END IO-Schema BEGIN Index-Info cn: 1/Barbara -1/J -1/Babs -*/Jensen -2/Bjorn -3/Gern -3/O -4/Horatio -4/N sn: */Jensen title: 1/product -1-2/manager -1/accounting -3,4/testpilot END Index-Info 5.1.3 "unique" consistency based full update version: x-tagged-index-1 updatetype: total thisupdate: 855938804 BEGIN IO-Schema dn: FULL cn: TOKEN sn: FULLuid: FULLtitle: TOKEN END IO-Schema BEGIN Index-Info dn: 1/cn=BarbaraJensen,ou=Product Development,o=Ace Industry,c=USJensen, ou=Product Development, o=Ace Industry, c=US -2/cn=BjornJensen,ou=Accounting,o=Ace Industry,c=USJensen, ou=Accounting, o=Ace Industry, c=US -3/cn=GernJensen,ou=Product Testing,o=Ace Industry,c=USJensen, ou=Product Testing, o=Ace Industry, c=US -4/cn=HoratioJensen,ou=Product Testing,o=Ace Industry,c=US ou: 1,3-4/Product -1/Development -2/Accounting -3-4/Testing o: */Ace -*/Industry c: */US objectclass: */top -*/person -*/organizationalPersonJensen, ou=Product Testing, o=Ace Industry, c=US cn: 1/Barbara -1/J -1/Babs -*/Jensen -2/Bjorn -3/Gern -3/O -4/Horatio -4/N sn: */Jensenuid: 1/bjensen -3/gernj -4/hjensentitle: 1/product1/manager 1/rod 1/and 1/reel 1/division-1-2/manager -1/accounting -3,4/testpilot END Index-InfoAs an example of the Incremental Index Object, consider an5.2 First updatethat occurs when BarbaraGern Jensen's entry above changes to: dn:cn=Barbara Jensen-Smith,cn=Gern Jensen, ou=ProductDevelopment,Testing, o=Ace Industry, c=US objectclass: top objectclass: person objectclass: organizationalPerson cn:Barbara Jensen-Smith cn: Barbara J Jensen-SmithGern Jensen cn:Babs Jensen-SmithGern O Jensen sn:Jensen-Smith uid: bjensen telephonenumber: +1 408 555 1212 description: A big sailing fan. The Tagged Index Object for this example would be:Jensen title: chiefpilot 5.2.1 First update using "complete" version: x-tagged-index-1 updatetype: incremental lastupdate: 855940000 thisupdate: 855938804 BEGIN IO-schemadn: FULL rdn: FULLcn: TOKEN sn: FULL title: FULL END IO-Schema BEGIN Update Blockdn: 1/cn=Barbara Jensen,ou=Product Development,o=Ace Industry,c=US rdn: 1/rdn=Barbara Jensen-SmithBEGIN Old cn:1/ Barbara1/Gern cn:1/ Babs1/O cn:1/Jensen-Smith1/Jensen sn:1/Jensen-Smith1/Jensen title:1/1/testpilot END Old BEGIN New cn: 1/Gern cn: 1/O cn: 1/Jensen sn: 1/Jensen title: 1/chiefpilot END New END Update BlockNote that in the above record, the attributes dn, cn and sn are modified from the original record. The attributes that do not change from the original are objectclass, uid, telephonenumber and description. Any attributes that are not changed SHOULD not be present in UPDATE block. Notice the title attribute has been removed from Barbara Jensen- Smith's entry. In this next example, consider an LDIF file containing a series of change records and comments.5.2.2 First update using "tag" consistency version: x-tagged-index-1 updatetype: incremental lastupdate: 855940000 thisupdate: 855938804 BEGIN IO-schema cn: TOKEN sn: FULL title: FULL END IO-Schema BEGIN Update Block BEGIN Old title: 3/testpilot END Old BEGIN New title: 3/chiefpilot END New END Update Block 5.2.3 First update using "unique" ID's version: x-tagged-index-1 updatetype: incremental lastupdate: 855940000 thisupdate: 855938804 BEGIN IO-schema cn: TOKEN sn: FULL title: FULL END IO-Schema BEGIN Update Block BEGIN Old dn: 1/cn=Gern Jensen, ou=Product Testing, o=Ace Industry, c=US title: 1/testpilot END Old BEGIN New dn: 1/cn=Gern Jensen, ou=Product Testing, o=Ace Industry, c=US title: 1/chiefpilot END New END Update Block 5.3 Second update # Add a new entry dn:cn=Fiona Jensen,cn=Bo Didley, ou=Marketing, o=Ace Industry, c=US changetype: add objectclass: top objectclass: person objectclass: organizationalPerson cn:Fiona JensenBo Didley sn:Jensen uid: fiona telephonenumber: +1 408 555 1212 jpegphoto:< /usr/local/directory/photos/fiona.jpgDidley title: Policy Maker # Delete an existing entry dn:cn=Robertcn=Bjorn Jensen,ou=Marketing,ou=Accounting, o=Ace Industry, c=US changetype: delete # Modify all other entries: adding anentry's relative distinguished nameadditional locality value dn:cn=Paulcn=Barbara Jensen, ou=Product Development, o=Ace Industry, c=US changetype:modrdn newrdn: cn=Paula Jensen deleteoldrdn: 1 # Rename and entry and move all of its children to a new location in # the directory tree (only implemented by LDAPv3 servers).modify add: locality locality: New Jersey dn:ou=PD Accountants,cn=Gern Jensen, ou=ProductDevelopment,Testing, o=Ace Industry, c=US changetype:modrdn newrdn: ou=Product Development Accountants deleteoldrdn: 0 newsuperior: ou=Accounting, o=Ace Industry, c=US # Modify an entry: add an additional value to the postaladdress attribute, # completely delete the description attribute, replace the telephonenumber # attribute with two values, and delete a specific value from the # facsimiletelephonenumber attributemodify add: locality locality: New Orleans dn:cn=Paulacn=Horatio Jensen, ou=ProductDevelopment,Testing, o=Ace Industry, c=US changetype: modify add:postaladdress postaladdress: 123 Anystreet $ Sunnyvale, CA $ 94086 - delete: description - replace: telephonenumber telephonenumber: +1 408 555 1234 telephonenumber: +1 408 555 5678 - delete: facsimiletelephonenumber facsimiletelephonenumber: +1 408 555 9876 - The Tagged Index Object for this example would be:locality locality: New Caledonia 5.3.1 "complete" version: x-tagged-index-1 updatetype: incrementalthisupdate: 855938804lastupdate:855912345855938804 thisupdate: 855939525 BEGINIO-Schema dn: FULL ou:IO-schema cn: TOKENo:sn: FULL title: FULL locality: TOKENc:END IO-Schema BEGIN Add Block cn: 1/Bo -1/Didley sn: 1/Didley title: 1/Policy -1/maker locality: 1/New -1/York END Add Block BEGIN Delete Block cn: 1/Bjorn -1/Jensen sn: 1/Jensen title: 1/Accounting -1/Manager END Delete Block BEGIN Update Block BEGIN Old cn: 1/Barbara -1/J -1-3/Jensen -2/Gern -2/O -3/Horatio sn: 1-3/Jensen title: 1/Production -1/Manager -2/Testpilot -3/Chiefpilot END Old BEGIN New cn: 1/Barbara -1/J -1-3/Jensen -2/Gern -2/O -3/Horatio sn: 1-3/Jensen title: 1/Production -1/Manager -2/Testpilot -3/Chiefpilot locality: 1/Jersey -2/Orleans -3/Caledonia -1-3/New END New END Update Block 5.3.2 "tag" version: x-tagged-index-1 updatetype: incremental lastupdate: 855938804 thisupdate: 855939525 BEGIN IO-schema cn: TOKENobjectclass:sn: FULL title: FULL locality: TOKEN END IO-Schema BEGIN Add Block cn: 5/Bo -5/Didley sn: 5/Didley title: 5/Policy -5/maker locality: 5/New -5/York END Add Block BEGIN Delete Block cn: 2/Bjorn -2/Jensen sn: 2/Jensen title: 2/Accounting -2/Manager END Delete Block BEGIN Update Block BEGIN New locality: 1/Jersey -2/Orleans -4/Caledonia -1,2,4/New END New END Update Block 5.3.3 "unique" version: x-tagged-index-1 updatetype: incremental lastupdate: 855938804 thisupdate: 855939525 BEGIN IO-schema cn: TOKEN sn: FULLuid: FULLtitle: FULL locality: TOKEN END IO-Schema BEGIN Add Blockobjectclass: top objectclass: person objectclass: organizationalPerson c: 1/us o: 1/Ace o: 1/Industry ou: 1/Marketing cn: 1/Fionadn: 1/cn=Bo Didley, ou=Marketing, o=Ace Industry, c=US cn:1/Jensen1/Bo -1/Didley sn:1/Jensen uid: 1/Fiona1/Didley title: 1/Policy -1/maker locality: 1/New -1/York END Add Block BEGIN Delete Block dn:1/cn=Robert1/cn=Bjorn Jensen,ou=Marketing,ou=Accounting, o=Ace Industry,c=usc=US END Delete Block BEGIN Update Block BEGIN New dn:1/ou=PD Accountants,1/cn=Barbara Jensen, ou=Product Development, o=Ace Industry, c=US-2/cn=Paula-2/cn=Gern Jensen, ou=ProductDevelopment,Testing, o=Ace Industry, c=USrdn: 1/Product Development Accountants description: 2/ telephonenumber: 2/+1 408 555 5678 facsimilenumber: 2/ postaladdress: 2/123 -2/AnyStreet -2/Sunnyvale -2/CA -2/94086-3/cn=Horatio Jensen, ou=Product Testing, o=Ace Industry, c=US locality: 1/Jersey -2/Orleans -3/Caledonia -1-3/New END New END Update BlockEND Index-Info6. Aggregation 6.1. Aggregation of Tagged Index Objects Aggregation of two tagged index objects is done by merging the two lists of values and rewriting each tag list. The tag list rewriting process is done so that the resulting index object appears as if it came from a single source.Tags from one of the two tagged index objects are "mapped" to the number space above that used by the other tagged index object.An index server that aggregates tagged index objects for export MUST ensure that the export URL (i.e. the base-uri of the CIP object) for the aggregate index object will route all queries that have "hits" on the index object to that server (otherwise, query routing will not succeed). 7. Security Considerations This specification provides a protocol fortransferingtransferring information between two servers. Theactualinformationtransferedtransferred may be protected by laws in many countries, so care must be taken in the methods used to tokenize the datain orderto ensure that protected data may not be reconstructed in full by the receiving server. This protocol does not have any inherent protection against spoofing or eavesdropping.How- ever,However, since this protocol is transported in MIME messages (as are all CIP index objects), it inherits allofthe security capabilities and liabilities of other MIME messages. Specifically, those wanting topre- ventprevent eavesdropping or spoofing may use some of the various techniques for signing and encrypting MIME messages. Information Server administrators must decide what portions of their databases are appropriate for inclusion in the Tagged Index Object. For distribution of information outsideofthe enterprise, information server developers are encouraged to allow for facilities that hide the organizational structure when generating the Tagged Index Object from the underlying information database.In order toTo allow for the secure transmission of Tagged Index Objects across the Internet, Index Servers should make use of SSLto carry outwhen completing the connection. In order to strongly verify the identity of the peer index server on the other side of the connection, SSL version 3 certificate exchange should be implemented, and the identity in the peer's certificate verify with the Public Key Infrastructure. If electronic mail is used to exchange the Tagged Index Objects, then a secure messaging facility, such as PGP/MIME or S/MIME should be used to sign or encrypt (or both) the information. 8. References [1] J. Allen, M. Mealling, "The Architecture of the Common Indexing Protocol (CIP)," Internet Draft (work in progress) June 1997. [2] C. Weider, J. Fullton, S. Spero, "Architecture of the Whois++ Index Service. RFC 1913, February 1996. [3] M. Wahl, T. Howes, S. Kille, "Lightweight Directory Access Protocol (v3),"Internet Draft (work in progress), JuneRFC 2251, December 1997. [4] ITU, "X.525 Information Technology - Open Systems Interconnection - The Directory: Replication", November 1993. [5] "FORTEZZA Application Implementors Guide for the FORTEZZA Crypto Card (Production Version)", Document #PD4002102-1.01, SPYRUS, 1995. [6] G. Good, " The LDAP Data Interchange Format(LDIF).(LDIF) - Technical Specification", Internet Draft (work inprogress), 25prgress) , November1996.1998. [7] R. Hedberg, "LDAPv2 client Vs the Index Mesh". Internet Draft (work in progress), November 1997. [8] T. Howes, M. Smith, "The LDAP URLFormat". Internet Draft (work in progress), JuneFormat", RFC 2255, December 1997. [9] M. Elkins, "MIME Security with Pretty Good Privacy (PGP)",RFC2015,RFC 2015, October 1996. [10] Blake Ramsdell, "S/MIME Version 3 Message Specification", Internet Draft, (work in progress),May 1997.August 1998. [11] C. Allen, T. Dierks, "The TLS Protocol Version 1.0", Internet Draft, (work in progress), November 1997. 9. Author's Addresses Roland HedbergUmdac Umea University 901 87 Umea SwedenCatalogix Dalsveien 53 0387 Oslo Norway Email:Roland.Hedberg@umdac.umu.seroland@catalogix.ac.se Bruce GreenblattRSA Data Security 100 Marine Parkway Suite 500 Redwood City,6841 Heaton Moor Drive San Jose, CA9406595119 USA Email:bgreenblatt@rsa.combruceg@innetix.com Phone:+1-650-595-8782+1-408-224-5349 Ryan Moats AT&T 15621 Drexel Circle Omaha, NE 68135-2358 USA EMail:jayhawk@ds.internic.netjayhawk@att.com Phone: +1 402 894-9456 Mark WahlCritical Angle,Innosoft International, Inc.4815 W Braker Lane #502-3858911 Capital of Texas Hwy, Suite 4140 Austin, TX 78759Email: M.Wahl@critical-angle.comUSA Phone +1 626 919 3600 EMail Mark.Wahl@innosoft.com Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2. Background . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 3. Object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 4. The Tagged Index Object . . . . . . . . . . . . . . . . . . . . . 5 4.1. The Agreement . . . . . . . . . . . . . . . . . . . . . . . . . 5 4.2. Content Type . . . . . . . . . . . . . . . . . . . . . . . . . 7 4.3 Tagged Index BNF . . . . . . . . . . . . . . . . . . . . . . . . 8 4.3.1. Header Descriptions . . . . . . . . . . . . . . . . . . . . . 10 4.3.2. Tokenization types . . . . . . . . . . . . . . . . . . . . . 11 4.3.3. Tag Conventions . . . . . . . . . . . . . . . . . . . . . . . 11 4.4. Incremental Indexing . . . . . . . . . . . . . . . . . . . . . 11 5.ExampleExamples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 5.1 The original database . . . . . . . . . . . . . . . . . . . . . 13 5.1.1 "complete" consistency based full update . . . . . . . . . . . 14 5.1.2 "tag" consistency based full update . . . . . . . . . . . . . 14 5.1.3 "unique" consistency based full update . . . . . . . . . . . . 15 5.2 First update . . . . . . . . . . . . . . . . . . . . . . . . . . 15 5.2.1 "complete" consistency based incremental update . . . . . . . 16 5.2.2 "tag" consistency based incremental update . . . . . . . . . 16 5.2.3 "unique" consistency based incremental update . . . . . . . . 17 5.3 Second update . . . . . . . . . . . . . . . . . . . . . . . . . 17 5.3.1 "complete" consistency based incremental update . . . . . . . 18 5.3.2 "tag" consistency based incremental update . . . . . . . . . . 19 5.3.3 "unique" consistency based incremental update . . . . . . . . 20 6. Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . .1820 6.1 Aggregation of Tagged Index Objects . . . . . . . . . . . . . .1820 7. Security Considerations . . . . . . . . . . . . . . . . . . . . .1821 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . .1921 9. Author's Addresses . . . . . . . . . . . . . . . . . . . . . . .2022