INTERNET-DRAFT M.Mealling Expires six months from June 1998 Network Solutions, Inc. Intended category: Experimental draft-mealling-human-friendly-identifier-arch-00.txt An Architecture for Supporting Human Friendly Identifiers Status of this Memo This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as work in progress. To view the entire list of current Internet-Drafts, please check the "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow Directories on ftp.is.co.za (Africa), ftp.nordu.net (Northern Europe), ftp.nis.garr.it (Southern Europe), munnari.oz.au (Pacific Rim), ftp.ietf.org (US East Coast), or ftp.isi.edu (US West Coast). Abstract This document describes an architecture that satisfies the requirements for Human Friendly Identifiers as specified in [HFI-REQ]. Specifically it describes the URI scheme "go" as an HFI encoding mechanism, a protocol for the resolution of HFIs, and a scalable and open infrastructure for resolving those HFIs. 1. The Architecture This architecture borrows heavily from DNS both in terms of local servers holding data while the root holds only referrals and in terms of its operational organization reflecting the current direction of DNS root management toward registrars and registries. There are five distinct parts of the architecture: The Root -- Due to the flatness of the HFI space, this service will be heavily loaded. Thus, the data served from the root should be small. Like DNS, it should only contain referrals to locally maintained servers. This can also be thought of as a registry in the parlance of the current gTLD debate. Registrars -- There are two classes of these: qualified and unqualified. Qualified registrars offer a guaranteed level of service as applied to the data that is presented by the service. This distinction between qualified and unqualified data is presented to the client by ranking responses so that hits from qualified registrars are ranked higher than those from unqualified registrars. Unqualified registrars can be any entity. This allows anyone to write entities to the root. Qualified vs. Unqualified is discussed in more detail below. Content Servers -- Since referrals are the only entities kept in the root, the actual data returned during resolution is retrieved from a separate server. This server can be maintained by a registrar or by the entity that requests the HFI. Local Server -- Much like DNS, these servers act as caches and contain data for use only be the local entity. They use the same protocol as the root and act as a chained or referral basis depending on their configuration. Clients -- The entire reason for most systems, the clients are the part that actually send queries and process results. +-----------------------------------------------+ | Root (Registry) | | (HFI, referral, contexts) | +-----------------------------------------------+ /|\/|\ R| /|\ +---------------+ | | e| |R | Registrar |---Qualified Entry--+ | f| |o +---------------+ | e| |o | r| |t +---------------+ | r| |Q | End user |---Unqualified Entry---+ a| |u +---------------+ l| |e +-----------+ |r +----------------+ | |y | Content Server | | +-------------------------+ +----------------+ | | Enterprise Level Server | | /|\ | +-------------------------+ C| R| | /|\ o| e| | | n| q| Referral | +-------------------------+ t| u| +----------------+ | Department Level Server | e| e| | +------------+------------+ n| s| | /|\ |Possible | t| t| \|/ | |Local | | +---------------+ Resolution Request | |Content | +--->| Client |-------------------------+ +------------+ +---------------+ Figure 1. Data Model The data model used by the architecture is fairly simple. The root only contains the actual identifier string, zero or more discriminating contexts, and enough information to refer the client to the host that contains the required data. Contexts are values specified by the registrant that discriminate the particular HFI from other HFIs with the same value. Potential contexts include geographic region, topic area/industry segment, popularity, or unique identifier. It has not been determined which contexts are required, if any. The metadata that is returned to the client resides on Content Servers. The referral to the client contains a host/port tuple that refers to Content Server. The data maintained there is encoded in an RDF [1] object that adheres to the RDF Schema specified in Appendix A. Since RDF allows multiple schema, the local Content Server maintainer has the ability to include community specific information within the returned object. The client is only required to understand the schema in Appendix A. Match Semantics The first match is done on the HFI itself. The user's query can specify simple syntactic matches at this point. Since the HFI is in Unicode there may be language specific matches that are possible. Unicode specific match semantics are a topic of much discussion. One 1 or more syntactic matches are made, the user supplied contexts are matched with the result set. Due to the expected size and load on the root, contexts should be thought of as simple scalar values. For example, if geographic area is specified as a context then the values should be normalized outside of the root. This allows the root to do very simple and fast comparisons on normalized codes. The root should not be required to support a GIS back-end in order to understand geographic location. Syntactic matches are matches based on the exact Unicode values of the HFI strings. These include exact and substring where appropriate. It is probably NOT possible to support soundex style matches across such a large, multi-lingual dataset. 2. The "go" URI scheme In order for an HFI to be used within the existing Internet and WorldWideWeb infrastructure it must adhere to the syntax and semantics of Uniform Resource Identifiers [RFC2396]. The HFI requirement that it be short suggests an URI scheme that is small but recognizable. Thus the scheme "go" is specified as the default method of specifying an HFI. The "go" scheme contains a single element which is the HFI itself. Since the HFI is required to be internationalized the scheme will need to be able to handle any language or character set. This requirement suggests that UTF-8 encoded Unicode is appropriate. When displayed to the user an HFI should not be shown in its URI encoded form unless no other form is available. Instead an HFI should be shown according to the localization rules of the user. As with URNs (and most URIs for that matter), the "go" scheme is considered independent of its resolution method. While the protocol for that resolution is specified in this document, the reader should take care to realize that a "go" URI can and will be resolved by other protocols. Example: Displayed Form Encoded Form ------------------------- --------------------------------- go:Nike go:Nike go:Network Solutions go:Network%20Solutions go:Martin J. Duerst go:Martin%20J.%20D%C3%BCrst NOTE: In the last example the limits of this ASCII document do not allow for the correct representation of Martin's last name. 3. The HFI resolution protocols This architecture has several client-server interactions of differing flavors. The protocol between the qualified registrars and the registry is almost out of scope since it is an operational issue that may have its own policy and security issues. The query protocol between the Client and the Local Servers should be identical to the query protocol used with the root since there shouldn't be any architectural difference between the two. The protocol between the Client and the Content Server can be handled by any existing retrieval protocol. HTTP immediately comes to mind as a very valid Content Server protocol. 3.1 Client to Server Query Protocol (CSQP?) For speed the protocol should be simple and small. For a low barrier to adoption the protocol should not require a great deal of encoding. To balance these the protocol will be UTF-8 encoded Unicode. The interaction is simple: the client connects and issues a query after which the server responds with 1 or more referrals. Since both the query and responses are atomic, the protocol can use either TCP or UDP as its transport. TCP uses a simple text based, line oriented interaction while UDP uses a simple, TFTP [RFC1350] style, packet reconstruction. 3.1.1 UDP Interaction Specification of exact UDP interaction should go here. See TFTP [RFC1350] for a good example of how it should be done. 3.1.2 TCP Interaction Specification of exact TCP interaction should go here. This should be fairly easy since its simply the UDP version without any block numbers or acknowledgments. 3.1.3 The Query /* Authors Temporary Comment: These formats are arbitrary and */ /* thus can (and probably should) be changed. */ The Query is made up of 3 elements: the query type, the HFI and n contexts. They are specified as follows: query = query-type " " hfi " " *(crlf context) crlf query-type = "substring" / "exact" / 1*alphadigit hfi = <"go" scheme URI> contect = context-name ":" context-value crlf context-name = 1*alphadigit contet-value = 1*alphadigit alphadigit = alpha / digit / "_" / "-" alpha = "a".."z" / "A".."Z" digit = "0".."9" lf = cr = crlf = cr lf Example: substring go:Nike location:us-ga-atlanta-lawrenceville industry:28 This example shows a query for the HFI "Nike" in the city of Lawrenceville where the entity is in the International Trademark Class 28 ("Toys and sporting goods"). exact go:Network%20Solutions location:us industry:38 This example shows a query for the HFI "Network Solutions" in the United States where the entity is in the International Trademark Class 38 ("Communication services"). 3.1.4 The Response /* Authors Temporary Comment: These formats are arbitrary and */ /* thus can (and probably should) be changed. */ A response is a simple list of hits where each hit is a tuple of the actual HFI that matched, the domain-name of the Content Server, the port on which to contact that host, and a unique id that is used by the Content Server to insure that the correct HFI is requested. It is in the following format: response = *hit hit = hfi domain port unique-id crlf hfi = <"go" scheme URI> port = "0".."65535" unique-id = 1*alphadigit alphadigit = alpha / digit / "_" / "-" alpha = "a".."z" / "A".."Z" digit = "0".."9" lf = cr = crlf = cr lf Examples: go:Network%20Solutions services.netsol.com 8080 01BDF839.D979BBA0@netsol.com This example shows the HFI that matched ("Network Solutions"), the host to be contacted (services.netsol.com), the port (8080) and the unique-id (01BDF839.D979BBA0@netsol.com). The unique ID is to serve as the identifier that is retrieved from the content server. This is for cases where a Content Server maintains multiple objects that share the same HFI. 3.2 The Content Retrieval Protocol The protocol for retrieving the actual RDF object is HTTP. The host is contacted on the given port and the path is requested. The requested path "/hfi/" where is the unique-id found in the referral. The response from the server should be a text/xml or application/xml object that contains an RDF object following the specification in Appendix A. Example: The user requests the HFI for "go:Network%20Solutions" and is presented with the hit from the above example. The client then connects to services.netsol.com on port 8080 and, using HTTP, requests the resource "/hfi/01BDF839.D979BBA0%64netsol.com". The response should be for either an application/xml or text/xml resource that contains the RDF object. All standard HTTP functions are valid. 4. Qualified vs Unqualified The reasoning behind allowing non-registrars to write unqualified entries to the root is to allow for the two communities that are being targeted with HFIs: the business community and the end user. Businesses desire an HFI that is of a higher quality and that have a bit of uniqueness to them. In their case, trademark is extremely important. The end user is simply looking for a cool identifier for use by friends and online contacts. Uniqueness and trademark status are unimportant whereas coolness and vanity are of utmost importance. In order for the system to be used by both, there is the need for the two types of entries to be disambiguated. For example, the South Park cartoon character Cartman is an important trademark for Comedy Central. At the same time, South Park's popularity has caused many online game players to use Cartman as a nickname to identify their online character. Both can use the identifier go:Cartman without there being any confusion as to which one is Comedy Central's official Cartman HFI. One additional feature is that, since the root contains both, Comedy Central has a fairly easy method for checking on infringers and, if so desired, could discover unqualified entries that it wished to pursue infringement litigation against. 7. Author Contact Information Michael Mealling Network Solutions 505 Huntmar Park Drive Herndon, VA 22070 voice: (703) 742-0400 fax: (703) 742-9552 email: michaelm@rwhois.net Appendix A -- XML DTD for Content This is just an example. I'm sure it will end up being a bit more elaborate than this.