idnits 2.17.1 draft-rfced-info-moats-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-25) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 3 longer pages, the longest (page 2) being 60 lines == It seems as if not all pages are separated by form feeds - found 0 form feeds but 4 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There is 1 instance of too long lines in the document, the longest one being 32 characters in excess of 72. ** There are 3 instances of lines with control characters in the document. ** The abstract seems to contain references ([1]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. == There are 1 instance of lines with non-RFC2606-compliant FQDNs in the document. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (June 1998) is 9446 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. '1' ** Downref: Normative reference to an Informational RFC: RFC 1107 (ref. '2') ** Downref: Normative reference to an Informational RFC: RFC 1430 (ref. '3') ** Downref: Normative reference to an Informational RFC: RFC 1588 (ref. '4') ** Downref: Normative reference to an Experimental RFC: RFC 2345 (ref. '5') -- Possible downref: Non-RFC (?) normative reference: ref. '6' Summary: 16 errors (**), 0 flaws (~~), 4 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 INTERNET DRAFT EXPIRES JAN 1999 INTERNET DRAFT 2 Internet-Draft Ryan Moats 3 Rick Huber 4 Expires in six months AT&T 5 June 1998 7 Building Directories from DNS: Experiences from WWWSeeker 8 10 Status of This Memo 12 This document is an Internet-Draft. Internet-Drafts are working 13 documents of the Internet Engineering Task Force (IETF), its 14 areas, and its working groups. Note that other groups may also 15 distribute working documents as Internet-Drafts. 17 Internet-Drafts are draft documents valid for a maximum of six 18 months and may be updated, replaced, or obsoleted by other 19 documents at any time. It is inappropriate to use Internet- 20 Drafts as reference material or to cite them other than as ``work 21 in progress.'' 23 To learn the current status of any Internet-Draft, please check 24 the ``1id-abstracts.txt'' listing contained in the Internet- 25 Drafts Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net 26 (Europe), munnari.oz.au (Pacific Rim), ftp.ietf.org (US East 27 Coast), or ftp.isi.edu (US West Coast). 29 Abstract 31 There has been much discussion and several documents written about 32 the need for an Internet Directory. Recently, this discussion has 33 focussed on ways to discover an organization's domain name without 34 relying on use of DNS as a directory service. This draft discusses 35 lessons that were learned during InterNIC Directory and Database 36 Services' development and operation of WWWSeeker, an application that 37 finds a web site given information about the name and location of an 38 organization. The back end database that drives this application was 39 built from information obtained from domain registries via WHOIS and 40 other protocols. We present this information to help future 41 implementors to avoid some of the blind alleys that we have already 42 explored. This work builds on the Netfind system that was created by 43 Mike Schwartz and his team at the University of Colorado at Boulder 44 [1]. 46 1. Introduction 48 Over time, there have been several RFCs [2, 3, 4] about approaches 49 for providing Internet Directories. Many of the earlier documents 50 discussed white pages directories that supply mappings from a 51 person's name to their telephone number, email address, etc. 53 More recently, there has been discussion of directories that map from 54 a company name to a domain name or web site [5]. Many people are 55 using DNS as a directory today to find this type of information about 56 a given company. Typically when DNS is used, users guess the domain 57 name of the company they are looking for and then prepend "www.". 58 This makes it highly desirable for a company to have an easily 59 guessable name. 61 There are two major problems here. As the number of assigned names 62 increases, it becomes more difficult to get an easily guessable name. 63 Also, the TLD must be guessed as well as the name. While many users 64 just guess ".COM" as the "default" TLD today, there are many two- 65 letter country code top-level domains in current use as well as other 66 gTLDs (.NET, .ORG, and possibly .EDU) with the prospect of additional 67 gTLDs soon. As the number of TLDs in general use continues to 68 increase, guessing gets more difficult every day. 70 Between July 1996 and our shutdown in March 1998, the InterNIC 71 Directory and Database Services project maintained the Netfind search 72 engine [1] and the associated database that maps organization 73 information to domain names and thus acts as the type of Internet 74 directory that associates company names with domain names. We also 75 built WWWSeeker, a system that used the Netfind database to find web 76 sites associated with a given organization. The experienced gained 77 from maintaining and growing this database provides valuable insight 78 into the issues of providing a directory service. We present it here 79 to allow future implementors to avoid some of the blind alleys that 80 we have already explored. 82 2. Directory Population 84 2.1 Using WHOIS to Populate the Directory 86 One proposal for populating a directory is to use WHOIS to gather 87 information about the organization that owns a domain. At the 88 conclusion of the InterNIC Directory and Database Services project, 89 our backend database contained about 2.9 million records that have 90 data that could be retrieved via WHOIS. The entire database 91 contained 3.25 million records, with the additional records coming 92 from sources other than WHOIS. 94 In our experience this information contains a significant number of 95 factual and typographical errors and requires further examination and 96 processing to improve its quality. Also, those TLDs that have 97 registrars that support WHOIS typically only support WHOIS 98 information for second level domains (i.e. ne.us) as opposed to lower 99 level domains (i.e. windrose.omaha.ne.us). Further, there are TLDs 100 without registrars, TLDs without WHOIS support, and still other TLDs 101 that use other methods (HTTP, FTP, gopher) for providing 102 organizational information. Based on our experience, an implementor 103 of an internet directory needs to support multiple protocols for 104 directory population. 106 2.2. Using "Tree Walks" to Populate the Directory 108 Another proposal is to use a variant of a "Tree Walk" to determine 109 the domains that need to be added to the directory. Our experience 110 is that this is neither a reasonable nor an efficient proposal for 111 maintaining such a directory. Except for some infrequent and long- 112 standing DNS surveys [6]. DNS "tree walks" tend to be discouraged by 113 the Internet community, especially given that the frequency of DNS 114 changes would require a new tree walk monthly. Also, our experience 115 has shown that data on allocated DNS domains can be usually retrieved 116 via other faster and more efficient methods (FTP, HTTP, etc.). 118 Since existing domains in the database may be verified via direct DNS 119 lookups rather than a "tree walk," "tree walks" should be the choice 120 of last resort for directory population. 122 3. Directory Updating: Full Rebuilds vs Incremental Updates 124 Given the size of our database in April 1998 when it was last 125 generated, a complete rebuild of the database that is available from 126 WHOIS lookups would require between 11.6 million and 14.5 million 127 seconds of time. This estimate does not include other considerations 128 that would increase the amount of time to rebuild the entire 129 database. 131 Whether this is feasible depends on the frequency of database updates 132 provided. Because of the rate of growth of allocated domain names 133 (150K-200K new allocated domains per month), we provided monthly 134 updates of the database. To rebuild the database each month would 135 require between 3 and 5 machines to be dedicated full time to the 136 task. Instead, we checkpointed the allocated domain list and rebuild 137 on an incremental basis during one weekend of the month. This 138 allowed us to complete the update on between 1 and 4 machines without 139 full dedication over a couple of days. Further, by coupling 140 incremental updates with periodic refresh of existing data (which can 141 be done during another part of the month, and doesn't require full 142 dedication of machine hardware), older records would be periodically 143 updated when the underlying information changes. The tradeoff is 144 timeliness and accuracy of data (some data in the database may be 145 old) against hardware and processing costs. 147 4. Directory Presentation: Distributed vs Monolithic 149 While a distributed directory is a desirable goal, we maintained our 150 database as a monolithic structure. Given past growth, it is not 151 clear at what point migrating to a distributed directory becomes 152 actually necessary to support customer queries. Our last database 153 contained over 3.25 million records in a flat ASCII file. Searching 154 was done via a PERL script of an inverted tree (also produced by a 155 PERL script). While admittedly primitive, this configuration 156 supported over 200,000 database queries per month from our production 157 servers. 159 Increasing the database size only requires more disk space to hold 160 the database and inverted tree. Of course, using database technology 161 would probably improve performance and scalability, but we had not 162 reached the point where this technology was required. 164 5. Acknowledgments 166 This work described in this document was partially supported by the 167 National Science Foundation under Cooperative Agreement NCR-9218179. 169 6. References 171 Request For Comments (RFC) documents are available at 172 http://info.internet.isi.edu/1/in-notes/rfc and from numerous mirror 173 sites. 175 [1] M. F. Schwartz, C. Pu. "Applying an Information 176 Gathering Architecture to Netfind: A White Pages 177 Tool for a Changing and Growing Internet," Univer- 178 sity of Colorado Technical Report CU-CS-656-93. 179 December 1993, revised July 1994. 180 182 [2] K. Sollins, Plan for Internet Directory Services, 183 RFC 1107, July 1989. 185 [3] S. Hardcastle-Kille, E. Huizer, V.Cerf, R. Hobby, 186 S. Kent, A Strategic Plan for Deploying an Internet 187 X.500 Directory Service, RFC 1430, February 1993. 189 [4] J. Postel & C. Anderson, White Pages Meeting 190 Report, RFC 1588, February 1994. 192 [5] J. Klensin, T. Wolf, G. Oglesby, Domain Names and 193 Company Name Retrieval, RFC 2345, May 1998. 195 [6] M. Lottor, "Network Wizards Internet Domain Sur- 196 vey," available from 197 http://www.nw.com/zone/WWW/top.html 199 7. Authors' addresses 201 Ryan Moats Rick Huber 202 AT&T AT&T 203 15621 Drexel Circle Room 1B-433, 101 Crawfords Corner Road 204 Omaha, NE 68135-2358 Holmdel, NJ 07733-3030 205 USA USA 207 EMail: jayhawk@att.com Email: rvh@att.com 209 INTERNET DRAFT EXPIRES JAN 1999 INTERNET DRAFT