idnits 2.17.1 draft-rfced-info-moats-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-26) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 5 longer pages, the longest (page 5) being 62 lines == It seems as if not all pages are separated by form feeds - found 0 form feeds but 6 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** The abstract seems to contain references ([1]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. == There are 1 instance of lines with non-RFC2606-compliant FQDNs in the document. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (December 1998) is 9264 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. '1' ** Downref: Normative reference to an Informational RFC: RFC 1107 (ref. '2') ** Downref: Normative reference to an Informational RFC: RFC 1430 (ref. '3') ** Downref: Normative reference to an Informational RFC: RFC 1588 (ref. '4') -- Possible downref: Non-RFC (?) normative reference: ref. '5' Summary: 12 errors (**), 0 flaws (~~), 4 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet-Draft Ryan Moats 2 draft-rfced-info-moats-03.txt Rick Huber 3 Expires in six months AT&T 4 December 1998 6 Building Directories from DNS: Experiences from WWWSeeker 7 Filename: draft-rfced-info-moats-03.txt 9 Status of This Memo 11 This document is an Internet-Draft. Internet-Drafts are working 12 documents of the Internet Engineering Task Force (IETF), its 13 areas, and its working groups. Note that other groups may also 14 distribute working documents as Internet-Drafts. 16 Internet-Drafts are draft documents valid for a maximum of six 17 months and may be updated, replaced, or obsoleted by other 18 documents at any time. It is inappropriate to use Internet- 19 Drafts as reference material or to cite them other than as ``work 20 in progress.'' 22 To learn the current status of any Internet-Draft, please check 23 the ``1id-abstracts.txt'' listing contained in the Internet- 24 Drafts Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net 25 (Europe), munnari.oz.au (Pacific Rim), ftp.ietf.org (US East 26 Coast), or ftp.isi.edu (US West Coast). 28 Abstract 30 There has been much discussion and several documents written about 31 the need for an Internet Directory. Recently, this discussion has 32 focused on ways to discover an organization's domain name without 33 relying on use of DNS as a directory service. This draft discusses 34 lessons that were learned during InterNIC Directory and Database 35 Services' development and operation of WWWSeeker, an application that 36 finds a web site given information about the name and location of an 37 organization. The back end database that drives this application was 38 built from information obtained from domain registries via WHOIS and 39 other protocols. We present this information to help future 40 implementors avoid some of the blind alleys that we have already 41 explored. This work builds on the Netfind system that was created by 42 Mike Schwartz and his team at the University of Colorado at Boulder 43 [1]. 45 INTERNET DRAFT Building Directories from DNS: Experiences from 46 WWWSeeker December 1998 48 1. Introduction 50 Over time, there have been several RFCs [2, 3, 4] about approaches 51 for providing Internet Directories. Many of the earlier documents 52 discussed white pages directories that supply mappings from a 53 person's name to their telephone number, email address, etc. 55 More recently, there has been discussion of directories that map from 56 a company name to a domain name or web site. Many people are using 57 DNS as a directory today to find this type of information about a 58 given company. Typically when DNS is used, users guess the domain 59 name of the company they are looking for and then prepend "www.". 60 This makes it highly desirable for a company to have an easily 61 guessable name. 63 There are two major problems here. As the number of assigned names 64 increases, it becomes more difficult to get an easily guessable name. 65 Also, the TLD must be guessed as well as the name. While many users 66 just guess ".COM" as the "default" TLD today, there are many two- 67 letter country code top-level domains in current use as well as other 68 gTLDs (.NET, .ORG, and possibly .EDU) with the prospect of additional 69 gTLDs in the future. As the number of TLDs in general use increases, 70 guessing gets more difficult. 72 Between July 1996 and our shutdown in March 1998, the InterNIC 73 Directory and Database Services project maintained the Netfind search 74 engine [1] and the associated database that maps organization 75 information to domain names. This database thus acted as the type of 76 Internet directory that associates company names with domain names. 77 We also built WWWSeeker, a system that used the Netfind database to 78 find web sites associated with a given organization. The experienced 79 gained from maintaining and growing this database provides valuable 80 insight into the issues of providing a directory service. We present 81 it here to allow future implementors to avoid some of the blind 82 alleys that we have already explored. 84 2. Directory Population 86 2.1 What to do? 88 There are two issues in populating a directory: finding all the 89 domain names (building the skeleton) and associating those domains 90 with entities (adding the meat). These two issues are discussed 91 below: 93 INTERNET DRAFT Building Directories from DNS: Experiences from 94 WWWSeeker December 1998 96 2.2 Building the skeleton 98 In "building the skeleton," it is popular to suggest using a variant 99 of a "tree walk" to determine the domains that need to be added to 100 the directory. Our experience is that this is neither a reasonable 101 nor an efficient proposal for maintaining such a directory. Except 102 for some infrequent and long-standing DNS surveys [5], DNS "tree 103 walks" tend to be discouraged by the Internet community, especially 104 given that the frequency of DNS changes would require a new tree walk 105 monthly (if not more often). Instead, our experience has shown that 106 data on allocated DNS domains can usually be retrieved in bulk 107 fashion with FTP, HTTP, or Gopher (we have used each of these for 108 particular TLDs). This has the added advantage of both "building the 109 skeleton" and "adding the meat" at the same time. Our favorite 110 method for finding a server that has allocated DNS domain information 111 is to start with the list maintained at 112 http://www.alldomains.com/countryindex.html and go from there. 113 Before this was available, it was necessary to hunt for a registry 114 using trial and error. 116 When maintaining the database, existing domains may be verified via 117 direct DNS lookups rather than a "tree walk." "Tree walks" should 118 therefore be the choice of last resort for directory population, and 119 bulk retrieval should be used whenever possible. 121 2.3 Adding the meat 123 A possibility for populating a directory ("adding the meat") is to 124 use an automated system that makes repeated queries using the WHOIS 125 protocol to gather information about the organization that owns a 126 domain. The queries would be made against a WHOIS server located 127 with the above method. At the conclusion of the InterNIC Directory 128 and Database Services project, our backend database contained about 129 2.9 million records built from data that could be retrieved via 130 WHOIS. The entire database contained 3.25 million records, with the 131 additional records coming from sources other than WHOIS. 133 In our experience this information contains many factual and 134 typographical errors and requires further examination and processing 135 to improve its quality. Further, TLD registrars that support WHOIS 136 typically only support WHOIS information for second level domains 137 (i.e. ne.us) as opposed to lower level domains (i.e. 138 windrose.omaha.ne.us). Also, there are TLDs without registrars, TLDs 139 without WHOIS support, and still other TLDs that use other methods 140 (HTTP, FTP, gopher) for providing organizational information. Based 141 on our experience, an implementor of an internet directory needs to 142 support multiple protocols for directory population. An automated 143 WHOIS search tool is necessary, but isn't enough. 145 INTERNET DRAFT Building Directories from DNS: Experiences from 146 WWWSeeker December 1998 148 3. Directory Updating: Full Rebuilds vs Incremental Updates 150 Given the size of our database in April 1998 when it was last 151 generated, a complete rebuild of the database that is available from 152 WHOIS lookups would require between 134.2 to 167.8 days just for 153 WHOIS lookups from a Sun SPARCstation 20. This estimate does not 154 include other considerations (for example, inverting the token tree 155 required about 24 hours processing time on a Sun SPARCstation 20) 156 that would increase the amount of time to rebuild the entire 157 database. 159 Whether this is feasible depends on the frequency of database updates 160 provided. Because of the rate of growth of allocated domain names 161 (150K-200K new allocated domains per month in early 1998), we 162 provided monthly updates of the database. To rebuild the database 163 each month (based on the above time estimate) would require between 3 164 and 5 machines to be dedicated full time (independent of machine 165 architecture). Instead, we checkpointed the allocated domain list 166 and rebuild on an incremental basis during one weekend of the month. 167 This allowed us to complete the update on between 1 and 4 machines (3 168 Sun SPARCstation 20s and a dual-processor Sparcserver 690) without 169 full dedication over a couple of days. Further, by coupling 170 incremental updates with periodic refresh of existing data (which can 171 be done during another part of the month and doesn't require full 172 dedication of machine hardware), older records would be periodically 173 updated when the underlying information changes. The tradeoff is 174 timeliness and accuracy of data (some data in the database may be 175 old) against hardware and processing costs. 177 4. Directory Presentation: Distributed vs Monolithic 179 While a distributed directory is a desirable goal, we maintained our 180 database as a monolithic structure. Given past growth, it is not 181 clear at what point migrating to a distributed directory becomes 182 actually necessary to support customer queries. Our last database 183 contained over 3.25 million records in a flat ASCII file. Searching 184 was done via a PERL script of an inverted tree (also produced by a 185 PERL script). While admittedly primitive, this configuration 186 supported over 200,000 database queries per month from our production 187 servers. 189 Increasing the database size only requires more disk space to hold 190 the database and inverted tree. Of course, using database technology 191 would probably improve performance and scalability, but we had not 192 reached the point where this technology was required. 194 INTERNET DRAFT Building Directories from DNS: Experiences from 195 WWWSeeker December 1998 197 5. Security 199 The underlying data for the type of directory discussed in this 200 document is already generally available through WHOIS, DNS, and other 201 standard interfaces. No new information is made available by using 202 these techniques though many types of search become much easier. To 203 the extent that easier access to this data makes it easier to find 204 specific sites or machines to attack, security may be decreased. 206 The protocols discussed here do not have built-in security features. 207 If one source machine is spoofed while the directory data is being 208 gathered, substantial amounts of incorrect and misleading data could 209 be pulled in to the directory and be spread to a wider audience. 211 In general, building a directory from registry data will not open any 212 new security holes since the data is already available to the public. 213 Existing security and accuracy problems with the data sources are 214 likely to be amplified. 216 6. Acknowledgments 218 This work described in this document was partially supported by the 219 National Science Foundation under Cooperative Agreement NCR-9218179. 221 7. References 223 Request For Comments (RFC) documents are available at 224 http://info.internet.isi.edu/1/in-notes/rfc and from numerous mirror 225 sites. 227 [1] M. F. Schwartz, C. Pu. "Applying an Information 228 Gathering Architecture to Netfind: A White Pages 229 Tool for a Changing and Growing Internet," Univer- 230 sity of Colorado Technical Report CU-CS-656-93. 231 December 1993, revised July 1994. 233