| < draft-rfced-info-moats-02.txt | draft-rfced-info-moats-03.txt > | |||
|---|---|---|---|---|
| Internet-Draft Ryan Moats | Internet-Draft Ryan Moats | |||
| draft-rfced-info-moats-02.txt Rick Huber | draft-rfced-info-moats-03.txt Rick Huber | |||
| Expires in six months AT&T | Expires in six months AT&T | |||
| October 1998 | December 1998 | |||
| Building Directories from DNS: Experiences from WWWSeeker | Building Directories from DNS: Experiences from WWWSeeker | |||
| Filename: draft-rfced-info-moats-02.txt | Filename: draft-rfced-info-moats-03.txt | |||
| Status of This Memo | Status of This Memo | |||
| This document is an Internet-Draft. Internet-Drafts are working | This document is an Internet-Draft. Internet-Drafts are working | |||
| documents of the Internet Engineering Task Force (IETF), its | documents of the Internet Engineering Task Force (IETF), its | |||
| areas, and its working groups. Note that other groups may also | areas, and its working groups. Note that other groups may also | |||
| distribute working documents as Internet-Drafts. | distribute working documents as Internet-Drafts. | |||
| Internet-Drafts are draft documents valid for a maximum of six | Internet-Drafts are draft documents valid for a maximum of six | |||
| months and may be updated, replaced, or obsoleted by other | months and may be updated, replaced, or obsoleted by other | |||
| skipping to change at page 1, line 34 ¶ | skipping to change at page 1, line 34 ¶ | |||
| To learn the current status of any Internet-Draft, please check | To learn the current status of any Internet-Draft, please check | |||
| the ``1id-abstracts.txt'' listing contained in the Internet- | the ``1id-abstracts.txt'' listing contained in the Internet- | |||
| Drafts Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net | Drafts Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net | |||
| (Europe), munnari.oz.au (Pacific Rim), ftp.ietf.org (US East | (Europe), munnari.oz.au (Pacific Rim), ftp.ietf.org (US East | |||
| Coast), or ftp.isi.edu (US West Coast). | Coast), or ftp.isi.edu (US West Coast). | |||
| Abstract | Abstract | |||
| There has been much discussion and several documents written about | There has been much discussion and several documents written about | |||
| the need for an Internet Directory. Recently, this discussion has | the need for an Internet Directory. Recently, this discussion has | |||
| focussed on ways to discover an organization's domain name without | focused on ways to discover an organization's domain name without | |||
| relying on use of DNS as a directory service. This draft discusses | relying on use of DNS as a directory service. This draft discusses | |||
| lessons that were learned during InterNIC Directory and Database | lessons that were learned during InterNIC Directory and Database | |||
| Services' development and operation of WWWSeeker, an application that | Services' development and operation of WWWSeeker, an application that | |||
| finds a web site given information about the name and location of an | finds a web site given information about the name and location of an | |||
| organization. The back end database that drives this application was | organization. The back end database that drives this application was | |||
| built from information obtained from domain registries via WHOIS and | built from information obtained from domain registries via WHOIS and | |||
| other protocols. We present this information to help future | other protocols. We present this information to help future | |||
| implementors avoid some of the blind alleys that we have already | implementors avoid some of the blind alleys that we have already | |||
| explored. This work builds on the Netfind system that was created by | explored. This work builds on the Netfind system that was created by | |||
| Mike Schwartz and his team at the University of Colorado at Boulder | Mike Schwartz and his team at the University of Colorado at Boulder | |||
| [1]. | [1]. | |||
| INTERNET DRAFT Building Directories from DNS: Experiences from | INTERNET DRAFT Building Directories from DNS: Experiences from | |||
| WWWSeeker October 1998 | WWWSeeker December 1998 | |||
| 1. Introduction | 1. Introduction | |||
| Over time, there have been several RFCs [2, 3, 4] about approaches | Over time, there have been several RFCs [2, 3, 4] about approaches | |||
| for providing Internet Directories. Many of the earlier documents | for providing Internet Directories. Many of the earlier documents | |||
| discussed white pages directories that supply mappings from a | discussed white pages directories that supply mappings from a | |||
| person's name to their telephone number, email address, etc. | person's name to their telephone number, email address, etc. | |||
| More recently, there has been discussion of directories that map from | More recently, there has been discussion of directories that map from | |||
| a company name to a domain name or web site. Many people are using | a company name to a domain name or web site. Many people are using | |||
| skipping to change at page 3, line 5 ¶ | skipping to change at page 3, line 5 ¶ | |||
| 2. Directory Population | 2. Directory Population | |||
| 2.1 What to do? | 2.1 What to do? | |||
| There are two issues in populating a directory: finding all the | There are two issues in populating a directory: finding all the | |||
| domain names (building the skeleton) and associating those domains | domain names (building the skeleton) and associating those domains | |||
| with entities (adding the meat). These two issues are discussed | with entities (adding the meat). These two issues are discussed | |||
| below: | below: | |||
| INTERNET DRAFT Building Directories from DNS: Experiences from | INTERNET DRAFT Building Directories from DNS: Experiences from | |||
| WWWSeeker October 1998 | WWWSeeker December 1998 | |||
| 2.2 Building the skeleton | 2.2 Building the skeleton | |||
| In "building the skeleton," it is popular to suggest using a variant | In "building the skeleton," it is popular to suggest using a variant | |||
| of a "tree walk" to determine the domains that need to be added to | of a "tree walk" to determine the domains that need to be added to | |||
| the directory. Our experience is that this is neither a reasonable | the directory. Our experience is that this is neither a reasonable | |||
| nor an efficient proposal for maintaining such a directory. Except | nor an efficient proposal for maintaining such a directory. Except | |||
| for some infrequent and long-standing DNS surveys [5], DNS "tree | for some infrequent and long-standing DNS surveys [5], DNS "tree | |||
| walks" tend to be discouraged by the Internet community, especially | walks" tend to be discouraged by the Internet community, especially | |||
| given that the frequency of DNS changes would require a new tree walk | given that the frequency of DNS changes would require a new tree walk | |||
| monthly (if not more often). Instead, our experience has shown that | monthly (if not more often). Instead, our experience has shown that | |||
| data on allocated DNS domains can usually be retrieved in bulk | data on allocated DNS domains can usually be retrieved in bulk | |||
| fashion with FTP, HTTP, or Gopher (we have used each of these for | fashion with FTP, HTTP, or Gopher (we have used each of these for | |||
| particular TLDs). This has the added advantage of both "building the | particular TLDs). This has the added advantage of both "building the | |||
| skeleton" and "adding the meat" at the same time. | skeleton" and "adding the meat" at the same time. Our favorite | |||
| method for finding a server that has allocated DNS domain information | ||||
| is to start with the list maintained at | ||||
| http://www.alldomains.com/countryindex.html and go from there. | ||||
| Before this was available, it was necessary to hunt for a registry | ||||
| using trial and error. | ||||
| When maintaining the database, existing domains may be verified via | When maintaining the database, existing domains may be verified via | |||
| direct DNS lookups rather than a "tree walk." "Tree walks" should | direct DNS lookups rather than a "tree walk." "Tree walks" should | |||
| therefore be the choice of last resort for directory population, and | therefore be the choice of last resort for directory population, and | |||
| bulk retrieval should be used whenever possible. | bulk retrieval should be used whenever possible. | |||
| 2.3 Adding the meat | 2.3 Adding the meat | |||
| A possibility for populating a directory ("adding the meat") is to | A possibility for populating a directory ("adding the meat") is to | |||
| use an automated system (like a spider) that uses the WHOIS protocol | use an automated system that makes repeated queries using the WHOIS | |||
| to gather information about the organization that owns a domain. At | protocol to gather information about the organization that owns a | |||
| the conclusion of the InterNIC Directory and Database Services | domain. The queries would be made against a WHOIS server located | |||
| project, our backend database contained about 2.9 million records | with the above method. At the conclusion of the InterNIC Directory | |||
| built from data that could be retrieved via WHOIS. The entire | and Database Services project, our backend database contained about | |||
| database contained 3.25 million records, with the additional records | 2.9 million records built from data that could be retrieved via | |||
| coming from sources other than WHOIS. | WHOIS. The entire database contained 3.25 million records, with the | |||
| additional records coming from sources other than WHOIS. | ||||
| In our experience this information contains many factual and | In our experience this information contains many factual and | |||
| typographical errors and requires further examination and processing | typographical errors and requires further examination and processing | |||
| to improve its quality. Further, TLD registrars that support WHOIS | to improve its quality. Further, TLD registrars that support WHOIS | |||
| typically only support WHOIS information for second level domains | typically only support WHOIS information for second level domains | |||
| (i.e. ne.us) as opposed to lower level domains (i.e. | (i.e. ne.us) as opposed to lower level domains (i.e. | |||
| windrose.omaha.ne.us). Also, there are TLDs without registrars, TLDs | windrose.omaha.ne.us). Also, there are TLDs without registrars, TLDs | |||
| without WHOIS support, and still other TLDs that use other methods | without WHOIS support, and still other TLDs that use other methods | |||
| (HTTP, FTP, gopher) for providing organizational information. Based | (HTTP, FTP, gopher) for providing organizational information. Based | |||
| on our experience, an implementor of an internet directory needs to | on our experience, an implementor of an internet directory needs to | |||
| support multiple protocols for directory population. A WHOIS spider | support multiple protocols for directory population. An automated | |||
| is necessary, but isn't enough. | WHOIS search tool is necessary, but isn't enough. | |||
| INTERNET DRAFT Building Directories from DNS: Experiences from | ||||
| WWWSeeker December 1998 | ||||
| 3. Directory Updating: Full Rebuilds vs Incremental Updates | 3. Directory Updating: Full Rebuilds vs Incremental Updates | |||
| Given the size of our database in April 1998 when it was last | Given the size of our database in April 1998 when it was last | |||
| generated, a complete rebuild of the database that is available from | generated, a complete rebuild of the database that is available from | |||
| WHOIS lookups would require between 134.2 to 167.8 days just for | ||||
| INTERNET DRAFT Building Directories from DNS: Experiences from | WHOIS lookups from a Sun SPARCstation 20. This estimate does not | |||
| WWWSeeker October 1998 | include other considerations (for example, inverting the token tree | |||
| required about 24 hours processing time on a Sun SPARCstation 20) | ||||
| WHOIS lookups would require between 11.6 million and 14.5 million | that would increase the amount of time to rebuild the entire | |||
| seconds of time just for WHOIS lookups from a Sun SPARCstation 20. | database. | |||
| This estimate does not include other considerations (for example, | ||||
| inverting the token tree required about 24 hours processing time on a | ||||
| Sun SPARCstation 20) that would increase the amount of time to | ||||
| rebuild the entire database. | ||||
| Whether this is feasible depends on the frequency of database updates | Whether this is feasible depends on the frequency of database updates | |||
| provided. Because of the rate of growth of allocated domain names | provided. Because of the rate of growth of allocated domain names | |||
| (150K-200K new allocated domains per month in early 1998), we | (150K-200K new allocated domains per month in early 1998), we | |||
| provided monthly updates of the database. To rebuild the database | provided monthly updates of the database. To rebuild the database | |||
| each month (based on the above time estimate) would require between 3 | each month (based on the above time estimate) would require between 3 | |||
| and 5 machines to be dedicated full time (independent of machine | and 5 machines to be dedicated full time (independent of machine | |||
| architecture). Instead, we checkpointed the allocated domain list | architecture). Instead, we checkpointed the allocated domain list | |||
| and rebuild on an incremental basis during one weekend of the month. | and rebuild on an incremental basis during one weekend of the month. | |||
| This allowed us to complete the update on between 1 and 4 machines (3 | This allowed us to complete the update on between 1 and 4 machines (3 | |||
| skipping to change at page 4, line 46 ¶ | skipping to change at page 4, line 50 ¶ | |||
| database as a monolithic structure. Given past growth, it is not | database as a monolithic structure. Given past growth, it is not | |||
| clear at what point migrating to a distributed directory becomes | clear at what point migrating to a distributed directory becomes | |||
| actually necessary to support customer queries. Our last database | actually necessary to support customer queries. Our last database | |||
| contained over 3.25 million records in a flat ASCII file. Searching | contained over 3.25 million records in a flat ASCII file. Searching | |||
| was done via a PERL script of an inverted tree (also produced by a | was done via a PERL script of an inverted tree (also produced by a | |||
| PERL script). While admittedly primitive, this configuration | PERL script). While admittedly primitive, this configuration | |||
| supported over 200,000 database queries per month from our production | supported over 200,000 database queries per month from our production | |||
| servers. | servers. | |||
| Increasing the database size only requires more disk space to hold | Increasing the database size only requires more disk space to hold | |||
| the database and inverted tree. Of course, using database technology | the database and inverted tree. Of course, using database technology | |||
| would probably improve performance and scalability, but we had not | would probably improve performance and scalability, but we had not | |||
| reached the point where this technology was required. | reached the point where this technology was required. | |||
| INTERNET DRAFT Building Directories from DNS: Experiences from | ||||
| WWWSeeker December 1998 | ||||
| 5. Security | 5. Security | |||
| The underlying data for the type of directory discussed in this | The underlying data for the type of directory discussed in this | |||
| INTERNET DRAFT Building Directories from DNS: Experiences from | ||||
| WWWSeeker October 1998 | ||||
| document is already generally available through WHOIS, DNS, and other | document is already generally available through WHOIS, DNS, and other | |||
| standard interfaces. No new information is made available by using | standard interfaces. No new information is made available by using | |||
| these techniques though many types of search become much easier. To | these techniques though many types of search become much easier. To | |||
| the extent that easier access to this data makes it easier to find | the extent that easier access to this data makes it easier to find | |||
| specific sites or machines to attack, security may be decreased. | specific sites or machines to attack, security may be decreased. | |||
| The protocols discussed here do not have built-in security features. | The protocols discussed here do not have built-in security features. | |||
| If one source machine is spoofed while the directory data is being | If one source machine is spoofed while the directory data is being | |||
| gathered, substantial amounts of incorrect and misleading data could | gathered, substantial amounts of incorrect and misleading data could | |||
| be pulled in to the directory and be spread to a wider audience. | be pulled in to the directory and be spread to a wider audience. | |||
| skipping to change at page 5, line 41 ¶ | skipping to change at page 5, line 44 ¶ | |||
| Request For Comments (RFC) documents are available at | Request For Comments (RFC) documents are available at | |||
| http://info.internet.isi.edu/1/in-notes/rfc and from numerous mirror | http://info.internet.isi.edu/1/in-notes/rfc and from numerous mirror | |||
| sites. | sites. | |||
| [1] M. F. Schwartz, C. Pu. "Applying an Information | [1] M. F. Schwartz, C. Pu. "Applying an Information | |||
| Gathering Architecture to Netfind: A White Pages | Gathering Architecture to Netfind: A White Pages | |||
| Tool for a Changing and Growing Internet," Univer- | Tool for a Changing and Growing Internet," Univer- | |||
| sity of Colorado Technical Report CU-CS-656-93. | sity of Colorado Technical Report CU-CS-656-93. | |||
| December 1993, revised July 1994. | December 1993, revised July 1994. | |||
| <URL:ftp://ftp.cs.colorado.edu/pub/cs/techreports/schwartz/Netfind.Gathering | <URL:ftp://ftp.cs.colorado.edu/pub/cs/techreports/schwartz/Netfind | |||
| .txt.Z> | ||||
| [2] K. Sollins, Plan for Internet Directory Services, | [2] K. Sollins, Plan for Internet Directory Services, | |||
| RFC 1107, July 1989. | RFC 1107, July 1989. | |||
| [3] S. Hardcastle-Kille, E. Huizer, V.Cerf, R. Hobby, | [3] S. Hardcastle-Kille, E. Huizer, V.Cerf, R. Hobby, | |||
| S. Kent, A Strategic Plan for Deploying an Internet | S. Kent, A Strategic Plan for Deploying an Internet | |||
| X.500 Directory Service, RFC 1430, February 1993. | X.500 Directory Service, RFC 1430, February 1993. | |||
| [4] J. Postel & C. Anderson, White Pages Meeting | [4] J. Postel & C. Anderson, White Pages Meeting | |||
| Report, RFC 1588, February 1994. | Report, RFC 1588, February 1994. | |||
| INTERNET DRAFT Building Directories from DNS: Experiences from | ||||
| WWWSeeker December 1998 | ||||
| [5] M. Lottor, "Network Wizards Internet Domain Sur- | [5] M. Lottor, "Network Wizards Internet Domain Sur- | |||
| vey," available from | vey," available from | |||
| http://www.nw.com/zone/WWW/top.html | http://www.nw.com/zone/WWW/top.html | |||
| INTERNET DRAFT Building Directories from DNS: Experiences from | ||||
| WWWSeeker October 1998 | ||||
| 8. Authors' addresses | 8. Authors' addresses | |||
| Ryan Moats Rick Huber | Ryan Moats Rick Huber | |||
| AT&T AT&T | AT&T AT&T | |||
| 15621 Drexel Circle Room C3-3B30, 200 Laurel Ave. South | 15621 Drexel Circle Room C3-3B30, 200 Laurel Ave. South | |||
| Omaha, NE 68135-2358 Middletown, NJ 07748 | Omaha, NE 68135-2358 Middletown, NJ 07748 | |||
| USA USA | USA USA | |||
| EMail: jayhawk@att.com Email: rvh@att.com | EMail: jayhawk@att.com Email: rvh@att.com | |||
| End of changes. 16 change blocks. | ||||
| 38 lines changed or deleted | 41 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||