idnits 2.17.1 draft-google-self-published-geofeeds-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) == There are 1 instance of lines with non-RFC2606-compliant FQDNs in the document. == There are 7 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. == There are 1 instance of lines with multicast IPv4 addresses in the document. If these are generic example addresses, they should be changed to use the 233.252.0.x range defined in RFC 5771 == There are 3 instances of lines with non-RFC3849-compliant IPv6 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (May 27, 2019) is 1795 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Experimental ---------------------------------------------------------------------------- -- Looks like a reference, but probably isn't: '1' on line 881 -- Looks like a reference, but probably isn't: '2' on line 885 -- Looks like a reference, but probably isn't: '3' on line 776 -- Looks like a reference, but probably isn't: '4' on line 778 -- Looks like a reference, but probably isn't: '5' on line 780 -- Looks like a reference, but probably isn't: '6' on line 782 -- Looks like a reference, but probably isn't: '7' on line 784 -- Looks like a reference, but probably isn't: '8' on line 786 -- Looks like a reference, but probably isn't: '0' on line 942 ** Obsolete normative reference: RFC 2616 (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235) -- Obsolete informational reference (is this intentional?): RFC 2818 (Obsoleted by RFC 9110) -- Obsolete informational reference (is this intentional?): RFC 4408 (Obsoleted by RFC 7208) -- Obsolete informational reference (is this intentional?): RFC 4627 (Obsoleted by RFC 7158, RFC 7159) Summary: 2 errors (**), 0 flaws (~~), 5 warnings (==), 14 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group E. Kline 3 Internet-Draft Loon LLC 4 Intended status: Experimental K. Duleba 5 Expires: November 28, 2019 Z. Szamonek 6 S. Moser 7 Google Switzerland GmbH 8 W. Kumari 9 Google 10 May 27, 2019 12 A Format for Self-published IP Geolocation Feeds 13 draft-google-self-published-geofeeds-04 15 Abstract 17 This document records a format whereby a network operator can publish 18 a mapping of IP address prefixes to simplified geolocation 19 information, colloquially termed a geolocation "feed". Interested 20 parties can poll and parse these feeds to update or merge with other 21 geolocation data sources and procedures. This format intentionally 22 only allows specifying coarse level location. 24 Some technical organizations operating networks that move from one 25 conference location to the next have already experimentally published 26 small geolocation feeds. 28 This document describes a currently deployed format. At least one 29 consumer (Google) has incorporated these feeds into a geolocation 30 data pipeline, and a significant number of ISPs are using it to 31 inform them where their prefixes should be geolocated. 33 [RFC Ed - Please remove publication: The IETF Meeting network 34 currently publishes a feed in this format at: 35 https://noc.ietf.org/geo/google.csv -- this has significantly cut 36 down on the number of "Gah! Why does the network believe I'm in 37 Montreal, that was last meeting! How am I supposed to find a pub?!" 38 complaints. A number of other meeting networks, including RIPE and 39 ICANN publish this information as well, see below. ] 41 [ Ed note: Text inside square brackets ([]) is additional background 42 information, answers to frequently asked questions, general musings, 43 etc. They will be removed before publication.] 45 [ This document is being collaborated on in Github at: 46 https://github.com/google/self-published-geo . The most recent 47 version of the document, open issues, etc should all be available 48 here. The authors (gratefully) accept pull requests ] 50 Status of This Memo 52 This Internet-Draft is submitted in full conformance with the 53 provisions of BCP 78 and BCP 79. 55 Internet-Drafts are working documents of the Internet Engineering 56 Task Force (IETF). Note that other groups may also distribute 57 working documents as Internet-Drafts. The list of current Internet- 58 Drafts is at http://datatracker.ietf.org/drafts/current/. 60 Internet-Drafts are draft documents valid for a maximum of six months 61 and may be updated, replaced, or obsoleted by other documents at any 62 time. It is inappropriate to use Internet-Drafts as reference 63 material or to cite them other than as "work in progress." 65 This Internet-Draft will expire on November 28, 2019. 67 Copyright Notice 69 Copyright (c) 2019 IETF Trust and the persons identified as the 70 document authors. All rights reserved. 72 This document is subject to BCP 78 and the IETF Trust's Legal 73 Provisions Relating to IETF Documents 74 (http://trustee.ietf.org/license-info) in effect on the date of 75 publication of this document. Please review these documents 76 carefully, as they describe your rights and restrictions with respect 77 to this document. Code Components extracted from this document must 78 include Simplified BSD License text as described in Section 4.e of 79 the Trust Legal Provisions and are provided without warranty as 80 described in the Simplified BSD License. 82 Table of Contents 84 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 85 1.1. Motivation . . . . . . . . . . . . . . . . . . . . . . . 3 86 1.2. Requirements notation . . . . . . . . . . . . . . . . . . 4 87 1.3. Implications of publication . . . . . . . . . . . . . . . 4 88 2. Self-published IP geolocation feeds . . . . . . . . . . . . . 4 89 2.1. Specification . . . . . . . . . . . . . . . . . . . . . . 5 90 2.1.1. Geolocation feed individual entry fields . . . . . . 5 91 2.1.1.1. IP Prefix . . . . . . . . . . . . . . . . . . . . 5 92 2.1.1.2. Country . . . . . . . . . . . . . . . . . . . . . 5 93 2.1.1.3. Region . . . . . . . . . . . . . . . . . . . . . 5 94 2.1.1.4. City . . . . . . . . . . . . . . . . . . . . . . 6 95 2.1.1.5. Postal code . . . . . . . . . . . . . . . . . . . 6 96 2.1.2. Prefixes with no geolocation information . . . . . . 6 97 2.1.3. Additional parsing requirements . . . . . . . . . . . 7 98 2.1.4. Looking up an IP address . . . . . . . . . . . . . . 7 99 2.2. Examples . . . . . . . . . . . . . . . . . . . . . . . . 7 100 2.3. Proposed extensions . . . . . . . . . . . . . . . . . . . 8 101 2.3.1. Delegation size . . . . . . . . . . . . . . . . . . . 9 102 2.3.2. Alternate format . . . . . . . . . . . . . . . . . . 9 103 3. Consuming self-published IP geolocation feeds . . . . . . . . 9 104 3.1. Feed integrity . . . . . . . . . . . . . . . . . . . . . 10 105 3.2. Verification of authority . . . . . . . . . . . . . . . . 10 106 3.3. Verification of accuracy . . . . . . . . . . . . . . . . 10 107 3.4. Refreshing feed information . . . . . . . . . . . . . . . 10 108 4. Privacy Considerations . . . . . . . . . . . . . . . . . . . 11 109 5. Relation to other work . . . . . . . . . . . . . . . . . . . 11 110 6. Security Considerations . . . . . . . . . . . . . . . . . . . 12 111 7. Finding self-published IP geolocation feeds . . . . . . . . . 12 112 7.1. Ad hoc 'well known' URIs . . . . . . . . . . . . . . . . 12 113 7.2. Using public databases of network authority . . . . . . . 13 114 7.3. Using 'reverse' DNS with NAPTR records . . . . . . . . . 13 115 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 14 116 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 15 117 9.1. Normative References . . . . . . . . . . . . . . . . . . 15 118 9.2. Informative References . . . . . . . . . . . . . . . . . 15 119 9.3. URIs . . . . . . . . . . . . . . . . . . . . . . . . . . 17 120 Appendix A. Sample Python validation code . . . . . . . . . . . 18 121 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 24 123 1. Introduction 125 1.1. Motivation 127 Providers of services over the Internet have grown to depend on best- 128 effort geolocation information to improve the user experience. 129 Locality information can aid in directing traffic to the nearest 130 serving location, inferring likely native language, and providing 131 additional context for services involving search queries. 133 When an ISP, for example, changes the location where an IP prefix is 134 deployed, services which make use of geolocation information may 135 begin to suffer degraded performance. This can lead to customer 136 complaints, possibly to the ISP directly. Dissemination of correct 137 geolocation data is complicated by the lack of any centralized means 138 to coordinate and communicate geolocation information to all 139 interested consumers of the data. 141 This document records a format whereby a network operator (an ISP, an 142 enterprise, or any organization which deems the geolocation of its IP 143 prefixes to be of concern) can publish a mapping of IP address 144 prefixes to simplified geolocation information, colloquially termed a 145 "geolocation feed". Interested parties can poll and parse these 146 feeds to update or merge with other geolocation data sources and 147 procedures. 149 This document describes a currently deployed format. At least one 150 consumer (Google) has incorporated these feeds into a geolocation 151 data pipeline, and a significant number of ISPs are using it to 152 inform them where their prefixes should be geolocated. 154 1.2. Requirements notation 156 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 157 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 158 document are to be interpreted as described in RFC2119. 160 1.3. Implications of publication 162 This document describes both a format and a mechanism for publishing 163 data, with the implication that the owner of the data wishes it to be 164 public. Any privacy risk is bounded by the format, and feed 165 publishers MAY omit any location field to further protect privacy 166 (see Section 2.1 for details about which fields exactly may be 167 omitted). Feed publishers assume the responsibility of determining 168 which data should be made public. 170 This proposal does not incorporate a mechanism to communicate 171 acceptable use policies for self-published data. Publication itself 172 is inferred as a desire by the publisher for the data to be usefully 173 consumed, similar to the publication of information like host names, 174 cryptographic keys, and SPF records [RFC4408] in the DNS. 176 2. Self-published IP geolocation feeds 178 The format described here was developed to address the need of 179 network operators to rapidly and usefully share geolocation 180 information changes. Originally, there arose a specific case where 181 regional operators found it desirable to publish location changes 182 rather than wait for geolocation algorithms to "learn" about them. 183 Later, technical conferences which frequently use the same network 184 prefixes advertised from different conference locations experimented 185 by publishing geolocation feeds, updated in advance of network 186 location changes, in order to better serve conference attendees. 188 At its simplest, the mechanism consists of a network operator 189 publishing a file (the "geolocation feed"), which contains several 190 text entries, one per line. Each entry is keyed by a unique (within 191 the feed) IP prefix (or single IP address) followed by a sequence of 192 network locality attributes to be ascribed to the given prefix. 194 2.1. Specification 196 For operational simplicity, every feed should contain data about all 197 IP addresses the provider wants to publish. Alternatives, like 198 publishing only entries for IP addresses whose geolocation data has 199 changed or differ from current observed geolocation behavior "at 200 large", are likely to be too operationally complex. 202 Feeds MUST use UTF-8 [RFC3629] character encoding. Text after a '#' 203 character is treated as a comment only and ignored. Blank lines are 204 similarly ignored. 206 Feeds MUST be in comma separated values format as described in 207 [RFC4180]. Each feed entry is a text line of the form: 209 ip_prefix,country,region,city,postal_code 211 The IP prefix field is REQUIRED, all others are OPTIONAL (can be 212 empty), though the requisite minimum number of commas SHOULD be 213 present. 215 2.1.1. Geolocation feed individual entry fields 217 2.1.1.1. IP Prefix 219 REQUIRED. Each IP prefix field MUST be either a single IP address or 220 an IP prefix in CIDR notation in conformance with section 3.1 [1] of 221 [RFC4632] for IPv4 or section 2.3 [2] of [RFC4291] for IPv6. 223 Examples include "192.0.2.1" and "192.0.2.0/24" for IPv4 and 224 "2001:db8::1" and "2001:db8::/32" for IPv6. 226 2.1.1.2. Country 228 OPTIONAL. The country field, if non-empty, MUST be a 2 letter ISO 229 country code conforming to ISO 3166-1 alpha 2 [ISO.3166.1alpha2]. 230 Parsers SHOULD treat this field case-insensitively. 232 Examples include "US" for the United States, "JP" for Japan, and "PL" 233 for Poland. 235 2.1.1.3. Region 237 OPTIONAL. The region field, if non-empty, MUST be a ISO region code 238 conforming to ISO 3166-2 [ISO.3166.2]. Parsers SHOULD treat this 239 field case-insensitively. 241 Examples include "ID-RI" for the Riau province of Indonesia and "NG- 242 RI" for the Rivers province in Nigeria. 244 2.1.1.4. City 246 OPTIONAL. The city field, if non-empty, SHOULD be free UTF-8 text, 247 excluding the comma (',') character. 249 Examples include "Dublin", "New York", and "Sao Paulo" (specifically 250 "S" followed by 0xc3, 0xa3, and "o Paulo"). 252 2.1.1.5. Postal code 254 OPTIONAL, DEPRECATED. The postal code field, if non-empty, SHOULD be 255 free UTF-8 text, excluding the comma (',') character. The use of 256 this field is deprecated; consumers of feeds should be able to parse 257 feeds containing these fields, but new feeds SHOULD NOT include this 258 field, due to the granularity of this information. See See Section 4 259 for additional discussion. 261 Examples include "106-6126" (in Minato ward, Tokyo, Japan). 263 2.1.2. Prefixes with no geolocation information 265 Feed publishers may indicate that some IP prefixes should not have 266 any associated geolocation information. It may be that some prefixes 267 under their administrative control are reserved, not yet allocated or 268 deployed, or are in the process of being redeployed elsewhere and 269 existing geolocation information can, from the perspective of the 270 publisher, safely be discarded. 272 This special case can be indicated by explicitly leaving blank all 273 fields which specify any degree of geolocation information. For 274 example: 276 127.0.0.0/8,,,, 277 224.0.0.0/4,,,, 278 240.0.0.0/4,,,, 280 Historically, the user-assigned country identifier of "ZZ" had be 281 used for this same purpose. This is not necessarily preferred, and 282 no specific interpretation of any of the other user-assigned country 283 codes is currently defined. 285 2.1.3. Additional parsing requirements 287 Feed entries missing required fields, or having a required field 288 which fails to parse correctly MUST be discarded. It is RECOMMENDED 289 that such entries also be logged for further administrative review. 291 While publishers SHOULD follow [RFC5952] style for IPv6 prefix 292 fields, consumers MUST nevertheless accept all valid string 293 representations. 295 Duplicate IP address or prefix entries MUST be considered an error, 296 and consumer implementations SHOULD log the repeated entries for 297 further administrative review. Publishers SHOULD take measures to 298 ensure there is one and only one entry per IP address and prefix. 300 Feed entries with non-empty optional fields which fail to parse, 301 either in part or in full, SHOULD be discarded. It is RECOMMENDED 302 that they also be logged for further administrative review. 304 For compatibility with future additional fields, a parser MUST ignore 305 any fields beyond those it expects. The data from fields which are 306 expected and which parse successfully MUST still be considered valid. 308 2.1.4. Looking up an IP address 310 Multiple entries which constitute nested prefixes are permitted. 311 Consumers SHOULD consider the entry with the longest matching prefix 312 (i.e. the "most specific") to be the best matching entry for a given 313 IP address. 315 2.2. Examples 317 Example entries using different IP address formats and describing 318 locations at country, region, and city granularity level, 319 respectively: 321 192.0.2.0/25,US,US-AL,, 322 192.0.2.5,US,US-AL,Alabaster, 323 192.0.2.128/25,PL,PL-MZ,, 324 2001:db8::/32,PL,,, 325 2001:db8:cafe::/48,PL,PL-MZ,, 327 The IETF network publishes geolocation information for the meeting 328 prefixes, and generally just comment out the last meeting information 329 and append the new meeting information. The [GEO_IETF] at the time 330 of this writing contains: 332 # IETF 104, March 2019 - Prague, CZ. 333 # Note that Prague changed from CZ-PR to CZ-10 2016-11-15 334 # https://www.iso.org/obp/ui/#iso:code:3166:CZ 335 130.129.0.0/16,CZ,CZ-10,Prague, 336 2001:df8::/32,CZ,CZ-10,Prague, 337 31.133.128.0/18,CZ,CZ-10,Prague, 338 31.130.224.0/20,CZ,CZ-10,Prague, 339 2001:67c:1230::/46,CZ,CZ-10,Prague, 340 2001:67c:370::/48,CZ,CZ-10,Prague, 342 Experimentally, RIPE has published geolocation information for their 343 conference network prefixes, which change location in accordance with 344 each new event. [GEO_RIPE_NCC] at the time of writing contains: 346 193.0.24.0/21,IS,IS-1,Reykjavik, 347 2001:67c:64::/48,IS,IS-1,Reykjavik, 349 Similarly, ICANN has published geolocation information for their 350 portable conference network prefixes. [GEO_ICANN] at the time of 351 writing contains: 353 199.91.192.0/21,ES,ES-CT,Barcelona 354 2620:f:8000::/48,ES,ES-CT,Barcelona 356 A longer example is the [GEO_Google] Google Corp Geofeed, which lists 357 the geo-location information for Google corporate offices. 359 Furthermore, it is worth noting that the geolocation data of SixXS 360 users, already available at whois.sixxs.net, is now also accessible 361 in the format described here (see [GEO_SIXXS]). This can be 362 particularly useful where tunnel broker networks [RFC3053] are 363 concerned as: 365 o the geolocation attributes of users with neighboring prefixes can 366 be quite different and therefore not easily aggregated, and 368 o attempting to learn this data by statistical analysis can be 369 complicated by the likely low number of samples for any given 370 user, making satisfactory statistical confidence difficult to 371 achieve. 373 2.3. Proposed extensions 375 Already some discussions have resulted in proposed extensions. While 376 the purpose of this document is principally to record existing 377 implementation details, it may be that there is a larger desire to 378 publish other "network attributes" in a similar manner. One such 379 network attribute, "delegation size", is not currently implemented, 380 but the state of the proposed extension is recorded here to 381 demonstrate the flexibility required of parser implementations. 383 The following have been only informally discussed and are not in use 384 at the time of writing. 386 2.3.1. Delegation size 388 OPTIONAL. A publisher may optionally communicate the average 389 delegated prefix size for subnetworks within the IP prefix of this 390 entry. For a network operator this can be used to help consumers 391 distinguish IP prefixes among various use types such as residential 392 prefixes, allocations to businesses, or data center customer 393 allocations. 395 Non-empty strings MUST be of the form required for CIDR notation 396 suffixes, i.e. "/" followed by the integer prefix length of the 397 expected allocation to the subnetworks from within the entry's 398 prefix. In the absence of data to the contrary, it is common to 399 assume that leaf networks may be delegated a prefix ranging from /24 400 to /32 in IPv4 and /48 to /64 in IPv6. Default assumptions about 401 delegation size are left to the consumer's implementation. 403 Examples for IPv6 include "/48", "/56", "/60", and "/64". 405 2.3.2. Alternate format 407 In order to more flexibly support future extensions, use of a more 408 expressive feed format has been suggested. Use of JavaScript Object 409 Notation (JSON, [RFC4627]), specifically, has been discussed. 410 However, at the time of writing no such specification nor 411 implementation exists. 413 The authors are planning on writing a new document describing such a 414 new format. The current document describes a currently deployed and 415 used format. 417 3. Consuming self-published IP geolocation feeds 419 Consumers MAY treat published feed data as a hint only and MAY choose 420 to prefer other sources of geolocation information for any given IP 421 prefix. Regardless of a consumer's stance with respect to a given 422 published feed, there are some points of note for sensibly and 423 effectively consuming published feeds. 425 3.1. Feed integrity 427 The integrity of published information SHOULD be protected by 428 securing the means of publication, for example by using HTTP over TLS 429 [RFC2818]. Whenever possible, consumers SHOULD prefer retrieving 430 geolocation feeds in a manner that guarantees integrity of the feed. 432 3.2. Verification of authority 434 Consumers of self-published IP geolocation feeds SHOULD perform some 435 form of verification that the publisher is in fact authoritative for 436 the addresses in the feed. The actual means of verification is 437 likely dependent upon the way in which the feed is discovered. Ad 438 hoc shared URIs, for example, will likely require an ad hoc 439 verification process. Future automated means of feed discovery 440 SHOULD have an accompanying automated means of verification. 442 A consumer MUST only trust geolocation information for IP addresses 443 or prefixes for which the publisher has been verified as 444 administratively authoritative. All other geolocation feed entries 445 MUST be ignored and SHOULD be logged for further administrative 446 review. 448 3.3. Verification of accuracy 450 Errors and inaccuracies may occur at many levels, and publication and 451 consumption of geolocation data are no exceptions. To the extent 452 practical, consumers SHOULD take steps to verify the accuracy of 453 published locality. Verification methodology, resolution of 454 discrepancies, and preference for alternative sources of data are 455 left to the discretion of the feed consumer. 457 Consumers SHOULD decide on discrepancy thresholds and SHOULD flag for 458 administrative review feed entries which exceed set thresholds. 460 3.4. Refreshing feed information 462 As a publisher can change geolocation data at any time and without 463 notification, consumers SHOULD implement mechanisms to periodically 464 refresh local copies of feed data. In the absence of any other 465 refresh timing information, it is recommended that consumers SHOULD 466 refresh feeds no less often than weekly. 468 For feeds available via HTTPS (or HTTP), the publisher MAY 469 communicate refresh timing information by means of the standard HTTP 470 expiration model (section 13.2 [3] of [RFC2616]). Specifically, 471 publishers can include either an Expires header [4] or a Cache- 472 Control header [5] specifying the max-age. Where practical, 473 consumers SHOULD refresh feed information before the expiry time is 474 reached. 476 4. Privacy Considerations 478 Publishers of geolocation feeds are advised to have fully considered 479 any and all privacy implications of the disclosure of such 480 information for the users of the described networks prior to 481 publication. A thorough comprehension of the security considerations 482 [6] of a chosen geolocation policy is highly recommended, including 483 an understanding of some of the limitations of information obscurity 484 [7] (see also [RFC6772]). 486 As noted in Section 2.1, each location field in an entry is optional, 487 in order to support expressing only the level of specificity which 488 the publisher has deemed acceptable. There is no requirement that 489 the level of specificity be consistent across all entries within a 490 feed. In particular, the Postal Code field (Section 2.1.1.5) can 491 provide very specific geolocation, sometimes within a building. Such 492 specific Postal Code values MUST NOT be published in geo feeds 493 without the express consent of the parties being located. 495 5. Relation to other work 497 While not originally done in conjunction with the [GEOPRIV] working 498 group, Richard Barnes observed that this work is nevertheless 499 consistent with that which the group has defined, both for address 500 format and for privacy. The data elements in geolocation feeds are 501 equivalent to the following XML structure (vis. [RFC5139]): 503 504 country 505 region 506 city 507 postal_code 508 510 Providing geolocation information to this granularity is equivalent 511 to the following privacy policy (vis. the definition of the 512 'building' [8] level of disclosure): 514 515 516 517 518 519 520 building 521 522 523 524 526 6. Security Considerations 528 As there is no true security in the obscurity of the location of any 529 given IP address, self-publication of this data fundamentally opens 530 no new attack vectors. For publishers, self-published data merely 531 increases the ease with which such location data might be exploited. 533 For consumers, feed retrieval processes may receive input from 534 potentially hostile sources (e.g. in the event of hijacked traffic). 535 As such, proper input validation and defense measures MUST be taken. 537 Similarly, consumers who do not perform sufficient verification of 538 published data bear the same risks as from other forms of geolocation 539 configuration errors. 541 7. Finding self-published IP geolocation feeds 543 The issue of finding, and later verifying, geolocation feeds is not 544 formally specified in this document. At this time, only ad hoc feed 545 discovery and verification has a modicum of established practice (see 546 below). Regardless, both the ad hoc mechanics and a few proposed but 547 not yet implemented alternatives are discussed. 549 7.1. Ad hoc 'well known' URIs 551 To date, geolocation feeds have been shared informally in the form of 552 HTTPS URIs exchanged in email threads. The two example URIs 553 documented above describe networks that change locations 554 periodically, the operators and operational practices of which are 555 well known within their respective technical communities. 557 The contents of the feeds are verified by a similarly ad hoc process 558 including: 560 o personal knowledge of the parties involved in the exchange, and 561 o comparison of feed-advertised prefixes with the BGP-advertised 562 prefixes of Autonomous System Numbers known to be operated by the 563 publishers. 565 Ad hoc mechanisms, while useful for early experimentation by 566 producers and consumers, are unlikely to be adequate for long-term, 567 widespread use by multiple parties. Future versions of any such 568 self-published geolocation feed mechanism SHOULD address scalability 569 concerns by defining a means for automated discovery and verification 570 of operational authority of advertised prefixes. 572 7.2. Using public databases of network authority 574 One possibility for enabling automation would be publication of feed 575 URIs as a well-known attribute in public databases of network 576 authority, e.g. the WHOIS service ([RFC3912]) operated by RIRs. 577 Verification may be performed if the same or similarly authoritative 578 service provides the identical feed URI for queries for each CIDR 579 prefix in the geolocation feed. 581 The burden of serving this data to all interested consumers, 582 especially the load imposed by any verification process, is not yet 583 known. The anticipation of additional operational burden on the 584 public resource of record (the database of network authority) is 585 however a noted concern. 587 7.3. Using 'reverse' DNS with NAPTR records 589 Another possibility for automating the location and verification of a 590 geolocation feed is to incorporate feed URIs into the DNS, 591 specifically the in-addr.arpa and ip6.arpa portions of the DNS 592 hierarchy. A suitably formatted query for a NAPTR ([RFC3403]) 593 record, or more specifically a U-NAPTR ([RFC4848]) record, could 594 yield a transformation to a geolocation feed URI. 596 For example, assuming a purely theoretical service name of 597 "x-geofeed", a 'reverse' DNS zone might contain a record of the form: 599 ;; order pref flags 600 IN NAPTR 200 10 "u" "x-geofeed" ( ; service 601 ; regexp 602 "!.*!https://example.com/ipgeo.csv!" 603 "" ; replacement 604 ) 606 Attempts to locate the geolocation feed for a given IP address would 607 begin by querying directly for a NAPTR record associated with the 608 address's PTR-style name. For example, 192.0.2.4 and 2001:db8::6 609 would cause a NAPTR record request to be issued for "4.2.0.192.in- 610 addr.arpa" and "6.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.8.b.d 611 .0.1.0.0.2.ip6.arpa", respectively. 613 If no such record exists, one further NAPTR query for the fully 614 qualified domain name of the SOA record in the authority section of 615 the response to the previous query would be performed ("2.0.192.in- 616 addr.arpa" and "d.0.1.0.0.2.ip6.arpa" in the examples above). 618 If one or more NAPTR records exist for the full PTR-style name but 619 none of them are for the required service name (e.g. "x-geofeed"), 620 then likely no SOA will be returned as a hint for subsequent queries. 621 In this case, implementations would need to first explicitly query 622 for an SOA record for the full PTR-style name, and then query for a 623 NAPTR record of the SOA in the response (assuming it differs from the 624 previously queried name). 626 Any successfully located feed URIs could then be processed as 627 outlined by this document. 629 Verification of the contents of a feed would proceed in essentially 630 the same way. CIDR prefixes may be verified by constructing a query 631 for any single address (at random) within the prefix and proceeding 632 as above. While not strictly provably correct (in cases where a 633 publisher has delegated some portion of the advertised prefix but not 634 excluded it from its feed), it may nevertheless suffice for 635 operational purposes, especially if a low-impact on-going 636 verification of observed client IP addresses is implemented, to 637 (eventually) catch any oversights. 639 This mode is untested and may prove impractical. However, the 640 operational burden is more closely located with those wishing to bear 641 it, i.e. the publishers who would likely handle serving in-addr.arpa 642 and ip6.arpa for the IP prefixes under their authority. 644 8. Acknowledgements 646 The authors would like to express their gratitude to reviewers and 647 early implementers, including but not limited to Mikael Abrahamsson, 648 Ray Bellis, John Bond, Alissa Cooper, Andras Erdei, Marco Hogewoning, 649 Mike Joseph, Maciej Kuzniar, Menno Schepers, Justyna Sidorska, Pim 650 van Pelt, and Bjoern A. Zeeb. Richard L. Barnes in particular 651 contributed substantial review, text, and advice. 653 9. References 655 9.1. Normative References 657 [ISO.3166.1alpha2] 658 International Organization for Standardization, "ISO 659 3166-1 decoding table", 660 . 663 [ISO.3166.2] 664 International Organization for Standardization, "ISO 665 3166-2:2007", . 668 [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., 669 Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext 670 Transfer Protocol -- HTTP/1.1", RFC 2616, 671 DOI 10.17487/RFC2616, June 1999, . 674 [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO 675 10646", STD 63, RFC 3629, DOI 10.17487/RFC3629, November 676 2003, . 678 [RFC4180] Shafranovich, Y., "Common Format and MIME Type for Comma- 679 Separated Values (CSV) Files", RFC 4180, 680 DOI 10.17487/RFC4180, October 2005, . 683 [RFC4291] Hinden, R. and S. Deering, "IP Version 6 Addressing 684 Architecture", RFC 4291, DOI 10.17487/RFC4291, February 685 2006, . 687 [RFC4632] Fuller, V. and T. Li, "Classless Inter-domain Routing 688 (CIDR): The Internet Address Assignment and Aggregation 689 Plan", BCP 122, RFC 4632, DOI 10.17487/RFC4632, August 690 2006, . 692 9.2. Informative References 694 [GEO_Google] 695 Google, LLC, "Google Corp Geofeed", 696 . 698 [GEO_ICANN] 699 Internet Corporation For Assigned Names and Numbers, 700 "ICANN Meeting Geolocation Data", 701 . 703 [GEO_IETF] 704 Kumari, A., "IETF Meeting Network Geolocation Data", 705 . 707 [GEO_RIPE_NCC] 708 Schepers, M., "RIPE NCC Meeting Geolocation Data", 709 . 711 [GEO_SIXXS] 712 van Pelt, P., "SixXS Geolocation Data", 713 . 715 [GEOPRIV] Internet Engineering Task Force, "IETF geopriv Working 716 Group", . 718 [IPADDR_PY] 719 Shields, M. and P. Moody, "Python IP address manipulation 720 library", . 722 [RFC2818] Rescorla, E., "HTTP Over TLS", RFC 2818, 723 DOI 10.17487/RFC2818, May 2000, . 726 [RFC3053] Durand, A., Fasano, P., Guardini, I., and D. Lento, "IPv6 727 Tunnel Broker", RFC 3053, DOI 10.17487/RFC3053, January 728 2001, . 730 [RFC3403] Mealling, M., "Dynamic Delegation Discovery System (DDDS) 731 Part Three: The Domain Name System (DNS) Database", 732 RFC 3403, DOI 10.17487/RFC3403, October 2002, 733 . 735 [RFC3912] Daigle, L., "WHOIS Protocol Specification", RFC 3912, 736 DOI 10.17487/RFC3912, September 2004, . 739 [RFC4408] Wong, M. and W. Schlitt, "Sender Policy Framework (SPF) 740 for Authorizing Use of Domains in E-Mail, Version 1", 741 RFC 4408, DOI 10.17487/RFC4408, April 2006, 742 . 744 [RFC4627] Crockford, D., "The application/json Media Type for 745 JavaScript Object Notation (JSON)", RFC 4627, 746 DOI 10.17487/RFC4627, July 2006, . 749 [RFC4848] Daigle, L., "Domain-Based Application Service Location 750 Using URIs and the Dynamic Delegation Discovery Service 751 (DDDS)", RFC 4848, DOI 10.17487/RFC4848, April 2007, 752 . 754 [RFC5139] Thomson, M. and J. Winterbottom, "Revised Civic Location 755 Format for Presence Information Data Format Location 756 Object (PIDF-LO)", RFC 5139, DOI 10.17487/RFC5139, 757 February 2008, . 759 [RFC5952] Kawamura, S. and M. Kawashima, "A Recommendation for IPv6 760 Address Text Representation", RFC 5952, 761 DOI 10.17487/RFC5952, August 2010, . 764 [RFC6772] Schulzrinne, H., Ed., Tschofenig, H., Ed., Cuellar, J., 765 Polk, J., Morris, J., and M. Thomson, "Geolocation Policy: 766 A Document Format for Expressing Privacy Preferences for 767 Location Information", RFC 6772, DOI 10.17487/RFC6772, 768 January 2013, . 770 9.3. URIs 772 [1] http://tools.ietf.org/html/rfc4632#section-3.1 774 [2] http://tools.ietf.org/html/rfc4291#section-2.3 776 [3] http://tools.ietf.org/html/rfc2616#section-13.2 778 [4] http://tools.ietf.org/html/rfc2616#section-14.21 780 [5] http://tools.ietf.org/html/rfc2616#section-14.9 782 [6] http://tools.ietf.org/html/rfc6772#section-13 784 [7] http://tools.ietf.org/html/rfc6772#section-13.5 786 [8] http://tools.ietf.org/html/rfc6772#section-6.5.1 788 Appendix A. Sample Python validation code 790 Included here is a simple format validator in Python for self- 791 published ipgeo feeds. This tool reads CSV data in the self- 792 published ipgeo feed format from the standard input and performs 793 basic validation. It is intended for use by feed publishers before 794 launching a feed. Note that this validator does not verify the 795 uniqueness of every IP prefix entry within the feed as a whole, but 796 only verifies the syntax of each single line from within the feed. A 797 complete validator MUST also ensure IP prefix uniqueness. 799 The main source file "ipgeo_feed_validator.py" follows. It requires 800 use of the open source ipaddr Python library for IP address and CIDR 801 parsing and validation [IPADDR_PY]. 803 #!/usr/bin/python 804 # 805 # Copyright (c) 2012 IETF Trust and the persons identified as authors of 806 # the code. All rights reserved. Redistribution and use in source and 807 # binary forms, with or without modification, is permitted pursuant to, 808 # and subject to the license terms contained in, the Simplified BSD 809 # License set forth in Section 4.c of the IETF Trust's Legal Provisions 810 # Relating to IETF Documents (http://trustee.ietf.org/license-info). 812 """Simple format validator for self-published ipgeo feeds. 814 This tool reads CSV data in the self-published ipgeo feed format from 815 the standard input and performs basic validation. It is intended for 816 use by feed publishers before launching a feed. 817 """ 819 import csv 820 import ipaddr 821 import re 822 import sys 824 class IPGeoFeedValidator(object): 825 def __init__(self): 826 self.prefixes = {} 827 self.line_number = 0 828 self.output_log = {} 829 self.SetOutputStream(sys.stderr) 831 def Validate(self, feed): 832 """Check validity of an IPGeo feed. 834 Args: 835 feed: iterable with feed lines 836 """ 838 for line in feed: 839 self._ValidateLine(line) 841 def SetOutputStream(self, logfile): 842 """Controls where the output messages go do (STDERR by default). 844 Use None to disable logging. 846 Args: 847 logfile: a file object (e.g., sys.stdout or sys.stderr) or None. 848 """ 849 self.output_stream = logfile 851 def CountErrors(self, severity): 852 """How many ERRORs or WARNINGs were generated.""" 853 return len(self.output_log.get(severity, [])) 855 ############################################################ 856 def _ValidateLine(self, line): 857 line = line.rstrip('\r\n') 858 self.line_number += 1 859 self.line = line.split('#')[0] 860 self.is_correct_line = True 862 if self._ShouldIgnoreLine(line): 863 return 865 fields = [field for field in csv.reader([line])][0] 867 self._ValidateFields(fields) 868 self._FlushOutputStream() 870 def _ShouldIgnoreLine(self, line): 871 line = line.strip() 872 return len(line) == 0 874 ############################################################ 875 def _ValidateFields(self, fields): 876 assert(len(fields) > 0) 878 is_correct = self._IsIPAddressOrPrefixCorrect(fields[0]) 880 if len(fields) > 1: 881 if not self._IsCountryCode2Correct(fields[1]): 883 is_correct = False 885 if len(fields) > 2 and not self._IsRegionCodeCorrect(fields[2]): 886 is_correct = False 888 if len(fields) != 5: 889 self._ReportWarning('5 fields were expected (got %d).' 890 % len(fields)) 892 ############################################################ 893 def _IsIPAddressOrPrefixCorrect(self, field): 894 if '/' in field: 895 return self._IsCIDRCorrect(field) 896 return self._IsIPAddressCorrect(field) 898 def _IsCIDRCorrect(self, cidr): 899 try: 900 ipprefix = ipaddr.IPNetwork(cidr) 901 if ipprefix.network._ip != ipprefix._ip: 902 self._ReportError('Incorrect IP Network.') 903 return False 904 if ipprefix.is_private: 905 self._ReportError('IP Address must not be private.') 906 return False 907 except: 908 self._ReportError('Incorrect IP Network.') 909 return False 910 return True 912 def _IsIPAddressCorrect(self, ipaddress): 913 try: 914 ip = ipaddr.IPAddress(ipaddress) 915 except: 916 self._ReportError('Incorrect IP Address.') 917 return False 918 if ip.is_private: 919 self._ReportError('IP Address must not be private.') 920 return False 921 return True 923 ############################################################ 924 def _IsCountryCode2Correct(self, country_code_2): 925 if len(country_code_2) == 0: 926 return True 927 if len(country_code_2) != 2 or not country_code_2.isalpha(): 928 self._ReportError( 929 'Country code must be in the ISO 3166-1 alpha 2 format.') 930 return False 932 return True 934 def _IsRegionCodeCorrect(self, region_code): 935 if len(region_code) == 0: 936 return True 937 if '-' not in region_code: 938 self._ReportError('Region code must be in the ISO 3166-2 format.') 939 return False 941 parts = region_code.split('-') 942 if not self._IsCountryCode2Correct(parts[0]): 943 return False 944 return True 946 ############################################################ 947 def _ReportError(self, message): 948 self._ReportWithSeverity('ERROR', message) 950 def _ReportWarning(self, message): 951 self._ReportWithSeverity('WARNING', message) 953 def _ReportWithSeverity(self, severity, message): 954 self.is_correct_line = False 955 output_line = '%s: %s\n' % (severity, message) 957 if severity not in self.output_log: 958 self.output_log[severity] = [] 959 self.output_log[severity].append(output_line) 961 if self.output_stream is not None: 962 self.output_stream.write(output_line) 964 def _FlushOutputStream(self): 965 if self.is_correct_line: return 966 if self.output_stream is None: return 968 self.output_stream.write('line %d: %s\n\n' 969 % (self.line_number, self.line)) 971 ############################################################ 972 def main(): 973 feed_validator = IPGeoFeedValidator() 974 feed_validator.Validate(sys.stdin) 976 if feed_validator.CountErrors('ERROR'): 977 sys.exit(1) 979 if __name__ == '__main__': 980 main() 982 A unit test file, "ipgeo_feed_validator_test.py" is provided as well. 983 It provides basic test coverage of the code above, though does not 984 test correct handling of non-ASCII UTF-8 strings. 986 #!/usr/bin/python 987 # 988 # Copyright (c) 2012 IETF Trust and the persons identified as authors of 989 # the code. All rights reserved. Redistribution and use in source and 990 # binary forms, with or without modification, is permitted pursuant to, 991 # and subject to the license terms contained in, the Simplified BSD 992 # License set forth in Section 4.c of the IETF Trust's Legal Provisions 993 # Relating to IETF Documents (http://trustee.ietf.org/license-info). 995 import sys 996 from ipgeo_feed_validator import IPGeoFeedValidator 998 class IPGeoFeedValidatorTest(object): 999 def __init__(self): 1000 self.validator = IPGeoFeedValidator() 1001 self.validator.SetOutputStream(None) 1002 self.successes = 0 1003 self.failures = 0 1005 def Run(self): 1006 self.TestFeedLine('# asdf', 0, 0) 1007 self.TestFeedLine(' ', 0, 0) 1008 self.TestFeedLine('', 0, 0) 1010 self.TestFeedLine('asdf', 1, 1) 1011 self.TestFeedLine('asdf,US,,,', 1, 0) 1012 self.TestFeedLine('aaaa::,US,,,', 0, 0) 1013 self.TestFeedLine('zzzz::,US', 1, 1) 1014 self.TestFeedLine(',US,,,', 1, 0) 1015 self.TestFeedLine('55.66.77', 1, 1) 1016 self.TestFeedLine('55.66.77.888', 1, 1) 1017 self.TestFeedLine('55.66.77.asdf', 1, 1) 1019 self.TestFeedLine('2001:db8:cafe::/48,PL,PL-MZ,,02-784', 0, 0) 1020 self.TestFeedLine('2001:db8:cafe::/48', 0, 1) 1022 self.TestFeedLine('55.66.77.88,PL', 0, 1) 1023 self.TestFeedLine('55.66.77.88,PL,,,', 0, 0) 1024 self.TestFeedLine('55.66.77.88,,,,', 0, 0) 1025 self.TestFeedLine('55.66.77.88,ZZ,,,', 0, 0) 1026 self.TestFeedLine('55.66.77.88,US,,,', 0, 0) 1027 self.TestFeedLine('55.66.77.88,USA,,,', 1, 0) 1028 self.TestFeedLine('55.66.77.88,99,,,', 1, 0) 1030 self.TestFeedLine('55.66.77.88,US,US-CA,,', 0, 0) 1031 self.TestFeedLine('55.66.77.88,US,USA-CA,,', 1, 0) 1032 self.TestFeedLine('55.66.77.88,USA,USA-CA,,', 2, 0) 1034 self.TestFeedLine('55.66.77.88,US,US-CA,Mountain View,', 0, 0) 1035 self.TestFeedLine('55.66.77.88,US,US-CA,Mountain View,94043', 0, 0) 1036 self.TestFeedLine('55.66.77.88,US,US-CA,Mountain View,94043,' 1037 '1600 Ampthitheatre Parkway', 0, 1) 1039 self.TestFeedLine('55.66.77.0/24,US,,,', 0, 0) 1040 self.TestFeedLine('55.66.77.88/24,US,,,', 1, 0) 1041 self.TestFeedLine('55.66.77.88/32,US,,,', 0, 0) 1042 self.TestFeedLine('55.66.77/24,US,,,', 1, 0) 1043 self.TestFeedLine('55.66.77.0/35,US,,,', 1, 0) 1045 self.TestFeedLine('172.15.30.1,US,,,', 0, 0) 1046 self.TestFeedLine('172.28.30.1,US,,,', 1, 0) 1047 self.TestFeedLine('192.167.100.1,US,,,', 0, 0) 1048 self.TestFeedLine('192.168.100.1,US,,,', 1, 0) 1049 self.TestFeedLine('10.0.5.9,US,,,', 1, 0) 1050 self.TestFeedLine('10.0.5.0/24,US,,,', 1, 0) 1051 self.TestFeedLine('fc00::/48,PL,,,', 1, 0) 1052 self.TestFeedLine('fe00::/48,PL,,,', 0, 0) 1054 print '%d tests passed, %d failed' % (self.successes, self.failures) 1056 def IsOutputLogCorrectAtSeverity(self, severity, expected_msg_count): 1057 msg_count = self.validator.CountErrors(severity) 1059 if msg_count != expected_msg_count: 1060 print 'TEST FAILED: %s\nexpected %d %s[s], observed %d\n%s\n' % ( 1061 self.validator.line, expected_msg_count, severity, msg_count, 1062 str(self.validator.output_log[severity])) 1063 return False 1064 return True 1066 def IsOutputLogCorrect(self, new_errors, new_warnings): 1067 retval = True 1069 if not self.IsOutputLogCorrectAtSeverity('ERROR', new_errors): 1070 retval = False 1071 if not self.IsOutputLogCorrectAtSeverity('WARNING', new_warnings): 1072 retval = False 1074 return retval 1076 def TestFeedLine(self, line, warning_count, error_count): 1077 self.validator.output_log['WARNING'] = [] 1078 self.validator.output_log['ERROR'] = [] 1079 self.validator._ValidateLine(line) 1081 if not self.IsOutputLogCorrect(warning_count, error_count): 1082 self.failures += 1 1083 return False 1085 self.successes += 1 1086 return True 1088 if __name__ == '__main__': 1089 IPGeoFeedValidatorTest().Run() 1091 Authors' Addresses 1093 Erik Kline 1094 Loon LLC 1095 1600 Amphitheatre Parkway 1096 Mountain View, California 94043 1097 United States of America 1099 Email: ek@loon.com 1101 Krzysztof Duleba 1102 Google Switzerland GmbH 1103 Brandschenkestrasse 110 1104 Zuerich 8002 1105 Switzerland 1107 Email: kduleba@google.com 1109 Zoltan Szamonek 1110 Google Switzerland GmbH 1111 Brandschenkestrasse 110 1112 Zuerich 8002 1113 Switzerland 1115 Email: zszami@google.com 1116 Stefan Moser 1117 Google Switzerland GmbH 1118 Brandschenkestrasse 110 1119 Zuerich 8002 1120 Switzerland 1122 Email: smoser@google.com 1124 Warren Kumari 1125 Google 1126 1600 Amphitheatre Parkway 1127 Mountain View, CA 94043 1128 US 1130 Email: warren@kumari.net