idnits 2.17.1 draft-google-self-published-geofeeds-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 1 instance of lines with non-RFC2606-compliant FQDNs in the document. == There are 5 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. == There are 3 instances of lines with non-RFC3849-compliant IPv6 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (August 16, 2019) is 1707 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Looks like a reference, but probably isn't: '1' on line 678 -- Looks like a reference, but probably isn't: '2' on line 680 -- Looks like a reference, but probably isn't: '3' on line 682 -- Looks like a reference, but probably isn't: '4' on line 684 -- Looks like a reference, but probably isn't: '5' on line 686 -- Looks like a reference, but probably isn't: '6' on line 688 -- Looks like a reference, but probably isn't: '7' on line 690 -- Looks like a reference, but probably isn't: '8' on line 692 ** Obsolete normative reference: RFC 2616 (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235) -- Obsolete informational reference (is this intentional?): RFC 2818 (Obsoleted by RFC 9110) -- Obsolete informational reference (is this intentional?): RFC 4408 (Obsoleted by RFC 7208) -- Obsolete informational reference (is this intentional?): RFC 4627 (Obsoleted by RFC 7158, RFC 7159) Summary: 1 error (**), 0 flaws (~~), 4 warnings (==), 13 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group E. Kline 3 Internet-Draft Loon LLC 4 Intended status: Informational K. Duleba 5 Expires: February 17, 2020 Z. Szamonek 6 S. Moser 7 Google Switzerland GmbH 8 W. Kumari 9 Google 10 August 16, 2019 12 A Format for Self-published IP Geolocation Feeds 13 draft-google-self-published-geofeeds-05 15 Abstract 17 This document records a format whereby a network operator can publish 18 a mapping of IP address prefixes to simplified geolocation 19 information, colloquially termed a geolocation "feed". Interested 20 parties can poll and parse these feeds to update or merge with other 21 geolocation data sources and procedures. This format intentionally 22 only allows specifying coarse level location. 24 Some technical organizations operating networks that move from one 25 conference location to the next have already experimentally published 26 small geolocation feeds. 28 This document describes a currently deployed format. At least one 29 consumer (Google) has incorporated these feeds into a geolocation 30 data pipeline, and a significant number of ISPs are using it to 31 inform them where their prefixes should be geolocated. 33 [RFC Ed - Please remove publication: The IETF Meeting network 34 currently publishes a feed in this format at: 35 https://noc.ietf.org/geo/google.csv -- this has significantly cut 36 down on the number of "Gah! Why does the network believe I'm in 37 Montreal, that was last meeting! How am I supposed to find a pub?!" 38 complaints. A number of other meeting networks, including RIPE and 39 ICANN publish this information as well, see below. ] 41 [ Ed note: Text inside square brackets ([]) is additional background 42 information, answers to frequently asked questions, general musings, 43 etc. They will be removed before publication.] 45 [ This document is being collaborated on in Github at: 46 https://github.com/google/self-published-geo . The most recent 47 version of the document, open issues, etc should all be available 48 here. The authors (gratefully) accept pull requests ] 50 Status of This Memo 52 This Internet-Draft is submitted in full conformance with the 53 provisions of BCP 78 and BCP 79. 55 Internet-Drafts are working documents of the Internet Engineering 56 Task Force (IETF). Note that other groups may also distribute 57 working documents as Internet-Drafts. The list of current Internet- 58 Drafts is at http://datatracker.ietf.org/drafts/current/. 60 Internet-Drafts are draft documents valid for a maximum of six months 61 and may be updated, replaced, or obsoleted by other documents at any 62 time. It is inappropriate to use Internet-Drafts as reference 63 material or to cite them other than as "work in progress." 65 This Internet-Draft will expire on February 17, 2020. 67 Copyright Notice 69 Copyright (c) 2019 IETF Trust and the persons identified as the 70 document authors. All rights reserved. 72 This document is subject to BCP 78 and the IETF Trust's Legal 73 Provisions Relating to IETF Documents 74 (http://trustee.ietf.org/license-info) in effect on the date of 75 publication of this document. Please review these documents 76 carefully, as they describe your rights and restrictions with respect 77 to this document. Code Components extracted from this document must 78 include Simplified BSD License text as described in Section 4.e of 79 the Trust Legal Provisions and are provided without warranty as 80 described in the Simplified BSD License. 82 Table of Contents 84 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 85 1.1. Motivation . . . . . . . . . . . . . . . . . . . . . . . 3 86 1.2. Requirements notation . . . . . . . . . . . . . . . . . . 4 87 1.3. Implications of publication . . . . . . . . . . . . . . . 4 88 2. Self-published IP geolocation feeds . . . . . . . . . . . . . 4 89 2.1. Specification . . . . . . . . . . . . . . . . . . . . . . 5 90 2.1.1. Geolocation feed individual entry fields . . . . . . 5 91 2.1.1.1. IP Prefix . . . . . . . . . . . . . . . . . . . . 5 92 2.1.1.2. Country . . . . . . . . . . . . . . . . . . . . . 5 93 2.1.1.3. Region . . . . . . . . . . . . . . . . . . . . . 5 94 2.1.1.4. City . . . . . . . . . . . . . . . . . . . . . . 6 95 2.1.1.5. Postal code . . . . . . . . . . . . . . . . . . . 6 96 2.1.2. Prefixes with no geolocation information . . . . . . 6 97 2.1.3. Additional parsing requirements . . . . . . . . . . . 7 98 2.1.4. Looking up an IP address . . . . . . . . . . . . . . 7 99 2.2. Examples . . . . . . . . . . . . . . . . . . . . . . . . 7 100 3. Consuming self-published IP geolocation feeds . . . . . . . . 8 101 3.1. Feed integrity . . . . . . . . . . . . . . . . . . . . . 9 102 3.2. Verification of authority . . . . . . . . . . . . . . . . 9 103 3.3. Verification of accuracy . . . . . . . . . . . . . . . . 9 104 3.4. Refreshing feed information . . . . . . . . . . . . . . . 9 105 4. Privacy Considerations . . . . . . . . . . . . . . . . . . . 10 106 5. Relation to other work . . . . . . . . . . . . . . . . . . . 10 107 6. Security Considerations . . . . . . . . . . . . . . . . . . . 11 108 7. Planned future work . . . . . . . . . . . . . . . . . . . . . 11 109 8. Finding self-published IP geolocation feeds . . . . . . . . . 12 110 8.1. Ad hoc 'well known' URIs . . . . . . . . . . . . . . . . 12 111 8.2. Other mechanisms . . . . . . . . . . . . . . . . . . . . 12 112 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12 113 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 12 114 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 13 115 11.1. Normative References . . . . . . . . . . . . . . . . . . 13 116 11.2. Informative References . . . . . . . . . . . . . . . . . 13 117 11.3. URIs . . . . . . . . . . . . . . . . . . . . . . . . . . 15 118 Appendix A. Sample Python validation code . . . . . . . . . . . 15 119 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 22 121 1. Introduction 123 1.1. Motivation 125 Providers of services over the Internet have grown to depend on best- 126 effort geolocation information to improve the user experience. 127 Locality information can aid in directing traffic to the nearest 128 serving location, inferring likely native language, and providing 129 additional context for services involving search queries. 131 When an ISP, for example, changes the location where an IP prefix is 132 deployed, services which make use of geolocation information may 133 begin to suffer degraded performance. This can lead to customer 134 complaints, possibly to the ISP directly. Dissemination of correct 135 geolocation data is complicated by the lack of any centralized means 136 to coordinate and communicate geolocation information to all 137 interested consumers of the data. 139 This document records a format whereby a network operator (an ISP, an 140 enterprise, or any organization which deems the geolocation of its IP 141 prefixes to be of concern) can publish a mapping of IP address 142 prefixes to simplified geolocation information, colloquially termed a 143 "geolocation feed". Interested parties can poll and parse these 144 feeds to update or merge with other geolocation data sources and 145 procedures. 147 This document describes a currently deployed format. At least one 148 consumer (Google) has incorporated these feeds into a geolocation 149 data pipeline, and a significant number of ISPs are using it to 150 inform them where their prefixes should be geolocated. 152 1.2. Requirements notation 154 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 155 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 156 document are to be interpreted as described in RFC2119. 158 1.3. Implications of publication 160 This document describes both a format and a mechanism for publishing 161 data, with the implication that the owner of the data wishes it to be 162 public. Any privacy risk is bounded by the format, and feed 163 publishers MAY omit any location field to further protect privacy 164 (see Section 2.1 for details about which fields exactly may be 165 omitted). Feed publishers assume the responsibility of determining 166 which data should be made public. 168 This proposal does not incorporate a mechanism to communicate 169 acceptable use policies for self-published data. Publication itself 170 is inferred as a desire by the publisher for the data to be usefully 171 consumed, similar to the publication of information like host names, 172 cryptographic keys, and SPF records [RFC4408] in the DNS. 174 2. Self-published IP geolocation feeds 176 The format described here was developed to address the need of 177 network operators to rapidly and usefully share geolocation 178 information changes. Originally, there arose a specific case where 179 regional operators found it desirable to publish location changes 180 rather than wait for geolocation algorithms to "learn" about them. 181 Later, technical conferences which frequently use the same network 182 prefixes advertised from different conference locations experimented 183 by publishing geolocation feeds, updated in advance of network 184 location changes, in order to better serve conference attendees. 186 At its simplest, the mechanism consists of a network operator 187 publishing a file (the "geolocation feed"), which contains several 188 text entries, one per line. Each entry is keyed by a unique (within 189 the feed) IP prefix (or single IP address) followed by a sequence of 190 network locality attributes to be ascribed to the given prefix. 192 2.1. Specification 194 For operational simplicity, every feed should contain data about all 195 IP addresses the provider wants to publish. Alternatives, like 196 publishing only entries for IP addresses whose geolocation data has 197 changed or differ from current observed geolocation behavior "at 198 large", are likely to be too operationally complex. 200 Feeds MUST use UTF-8 [RFC3629] character encoding. Text after a '#' 201 character is treated as a comment only and ignored. Blank lines are 202 similarly ignored. 204 Feeds lines that are not comments MUST be in comma separated value 205 (CSV) format as described in [RFC4180]. Each feed entry is a text 206 line of the form: 208 ip_prefix,country,region,city,postal_code 210 The IP prefix field is REQUIRED, all others are OPTIONAL (can be 211 empty), though the requisite minimum number of commas SHOULD be 212 present. 214 2.1.1. Geolocation feed individual entry fields 216 2.1.1.1. IP Prefix 218 REQUIRED. Each IP prefix field MUST be either a single IP address or 219 an IP prefix in CIDR notation in conformance with section 3.1 [1] of 220 [RFC4632] for IPv4 or section 2.3 [2] of [RFC4291] for IPv6. 222 Examples include "192.0.2.1" and "192.0.2.0/24" for IPv4 and 223 "2001:db8::1" and "2001:db8::/32" for IPv6. 225 2.1.1.2. Country 227 OPTIONAL. The country field, if non-empty, MUST be a 2 letter ISO 228 country code conforming to ISO 3166-1 alpha 2 [ISO.3166.1alpha2]. 229 Parsers SHOULD treat this field case-insensitively. 231 Examples include "US" for the United States, "JP" for Japan, and "PL" 232 for Poland. 234 2.1.1.3. Region 236 OPTIONAL. The region field, if non-empty, MUST be a ISO region code 237 conforming to ISO 3166-2 [ISO.3166.2]. Parsers SHOULD treat this 238 field case-insensitively. 240 Examples include "ID-RI" for the Riau province of Indonesia and "NG- 241 RI" for the Rivers province in Nigeria. 243 2.1.1.4. City 245 OPTIONAL. The city field, if non-empty, SHOULD be free UTF-8 text, 246 excluding the comma (',') character. 248 Examples include "Dublin", "New York", and "Sao Paulo" (specifically 249 "S" followed by 0xc3, 0xa3, and "o Paulo"). 251 2.1.1.5. Postal code 253 OPTIONAL, DEPRECATED. The postal code field, if non-empty, SHOULD be 254 free UTF-8 text, excluding the comma (',') character. The use of 255 this field is deprecated; consumers of feeds should be able to parse 256 feeds containing these fields, but new feeds SHOULD NOT include this 257 field, due to the granularity of this information. See Section 4 for 258 additional discussion. 260 Examples include "106-6126" (in Minato ward, Tokyo, Japan). 262 2.1.2. Prefixes with no geolocation information 264 Feed publishers may indicate that some IP prefixes should not have 265 any associated geolocation information. It may be that some prefixes 266 under their administrative control are reserved, not yet allocated or 267 deployed, or are in the process of being redeployed elsewhere and 268 existing geolocation information can, from the perspective of the 269 publisher, safely be discarded. 271 This special case can be indicated by explicitly leaving blank all 272 fields which specify any degree of geolocation information. For 273 example: 275 192.0.2.0/24,,,, 276 2001:db8:1::/48,,,, 277 2001:db8:2::/48,,,, 279 Historically, the user-assigned country identifier of "ZZ" had be 280 used for this same purpose. This is not necessarily preferred, and 281 no specific interpretation of any of the other user-assigned country 282 codes is currently defined. 284 2.1.3. Additional parsing requirements 286 Feed entries missing required fields, or having a required field 287 which fails to parse correctly MUST be discarded. It is RECOMMENDED 288 that such entries also be logged for further administrative review. 290 While publishers SHOULD follow [RFC5952] style for IPv6 prefix 291 fields, consumers MUST nevertheless accept all valid string 292 representations. 294 Duplicate IP address or prefix entries MUST be considered an error, 295 and consumer implementations SHOULD log the repeated entries for 296 further administrative review. Publishers SHOULD take measures to 297 ensure there is one and only one entry per IP address and prefix. 299 Feed entries with non-empty optional fields which fail to parse, 300 either in part or in full, SHOULD be discarded. It is RECOMMENDED 301 that they also be logged for further administrative review. 303 For compatibility with future additional fields, a parser MUST ignore 304 any fields beyond those it expects. The data from fields which are 305 expected and which parse successfully MUST still be considered valid. 307 2.1.4. Looking up an IP address 309 Multiple entries which constitute nested prefixes are permitted. 310 Consumers SHOULD consider the entry with the longest matching prefix 311 (i.e. the "most specific") to be the best matching entry for a given 312 IP address. 314 2.2. Examples 316 Example entries using different IP address formats and describing 317 locations at country, region, and city granularity level, 318 respectively: 320 192.0.2.0/25,US,US-AL,, 321 192.0.2.5,US,US-AL,Alabaster, 322 192.0.2.128/25,PL,PL-MZ,, 323 2001:db8::/32,PL,,, 324 2001:db8:cafe::/48,PL,PL-MZ,, 326 The IETF network publishes geolocation information for the meeting 327 prefixes, and generally just comment out the last meeting information 328 and append the new meeting information. The [GEO_IETF] at the time 329 of this writing contains: 331 # IETF 104, March 2019 - Prague, CZ. 332 # Note that Prague changed from CZ-PR to CZ-10 2016-11-15 333 # https://www.iso.org/obp/ui/#iso:code:3166:CZ 334 130.129.0.0/16,CZ,CZ-10,Prague, 335 2001:df8::/32,CZ,CZ-10,Prague, 336 31.133.128.0/18,CZ,CZ-10,Prague, 337 31.130.224.0/20,CZ,CZ-10,Prague, 338 2001:67c:1230::/46,CZ,CZ-10,Prague, 339 2001:67c:370::/48,CZ,CZ-10,Prague, 341 Experimentally, RIPE has published geolocation information for their 342 conference network prefixes, which change location in accordance with 343 each new event. [GEO_RIPE_NCC] at the time of writing contains: 345 193.0.24.0/21,IS,IS-1,Reykjavik, 346 2001:67c:64::/48,IS,IS-1,Reykjavik, 348 Similarly, ICANN has published geolocation information for their 349 portable conference network prefixes. [GEO_ICANN] at the time of 350 writing contains: 352 199.91.192.0/21,ES,ES-CT,Barcelona 353 2620:f:8000::/48,ES,ES-CT,Barcelona 355 A longer example is the [GEO_Google] Google Corp Geofeed, which lists 356 the geo-location information for Google corporate offices. 358 Furthermore, it is worth noting that the geolocation data of SixXS 359 users, already available at whois.sixxs.net, is now also accessible 360 in the format described here (see [GEO_SIXXS]). This can be 361 particularly useful where tunnel broker networks [RFC3053] are 362 concerned as: 364 o the geolocation attributes of users with neighboring prefixes can 365 be quite different and therefore not easily aggregated, and 367 o attempting to learn this data by statistical analysis can be 368 complicated by the likely low number of samples for any given 369 user, making satisfactory statistical confidence difficult to 370 achieve. 372 3. Consuming self-published IP geolocation feeds 374 Consumers MAY treat published feed data as a hint only and MAY choose 375 to prefer other sources of geolocation information for any given IP 376 prefix. Regardless of a consumer's stance with respect to a given 377 published feed, there are some points of note for sensibly and 378 effectively consuming published feeds. 380 3.1. Feed integrity 382 The integrity of published information SHOULD be protected by 383 securing the means of publication, for example by using HTTP over TLS 384 [RFC2818]. Whenever possible, consumers SHOULD prefer retrieving 385 geolocation feeds in a manner that guarantees integrity of the feed. 387 3.2. Verification of authority 389 Consumers of self-published IP geolocation feeds SHOULD perform some 390 form of verification that the publisher is in fact authoritative for 391 the addresses in the feed. The actual means of verification is 392 likely dependent upon the way in which the feed is discovered. Ad 393 hoc shared URIs, for example, will likely require an ad hoc 394 verification process. Future automated means of feed discovery 395 SHOULD have an accompanying automated means of verification. 397 A consumer MUST only trust geolocation information for IP addresses 398 or prefixes for which the publisher has been verified as 399 administratively authoritative. All other geolocation feed entries 400 MUST be ignored and SHOULD be logged for further administrative 401 review. 403 3.3. Verification of accuracy 405 Errors and inaccuracies may occur at many levels, and publication and 406 consumption of geolocation data are no exceptions. To the extent 407 practical, consumers SHOULD take steps to verify the accuracy of 408 published locality. Verification methodology, resolution of 409 discrepancies, and preference for alternative sources of data are 410 left to the discretion of the feed consumer. 412 Consumers SHOULD decide on discrepancy thresholds and SHOULD flag for 413 administrative review feed entries which exceed set thresholds. 415 3.4. Refreshing feed information 417 As a publisher can change geolocation data at any time and without 418 notification, consumers SHOULD implement mechanisms to periodically 419 refresh local copies of feed data. In the absence of any other 420 refresh timing information, it is recommended that consumers SHOULD 421 refresh feeds no less often than weekly. 423 For feeds available via HTTPS (or HTTP), the publisher MAY 424 communicate refresh timing information by means of the standard HTTP 425 expiration model (section 13.2 [3] of [RFC2616]). Specifically, 426 publishers can include either an Expires header [4] or a Cache- 427 Control header [5] specifying the max-age. Where practical, 428 consumers SHOULD refresh feed information before the expiry time is 429 reached. 431 4. Privacy Considerations 433 Publishers of geolocation feeds are advised to have fully considered 434 any and all privacy implications of the disclosure of such 435 information for the users of the described networks prior to 436 publication. A thorough comprehension of the security considerations 437 [6] of a chosen geolocation policy is highly recommended, including 438 an understanding of some of the limitations of information obscurity 439 [7] (see also [RFC6772]). 441 As noted in Section 2.1, each location field in an entry is optional, 442 in order to support expressing only the level of specificity which 443 the publisher has deemed acceptable. There is no requirement that 444 the level of specificity be consistent across all entries within a 445 feed. In particular, the Postal Code field (Section 2.1.1.5) can 446 provide very specific geolocation, sometimes within a building. Such 447 specific Postal Code values MUST NOT be published in geo feeds 448 without the express consent of the parties being located. 450 5. Relation to other work 452 While not originally done in conjunction with the [GEOPRIV] working 453 group, Richard Barnes observed that this work is nevertheless 454 consistent with that which the group has defined, both for address 455 format and for privacy. The data elements in geolocation feeds are 456 equivalent to the following XML structure (vis. [RFC5139]): 458 459 country 460 region 461 city 462 postal_code 463 465 Providing geolocation information to this granularity is equivalent 466 to the following privacy policy (vis. the definition of the 467 'building' [8] level of disclosure): 469 470 471 472 473 474 475 building 476 477 478 479 481 6. Security Considerations 483 As there is no true security in the obscurity of the location of any 484 given IP address, self-publication of this data fundamentally opens 485 no new attack vectors. For publishers, self-published data merely 486 increases the ease with which such location data might be exploited. 488 For consumers, feed retrieval processes may receive input from 489 potentially hostile sources (e.g. in the event of hijacked traffic). 490 As such, proper input validation and defense measures MUST be taken. 492 Similarly, consumers who do not perform sufficient verification of 493 published data bear the same risks as from other forms of geolocation 494 configuration errors. 496 Validation of a feed's contents includes verifying that the publisher 497 is authoritative for the IP prefixes included in the feed. Failure 498 to verify IP prefix authority would, for example, allow ISP Bob to 499 make geolocation statements about IP space held by ISP Alice. At 500 this time only out-of-band verification methods are implemented (i.e. 501 an ISP's feed may be verified against publicly available IP 502 allocation data). 504 7. Planned future work 506 In order to more flexibly support future extensions, use of a more 507 expressive feed format has been suggested. Use of JavaScript Object 508 Notation (JSON, [RFC4627]), specifically, has been discussed. 509 However, at the time of writing no such specification nor 510 implementation exists. Nevertheless, work on extensions is deferred 511 until a more suitable format has been selected 513 The authors are planning on writing a document describing such a new 514 format. The current document describes a currently deployed and used 515 format. 517 8. Finding self-published IP geolocation feeds 519 The issue of finding, and later verifying, geolocation feeds is not 520 formally specified in this document. At this time, only ad hoc feed 521 discovery and verification has a modicum of established practice (see 522 below); discussion of other mechanisms has been removed for clarity. 524 8.1. Ad hoc 'well known' URIs 526 To date, geolocation feeds have been shared informally in the form of 527 HTTPS URIs exchanged in email threads. The two example URIs 528 documented above describe networks that change locations 529 periodically, the operators and operational practices of which are 530 well known within their respective technical communities. 532 The contents of the feeds are verified by a similarly ad hoc process 533 including: 535 o personal knowledge of the parties involved in the exchange, and 537 o comparison of feed-advertised prefixes with the BGP-advertised 538 prefixes of Autonomous System Numbers known to be operated by the 539 publishers. 541 Ad hoc mechanisms, while useful for early experimentation by 542 producers and consumers, are unlikely to be adequate for long-term, 543 widespread use by multiple parties. Future versions of any such 544 self-published geolocation feed mechanism SHOULD address scalability 545 concerns by defining a means for automated discovery and verification 546 of operational authority of advertised prefixes. 548 8.2. Other mechanisms 550 Previous versions of this document referenced use of the WHOIS 551 service ([RFC3912]) operated by RIRs as well as possible DNS-based 552 schemes to discover and validate geofeeds. To the authors' knowledge 553 support for such mechanisms has never been implemented, and this 554 speculative text has been removed to avoid ambiguity. 556 9. IANA Considerations 558 This document makes no requests of the IANA. 560 10. Acknowledgements 562 The authors would like to express their gratitude to reviewers and 563 early implementers, including but not limited to Mikael Abrahamsson, 564 Ray Bellis, John Bond, Alissa Cooper, Andras Erdei, Marco Hogewoning, 565 Mike Joseph, Maciej Kuzniar, Menno Schepers, Justyna Sidorska, Pim 566 van Pelt, and Bjoern A. Zeeb. Richard L. Barnes and Andy Newton in 567 particular contributed substantial review, text, and advice. 569 11. References 571 11.1. Normative References 573 [ISO.3166.1alpha2] 574 International Organization for Standardization, "ISO 575 3166-1 decoding table", 576 . 579 [ISO.3166.2] 580 International Organization for Standardization, "ISO 581 3166-2:2007", . 584 [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., 585 Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext 586 Transfer Protocol -- HTTP/1.1", RFC 2616, 587 DOI 10.17487/RFC2616, June 1999, . 590 [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO 591 10646", STD 63, RFC 3629, DOI 10.17487/RFC3629, November 592 2003, . 594 [RFC4291] Hinden, R. and S. Deering, "IP Version 6 Addressing 595 Architecture", RFC 4291, DOI 10.17487/RFC4291, February 596 2006, . 598 [RFC4632] Fuller, V. and T. Li, "Classless Inter-domain Routing 599 (CIDR): The Internet Address Assignment and Aggregation 600 Plan", BCP 122, RFC 4632, DOI 10.17487/RFC4632, August 601 2006, . 603 11.2. Informative References 605 [GEO_Google] 606 Google, LLC, "Google Corp Geofeed", 607 . 609 [GEO_ICANN] 610 Internet Corporation For Assigned Names and Numbers, 611 "ICANN Meeting Geolocation Data", 612 . 614 [GEO_IETF] 615 Kumari, A., "IETF Meeting Network Geolocation Data", 616 . 618 [GEO_RIPE_NCC] 619 Schepers, M., "RIPE NCC Meeting Geolocation Data", 620 . 622 [GEO_SIXXS] 623 van Pelt, P., "SixXS Geolocation Data", 624 . 626 [GEOPRIV] Internet Engineering Task Force, "IETF geopriv Working 627 Group", . 629 [IPADDR_PY] 630 Shields, M. and P. Moody, "Python IP address manipulation 631 library", . 633 [RFC2818] Rescorla, E., "HTTP Over TLS", RFC 2818, 634 DOI 10.17487/RFC2818, May 2000, . 637 [RFC3053] Durand, A., Fasano, P., Guardini, I., and D. Lento, "IPv6 638 Tunnel Broker", RFC 3053, DOI 10.17487/RFC3053, January 639 2001, . 641 [RFC3912] Daigle, L., "WHOIS Protocol Specification", RFC 3912, 642 DOI 10.17487/RFC3912, September 2004, . 645 [RFC4180] Shafranovich, Y., "Common Format and MIME Type for Comma- 646 Separated Values (CSV) Files", RFC 4180, 647 DOI 10.17487/RFC4180, October 2005, . 650 [RFC4408] Wong, M. and W. Schlitt, "Sender Policy Framework (SPF) 651 for Authorizing Use of Domains in E-Mail, Version 1", 652 RFC 4408, DOI 10.17487/RFC4408, April 2006, 653 . 655 [RFC4627] Crockford, D., "The application/json Media Type for 656 JavaScript Object Notation (JSON)", RFC 4627, 657 DOI 10.17487/RFC4627, July 2006, . 660 [RFC5139] Thomson, M. and J. Winterbottom, "Revised Civic Location 661 Format for Presence Information Data Format Location 662 Object (PIDF-LO)", RFC 5139, DOI 10.17487/RFC5139, 663 February 2008, . 665 [RFC5952] Kawamura, S. and M. Kawashima, "A Recommendation for IPv6 666 Address Text Representation", RFC 5952, 667 DOI 10.17487/RFC5952, August 2010, . 670 [RFC6772] Schulzrinne, H., Ed., Tschofenig, H., Ed., Cuellar, J., 671 Polk, J., Morris, J., and M. Thomson, "Geolocation Policy: 672 A Document Format for Expressing Privacy Preferences for 673 Location Information", RFC 6772, DOI 10.17487/RFC6772, 674 January 2013, . 676 11.3. URIs 678 [1] http://tools.ietf.org/html/rfc4632#section-3.1 680 [2] http://tools.ietf.org/html/rfc4291#section-2.3 682 [3] http://tools.ietf.org/html/rfc2616#section-13.2 684 [4] http://tools.ietf.org/html/rfc2616#section-14.21 686 [5] http://tools.ietf.org/html/rfc2616#section-14.9 688 [6] http://tools.ietf.org/html/rfc6772#section-13 690 [7] http://tools.ietf.org/html/rfc6772#section-13.5 692 [8] http://tools.ietf.org/html/rfc6772#section-6.5.1 694 Appendix A. Sample Python validation code 696 Included here is a simple format validator in Python for self- 697 published ipgeo feeds. This tool reads CSV data in the self- 698 published ipgeo feed format from the standard input and performs 699 basic validation. It is intended for use by feed publishers before 700 launching a feed. Note that this validator does not verify the 701 uniqueness of every IP prefix entry within the feed as a whole, but 702 only verifies the syntax of each single line from within the feed. A 703 complete validator MUST also ensure IP prefix uniqueness. 705 The main source file "ipgeo_feed_validator.py" follows. It requires 706 use of the open source ipaddr Python library for IP address and CIDR 707 parsing and validation [IPADDR_PY]. 709 710 #!/usr/bin/python 711 # 712 # Copyright (c) 2012 IETF Trust and the persons identified as authors of 713 # the code. All rights reserved. Redistribution and use in source and 714 # binary forms, with or without modification, is permitted pursuant to, 715 # and subject to the license terms contained in, the Simplified BSD 716 # License set forth in Section 4.c of the IETF Trust's Legal Provisions 717 # Relating to IETF Documents (http://trustee.ietf.org/license-info). 719 """Simple format validator for self-published ipgeo feeds. 721 This tool reads CSV data in the self-published ipgeo feed format from 722 the standard input and performs basic validation. It is intended for 723 use by feed publishers before launching a feed. 724 """ 726 import csv 727 import ipaddr 728 import re 729 import sys 731 class IPGeoFeedValidator(object): 732 def __init__(self): 733 self.prefixes = {} 734 self.line_number = 0 735 self.output_log = {} 736 self.SetOutputStream(sys.stderr) 738 def Validate(self, feed): 739 """Check validity of an IPGeo feed. 741 Args: 742 feed: iterable with feed lines 743 """ 745 for line in feed: 746 self._ValidateLine(line) 748 def SetOutputStream(self, logfile): 749 """Controls where the output messages go do (STDERR by default). 751 Use None to disable logging. 753 Args: 754 logfile: a file object (e.g., sys.stdout or sys.stderr) or None. 756 """ 757 self.output_stream = logfile 759 def CountErrors(self, severity): 760 """How many ERRORs or WARNINGs were generated.""" 761 return len(self.output_log.get(severity, [])) 763 ############################################################ 764 def _ValidateLine(self, line): 765 line = line.rstrip('\r\n') 766 self.line_number += 1 767 self.line = line.split('#')[0] 768 self.is_correct_line = True 770 if self._ShouldIgnoreLine(line): 771 return 773 fields = [field for field in csv.reader([line])][0] 775 self._ValidateFields(fields) 776 self._FlushOutputStream() 778 def _ShouldIgnoreLine(self, line): 779 line = line.strip() 780 return len(line) == 0 782 ############################################################ 783 def _ValidateFields(self, fields): 784 assert(len(fields) > 0) 786 is_correct = self._IsIPAddressOrPrefixCorrect(fields[0]) 788 if len(fields) > 1: 789 if not self._IsCountryCode2Correct(fields[1]): 790 is_correct = False 792 if len(fields) > 2 and not self._IsRegionCodeCorrect(fields[2]): 793 is_correct = False 795 if len(fields) != 5: 796 self._ReportWarning('5 fields were expected (got %d).' 797 % len(fields)) 799 ############################################################ 800 def _IsIPAddressOrPrefixCorrect(self, field): 801 if '/' in field: 802 return self._IsCIDRCorrect(field) 803 return self._IsIPAddressCorrect(field) 805 def _IsCIDRCorrect(self, cidr): 806 try: 807 ipprefix = ipaddr.IPNetwork(cidr) 808 if ipprefix.network._ip != ipprefix._ip: 809 self._ReportError('Incorrect IP Network.') 810 return False 811 if ipprefix.is_private: 812 self._ReportError('IP Address must not be private.') 813 return False 814 except: 815 self._ReportError('Incorrect IP Network.') 816 return False 817 return True 819 def _IsIPAddressCorrect(self, ipaddress): 820 try: 821 ip = ipaddr.IPAddress(ipaddress) 822 except: 823 self._ReportError('Incorrect IP Address.') 824 return False 825 if ip.is_private: 826 self._ReportError('IP Address must not be private.') 827 return False 828 return True 830 ############################################################ 831 def _IsCountryCode2Correct(self, country_code_2): 832 if len(country_code_2) == 0: 833 return True 834 if len(country_code_2) != 2 or not country_code_2.isalpha(): 835 self._ReportError( 836 'Country code must be in the ISO 3166-1 alpha 2 format.') 837 return False 838 return True 840 def _IsRegionCodeCorrect(self, region_code): 841 if len(region_code) == 0: 842 return True 843 if '-' not in region_code: 844 self._ReportError('Region code must be in the ISO 3166-2 format.') 845 return False 847 parts = region_code.split('-') 848 if not self._IsCountryCode2Correct(parts[0]): 849 return False 850 return True 852 ############################################################ 853 def _ReportError(self, message): 854 self._ReportWithSeverity('ERROR', message) 856 def _ReportWarning(self, message): 857 self._ReportWithSeverity('WARNING', message) 859 def _ReportWithSeverity(self, severity, message): 860 self.is_correct_line = False 861 output_line = '%s: %s\n' % (severity, message) 863 if severity not in self.output_log: 864 self.output_log[severity] = [] 865 self.output_log[severity].append(output_line) 867 if self.output_stream is not None: 868 self.output_stream.write(output_line) 870 def _FlushOutputStream(self): 871 if self.is_correct_line: return 872 if self.output_stream is None: return 874 self.output_stream.write('line %d: %s\n\n' 875 % (self.line_number, self.line)) 877 ############################################################ 878 def main(): 879 feed_validator = IPGeoFeedValidator() 880 feed_validator.Validate(sys.stdin) 882 if feed_validator.CountErrors('ERROR'): 883 sys.exit(1) 885 if __name__ == '__main__': 886 main() 888 A unit test file, "ipgeo_feed_validator_test.py" is provided as well. 889 It provides basic test coverage of the code above, though does not 890 test correct handling of non-ASCII UTF-8 strings. 892 #!/usr/bin/python 893 # 894 # Copyright (c) 2012 IETF Trust and the persons identified as authors of 895 # the code. All rights reserved. Redistribution and use in source and 896 # binary forms, with or without modification, is permitted pursuant to, 897 # and subject to the license terms contained in, the Simplified BSD 898 # License set forth in Section 4.c of the IETF Trust's Legal Provisions 899 # Relating to IETF Documents (http://trustee.ietf.org/license-info). 900 import sys 901 from ipgeo_feed_validator import IPGeoFeedValidator 903 class IPGeoFeedValidatorTest(object): 904 def __init__(self): 905 self.validator = IPGeoFeedValidator() 906 self.validator.SetOutputStream(None) 907 self.successes = 0 908 self.failures = 0 910 def Run(self): 911 self.TestFeedLine('# asdf', 0, 0) 912 self.TestFeedLine(' ', 0, 0) 913 self.TestFeedLine('', 0, 0) 915 self.TestFeedLine('asdf', 1, 1) 916 self.TestFeedLine('asdf,US,,,', 1, 0) 917 self.TestFeedLine('aaaa::,US,,,', 0, 0) 918 self.TestFeedLine('zzzz::,US', 1, 1) 919 self.TestFeedLine(',US,,,', 1, 0) 920 self.TestFeedLine('55.66.77', 1, 1) 921 self.TestFeedLine('55.66.77.888', 1, 1) 922 self.TestFeedLine('55.66.77.asdf', 1, 1) 924 self.TestFeedLine('2001:db8:cafe::/48,PL,PL-MZ,,02-784', 0, 0) 925 self.TestFeedLine('2001:db8:cafe::/48', 0, 1) 927 self.TestFeedLine('55.66.77.88,PL', 0, 1) 928 self.TestFeedLine('55.66.77.88,PL,,,', 0, 0) 929 self.TestFeedLine('55.66.77.88,,,,', 0, 0) 930 self.TestFeedLine('55.66.77.88,ZZ,,,', 0, 0) 931 self.TestFeedLine('55.66.77.88,US,,,', 0, 0) 932 self.TestFeedLine('55.66.77.88,USA,,,', 1, 0) 933 self.TestFeedLine('55.66.77.88,99,,,', 1, 0) 935 self.TestFeedLine('55.66.77.88,US,US-CA,,', 0, 0) 936 self.TestFeedLine('55.66.77.88,US,USA-CA,,', 1, 0) 937 self.TestFeedLine('55.66.77.88,USA,USA-CA,,', 2, 0) 939 self.TestFeedLine('55.66.77.88,US,US-CA,Mountain View,', 0, 0) 940 self.TestFeedLine('55.66.77.88,US,US-CA,Mountain View,94043', 0, 0) 941 self.TestFeedLine('55.66.77.88,US,US-CA,Mountain View,94043,' 942 '1600 Ampthitheatre Parkway', 0, 1) 944 self.TestFeedLine('55.66.77.0/24,US,,,', 0, 0) 945 self.TestFeedLine('55.66.77.88/24,US,,,', 1, 0) 946 self.TestFeedLine('55.66.77.88/32,US,,,', 0, 0) 947 self.TestFeedLine('55.66.77/24,US,,,', 1, 0) 948 self.TestFeedLine('55.66.77.0/35,US,,,', 1, 0) 950 self.TestFeedLine('172.15.30.1,US,,,', 0, 0) 951 self.TestFeedLine('172.28.30.1,US,,,', 1, 0) 952 self.TestFeedLine('192.167.100.1,US,,,', 0, 0) 953 self.TestFeedLine('192.168.100.1,US,,,', 1, 0) 954 self.TestFeedLine('10.0.5.9,US,,,', 1, 0) 955 self.TestFeedLine('10.0.5.0/24,US,,,', 1, 0) 956 self.TestFeedLine('fc00::/48,PL,,,', 1, 0) 957 self.TestFeedLine('fe00::/48,PL,,,', 0, 0) 959 print '%d tests passed, %d failed' % (self.successes, self.failures) 961 def IsOutputLogCorrectAtSeverity(self, severity, expected_msg_count): 962 msg_count = self.validator.CountErrors(severity) 964 if msg_count != expected_msg_count: 965 print 'TEST FAILED: %s\nexpected %d %s[s], observed %d\n%s\n' % ( 966 self.validator.line, expected_msg_count, severity, msg_count, 967 str(self.validator.output_log[severity])) 968 return False 969 return True 971 def IsOutputLogCorrect(self, new_errors, new_warnings): 972 retval = True 974 if not self.IsOutputLogCorrectAtSeverity('ERROR', new_errors): 975 retval = False 976 if not self.IsOutputLogCorrectAtSeverity('WARNING', new_warnings): 977 retval = False 979 return retval 981 def TestFeedLine(self, line, warning_count, error_count): 982 self.validator.output_log['WARNING'] = [] 983 self.validator.output_log['ERROR'] = [] 984 self.validator._ValidateLine(line) 986 if not self.IsOutputLogCorrect(warning_count, error_count): 987 self.failures += 1 988 return False 990 self.successes += 1 991 return True 993 if __name__ == '__main__': 994 IPGeoFeedValidatorTest().Run() 996 998 Authors' Addresses 1000 Erik Kline 1001 Loon LLC 1002 1600 Amphitheatre Parkway 1003 Mountain View, California 94043 1004 United States of America 1006 Email: ek@loon.com 1008 Krzysztof Duleba 1009 Google Switzerland GmbH 1010 Brandschenkestrasse 110 1011 Zuerich 8002 1012 Switzerland 1014 Email: kduleba@google.com 1016 Zoltan Szamonek 1017 Google Switzerland GmbH 1018 Brandschenkestrasse 110 1019 Zuerich 8002 1020 Switzerland 1022 Email: zszami@google.com 1024 Stefan Moser 1025 Google Switzerland GmbH 1026 Brandschenkestrasse 110 1027 Zuerich 8002 1028 Switzerland 1030 Email: smoser@google.com 1032 Warren Kumari 1033 Google 1034 1600 Amphitheatre Parkway 1035 Mountain View, CA 94043 1036 US 1038 Email: warren@kumari.net