idnits 2.17.1 draft-google-self-published-geofeeds-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 5 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. == There are 3 instances of lines with non-RFC3849-compliant IPv6 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords -- however, there's a paragraph with a matching beginning. Boilerplate error? (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document date (November 4, 2019) is 1635 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Looks like a reference, but probably isn't: '1' on line 673 -- Looks like a reference, but probably isn't: '2' on line 675 -- Looks like a reference, but probably isn't: '3' on line 677 -- Looks like a reference, but probably isn't: '4' on line 679 -- Looks like a reference, but probably isn't: '5' on line 681 -- Looks like a reference, but probably isn't: '6' on line 683 -- Looks like a reference, but probably isn't: '7' on line 685 -- Looks like a reference, but probably isn't: '8' on line 687 ** Obsolete normative reference: RFC 2616 (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235) -- Obsolete informational reference (is this intentional?): RFC 2818 (Obsoleted by RFC 9110) -- Obsolete informational reference (is this intentional?): RFC 4408 (Obsoleted by RFC 7208) -- Obsolete informational reference (is this intentional?): RFC 4627 (Obsoleted by RFC 7158, RFC 7159) Summary: 1 error (**), 0 flaws (~~), 4 warnings (==), 13 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group E. Kline 3 Internet-Draft Loon LLC 4 Intended status: Informational K. Duleba 5 Expires: May 7, 2020 Z. Szamonek 6 S. Moser 7 Google Switzerland GmbH 8 W. Kumari 9 Google 10 November 4, 2019 12 A Format for Self-published IP Geolocation Feeds 13 draft-google-self-published-geofeeds-06 15 Abstract 17 This document records a format whereby a network operator can publish 18 a mapping of IP address prefixes to simplified geolocation 19 information, colloquially termed a geolocation "feed". Interested 20 parties can poll and parse these feeds to update or merge with other 21 geolocation data sources and procedures. This format intentionally 22 only allows specifying coarse level location. 24 Some technical organizations operating networks that move from one 25 conference location to the next have already experimentally published 26 small geolocation feeds. 28 This document describes a currently deployed format. At least one 29 consumer (Google) has incorporated these feeds into a geolocation 30 data pipeline, and a significant number of ISPs are using it to 31 inform them where their prefixes should be geolocated. 33 [RFC Ed - Please remove publication: The IETF Meeting network 34 currently publishes a feed in this format at: 35 https://noc.ietf.org/geo/google.csv -- this has significantly cut 36 down on the number of "Gah! Why does the network believe I'm in 37 Montreal, that was last meeting! How am I supposed to find a pub?!" 38 complaints. A number of other meeting networks, including RIPE and 39 ICANN publish this information as well, see below. ] 41 [ Ed note: Text inside square brackets ([]) is additional background 42 information, answers to frequently asked questions, general musings, 43 etc. They will be removed before publication.] 45 [ This document is being collaborated on in Github at: 46 https://github.com/google/self-published-geo . The most recent 47 version of the document, open issues, etc should all be available 48 here. The authors (gratefully) accept pull requests ] 50 Status of This Memo 52 This Internet-Draft is submitted in full conformance with the 53 provisions of BCP 78 and BCP 79. 55 Internet-Drafts are working documents of the Internet Engineering 56 Task Force (IETF). Note that other groups may also distribute 57 working documents as Internet-Drafts. The list of current Internet- 58 Drafts is at https://datatracker.ietf.org/drafts/current/. 60 Internet-Drafts are draft documents valid for a maximum of six months 61 and may be updated, replaced, or obsoleted by other documents at any 62 time. It is inappropriate to use Internet-Drafts as reference 63 material or to cite them other than as "work in progress." 65 This Internet-Draft will expire on May 7, 2020. 67 Copyright Notice 69 Copyright (c) 2019 IETF Trust and the persons identified as the 70 document authors. All rights reserved. 72 This document is subject to BCP 78 and the IETF Trust's Legal 73 Provisions Relating to IETF Documents 74 (https://trustee.ietf.org/license-info) in effect on the date of 75 publication of this document. Please review these documents 76 carefully, as they describe your rights and restrictions with respect 77 to this document. Code Components extracted from this document must 78 include Simplified BSD License text as described in Section 4.e of 79 the Trust Legal Provisions and are provided without warranty as 80 described in the Simplified BSD License. 82 Table of Contents 84 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 85 1.1. Motivation . . . . . . . . . . . . . . . . . . . . . . . 3 86 1.2. Requirements notation . . . . . . . . . . . . . . . . . . 4 87 1.3. Assumptions about publication . . . . . . . . . . . . . . 4 88 2. Self-published IP geolocation feeds . . . . . . . . . . . . . 4 89 2.1. Specification . . . . . . . . . . . . . . . . . . . . . . 5 90 2.1.1. Geolocation feed individual entry fields . . . . . . 5 91 2.1.1.1. IP Prefix . . . . . . . . . . . . . . . . . . . . 5 92 2.1.1.2. Country . . . . . . . . . . . . . . . . . . . . . 5 93 2.1.1.3. Region . . . . . . . . . . . . . . . . . . . . . 6 94 2.1.1.4. City . . . . . . . . . . . . . . . . . . . . . . 6 95 2.1.1.5. Postal code . . . . . . . . . . . . . . . . . . . 6 96 2.1.2. Prefixes with no geolocation information . . . . . . 6 97 2.1.3. Additional parsing requirements . . . . . . . . . . . 7 99 2.2. Examples . . . . . . . . . . . . . . . . . . . . . . . . 7 100 3. Consuming self-published IP geolocation feeds . . . . . . . . 8 101 3.1. Feed integrity . . . . . . . . . . . . . . . . . . . . . 8 102 3.2. Verification of authority . . . . . . . . . . . . . . . . 8 103 3.3. Verification of accuracy . . . . . . . . . . . . . . . . 9 104 3.4. Refreshing feed information . . . . . . . . . . . . . . . 9 105 4. Privacy Considerations . . . . . . . . . . . . . . . . . . . 9 106 5. Relation to other work . . . . . . . . . . . . . . . . . . . 10 107 6. Security Considerations . . . . . . . . . . . . . . . . . . . 10 108 7. Planned future work . . . . . . . . . . . . . . . . . . . . . 11 109 8. Finding self-published IP geolocation feeds . . . . . . . . . 11 110 8.1. Ad hoc 'well known' URIs . . . . . . . . . . . . . . . . 11 111 8.2. Other mechanisms . . . . . . . . . . . . . . . . . . . . 12 112 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12 113 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 12 114 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 12 115 11.1. Normative References . . . . . . . . . . . . . . . . . . 12 116 11.2. Informative References . . . . . . . . . . . . . . . . . 13 117 11.3. URIs . . . . . . . . . . . . . . . . . . . . . . . . . . 15 118 Appendix A. Sample Python validation code . . . . . . . . . . . 15 119 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 21 121 1. Introduction 123 1.1. Motivation 125 Providers of services over the Internet have grown to depend on best- 126 effort geolocation information to improve the user experience. 127 Locality information can aid in directing traffic to the nearest 128 serving location, inferring likely native language, and providing 129 additional context for services involving search queries. 131 When an ISP, for example, changes the location where an IP prefix is 132 deployed, services which make use of geolocation information may 133 begin to suffer degraded performance. This can lead to customer 134 complaints, possibly to the ISP directly. Dissemination of correct 135 geolocation data is complicated by the lack of any centralized means 136 to coordinate and communicate geolocation information to all 137 interested consumers of the data. 139 This document records a format whereby a network operator (an ISP, an 140 enterprise, or any organization which deems the geolocation of its IP 141 prefixes to be of concern) can publish a mapping of IP address 142 prefixes to simplified geolocation information, colloquially termed a 143 "geolocation feed". Interested parties can poll and parse these 144 feeds to update or merge with other geolocation data sources and 145 procedures. 147 This document describes a currently deployed format. At least one 148 consumer (Google) has incorporated these feeds into a geolocation 149 data pipeline, and a significant number of ISPs are using it to 150 inform them where their prefixes should be geolocated. 152 1.2. Requirements notation 154 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 155 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 156 "OPTIONAL" in this document are to be interpreted as described in BCP 157 14 [RFC2119] and [RFC8174] when, and only when, they appear in all 158 capitals, as shown here. 160 As this is an informational document about a data format and set of 161 operational practices presently in use, requirements notation 162 captures the design goals of the authors and implementors. 164 1.3. Assumptions about publication 166 This document describes both a format and a mechanism for publishing 167 data, with the assumption that the network operator to whom 168 operational responsibility has been delegated for any published data 169 wishes it to be public. Any privacy risk is bounded by the format, 170 and feed publishers MAY omit prefixes or any location field 171 associated with a given prefix to further protect privacy (see 172 Section 2.1 for details about which fields exactly may be omitted). 173 Feed publishers assume the responsibility of determining which data 174 should be made public. 176 This proposal does not incorporate a mechanism to communicate 177 acceptable use policies for self-published data. Publication itself 178 is inferred as a desire by the publisher for the data to be usefully 179 consumed, similar to the publication of information like host names, 180 cryptographic keys, and SPF records [RFC4408] in the DNS. 182 2. Self-published IP geolocation feeds 184 The format described here was developed to address the need of 185 network operators to rapidly and usefully share geolocation 186 information changes. Originally, there arose a specific case where 187 regional operators found it desirable to publish location changes 188 rather than wait for geolocation algorithms to "learn" about them. 189 Later, technical conferences which frequently use the same network 190 prefixes advertised from different conference locations experimented 191 by publishing geolocation feeds, updated in advance of network 192 location changes, in order to better serve conference attendees. 194 At its simplest, the mechanism consists of a network operator 195 publishing a file (the "geolocation feed"), which contains several 196 text entries, one per line. Each entry is keyed by a unique (within 197 the feed) IP prefix (or single IP address) followed by a sequence of 198 network locality attributes to be ascribed to the given prefix. 200 2.1. Specification 202 For operational simplicity, every feed should contain data about all 203 IP addresses the provider wants to publish. Alternatives, like 204 publishing only entries for IP addresses whose geolocation data has 205 changed or differ from current observed geolocation behavior "at 206 large", are likely to be too operationally complex. 208 Feeds MUST use UTF-8 [RFC3629] character encoding. Text after a '#' 209 character is treated as a comment only and ignored. Blank lines are 210 similarly ignored. 212 Feed lines that are not comments MUST be in comma separated value 213 (CSV) format as described in [RFC4180]. Each feed entry is a text 214 line of the form: 216 ip_prefix,country,region,city,postal_code 218 The IP prefix field is REQUIRED, all others are OPTIONAL (can be 219 empty), though the requisite minimum number of commas SHOULD be 220 present. 222 2.1.1. Geolocation feed individual entry fields 224 2.1.1.1. IP Prefix 226 REQUIRED. Each IP prefix field MUST be either a single IP address or 227 an IP prefix in CIDR notation in conformance with section 3.1 [1] of 228 [RFC4632] for IPv4 or section 2.3 [2] of [RFC4291] for IPv6. 230 Examples include "192.0.2.1" and "192.0.2.0/24" for IPv4 and 231 "2001:db8::1" and "2001:db8::/32" for IPv6. 233 2.1.1.2. Country 235 OPTIONAL. The country field, if non-empty, MUST be a 2 letter ISO 236 country code conforming to ISO 3166-1 alpha 2 [ISO.3166.1alpha2]. 237 Parsers SHOULD treat this field case-insensitively. 239 Examples include "US" for the United States, "JP" for Japan, and "PL" 240 for Poland. 242 2.1.1.3. Region 244 OPTIONAL. The region field, if non-empty, MUST be a ISO region code 245 conforming to ISO 3166-2 [ISO.3166.2]. Parsers SHOULD treat this 246 field case-insensitively. 248 Examples include "ID-RI" for the Riau province of Indonesia and "NG- 249 RI" for the Rivers province in Nigeria. 251 2.1.1.4. City 253 OPTIONAL. The city field, if non-empty, SHOULD be free UTF-8 text, 254 excluding the comma (',') character. 256 Examples include "Dublin", "New York", and "Sao Paulo" (specifically 257 "S" followed by 0xc3, 0xa3, and "o Paulo"). 259 2.1.1.5. Postal code 261 OPTIONAL, DEPRECATED. The postal code field, if non-empty, SHOULD be 262 free UTF-8 text, excluding the comma (',') character. The use of 263 this field is deprecated; consumers of feeds should be able to parse 264 feeds containing these fields, but new feeds SHOULD NOT include this 265 field, due to the granularity of this information. See Section 4 for 266 additional discussion. 268 Examples include "106-6126" (in Minato ward, Tokyo, Japan). 270 2.1.2. Prefixes with no geolocation information 272 Feed publishers may indicate that some IP prefixes should not have 273 any associated geolocation information. It may be that some prefixes 274 under their administrative control are reserved, not yet allocated or 275 deployed, or are in the process of being redeployed elsewhere and 276 existing geolocation information can, from the perspective of the 277 publisher, safely be discarded. 279 This special case can be indicated by explicitly leaving blank all 280 fields which specify any degree of geolocation information. For 281 example: 283 192.0.2.0/24,,,, 284 2001:db8:1::/48,,,, 285 2001:db8:2::/48,,,, 287 Historically, the user-assigned country identifier of "ZZ" had be 288 used for this same purpose. This is not necessarily preferred, and 289 no specific interpretation of any of the other user-assigned country 290 codes is currently defined. 292 2.1.3. Additional parsing requirements 294 Feed entries missing required fields, or having a required field 295 which fails to parse correctly MUST be discarded. It is RECOMMENDED 296 that such entries also be logged for further administrative review. 298 While publishers SHOULD follow [RFC5952] style for IPv6 prefix 299 fields, consumers MUST nevertheless accept all valid string 300 representations. 302 Duplicate IP address or prefix entries MUST be considered an error, 303 and consumer implementations SHOULD log the repeated entries for 304 further administrative review. Publishers SHOULD take measures to 305 ensure there is one and only one entry per IP address and prefix. 307 Feed entries with non-empty optional fields which fail to parse, 308 either in part or in full, SHOULD be discarded. It is RECOMMENDED 309 that they also be logged for further administrative review. 311 For compatibility with future additional fields, a parser MUST ignore 312 any fields beyond those it expects. The data from fields which are 313 expected and which parse successfully MUST still be considered valid. 315 Multiple entries which constitute nested prefixes are permitted. 316 Consumers SHOULD consider the entry with the longest matching prefix 317 (i.e. the "most specific") to be the best matching entry for a given 318 IP address. 320 2.2. Examples 322 Example entries using different IP address formats and describing 323 locations at country, region, and city granularity level, 324 respectively: 326 192.0.2.0/25,US,US-AL,, 327 192.0.2.5,US,US-AL,Alabaster, 328 192.0.2.128/25,PL,PL-MZ,, 329 2001:db8::/32,PL,,, 330 2001:db8:cafe::/48,PL,PL-MZ,, 332 The IETF network publishes geolocation information for the meeting 333 prefixes, and generally just comment out the last meeting information 334 and append the new meeting information. The [GEO_IETF] at the time 335 of this writing contains: 337 # IETF106 (Singapore) - November 2019 - Singapore, SG 338 130.129.0.0/16,SG,SG-01,Singapore, 339 2001:df8::/32,SG,SG-01,Singapore, 340 31.133.128.0/18,SG,SG-01,Singapore, 341 31.130.224.0/20,SG,SG-01,Singapore, 342 2001:67c:1230::/46,SG,SG-01,Singapore, 343 2001:67c:370::/48,SG,SG-01,Singapore, 345 Experimentally, RIPE has published geolocation information for their 346 conference network prefixes, which change location in accordance with 347 each new event. [GEO_RIPE_NCC] at the time of writing contains: 349 193.0.24.0/21,NL,NL-ZH,Rotterdam, 350 2001:67c:64::/48,NL,NL-ZH,Rotterdam, 352 Similarly, ICANN has published geolocation information for their 353 portable conference network prefixes. [GEO_ICANN] at the time of 354 writing contains: 356 199.91.192.0/21,MA,MA-07,Marrakech 357 2620:f:8000::/48,MA,MA-07,Marrakech 359 A longer example is the [GEO_Google] Google Corp Geofeed, which lists 360 the geo-location information for Google corporate offices. 362 At the time of writing, Google processes approximately 400 feeds 363 comprising more than 750,000 IPv4 and IPv6 prefixes. 365 3. Consuming self-published IP geolocation feeds 367 Consumers MAY treat published feed data as a hint only and MAY choose 368 to prefer other sources of geolocation information for any given IP 369 prefix. Regardless of a consumer's stance with respect to a given 370 published feed, there are some points of note for sensibly and 371 effectively consuming published feeds. 373 3.1. Feed integrity 375 The integrity of published information SHOULD be protected by 376 securing the means of publication, for example by using HTTP over TLS 377 [RFC2818]. Whenever possible, consumers SHOULD prefer retrieving 378 geolocation feeds in a manner that guarantees integrity of the feed. 380 3.2. Verification of authority 382 Consumers of self-published IP geolocation feeds SHOULD perform some 383 form of verification that the publisher is in fact authoritative for 384 the addresses in the feed. The actual means of verification is 385 likely dependent upon the way in which the feed is discovered. Ad 386 hoc shared URIs, for example, will likely require an ad hoc 387 verification process. Future automated means of feed discovery 388 SHOULD have an accompanying automated means of verification. 390 A consumer should only trust geolocation information for IP addresses 391 or prefixes for which the publisher has been verified as 392 administratively authoritative. All other geolocation feed entries 393 should be ignored and logged for further administrative review. 395 3.3. Verification of accuracy 397 Errors and inaccuracies may occur at many levels, and publication and 398 consumption of geolocation data are no exceptions. To the extent 399 practical, consumers SHOULD take steps to verify the accuracy of 400 published locality. Verification methodology, resolution of 401 discrepancies, and preference for alternative sources of data are 402 left to the discretion of the feed consumer. 404 Consumers SHOULD decide on discrepancy thresholds and SHOULD flag for 405 administrative review feed entries which exceed set thresholds. 407 3.4. Refreshing feed information 409 As a publisher can change geolocation data at any time and without 410 notification, consumers SHOULD implement mechanisms to periodically 411 refresh local copies of feed data. In the absence of any other 412 refresh timing information, it is recommended that consumers SHOULD 413 refresh feeds no less often than weekly. 415 For feeds available via HTTPS (or HTTP), the publisher MAY 416 communicate refresh timing information by means of the standard HTTP 417 expiration model (section 13.2 [3] of [RFC2616]). Specifically, 418 publishers can include either an Expires header [4] or a Cache- 419 Control header [5] specifying the max-age. Where practical, 420 consumers SHOULD refresh feed information before the expiry time is 421 reached. 423 4. Privacy Considerations 425 Publishers of geolocation feeds are advised to have fully considered 426 any and all privacy implications of the disclosure of such 427 information for the users of the described networks prior to 428 publication. A thorough comprehension of the security considerations 429 [6] of a chosen geolocation policy is highly recommended, including 430 an understanding of some of the limitations of information obscurity 431 [7] (see also [RFC6772]). 433 As noted in Section 2.1, each location field in an entry is optional, 434 in order to support expressing only the level of specificity which 435 the publisher has deemed acceptable. There is no requirement that 436 the level of specificity be consistent across all entries within a 437 feed. In particular, the Postal Code field (Section 2.1.1.5) can 438 provide very specific geolocation, sometimes within a building. Such 439 specific Postal Code values MUST NOT be published in geo feeds 440 without the express consent of the parties being located. 442 5. Relation to other work 444 While not originally done in conjunction with the [GEOPRIV] working 445 group, Richard Barnes observed that this work is nevertheless 446 consistent with that which the group has defined, both for address 447 format and for privacy. The data elements in geolocation feeds are 448 equivalent to the following XML structure (vis. [RFC5139]): 450 451 country 452 region 453 city 454 postal_code 455 457 Providing geolocation information to this granularity is equivalent 458 to the following privacy policy (vis. the definition of the 459 'building' [8] level of disclosure): 461 462 463 464 465 466 467 building 468 469 470 471 473 6. Security Considerations 475 As there is no true security in the obscurity of the location of any 476 given IP address, self-publication of this data fundamentally opens 477 no new attack vectors. For publishers, self-published data may 478 increase the ease with which such location data might be exploited 479 (it can, for example, make easy the discovery of prefixes populated 480 with customers as distinct from prefixes not generally in use). 482 For consumers, feed retrieval processes may receive input from 483 potentially hostile sources (e.g. in the event of hijacked traffic). 484 As such, proper input validation and defense measures MUST be taken. 486 Similarly, consumers who do not perform sufficient verification of 487 published data bear the same risks as from other forms of geolocation 488 configuration errors. 490 Validation of a feed's contents includes verifying that the publisher 491 is authoritative for the IP prefixes included in the feed. Failure 492 to verify IP prefix authority would, for example, allow ISP Bob to 493 make geolocation statements about IP space held by ISP Alice. At 494 this time only out-of-band verification methods are implemented (i.e. 495 an ISP's feed may be verified against publicly available IP 496 allocation data). 498 7. Planned future work 500 In order to more flexibly support future extensions, use of a more 501 expressive feed format has been suggested. Use of JavaScript Object 502 Notation (JSON, [RFC4627]), specifically, has been discussed. 503 However, at the time of writing no such specification nor 504 implementation exists. Nevertheless, work on extensions is deferred 505 until a more suitable format has been selected 507 The authors are planning on writing a document describing such a new 508 format. The current document describes a currently deployed and used 509 format. 511 8. Finding self-published IP geolocation feeds 513 The issue of finding, and later verifying, geolocation feeds is not 514 formally specified in this document. At this time, only ad hoc feed 515 discovery and verification has a modicum of established practice (see 516 below); discussion of other mechanisms has been removed for clarity. 518 8.1. Ad hoc 'well known' URIs 520 To date, geolocation feeds have been shared informally in the form of 521 HTTPS URIs exchanged in email threads. The two example URIs 522 documented above describe networks that change locations 523 periodically, the operators and operational practices of which are 524 well known within their respective technical communities. 526 The contents of the feeds are verified by a similarly ad hoc process 527 including: 529 o personal knowledge of the parties involved in the exchange, and 530 o comparison of feed-advertised prefixes with the BGP-advertised 531 prefixes of Autonomous System Numbers known to be operated by the 532 publishers. 534 Ad hoc mechanisms, while useful for early experimentation by 535 producers and consumers, are unlikely to be adequate for long-term, 536 widespread use by multiple parties. Future versions of any such 537 self-published geolocation feed mechanism SHOULD address scalability 538 concerns by defining a means for automated discovery and verification 539 of operational authority of advertised prefixes. 541 8.2. Other mechanisms 543 Previous versions of this document referenced use of the WHOIS 544 service ([RFC3912]) operated by RIRs as well as possible DNS-based 545 schemes to discover and validate geofeeds. To the authors' knowledge 546 support for such mechanisms has never been implemented, and this 547 speculative text has been removed to avoid ambiguity. 549 9. IANA Considerations 551 This document makes no requests of the IANA. 553 10. Acknowledgements 555 The authors would like to express their gratitude to reviewers and 556 early implementers, including but not limited to Mikael Abrahamsson, 557 Andrew Alston, Ray Bellis, John Bond, Alissa Cooper, Andras Erdei, 558 Stephen Farrell, Marco Hogewoning, Mike Joseph, Maciej Kuzniar, Menno 559 Schepers, Justyna Sidorska, Pim van Pelt, and Bjoern A. Zeeb. 561 Richard L. Barnes and Andy Newton in particular contributed 562 substantial review, text, and advice. 564 11. References 566 11.1. Normative References 568 [ISO.3166.1alpha2] 569 agency, I. 3. M., "ISO 3166-1 decoding table", 570 . 573 [ISO.3166.2] 574 agency, I. 3. M., "ISO 3166-2:2007", 575 . 578 [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., 579 Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext 580 Transfer Protocol -- HTTP/1.1", RFC 2616, 581 DOI 10.17487/RFC2616, June 1999, 582 . 584 [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO 585 10646", STD 63, RFC 3629, DOI 10.17487/RFC3629, November 586 2003, . 588 [RFC4291] Hinden, R. and S. Deering, "IP Version 6 Addressing 589 Architecture", RFC 4291, DOI 10.17487/RFC4291, February 590 2006, . 592 [RFC4632] Fuller, V. and T. Li, "Classless Inter-domain Routing 593 (CIDR): The Internet Address Assignment and Aggregation 594 Plan", BCP 122, RFC 4632, DOI 10.17487/RFC4632, August 595 2006, . 597 11.2. Informative References 599 [GEO_Google] 600 Google, LLC, "Google Corp Geofeed", 601 . 603 [GEO_ICANN] 604 Internet Corporation For Assigned Names and Numbers, 605 "ICANN Meeting Geolocation Data", 606 . 608 [GEO_IETF] 609 Kumari, A., "IETF Meeting Network Geolocation Data", 610 . 612 [GEO_RIPE_NCC] 613 Schepers, M., "RIPE NCC Meeting Geolocation Data", 614 . 616 [GEOPRIV] Internet Engineering Task Force, "IETF geopriv Working 617 Group", . 619 [IPADDR_PY] 620 Shields, M. and P. Moody, "Python IP address manipulation 621 library", . 623 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 624 Requirement Levels", BCP 14, RFC 2119, 625 DOI 10.17487/RFC2119, March 1997, 626 . 628 [RFC2818] Rescorla, E., "HTTP Over TLS", RFC 2818, 629 DOI 10.17487/RFC2818, May 2000, 630 . 632 [RFC3912] Daigle, L., "WHOIS Protocol Specification", RFC 3912, 633 DOI 10.17487/RFC3912, September 2004, 634 . 636 [RFC4180] Shafranovich, Y., "Common Format and MIME Type for Comma- 637 Separated Values (CSV) Files", RFC 4180, 638 DOI 10.17487/RFC4180, October 2005, 639 . 641 [RFC4408] Wong, M. and W. Schlitt, "Sender Policy Framework (SPF) 642 for Authorizing Use of Domains in E-Mail, Version 1", 643 RFC 4408, DOI 10.17487/RFC4408, April 2006, 644 . 646 [RFC4627] Crockford, D., "The application/json Media Type for 647 JavaScript Object Notation (JSON)", RFC 4627, 648 DOI 10.17487/RFC4627, July 2006, 649 . 651 [RFC5139] Thomson, M. and J. Winterbottom, "Revised Civic Location 652 Format for Presence Information Data Format Location 653 Object (PIDF-LO)", RFC 5139, DOI 10.17487/RFC5139, 654 February 2008, . 656 [RFC5952] Kawamura, S. and M. Kawashima, "A Recommendation for IPv6 657 Address Text Representation", RFC 5952, 658 DOI 10.17487/RFC5952, August 2010, 659 . 661 [RFC6772] Schulzrinne, H., Ed., Tschofenig, H., Ed., Cuellar, J., 662 Polk, J., Morris, J., and M. Thomson, "Geolocation Policy: 663 A Document Format for Expressing Privacy Preferences for 664 Location Information", RFC 6772, DOI 10.17487/RFC6772, 665 January 2013, . 667 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 668 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 669 May 2017, . 671 11.3. URIs 673 [1] http://tools.ietf.org/html/rfc4632#section-3.1 675 [2] http://tools.ietf.org/html/rfc4291#section-2.3 677 [3] http://tools.ietf.org/html/rfc2616#section-13.2 679 [4] http://tools.ietf.org/html/rfc2616#section-14.21 681 [5] http://tools.ietf.org/html/rfc2616#section-14.9 683 [6] http://tools.ietf.org/html/rfc6772#section-13 685 [7] http://tools.ietf.org/html/rfc6772#section-13.5 687 [8] http://tools.ietf.org/html/rfc6772#section-6.5.1 689 Appendix A. Sample Python validation code 691 Included here is a simple format validator in Python for self- 692 published ipgeo feeds. This tool reads CSV data in the self- 693 published ipgeo feed format from the standard input and performs 694 basic validation. It is intended for use by feed publishers before 695 launching a feed. Note that this validator does not verify the 696 uniqueness of every IP prefix entry within the feed as a whole, but 697 only verifies the syntax of each single line from within the feed. A 698 complete validator MUST also ensure IP prefix uniqueness. 700 The main source file "ipgeo_feed_validator.py" follows. It requires 701 use of the open source ipaddr Python library for IP address and CIDR 702 parsing and validation [IPADDR_PY]. 704 705 #!/usr/bin/python 706 # 707 # Copyright (c) 2012 IETF Trust and the persons identified as authors of 708 # the code. All rights reserved. Redistribution and use in source and 709 # binary forms, with or without modification, is permitted pursuant to, 710 # and subject to the license terms contained in, the Simplified BSD 711 # License set forth in Section 4.c of the IETF Trust's Legal Provisions 712 # Relating to IETF Documents (http://trustee.ietf.org/license-info). 714 """Simple format validator for self-published ipgeo feeds. 716 This tool reads CSV data in the self-published ipgeo feed format from 717 the standard input and performs basic validation. It is intended for 718 use by feed publishers before launching a feed. 719 """ 721 import csv 722 import ipaddr 723 import re 724 import sys 726 class IPGeoFeedValidator(object): 727 def __init__(self): 728 self.prefixes = {} 729 self.line_number = 0 730 self.output_log = {} 731 self.SetOutputStream(sys.stderr) 733 def Validate(self, feed): 734 """Check validity of an IPGeo feed. 736 Args: 737 feed: iterable with feed lines 738 """ 740 for line in feed: 741 self._ValidateLine(line) 743 def SetOutputStream(self, logfile): 744 """Controls where the output messages go do (STDERR by default). 746 Use None to disable logging. 748 Args: 749 logfile: a file object (e.g., sys.stdout or sys.stderr) or None. 750 """ 751 self.output_stream = logfile 753 def CountErrors(self, severity): 754 """How many ERRORs or WARNINGs were generated.""" 755 return len(self.output_log.get(severity, [])) 757 ############################################################ 758 def _ValidateLine(self, line): 759 line = line.rstrip('\r\n') 760 self.line_number += 1 761 self.line = line.split('#')[0] 762 self.is_correct_line = True 764 if self._ShouldIgnoreLine(line): 766 return 768 fields = [field for field in csv.reader([line])][0] 770 self._ValidateFields(fields) 771 self._FlushOutputStream() 773 def _ShouldIgnoreLine(self, line): 774 line = line.strip() 775 return len(line) == 0 777 ############################################################ 778 def _ValidateFields(self, fields): 779 assert(len(fields) > 0) 781 is_correct = self._IsIPAddressOrPrefixCorrect(fields[0]) 783 if len(fields) > 1: 784 if not self._IsCountryCode2Correct(fields[1]): 785 is_correct = False 787 if len(fields) > 2 and not self._IsRegionCodeCorrect(fields[2]): 788 is_correct = False 790 if len(fields) != 5: 791 self._ReportWarning('5 fields were expected (got %d).' 792 % len(fields)) 794 ############################################################ 795 def _IsIPAddressOrPrefixCorrect(self, field): 796 if '/' in field: 797 return self._IsCIDRCorrect(field) 798 return self._IsIPAddressCorrect(field) 800 def _IsCIDRCorrect(self, cidr): 801 try: 802 ipprefix = ipaddr.IPNetwork(cidr) 803 if ipprefix.network._ip != ipprefix._ip: 804 self._ReportError('Incorrect IP Network.') 805 return False 806 if ipprefix.is_private: 807 self._ReportError('IP Address must not be private.') 808 return False 809 except: 810 self._ReportError('Incorrect IP Network.') 811 return False 812 return True 814 def _IsIPAddressCorrect(self, ipaddress): 815 try: 816 ip = ipaddr.IPAddress(ipaddress) 817 except: 818 self._ReportError('Incorrect IP Address.') 819 return False 820 if ip.is_private: 821 self._ReportError('IP Address must not be private.') 822 return False 823 return True 825 ############################################################ 826 def _IsCountryCode2Correct(self, country_code_2): 827 if len(country_code_2) == 0: 828 return True 829 if len(country_code_2) != 2 or not country_code_2.isalpha(): 830 self._ReportError( 831 'Country code must be in the ISO 3166-1 alpha 2 format.') 832 return False 833 return True 835 def _IsRegionCodeCorrect(self, region_code): 836 if len(region_code) == 0: 837 return True 838 if '-' not in region_code: 839 self._ReportError('Region code must be in the ISO 3166-2 format.') 840 return False 842 parts = region_code.split('-') 843 if not self._IsCountryCode2Correct(parts[0]): 844 return False 845 return True 847 ############################################################ 848 def _ReportError(self, message): 849 self._ReportWithSeverity('ERROR', message) 851 def _ReportWarning(self, message): 852 self._ReportWithSeverity('WARNING', message) 854 def _ReportWithSeverity(self, severity, message): 855 self.is_correct_line = False 856 output_line = '%s: %s\n' % (severity, message) 858 if severity not in self.output_log: 859 self.output_log[severity] = [] 860 self.output_log[severity].append(output_line) 861 if self.output_stream is not None: 862 self.output_stream.write(output_line) 864 def _FlushOutputStream(self): 865 if self.is_correct_line: return 866 if self.output_stream is None: return 868 self.output_stream.write('line %d: %s\n\n' 869 % (self.line_number, self.line)) 871 ############################################################ 872 def main(): 873 feed_validator = IPGeoFeedValidator() 874 feed_validator.Validate(sys.stdin) 876 if feed_validator.CountErrors('ERROR'): 877 sys.exit(1) 879 if __name__ == '__main__': 880 main() 882 A unit test file, "ipgeo_feed_validator_test.py" is provided as well. 883 It provides basic test coverage of the code above, though does not 884 test correct handling of non-ASCII UTF-8 strings. 886 #!/usr/bin/python 887 # 888 # Copyright (c) 2012 IETF Trust and the persons identified as authors of 889 # the code. All rights reserved. Redistribution and use in source and 890 # binary forms, with or without modification, is permitted pursuant to, 891 # and subject to the license terms contained in, the Simplified BSD 892 # License set forth in Section 4.c of the IETF Trust's Legal Provisions 893 # Relating to IETF Documents (http://trustee.ietf.org/license-info). 895 import sys 896 from ipgeo_feed_validator import IPGeoFeedValidator 898 class IPGeoFeedValidatorTest(object): 899 def __init__(self): 900 self.validator = IPGeoFeedValidator() 901 self.validator.SetOutputStream(None) 902 self.successes = 0 903 self.failures = 0 905 def Run(self): 906 self.TestFeedLine('# asdf', 0, 0) 907 self.TestFeedLine(' ', 0, 0) 908 self.TestFeedLine('', 0, 0) 910 self.TestFeedLine('asdf', 1, 1) 911 self.TestFeedLine('asdf,US,,,', 1, 0) 912 self.TestFeedLine('aaaa::,US,,,', 0, 0) 913 self.TestFeedLine('zzzz::,US', 1, 1) 914 self.TestFeedLine(',US,,,', 1, 0) 915 self.TestFeedLine('55.66.77', 1, 1) 916 self.TestFeedLine('55.66.77.888', 1, 1) 917 self.TestFeedLine('55.66.77.asdf', 1, 1) 919 self.TestFeedLine('2001:db8:cafe::/48,PL,PL-MZ,,02-784', 0, 0) 920 self.TestFeedLine('2001:db8:cafe::/48', 0, 1) 922 self.TestFeedLine('55.66.77.88,PL', 0, 1) 923 self.TestFeedLine('55.66.77.88,PL,,,', 0, 0) 924 self.TestFeedLine('55.66.77.88,,,,', 0, 0) 925 self.TestFeedLine('55.66.77.88,ZZ,,,', 0, 0) 926 self.TestFeedLine('55.66.77.88,US,,,', 0, 0) 927 self.TestFeedLine('55.66.77.88,USA,,,', 1, 0) 928 self.TestFeedLine('55.66.77.88,99,,,', 1, 0) 930 self.TestFeedLine('55.66.77.88,US,US-CA,,', 0, 0) 931 self.TestFeedLine('55.66.77.88,US,USA-CA,,', 1, 0) 932 self.TestFeedLine('55.66.77.88,USA,USA-CA,,', 2, 0) 934 self.TestFeedLine('55.66.77.88,US,US-CA,Mountain View,', 0, 0) 935 self.TestFeedLine('55.66.77.88,US,US-CA,Mountain View,94043', 0, 0) 936 self.TestFeedLine('55.66.77.88,US,US-CA,Mountain View,94043,' 937 '1600 Ampthitheatre Parkway', 0, 1) 939 self.TestFeedLine('55.66.77.0/24,US,,,', 0, 0) 940 self.TestFeedLine('55.66.77.88/24,US,,,', 1, 0) 941 self.TestFeedLine('55.66.77.88/32,US,,,', 0, 0) 942 self.TestFeedLine('55.66.77/24,US,,,', 1, 0) 943 self.TestFeedLine('55.66.77.0/35,US,,,', 1, 0) 945 self.TestFeedLine('172.15.30.1,US,,,', 0, 0) 946 self.TestFeedLine('172.28.30.1,US,,,', 1, 0) 947 self.TestFeedLine('192.167.100.1,US,,,', 0, 0) 948 self.TestFeedLine('192.168.100.1,US,,,', 1, 0) 949 self.TestFeedLine('10.0.5.9,US,,,', 1, 0) 950 self.TestFeedLine('10.0.5.0/24,US,,,', 1, 0) 951 self.TestFeedLine('fc00::/48,PL,,,', 1, 0) 952 self.TestFeedLine('fe00::/48,PL,,,', 0, 0) 954 print '%d tests passed, %d failed' % (self.successes, self.failures) 956 def IsOutputLogCorrectAtSeverity(self, severity, expected_msg_count): 957 msg_count = self.validator.CountErrors(severity) 959 if msg_count != expected_msg_count: 960 print 'TEST FAILED: %s\nexpected %d %s[s], observed %d\n%s\n' % ( 961 self.validator.line, expected_msg_count, severity, msg_count, 962 str(self.validator.output_log[severity])) 963 return False 964 return True 966 def IsOutputLogCorrect(self, new_errors, new_warnings): 967 retval = True 969 if not self.IsOutputLogCorrectAtSeverity('ERROR', new_errors): 970 retval = False 971 if not self.IsOutputLogCorrectAtSeverity('WARNING', new_warnings): 972 retval = False 974 return retval 976 def TestFeedLine(self, line, warning_count, error_count): 977 self.validator.output_log['WARNING'] = [] 978 self.validator.output_log['ERROR'] = [] 979 self.validator._ValidateLine(line) 981 if not self.IsOutputLogCorrect(warning_count, error_count): 982 self.failures += 1 983 return False 985 self.successes += 1 986 return True 988 if __name__ == '__main__': 989 IPGeoFeedValidatorTest().Run() 990 992 Authors' Addresses 994 Erik Kline 995 Loon LLC 996 1600 Amphitheatre Parkway 997 Mountain View, California 94043 998 United States of America 1000 Email: ek@loon.com 1001 Krzysztof Duleba 1002 Google Switzerland GmbH 1003 Brandschenkestrasse 110 1004 Zuerich 8002 1005 Switzerland 1007 Email: kduleba@google.com 1009 Zoltan Szamonek 1010 Google Switzerland GmbH 1011 Brandschenkestrasse 110 1012 Zuerich 8002 1013 Switzerland 1015 Email: zszami@google.com 1017 Stefan Moser 1018 Google Switzerland GmbH 1019 Brandschenkestrasse 110 1020 Zuerich 8002 1021 Switzerland 1023 Email: smoser@google.com 1025 Warren Kumari 1026 Google 1027 1600 Amphitheatre Parkway 1028 Mountain View, CA 94043 1029 US 1031 Email: warren@kumari.net