idnits 2.17.1 draft-google-self-published-geofeeds-09.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 5 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. == There are 3 instances of lines with non-RFC3849-compliant IPv6 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords -- however, there's a paragraph with a matching beginning. Boilerplate error? (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document date (February 7, 2020) is 1532 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Looks like a reference, but probably isn't: '1' on line 682 -- Looks like a reference, but probably isn't: '2' on line 684 -- Looks like a reference, but probably isn't: '3' on line 686 -- Looks like a reference, but probably isn't: '4' on line 688 -- Looks like a reference, but probably isn't: '5' on line 690 -- Looks like a reference, but probably isn't: '6' on line 692 -- Looks like a reference, but probably isn't: '7' on line 694 -- Looks like a reference, but probably isn't: '8' on line 696 -- Looks like a reference, but probably isn't: '9' on line 698 -- Looks like a reference, but probably isn't: '10' on line 700 -- Looks like a reference, but probably isn't: '11' on line 702 ** Obsolete normative reference: RFC 2616 (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235) -- Obsolete informational reference (is this intentional?): RFC 2818 (Obsoleted by RFC 9110) -- Obsolete informational reference (is this intentional?): RFC 4408 (Obsoleted by RFC 7208) -- Obsolete informational reference (is this intentional?): RFC 4627 (Obsoleted by RFC 7158, RFC 7159) Summary: 1 error (**), 0 flaws (~~), 4 warnings (==), 16 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group E. Kline 3 Internet-Draft Loon LLC 4 Intended status: Informational K. Duleba 5 Expires: August 10, 2020 Z. Szamonek 6 S. Moser 7 Google Switzerland GmbH 8 W. Kumari 9 Google 10 February 7, 2020 12 A Format for Self-published IP Geolocation Feeds 13 draft-google-self-published-geofeeds-09 15 Abstract 17 This document records a format whereby a network operator can publish 18 a mapping of IP address prefixes to simplified geolocation 19 information, colloquially termed a geolocation "feed". Interested 20 parties can poll and parse these feeds to update or merge with other 21 geolocation data sources and procedures. This format intentionally 22 only allows specifying coarse level location. 24 Some technical organizations operating networks that move from one 25 conference location to the next have already experimentally published 26 small geolocation feeds. 28 This document describes a currently deployed format. At least one 29 consumer (Google) has incorporated these feeds into a geolocation 30 data pipeline, and a significant number of ISPs are using it to 31 inform them where their prefixes should be geolocated. 33 Status of This Memo 35 This Internet-Draft is submitted in full conformance with the 36 provisions of BCP 78 and BCP 79. 38 Internet-Drafts are working documents of the Internet Engineering 39 Task Force (IETF). Note that other groups may also distribute 40 working documents as Internet-Drafts. The list of current Internet- 41 Drafts is at https://datatracker.ietf.org/drafts/current/. 43 Internet-Drafts are draft documents valid for a maximum of six months 44 and may be updated, replaced, or obsoleted by other documents at any 45 time. It is inappropriate to use Internet-Drafts as reference 46 material or to cite them other than as "work in progress." 48 This Internet-Draft will expire on August 10, 2020. 50 Copyright Notice 52 Copyright (c) 2020 IETF Trust and the persons identified as the 53 document authors. All rights reserved. 55 This document is subject to BCP 78 and the IETF Trust's Legal 56 Provisions Relating to IETF Documents 57 (https://trustee.ietf.org/license-info) in effect on the date of 58 publication of this document. Please review these documents 59 carefully, as they describe your rights and restrictions with respect 60 to this document. Code Components extracted from this document must 61 include Simplified BSD License text as described in Section 4.e of 62 the Trust Legal Provisions and are provided without warranty as 63 described in the Simplified BSD License. 65 Table of Contents 67 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 68 1.1. Motivation . . . . . . . . . . . . . . . . . . . . . . . 3 69 1.2. Requirements Notation . . . . . . . . . . . . . . . . . . 3 70 1.3. Assumptions About Publication . . . . . . . . . . . . . . 4 71 2. Self-Published IP Geolocation Feeds . . . . . . . . . . . . . 4 72 2.1. Specification . . . . . . . . . . . . . . . . . . . . . . 4 73 2.1.1. Geolocation Feed Individual Entry Fields . . . . . . 5 74 2.1.1.1. IP Prefix . . . . . . . . . . . . . . . . . . . . 5 75 2.1.1.2. Alpha2code (previously: 'country') . . . . . . . 5 76 2.1.1.3. Region . . . . . . . . . . . . . . . . . . . . . 5 77 2.1.1.4. City . . . . . . . . . . . . . . . . . . . . . . 6 78 2.1.1.5. Postal Code . . . . . . . . . . . . . . . . . . . 6 79 2.1.2. Prefixes With No Geolocation Information . . . . . . 6 80 2.1.3. Additional Parsing Requirements . . . . . . . . . . . 7 81 2.2. Examples . . . . . . . . . . . . . . . . . . . . . . . . 7 82 3. Consuming Self-Published IP Geolocation Feeds . . . . . . . . 8 83 3.1. Feed Integrity . . . . . . . . . . . . . . . . . . . . . 8 84 3.2. Verification of Authority . . . . . . . . . . . . . . . . 8 85 3.3. Verification of Accuracy . . . . . . . . . . . . . . . . 9 86 3.4. Refreshing Feed Information . . . . . . . . . . . . . . . 9 87 4. Privacy Considerations . . . . . . . . . . . . . . . . . . . 9 88 5. Relation to Other Work . . . . . . . . . . . . . . . . . . . 10 89 6. Security Considerations . . . . . . . . . . . . . . . . . . . 11 90 7. Planned Future Work . . . . . . . . . . . . . . . . . . . . . 11 91 8. Finding Self-Published IP Geolocation Feeds . . . . . . . . . 11 92 8.1. Ad Hoc 'Well Known' URIs . . . . . . . . . . . . . . . . 12 93 8.2. Other Mechanisms . . . . . . . . . . . . . . . . . . . . 12 94 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12 95 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 12 96 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 13 97 11.1. Normative References . . . . . . . . . . . . . . . . . . 13 98 11.2. Informative References . . . . . . . . . . . . . . . . . 14 99 11.3. URIs . . . . . . . . . . . . . . . . . . . . . . . . . . 15 100 Appendix A. Sample Python Validation Code . . . . . . . . . . . 15 101 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 22 103 1. Introduction 105 1.1. Motivation 107 Providers of services over the Internet have grown to depend on best- 108 effort geolocation information to improve the user experience. 109 Locality information can aid in directing traffic to the nearest 110 serving location, inferring likely native language, and providing 111 additional context for services involving search queries. 113 When an ISP, for example, changes the location where an IP prefix is 114 deployed, services which make use of geolocation information may 115 begin to suffer degraded performance. This can lead to customer 116 complaints, possibly to the ISP directly. Dissemination of correct 117 geolocation data is complicated by the lack of any centralized means 118 to coordinate and communicate geolocation information to all 119 interested consumers of the data. 121 This document records a format whereby a network operator (an ISP, an 122 enterprise, or any organization which deems the geolocation of its IP 123 prefixes to be of concern) can publish a mapping of IP address 124 prefixes to simplified geolocation information, colloquially termed a 125 "geolocation feed". Interested parties can poll and parse these 126 feeds to update or merge with other geolocation data sources and 127 procedures. 129 This document describes a currently deployed format. At least one 130 consumer (Google) has incorporated these feeds into a geolocation 131 data pipeline, and a significant number of ISPs are using it to 132 inform them where their prefixes should be geolocated. 134 1.2. Requirements Notation 136 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 137 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 138 "OPTIONAL" in this document are to be interpreted as described in BCP 139 14 [RFC2119] and [RFC8174] when, and only when, they appear in all 140 capitals, as shown here. 142 As this is an informational document about a data format and set of 143 operational practices presently in use, requirements notation 144 captures the design goals of the authors and implementors. 146 1.3. Assumptions About Publication 148 This document describes both a format and a mechanism for publishing 149 data, with the assumption that the network operator to whom 150 operational responsibility has been delegated for any published data 151 wishes it to be public. Any privacy risk is bounded by the format, 152 and feed publishers MAY omit prefixes or any location field 153 associated with a given prefix to further protect privacy (see 154 Section 2.1 for details about which fields exactly may be omitted). 155 Feed publishers assume the responsibility of determining which data 156 should be made public. 158 This document does not incorporate a mechanism to communicate 159 acceptable use policies for self-published data. Publication itself 160 is inferred as a desire by the publisher for the data to be usefully 161 consumed, similar to the publication of information like host names, 162 cryptographic keys, and SPF records [RFC4408] in the DNS. 164 2. Self-Published IP Geolocation Feeds 166 The format described here was developed to address the need of 167 network operators to rapidly and usefully share geolocation 168 information changes. Originally, there arose a specific case where 169 regional operators found it desirable to publish location changes 170 rather than wait for geolocation algorithms to "learn" about them. 171 Later, technical conferences which frequently use the same network 172 prefixes advertised from different conference locations experimented 173 by publishing geolocation feeds, updated in advance of network 174 location changes, in order to better serve conference attendees. 176 At its simplest, the mechanism consists of a network operator 177 publishing a file (the "geolocation feed"), which contains several 178 text entries, one per line. Each entry is keyed by a unique (within 179 the feed) IP prefix (or single IP address) followed by a sequence of 180 network locality attributes to be ascribed to the given prefix. 182 2.1. Specification 184 For operational simplicity, every feed should contain data about all 185 IP addresses the provider wants to publish. Alternatives, like 186 publishing only entries for IP addresses whose geolocation data has 187 changed or differ from current observed geolocation behavior "at 188 large", are likely to be too operationally complex. 190 Feeds MUST use UTF-8 [RFC3629] character encoding. Lines are 191 delimited by a line break (CRLF) (as specifed in [RFC4180]), and 192 blank lines are ignored. Text from a '#' character to the end of the 193 current line is treated as a comment only and is similarily ignored 194 (note that this does not stricly follow [RFC4180], which has no 195 support for comments). 197 Feed lines that are not comments MUST be in comma separated value 198 (CSV) format as described in [RFC4180]. Each feed entry is a text 199 line of the form: 201 ip_prefix,alpha2code,region,city,postal_code 203 The IP prefix field is REQUIRED, all others are OPTIONAL (can be 204 empty), though the requisite minimum number of commas SHOULD be 205 present. 207 2.1.1. Geolocation Feed Individual Entry Fields 209 2.1.1.1. IP Prefix 211 REQUIRED. Each IP prefix field MUST be either a single IP address or 212 an IP prefix in CIDR notation in conformance with section 3.1 [1] of 213 [RFC4632] for IPv4 or section 2.3 [2] of [RFC4291] for IPv6. 215 Examples include "192.0.2.1" and "192.0.2.0/24" for IPv4 and 216 "2001:db8::1" and "2001:db8::/32" for IPv6. 218 2.1.1.2. Alpha2code (previously: 'country') 220 OPTIONAL. The alpha2code field, if non-empty, MUST be a 2 letter ISO 221 country code conforming to ISO 3166-1 alpha 2 [ISO.3166.1alpha2]. 222 Parsers SHOULD treat this field case-insensitively. 224 Earlier versions of this document called this field "country", and it 225 may still be referred to as such in existing tools / interfaces. 227 Parsers MAY additionally support other 2 letter codes outside the ISO 228 3166-1 alpha 2 codes. For example, 2 letter codes from the 229 "Exceptionally reserved codes" [3] set may appear in this field, e.g. 230 "UK" [4] or "EU" [5]. 232 Examples include "US" for the United States, "JP" for Japan, and "PL" 233 for Poland. 235 2.1.1.3. Region 237 OPTIONAL. The region field, if non-empty, MUST be a ISO region code 238 conforming to ISO 3166-2 [ISO.3166.2]. Parsers SHOULD treat this 239 field case-insensitively. 241 Examples include "ID-RI" for the Riau province of Indonesia and "NG- 242 RI" for the Rivers province in Nigeria. 244 2.1.1.4. City 246 OPTIONAL. The city field, if non-empty, SHOULD be free UTF-8 text, 247 excluding the comma (',') character. 249 Examples include "Dublin", "New York", and "Sao Paulo" (specifically 250 "S" followed by 0xc3, 0xa3, and "o Paulo"). 252 2.1.1.5. Postal Code 254 OPTIONAL, DEPRECATED. The postal code field, if non-empty, SHOULD be 255 free UTF-8 text, excluding the comma (',') character. The use of 256 this field is deprecated; consumers of feeds should be able to parse 257 feeds containing these fields, but new feeds SHOULD NOT include this 258 field, due to the granularity of this information. See Section 4 for 259 additional discussion. 261 Examples include "106-6126" (in Minato ward, Tokyo, Japan). 263 2.1.2. Prefixes With No Geolocation Information 265 Feed publishers may indicate that some IP prefixes should not have 266 any associated geolocation information. It may be that some prefixes 267 under their administrative control are reserved, not yet allocated or 268 deployed, or are in the process of being redeployed elsewhere and 269 existing geolocation information can, from the perspective of the 270 publisher, safely be discarded. 272 This special case can be indicated by explicitly leaving blank all 273 fields which specify any degree of geolocation information. For 274 example: 276 192.0.2.0/24,,,, 277 2001:db8:1::/48,,,, 278 2001:db8:2::/48,,,, 280 Historically, the user-assigned alpha2code identifier of "ZZ" has 281 been used for this same purpose. This is not necessarily preferred, 282 and no specific interpretation of any of the other user-assigned 283 alpha2code codes is currently defined. 285 2.1.3. Additional Parsing Requirements 287 Feed entries missing an IP address or prefix field or having an IP 288 address or prefix field which fails to parse correctly MUST be 289 discarded. 291 While publishers SHOULD follow [RFC5952] style for IPv6 prefix 292 fields, consumers MUST nevertheless accept all valid string 293 representations. 295 Duplicate IP address or prefix entries MUST be considered an error, 296 and consumer implementations SHOULD log the repeated entries for 297 further administrative review. Publishers SHOULD take measures to 298 ensure there is one and only one entry per IP address and prefix. 300 Multiple entries which constitute nested prefixes are permitted. 301 Consumers SHOULD consider the entry with the longest matching prefix 302 (i.e. the "most specific") to be the best matching entry for a given 303 IP address. 305 Feed entries with non-empty optional fields which fail to parse, 306 either in part or in full, SHOULD be discarded. It is RECOMMENDED 307 that they also be logged for further administrative review. 309 For compatibility with future additional fields, a parser MUST ignore 310 any fields beyond those it expects. The data from fields which are 311 expected and which parse successfully MUST still be considered valid. 312 Per Section 7 no extensions to this format are in use nor are any 313 anticipated. 315 2.2. Examples 317 Example entries using different IP address formats and describing 318 locations at alpha2code ("country code"), region, and city 319 granularity level, respectively: 321 192.0.2.0/25,US,US-AL,, 322 192.0.2.5,US,US-AL,Alabaster, 323 192.0.2.128/25,PL,PL-MZ,, 324 2001:db8::/32,PL,,, 325 2001:db8:cafe::/48,PL,PL-MZ,, 327 The IETF network publishes geolocation information for the meeting 328 prefixes, and generally just comment out the last meeting information 329 and append the new meeting information. The [GEO_IETF] at the time 330 of this writing contains: 332 # IETF106 (Singapore) - November 2019 - Singapore, SG 333 130.129.0.0/16,SG,SG-01,Singapore, 334 2001:df8::/32,SG,SG-01,Singapore, 335 31.133.128.0/18,SG,SG-01,Singapore, 336 31.130.224.0/20,SG,SG-01,Singapore, 337 2001:67c:1230::/46,SG,SG-01,Singapore, 338 2001:67c:370::/48,SG,SG-01,Singapore, 340 Experimentally, RIPE has published geolocation information for their 341 conference network prefixes, which change location in accordance with 342 each new event. [GEO_RIPE_NCC] at the time of writing contains: 344 193.0.24.0/21,NL,NL-ZH,Rotterdam, 345 2001:67c:64::/48,NL,NL-ZH,Rotterdam, 347 Similarly, ICANN has published geolocation information for their 348 portable conference network prefixes. [GEO_ICANN] at the time of 349 writing contains: 351 199.91.192.0/21,MA,MA-07,Marrakech 352 2620:f:8000::/48,MA,MA-07,Marrakech 354 A longer example is the [GEO_Google] Google Corp Geofeed, which lists 355 the geo-location information for Google corporate offices. 357 At the time of writing, Google processes approximately 400 feeds 358 comprising more than 750,000 IPv4 and IPv6 prefixes. 360 3. Consuming Self-Published IP Geolocation Feeds 362 Consumers MAY treat published feed data as a hint only and MAY choose 363 to prefer other sources of geolocation information for any given IP 364 prefix. Regardless of a consumer's stance with respect to a given 365 published feed, there are some points of note for sensibly and 366 effectively consuming published feeds. 368 3.1. Feed Integrity 370 The integrity of published information SHOULD be protected by 371 securing the means of publication, for example by using HTTP over TLS 372 [RFC2818]. Whenever possible, consumers SHOULD prefer retrieving 373 geolocation feeds in a manner that guarantees integrity of the feed. 375 3.2. Verification of Authority 377 Consumers of self-published IP geolocation feeds SHOULD perform some 378 form of verification that the publisher is in fact authoritative for 379 the addresses in the feed. The actual means of verification is 380 likely dependent upon the way in which the feed is discovered. Ad 381 hoc shared URIs, for example, will likely require an ad hoc 382 verification process. Future automated means of feed discovery 383 SHOULD have an accompanying automated means of verification. 385 A consumer should only trust geolocation information for IP addresses 386 or prefixes for which the publisher has been verified as 387 administratively authoritative. All other geolocation feed entries 388 should be ignored and logged for further administrative review. 390 3.3. Verification of Accuracy 392 Errors and inaccuracies may occur at many levels, and publication and 393 consumption of geolocation data are no exceptions. To the extent 394 practical, consumers SHOULD take steps to verify the accuracy of 395 published locality. Verification methodology, resolution of 396 discrepancies, and preference for alternative sources of data are 397 left to the discretion of the feed consumer. 399 Consumers SHOULD decide on discrepancy thresholds and SHOULD flag for 400 administrative review feed entries which exceed set thresholds. 402 3.4. Refreshing Feed Information 404 As a publisher can change geolocation data at any time and without 405 notification, consumers SHOULD implement mechanisms to periodically 406 refresh local copies of feed data. In the absence of any other 407 refresh timing information, it is recommended that consumers SHOULD 408 refresh feeds no less often than weekly, and no more often than is 409 likely to cause issues to the publisher. 411 For feeds available via HTTPS (or HTTP), the publisher MAY 412 communicate refresh timing information by means of the standard HTTP 413 expiration model (section 13.2 [6] of [RFC2616]). Specifically, 414 publishers can include either an Expires header [7] or a Cache- 415 Control header [8] specifying the max-age. Where practical, 416 consumers SHOULD refresh feed information before the expiry time is 417 reached. 419 4. Privacy Considerations 421 Publishers of geolocation feeds are advised to have fully considered 422 any and all privacy implications of the disclosure of such 423 information for the users of the described networks prior to 424 publication. A thorough comprehension of the security considerations 425 [9] of a chosen geolocation policy is highly recommended, including 426 an understanding of some of the limitations of information obscurity 427 [10] (see also [RFC6772]). 429 As noted in Section 2.1, each location field in an entry is optional, 430 in order to support expressing only the level of specificity which 431 the publisher has deemed acceptable. There is no requirement that 432 the level of specificity be consistent across all entries within a 433 feed. In particular, the Postal Code field (Section 2.1.1.5) can 434 provide very specific geolocation, sometimes within a building. Such 435 specific Postal Code values MUST NOT be published in geofeeds without 436 the express consent of the parties being located. 438 Operators who publish geolocation information are strongly encouraged 439 to inform affected users/customers of this fact and of the potential 440 privacy-related consequences and trade-offs. 442 5. Relation to Other Work 444 While not originally done in conjunction with the [GEOPRIV] working 445 group, Richard Barnes observed that this work is nevertheless 446 consistent with that which the group has defined, both for address 447 format and for privacy. The data elements in geolocation feeds are 448 equivalent to the following XML structure (vis. [RFC5139]): 450 451 country 452 region 453 city 454 postal_code 455 457 Providing geolocation information to this granularity is equivalent 458 to the following privacy policy (vis. the definition of the 459 'building' [11] level of disclosure): 461 462 463 464 465 466 467 building 468 469 470 471 473 6. Security Considerations 475 As there is no true security in the obscurity of the location of any 476 given IP address, self-publication of this data fundamentally opens 477 no new attack vectors. For publishers, self-published data may 478 increase the ease with which such location data might be exploited 479 (it can, for example, make easy the discovery of prefixes populated 480 with customers as distinct from prefixes not generally in use). 482 For consumers, feed retrieval processes may receive input from 483 potentially hostile sources (e.g. in the event of hijacked traffic). 484 As such, proper input validation and defense measures MUST be taken 485 (see the discussion in Section 3.1). 487 Similarly, consumers who do not perform sufficient verification of 488 published data bear the same risks as from other forms of geolocation 489 configuration errors (see the discussion in Section 3.2 and 490 Section 3.3). 492 Validation of a feed's contents includes verifying that the publisher 493 is authoritative for the IP prefixes included in the feed. Failure 494 to verify IP prefix authority would, for example, allow ISP Bob to 495 make geolocation statements about IP space held by ISP Alice. At 496 this time only out-of-band verification methods are implemented (i.e. 497 an ISP's feed may be verified against publicly available IP 498 allocation data). 500 7. Planned Future Work 502 In order to more flexibly support future extensions, use of a more 503 expressive feed format has been suggested. Use of JavaScript Object 504 Notation (JSON, [RFC4627]), specifically, has been discussed. 505 However, at the time of writing no such specification nor 506 implementation exists. Nevertheless, work on extensions is deferred 507 until a more suitable format has been selected. 509 The authors are planning on writing a document describing such a new 510 format. This document describes a currently deployed and used 511 format. Given the extremely limited extensibility of the present 512 format no extensions to it are anticipated. Extensibility 513 requirements are instead expected to be integral to the development 514 of a new format. 516 8. Finding Self-Published IP Geolocation Feeds 518 The issue of finding, and later verifying, geolocation feeds is not 519 formally specified in this document. At this time, only ad hoc feed 520 discovery and verification has a modicum of established practice (see 521 below); discussion of other mechanisms has been removed for clarity. 523 8.1. Ad Hoc 'Well Known' URIs 525 To date, geolocation feeds have been shared informally in the form of 526 HTTPS URIs exchanged in email threads. Three of example URIs 527 documented below ([GEO_IETF], [GEO_RIPE_NCC], [GEO_ICANN]) describe 528 networks that change locations periodically, the operators and 529 operational practices of which are well known within their respective 530 technical communities. 532 The contents of the feeds are verified by a similarly ad hoc process 533 including: 535 o personal knowledge of the parties involved in the exchange, and 537 o comparison of feed-advertised prefixes with the BGP-advertised 538 prefixes of Autonomous System Numbers known to be operated by the 539 publishers. 541 Ad hoc mechanisms, while useful for early experimentation by 542 producers and consumers, are unlikely to be adequate for long-term, 543 widespread use by multiple parties. Future versions of any such 544 self-published geolocation feed mechanism SHOULD address scalability 545 concerns by defining a means for automated discovery and verification 546 of operational authority of advertised prefixes. 548 8.2. Other Mechanisms 550 Previous versions of this document referenced use of the WHOIS 551 service ([RFC3912]) operated by RIRs as well as possible DNS-based 552 schemes to discover and validate geofeeds. To the authors' knowledge 553 support for such mechanisms has never been implemented, and this 554 speculative text has been removed to avoid ambiguity. 556 9. IANA Considerations 558 This document makes no requests of the IANA. 560 10. Acknowledgements 562 The authors would like to express their gratitude to reviewers and 563 early implementers, including but not limited to Mikael Abrahamsson, 564 Andrew Alston, Ray Bellis, John Bond, Alissa Cooper, Andras Erdei, 565 Stephen Farrell, Marco Hogewoning, Mike Joseph, Maciej Kuzniar, 566 George Michaelson, Menno Schepers, Justyna Sidorska, Pim van Pelt, 567 and Bjoern A. Zeeb. 569 Richard L. Barnes and Andy Newton in particular contributed 570 substantial review, text, and advice. 572 11. References 574 11.1. Normative References 576 [ISO.3166.1alpha2] 577 International Organization for Standardization, "ISO 578 3166-1 decoding table", 579 . 582 [ISO.3166.2] 583 International Organization for Standardization, "ISO 584 3166-2:2007", . 587 [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., 588 Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext 589 Transfer Protocol -- HTTP/1.1", RFC 2616, 590 DOI 10.17487/RFC2616, June 1999, 591 . 593 [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO 594 10646", STD 63, RFC 3629, DOI 10.17487/RFC3629, November 595 2003, . 597 [RFC4180] Shafranovich, Y., "Common Format and MIME Type for Comma- 598 Separated Values (CSV) Files", RFC 4180, 599 DOI 10.17487/RFC4180, October 2005, 600 . 602 [RFC4291] Hinden, R. and S. Deering, "IP Version 6 Addressing 603 Architecture", RFC 4291, DOI 10.17487/RFC4291, February 604 2006, . 606 [RFC4632] Fuller, V. and T. Li, "Classless Inter-domain Routing 607 (CIDR): The Internet Address Assignment and Aggregation 608 Plan", BCP 122, RFC 4632, DOI 10.17487/RFC4632, August 609 2006, . 611 [RFC5952] Kawamura, S. and M. Kawashima, "A Recommendation for IPv6 612 Address Text Representation", RFC 5952, 613 DOI 10.17487/RFC5952, August 2010, 614 . 616 11.2. Informative References 618 [GEO_Google] 619 Google, LLC, "Google Corp Geofeed", 620 . 622 [GEO_ICANN] 623 Internet Corporation For Assigned Names and Numbers, 624 "ICANN Meeting Geolocation Data", 625 . 627 [GEO_IETF] 628 Kumari, A., "IETF Meeting Network Geolocation Data", 629 . 631 [GEO_RIPE_NCC] 632 Schepers, M., "RIPE NCC Meeting Geolocation Data", 633 . 635 [GEOPRIV] Internet Engineering Task Force, "IETF geopriv Working 636 Group", . 638 [IPADDR_PY] 639 Shields, M. and P. Moody, "Python IP address manipulation 640 library", . 642 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 643 Requirement Levels", BCP 14, RFC 2119, 644 DOI 10.17487/RFC2119, March 1997, 645 . 647 [RFC2818] Rescorla, E., "HTTP Over TLS", RFC 2818, 648 DOI 10.17487/RFC2818, May 2000, 649 . 651 [RFC3912] Daigle, L., "WHOIS Protocol Specification", RFC 3912, 652 DOI 10.17487/RFC3912, September 2004, 653 . 655 [RFC4408] Wong, M. and W. Schlitt, "Sender Policy Framework (SPF) 656 for Authorizing Use of Domains in E-Mail, Version 1", 657 RFC 4408, DOI 10.17487/RFC4408, April 2006, 658 . 660 [RFC4627] Crockford, D., "The application/json Media Type for 661 JavaScript Object Notation (JSON)", RFC 4627, 662 DOI 10.17487/RFC4627, July 2006, 663 . 665 [RFC5139] Thomson, M. and J. Winterbottom, "Revised Civic Location 666 Format for Presence Information Data Format Location 667 Object (PIDF-LO)", RFC 5139, DOI 10.17487/RFC5139, 668 February 2008, . 670 [RFC6772] Schulzrinne, H., Ed., Tschofenig, H., Ed., Cuellar, J., 671 Polk, J., Morris, J., and M. Thomson, "Geolocation Policy: 672 A Document Format for Expressing Privacy Preferences for 673 Location Information", RFC 6772, DOI 10.17487/RFC6772, 674 January 2013, . 676 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 677 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 678 May 2017, . 680 11.3. URIs 682 [1] http://tools.ietf.org/html/rfc4632#section-3.1 684 [2] http://tools.ietf.org/html/rfc4291#section-2.3 686 [3] https://www.iso.org/glossary-for-iso-3166.html 688 [4] https://www.iso.org/obp/ui/#iso:code:3166:UK 690 [5] https://www.iso.org/obp/ui/#iso:code:3166:EU 692 [6] http://tools.ietf.org/html/rfc2616#section-13.2 694 [7] http://tools.ietf.org/html/rfc2616#section-14.21 696 [8] http://tools.ietf.org/html/rfc2616#section-14.9 698 [9] http://tools.ietf.org/html/rfc6772#section-13 700 [10] http://tools.ietf.org/html/rfc6772#section-13.5 702 [11] http://tools.ietf.org/html/rfc6772#section-6.5.1 704 Appendix A. Sample Python Validation Code 706 Included here is a simple format validator in Python for self- 707 published ipgeo feeds. This tool reads CSV data in the self- 708 published ipgeo feed format from the standard input and performs 709 basic validation. It is intended for use by feed publishers before 710 launching a feed. Note that this validator does not verify the 711 uniqueness of every IP prefix entry within the feed as a whole, but 712 only verifies the syntax of each single line from within the feed. A 713 complete validator MUST also ensure IP prefix uniqueness. 715 The main source file "ipgeo_feed_validator.py" follows. It requires 716 use of the open source ipaddr Python library for IP address and CIDR 717 parsing and validation [IPADDR_PY]. 719 720 #!/usr/bin/python 721 # 722 # Copyright (c) 2012 IETF Trust and the persons identified as authors of 723 # the code. All rights reserved. Redistribution and use in source and 724 # binary forms, with or without modification, is permitted pursuant to, 725 # and subject to the license terms contained in, the Simplified BSD 726 # License set forth in Section 4.c of the IETF Trust's Legal Provisions 727 # Relating to IETF Documents (http://trustee.ietf.org/license-info). 729 """Simple format validator for self-published ipgeo feeds. 731 This tool reads CSV data in the self-published ipgeo feed format from 732 the standard input and performs basic validation. It is intended for 733 use by feed publishers before launching a feed. 734 """ 736 import csv 737 import ipaddr 738 import re 739 import sys 741 class IPGeoFeedValidator(object): 742 def __init__(self): 743 self.prefixes = {} 744 self.line_number = 0 745 self.output_log = {} 746 self.SetOutputStream(sys.stderr) 748 def Validate(self, feed): 749 """Check validity of an IPGeo feed. 751 Args: 752 feed: iterable with feed lines 753 """ 755 for line in feed: 756 self._ValidateLine(line) 758 def SetOutputStream(self, logfile): 759 """Controls where the output messages go do (STDERR by default). 761 Use None to disable logging. 763 Args: 764 logfile: a file object (e.g., sys.stdout or sys.stderr) or None. 765 """ 766 self.output_stream = logfile 768 def CountErrors(self, severity): 769 """How many ERRORs or WARNINGs were generated.""" 770 return len(self.output_log.get(severity, [])) 772 ############################################################ 773 def _ValidateLine(self, line): 774 line = line.rstrip('\r\n') 775 self.line_number += 1 776 self.line = line.split('#')[0] 777 self.is_correct_line = True 779 if self._ShouldIgnoreLine(line): 780 return 782 fields = [field for field in csv.reader([line])][0] 784 self._ValidateFields(fields) 785 self._FlushOutputStream() 787 def _ShouldIgnoreLine(self, line): 788 line = line.strip() 789 if line.startswith('#'): 790 return True 791 return len(line) == 0 793 ############################################################ 794 def _ValidateFields(self, fields): 795 assert(len(fields) > 0) 797 is_correct = self._IsIPAddressOrPrefixCorrect(fields[0]) 799 if len(fields) > 1: 800 if not self._IsAlpha2CodeCorrect(fields[1]): 801 is_correct = False 803 if len(fields) > 2 and not self._IsRegionCodeCorrect(fields[2]): 804 is_correct = False 806 if len(fields) != 5: 807 self._ReportWarning('5 fields were expected (got %d).' 808 % len(fields)) 810 ############################################################ 811 def _IsIPAddressOrPrefixCorrect(self, field): 812 if '/' in field: 813 return self._IsCIDRCorrect(field) 814 return self._IsIPAddressCorrect(field) 816 def _IsCIDRCorrect(self, cidr): 817 try: 818 ipprefix = ipaddr.IPNetwork(cidr) 819 if ipprefix.network._ip != ipprefix._ip: 820 self._ReportError('Incorrect IP Network.') 821 return False 822 if ipprefix.is_private: 823 self._ReportError('IP Address must not be private.') 824 return False 825 except: 826 self._ReportError('Incorrect IP Network.') 827 return False 828 return True 830 def _IsIPAddressCorrect(self, ipaddress): 831 try: 832 ip = ipaddr.IPAddress(ipaddress) 833 except: 834 self._ReportError('Incorrect IP Address.') 835 return False 836 if ip.is_private: 837 self._ReportError('IP Address must not be private.') 838 return False 839 return True 841 ############################################################ 842 def _IsAlpha2CodeCorrect(self, alpha2code): 843 if len(alpha2code) == 0: 844 return True 845 if len(alpha2code) != 2 or not alpha2code.isalpha(): 846 self._ReportError( 847 'Alpha 2 code must be in the ISO 3166-1 alpha 2 format.') 848 return False 849 return True 851 def _IsRegionCodeCorrect(self, region_code): 852 if len(region_code) == 0: 853 return True 855 if '-' not in region_code: 856 self._ReportError('Region code must be in the ISO 3166-2 format.') 857 return False 859 parts = region_code.split('-') 860 if not self._IsAlpha2CodeCorrect(parts[0]): 861 return False 862 return True 864 ############################################################ 865 def _ReportError(self, message): 866 self._ReportWithSeverity('ERROR', message) 868 def _ReportWarning(self, message): 869 self._ReportWithSeverity('WARNING', message) 871 def _ReportWithSeverity(self, severity, message): 872 self.is_correct_line = False 873 output_line = '%s: %s\n' % (severity, message) 875 if severity not in self.output_log: 876 self.output_log[severity] = [] 877 self.output_log[severity].append(output_line) 879 if self.output_stream is not None: 880 self.output_stream.write(output_line) 882 def _FlushOutputStream(self): 883 if self.is_correct_line: return 884 if self.output_stream is None: return 886 self.output_stream.write('line %d: %s\n\n' 887 % (self.line_number, self.line)) 889 ############################################################ 890 def main(): 891 feed_validator = IPGeoFeedValidator() 892 feed_validator.Validate(sys.stdin) 894 if feed_validator.CountErrors('ERROR'): 895 sys.exit(1) 897 if __name__ == '__main__': 898 main() 899 900 A unit test file, "ipgeo_feed_validator_test.py" is provided as well. 901 It provides basic test coverage of the code above, though does not 902 test correct handling of non-ASCII UTF-8 strings. 904 905 #!/usr/bin/python 906 # 907 # Copyright (c) 2012 IETF Trust and the persons identified as authors of 908 # the code. All rights reserved. Redistribution and use in source and 909 # binary forms, with or without modification, is permitted pursuant to, 910 # and subject to the license terms contained in, the Simplified BSD 911 # License set forth in Section 4.c of the IETF Trust's Legal Provisions 912 # Relating to IETF Documents (http://trustee.ietf.org/license-info). 914 import sys 915 from ipgeo_feed_validator import IPGeoFeedValidator 917 class IPGeoFeedValidatorTest(object): 918 def __init__(self): 919 self.validator = IPGeoFeedValidator() 920 self.validator.SetOutputStream(None) 921 self.successes = 0 922 self.failures = 0 924 def Run(self): 925 self.TestFeedLine('# asdf', 0, 0) 926 self.TestFeedLine(' ', 0, 0) 927 self.TestFeedLine('', 0, 0) 929 self.TestFeedLine('asdf', 1, 1) 930 self.TestFeedLine('asdf,US,,,', 1, 0) 931 self.TestFeedLine('aaaa::,US,,,', 0, 0) 932 self.TestFeedLine('zzzz::,US', 1, 1) 933 self.TestFeedLine(',US,,,', 1, 0) 934 self.TestFeedLine('55.66.77', 1, 1) 935 self.TestFeedLine('55.66.77.888', 1, 1) 936 self.TestFeedLine('55.66.77.asdf', 1, 1) 938 self.TestFeedLine('2001:db8:cafe::/48,PL,PL-MZ,,02-784', 0, 0) 939 self.TestFeedLine('2001:db8:cafe::/48', 0, 1) 941 self.TestFeedLine('55.66.77.88,PL', 0, 1) 942 self.TestFeedLine('55.66.77.88,PL,,,', 0, 0) 943 self.TestFeedLine('55.66.77.88,,,,', 0, 0) 944 self.TestFeedLine('55.66.77.88,ZZ,,,', 0, 0) 945 self.TestFeedLine('55.66.77.88,US,,,', 0, 0) 946 self.TestFeedLine('55.66.77.88,USA,,,', 1, 0) 947 self.TestFeedLine('55.66.77.88,99,,,', 1, 0) 948 self.TestFeedLine('55.66.77.88,US,US-CA,,', 0, 0) 949 self.TestFeedLine('55.66.77.88,US,USA-CA,,', 1, 0) 950 self.TestFeedLine('55.66.77.88,USA,USA-CA,,', 2, 0) 952 self.TestFeedLine('55.66.77.88,US,US-CA,Mountain View,', 0, 0) 953 self.TestFeedLine('55.66.77.88,US,US-CA,Mountain View,94043', 0, 0) 954 self.TestFeedLine('55.66.77.88,US,US-CA,Mountain View,94043,' 955 '1600 Ampthitheatre Parkway', 0, 1) 957 self.TestFeedLine('55.66.77.0/24,US,,,', 0, 0) 958 self.TestFeedLine('55.66.77.88/24,US,,,', 1, 0) 959 self.TestFeedLine('55.66.77.88/32,US,,,', 0, 0) 960 self.TestFeedLine('55.66.77/24,US,,,', 1, 0) 961 self.TestFeedLine('55.66.77.0/35,US,,,', 1, 0) 963 self.TestFeedLine('172.15.30.1,US,,,', 0, 0) 964 self.TestFeedLine('172.28.30.1,US,,,', 1, 0) 965 self.TestFeedLine('192.167.100.1,US,,,', 0, 0) 966 self.TestFeedLine('192.168.100.1,US,,,', 1, 0) 967 self.TestFeedLine('10.0.5.9,US,,,', 1, 0) 968 self.TestFeedLine('10.0.5.0/24,US,,,', 1, 0) 969 self.TestFeedLine('fc00::/48,PL,,,', 1, 0) 970 self.TestFeedLine('fe00::/48,PL,,,', 0, 0) 972 print ('%d tests passed, %d failed' 973 % (self.successes, self.failures)) 975 def IsOutputLogCorrectAtSeverity(self, severity, expected_msg_count): 976 msg_count = self.validator.CountErrors(severity) 978 if msg_count != expected_msg_count: 979 print ('TEST FAILED: %s\nexpected %d %s[s], observed %d\n%s\n' % ( 980 self.validator.line, expected_msg_count, severity, msg_count, 981 str(self.validator.output_log[severity]))) 982 return False 983 return True 985 def IsOutputLogCorrect(self, new_errors, new_warnings): 986 retval = True 988 if not self.IsOutputLogCorrectAtSeverity('ERROR', new_errors): 989 retval = False 990 if not self.IsOutputLogCorrectAtSeverity('WARNING', new_warnings): 991 retval = False 993 return retval 995 def TestFeedLine(self, line, warning_count, error_count): 997 self.validator.output_log['WARNING'] = [] 998 self.validator.output_log['ERROR'] = [] 999 self.validator._ValidateLine(line) 1001 if not self.IsOutputLogCorrect(warning_count, error_count): 1002 self.failures += 1 1003 return False 1005 self.successes += 1 1006 return True 1008 if __name__ == '__main__': 1009 IPGeoFeedValidatorTest().Run() 1010 1012 Authors' Addresses 1014 Erik Kline 1015 Loon LLC 1016 1600 Amphitheatre Parkway 1017 Mountain View, California 94043 1018 United States of America 1020 Email: ek@loon.com 1022 Krzysztof Duleba 1023 Google Switzerland GmbH 1024 Brandschenkestrasse 110 1025 Zuerich 8002 1026 Switzerland 1028 Email: kduleba@google.com 1030 Zoltan Szamonek 1031 Google Switzerland GmbH 1032 Brandschenkestrasse 110 1033 Zuerich 8002 1034 Switzerland 1036 Email: zszami@google.com 1037 Stefan Moser 1038 Google Switzerland GmbH 1039 Brandschenkestrasse 110 1040 Zuerich 8002 1041 Switzerland 1043 Email: smoser@google.com 1045 Warren Kumari 1046 Google 1047 1600 Amphitheatre Parkway 1048 Mountain View, CA 94043 1049 US 1051 Email: warren@kumari.net