idnits 2.17.1 draft-google-self-published-geofeeds-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- -- The document has an IETF Trust Provisions (28 Dec 2009) Section 6.c(i) Publication Limitation clause. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) == There are 1 instance of lines with non-RFC2606-compliant FQDNs in the document. == There are 4 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. == There are 1 instance of lines with multicast IPv4 addresses in the document. If these are generic example addresses, they should be changed to use the 233.252.0.x range defined in RFC 5771 == There are 2 instances of lines with non-RFC3849-compliant IPv6 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Couldn't figure out when the document was first submitted -- there may comments or warnings related to the use of a disclaimer for pre-RFC5378 work that could not be issued because of this. Please check the Legal Provisions document at https://trustee.ietf.org/license-info to determine if you need the pre-RFC5378 disclaimer. -- The document date (July 29, 2013) is 3895 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Looks like a reference, but probably isn't: '0' on line 844 -- Looks like a reference, but probably isn't: '1' on line 785 -- Looks like a reference, but probably isn't: '2' on line 788 ** Obsolete normative reference: RFC 2616 (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235) -- Obsolete informational reference (is this intentional?): RFC 2818 (Obsoleted by RFC 9110) -- Obsolete informational reference (is this intentional?): RFC 4408 (Obsoleted by RFC 7208) -- Obsolete informational reference (is this intentional?): RFC 4627 (Obsoleted by RFC 7158, RFC 7159) Summary: 2 errors (**), 0 flaws (~~), 6 warnings (==), 9 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group E. Kline 3 Internet-Draft Google Japan 4 Intended status: Informational K. Duleba 5 Expires: January 30, 2014 Z. Szamonek 6 Google Switzerland GmbH 7 July 29, 2013 9 Self-published IP Geolocation Data 10 draft-google-self-published-geofeeds-02 12 Abstract 14 This document records a format whereby a network operator can publish 15 a mapping of IP address ranges to simplified geolocation information, 16 colloquially termed a geolocation "feed". Interested parties can 17 poll and parse these feeds to update or merge with other geolocation 18 data sources and procedures. 20 Some technical organizations operating networks that move from one 21 conference location to the next have already experimentally published 22 small geolocation feeds. At least one consumer (Google) has 23 incorporated these ad hoc feeds into a geolocation data pipeline. 25 Status of this Memo 27 This Internet-Draft is submitted in full conformance with the 28 provisions of BCP 78 and BCP 79. This document may not be modified, 29 and derivative works of it may not be created, except to format it 30 for publication as an RFC or to translate it into languages other 31 than English. 33 Internet-Drafts are working documents of the Internet Engineering 34 Task Force (IETF). Note that other groups may also distribute 35 working documents as Internet-Drafts. The list of current Internet- 36 Drafts is at http://datatracker.ietf.org/drafts/current/. 38 Internet-Drafts are draft documents valid for a maximum of six months 39 and may be updated, replaced, or obsoleted by other documents at any 40 time. It is inappropriate to use Internet-Drafts as reference 41 material or to cite them other than as "work in progress." 43 This Internet-Draft will expire on January 30, 2014. 45 Copyright Notice 47 Copyright (c) 2013 IETF Trust and the persons identified as the 48 document authors. All rights reserved. 50 This document is subject to BCP 78 and the IETF Trust's Legal 51 Provisions Relating to IETF Documents 52 (http://trustee.ietf.org/license-info) in effect on the date of 53 publication of this document. Please review these documents 54 carefully, as they describe your rights and restrictions with respect 55 to this document. Code Components extracted from this document must 56 include Simplified BSD License text as described in Section 4.e of 57 the Trust Legal Provisions and are provided without warranty as 58 described in the Simplified BSD License. 60 Table of Contents 62 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 63 1.1. Motivation . . . . . . . . . . . . . . . . . . . . . . . . 3 64 1.2. Requirements notation . . . . . . . . . . . . . . . . . . 3 65 1.3. Implications of publication . . . . . . . . . . . . . . . 3 66 2. Self-published IP geolocation feeds . . . . . . . . . . . . . 4 67 2.1. Specification . . . . . . . . . . . . . . . . . . . . . . 4 68 2.1.1. Geolocation feed individual entry fields . . . . . . . 5 69 2.1.2. Prefixes with no geolocation information . . . . . . . 6 70 2.1.3. Additional parsing requirements . . . . . . . . . . . 6 71 2.1.4. Looking up an IP address . . . . . . . . . . . . . . . 7 72 2.2. Examples . . . . . . . . . . . . . . . . . . . . . . . . . 7 73 2.3. Proposed extensions . . . . . . . . . . . . . . . . . . . 8 74 2.3.1. Delegation size . . . . . . . . . . . . . . . . . . . 8 75 2.3.2. Alternate format . . . . . . . . . . . . . . . . . . . 8 76 3. Finding self-published IP geolocation feeds . . . . . . . . . 8 77 3.1. Ad hoc 'well known' URIs . . . . . . . . . . . . . . . . . 9 78 3.2. Using public databases of network authority . . . . . . . 9 79 3.3. Using 'reverse' DNS with NAPTR records . . . . . . . . . . 9 80 4. Consuming self-published IP geolocation feeds . . . . . . . . 11 81 4.1. Feed integrity . . . . . . . . . . . . . . . . . . . . . . 11 82 4.2. Verification of authority . . . . . . . . . . . . . . . . 11 83 4.3. Verification of accuracy . . . . . . . . . . . . . . . . . 11 84 4.4. Refreshing feed information . . . . . . . . . . . . . . . 11 85 5. Security Considerations . . . . . . . . . . . . . . . . . . . 12 86 6. Privacy Considerations . . . . . . . . . . . . . . . . . . . . 12 87 7. Relation to other work . . . . . . . . . . . . . . . . . . . . 13 88 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 13 89 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 13 90 9.1. Normative References . . . . . . . . . . . . . . . . . . . 13 91 9.2. Informative References . . . . . . . . . . . . . . . . . . 14 92 Appendix A. Sample Python validation code . . . . . . . . . . . . 15 93 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 22 95 1. Introduction 97 1.1. Motivation 99 Providers of services over the Internet have grown to depend on best- 100 effort geolocation information to improve the user experience. 101 Locality information can aid in directing traffic to the nearest 102 serving location, inferring likely native language, and providing 103 additional context for services involving search queries. 105 When an ISP, for example, changes the location where an IP prefix is 106 deployed, services which make use of geolocation information may 107 begin to suffer degraded performance. This can lead to customer 108 complaints, possibly to the ISP directly. Dissemination of correct 109 geolocation data is complicated by the lack of any centralized means 110 to coordinate and communicate geolocation information to all 111 interested consumers of the data. 113 This document records a format whereby a network operator (an ISP, an 114 enterprise, or any organization which deems the geolocation of its IP 115 prefixes to be of concern) can publish a mapping of IP address ranges 116 to simplified geolocation information, colloquially termed a 117 "geolocation feed". Interested parties can poll and parse these 118 feeds to update or merge with other geolocation data sources and 119 procedures. 121 Some technical organizations operating networks that move from one 122 conference location to the next have already experimentally published 123 small geolocation feeds. At least one consumer (Google) has 124 incorporated these ad hoc feeds into a geolocation data pipeline. 126 1.2. Requirements notation 128 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 129 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 130 document are to be interpreted as described in [RFC2119]. 132 1.3. Implications of publication 134 This document describes both a format and a mechanism for publishing 135 data, with the implication that the owner of the data wishes it to be 136 public. Any privacy risk is bounded by the format, and data 137 publishers MAY omit certain fields to further protect privacy (see 138 Section 2.1 for details about which fields exactly may be omitted). 139 Feed publishers assume the responsibility of determining which data 140 should be made public. 142 This proposal does not incorporate a mechanism to communicate 143 acceptable use policies for self-published data. Publication itself 144 is inferred as a desire by the publisher for the data to be usefully 145 consumed, similar to the publication of information like host names, 146 cryptographic keys, and SPF records [RFC4408] in the DNS. 148 2. Self-published IP geolocation feeds 150 The format described here was developed to address the need of 151 network operators to rapidly and usefully share geolocation 152 information changes. Originally, there arose a specific case where 153 regional operators found it desirable to publish location changes 154 rather than wait for geolocation algorithms to "learn" about them. 155 Later, technical conferences which frequently use the same network 156 prefixes advertised from different conference locations experimented 157 by publishing geolocation feeds, updated in advance of network 158 location changes, in order to better serve conference attendees. 160 At its simplest, the mechanism consists of a network operator 161 publishing a file (the "geolocation feed"), which contains several 162 text entries, one per line. Each entry is keyed by a unique (within 163 the feed) IP prefix (or single IP address) followed by a sequence of 164 network locality attributes to be ascribed to the given prefix. 166 2.1. Specification 168 For operational simplicity, every feed should contain data about all 169 IP addresses the provider wants to publish. Alternatives, like 170 publishing only entries for IP addresses whose geolocation data has 171 changed or differ from current observed geolocation behavior "at 172 large", are likely to be too operationally complex. 174 Feeds MUST use UTF-8 [RFC3629] character encoding. Text after a '#' 175 character is treated as a comment only and ignored. Blank lines are 176 similarly ignored. 178 Feeds MUST be in comma separated values format as described in 179 [RFC4180]. Each feed entry is a text line of the form: 181 ip_range,country,region,city,postal_code 183 The IP range field is REQUIRED, all others are OPTIONAL (can be 184 empty), though the requisite minimum number of commas SHOULD be 185 present. 187 2.1.1. Geolocation feed individual entry fields 189 2.1.1.1. IP Range 191 REQUIRED. Each IP range field MUST be either a single IP address or 192 an IP prefix in CIDR notation in conformance with section 3.1 of 193 [RFC4632] for IPv4 or section 2.3 of [RFC4291] for IPv6. 195 Examples include "192.0.2.1" and "192.0.2.0/24" for IPv4 and "2001: 196 db8::1" and "2001:db8::/32" for IPv6. 198 2.1.1.2. Country 200 OPTIONAL. The country field, if non-empty, MUST be a 2 letter ISO 201 country code conforming to ISO 3166-1 alpha 2 [ISO.3166.1alpha2]. 202 Parsers SHOULD treat this field case-insensitively. 204 Examples include "US" for the United States, "JP" for Japan, and "PL" 205 for Poland. 207 2.1.1.3. Region 209 OPTIONAL. The region field, if non-empty, MUST be a ISO region code 210 conforming to ISO 3166-2 [ISO.3166.2]. Parsers SHOULD treat this 211 field case-insensitively. 213 Examples include "ID-RI" for the Riau province of Indonesia and 214 "NG-RI" for the Rivers province in Nigeria. 216 2.1.1.4. City 218 OPTIONAL. The city field, if non-empty, SHOULD be free UTF-8 text, 219 excluding the comma (',') character. 221 Examples include "Dublin", "New York", and "Sao Paulo" (specifically 222 "S" followed by 0xc3, 0xa3, and "o Paulo"). 224 2.1.1.5. Postal code 226 OPTIONAL. The postal code field, if non-empty, SHOULD be free UTF-8 227 text, excluding the comma (',') character. See Section 6 for some 228 discussion of when this field must not be populated. 230 Examples include "106-6126" (in Minato ward, Tokyo, Japan). 232 2.1.2. Prefixes with no geolocation information 234 Feed publishers may indicate that some IP prefixes should not have 235 any associated geolocation information. It may be that some prefixes 236 under their administrative control are reserved, not yet allocated or 237 deployed, or are in the process of being redeployed elsewhere and 238 existing geolocation information can, from the perspective of the 239 publisher, safely be discarded. 241 This special case can be indicated by explicitly leaving blank all 242 fields which specify any degree of geolocation information. For 243 example: 245 127.0.0.0/8,,,, 246 224.0.0.0/4,,,, 247 240.0.0.0/4,,,, 249 Historically, the user-assigned country identifier of "ZZ" had be 250 used for this same purpose. This is not necessarily preferred, and 251 no specific interpretation of any of the other user-assigned country 252 codes is currently defined. 254 2.1.3. Additional parsing requirements 256 Feed entries missing required fields, or having a required field 257 which fails to parse correctly MUST be discarded. It is RECOMMENDED 258 that such entries also be logged for further administrative review. 260 While publishers SHOULD follow [RFC5952] style for IPv6 prefix 261 fields, consumers MUST nevertheless accept all valid string 262 representations. 264 Duplicate IP address or prefix entries MUST be considered an error, 265 and consumer implementations SHOULD log the repeated entries for 266 further administrative review. Publishers SHOULD take measures to 267 ensure there is one and only one entry per IP address and prefix. 269 Feed entries with non-empty optional fields which fail to parse, 270 either in part or in full, SHOULD be discarded. It is RECOMMENDED 271 that they also be logged for further administrative review. 273 For compatibility with future additional fields a parser MUST ignore 274 any fields beyond those it expects. The data from fields which are 275 expected and which parse successfully MUST still be considered valid. 277 2.1.4. Looking up an IP address 279 Multiple entries which constitute nested prefixes are permitted. 280 Consumers SHOULD consider the entry with the longest matching prefix 281 (i.e. the "most specific") to be the best matching entry for a given 282 IP address. 284 2.2. Examples 286 Example entries using different IP address formats and describing 287 locations at country, region, city and postal code granularity level, 288 respectively: 290 192.0.2.0/25,US,US-AL,, 291 192.0.2.5,US,US-AL,Alabaster, 292 192.0.2.128/25,PL,PL-MZ,,02-784 293 2001:db8::/32,PL,,, 294 2001:db8:cafe::/48,PL,PL-MZ,,02-784 296 Experimentally, RIPE has published geolocation information for their 297 conference network prefixes, which change location in accordance with 298 each new event. [GEO_RIPE_NCC] at the time of writing contains: 300 193.0.24.0/21,IE,IE-D,Dublin, 301 2001:67c:64::/48,IE,IE-D,Dublin, 303 Similarly, ICANN has published geolocation information for their 304 portable conference network prefixes. [GEO_ICANN] at the time of 305 writing contains: 307 199.91.192.0/21,US,US-CA,Los Angeles, 308 2620:f:8000::/48,US,US-CA,Los Angeles, 310 Furthermore, it is worth noting that the geolocation data of SixXS 311 users, already available at whois.sixxs.net, is now also accessible 312 in the format described here (see [GEO_SIXXS]). This can be 313 particularly useful where tunnel broker networks [RFC3053] are 314 concerned as: 316 o the geolocation attributes of users with neighboring prefixes can 317 be quite different and therefore not easily aggregated, and 319 o attempting to learn this data by statistical analysis can be 320 complicated by the likely low number of samples for any given 321 user, making satisfactory statistical confidence difficult to 322 achieve. 324 2.3. Proposed extensions 326 Already some discussions have resulted in proposed extensions. While 327 the purpose of this document is principally to record existing 328 implementation details, it may be that there is a larger desire to 329 publish other "network attributes" in a similar manner. One such 330 network attribute, "delegation size", is not currently implemented 331 but the state of the proposed extension is recorded here to 332 demonstrate the flexibility required of parser implementations. 334 The following have been only informally discussed and are not in use 335 at the time of writing. 337 2.3.1. Delegation size 339 OPTIONAL. A publisher may optionally communicate the average 340 delegated prefix size for subnetworks within the IP prefix of this 341 entry. For a network operator this can be used to help consumers 342 distinguish IP prefixes among various use types such as residential 343 prefixes, allocations to businesses, or data center customer 344 allocations. 346 Non-empty strings MUST be of the form required for CIDR notation 347 suffixes, i.e. "/" followed by the integer prefix length of the 348 expected allocation to the subnetworks from within the entry's 349 prefix. In the absence of data to the contrary, it is common to 350 assume that leaf networks may be delegated a prefix ranging from /24 351 to /32 in IPv4 and /48 to /64 in IPv6. Default assumptions about 352 delegation size are left to the consumer's implementation. 354 Examples for IPv6 include "/48", "/56", "/60", and "/64". 356 2.3.2. Alternate format 358 In order to more flexibly support future extensions, use of a more 359 expressive feed format has been suggested. Use of JavaScript Object 360 Notation (JSON, [RFC4627]), specifically, has been discussed. 361 However, at the time of writing no such specification nor 362 implementation exists. 364 3. Finding self-published IP geolocation feeds 366 The issue of finding, and later verifying, geolocation feeds is not 367 formally specified in this document. At this time, only ad hoc feed 368 discovery and verification has a modicum of established practice (see 369 below). Regardless, both the ad hoc mechanics and a few proposed but 370 not yet implemented alternatives are discussed. 372 3.1. Ad hoc 'well known' URIs 374 To date, geolocation feeds have been shared informally in the form of 375 HTTPS URIs exchanged in email threads. The two example URIs 376 documented above describe networks that change locations 377 periodically, the operators and operational practices of which are 378 well known within their respective technical communities. 380 The contents of the feeds are verified by a similarly ad hoc process 381 including: 383 o personal knowledge of the parties involved in the exchange, and 385 o comparison of feed-advertised prefixes with the BGP-advertised 386 prefixes of Autonomous System Numbers known to be operated by the 387 publishers. 389 Ad hoc mechanisms, while useful for early experimentation by 390 producers and consumers, are unlikely to be adequate for long-term, 391 widespread use by multiple parties. Future versions of any such 392 self-published geolocation feed mechanism SHOULD address scalability 393 concerns by defining a means for automated discovery and verification 394 of operational authority of advertised prefixes. 396 3.2. Using public databases of network authority 398 One possibility for enabling automation would be publication of feed 399 URIs as a well-known attribute in public databases of network 400 authority, e.g. the WHOIS service ([RFC3912]) operated by RIRs. 401 Verification may be performed if the same or similarly authoritative 402 service provides the identical feed URI for queries for each CIDR 403 prefix in the geolocation feed. 405 The burden of serving this data to all interested consumers, 406 especially the load imposed by any verification process, is not yet 407 known. The anticipation of additional operational burden on the 408 public resource of record (the database of network authority) is 409 however a noted concern. 411 3.3. Using 'reverse' DNS with NAPTR records 413 Another possibility for automating the location and verification of a 414 geolocation feed is to incorporate feed URIs into the DNS, 415 specifically the in-addr.arpa and ip6.arpa portions of the DNS 416 hierarchy. A suitably formatted query for a NAPTR ([RFC3403]) 417 record, or more specifically a U-NAPTR ([RFC4848]) record, could 418 yield a transformation to a geolocation feed URI. 420 For example, assuming a purely theoretical service name of 421 "x-geofeed", a 'reverse' DNS zone might contain a record of the form: 423 ;; order pref flags 424 IN NAPTR 200 10 "u" "x-geofeed" ( ; service 425 ; regexp 426 "!.*!https://example.com/ipgeo.csv!" 427 "" ; replacement 428 ) 430 Attempts to locate the geolocation feed for a given IP address would 431 begin by querying directly for a NAPTR record associated with the 432 address's PTR-style name. For example, 192.0.2.4 and 2001:db8::6 433 would cause a NAPTR record request to be issued for "4.2.0.192.in- 434 addr.arpa" and "6.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.8.b.d 435 .0.1.0.0.2.ip6.arpa", respectively. 437 If no such record exists one further NAPTR query for the fully 438 qualified domain name of the SOA record in the authority section of 439 the response to the previous query would be performed ("2.0.192.in- 440 addr.arpa" and "d.0.1.0.0.2.ip6.arpa" in the examples above). 442 If one or more NAPTR records exist for the full PTR-style name but 443 none of them are for the required service name (e.g. "x-geofeed"), 444 then likely no SOA will be returned as a hint for subsequent queries. 445 In this case implementations would need to first explicitly query for 446 an SOA record for the full PTR-style name, and then query for a NAPTR 447 record of the SOA in the response (assuming it differs from the 448 previously queried name). 450 Any successfully located feed URIs could then be processed as 451 outlined by this document. 453 Verification of the contents of a feed would proceed in essentially 454 the same way. CIDR prefixes may be verified by constructing a query 455 for any single address (at random) within the prefix and proceeding 456 as above. While not strictly provably correct (in cases where a 457 publisher has delegated some portion of the advertised prefix but not 458 excluded it from its feed), it may nevertheless suffice for 459 operational purposes, especially if a low-impact on-going 460 verification of observed client IP addresses is implemented, to 461 (eventually) catch any oversights. 463 This mode is untested and may prove impractical. However, the 464 operational burden is more closely located with those wishing and 465 willing to bear it, i.e. the publishers who would likely handle 466 serving in-addr.arpa and ip6.arpa for the IP prefixes under their 467 authority. 469 4. Consuming self-published IP geolocation feeds 471 Consumers MAY treat published feed data as a hint only and MAY choose 472 to prefer other sources of geolocation information for any given IP 473 range. Regardless of a consumer's stance with respect to a given 474 published feed, there are some points of note for sensibly and 475 effectively consuming published feeds. 477 4.1. Feed integrity 479 The integrity of published information SHOULD be protected by 480 securing the means of publication, for example by using HTTP over TLS 481 [RFC2818]. Whenever possible, consumers SHOULD prefer retrieving 482 geolocation feeds in a manner that guarantees integrity of the feed. 484 4.2. Verification of authority 486 Consumers of self-published IP geolocation feeds SHOULD perform some 487 form of verification that the publisher is in fact authoritative for 488 the addresses in the feed. The actual means of verification is 489 likely dependent upon the way in which the feed is discovered. Ad 490 hoc shared URIs, for example, will likely require an ad hoc 491 verification process. Future automated means of feed discovery 492 SHOULD have an accompanying automated means of verification. 494 A consumer MUST only trust geolocation information for IP addresses 495 or ranges for which the publisher has been verified as 496 administratively authoritative. All other geolocation feed entries 497 MUST be ignored and SHOULD be logged for further administrative 498 review. 500 4.3. Verification of accuracy 502 Errors and inaccuracies may occur at many levels, and publication and 503 consumption of geolocation data are no exceptions. To the extent 504 practical consumers SHOULD take steps to verify the accuracy of 505 published locality. Verification methodology, resolution of 506 discrepancies, and preference for alternative sources of data are 507 left to the discretion of the feed consumer. 509 Consumers SHOULD decide on discrepancy thresholds and SHOULD flag for 510 administrative review feed entries which exceed set thresholds. 512 4.4. Refreshing feed information 514 As a publisher can change geolocation data at any time and without 515 notification consumers SHOULD implement mechanisms to periodically 516 refresh local copies of feed data. In the absence of any other 517 refresh timing information it is recommended that consumers SHOULD 518 refresh feeds no less often than weekly. 520 For feeds available via HTTPS (or HTTP), the publisher MAY 521 communicate refresh timing information by means of the standard HTTP 522 expiration model (section 13.2 of [RFC2616]). Specifically, 523 publishers can include either an Expires header or a Cache-Control 524 header specifying the max-age. Where practical, consumers SHOULD 525 refresh feed information before the expiry time is reached. 527 5. Security Considerations 529 As there is no true security in the obscurity of the location of any 530 given IP address, self-publication of this data fundamentally opens 531 no new attack vectors. For publishers, self-published data merely 532 increases the ease with which such location data might be exploited. 534 For consumers, feed retrieval processes may receive input from 535 potentially hostile sources (e.g. in the event of hijacked traffic). 536 As such, proper input validation and defense measures MUST be taken. 538 Similarly, consumers who do not perform sufficient verification of 539 published data bear the same risks as from other forms of geolocation 540 configuration errors. 542 6. Privacy Considerations 544 Publishers of geolocation feeds are advised to have fully considered 545 any and all privacy implications of the disclosure of such 546 information for the users of the described networks prior to 547 publication. A thorough comprehension of the security considerations 548 of a chosen geolocation policy is highly recommended, including an 549 understanding of some of the limitations of information obscurity 550 (see also [RFC6772]). 552 As noted in Section 2.1, each location field in an entry is optional, 553 in order to support expressing only the level of specificity which 554 the publisher has deemed acceptable. There is no requirement that 555 the level of specificity be consistent across all entries within a 556 feed. In particular, the Postal Code field (Section 2.1.1.5) can 557 provide very specific geolocation, sometimes within a building. Such 558 specific Postal Code values MUST NOT be published in geo feeds 559 without the consent of the parties being located. 561 7. Relation to other work 563 While not originally done in conjunction with the [GEOPRIV] working 564 group, Richard Barnes observed that this work is nevertheless 565 consistent with that which the group has defined, both for address 566 format and for privacy. The data elements in geolocation feeds are 567 equivalent to the following XML structure (vis. [RFC5139]): 569 570 country 571 region 572 city 573 postal_code 574 576 Providing geolocation information to this granularity is equivalent 577 to the following privacy policy (vis. the definition of the 578 'building' level of disclosure): 580 581 582 583 584 585 586 building 587 588 589 590 592 8. Acknowledgements 594 The authors would like to express their gratitude to reviewers and 595 early implementers, including but not limited to Mikael Abrahamsson, 596 Ray Bellis, John Bond, Alissa Cooper, Andras Erdei, Marco Hogewoning, 597 Mike Joseph, Warren Kumari, Menno Schepers, Justyna Sidorska, Pim van 598 Pelt, and Bjoern A. Zeeb. Richard L. Barnes in particular 599 contributed substantial review, text, and advice. 601 9. References 603 9.1. Normative References 605 [ISO.3166.1alpha2] 606 International Organization for Standardization, "ISO 607 3166-1 decoding table", . 610 [ISO.3166.2] 611 International Organization for Standardization, "ISO 3166- 612 2:2007", . 615 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 616 Requirement Levels", BCP 14, RFC 2119, March 1997. 618 [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., 619 Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext 620 Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999. 622 [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO 623 10646", STD 63, RFC 3629, November 2003. 625 [RFC4180] Shafranovich, Y., "Common Format and MIME Type for Comma- 626 Separated Values (CSV) Files", RFC 4180, October 2005. 628 [RFC4291] Hinden, R. and S. Deering, "IP Version 6 Addressing 629 Architecture", RFC 4291, February 2006. 631 [RFC4632] Fuller, V. and T. Li, "Classless Inter-domain Routing 632 (CIDR): The Internet Address Assignment and Aggregation 633 Plan", BCP 122, RFC 4632, August 2006. 635 9.2. Informative References 637 [GEOPRIV] Internet Engineering Task Force, "IETF geopriv Working 638 Group", . 640 [GEO_ICANN] 641 Internet Corporation For Assigned Names and Numbers, 642 "ICANN Meeting Geolocation Data", 643 . 645 [GEO_RIPE_NCC] 646 Schepers, M., "RIPE NCC Meeting Geolocation Data", 647 . 649 [GEO_SIXXS] 650 van Pelt, P., "SixXS Geolocation Data", 651 . 653 [IPADDR_PY] 654 Shields, M. and P. Moody, "Python IP address manipulation 655 library", . 657 [RFC2818] Rescorla, E., "HTTP Over TLS", RFC 2818, May 2000. 659 [RFC3053] Durand, A., Fasano, P., Guardini, I., and D. Lento, "IPv6 660 Tunnel Broker", RFC 3053, January 2001. 662 [RFC3403] Mealling, M., "Dynamic Delegation Discovery System (DDDS) 663 Part Three: The Domain Name System (DNS) Database", 664 RFC 3403, October 2002. 666 [RFC3912] Daigle, L., "WHOIS Protocol Specification", RFC 3912, 667 September 2004. 669 [RFC4408] Wong, M. and W. Schlitt, "Sender Policy Framework (SPF) 670 for Authorizing Use of Domains in E-Mail, Version 1", 671 RFC 4408, April 2006. 673 [RFC4627] Crockford, D., "The application/json Media Type for 674 JavaScript Object Notation (JSON)", RFC 4627, July 2006. 676 [RFC4848] Daigle, L., "Domain-Based Application Service Location 677 Using URIs and the Dynamic Delegation Discovery Service 678 (DDDS)", RFC 4848, April 2007. 680 [RFC5139] Thomson, M. and J. Winterbottom, "Revised Civic Location 681 Format for Presence Information Data Format Location 682 Object (PIDF-LO)", RFC 5139, February 2008. 684 [RFC5952] Kawamura, S. and M. Kawashima, "A Recommendation for IPv6 685 Address Text Representation", RFC 5952, August 2010. 687 [RFC6772] Schulzrinne, H., Tschofenig, H., Cuellar, J., Polk, J., 688 Morris, J., and M. Thomson, "Geolocation Policy: A 689 Document Format for Expressing Privacy Preferences for 690 Location Information", RFC 6772, January 2013. 692 Appendix A. Sample Python validation code 694 Included here is a simple format validator in Python for self- 695 published ipgeo feeds. This tool reads CSV data in the self- 696 published ipgeo feed format from the standard input and performs 697 basic validation. It is intended for use by feed publishers before 698 launching a feed. Note that this validator does not verify the 699 uniqueness of every IP prefix entry within the feed as a whole, but 700 only verifies the syntax of each single line from within the feed. A 701 complete validator MUST also ensure IP prefix uniqueness. 703 The main source file "ipgeo_feed_validator.py" follows. It requires 704 use of the open source ipaddr Python library for IP address and CIDR 705 parsing and validation [IPADDR_PY]. 707 #!/usr/bin/python 708 # 709 # Copyright (c) 2012 IETF Trust and the persons identified as authors of 710 # the code. All rights reserved. Redistribution and use in source and 711 # binary forms, with or without modification, is permitted pursuant to, 712 # and subject to the license terms contained in, the Simplified BSD 713 # License set forth in Section 4.c of the IETF Trust's Legal Provisions 714 # Relating to IETF Documents (http://trustee.ietf.org/license-info). 716 """Simple format validator for self-published ipgeo feeds. 718 This tool reads CSV data in the self-published ipgeo feed format from 719 the standard input and performs basic validation. It is intended for 720 use by feed publishers before launching a feed. 721 """ 723 import csv 724 import ipaddr 725 import re 726 import sys 728 class IPGeoFeedValidator(object): 729 def __init__(self): 730 self.ranges = {} 731 self.line_number = 0 732 self.output_log = {} 733 self.SetOutputStream(sys.stderr) 735 def Validate(self, feed): 736 """Check validity of an IPGeo feed. 738 Args: 739 feed: iterable with feed lines 740 """ 742 for line in feed: 743 self._ValidateLine(line) 745 def SetOutputStream(self, logfile): 746 """Controls where the output messages go do (STDERR by default). 748 Use None to disable logging. 750 Args: 751 logfile: a file object (e.g., sys.stdout or sys.stderr) or None. 752 """ 753 self.output_stream = logfile 755 def CountErrors(self, severity): 756 """How many ERRORs or WARNINGs were generated.""" 757 return len(self.output_log.get(severity, [])) 759 ############################################################ 760 def _ValidateLine(self, line): 761 line = line.rstrip('\r\n') 762 self.line_number += 1 763 self.line = line 764 self.is_correct_line = True 766 if self._ShouldIgnoreLine(line): 767 return 769 fields = [field for field in csv.reader([line])][0] 771 self._ValidateFields(fields) 772 self._FlushOutputStream() 774 def _ShouldIgnoreLine(self, line): 775 line = line.strip() 776 return len(line) == 0 or line.startswith('#') 778 ############################################################ 779 def _ValidateFields(self, fields): 780 assert(len(fields) > 0) 782 is_correct = self._IsIPAddressOrRangeCorrect(fields[0]) 784 if len(fields) > 1: 785 if not self._IsCountryCode2Correct(fields[1]): 786 is_correct = False 788 if len(fields) > 2 and not self._IsRegionCodeCorrect(fields[2]): 789 is_correct = False 791 if len(fields) != 5: 792 self._ReportWarning('5 fields were expected (got %d).' 793 % len(fields)) 795 ############################################################ 796 def _IsIPAddressOrRangeCorrect(self, field): 797 if '/' in field: 798 return self._IsCIDRCorrect(field) 799 return self._IsIPAddressCorrect(field) 801 def _IsCIDRCorrect(self, cidr): 802 try: 803 iprange = ipaddr.IPNetwork(cidr) 804 if iprange.network._ip != iprange._ip: 805 self._ReportError('Incorrect IP Network.') 806 return False 807 if iprange.is_private: 808 self._ReportError('IP Address must not be private.') 809 return False 810 except: 811 self._ReportError('Incorrect IP Network.') 812 return False 813 return True 815 def _IsIPAddressCorrect(self, ipaddress): 816 try: 817 ip = ipaddr.IPAddress(ipaddress) 818 except: 819 self._ReportError('Incorrect IP Address.') 820 return False 821 if ip.is_private: 822 self._ReportError('IP Address must not be private.') 823 return False 824 return True 826 ############################################################ 827 def _IsCountryCode2Correct(self, country_code_2): 828 if len(country_code_2) == 0: 829 return True 830 if len(country_code_2) != 2 or not country_code_2.isalpha(): 831 self._ReportError( 832 'Country code must be in the ISO 3166-1 alpha 2 format.') 833 return False 834 return True 836 def _IsRegionCodeCorrect(self, region_code): 837 if len(region_code) == 0: 838 return True 839 if '-' not in region_code: 840 self._ReportError('Region code must be in the ISO 3166-2 format.') 841 return False 843 parts = region_code.split('-') 844 if not self._IsCountryCode2Correct(parts[0]): 845 return False 846 return True 848 ############################################################ 849 def _ReportError(self, message): 850 self._ReportWithSeverity('ERROR', message) 852 def _ReportWarning(self, message): 853 self._ReportWithSeverity('WARNING', message) 855 def _ReportWithSeverity(self, severity, message): 856 self.is_correct_line = False 857 output_line = '%s: %s\n' % (severity, message) 859 if severity not in self.output_log: 860 self.output_log[severity] = [] 861 self.output_log[severity].append(output_line) 863 if self.output_stream is not None: 864 self.output_stream.write(output_line) 866 def _FlushOutputStream(self): 867 if self.is_correct_line: return 868 if self.output_stream is None: return 870 self.output_stream.write('line %d: %s\n\n' 871 % (self.line_number, self.line)) 873 ############################################################ 874 def main(): 875 feed_validator = IPGeoFeedValidator() 876 feed_validator.Validate(sys.stdin) 878 if feed_validator.CountErrors('ERROR'): 879 sys.exit(1) 881 if __name__ == '__main__': 882 main() 884 A unit test file, "ipgeo_feed_validator_test.py" is provided as well. 885 It provides basic test coverage of the code above, though does not 886 test correct handling of non-ASCII UTF-8 strings. 888 #!/usr/bin/python 889 # 890 # Copyright (c) 2012 IETF Trust and the persons identified as authors of 891 # the code. All rights reserved. Redistribution and use in source and 892 # binary forms, with or without modification, is permitted pursuant to, 893 # and subject to the license terms contained in, the Simplified BSD 894 # License set forth in Section 4.c of the IETF Trust's Legal Provisions 895 # Relating to IETF Documents (http://trustee.ietf.org/license-info). 897 import sys 898 from ipgeo_feed_validator import IPGeoFeedValidator 900 class IPGeoFeedValidatorTest(object): 901 def __init__(self): 902 self.validator = IPGeoFeedValidator() 903 self.validator.SetOutputStream(None) 904 self.successes = 0 905 self.failures = 0 907 def Run(self): 908 self.TestFeedLine('# asdf', 0, 0) 909 self.TestFeedLine(' ', 0, 0) 910 self.TestFeedLine('', 0, 0) 912 self.TestFeedLine('asdf', 1, 1) 913 self.TestFeedLine('asdf,US,,,', 1, 0) 914 self.TestFeedLine('aaaa::,US,,,', 0, 0) 915 self.TestFeedLine('zzzz::,US', 1, 1) 916 self.TestFeedLine(',US,,,', 1, 0) 917 self.TestFeedLine('55.66.77', 1, 1) 918 self.TestFeedLine('55.66.77.888', 1, 1) 919 self.TestFeedLine('55.66.77.asdf', 1, 1) 921 self.TestFeedLine('2001:db8:cafe::/48,PL,PL-MZ,,02-784', 0, 0) 922 self.TestFeedLine('2001:db8:cafe::/48', 0, 1) 924 self.TestFeedLine('55.66.77.88,PL', 0, 1) 925 self.TestFeedLine('55.66.77.88,PL,,,', 0, 0) 926 self.TestFeedLine('55.66.77.88,,,,', 0, 0) 927 self.TestFeedLine('55.66.77.88,ZZ,,,', 0, 0) 928 self.TestFeedLine('55.66.77.88,US,,,', 0, 0) 929 self.TestFeedLine('55.66.77.88,USA,,,', 1, 0) 930 self.TestFeedLine('55.66.77.88,99,,,', 1, 0) 932 self.TestFeedLine('55.66.77.88,US,US-CA,,', 0, 0) 933 self.TestFeedLine('55.66.77.88,US,USA-CA,,', 1, 0) 934 self.TestFeedLine('55.66.77.88,USA,USA-CA,,', 2, 0) 936 self.TestFeedLine('55.66.77.88,US,US-CA,Mountain View,', 0, 0) 937 self.TestFeedLine('55.66.77.88,US,US-CA,Mountain View,94043', 0, 0) 938 self.TestFeedLine('55.66.77.88,US,US-CA,Mountain View,94043,' 939 '1600 Ampthitheatre Parkway', 0, 1) 941 self.TestFeedLine('55.66.77.0/24,US,,,', 0, 0) 942 self.TestFeedLine('55.66.77.88/24,US,,,', 1, 0) 943 self.TestFeedLine('55.66.77.88/32,US,,,', 0, 0) 944 self.TestFeedLine('55.66.77/24,US,,,', 1, 0) 945 self.TestFeedLine('55.66.77.0/35,US,,,', 1, 0) 947 self.TestFeedLine('172.15.30.1,US,,,', 0, 0) 948 self.TestFeedLine('172.28.30.1,US,,,', 1, 0) 949 self.TestFeedLine('192.167.100.1,US,,,', 0, 0) 950 self.TestFeedLine('192.168.100.1,US,,,', 1, 0) 951 self.TestFeedLine('10.0.5.9,US,,,', 1, 0) 952 self.TestFeedLine('10.0.5.0/24,US,,,', 1, 0) 953 self.TestFeedLine('fc00::/48,PL,,,', 1, 0) 954 self.TestFeedLine('fe00::/48,PL,,,', 0, 0) 956 print '%d tests passed, %d failed' % (self.successes, self.failures) 958 def IsOutputLogCorrectAtSeverity(self, severity, expected_msg_count): 959 msg_count = self.validator.CountErrors(severity) 961 if msg_count != expected_msg_count: 962 print 'TEST FAILED: %s\nexpected %d %s[s], observed %d\n%s\n' % ( 963 self.validator.line, expected_sg_count, severity, msg_count, 964 str(self.validator.output_log[severity])) 965 return False 966 return True 968 def IsOutputLogCorrect(self, new_errors, new_warnings): 969 retval = True 971 if not self.IsOutputLogCorrectAtSeverity('ERROR', new_errors): 972 retval = False 973 if not self.IsOutputLogCorrectAtSeverity('WARNING', new_warnings): 974 retval = False 976 return retval 978 def TestFeedLine(self, line, warning_count, error_count): 979 self.validator.output_log['WARNING'] = [] 980 self.validator.output_log['ERROR'] = [] 981 self.validator._ValidateLine(line) 983 if not self.IsOutputLogCorrect(warning_count, error_count): 984 self.failures += 1 985 return False 987 self.successes += 1 988 return True 990 if __name__ == '__main__': 991 IPGeoFeedValidatorTest().Run() 993 Authors' Addresses 995 Erik Kline 996 Google Japan 997 Roppongi 6-10-1, 26th Floor 998 Minato, Tokyo 106-6126 999 Japan 1001 Phone: +81 03 6384 9000 1002 Email: ek@google.com 1004 Krzysztof Duleba 1005 Google Switzerland GmbH 1006 Brandschenkestrasse 110 1007 Zuerich 8002 1008 Switzerland 1010 Email: kduleba@google.com 1012 Zoltan Szamonek 1013 Google Switzerland GmbH 1014 Brandschenkestrasse 110 1015 Zuerich 8002 1016 Switzerland 1018 Email: zszami@google.com