idnits 2.17.1 draft-google-self-published-geofeeds-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- -- The document has an IETF Trust Provisions (28 Dec 2009) Section 6.c(i) Publication Limitation clause. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) == There are 4 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. == There are 1 instance of lines with multicast IPv4 addresses in the document. If these are generic example addresses, they should be changed to use the 233.252.0.x range defined in RFC 5771 == There are 2 instances of lines with non-RFC3849-compliant IPv6 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (May 15, 2013) is 3999 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Looks like a reference, but probably isn't: '0' on line 813 -- Looks like a reference, but probably isn't: '1' on line 754 -- Looks like a reference, but probably isn't: '2' on line 757 == Unused Reference: 'RFC6772' is defined on line 656, but no explicit reference was found in the text ** Obsolete normative reference: RFC 2616 (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235) -- Obsolete informational reference (is this intentional?): RFC 2818 (Obsoleted by RFC 9110) -- Obsolete informational reference (is this intentional?): RFC 4408 (Obsoleted by RFC 7208) -- Obsolete informational reference (is this intentional?): RFC 4627 (Obsoleted by RFC 7158, RFC 7159) Summary: 2 errors (**), 0 flaws (~~), 5 warnings (==), 9 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group E. Kline 3 Internet-Draft Google Japan 4 Intended status: Informational K. Duleba 5 Expires: November 16, 2013 Z. Szamonek 6 Google Switzerland GmbH 7 May 15, 2013 9 Self-published IP Geolocation Data 10 draft-google-self-published-geofeeds-01 12 Abstract 14 This document records a format whereby a network operator can publish 15 a mapping of IP address ranges to simplified geolocation information, 16 colloquially termed a geolocation "feed". Interested parties can 17 poll and parse these feeds to update or merge with other geolocation 18 data sources and procedures. 20 Some technical organizations operating networks that move from one 21 conference location to the next have already experimentally published 22 small geolocation feeds. At least one consumer (Google) has 23 incorporated these ad hoc feeds into a geolocation data pipeline. 25 Status of this Memo 27 This Internet-Draft is submitted in full conformance with the 28 provisions of BCP 78 and BCP 79. This document may not be modified, 29 and derivative works of it may not be created, except to format it 30 for publication as an RFC or to translate it into languages other 31 than English. 33 Internet-Drafts are working documents of the Internet Engineering 34 Task Force (IETF). Note that other groups may also distribute 35 working documents as Internet-Drafts. The list of current Internet- 36 Drafts is at http://datatracker.ietf.org/drafts/current/. 38 Internet-Drafts are draft documents valid for a maximum of six months 39 and may be updated, replaced, or obsoleted by other documents at any 40 time. It is inappropriate to use Internet-Drafts as reference 41 material or to cite them other than as "work in progress." 43 This Internet-Draft will expire on November 16, 2013. 45 Copyright Notice 47 Copyright (c) 2013 IETF Trust and the persons identified as the 48 document authors. All rights reserved. 50 This document is subject to BCP 78 and the IETF Trust's Legal 51 Provisions Relating to IETF Documents 52 (http://trustee.ietf.org/license-info) in effect on the date of 53 publication of this document. Please review these documents 54 carefully, as they describe your rights and restrictions with respect 55 to this document. Code Components extracted from this document must 56 include Simplified BSD License text as described in Section 4.e of 57 the Trust Legal Provisions and are provided without warranty as 58 described in the Simplified BSD License. 60 Table of Contents 62 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 63 1.1. Motivation . . . . . . . . . . . . . . . . . . . . . . . . 3 64 1.2. Requirements notation . . . . . . . . . . . . . . . . . . 3 65 1.3. Implications of publication . . . . . . . . . . . . . . . 3 66 2. Self-published IP geolocation feeds . . . . . . . . . . . . . 4 67 2.1. Specification . . . . . . . . . . . . . . . . . . . . . . 4 68 2.1.1. Geolocation feed individual entry fields . . . . . . . 5 69 2.1.2. Prefixes with no geolocation information . . . . . . . 6 70 2.1.3. Additional parsing requirements . . . . . . . . . . . 6 71 2.1.4. Looking up an IP address . . . . . . . . . . . . . . . 7 72 2.2. Examples . . . . . . . . . . . . . . . . . . . . . . . . . 7 73 2.3. Proposed extensions . . . . . . . . . . . . . . . . . . . 7 74 2.3.1. Delegation size . . . . . . . . . . . . . . . . . . . 8 75 2.3.2. Alternate format . . . . . . . . . . . . . . . . . . . 8 76 3. Finding self-published IP geolocation feeds . . . . . . . . . 8 77 3.1. Ad hoc 'well known' URIs . . . . . . . . . . . . . . . . . 8 78 3.2. Using public databases of network authority . . . . . . . 9 79 3.3. Using 'reverse' DNS with NAPTR records . . . . . . . . . . 9 80 4. Consuming self-published IP geolocation feeds . . . . . . . . 10 81 4.1. Feed integrity . . . . . . . . . . . . . . . . . . . . . . 10 82 4.2. Verification of authority . . . . . . . . . . . . . . . . 11 83 4.3. Verification of accuracy . . . . . . . . . . . . . . . . . 11 84 4.4. Refreshing feed information . . . . . . . . . . . . . . . 11 85 5. Security Considerations . . . . . . . . . . . . . . . . . . . 11 86 6. Privacy Considerations . . . . . . . . . . . . . . . . . . . . 12 87 7. Relation to other work . . . . . . . . . . . . . . . . . . . . 12 88 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 13 89 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 13 90 9.1. Normative References . . . . . . . . . . . . . . . . . . . 13 91 9.2. Informative References . . . . . . . . . . . . . . . . . . 14 92 Appendix A. Sample Python validation code . . . . . . . . . . . . 15 93 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 21 95 1. Introduction 97 1.1. Motivation 99 Providers of services over the Internet have grown to depend on best- 100 effort geolocation information to improve the user experience. 101 Locality information can aid in directing traffic to the nearest 102 serving location, inferring likely native language, and providing 103 additional context for services involving search queries. 105 When an ISP, for example, changes the location where an IP prefix is 106 deployed, services which make use of geolocation information may 107 begin to suffer degraded performance. This can lead to customer 108 complaints, possibly to the ISP directly. Dissemination of correct 109 geolocation data is complicated by the lack of any centralized means 110 to coordinate and communicate geolocation information to all 111 interested consumers of the data. 113 This document records a format whereby a network operator (an ISP, an 114 enterprise, or any organization which deems the geolocation of its IP 115 prefixes to be of concern) can publish a mapping of IP address ranges 116 to simplified geolocation information, colloquially termed a 117 "geolocation feed". Interested parties can poll and parse these 118 feeds to update or merge with other geolocation data sources and 119 procedures. 121 Some technical organizations operating networks that move from one 122 conference location to the next have already experimentally published 123 small geolocation feeds. At least one consumer (Google) has 124 incorporated these ad hoc feeds into a geolocation data pipeline. 126 1.2. Requirements notation 128 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 129 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 130 document are to be interpreted as described in [RFC2119]. 132 1.3. Implications of publication 134 This document describes both a format and a mechanism for publishing 135 data, with the implication that the owner of the data wishes it to be 136 public. Any privacy risk is bounded by the format, and data 137 publishers MAY omit certain fields to further protect privacy (see 138 Section 2.1 for details about which fields exactly may be omitted). 139 Feed publishers assume the responsibility of determining which data 140 should be made public. 142 This proposal does not incorporate a mechanism to communicate 143 acceptable use policies for self-published data. Publication itself 144 is inferred as a desire by the publisher for the data to be usefully 145 consumed, similar to the publication of information like host names, 146 cryptographic keys, and SPF records [RFC4408] in the DNS. 148 2. Self-published IP geolocation feeds 150 The format described here was developed to address the need of 151 network operators to rapidly and usefully share geolocation 152 information changes. Originally, there arose a specific case where 153 regional operators found it desirable to publish location changes 154 rather than wait for geolocation algorithms to "learn" about them. 155 Later, technical conferences which frequently use the same network 156 prefixes advertised from different conference locations experimented 157 by publishing geolocation feeds, updated in advance of network 158 location changes, in order to better serve conference attendees. 160 At its simplest, the mechanism consists of a network operator 161 publishing a file (the "geolocation feed"), which contains several 162 text entries, one per line. Each entry is keyed by a unique (within 163 the feed) IP prefix (or single IP address) followed by a sequence of 164 network locality attributes to be ascribed to the given prefix. 166 2.1. Specification 168 For operational simplicity, every feed should contain data about all 169 IP addresses the provider wants to publish. Alternatives, like 170 publishing only entries for IP addresses whose geolocation data has 171 changed or differ from current observed geolocation behavior "at 172 large", are likely to be too operationally complex. 174 Feeds MUST use UTF-8 [RFC3629] character encoding. Text after a '#' 175 character is treated as a comment only and ignored. Blank lines are 176 similarly ignored. 178 Feeds MUST be in comma separated values format as described in 179 [RFC4180]. Each feed entry is a text line of the form: 181 ip_range,country,region,city,postal_code 183 The IP range field is REQUIRED, all others are OPTIONAL (can be 184 empty), though the requisite minimum number of commas SHOULD be 185 present. 187 2.1.1. Geolocation feed individual entry fields 189 2.1.1.1. IP Range 191 REQUIRED. Each IP range field MUST be either a single IP address or 192 an IP prefix in CIDR notation in conformance with section 3.1 of 193 [RFC4632] for IPv4 or section 2.3 of [RFC4291] for IPv6. 195 Examples include "192.0.2.1" and "192.0.2.0/24" for IPv4 and "2001: 196 db8::1" and "2001:db8::/32" for IPv6. 198 2.1.1.2. Country 200 OPTIONAL. The country field, if non-empty, MUST be a 2 letter ISO 201 country code conforming to ISO 3166-1 alpha 2 [ISO.3166.1alpha2]. 202 Parsers SHOULD treat this field case-insensitively. 204 Examples include "US" for the United States, "JP" for Japan, and "PL" 205 for Poland. 207 2.1.1.3. Region 209 OPTIONAL. The region field, if non-empty, MUST be a ISO region code 210 conforming to ISO 3166-2 [ISO.3166.2]. Parsers SHOULD treat this 211 field case-insensitively. 213 Examples include "ID-RI" for the Riau province of Indonesia and 214 "NG-RI" for the Rivers province in Nigeria. 216 2.1.1.4. City 218 OPTIONAL. The city field, if non-empty, SHOULD be free UTF-8 text, 219 excluding the comma (',') character. 221 Examples include "Dublin", "New York", and "Sao Paulo" (specifically 222 "S" followed by 0xc3, 0xa3, and "o Paulo"). 224 2.1.1.5. Postal code 226 OPTIONAL. The postal code field, if non-empty, SHOULD be free UTF-8 227 text, excluding the comma (',') character. See Section 6 for some 228 discussion of when this field must not be populated. 230 Examples include "106-6126" (in Minato ward, Tokyo, Japan). 232 2.1.2. Prefixes with no geolocation information 234 Feed publishers may indicate that some IP prefixes should not have 235 any associated geolocation information. It may be that some prefixes 236 under their administrative control are reserved, not yet allocated or 237 deployed, or are in the process of being redeployed elsewhere and 238 existing geolocation information can, from the perspective of the 239 publisher, safely be discarded. 241 This special case can be indicated by explicitly leaving blank all 242 fields which specify any degree of geolocation information. For 243 example: 245 127.0.0.0/8,,,, 246 224.0.0.0/4,,,, 247 240.0.0.0/4,,,, 249 Historically, the user-assigned country identifier of "ZZ" had be 250 used for this same purpose. This is not necessarily preferred, and 251 no specific interpretation of any of the other user-assigned country 252 codes is currently defined. 254 2.1.3. Additional parsing requirements 256 Feed entries missing required fields, or having a required field 257 which fails to parse correctly MUST be discarded. It is RECOMMENDED 258 that such entries also be logged for further administrative review. 260 While publishers SHOULD follow [RFC5952] style for IPv6 prefix 261 fields, consumers MUST nevertheless accept all valid string 262 representations. 264 Duplicate IP address or prefix entries MUST be considered an error, 265 and consumer implementations SHOULD log the repeated entries for 266 further administrative review. Publishers SHOULD take measures to 267 ensure there is one and only one entry per IP address and prefix. 269 Feed entries with non-empty optional fields which fail to parse, 270 either in part or in full, SHOULD be discarded. It is RECOMMENDED 271 that they also be logged for further administrative review. 273 For compatibility with future additional fields a parser MUST ignore 274 any fields beyond those it expects. The data from fields which are 275 expected and which parse successfully MUST still be considered valid. 277 2.1.4. Looking up an IP address 279 Multiple entries which constitute nested prefixes are permitted. 280 Consumers SHOULD consider the entry with the longest matching prefix 281 (i.e. the "most specific") to be the best matching entry for a given 282 IP address. 284 2.2. Examples 286 Example entries using different IP address formats and describing 287 locations at country, region, city and postal code granularity level, 288 respectively: 290 192.0.2.0/25,US,US-AL,, 291 192.0.2.5,US,US-AL,Alabaster, 292 192.0.2.128/25,PL,PL-MZ,,02-784 293 2001:db8::/32,PL,,, 294 2001:db8:cafe::/48,PL,PL-MZ,,02-784 296 Experimentally, RIPE has published geolocation information for their 297 conference network prefixes, which change location in accordance with 298 each new event. [GEO_RIPE_NCC] at the time of writing contains: 300 193.0.24.0/21,IE,IE-D,Dublin, 301 2001:67c:64::/48,IE,IE-D,Dublin, 303 Similarly, ICANN has published geolocation information for their 304 portable conference network prefixes. [GEO_ICANN] at the time of 305 writing contains: 307 199.91.192.0/21,US,US-CA,Los Angeles, 308 2620:f:8000::/48,US,US-CA,Los Angeles, 310 2.3. Proposed extensions 312 Already some discussions have resulted in proposed extensions. While 313 the purpose of this document is principally to record existing 314 implementation details, it may be that there is a larger desire to 315 publish other "network attributes" in a similar manner. One such 316 network attribute, "delegation size", is not currently implemented 317 but the state of the proposed extension is recorded here to 318 demonstrate the flexibility required of parser implementations. 320 The following have been only informally discussed and are not in use 321 at the time of writing. 323 2.3.1. Delegation size 325 OPTIONAL. A publisher may optionally communicate the average 326 delegated prefix size for subnetworks within the IP prefix of this 327 entry. For a network operator this can be used to help consumers 328 distinguish IP prefixes among various use types such as residential 329 prefixes, allocations to businesses, or data center customer 330 allocations. 332 Non-empty strings MUST be of the form required for CIDR notation 333 suffixes, i.e. "/" followed by the integer prefix length of the 334 expected allocation to the subnetworks from within the entry's 335 prefix. In the absence of data to the contrary, it is common to 336 assume that leaf networks may be delegated a prefix ranging from /24 337 to /32 in IPv4 and /48 to /64 in IPv6. Default assumptions about 338 delegation size are left to the consumer's implementation. 340 Examples for IPv6 include "/48", "/56", "/60", and "/64". 342 2.3.2. Alternate format 344 In order to more flexibly support future extensions, use of a more 345 expressive feed format has been suggested. Use of JavaScript Object 346 Notation (JSON, [RFC4627]), specifically, has been discussed. 347 However, at the time of writing no such specification nor 348 implementation exists. 350 3. Finding self-published IP geolocation feeds 352 The issue of finding, and later verifying, geolocation feeds is not 353 formally specified in this document. At this time, only ad hoc feed 354 discovery and verification has a modicum of established practice (see 355 below). Regardless, both the ad hoc mechanics and a few proposed but 356 not yet implemented alternatives are discussed. 358 3.1. Ad hoc 'well known' URIs 360 To date, geolocation feeds have been shared informally in the form of 361 HTTPS URIs exchanged in email threads. The two example URIs 362 documented above describe networks that change locations 363 periodically, the operators and operational practices of which are 364 well known within their respective technical communities. 366 The contents of the feeds are verified by a similarly ad hoc process 367 including: 369 o personal knowledge of the parties involved in the exchange, and 371 o comparison of feed-advertised prefixes with the BGP-advertised 372 prefixes of Autonomous System Numbers known to be operated by the 373 publishers. 375 Ad hoc mechanisms, while useful for early experimentation by 376 producers and consumers, are unlikely to be adequate for long-term, 377 widespread use by multiple parties. Future versions of any such 378 self-published geolocation feed mechanism SHOULD address scalability 379 concerns by defining a means for automated discovery and verification 380 of operational authority of advertised prefixes. 382 3.2. Using public databases of network authority 384 One possibility for enabling automation would be publication of feed 385 URIs as a well-known attribute in public databases of network 386 authority, e.g. the WHOIS service ([RFC3912]) operated by RIRs. 387 Verification may be performed if the same or similarly authoritative 388 service provides the identical feed URI for queries for each CIDR 389 prefix in the geolocation feed. 391 The burden of serving this data to all interested consumers, 392 especially the load imposed by any verification process, is not yet 393 known. The anticipation of additional operational burden on the 394 public resource of record (the database of network authority) is 395 however a noted concern. 397 3.3. Using 'reverse' DNS with NAPTR records 399 Another possibility for automating the location and verification of a 400 geolocation feed is to incorporate feed URIs into the DNS, 401 specifically the in-addr.arpa and ip6.arpa portions of the DNS 402 hierarchy. A suitably formatted query for a NAPTR ([RFC3403]) 403 record, or more specifically a U-NAPTR ([RFC4848]) record, could 404 yield a transformation to a geolocation feed URI. 406 For example, assuming a purely theoretical service name of 407 "x-geofeed", a 'reverse' DNS zone might contain a record of the form: 409 ;; order pref flags 410 IN NAPTR 200 10 "u" "x-geofeed" ( ; service 411 ; regexp 412 "!.*!https://example.com/ipgeo.csv!" 413 "" ; replacement 414 ) 416 Attempts to locate the geolocation feed for a given IP address would 417 begin by querying directly for a NAPTR record associated with the 418 address's PTR-style name. For example, 192.0.2.4 and 2001:db8::6 419 would cause a NAPTR record request to be issued for "4.2.0.192.in- 420 addr.arpa" and "6.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.8.b.d 421 .0.1.0.0.2.ip6.arpa", respectively. 423 If no such record exists one further query for the fully qualified 424 domain name of the SOA record in the authority section of the 425 response to the previous query would be performed ("2.0.192.in- 426 addr.arpa" and "d.0.1.0.0.2.ip6.arpa" in the examples above). 427 Successfully located feed URIs could then be processed as outlined by 428 this document. 430 Verification of the contents of a feed would proceed in essentially 431 the same way. CIDR prefixes may be verified by constructing a query 432 for any single address (at random) within the prefix and proceeding 433 as above. While not strictly provably correct (in cases where a 434 publisher has delegated some portion of the advertised prefix but not 435 excluded it from its feed), it may nevertheless suffice for 436 operational purposes, especially if a low-impact on-going 437 verification of observed client IP addresses is implemented, to 438 (eventually) catch any oversights. 440 This mode is untested and may prove impractical. However, the 441 operational burden is more closely located with those wishing and 442 willing to bear it, i.e. the publishers who would likely handle 443 serving in-addr.arpa and ip6.arpa for the IP prefixes under their 444 authority. 446 4. Consuming self-published IP geolocation feeds 448 Consumers MAY treat published feed data as a hint only and MAY choose 449 to prefer other sources of geolocation information for any given IP 450 range. Regardless of a consumer's stance with respect to a given 451 published feed, there are some points of note for sensibly and 452 effectively consuming published feeds. 454 4.1. Feed integrity 456 The integrity of published information SHOULD be protected by 457 securing the means of publication, for example by using HTTP over TLS 458 [RFC2818]. Whenever possible, consumers SHOULD prefer retrieving 459 geolocation feeds in a manner that guarantees integrity of the feed. 461 4.2. Verification of authority 463 Consumers of self-published IP geolocation feeds SHOULD perform some 464 form of verification that the publisher is in fact authoritative for 465 the addresses in the feed. The actual means of verification is 466 likely dependent upon the way in which the feed is discovered. Ad 467 hoc shared URIs, for example, will likely require an ad hoc 468 verification process. Future automated means of feed discovery 469 SHOULD have an accompanying automated means of verification. 471 A consumer MUST only trust geolocation information for IP addresses 472 or ranges for which the publisher has been verified as 473 administratively authoritative. All other geolocation feed entries 474 MUST be ignored and SHOULD be logged for further administrative 475 review. 477 4.3. Verification of accuracy 479 Errors and inaccuracies may occur at many levels, and publication and 480 consumption of geolocation data are no exceptions. To the extent 481 practical consumers SHOULD take steps to verify the accuracy of 482 published locality. Verification methodology, resolution of 483 discrepancies, and preference for alternative sources of data are 484 left to the discretion of the feed consumer. 486 Consumers SHOULD decide on discrepancy thresholds and SHOULD flag for 487 administrative review feed entries which exceed set thresholds. 489 4.4. Refreshing feed information 491 As a publisher can change geolocation data at any time and without 492 notification consumers SHOULD implement mechanisms to periodically 493 refresh local copies of feed data. In the absence of any other 494 refresh timing information it is recommended that consumers SHOULD 495 refresh feeds no less often than weekly. 497 For feeds available via HTTPS (or HTTP), the publisher MAY 498 communicate refresh timing information by means of the standard HTTP 499 expiration model (section 13.2 of [RFC2616]). Specifically, 500 publishers can include either an Expires header or a Cache-Control 501 header specifying the max-age. Where practical, consumers SHOULD 502 refresh feed information before the expiry time is reached. 504 5. Security Considerations 506 As there is no true security in the obscurity of the location of any 507 given IP address, self-publication of this data fundamentally opens 508 no new attack vectors. For publishers, self-published data merely 509 increases the ease with which such location data might be exploited. 511 For consumers, feed retrieval processes may receive input from 512 potentially hostile sources (e.g. in the event of hijacked traffic). 513 As such, proper input validation and defense measures MUST be taken. 515 Similarly, consumers who do not perform sufficient verification of 516 published data bear the same risks as from other forms of geolocation 517 configuration errors. 519 6. Privacy Considerations 521 Publishers of geolocation feeds are advised to have fully considered 522 any and all privacy implications of the disclosure of such 523 information for the users of the described networks prior to 524 publication. A thorough comprehension of the security considerations 525 of a chosen geolocation policy is highly recommended, including an 526 understanding of some of the limitations of information obscurity. 528 As noted in Section 2.1, each location field in an entry is optional, 529 in order to support expressing only the level of specificity which 530 the publisher has deemed acceptable. There is no requirement that 531 the level of specificity be consistent across all entries within a 532 feed. In particular, the Postal Code field (Section 2.1.1.5) can 533 provide very specific geolocation, sometimes within a building. Such 534 specific Postal Code values MUST NOT be published in geo feeds 535 without the consent of the parties being located. 537 7. Relation to other work 539 While not originally done in conjunction with the [GEOPRIV] working 540 group, Richard Barnes observed that this work is nevertheless 541 consistent with that which the group has defined, both for address 542 format and for privacy. The data elements in geolocation feeds are 543 equivalent to the following XML structure (vis. [RFC5139]): 545 546 country 547 region 548 city 549 postal_code 550 552 Providing geolocation information to this granularity is equivalent 553 to the following privacy policy (vis. the definition of the 554 'building' level of disclosure): 556 557 558 559 560 561 562 building 563 564 565 566 568 8. Acknowledgements 570 The authors would like to express their gratitude to reviewers and 571 early implementers, including but not limited to Mikael Abrahamsson, 572 John Bond, Alissa Cooper, Andras Erdei, Marco Hogewoning, Mike 573 Joseph, Warren Kumari, Menno Schepers, Justyna Sidorska, and Bjoern 574 A. Zeeb. Richard L. Barnes in particular contributed substantial 575 review, text, and advice. 577 9. References 579 9.1. Normative References 581 [ISO.3166.1alpha2] 582 International Organization for Standardization, "ISO 583 3166-1 decoding table", . 586 [ISO.3166.2] 587 International Organization for Standardization, "ISO 3166- 588 2:2007", . 591 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 592 Requirement Levels", BCP 14, RFC 2119, March 1997. 594 [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., 595 Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext 596 Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999. 598 [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO 599 10646", STD 63, RFC 3629, November 2003. 601 [RFC4180] Shafranovich, Y., "Common Format and MIME Type for Comma- 602 Separated Values (CSV) Files", RFC 4180, October 2005. 604 [RFC4291] Hinden, R. and S. Deering, "IP Version 6 Addressing 605 Architecture", RFC 4291, February 2006. 607 [RFC4632] Fuller, V. and T. Li, "Classless Inter-domain Routing 608 (CIDR): The Internet Address Assignment and Aggregation 609 Plan", BCP 122, RFC 4632, August 2006. 611 9.2. Informative References 613 [GEOPRIV] Internet Engineering Task Force, "IETF geopriv Working 614 Group", . 616 [GEO_ICANN] 617 Internet Corporation For Assigned Names and Numbers, 618 "ICANN Meeting Geolocation Data", 619 . 621 [GEO_RIPE_NCC] 622 Schepers, M., "RIPE NCC Meeting Geolocation Data", 623 . 625 [IPADDR_PY] 626 Shields, M. and P. Moody, "Python IP address manipulation 627 library", . 629 [RFC2818] Rescorla, E., "HTTP Over TLS", RFC 2818, May 2000. 631 [RFC3403] Mealling, M., "Dynamic Delegation Discovery System (DDDS) 632 Part Three: The Domain Name System (DNS) Database", 633 RFC 3403, October 2002. 635 [RFC3912] Daigle, L., "WHOIS Protocol Specification", RFC 3912, 636 September 2004. 638 [RFC4408] Wong, M. and W. Schlitt, "Sender Policy Framework (SPF) 639 for Authorizing Use of Domains in E-Mail, Version 1", 640 RFC 4408, April 2006. 642 [RFC4627] Crockford, D., "The application/json Media Type for 643 JavaScript Object Notation (JSON)", RFC 4627, July 2006. 645 [RFC4848] Daigle, L., "Domain-Based Application Service Location 646 Using URIs and the Dynamic Delegation Discovery Service 647 (DDDS)", RFC 4848, April 2007. 649 [RFC5139] Thomson, M. and J. Winterbottom, "Revised Civic Location 650 Format for Presence Information Data Format Location 651 Object (PIDF-LO)", RFC 5139, February 2008. 653 [RFC5952] Kawamura, S. and M. Kawashima, "A Recommendation for IPv6 654 Address Text Representation", RFC 5952, August 2010. 656 [RFC6772] Schulzrinne, H., Tschofenig, H., Cuellar, J., Polk, J., 657 Morris, J., and M. Thomson, "Geolocation Policy: A 658 Document Format for Expressing Privacy Preferences for 659 Location Information", RFC 6772, January 2013. 661 Appendix A. Sample Python validation code 663 Included here is a simple format validator in Python for self- 664 published ipgeo feeds. This tool reads CSV data in the self- 665 published ipgeo feed format from the standard input and performs 666 basic validation. It is intended for use by feed publishers before 667 launching a feed. Note that this validator does not verify the 668 uniqueness of every IP prefix entry within the feed as a whole, but 669 only verifies the syntax of each single line from within the feed. A 670 complete validator MUST also ensure IP prefix uniqueness. 672 The main source file "ipgeo_feed_validator.py" follows. It requires 673 use of the open source ipaddr Python library for IP address and CIDR 674 parsing and validation [IPADDR_PY]. 676 #!/usr/bin/python 677 # 678 # Copyright (c) 2012 IETF Trust and the persons identified as authors of 679 # the code. All rights reserved. Redistribution and use in source and 680 # binary forms, with or without modification, is permitted pursuant to, 681 # and subject to the license terms contained in, the Simplified BSD 682 # License set forth in Section 4.c of the IETF Trust's Legal Provisions 683 # Relating to IETF Documents (http://trustee.ietf.org/license-info). 685 """Simple format validator for self-published ipgeo feeds. 687 This tool reads CSV data in the self-published ipgeo feed format from 688 the standard input and performs basic validation. It is intended for 689 use by feed publishers before launching a feed. 690 """ 692 import csv 693 import ipaddr 694 import re 695 import sys 697 class IPGeoFeedValidator(object): 698 def __init__(self): 699 self.ranges = {} 700 self.line_number = 0 701 self.output_log = {} 702 self.SetOutputStream(sys.stderr) 704 def Validate(self, feed): 705 """Check validity of an IPGeo feed. 707 Args: 708 feed: iterable with feed lines 709 """ 711 for line in feed: 712 self._ValidateLine(line) 714 def SetOutputStream(self, logfile): 715 """Controls where the output messages go do (STDERR by default). 717 Use None to disable logging. 719 Args: 720 logfile: a file object (e.g., sys.stdout or sys.stderr) or None. 721 """ 722 self.output_stream = logfile 724 def CountErrors(self, severity): 725 """How many ERRORs or WARNINGs were generated.""" 726 return len(self.output_log.get(severity, [])) 728 ############################################################ 729 def _ValidateLine(self, line): 730 line = line.rstrip('\r\n') 731 self.line_number += 1 732 self.line = line 733 self.is_correct_line = True 735 if self._ShouldIgnoreLine(line): 736 return 738 fields = [field for field in csv.reader([line])][0] 740 self._ValidateFields(fields) 741 self._FlushOutputStream() 743 def _ShouldIgnoreLine(self, line): 744 line = line.strip() 745 return len(line) == 0 or line.startswith('#') 747 ############################################################ 748 def _ValidateFields(self, fields): 749 assert(len(fields) > 0) 751 is_correct = self._IsIPAddressOrRangeCorrect(fields[0]) 753 if len(fields) > 1: 754 if not self._IsCountryCode2Correct(fields[1]): 755 is_correct = False 757 if len(fields) > 2 and not self._IsRegionCodeCorrect(fields[2]): 758 is_correct = False 760 if len(fields) != 5: 761 self._ReportWarning('5 fields were expected (got %d).' 762 % len(fields)) 764 ############################################################ 765 def _IsIPAddressOrRangeCorrect(self, field): 766 if '/' in field: 767 return self._IsCIDRCorrect(field) 768 return self._IsIPAddressCorrect(field) 770 def _IsCIDRCorrect(self, cidr): 771 try: 772 iprange = ipaddr.IPNetwork(cidr) 773 if iprange.network._ip != iprange._ip: 774 self._ReportError('Incorrect IP Network.') 775 return False 776 if iprange.is_private: 777 self._ReportError('IP Address must not be private.') 778 return False 779 except: 780 self._ReportError('Incorrect IP Network.') 781 return False 782 return True 784 def _IsIPAddressCorrect(self, ipaddress): 785 try: 786 ip = ipaddr.IPAddress(ipaddress) 787 except: 788 self._ReportError('Incorrect IP Address.') 789 return False 790 if ip.is_private: 791 self._ReportError('IP Address must not be private.') 792 return False 793 return True 795 ############################################################ 796 def _IsCountryCode2Correct(self, country_code_2): 797 if len(country_code_2) == 0: 798 return True 799 if len(country_code_2) != 2 or not country_code_2.isalpha(): 800 self._ReportError( 801 'Country code must be in the ISO 3166-1 alpha 2 format.') 802 return False 803 return True 805 def _IsRegionCodeCorrect(self, region_code): 806 if len(region_code) == 0: 807 return True 808 if '-' not in region_code: 809 self._ReportError('Region code must be in the ISO 3166-2 format.') 810 return False 812 parts = region_code.split('-') 813 if not self._IsCountryCode2Correct(parts[0]): 814 return False 815 return True 817 ############################################################ 818 def _ReportError(self, message): 819 self._ReportWithSeverity('ERROR', message) 821 def _ReportWarning(self, message): 822 self._ReportWithSeverity('WARNING', message) 824 def _ReportWithSeverity(self, severity, message): 825 self.is_correct_line = False 826 output_line = '%s: %s\n' % (severity, message) 828 if severity not in self.output_log: 829 self.output_log[severity] = [] 830 self.output_log[severity].append(output_line) 832 if self.output_stream is not None: 833 self.output_stream.write(output_line) 835 def _FlushOutputStream(self): 836 if self.is_correct_line: return 837 if self.output_stream is None: return 839 self.output_stream.write('line %d: %s\n\n' 840 % (self.line_number, self.line)) 842 ############################################################ 843 def main(): 844 feed_validator = IPGeoFeedValidator() 845 feed_validator.Validate(sys.stdin) 847 if feed_validator.CountErrors('ERROR'): 848 sys.exit(1) 850 if __name__ == '__main__': 851 main() 853 A unit test file, "ipgeo_feed_validator_test.py" is provided as well. 854 It provides basic test coverage of the code above, though does not 855 test correct handling of non-ASCII UTF-8 strings. 857 #!/usr/bin/python 858 # 859 # Copyright (c) 2012 IETF Trust and the persons identified as authors of 860 # the code. All rights reserved. Redistribution and use in source and 861 # binary forms, with or without modification, is permitted pursuant to, 862 # and subject to the license terms contained in, the Simplified BSD 863 # License set forth in Section 4.c of the IETF Trust's Legal Provisions 864 # Relating to IETF Documents (http://trustee.ietf.org/license-info). 866 import sys 867 from ipgeo_feed_validator import IPGeoFeedValidator 869 class IPGeoFeedValidatorTest(object): 870 def __init__(self): 871 self.validator = IPGeoFeedValidator() 872 self.validator.SetOutputStream(None) 873 self.successes = 0 874 self.failures = 0 876 def Run(self): 877 self.TestFeedLine('# asdf', 0, 0) 878 self.TestFeedLine(' ', 0, 0) 879 self.TestFeedLine('', 0, 0) 881 self.TestFeedLine('asdf', 1, 1) 882 self.TestFeedLine('asdf,US,,,', 1, 0) 883 self.TestFeedLine('aaaa::,US,,,', 0, 0) 884 self.TestFeedLine('zzzz::,US', 1, 1) 885 self.TestFeedLine(',US,,,', 1, 0) 886 self.TestFeedLine('55.66.77', 1, 1) 887 self.TestFeedLine('55.66.77.888', 1, 1) 888 self.TestFeedLine('55.66.77.asdf', 1, 1) 890 self.TestFeedLine('2001:db8:cafe::/48,PL,PL-MZ,,02-784', 0, 0) 891 self.TestFeedLine('2001:db8:cafe::/48', 0, 1) 893 self.TestFeedLine('55.66.77.88,PL', 0, 1) 894 self.TestFeedLine('55.66.77.88,PL,,,', 0, 0) 895 self.TestFeedLine('55.66.77.88,,,,', 0, 0) 896 self.TestFeedLine('55.66.77.88,ZZ,,,', 0, 0) 897 self.TestFeedLine('55.66.77.88,US,,,', 0, 0) 898 self.TestFeedLine('55.66.77.88,USA,,,', 1, 0) 899 self.TestFeedLine('55.66.77.88,99,,,', 1, 0) 901 self.TestFeedLine('55.66.77.88,US,US-CA,,', 0, 0) 902 self.TestFeedLine('55.66.77.88,US,USA-CA,,', 1, 0) 903 self.TestFeedLine('55.66.77.88,USA,USA-CA,,', 2, 0) 905 self.TestFeedLine('55.66.77.88,US,US-CA,Mountain View,', 0, 0) 906 self.TestFeedLine('55.66.77.88,US,US-CA,Mountain View,94043', 0, 0) 907 self.TestFeedLine('55.66.77.88,US,US-CA,Mountain View,94043,' 908 '1600 Ampthitheatre Parkway', 0, 1) 910 self.TestFeedLine('55.66.77.0/24,US,,,', 0, 0) 911 self.TestFeedLine('55.66.77.88/24,US,,,', 1, 0) 912 self.TestFeedLine('55.66.77.88/32,US,,,', 0, 0) 913 self.TestFeedLine('55.66.77/24,US,,,', 1, 0) 914 self.TestFeedLine('55.66.77.0/35,US,,,', 1, 0) 916 self.TestFeedLine('172.15.30.1,US,,,', 0, 0) 917 self.TestFeedLine('172.28.30.1,US,,,', 1, 0) 918 self.TestFeedLine('192.167.100.1,US,,,', 0, 0) 919 self.TestFeedLine('192.168.100.1,US,,,', 1, 0) 920 self.TestFeedLine('10.0.5.9,US,,,', 1, 0) 921 self.TestFeedLine('10.0.5.0/24,US,,,', 1, 0) 922 self.TestFeedLine('fc00::/48,PL,,,', 1, 0) 923 self.TestFeedLine('fe00::/48,PL,,,', 0, 0) 925 print '%d tests passed, %d failed' % (self.successes, self.failures) 927 def IsOutputLogCorrectAtSeverity(self, severity, expected_msg_count): 928 msg_count = self.validator.CountErrors(severity) 930 if msg_count != expected_msg_count: 932 print 'TEST FAILED: %s\nexpected %d %s[s], observed %d\n%s\n' % ( 933 self.validator.line, expected_sg_count, severity, msg_count, 934 str(self.validator.output_log[severity])) 935 return False 936 return True 938 def IsOutputLogCorrect(self, new_errors, new_warnings): 939 retval = True 941 if not self.IsOutputLogCorrectAtSeverity('ERROR', new_errors): 942 retval = False 943 if not self.IsOutputLogCorrectAtSeverity('WARNING', new_warnings): 944 retval = False 946 return retval 948 def TestFeedLine(self, line, warning_count, error_count): 949 self.validator.output_log['WARNING'] = [] 950 self.validator.output_log['ERROR'] = [] 951 self.validator._ValidateLine(line) 953 if not self.IsOutputLogCorrect(warning_count, error_count): 954 self.failures += 1 955 return False 957 self.successes += 1 958 return True 960 if __name__ == '__main__': 961 IPGeoFeedValidatorTest().Run() 963 Authors' Addresses 965 Erik Kline 966 Google Japan 967 Roppongi 6-10-1, 26th Floor 968 Minato, Tokyo 106-6126 969 Japan 971 Phone: +81 03 6384 9000 972 Email: ek@google.com 973 Krzysztof Duleba 974 Google Switzerland GmbH 975 Brandschenkestrasse 110 976 Zuerich 8002 977 Switzerland 979 Email: kduleba@google.com 981 Zoltan Szamonek 982 Google Switzerland GmbH 983 Brandschenkestrasse 110 984 Zuerich 8002 985 Switzerland 987 Email: zszami@google.com