A Format for Self-published IP
Geolocation FeedsGoogle JapanRoppongi 6-10-1, 26th FloorMinatoTokyo106-6126Japan+81 03 6384 9000ek@google.comGoogle Switzerland GmbHBrandschenkestrasse 1108002ZürichSwitzerlandkduleba@google.comGoogle Switzerland GmbHBrandschenkestrasse 1108002ZürichSwitzerlandzszami@google.comThis document records a format whereby a network operator can publish
a mapping of IP address prefixes to simplified geolocation information,
colloquially termed a geolocation "feed". Interested parties can poll
and parse these feeds to update or merge with other geolocation data
sources and procedures. This format intentionally only allows specifying
coarse level location.Some technical organizations operating networks that move from one
conference location to the next have already experimentally published
small geolocation feeds. At least one consumer (Google) has incorporated
these ad hoc feeds into a geolocation data pipeline, and is using it to
allow ISPs to inform them where the prefixes live.[RFC Ed - Please remove publication: The IETF Meeting network
currently publishes a feed in this format at:
https://noc.ietf.org/geo/google.csv -- this has significantly cut down
on the number of "Gah! Why does the network believe I'm in Montreal,
that was last meeting! How am I supposed to find a pub?!"
complaints. A number of other meeting networks, including RIPE and ICANN
publish this information as well, see below. ][ Ed note: Text inside square brackets ([]) is additional background
information, answers to frequently asked questions, general musings,
etc. They will be removed before publication.][ This document is being collaborated on in Github at:
https://github.com/google/self-published-geo . The most recent version
of the document, open issues, etc should all be available here. The
authors (gratefully) accept pull requests ]Providers of services over the Internet have grown to depend on
best-effort geolocation information to improve the user experience.
Locality information can aid in directing traffic to the nearest
serving location, inferring likely native language, and providing
additional context for services involving search queries.When an ISP, for example, changes the location where an IP prefix
is deployed, services which make use of geolocation information may
begin to suffer degraded performance. This can lead to customer
complaints, possibly to the ISP directly. Dissemination of correct
geolocation data is complicated by the lack of any centralized means
to coordinate and communicate geolocation information to all
interested consumers of the data.This document records a format whereby a network operator (an ISP,
an enterprise, or any organization which deems the geolocation of its
IP prefixes to be of concern) can publish a mapping of IP address
prefixes to simplified geolocation information, colloquially termed a
"geolocation feed". Interested parties can poll and parse these feeds
to update or merge with other geolocation data sources and
procedures.Some technical organizations operating networks that move from one
conference location to the next have already experimentally published
small geolocation feeds. At least one consumer (Google) has
incorporated these ad hoc feeds into a geolocation data pipeline.The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC2119.This document describes both a format and a mechanism for
publishing data, with the implication that the owner of the data
wishes it to be public. Any privacy risk is bounded by the format, and
feed publishers MAY omit any location field to further protect privacy
(see for details about which fields exactly may
be omitted). Feed publishers assume the responsibility of determining
which data should be made public.This proposal does not incorporate a mechanism to communicate
acceptable use policies for self-published data. Publication itself is
inferred as a desire by the publisher for the data to be usefully
consumed, similar to the publication of information like host names,
cryptographic keys, and SPF records in the
DNS.The format described here was developed to address the need of
network operators to rapidly and usefully share geolocation information
changes. Originally, there arose a specific case where regional
operators found it desirable to publish location changes rather than
wait for geolocation algorithms to "learn" about them. Later, technical
conferences which frequently use the same network prefixes advertised
from different conference locations experimented by publishing
geolocation feeds, updated in advance of network location changes, in
order to better serve conference attendees.At its simplest, the mechanism consists of a network operator
publishing a file (the "geolocation feed"), which contains several text
entries, one per line. Each entry is keyed by a unique (within the feed)
IP prefix (or single IP address) followed by a sequence of network
locality attributes to be ascribed to the given prefix.For operational simplicity, every feed should contain data about
all IP addresses the provider wants to publish. Alternatives, like
publishing only entries for IP addresses whose geolocation data has
changed or differ from current observed geolocation behavior "at
large", are likely to be too operationally complex.Feeds MUST use UTF-8 character encoding.
Text after a '#' character is treated as a comment only and ignored.
Blank lines are similarly ignored.Feeds MUST be in comma separated values format as described in
. Each feed entry is a text line of the form:
The IP prefix field is REQUIRED, all others are OPTIONAL (can be
empty), though the requisite minimum number of commas SHOULD be
present.REQUIRED. Each IP prefix field MUST be either a single IP
address or an IP prefix in CIDR notation in conformance with section
3.1 of for IPv4 or section
2.3 of for IPv6.Examples include "192.0.2.1" and "192.0.2.0/24" for IPv4 and
"2001:db8::1" and "2001:db8::/32" for IPv6.OPTIONAL. The country field, if non-empty, MUST be a 2 letter
ISO country code conforming to ISO 3166-1 alpha 2 . Parsers SHOULD treat this field
case-insensitively.Examples include "US" for the United States, "JP" for Japan,
and "PL" for Poland.OPTIONAL. The region field, if non-empty, MUST be a ISO region
code conforming to ISO 3166-2 . Parsers
SHOULD treat this field case-insensitively.Examples include "ID-RI" for the Riau province of Indonesia and
"NG-RI" for the Rivers province in Nigeria.OPTIONAL. The city field, if non-empty, SHOULD be free UTF-8
text, excluding the comma (',') character.Examples include "Dublin", "New York", and "São Paulo"
(specifically "S" followed by 0xc3, 0xa3, and "o Paulo").OPTIONAL. The postal code field, if non-empty, SHOULD be free
UTF-8 text, excluding the comma (',') character. See for some discussion of when this field must not
be populated.Examples include "106-6126" (in Minato ward, Tokyo, Japan).Feed publishers may indicate that some IP prefixes should not
have any associated geolocation information. It may be that some
prefixes under their administrative control are reserved, not yet
allocated or deployed, or are in the process of being redeployed
elsewhere and existing geolocation information can, from the
perspective of the publisher, safely be discarded.This special case can be indicated by explicitly leaving blank
all fields which specify any degree of geolocation information. For
example: Historically, the user-assigned country identifier of "ZZ" had be
used for this same purpose. This is not necessarily preferred, and
no specific interpretation of any of the other user-assigned country
codes is currently defined.Feed entries missing required fields, or having a required field
which fails to parse correctly MUST be discarded. It is RECOMMENDED
that such entries also be logged for further administrative
review.While publishers SHOULD follow style for
IPv6 prefix fields, consumers MUST nevertheless accept all valid
string representations.Duplicate IP address or prefix entries MUST be considered an
error, and consumer implementations SHOULD log the repeated entries
for further administrative review. Publishers SHOULD take measures
to ensure there is one and only one entry per IP address and
prefix.Feed entries with non-empty optional fields which fail to parse,
either in part or in full, SHOULD be discarded. It is RECOMMENDED
that they also be logged for further administrative review.For compatibility with future additional fields a parser MUST
ignore any fields beyond those it expects. The data from fields
which are expected and which parse successfully MUST still be
considered valid.Multiple entries which constitute nested prefixes are permitted.
Consumers SHOULD consider the entry with the longest matching prefix
(i.e. the "most specific") to be the best matching entry for a given
IP address.Example entries using different IP address formats and describing
locations at country, region, city and postal code granularity level,
respectively: The IETF network publishes geolocation information for the meeting
prefixes, and generally just comment out the last meeting information
and append the new meeting information. The
at the time of this writing contains:Experimentally, RIPE has published geolocation information for
their conference network prefixes, which change location in accordance
with each new event. at the time of
writing contains: Similarly, ICANN has published geolocation information for their
portable conference network prefixes. at
the time of writing contains: A longer example is the Google Corp
Geofeed, which lists the geo-location information for Google coroprate
offices.Furthermore, it is worth noting that the geolocation data of SixXS
users, already available at whois.sixxs.net, is now also accessible in
the format described here (see ). This can
be particularly useful where tunnel broker networks are concerned as: the geolocation attributes of users with neighboring prefixes
can be quite different and therefore not easily aggregated,
andattempting to learn this data by statistical analysis can be
complicated by the likely low number of samples for any given
user, making satisfactory statistical confidence difficult to
achieve.Already some discussions have resulted in proposed extensions.
While the purpose of this document is principally to record existing
implementation details, it may be that there is a larger desire to
publish other "network attributes" in a similar manner. One such
network attribute, "delegation size", is not currently implemented but
the state of the proposed extension is recorded here to demonstrate
the flexibility required of parser implementations.The following have been only informally discussed and are not in
use at the time of writing.OPTIONAL. A publisher may optionally communicate the average
delegated prefix size for subnetworks within the IP prefix of this
entry. For a network operator this can be used to help consumers
distinguish IP prefixes among various use types such as residential
prefixes, allocations to businesses, or data center customer
allocations.Non-empty strings MUST be of the form required for CIDR notation
suffixes, i.e. "/" followed by the integer prefix length of the
expected allocation to the subnetworks from within the entry's
prefix. In the absence of data to the contrary, it is common to
assume that leaf networks may be delegated a prefix ranging from /24
to /32 in IPv4 and /48 to /64 in IPv6. Default assumptions about
delegation size are left to the consumer's implementation.Examples for IPv6 include "/48", "/56", "/60", and "/64".In order to more flexibly support future extensions, use of a
more expressive feed format has been suggested. Use of JavaScript
Object Notation (JSON, ), specifically, has
been discussed. However, at the time of writing no such
specification nor implementation exists.Consumers MAY treat published feed data as a hint only and MAY choose
to prefer other sources of geolocation information for any given IP
prefix. Regardless of a consumer's stance with respect to a given
published feed, there are some points of note for sensibly and
effectively consuming published feeds.The integrity of published information SHOULD be protected by
securing the means of publication, for example by using HTTP over TLS
. Whenever possible, consumers SHOULD prefer
retrieving geolocation feeds in a manner that guarantees integrity of
the feed.Consumers of self-published IP geolocation feeds SHOULD perform
some form of verification that the publisher is in fact authoritative
for the addresses in the feed. The actual means of verification is
likely dependent upon the way in which the feed is discovered. Ad hoc
shared URIs, for example, will likely require an ad hoc verification
process. Future automated means of feed discovery SHOULD have an
accompanying automated means of verification.A consumer MUST only trust geolocation information for IP addresses
or prefixes for which the publisher has been verified as
administratively authoritative. All other geolocation feed entries
MUST be ignored and SHOULD be logged for further administrative
review.Errors and inaccuracies may occur at many levels, and publication
and consumption of geolocation data are no exceptions. To the extent
practical consumers SHOULD take steps to verify the accuracy of
published locality. Verification methodology, resolution of
discrepancies, and preference for alternative sources of data are left
to the discretion of the feed consumer.Consumers SHOULD decide on discrepancy thresholds and SHOULD flag
for administrative review feed entries which exceed set
thresholds.As a publisher can change geolocation data at any time and without
notification consumers SHOULD implement mechanisms to periodically
refresh local copies of feed data. In the absence of any other refresh
timing information it is recommended that consumers SHOULD refresh
feeds no less often than weekly.For feeds available via HTTPS (or HTTP), the publisher MAY
communicate refresh timing information by means of the standard HTTP
expiration model (section
13.2 of ). Specifically, publishers can
include either an Expires
header or a Cache-Control
header specifying the max-age. Where practical, consumers
SHOULD refresh feed information before the expiry time is reached.Publishers of geolocation feeds are advised to have fully considered
any and all privacy implications of the disclosure of such information
for the users of the described networks prior to publication. A thorough
comprehension of the security
considerations of a chosen geolocation policy is highly
recommended, including an understanding of some of the limitations of
information obscurity (see also ).As noted in , each location field in an entry is
optional, in order to support expressing only the level of specificity
which the publisher has deemed acceptable. There is no requirement that
the level of specificity be consistent across all entries within a feed.
In particular, the Postal Code field () can
provide very specific geolocation, sometimes within a building. Such
specific Postal Code values MUST NOT be published in geo feeds without
the consent of the parties being located.While not originally done in conjunction with the working group, Richard Barnes observed that this work
is nevertheless consistent with that which the group has defined, both
for address format and for privacy. The data elements in geolocation
feeds are equivalent to the following XML structure (vis. ): Providing geolocation information to this granularity is equivalent
to the following privacy policy (vis. the definition of the
'building' level of disclosure): As there is no true security in the obscurity of the location of any
given IP address, self-publication of this data fundamentally opens no
new attack vectors. For publishers, self-published data merely increases
the ease with which such location data might be exploited.For consumers, feed retrieval processes may receive input from
potentially hostile sources (e.g. in the event of hijacked traffic). As
such, proper input validation and defense measures MUST be taken.Similarly, consumers who do not perform sufficient verification of
published data bear the same risks as from other forms of geolocation
configuration errors.The issue of finding, and later verifying, geolocation feeds is not
formally specified in this document. At this time, only ad hoc feed
discovery and verification has a modicum of established practice (see
below). Regardless, both the ad hoc mechanics and a few proposed but not
yet implemented alternatives are discussed.To date, geolocation feeds have been shared informally in the form
of HTTPS URIs exchanged in email threads. The two example URIs
documented above describe networks that change locations periodically,
the operators and operational practices of which are well known within
their respective technical communities.The contents of the feeds are verified by a similarly ad hoc
process including: personal knowledge of the parties involved in the exchange,
andcomparison of feed-advertised prefixes with the BGP-advertised
prefixes of Autonomous System Numbers known to be operated by the
publishers.Ad hoc mechanisms, while useful for early experimentation by
producers and consumers, are unlikely to be adequate for long-term,
widespread use by multiple parties. Future versions of any such
self-published geolocation feed mechanism SHOULD address scalability
concerns by defining a means for automated discovery and verification
of operational authority of advertised prefixes.One possibility for enabling automation would be publication of
feed URIs as a well-known attribute in public databases of network
authority, e.g. the WHOIS service () operated
by RIRs. Verification may be performed if the same or similarly
authoritative service provides the identical feed URI for queries for
each CIDR prefix in the geolocation feed.The burden of serving this data to all interested consumers,
especially the load imposed by any verification process, is not yet
known. The anticipation of additional operational burden on the public
resource of record (the database of network authority) is however a
noted concern.Another possibility for automating the location and verification of
a geolocation feed is to incorporate feed URIs into the DNS,
specifically the in-addr.arpa and ip6.arpa portions of the DNS
hierarchy. A suitably formatted query for a NAPTR () record, or more specifically a U-NAPTR () record, could yield a transformation to a
geolocation feed URI.For example, assuming a purely theoretical service name of
"x-geofeed", a 'reverse' DNS zone might contain a record of the form:
Attempts to locate the geolocation feed for a given IP address
would begin by querying directly for a NAPTR record associated with
the address's PTR-style name. For example, 192.0.2.4 and 2001:db8::6
would cause a NAPTR record request to be issued for
"4.2.0.192.in-addr.arpa" and
"6.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.8.b.d.0.1.0.0.2.ip6.arpa",
respectively.If no such record exists one further NAPTR query for the fully
qualified domain name of the SOA record in the authority section of
the response to the previous query would be performed
("2.0.192.in-addr.arpa" and "d.0.1.0.0.2.ip6.arpa" in the examples
above).If one or more NAPTR records exist for the full PTR-style name but
none of them are for the required service name (e.g. "x-geofeed"),
then likely no SOA will be returned as a hint for subsequent queries.
In this case implementations would need to first explicitly query for
an SOA record for the full PTR-style name, and then query for a NAPTR
record of the SOA in the response (assuming it differs from the
previously queried name).Any successfully located feed URIs could then be processed as
outlined by this document.Verification of the contents of a feed would proceed in essentially
the same way. CIDR prefixes may be verified by constructing a query
for any single address (at random) within the prefix and proceeding as
above. While not strictly provably correct (in cases where a publisher
has delegated some portion of the advertised prefix but not excluded
it from its feed), it may nevertheless suffice for operational
purposes, especially if a low-impact on-going verification of observed
client IP addresses is implemented, to (eventually) catch any
oversights.This mode is untested and may prove impractical. However, the
operational burden is more closely located with those wishing and
willing to bear it, i.e. the publishers who would likely handle
serving in-addr.arpa and ip6.arpa for the IP prefixes under their
authority.The authors would like to express their gratitude to reviewers and
early implementers, including but not limited to Mikael Abrahamsson, Ray
Bellis, John Bond, Alissa Cooper, Andras Erdei, Marco Hogewoning, Mike
Joseph, Warren Kumari, Menno Schepers, Justyna Sidorska, Pim van Pelt,
and Bjoern A. Zeeb. Richard L. Barnes in particular contributed
substantial review, text, and advice.ISO 3166-1 decoding tableInternational Organization for
StandardizationISO 3166-2:2007International Organization for
StandardizationIETF Meeting Network Geolocation DataInternet Engineering Task Force
(IETF) NOCRIPE NCC Meeting Geolocation DataRéseaux IP Européens
Network Coordination CentreICANN Meeting Geolocation DataInternet Corporation For Assigned
Names and NumbersGoogle Corp GeofeedGoogle, LLCSixXS Geolocation DataSixXS IPv6 Deployment and Tunnel
BrokerIETF geopriv Working GroupInternet Engineering Task
ForcePython IP address manipulation libraryGoogle Inc.Google Inc.Included here is a simple format validator in Python for
self-published ipgeo feeds. This tool reads CSV data in the
self-published ipgeo feed format from the standard input and performs
basic validation. It is intended for use by feed publishers before
launching a feed. Note that this validator does not verify the
uniqueness of every IP prefix entry within the feed as a whole, but only
verifies the syntax of each single line from within the feed. A complete
validator MUST also ensure IP prefix uniqueness.The main source file "ipgeo_feed_validator.py" follows. It requires
use of the open source ipaddr Python library for IP address and CIDR
parsing and validation .A unit test file, "ipgeo_feed_validator_test.py" is provided as well.
It provides basic test coverage of the code above, though does not test
correct handling of non-ASCII UTF-8 strings.