idnits 2.17.1 draft-seedorf-lmap-alto-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 2 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. == There are 1 instance of lines with private range IPv4 addresses in the document. If these are generic example addresses, they should be changed to use any of the ranges defined in RFC 6890 (or successor): 192.0.2.x, 198.51.100.x or 203.0.113.x. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (October 21, 2013) is 3839 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-27) exists of draft-ietf-alto-protocol-20 == Outdated reference: A later version (-02) exists of draft-marocco-alto-ws-01 Summary: 0 errors (**), 0 flaws (~~), 5 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 LMAP J. Seedorf 3 Internet-Draft NEC 4 Intended status: Informational D. Goergen 5 Expires: April 24, 2014 R. State 6 University of Luxembourg 7 V. Gurbani 8 Bell Labs, Alcatel-Lucent 9 E. Marocco 10 Telecom Italia 11 October 21, 2013 13 ALTO for Querying LMAP Results 14 draft-seedorf-lmap-alto-02 16 Abstract 18 In the context of Large-Scale Measurement of Broadband Performance 19 (LMAP), measurement results are currently made available to the 20 public either at the finest granularity level (e.g. as a list of 21 results of all individual tests), or in a very high level human- 22 readable format (e.g. as PDF reports). This document argues that 23 there is a need for an intermediate way to provide access to large- 24 scale network measurement results, flexible enough to enable querying 25 of specific and possibly aggregated data. The Application-Layer 26 Traffic Optimization (ALTO) Protocol, defined with the goal to 27 provide applications with network information, seems a good candidate 28 to fulfill such a role. Finally, we describe our methodology for 29 analyzing the United States Federal Communication Commission's (FCC) 30 Measuring Broadband America (MBA) dataset to derive required topology 31 and cost maps suitable for consumption by an ALTO server. 33 Status of this Memo 35 This Internet-Draft is submitted in full conformance with the 36 provisions of BCP 78 and BCP 79. 38 Internet-Drafts are working documents of the Internet Engineering 39 Task Force (IETF). Note that other groups may also distribute 40 working documents as Internet-Drafts. The list of current Internet- 41 Drafts is at http://datatracker.ietf.org/drafts/current/. 43 Internet-Drafts are draft documents valid for a maximum of six months 44 and may be updated, replaced, or obsoleted by other documents at any 45 time. It is inappropriate to use Internet-Drafts as reference 46 material or to cite them other than as "work in progress." 48 This Internet-Draft will expire on April 24, 2014. 50 Copyright Notice 52 Copyright (c) 2013 IETF Trust and the persons identified as the 53 document authors. All rights reserved. 55 This document is subject to BCP 78 and the IETF Trust's Legal 56 Provisions Relating to IETF Documents 57 (http://trustee.ietf.org/license-info) in effect on the date of 58 publication of this document. Please review these documents 59 carefully, as they describe your rights and restrictions with respect 60 to this document. Code Components extracted from this document must 61 include Simplified BSD License text as described in Section 4.e of 62 the Trust Legal Provisions and are provided without warranty as 63 described in the Simplified BSD License. 65 Table of Contents 67 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 68 2. Example Use Cases . . . . . . . . . . . . . . . . . . . . . . 5 69 3. Advantages of using ALTO . . . . . . . . . . . . . . . . . . . 6 70 4. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 71 4.1. Download speeds . . . . . . . . . . . . . . . . . . . . . 7 72 4.1.1. Network map . . . . . . . . . . . . . . . . . . . . . 8 73 4.1.2. Cost map . . . . . . . . . . . . . . . . . . . . . . . 9 74 5. Discussion of Useful ALTO Extensions . . . . . . . . . . . . . 10 75 6. Case study: Analyzing a large-scale dataset . . . . . . . . . 11 76 6.1. Challenges in data analysis . . . . . . . . . . . . . . . 11 77 6.2. Geo-locating the units . . . . . . . . . . . . . . . . . . 12 78 7. Security considerations . . . . . . . . . . . . . . . . . . . 15 79 8. IANA considerations . . . . . . . . . . . . . . . . . . . . . 16 80 9. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . 17 81 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 18 82 10.1. Normative References . . . . . . . . . . . . . . . . . . . 18 83 10.2. Informative References . . . . . . . . . . . . . . . . . . 18 84 Appendix A. Acknowledgment . . . . . . . . . . . . . . . . . . . 19 85 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 20 87 1. Introduction 89 Recently, there is a discussion on standardizing protocols that would 90 allow measurements of broadband performance on a large scale (LMAP 91 [I-D.schulzrinne-lmap-requirements]). In principle, the vision is 92 that "user networks gather data, either on their own initiative or 93 instructed by a measurement controller, and then upload the 94 measurement results to a designated measurement server." 96 Apart from protocols that can be used to gather measurement data and 97 to upload such data to dedicated servers, there is also a need for 98 protocols to retrieve - potentially aggregated - measurement results 99 for a certain network (or part of a network), possibly in an 100 automated way. Currently, two extremes are being used to provide 101 access to large-scale measurement results: One the one hand, highly 102 aggregated results for certain networks may be made available in the 103 form of PDFs of figures. Such presentations may be suitable for 104 certain use cases, but certainly do not allow a user (or entity such 105 as a service provider) to select specific criteria and then create 106 corresponding results. On the other hand, complete and detailed 107 results may be made available in the form of comma-seperated-values 108 (csv) files. Such data sets typically include the complete results 109 being measured on a very fine-grained level and usually imply large 110 file sizes (of result data sets). Such detailed result data sets are 111 very useful e.g. for the scientific community because they enable to 112 execute complex data analytics algorithms or queries to analyse 113 results. 115 Considering the two extremes discussed above, this document argues 116 that there is a need for an intermediate way to provide access to 117 large-scale network measurement results: It must be possible to query 118 for specific, possibly aggregated, results in a flexible way. 119 Otherwise, entities interested in measurement results either cannot 120 select what kind of result aggregation they desire, or must always 121 fetch large amounts of detailed results and process these huge 122 datasets themselves. The need for a flexible mechanism to query for 123 dedicated, partial results becomes evident when considering use cases 124 where a service provider or a process wants to use certain 125 measurement results in an automated fashion. For instance, consider 126 a video streaming service provider which wants to know for a given 127 end-user request the average download speed by the end user's access 128 provider in the end user's region (e.g. to optimize/parametrize its 129 http adaptive streaming service). Or consider a website which is 130 interested in retrieving average connectivity speeds for users 131 depending on access provider, region, or type of contract (e.g. to be 132 able to adapt web content on a per-request basis according to such 133 statistics). 135 This document argues that use cases as described above may enhance 136 the value of measurements of broadband performance on a large scale 137 (LMAP), given that it is possible to query for selected results in an 138 automated fashion. Therefore, in order to facilitate such use cases, 139 a protocol is needed that enables to query LMAP measurements results 140 while allowing to specify certain parameters that narrow down the 141 particular data (i.e. measurement results) the issuer of the query is 142 interested in. This document argues that ALTO [RFC5693] 143 [I-D.ietf-alto-protocol] could be a suitable candidate for such a 144 flexible LMAP result query protocol. 146 2. Example Use Cases 148 To motivate the usefulness of ALTO for querying LMAP results, 149 consider some key use cases: 151 o Video Streaming Service Provider: For HTTP adaptive streaming, it 152 may be very useful to be able to query for average measurement 153 values regarding a particular end user's access network provider. 154 For instance, consider a video streaming service provider that 155 queries LMAP measurement results to retrieve for a given end-user 156 request the average download speed by the end user's access 157 provider in the end user's region. Such data could help the 158 service provider to optimize/parametrize its HTTP adaptive 159 streaming service. 161 o Website Front End Optimization: A website might be interested in 162 statistics about average connectivity types or download speeds for 163 a given end user request in order to dynamically adapt HTML/CSS/ 164 JavaScript content depending on such information (sometimes 165 referred to as "Front End Optimization"). For instance, image 166 compression may or may not be employed depending on the average 167 connectivity type/speed of a user in a given region or with a 168 given access network provider. 170 o Display estimation of service quality or total download time to 171 users: A webservice could use statistics about average download 172 speeds for a given ISP and/or region to estimate Quality-of- 173 Service for provided services (e.g. to indicate to the user what 174 Quality-of-Experience to expect when clicking on a given link) or 175 to estimate (and display to the user) the total download time for 176 given content. 178 o Troubleshooting: In general, any service on the Internet may be 179 interested in LMAP data for troubleshooting. In case a service 180 does not work as expected (e.g. low throughput, high packet loss, 181 ...), it may be of value for the service provider to retrieve 182 (fairly) recent measurement data regarding the host that is 183 requesting the service. 185 o TBD: add more use cases 187 3. Advantages of using ALTO 189 The ALTO protocol [I-D.ietf-alto-protocol] specifies a very 190 lightweigth JSON-based encoding for network information and can play 191 an important role in querying the measurement results as we argue in 192 Section 2. 194 ALTO is designed on two abstractions that are useful here. First is 195 the abstraction of the physical network topology into an aggregated 196 but logical topology. In this abstract topological view, referred to 197 as "network map", individual hosts are aggregated into a well defined 198 network location identifier called a PID. Hosts could be aggregated 199 into the PID depending on certain identifying characteristics such as 200 geographical location, serving ISP, network mask, nominal access 201 speed, or any mix of them. The "network map" abstraction is 202 essential for exporting network infromation in a scalable and 203 privacy-preserving way. 205 The second abstraction that is useful for LMAP is the notion of a 206 "cost map". Each PID identified in the network map can, in a sense, 207 become a vertex in a cost map, and each edge joining adjacent 208 vertices can have an associated cost. The cost can be defined by the 209 measurement server and can indicate routing hops, the financial cost 210 of sending data over the link, available bandwidth on the link with 211 bottled-up links increasingly showing a smaller value, or a user- 212 defined cost attribute that allows arbitrary reasoning. 214 The ALTO protocol defines several basic services based on such 215 abstractions, but additional ones can be easily defined as 216 extensions. 218 There are other advantages to using ALTO as well. The protocol is 219 defined as a set of REST APIs on top of HTTP. The data carried by 220 the protocol is encoded as JSON. Queries can be performed by clients 221 locally after downloading the entire topological and cost maps or 222 clients can send filtered requests to the ALTO server such that the 223 ALTO server performs the required computation and returns the 224 results. The protocol supports a set of atomic constraints related 225 to equality that can be used to filter results and only obtain a set 226 of interest to the query. 228 Additionally, protocol extensions that could also be useful for the 229 LMAP usage scenario (e.g. extensions for incremental updates, for 230 asynchrounous change notifications and for encoding of multiple costs 231 within the same cost map) have been proposed and are currently being 232 discussed in the ALTO WG. 234 4. Examples 236 [NOTE: syntax most certainly wrong!] 238 4.1. Download speeds 240 This section shows, as an example, how average download speeds 241 measured in a given time interval can be reported. The aggregation 242 approach in this case is based on ISP and geographical location. Two 243 types of data are reported in this example: 245 o data collected from measurements against specific endpoints (e.g. 246 active measurements); 248 o data collected from all measurements (e.g. passive measurements). 250 4.1.1. Network map 252 { 253 "meta" : {}, 254 "data" : { 255 "map-vtag" : "1266506139", 256 "map" : { 257 "ISP1-GEO1" : { 258 "ipv4" : [ "10.1.0.0/16", 172.20.0.0/16" ] 259 }, 260 "ISP2-GEO1" : { 261 "ipv4" : [ "10.2.0.0/17" ] 262 }, 263 "ISP3-GEO1" : { 264 "ipv4" : [ "10.3.0.0/16" ] 265 }, 266 "ISP2-GEO2" : { 267 "ipv4" : [ "10.2.128.0/17" ] 268 }, 269 "ISP4-GEO2" : { 270 "ipv4" : [ "10.4.0.0/16" ] 271 }, 273 . 274 . 275 . 277 "MSMNT-CL1" : { 278 "ipv4" : [ "192.168.0.0/30" ] 279 }, 280 "TOTAL" : { 281 "ipv4" : [ "0.0.0.0/0" ] 282 } 283 } 284 } 286 4.1.2. Cost map 288 { 289 "meta" : {}, 290 "data" : { 291 "cost-mode" : "numerical", 292 "cost-type" : "avg-dl-speed", 293 "map-vtag" : "1266506139", 294 "time-interval" : "2629740", 295 "map" : { 296 "ISP1-GEO1": { "MSMNT-CL1" : 13.2, 297 "TOTAL" : 10.2}, 298 "ISP2-GEO1": { "MSMNT-CL1" : 11.4, 299 "TOTAL" : 12.3}, 300 "ISP3-GEO1": { "MSMNT-CL1" : 13.2, 301 "TOTAL" : 10.2}, 302 . 303 . 304 . 306 } 307 } 308 } 309 } 311 5. Discussion of Useful ALTO Extensions 313 The base ALTO Protocol as specified in [I-D.ietf-alto-protocol] can 314 in principle be used to enable a more flexible way to provide access 315 to large-scale network measurement results as discussed in the 316 previous sections of this document. However, certain extensions to 317 the base ALTO Protocol that have recently been proposed in the ALTO 318 WG would allow to better enable the use cases discussed in Section 2: 320 o Server-initiated Notifications: In [I-D.marocco-alto-ws], it has 321 been proposed to enhance the ALTO protocol such that servers can 322 notify clients about newly available ALTO maps. In the context of 323 this document, this extension would allow applications to be 324 notified when certain new LMAP measurements are available, such as 325 new measurement results on average download speeds. These new 326 results could then be downloaded and used immediately by 327 applications. 329 o Incremental Updates: In [I-D.schwan-alto-incr-updates], it has 330 been proposed to enhance the ALTO protocol with incremental 331 updates, such that clients can retrieve partial updates for ALTO 332 maps instead of always downloading a full ALTO map (even when only 333 a small fraction of the ALTO map has changed compared to a 334 previous version). When ALTO is used for querying LMAP results, 335 the corresponding ALTO maps may potentially be quite large (e.g. 336 when a webservice queries for particular, detailed results 337 regarding a whole ISP). In this case, incremental ALTO updates 338 would be a very useful mechanism for applications to retrieve 339 updates of ALTO maps, as a reduced amount of data would be needed 340 for transmitting these maps. 342 6. Case study: Analyzing a large-scale dataset 344 Measuring broadband performance is increasingly important as 345 communications continue to move towards the Internet. Internet 346 service providers (ISP), national agencies and other entities gather 347 broadband data and may provide some, or all, of the dataset to the 348 public for analysis. As we argue above, there are two extremes 349 prevalent for presenting large-scale data. One is in the form of 350 charts, figures, or summarized reports amenable for easy and quick 351 consumption. The other extreme includes releasing raw data in the 352 form of large files containing tables formatted as values separated 353 by a delimiter. While the former is indispensable to acquire a 354 summary view of the dataset, it does not suffice for additional 355 analysis beyond what is presented. Conversely, the problem with the 356 latter option (raw files) is that the unsuspecting user perusing them 357 is lost in the deluge of data. 359 We offer the argument that a reasonable medium between the two 360 extremes may be the ALTO protocol [I-D.ietf-alto-protocol]. A 361 necessary prerequisite for using ALTO is abstracting the network 362 information into a form that is suitable for consumption by the 363 protocol. The implication of using ALTO is that data from any large- 364 scale measurement effort must first be distilled in two maps: a 365 topology map and a cost map. Further analysis and ad-hoc queries can 366 be subsequently performed on the normalized dataset. 368 In the United States, the Federal Communication Commission (FCC) has 369 embarked on a nationwide performance study of residential wireline 370 broadband service [fcc]. Our aim is to use the raw datasets from 371 this study for analysis and to create a topology map and a cost map 372 from this dataset. ALTO queries aimed at these maps will enable 373 users and interested parties to fulfill the use cases listed in 374 Section 2. 376 6.1. Challenges in data analysis 378 The FCC Measuring Broadband America (MBA) study consisted of 7,782 379 volunteers spread across the United States with adequate geographic 380 diversity. Volunteers opted in for the study, however, each of the 381 volunteers remained anonymous. An opaque integral number (unit_id) 382 represented a subscriber in the raw dataset. This unit_id remains 383 constant during the duration of the study in the dataset and uniquely 384 identifies a volunteer subscriber, even if the subscriber switches 385 the ISP. More detail about the methodology used is described in 386 [fcc]. 388 The dataset consisted of 12 tables, each table corresponding to the 389 data drawn from a certain performance test. For the analysis we 390 present in this document we focus on the "curr_dns" table, which 391 contains the time taken for the ISP's recursive DNS resolver to 392 return a DNS A RR for a popular website domain name. This test was 393 ran approximately every hour in a 24-hour period, and produced about 394 75-78 million records per month. This resulted in a typical file 395 size in the range of 6-7 GBytes per month. We note that the 396 "curr_dns" table is one of the smaller tables in the dataset. 398 The first challenge, therefore, was to arrive at computing resources 399 comparable in scale with respect to the dataset consisting of 400 millions of records spread across gigabyte-sized files. To analyze 401 the volume of data we used a canonical Map-Reduce computational 402 paradigm on a Hadoop cluster (more details on the methodology are 403 outlined in Section 6.2). 405 A second, more pressing challenge, was to identify the geographic 406 location of the unit_ids generating the data. In order to derive a 407 topological map and impose costs on the links, it is important to 408 know the physical locations of the unit_ids that contributed the 409 measurements. However, in the MBA dataset, the population is 410 anonymized and the individual subscriber reporting the measurement 411 data is simply referred to by an opaque integral number. Therefore, 412 an important task was to use the information in the public tables to 413 reveal a coarse location of the subscriber. 415 We outline the methodology we used to do so in the next section. We 416 stress that this methodology does not identify the specific location 417 of a subscriber, who still remains anonymous. Instead, it simply 418 locates the subscriber in a larger metropolitan region. This level 419 of granularity suffices for our work. 421 6.2. Geo-locating the units 423 To geo-locate the units, we simply note that broadband subscriber 424 devices are likely to be configured using DHCP by their ISP. Besides 425 imparting an IP address to the subscriber device, DHCP also populates 426 the DNS name servers the subscriber devices uses for DNS queries. In 427 most installations, these DNS name servers are located in close 428 physical proximity of the subscriber device. The FCC technical 429 appendix states that the DNS resolution tests were targeted directly 430 at the ISP's recursive resolvers to circumvent caching and users 431 configuring the subscriber device to circumvent the ISP's DNS 432 resolvers. Therefore, a reasonable approximation of a subscribers 433 geo-location could be the geographic location of the DNS name server 434 serving the subscriber. We use this very heuristic to geo-locate a 435 subscriber. 437 Thus our first, and very simple filter consisted of obtaining a 438 mapping from a unit_id (representing a subscriber) to one or more DNS 439 name servers that the unit_id is sending DNS requests to. It turned 440 out that while this was a necessary condition for advancing, it was 441 not a sufficient one. The raw data would need to be further 442 processed to reduce inconsistencies and remove outliers. A number of 443 interesting artifacts were uncovered during further processing of the 444 data. These artifacts informed the selection of the unit_ids for 445 further analysis. 447 The artifacts are documented below. 449 o A handful of unit_ids were geo-located in areas outside the 450 contiguous United States, such as Ukraine, Poland or the United 451 Kingdom. We theorize that the subscribers corresponding to the 452 unit_ids geo-located outside the contiguous United States had 453 simply configured their devices to use alternate DNS servers, 454 probably located outside the United States. We removed these 455 records before conducting our analysis. 457 o We also observed a reasonable number of non-ISP DNS resolvers, 458 especially Google's 8.8.8.8 and 8.8.4.4 and OpenDNS 208.67.222.222 459 and 208.67.220.220. These 4 public DNS servers are geo-located in 460 California. We removed these records to ensure that the specific 461 location that these resolvers represented was not oversampled. 463 o We noticed that a large number of unit_ids were being geo-located 464 in Potwin, Kansas. Intrigued as to why there appeared to be a 465 large population of Internet users being located in a small rural 466 community in Kansas, we investigated further. It appears that 467 Potwin, Kansas is the geographical center of the United States and 468 a number of ISPs have chosen to establish data centers in or 469 around the Potwin area. These ISPs generally locate their primary 470 or secondary DNS name servers in Potwin-area data centers, thus 471 accounting for the popularity of Potwin as an Internet 472 destination. We continue to further investigate on minimizing the 473 impact of such natural aggregation points that, if not accounted 474 for, will skew our results in an unwarranted direction. 476 o We observed some unit_ids changing ISPs during the observation 477 period. This is a normal occurrence and to the extent that the 478 unit_id is geo-located in the same geographical area after the 479 change in ISP, we do not exclude such unit_ids from further 480 analysis. 482 Subsequent filters extracted the stable unit_ids from our dataset. 483 In order to determine which unit_id are stable, i.e., remain constant 484 with respect to their geographic location over the observation period 485 from January to December 2012, we extracted for each unit_id the IP 486 address of each DNS name server it consulted. This is obtained by 487 applying the map reduce paradigm on the DNS dataset. We extracted 488 for each unit_id the triggered DNS servers and obtained the 489 individual DNS servers accessed by a unit_id. This was repeated for 490 each month of the observation period. The resulting sets were 491 cleaned up of private IP addresses and other artifacts discussed 492 above. The cleaned set consisted of about 8000 distinct unit_id. 494 In order to determine the stability of each unit_id we proceeded to 495 sum up the occurrences of IP addresses over the whole observation 496 period separated in monthly files. If the IP address of a DNS server 497 occurred 12 times this meant that the unit_id always accessed the 498 same DNS server and therefore remained stable over the observation 499 period. The obtained stable unit_ids, around 1500, will be used for 500 further analysis. Assuming a 99% confidence level and +/- 3 point 501 margin of error, we will require a sample of 1494 unit_ids. With our 502 stable unit_id set of 1500 unit_ids, we are now positioned to perform 503 further analysis on the dataset to create the full topology and cost 504 maps. 506 Table 1 presents a sample of the geographic location data that we 507 have uncovered for unit_ids. A complete list of identified units 508 superimposed on the geographical map of the United States is 509 available at http://cdb.io/13UOHgD. 511 +---------+-----------------+--------------------------+ 512 | Unit ID | City, State | Latitude/Longitude | 513 +---------+-----------------+--------------------------+ 514 | 872 | Morganville, NJ | 40.35950089,-74.26280212 | 515 | | | | 516 | 885 | Madison, WI | 43.07310104,-89.40119934 | 517 | | | | 518 | 898 | Foley, AL | 30.40660095,-87.68360138 | 519 | | | | 520 | 7969 | Manteca, CA | 37.79740143,-121.2160034 | 521 | | | | 522 | 8024 | Quincy, MA | 42.25289917,-71.00229645 | 523 +---------+-----------------+--------------------------+ 525 Sample unit identification tuples 527 Table 1 529 7. Security considerations 531 There are no security artifacts invalidated due to our analysis in 532 Section 6. All of our analysis was performed on publicly available 533 data. However, we do note that some privacy may have been lost based 534 on our analysis. In the raw dataset, the unit identifiers are opaque 535 strings with no immediate correlation with a geographic location. 536 After our analysis, while the unit identifiers still remain opaque, 537 they are nonetheless correlated to a specific, though coarse, 538 geographic location. 540 8. IANA considerations 542 This document does not contain any IANA considerations. 544 9. Conclusion 546 This document argues that, compared to existing solutions, there may 547 be a need for a more flexible way to provide access to large-scale 548 network measurement results. Further, the document argues that the 549 ALTO protocol is a good candidate to enable querying for specific, 550 possibly aggregated, measurement results in a flexible way. Examples 551 of how such a flexible query meachnism for large-scale measurement 552 results could look like based on ALTO are given. 554 With respect to the case study in Section 6, identification of the 555 geographic location of the unit_ids generating the performance data 556 is essential in order to continue the work. We have presented a 557 methodology and some early results in identifying a geographic 558 location. This location, although coarse, suffices for our future 559 work that will consist of further data mining and analysis to create 560 appropriate ALTO network and cost maps. 562 10. References 564 10.1. Normative References 566 [RFC5693] Seedorf, J. and E. Burger, "Application-Layer Traffic 567 Optimization (ALTO) Problem Statement", RFC 5693, 568 October 2009. 570 10.2. Informative References 572 [I-D.ietf-alto-protocol] 573 Alimi, R., Penno, R., and Y. Yang, "ALTO Protocol", 574 draft-ietf-alto-protocol-20 (work in progress), 575 October 2013. 577 [I-D.marocco-alto-ws] 578 Marocco, E. and J. Seedorf, "WebSocket-based server-to- 579 client notifications for the Application-Layer Traffic 580 Optimization (ALTO) Protocol", draft-marocco-alto-ws-01 581 (work in progress), July 2012. 583 [I-D.schulzrinne-lmap-requirements] 584 Schulzrinne, H., Johnston, W., and J. Miller, "Large-Scale 585 Measurement of Broadband Performance: Use Cases, 586 Architecture and Protocol Requirements", 587 draft-schulzrinne-lmap-requirements-00 (work in progress), 588 September 2012. 590 [I-D.schwan-alto-incr-updates] 591 Schwan, N. and B. Roome, "ALTO Incremental Updates", 592 draft-schwan-alto-incr-updates-02 (work in progress), 593 July 2012. 595 [fcc] United States Federal Communications Commission, 596 "Measuring Broadband America", Accessed July 12, 597 2013, http://www.fcc.gov/measuring-broadband-america. 599 Appendix A. Acknowledgment 601 Jan Seedorf is partially supported by the mPlane project (mPlane: an 602 Intelligent Measurement Plane for Future Network and Application 603 Management), a research project supported by the European Commission 604 under its 7th Framework Program (contract no. 318627). The views and 605 conclusions contained herein are those of the authors and should not 606 be interpreted as necessarily representing the official policies or 607 endorsements, either expressed or implied, of the mPlane project or 608 the European Commission. 610 Authors' Addresses 612 Jan Seedorf 613 NEC 614 Kurfuerstenanlage 36 615 Heidelberg 69115 616 Germany 618 Phone: +49 6221 4342 221 619 Fax: +49 6221 4342 155 620 Email: seedorf@neclab.eu 622 David Goergen 623 University of Luxembourg 625 Email: david.goergen@uni.lu 627 Radu State 628 University of Luxembourg 630 Email: radu.state@uni.lu 632 Vijay K. Gurbani 633 Bell Labs, Alcatel-Lucent 635 Email: vkg@bell-labs.com 637 Enrico Marocco 638 Telecom Italia 639 Via G. Reiss Romoli, 274 640 Turin 10148 641 Italy 643 Email: enrico.marocco@telecomitalia.it