idnits 2.17.1 draft-cui-iss-problem-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** There are 28 instances of too long lines in the document, the longest one being 3 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (November 2, 2015) is 3096 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Unused Reference: 'Batched' is defined on line 650, but no explicit reference was found in the text == Unused Reference: 'Towards' is defined on line 690, but no explicit reference was found in the text Summary: 2 errors (**), 0 flaws (~~), 3 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group Y. Cui 3 Internet-Draft Z. Lai 4 Intended status: Informational L. Sun 5 Expires: May 5, 2016 Tsinghua University 6 November 2, 2015 8 Internet Storage Sync: Problem Statement 9 draft-cui-iss-problem-03 11 Abstract 13 Internet storage services have become more and more popular. They 14 attract a huge number of users and produce a significant share of 15 Internet traffic. Most existing Internet storage services make use 16 of proprietary sync protocols with different capabilities to achieve 17 the data sync. However, a single Internet storage service using its 18 proprietary sync protocols has intrinsic limitations on service 19 usability and network performance. This document outlines the 20 related problems caused by using proprietary sync protocols and 21 missing key capabilities. It also shows a demand for designing a 22 standard sync protocol to achieve better usability and sync 23 performance. 25 Status of This Memo 27 This Internet-Draft is submitted in full conformance with the 28 provisions of BCP 78 and BCP 79. 30 Internet-Drafts are working documents of the Internet Engineering 31 Task Force (IETF). Note that other groups may also distribute 32 working documents as Internet-Drafts. The list of current Internet- 33 Drafts is at http://datatracker.ietf.org/drafts/current/. 35 Internet-Drafts are draft documents valid for a maximum of six months 36 and may be updated, replaced, or obsoleted by other documents at any 37 time. It is inappropriate to use Internet-Drafts as reference 38 material or to cite them other than as "work in progress." 40 This Internet-Draft will expire on May 5, 2016. 42 Copyright Notice 44 Copyright (c) 2015 IETF Trust and the persons identified as the 45 document authors. All rights reserved. 47 This document is subject to BCP 78 and the IETF Trust's Legal 48 Provisions Relating to IETF Documents 49 (http://trustee.ietf.org/license-info) in effect on the date of 50 publication of this document. Please review these documents 51 carefully, as they describe your rights and restrictions with respect 52 to this document. Code Components extracted from this document must 53 include Simplified BSD License text as described in Section 4.e of 54 the Trust Legal Provisions and are provided without warranty as 55 described in the Simplified BSD License. 57 Table of Contents 59 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 60 2. Terminology and Concepts . . . . . . . . . . . . . . . . . . 4 61 3. Architecture of Internet Storage Service . . . . . . . . . . 5 62 4. Problems . . . . . . . . . . . . . . . . . . . . . . . . . . 6 63 4.1. Complicated Support for APIs . . . . . . . . . . . . . . 6 64 4.2. Unavailable Cross-service Sync . . . . . . . . . . . . . 7 65 4.3. Multiple Similar Clients . . . . . . . . . . . . . . . . 7 66 4.4. Protocol Capability Configurations and Implementations . 8 67 4.4.1. Chunking and Deduplication . . . . . . . . . . . . . 9 68 4.4.2. Chunking and Delta-encoding . . . . . . . . . . . . . 9 69 4.4.3. Bundling . . . . . . . . . . . . . . . . . . . . . . 10 70 4.5. Sync Protocols in Mobile and Wireless Environments . . . 10 71 4.6. Unsatisfactory Concurrent Work Ability . . . . . . . . . 11 72 5. Advantages of Standard Sync Protocol . . . . . . . . . . . . 12 73 6. Understanding of Sync Protocol . . . . . . . . . . . . . . . 13 74 7. Related Work in IETF . . . . . . . . . . . . . . . . . . . . 14 75 8. Security Considerations (TBD) . . . . . . . . . . . . . . . . 14 76 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 14 77 10. Informative References . . . . . . . . . . . . . . . . . . . 14 78 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 15 80 1. Introduction 82 Internet storage services provide a convenient way for users to 83 synchronize local files or folders with remote servers. In recent 84 years, Internet storage services have gained tremendous popularity 85 and accounted for a large amount of Internet traffic. This high 86 public interest also pushes various providers to enter the Internet 87 storage market. Services like Dropbox, Google Drive, OneDrive and 88 Box are becoming pervasive in people's routine. Dropbox, typically 89 considered as one of the leading providers, annouced that they have 90 more than 400 million registered users in June, 2015 [users], and 91 this number will keep growing in the future. Internet storage 92 services enable the users to access, operate and share their data 93 from anywhere, on any devices, at any time and with any connectivity. 94 Internet storage services also provide powerful APIs which allow 95 third-party applications to offload the burden of data storage and 96 management to the server. By aggregating users' files or application 97 data in the server, Internet storage services are becoming the "data 98 entrance" for personal users. 100 Sync protocol is the key design consideration of Internet storage 101 services. The sync protocol can be equipped with several 102 capabilities to optimize the storage usage and speed up data 103 transmission. Existing Internet storage services employ their 104 proprietary sync protocols to store/retrieve user data to/from the 105 remote servers. However, using proprietary sync protocols with 106 different capabilities in different Internet Storage services has 107 intrinsic limitations on service usability and network performance. 109 Multi-service usability: Users may use multiple Internet storage 110 services for the diversity of performance and functionality. In 111 addition, an Internet storage service has full access to user data, 112 the user data is at risk when the service is attacked or when 113 authorities require the providers to expose their data. Some 114 enterprise users may want to use their own network-based storage 115 service. Furthermore, it is complicated for developers to use 116 different APIs to combine their application with Internet storage 117 service. It also makes it unavailable for an Internet storage 118 service user to synchronize data with the users of other service. 119 Moreover, to use multi-service a user may install a series of client 120 applications with similar functionality, which wastes the local 121 resource and sacrifices the user experience. 123 Missing or misusing capabilities: Previous works show that existing 124 Internet storage services have different capability configurations 125 and implementations. These capabilities are closely related to each 126 other and help to efficiently synchronize user data. However, most 127 of the storage services are found to be lack of key capabilities or 128 the capabilities are not reasonably configured, which may result in 129 unexpected sync failure and sync inefficiency. How to reasonably 130 design and implement capabilities in the sync protocol has indeed 131 become a critical problem for the providers. 133 To address the problems mentioned above, an open and standard sync 134 protocol is required. In addition, this standard sync protocol are 135 expected to support the useful capabilities to avoid unexpected sync 136 failures and improve network performance. 138 This document outlines the problems arisen in existing Internet 139 storage services with various proprietary sync protocols. Section 2 140 lists the terminology and related concepts of Internet storage 141 services. Section 3 introduces the architecture of existing Internet 142 storage services. Section 4 describes the main problems and issues 143 that need to be considered. Section 5 explains the advantages of 144 using open and standard sync protocol. Section 6 shows a high-level 145 understanding of the sync protocol. Section 7 identifies the 146 differences between ISS and related work in IETF (i.e. WebDAV). 148 2. Terminology and Concepts 150 Data synchronization (sync): A primary technique for Internet storage 151 services. It enables the client to automatically update local file 152 changes to the remote servers through network communications. 154 Client: An application which is installed at the user side (i.e. on 155 multiple terminals). It enables users to access and experience 156 Internet storage service. 158 Control server: The entity that takes the responsibility of 159 authenticating users, managing metadata information and also 160 notifying changes to the client. It stores authentication and 161 metadata information of users. 163 Data storage server: The entity that stores the synchronized files of 164 users. 166 Control data: The control information exchanged with control server 167 to fulfil the data sync process. Typical control data includes 168 metadata (e.g. hashes for chunks), authentication information and 169 etc. 171 Content data: The original data of the local file, often in forms of 172 small chunks. 174 Sync protocol: A communication protocol between client and remote 175 servers to achieve data sync. It contains control flow and data 176 flow. Sync protocols are always built on HTTPS/HTTP. 178 o Control flow: This flow is for client and control server to 179 exchange control data. 181 o Data flow: This flow is for transmitting content data between 182 client and data storage servers. 184 Sync efficiency: A performance metric that indicates how fast the 185 changes can be synchronized to the Internet with the lowest traffic 186 overhead. 188 Useful capabilities to improve sync efficiency: 190 o Chunking: Split large file into small chunks. 192 o Bundling: Transmit multiple small chunks as a single big chunk. 194 o Deduplication: Avoid retransmission of existing content on the 195 Internet. 197 o Delta-encoding: Only synchronize modified data. 199 o Compression: Compress data before transmission. 201 3. Architecture of Internet Storage Service 203 The architecture of most Internet storage services is generally 204 composed of three major components: client, control server and data 205 storage server. And the whole architecture is shown in Figure 1. 207 * * * * * * * * 208 * * * * * * * * * * * * * * 209 * INTERNET * 210 * +------------+ +------------+ * 211 ------| Control | | +------------+ * 212 | * | server | | |Data storage|======== 213 | * +------------+ + | servers | * | 214 | * +------------+ * | 215 | * * * * * * * * * * * * * * | 216 Control Flow * * * * * * * * Data Flow 217 | | 218 | | 219 | +--------+ | 220 ---------------------| Client |===================== 221 +--------+ 223 Figure 1 225 With the help of sync protocol, all the three components could 226 communicate with each other. Control server is responsible for 227 storing all the control data, including authentication information, 228 metadata and etc. And once there are changes made on synchronized 229 files, the control server will notify the clients. However the other 230 type of data, content data, is stored in the form of chunks on the 231 data storage servers with no knowledge of sources, users and 232 relationship with other data chunks. As a result, a complete user 233 file will be split into small chunks and those chunks may be stored 234 on several different data storage servers. These two types of 235 servers are separate logical entities and are usually deployed in 236 different locations. Every time the client synchronize a local file 237 to the Internet, it needs to exchange control data and content data 238 with different types of servers in different flows. 240 4. Problems 242 Existing popular Internet storage services, including Dropbox, 243 OneDrive, GoogleDrive and etc, are using their own proprietary sync 244 protocols to achieve the data sync. Using different proprietary 245 protocols are always considered not to be beneficial to the 246 development of Internet services. This section describes current 247 problems for Internet storage services caused by their sync 248 protocols. We summarize six specific problems from three different 249 aspects: service usability, protocol capabilities and concurrent work 250 ability. As we discussed in Section 1, users prefer to use multiple 251 storage services for the considerations of performance, reliability 252 and security. Service usability among multiple services is still 253 lacking to some extent due to the proprietary format of sync 254 protocols. Section 4.1, Section 4.2 and Section 4.3 describe the 255 problems which are concerned with the usability. Moreover, previous 256 works and measurements have revealed that most sync protocols are 257 lack of key service capabilities or the capabilities are not well 258 configured, which significantly degrades the network performance, 259 especially in the mobile and wireless environment. Section 4.4 and 260 Section 4.5 illustrate the problems of current protocol capabilities. 261 In addition, the unsatisfied concurrent work ability is specified in 262 Section 4.6. 264 4.1. Complicated Support for APIs 266 Popular Internet storage services provide APIs that extend access to 267 the content management features in client software for use in third- 268 party applications. In practical platform, these APIs take care of 269 synchronizing data with Internet storage servers through a familiar 270 system-like way. Behind the scenes, API synchronize changes to the 271 server and automatically notify the client when changes are made on 272 other devices. These APIs can also include some further advanced 273 features or functions, e.g. revision or restoration of files, to make 274 the client work better. Different providers have different APIs 275 provided to the developers and their APIs have different styles and 276 features in order to support different platforms (e.g. Windows and 277 Andorid). 279 Third-party applications prefer to combine multiple Internet storage 280 services into their applications to achieve better performance, 281 reliability and security. However, for these developers who want to 282 use multiple storage services, they need to learn the APIs of all 283 service providers in order to design and implement their own clients. 284 Although there have already been some successful third party clients 285 that support multiple services (e.g. ExpanDrive [ExpanDrive], IFTTT 286 [IFTTT]), it is not easy for the developers to learn and apply so 287 many different APIs to develop and maintain their third party 288 clients. 290 4.2. Unavailable Cross-service Sync 292 Synchronizing is one of the most important functions provided by 293 Internet storage services. With this function provided, files in the 294 Internet could be easily shared and manipulated by different people 295 and groups. Anyone who is permitted to read and download the file is 296 able to modify and upload new versions of this file to the Internet. 298 However, this synchronizing function merely works well inside a 299 single service. Users who are using the same Internet storage 300 service could easily achieve the sharing (i.e. download) and 301 coordinated operations on their files. When referring to the 302 synchronizing among different Internet storage services, it is not 303 complete since the sync among different services is not available. 304 For example, if a Dropbox user wants to work on a cooperative file 305 with a Google Drive user currently, he is only able to share this 306 file with the other one by sending an open HTTP link of this file. 307 After clicking on that link, the Google Drive user could only 308 download this file through HTTP. However, the Google Drive user can 309 only read and download the shared file. He cannot modify and update 310 the shared file since Dropbox and Google Drive are using two 311 different proprietary sync protocols. This is because the 312 cooperative file is stored on Dropbox servers. A Google Drive client 313 cannot download/upload the file through Dropbox's sync protocol since 314 it has no idea of the Dropbox's sync protocol. Different services 315 using different proprietary sync protocols results in the 316 unavailability. 318 4.3. Multiple Similar Clients 320 The emergency of more and more Internet storage services provides 321 users with a wide range of choices for storing their local files 322 remotely. Like other Internet applications, users are not restricted 323 to use only one of those services. Actually, they tend to have 324 multiple accounts for different Internet storage services and 325 experience them simultaneously. One important reason is that users 326 are always pursuing better functionality. For example, Dropbox is 327 better at file processing, OneDrive is better at the interoperability 328 and compatibility with Microsoft Office while GoogleDrive has a 329 better performance at mail attachment. To enable all the desired 330 functions and features, a simple way is to register and use all the 331 desired Internet storage services. Furthermore, people may simply 332 need multiple Internet storage services for larger storage space and 333 higher reliability. 335 However, using different Internet storage service results in a 336 problem that users have to install multiple similar client 337 applications. Since almost all commercial Internet storage services 338 have their own proprietary sync protocols and corresponding client 339 applications, installing and running multiple similar client 340 applications sacrifices the user experience and also increases the 341 complexity of synchronizing files with different providers' servers 342 in Internet. For instance, users usually suffer from duplicate 343 operations in order to upload the same file to their different 344 service accounts. 346 4.4. Protocol Capability Configurations and Implementations 348 Data sync is not a simple remote file transfer process, it can 349 implement several capabilities to optimize the data storage usage and 350 speed up data transmissions. There exists five well-known 351 capabilities that can be employed by Internet storage services to 352 improve the sync efficiency and reliability: chunking, bundling, 353 deduplication, delta-encoding and compression. All these 354 capabilities are aimed to help to efficiently synchronize user data 355 via Internet communications. 357 However, the investigation of [Benchmarking] shows that different 358 Internet storage services have different capability configurations 359 and implementations. And most existing Internet storage services do 360 not implement all the five capabilities in their sync protocol. Lack 361 of such capabilities can do affect the sync efficiency. Table 1 from 362 [QuickSync] shows different capabilities implementations of four 363 popular Internet storage services (i.e. Dropbox, GoogleDrive, 364 OneDrive and Seafile) on Windows OS. 366 +----------------+-------------+-------------+-------------+-------------+ 367 | Capabilities | Dropbox | GoogleDrive | OneDrive | Seafile | 368 | | | | | | 369 +----------------+-------------+-------------+-------------+-------------+ 370 | Chunking | 4MB | 8MB | Variable | Variable | 371 +----------------+-------------+-------------+-------------+-------------+ 372 | Bundling | Yes | No | No | No | 373 +----------------+-------------+-------------+-------------+-------------+ 374 | Deduplication | Yes | No | No | Yes | 375 +----------------+-------------+-------------+-------------+-------------+ 376 | Delta-encoding | Yes | No | No | No | 377 +----------------+-------------+-------------+-------------+-------------+ 378 | Compression | Yes | Yes | No | No | 379 +----------------+-------------+-------------+-------------+-------------+ 380 Table 1 382 Measurements and study from [QuickSync] also reveal that those key 383 capabilities significantly affect the sync performance. Most of them 384 should be implemented and well configured to achieve data sync. The 385 remaining part of this subsection lists the problems caused by 386 insufficient or unreasonably configured capabilities. 388 4.4.1. Chunking and Deduplication 390 Chunking is the most widely implemented capability that simplifies 391 the transmission recovery when the sync of a large file is 392 interrupted. Different implementations of chunking has different 393 chunking schemes (i.e. dynamic chunking or static chunking) and chunk 394 sizes. Chunking is closely related to deduplication since the 395 deduplication is performed in the chunk granularity. Typically, 396 smaller chunk size and dynamic chunking scheme (e.g. Content Defined 397 Chunking) are better for detecting and eliminating redundancy. 398 However the ability to detect more redundancy is not always equal to 399 better sync efficiency since it will introduce more computation 400 overhead (i.e. finding more redundancy needs more CPU time). 401 Aggressive dynamic chunking scheme (e.g. Content Defined Chunking) 402 performs better in a high delay (i.e. high RTT) environment, while 403 fixed-size scheme performs well in good network conditions. A trade- 404 off between computation time and transmission time need to be 405 considered to achieve an effective chunking. A better chunking 406 strategy may be network-aware which means the sync should be able to 407 employ appropriate chunking strategy according to its current network 408 condition. 410 4.4.2. Chunking and Delta-encoding 412 Delta-encoding is an algorithm that can be used to find the different 413 portion of two files and achieve incremental sync. However, not all 414 Internet storage services implement delta-encoding. One possible 415 reason is that most delta-encoding algorithms work at the granularity 416 of file, while to save the storage space thus reducing the cost, 417 files are often split into chunks to manage for Internet storage 418 services. Naively piecing together all chunks to reconstruct the 419 whole file to achieve incremental sync would waste massive intra- 420 cluster bandwidth. Therefore, some Internet storage services, e.g. 421 Dropbox, implement delta-encoding at the chunk granularity. The 422 delta-encoding is performed between two chunks in the original and 423 modified version respectively according to the chunk offset from the 424 beginning of the file. If a service uses the fixed size chunking 425 method, some types of modifications, e.g. inserting some new data at 426 the head of a file, may cause that the two chunks used to perform 427 delta-encoding have very little similarity. In this circumstance, 428 delta-encoding is unable to reveal the delta between the original and 429 modified file so that the incremental sync fails. To solve the 430 problem, we need to design an improved delta-encoding algorithm with 431 appropriate chunking that makes the incremental sync always available 432 in various scenarios. 434 4.4.3. Bundling 436 Small files are more likely to be modified and synchronized 437 frequently. For example, people usually collaborate on a number of 438 small files (e.g. a project's source code always consists of multiple 439 small files). In a high delay environment, synchronizing large 440 number of small files is not efficient. One reason is that most 441 existing Internet storage services employ a sequential 442 acknowledgement mechanism. Under this circumstance, the next chunk 443 is only allowed to be transmitted until the last chunk's 444 acknowledgement has been received. The sequential acknowledgement 445 mechanism wastes the limited bandwidth since the TCP connection is in 446 idle state for a long time. Bundling small files together and 447 employing delayed acknowledgement mechanism can effectively make full 448 use of limited bandwidth so that the whole sync time and traffic 449 overhead can be significantly decreased. 451 4.5. Sync Protocols in Mobile and Wireless Environments 453 The increasing number of mobile terminals introduces the requirement 454 of synchronizing data on any device via any connectivity at anytime 455 and anywhere. A change made on the data through the desktop is 456 required to be automatically transferred to the user's mobile phone 457 or other mobile devices. Based on the measurements from 458 [Look_at_Mobile_Cloud], the problem of missing capabilities is more 459 severe when referring to the mobile Internet storage services. The 460 root cause and problem are twofold: 462 First of all, mobile devices have limited storage and computation 463 ability, it is really hard to implement all the five useful 464 capabilities discussed previously on a mobile client since the 465 implementation of those capabilities will bring extra overhead 466 (Table 2 shows the implementations for capabilities on Android OS). 467 The measurement results from [Look_at_Mobile_Cloud] shows that none 468 of existing mobile Internet storage services implement all the five 469 key capabilities and only very few of them could be found on a mobile 470 Internet storage client. That explains why most Internet storage 471 services wastes limited bandwidth, produce large useless traffic and 472 suffer long sync time in the mobile environment. How to implement 473 all the desired capabilities with lower requirement of storage and 474 computation resources is a critical problem needs to be addressed. 476 +----------------+-------------+-------------+-------------+-------------+ 477 | Capabilities | Dropbox | GoogleDrive | OneDrive | Seafile | 478 | | | | | | 479 +----------------+-------------+-------------+-------------+-------------+ 480 | Chunking | 4MB | 260K | 1MB | No | 481 +----------------+-------------+-------------+-------------+-------------+ 482 | Bundling | No | No | No | No | 483 +----------------+-------------+-------------+-------------+-------------+ 484 | Deduplication | Yes | No | No | No | 485 +----------------+-------------+-------------+-------------+-------------+ 486 | Delta-encoding | No | No | No | No | 487 +----------------+-------------+-------------+-------------+-------------+ 488 | Compression | No | No | No | No | 489 +----------------+-------------+-------------+-------------+-------------+ 490 Table 2 492 Secondly, sync protocol cannot well handle network disruptions caused 493 by unstable network connection. For example, some services fail to 494 resume sync if the data transmission is interrupted, or incur too 495 much additional recovery overhead when exception happens. A well 496 designed sync protocol that guarantees reliability and efficiency in 497 mobile or wireless networks is expected. 499 4.6. Unsatisfactory Concurrent Work Ability 501 With the popularity of Internet storage services, collaborative work 502 is becoming an important feature of such services. This feature is 503 especially important and provides convenience for a team or an 504 organization since participants could easily retrieve and edit the 505 target file on the Internet. Currently, such collaborative work 506 ability is still unsatisfactory that some common and frequent 507 operations may lead to redundant file versions. More specifically, 508 parallel updates from different end users may result in a version 509 conflict. If two or more users are editing the same file 510 concurrently, it is hard to make the file updated correctly. To 511 ensure every participant's modification would be considered, the 512 typical way is to lock the file and allow other participants to 513 create different versions for the same file. To obtain a final 514 version, participants have to negotiate with each other about their 515 modifications (versions) and merge the final version manually. This 516 would definitely affect the work efficiency since people have to 517 spend lots of time and effort on managing redundant versions and 518 merging a final version. 520 A desired concurrent work ability is when different people are 521 working on the same file, the client should automatically create 522 exclusive versions for their users locally. And after they finished 523 and uploaded to the server, the server would automatically merge 524 different versions to get a final version without any human 525 involvement. Furthermore, a better solution is like what 526 [GoogleDocs] does which provides actual real-time edit. Multiple 527 people could edit the same file and are able to find each other's 528 cursor and real-time operation. Such desired ability does help to 529 improve the collaborative work ability but is really challenging when 530 designing a protocol. 532 5. Advantages of Standard Sync Protocol 534 An open and standard sync protocol between client and server can 535 effectively address some problems mentioned above. The sync protocol 536 consists of two types of flows: control flow and data flow. Control 537 flow is between client and control server. It is intended for user 538 authentication, metadata management and also the active notification 539 of data changes. Data flow is between client and data storage 540 servers, which is only for transmitting actual file data (in the form 541 of numerous chunks). The combination of control flow and data flow 542 enables the whole data sync. According to the analysis of problems 543 above, the key capabilities could be supported as optional features 544 in the sync protocol and it would be better if the protocol is 545 network-aware. The rest of this section lists the advantages of 546 employing an open and standard sync protocol. 548 First off, with a standard sync protocol provided, a third party 549 client that supports multiple Internet storage services is easy to 550 implement since APIs provided by different providers would be 551 unnecessary or at least simplified. This would attract more and more 552 people or organizations to develop and implement their own client 553 (sometimes it is even possible for the user himself to implement his 554 client). As a result, users do not need multiple clients for 555 multiple services any more and their user experience is improved. 556 Furthermore, the competition in the (third party) client market is 557 increasing which is beneficial for the users. They are able to 558 choose their clients flexibly and the frequent updates of clients 559 enable users to obtain more functions and better user experience. 561 Another advantage of having standard sync protocol is that the sync 562 among different services is available or at least possible to 563 achieve. If two different services both employ the standard sync 564 protocol, their users could synchronize files with each other using 565 the same standard sync protocol (not the basic HTTP download any 566 more). In this way, users from different services could achieve 567 sharing and coordinated operations on their local files. 569 Using standard sync protocol also makes it easy to improve Internet 570 storage services. Compared with the existing proprietary formats, 571 standard sync protocol is totally open and designed by many 572 contributors. People are welcome to revise and improve the standard 573 protocol. We believe that both users and providers will benefit a 574 lot from such a standard sync protocol. 576 6. Understanding of Sync Protocol 578 Client Control Server Data Storage Server 579 | | | 580 |---meta data, auth info-->| | 581 |<-------start sync--------| | 582 | sync preparation | | 583 | | | 584 |--------------------store/retrieve------------------>| 585 |<--------------------ok/content----------------------| 586 | ... | 587 |--------------------store/retrieve------------------>| 588 |<--------------------ok/content----------------------| 589 | data transmission | 590 | | | 591 |---meta data, ver info--->| | 592 |<-----conclude sync-------| | 593 | sync finish | | 594 | | | 596 Figure 2 598 Figure 2 shows a preliminary and high level understanding of the sync 599 protocol. The whole sync process could be divided into three stages: 600 sync preparation, data transmission and sync finish. In the first 601 stage, the client should exchange its metadata, authentication 602 information with the control server to initiate a sync process. 603 During this stage, the capabilities including network-aware chunking 604 and deduplication should be performed. In the second stage, data 605 transmission, client sends/retrieves chunks to/from the data storage 606 servers. To speed up the data sync and make it more reliable, the 607 capabilities like bundling and delta-encoding could be employed. 608 When the sync finishes (i.e. sync finish stage), the client would 609 send its metadata again for the control server to check and conclude 610 the sync process. Also some version information is exchanged for the 611 version control. From this understanding we could derive that the 612 control flow and data flow are closely related, which cannot work 613 without each other. 615 7. Related Work in IETF 617 WebDAV ([RFC4918]) provides an alternative way to exchange local data 618 with remote web servers. It can be treated as previous IETF effort 619 on file collections, authoring and versioning over HTTP. WebDAV 620 mainly focuses on the authoring and versioning for distributed web 621 contents. Typical WebDAV protocol extends HTTP protocol to enable 622 users to collaboratively edit and manage files on remote servers. 623 WebDAV focuses on the distributed work (authoring and versioning) 624 while ISS will focus on the data sync. A potential major difference 625 between data sync and distributed authoring/versioning is the 626 frequency of data transmission. In data sync, the client will 627 automatically exchange data with remote servers when there are any 628 changes. In reality, every time you perform 'save' operation of a 629 file, the client will solicit a data sync process. Such frequent 630 data transmission will cause a large amount of network traffic. This 631 introduces challenges to the design of sync protocols. A possible 632 solution is to make use of those well-known service capabilities and 633 make the protocol to be network-aware to some extent. The ISS 634 protocol suite could build on the WebDAV protocol or basic HTTP 635 protocol. 637 8. Security Considerations (TBD) 639 TBD 641 9. Acknowledgements 643 The authors would like to thank Barry Leiba, Mark Nottingham, Julian 644 Reschke, Marc Blanchet, Mike Bishop, Haibin Song, Philip Hallam 645 Baker, Michiel de Jong and Ted Lemon for their valuable comments and 646 contributions to this work. 648 10. Informative References 650 [Batched] Li, Z., Wilson, C., Jiang, Z., Liu, Y., Zhao, B., Jin, C., 651 Zhang, Z., and Y. Dai, "Efficient Batched Synchronization 652 in Dropbox-Like Cloud Storage Services", Middleware , 653 2013. 655 [Benchmarking] 656 Drago, I., Bocchi, E., Mellia, M., Slatman, H., and A. 657 Pras, "Benchmarking Personal Cloud Storage", IMC , 2013. 659 [ExpanDrive] 660 "ExpanDrive", . 662 [GoogleDocs] 663 "Google Docs", 664 . 666 [IFTTT] "IFTTT", . 668 [Inside_Dropbox] 669 Drago, I., Mellia, M., Munafo, M., Sperotto, A., Sadre, 670 R., and A. Pras, "Inside Dropbox: Understanding Personal 671 Cloud Storage Services", IMC , 2012. 673 [Look_at_Mobile_Cloud] 674 Cui, Y., Lai, Z., and N. Dai, "A First Look at Mobile 675 Cloud Storage Services: Architecture, Experimentation and 676 Challenge", IEEE Network , 2015. 678 [QuickSync] 679 Cui, Y., Lai, Z., Wang, X., Dai, N., and C. Miao, 680 "QuickSync: Improving Synchronization Efficiency for 681 Mobile Cloud Storage Services", MOBICOM , 2015. 683 [RFC4918] Dusseault, L., Ed., "HTTP Extensions for Web Distributed 684 Authoring and Versioning (WebDAV)", RFC 4918, 685 DOI 10.17487/RFC4918, June 2007, 686 . 688 [rsync] "rsync", . 690 [Towards] Li, Z., Jin, C., Xu, T., Wilson, C., Liu, Y., Cheng, L., 691 Liu, Y., Dai, Y., and Z. Zhang, "Towards Network-level 692 Efficiency for Cloud Storage Services", IMC , 2014. 694 [users] "400 million strong", . 697 Authors' Addresses 699 Yong Cui 700 Tsinghua University 701 Beijing 100084 702 P.R.China 704 Phone: +86-10-6260-3059 705 Email: yong@csnet1.cs.tsinghua.edu.cn 706 Zeqi Lai 707 Tsinghua University 708 Beijing 100084 709 P.R.China 711 Phone: +86-10-6278-5822 712 Email: uestclzq@gmail.com 714 Linhui Sun 715 Tsinghua University 716 Beijing 100084 717 P.R.China 719 Phone: +86-10-6278-5822 720 Email: lh.sunlinh@gmail.com