idnits 2.17.1
draft-cui-iss-problem-03.txt:
Checking boilerplate required by RFC 5378 and the IETF Trust (see
https://trustee.ietf.org/license-info):
----------------------------------------------------------------------------
No issues found here.
Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
----------------------------------------------------------------------------
No issues found here.
Checking nits according to https://www.ietf.org/id-info/checklist :
----------------------------------------------------------------------------
** The document seems to lack an IANA Considerations section. (See Section
2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
when there are no actions for IANA.)
** There are 28 instances of too long lines in the document, the longest
one being 3 characters in excess of 72.
Miscellaneous warnings:
----------------------------------------------------------------------------
== The copyright year in the IETF Trust and authors Copyright Line does not
match the current year
-- The document date (November 2, 2015) is 3096 days in the past. Is this
intentional?
Checking references for intended status: Informational
----------------------------------------------------------------------------
== Unused Reference: 'Batched' is defined on line 650, but no explicit
reference was found in the text
== Unused Reference: 'Towards' is defined on line 690, but no explicit
reference was found in the text
Summary: 2 errors (**), 0 flaws (~~), 3 warnings (==), 1 comment (--).
Run idnits with the --verbose option for more detailed information about
the items above.
--------------------------------------------------------------------------------
2 Network Working Group Y. Cui
3 Internet-Draft Z. Lai
4 Intended status: Informational L. Sun
5 Expires: May 5, 2016 Tsinghua University
6 November 2, 2015
8 Internet Storage Sync: Problem Statement
9 draft-cui-iss-problem-03
11 Abstract
13 Internet storage services have become more and more popular. They
14 attract a huge number of users and produce a significant share of
15 Internet traffic. Most existing Internet storage services make use
16 of proprietary sync protocols with different capabilities to achieve
17 the data sync. However, a single Internet storage service using its
18 proprietary sync protocols has intrinsic limitations on service
19 usability and network performance. This document outlines the
20 related problems caused by using proprietary sync protocols and
21 missing key capabilities. It also shows a demand for designing a
22 standard sync protocol to achieve better usability and sync
23 performance.
25 Status of This Memo
27 This Internet-Draft is submitted in full conformance with the
28 provisions of BCP 78 and BCP 79.
30 Internet-Drafts are working documents of the Internet Engineering
31 Task Force (IETF). Note that other groups may also distribute
32 working documents as Internet-Drafts. The list of current Internet-
33 Drafts is at http://datatracker.ietf.org/drafts/current/.
35 Internet-Drafts are draft documents valid for a maximum of six months
36 and may be updated, replaced, or obsoleted by other documents at any
37 time. It is inappropriate to use Internet-Drafts as reference
38 material or to cite them other than as "work in progress."
40 This Internet-Draft will expire on May 5, 2016.
42 Copyright Notice
44 Copyright (c) 2015 IETF Trust and the persons identified as the
45 document authors. All rights reserved.
47 This document is subject to BCP 78 and the IETF Trust's Legal
48 Provisions Relating to IETF Documents
49 (http://trustee.ietf.org/license-info) in effect on the date of
50 publication of this document. Please review these documents
51 carefully, as they describe your rights and restrictions with respect
52 to this document. Code Components extracted from this document must
53 include Simplified BSD License text as described in Section 4.e of
54 the Trust Legal Provisions and are provided without warranty as
55 described in the Simplified BSD License.
57 Table of Contents
59 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
60 2. Terminology and Concepts . . . . . . . . . . . . . . . . . . 4
61 3. Architecture of Internet Storage Service . . . . . . . . . . 5
62 4. Problems . . . . . . . . . . . . . . . . . . . . . . . . . . 6
63 4.1. Complicated Support for APIs . . . . . . . . . . . . . . 6
64 4.2. Unavailable Cross-service Sync . . . . . . . . . . . . . 7
65 4.3. Multiple Similar Clients . . . . . . . . . . . . . . . . 7
66 4.4. Protocol Capability Configurations and Implementations . 8
67 4.4.1. Chunking and Deduplication . . . . . . . . . . . . . 9
68 4.4.2. Chunking and Delta-encoding . . . . . . . . . . . . . 9
69 4.4.3. Bundling . . . . . . . . . . . . . . . . . . . . . . 10
70 4.5. Sync Protocols in Mobile and Wireless Environments . . . 10
71 4.6. Unsatisfactory Concurrent Work Ability . . . . . . . . . 11
72 5. Advantages of Standard Sync Protocol . . . . . . . . . . . . 12
73 6. Understanding of Sync Protocol . . . . . . . . . . . . . . . 13
74 7. Related Work in IETF . . . . . . . . . . . . . . . . . . . . 14
75 8. Security Considerations (TBD) . . . . . . . . . . . . . . . . 14
76 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 14
77 10. Informative References . . . . . . . . . . . . . . . . . . . 14
78 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 15
80 1. Introduction
82 Internet storage services provide a convenient way for users to
83 synchronize local files or folders with remote servers. In recent
84 years, Internet storage services have gained tremendous popularity
85 and accounted for a large amount of Internet traffic. This high
86 public interest also pushes various providers to enter the Internet
87 storage market. Services like Dropbox, Google Drive, OneDrive and
88 Box are becoming pervasive in people's routine. Dropbox, typically
89 considered as one of the leading providers, annouced that they have
90 more than 400 million registered users in June, 2015 [users], and
91 this number will keep growing in the future. Internet storage
92 services enable the users to access, operate and share their data
93 from anywhere, on any devices, at any time and with any connectivity.
94 Internet storage services also provide powerful APIs which allow
95 third-party applications to offload the burden of data storage and
96 management to the server. By aggregating users' files or application
97 data in the server, Internet storage services are becoming the "data
98 entrance" for personal users.
100 Sync protocol is the key design consideration of Internet storage
101 services. The sync protocol can be equipped with several
102 capabilities to optimize the storage usage and speed up data
103 transmission. Existing Internet storage services employ their
104 proprietary sync protocols to store/retrieve user data to/from the
105 remote servers. However, using proprietary sync protocols with
106 different capabilities in different Internet Storage services has
107 intrinsic limitations on service usability and network performance.
109 Multi-service usability: Users may use multiple Internet storage
110 services for the diversity of performance and functionality. In
111 addition, an Internet storage service has full access to user data,
112 the user data is at risk when the service is attacked or when
113 authorities require the providers to expose their data. Some
114 enterprise users may want to use their own network-based storage
115 service. Furthermore, it is complicated for developers to use
116 different APIs to combine their application with Internet storage
117 service. It also makes it unavailable for an Internet storage
118 service user to synchronize data with the users of other service.
119 Moreover, to use multi-service a user may install a series of client
120 applications with similar functionality, which wastes the local
121 resource and sacrifices the user experience.
123 Missing or misusing capabilities: Previous works show that existing
124 Internet storage services have different capability configurations
125 and implementations. These capabilities are closely related to each
126 other and help to efficiently synchronize user data. However, most
127 of the storage services are found to be lack of key capabilities or
128 the capabilities are not reasonably configured, which may result in
129 unexpected sync failure and sync inefficiency. How to reasonably
130 design and implement capabilities in the sync protocol has indeed
131 become a critical problem for the providers.
133 To address the problems mentioned above, an open and standard sync
134 protocol is required. In addition, this standard sync protocol are
135 expected to support the useful capabilities to avoid unexpected sync
136 failures and improve network performance.
138 This document outlines the problems arisen in existing Internet
139 storage services with various proprietary sync protocols. Section 2
140 lists the terminology and related concepts of Internet storage
141 services. Section 3 introduces the architecture of existing Internet
142 storage services. Section 4 describes the main problems and issues
143 that need to be considered. Section 5 explains the advantages of
144 using open and standard sync protocol. Section 6 shows a high-level
145 understanding of the sync protocol. Section 7 identifies the
146 differences between ISS and related work in IETF (i.e. WebDAV).
148 2. Terminology and Concepts
150 Data synchronization (sync): A primary technique for Internet storage
151 services. It enables the client to automatically update local file
152 changes to the remote servers through network communications.
154 Client: An application which is installed at the user side (i.e. on
155 multiple terminals). It enables users to access and experience
156 Internet storage service.
158 Control server: The entity that takes the responsibility of
159 authenticating users, managing metadata information and also
160 notifying changes to the client. It stores authentication and
161 metadata information of users.
163 Data storage server: The entity that stores the synchronized files of
164 users.
166 Control data: The control information exchanged with control server
167 to fulfil the data sync process. Typical control data includes
168 metadata (e.g. hashes for chunks), authentication information and
169 etc.
171 Content data: The original data of the local file, often in forms of
172 small chunks.
174 Sync protocol: A communication protocol between client and remote
175 servers to achieve data sync. It contains control flow and data
176 flow. Sync protocols are always built on HTTPS/HTTP.
178 o Control flow: This flow is for client and control server to
179 exchange control data.
181 o Data flow: This flow is for transmitting content data between
182 client and data storage servers.
184 Sync efficiency: A performance metric that indicates how fast the
185 changes can be synchronized to the Internet with the lowest traffic
186 overhead.
188 Useful capabilities to improve sync efficiency:
190 o Chunking: Split large file into small chunks.
192 o Bundling: Transmit multiple small chunks as a single big chunk.
194 o Deduplication: Avoid retransmission of existing content on the
195 Internet.
197 o Delta-encoding: Only synchronize modified data.
199 o Compression: Compress data before transmission.
201 3. Architecture of Internet Storage Service
203 The architecture of most Internet storage services is generally
204 composed of three major components: client, control server and data
205 storage server. And the whole architecture is shown in Figure 1.
207 * * * * * * * *
208 * * * * * * * * * * * * * *
209 * INTERNET *
210 * +------------+ +------------+ *
211 ------| Control | | +------------+ *
212 | * | server | | |Data storage|========
213 | * +------------+ + | servers | * |
214 | * +------------+ * |
215 | * * * * * * * * * * * * * * |
216 Control Flow * * * * * * * * Data Flow
217 | |
218 | |
219 | +--------+ |
220 ---------------------| Client |=====================
221 +--------+
223 Figure 1
225 With the help of sync protocol, all the three components could
226 communicate with each other. Control server is responsible for
227 storing all the control data, including authentication information,
228 metadata and etc. And once there are changes made on synchronized
229 files, the control server will notify the clients. However the other
230 type of data, content data, is stored in the form of chunks on the
231 data storage servers with no knowledge of sources, users and
232 relationship with other data chunks. As a result, a complete user
233 file will be split into small chunks and those chunks may be stored
234 on several different data storage servers. These two types of
235 servers are separate logical entities and are usually deployed in
236 different locations. Every time the client synchronize a local file
237 to the Internet, it needs to exchange control data and content data
238 with different types of servers in different flows.
240 4. Problems
242 Existing popular Internet storage services, including Dropbox,
243 OneDrive, GoogleDrive and etc, are using their own proprietary sync
244 protocols to achieve the data sync. Using different proprietary
245 protocols are always considered not to be beneficial to the
246 development of Internet services. This section describes current
247 problems for Internet storage services caused by their sync
248 protocols. We summarize six specific problems from three different
249 aspects: service usability, protocol capabilities and concurrent work
250 ability. As we discussed in Section 1, users prefer to use multiple
251 storage services for the considerations of performance, reliability
252 and security. Service usability among multiple services is still
253 lacking to some extent due to the proprietary format of sync
254 protocols. Section 4.1, Section 4.2 and Section 4.3 describe the
255 problems which are concerned with the usability. Moreover, previous
256 works and measurements have revealed that most sync protocols are
257 lack of key service capabilities or the capabilities are not well
258 configured, which significantly degrades the network performance,
259 especially in the mobile and wireless environment. Section 4.4 and
260 Section 4.5 illustrate the problems of current protocol capabilities.
261 In addition, the unsatisfied concurrent work ability is specified in
262 Section 4.6.
264 4.1. Complicated Support for APIs
266 Popular Internet storage services provide APIs that extend access to
267 the content management features in client software for use in third-
268 party applications. In practical platform, these APIs take care of
269 synchronizing data with Internet storage servers through a familiar
270 system-like way. Behind the scenes, API synchronize changes to the
271 server and automatically notify the client when changes are made on
272 other devices. These APIs can also include some further advanced
273 features or functions, e.g. revision or restoration of files, to make
274 the client work better. Different providers have different APIs
275 provided to the developers and their APIs have different styles and
276 features in order to support different platforms (e.g. Windows and
277 Andorid).
279 Third-party applications prefer to combine multiple Internet storage
280 services into their applications to achieve better performance,
281 reliability and security. However, for these developers who want to
282 use multiple storage services, they need to learn the APIs of all
283 service providers in order to design and implement their own clients.
284 Although there have already been some successful third party clients
285 that support multiple services (e.g. ExpanDrive [ExpanDrive], IFTTT
286 [IFTTT]), it is not easy for the developers to learn and apply so
287 many different APIs to develop and maintain their third party
288 clients.
290 4.2. Unavailable Cross-service Sync
292 Synchronizing is one of the most important functions provided by
293 Internet storage services. With this function provided, files in the
294 Internet could be easily shared and manipulated by different people
295 and groups. Anyone who is permitted to read and download the file is
296 able to modify and upload new versions of this file to the Internet.
298 However, this synchronizing function merely works well inside a
299 single service. Users who are using the same Internet storage
300 service could easily achieve the sharing (i.e. download) and
301 coordinated operations on their files. When referring to the
302 synchronizing among different Internet storage services, it is not
303 complete since the sync among different services is not available.
304 For example, if a Dropbox user wants to work on a cooperative file
305 with a Google Drive user currently, he is only able to share this
306 file with the other one by sending an open HTTP link of this file.
307 After clicking on that link, the Google Drive user could only
308 download this file through HTTP. However, the Google Drive user can
309 only read and download the shared file. He cannot modify and update
310 the shared file since Dropbox and Google Drive are using two
311 different proprietary sync protocols. This is because the
312 cooperative file is stored on Dropbox servers. A Google Drive client
313 cannot download/upload the file through Dropbox's sync protocol since
314 it has no idea of the Dropbox's sync protocol. Different services
315 using different proprietary sync protocols results in the
316 unavailability.
318 4.3. Multiple Similar Clients
320 The emergency of more and more Internet storage services provides
321 users with a wide range of choices for storing their local files
322 remotely. Like other Internet applications, users are not restricted
323 to use only one of those services. Actually, they tend to have
324 multiple accounts for different Internet storage services and
325 experience them simultaneously. One important reason is that users
326 are always pursuing better functionality. For example, Dropbox is
327 better at file processing, OneDrive is better at the interoperability
328 and compatibility with Microsoft Office while GoogleDrive has a
329 better performance at mail attachment. To enable all the desired
330 functions and features, a simple way is to register and use all the
331 desired Internet storage services. Furthermore, people may simply
332 need multiple Internet storage services for larger storage space and
333 higher reliability.
335 However, using different Internet storage service results in a
336 problem that users have to install multiple similar client
337 applications. Since almost all commercial Internet storage services
338 have their own proprietary sync protocols and corresponding client
339 applications, installing and running multiple similar client
340 applications sacrifices the user experience and also increases the
341 complexity of synchronizing files with different providers' servers
342 in Internet. For instance, users usually suffer from duplicate
343 operations in order to upload the same file to their different
344 service accounts.
346 4.4. Protocol Capability Configurations and Implementations
348 Data sync is not a simple remote file transfer process, it can
349 implement several capabilities to optimize the data storage usage and
350 speed up data transmissions. There exists five well-known
351 capabilities that can be employed by Internet storage services to
352 improve the sync efficiency and reliability: chunking, bundling,
353 deduplication, delta-encoding and compression. All these
354 capabilities are aimed to help to efficiently synchronize user data
355 via Internet communications.
357 However, the investigation of [Benchmarking] shows that different
358 Internet storage services have different capability configurations
359 and implementations. And most existing Internet storage services do
360 not implement all the five capabilities in their sync protocol. Lack
361 of such capabilities can do affect the sync efficiency. Table 1 from
362 [QuickSync] shows different capabilities implementations of four
363 popular Internet storage services (i.e. Dropbox, GoogleDrive,
364 OneDrive and Seafile) on Windows OS.
366 +----------------+-------------+-------------+-------------+-------------+
367 | Capabilities | Dropbox | GoogleDrive | OneDrive | Seafile |
368 | | | | | |
369 +----------------+-------------+-------------+-------------+-------------+
370 | Chunking | 4MB | 8MB | Variable | Variable |
371 +----------------+-------------+-------------+-------------+-------------+
372 | Bundling | Yes | No | No | No |
373 +----------------+-------------+-------------+-------------+-------------+
374 | Deduplication | Yes | No | No | Yes |
375 +----------------+-------------+-------------+-------------+-------------+
376 | Delta-encoding | Yes | No | No | No |
377 +----------------+-------------+-------------+-------------+-------------+
378 | Compression | Yes | Yes | No | No |
379 +----------------+-------------+-------------+-------------+-------------+
380 Table 1
382 Measurements and study from [QuickSync] also reveal that those key
383 capabilities significantly affect the sync performance. Most of them
384 should be implemented and well configured to achieve data sync. The
385 remaining part of this subsection lists the problems caused by
386 insufficient or unreasonably configured capabilities.
388 4.4.1. Chunking and Deduplication
390 Chunking is the most widely implemented capability that simplifies
391 the transmission recovery when the sync of a large file is
392 interrupted. Different implementations of chunking has different
393 chunking schemes (i.e. dynamic chunking or static chunking) and chunk
394 sizes. Chunking is closely related to deduplication since the
395 deduplication is performed in the chunk granularity. Typically,
396 smaller chunk size and dynamic chunking scheme (e.g. Content Defined
397 Chunking) are better for detecting and eliminating redundancy.
398 However the ability to detect more redundancy is not always equal to
399 better sync efficiency since it will introduce more computation
400 overhead (i.e. finding more redundancy needs more CPU time).
401 Aggressive dynamic chunking scheme (e.g. Content Defined Chunking)
402 performs better in a high delay (i.e. high RTT) environment, while
403 fixed-size scheme performs well in good network conditions. A trade-
404 off between computation time and transmission time need to be
405 considered to achieve an effective chunking. A better chunking
406 strategy may be network-aware which means the sync should be able to
407 employ appropriate chunking strategy according to its current network
408 condition.
410 4.4.2. Chunking and Delta-encoding
412 Delta-encoding is an algorithm that can be used to find the different
413 portion of two files and achieve incremental sync. However, not all
414 Internet storage services implement delta-encoding. One possible
415 reason is that most delta-encoding algorithms work at the granularity
416 of file, while to save the storage space thus reducing the cost,
417 files are often split into chunks to manage for Internet storage
418 services. Naively piecing together all chunks to reconstruct the
419 whole file to achieve incremental sync would waste massive intra-
420 cluster bandwidth. Therefore, some Internet storage services, e.g.
421 Dropbox, implement delta-encoding at the chunk granularity. The
422 delta-encoding is performed between two chunks in the original and
423 modified version respectively according to the chunk offset from the
424 beginning of the file. If a service uses the fixed size chunking
425 method, some types of modifications, e.g. inserting some new data at
426 the head of a file, may cause that the two chunks used to perform
427 delta-encoding have very little similarity. In this circumstance,
428 delta-encoding is unable to reveal the delta between the original and
429 modified file so that the incremental sync fails. To solve the
430 problem, we need to design an improved delta-encoding algorithm with
431 appropriate chunking that makes the incremental sync always available
432 in various scenarios.
434 4.4.3. Bundling
436 Small files are more likely to be modified and synchronized
437 frequently. For example, people usually collaborate on a number of
438 small files (e.g. a project's source code always consists of multiple
439 small files). In a high delay environment, synchronizing large
440 number of small files is not efficient. One reason is that most
441 existing Internet storage services employ a sequential
442 acknowledgement mechanism. Under this circumstance, the next chunk
443 is only allowed to be transmitted until the last chunk's
444 acknowledgement has been received. The sequential acknowledgement
445 mechanism wastes the limited bandwidth since the TCP connection is in
446 idle state for a long time. Bundling small files together and
447 employing delayed acknowledgement mechanism can effectively make full
448 use of limited bandwidth so that the whole sync time and traffic
449 overhead can be significantly decreased.
451 4.5. Sync Protocols in Mobile and Wireless Environments
453 The increasing number of mobile terminals introduces the requirement
454 of synchronizing data on any device via any connectivity at anytime
455 and anywhere. A change made on the data through the desktop is
456 required to be automatically transferred to the user's mobile phone
457 or other mobile devices. Based on the measurements from
458 [Look_at_Mobile_Cloud], the problem of missing capabilities is more
459 severe when referring to the mobile Internet storage services. The
460 root cause and problem are twofold:
462 First of all, mobile devices have limited storage and computation
463 ability, it is really hard to implement all the five useful
464 capabilities discussed previously on a mobile client since the
465 implementation of those capabilities will bring extra overhead
466 (Table 2 shows the implementations for capabilities on Android OS).
467 The measurement results from [Look_at_Mobile_Cloud] shows that none
468 of existing mobile Internet storage services implement all the five
469 key capabilities and only very few of them could be found on a mobile
470 Internet storage client. That explains why most Internet storage
471 services wastes limited bandwidth, produce large useless traffic and
472 suffer long sync time in the mobile environment. How to implement
473 all the desired capabilities with lower requirement of storage and
474 computation resources is a critical problem needs to be addressed.
476 +----------------+-------------+-------------+-------------+-------------+
477 | Capabilities | Dropbox | GoogleDrive | OneDrive | Seafile |
478 | | | | | |
479 +----------------+-------------+-------------+-------------+-------------+
480 | Chunking | 4MB | 260K | 1MB | No |
481 +----------------+-------------+-------------+-------------+-------------+
482 | Bundling | No | No | No | No |
483 +----------------+-------------+-------------+-------------+-------------+
484 | Deduplication | Yes | No | No | No |
485 +----------------+-------------+-------------+-------------+-------------+
486 | Delta-encoding | No | No | No | No |
487 +----------------+-------------+-------------+-------------+-------------+
488 | Compression | No | No | No | No |
489 +----------------+-------------+-------------+-------------+-------------+
490 Table 2
492 Secondly, sync protocol cannot well handle network disruptions caused
493 by unstable network connection. For example, some services fail to
494 resume sync if the data transmission is interrupted, or incur too
495 much additional recovery overhead when exception happens. A well
496 designed sync protocol that guarantees reliability and efficiency in
497 mobile or wireless networks is expected.
499 4.6. Unsatisfactory Concurrent Work Ability
501 With the popularity of Internet storage services, collaborative work
502 is becoming an important feature of such services. This feature is
503 especially important and provides convenience for a team or an
504 organization since participants could easily retrieve and edit the
505 target file on the Internet. Currently, such collaborative work
506 ability is still unsatisfactory that some common and frequent
507 operations may lead to redundant file versions. More specifically,
508 parallel updates from different end users may result in a version
509 conflict. If two or more users are editing the same file
510 concurrently, it is hard to make the file updated correctly. To
511 ensure every participant's modification would be considered, the
512 typical way is to lock the file and allow other participants to
513 create different versions for the same file. To obtain a final
514 version, participants have to negotiate with each other about their
515 modifications (versions) and merge the final version manually. This
516 would definitely affect the work efficiency since people have to
517 spend lots of time and effort on managing redundant versions and
518 merging a final version.
520 A desired concurrent work ability is when different people are
521 working on the same file, the client should automatically create
522 exclusive versions for their users locally. And after they finished
523 and uploaded to the server, the server would automatically merge
524 different versions to get a final version without any human
525 involvement. Furthermore, a better solution is like what
526 [GoogleDocs] does which provides actual real-time edit. Multiple
527 people could edit the same file and are able to find each other's
528 cursor and real-time operation. Such desired ability does help to
529 improve the collaborative work ability but is really challenging when
530 designing a protocol.
532 5. Advantages of Standard Sync Protocol
534 An open and standard sync protocol between client and server can
535 effectively address some problems mentioned above. The sync protocol
536 consists of two types of flows: control flow and data flow. Control
537 flow is between client and control server. It is intended for user
538 authentication, metadata management and also the active notification
539 of data changes. Data flow is between client and data storage
540 servers, which is only for transmitting actual file data (in the form
541 of numerous chunks). The combination of control flow and data flow
542 enables the whole data sync. According to the analysis of problems
543 above, the key capabilities could be supported as optional features
544 in the sync protocol and it would be better if the protocol is
545 network-aware. The rest of this section lists the advantages of
546 employing an open and standard sync protocol.
548 First off, with a standard sync protocol provided, a third party
549 client that supports multiple Internet storage services is easy to
550 implement since APIs provided by different providers would be
551 unnecessary or at least simplified. This would attract more and more
552 people or organizations to develop and implement their own client
553 (sometimes it is even possible for the user himself to implement his
554 client). As a result, users do not need multiple clients for
555 multiple services any more and their user experience is improved.
556 Furthermore, the competition in the (third party) client market is
557 increasing which is beneficial for the users. They are able to
558 choose their clients flexibly and the frequent updates of clients
559 enable users to obtain more functions and better user experience.
561 Another advantage of having standard sync protocol is that the sync
562 among different services is available or at least possible to
563 achieve. If two different services both employ the standard sync
564 protocol, their users could synchronize files with each other using
565 the same standard sync protocol (not the basic HTTP download any
566 more). In this way, users from different services could achieve
567 sharing and coordinated operations on their local files.
569 Using standard sync protocol also makes it easy to improve Internet
570 storage services. Compared with the existing proprietary formats,
571 standard sync protocol is totally open and designed by many
572 contributors. People are welcome to revise and improve the standard
573 protocol. We believe that both users and providers will benefit a
574 lot from such a standard sync protocol.
576 6. Understanding of Sync Protocol
578 Client Control Server Data Storage Server
579 | | |
580 |---meta data, auth info-->| |
581 |<-------start sync--------| |
582 | sync preparation | |
583 | | |
584 |--------------------store/retrieve------------------>|
585 |<--------------------ok/content----------------------|
586 | ... |
587 |--------------------store/retrieve------------------>|
588 |<--------------------ok/content----------------------|
589 | data transmission |
590 | | |
591 |---meta data, ver info--->| |
592 |<-----conclude sync-------| |
593 | sync finish | |
594 | | |
596 Figure 2
598 Figure 2 shows a preliminary and high level understanding of the sync
599 protocol. The whole sync process could be divided into three stages:
600 sync preparation, data transmission and sync finish. In the first
601 stage, the client should exchange its metadata, authentication
602 information with the control server to initiate a sync process.
603 During this stage, the capabilities including network-aware chunking
604 and deduplication should be performed. In the second stage, data
605 transmission, client sends/retrieves chunks to/from the data storage
606 servers. To speed up the data sync and make it more reliable, the
607 capabilities like bundling and delta-encoding could be employed.
608 When the sync finishes (i.e. sync finish stage), the client would
609 send its metadata again for the control server to check and conclude
610 the sync process. Also some version information is exchanged for the
611 version control. From this understanding we could derive that the
612 control flow and data flow are closely related, which cannot work
613 without each other.
615 7. Related Work in IETF
617 WebDAV ([RFC4918]) provides an alternative way to exchange local data
618 with remote web servers. It can be treated as previous IETF effort
619 on file collections, authoring and versioning over HTTP. WebDAV
620 mainly focuses on the authoring and versioning for distributed web
621 contents. Typical WebDAV protocol extends HTTP protocol to enable
622 users to collaboratively edit and manage files on remote servers.
623 WebDAV focuses on the distributed work (authoring and versioning)
624 while ISS will focus on the data sync. A potential major difference
625 between data sync and distributed authoring/versioning is the
626 frequency of data transmission. In data sync, the client will
627 automatically exchange data with remote servers when there are any
628 changes. In reality, every time you perform 'save' operation of a
629 file, the client will solicit a data sync process. Such frequent
630 data transmission will cause a large amount of network traffic. This
631 introduces challenges to the design of sync protocols. A possible
632 solution is to make use of those well-known service capabilities and
633 make the protocol to be network-aware to some extent. The ISS
634 protocol suite could build on the WebDAV protocol or basic HTTP
635 protocol.
637 8. Security Considerations (TBD)
639 TBD
641 9. Acknowledgements
643 The authors would like to thank Barry Leiba, Mark Nottingham, Julian
644 Reschke, Marc Blanchet, Mike Bishop, Haibin Song, Philip Hallam
645 Baker, Michiel de Jong and Ted Lemon for their valuable comments and
646 contributions to this work.
648 10. Informative References
650 [Batched] Li, Z., Wilson, C., Jiang, Z., Liu, Y., Zhao, B., Jin, C.,
651 Zhang, Z., and Y. Dai, "Efficient Batched Synchronization
652 in Dropbox-Like Cloud Storage Services", Middleware ,
653 2013.
655 [Benchmarking]
656 Drago, I., Bocchi, E., Mellia, M., Slatman, H., and A.
657 Pras, "Benchmarking Personal Cloud Storage", IMC , 2013.
659 [ExpanDrive]
660 "ExpanDrive", .
662 [GoogleDocs]
663 "Google Docs",
664 .
666 [IFTTT] "IFTTT", .
668 [Inside_Dropbox]
669 Drago, I., Mellia, M., Munafo, M., Sperotto, A., Sadre,
670 R., and A. Pras, "Inside Dropbox: Understanding Personal
671 Cloud Storage Services", IMC , 2012.
673 [Look_at_Mobile_Cloud]
674 Cui, Y., Lai, Z., and N. Dai, "A First Look at Mobile
675 Cloud Storage Services: Architecture, Experimentation and
676 Challenge", IEEE Network , 2015.
678 [QuickSync]
679 Cui, Y., Lai, Z., Wang, X., Dai, N., and C. Miao,
680 "QuickSync: Improving Synchronization Efficiency for
681 Mobile Cloud Storage Services", MOBICOM , 2015.
683 [RFC4918] Dusseault, L., Ed., "HTTP Extensions for Web Distributed
684 Authoring and Versioning (WebDAV)", RFC 4918,
685 DOI 10.17487/RFC4918, June 2007,
686 .
688 [rsync] "rsync", .
690 [Towards] Li, Z., Jin, C., Xu, T., Wilson, C., Liu, Y., Cheng, L.,
691 Liu, Y., Dai, Y., and Z. Zhang, "Towards Network-level
692 Efficiency for Cloud Storage Services", IMC , 2014.
694 [users] "400 million strong", .
697 Authors' Addresses
699 Yong Cui
700 Tsinghua University
701 Beijing 100084
702 P.R.China
704 Phone: +86-10-6260-3059
705 Email: yong@csnet1.cs.tsinghua.edu.cn
706 Zeqi Lai
707 Tsinghua University
708 Beijing 100084
709 P.R.China
711 Phone: +86-10-6278-5822
712 Email: uestclzq@gmail.com
714 Linhui Sun
715 Tsinghua University
716 Beijing 100084
717 P.R.China
719 Phone: +86-10-6278-5822
720 Email: lh.sunlinh@gmail.com