idnits 2.17.1 

draft-cui-iss-problem-03.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** There are 28 instances of too long lines in the document, the longest
     one being 3 characters in excess of 72.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (November 2, 2015) is 3096 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Unused Reference: 'Batched' is defined on line 650, but no explicit
     reference was found in the text

  == Unused Reference: 'Towards' is defined on line 690, but no explicit
     reference was found in the text


     Summary: 2 errors (**), 0 flaws (~~), 3 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                             Y. Cui
3	Internet-Draft                                                    Z. Lai
4	Intended status: Informational                                    L. Sun
5	Expires: May 5, 2016                                 Tsinghua University
6	                                                        November 2, 2015

8	                Internet Storage Sync: Problem Statement
9	                        draft-cui-iss-problem-03

11	Abstract

13	   Internet storage services have become more and more popular.  They
14	   attract a huge number of users and produce a significant share of
15	   Internet traffic.  Most existing Internet storage services make use
16	   of proprietary sync protocols with different capabilities to achieve
17	   the data sync.  However, a single Internet storage service using its
18	   proprietary sync protocols has intrinsic limitations on service
19	   usability and network performance.  This document outlines the
20	   related problems caused by using proprietary sync protocols and
21	   missing key capabilities.  It also shows a demand for designing a
22	   standard sync protocol to achieve better usability and sync
23	   performance.

25	Status of This Memo

27	   This Internet-Draft is submitted in full conformance with the
28	   provisions of BCP 78 and BCP 79.

30	   Internet-Drafts are working documents of the Internet Engineering
31	   Task Force (IETF).  Note that other groups may also distribute
32	   working documents as Internet-Drafts.  The list of current Internet-
33	   Drafts is at http://datatracker.ietf.org/drafts/current/.

35	   Internet-Drafts are draft documents valid for a maximum of six months
36	   and may be updated, replaced, or obsoleted by other documents at any
37	   time.  It is inappropriate to use Internet-Drafts as reference
38	   material or to cite them other than as "work in progress."

40	   This Internet-Draft will expire on May 5, 2016.

42	Copyright Notice

44	   Copyright (c) 2015 IETF Trust and the persons identified as the
45	   document authors.  All rights reserved.

47	   This document is subject to BCP 78 and the IETF Trust's Legal
48	   Provisions Relating to IETF Documents
49	   (http://trustee.ietf.org/license-info) in effect on the date of
50	   publication of this document.  Please review these documents
51	   carefully, as they describe your rights and restrictions with respect
52	   to this document.  Code Components extracted from this document must
53	   include Simplified BSD License text as described in Section 4.e of
54	   the Trust Legal Provisions and are provided without warranty as
55	   described in the Simplified BSD License.

57	Table of Contents

59	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
60	   2.  Terminology and Concepts  . . . . . . . . . . . . . . . . . .   4
61	   3.  Architecture of Internet Storage Service  . . . . . . . . . .   5
62	   4.  Problems  . . . . . . . . . . . . . . . . . . . . . . . . . .   6
63	     4.1.  Complicated Support for APIs  . . . . . . . . . . . . . .   6
64	     4.2.  Unavailable Cross-service Sync  . . . . . . . . . . . . .   7
65	     4.3.  Multiple Similar Clients  . . . . . . . . . . . . . . . .   7
66	     4.4.  Protocol Capability Configurations and Implementations  .   8
67	       4.4.1.  Chunking and Deduplication  . . . . . . . . . . . . .   9
68	       4.4.2.  Chunking and Delta-encoding . . . . . . . . . . . . .   9
69	       4.4.3.  Bundling  . . . . . . . . . . . . . . . . . . . . . .  10
70	     4.5.  Sync Protocols in Mobile and Wireless Environments  . . .  10
71	     4.6.  Unsatisfactory Concurrent Work Ability  . . . . . . . . .  11
72	   5.  Advantages of Standard Sync Protocol  . . . . . . . . . . . .  12
73	   6.  Understanding of Sync Protocol  . . . . . . . . . . . . . . .  13
74	   7.  Related Work in IETF  . . . . . . . . . . . . . . . . . . . .  14
75	   8.  Security Considerations (TBD) . . . . . . . . . . . . . . . .  14
76	   9.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  14
77	   10. Informative References  . . . . . . . . . . . . . . . . . . .  14
78	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  15

80	1.  Introduction

82	   Internet storage services provide a convenient way for users to
83	   synchronize local files or folders with remote servers.  In recent
84	   years, Internet storage services have gained tremendous popularity
85	   and accounted for a large amount of Internet traffic.  This high
86	   public interest also pushes various providers to enter the Internet
87	   storage market.  Services like Dropbox, Google Drive, OneDrive and
88	   Box are becoming pervasive in people's routine.  Dropbox, typically
89	   considered as one of the leading providers, annouced that they have
90	   more than 400 million registered users in June, 2015 [users], and
91	   this number will keep growing in the future.  Internet storage
92	   services enable the users to access, operate and share their data
93	   from anywhere, on any devices, at any time and with any connectivity.
94	   Internet storage services also provide powerful APIs which allow
95	   third-party applications to offload the burden of data storage and
96	   management to the server.  By aggregating users' files or application
97	   data in the server, Internet storage services are becoming the "data
98	   entrance" for personal users.

100	   Sync protocol is the key design consideration of Internet storage
101	   services.  The sync protocol can be equipped with several
102	   capabilities to optimize the storage usage and speed up data
103	   transmission.  Existing Internet storage services employ their
104	   proprietary sync protocols to store/retrieve user data to/from the
105	   remote servers.  However, using proprietary sync protocols with
106	   different capabilities in different Internet Storage services has
107	   intrinsic limitations on service usability and network performance.

109	   Multi-service usability: Users may use multiple Internet storage
110	   services for the diversity of performance and functionality.  In
111	   addition, an Internet storage service has full access to user data,
112	   the user data is at risk when the service is attacked or when
113	   authorities require the providers to expose their data.  Some
114	   enterprise users may want to use their own network-based storage
115	   service.  Furthermore, it is complicated for developers to use
116	   different APIs to combine their application with Internet storage
117	   service.  It also makes it unavailable for an Internet storage
118	   service user to synchronize data with the users of other service.
119	   Moreover, to use multi-service a user may install a series of client
120	   applications with similar functionality, which wastes the local
121	   resource and sacrifices the user experience.

123	   Missing or misusing capabilities: Previous works show that existing
124	   Internet storage services have different capability configurations
125	   and implementations.  These capabilities are closely related to each
126	   other and help to efficiently synchronize user data.  However, most
127	   of the storage services are found to be lack of key capabilities or
128	   the capabilities are not reasonably configured, which may result in
129	   unexpected sync failure and sync inefficiency.  How to reasonably
130	   design and implement capabilities in the sync protocol has indeed
131	   become a critical problem for the providers.

133	   To address the problems mentioned above, an open and standard sync
134	   protocol is required.  In addition, this standard sync protocol are
135	   expected to support the useful capabilities to avoid unexpected sync
136	   failures and improve network performance.

138	   This document outlines the problems arisen in existing Internet
139	   storage services with various proprietary sync protocols.  Section 2
140	   lists the terminology and related concepts of Internet storage
141	   services.  Section 3 introduces the architecture of existing Internet
142	   storage services.  Section 4 describes the main problems and issues
143	   that need to be considered.  Section 5 explains the advantages of
144	   using open and standard sync protocol.  Section 6 shows a high-level
145	   understanding of the sync protocol.  Section 7 identifies the
146	   differences between ISS and related work in IETF (i.e.  WebDAV).

148	2.  Terminology and Concepts

150	   Data synchronization (sync): A primary technique for Internet storage
151	   services.  It enables the client to automatically update local file
152	   changes to the remote servers through network communications.

154	   Client: An application which is installed at the user side (i.e. on
155	   multiple terminals).  It enables users to access and experience
156	   Internet storage service.

158	   Control server: The entity that takes the responsibility of
159	   authenticating users, managing metadata information and also
160	   notifying changes to the client.  It stores authentication and
161	   metadata information of users.

163	   Data storage server: The entity that stores the synchronized files of
164	   users.

166	   Control data: The control information exchanged with control server
167	   to fulfil the data sync process.  Typical control data includes
168	   metadata (e.g. hashes for chunks), authentication information and
169	   etc.

171	   Content data: The original data of the local file, often in forms of
172	   small chunks.

174	   Sync protocol: A communication protocol between client and remote
175	   servers to achieve data sync.  It contains control flow and data
176	   flow.  Sync protocols are always built on HTTPS/HTTP.

178	   o  Control flow: This flow is for client and control server to
179	      exchange control data.

181	   o  Data flow: This flow is for transmitting content data between
182	      client and data storage servers.

184	   Sync efficiency: A performance metric that indicates how fast the
185	   changes can be synchronized to the Internet with the lowest traffic
186	   overhead.

188	   Useful capabilities to improve sync efficiency:

190	   o  Chunking: Split large file into small chunks.

192	   o  Bundling: Transmit multiple small chunks as a single big chunk.

194	   o  Deduplication: Avoid retransmission of existing content on the
195	      Internet.

197	   o  Delta-encoding: Only synchronize modified data.

199	   o  Compression: Compress data before transmission.

201	3.  Architecture of Internet Storage Service

203	   The architecture of most Internet storage services is generally
204	   composed of three major components: client, control server and data
205	   storage server.  And the whole architecture is shown in Figure 1.

207	                           * * * * * * * *
208	              * * * * * * *               * * * * * * *
209	            *                                 INTERNET  *
210	            *  +------------+        +------------+     *
211	         ------|   Control  |        | +------------+    *
212	        |  *   |   server   |        | |Data storage|========
213	        |   *  +------------+        + |   servers  |   *    |
214	        |   *                          +------------+   *    |
215	        |     * * * * * * *                * * * * * * *     |
216	   Control Flow            * * * * * * * *               Data Flow
217	        |                                                    |
218	        |                                                    |
219	        |                     +--------+                     |
220	         ---------------------| Client |=====================
221	                              +--------+

223	                               Figure 1

225	   With the help of sync protocol, all the three components could
226	   communicate with each other.  Control server is responsible for
227	   storing all the control data, including authentication information,
228	   metadata and etc.  And once there are changes made on synchronized
229	   files, the control server will notify the clients.  However the other
230	   type of data, content data, is stored in the form of chunks on the
231	   data storage servers with no knowledge of sources, users and
232	   relationship with other data chunks.  As a result, a complete user
233	   file will be split into small chunks and those chunks may be stored
234	   on several different data storage servers.  These two types of
235	   servers are separate logical entities and are usually deployed in
236	   different locations.  Every time the client synchronize a local file
237	   to the Internet, it needs to exchange control data and content data
238	   with different types of servers in different flows.

240	4.  Problems

242	   Existing popular Internet storage services, including Dropbox,
243	   OneDrive, GoogleDrive and etc, are using their own proprietary sync
244	   protocols to achieve the data sync.  Using different proprietary
245	   protocols are always considered not to be beneficial to the
246	   development of Internet services.  This section describes current
247	   problems for Internet storage services caused by their sync
248	   protocols.  We summarize six specific problems from three different
249	   aspects: service usability, protocol capabilities and concurrent work
250	   ability.  As we discussed in Section 1, users prefer to use multiple
251	   storage services for the considerations of performance, reliability
252	   and security.  Service usability among multiple services is still
253	   lacking to some extent due to the proprietary format of sync
254	   protocols.  Section 4.1, Section 4.2 and Section 4.3 describe the
255	   problems which are concerned with the usability.  Moreover, previous
256	   works and measurements have revealed that most sync protocols are
257	   lack of key service capabilities or the capabilities are not well
258	   configured, which significantly degrades the network performance,
259	   especially in the mobile and wireless environment.  Section 4.4 and
260	   Section 4.5 illustrate the problems of current protocol capabilities.
261	   In addition, the unsatisfied concurrent work ability is specified in
262	   Section 4.6.

264	4.1.  Complicated Support for APIs

266	   Popular Internet storage services provide APIs that extend access to
267	   the content management features in client software for use in third-
268	   party applications.  In practical platform, these APIs take care of
269	   synchronizing data with Internet storage servers through a familiar
270	   system-like way.  Behind the scenes, API synchronize changes to the
271	   server and automatically notify the client when changes are made on
272	   other devices.  These APIs can also include some further advanced
273	   features or functions, e.g. revision or restoration of files, to make
274	   the client work better.  Different providers have different APIs
275	   provided to the developers and their APIs have different styles and
276	   features in order to support different platforms (e.g.  Windows and
277	   Andorid).

279	   Third-party applications prefer to combine multiple Internet storage
280	   services into their applications to achieve better performance,
281	   reliability and security.  However, for these developers who want to
282	   use multiple storage services, they need to learn the APIs of all
283	   service providers in order to design and implement their own clients.
284	   Although there have already been some successful third party clients
285	   that support multiple services (e.g.  ExpanDrive [ExpanDrive], IFTTT
286	   [IFTTT]), it is not easy for the developers to learn and apply so
287	   many different APIs to develop and maintain their third party
288	   clients.

290	4.2.  Unavailable Cross-service Sync

292	   Synchronizing is one of the most important functions provided by
293	   Internet storage services.  With this function provided, files in the
294	   Internet could be easily shared and manipulated by different people
295	   and groups.  Anyone who is permitted to read and download the file is
296	   able to modify and upload new versions of this file to the Internet.

298	   However, this synchronizing function merely works well inside a
299	   single service.  Users who are using the same Internet storage
300	   service could easily achieve the sharing (i.e. download) and
301	   coordinated operations on their files.  When referring to the
302	   synchronizing among different Internet storage services, it is not
303	   complete since the sync among different services is not available.
304	   For example, if a Dropbox user wants to work on a cooperative file
305	   with a Google Drive user currently, he is only able to share this
306	   file with the other one by sending an open HTTP link of this file.
307	   After clicking on that link, the Google Drive user could only
308	   download this file through HTTP.  However, the Google Drive user can
309	   only read and download the shared file.  He cannot modify and update
310	   the shared file since Dropbox and Google Drive are using two
311	   different proprietary sync protocols.  This is because the
312	   cooperative file is stored on Dropbox servers.  A Google Drive client
313	   cannot download/upload the file through Dropbox's sync protocol since
314	   it has no idea of the Dropbox's sync protocol.  Different services
315	   using different proprietary sync protocols results in the
316	   unavailability.

318	4.3.  Multiple Similar Clients

320	   The emergency of more and more Internet storage services provides
321	   users with a wide range of choices for storing their local files
322	   remotely.  Like other Internet applications, users are not restricted
323	   to use only one of those services.  Actually, they tend to have
324	   multiple accounts for different Internet storage services and
325	   experience them simultaneously.  One important reason is that users
326	   are always pursuing better functionality.  For example, Dropbox is
327	   better at file processing, OneDrive is better at the interoperability
328	   and compatibility with Microsoft Office while GoogleDrive has a
329	   better performance at mail attachment.  To enable all the desired
330	   functions and features, a simple way is to register and use all the
331	   desired Internet storage services.  Furthermore, people may simply
332	   need multiple Internet storage services for larger storage space and
333	   higher reliability.

335	   However, using different Internet storage service results in a
336	   problem that users have to install multiple similar client
337	   applications.  Since almost all commercial Internet storage services
338	   have their own proprietary sync protocols and corresponding client
339	   applications, installing and running multiple similar client
340	   applications sacrifices the user experience and also increases the
341	   complexity of synchronizing files with different providers' servers
342	   in Internet.  For instance, users usually suffer from duplicate
343	   operations in order to upload the same file to their different
344	   service accounts.

346	4.4.  Protocol Capability Configurations and Implementations

348	   Data sync is not a simple remote file transfer process, it can
349	   implement several capabilities to optimize the data storage usage and
350	   speed up data transmissions.  There exists five well-known
351	   capabilities that can be employed by Internet storage services to
352	   improve the sync efficiency and reliability: chunking, bundling,
353	   deduplication, delta-encoding and compression.  All these
354	   capabilities are aimed to help to efficiently synchronize user data
355	   via Internet communications.

357	   However, the investigation of [Benchmarking] shows that different
358	   Internet storage services have different capability configurations
359	   and implementations.  And most existing Internet storage services do
360	   not implement all the five capabilities in their sync protocol.  Lack
361	   of such capabilities can do affect the sync efficiency.  Table 1 from
362	   [QuickSync] shows different capabilities implementations of four
363	   popular Internet storage services (i.e.  Dropbox, GoogleDrive,
364	   OneDrive and Seafile) on Windows OS.

366	 +----------------+-------------+-------------+-------------+-------------+
367	 |  Capabilities  |   Dropbox   | GoogleDrive |   OneDrive  |   Seafile   |
368	 |                |             |             |             |             |
369	 +----------------+-------------+-------------+-------------+-------------+
370	 |    Chunking    |     4MB     |     8MB     |   Variable  |   Variable  |
371	 +----------------+-------------+-------------+-------------+-------------+
372	 |    Bundling    |     Yes     |      No     |      No     |      No     |
373	 +----------------+-------------+-------------+-------------+-------------+
374	 |  Deduplication |     Yes     |      No     |      No     |     Yes     |
375	 +----------------+-------------+-------------+-------------+-------------+
376	 | Delta-encoding |     Yes     |      No     |      No     |      No     |
377	 +----------------+-------------+-------------+-------------+-------------+
378	 |   Compression  |     Yes     |     Yes     |      No     |      No     |
379	 +----------------+-------------+-------------+-------------+-------------+
380	                                   Table 1

382	   Measurements and study from [QuickSync] also reveal that those key
383	   capabilities significantly affect the sync performance.  Most of them
384	   should be implemented and well configured to achieve data sync.  The
385	   remaining part of this subsection lists the problems caused by
386	   insufficient or unreasonably configured capabilities.

388	4.4.1.  Chunking and Deduplication

390	   Chunking is the most widely implemented capability that simplifies
391	   the transmission recovery when the sync of a large file is
392	   interrupted.  Different implementations of chunking has different
393	   chunking schemes (i.e. dynamic chunking or static chunking) and chunk
394	   sizes.  Chunking is closely related to deduplication since the
395	   deduplication is performed in the chunk granularity.  Typically,
396	   smaller chunk size and dynamic chunking scheme (e.g.  Content Defined
397	   Chunking) are better for detecting and eliminating redundancy.
398	   However the ability to detect more redundancy is not always equal to
399	   better sync efficiency since it will introduce more computation
400	   overhead (i.e. finding more redundancy needs more CPU time).
401	   Aggressive dynamic chunking scheme (e.g.  Content Defined Chunking)
402	   performs better in a high delay (i.e. high RTT) environment, while
403	   fixed-size scheme performs well in good network conditions.  A trade-
404	   off between computation time and transmission time need to be
405	   considered to achieve an effective chunking.  A better chunking
406	   strategy may be network-aware which means the sync should be able to
407	   employ appropriate chunking strategy according to its current network
408	   condition.

410	4.4.2.  Chunking and Delta-encoding

412	   Delta-encoding is an algorithm that can be used to find the different
413	   portion of two files and achieve incremental sync.  However, not all
414	   Internet storage services implement delta-encoding.  One possible
415	   reason is that most delta-encoding algorithms work at the granularity
416	   of file, while to save the storage space thus reducing the cost,
417	   files are often split into chunks to manage for Internet storage
418	   services.  Naively piecing together all chunks to reconstruct the
419	   whole file to achieve incremental sync would waste massive intra-
420	   cluster bandwidth.  Therefore, some Internet storage services, e.g.
421	   Dropbox, implement delta-encoding at the chunk granularity.  The
422	   delta-encoding is performed between two chunks in the original and
423	   modified version respectively according to the chunk offset from the
424	   beginning of the file.  If a service uses the fixed size chunking
425	   method, some types of modifications, e.g. inserting some new data at
426	   the head of a file, may cause that the two chunks used to perform
427	   delta-encoding have very little similarity.  In this circumstance,
428	   delta-encoding is unable to reveal the delta between the original and
429	   modified file so that the incremental sync fails.  To solve the
430	   problem, we need to design an improved delta-encoding algorithm with
431	   appropriate chunking that makes the incremental sync always available
432	   in various scenarios.

434	4.4.3.  Bundling

436	   Small files are more likely to be modified and synchronized
437	   frequently.  For example, people usually collaborate on a number of
438	   small files (e.g. a project's source code always consists of multiple
439	   small files).  In a high delay environment, synchronizing large
440	   number of small files is not efficient.  One reason is that most
441	   existing Internet storage services employ a sequential
442	   acknowledgement mechanism.  Under this circumstance, the next chunk
443	   is only allowed to be transmitted until the last chunk's
444	   acknowledgement has been received.  The sequential acknowledgement
445	   mechanism wastes the limited bandwidth since the TCP connection is in
446	   idle state for a long time.  Bundling small files together and
447	   employing delayed acknowledgement mechanism can effectively make full
448	   use of limited bandwidth so that the whole sync time and traffic
449	   overhead can be significantly decreased.

451	4.5.  Sync Protocols in Mobile and Wireless Environments

453	   The increasing number of mobile terminals introduces the requirement
454	   of synchronizing data on any device via any connectivity at anytime
455	   and anywhere.  A change made on the data through the desktop is
456	   required to be automatically transferred to the user's mobile phone
457	   or other mobile devices.  Based on the measurements from
458	   [Look_at_Mobile_Cloud], the problem of missing capabilities is more
459	   severe when referring to the mobile Internet storage services.  The
460	   root cause and problem are twofold:

462	   First of all, mobile devices have limited storage and computation
463	   ability, it is really hard to implement all the five useful
464	   capabilities discussed previously on a mobile client since the
465	   implementation of those capabilities will bring extra overhead
466	   (Table 2 shows the implementations for capabilities on Android OS).
467	   The measurement results from [Look_at_Mobile_Cloud] shows that none
468	   of existing mobile Internet storage services implement all the five
469	   key capabilities and only very few of them could be found on a mobile
470	   Internet storage client.  That explains why most Internet storage
471	   services wastes limited bandwidth, produce large useless traffic and
472	   suffer long sync time in the mobile environment.  How to implement
473	   all the desired capabilities with lower requirement of storage and
474	   computation resources is a critical problem needs to be addressed.

476	 +----------------+-------------+-------------+-------------+-------------+
477	 |  Capabilities  |   Dropbox   | GoogleDrive |   OneDrive  |   Seafile   |
478	 |                |             |             |             |             |
479	 +----------------+-------------+-------------+-------------+-------------+
480	 |    Chunking    |     4MB     |     260K    |     1MB     |      No     |
481	 +----------------+-------------+-------------+-------------+-------------+
482	 |    Bundling    |      No     |      No     |      No     |      No     |
483	 +----------------+-------------+-------------+-------------+-------------+
484	 |  Deduplication |     Yes     |      No     |      No     |      No     |
485	 +----------------+-------------+-------------+-------------+-------------+
486	 | Delta-encoding |      No     |      No     |      No     |      No     |
487	 +----------------+-------------+-------------+-------------+-------------+
488	 |   Compression  |      No     |      No     |      No     |      No     |
489	 +----------------+-------------+-------------+-------------+-------------+
490	                                   Table 2

492	   Secondly, sync protocol cannot well handle network disruptions caused
493	   by unstable network connection.  For example, some services fail to
494	   resume sync if the data transmission is interrupted, or incur too
495	   much additional recovery overhead when exception happens.  A well
496	   designed sync protocol that guarantees reliability and efficiency in
497	   mobile or wireless networks is expected.

499	4.6.  Unsatisfactory Concurrent Work Ability

501	   With the popularity of Internet storage services, collaborative work
502	   is becoming an important feature of such services.  This feature is
503	   especially important and provides convenience for a team or an
504	   organization since participants could easily retrieve and edit the
505	   target file on the Internet.  Currently, such collaborative work
506	   ability is still unsatisfactory that some common and frequent
507	   operations may lead to redundant file versions.  More specifically,
508	   parallel updates from different end users may result in a version
509	   conflict.  If two or more users are editing the same file
510	   concurrently, it is hard to make the file updated correctly.  To
511	   ensure every participant's modification would be considered, the
512	   typical way is to lock the file and allow other participants to
513	   create different versions for the same file.  To obtain a final
514	   version, participants have to negotiate with each other about their
515	   modifications (versions) and merge the final version manually.  This
516	   would definitely affect the work efficiency since people have to
517	   spend lots of time and effort on managing redundant versions and
518	   merging a final version.

520	   A desired concurrent work ability is when different people are
521	   working on the same file, the client should automatically create
522	   exclusive versions for their users locally.  And after they finished
523	   and uploaded to the server, the server would automatically merge
524	   different versions to get a final version without any human
525	   involvement.  Furthermore, a better solution is like what
526	   [GoogleDocs] does which provides actual real-time edit.  Multiple
527	   people could edit the same file and are able to find each other's
528	   cursor and real-time operation.  Such desired ability does help to
529	   improve the collaborative work ability but is really challenging when
530	   designing a protocol.

532	5.  Advantages of Standard Sync Protocol

534	   An open and standard sync protocol between client and server can
535	   effectively address some problems mentioned above.  The sync protocol
536	   consists of two types of flows: control flow and data flow.  Control
537	   flow is between client and control server.  It is intended for user
538	   authentication, metadata management and also the active notification
539	   of data changes.  Data flow is between client and data storage
540	   servers, which is only for transmitting actual file data (in the form
541	   of numerous chunks).  The combination of control flow and data flow
542	   enables the whole data sync.  According to the analysis of problems
543	   above, the key capabilities could be supported as optional features
544	   in the sync protocol and it would be better if the protocol is
545	   network-aware.  The rest of this section lists the advantages of
546	   employing an open and standard sync protocol.

548	   First off, with a standard sync protocol provided, a third party
549	   client that supports multiple Internet storage services is easy to
550	   implement since APIs provided by different providers would be
551	   unnecessary or at least simplified.  This would attract more and more
552	   people or organizations to develop and implement their own client
553	   (sometimes it is even possible for the user himself to implement his
554	   client).  As a result, users do not need multiple clients for
555	   multiple services any more and their user experience is improved.
556	   Furthermore, the competition in the (third party) client market is
557	   increasing which is beneficial for the users.  They are able to
558	   choose their clients flexibly and the frequent updates of clients
559	   enable users to obtain more functions and better user experience.

561	   Another advantage of having standard sync protocol is that the sync
562	   among different services is available or at least possible to
563	   achieve.  If two different services both employ the standard sync
564	   protocol, their users could synchronize files with each other using
565	   the same standard sync protocol (not the basic HTTP download any
566	   more).  In this way, users from different services could achieve
567	   sharing and coordinated operations on their local files.

569	   Using standard sync protocol also makes it easy to improve Internet
570	   storage services.  Compared with the existing proprietary formats,
571	   standard sync protocol is totally open and designed by many
572	   contributors.  People are welcome to revise and improve the standard
573	   protocol.  We believe that both users and providers will benefit a
574	   lot from such a standard sync protocol.

576	6.  Understanding of Sync Protocol

578	    Client                 Control Server           Data Storage Server
579	       |                          |                          |
580	       |---meta data, auth info-->|                          |
581	       |<-------start sync--------|                          |
582	       |     sync preparation     |                          |
583	       |                          |                          |
584	       |--------------------store/retrieve------------------>|
585	       |<--------------------ok/content----------------------|
586	       |                         ...                         |
587	       |--------------------store/retrieve------------------>|
588	       |<--------------------ok/content----------------------|
589	       |                   data transmission                 |
590	       |                          |                          |
591	       |---meta data, ver info--->|                          |
592	       |<-----conclude sync-------|                          |
593	       |        sync finish       |                          |
594	       |                          |                          |

596	                               Figure 2

598	   Figure 2 shows a preliminary and high level understanding of the sync
599	   protocol.  The whole sync process could be divided into three stages:
600	   sync preparation, data transmission and sync finish.  In the first
601	   stage, the client should exchange its metadata, authentication
602	   information with the control server to initiate a sync process.
603	   During this stage, the capabilities including network-aware chunking
604	   and deduplication should be performed.  In the second stage, data
605	   transmission, client sends/retrieves chunks to/from the data storage
606	   servers.  To speed up the data sync and make it more reliable, the
607	   capabilities like bundling and delta-encoding could be employed.
608	   When the sync finishes (i.e. sync finish stage), the client would
609	   send its metadata again for the control server to check and conclude
610	   the sync process.  Also some version information is exchanged for the
611	   version control.  From this understanding we could derive that the
612	   control flow and data flow are closely related, which cannot work
613	   without each other.

615	7.  Related Work in IETF

617	   WebDAV ([RFC4918]) provides an alternative way to exchange local data
618	   with remote web servers.  It can be treated as previous IETF effort
619	   on file collections, authoring and versioning over HTTP.  WebDAV
620	   mainly focuses on the authoring and versioning for distributed web
621	   contents.  Typical WebDAV protocol extends HTTP protocol to enable
622	   users to collaboratively edit and manage files on remote servers.
623	   WebDAV focuses on the distributed work (authoring and versioning)
624	   while ISS will focus on the data sync.  A potential major difference
625	   between data sync and distributed authoring/versioning is the
626	   frequency of data transmission.  In data sync, the client will
627	   automatically exchange data with remote servers when there are any
628	   changes.  In reality, every time you perform 'save' operation of a
629	   file, the client will solicit a data sync process.  Such frequent
630	   data transmission will cause a large amount of network traffic.  This
631	   introduces challenges to the design of sync protocols.  A possible
632	   solution is to make use of those well-known service capabilities and
633	   make the protocol to be network-aware to some extent.  The ISS
634	   protocol suite could build on the WebDAV protocol or basic HTTP
635	   protocol.

637	8.  Security Considerations (TBD)

639	   TBD

641	9.  Acknowledgements

643	   The authors would like to thank Barry Leiba, Mark Nottingham, Julian
644	   Reschke, Marc Blanchet, Mike Bishop, Haibin Song, Philip Hallam
645	   Baker, Michiel de Jong and Ted Lemon for their valuable comments and
646	   contributions to this work.

648	10.  Informative References

650	   [Batched]  Li, Z., Wilson, C., Jiang, Z., Liu, Y., Zhao, B., Jin, C.,
651	              Zhang, Z., and Y. Dai, "Efficient Batched Synchronization
652	              in Dropbox-Like Cloud Storage Services", Middleware ,
653	              2013.

655	   [Benchmarking]
656	              Drago, I., Bocchi, E., Mellia, M., Slatman, H., and A.
657	              Pras, "Benchmarking Personal Cloud Storage", IMC , 2013.

659	   [ExpanDrive]
660	              "ExpanDrive", <http://www.expandrive.com/>.

662	   [GoogleDocs]
663	              "Google Docs",
664	              <http://www.google.com/intl/en/docs/about/>.

666	   [IFTTT]    "IFTTT", <https://ifttt.com/>.

668	   [Inside_Dropbox]
669	              Drago, I., Mellia, M., Munafo, M., Sperotto, A., Sadre,
670	              R., and A. Pras, "Inside Dropbox: Understanding Personal
671	              Cloud Storage Services", IMC , 2012.

673	   [Look_at_Mobile_Cloud]
674	              Cui, Y., Lai, Z., and N. Dai, "A First Look at Mobile
675	              Cloud Storage Services: Architecture, Experimentation and
676	              Challenge", IEEE Network , 2015.

678	   [QuickSync]
679	              Cui, Y., Lai, Z., Wang, X., Dai, N., and C. Miao,
680	              "QuickSync: Improving Synchronization Efficiency for
681	              Mobile Cloud Storage Services", MOBICOM , 2015.

683	   [RFC4918]  Dusseault, L., Ed., "HTTP Extensions for Web Distributed
684	              Authoring and Versioning (WebDAV)", RFC 4918,
685	              DOI 10.17487/RFC4918, June 2007,
686	              <http://www.rfc-editor.org/info/rfc4918>.

688	   [rsync]    "rsync", <https://rsync.samba.org/>.

690	   [Towards]  Li, Z., Jin, C., Xu, T., Wilson, C., Liu, Y., Cheng, L.,
691	              Liu, Y., Dai, Y., and Z. Zhang, "Towards Network-level
692	              Efficiency for Cloud Storage Services", IMC , 2014.

694	   [users]    "400 million strong", <https://blogs.dropbox.com/
695	              dropbox/2015/06/400-million-users/>.

697	Authors' Addresses

699	   Yong Cui
700	   Tsinghua University
701	   Beijing  100084
702	   P.R.China

704	   Phone: +86-10-6260-3059
705	   Email: yong@csnet1.cs.tsinghua.edu.cn
706	   Zeqi Lai
707	   Tsinghua University
708	   Beijing  100084
709	   P.R.China

711	   Phone: +86-10-6278-5822
712	   Email: uestclzq@gmail.com

714	   Linhui Sun
715	   Tsinghua University
716	   Beijing  100084
717	   P.R.China

719	   Phone: +86-10-6278-5822
720	   Email: lh.sunlinh@gmail.com