ISS ? 2015-11-03 1520-1650 ======================= Chairs: Yong Cui and David Black Scribes: Linhui Sun ---------------------------------------- ChairsĄ¯ Slides: Yong Cui: This BOF will focus on the problem statement, and then we will have an open discussion. ISS is not a WG-forming BOF. David Black: We are looking for open discussions about 1) whatĄ¯s going on here 2) what does the IETF care about 3) what do you think the IETF ought to do in this area and why. Yong Cui: People are encouraged to ask the clarification questions during the problem statement presentation, other open questions will be left to the open discussion phase. ---------------------------------------- Internet Storage Sync ? Problem Statement (Zeqi Lai): < https://www.ietf.org/proceedings/94/slides/slides-94-iss-1.pdf > Ted Lemon: Seems like you are really focus on the problem of synchronizing individual large files, would you say thatĄ¯s true? Zeqi Lai: Maybe both large and small files. Ted Lemon: It seems to be two different pieces to this problem, one of them is synchronizing files, the other part which is also important is how to track changes to the files, I donĄ¯t think you really cover that. Ted Lemon: Rsync isnĄ¯t an example of something that I think works very well, but it addresses the problem of keeping a copy of a file tree on a client in sync with the same copy of file tree on server. Is that something you considered it part of this problem? Ted Lemon: IĄ¯m not suggesting we should adopt rsync. IĄ¯m just asking if you consider the problem that rsync solves to be part of the problem you want to solve. Barry Leiba: Not speaking as AD, the way I look at this is that it intends to go to protocol to do the transfers and synchronizations, but not to figure out what needs to be synchronized. ThatĄ¯s up to the client and server, there is no reason for that to interoperate. Do you guys agree with that? Zeqi Lai: Yes. Ted Lemon: I donĄ¯t mean to say that you should necessarily change just because I want something different than you do. What I want is both things, and for me the most interesting and important is synchronizing an entirely set of files. Joe Hildebrand: Do you see the list of file names, the directory structure, the nesting of that information to be a part of what you currently synchronizing in the protocol? Zeqi Lai: That should be considered in the sync protocol. Joe Hildebrand: Including things like the executable bits and all the other metadata about the files potentially? Zeqi Lai: Yes. Patrick Linskey: Have you thought about the concerns around end-to-end encryption and privacy in this context? Zeqi Lai: The security and privacy are quite important. But in our previous exploration we did not focus on the security issues, we could discuss this in this BOF. Patrick Linskey: Adding crypto into a file sync protocol significantly complicates the problem of deduplication and etc. For IETF, itĄ¯s important to think about those concerns. Those struggles may make it nearly impossible to achieve some of the goals, but on the other hand totally solve some of the other concerns. David Black: If the encryption is done well, it defeats deduplication because deduplication is the evidence that encryption has not been done well. Patrick Linskey: There might be a couple of narrow spaces where you can avoid that. ThatĄ¯s a belief not a conviction. ------------------------------------------- Open discussion: Gihan Dias: As for deduplication and encryption, do believe there are solutions to that. There are two issues we need to do, as Ted mentioned, it is not so useful just to sync individual files but syncing file systems across multiple devices (laptops, phones and etc.), many clients and many servers need to be synced. David Black: Do you mean the scope of the problem isnĄ¯t just the data sync, the problem we need to solve interoperability is the data sync and file name tree sync as well? Gihan Dias: Yes, and the other thing is that we need to look at multiple clients and possibly multiple servers. Multiple servers are also important. David Black: Are we talking about multiple clients and multiple servers that interoperate one-to-one or are we talking about common shared view of data will one client contact to multiple servers and are we also talking about concurrent editing of multiple clients talk to the same server? Gihan Dias: There are lots of stuff here, at least the first case would be my data stored in many places and accessed from many places. Linhui Sun: Could use notification flow to notify all the devices where the changes are made elsewhere. And then solicit metadata exchange among multiple devices and servers. This has been considered in the current storage services and we will also consider it. Gihan Dias: The second issue is cost. If we are paying per megabyte or gigabyte, it becomes pretty expensive. The network needs to be aware of cost as well. I would like to treat that as one of the requirements of the system. Ted Lemon: To expand a little bit on the multiple clients and multiple servers thing, what I want is that there are a bunch of data stores and there is a mechanism for synchronizing them. And understanding what the latest version on any given data store is, trying to come up with most modern version. The edit to the data store should be a separate thing than syncs. The IMAP doesnĄ¯t do well in this aspect. If we can avoid the idea that the client and server are necessarily different, what we produce would be more generally useful. David Black: How general do you want to go? How far do we have to go in the direction of ownership to figure out who makes updates and what the latest version is? Ted Lemon: The thing that I want to avoid is Google Drive erasing changes that IĄ¯ve made. In other words, I donĄ¯t want when someone makes changes on the master, it comes back and squashes the changes that IĄ¯ve made on other device. I want the synchronization process to be based on the delta between when we last communicated and now. Patrick Linskey: The couple of other important considerations in standardizing this domain would be notions of groups. Both from stand point of units of sharing and also in terms of memberships. David Black: Groups of what? Patrick Linskey: Groups of files, groups of objects and groups of people. Patrick Linskey: Getting too far into specifying dropbox or anything like that is probably the wrong choice for the IETF. Really the question here is that are the primitives that all the different systems could be sharing. In the collaboration space, groups of people are more important than hierarchy of structure. I want basic hierarchies and rich groups of people. It would be challenging and probably inappropriate to standardize that problem domain, it is really critical to look at what kind of small pieces would be able to be optimized by us. David Black: What do you think should be out of scope of the problem domain? Patrick Linskey: Those problem domains would be better than that problem domain, and those problem domains would be anything that an end user would be happy with. David Black: Sounds like you are advocating that we should standardize mechanisms and primitives, but the complete user experience is probably out of scope? Patrick Linskey: Yes. Mike Bishop: From Microsoft but not on that team. We talked about the same entity owns the codes on both sides, so I donĄ¯t need to standardize how the app client talk to the app server since the same company is writing codes on their sides. ItĄ¯s their business; letĄ¯s stay out of that. I donĄ¯t know thereĄ¯s going to be an advantage to those vendors (i.e. Dropbox, OneDrive and etc.) and having a standardized protocol when they already have their own implementations and theyĄ¯re only trying to interact with their own codes. David Black: Do you mean common client is unreasonable, everyone builds his own service is going to build his own client? Mike Bishop: Not saying everyone is going to but everyone has. If one of your goals is a common client, that common client would have to offer enough advantages to get everyone to switch to it. Not sure thatĄ¯s achievable. Mike Bishop: Have a lot of diversity clients are accessing the files. The thing we may get benefits from standardizing is that other apps talk (e.g. office) to the clients, instead of an app has to incorporate to different SDKs. Standardizing the access protocol would probably be more useful. Mark Donnelly: One thing that IETF could do to be helpful is standardizing some way to sharing in between servers. Files on Dropbox could be shared to Google Drive. Could work towards some way to allow collaborations between servers. David Black: Why a Dropbox CEO should implement this to let data out? Mark Donnelly: You are not letting the data out but letting the data in. Linhui Sun: Enterprises may want their own private storage services should be another requirement. And we should find some other interesting points to let vendors in, not just ask them to implement this directly. A better collaboration with other apps (e.g. office, gmail and etc.) may help. David Black: Also leads to storage services that are country specific or region specific. Murray Kucherawy: Research work here is impressive. If we cannot find a good reason to standardize, IRTF should take a look at this and further the work. One of the two of us should pick up something here. Nathan Owens: Bring up an open source project called Syncthing. Richard Barnes: ItĄ¯s not worth doing work in the IETF that is not going to deploy. If cannot hear folks in the room to say want to build this thing, need this thing, not sure where to go, maybe IRTF is a good idea. ?: Is there a large player that exists in the space would like to disrupt their business, the answer is no. ItĄ¯s easy to convince the CEO to give you a nice API to let data in to their storage, but letting out is of course the question. But that bridge has already been crossed. Need to talk about what do enterprises want to do this, often driven by security and etc. IRTF is not appropriate for this work; we should just say yes or no. Joe Hildebrand: Thinking about how we do backup. Could imagine doing interesting new products here from existing backup companies that may benefit from splitting things into multiple different stores. Ted Lemon: Regardless of whether Dropbox want it, I still want it. Patrick Linskey: Advertise for saag WG on Thursday for key sharing issues, which is also key part of this problem domain. And Mechanisms to do file format aware synchronization is interesting. Possibly content types. Some problems are addressable with the knowledge of the file format. David Black: There are several techniques based on rolling hashes discussed in the slides. Patrick Linskey: That could also address some conflict situations that arrive in different synchronizations. Yong Cui: Do you think that is a little bit related to the GoogleDoc? Patrick Linskey: Absolutely. Rich Salz: This is likely to be where we had one, now we have a plus one. Maybe there is one customer who would use it, unless the industry buys it that we are really skeptical about. LetĄ¯s just talk about it and move on. ?: Maybe the protocol is not that good for Dropbox, but more for personal file servers host on your access router. With IPv6, permanent connectivity, why not just host your stuff yourself. Haibin Song: Have you considered the network parameters to make the sync more network friendly? Yong Cui: It could be related but now has not been related. Xiaowei Qin: Uploading is important. Kireeti Kompella: Have heard lots of voice that this is an interesting problem but have not heard why the IETF should solve it. Another aspect might get in the way is that people may have lots of IPR here. The choice would be between going to a weaker standard protocol or going to be willing to give up my IPR. Gihan Dias: A possible way is to give a framework, which you could plug in different algorithms. Details of how we sync could be different form place to place. To respond colleague from Akamai, believe itĄ¯s a good opportunity for CDN providers. Eric Rescorla: Efficiency should not be focused on, IETF is a standard body that cannot do contributions in efficiency. Interoperability is an interesting discussion. Patrick Linskey: Potential IETF value in this domain is standardization around extending PUT (or maybe GET) to allow ranges to be extensible. Joe Hildebrand: WouldnĄ¯t necessarily be ranges, could be something else. Interesting. Igor Gashinsky: Seriously confused why this is in IETF since making more efficient belongs to the research place, there are serious IPR issues and etc. Cullen Jennings: First itĄ¯s nothing about the efficiency. And from an implementer point of view, when building a product to integrate well with Dropbox, people have to implement it to integrate with all the major services. All the major services have different APIs, it is better to have one API since they are doing the same thing. The value here is about interoperability. Igor Gashinsky: We have tried to build the interoperable API with messaging services, all these are super customized and super service specific. DonĄ¯t know anybody who will want to standardize it. David Black: How much that problem is an API problem, how much is a library problem and how much is a protocol problem. Cullen Jennings: First do not see a difference between API and protocol. And itĄ¯s not a library problem. Back to the previous comment, there are different requirements and complicated things here. People have different uses and those are fairly cross all the services, that seems to me optimistic that a group of people here are talking about a common API or protocol. Ted Lemon: How to sync big collection of files is interesting, and totally agree that sync efficiency is not that interesting. Zeqi Lai: Standardization may help small companies to build their own private cloud system. Just think about the email. David: Curious about last week the Openstack here. Maybe thatĄ¯s the place. David Black: A lot of component there are control plane components. This work has a great deal to do about moving data that may not be suitable for Openstack. Eric Rescorla: ItĄ¯s fantastic to use the same client to sync with every possible service. But the way this game is played is that people who have big storage services have to be interested in doing that. There are two possible stories that would make this be useful. 1) Lots of people want to build a new set of system and they would use that, not the big guys or have some large use cases. 2) People work in this space have interests. Patrick Linskey: What if the IETF to standardize BitTorrent? David Black: Should wonder what the use cases are. BitTorrent is more a distribution protocol than a sync protocol. Patrick Linskey: One example use case is that ties into the Openstack. Second is allowing data centers using BitTorrent internally for propagation of new files. Igor Gashinsky: ItĄ¯s a cool research problem but donĄ¯t need to be a standard protocol. Joe Hilderbrand: The next thing those people did was they came up with a protocol to work as a superset of all of them. And thatĄ¯s what we are using right now to exchange background in this meeting. The slack. Nathan Owens: Maybe taking something like Syncthing, BitTorrent sync into IETF standard, once they take off, there might be some incentives for big players to take the standard. David Black: Anyone else wants to comment on the question slide? ?: The next step is to bring the people here who actually would have business case for doing this. David Black: Another next step is to make progress on use case and applicability. ?: I would not focus on the use case, I would focus on the people who actually have interests in this. And also support doing something with BitTorrent. Cullen Jennings: Talked to BitTorrent guys before, they are not keen on IETF standardizing at that time. ------------------------------------------- Final words: Barry Leiba: Comfortable with the discussion. The discussion has been useful but donĄ¯t think has led to any near term work. As people said, the main thing is to see if we can get the people who provide the services to come here to work with us and say they want to deploy it. Unless we have that, there is no point to doing a cool protocol but no one is gonna to deploy. Joe Hildebrand: Not speaking for IAB but with IAB hat on. Agree with Barry, that would be useful to bring others in who would be interesting in implementing this or have use cases. Barry Leiba: A ML called storagesync@ietf.org to continue discussion.