[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [nfsv4] pnfs: efficient file-layout striping proposal



I just returned from vacation and trying to catch up on all the issues (whew...)
Quoting from the IETF 66 summary:
" Issue #69: File-layout striping structure
- No real disagreement
- Issue was raised that these simple/complex device IDs might be
useful in a generic sense (across all layouts); will revisit this
once we have it defined for files
- Garth will update proposal with specifics and repost"


I'm hesitant to comment on the file-layout striping structure as I wasn't at IETF, but I have one initial impression. Creating layouts that stripe data in a pseudo-random (read: algorithmic) manner can be useful, but its representation as a series of data servers, i.e., the output of the algorithm, seems to eliminate the benefits of algorithmic layouts: they are inherently compact and efficient. Why use an algorithm on just the server? Having clients and servers share a single algorithm saves space, increases efficiency, reduces complexity, and allows their representation to consist of an algorithm id and a few attributes (input variables).

I definitely support layouts that allow underlying file systems to generate customizable layouts, but if all we are talking about is a simple algorithm to increase parallelism for some applications by randomizing the data servers, why not just put a useful algorithm in the spec and refer to it via an algorithm id? Clients and servers would then use the same algorithm to perform reads and writes? As time goes on, we could add more layout generation algorithms as needed. (notice I did not mention code shipping anywhere :) )

To summarize, I believe client/server shared layout algorithms are the ideal theoretical solution, but implementing this efficiently with the existing number of platforms and implementations is very hard. Can we use the spec to help? Thoughts?

Dean Hildebrand

Garth Goodson wrote:
Sorry been out of internet contact for a few days.  Some comments inline:

Marc Eshel wrote:
Now that I finally understand the proposal it sound more reasonable and maybe I can make some suggestion to improve it. So the idea is to represent a repeated pattern by just referencing a COMPLEX device id. This will also work great for Device Equivalence. The alternative device are usually the same, so alternates for device 1 are 11, 12, 13. We don't have to repeat the alternate in every layout and we don't have to save it on the client more than once. It should probably be another array in the COMPLEX device type.

This is included in one of my last cases (near the bottom of the proposal). I was proposing that we change the SIMPLE type to be an array of device ids which gives the device equivalence. We could also add an additional type, but I'm not sure that would be any more useful than extending the SIMPLE type.


There is another problem that I raised in the past that should be solved with this new device types. We are going to use much more of the name space represent those different device types so we need a way to reset (recall) those mapping. I know that one alternative is to change pnfs_deviceid4 from 32bit to 64bit but this will add back to the space that we are trying to reduce. So what we need is an option on CB_LAYOUTRECALL that will tell the client to clear his device list cache.
Marc.


Let's talk about this at the IETF meeting. I'm not sure we are going to use more of the name space, since COMPLEX ids comprise multiple devices we may actually use fewer ids. However, if the mappings change often, we will use more, although I'm not sure we will use more than in the case we have now --- even now if a single device ID in the mapping changes a new ID must be used, this would be no different.

-Garth


"Noveck, Dave" <Dave.Noveck at netapp.com> wrote on 07/08/2006 08:26:09 AM:


If the client gets a COMPLEX device which is an array of

pnfs_deviceid4
than what is the id of the quasi-device?

It is the device id that was in the layout which you used to fetch the device info, which in turn contains the array of pnfs_deviceid4's.


how do you distinguish between quasi-device and real device,

The device info for files is a discriminated union where there are separate cases for SIMPLE and COMPLEX.


are they sharing the same name (id) space?

Yes. This is principally to avoid additional ops. By making these
things devices, we allow DEVINFO to be used and the discriminated
union supports multiple types of devices or device-like aggregations. The big benefit is when many files make reference to the same device
aggregations.


-----Original Message-----
From: Marc Eshel [mailto:eshel at almaden.ibm.com] Sent: Friday, July 07, 2006 4:54 PM
To: Noveck, Dave
Cc: Goodson, Garth; nfsv4 at ietf.org
Subject: RE: [nfsv4] pnfs: efficient file-layout striping proposal


Yes, we can continue to talk in Montreal but it would be useful to
clarify your proposal as much as possible before the meeting. [ more text in-line ]
"Noveck, Dave" <Dave.Noveck at netapp.com> wrote on 07/07/2006 11:51:32 AM:



Marc Eshel:

Where does the client hang the shared parts of the layout? How long should the client keep them around?

This is where storing this quasi-devices means the complexity impact is about zero. You do exactly what you do with device definitions.
You keep them is some sort of hash by low-order dev-id bits and you LRU them.



How does the server know that the client still holds the shared

part?
He doesn't and he doesn't have to. This works exactly the same as device definitions which are shared among layouts. The server gives an id without knowing whether the client has or has not fetched the definition corresponding to a given id. It is up to the client to fetch it if it doesn't have it.


This make a little more sense, but I still don't see how you get this
new quasi-devices from to original proposal. Maybe you can explain using
the new proposed struct



union nfsv4_file_layout_device4 switch (file_layout_device_type) {
    case SIMPLE:                                 <--- NEW
        nfsv4_file_layout_simple_device4 dev;    <--- NEW
    case COMPLEX:                                <--- NEW
        pnfs_deviceid4 dev_list<>;               <--- NEW
    default:
        void;
};

If the client gets a COMPLEX device which is an array of pnfs_deviceid4
than what is the id of the quasi-device? how do you distinguish between
quasi-device and real device, are they sharing the same name (id) space?


Just trying to understand better the proposal.
Marc.
_______________________________________________
nfsv4 mailing list
nfsv4 at ietf.org
https://www1.ietf.org/mailman/listinfo/nfsv4

_______________________________________________ nfsv4 mailing list nfsv4 at ietf.org https://www1.ietf.org/mailman/listinfo/nfsv4

-- Dean Hildebrand Ph.D. Candidate University of Michigan


_______________________________________________ nfsv4 mailing list nfsv4 at ietf.org https://www1.ietf.org/mailman/listinfo/nfsv4