[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [nfsv4] question about pNFS.



Yi,

The server strategy for granting overlapping write layouts
and preventing corruption due to concurrent writing of the
file depends on the layout type and other parameters (e.g.
objects RAID level for pnfs-obj).

For example with the files layout or with a striped (RAID-0)
file over the objects layout, if the clients write *only*
the application buffers, i.e. they do not read around that
to pad and align to page/buffer size, then the client should
not step on each other's data, since writing to any data server
in this case is no different than writing to the metadata
server.

Other cases require serialization of writes that is out of
scope for the client.  For example, with objects RAID, the
whole stripe needs to be locked while writing to it,
therefore the server will not grant more than one outstanding
layout for any specific stripe.

Another example could be the block layout, for which the server
provisionally allocates blocks for newly written data.
In this case, false sharing may result in data corruption if
two clients are given concurrently write layouts overlapping
a block, if the server is unable to perform LAYOUTCOMMIT
for extents that are not block aligned.

Benny

On Jun. 26, 2008, 15:02 +0300, "易乐天" <lonat.front at gmail.com> wrote:
>    Hmm..   i do not mind adding it to mail list, but i think some
> thing(the second method) i considered is wrong...
>  
>    In the situatition i refered, two client just write concurrently a
> shared file, and this two client write the same range of file(may be the
> same data) by turns with no lock and data reservation, the server have
> two method to grant layout. the first is the same as method described in
> "draft of pNFS", and the second method means server can grant the layout
> refer the same range(may be a block or a object of a file) to two
> clients, and this may lead to no written of bytes of one client   
> 
>  
> 2008/6/24 Benny Halevy <bhalevy at panasas.com <mailto:bhalevy at panasas.com>>:
> 
>     Hi Yi,
> 
>     I'm not sure I understand exactly the workload you described
>     so I appreciate if you can give more concrete example
>     describing when the different clients are writing
>     and to which regions of the file.
> 
>     Also, this question is really appropriate for the nfsv4
>     mailing list, so do you mind if I add it to the Cc on
>     my reply?
> 
>     Benny
> 
>     On Jun. 24, 2008, 9:39 +0300, "易乐天" <lonat.front at gmail.com
>     <mailto:lonat.front at gmail.com>> wrote:
>     > hi Benny
>     >       I've read pNFS draft these days, a problem about data caching
>     > still puzzle me.
>     >       Your explaination about range lock may be good enough to
>     solve to
>     > some sophisticated application with range lock request to parallel
>     > accessing the same file(some process to access the same file).
>     However,
>     > there may be some infantile application with no range lock. In this
>     > scenes, as you said, two client just write concurrently a shared file,
>     > and this two client write file by turns. Of course, here, both client
>     > restrict nerther the data reservation nor range lock. In this
>     > situation, because of the some other reasons, i think the content
>     of the
>     > file may be "inconsistency" of both process after closing.
>     >       what puzzled me is, in the processing of above, which
>     > method should MDS apply:
>     >      1. recall and grant layout of this file to clients
>     frequently. And
>     > this may be benefit from "layout prefecth". for example, client A
>     access
>     > the first 1G range of FILE, and B access the second 1G range.The
>     > server can grant layout of both range to these client and
>     eliminate the
>     > reference to server from client.
>     >      2. no layout caching and data caching in any clients. this
>     may be a
>     > conservative method.
>     >
>     >
>      
>     > 2008/6/2, Benny Halevy <bhalevy at panasas.com
>     <mailto:bhalevy at panasas.com> <mailto:bhalevy at panasas.com
>     <mailto:bhalevy at panasas.com>>>:
>     >
>     >     Basically, the application lock requests turn into
>     >     NFSv4 byte range locks that are sent to the metadata server.
>     >     Note that NFSv4 has fine grain locks, not only coarse grain
>     >     share reservations.
>     >
>     >     With clustered back-end file systems like Panasas, GPFS, or GFS2
>     >     these might turn into DLM calls in the back-end cluster.
>     >
>     >     If the clients just read and write concurrently from a shared
>     >     file it is up to the back-end file system to coordinate the
>     >     access.  This is done by recalling and granting layouts
>     >     appropriately when needed, depending on the file's striping
>     >     pattern and on the back-end file system's architecture.
>     >     So essentially this is no less efficient than any other
>     >     clustered file system.  Files striped with RAID-0 need
>     >     no synchronization between pnfs clients, however, for
>     >     back-end filesystems like GPFS or GFS2, they need to utilize
>     >     the DLM "behind the scenes" to coordinate their internal
>     >     access to shared storage.  Panasas, however, using object-based
>     >     storage in a partitioned storage architecture need no global
>     >     synchronization between the OSDs.  If the the file is striped
>     >     in RAID-5, like we typically do, we do coordinate access
>     >     by recalling layouts so that no two pnfs clients could
>     >     write to the same stripe at the same time.  If each client
>     >     writes a big enough region this is very effective and
>     >     we can reach multi-gigabyte per second throughput even
>     >     with RAID-5.  For smaller writes, and if the clients'
>     >     regions do not overlap we'd typically recommend
>     >     striping with RAID-10 or (RAID-0) and get even better
>     >     performance (in the expense of space (RAID-10) or reliability
>     >     (RAID-0)).
>     >
>     >     Benny
>     >
>     >     On Jun. 02, 2008, 10:31 +0300, "易乐天" <lonat.front at gmail.com
>     <mailto:lonat.front at gmail.com>
>      
>     >     <mailto:lonat.front at gmail.com <mailto:lonat.front at gmail.com>>>
>     wrote:
>     >     > hi  benny
>     >     >     According to the goals of pNFS, client should not only
>     access
>     >     the shared
>     >     > storage, but also can support high end scientifical computing.
>     >     However,
>     >     > because of the limit of naive lock of NFS, for example LEASE
>     (it is
>     >     > coarse-granularity-lock which may only be used to protect entire
>     >     file), it
>     >     > can not support the concurrence accessing which exists in
>     many HPC's
>     >     > application.And in some cluster file system such as lustre, GFS,
>     >     > GPFS, they use Distributed Lock Manager to achieve it.
>     >     >     So, I am afraid that pNFS can not do the same work
>     effeciently
>     >     as above
>     >     > file system if no DLM be added in it. And how do you think
>     of it?
>     >     >
>     >     >
>     >     > 2008/5/28 易乐天 <lonat.front at gmail.com
>     <mailto:lonat.front at gmail.com>
>      
>     >     <mailto:lonat.front at gmail.com <mailto:lonat.front at gmail.com>>>:
>     >     >
>     >     >> hi Benny
>     >     >>   Thank you reply about this questions and your reply helps
>     me to
>     >     >> understand pNFS more deeply.
>     >     >>
>     >     >> 2008/5/27 Benny Halevy <bhalevy at panasas.com
>     <mailto:bhalevy at panasas.com>
>      
>     >     <mailto:bhalevy at panasas.com <mailto:bhalevy at panasas.com>>>:
>     >     >>
>     >     >>   易乐天 wrote:
>     >     >>>> hello halevy.
>     >     >>>>    I am interesting in pNFS, but i have a question about it.
>     >     >>>>   Except the pluggable-layout-driver that can support
>     "FILE, BLOCK,
>     >     >>>> OBJECT" model, i can not find out the other advantages or
>     >     attracting
>     >     >>>> aspects of pNFS when compare it with other cluster file
>     system
>     >     such as
>     >     >>>> lustre, storage tank, panFS. With these fs, users can
>     select and
>     >     >>>> obtain all model(FILE, BLOCK, OBJECT) according to their
>     hardware
>     >     >>>> environment.
>     >     >>>>   Can you explain it to me : what characters pNFS owed while
>     >     comparing
>     >     >>>> it with these cluster file system?
>     >     >>>>   thank you!
>     >     >>>>
>     >     >>>> --
>     >     >>>> Best regards.
>     >     >>>>
>     >     >>>> Yi LeTian
>     >     >>>>
>     >     >>>> School of Computer Science, Changsha, Hunan, China
>     >     >>>>
>     >     >>>> Email: lonat.front at gmail.com
>     <mailto:lonat.front at gmail.com> <mailto:lonat.front at gmail.com
>     <mailto:lonat.front at gmail.com>>
>      
>     >     <mailto:lonat.front at gmail.com <mailto:lonat.front at gmail.com>
>     <mailto:lonat.front at gmail.com <mailto:lonat.front at gmail.com>>>
>     >     >>> The main advantage of pNFS comparing to any other proprietary
>     >     solution
>     >     >>> is that it
>     >     >>> is an IETF standard that's layered on top of other
>     standard storage
>     >     >>> protocols.
>     >     >>> For customers this means that they can potentially deploy
>     >     best-of-breed
>     >     >>> solution
>     >     >>> choosing from multiple vendors.
>     >     >>>
>     >     >>> From the technical perspective only, we tried to combine
>     features
>     >     >>> already present
>     >     >>> in several existing and similar architectures like Panasas,
>     >     Lustre, EMC
>     >     >>> high road, etc.
>     >     >>> and not necessarily improve on them.  One thing that is in the
>     >     standard
>     >     >>> and
>     >     >>> may be missing from some existing clustered file system is the
>     >     propagation
>     >     >>> of device information that allows the client to dynamically
>     >     discover and
>     >     >>> mount the storage devices without having to configure them on
>     >     each and
>     >     >>> ever client.
>     >     >>>
>     >     >>> Also, using NFSv4.1 infrastructure, NFSv4.1 sessions provide
>     >     >>> "exactly-once-semantics"
>     >     >>> and recovery features that may be missing from some existing
>     >     file systems.
>     >     >>>
>     >     >>> I hope this answers your question.
>     >     >>>
>     >     >>> Benny
>     >     >>>
>     >     >>>
>     >     >>>
>     >     >>
>     >     >> --
>     >     >> Best regards.
>     >     >>
>     >     >> Yi LeTian
>     >     >>
>     >     >> School of Computer Science,  Changsha, Hunan, China
>     >     >>
>     >     >>
>     >     >> Email: lonat.front at gmail.com <mailto:lonat.front at gmail.com>
>     <mailto:lonat.front at gmail.com <mailto:lonat.front at gmail.com>>
>     >     >>
>     >     >
>     >     >
>     >     >
>     >
>     >
>     >
>     >
>      
>     > --
>     > Best regards.
>     >
>     > Yi LeTian
>     >
>     > School of Computer Science, National University of Defense Technology,
>     > Changsha, Hunan, China
>     >
>     > Tel: 13467509790
>      
>     > Email: lonat.front at gmail.com <mailto:lonat.front at gmail.com>
>     <mailto:lonat.front at gmail.com <mailto:lonat.front at gmail.com>>
> 
>      
> 
> 
> 
> 
> -- 
> Best regards.
> 
> Yi LeTian
> 
> School of Computer Science, National University of Defense Technology,
> Changsha, Hunan, China
> 
> Tel: 13467509790
> Email: lonat.front at gmail.com <mailto:lonat.front at gmail.com>
_______________________________________________
nfsv4 mailing list
nfsv4 at ietf.org
https://www.ietf.org/mailman/listinfo/nfsv4