idnits 2.17.1 

draft-yangcan-ietf-data-migration-standards-04.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (May 27, 2020) is 1429 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Unused Reference: 'RFC2026' is defined on line 891, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC2578' is defined on line 900, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC2629' is defined on line 908, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC4710' is defined on line 912, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC5694' is defined on line 917, but no explicit
     reference was found in the text

  -- Obsolete informational reference (is this intentional?): RFC 2629
     (Obsoleted by RFC 7749)


     Summary: 0 errors (**), 0 flaws (~~), 6 warnings (==), 2 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	Internet Engineering Task Force                             C. Yang, Ed.
3	Internet-Draft                                  Y.Liu&Y.Wang&SY.Pan, Ed.
4	Intended status: Standards Track    South China University of Technology
5	Expires: November 28, 2020                                       C. Chen
6	                                                                  Inspur
7	                                                                 G. Chen
8	                                                                    GSTA
9	                                                                  Y. Wei
10	                                                                  Huawei
11	                                                            May 27, 2020

13	                   A Massive Data Migration Framework
14	             draft-yangcan-ietf-data-migration-standards-04

16	Abstract

18	   This document describes a standardized framework for implementing the
19	   massive data migration between traditional databases and big-data
20	   platforms on the cloud via Internet, especially for an instance of
21	   Hadoop data architecture.  The main goal of the framework is to
22	   provide more concise and friendly interfaces for users more easily
23	   and quickly migrate the massive data from a relational database to a
24	   distributed platform for a variety of requirements, in order to make
25	   full use of distributed storage resource and distributed computing
26	   capability to solve the bottleneck problems of both storage and
27	   computing performance in traditional enterprise-level applications.
28	   This document covers the fundamental architecture, data elements
29	   specification, operations, and interface related to massive data
30	   migration.

32	Status of This Memo

34	   This Internet-Draft is submitted in full conformance with the
35	   provisions of BCP 78 and BCP 79.

37	   Internet-Drafts are working documents of the Internet Engineering
38	   Task Force (IETF).  Note that other groups may also distribute
39	   working documents as Internet-Drafts.  The list of current Internet-
40	   Drafts is at https://datatracker.ietf.org/drafts/current/.

42	   Internet-Drafts are draft documents valid for a maximum of six months
43	   and may be updated, replaced, or obsoleted by other documents at any
44	   time.  It is inappropriate to use Internet-Drafts as reference
45	   material or to cite them other than as "work in progress."

47	   This Internet-Draft will expire on November 28, 2020.

49	Copyright Notice

51	   Copyright (c) 2020 IETF Trust and the persons identified as the
52	   document authors.  All rights reserved.

54	   This document is subject to BCP 78 and the IETF Trust's Legal
55	   Provisions Relating to IETF Documents
56	   (https://trustee.ietf.org/license-info) in effect on the date of
57	   publication of this document.  Please review these documents
58	   carefully, as they describe your rights and restrictions with respect
59	   to this document.  Code Components extracted from this document must
60	   include Simplified BSD License text as described in Section 4.e of
61	   the Trust Legal Provisions and are provided without warranty as
62	   described in the Simplified BSD License.

64	Table of Contents

66	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
67	   2.  Definitions and Terminology . . . . . . . . . . . . . . . . .   4
68	   3.  Specific Framework Implementation Standards . . . . . . . . .   6
69	     3.1.  System Architecture Diagram . . . . . . . . . . . . . . .   6
70	     3.2.  Source and Target of Migration  . . . . . . . . . . . . .   7
71	       3.2.1.  The Data Sources of Migration . . . . . . . . . . . .   7
72	       3.2.2.  The Connection Testing of Relational Data Sources . .   8
73	       3.2.3.  The Target Storage Container of Data Migration  . . .   8
74	       3.2.4.  Specifying Target Cloud Platform  . . . . . . . . . .   8
75	       3.2.5.  Data Migration to third-party Web Applications  . . .   8
76	     3.3.  Type of Migrated Database . . . . . . . . . . . . . . . .   9
77	     3.4.  Scale of Migrated Table . . . . . . . . . . . . . . . . .   9
78	       3.4.1.  Full Table Migration  . . . . . . . . . . . . . . . .   9
79	       3.4.2.  Single Table Migration  . . . . . . . . . . . . . . .  10
80	       3.4.3.  Multi-table migration . . . . . . . . . . . . . . . .  10
81	     3.5.  Split-by  . . . . . . . . . . . . . . . . . . . . . . . .  10
82	       3.5.1.  Single Column . . . . . . . . . . . . . . . . . . . .  10
83	       3.5.2.  Multiple Column . . . . . . . . . . . . . . . . . . .  11
84	       3.5.3.  Non-linear Segmentation . . . . . . . . . . . . . . .  11
85	     3.6.  Conditional Query Migration . . . . . . . . . . . . . . .  12
86	     3.7.  Dynamic Detection of Data Redundancy  . . . . . . . . . .  12
87	     3.8.  Data Migration with Compression . . . . . . . . . . . . .  13
88	     3.9.  Updating Mode of Data Migration . . . . . . . . . . . . .  13
89	       3.9.1.  Appending Migration . . . . . . . . . . . . . . . . .  13
90	       3.9.2.  Overwriting the Import  . . . . . . . . . . . . . . .  13
91	     3.10. The Encryption and Decryption of Data Migration . . . . .  14
92	     3.11. Incremental Migration . . . . . . . . . . . . . . . . . .  14
93	     3.12. Real-Time Synchronization Migration . . . . . . . . . . .  14
94	     3.13. The Direct Mode of Data Migration . . . . . . . . . . . .  14
95	     3.14. The Storage Format of Data files  . . . . . . . . . . . .  15
96	     3.15. The Number of Map Tasks . . . . . . . . . . . . . . . . .  15
97	     3.16. The selection on the elements in a table to be migrated
98	           column  . . . . . . . . . . . . . . . . . . . . . . . . .  15
99	     3.17. Visualization of Migration  . . . . . . . . . . . . . . .  15
100	       3.17.1.  Dataset Visualization  . . . . . . . . . . . . . . .  15
101	       3.17.2.  Visualization of Data Migration Progress . . . . . .  15
102	     3.18. Smart Analysis of Migration . . . . . . . . . . . . . . .  16
103	     3.19. Task Scheduling . . . . . . . . . . . . . . . . . . . . .  16
104	     3.20. The Alarm of Task Error . . . . . . . . . . . . . . . . .  16
105	     3.21. Data Export From Cloud to RDBMS . . . . . . . . . . . . .  16
106	       3.21.1.  Data Export Diagram  . . . . . . . . . . . . . . . .  16
107	       3.21.2.  Full Export  . . . . . . . . . . . . . . . . . . . .  17
108	       3.21.3.  Partial Export . . . . . . . . . . . . . . . . . . .  17
109	     3.22. The Merger of Data  . . . . . . . . . . . . . . . . . . .  18
110	     3.23. Column Separator  . . . . . . . . . . . . . . . . . . . .  18
111	     3.24. Record Line Separator . . . . . . . . . . . . . . . . . .  18
112	     3.25. The Mode of Payment . . . . . . . . . . . . . . . . . . .  18
113	     3.26. Web Shell for Migration . . . . . . . . . . . . . . . . .  18
114	       3.26.1.  Linux Web Shell  . . . . . . . . . . . . . . . . . .  18
115	       3.26.2.  HBase Shell  . . . . . . . . . . . . . . . . . . . .  19
116	       3.26.3.  Hive Shell . . . . . . . . . . . . . . . . . . . . .  19
117	       3.26.4.  Hadoop Shell . . . . . . . . . . . . . . . . . . . .  19
118	       3.26.5.  Spark Shell  . . . . . . . . . . . . . . . . . . . .  19
119	       3.26.6.  Spark Shell Programming Language . . . . . . . . . .  19
120	   4.  Security Considerations . . . . . . . . . . . . . . . . . . .  19
121	   5.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  20
122	   6.  References  . . . . . . . . . . . . . . . . . . . . . . . . .  20
123	     6.1.  Normative References  . . . . . . . . . . . . . . . . . .  20
124	     6.2.  Informative References  . . . . . . . . . . . . . . . . .  20
125	     6.3.  URL References  . . . . . . . . . . . . . . . . . . . . .  20
126	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  21

128	1.  Introduction

130	   With the widespread popularization of cloud computing and big data
131	   technology, the scale of data is increasing rapidly, and the
132	   distribution computing requirements are more significant than before.
133	   For a long time, a majority of companies have usually use relational
134	   databases to store and manage their data, a great amount of
135	   structured data exist still and accumulate with the business
136	   development in legacies.  With the dairy growth of data size, the
137	   storage bottleneck and the performance degradation for the data when
138	   analyzing and processing have become pretty serious and need to be
139	   solved in globe enterprise-level applications.  This distributed
140	   platform refers to a software platform that builds data storage, data
141	   analysis, and calculations on a cluster of multiple hosts.  Its core
142	   architecture involves in distributed storage and distributed
143	   computing.  In terms of storage, it is theoretically possible to
144	   expand capacity indefinitely, and storage can be dynamically expanded
145	   horizontally with the increasing data.  In terms of computing, some
146	   key computing frameworks as mapreduce can be used to perform parallel
147	   computing on large-scale datasets to improve the efficiency of
148	   massive data processing.  Therefore, when the data size exceeds the
149	   storage capacity of a single-system or the computation exceeds the
150	   computing capacity of a stand-alone system, massive data can be
151	   migrated to a distributed platform.  The ability of resource sharing
152	   and collaborative computing provided by a distributed platform can
153	   well solve large-scale data processing problems.  The document
154	   focuses on putting forward a standard for implementing a big data
155	   migration framework through web access via Internet and considering
156	   how to help users more easily and quickly migrate the massive data
157	   from a traditional relational database to a cloud platform from
158	   multiple requirements.  Using the distributed storage and distributed
159	   computing technologies highlighted by the cloud platform, on the one
160	   hand, it solves the storage bottleneck and the problem of low data
161	   analyzing and processing performance of relational databases.  Based
162	   on the access by web, the framework supports open work state and
163	   promotes globe applications for data migration.

165	   Note: It is also permissible to implement this framework in non-web.

167	2.  Definitions and Terminology

169	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
170	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
171	   document are to be interpreted as described in RFC 2119 [RFC2119].

173	   The following definitions are for terms used in the context of this
174	   document.

176	   o  "DMOW": Its full name is "Data Migration on Web",it means data
177	      migration based on web.

179	   o  "Cloud Computing": Cloud computing is a pay-per-use model that
180	      provides available, convenient, on-demand network access, the user
181	      enters a configurable computing resource sharing pool (resources
182	      include network, server, storage, application software, services),
183	      these resources can be provided quickly, with little
184	      administrative effort or little interaction with service
185	      providers.

187	   o  "Big Data": A collection of data that cannot be captured, managed,
188	      and processed using conventional software tools within a certain
189	      time frame.  That is a massive, high growth rate and diversified
190	      information assets that require new processing modes to have
191	      stronger decision-making power, insight and process optimization
192	      capabilities.

194	   o  "Data Migration": The data migration described in this document is
195	      aimed at the data transfer process between a relational database
196	      and a cloud platform.

198	   o  "Data Storage": Data is recorded in a format on the computer's
199	      internal or external storage media.

201	   o  "Data Cleansing": It is a process of re-examining and verifying
202	      data.  The purpose is to remove duplicate information, correct
203	      existing errors, and provide data consistency.

205	   o  "Extraction-Transformation-Loading(ETL)": The processing of user
206	      database or data warehouse.  That is, data is extracted from
207	      various data sources, converted into data that meets the needs of
208	      the business, and finally loaded into the database.

210	   o  "Distributed Platform": A software platform that builds data
211	      storage, data analysis, and calculations on a cluster of multiple
212	      hosts.

214	   o  "Distributed File System": The physical storage resources managed
215	      by the file system are not directly connected to the local node.
216	      Instead, they are distributed on a group of machine nodes
217	      connected by a high-speed internal network.  These machine nodes
218	      together form a cluster.

220	   o  "Distributed Computing": A computer science discipline studies how
221	      to divide a problem that requires a very large amount of computing
222	      power into many small parts and it is coordinated by many
223	      independent computers to get the final result.

225	   o  "Apache Hadoop": An open source distributed system infrastructure
226	      that can be used to develop distributed programs for large data
227	      operations and storage.

229	   o  "Apache HBase": An open source, non-relational, distributed
230	      database.  Used with the Hadoop framework.

232	   o  "Apache Hive": It is a data warehouse infrastructure built on
233	      Hadoop.  It can be used for data extraction-transformation-
234	      loading(ETL), it is a mechanism that can store, query, and analyze
235	      large-scale data stored in Hadoop.

237	   o  "HDFS": A Hadoop distributed file system designed to run on
238	      general-purpose hardware.

240	   o  "MapReduce": A programming model for parallel computing of large-
241	      scale data sets (greater than 1 TB).

243	   o  "Spark": It is a fast and versatile computing engine designed for
244	      large-scale data processing.

246	   o  "MongoDB":It is a database based on distributed file storage
247	      designed to provide scalable, high-performance data storage
248	      solutions for web applications.

250	   o  "GHTs":the sender to send the granular header information index
251	      table of the information granules.

253	   o  "GDBs": the sender to send the granular information database of
254	      the information granules.

256	   o  "GHTr": the receiver to receive the granular header information
257	      index table of the information granules.

259	   o  "GDBr":the receiver to receive the granular information database
260	      of the information granules.

262	3.  Specific Framework Implementation Standards

264	   The main goal of this data migration framework is to help companies
265	   migrate their massive data stored in relational databases to cloud
266	   platforms through web access.  We propose a series of rules and
267	   constraints on the implementation of the framework, by which the
268	   users can conduct massive data migration with a multi-demand
269	   perspective.

271	   Note: The cloud platforms mentioned in the document refer to the
272	   Hadoop platform by default.  All standards on the operations and the
273	   environment of the framework refer to web state by default.

275	3.1.  System Architecture Diagram
276	   Figure 1 shows the working diagram of the framework.

278	       +---------+         +----------------+
279	       |         |   (1)   |    WebServer   |
280	       | Browser |-------->|                |---------------------
281	       |         |         |  +-----------+ |                    |
282	       +---------+         |  |   DMOW    | |                    |
283	                           |  +-----------+ |                    |
284	                           +----------------+                    |
285	                                                                 |(2)
286	                                                                 |
287	                                                                 |
288	           +-------------+          +-----------------------+    |
289	           |             |   (3)    |                       |    |
290	           | Data Source |--------> |     Cloud Platform    |    |
291	           |             |          |  +-----------------+  |<----
292	           +-------------+          |  | Migration Engine|  |
293	                                    |  +-----------------+  |
294	                                    +-----------------------+

296	                      Figure 1:Reference Architecture

298	   The workflow of the framework is as follows:

300	      Step (1) in the figure means that users submit the requisition of
301	      data migration to DMOW through browser(the requisition includes
302	      data source information, target cloud platform information, and
303	      related migration parameter settings);

305	      Step (2) in the figure means that DMOW submits user's request
306	      information of data migration to cloud platform's migration
307	      engine;

309	      Step (3) in the figure means that the migration engine performs
310	      data migration tasks based on the migration requests it receives
311	      to migrate data from relational database to cloud platform;

313	3.2.  Source and Target of Migration

315	3.2.1.  The Data Sources of Migration

317	   This framework MUST support data migration between relational
318	   databases and cloud platforms on web, and MUST meet the following
319	   requirements:

321	   1.  The framework supports to connect data sources in relational
322	       databases.  The relational database MUST be at least one of the
323	       following:

325	       *  SQLSERVER

327	       *  MYSQL

329	       *  ORACLE

331	   2.  This framework MUST support the dynamic perception of data
332	       information in relational databases under a normal connection, in
333	       other words :

335	       *  It MUST support dynamic awareness of all tables in a
336	          relational database;

338	       *  It MUST support dynamic awareness of all columns corresponding
339	          to all tables in a relational database;

341	3.2.2.  The Connection Testing of Relational Data Sources

343	   Before conducting data migration, the framework MUST support testing
344	   the connection to the data sources that will be migrated, and then
345	   decide whether to migrate.

347	3.2.3.  The Target Storage Container of Data Migration

349	   This framework MUST allow users to migrate large amounts of data from
350	   a relational database to the following at least two types of target
351	   storage containers:

353	   o  HDFS

355	   o  HBASE

357	   o  HIVE

359	3.2.4.  Specifying Target Cloud Platform

361	   This framework MUST allow an authorized user to specify the target
362	   cloud platform to which the data will be migrated.

364	3.2.5.  Data Migration to third-party Web Applications

366	   This framework SHALL support the migration of large amounts of data
367	   from relational databases to one or multiple data containers for
368	   third-party Web applications.  The target storage containers of the
369	   third-party Web application systems can be:

371	   o  MONGODB

373	   o  MYSQL

375	   o  SQLSERVER

377	   o  ORACLE

379	3.3.  Type of Migrated Database

381	   This framework is needed to meet the following requirements:

383	   o  It MAY support migrating the entire relational database to the
384	      cloud platform;

386	   o  It MAY support homogeneous migration (for example, migration from
387	      ORACLE to ORACLE);

389	   o  It MAY support heterogeneous migrations between different
390	      databases (for example, migration from ORACLE to SQLServer);

392	   o  It SHALL support the migration to the MONGODB database;

394	   o  It's OPTIONAL that If the migration process is interrupted, it is
395	      needed to support automatic restart of the migration process and
396	      continue the migration from where it left off; Additionally, the
397	      framework is needed to be able to support the user in the
398	      following manner to inform this abnormal interruption:

400	      *  It MUST support popping up an alert box on the screen of the
401	         user;

403	      *  It SHALL support notifying users by email;

405	      *  It's OPTIONAL to notify users by an Instant Messenger as We
406	         Chat or QQ;

408	3.4.  Scale of Migrated Table

410	3.4.1.  Full Table Migration

412	   This framework MUST support the migration of all tables in a
413	   relational database to at least two types of target storage
414	   containers:

416	   o  HDFS

418	   o  HBASE

420	   o  HIVE

422	3.4.2.  Single Table Migration

424	   This framework MUST allow users to specify a single table in a
425	   relational database and migrate it to at least two types of target
426	   storage containers:

428	   o  HDFS

430	   o  HBASE

432	   o  HIVE

434	3.4.3.  Multi-table migration

436	   This framework MUST allow users to specify multiple tables in a
437	   relational database and migrate to at least two types of target
438	   storage containers:

440	   o  HDFS

442	   o  HBASE

444	   o  HIVE

446	3.5.  Split-by

448	   This framework is needed to meet the following requirements on split-
449	   by.

451	3.5.1.  Single Column

453	   1.  The framework MUST allow the user to specify a single column of
454	       the data table (usually the table's primary key), then slice the
455	       data in the table into multiple parallel tasks based on this
456	       column, and migrate the sliced data to one or more of the
457	       following target data containers respectively:

459	       *  HDFS

461	       *  HBASE

463	       *  HIVE
464	          The specification of the data table column can be based on the
465	          following methods:

467	          +  Users can specify freely;

469	          +  Users can specify linearly;

471	          +  Users can select an appropriate column for the segmentation
472	             based on the information entropy of the selected column
473	             data;

475	   2.  The framework SHALL allow the user to query the boundaries of the
476	       specified column in the split-by, then slice the data into
477	       multiple parallel tasks and migrating the data to one or more of
478	       the following target data containers:

480	       *  HDFS

482	       *  HBASE

484	       *  HIVE

486	3.5.2.  Multiple Column

488	   This framework MAY allow the user to specify multiple columns in the
489	   data table to slice the data linearly into multiple parallel tasks
490	   and then migrate the data to one or more of the following target data
491	   containers:

493	   o  HDFS

495	   o  HBASE

497	   o  HIVE

499	3.5.3.  Non-linear Segmentation

501	   It's OPTIONAL that this framework is needed to support non-linear
502	   intelligent segmentations of data for one or more columns and then
503	   migrate the data to one or more of the following target data
504	   containers:

506	      The non-linear intelligent segmentations refer to:

508	      *  Adaptive segmentation based on the distribution(density)of the
509	         value of numerical columns;

511	      *  Adaptive segmentation based on the distribution of entropy of
512	         subsegments of a column;

514	      *  Adaptive Segmentation Based on Neural Network Predictor;

516	      The target data container includes:

518	      *  HDFS

520	      *  HBASE

522	      *  HIVE

524	3.6.  Conditional Query Migration

526	   This framework SHALL allow users to specify the query conditions,
527	   then querying out the corresponding data records and migrating them.

529	3.7.  Dynamic Detection of Data Redundancy

531	   It's OPTIONAL that the framework is needed to allow users to add data
532	   redundancy labels and label communication mechanisms, then it detects
533	   redundant data dynamically during data migration to achieve non-
534	   redundant migration.

536	   The specific requirements are as follows:

538	   o  The framework SHALL be able to deep granulation processing on the
539	      piece of data content to be sent.  It means the content segment to
540	      be sent is further divided into smaller-sized data sub-blocks.

542	   o  The framework SHALL be able to feature calculation and forming a
543	      grain head for each of the decomposed particles, the granular
544	      header information includes but not limited to grain feature
545	      amount, grain data fingerprint, unique grain ID number, particle
546	      generation time, source address and destination address, etc.

548	   o  The framework SHALL be able to detect the granular header
549	      information to determine the transmission status of each
550	      information granule content that is decomposed, and if the current
551	      information granule to be sent is already present at the receiving
552	      end, the content of the granule is not sent.  Otherwise the
553	      current granule will be sent out.

555	   o  The framework SHALL be able to set a Cache at the sending port to
556	      cache the granular header information index table (GHTs) and the
557	      granular information database (GDBs) for the information granules;
558	      the receiver SHALL be able to set a Cache to cache the granular
559	      header information index table (GHTr) and the granular information
560	      database (GDBr) that have successfully received the information
561	      granular.

563	   o  After all the fragments of the data have been transferred, the
564	      framework SHALL be able to reassemble all the fragments and store
565	      the data on the receiving disk.

567	   o  The framework SHALL be able to set a granular encoder at the
568	      sending port, which is responsible for encoding and compressing
569	      the information granular content generated by the granular
570	      resolver.  The encoder generates a coded version of the
571	      corresponding information granule, and calculates the content
572	      about the granule head of the compressed information granule ,
573	      then performs transmission processing in the manner of sending the
574	      granule head, detecting redundant granules, synthesizing the
575	      granules, and accurately detecting redundant granules.

577	   o  The framework SHALL be able to set a granular decoder at the
578	      receiving port, which is responsible for decoding the encoded
579	      compressed granular content at the receiving port and merging it
580	      with the grain synthesizer, whether it comes from the sending port
581	      Cache or the receiving port Cache.

583	3.8.  Data Migration with Compression

585	   During the data migration process, the data is not compressed by
586	   default.  This framework MUST support at least one of the following
587	   data compression encoding formats, allowing the user to compress and
588	   migrate the data:

590	   o  GZIP

592	   o  BZIP2

594	3.9.  Updating Mode of Data Migration

596	3.9.1.  Appending Migration

598	   This framework SHALL support the migration of appending data to
599	   existing datasets in HDFS.

601	3.9.2.  Overwriting the Import

603	   When importing data into HIVE, the framework SHALL support
604	   overwriting the original dataset and saving it.

606	3.10.  The Encryption and Decryption of Data Migration

608	   This framework is needed to meet the following requirements:

610	   o  It MAY support data encryption at the source, and then the
611	      received data should be decrypted and stored on the target
612	      platform;

614	   o  It MUST support the authentication when getting data migration
615	      source data;

617	   o  It SHALL support the verification of identity and permission when
618	      accessing the target platform of data migration;

620	   o  During the process of data migration, it SHOULD support data
621	      consistency;

623	   o  During the process of data migration, it MUST support data
624	      integrity;

626	3.11.  Incremental Migration

628	   The framework SHOULD support incremental migration of table records
629	   in a relational database, and it MUST allow the user to specify a
630	   field value as "last_value" in the table in order to characterize the
631	   row record increment.  Then, the framework SHOULD migrate those
632	   records in the table whose field value is greater than the specified
633	   "last_value", and then update the last_value.

635	3.12.  Real-Time Synchronization Migration

637	   The framework SHALL support real-time synchronous migration of
638	   updated data and incremental data from a relational database to one
639	   or many of the following target data containers:

641	   o  HDFS

643	   o  HBASE

645	   o  HIVE

647	3.13.  The Direct Mode of Data Migration

649	   This framework MUST support data migration in direct mode, which can
650	   increase the data migration rate.

652	   Note:This mode supports only for MYSQL and POSTGRESQL.

654	3.14.  The Storage Format of Data files

656	   This framework MUST allow saving the migrated data within at least
657	   one of following data file formats:

659	   o  SEQUENCE

661	   o  TEXTFILE

663	   o  AVRO

665	3.15.  The Number of Map Tasks

667	   This framework MUST allow the user to specify a number of map tasks
668	   to start a corresponding number of map tasks for migrating large
669	   amounts of data in parallel.

671	3.16.  The selection on the elements in a table to be migrated column

673	   o  The specification of columns

675	         This framework MUST support the user to specify the data of one
676	         or multiple columns in a table to be migrated.

678	   o  The specification of rows

680	         This framework SHOULD support the user to specify the range of
681	         rows in a table to be migrated.

683	   o  The composition of the specification of columns and rows

685	         This framework MAY support optionally the user to specify the
686	         range of rows and columns in a table to be migrated.

688	3.17.  Visualization of Migration

690	3.17.1.  Dataset Visualization

692	   After the framework has migrated the data in the relational
693	   database,,it MUST support the visualization of the dataset in the
694	   cloud platform.

696	3.17.2.  Visualization of Data Migration Progress

698	   The framework SHOULD support to show dynamically the progress to
699	   users in graphical mode when migrating.

701	3.18.  Smart Analysis of Migration

703	   The framework MAY provide automated migration proposals to facilitate
704	   the user's estimation of migration workload and costs.

706	3.19.  Task Scheduling

708	   The framework SHALL support the user to set various migration
709	   parameters(such as map tasks,the storage format of data files,the
710	   type of data compression and so on) and task execution time, and then
711	   to perform the schedule off-line/online migration tasks.

713	3.20.  The Alarm of Task Error

715	   When the task fails, the framework MUST at least support to notify
716	   stakeholders through a predefined way.

718	3.21.  Data Export From Cloud to RDBMS

720	3.21.1.  Data Export Diagram

722	   Figure 2 shows the framework's working diagram of exporting data.

724	       +---------+         +----------------+
725	       |         |   (1)   |    WebServer   |
726	       | Browser |-------->|                |---------------------
727	       |         |         |  +-----------+ |                    |
728	       +---------+         |  |   DMOW    | |                    |
729	                           |  +-----------+ |                    |
730	                           +----------------+                    |
731	                                                                 |(2)
732	                                                                 |
733	                                                                 |
734	           +-------------+          +-----------------------+    |
735	           |             |   (3)    |                       |    |
736	           | Data Source |<-------- |     Cloud Platform    |    |
737	           |             |          |  +-----------------+  |<----
738	           +-------------+          |  | Migration Engine|  |
739	                                    |  +-----------------+  |
740	                                    +-----------------------+

742	                        Figure 2:Reference Diagram

744	   The workflow of exporting data through the framework is as follows:

746	      Step (1) in the figure means that users submit the requisition of
747	      data migration to DMOW through browser(the requisition includes
748	      cloud platform information,the information of target relational
749	      database, and related migration parameter settings);

751	      Step (2) in the figure means that DMOW submits user's request
752	      information of data migration to cloud platform's migration
753	      engine;

755	      Step (3) in the figure means that the migration engine performs
756	      data migration tasks based on the migration requests it receives
757	      to migrate data from cloud platform to relational database;

759	3.21.2.  Full Export

761	   The framework MUST at least support exporting data from HDFS to one
762	   of following relational databases:

764	   o  SQLSERVER

766	   o  MYSQL

768	   o  ORACLE

770	   The framework SHALL support exporting data from HBASE to one of
771	   following relational databases:

773	   o  SQLSERVER

775	   o  MYSQL

777	   o  ORACLE

779	   The framework SHALL support exporting data from HIVE to one of
780	   following relational databases:

782	   o  SQLSERVER

784	   o  MYSQL

786	   o  ORACLE

788	3.21.3.  Partial Export

790	   The framework SHALL allow the user to specify data range of keys on
791	   the cloud platform and export the elements in the specified range to
792	   a relational database.  Exporting into A Subset of Columns.

794	3.22.  The Merger of Data

796	   The framework SHALL support merging data in different directories in
797	   HDFS and store them in a specified directory.

799	3.23.  Column Separator

801	   The framework MUST allow the user to specify the separator between
802	   fields in the migration process.

804	3.24.  Record Line Separator

806	   The framework MUST allow the user to specify the separator between
807	   the record lines after the migration is complete.

809	3.25.  The Mode of Payment

811	   1.  One-way payment mode

813	       *  In the framework by default, users SHALL to pay for
814	          downloading data from the cloud platform.It is free when
815	          uploading data from a relational database to a cloud platform;

817	       *  In the framework, users SHALL pay for uploading data from a
818	          relational database to a cloud platform.It is free when
819	          downloading data from the cloud;

821	   2.  Two-way payment mode

823	          In the framework, the users of the data migration process
824	          between the relational database and the cloud platform all
825	          SHALL pay a fee;

827	3.26.  Web Shell for Migration

829	   The framework provides following shells for character interface to
830	   operate through web access.

832	3.26.1.  Linux Web Shell

834	   The framework SHALL support Linux shell through web access, which
835	   allows users to perform basic Linux command instructions for the
836	   configuration management of the data migrated on web.

838	3.26.2.  HBase Shell

840	   The framework SHALL support hbase shell through web access, which
841	   allows users to perform basic operations such as adding, deleting,
842	   and deleting to the data migrated to hbase through the web shell.

844	3.26.3.  Hive Shell

846	   The framework SHALL support hive shell through web access, which
847	   allows users to perform basic operations such as adding, deleting,
848	   and deleting to the data migrated to hive through the web shell.

850	3.26.4.  Hadoop Shell

852	   The framework SHALL support the Hadoop shell through web access so
853	   that users can perform basic Hadoop command operations through the
854	   web shell.

856	3.26.5.  Spark Shell

858	   The framework SHALL support spark shell through web access and
859	   provide an interactive way to analyze and process the data in the
860	   cloud platform.

862	3.26.6.  Spark Shell Programming Language

864	   In spark web shell, the framework SHALL support at least one of the
865	   following programming languages:

867	   o  Scala

869	   o  Java

871	   o  Python

873	4.  Security Considerations

875	   The framework SHOUD support for the security of the data migration
876	   process.  During the data migration process, it should support
877	   encrypt the data before transmission, and then decrypt it for storage
878	   in target after the transfer is complete.  At the same time, it must
879	   support the authentication when getting data migration source data
880	   and it shall support the verification of identity and permission when
881	   accessing the target platform.

883	5.  IANA Considerations

885	   This memo includes no request to IANA.

887	6.  References

889	6.1.  Normative References

891	   [RFC2026]  Bradner, S., "The Internet Standards Process -- Revision
892	              3", BCP 9, RFC 2026, DOI 10.17487/RFC2026, October 1996,
893	              <https://www.rfc-editor.org/info/rfc2026>.

895	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
896	              Requirement Levels", BCP 14, RFC 2119,
897	              DOI 10.17487/RFC2119, March 1997,
898	              <https://www.rfc-editor.org/info/rfc2119>.

900	   [RFC2578]  McCloghrie, K., Ed., Perkins, D., Ed., and J.
901	              Schoenwaelder, Ed., "Structure of Management Information
902	              Version 2 (SMIv2)", STD 58, RFC 2578,
903	              DOI 10.17487/RFC2578, April 1999,
904	              <https://www.rfc-editor.org/info/rfc2578>.

906	6.2.  Informative References

908	   [RFC2629]  Rose, M., "Writing I-Ds and RFCs using XML", RFC 2629,
909	              DOI 10.17487/RFC2629, June 1999,
910	              <https://www.rfc-editor.org/info/rfc2629>.

912	   [RFC4710]  Siddiqui, A., Romascanu, D., and E. Golovinsky, "Real-time
913	              Application Quality-of-Service Monitoring (RAQMON)
914	              Framework", RFC 4710, DOI 10.17487/RFC4710, October 2006,
915	              <https://www.rfc-editor.org/info/rfc4710>.

917	   [RFC5694]  Camarillo, G., Ed. and IAB, "Peer-to-Peer (P2P)
918	              Architecture: Definition, Taxonomies, Examples, and
919	              Applicability", RFC 5694, DOI 10.17487/RFC5694, November
920	              2009, <https://www.rfc-editor.org/info/rfc5694>.

922	6.3.  URL References

924	   [hadoop]   The Apache Software Foundation,
925	              "http://hadoop.apache.org/".

927	   [hbase]    The Apache Software Foundation,
928	              "http://hbase.apache.org/".

930	   [hive]     The Apache Software Foundation, "http://hive.apache.org/".

932	   [idguidelines]
933	              IETF Internet Drafts editor,
934	              "http://www.ietf.org/ietf/1id-guidelines.txt".

936	   [idnits]   IETF Internet Drafts editor,
937	              "http://www.ietf.org/ID-Checklist.html".

939	   [ietf]     IETF Tools Team, "http://tools.ietf.org".

941	   [ops]      the IETF OPS Area, "http://www.ops.ietf.org".

943	   [spark]    The Apache Software Foundation,
944	              "http://spark.apache.org/".

946	   [sqoop]    The Apache Software Foundation,
947	              "http://sqoop.apache.org/".

949	   [xml2rfc]  XML2RFC tools and documentation,
950	              "http://xml.resource.org".

952	Authors' Addresses

954	   Can Yang (editor)
955	   South China University of Technology
956	   382 Zhonghuan Road East
957	   Guangzhou Higher Education Mega Centre
958	   Guangzhou, Panyu District
959	   P.R.China

961	   Phone: +86 18602029601
962	   Email: cscyang@scut.edu.cn

964	   Yu Liu&Ying Wang&ShiYing Pan (editor)
965	   South China University of Technology
966	   382 Zhonghuan Road East
967	   Guangzhou Higher Education Mega Centre
968	   Guangzhou, Panyu District
969	   P.R.China

971	   Email: 201820132798@scut.edu.cn
972	   Cong Chen
973	   Inspur
974	   163 Pingyun Road
975	   Guangzhou, Tianhe District
976	   P.R.China

978	   Email: chen_cong@inspur.com

980	   Ge Chen
981	   GSTA
982	   No. 109 Zhongshan Road West, Guangdong Telecom Technology Building
983	   Guangzhou, Tianhe District
984	   P.R.China

986	   Email: cheng@gsta.com

988	   Yukai Wei
989	   Huawei
990	   Putian Huawei base
991	   Shenzhen, Longgang District
992	   P.R.China

994	   Email: weiyukai@huawei.com