idnits 2.17.1 

draft-yangcan-ietf-data-migration-standards-02.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (May 30, 2019) is 1785 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Unused Reference: 'RFC2026' is defined on line 853, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC2578' is defined on line 862, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC2629' is defined on line 870, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC4710' is defined on line 874, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC5694' is defined on line 879, but no explicit
     reference was found in the text

  -- Obsolete informational reference (is this intentional?): RFC 2629
     (Obsoleted by RFC 7749)


     Summary: 0 errors (**), 0 flaws (~~), 6 warnings (==), 2 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	Internet Engineering Task Force                             C. Yang, Ed.
3	Internet-Draft                                               Y. Liu, Ed.
4	Intended status: Standards Track    South China University of Technology
5	Expires: December 1, 2019                                        C. Chen
6	                                                                  Inspur
7	                                                                 G. Chen
8	                                                                    GSTA
9	                                                                  Y. Wei
10	                                                                  Huawei
11	                                                            May 30, 2019

13	                   A Massive Data Migration Framework
14	             draft-yangcan-ietf-data-migration-standards-02

16	Abstract

18	   This document describes a standardized framework for implementing the
19	   massive data migration between traditional databases and big-data
20	   platforms on the cloud via Internet, especially for an instance of
21	   Hadoop data architecture.  The main goal of the framework is to
22	   provide more concise and friendly interfaces for users more easily
23	   and quickly migrate the massive data from a relational database to a
24	   distributed platform for a variety of requirements, in order to make
25	   full use of distributed storage resource and distributed computing
26	   capability to solve the bottleneck problems of both storage and
27	   computing performance in traditional enterprise-level applications.
28	   This document covers the fundamental architecture, data elements
29	   specification, operations, and interface related to massive data
30	   migration.

32	Status of This Memo

34	   This Internet-Draft is submitted in full conformance with the
35	   provisions of BCP 78 and BCP 79.

37	   Internet-Drafts are working documents of the Internet Engineering
38	   Task Force (IETF).  Note that other groups may also distribute
39	   working documents as Internet-Drafts.  The list of current Internet-
40	   Drafts is at https://datatracker.ietf.org/drafts/current/.

42	   Internet-Drafts are draft documents valid for a maximum of six months
43	   and may be updated, replaced, or obsoleted by other documents at any
44	   time.  It is inappropriate to use Internet-Drafts as reference
45	   material or to cite them other than as "work in progress."

47	   This Internet-Draft will expire on December 1, 2019.

49	Copyright Notice

51	   Copyright (c) 2019 IETF Trust and the persons identified as the
52	   document authors.  All rights reserved.

54	   This document is subject to BCP 78 and the IETF Trust's Legal
55	   Provisions Relating to IETF Documents
56	   (https://trustee.ietf.org/license-info) in effect on the date of
57	   publication of this document.  Please review these documents
58	   carefully, as they describe your rights and restrictions with respect
59	   to this document.  Code Components extracted from this document must
60	   include Simplified BSD License text as described in Section 4.e of
61	   the Trust Legal Provisions and are provided without warranty as
62	   described in the Simplified BSD License.

64	Table of Contents

66	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
67	   2.  Definitions and Terminology . . . . . . . . . . . . . . . . .   4
68	   3.  Specific Framework Implementation Standards . . . . . . . . .   6
69	     3.1.  System Architecture Diagram . . . . . . . . . . . . . . .   6
70	     3.2.  Source and Target of Migration  . . . . . . . . . . . . .   7
71	       3.2.1.  The Data Sources of Migration . . . . . . . . . . . .   7
72	       3.2.2.  The Connection Testing of Relational Data Sources . .   7
73	       3.2.3.  The Target Storage Container of Data Migration  . . .   8
74	       3.2.4.  Specifying Target Cloud Platform  . . . . . . . . . .   8
75	       3.2.5.  Data Migration to third-party Web Applications  . . .   8
76	     3.3.  Type of Migrated Database . . . . . . . . . . . . . . . .   8
77	     3.4.  Scale of Migrated Table . . . . . . . . . . . . . . . . .   9
78	       3.4.1.  Full Table Migration  . . . . . . . . . . . . . . . .   9
79	       3.4.2.  Single Table Migration  . . . . . . . . . . . . . . .   9
80	       3.4.3.  Multi-table migration . . . . . . . . . . . . . . . .   9
81	     3.5.  Split-by  . . . . . . . . . . . . . . . . . . . . . . . .  10
82	       3.5.1.  Single Column . . . . . . . . . . . . . . . . . . . .  10
83	       3.5.2.  Multiple Column . . . . . . . . . . . . . . . . . . .  11
84	       3.5.3.  Non-linear Segmentation . . . . . . . . . . . . . . .  11
85	     3.6.  Conditional Query Migration . . . . . . . . . . . . . . .  11
86	     3.7.  Dynamic Detection of Data Redundancy  . . . . . . . . . .  11
87	     3.8.  Data Migration with Compression . . . . . . . . . . . . .  12
88	     3.9.  Updating Mode of Data Migration . . . . . . . . . . . . .  12
89	       3.9.1.  Appending Migration . . . . . . . . . . . . . . . . .  12
90	       3.9.2.  Overwriting the Import  . . . . . . . . . . . . . . .  13
91	     3.10. The Encryption and Decryption of Data Migration . . . . .  13
92	     3.11. Incremental Migration . . . . . . . . . . . . . . . . . .  13
93	     3.12. Real-Time Synchronization Migration . . . . . . . . . . .  13
94	     3.13. The Direct Mode of Data Migration . . . . . . . . . . . .  14
95	     3.14. The Storage Format of Data files  . . . . . . . . . . . .  14
96	     3.15. The Number of Map Tasks . . . . . . . . . . . . . . . . .  14
97	     3.16. The selection on the elements in a table to be migrated
98	           column  . . . . . . . . . . . . . . . . . . . . . . . . .  14
99	     3.17. Visualization of Migration  . . . . . . . . . . . . . . .  14
100	       3.17.1.  Dataset Visualization  . . . . . . . . . . . . . . .  14
101	       3.17.2.  Visualization of Data Migration Progress . . . . . .  15
102	     3.18. Smart Analysis of Migration . . . . . . . . . . . . . . .  15
103	     3.19. Task Scheduling . . . . . . . . . . . . . . . . . . . . .  15
104	     3.20. The Alarm of Task Error . . . . . . . . . . . . . . . . .  15
105	     3.21. Data Export From Cloud to RDBMS . . . . . . . . . . . . .  15
106	       3.21.1.  Data Export Diagram  . . . . . . . . . . . . . . . .  15
107	       3.21.2.  Full Export  . . . . . . . . . . . . . . . . . . . .  16
108	       3.21.3.  Partial Export . . . . . . . . . . . . . . . . . . .  17
109	     3.22. The Merger of Data  . . . . . . . . . . . . . . . . . . .  17
110	     3.23. Column Separator  . . . . . . . . . . . . . . . . . . . .  17
111	     3.24. Record Line Separator . . . . . . . . . . . . . . . . . .  17
112	     3.25. The Mode of Payment . . . . . . . . . . . . . . . . . . .  17
113	     3.26. Web Shell for Migration . . . . . . . . . . . . . . . . .  18
114	       3.26.1.  Linux Web Shell  . . . . . . . . . . . . . . . . . .  18
115	       3.26.2.  HBase Shell  . . . . . . . . . . . . . . . . . . . .  18
116	       3.26.3.  Hive Shell . . . . . . . . . . . . . . . . . . . . .  18
117	       3.26.4.  Hadoop Shell . . . . . . . . . . . . . . . . . . . .  18
118	       3.26.5.  Spark Shell  . . . . . . . . . . . . . . . . . . . .  18
119	       3.26.6.  Spark Shell Programming Language . . . . . . . . . .  19
120	   4.  Security Considerations . . . . . . . . . . . . . . . . . . .  19
121	   5.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  19
122	   6.  References  . . . . . . . . . . . . . . . . . . . . . . . . .  19
123	     6.1.  Normative References  . . . . . . . . . . . . . . . . . .  19
124	     6.2.  Informative References  . . . . . . . . . . . . . . . . .  20
125	     6.3.  URL References  . . . . . . . . . . . . . . . . . . . . .  20
126	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  21

128	1.  Introduction

130	   With the widespread popularization of cloud computing and big data
131	   technology, the scale of data is increasing rapidly, and the
132	   distribution computing requirements are more significant than before.
133	   For a long time, a majority of companies have usually use relational
134	   databases to store and manage their data, a great amount of
135	   structured data exist still and accumulate with the business
136	   development in legacies.  With the dairy growth of data size, the
137	   storage bottleneck and the performance degradation for the data when
138	   analyzing and processing have become pretty serious and need to be
139	   solved in globe enterprise-level applications.  This distributed
140	   platform refers to a software platform that builds data storage, data
141	   analysis, and calculations on a cluster of multiple hosts.  Its core
142	   architecture involves in distributed storage and distributed
143	   computing.  In terms of storage, it is theoretically possible to
144	   expand capacity indefinitely, and storage can be dynamically expanded
145	   horizontally with the increasing data.  In terms of computing, some
146	   key computing frameworks as mapreduce can be used to perform parallel
147	   computing on large-scale datasets to improve the efficiency of
148	   massive data processing.  Therefore, when the data size exceeds the
149	   storage capacity of a single-system or the computation exceeds the
150	   computing capacity of a stand-alone system, massive data can be
151	   migrated to a distributed platform.  The ability of resource sharing
152	   and collaborative computing provided by a distributed platform can
153	   well solve large-scale data processing problems.  The document
154	   focuses on putting forward a standard for implementing a big data
155	   migration framework through web access via Internet and considering
156	   how to help users more easily and quickly migrate the massive data
157	   from a traditional relational database to a cloud platform from
158	   multiple requirements.  Using the distributed storage and distributed
159	   computing technologies highlighted by the cloud platform, on the one
160	   hand, it solves the storage bottleneck and the problem of low data
161	   analyzing and processing performance of relational databases.  Based
162	   on the access by web, the framework supports open work state and
163	   promotes globe applications for data migration.

165	   Note: It is also permissible to implement this framework in non-web.

167	2.  Definitions and Terminology

169	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
170	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
171	   document are to be interpreted as described in RFC 2119 [RFC2119].

173	   The following definitions are for terms used in the context of this
174	   document.

176	   o  "DMOW": Its full name is "Data Migration on Web",it means data
177	      migration based on web.

179	   o  "Cloud Computing": Cloud computing is a pay-per-use model that
180	      provides available, convenient, on-demand network access, the user
181	      enters a configurable computing resource sharing pool (resources
182	      include network, server, storage, application software, services),
183	      these resources can be provided quickly, with little
184	      administrative effort or little interaction with service
185	      providers.

187	   o  "Big Data": A collection of data that cannot be captured, managed,
188	      and processed using conventional software tools within a certain
189	      time frame.  That is a massive, high growth rate and diversified
190	      information assets that require new processing modes to have
191	      stronger decision-making power, insight and process optimization
192	      capabilities.

194	   o  "Data Migration": The data migration described in this document is
195	      aimed at the data transfer process between a relational database
196	      and a cloud platform.

198	   o  "Data Storage": Data is recorded in a format on the computer's
199	      internal or external storage media.

201	   o  "Data Cleansing": It is a process of re-examining and verifying
202	      data.  The purpose is to remove duplicate information, correct
203	      existing errors, and provide data consistency.

205	   o  "Extraction-Transformation-Loading(ETL)": The processing of user
206	      database or data warehouse.  That is, data is extracted from
207	      various data sources, converted into data that meets the needs of
208	      the business, and finally loaded into the database.

210	   o  "Distributed Platform": A software platform that builds data
211	      storage, data analysis, and calculations on a cluster of multiple
212	      hosts.

214	   o  "Distributed File System": The physical storage resources managed
215	      by the file system are not directly connected to the local node.
216	      Instead, they are distributed on a group of machine nodes
217	      connected by a high-speed internal network.  These machine nodes
218	      together form a cluster.

220	   o  "Distributed Computing": A computer science discipline studies how
221	      to divide a problem that requires a very large amount of computing
222	      power into many small parts and it is coordinated by many
223	      independent computers to get the final result.

225	   o  "Apache Hadoop": An open source distributed system infrastructure
226	      that can be used to develop distributed programs for large data
227	      operations and storage.

229	   o  "Apache HBase": An open source, non-relational, distributed
230	      database.  Used with the Hadoop framework.

232	   o  "Apache Hive": It is a data warehouse infrastructure built on
233	      Hadoop.  It can be used for data extraction-transformation-
234	      loading(ETL), it is a mechanism that can store, query, and analyze
235	      large-scale data stored in Hadoop.

237	   o  "HDFS": A Hadoop distributed file system designed to run on
238	      general-purpose hardware.

240	   o  "MapReduce": A programming model for parallel computing of large-
241	      scale data sets (greater than 1 TB).

243	   o  "Spark": It is a fast and versatile computing engine designed for
244	      large-scale data processing.

246	   o  "MongoDB":It is a database based on distributed file storage
247	      designed to provide scalable, high-performance data storage
248	      solutions for web applications.

250	3.  Specific Framework Implementation Standards

252	   The main goal of this data migration framework is to help companies
253	   migrate their massive data stored in relational databases to cloud
254	   platforms through web access.  We propose a series of rules and
255	   constraints on the implementation of the framework, by which the
256	   users can conduct massive data migration with a multi-demand
257	   perspective.

259	   Note: The cloud platforms mentioned in the document refer to the
260	   Hadoop platform by default.  All standards on the operations and the
261	   environment of the framework refer to web state by default.

263	3.1.  System Architecture Diagram

265	   Figure 1 shows the working diagram of the framework.

267	       +---------+         +----------------+
268	       |         |   (1)   |    WebServer   |
269	       | Browser |-------->|                |---------------------
270	       |         |         |  +-----------+ |                    |
271	       +---------+         |  |   DMOW    | |                    |
272	                           |  +-----------+ |                    |
273	                           +----------------+                    |
274	                                                                 |(2)
275	                                                                 |
276	                                                                 |
277	           +-------------+          +-----------------------+    |
278	           |             |   (3)    |                       |    |
279	           | Data Source |--------> |     Cloud Platform    |    |
280	           |             |          |  +-----------------+  |<----
281	           +-------------+          |  | Migration Engine|  |
282	                                    |  +-----------------+  |
283	                                    +-----------------------+

285	                      Figure 1:Reference Architecture

287	   The workflow of the framework is as follows:

289	      Step (1) in the figure means that users submit the requisition of
290	      data migration to DMOW through browser(the requisition includes
291	      data source information, target cloud platform information, and
292	      related migration parameter settings);

294	      Step (2) in the figure means that DMOW submits user's request
295	      information of data migration to cloud platform's migration
296	      engine;

298	      Step (3) in the figure means that the migration engine performs
299	      data migration tasks based on the migration requests it receives
300	      to migrate data from relational database to cloud platform;

302	3.2.  Source and Target of Migration

304	3.2.1.  The Data Sources of Migration

306	   This framework MUST support data migration between relational
307	   databases and cloud platforms on web, and MUST meet the following
308	   requirements:

310	   1.  The framework supports to connect data sources in relational
311	       databases.  The relational database MUST be at least one of the
312	       following:

314	       *  SQLSERVER

316	       *  MYSQL

318	       *  ORACLE

320	   2.  This framework MUST support the dynamic perception of data
321	       information in relational databases under a normal connection, in
322	       other words :

324	       *  It MUST support dynamic awareness of all tables in a
325	          relational database;

327	       *  It MUST support dynamic awareness of all columns corresponding
328	          to all tables in a relational database;

330	3.2.2.  The Connection Testing of Relational Data Sources

332	   Before conducting data migration, the framework MUST support testing
333	   the connection to the data sources that will be migrated, and then
334	   decide whether to migrate.

336	3.2.3.  The Target Storage Container of Data Migration

338	   This framework MUST allow users to migrate large amounts of data from
339	   a relational database to the following at least two types of target
340	   storage containers:

342	   o  HDFS

344	   o  HBASE

346	   o  HIVE

348	3.2.4.  Specifying Target Cloud Platform

350	   This framework MUST allow an authorized user to specify the target
351	   cloud platform to which the data will be migrated.

353	3.2.5.  Data Migration to third-party Web Applications

355	   This framework SHALL support the migration of large amounts of data
356	   from relational databases to one or multiple data containers for
357	   third-party Web applications.  The target storage containers of the
358	   third-party Web application systems can be:

360	   o  MONGODB

362	   o  MYSQL

364	   o  SQLSERVER

366	   o  ORACLE

368	3.3.  Type of Migrated Database

370	   This framework is needed to meet the following requirements:

372	   o  It MAY support migrating the entire relational database to the
373	      cloud platform;

375	   o  It MAY support homogeneous migration (for example, migration from
376	      ORACLE to ORACLE);

378	   o  It MAY support heterogeneous migrations between different
379	      databases (for example, migration from ORACLE to SQLServer);

381	   o  It SHALL support the migration to the MONGODB database;
382	   o  It's OPTIONAL that If the migration process is interrupted, it is
383	      needed to support automatic restart of the migration process and
384	      continue the migration from where it left off; Additionally, the
385	      framework is needed to be able to support the user in the
386	      following manner to inform this abnormal interruption:

388	      *  It MUST support popping up an alert box on the screen of the
389	         user;

391	      *  It SHALL support notifying users by email;

393	      *  It's OPTIONAL to notify users by an Instant Messenger as We
394	         Chat or QQ;

396	3.4.  Scale of Migrated Table

398	3.4.1.  Full Table Migration

400	   This framework MUST support the migration of all tables in a
401	   relational database to at least two types of target storage
402	   containers:

404	   o  HDFS

406	   o  HBASE

408	   o  HIVE

410	3.4.2.  Single Table Migration

412	   This framework MUST allow users to specify a single table in a
413	   relational database and migrate it to at least two types of target
414	   storage containers:

416	   o  HDFS

418	   o  HBASE

420	   o  HIVE

422	3.4.3.  Multi-table migration

424	   This framework MUST allow users to specify multiple tables in a
425	   relational database and migrate to at least two types of target
426	   storage containers:

428	   o  HDFS
429	   o  HBASE

431	   o  HIVE

433	3.5.  Split-by

435	   This framework is needed to meet the following requirements on split-
436	   by.

438	3.5.1.  Single Column

440	   1.  The framework MUST allow the user to specify a single column of
441	       the data table (usually the table's primary key), then slice the
442	       data in the table into multiple parallel tasks based on this
443	       column, and migrate the sliced data to one or more of the
444	       following target data containers respectively:

446	       *  HDFS

448	       *  HBASE

450	       *  HIVE

452	          The specification of the data table column can be based on the
453	          following methods:

455	          +  Users can specify freely;

457	          +  Users can specify linearly;

459	          +  Users can select an appropriate column for the segmentation
460	             based on the information entropy of the selected column
461	             data;

463	   2.  The framework SHALL allow the user to query the boundaries of the
464	       specified column in the split-by, then slice the data into
465	       multiple parallel tasks and migrating the data to one or more of
466	       the following target data containers:

468	       *  HDFS

470	       *  HBASE

472	       *  HIVE

474	3.5.2.  Multiple Column

476	   This framework MAY allow the user to specify multiple columns in the
477	   data table to slice the data linearly into multiple parallel tasks
478	   and then migrate the data to one or more of the following target data
479	   containers:

481	   o  HDFS

483	   o  HBASE

485	   o  HIVE

487	3.5.3.  Non-linear Segmentation

489	   It's OPTIONAL that this framework is needed to support non-linear
490	   intelligent segmentations of data for one or more columns and then
491	   migrate the data to one or more of the following target data
492	   containers:

494	      The non-linear intelligent segmentations refer to:

496	      *  Adaptive segmentation based on the distribution(density)of the
497	         value of numerical columns;

499	      *  Adaptive segmentation based on the distribution of entropy of
500	         subsegments of a column;

502	      *  Adaptive Segmentation Based on Neural Network Predictor;

504	      The target data container includes:

506	      *  HDFS

508	      *  HBASE

510	      *  HIVE

512	3.6.  Conditional Query Migration

514	   This framework SHALL allow users to specify the query conditions,
515	   then querying out the corresponding data records and migrating them.

517	3.7.  Dynamic Detection of Data Redundancy

519	   It's OPTIONAL that the framework is needed to allow users to add data
520	   redundancy labels and label communication mechanisms, then it detects
521	   redundant data dynamically during data migration to achieve non-
522	   redundant migration.

524	   The specific requirements are as follows:

526	   o  The framework SHALL be able to deep granulation processing on the
527	      piece of data content to be sent.  It means the content segment to
528	      be sent is further divided into smaller-sized data sub-blocks.

530	   o  The framework SHALL be able to feature calculation and forming a
531	      grain head for each of the decomposed particles, the granular
532	      header information includes but not limited to grain feature
533	      amount, grain data fingerprint, unique grain ID number, particle
534	      generation time, source address and destination address, etc.

536	   o  The framework SHALL be able to detect the granular header
537	      information to determine the transmission status of each
538	      information granule content that is decomposed, and if the current
539	      information granule to be sent is already present at the receiving
540	      end, the content of the granule is not sent.  Otherwise the
541	      current granule will be sent out.

543	   o  After all the fragments of the data have been transferred, the
544	      framework SHALL be able to reassemble all the fragments and store
545	      the data on the receiving disk.

547	3.8.  Data Migration with Compression

549	   During the data migration process, the data is not compressed by
550	   default.  This framework MUST support at least one of the following
551	   data compression encoding formats, allowing the user to compress and
552	   migrate the data:

554	   o  GZIP

556	   o  BZIP2

558	3.9.  Updating Mode of Data Migration

560	3.9.1.  Appending Migration

562	   This framework SHALL support the migration of appending data to
563	   existing datasets in HDFS.

565	3.9.2.  Overwriting the Import

567	   When importing data into HIVE, the framework SHALL support
568	   overwriting the original dataset and saving it.

570	3.10.  The Encryption and Decryption of Data Migration

572	   This framework is needed to meet the following requirements:

574	   o  It MAY support data encryption at the source, and then the
575	      received data should be decrypted and stored on the target
576	      platform;

578	   o  It MUST support the authentication when getting data migration
579	      source data;

581	   o  It SHALL support the verification of identity and permission when
582	      accessing the target platform of data migration;

584	   o  During the process of data migration, it SHOULD support data
585	      consistency;

587	   o  During the process of data migration, it MUST support data
588	      integrity;

590	3.11.  Incremental Migration

592	   The framework SHOULD support incremental migration of table records
593	   in a relational database, and it MUST allow the user to specify a
594	   field value as "last_value" in the table in order to characterize the
595	   row record increment.  Then, the framework SHOULD migrate those
596	   records in the table whose field value is greater than the specified
597	   "last_value", and then update the last_value.

599	3.12.  Real-Time Synchronization Migration

601	   The framework SHALL support real-time synchronous migration of
602	   updated data and incremental data from a relational database to one
603	   or many of the following target data containers:

605	   o  HDFS

607	   o  HBASE

609	   o  HIVE

611	3.13.  The Direct Mode of Data Migration

613	   This framework MUST support data migration in direct mode, which can
614	   increase the data migration rate.

616	   Note:This mode supports only for MYSQL and POSTGRESQL.

618	3.14.  The Storage Format of Data files

620	   This framework MUST allow saving the migrated data within at least
621	   one of following data file formats:

623	   o  SEQUENCE

625	   o  TEXTFILE

627	   o  AVRO

629	3.15.  The Number of Map Tasks

631	   This framework MUST allow the user to specify a number of map tasks
632	   to start a corresponding number of map tasks for migrating large
633	   amounts of data in parallel.

635	3.16.  The selection on the elements in a table to be migrated column

637	   o  The specification of columns

639	         This framework MUST support the user to specify the data of one
640	         or multiple columns in a table to be migrated.

642	   o  The specification of rows

644	         This framework SHOULD support the user to specify the range of
645	         rows in a table to be migrated.

647	   o  The composition of the specification of columns and rows

649	         This framework MAY support optionally the user to specify the
650	         range of rows and columns in a table to be migrated.

652	3.17.  Visualization of Migration

654	3.17.1.  Dataset Visualization

656	   After the framework has migrated the data in the relational
657	   database,,it MUST support the visualization of the dataset in the
658	   cloud platform.

660	3.17.2.  Visualization of Data Migration Progress

662	   The framework SHOULD support to show dynamically the progress to
663	   users in graphical mode when migrating.

665	3.18.  Smart Analysis of Migration

667	   The framework MAY provide automated migration proposals to facilitate
668	   the user's estimation of migration workload and costs.

670	3.19.  Task Scheduling

672	   The framework SHALL support the user to set various migration
673	   parameters(such as map tasks,the storage format of data files,the
674	   type of data compression and so on) and task execution time, and then
675	   to perform the schedule off-line/online migration tasks.

677	3.20.  The Alarm of Task Error

679	   When the task fails, the framework MUST at least support to notify
680	   stakeholders through a predefined way.

682	3.21.  Data Export From Cloud to RDBMS

684	3.21.1.  Data Export Diagram
685	   Figure 2 shows the framework's working diagram of exporting data.

687	       +---------+         +----------------+
688	       |         |   (1)   |    WebServer   |
689	       | Browser |-------->|                |---------------------
690	       |         |         |  +-----------+ |                    |
691	       +---------+         |  |   DMOW    | |                    |
692	                           |  +-----------+ |                    |
693	                           +----------------+                    |
694	                                                                 |(2)
695	                                                                 |
696	                                                                 |
697	           +-------------+          +-----------------------+    |
698	           |             |   (3)    |                       |    |
699	           | Data Source |<-------- |     Cloud Platform    |    |
700	           |             |          |  +-----------------+  |<----
701	           +-------------+          |  | Migration Engine|  |
702	                                    |  +-----------------+  |
703	                                    +-----------------------+

705	                        Figure 2:Reference Diagram

707	   The workflow of exporting data through the framework is as follows:

709	      Step (1) in the figure means that users submit the requisition of
710	      data migration to DMOW through browser(the requisition includes
711	      cloud platform information,the information of target relational
712	      database, and related migration parameter settings);

714	      Step (2) in the figure means that DMOW submits user's request
715	      information of data migration to cloud platform's migration
716	      engine;

718	      Step (3) in the figure means that the migration engine performs
719	      data migration tasks based on the migration requests it receives
720	      to migrate data from cloud platform to relational database;

722	3.21.2.  Full Export

724	   The framework MUST at least support exporting data from HDFS to one
725	   of following relational databases:

727	   o  SQLSERVER

729	   o  MYSQL
730	   o  ORACLE

732	   The framework SHALL support exporting data from HBASE to one of
733	   following relational databases:

735	   o  SQLSERVER

737	   o  MYSQL

739	   o  ORACLE

741	   The framework SHALL support exporting data from HIVE to one of
742	   following relational databases:

744	   o  SQLSERVER

746	   o  MYSQL

748	   o  ORACLE

750	3.21.3.  Partial Export

752	   The framework SHALL allow the user to specify data range of keys on
753	   the cloud platform and export the elements in the specified range to
754	   a relational database.  Exporting into A Subset of Columns.

756	3.22.  The Merger of Data

758	   The framework SHALL support merging data in different directories in
759	   HDFS and store them in a specified directory.

761	3.23.  Column Separator

763	   The framework MUST allow the user to specify the separator between
764	   fields in the migration process.

766	3.24.  Record Line Separator

768	   The framework MUST allow the user to specify the separator between
769	   the record lines after the migration is complete.

771	3.25.  The Mode of Payment

773	   1.  One-way payment mode

775	       *  In the framework by default, users SHALL to pay for
776	          downloading data from the cloud platform.It is free when
777	          uploading data from a relational database to a cloud platform;

779	       *  In the framework, users SHALL pay for uploading data from a
780	          relational database to a cloud platform.It is free when
781	          downloading data from the cloud;

783	   2.  Two-way payment mode

785	          In the framework, the users of the data migration process
786	          between the relational database and the cloud platform all
787	          SHALL pay a fee;

789	3.26.  Web Shell for Migration

791	   The framework provides following shells for character interface to
792	   operate through web access.

794	3.26.1.  Linux Web Shell

796	   The framework SHALL support Linux shell through web access, which
797	   allows users to perform basic Linux command instructions for the
798	   configuration management of the data migrated on web.

800	3.26.2.  HBase Shell

802	   The framework SHALL support hbase shell through web access, which
803	   allows users to perform basic operations such as adding, deleting,
804	   and deleting to the data migrated to hbase through the web shell.

806	3.26.3.  Hive Shell

808	   The framework SHALL support hive shell through web access, which
809	   allows users to perform basic operations such as adding, deleting,
810	   and deleting to the data migrated to hive through the web shell.

812	3.26.4.  Hadoop Shell

814	   The framework SHALL support the Hadoop shell through web access so
815	   that users can perform basic Hadoop command operations through the
816	   web shell.

818	3.26.5.  Spark Shell

820	   The framework SHALL support spark shell through web access and
821	   provide an interactive way to analyze and process the data in the
822	   cloud platform.

824	3.26.6.  Spark Shell Programming Language

826	   In spark web shell, the framework SHALL support at least one of the
827	   following programming languages:

829	   o  Scala

831	   o  Java

833	   o  Python

835	4.  Security Considerations

837	   The framework SHOUD support for the security of the data migration
838	   process.  During the data migration process, it should support
839	   encrypt the data before transmission, and then decrypt it for storage
840	   in target after the transfer is complete.  At the same time, it must
841	   support the authentication when getting data migration source data
842	   and it shall support the verification of identity and permission when
843	   accessing the target platform.

845	5.  IANA Considerations

847	   This memo includes no request to IANA.

849	6.  References

851	6.1.  Normative References

853	   [RFC2026]  Bradner, S., "The Internet Standards Process -- Revision
854	              3", BCP 9, RFC 2026, DOI 10.17487/RFC2026, October 1996,
855	              <https://www.rfc-editor.org/info/rfc2026>.

857	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
858	              Requirement Levels", BCP 14, RFC 2119,
859	              DOI 10.17487/RFC2119, March 1997,
860	              <https://www.rfc-editor.org/info/rfc2119>.

862	   [RFC2578]  McCloghrie, K., Ed., Perkins, D., Ed., and J.
863	              Schoenwaelder, Ed., "Structure of Management Information
864	              Version 2 (SMIv2)", STD 58, RFC 2578,
865	              DOI 10.17487/RFC2578, April 1999,
866	              <https://www.rfc-editor.org/info/rfc2578>.

868	6.2.  Informative References

870	   [RFC2629]  Rose, M., "Writing I-Ds and RFCs using XML", RFC 2629,
871	              DOI 10.17487/RFC2629, June 1999,
872	              <https://www.rfc-editor.org/info/rfc2629>.

874	   [RFC4710]  Siddiqui, A., Romascanu, D., and E. Golovinsky, "Real-time
875	              Application Quality-of-Service Monitoring (RAQMON)
876	              Framework", RFC 4710, DOI 10.17487/RFC4710, October 2006,
877	              <https://www.rfc-editor.org/info/rfc4710>.

879	   [RFC5694]  Camarillo, G., Ed. and IAB, "Peer-to-Peer (P2P)
880	              Architecture: Definition, Taxonomies, Examples, and
881	              Applicability", RFC 5694, DOI 10.17487/RFC5694, November
882	              2009, <https://www.rfc-editor.org/info/rfc5694>.

884	6.3.  URL References

886	   [hadoop]   The Apache Software Foundation,
887	              "http://hadoop.apache.org/".

889	   [hbase]    The Apache Software Foundation,
890	              "http://hbase.apache.org/".

892	   [hive]     The Apache Software Foundation, "http://hive.apache.org/".

894	   [idguidelines]
895	              IETF Internet Drafts editor,
896	              "http://www.ietf.org/ietf/1id-guidelines.txt".

898	   [idnits]   IETF Internet Drafts editor,
899	              "http://www.ietf.org/ID-Checklist.html".

901	   [ietf]     IETF Tools Team, "http://tools.ietf.org".

903	   [ops]      the IETF OPS Area, "http://www.ops.ietf.org".

905	   [spark]    The Apache Software Foundation,
906	              "http://spark.apache.org/".

908	   [sqoop]    The Apache Software Foundation,
909	              "http://sqoop.apache.org/".

911	   [xml2rfc]  XML2RFC tools and documentation,
912	              "http://xml.resource.org".

914	Authors' Addresses

916	   Can Yang (editor)
917	   South China University of Technology
918	   382 Zhonghuan Road East
919	   Guangzhou Higher Education Mega Centre
920	   Guangzhou, Panyu District
921	   P.R.China

923	   Phone: +86 18602029601
924	   Email: cscyang@scut.edu.cn

926	   Yu Liu (editor)
927	   South China University of Technology
928	   382 Zhonghuan Road East
929	   Guangzhou Higher Education Mega Centre
930	   Guangzhou, Panyu District
931	   P.R.China

933	   Email: 201621032214@scut.edu.cn

935	   Cong Chen
936	   Inspur
937	   163 Pingyun Road
938	   Guangzhou, Tianhe District
939	   P.R.China

941	   Email: chen_cong@inspur.com

943	   Ge Chen
944	   GSTA
945	   No. 109 Zhongshan Road West, Guangdong Telecom Technology Building
946	   Guangzhou, Tianhe District
947	   P.R.China

949	   Email: cheng@gsta.com

951	   Yukai Wei
952	   Huawei
953	   Putian Huawei base
954	   Shenzhen, Longgang District
955	   P.R.China

957	   Email: weiyukai@huawei.com