idnits 2.17.1 draft-yangcan-ietf-data-migration-standards-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (May 30, 2019) is 1785 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'RFC2026' is defined on line 853, but no explicit reference was found in the text == Unused Reference: 'RFC2578' is defined on line 862, but no explicit reference was found in the text == Unused Reference: 'RFC2629' is defined on line 870, but no explicit reference was found in the text == Unused Reference: 'RFC4710' is defined on line 874, but no explicit reference was found in the text == Unused Reference: 'RFC5694' is defined on line 879, but no explicit reference was found in the text -- Obsolete informational reference (is this intentional?): RFC 2629 (Obsoleted by RFC 7749) Summary: 0 errors (**), 0 flaws (~~), 6 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force C. Yang, Ed. 3 Internet-Draft Y. Liu, Ed. 4 Intended status: Standards Track South China University of Technology 5 Expires: December 1, 2019 C. Chen 6 Inspur 7 G. Chen 8 GSTA 9 Y. Wei 10 Huawei 11 May 30, 2019 13 A Massive Data Migration Framework 14 draft-yangcan-ietf-data-migration-standards-02 16 Abstract 18 This document describes a standardized framework for implementing the 19 massive data migration between traditional databases and big-data 20 platforms on the cloud via Internet, especially for an instance of 21 Hadoop data architecture. The main goal of the framework is to 22 provide more concise and friendly interfaces for users more easily 23 and quickly migrate the massive data from a relational database to a 24 distributed platform for a variety of requirements, in order to make 25 full use of distributed storage resource and distributed computing 26 capability to solve the bottleneck problems of both storage and 27 computing performance in traditional enterprise-level applications. 28 This document covers the fundamental architecture, data elements 29 specification, operations, and interface related to massive data 30 migration. 32 Status of This Memo 34 This Internet-Draft is submitted in full conformance with the 35 provisions of BCP 78 and BCP 79. 37 Internet-Drafts are working documents of the Internet Engineering 38 Task Force (IETF). Note that other groups may also distribute 39 working documents as Internet-Drafts. The list of current Internet- 40 Drafts is at https://datatracker.ietf.org/drafts/current/. 42 Internet-Drafts are draft documents valid for a maximum of six months 43 and may be updated, replaced, or obsoleted by other documents at any 44 time. It is inappropriate to use Internet-Drafts as reference 45 material or to cite them other than as "work in progress." 47 This Internet-Draft will expire on December 1, 2019. 49 Copyright Notice 51 Copyright (c) 2019 IETF Trust and the persons identified as the 52 document authors. All rights reserved. 54 This document is subject to BCP 78 and the IETF Trust's Legal 55 Provisions Relating to IETF Documents 56 (https://trustee.ietf.org/license-info) in effect on the date of 57 publication of this document. Please review these documents 58 carefully, as they describe your rights and restrictions with respect 59 to this document. Code Components extracted from this document must 60 include Simplified BSD License text as described in Section 4.e of 61 the Trust Legal Provisions and are provided without warranty as 62 described in the Simplified BSD License. 64 Table of Contents 66 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 67 2. Definitions and Terminology . . . . . . . . . . . . . . . . . 4 68 3. Specific Framework Implementation Standards . . . . . . . . . 6 69 3.1. System Architecture Diagram . . . . . . . . . . . . . . . 6 70 3.2. Source and Target of Migration . . . . . . . . . . . . . 7 71 3.2.1. The Data Sources of Migration . . . . . . . . . . . . 7 72 3.2.2. The Connection Testing of Relational Data Sources . . 7 73 3.2.3. The Target Storage Container of Data Migration . . . 8 74 3.2.4. Specifying Target Cloud Platform . . . . . . . . . . 8 75 3.2.5. Data Migration to third-party Web Applications . . . 8 76 3.3. Type of Migrated Database . . . . . . . . . . . . . . . . 8 77 3.4. Scale of Migrated Table . . . . . . . . . . . . . . . . . 9 78 3.4.1. Full Table Migration . . . . . . . . . . . . . . . . 9 79 3.4.2. Single Table Migration . . . . . . . . . . . . . . . 9 80 3.4.3. Multi-table migration . . . . . . . . . . . . . . . . 9 81 3.5. Split-by . . . . . . . . . . . . . . . . . . . . . . . . 10 82 3.5.1. Single Column . . . . . . . . . . . . . . . . . . . . 10 83 3.5.2. Multiple Column . . . . . . . . . . . . . . . . . . . 11 84 3.5.3. Non-linear Segmentation . . . . . . . . . . . . . . . 11 85 3.6. Conditional Query Migration . . . . . . . . . . . . . . . 11 86 3.7. Dynamic Detection of Data Redundancy . . . . . . . . . . 11 87 3.8. Data Migration with Compression . . . . . . . . . . . . . 12 88 3.9. Updating Mode of Data Migration . . . . . . . . . . . . . 12 89 3.9.1. Appending Migration . . . . . . . . . . . . . . . . . 12 90 3.9.2. Overwriting the Import . . . . . . . . . . . . . . . 13 91 3.10. The Encryption and Decryption of Data Migration . . . . . 13 92 3.11. Incremental Migration . . . . . . . . . . . . . . . . . . 13 93 3.12. Real-Time Synchronization Migration . . . . . . . . . . . 13 94 3.13. The Direct Mode of Data Migration . . . . . . . . . . . . 14 95 3.14. The Storage Format of Data files . . . . . . . . . . . . 14 96 3.15. The Number of Map Tasks . . . . . . . . . . . . . . . . . 14 97 3.16. The selection on the elements in a table to be migrated 98 column . . . . . . . . . . . . . . . . . . . . . . . . . 14 99 3.17. Visualization of Migration . . . . . . . . . . . . . . . 14 100 3.17.1. Dataset Visualization . . . . . . . . . . . . . . . 14 101 3.17.2. Visualization of Data Migration Progress . . . . . . 15 102 3.18. Smart Analysis of Migration . . . . . . . . . . . . . . . 15 103 3.19. Task Scheduling . . . . . . . . . . . . . . . . . . . . . 15 104 3.20. The Alarm of Task Error . . . . . . . . . . . . . . . . . 15 105 3.21. Data Export From Cloud to RDBMS . . . . . . . . . . . . . 15 106 3.21.1. Data Export Diagram . . . . . . . . . . . . . . . . 15 107 3.21.2. Full Export . . . . . . . . . . . . . . . . . . . . 16 108 3.21.3. Partial Export . . . . . . . . . . . . . . . . . . . 17 109 3.22. The Merger of Data . . . . . . . . . . . . . . . . . . . 17 110 3.23. Column Separator . . . . . . . . . . . . . . . . . . . . 17 111 3.24. Record Line Separator . . . . . . . . . . . . . . . . . . 17 112 3.25. The Mode of Payment . . . . . . . . . . . . . . . . . . . 17 113 3.26. Web Shell for Migration . . . . . . . . . . . . . . . . . 18 114 3.26.1. Linux Web Shell . . . . . . . . . . . . . . . . . . 18 115 3.26.2. HBase Shell . . . . . . . . . . . . . . . . . . . . 18 116 3.26.3. Hive Shell . . . . . . . . . . . . . . . . . . . . . 18 117 3.26.4. Hadoop Shell . . . . . . . . . . . . . . . . . . . . 18 118 3.26.5. Spark Shell . . . . . . . . . . . . . . . . . . . . 18 119 3.26.6. Spark Shell Programming Language . . . . . . . . . . 19 120 4. Security Considerations . . . . . . . . . . . . . . . . . . . 19 121 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 19 122 6. References . . . . . . . . . . . . . . . . . . . . . . . . . 19 123 6.1. Normative References . . . . . . . . . . . . . . . . . . 19 124 6.2. Informative References . . . . . . . . . . . . . . . . . 20 125 6.3. URL References . . . . . . . . . . . . . . . . . . . . . 20 126 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 21 128 1. Introduction 130 With the widespread popularization of cloud computing and big data 131 technology, the scale of data is increasing rapidly, and the 132 distribution computing requirements are more significant than before. 133 For a long time, a majority of companies have usually use relational 134 databases to store and manage their data, a great amount of 135 structured data exist still and accumulate with the business 136 development in legacies. With the dairy growth of data size, the 137 storage bottleneck and the performance degradation for the data when 138 analyzing and processing have become pretty serious and need to be 139 solved in globe enterprise-level applications. This distributed 140 platform refers to a software platform that builds data storage, data 141 analysis, and calculations on a cluster of multiple hosts. Its core 142 architecture involves in distributed storage and distributed 143 computing. In terms of storage, it is theoretically possible to 144 expand capacity indefinitely, and storage can be dynamically expanded 145 horizontally with the increasing data. In terms of computing, some 146 key computing frameworks as mapreduce can be used to perform parallel 147 computing on large-scale datasets to improve the efficiency of 148 massive data processing. Therefore, when the data size exceeds the 149 storage capacity of a single-system or the computation exceeds the 150 computing capacity of a stand-alone system, massive data can be 151 migrated to a distributed platform. The ability of resource sharing 152 and collaborative computing provided by a distributed platform can 153 well solve large-scale data processing problems. The document 154 focuses on putting forward a standard for implementing a big data 155 migration framework through web access via Internet and considering 156 how to help users more easily and quickly migrate the massive data 157 from a traditional relational database to a cloud platform from 158 multiple requirements. Using the distributed storage and distributed 159 computing technologies highlighted by the cloud platform, on the one 160 hand, it solves the storage bottleneck and the problem of low data 161 analyzing and processing performance of relational databases. Based 162 on the access by web, the framework supports open work state and 163 promotes globe applications for data migration. 165 Note: It is also permissible to implement this framework in non-web. 167 2. Definitions and Terminology 169 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 170 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 171 document are to be interpreted as described in RFC 2119 [RFC2119]. 173 The following definitions are for terms used in the context of this 174 document. 176 o "DMOW": Its full name is "Data Migration on Web",it means data 177 migration based on web. 179 o "Cloud Computing": Cloud computing is a pay-per-use model that 180 provides available, convenient, on-demand network access, the user 181 enters a configurable computing resource sharing pool (resources 182 include network, server, storage, application software, services), 183 these resources can be provided quickly, with little 184 administrative effort or little interaction with service 185 providers. 187 o "Big Data": A collection of data that cannot be captured, managed, 188 and processed using conventional software tools within a certain 189 time frame. That is a massive, high growth rate and diversified 190 information assets that require new processing modes to have 191 stronger decision-making power, insight and process optimization 192 capabilities. 194 o "Data Migration": The data migration described in this document is 195 aimed at the data transfer process between a relational database 196 and a cloud platform. 198 o "Data Storage": Data is recorded in a format on the computer's 199 internal or external storage media. 201 o "Data Cleansing": It is a process of re-examining and verifying 202 data. The purpose is to remove duplicate information, correct 203 existing errors, and provide data consistency. 205 o "Extraction-Transformation-Loading(ETL)": The processing of user 206 database or data warehouse. That is, data is extracted from 207 various data sources, converted into data that meets the needs of 208 the business, and finally loaded into the database. 210 o "Distributed Platform": A software platform that builds data 211 storage, data analysis, and calculations on a cluster of multiple 212 hosts. 214 o "Distributed File System": The physical storage resources managed 215 by the file system are not directly connected to the local node. 216 Instead, they are distributed on a group of machine nodes 217 connected by a high-speed internal network. These machine nodes 218 together form a cluster. 220 o "Distributed Computing": A computer science discipline studies how 221 to divide a problem that requires a very large amount of computing 222 power into many small parts and it is coordinated by many 223 independent computers to get the final result. 225 o "Apache Hadoop": An open source distributed system infrastructure 226 that can be used to develop distributed programs for large data 227 operations and storage. 229 o "Apache HBase": An open source, non-relational, distributed 230 database. Used with the Hadoop framework. 232 o "Apache Hive": It is a data warehouse infrastructure built on 233 Hadoop. It can be used for data extraction-transformation- 234 loading(ETL), it is a mechanism that can store, query, and analyze 235 large-scale data stored in Hadoop. 237 o "HDFS": A Hadoop distributed file system designed to run on 238 general-purpose hardware. 240 o "MapReduce": A programming model for parallel computing of large- 241 scale data sets (greater than 1 TB). 243 o "Spark": It is a fast and versatile computing engine designed for 244 large-scale data processing. 246 o "MongoDB":It is a database based on distributed file storage 247 designed to provide scalable, high-performance data storage 248 solutions for web applications. 250 3. Specific Framework Implementation Standards 252 The main goal of this data migration framework is to help companies 253 migrate their massive data stored in relational databases to cloud 254 platforms through web access. We propose a series of rules and 255 constraints on the implementation of the framework, by which the 256 users can conduct massive data migration with a multi-demand 257 perspective. 259 Note: The cloud platforms mentioned in the document refer to the 260 Hadoop platform by default. All standards on the operations and the 261 environment of the framework refer to web state by default. 263 3.1. System Architecture Diagram 265 Figure 1 shows the working diagram of the framework. 267 +---------+ +----------------+ 268 | | (1) | WebServer | 269 | Browser |-------->| |--------------------- 270 | | | +-----------+ | | 271 +---------+ | | DMOW | | | 272 | +-----------+ | | 273 +----------------+ | 274 |(2) 275 | 276 | 277 +-------------+ +-----------------------+ | 278 | | (3) | | | 279 | Data Source |--------> | Cloud Platform | | 280 | | | +-----------------+ |<---- 281 +-------------+ | | Migration Engine| | 282 | +-----------------+ | 283 +-----------------------+ 285 Figure 1:Reference Architecture 287 The workflow of the framework is as follows: 289 Step (1) in the figure means that users submit the requisition of 290 data migration to DMOW through browser(the requisition includes 291 data source information, target cloud platform information, and 292 related migration parameter settings); 294 Step (2) in the figure means that DMOW submits user's request 295 information of data migration to cloud platform's migration 296 engine; 298 Step (3) in the figure means that the migration engine performs 299 data migration tasks based on the migration requests it receives 300 to migrate data from relational database to cloud platform; 302 3.2. Source and Target of Migration 304 3.2.1. The Data Sources of Migration 306 This framework MUST support data migration between relational 307 databases and cloud platforms on web, and MUST meet the following 308 requirements: 310 1. The framework supports to connect data sources in relational 311 databases. The relational database MUST be at least one of the 312 following: 314 * SQLSERVER 316 * MYSQL 318 * ORACLE 320 2. This framework MUST support the dynamic perception of data 321 information in relational databases under a normal connection, in 322 other words : 324 * It MUST support dynamic awareness of all tables in a 325 relational database; 327 * It MUST support dynamic awareness of all columns corresponding 328 to all tables in a relational database; 330 3.2.2. The Connection Testing of Relational Data Sources 332 Before conducting data migration, the framework MUST support testing 333 the connection to the data sources that will be migrated, and then 334 decide whether to migrate. 336 3.2.3. The Target Storage Container of Data Migration 338 This framework MUST allow users to migrate large amounts of data from 339 a relational database to the following at least two types of target 340 storage containers: 342 o HDFS 344 o HBASE 346 o HIVE 348 3.2.4. Specifying Target Cloud Platform 350 This framework MUST allow an authorized user to specify the target 351 cloud platform to which the data will be migrated. 353 3.2.5. Data Migration to third-party Web Applications 355 This framework SHALL support the migration of large amounts of data 356 from relational databases to one or multiple data containers for 357 third-party Web applications. The target storage containers of the 358 third-party Web application systems can be: 360 o MONGODB 362 o MYSQL 364 o SQLSERVER 366 o ORACLE 368 3.3. Type of Migrated Database 370 This framework is needed to meet the following requirements: 372 o It MAY support migrating the entire relational database to the 373 cloud platform; 375 o It MAY support homogeneous migration (for example, migration from 376 ORACLE to ORACLE); 378 o It MAY support heterogeneous migrations between different 379 databases (for example, migration from ORACLE to SQLServer); 381 o It SHALL support the migration to the MONGODB database; 382 o It's OPTIONAL that If the migration process is interrupted, it is 383 needed to support automatic restart of the migration process and 384 continue the migration from where it left off; Additionally, the 385 framework is needed to be able to support the user in the 386 following manner to inform this abnormal interruption: 388 * It MUST support popping up an alert box on the screen of the 389 user; 391 * It SHALL support notifying users by email; 393 * It's OPTIONAL to notify users by an Instant Messenger as We 394 Chat or QQ; 396 3.4. Scale of Migrated Table 398 3.4.1. Full Table Migration 400 This framework MUST support the migration of all tables in a 401 relational database to at least two types of target storage 402 containers: 404 o HDFS 406 o HBASE 408 o HIVE 410 3.4.2. Single Table Migration 412 This framework MUST allow users to specify a single table in a 413 relational database and migrate it to at least two types of target 414 storage containers: 416 o HDFS 418 o HBASE 420 o HIVE 422 3.4.3. Multi-table migration 424 This framework MUST allow users to specify multiple tables in a 425 relational database and migrate to at least two types of target 426 storage containers: 428 o HDFS 429 o HBASE 431 o HIVE 433 3.5. Split-by 435 This framework is needed to meet the following requirements on split- 436 by. 438 3.5.1. Single Column 440 1. The framework MUST allow the user to specify a single column of 441 the data table (usually the table's primary key), then slice the 442 data in the table into multiple parallel tasks based on this 443 column, and migrate the sliced data to one or more of the 444 following target data containers respectively: 446 * HDFS 448 * HBASE 450 * HIVE 452 The specification of the data table column can be based on the 453 following methods: 455 + Users can specify freely; 457 + Users can specify linearly; 459 + Users can select an appropriate column for the segmentation 460 based on the information entropy of the selected column 461 data; 463 2. The framework SHALL allow the user to query the boundaries of the 464 specified column in the split-by, then slice the data into 465 multiple parallel tasks and migrating the data to one or more of 466 the following target data containers: 468 * HDFS 470 * HBASE 472 * HIVE 474 3.5.2. Multiple Column 476 This framework MAY allow the user to specify multiple columns in the 477 data table to slice the data linearly into multiple parallel tasks 478 and then migrate the data to one or more of the following target data 479 containers: 481 o HDFS 483 o HBASE 485 o HIVE 487 3.5.3. Non-linear Segmentation 489 It's OPTIONAL that this framework is needed to support non-linear 490 intelligent segmentations of data for one or more columns and then 491 migrate the data to one or more of the following target data 492 containers: 494 The non-linear intelligent segmentations refer to: 496 * Adaptive segmentation based on the distribution(density)of the 497 value of numerical columns; 499 * Adaptive segmentation based on the distribution of entropy of 500 subsegments of a column; 502 * Adaptive Segmentation Based on Neural Network Predictor; 504 The target data container includes: 506 * HDFS 508 * HBASE 510 * HIVE 512 3.6. Conditional Query Migration 514 This framework SHALL allow users to specify the query conditions, 515 then querying out the corresponding data records and migrating them. 517 3.7. Dynamic Detection of Data Redundancy 519 It's OPTIONAL that the framework is needed to allow users to add data 520 redundancy labels and label communication mechanisms, then it detects 521 redundant data dynamically during data migration to achieve non- 522 redundant migration. 524 The specific requirements are as follows: 526 o The framework SHALL be able to deep granulation processing on the 527 piece of data content to be sent. It means the content segment to 528 be sent is further divided into smaller-sized data sub-blocks. 530 o The framework SHALL be able to feature calculation and forming a 531 grain head for each of the decomposed particles, the granular 532 header information includes but not limited to grain feature 533 amount, grain data fingerprint, unique grain ID number, particle 534 generation time, source address and destination address, etc. 536 o The framework SHALL be able to detect the granular header 537 information to determine the transmission status of each 538 information granule content that is decomposed, and if the current 539 information granule to be sent is already present at the receiving 540 end, the content of the granule is not sent. Otherwise the 541 current granule will be sent out. 543 o After all the fragments of the data have been transferred, the 544 framework SHALL be able to reassemble all the fragments and store 545 the data on the receiving disk. 547 3.8. Data Migration with Compression 549 During the data migration process, the data is not compressed by 550 default. This framework MUST support at least one of the following 551 data compression encoding formats, allowing the user to compress and 552 migrate the data: 554 o GZIP 556 o BZIP2 558 3.9. Updating Mode of Data Migration 560 3.9.1. Appending Migration 562 This framework SHALL support the migration of appending data to 563 existing datasets in HDFS. 565 3.9.2. Overwriting the Import 567 When importing data into HIVE, the framework SHALL support 568 overwriting the original dataset and saving it. 570 3.10. The Encryption and Decryption of Data Migration 572 This framework is needed to meet the following requirements: 574 o It MAY support data encryption at the source, and then the 575 received data should be decrypted and stored on the target 576 platform; 578 o It MUST support the authentication when getting data migration 579 source data; 581 o It SHALL support the verification of identity and permission when 582 accessing the target platform of data migration; 584 o During the process of data migration, it SHOULD support data 585 consistency; 587 o During the process of data migration, it MUST support data 588 integrity; 590 3.11. Incremental Migration 592 The framework SHOULD support incremental migration of table records 593 in a relational database, and it MUST allow the user to specify a 594 field value as "last_value" in the table in order to characterize the 595 row record increment. Then, the framework SHOULD migrate those 596 records in the table whose field value is greater than the specified 597 "last_value", and then update the last_value. 599 3.12. Real-Time Synchronization Migration 601 The framework SHALL support real-time synchronous migration of 602 updated data and incremental data from a relational database to one 603 or many of the following target data containers: 605 o HDFS 607 o HBASE 609 o HIVE 611 3.13. The Direct Mode of Data Migration 613 This framework MUST support data migration in direct mode, which can 614 increase the data migration rate. 616 Note:This mode supports only for MYSQL and POSTGRESQL. 618 3.14. The Storage Format of Data files 620 This framework MUST allow saving the migrated data within at least 621 one of following data file formats: 623 o SEQUENCE 625 o TEXTFILE 627 o AVRO 629 3.15. The Number of Map Tasks 631 This framework MUST allow the user to specify a number of map tasks 632 to start a corresponding number of map tasks for migrating large 633 amounts of data in parallel. 635 3.16. The selection on the elements in a table to be migrated column 637 o The specification of columns 639 This framework MUST support the user to specify the data of one 640 or multiple columns in a table to be migrated. 642 o The specification of rows 644 This framework SHOULD support the user to specify the range of 645 rows in a table to be migrated. 647 o The composition of the specification of columns and rows 649 This framework MAY support optionally the user to specify the 650 range of rows and columns in a table to be migrated. 652 3.17. Visualization of Migration 654 3.17.1. Dataset Visualization 656 After the framework has migrated the data in the relational 657 database,,it MUST support the visualization of the dataset in the 658 cloud platform. 660 3.17.2. Visualization of Data Migration Progress 662 The framework SHOULD support to show dynamically the progress to 663 users in graphical mode when migrating. 665 3.18. Smart Analysis of Migration 667 The framework MAY provide automated migration proposals to facilitate 668 the user's estimation of migration workload and costs. 670 3.19. Task Scheduling 672 The framework SHALL support the user to set various migration 673 parameters(such as map tasks,the storage format of data files,the 674 type of data compression and so on) and task execution time, and then 675 to perform the schedule off-line/online migration tasks. 677 3.20. The Alarm of Task Error 679 When the task fails, the framework MUST at least support to notify 680 stakeholders through a predefined way. 682 3.21. Data Export From Cloud to RDBMS 684 3.21.1. Data Export Diagram 685 Figure 2 shows the framework's working diagram of exporting data. 687 +---------+ +----------------+ 688 | | (1) | WebServer | 689 | Browser |-------->| |--------------------- 690 | | | +-----------+ | | 691 +---------+ | | DMOW | | | 692 | +-----------+ | | 693 +----------------+ | 694 |(2) 695 | 696 | 697 +-------------+ +-----------------------+ | 698 | | (3) | | | 699 | Data Source |<-------- | Cloud Platform | | 700 | | | +-----------------+ |<---- 701 +-------------+ | | Migration Engine| | 702 | +-----------------+ | 703 +-----------------------+ 705 Figure 2:Reference Diagram 707 The workflow of exporting data through the framework is as follows: 709 Step (1) in the figure means that users submit the requisition of 710 data migration to DMOW through browser(the requisition includes 711 cloud platform information,the information of target relational 712 database, and related migration parameter settings); 714 Step (2) in the figure means that DMOW submits user's request 715 information of data migration to cloud platform's migration 716 engine; 718 Step (3) in the figure means that the migration engine performs 719 data migration tasks based on the migration requests it receives 720 to migrate data from cloud platform to relational database; 722 3.21.2. Full Export 724 The framework MUST at least support exporting data from HDFS to one 725 of following relational databases: 727 o SQLSERVER 729 o MYSQL 730 o ORACLE 732 The framework SHALL support exporting data from HBASE to one of 733 following relational databases: 735 o SQLSERVER 737 o MYSQL 739 o ORACLE 741 The framework SHALL support exporting data from HIVE to one of 742 following relational databases: 744 o SQLSERVER 746 o MYSQL 748 o ORACLE 750 3.21.3. Partial Export 752 The framework SHALL allow the user to specify data range of keys on 753 the cloud platform and export the elements in the specified range to 754 a relational database. Exporting into A Subset of Columns. 756 3.22. The Merger of Data 758 The framework SHALL support merging data in different directories in 759 HDFS and store them in a specified directory. 761 3.23. Column Separator 763 The framework MUST allow the user to specify the separator between 764 fields in the migration process. 766 3.24. Record Line Separator 768 The framework MUST allow the user to specify the separator between 769 the record lines after the migration is complete. 771 3.25. The Mode of Payment 773 1. One-way payment mode 775 * In the framework by default, users SHALL to pay for 776 downloading data from the cloud platform.It is free when 777 uploading data from a relational database to a cloud platform; 779 * In the framework, users SHALL pay for uploading data from a 780 relational database to a cloud platform.It is free when 781 downloading data from the cloud; 783 2. Two-way payment mode 785 In the framework, the users of the data migration process 786 between the relational database and the cloud platform all 787 SHALL pay a fee; 789 3.26. Web Shell for Migration 791 The framework provides following shells for character interface to 792 operate through web access. 794 3.26.1. Linux Web Shell 796 The framework SHALL support Linux shell through web access, which 797 allows users to perform basic Linux command instructions for the 798 configuration management of the data migrated on web. 800 3.26.2. HBase Shell 802 The framework SHALL support hbase shell through web access, which 803 allows users to perform basic operations such as adding, deleting, 804 and deleting to the data migrated to hbase through the web shell. 806 3.26.3. Hive Shell 808 The framework SHALL support hive shell through web access, which 809 allows users to perform basic operations such as adding, deleting, 810 and deleting to the data migrated to hive through the web shell. 812 3.26.4. Hadoop Shell 814 The framework SHALL support the Hadoop shell through web access so 815 that users can perform basic Hadoop command operations through the 816 web shell. 818 3.26.5. Spark Shell 820 The framework SHALL support spark shell through web access and 821 provide an interactive way to analyze and process the data in the 822 cloud platform. 824 3.26.6. Spark Shell Programming Language 826 In spark web shell, the framework SHALL support at least one of the 827 following programming languages: 829 o Scala 831 o Java 833 o Python 835 4. Security Considerations 837 The framework SHOUD support for the security of the data migration 838 process. During the data migration process, it should support 839 encrypt the data before transmission, and then decrypt it for storage 840 in target after the transfer is complete. At the same time, it must 841 support the authentication when getting data migration source data 842 and it shall support the verification of identity and permission when 843 accessing the target platform. 845 5. IANA Considerations 847 This memo includes no request to IANA. 849 6. References 851 6.1. Normative References 853 [RFC2026] Bradner, S., "The Internet Standards Process -- Revision 854 3", BCP 9, RFC 2026, DOI 10.17487/RFC2026, October 1996, 855 . 857 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 858 Requirement Levels", BCP 14, RFC 2119, 859 DOI 10.17487/RFC2119, March 1997, 860 . 862 [RFC2578] McCloghrie, K., Ed., Perkins, D., Ed., and J. 863 Schoenwaelder, Ed., "Structure of Management Information 864 Version 2 (SMIv2)", STD 58, RFC 2578, 865 DOI 10.17487/RFC2578, April 1999, 866 . 868 6.2. Informative References 870 [RFC2629] Rose, M., "Writing I-Ds and RFCs using XML", RFC 2629, 871 DOI 10.17487/RFC2629, June 1999, 872 . 874 [RFC4710] Siddiqui, A., Romascanu, D., and E. Golovinsky, "Real-time 875 Application Quality-of-Service Monitoring (RAQMON) 876 Framework", RFC 4710, DOI 10.17487/RFC4710, October 2006, 877 . 879 [RFC5694] Camarillo, G., Ed. and IAB, "Peer-to-Peer (P2P) 880 Architecture: Definition, Taxonomies, Examples, and 881 Applicability", RFC 5694, DOI 10.17487/RFC5694, November 882 2009, . 884 6.3. URL References 886 [hadoop] The Apache Software Foundation, 887 "http://hadoop.apache.org/". 889 [hbase] The Apache Software Foundation, 890 "http://hbase.apache.org/". 892 [hive] The Apache Software Foundation, "http://hive.apache.org/". 894 [idguidelines] 895 IETF Internet Drafts editor, 896 "http://www.ietf.org/ietf/1id-guidelines.txt". 898 [idnits] IETF Internet Drafts editor, 899 "http://www.ietf.org/ID-Checklist.html". 901 [ietf] IETF Tools Team, "http://tools.ietf.org". 903 [ops] the IETF OPS Area, "http://www.ops.ietf.org". 905 [spark] The Apache Software Foundation, 906 "http://spark.apache.org/". 908 [sqoop] The Apache Software Foundation, 909 "http://sqoop.apache.org/". 911 [xml2rfc] XML2RFC tools and documentation, 912 "http://xml.resource.org". 914 Authors' Addresses 916 Can Yang (editor) 917 South China University of Technology 918 382 Zhonghuan Road East 919 Guangzhou Higher Education Mega Centre 920 Guangzhou, Panyu District 921 P.R.China 923 Phone: +86 18602029601 924 Email: cscyang@scut.edu.cn 926 Yu Liu (editor) 927 South China University of Technology 928 382 Zhonghuan Road East 929 Guangzhou Higher Education Mega Centre 930 Guangzhou, Panyu District 931 P.R.China 933 Email: 201621032214@scut.edu.cn 935 Cong Chen 936 Inspur 937 163 Pingyun Road 938 Guangzhou, Tianhe District 939 P.R.China 941 Email: chen_cong@inspur.com 943 Ge Chen 944 GSTA 945 No. 109 Zhongshan Road West, Guangdong Telecom Technology Building 946 Guangzhou, Tianhe District 947 P.R.China 949 Email: cheng@gsta.com 951 Yukai Wei 952 Huawei 953 Putian Huawei base 954 Shenzhen, Longgang District 955 P.R.China 957 Email: weiyukai@huawei.com