idnits 2.17.1 draft-yangcan-ietf-data-migration-standards-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (May 27, 2020) is 1429 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'RFC2026' is defined on line 891, but no explicit reference was found in the text == Unused Reference: 'RFC2578' is defined on line 900, but no explicit reference was found in the text == Unused Reference: 'RFC2629' is defined on line 908, but no explicit reference was found in the text == Unused Reference: 'RFC4710' is defined on line 912, but no explicit reference was found in the text == Unused Reference: 'RFC5694' is defined on line 917, but no explicit reference was found in the text -- Obsolete informational reference (is this intentional?): RFC 2629 (Obsoleted by RFC 7749) Summary: 0 errors (**), 0 flaws (~~), 6 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force C. Yang, Ed. 3 Internet-Draft Y.Liu&Y.Wang&SY.Pan, Ed. 4 Intended status: Standards Track South China University of Technology 5 Expires: November 28, 2020 C. Chen 6 Inspur 7 G. Chen 8 GSTA 9 Y. Wei 10 Huawei 11 May 27, 2020 13 A Massive Data Migration Framework 14 draft-yangcan-ietf-data-migration-standards-04 16 Abstract 18 This document describes a standardized framework for implementing the 19 massive data migration between traditional databases and big-data 20 platforms on the cloud via Internet, especially for an instance of 21 Hadoop data architecture. The main goal of the framework is to 22 provide more concise and friendly interfaces for users more easily 23 and quickly migrate the massive data from a relational database to a 24 distributed platform for a variety of requirements, in order to make 25 full use of distributed storage resource and distributed computing 26 capability to solve the bottleneck problems of both storage and 27 computing performance in traditional enterprise-level applications. 28 This document covers the fundamental architecture, data elements 29 specification, operations, and interface related to massive data 30 migration. 32 Status of This Memo 34 This Internet-Draft is submitted in full conformance with the 35 provisions of BCP 78 and BCP 79. 37 Internet-Drafts are working documents of the Internet Engineering 38 Task Force (IETF). Note that other groups may also distribute 39 working documents as Internet-Drafts. The list of current Internet- 40 Drafts is at https://datatracker.ietf.org/drafts/current/. 42 Internet-Drafts are draft documents valid for a maximum of six months 43 and may be updated, replaced, or obsoleted by other documents at any 44 time. It is inappropriate to use Internet-Drafts as reference 45 material or to cite them other than as "work in progress." 47 This Internet-Draft will expire on November 28, 2020. 49 Copyright Notice 51 Copyright (c) 2020 IETF Trust and the persons identified as the 52 document authors. All rights reserved. 54 This document is subject to BCP 78 and the IETF Trust's Legal 55 Provisions Relating to IETF Documents 56 (https://trustee.ietf.org/license-info) in effect on the date of 57 publication of this document. Please review these documents 58 carefully, as they describe your rights and restrictions with respect 59 to this document. Code Components extracted from this document must 60 include Simplified BSD License text as described in Section 4.e of 61 the Trust Legal Provisions and are provided without warranty as 62 described in the Simplified BSD License. 64 Table of Contents 66 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 67 2. Definitions and Terminology . . . . . . . . . . . . . . . . . 4 68 3. Specific Framework Implementation Standards . . . . . . . . . 6 69 3.1. System Architecture Diagram . . . . . . . . . . . . . . . 6 70 3.2. Source and Target of Migration . . . . . . . . . . . . . 7 71 3.2.1. The Data Sources of Migration . . . . . . . . . . . . 7 72 3.2.2. The Connection Testing of Relational Data Sources . . 8 73 3.2.3. The Target Storage Container of Data Migration . . . 8 74 3.2.4. Specifying Target Cloud Platform . . . . . . . . . . 8 75 3.2.5. Data Migration to third-party Web Applications . . . 8 76 3.3. Type of Migrated Database . . . . . . . . . . . . . . . . 9 77 3.4. Scale of Migrated Table . . . . . . . . . . . . . . . . . 9 78 3.4.1. Full Table Migration . . . . . . . . . . . . . . . . 9 79 3.4.2. Single Table Migration . . . . . . . . . . . . . . . 10 80 3.4.3. Multi-table migration . . . . . . . . . . . . . . . . 10 81 3.5. Split-by . . . . . . . . . . . . . . . . . . . . . . . . 10 82 3.5.1. Single Column . . . . . . . . . . . . . . . . . . . . 10 83 3.5.2. Multiple Column . . . . . . . . . . . . . . . . . . . 11 84 3.5.3. Non-linear Segmentation . . . . . . . . . . . . . . . 11 85 3.6. Conditional Query Migration . . . . . . . . . . . . . . . 12 86 3.7. Dynamic Detection of Data Redundancy . . . . . . . . . . 12 87 3.8. Data Migration with Compression . . . . . . . . . . . . . 13 88 3.9. Updating Mode of Data Migration . . . . . . . . . . . . . 13 89 3.9.1. Appending Migration . . . . . . . . . . . . . . . . . 13 90 3.9.2. Overwriting the Import . . . . . . . . . . . . . . . 13 91 3.10. The Encryption and Decryption of Data Migration . . . . . 14 92 3.11. Incremental Migration . . . . . . . . . . . . . . . . . . 14 93 3.12. Real-Time Synchronization Migration . . . . . . . . . . . 14 94 3.13. The Direct Mode of Data Migration . . . . . . . . . . . . 14 95 3.14. The Storage Format of Data files . . . . . . . . . . . . 15 96 3.15. The Number of Map Tasks . . . . . . . . . . . . . . . . . 15 97 3.16. The selection on the elements in a table to be migrated 98 column . . . . . . . . . . . . . . . . . . . . . . . . . 15 99 3.17. Visualization of Migration . . . . . . . . . . . . . . . 15 100 3.17.1. Dataset Visualization . . . . . . . . . . . . . . . 15 101 3.17.2. Visualization of Data Migration Progress . . . . . . 15 102 3.18. Smart Analysis of Migration . . . . . . . . . . . . . . . 16 103 3.19. Task Scheduling . . . . . . . . . . . . . . . . . . . . . 16 104 3.20. The Alarm of Task Error . . . . . . . . . . . . . . . . . 16 105 3.21. Data Export From Cloud to RDBMS . . . . . . . . . . . . . 16 106 3.21.1. Data Export Diagram . . . . . . . . . . . . . . . . 16 107 3.21.2. Full Export . . . . . . . . . . . . . . . . . . . . 17 108 3.21.3. Partial Export . . . . . . . . . . . . . . . . . . . 17 109 3.22. The Merger of Data . . . . . . . . . . . . . . . . . . . 18 110 3.23. Column Separator . . . . . . . . . . . . . . . . . . . . 18 111 3.24. Record Line Separator . . . . . . . . . . . . . . . . . . 18 112 3.25. The Mode of Payment . . . . . . . . . . . . . . . . . . . 18 113 3.26. Web Shell for Migration . . . . . . . . . . . . . . . . . 18 114 3.26.1. Linux Web Shell . . . . . . . . . . . . . . . . . . 18 115 3.26.2. HBase Shell . . . . . . . . . . . . . . . . . . . . 19 116 3.26.3. Hive Shell . . . . . . . . . . . . . . . . . . . . . 19 117 3.26.4. Hadoop Shell . . . . . . . . . . . . . . . . . . . . 19 118 3.26.5. Spark Shell . . . . . . . . . . . . . . . . . . . . 19 119 3.26.6. Spark Shell Programming Language . . . . . . . . . . 19 120 4. Security Considerations . . . . . . . . . . . . . . . . . . . 19 121 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 20 122 6. References . . . . . . . . . . . . . . . . . . . . . . . . . 20 123 6.1. Normative References . . . . . . . . . . . . . . . . . . 20 124 6.2. Informative References . . . . . . . . . . . . . . . . . 20 125 6.3. URL References . . . . . . . . . . . . . . . . . . . . . 20 126 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 21 128 1. Introduction 130 With the widespread popularization of cloud computing and big data 131 technology, the scale of data is increasing rapidly, and the 132 distribution computing requirements are more significant than before. 133 For a long time, a majority of companies have usually use relational 134 databases to store and manage their data, a great amount of 135 structured data exist still and accumulate with the business 136 development in legacies. With the dairy growth of data size, the 137 storage bottleneck and the performance degradation for the data when 138 analyzing and processing have become pretty serious and need to be 139 solved in globe enterprise-level applications. This distributed 140 platform refers to a software platform that builds data storage, data 141 analysis, and calculations on a cluster of multiple hosts. Its core 142 architecture involves in distributed storage and distributed 143 computing. In terms of storage, it is theoretically possible to 144 expand capacity indefinitely, and storage can be dynamically expanded 145 horizontally with the increasing data. In terms of computing, some 146 key computing frameworks as mapreduce can be used to perform parallel 147 computing on large-scale datasets to improve the efficiency of 148 massive data processing. Therefore, when the data size exceeds the 149 storage capacity of a single-system or the computation exceeds the 150 computing capacity of a stand-alone system, massive data can be 151 migrated to a distributed platform. The ability of resource sharing 152 and collaborative computing provided by a distributed platform can 153 well solve large-scale data processing problems. The document 154 focuses on putting forward a standard for implementing a big data 155 migration framework through web access via Internet and considering 156 how to help users more easily and quickly migrate the massive data 157 from a traditional relational database to a cloud platform from 158 multiple requirements. Using the distributed storage and distributed 159 computing technologies highlighted by the cloud platform, on the one 160 hand, it solves the storage bottleneck and the problem of low data 161 analyzing and processing performance of relational databases. Based 162 on the access by web, the framework supports open work state and 163 promotes globe applications for data migration. 165 Note: It is also permissible to implement this framework in non-web. 167 2. Definitions and Terminology 169 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 170 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 171 document are to be interpreted as described in RFC 2119 [RFC2119]. 173 The following definitions are for terms used in the context of this 174 document. 176 o "DMOW": Its full name is "Data Migration on Web",it means data 177 migration based on web. 179 o "Cloud Computing": Cloud computing is a pay-per-use model that 180 provides available, convenient, on-demand network access, the user 181 enters a configurable computing resource sharing pool (resources 182 include network, server, storage, application software, services), 183 these resources can be provided quickly, with little 184 administrative effort or little interaction with service 185 providers. 187 o "Big Data": A collection of data that cannot be captured, managed, 188 and processed using conventional software tools within a certain 189 time frame. That is a massive, high growth rate and diversified 190 information assets that require new processing modes to have 191 stronger decision-making power, insight and process optimization 192 capabilities. 194 o "Data Migration": The data migration described in this document is 195 aimed at the data transfer process between a relational database 196 and a cloud platform. 198 o "Data Storage": Data is recorded in a format on the computer's 199 internal or external storage media. 201 o "Data Cleansing": It is a process of re-examining and verifying 202 data. The purpose is to remove duplicate information, correct 203 existing errors, and provide data consistency. 205 o "Extraction-Transformation-Loading(ETL)": The processing of user 206 database or data warehouse. That is, data is extracted from 207 various data sources, converted into data that meets the needs of 208 the business, and finally loaded into the database. 210 o "Distributed Platform": A software platform that builds data 211 storage, data analysis, and calculations on a cluster of multiple 212 hosts. 214 o "Distributed File System": The physical storage resources managed 215 by the file system are not directly connected to the local node. 216 Instead, they are distributed on a group of machine nodes 217 connected by a high-speed internal network. These machine nodes 218 together form a cluster. 220 o "Distributed Computing": A computer science discipline studies how 221 to divide a problem that requires a very large amount of computing 222 power into many small parts and it is coordinated by many 223 independent computers to get the final result. 225 o "Apache Hadoop": An open source distributed system infrastructure 226 that can be used to develop distributed programs for large data 227 operations and storage. 229 o "Apache HBase": An open source, non-relational, distributed 230 database. Used with the Hadoop framework. 232 o "Apache Hive": It is a data warehouse infrastructure built on 233 Hadoop. It can be used for data extraction-transformation- 234 loading(ETL), it is a mechanism that can store, query, and analyze 235 large-scale data stored in Hadoop. 237 o "HDFS": A Hadoop distributed file system designed to run on 238 general-purpose hardware. 240 o "MapReduce": A programming model for parallel computing of large- 241 scale data sets (greater than 1 TB). 243 o "Spark": It is a fast and versatile computing engine designed for 244 large-scale data processing. 246 o "MongoDB":It is a database based on distributed file storage 247 designed to provide scalable, high-performance data storage 248 solutions for web applications. 250 o "GHTs":the sender to send the granular header information index 251 table of the information granules. 253 o "GDBs": the sender to send the granular information database of 254 the information granules. 256 o "GHTr": the receiver to receive the granular header information 257 index table of the information granules. 259 o "GDBr":the receiver to receive the granular information database 260 of the information granules. 262 3. Specific Framework Implementation Standards 264 The main goal of this data migration framework is to help companies 265 migrate their massive data stored in relational databases to cloud 266 platforms through web access. We propose a series of rules and 267 constraints on the implementation of the framework, by which the 268 users can conduct massive data migration with a multi-demand 269 perspective. 271 Note: The cloud platforms mentioned in the document refer to the 272 Hadoop platform by default. All standards on the operations and the 273 environment of the framework refer to web state by default. 275 3.1. System Architecture Diagram 276 Figure 1 shows the working diagram of the framework. 278 +---------+ +----------------+ 279 | | (1) | WebServer | 280 | Browser |-------->| |--------------------- 281 | | | +-----------+ | | 282 +---------+ | | DMOW | | | 283 | +-----------+ | | 284 +----------------+ | 285 |(2) 286 | 287 | 288 +-------------+ +-----------------------+ | 289 | | (3) | | | 290 | Data Source |--------> | Cloud Platform | | 291 | | | +-----------------+ |<---- 292 +-------------+ | | Migration Engine| | 293 | +-----------------+ | 294 +-----------------------+ 296 Figure 1:Reference Architecture 298 The workflow of the framework is as follows: 300 Step (1) in the figure means that users submit the requisition of 301 data migration to DMOW through browser(the requisition includes 302 data source information, target cloud platform information, and 303 related migration parameter settings); 305 Step (2) in the figure means that DMOW submits user's request 306 information of data migration to cloud platform's migration 307 engine; 309 Step (3) in the figure means that the migration engine performs 310 data migration tasks based on the migration requests it receives 311 to migrate data from relational database to cloud platform; 313 3.2. Source and Target of Migration 315 3.2.1. The Data Sources of Migration 317 This framework MUST support data migration between relational 318 databases and cloud platforms on web, and MUST meet the following 319 requirements: 321 1. The framework supports to connect data sources in relational 322 databases. The relational database MUST be at least one of the 323 following: 325 * SQLSERVER 327 * MYSQL 329 * ORACLE 331 2. This framework MUST support the dynamic perception of data 332 information in relational databases under a normal connection, in 333 other words : 335 * It MUST support dynamic awareness of all tables in a 336 relational database; 338 * It MUST support dynamic awareness of all columns corresponding 339 to all tables in a relational database; 341 3.2.2. The Connection Testing of Relational Data Sources 343 Before conducting data migration, the framework MUST support testing 344 the connection to the data sources that will be migrated, and then 345 decide whether to migrate. 347 3.2.3. The Target Storage Container of Data Migration 349 This framework MUST allow users to migrate large amounts of data from 350 a relational database to the following at least two types of target 351 storage containers: 353 o HDFS 355 o HBASE 357 o HIVE 359 3.2.4. Specifying Target Cloud Platform 361 This framework MUST allow an authorized user to specify the target 362 cloud platform to which the data will be migrated. 364 3.2.5. Data Migration to third-party Web Applications 366 This framework SHALL support the migration of large amounts of data 367 from relational databases to one or multiple data containers for 368 third-party Web applications. The target storage containers of the 369 third-party Web application systems can be: 371 o MONGODB 373 o MYSQL 375 o SQLSERVER 377 o ORACLE 379 3.3. Type of Migrated Database 381 This framework is needed to meet the following requirements: 383 o It MAY support migrating the entire relational database to the 384 cloud platform; 386 o It MAY support homogeneous migration (for example, migration from 387 ORACLE to ORACLE); 389 o It MAY support heterogeneous migrations between different 390 databases (for example, migration from ORACLE to SQLServer); 392 o It SHALL support the migration to the MONGODB database; 394 o It's OPTIONAL that If the migration process is interrupted, it is 395 needed to support automatic restart of the migration process and 396 continue the migration from where it left off; Additionally, the 397 framework is needed to be able to support the user in the 398 following manner to inform this abnormal interruption: 400 * It MUST support popping up an alert box on the screen of the 401 user; 403 * It SHALL support notifying users by email; 405 * It's OPTIONAL to notify users by an Instant Messenger as We 406 Chat or QQ; 408 3.4. Scale of Migrated Table 410 3.4.1. Full Table Migration 412 This framework MUST support the migration of all tables in a 413 relational database to at least two types of target storage 414 containers: 416 o HDFS 418 o HBASE 420 o HIVE 422 3.4.2. Single Table Migration 424 This framework MUST allow users to specify a single table in a 425 relational database and migrate it to at least two types of target 426 storage containers: 428 o HDFS 430 o HBASE 432 o HIVE 434 3.4.3. Multi-table migration 436 This framework MUST allow users to specify multiple tables in a 437 relational database and migrate to at least two types of target 438 storage containers: 440 o HDFS 442 o HBASE 444 o HIVE 446 3.5. Split-by 448 This framework is needed to meet the following requirements on split- 449 by. 451 3.5.1. Single Column 453 1. The framework MUST allow the user to specify a single column of 454 the data table (usually the table's primary key), then slice the 455 data in the table into multiple parallel tasks based on this 456 column, and migrate the sliced data to one or more of the 457 following target data containers respectively: 459 * HDFS 461 * HBASE 463 * HIVE 464 The specification of the data table column can be based on the 465 following methods: 467 + Users can specify freely; 469 + Users can specify linearly; 471 + Users can select an appropriate column for the segmentation 472 based on the information entropy of the selected column 473 data; 475 2. The framework SHALL allow the user to query the boundaries of the 476 specified column in the split-by, then slice the data into 477 multiple parallel tasks and migrating the data to one or more of 478 the following target data containers: 480 * HDFS 482 * HBASE 484 * HIVE 486 3.5.2. Multiple Column 488 This framework MAY allow the user to specify multiple columns in the 489 data table to slice the data linearly into multiple parallel tasks 490 and then migrate the data to one or more of the following target data 491 containers: 493 o HDFS 495 o HBASE 497 o HIVE 499 3.5.3. Non-linear Segmentation 501 It's OPTIONAL that this framework is needed to support non-linear 502 intelligent segmentations of data for one or more columns and then 503 migrate the data to one or more of the following target data 504 containers: 506 The non-linear intelligent segmentations refer to: 508 * Adaptive segmentation based on the distribution(density)of the 509 value of numerical columns; 511 * Adaptive segmentation based on the distribution of entropy of 512 subsegments of a column; 514 * Adaptive Segmentation Based on Neural Network Predictor; 516 The target data container includes: 518 * HDFS 520 * HBASE 522 * HIVE 524 3.6. Conditional Query Migration 526 This framework SHALL allow users to specify the query conditions, 527 then querying out the corresponding data records and migrating them. 529 3.7. Dynamic Detection of Data Redundancy 531 It's OPTIONAL that the framework is needed to allow users to add data 532 redundancy labels and label communication mechanisms, then it detects 533 redundant data dynamically during data migration to achieve non- 534 redundant migration. 536 The specific requirements are as follows: 538 o The framework SHALL be able to deep granulation processing on the 539 piece of data content to be sent. It means the content segment to 540 be sent is further divided into smaller-sized data sub-blocks. 542 o The framework SHALL be able to feature calculation and forming a 543 grain head for each of the decomposed particles, the granular 544 header information includes but not limited to grain feature 545 amount, grain data fingerprint, unique grain ID number, particle 546 generation time, source address and destination address, etc. 548 o The framework SHALL be able to detect the granular header 549 information to determine the transmission status of each 550 information granule content that is decomposed, and if the current 551 information granule to be sent is already present at the receiving 552 end, the content of the granule is not sent. Otherwise the 553 current granule will be sent out. 555 o The framework SHALL be able to set a Cache at the sending port to 556 cache the granular header information index table (GHTs) and the 557 granular information database (GDBs) for the information granules; 558 the receiver SHALL be able to set a Cache to cache the granular 559 header information index table (GHTr) and the granular information 560 database (GDBr) that have successfully received the information 561 granular. 563 o After all the fragments of the data have been transferred, the 564 framework SHALL be able to reassemble all the fragments and store 565 the data on the receiving disk. 567 o The framework SHALL be able to set a granular encoder at the 568 sending port, which is responsible for encoding and compressing 569 the information granular content generated by the granular 570 resolver. The encoder generates a coded version of the 571 corresponding information granule, and calculates the content 572 about the granule head of the compressed information granule , 573 then performs transmission processing in the manner of sending the 574 granule head, detecting redundant granules, synthesizing the 575 granules, and accurately detecting redundant granules. 577 o The framework SHALL be able to set a granular decoder at the 578 receiving port, which is responsible for decoding the encoded 579 compressed granular content at the receiving port and merging it 580 with the grain synthesizer, whether it comes from the sending port 581 Cache or the receiving port Cache. 583 3.8. Data Migration with Compression 585 During the data migration process, the data is not compressed by 586 default. This framework MUST support at least one of the following 587 data compression encoding formats, allowing the user to compress and 588 migrate the data: 590 o GZIP 592 o BZIP2 594 3.9. Updating Mode of Data Migration 596 3.9.1. Appending Migration 598 This framework SHALL support the migration of appending data to 599 existing datasets in HDFS. 601 3.9.2. Overwriting the Import 603 When importing data into HIVE, the framework SHALL support 604 overwriting the original dataset and saving it. 606 3.10. The Encryption and Decryption of Data Migration 608 This framework is needed to meet the following requirements: 610 o It MAY support data encryption at the source, and then the 611 received data should be decrypted and stored on the target 612 platform; 614 o It MUST support the authentication when getting data migration 615 source data; 617 o It SHALL support the verification of identity and permission when 618 accessing the target platform of data migration; 620 o During the process of data migration, it SHOULD support data 621 consistency; 623 o During the process of data migration, it MUST support data 624 integrity; 626 3.11. Incremental Migration 628 The framework SHOULD support incremental migration of table records 629 in a relational database, and it MUST allow the user to specify a 630 field value as "last_value" in the table in order to characterize the 631 row record increment. Then, the framework SHOULD migrate those 632 records in the table whose field value is greater than the specified 633 "last_value", and then update the last_value. 635 3.12. Real-Time Synchronization Migration 637 The framework SHALL support real-time synchronous migration of 638 updated data and incremental data from a relational database to one 639 or many of the following target data containers: 641 o HDFS 643 o HBASE 645 o HIVE 647 3.13. The Direct Mode of Data Migration 649 This framework MUST support data migration in direct mode, which can 650 increase the data migration rate. 652 Note:This mode supports only for MYSQL and POSTGRESQL. 654 3.14. The Storage Format of Data files 656 This framework MUST allow saving the migrated data within at least 657 one of following data file formats: 659 o SEQUENCE 661 o TEXTFILE 663 o AVRO 665 3.15. The Number of Map Tasks 667 This framework MUST allow the user to specify a number of map tasks 668 to start a corresponding number of map tasks for migrating large 669 amounts of data in parallel. 671 3.16. The selection on the elements in a table to be migrated column 673 o The specification of columns 675 This framework MUST support the user to specify the data of one 676 or multiple columns in a table to be migrated. 678 o The specification of rows 680 This framework SHOULD support the user to specify the range of 681 rows in a table to be migrated. 683 o The composition of the specification of columns and rows 685 This framework MAY support optionally the user to specify the 686 range of rows and columns in a table to be migrated. 688 3.17. Visualization of Migration 690 3.17.1. Dataset Visualization 692 After the framework has migrated the data in the relational 693 database,,it MUST support the visualization of the dataset in the 694 cloud platform. 696 3.17.2. Visualization of Data Migration Progress 698 The framework SHOULD support to show dynamically the progress to 699 users in graphical mode when migrating. 701 3.18. Smart Analysis of Migration 703 The framework MAY provide automated migration proposals to facilitate 704 the user's estimation of migration workload and costs. 706 3.19. Task Scheduling 708 The framework SHALL support the user to set various migration 709 parameters(such as map tasks,the storage format of data files,the 710 type of data compression and so on) and task execution time, and then 711 to perform the schedule off-line/online migration tasks. 713 3.20. The Alarm of Task Error 715 When the task fails, the framework MUST at least support to notify 716 stakeholders through a predefined way. 718 3.21. Data Export From Cloud to RDBMS 720 3.21.1. Data Export Diagram 722 Figure 2 shows the framework's working diagram of exporting data. 724 +---------+ +----------------+ 725 | | (1) | WebServer | 726 | Browser |-------->| |--------------------- 727 | | | +-----------+ | | 728 +---------+ | | DMOW | | | 729 | +-----------+ | | 730 +----------------+ | 731 |(2) 732 | 733 | 734 +-------------+ +-----------------------+ | 735 | | (3) | | | 736 | Data Source |<-------- | Cloud Platform | | 737 | | | +-----------------+ |<---- 738 +-------------+ | | Migration Engine| | 739 | +-----------------+ | 740 +-----------------------+ 742 Figure 2:Reference Diagram 744 The workflow of exporting data through the framework is as follows: 746 Step (1) in the figure means that users submit the requisition of 747 data migration to DMOW through browser(the requisition includes 748 cloud platform information,the information of target relational 749 database, and related migration parameter settings); 751 Step (2) in the figure means that DMOW submits user's request 752 information of data migration to cloud platform's migration 753 engine; 755 Step (3) in the figure means that the migration engine performs 756 data migration tasks based on the migration requests it receives 757 to migrate data from cloud platform to relational database; 759 3.21.2. Full Export 761 The framework MUST at least support exporting data from HDFS to one 762 of following relational databases: 764 o SQLSERVER 766 o MYSQL 768 o ORACLE 770 The framework SHALL support exporting data from HBASE to one of 771 following relational databases: 773 o SQLSERVER 775 o MYSQL 777 o ORACLE 779 The framework SHALL support exporting data from HIVE to one of 780 following relational databases: 782 o SQLSERVER 784 o MYSQL 786 o ORACLE 788 3.21.3. Partial Export 790 The framework SHALL allow the user to specify data range of keys on 791 the cloud platform and export the elements in the specified range to 792 a relational database. Exporting into A Subset of Columns. 794 3.22. The Merger of Data 796 The framework SHALL support merging data in different directories in 797 HDFS and store them in a specified directory. 799 3.23. Column Separator 801 The framework MUST allow the user to specify the separator between 802 fields in the migration process. 804 3.24. Record Line Separator 806 The framework MUST allow the user to specify the separator between 807 the record lines after the migration is complete. 809 3.25. The Mode of Payment 811 1. One-way payment mode 813 * In the framework by default, users SHALL to pay for 814 downloading data from the cloud platform.It is free when 815 uploading data from a relational database to a cloud platform; 817 * In the framework, users SHALL pay for uploading data from a 818 relational database to a cloud platform.It is free when 819 downloading data from the cloud; 821 2. Two-way payment mode 823 In the framework, the users of the data migration process 824 between the relational database and the cloud platform all 825 SHALL pay a fee; 827 3.26. Web Shell for Migration 829 The framework provides following shells for character interface to 830 operate through web access. 832 3.26.1. Linux Web Shell 834 The framework SHALL support Linux shell through web access, which 835 allows users to perform basic Linux command instructions for the 836 configuration management of the data migrated on web. 838 3.26.2. HBase Shell 840 The framework SHALL support hbase shell through web access, which 841 allows users to perform basic operations such as adding, deleting, 842 and deleting to the data migrated to hbase through the web shell. 844 3.26.3. Hive Shell 846 The framework SHALL support hive shell through web access, which 847 allows users to perform basic operations such as adding, deleting, 848 and deleting to the data migrated to hive through the web shell. 850 3.26.4. Hadoop Shell 852 The framework SHALL support the Hadoop shell through web access so 853 that users can perform basic Hadoop command operations through the 854 web shell. 856 3.26.5. Spark Shell 858 The framework SHALL support spark shell through web access and 859 provide an interactive way to analyze and process the data in the 860 cloud platform. 862 3.26.6. Spark Shell Programming Language 864 In spark web shell, the framework SHALL support at least one of the 865 following programming languages: 867 o Scala 869 o Java 871 o Python 873 4. Security Considerations 875 The framework SHOUD support for the security of the data migration 876 process. During the data migration process, it should support 877 encrypt the data before transmission, and then decrypt it for storage 878 in target after the transfer is complete. At the same time, it must 879 support the authentication when getting data migration source data 880 and it shall support the verification of identity and permission when 881 accessing the target platform. 883 5. IANA Considerations 885 This memo includes no request to IANA. 887 6. References 889 6.1. Normative References 891 [RFC2026] Bradner, S., "The Internet Standards Process -- Revision 892 3", BCP 9, RFC 2026, DOI 10.17487/RFC2026, October 1996, 893 . 895 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 896 Requirement Levels", BCP 14, RFC 2119, 897 DOI 10.17487/RFC2119, March 1997, 898 . 900 [RFC2578] McCloghrie, K., Ed., Perkins, D., Ed., and J. 901 Schoenwaelder, Ed., "Structure of Management Information 902 Version 2 (SMIv2)", STD 58, RFC 2578, 903 DOI 10.17487/RFC2578, April 1999, 904 . 906 6.2. Informative References 908 [RFC2629] Rose, M., "Writing I-Ds and RFCs using XML", RFC 2629, 909 DOI 10.17487/RFC2629, June 1999, 910 . 912 [RFC4710] Siddiqui, A., Romascanu, D., and E. Golovinsky, "Real-time 913 Application Quality-of-Service Monitoring (RAQMON) 914 Framework", RFC 4710, DOI 10.17487/RFC4710, October 2006, 915 . 917 [RFC5694] Camarillo, G., Ed. and IAB, "Peer-to-Peer (P2P) 918 Architecture: Definition, Taxonomies, Examples, and 919 Applicability", RFC 5694, DOI 10.17487/RFC5694, November 920 2009, . 922 6.3. URL References 924 [hadoop] The Apache Software Foundation, 925 "http://hadoop.apache.org/". 927 [hbase] The Apache Software Foundation, 928 "http://hbase.apache.org/". 930 [hive] The Apache Software Foundation, "http://hive.apache.org/". 932 [idguidelines] 933 IETF Internet Drafts editor, 934 "http://www.ietf.org/ietf/1id-guidelines.txt". 936 [idnits] IETF Internet Drafts editor, 937 "http://www.ietf.org/ID-Checklist.html". 939 [ietf] IETF Tools Team, "http://tools.ietf.org". 941 [ops] the IETF OPS Area, "http://www.ops.ietf.org". 943 [spark] The Apache Software Foundation, 944 "http://spark.apache.org/". 946 [sqoop] The Apache Software Foundation, 947 "http://sqoop.apache.org/". 949 [xml2rfc] XML2RFC tools and documentation, 950 "http://xml.resource.org". 952 Authors' Addresses 954 Can Yang (editor) 955 South China University of Technology 956 382 Zhonghuan Road East 957 Guangzhou Higher Education Mega Centre 958 Guangzhou, Panyu District 959 P.R.China 961 Phone: +86 18602029601 962 Email: cscyang@scut.edu.cn 964 Yu Liu&Ying Wang&ShiYing Pan (editor) 965 South China University of Technology 966 382 Zhonghuan Road East 967 Guangzhou Higher Education Mega Centre 968 Guangzhou, Panyu District 969 P.R.China 971 Email: 201820132798@scut.edu.cn 972 Cong Chen 973 Inspur 974 163 Pingyun Road 975 Guangzhou, Tianhe District 976 P.R.China 978 Email: chen_cong@inspur.com 980 Ge Chen 981 GSTA 982 No. 109 Zhongshan Road West, Guangdong Telecom Technology Building 983 Guangzhou, Tianhe District 984 P.R.China 986 Email: cheng@gsta.com 988 Yukai Wei 989 Huawei 990 Putian Huawei base 991 Shenzhen, Longgang District 992 P.R.China 994 Email: weiyukai@huawei.com