Internet Engineering Task Force C. Yang, Ed. Internet-Draft Y.Liu&Y.Wang&SY.Pan, Ed. Intended status: Standards Track South China University of Technology Expires: November 28, 2020 C. Chen Inspur G. Chen GSTA Y. Wei Huawei May 27, 2020 A Massive Data Migration Framework draft-yangcan-ietf-data-migration-standards-04 Abstract This document describes a standardized framework for implementing the massive data migration between traditional databases and big-data platforms on the cloud via Internet, especially for an instance of Hadoop data architecture. The main goal of the framework is to provide more concise and friendly interfaces for users more easily and quickly migrate the massive data from a relational database to a distributed platform for a variety of requirements, in order to make full use of distributed storage resource and distributed computing capability to solve the bottleneck problems of both storage and computing performance in traditional enterprise-level applications. This document covers the fundamental architecture, data elements specification, operations, and interface related to massive data migration. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on November 28, 2020. Yang, et al. Expires November 28, 2020 [Page 1] Internet-Draft Data Migration Standards May 2020 Copyright Notice Copyright (c) 2020 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Definitions and Terminology . . . . . . . . . . . . . . . . . 4 3. Specific Framework Implementation Standards . . . . . . . . . 6 3.1. System Architecture Diagram . . . . . . . . . . . . . . . 6 3.2. Source and Target of Migration . . . . . . . . . . . . . 7 3.2.1. The Data Sources of Migration . . . . . . . . . . . . 7 3.2.2. The Connection Testing of Relational Data Sources . . 8 3.2.3. The Target Storage Container of Data Migration . . . 8 3.2.4. Specifying Target Cloud Platform . . . . . . . . . . 8 3.2.5. Data Migration to third-party Web Applications . . . 8 3.3. Type of Migrated Database . . . . . . . . . . . . . . . . 9 3.4. Scale of Migrated Table . . . . . . . . . . . . . . . . . 9 3.4.1. Full Table Migration . . . . . . . . . . . . . . . . 9 3.4.2. Single Table Migration . . . . . . . . . . . . . . . 10 3.4.3. Multi-table migration . . . . . . . . . . . . . . . . 10 3.5. Split-by . . . . . . . . . . . . . . . . . . . . . . . . 10 3.5.1. Single Column . . . . . . . . . . . . . . . . . . . . 10 3.5.2. Multiple Column . . . . . . . . . . . . . . . . . . . 11 3.5.3. Non-linear Segmentation . . . . . . . . . . . . . . . 11 3.6. Conditional Query Migration . . . . . . . . . . . . . . . 12 3.7. Dynamic Detection of Data Redundancy . . . . . . . . . . 12 3.8. Data Migration with Compression . . . . . . . . . . . . . 13 3.9. Updating Mode of Data Migration . . . . . . . . . . . . . 13 3.9.1. Appending Migration . . . . . . . . . . . . . . . . . 13 3.9.2. Overwriting the Import . . . . . . . . . . . . . . . 13 3.10. The Encryption and Decryption of Data Migration . . . . . 14 3.11. Incremental Migration . . . . . . . . . . . . . . . . . . 14 3.12. Real-Time Synchronization Migration . . . . . . . . . . . 14 3.13. The Direct Mode of Data Migration . . . . . . . . . . . . 14 3.14. The Storage Format of Data files . . . . . . . . . . . . 15 3.15. The Number of Map Tasks . . . . . . . . . . . . . . . . . 15 Yang, et al. Expires November 28, 2020 [Page 2] Internet-Draft Data Migration Standards May 2020 3.16. The selection on the elements in a table to be migrated column . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.17. Visualization of Migration . . . . . . . . . . . . . . . 15 3.17.1. Dataset Visualization . . . . . . . . . . . . . . . 15 3.17.2. Visualization of Data Migration Progress . . . . . . 15 3.18. Smart Analysis of Migration . . . . . . . . . . . . . . . 16 3.19. Task Scheduling . . . . . . . . . . . . . . . . . . . . . 16 3.20. The Alarm of Task Error . . . . . . . . . . . . . . . . . 16 3.21. Data Export From Cloud to RDBMS . . . . . . . . . . . . . 16 3.21.1. Data Export Diagram . . . . . . . . . . . . . . . . 16 3.21.2. Full Export . . . . . . . . . . . . . . . . . . . . 17 3.21.3. Partial Export . . . . . . . . . . . . . . . . . . . 17 3.22. The Merger of Data . . . . . . . . . . . . . . . . . . . 18 3.23. Column Separator . . . . . . . . . . . . . . . . . . . . 18 3.24. Record Line Separator . . . . . . . . . . . . . . . . . . 18 3.25. The Mode of Payment . . . . . . . . . . . . . . . . . . . 18 3.26. Web Shell for Migration . . . . . . . . . . . . . . . . . 18 3.26.1. Linux Web Shell . . . . . . . . . . . . . . . . . . 18 3.26.2. HBase Shell . . . . . . . . . . . . . . . . . . . . 19 3.26.3. Hive Shell . . . . . . . . . . . . . . . . . . . . . 19 3.26.4. Hadoop Shell . . . . . . . . . . . . . . . . . . . . 19 3.26.5. Spark Shell . . . . . . . . . . . . . . . . . . . . 19 3.26.6. Spark Shell Programming Language . . . . . . . . . . 19 4. Security Considerations . . . . . . . . . . . . . . . . . . . 19 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 20 6. References . . . . . . . . . . . . . . . . . . . . . . . . . 20 6.1. Normative References . . . . . . . . . . . . . . . . . . 20 6.2. Informative References . . . . . . . . . . . . . . . . . 20 6.3. URL References . . . . . . . . . . . . . . . . . . . . . 20 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 21 1. Introduction With the widespread popularization of cloud computing and big data technology, the scale of data is increasing rapidly, and the distribution computing requirements are more significant than before. For a long time, a majority of companies have usually use relational databases to store and manage their data, a great amount of structured data exist still and accumulate with the business development in legacies. With the dairy growth of data size, the storage bottleneck and the performance degradation for the data when analyzing and processing have become pretty serious and need to be solved in globe enterprise-level applications. This distributed platform refers to a software platform that builds data storage, data analysis, and calculations on a cluster of multiple hosts. Its core architecture involves in distributed storage and distributed computing. In terms of storage, it is theoretically possible to expand capacity indefinitely, and storage can be dynamically expanded Yang, et al. Expires November 28, 2020 [Page 3] Internet-Draft Data Migration Standards May 2020 horizontally with the increasing data. In terms of computing, some key computing frameworks as mapreduce can be used to perform parallel computing on large-scale datasets to improve the efficiency of massive data processing. Therefore, when the data size exceeds the storage capacity of a single-system or the computation exceeds the computing capacity of a stand-alone system, massive data can be migrated to a distributed platform. The ability of resource sharing and collaborative computing provided by a distributed platform can well solve large-scale data processing problems. The document focuses on putting forward a standard for implementing a big data migration framework through web access via Internet and considering how to help users more easily and quickly migrate the massive data from a traditional relational database to a cloud platform from multiple requirements. Using the distributed storage and distributed computing technologies highlighted by the cloud platform, on the one hand, it solves the storage bottleneck and the problem of low data analyzing and processing performance of relational databases. Based on the access by web, the framework supports open work state and promotes globe applications for data migration. Note: It is also permissible to implement this framework in non-web. 2. Definitions and Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. The following definitions are for terms used in the context of this document. o "DMOW": Its full name is "Data Migration on Web",it means data migration based on web. o "Cloud Computing": Cloud computing is a pay-per-use model that provides available, convenient, on-demand network access, the user enters a configurable computing resource sharing pool (resources include network, server, storage, application software, services), these resources can be provided quickly, with little administrative effort or little interaction with service providers. o "Big Data": A collection of data that cannot be captured, managed, and processed using conventional software tools within a certain time frame. That is a massive, high growth rate and diversified information assets that require new processing modes to have stronger decision-making power, insight and process optimization capabilities. Yang, et al. Expires November 28, 2020 [Page 4] Internet-Draft Data Migration Standards May 2020 o "Data Migration": The data migration described in this document is aimed at the data transfer process between a relational database and a cloud platform. o "Data Storage": Data is recorded in a format on the computer's internal or external storage media. o "Data Cleansing": It is a process of re-examining and verifying data. The purpose is to remove duplicate information, correct existing errors, and provide data consistency. o "Extraction-Transformation-Loading(ETL)": The processing of user database or data warehouse. That is, data is extracted from various data sources, converted into data that meets the needs of the business, and finally loaded into the database. o "Distributed Platform": A software platform that builds data storage, data analysis, and calculations on a cluster of multiple hosts. o "Distributed File System": The physical storage resources managed by the file system are not directly connected to the local node. Instead, they are distributed on a group of machine nodes connected by a high-speed internal network. These machine nodes together form a cluster. o "Distributed Computing": A computer science discipline studies how to divide a problem that requires a very large amount of computing power into many small parts and it is coordinated by many independent computers to get the final result. o "Apache Hadoop": An open source distributed system infrastructure that can be used to develop distributed programs for large data operations and storage. o "Apache HBase": An open source, non-relational, distributed database. Used with the Hadoop framework. o "Apache Hive": It is a data warehouse infrastructure built on Hadoop. It can be used for data extraction-transformation- loading(ETL), it is a mechanism that can store, query, and analyze large-scale data stored in Hadoop. o "HDFS": A Hadoop distributed file system designed to run on general-purpose hardware. o "MapReduce": A programming model for parallel computing of large- scale data sets (greater than 1 TB). Yang, et al. Expires November 28, 2020 [Page 5] Internet-Draft Data Migration Standards May 2020 o "Spark": It is a fast and versatile computing engine designed for large-scale data processing. o "MongoDB":It is a database based on distributed file storage designed to provide scalable, high-performance data storage solutions for web applications. o "GHTs":the sender to send the granular header information index table of the information granules. o "GDBs": the sender to send the granular information database of the information granules. o "GHTr": the receiver to receive the granular header information index table of the information granules. o "GDBr":the receiver to receive the granular information database of the information granules. 3. Specific Framework Implementation Standards The main goal of this data migration framework is to help companies migrate their massive data stored in relational databases to cloud platforms through web access. We propose a series of rules and constraints on the implementation of the framework, by which the users can conduct massive data migration with a multi-demand perspective. Note: The cloud platforms mentioned in the document refer to the Hadoop platform by default. All standards on the operations and the environment of the framework refer to web state by default. 3.1. System Architecture Diagram Yang, et al. Expires November 28, 2020 [Page 6] Internet-Draft Data Migration Standards May 2020 Figure 1 shows the working diagram of the framework. +---------+ +----------------+ | | (1) | WebServer | | Browser |-------->| |--------------------- | | | +-----------+ | | +---------+ | | DMOW | | | | +-----------+ | | +----------------+ | |(2) | | +-------------+ +-----------------------+ | | | (3) | | | | Data Source |--------> | Cloud Platform | | | | | +-----------------+ |<---- +-------------+ | | Migration Engine| | | +-----------------+ | +-----------------------+ Figure 1:Reference Architecture The workflow of the framework is as follows: Step (1) in the figure means that users submit the requisition of data migration to DMOW through browser(the requisition includes data source information, target cloud platform information, and related migration parameter settings); Step (2) in the figure means that DMOW submits user's request information of data migration to cloud platform's migration engine; Step (3) in the figure means that the migration engine performs data migration tasks based on the migration requests it receives to migrate data from relational database to cloud platform; 3.2. Source and Target of Migration 3.2.1. The Data Sources of Migration This framework MUST support data migration between relational databases and cloud platforms on web, and MUST meet the following requirements: Yang, et al. Expires November 28, 2020 [Page 7] Internet-Draft Data Migration Standards May 2020 1. The framework supports to connect data sources in relational databases. The relational database MUST be at least one of the following: * SQLSERVER * MYSQL * ORACLE 2. This framework MUST support the dynamic perception of data information in relational databases under a normal connection, in other words : * It MUST support dynamic awareness of all tables in a relational database; * It MUST support dynamic awareness of all columns corresponding to all tables in a relational database; 3.2.2. The Connection Testing of Relational Data Sources Before conducting data migration, the framework MUST support testing the connection to the data sources that will be migrated, and then decide whether to migrate. 3.2.3. The Target Storage Container of Data Migration This framework MUST allow users to migrate large amounts of data from a relational database to the following at least two types of target storage containers: o HDFS o HBASE o HIVE 3.2.4. Specifying Target Cloud Platform This framework MUST allow an authorized user to specify the target cloud platform to which the data will be migrated. 3.2.5. Data Migration to third-party Web Applications This framework SHALL support the migration of large amounts of data from relational databases to one or multiple data containers for Yang, et al. Expires November 28, 2020 [Page 8] Internet-Draft Data Migration Standards May 2020 third-party Web applications. The target storage containers of the third-party Web application systems can be: o MONGODB o MYSQL o SQLSERVER o ORACLE 3.3. Type of Migrated Database This framework is needed to meet the following requirements: o It MAY support migrating the entire relational database to the cloud platform; o It MAY support homogeneous migration (for example, migration from ORACLE to ORACLE); o It MAY support heterogeneous migrations between different databases (for example, migration from ORACLE to SQLServer); o It SHALL support the migration to the MONGODB database; o It's OPTIONAL that If the migration process is interrupted, it is needed to support automatic restart of the migration process and continue the migration from where it left off; Additionally, the framework is needed to be able to support the user in the following manner to inform this abnormal interruption: * It MUST support popping up an alert box on the screen of the user; * It SHALL support notifying users by email; * It's OPTIONAL to notify users by an Instant Messenger as We Chat or QQ; 3.4. Scale of Migrated Table 3.4.1. Full Table Migration This framework MUST support the migration of all tables in a relational database to at least two types of target storage containers: Yang, et al. Expires November 28, 2020 [Page 9] Internet-Draft Data Migration Standards May 2020 o HDFS o HBASE o HIVE 3.4.2. Single Table Migration This framework MUST allow users to specify a single table in a relational database and migrate it to at least two types of target storage containers: o HDFS o HBASE o HIVE 3.4.3. Multi-table migration This framework MUST allow users to specify multiple tables in a relational database and migrate to at least two types of target storage containers: o HDFS o HBASE o HIVE 3.5. Split-by This framework is needed to meet the following requirements on split- by. 3.5.1. Single Column 1. The framework MUST allow the user to specify a single column of the data table (usually the table's primary key), then slice the data in the table into multiple parallel tasks based on this column, and migrate the sliced data to one or more of the following target data containers respectively: * HDFS * HBASE * HIVE Yang, et al. Expires November 28, 2020 [Page 10] Internet-Draft Data Migration Standards May 2020 The specification of the data table column can be based on the following methods: + Users can specify freely; + Users can specify linearly; + Users can select an appropriate column for the segmentation based on the information entropy of the selected column data; 2. The framework SHALL allow the user to query the boundaries of the specified column in the split-by, then slice the data into multiple parallel tasks and migrating the data to one or more of the following target data containers: * HDFS * HBASE * HIVE 3.5.2. Multiple Column This framework MAY allow the user to specify multiple columns in the data table to slice the data linearly into multiple parallel tasks and then migrate the data to one or more of the following target data containers: o HDFS o HBASE o HIVE 3.5.3. Non-linear Segmentation It's OPTIONAL that this framework is needed to support non-linear intelligent segmentations of data for one or more columns and then migrate the data to one or more of the following target data containers: The non-linear intelligent segmentations refer to: * Adaptive segmentation based on the distribution(density)of the value of numerical columns; Yang, et al. Expires November 28, 2020 [Page 11] Internet-Draft Data Migration Standards May 2020 * Adaptive segmentation based on the distribution of entropy of subsegments of a column; * Adaptive Segmentation Based on Neural Network Predictor; The target data container includes: * HDFS * HBASE * HIVE 3.6. Conditional Query Migration This framework SHALL allow users to specify the query conditions, then querying out the corresponding data records and migrating them. 3.7. Dynamic Detection of Data Redundancy It's OPTIONAL that the framework is needed to allow users to add data redundancy labels and label communication mechanisms, then it detects redundant data dynamically during data migration to achieve non- redundant migration. The specific requirements are as follows: o The framework SHALL be able to deep granulation processing on the piece of data content to be sent. It means the content segment to be sent is further divided into smaller-sized data sub-blocks. o The framework SHALL be able to feature calculation and forming a grain head for each of the decomposed particles, the granular header information includes but not limited to grain feature amount, grain data fingerprint, unique grain ID number, particle generation time, source address and destination address, etc. o The framework SHALL be able to detect the granular header information to determine the transmission status of each information granule content that is decomposed, and if the current information granule to be sent is already present at the receiving end, the content of the granule is not sent. Otherwise the current granule will be sent out. o The framework SHALL be able to set a Cache at the sending port to cache the granular header information index table (GHTs) and the granular information database (GDBs) for the information granules; the receiver SHALL be able to set a Cache to cache the granular Yang, et al. Expires November 28, 2020 [Page 12] Internet-Draft Data Migration Standards May 2020 header information index table (GHTr) and the granular information database (GDBr) that have successfully received the information granular. o After all the fragments of the data have been transferred, the framework SHALL be able to reassemble all the fragments and store the data on the receiving disk. o The framework SHALL be able to set a granular encoder at the sending port, which is responsible for encoding and compressing the information granular content generated by the granular resolver. The encoder generates a coded version of the corresponding information granule, and calculates the content about the granule head of the compressed information granule , then performs transmission processing in the manner of sending the granule head, detecting redundant granules, synthesizing the granules, and accurately detecting redundant granules. o The framework SHALL be able to set a granular decoder at the receiving port, which is responsible for decoding the encoded compressed granular content at the receiving port and merging it with the grain synthesizer, whether it comes from the sending port Cache or the receiving port Cache. 3.8. Data Migration with Compression During the data migration process, the data is not compressed by default. This framework MUST support at least one of the following data compression encoding formats, allowing the user to compress and migrate the data: o GZIP o BZIP2 3.9. Updating Mode of Data Migration 3.9.1. Appending Migration This framework SHALL support the migration of appending data to existing datasets in HDFS. 3.9.2. Overwriting the Import When importing data into HIVE, the framework SHALL support overwriting the original dataset and saving it. Yang, et al. Expires November 28, 2020 [Page 13] Internet-Draft Data Migration Standards May 2020 3.10. The Encryption and Decryption of Data Migration This framework is needed to meet the following requirements: o It MAY support data encryption at the source, and then the received data should be decrypted and stored on the target platform; o It MUST support the authentication when getting data migration source data; o It SHALL support the verification of identity and permission when accessing the target platform of data migration; o During the process of data migration, it SHOULD support data consistency; o During the process of data migration, it MUST support data integrity; 3.11. Incremental Migration The framework SHOULD support incremental migration of table records in a relational database, and it MUST allow the user to specify a field value as "last_value" in the table in order to characterize the row record increment. Then, the framework SHOULD migrate those records in the table whose field value is greater than the specified "last_value", and then update the last_value. 3.12. Real-Time Synchronization Migration The framework SHALL support real-time synchronous migration of updated data and incremental data from a relational database to one or many of the following target data containers: o HDFS o HBASE o HIVE 3.13. The Direct Mode of Data Migration This framework MUST support data migration in direct mode, which can increase the data migration rate. Note:This mode supports only for MYSQL and POSTGRESQL. Yang, et al. Expires November 28, 2020 [Page 14] Internet-Draft Data Migration Standards May 2020 3.14. The Storage Format of Data files This framework MUST allow saving the migrated data within at least one of following data file formats: o SEQUENCE o TEXTFILE o AVRO 3.15. The Number of Map Tasks This framework MUST allow the user to specify a number of map tasks to start a corresponding number of map tasks for migrating large amounts of data in parallel. 3.16. The selection on the elements in a table to be migrated column o The specification of columns This framework MUST support the user to specify the data of one or multiple columns in a table to be migrated. o The specification of rows This framework SHOULD support the user to specify the range of rows in a table to be migrated. o The composition of the specification of columns and rows This framework MAY support optionally the user to specify the range of rows and columns in a table to be migrated. 3.17. Visualization of Migration 3.17.1. Dataset Visualization After the framework has migrated the data in the relational database,,it MUST support the visualization of the dataset in the cloud platform. 3.17.2. Visualization of Data Migration Progress The framework SHOULD support to show dynamically the progress to users in graphical mode when migrating. Yang, et al. Expires November 28, 2020 [Page 15] Internet-Draft Data Migration Standards May 2020 3.18. Smart Analysis of Migration The framework MAY provide automated migration proposals to facilitate the user's estimation of migration workload and costs. 3.19. Task Scheduling The framework SHALL support the user to set various migration parameters(such as map tasks,the storage format of data files,the type of data compression and so on) and task execution time, and then to perform the schedule off-line/online migration tasks. 3.20. The Alarm of Task Error When the task fails, the framework MUST at least support to notify stakeholders through a predefined way. 3.21. Data Export From Cloud to RDBMS 3.21.1. Data Export Diagram Figure 2 shows the framework's working diagram of exporting data. +---------+ +----------------+ | | (1) | WebServer | | Browser |-------->| |--------------------- | | | +-----------+ | | +---------+ | | DMOW | | | | +-----------+ | | +----------------+ | |(2) | | +-------------+ +-----------------------+ | | | (3) | | | | Data Source |<-------- | Cloud Platform | | | | | +-----------------+ |<---- +-------------+ | | Migration Engine| | | +-----------------+ | +-----------------------+ Figure 2:Reference Diagram The workflow of exporting data through the framework is as follows: Yang, et al. Expires November 28, 2020 [Page 16] Internet-Draft Data Migration Standards May 2020 Step (1) in the figure means that users submit the requisition of data migration to DMOW through browser(the requisition includes cloud platform information,the information of target relational database, and related migration parameter settings); Step (2) in the figure means that DMOW submits user's request information of data migration to cloud platform's migration engine; Step (3) in the figure means that the migration engine performs data migration tasks based on the migration requests it receives to migrate data from cloud platform to relational database; 3.21.2. Full Export The framework MUST at least support exporting data from HDFS to one of following relational databases: o SQLSERVER o MYSQL o ORACLE The framework SHALL support exporting data from HBASE to one of following relational databases: o SQLSERVER o MYSQL o ORACLE The framework SHALL support exporting data from HIVE to one of following relational databases: o SQLSERVER o MYSQL o ORACLE 3.21.3. Partial Export The framework SHALL allow the user to specify data range of keys on the cloud platform and export the elements in the specified range to a relational database. Exporting into A Subset of Columns. Yang, et al. Expires November 28, 2020 [Page 17] Internet-Draft Data Migration Standards May 2020 3.22. The Merger of Data The framework SHALL support merging data in different directories in HDFS and store them in a specified directory. 3.23. Column Separator The framework MUST allow the user to specify the separator between fields in the migration process. 3.24. Record Line Separator The framework MUST allow the user to specify the separator between the record lines after the migration is complete. 3.25. The Mode of Payment 1. One-way payment mode * In the framework by default, users SHALL to pay for downloading data from the cloud platform.It is free when uploading data from a relational database to a cloud platform; * In the framework, users SHALL pay for uploading data from a relational database to a cloud platform.It is free when downloading data from the cloud; 2. Two-way payment mode In the framework, the users of the data migration process between the relational database and the cloud platform all SHALL pay a fee; 3.26. Web Shell for Migration The framework provides following shells for character interface to operate through web access. 3.26.1. Linux Web Shell The framework SHALL support Linux shell through web access, which allows users to perform basic Linux command instructions for the configuration management of the data migrated on web. Yang, et al. Expires November 28, 2020 [Page 18] Internet-Draft Data Migration Standards May 2020 3.26.2. HBase Shell The framework SHALL support hbase shell through web access, which allows users to perform basic operations such as adding, deleting, and deleting to the data migrated to hbase through the web shell. 3.26.3. Hive Shell The framework SHALL support hive shell through web access, which allows users to perform basic operations such as adding, deleting, and deleting to the data migrated to hive through the web shell. 3.26.4. Hadoop Shell The framework SHALL support the Hadoop shell through web access so that users can perform basic Hadoop command operations through the web shell. 3.26.5. Spark Shell The framework SHALL support spark shell through web access and provide an interactive way to analyze and process the data in the cloud platform. 3.26.6. Spark Shell Programming Language In spark web shell, the framework SHALL support at least one of the following programming languages: o Scala o Java o Python 4. Security Considerations The framework SHOUD support for the security of the data migration process. During the data migration process, it should support encrypt the data before transmission, and then decrypt it for storage in target after the transfer is complete. At the same time, it must support the authentication when getting data migration source data and it shall support the verification of identity and permission when accessing the target platform. Yang, et al. Expires November 28, 2020 [Page 19] Internet-Draft Data Migration Standards May 2020 5. IANA Considerations This memo includes no request to IANA. 6. References 6.1. Normative References [RFC2026] Bradner, S., "The Internet Standards Process -- Revision 3", BCP 9, RFC 2026, DOI 10.17487/RFC2026, October 1996, . [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, . [RFC2578] McCloghrie, K., Ed., Perkins, D., Ed., and J. Schoenwaelder, Ed., "Structure of Management Information Version 2 (SMIv2)", STD 58, RFC 2578, DOI 10.17487/RFC2578, April 1999, . 6.2. Informative References [RFC2629] Rose, M., "Writing I-Ds and RFCs using XML", RFC 2629, DOI 10.17487/RFC2629, June 1999, . [RFC4710] Siddiqui, A., Romascanu, D., and E. Golovinsky, "Real-time Application Quality-of-Service Monitoring (RAQMON) Framework", RFC 4710, DOI 10.17487/RFC4710, October 2006, . [RFC5694] Camarillo, G., Ed. and IAB, "Peer-to-Peer (P2P) Architecture: Definition, Taxonomies, Examples, and Applicability", RFC 5694, DOI 10.17487/RFC5694, November 2009, . 6.3. URL References [hadoop] The Apache Software Foundation, "http://hadoop.apache.org/". [hbase] The Apache Software Foundation, "http://hbase.apache.org/". [hive] The Apache Software Foundation, "http://hive.apache.org/". Yang, et al. Expires November 28, 2020 [Page 20] Internet-Draft Data Migration Standards May 2020 [idguidelines] IETF Internet Drafts editor, "http://www.ietf.org/ietf/1id-guidelines.txt". [idnits] IETF Internet Drafts editor, "http://www.ietf.org/ID-Checklist.html". [ietf] IETF Tools Team, "http://tools.ietf.org". [ops] the IETF OPS Area, "http://www.ops.ietf.org". [spark] The Apache Software Foundation, "http://spark.apache.org/". [sqoop] The Apache Software Foundation, "http://sqoop.apache.org/". [xml2rfc] XML2RFC tools and documentation, "http://xml.resource.org". Authors' Addresses Can Yang (editor) South China University of Technology 382 Zhonghuan Road East Guangzhou Higher Education Mega Centre Guangzhou, Panyu District P.R.China Phone: +86 18602029601 Email: cscyang@scut.edu.cn Yu Liu&Ying Wang&ShiYing Pan (editor) South China University of Technology 382 Zhonghuan Road East Guangzhou Higher Education Mega Centre Guangzhou, Panyu District P.R.China Email: 201820132798@scut.edu.cn Yang, et al. Expires November 28, 2020 [Page 21] Internet-Draft Data Migration Standards May 2020 Cong Chen Inspur 163 Pingyun Road Guangzhou, Tianhe District P.R.China Email: chen_cong@inspur.com Ge Chen GSTA No. 109 Zhongshan Road West, Guangdong Telecom Technology Building Guangzhou, Tianhe District P.R.China Email: cheng@gsta.com Yukai Wei Huawei Putian Huawei base Shenzhen, Longgang District P.R.China Email: weiyukai@huawei.com Yang, et al. Expires November 28, 2020 [Page 22]