idnits 2.17.1 draft-chen-ds-description-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (November 25, 2017) is 2334 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- No issues found here. Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Working Group YP. Chen 2 Internet-Draft H. Xia 4 Intended status: Informational ZM. Wang 5 Expires: May 29, 2018 P. Yang 6 CW. Tang 7 Shaanxi Key Laboratory of Network Data Intelligent Processing 8 Xi'an University of Posts and Telecommunications 9 November 25, 2017 10 INTERNET-DRAFT 11 A Unified Description Method for Data Service 12 draft-chen-ds-description-00 14 Status of this Memo 15 This Internet-Draft is submitted in full conformance with the 16 provisions of BCP 78 and BCP 79. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF). Note that other groups may also distribute 20 working documents as Internet-Drafts. The list of current Internet- 21 Drafts is at https://datatracker.ietf.org/drafts/current/. 23 Internet-Drafts are draft documents valid for a maximum of six 24 months and may be updated, replaced, or obsoleted by other documents 25 at any time. It is inappropriate to use Internet-Drafts as reference 26 material or to cite them other than as "work in progress." 28 This Internet-Draft will expire on May 29, 2018. 30 Copyright Notice 32 Copyright (c) 2017 IETF Trust and the persons identified as the 33 document authors. All rights reserved. 35 This document is subject to BCP 78 and the IETF Trust's Legal 36 Provisions Relating to IETF Documents 37 (http://trustee.ietf.org/license-info) in effect on the date of 38 publication of this document. Please review these documents 39 carefully, as they describe your rights and restrictions with respect 40 to this document. Code Components extracted from this document must 41 include Simplified BSD License text as described in Section 4.e of 42 the Trust Legal Provisions and are provided without warranty as 43 described in the Simplified BSD License. 45 Abstract 46 The rapid development of Internet has driven more and more 47 enterprises or individuals encapsulate operations on key data 48 entities we call data service (DS). Due to the different fields 49 between enterprise or individual, resulting in the description of 50 data services appear semantic heterogeneity. In this paper, we 51 propose a more principled approach to the problems of heterogeneous 52 data service on the Web. We start with a data service description 53 document pre-processing. Finally, we propose a unified description 54 language model for data service, the Unified Description Language 55 for Data Service (UDL4DS). 57 Table of Contents 59 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 60 1.1. Background . . . . . . . . . . . . . . . . . . . . . . . 2 61 2. Conventions Used in This Document . . . . . . . . . . . . . . 3 62 3. Data Service Description . . . . . . . . . . . . . . . . . . 4 63 3.1. Data Service Overview . . . . . . . . . . . . . . . . . 4 64 3.2. Data Service Preprocessing . . . . . . . . . . . . . . . 5 65 3.2.1. Data Service Acquisition . . . . . . . . . . . . . 6 66 3.2.2. Feature Word Extraction for Data Service . . . . . 6 67 3.3. Data Service Classification . . . . . . . . . . . . . . 7 68 3.4. Data Service Description Language Design . . . . . . 8 69 3.4.1. Semantic Annotation of Data Service . . . . . . . . 8 70 3.5. Data Service Description Model . . . . . . . . . . . . . 9 71 4. Security Considerations . . . . . . . . . . . . . . . . . . 10 72 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . 10 73 6. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . 10 74 7. References . . . . . . . . . . . . . . . . . . . . . . . . . 10 75 7.1. Normative References . . . . . . . . . . . . . . . . . 10 76 7.2. Informative References . . . . . . . . . . . . . . . . 10 77 8. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . 10 78 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 11 80 1. Introduction 81 1.1. Background 82 With the development of computer Internet and cloud computing, 83 various forms of data information have generated. Due to these data 84 service use different description standards and technology on the 85 Web, there is no common data model and access method so that it is 86 difficult to realize the mutual sharing of heterogeneous data source 87 information. In order to solve the above problems, a large number of 88 heterogeneous data are published on the Internet in the form of 89 services to provide data services for service users. 91 The essence of the data service is to use network service protocols 92 and standards such as Hyper Text Transfer Protocol (HTTP), Web 93 Services Description Language (WSDL), XML (Extensible Markup 94 Language), SOAP (Simple Object Access Protocol), Universal 95 Description Discovery and Integration (UDDI) to encapsulate 96 heterogeneous data sources in the Internet by opening up an agent or 97 interface access and providing data services for users. However, as 98 data in various fields is continuously encapsulated as services, 99 data services are becoming more and more frequent, leading to higher 100 and higher requirements for data services. In the process of data 101 service release and invocation, there are critical problems of data 102 service description as following: 104 The existing promulgators of data service are from different 105 industries or fields that cause the lack of a unified data 106 standards and norms as a result of semantic heterogeneity 107 description in the data service. 109 With the development of data services and the increasing 110 complexity of demands requested by service consumers, a single 111 service can not accurately and quickly satisfy the complex 112 demands. It becomes an urgent problem about how to effectively 113 integrate these data services to solve actual demands required by 114 the customer. 116 The method of sorting and semantic annotation for data service is 117 not good enough. 118 In this paper, we propose a data service description language model 119 named UDL4DS based on XML Schema, including the classification of 120 data services, the construction of domain ontology and semantic 121 annotation to solve the semantic heterogeneity between data service 122 in different fields. In addition, XML Schema description of the key 123 elements of the language model was designed to form a common 124 specification to achieve a unified description of data services. 126 2. Conventions Used in This Document 127 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 128 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 129 document are to be interpreted as described in RFC 2119 [RFC2119]. 131 In this document, these words will appear with that interpretation 132 only when in ALL CAPS. Lower case uses of these words are not to be 133 interpreted as carrying significance described in RFC 2119. 135 3. Data Service Description 136 At present, the data service description is generally based on the 137 XML specification, which describes the access interface and other 138 information of data service. As the constant changes of needs 139 required by users, the description of data service is changing from 140 the grammatical level to the semantic level, which solves the 141 problem that computers are difficult to understand for their 142 semantic expression and provides the best data service more quickly 143 and intelligently. However, due to the data service providers are 144 from different industries which have their own standards for the 145 services they publish, as a result of shared-nothing and 146 interoperated-nothing with each other. 148 In this paper, through the division of data service and the solution 149 of data service published in different fields, we propose a unified 150 description language for data service (UDL4DS) based on the XML 151 Schema specification. We complete the description of unity in two 152 ways: on the one hand, we propose a new data service semantic 153 annotation method based on the domain ontology library to solve the 154 semantic heterogeneity between data services. On the other hand, 155 design a unified description language model, which describes the 156 data service according to the designed description language. 158 3.1. Data Service Overview 159 In different fields, the meaning of data service is very different. 160 Manu MR and Richard Manning believe that data service layer applies 161 SOA architecture and plays an important role in data integration. 162 Carey M J believes that data service is a software service that 163 provides a unified data model and various access operations to 164 data resources. WS. Zhang believes that the data Service is an XML 165 access interface that can access the database and return the Web 166 Service of the XML format result set. Zhang Peng believes that data 167 service only encapsulates data resources in the information system. 168 Before and after invocation, data service does not change the state 169 of the outside world, and does not have the logic function of 170 handling any business by itself. Following the principles of Web 171 architecture[W3C.REC-webarch-20041215]. 173 The data service directly encapsulates the data of the underlying 174 data source and opens an access interface for the data service 175 requester to invocate, thus the cost of updating and maintaining the 176 system will be reduced. In addition, it can facilitate the user to 177 easily discover and transparently access the data from data source. 178 Therefore, data services are becoming more and more popular on the 179 encapsulation of data. 181 3.2. Data Service Preprocessing 182 Data service exists in the form of XML specification on the Web. The 183 service requester accesses the published data service by calling the 184 open interface of the data service publisher. However, the data 185 service publishers have different industries or fields, and the data 186 services perform semantic heterogeneity in service descriptions, 187 resulting in data service requesters can not exactly and quickly 188 access the best data service that satisfies their needs. 190 In order to discover and invoke the data service better, we 191 implement the preprocessing of data service by analyzing the basic 192 information described in the data service description document and 193 extracting the attribute values of the key tags in the description 194 document, we can obtain the feature word text that can represent the 195 data service, classifying data services by feature word text, 196 dividing the fields into which they belong, and providing keywords 197 that can represent the data service, as shown in Figure 1, which 198 illustrates the preprocessing of data service. 200 +----------------------------------+ 201 | Web | 202 | | 203 +----------------------------------+ <-------- 204 | | | 205 | WSDL Document for Data Service | Obtain 206 | | | 207 +----------------------------------+ <-------- 208 | Feature Vector of WSDL Document | Extract 209 | | | 210 +----------------------------------+ <-------- 211 | | | 212 | Domain Ontology Library | Construct 213 | | | 214 +----------------------------------+ <-------- 215 | | | 216 | Subset of Data Service | Classify 217 | (Weather) (News) | | 218 +----------------------------------+ <-------- 220 Figure 1: Data Service preprocessing 221 3.2.1. Data Service Acquisition 222 In this paper, we mainly study the data service described by 223 WSDL. We find that the existing form of description document is WSDL, 224 ASMX based on the manifestations of WSDL description document on 225 the Web. We obtain these kinds of data services through the 226 preparation of the crawler. First, we set a certain rule according 227 to our own needs. second, we crawl on the Web to match the rules of 228 document from a given URL. Finally, end crawl as the number of 229 crawling documents reached the set threshold. Figure 2 shows the 230 process of crawling. 232 +------------------------+ 233 | URL | 234 +------------------------+ 235 | | 236 | Regular Expression | 237 | | 238 +---> +------------------------+ <--+ 239 | | Extract Web link | | 240 | | | | 241 | +------------------------+ | 242 | | | +-----------+ 243 | | Link queue | | WSDL | 244 | | | |Description| 245 | +------------------------+ |Document | 246 | | | | (Match) | 247 | | Lenght of link | +-----------+ 248 |Less | | 249 +---> +------------------------+ 250 | End | 251 | (more) | 252 +------------------------+ 254 Figure 2: The process of Crawling 256 3.2.2. Feature Word Extraction for Data Service 257 Each data service corresponds to a WSDL description document that 258 describes the basic information of the data service, such as "What 259 does the data service do", "Where is the data service", and "how to 260 invoke data service". In this paper, in order to better and easier 261 to represent a data service, we extract some of the more 262 representative tags in the data service description document as 263 attributes of the document, such as (WSDL: service) describes the 264 name of data service, (WSDL: operation) describes what kind of 265 functional information the data service can accomplish. For example, 266 a data service "Weather Service" whose method name "Get Weather By 267 IP" can clearly illustrate that the data service is a service that 268 obtains the weather information of the city or region represented by 269 the IP address through the IP address. 271 Each element in the WSDL description document represents a certain 272 meaning. In order to extract the unique attribute representing the 273 data service, the elements in the document need to be parsed. In 274 this document, the content of the name attribute from the (WSDL: 275 service) and (WSDL: operation) tags are extracted as the document's 276 unique attribute value. 278 3.3. Data Service Classification 279 At present, the ontology construction generally consists of 280 requirements analysis, information collection, terminology 281 recognition, formal coding and assessment, as shown in Figure 3.4. 282 There are many ontology libraries built by the above aspects, but 283 considering the different fields and projects, the constructed 284 ontology base not only considers the general process but also 285 combines with the actual situation. 287 In order to construct a domain ontology suitable for this study, we 288 cluster the feature words of WSDL description document for obtained 289 data service and construct Vector Space Mode (VSM) for all feature 290 words, that is each WSDL description document feature word as a 291 column to form a word - document matrix D, the document matrix D on 292 behalf of N WSDL document, to facilitate the calculation of each 293 feature word weight in any feature word document. 294 Based on the prototype model of domain ontology, the ontology was 295 modeled by OWL ontology description language, the result of 296 clustering the feature words of WSDL document using K-center 297 algorithm, combine of domain information and the tool developed by 298 Stanford University. 300 We implement the classification of data service based on domain 301 ontology from three aspects. First, we parse the obtained WSDL 302 description document of data service, extract the feature word 303 document that represents the basic information of the data service, 304 and construct the feature word vector according to the space vector 305 model. Second, we use the WordNet to calculate the semantic distance 306 between the feature word vector and the vector formed by the domain 307 ontology. Finally, we select the appropriate dividing line to divide 308 the document into its own field. 310 The extraction of feature words for data service and the 311 construction of feature word space vector models, and will generate 312 a data service feature vector (SFV). In order to better calculate the 313 similarity between the feature word vector of data service and the 314 domain, domain ontology can be generalize to a domain vector (DV). We 315 can divide the data service belongs to which field according to the 316 similarity between two vectors. 318 3.4. Data Service Description Language Design 319 In this section, we first improve the formula for calculating the 320 similarity of feature words in the WSDL description document. Then, 321 we present an approach of calculating the similarity based on domain 322 ontology to complete the semantic processing of data service. On the 323 basis pf semantic annotation, we propose a unified description 324 language model of data service as well as complete the design of 325 description language. 327 3.4.1. Semantic Annotation of Data Service 328 In order to describe the data service uniformly, it is necessary to 329 solve the semantic difference between heterogeneous data services. 330 In this paper, we propose a new semantic annotation method for data 331 service which combines the domain ontology library constructed above. 332 The problem of semantic differences between heterogeneous data 333 services can be solved by semantic annotation for data service. 335 The idea of this method is as follows: Firstly, we extract feature 336 word from WSDL description document of data service to form a feature 337 word set that represents the description document. Secondly, we 338 cluster the feature word set by using K-center algorithm and construct 339 the domain ontology library by combining with the domain information. 340 Finally, we calculate the weight of each feature word combining with 341 the domain ontology, and the set of feature words and their weights 342 are stored according to ontology space vector model VSM. The WSDL 343 document containing these feature words Is associated with the 344 corresponding feature word, thus the mapping between the data service 345 description document and the domain ontology concept is formed. 347 Because ontology is a detailed description of the constraints of 348 the related concepts, concept attributes and the concepts of various 349 hierarchies in this field, semantic annotation of data services 350 based on domain ontology can not only reflect the relationship 351 between service description documents and semantic relevance of 352 categories, as well as display the implicit semantic information of 353 data service description documents. In this way, the data service 354 description documents have a certain semantic relationship between 355 them, so as to solve the problem of heterogeneous data services, 356 provide more accurate and comprehensive data services, and lay down 357 unified descriptions for implementing data services. 359 3.5. Data Service Description Model 360 At present, the data service description methods and standards 361 published on the Web are different. In order to enable the sharing 362 of heterogeneous service resources, it is necessary to solve the 363 semantic heterogeneity between data service resources to make the 364 data service resources to complete a unified semantic description in 365 service description as well as automatically judge the service 366 access mechanism in the implementation of service. 368 In this paper, we present a unified data service description 369 language model (UDL4DS), Figure 3 illustrates the model of UDL4DS. 371 +--> +-----------+ <--+-------------------------+ 372 | | Execution | | | 373 | |Information| |Execute | 374 | +-----------+ | DSExecute /DSExecute | 375 +--+--+ | JDSExecute /JDSExecute | 376 | | | | 377 | | |/Execute | 378 | | +-------------------------+ 379 |DS|+----> +-----------+ <--+--------------------+ 380 | | | Basic | |BaseInfo | 381 | | |Information| | DSID /DSID | 382 | | +-----------+ | DSName /DSName | 383 | | | | 384 +--+--+ |/BaseInfo | 385 | +--------------------+ 386 | 387 | 388 +-----+ +-----------+ <-+-------------------------------+ 389 | | Semantic | |Semantic | 390 | |Information| |ClassifyName /ClassifyName | 391 +-----> +-----------+ |ClassifyMethod /ClassifyMethod | 392 |ClassfyTime /ClassfyTime | 393 |Semantic | 394 +-------------------------------+ 396 Figure 3: UDL4DS Language Model 397 4. Security Considerations 398 In this paper, we mainly focus on the unified description of the 399 heterogeneous data service described in the existing WSDL. However, 400 when considering the heterogeneous data sources such as text or 401 webpage data and other forms of data services, the study is not 402 comprehensive enough. 404 5. IANA Considerations 405 There are no IANA considerations related to this document. 407 6. Conclusions 408 This document proposes a unified description method for 409 heterogeneous data service, which can make data service share to 410 solve the complex needs of users. We start with a pre-processing of 411 data service description document. Second, we propose a unified 412 description language model for data service, the Unified Description 413 Language for Data Service (UDL4DS). Finally, we implement 414 description system of data service based on Web. 416 7. References 418 7.1. Normative References 420 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 421 Requirement Levels", BCP 14, RFC 2119, 422 DOI 10.17487/RFC2119, March 1997, 424 7.2. Informative References 426 [W3C.REC-webarch-20041215] 427 Jacobs, I. and N. Walsh, "Architecture of 428 the World Wide Web, Volume One", World Wide Web Consortium 430 Recommendation REC-webarch-20041215, December 2004. 432 8. Acknowledgments 433 Thanks for comments and suggestions provided by H. Wang. 435 This document was prepared using 2-Word-v2.0.template.dot. 437 Authors' Addresses 438 YP Chen 439 Shaanxi Key Laboratory of Network Data Intelligent Processing 440 Xi'an University of Posts and Telecommunications 441 China 443 Email: CHENYP@XUPT.edu.cn 445 H Xia 446 Shaanxi Key Laboratory of Network Data Intelligent Processing 447 Xi'an University of Posts and Telecommunications 448 China 450 Email: XIAHONG@XUPT.edu.cn 452 ZM Wang 453 Shaanxi Key Laboratory of Network Data Intelligent Processing 454 Xi'an University of Posts and Telecommunications 455 China 457 Email: ZMWANG@XUPT.edu.cn 459 P Yang 460 Xi'an University of Posts and Telecommunications 461 China 463 Email: YANGPING@163.com 465 CW Tang 466 Xi'an University of Posts and Telecommunications 467 China 469 Email: 1316904833@qq.com