idnits 2.17.1 

draft-chen-ds-description-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == The document doesn't use any RFC 2119 keywords, yet seems to have RFC
     2119 boilerplate text.

  -- The document date (November 25, 2017) is 2334 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

     No issues found here.

     Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------

1	Internet Working Group                                      YP. Chen
2	Internet-Draft                                                H. Xia

4	Intended status: Informational                              ZM. Wang
5	Expires: May 29, 2018                                        P. Yang
6	                                                            CW. Tang
7	       Shaanxi Key Laboratory of Network Data Intelligent Processing
8	                    Xi'an University of Posts and Telecommunications
9	                                                   November 25, 2017
10	INTERNET-DRAFT
11	A Unified Description Method for Data Service
12	draft-chen-ds-description-00

14	Status of this Memo
15	This Internet-Draft is submitted in full conformance with the
16	provisions of BCP 78 and BCP 79.

18	Internet-Drafts are working documents of the Internet Engineering
19	Task Force (IETF). Note that other groups may also distribute
20	working documents as Internet-Drafts. The list of current Internet-
21	Drafts is at https://datatracker.ietf.org/drafts/current/.

23	Internet-Drafts are draft documents valid for a maximum of six
24	months and may be updated, replaced, or obsoleted by other documents
25	at any time. It is inappropriate to use Internet-Drafts as reference
26	material or to cite them other than as "work in progress."

28	This Internet-Draft will expire on May 29, 2018.

30	Copyright Notice

32	Copyright (c) 2017 IETF Trust and the persons identified as the
33	document authors. All rights reserved.

35	This document is subject to BCP 78 and the IETF Trust's Legal
36	Provisions Relating to IETF Documents
37	(http://trustee.ietf.org/license-info) in effect on the date of
38	publication of this document. Please review these documents
39	carefully, as they describe your rights and restrictions with respect
40	to this document. Code Components extracted from this document must
41	include Simplified BSD License text as described in Section 4.e of
42	the Trust Legal Provisions and are provided without warranty as
43	described in the Simplified BSD License.

45	Abstract
46	The rapid development of Internet has driven more and more
47	enterprises or individuals encapsulate operations on key data
48	entities we call data service (DS). Due to the different fields
49	between enterprise or individual, resulting in the description of
50	data services appear semantic heterogeneity. In this paper, we
51	propose a more principled approach to the problems of heterogeneous
52	data service on the Web. We start with a data service description
53	document pre-processing. Finally, we propose a unified description
54	language model for data service, the Unified Description Language
55	for Data Service (UDL4DS).

57	Table of Contents

59	1. Introduction  . . . . . . . . . . . . . . . . . . . . . . . .  2
60	   1.1. Background  . . . . . . . . . . . . . . . . . . . . . . . 2
61	2. Conventions Used in This Document  . . . . . . . . . . . . . . 3
62	3. Data Service Description  . . . . . . . . . . . . . . . . . .  4
63	   3.1. Data Service Overview  . . . . . . . . . . . . . . . . .  4
64	   3.2. Data Service Preprocessing . . . . . . . . . . . . . . .  5
65	      3.2.1. Data Service Acquisition  . . . . . . . . . . . . .  6
66	      3.2.2. Feature Word Extraction for Data Service  . . . . .  6
67	   3.3. Data Service Classification  . . . . . . . . . . . . . .  7
68	      3.4. Data Service Description Language Design  . . . . . .  8
69	      3.4.1. Semantic Annotation of Data Service  . . . . . . . . 8
70	   3.5. Data Service Description Model  . . . . . . . . . . . . . 9
71	4. Security Considerations  . . . . . . . . . . . . . . . . . .  10
72	5. IANA Considerations  . . . . . . . . . . . . . . . . . . . .  10
73	6. Conclusions  . . . . . . . . . . . . . . . . . . . . . . . .  10
74	7. References  . . . . . . . . . . . . . . . . . . . . . . . . . 10
75	   7.1. Normative References  . . . . . . . . . . . . . . . . .  10
76	   7.2. Informative References  . . . . . . . . . . . . . . . .  10
77	8. Acknowledgments  . . . . . . . . . . . . . . . . . . . . . .  10
78	Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . .  11

80	1. Introduction
81	1.1.  Background
82	With the development of computer Internet and cloud computing,
83	various forms of data information have generated. Due to these data
84	service use different description standards and technology on the
85	Web, there is no common data model and access method so that it is
86	difficult to realize the mutual sharing of heterogeneous data source
87	information. In order to solve the above problems, a large number of
88	heterogeneous data are published on the Internet in the form of
89	services to provide data services for service users.

91	The essence of the data service is to use network service protocols
92	and standards such as Hyper Text Transfer Protocol (HTTP), Web
93	Services Description Language (WSDL), XML (Extensible Markup
94	Language), SOAP (Simple Object Access Protocol), Universal
95	Description Discovery and Integration (UDDI) to encapsulate
96	heterogeneous data sources in the Internet by opening up an agent or
97	interface access and providing data services for users. However, as
98	data in various fields is continuously encapsulated as services,
99	data services are becoming more and more frequent, leading to higher
100	and higher requirements for data services. In the process of data
101	service release and invocation, there are critical problems of data
102	service description as following:

104	The existing promulgators of data service are from different
105	industries or fields that cause the lack of a unified data
106	standards and norms as a result of semantic heterogeneity
107	description in the data service.

109	With the development of data services and the increasing
110	complexity of demands requested by service consumers, a single
111	service can not accurately and quickly satisfy the complex
112	demands. It becomes an urgent problem about how to effectively
113	integrate these data services to solve actual demands required by
114	the customer.

116	The method of sorting and semantic annotation for data service is
117	not good enough.
118	In this paper, we propose a data service description language model
119	named UDL4DS based on XML Schema, including the classification of
120	data services, the construction of domain ontology and semantic
121	annotation to solve the semantic heterogeneity between data service
122	in different fields. In addition, XML Schema description of the key
123	elements of the language model was designed to form a common
124	specification to achieve a unified description of data services.

126	2. Conventions Used in This Document
127	The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
128	"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
129	document are to be interpreted as described in RFC 2119 [RFC2119].

131	In this document, these words will appear with that interpretation
132	only when in ALL CAPS. Lower case uses of these words are not to be
133	interpreted as carrying significance described in RFC 2119.

135	3. Data Service Description
136	At present, the data service description is generally based on the
137	XML specification, which describes the access interface and other
138	information of data service. As the constant changes of needs
139	required by users, the description of data service is changing from
140	the grammatical level to the semantic level, which solves the
141	problem that computers are difficult to understand for their
142	semantic expression and provides the best data service more quickly
143	and intelligently. However, due to the data service providers are
144	from different industries which have their own standards for the
145	services they publish, as a result of shared-nothing and
146	interoperated-nothing with each other.

148	In this paper, through the division of data service and the solution
149	of data service published in different fields, we propose a unified
150	description language for data service (UDL4DS) based on the XML
151	Schema specification. We complete the description of unity in two
152	ways: on the one hand, we propose a new data service semantic
153	annotation method based on the domain ontology library to solve the
154	semantic heterogeneity between data services. On the other hand,
155	design a unified description language model, which describes the
156	data service according to the designed description language.

158	3.1. Data Service Overview
159	In different fields, the meaning of data service is very different.
160	Manu MR and Richard Manning believe that data service layer applies
161	SOA architecture and plays an important role in data integration.
162	Carey M J believes that data service is a software service that
163	provides a unified data model and various access operations to
164	data resources. WS. Zhang believes that the data Service is an XML
165	access interface that can access the database and return the Web
166	Service of the XML format result set. Zhang Peng believes that data
167	service only encapsulates data resources in the information system.
168	Before and after invocation, data service does not change the state
169	of the outside world, and does not have the logic function of
170	handling any business by itself. Following the principles of Web
171	architecture[W3C.REC-webarch-20041215].

173	The data service directly encapsulates the data of the underlying
174	data source and opens an access interface for the data service
175	requester to invocate, thus the cost of updating and maintaining the
176	system will be reduced. In addition, it can facilitate the user to
177	easily discover and transparently access the data from data source.
178	Therefore, data services are becoming more and more popular on the
179	encapsulation of data.

181	3.2. Data Service Preprocessing
182	Data service exists in the form of XML specification on the Web. The
183	service requester accesses the published data service by calling the
184	open interface of the data service publisher. However, the data
185	service publishers have different industries or fields, and the data
186	services perform semantic heterogeneity in service descriptions,
187	resulting in data service requesters can not exactly and quickly
188	access the best data service that satisfies their needs.

190	In order to discover and invoke the data service better, we
191	implement the preprocessing of data service by analyzing the basic
192	information described in the data service description document and
193	extracting the attribute values of the key tags in the description
194	document, we can obtain the feature word text that can represent the
195	data service, classifying data services by feature word text,
196	dividing the fields into which they belong, and providing keywords
197	that can represent the data service, as shown in Figure 1, which
198	illustrates the preprocessing of data service.

200	+----------------------------------+
201	|             Web                  |
202	|                                  |
203	+----------------------------------+ <--------
204	|                                  |    |
205	| WSDL Document for Data Service   |  Obtain
206	|                                  |    |
207	+----------------------------------+ <--------
208	| Feature Vector of WSDL Document  |   Extract
209	|                                  |    |
210	+----------------------------------+ <--------
211	|                                  |    |
212	|    Domain Ontology Library       | Construct
213	|                                  |    |
214	+----------------------------------+ <--------
215	|                                  |    |
216	|       Subset of Data Service     | Classify
217	|      (Weather) (News)            |    |
218	+----------------------------------+ <--------

220	              Figure 1: Data Service preprocessing
221	3.2.1. Data Service Acquisition
222	In this paper, we mainly study the data service described by
223	WSDL. We find that the existing form of description document is WSDL,
224	ASMX based on the manifestations of WSDL description document on
225	the Web. We obtain these kinds of data services through the
226	preparation of the crawler. First, we set a certain rule according
227	to our own needs. second, we crawl on the Web to match the rules of
228	document from a given URL. Finally, end crawl as the number of
229	crawling documents reached the set threshold. Figure 2 shows the
230	process of crawling.

232	      +------------------------+
233	      |           URL          |
234	      +------------------------+
235	      |                        |
236	      |     Regular Expression |
237	      |                        |
238	+---> +------------------------+ <--+
239	|     |       Extract Web link |    |
240	|     |                        |    |
241	|     +------------------------+    |
242	|     |                        |    +-----------+
243	|     |        Link queue      |    |    WSDL   |
244	|     |                        |    |Description|
245	|     +------------------------+    |Document   |
246	|     |                        |    |  (Match)  |
247	|     |       Lenght of link   |    +-----------+
248	|Less |                        |
249	+---> +------------------------+
250	      |          End           |
251	      |         (more)         |
252	      +------------------------+

254	               Figure 2: The process of Crawling

256	3.2.2. Feature Word Extraction for Data Service
257	Each data service corresponds to a WSDL description document that
258	describes the basic information of the data service, such as "What
259	does the data service do", "Where is the data service", and "how to
260	invoke data service". In this paper, in order to better and easier
261	to represent a data service, we extract some of the more
262	representative tags in the data service description document as
263	attributes of the document, such as (WSDL: service) describes the
264	name of data service, (WSDL: operation) describes what kind of
265	functional information the data service can accomplish. For example,
266	a data service "Weather Service" whose method name "Get Weather By
267	IP" can clearly illustrate that the data service is a service that
268	obtains the weather information of the city or region represented by
269	the IP address through the IP address.

271	Each element in the WSDL description document represents a certain
272	meaning. In order to extract the unique attribute representing the
273	data service, the elements in the document need to be parsed. In
274	this document, the content of the name attribute from the (WSDL:
275	service) and (WSDL: operation) tags are extracted as the document's
276	unique attribute value.

278	3.3. Data Service Classification
279	At present, the ontology construction generally consists of
280	requirements analysis, information collection, terminology
281	recognition, formal coding and assessment, as shown in Figure 3.4.
282	There are many ontology libraries built by the above aspects, but
283	considering the different fields and projects, the constructed
284	ontology base not only considers the general process but also
285	combines with the actual situation.

287	In order to construct a domain ontology suitable for this study, we
288	cluster the feature words of WSDL description document for obtained
289	data service and construct Vector Space Mode (VSM) for all feature
290	words, that is each WSDL description document feature word as a
291	column to form a word - document matrix D, the document matrix D on
292	behalf of N WSDL document, to facilitate the calculation of each
293	feature word weight in any feature word document.
294	Based on the prototype model of domain ontology, the ontology was
295	modeled by OWL ontology description language, the result of
296	clustering the feature words of WSDL document using K-center
297	algorithm, combine of domain information and the tool developed by
298	Stanford University.

300	We implement the classification of data service based on domain
301	ontology from three aspects. First, we parse the obtained WSDL
302	description document of data service, extract the feature word
303	document that represents the basic information of the data service,
304	and construct the feature word vector according to the space vector
305	model. Second, we use the WordNet to calculate the semantic distance
306	between the feature word vector and the vector formed by the domain
307	ontology. Finally, we select the appropriate dividing line to divide
308	the document into its own field.

310	The extraction of feature words for data service and the
311	construction of feature word space vector models, and will generate
312	a data service feature vector (SFV). In order to better calculate the
313	similarity between the feature word vector of data service and the
314	domain, domain ontology can be generalize to a domain vector (DV). We
315	can divide the data service belongs to which field according to the
316	similarity between two vectors.

318	3.4. Data Service Description Language Design
319	In this section, we first improve the formula for calculating the
320	similarity of feature words in the WSDL description document. Then,
321	we present an approach of calculating the similarity based on domain
322	ontology to complete the semantic processing of data service. On the
323	basis pf semantic annotation, we propose a unified description
324	language model of data service as well as complete the design of
325	description language.

327	3.4.1. Semantic Annotation of Data Service
328	In order to describe the data service uniformly, it is necessary to
329	solve the semantic difference between heterogeneous data services.
330	In this paper, we propose a new semantic annotation method for data
331	service which combines the domain ontology library constructed above.
332	The problem of semantic differences between heterogeneous data
333	services can be solved by semantic annotation for data service.

335	The idea of this method is as follows: Firstly, we extract feature
336	word from WSDL description document of data service to form a feature
337	word set that represents the description document. Secondly, we
338	cluster the feature word set by using K-center algorithm and construct
339	the domain ontology library by combining with the domain information.
340	Finally, we calculate the weight of each feature word combining with
341	the domain ontology, and the set of feature words and their weights
342	are stored according to ontology space vector model VSM. The WSDL
343	document containing these feature words Is associated with the
344	corresponding feature word, thus the mapping between the data service
345	description document and the domain ontology concept is formed.

347	Because ontology is a detailed description of the constraints of
348	the related concepts, concept attributes and the concepts of various
349	hierarchies in this field, semantic annotation of data services
350	based on domain ontology can not only reflect the relationship
351	between service description documents and semantic relevance of
352	categories, as well as display the implicit semantic information of
353	data service description documents. In this way, the data service
354	description documents have a certain semantic relationship between
355	them, so as to solve the problem of heterogeneous data services,
356	provide more accurate and comprehensive data services, and lay down
357	unified descriptions for implementing data services.

359	3.5. Data Service Description Model
360	At present, the data service description methods and standards
361	published on the Web are different. In order to enable the sharing
362	of heterogeneous service resources, it is necessary to solve the
363	semantic heterogeneity between data service resources to make the
364	data service resources to complete a unified semantic description in
365	service description as well as automatically judge the service
366	access mechanism in the implementation of service.

368	In this paper, we present a unified data service description
369	language model (UDL4DS), Figure 3 illustrates the model of UDL4DS.

371	      +--> +-----------+ <--+-------------------------+
372	      |    | Execution |    |                         |
373	      |    |Information|    |Execute                  |
374	      |    +-----------+    | DSExecute   /DSExecute  |
375	+--+--+                     | JDSExecute  /JDSExecute |
376	|  |                        |                         |
377	|  |                        |/Execute                 |
378	|  |                        +-------------------------+
379	|DS|+----> +-----------+ <--+--------------------+
380	|  |       |   Basic   |    |BaseInfo            |
381	|  |       |Information|    | DSID   /DSID       |
382	|  |       +-----------+    | DSName   /DSName   |
383	|  |                        |                    |
384	+--+--+                     |/BaseInfo           |
385	      |                     +--------------------+
386	      |
387	      |
388	+-----+ +-----------+ <-+-------------------------------+
389	|       | Semantic  |   |Semantic                       |
390	|       |Information|   |ClassifyName   /ClassifyName   |
391	+-----> +-----------+   |ClassifyMethod /ClassifyMethod |
392	                        |ClassfyTime   /ClassfyTime     |
393	                        |Semantic                       |
394	                        +-------------------------------+

396	              Figure 3: UDL4DS Language Model
397	4. Security Considerations
398	In this paper, we mainly focus on the unified description of the
399	heterogeneous data service described in the existing WSDL. However,
400	when considering the heterogeneous data sources such as text or
401	webpage data and other forms of data services, the study is not
402	comprehensive enough.

404	5. IANA Considerations
405	There are no IANA considerations related to this document.

407	6. Conclusions
408	This document proposes a unified description method for
409	heterogeneous data service, which can make data service share to
410	solve the complex needs of users. We start with a pre-processing of
411	data service description document. Second, we propose a unified
412	description language model for data service, the Unified Description
413	Language for Data Service (UDL4DS). Finally, we implement
414	description system of data service based on Web.

416	7. References

418	7.1. Normative References

420	[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
421	Requirement Levels", BCP 14, RFC 2119,
422	DOI 10.17487/RFC2119, March 1997,

424	7.2. Informative References

426	[W3C.REC-webarch-20041215]
427	  Jacobs, I. and N. Walsh, "Architecture of
428	the World Wide Web, Volume One", World Wide Web Consortium

430	Recommendation REC-webarch-20041215, December 2004.

432	8. Acknowledgments
433	Thanks for comments and suggestions provided by H. Wang.

435	This document was prepared using 2-Word-v2.0.template.dot.

437	Authors' Addresses
438	YP Chen
439	Shaanxi Key Laboratory of Network Data Intelligent Processing
440	Xi'an University of Posts and Telecommunications
441	China

443	Email: CHENYP@XUPT.edu.cn

445	H Xia
446	Shaanxi Key Laboratory of Network Data Intelligent Processing
447	Xi'an University of Posts and Telecommunications
448	China

450	Email: XIAHONG@XUPT.edu.cn

452	ZM Wang
453	Shaanxi Key Laboratory of Network Data Intelligent Processing
454	Xi'an University of Posts and Telecommunications
455	China

457	Email: ZMWANG@XUPT.edu.cn

459	P Yang
460	Xi'an University of Posts and Telecommunications
461	China

463	Email: YANGPING@163.com

465	CW Tang
466	Xi'an University of Posts and Telecommunications
467	China

469	Email: 1316904833@qq.com