Network Working Group Juha Hakala Internet-Draft Helsinki University Library Category: Informational 3 July 2002 draft-hakala-istc-00.txt Expires: 3 January 2003 Using International Standard Text Work Codes as Uniform Resource Names Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." To view the entire list of Internet-Draft Shadow Directories, see http://www.ietf.org/shadow.html. This Internet-Draft will expire on 3 January 2003. Abstract This document discusses how International Standard Text Work Codes (ISTCs; persistent and unique identifiers for textual works) can be supported within the URN framework and the syntax for URNs defined in RFC 2141 [Moats]. Analysis is in part based on the ideas expressed in RFC 2288 [Lynch], which analysed the use of ISSN, ISBN and SICI as URNs. Chapter 5 contains a URN namespace registration request modelled according to the template in RFC 2611 [Daigle et al.]. 1. Introduction As part of the validation process for the development of URNs the IETF working group agreed that it is important to demonstrate that the current URN syntax proposal can accommodate existing identifiers from well-established namespaces. One such infrastructure for assigning and managing names comes from the bibliographic community. Bibliographic identifiers function as names for objects that exist both in print and, increasingly, in electronic formats. RFC 2288 [Lynch et. al.] investigated the feasibility of using three identifiers (ISBN, ISSN and ISTC) as URNs. As a result of a recent proliferation of manifestations of works (various printed and electronic versions of books, for instance) ISO has decided to develop a set of identifiers for works. These standards include International Standard Audiovisual Work Code (ISAN), International Standard Musical Work Code (ISWC) and International Standard Text Work Code (ISTC) [ISO]. These standards identify works (such as Brave New World by Aldous Huxley) and their manifestations (such as a translation of Brave new world into Finnish). Manifestations, like the first edition of the Brave new world by Chatto & Windus, London 1932, will never receive an ISTC but û it being a novel û an ISBN. ISTC and ISTC metadata will be efficient tools for bringing together all related works and expressions û like all translations of Brave new world û and all manifestations any work or expression may have. ISTC is an emerging ISO standard which will reach the status of a Draft International Standard by Summer 2002. As of this writing it seems quite likely that the standard will be approved after the 6 months voting period in early 2003. Major changes to the syntax or to the maintenance organisation of the standard are very unlikely. RFC 2288 does not û and it was not the aim of its authors û to analyse how ISTC-based URNs can actually be resolved. This text will specify one solution to this question. There may be other complementary resolution services in addition to the one described here. Generally, the difficulty of designing a URN resolution service is dependent on two factors: * Is the identifier dumb, or does it provide a hint on where to find a resolution service? * How many potential resolution services are there? ISBN (International Standard Book Number) is a good example of an intelligent identifier. Analysis of the ISBN will reveal not only the region where the ISBN has been assigned, but also the publisher of the book. Resolution of ISBN-based URNs can be decentralised to national bibliography databases, maintained by the national libraries. If the ISBN were a dumb identifier, this would be impossible. International Standard Serial Number (ISSN) is a dumb identifier. It does not have a publisher identifier; serials published by a certain company get seemingly random ISSNs. Although ISSNs are allocated to regional agencies in blocks, which gives the system some "intelligence", a resolution service should not rely on these blocks û there are just too many of them, and their number is increasing all the time - but use the global ISSN database. It contains a bibliographic description of every periodical that has received an ISSN; by June 2002 the database contained about one million bibliographic records. Thus, it is easy to resolve ISSN-based URNs even though the identifier itself does not help in localising the resolution service. Like ISBN, ISTC will be an intelligent identifier (see below for a description of its syntax). On the other hand, it will be similar to the ISSN system in that there will be a global ISTC database, containing every ISTC assigned in the world, and related metadata. Since ISTCs can and will be given to textual works retrospectively, this database, maintained by the ISTC Registration Authority, will relatively soon become very large. However, at least some ISTC Regional Agencies, which will take care of ISTC assignment in their own regions (mainly geographical, but they may also be subject-driven) will send their data in batch mode to the ISTC register. Therefore there is a need to complement the ISTC resolution done in the global ISTC database with regional resolution services. The resulting system is a two-level cascade, where the bibliographic data related to the ISTC will be available either from the global database or from a database maintained by the Regional Agency, which assigned the ISTC. A Regional agency may be for instance a national library, which has generated work-related metadata and ISTCs from a traditional, manifestation-centered national bibliography. The registration request for acquiring a Namespace Identifier (NID) "ISTC" for International Standard Text Work Codes has been written by Helsinki University Library û The National Library of Finland on behalf of the International Standardisation Organisation (ISO). The request is included in chapter 5 of this text. The document at hand is part of a global co-operation of the national libraries to foster identification of electronic documents in general and utilisation of URNs in particular. This work is co-ordinated by a working group established by the Conference of Directors of National Libraries (CDNL), and supported by the Conference of the European National Librarians (CENL) Working Group on Networking Standards. We have used the URN Namespace Identifier "ISTC" for the International Standard Text Work Codes in examples below. 2. Identification vs. Resolution The ISTCs identify works, that is, abstract entities, which are embodied as physical manifestations. ISTC resolution service will only deliver a bibliographic record related to the work or expression. In the bibliographic record there may be links to other ISTC records describing related works and expressions, or to manifestations of the work. The manifestations of textual works identified by ISTCs may be printed or electronic. In the latter case, a user may be able to retrieve all manifestations related to the work. 3. International Standard Text Work Code 3.1 Overview The ISO International Standard Text Work Code (ISTC) standard defines a 16 byte hexadecimal code that provides unique identification of textual works. ISTC is as of this writing specified in the committee draft 21047, revised in 15 May 2002. In this CD, comments given to the first committee draft have been taken into account, and the ISTC Working Group decided to publish the text as a Draft International Standard. Changes to the syntax or management of the ISTC at this stage are highly unlikely. ISTC consists of four segments, all of which are required: - registration agency element; - year element; - work element; - check digit. ISO CD 21047 provides the following example: ISTC 0A9 2002 12B4A105 7 When an ISTC is displayed in written form the letters ISTC shall precede it. The segments should be separated by hyphen or space. Registration agency element shall consist of three hexadecimal digits. The code (in the above example, 0A9) represents the Registration agency which assigned the ISTC. The year element (in the example, 2002) shall consist of the four digits representing the year in which the ISTC was allocated. The work element shall consist of eight hexadecimal digits. The work element shall be assigned by an ISTC Registration agency appointed by the Registration authority for ISO 21047. The check digit shall be calculated on a MOD 16-3 system defined in accordance with ISO 7064. ISTC Registration agencies must provide metadata for each work they have identified. This metadata will be collected into the global ISTC register maintained by the ISTC Registration authority. The data may be updated on-line or in batch mode. Duplicates are removed from the database with the help of a duplicate check algorithm. According to the ISO CD 21047, ISTCs can be applied retrospectively to old works. In such case, work metadata will be usually generated from existing manifestation level metadata. Some projects have already analysed the feasibility of this process with satisfactory results. ISTC numbers are assigned by Registration agencies, which receive their agency element codes from the Registration authority. The system allows for 4096 such codes at any time; the codes may be re-used over time since agencies can be identified with the combination of the agency element and year. However, 4096 registration agency elements will be sufficient for quite a long time (the ISSN system has about 70 regional agencies, the ISBN system about 160). Given the relative complexity of ISTC codes and the very large number of textual works, which need identification, the recommended practice is to automate the ISTC creation process. In any Registration agency the agency element will never change, and changes in the year element are easy to track. Work element can be used rather freely, as long as the same identifier is never assigned twice. Since calculation of the check digit can also be easily automated, ISTC assignment can without difficulty be made a fully automatic process. 3.2 Encoding Considerations and Lexical Equivalence Since ISTC consists of hexadecimal characters, there are no needs for special encoding. However, the string ISTC preceding the identifier and any spaces separating the ISTC elements should be replaced by hyphens when an ISTC is used as URN. In order to determine if two ISTCs are lexically equivalent it is necessary to remove all spaces and hyphens from the ISTC string. 3.3 Resolution of ISTC-based URNs An efficient and global resolution service for ISTCs can be accomplished by using the global ISTC register. This database will, according to the current plans of the proposed Registration authority, go into production in January 2003. From this system, the ISTC data may be copied to one or several systems used for public access. An ISTC can be used as a search key for retrieving the bibliographic record of the work from the databases containing ISTC data. This record may contain ISTCs pointing to other works or other identifiers such as ISBNs, DOIs or SICIs identifying manifestations (books or articles) of the textual work. With the help of the registration agency element and the year code it is possible to locate the ISTC register (for instance, a traditional national bibliography database enriched with work metadata) of the Registration agency, which assigned the ISTC. Expanding the resolution of the ISTC-based URNs into these databases will bring two additional benefits. First, since the global ISTC register is maintained in batch mode it (and databases dependent on it) may not contain the newest ISTCs assigned by the registration authorities. Second, access to the systems containing global ISTC data may be for fee only, while the regional agencies may allow free access to their local ISTC registers. Typical users of the system will be authors and publishers seeking information about (published or non-published) works, librarians wishing to copy catalogue metadata related to a given work, and patrons who wish to track all manifestations of a work or expression related to it. 3.4 Additional considerations Since the number of ISTC resolution services will eventually be high (theoretical maximum 4096 + 1 "live" systems), encoding all services into the URN Resolution Discovery Service, and maintaining this data, may become a bottleneck. The ISTC system may become very large, as it is intended to cover all textual works, including novels, short stories and articles. Such a system may eventually become extremely popular. It is important that there will be multiple databases containing all or at least the most of the ISTC metadata in existence. 4. Security Considerations This document proposes means of encoding and using International Standard Text Work Codes within the URN framework. This document does not discuss resolution except at a generic level; thus questions of secure or authenticated resolution mechanisms in the ISTC registers are out of scope. This text does not address means of validating the integrity or authenticating the source or provenance of URNs that contain ISTCs. Issues regarding intellectual property rights associated with bibliographic data related to the ISTC or other work identifiers are also beyond the scope of this document, as are questions about rights to the databases that might be used to construct resolvers. 5. Namespace registration URN Namespace ID Registration for the International Standard Text Work Code (ISTC) Namespace ID: ISTC ISTC will become an established acronym for International Standard Text Work Codes; giving this NID for any other system would cause a lot of confusion. Registration Information: Version: 1 Date: 2002-07-03 Declared registrant of the namespace: Name: International ISTC Agency / Albert Simmonds E-mail: simmonda@oclc.org Affiliation: OCLC Online Computer Library Center, Inc. Address: OCLC, 6565 Frantz Road, Dublin, OH 43017-3395, USA Declaration of syntactic structure: Each ISTC contains four segments: ISTC consists of four segments, all of which are required: - registration agency element; - year element; - work element; - check digit. When an ISTC is displayed in written form the letters ISTC shall precede it. The segments should be separated by hyphen or space. Registration agency element shall consist of three hexadecimal digits. The code (in the above example, 0A9) represents the Registration agency which assigned the ISTC. The year element (in the example, 2002) shall consist of the four digits representing the year in which the ISTC was allocated. The work element shall consist of eight hexadecimal digits. The work element shall be assigned by an ISTC Registration agency appointed by the Registration authority for ISO 21047. The check digit shall be calculated on a MOD 16-3 system defined in accordance with ISO 7064. Example: 0A9-2002-12B4A105-7 ISTC codes can be generated and parsed by computer programs. Relevant ancillary documentation: ISTC is an emerging ISO standard defined by ISO CD 21047 (revised 2002- 05-15). Draft International Standard version of ISTC will be published during summer 2002, and it is expected that ISTC will be approved as ISO standard in early 2003, after the DIS 6 months comment period. No major changes to the syntax of the ISTC or its maintenance organisation are likely. Identifier uniqueness considerations: ISTC codes will always be unique. Two or more different ISTCs may identify the same work if multiple registration agencies deal with the same resources, or if a single agency deals with the same work twice. The duplicate control algorithm in the ISTC Registration authority is intended to remove duplicates arriving from the agencies, and any agency should have sufficient control mechanism in place to avoid duplicate registration of works. Identifier persistence considerations: Once assigned, ISTC will never change. The same ISTC will not be used again for another textual work. Process of identifier assignment: ISTCs will be assigned by the Registration agencies. Typically an author or his/her agent or a publisher will apply for an ISTC. It is also possible to generate ISTCs retrospectively for existing manifestations (published books and articles). This process has to be controlled well in order to avoid duplicate registration of works. One possibility is to generate work data in national bibliographic databases, and to limit the generation of work records to domestic works only. The Registration authority will govern the ISTC assignment process in the global level. The global ISTC Registry will enable duplicate control of the identified works. ISTC can - and should - be built via automated means. Process for identifier resolution: Resolution will take place as defined in chapter 3.3. The first step is to check the ISTC register or another database containing all of ISTC metadata, or the most of it. If there is no match, it is possible to use the Registration agency element (and eventually the year element) as a hint for finding the Registration agency, which has assigned the ISTC, and the resolution service maintained by it. ISTCs will always resolve into the work metadata. Manifestations of the work (such as electronic versions of a book) may or may not be linked to the ISTC metadata. ISTC metadata may also contain links to related works and expressions. Rules for Lexical Equivalence: Spaces and hyphens in the ISTC string are lexically equivalent. String "ISTC" in the beginning of the string must be neglected in the comparison. Conformance with URN Syntax: ISTC consists of hexadecimal digits and it is therefore compliant to the requirements to the URN syntax as defined in [Moats]. Validation mechanism: Validity of an ISTC string can be checked by modulus 16-3 check digit. Scope: Global. 6. References [Daigle et al.]: Daigle, L., van Gulik, D., Iannella, R. & Faltstrom, P.: URN Namespace Definition Mechanisms, RFC2611, June 1999. [ISO] Information and documentation û International Standard Text Code (ISTC). ISO/CD 21047. May 2002. [Lynch] Lynch, C., Using Existing Bibliographic Identifiers as Uniform Resource Names, RFC 2288, February 1998 [Moats] Moats, R., URN Syntax, RFC 2141, May 1997. 7. Authors' Address Juha Hakala Helsinki University Library - The National Library of Finland P.O. Box 26 FIN-00014 Helsinki University FINLAND E-mail: juha.hakala@helsinki.fi 8. Full Copyright Statement Copyright (C) The Internet Society (2002). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.