idnits 2.17.1 draft-kunze-dchtml-02.txt: ** The Abstract section seems to be numbered Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 1035 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Introduction section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** The abstract seems to contain references ([DC1], [SWISH-E], [PERL], [ISEARCH], [GLIMPSE], [HARVEST]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 626 has weird spacing: '...for the purpo...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '9' on line 880 -- Looks like a reference, but probably isn't: '5' on line 880 -- Looks like a reference, but probably isn't: '4' on line 880 -- Looks like a reference, but probably isn't: '3' on line 880 -- Looks like a reference, but probably isn't: '0' on line 928 == Unused Reference: 'AAT' is defined on line 964, but no explicit reference was found in the text == Unused Reference: 'ISO639-2' is defined on line 999, but no explicit reference was found in the text == Unused Reference: 'ISO8601' is defined on line 1003, but no explicit reference was found in the text == Unused Reference: 'TGN' is defined on line 1023, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. 'AAT' -- Possible downref: Non-RFC (?) normative reference: ref. 'AC' ** Obsolete normative reference: RFC 2413 (ref. 'DC1') (Obsoleted by RFC 5013) -- Possible downref: Non-RFC (?) normative reference: ref. 'DCHOME' -- Possible downref: Non-RFC (?) normative reference: ref. 'DCPROJECTS' -- Possible downref: Non-RFC (?) normative reference: ref. 'DCT1' -- Possible downref: Non-RFC (?) normative reference: ref. 'GLIMPSE' -- Possible downref: Non-RFC (?) normative reference: ref. 'HARVEST' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISEARCH' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO639-2' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO8601' -- Possible downref: Non-RFC (?) normative reference: ref. 'MARC' -- Possible downref: Non-RFC (?) normative reference: ref. 'PERL' -- Possible downref: Non-RFC (?) normative reference: ref. 'RDF' ** Obsolete normative reference: RFC 1766 (Obsoleted by RFC 3066, RFC 3282) -- Possible downref: Non-RFC (?) normative reference: ref. 'SWISH-E' -- Possible downref: Non-RFC (?) normative reference: ref. 'TGN' -- Possible downref: Non-RFC (?) normative reference: ref. 'WTN8601' -- Possible downref: Non-RFC (?) normative reference: ref. 'XML' Summary: 10 errors (**), 0 flaws (~~), 7 warnings (==), 25 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet-Draft (Informational) J. Kunze 2 draft-kunze-dchtml-02.txt 3 15 September 1999 Dublin Core 4 Expires 15 March 2000 Metadata Initiative 6 Encoding Dublin Core Metadata in HTML 8 (ftp://ftp.ietf.org/internet-drafts/draft-kunze-dchtml-02.txt) 10 1. Status of this Document 12 This document is an Internet-Draft and is in full conformance with all 13 provisions of Section 10 of RFC2026. 15 Internet-Drafts are working documents of the Internet Engineering Task 16 Force (IETF), its areas, and its working groups. Note that other groups 17 may also distribute working documents as Internet-Drafts. 19 Internet-Drafts are draft documents valid for a maximum of six months 20 and may be updated, replaced, or obsoleted by other documents at any 21 time. It is inappropriate to use Internet-Drafts as reference 22 material or to cite them other than as ``work in progress.'' 24 The list of current Internet-Drafts can be accessed at 25 http://www.ietf.org/ietf/1id-abstracts.txt 27 The list of Internet-Draft Shadow Directories can be accessed at 28 http://www.ietf.org/shadow.html. 30 To learn the current status of any Internet-Draft, please check the 31 ``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow 32 Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), 33 munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or 34 ftp.isi.edu (US West Coast). 36 Distribution of this document is unlimited. Please send comments to 37 jak@ckm.ucsf.edu or to the dc-general@mailbase.ac.uk discussion list. 39 2. Abstract 41 The Dublin Core [DC1] is a small set of metadata elements for describing 42 information resources. This document explains how these elements are 43 expressed using the META and LINK tags of HTML [HTML4.0]. A sequence of 44 metadata elements embedded in an HTML file is taken to be a description 45 of that file. Examples illustrate conventions allowing interoperation 46 with current software that indexes, displays, and manipulates metadata, 47 such as [SWISH-E], [freeWAIS-sf2.0], [GLIMPSE], [HARVEST], [ISEARCH], 48 etc., and the Perl [PERL] scripts in the appendix. 50 3. HTML, Dublin Core, and Non-Dublin Core Metadata 52 The Dublin Core (DC) metadata initiative [DCHOME] has produced a small 53 set of resource description categories [DC1], or elements of metadata 54 (literally, data about data). Metadata elements are typically small 55 relative to the resource they describe and may, if the resource format 56 permits, be embedded in it. Two such formats are the Hypertext Markup 57 Language (HTML) and the Extensible Markup Language (XML); HTML is 58 currently in wide use, but once standardized, XML [XML] in conjunction 59 with the Resource Description Framework [RDF] promise a significantly 60 more expressive means of encoding metadata. The [RDF] specification 61 actually describes a way to use RDF within an HTML document by adhering 62 to an abbreviated syntax. 64 This document explains how to encode metadata using HTML 4.0 [HTML4.0]. 65 It is not concerned with element semantics, which are defined elsewhere. 66 For illustrative purposes, some element semantics are alluded to, but 67 in no way should semantics appearing here be considered definitive. 69 The HTML encoding allows elements of DC metadata to be interspersed with 70 non-DC elements (provided such mixing is consistent with rules governing 71 use of those non-DC elements). A DC element is indicated by the prefix 72 "DC", and a non-DC element by another prefix; for example, the prefix 73 "AC" is used with elements from the A-Core [AC]. 75 4. The META Tag 77 The META tag of HTML is designed to encode a named metadata element. 78 Each element describes a given aspect of a document or other information 79 resource. For example, this tagged metadata element, 81 84 says that Homer Simpson is the Creator, where the element named Creator 85 is defined in the DC element set. In the more general form, 87 90 the capitalized words are meant to be replaced in actual descriptions; 91 thus in the example, 93 ELEMENT_NAME was: Creator 94 ELEMENT_VALUE was: Simpson, Homer 95 and PREFIX was: DC 97 Within a META tag the first letter of a Dublin Core element name is 98 capitalized. DC places no restriction on alphabetic case in an element 99 value and any number of META tagged elements may appear together, in any 100 order. More than one DC element with the same name may appear, and each 101 DC element is optional. The next example is a book description with two 102 authors, two titles, and no other metadata. 104 106 108 110 113 The prefix "DC" precedes each Dublin Core element encoded with META, 114 and it is separated by a period (.) from the element name following it. 115 Each non-DC element should be encoded with a prefix that can be used to 116 trace its origin and definition; the linkage between prefix and element 117 definition is made with the LINK tag, as explained in the next section. 118 Non-DC elements, such as Email from the A-Core [AC], may appear together 119 with DC elements, as in 121 123 125 128 This example also shows how some special characters may be encoded. 129 The author name in the first element contains a diacritic encoded as an 130 HTML character entity reference -- in this case an accented letter E. 131 Similarly, the last line contains two double-quote characters encoded 132 so as to avoid being interpreted as element content delimiters. 134 5. The LINK Tag 136 The LINK tag of HTML may be used to associate an element name prefix 137 with the reference definition of the element set that it identifies. 138 A sequence of META tags describing a resource is incomplete without 139 one such LINK tag for each different prefix appearing in the sequence. 140 The previous example could be considered complete with the addition of 141 these two LINK tags: 143 145 148 In general, the association takes the form 150 153 where, in actual descriptions, PREFIX is to be replaced by the prefix 154 and LOCATION_OF_DEFINITION by the URL or URN of the defining document. 155 When embedded in the HEAD part of an HTML file, a sequence of LINK and 156 META tags describes the information in the surrounding HTML file itself. 157 Here is a complete HTML file with its own embedded description. 159 160 161 A Dirge 162 164 166 168 170 172 174 176 177
178	            Rough wind, that moanest loud
179	              Grief too sad for song;
180	            Wild wind, when sullen cloud
181	              Knells all the night long;
182	            Sad storm, whose tears are vain,
183	            Bare woods, whose branches strain,
184	            Deep caves and dreary main, -
185	              Wail, for the world's wrong!
186	    
187 189 6. Encoding Recommendations 191 HTML allows more flexibility in principle and in practice than is 192 recommended here for encoding metadata. Limited flexibility encourages 193 easy development of software for extracting and processing metadata. 194 At this early evolutionary stage of internet metadata, easy prototyping 195 and experimentation hastens the development of useful standards. 197 Adherence is therefore recommended to the tagging style exemplified in 198 this document as regards prefix and element name capitalization, 199 double-quoting (") of attribute values, and not starting more than one 200 META tag on a line. There is much room for flexibility, but choosing 201 a style and sticking with it will likely make metadata manipulation and 202 editing easier. The following META tags adhere to the recommendations 203 and carry identical metadata in three different styles: 205 207 211 213 Use of these recommendations is known to result in metadata that may 214 be harvested, indexed, and manipulated by popular, freely available 215 software packages such as [SWISH-E], [freeWAIS-sf2.0], [GLIMPSE], 216 [HARVEST], and [ISEARCH], among others. These conventions also work 217 with the metadata processing scripts appearing in the appendix, as well 218 as with most of the [DCPROJECTS] applications referenced from the 219 [DCHOME] site. Software support for the LINK tag and qualifier 220 conventions (see the next section) is not currently widespread. 222 Ordering of metadata elements is not preserved in general. Writers 223 of software for metadata indexing and display should try to preserve 224 relative ordering among META tagged elements having the same name (e.g., 225 among multiple authors), however, metadata providers and searchers have 226 no guarantee that ordering will be preserved in metadata that passes 227 through unknown systems. 229 7. Dublin Core in Real Descriptions 231 In actual resource description it is often necessary to qualify Dublin 232 Core elements to add nuances of meaning. While neither the general 233 principles nor the specific semantics of DC qualifiers are within scope 234 of this document, everyday uses of the qualifier syntax are illustrated 235 to lend realism to later examples. Without further explanation, the 236 three ways in which the optional qualifier syntax is currently (subject 237 to change) used to supplement the META tag may be summarized as follows: 239 241 243 245 Accordingly, a posthumous work in Spanish might be described with 247 250 253 256 258 261 Note that the qualifier syntax and label suffixes (which follow an 262 element name and a period) used in examples in this document merely 263 reflect current trends in the HTML encoding of qualifiers. Use of this 264 syntax and these suffixes is neither a standard nor a recommendation. 266 8. Encoding Dublin Core Elements 268 This section consists of very simple Dublin Core encoding examples, 269 arranged by element. 271 Title (name given to the resource) 272 ----- 274 277 280 283 286 290 Creator (entity that created the content) 291 ------- 293 295 298 300 303 305 309 311 313 316 Subject (topic or keyword) 317 ------- 319 321 325 327 331 333 337 Description (account, summary, or abstract of the content) 338 ----------- 340 348 351 356 Publisher (entity that made the resource available) 357 --------- 359 362 365 368 371 Contributor (other entity that made a contribution) 372 ----------- 374 377 379 381 384 Date (of an event in the life of the resource; [WTN8601] recommended) 385 ---- 387 390 392 396 398 400 403 405 408 412 416 420 Type (nature, genre, or category; [DCT1] recommended) 421 ---- 423 426 429 431 434 438 440 443 445 447 450 454 457 460 Format (physical or digital data format, plus optional dimensions) 461 ------ 463 465 469 472 475 478 481 484 Identifier (of the resource) 485 ---------- 487 490 493 497 501 505 Source (reference to the resource's origin) 506 ------ 508 511 514 Language (of the content of the resource; [RFC1766] recommended) 515 -------- 517 519 522 526 530 532 534 536 539 541 545 Relation (reference to a related resource) 546 -------- 548 551 554 557 560 563 567 Coverage (extent or scope of the content) 568 -------- 570 573 576 580 583 Rights (text or identifier of a rights management statement) 584 ------ 586 590 593 9. Security Considerations 595 The syntax rules for encoding Dublin Core metadata in HTML that are 596 documented here pose no direct risk to computers and networks. People 597 can use these rules to encode metadata that is inaccurate or even 598 deliberately misleading (creating mischief in the form of "index spam"), 599 however, this reflects a general pattern of HTML META tag abuse that is 600 not limited to the encoding of metadata from the Dublin Core set. Even 601 traditional metadata encoding schems (e.g., [MARC]) are not immune to 602 inaccuracy, although they are generally followed in environments where 603 production quality greatly exceeds that of the average Web site. 605 Systems that process metadata encoded with META tags need to consider 606 issues related to its accuracy and validity as part of their design and 607 implementation, and users of such systems need to consider the design 608 and implementation assumptions. Various approaches may be relevant for 609 certain applications, such as adding statements of metadata provenance, 610 signing of metadata with digital signatures, and automating certain 611 aspects of metadata creation; but these are far outside the scope of 612 this document and the underlying META tag syntax that it describes. 614 10. Copyright Notice 616 Copyright (C) The Internet Society (date). All Rights Reserved. 618 This document and translations of it may be copied and furnished to 619 others, and derivative works that comment on or otherwise explain it 620 or assist in its implmentation may be prepared, copied, published and 621 distributed, in whole or in part, without restriction of any kind, 622 provided that the above copyright notice and this paragraph are 623 included on all such copies and derivative works. However, this 624 document itself may not be modified in any way, such as by removing 625 the copyright notice or references to the Internet Society or other 626 Internet organizations, except as needed for the purpose of developing 627 Internet standards in which case the procedures for copyrights defined 628 in the Internet Standards process must be followed, or as required to 629 translate it into languages other than English. 631 The limited permissions granted above are perpetual and will not be 632 revoked by the Internet Society or its successors or assigns. 634 This document and the information contained herein is provided on an 635 "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING 636 TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT 637 NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL 638 NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR 639 FITNESS FOR A PARTICULAR PURPOSE. 641 The IETF invites any interested party to bring to its attention any 642 copyrights, patents or patent applications, or other proprietary 643 rights which may cover technology that may be required to practice 644 this standard. Please address the information to the IETF Executive 645 Director. 647 11. Appendix -- Perl Scripts that Manipulate HTML Encoded Metadata 649 This section contains two simple programs that work with versions 4 and 650 5 of the Perl [PERL] scripting language interpreter. They may be taken 651 and freely adapted for local organizational needs, research proposals, 652 venture capital bids, etc. A variety of applications are within easy 653 reach of implementors that choose to build on these scripts. 655 Script 1: Metadata Format Conversion 656 ------------------------------------- 658 Here is a simple Perl script that correctly recognizes every example of 659 metadata encoding in this document. It shows how a modest scripting 660 effort can produce a utility that converts metadata from one format to 661 another. Minor changes are sufficient to support a number of output 662 formats. 664 #!/depot/bin/perl 665 # 666 # This simple perl script extracts metadata embedded in an HTML file 667 # and outputs it in an alternate format. Issues warning about missing 668 # element name or value. 669 # 670 # Handles mixed case tags and attribute values, one per line or spanning 671 # several lines. Also handles a quoted string spanning multiple lines. 672 # No error checking. Does not tolerate more than one ") { 676 next if (! //i) { 679 while (<>) { 680 $meta .= $_; 681 last if (/>/); 682 } 683 } 684 $name = $meta =~ /name\s*=\s*"([^"]*)"/i 685 ? $1 : "MISSING ELEMENT NAME"; 686 $content = $meta =~ /content\s*=\s*"([^"]*)"/i 687 ? $1 : "MISSING ELEMENT VALUE"; 688 ($scheme) = $meta =~ /scheme\s*=\s*"([^"]*)"/i; 689 ($lang) = $meta =~ /lang\s*=\s*"([^"]*)"/i; 691 if ($lang || $scheme) { 692 $mod = " ($lang"; 693 if (! $scheme) 694 { $mod .= ")"; } 695 elsif (! $lang) 696 { $mod .= "$scheme)" } 697 else 698 { $mod .= ", $scheme)"; } 699 } 700 else 701 { $mod = ""; } 703 print " @|$name$mod; $content\n"; 704 } 705 print "@)urc;\n"; 706 # ---- end of Perl script ---- 708 When the conversion script is run on the metadata file example from 709 the LINK tag section (section 5), it produces the following output. 711 @(urc; 712 @|DC.Title; A Dirge 713 @|DC.Creator; Shelley, Percy Bysshe 714 @|DC.Type; poem 715 @|DC.Date; 1820 716 @|DC.Format; text/html 717 @|DC.Language; en 718 @)urc; 720 Script 2: Automated Metadata Creation 721 -------------------------------------- 723 The creation and maintenance of high-quality metadata can be extremely 724 expensive without automation to assist in processes such as supplying 725 pre-set or computed defaults, validating syntax, verifying value ranges, 726 spell checking, etc. Considerable relief could be had from a script 727 that reduced an individual provider's metadata burden to just the title 728 of each document. Below is such a script. It lets the provider of an 729 HTML document abbreviate an entire embedded resource description using 730 a single HTML comment statement that looks like 732 734 Our script processes this statement specially as a kind of "metadata 735 block" declaration with attached title. The general form is 737 739 This statement works much like a "Web server-side include" in that the 740 script replaces it with a fully-specified block of metadata and triggers 741 other replacements. Once installed, the script can output HTML files 742 suitable for integration into one's production Web server procedures. 744 The individual provider keeps a separate "template" file of infrequently 745 changing pre-set values for metadata elements. If the provider's needs 746 are simple enough, the only element values besides the title that differ 747 from one document to the next may be generated automatically. Using the 748 script, values may be referenced as variables from within the template 749 or within the document. Our variable references have the form 750 "(--mbVARNAME)", and here is what they look like inside a template: 752 (--mbtitle) 753 755 757 759 761 763 765 767 769 772 The above template represents the metadata block that will describe the 773 document once the variable references are replaced with real values. 774 By the conventions of our script, the following variables will be 775 replaced in both the template and in the document: 777 (--mbfilesize) size of the final output file 778 (--mbtitle) title of the document 779 (--mblanguage) language of the document 780 (--mbbaseURL) beginning part of document identifier 781 (--mbfilename) last part (minus .html) of identifier 782 (--mbfilemodtime) last modification date of the document 784 Here's an example HTML file to run the script on. 786 787 788 789 791 792 793

794 From: Acting Shift Supervisor 795 To: Plant Control Personnel 796 RE: (--mbtitle) 797 Date: (--mbfilemodtime) 798

799 Pursuant to directive DOH:10.2001/405aec of article B-2022, 800 subsection 48.2.4.4.1c regarding staff morale and employee 801 productivity standards, the current allocation of doughnut 802 acquisition funds shall be increased effective immediately. 803 804 806 Note that because replacement occurs throughout the document, the 807 provider need only enter the title once instead of twice (normally the 808 title must be entered once in the HTML head and again in the HTML body). 809 After running the script, the above file is transformed into this: 811 812 813 Nutritional Allocation Increase 814 816 818 820 822 824 826 828 830 832 834 835 836

837 From: Acting Shift Supervisor 838 To: Plant Control Personnel 839 RE: Nutritional Allocation Increase 840 Date: 1999-03-08 841

842 Pursuant to directive DOH:10.2001/405aec of article B-2022, 843 subsection 48.2.4.4.1c regarding staff morale and employee 844 productivity standards, the current allocation of doughnut 845 acquisition funds shall be increased effective immediately. 846 847 849 Here is the script that accomplishes this transformation. 851 #!/depot/bin/perl 852 # 853 # This Perl script processes metadata block declarations of the form 854 # and variable references of the 855 # form (--mbVARNAME), replacing them with full metadata blocks and 856 # variable values, respectively. Requires a "template" file. 857 # Outputs an HTML file. 858 # 859 # Invoke this script with a single filename argument, "foo". It creates 860 # an output file "foo.html" using a temporary working file "foo.work". 861 # The size of foo.work is measured after variable replacement, and is 862 # later inserted into the file in such a way that the file's size does 863 # not change in the process. Has little or no error checking. 865 $infile = shift; 866 open(IN, "< $infile") 867 or die("Could not open input file \"$infile\""); 868 $workfile = "$infile.work"; 869 unlink($workfile); 870 open(WORK, "+> $workfile") 871 or die("Could not open work file \"$workfile\""); 873 @offsets = (); # records locations for late size replacement 874 $title = ""; # gets the title during metablock processing 875 $language = "en"; # pre-set language here (not in the template) 876 $baseURL = "http://moes.bar.com/doh"; # pre-set base URL here also 877 $filename = "$infile.html"; # final output filename 878 $filesize = "(--mbfilesize)"; # replaced late (separate pass) 880 ($year, $month, $day) = (localtime( (stat IN) [9] ))[5, 4, 3]; 881 $filemodtime = sprintf "%s-%02s-%02s", 1900 + $year, 1 + $month, $day; 883 sub putout { # outputs current line with variable replacement 884 if (! /\(--mb/) { 885 print WORK; 886 return; 887 } 888 if (/\(--mbfilesize\)/) # remember where it was 889 { push @offsets, tell WORK; } # but don't replace yet 890 s/\(--mbtitle\)/$title/g; 891 s/\(--mblanguage\)/$language/g; 892 s/\(--mbbaseURL\)/$baseURL/g; 893 s/\(--mbfilename\)/$filename/g; 894 s/\(--mbfilemodtime\)/$filemodtime/g; 895 print WORK; 896 } 898 while () { # main loop for input file 899 if (! /(.*)(.*)//) { 907 $remainder = $1; 908 } 909 else { 910 while () { 911 $title .= $_; 912 last if (/(.*)\s*-->(.*)/); 913 } 914 $title .= $1; 915 $remainder = $2; 916 } 917 open(TPLATE, "< template") 918 or die("Could not open template file"); 919 while () # subloop for template file 920 { &putout; } 921 close(TPLATE); 922 $_ = $remainder; 923 &putout; 924 } 925 close(IN); 927 # Now replace filesize variables without altering total byte count. 928 select( (select(WORK), $| = 1) [0] ); # first flush output so we 929 if (($size = -s WORK) < 100000) # can get final file size 930 { $scale = 0; } # and set scale factor or 931 else { # compute it, keeping width of size field low 932 for ($scale = 0; $size >= 1000; $scale++) 933 { $size /= 1024; } 934 } 935 $filesize = sprintf "%7.7s %sbytes", 936 $size, (" ", "K", "M", "G", "T", "P") [$scale]; 938 foreach $pos (@offsets) { # loop through saved size locations 939 seek WORK, $pos, 0; # read the line found there 940 $_ = ; 941 # $filesize must be exactly as wide as "(--mbfilesize)" 942 s/\(--mbfilesize\)/$filesize/g; 943 seek WORK, $pos, 0; # rewrite it with replacement 944 print WORK; 945 } 947 close(WORK); 948 rename($workfile, "$filename") 949 or die("Could not rename \"$workfile\" to \"$filename\""); 950 # ---- end of Perl script ---- 952 12. Author's Address 954 John A. Kunze 955 Center for Knowledge Management 956 University of California, San Francisco 957 530 Parnassus Ave, Box 0840 958 San Francisco, CA 94143-0840, USA 959 Email: jak@ckm.ucsf.edu 960 Fax: +1 415-476-4653 962 13. References 964 [AAT] Art and Architecture Thesaurus, Getty Information Institute, 965 http://www.gii.getty.edu/vocabulary/aat.html 967 [AC] The A-Core: Metadata about Content Metadata, (in progress) 968 http://metadata.net/ac/draft-iannella-admin-01.txt 970 [DC1] RFC 2413, Dublin Core Metadata for Resource Discovery, 971 September 1998, ftp://ftp.isi.edu/in-notes/rfc2413.txt 973 [DCHOME] Dublin Core Initiative Home Page, 974 http://purl.org/DC/ 976 [DCPROJECTS] 977 Projects Using Dublin Core Metadata, 978 http://purl.org/DC/projects/index.htm 980 [DCT1] Dublin Core Type List 1, DC Type Working Group, March 1999, 981 http://www.loc.gov/marc/typelist.html 983 [freeWAIS-sf2.0] 984 The enhanced freeWAIS distribution, February 1999, 985 http://ls6-www.cs.uni-dortmund.de/ir/projects/freeWAIS-sf/ 987 [GLIMPSE] Glimpse Home Page, 988 http://glimpse.cs.arizona.edu/ 990 [HARVEST] Harvest Web Indexing, 991 http://www.tardis.ed.ac.uk/harvest/ 993 [HTML4.0] Hypertext Markup Language 4.0 Specification, April 1998, 994 http://www.w3.org/TR/REC-html40/ 996 [ISEARCH] Isearch Resources Page, 997 http://www.etymon.com/Isearch/ 999 [ISO639-2] 1000 Code for the representation of names of languages, 1996, 1001 http://www.indigo.ie/egt/standards/iso639/iso639-2-en.html 1003 [ISO8601] ISO 8601:1988(E), Data elements and interchange formats -- 1004 Information interchange -- Representation of dates and times, 1005 International Organization for Standardization, June 1988. 1006 http://www.iso.ch/markete/8601.pdf 1008 [MARC] USMARC Format for Bibliographic Data, US Library of Congress, 1009 http://lcweb.loc.gov/marc/marc.html 1011 [PERL] L. Wall, T. Christiansen, R. Schwartz, Programming Perl, 1012 Second Edition, O'Reilly, 1996. 1014 [RDF] Resource Description Framework Model and Syntax Specification, 1015 February 1999, http://www.w3.org/TR/REC-rdf-syntax/ 1017 [RFC1766] RFC 1766, Tags for the Identification of Languages, 1018 http://ds.internic.net/rfc/rfc1766.txt 1020 [SWISH-E] Simple Web Indexing System for Humans - Enhanced, 1021 http://sunsite.Berkeley.EDU/SWISH-E/ 1023 [TGN] Thesaurus of Geographic Names, Getty Information Institute, 1024 http://www.gii.getty.edu/tgn_browser/ 1026 [WTN8601] W3C Technical Note - Profile of ISO 8601 Date and Time Formats 1027 http://www.w3.org/TR/NOTE-datetime 1029 [XML] Extensible Markup Language (XML), 1030 http://www.w3.org/TR/REC-xml