INTERNET DRAFT Phillip M. Hallam-Baker, W3C Expires in six months email: Brian Behlendorf email: 21st February 1996 Extended Log File Format Status of this Memo This document is an Internet draft. Internet drafts are working documents of the Internet Engineering Task Force (IETF), its areas and its working groups. Note that other groups may also distribute working information as Internet drafts. Internet Drafts are draft documents valid for a maximum of six months and can be updated, replaced or obsoleted by other documents at any time. It is inappropriate to use Internet drafts as reference material or to cite them as other than as "work in progress". To learn the current status of any Internet draft please check the "lid-abstracts.txt" listing contained in the Internet drafts shadow directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East coast) or ftp.isi.edu (US West coast). Further information about the IETF can be found at URL: http://www.cnri.reston.va.us/ Distribution of this document is unlimited. Please send comments to the HTTP working group (HTTP-WG) of the Internet Engineering Task Force (IETF) at < http://www.ics.uci.edu/pub/ietf/http/. This note is also avaliable as a World Wide Web Consortium Working Draft WD-logfile-960221, archived at http://www.w3.org/pub/WWW/TR/WD-logfile-960221.html Extended Log File Format WD-logfile-960221 Extended Log File Format W3C Working Draft _WD-logfile-960221_ This version: http://www.w3.org/pub/WWW/TR/WD-logfile-960221.html Latest version: http://www.w3.org/pub/WWW/TR/WD-logfile.html Authors: Phillip M. Hallam-Baker Phillip M. Hallam-Baker Page 1 Extended Log File Format Brian Behlendorf ------------------------------------------------------------------------------ Status of this document This is a W3C Working Draft for review by W3C members and other interested parties. It is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to use W3C Working Drafts as reference material or to cite them as other than "work in progress". A list of current W3C working drafts can be found at: http://www.w3.org/pub/WWW/TR Note: since working drafts are subject to frequent change, you are advised to reference the above URL, rather than the URLs for working drafts themselves. Phillip M. Hallam-Baker Page 2 Extended Log File Format Abstract An improved format for Web server log files is presented. The format is extensible, permitting a wider range of data to be captured. This proposal is motivated by the need to capture a wider range of data for demographic analysis and also the needs of proxy caches. Introduction Most Web servers offer the option to store logfiles in either the common log format or a proprietary format. The common log file format is supported by the majority of analysis tools but the information about each server transaction is fixed. In many cases it is desirable to record more information. Sites sensitive to personal data issues may wish to omit the recording of certain data. In addition ambiguities arise in analysing the common log file format since field separator characters may in some cases occur within fields. The extended log file format is designed to meet the following needs: * Permit control over the data recorded. * Support needs of proxies, clients and servers in a common format * Provide robust handling of character escaping issues * Allow exchange of demographic data. * Allow summary data to be expressed. The log file format described permits customized logfiles to be recorded in a format readable by generic analysis tools. A header specifying the data types recorded is written out at the start of each log. This work is in part motivated by the need to support collection of demographic data. This work is discussed at greater length in companion drafts describing session identifier URIs [Hallam96a] and more consistent proxy behaviour [Hallam96b]. Format A extended log file contains a sequence of _lines_ containing ASCII characters terminated by either the sequence CR or CRLF. Log file generators should follow the line termination convention for the platform on which they are executed. Analysers should accept either form. Each line may contain either a _directive_ or an _entry_. Entries consist of a sequence of _fields_ relating to a single HTTP transaction. Fields are separated by whitespace, the use of tab characters for this purpose is encouraged. If a field is unused in a particular entry dash "-" marks the omitted field. Directives record information about the logging process itself. Phillip M. Hallam-Baker Page 3 Extended Log File Format The following directives are defined: Version: __.__ The version of the extended log file format used. This draft defines version 1.0. Fields: [__...] Specifies the fields recorded in the log. Software: _string_ Identifies the software which generated the log. Start-Date: __ _