IETF media feature registration WG Graham Klyne Internet draft Integralis Technology Ltd. 11 March 1998 Expires: 11 September 1998 An algebra for describing media feature sets Status of this memo This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as ``work in progress''. To learn the current status of any Internet-Draft, please check the ``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast). Copyright (C) 1998, The Internet Society Abstract A number of Internet application protocols have a need to provide content negotiation for the resources with which they interact [1]. A framework for such negotiation is described in [2]. Part of this framework is a way to describe the range of media features which can be handled by the sender, recipient or document transmission format of a message. A format for a vocabulary of individual media features and procedures for registering media features are presented in [3]. This document describes an algebra which can be used to define feature sets which are formed from combinations and relations involving individual media features. Such feature sets are used to describe the media feature handling capabilities of message senders, recipients and file formats. This document does not set out to specify a syntax for defining feature sets. Klyne [Page 1] Internet draft 11 March 1998 An algebra for describing media feature sets Table of contents 1. Introduction.............................................2 1.1 Structure of this document ...........................3 1.2 Discussion of this document ..........................3 1.3 Ammendment history ...................................4 1.4 Unfinished business ..................................4 2. Terminology and definitions..............................4 3. Media feature values.....................................5 3.1 Complexity of feature algebra ........................5 3.2 Sufficiency of simple types ..........................6 3.2.1 Unstructured data types..........................6 3.2.2 Cartesian product................................6 3.2.3 Disciminated union...............................6 3.2.4 Array............................................7 3.2.5 Powerset.........................................7 3.2.6 Sequence.........................................8 4. Feature set predicates...................................8 4.1 An algebra for data file format selection ............9 4.1.1 Describing file format features..................9 4.1.1.1 Feature ranges 10 4.1.1.2 Feature combinations 11 4.1.2 Content, sender and recipient capabilities.......12 4.2 Conclusion and proposal ..............................12 5. Other issues.............................................13 5.1 Some thoughts on describing preferences ..............13 6. Security considerations..................................14 7. Copyright................................................14 8. Acknowledgements.........................................15 9. References...............................................15 10. Author's address........................................16 1. Introduction A number of Internet application protocols have a need to provide content negotiation for the resources with which they interact [1]. A framework for such negotiation is described in [2]. A part of this framework is a way to describe the range of media features which can be handled by the sender, recipient or document transmission format of a message. Descriptions of media feature capabilities need to be based upon some underlying vocabulary of individual media features. A format for such a vocabulary and procedures for registering media features are presented in [3]. Klyne [Page 2] Internet draft 11 March 1998 An algebra for describing media feature sets This document defines an algebra which can be used to describe feature sets which are formed from combinations and relations involving individual media features. Such feature sets are used to describe the media handling capabilities of message senders, recipients and file formats. The feature set algebra is built around the principle of using feature set predicates as mathematical relations which define constraints on feature handling capabilities. The idea is that the same form of feature set expression can be used to describe sender, receiver and file format capabilities. This has been loosely modelled on the way that the Prolog programming language uses Horn Clauses to describe a set of result values. This document does not attempt to describe a concrete syntax for the algebra. Examples are given using notation drawn from the C and Prolog programming languages. 1.1 Structure of this document The main part of this draft addresses four main areas: Section 2 introduces and references some terms which are used with special meaning. Section 3 discusses constraints on the data types allowed for individual media feature values. Section 4 introduces and describes the algebra used to construct feature set descriptions with expressions containing media features. Section 5 introduces other related issues which are not covered by the feature set algebra. 1.2 Discussion of this document Discussion of this document should take place on the content negotiation and media feature reagistration mailing list hosted by the Internet Mail Consortium (IMC): Please send comments regarding this document to: ietf-medfree@imc.org To subscribe to this list, send a message with the body 'subscribe' to "ietf-medfree-request@imc.org". Klyne [Page 3] Internet draft 11 March 1998 An algebra for describing media feature sets To see what has gone on before you subscribed, please see the mailing list archive at: http://www.imc.org/ietf-medfree/ 1.3 Ammendment history 00a 11-Mar-1998 Document initially created. 1.4 Unfinished business . Array values: are they needed? (section 3.2.4) . Feature set predicates: clean up description . Other issues: are there more? . Security considerations: are there any? 2. Terminology and definitions Feature Collection is a collection of different media features and associated values. This might be viewed as describing a specific rendering of a specific instance of a document or resource by a specific recipient. Feature Set is a set of zero, one or more feature collections. Feature set predicate A function of an arbitrary feature set value which returns a Boolean result. A TRUE result is taken to mean that the corresponding feature set belongs to some set of media feature handling capabilities defined by the predicate. Other terms used in this draft are defined in [2]. Klyne [Page 4] Internet draft 11 March 1998 An algebra for describing media feature sets 3. Media feature values This document assumes that individual media feature values are simple atomic values: . Boolean values . Enumerated values . Numeric values More complex media feature values might be accommodated, but they would (a) be undesirable because they would complicate the algebra, and (b) are not necessary. These statements are justified in the following sub-sections. 3.1 Complexity of feature algebra Statement (a) above is justified as follows: predicates constructed as expressions containing media feature values must ultimately resolve to a logical combination of feature value tests. A full range of simple tests for all of the data types listed above can be performed based on just two fundamental operations: equality and less-than. All other meaningful tests can be constructed as predicates incorporating these two basic tests. For example: ( a != b ) iff !( a == b ) ( a <= b ) iff !( b < a ) ( a > b ) iff ( b < a ) ( a >= b ) iff !( a < b ) If additional (composite) data types are introduced, then additional operators must be introduced to test their component parts: the addition of just one further comparison operator increases the number of such operators by 50%. 3.2 Sufficiency of simple types To justify statement (b), let us first review the range of composite data types that might reasonably be considered. In 1972, a paper "Notes on data structuring" by C. A. R. Hoare was published in the book "Structured Programming" [4]. This was an early formalization of data types used in programming languages, and its content has formed a sufficient basis for describing the data types in almost every programming language which has been Klyne [Page 5] Internet draft 11 March 1998 An algebra for describing media feature sets developed. This gives good grounds to believe that the type framework is also sufficient for media features. The data types covered by this paper are: . Unstructured data types: (integer, real, enumeration, ordered enumeration, subranges). . Cartesian product (e.g. C 'struct'). . Discriminated union (e.g. C 'union'). . Array. . Powerset (e.g. Pascal 'SET OF'). . Sequence (e.g. C string, Pascal 'FILE OF'). To demonstrate sufficiency of simple types for media features we must show that the feature-set defining properties of these composite types can be captured using predeicates on the simple simple types described previously. 3.2.1 Unstructured data types Note that the unstructured data types noted correspond closely to, and can be represented by the proposed simple value types for media features. 3.2.2 Cartesian product A cartesian product value (e.g. resolution=[x,y]) is easily captured as a collection of two or more separately named media features (e.g. x-resolution=x, y-resolution=y). 3.2.3 Disciminated union A discriminated union value is an either/or type of choice. For example, a given workstation might be able to display 16K colours at 1024x768 resolution, OR 256 colours at 1280x1024 resolution. These possibilities are captured by a logical-OR of predicates: ( ( x-resolution <= 1024 ) && ( y-resolution <= 768 ) && ( colours <= 16384 ) ) || ( ( x-resolution <= 1280 ) && ( y-resolution <= 1024 ) && ( colours <= 256 ) ) Klyne [Page 6] Internet draft 11 March 1998 An algebra for describing media feature sets 3.2.4 Array An array represents a mapping from one data type to another. For example, the availability of pens in a pen plotter might be represented by an array which maps a pen number to a colour. If the array index which forms the basis for defining a feature set is assumed to be a constant, then each member can be designated by a feature name which incorporates the index value. For example: Pen-1=black, pen-2=red, etc. Another example where an array might describe a media feature is a colour pallette: an array is used to associate a colour value (in terms of RGB or some other colour model) with a colour index value. In this case is is possible to envisage a requirement for a particular colour to be loaded in the pallette without any knowledge of the index which maps to it. In this case, the colour might be treated as a named Boolean attribute: if TRUE then that colour is deemed to be available in the pallette Feature selection based on a variable array index is more difficult, but it is believed that this is not a required capability for media selection. [[I cannot think of any example of feature selection which involves a variable index into an array. If such a feature is presented, an array type could be added to the set of allowable media feature types, and an array selection operator added to the algebra.]] 3.2.5 Powerset A powerset is a collection of zero, one or more values from some base set of values. A colour pallette may be viewed as a powerset of colour values, or the fonts available in a printer as a powerset of all available fonts. A powerset is very easily represented by a separate Boolean-values feature for each member of the base set. The value TRUE indicates that the corresonding value is a member of the powerset value. 3.2.6 Sequence A sequence is a list of values from some base set of values, which are accessed sequentially. A sequence can be modelled by an array if one assumes integer index values starting at (say) 1 and incrementing by 1 for each Klyne [Page 7] Internet draft 11 March 1998 An algebra for describing media feature sets successive element of the sequence. Other variants of a sequence can be similarly modelled by an array. Thus, the considerations described above relating to array values can be considered as also applying (in part) to sequence values. That is, if arrays are deemed to be adequately handled, then sequence values too can be handled. 4. Feature set predicates [[This section may be incomplete and is certainly not polished. It consists mainly of a reproduction of the proposals previously posted as messages to the 'conneg' mailing list]] A model for data file selection based on relational set definition and selection from the resulting set, using a subset of the Prolog programming language [5] as a descriptive notation for this purpose. NOTE: The use of Prolog as a syntax for feature description is NOT being proposed; rather, the Prolog- like notation is used to develop the semantics of an algebra. Once the semantics have been developed, they can be mapped to some convenient syntax. For the purposes of developing this algebra, examples are drawn from the media features described in [6], which in summary are: pix-x=n (Image size, in pixels) pix-y=m res-x=n (Image resolution, pixels per inch) res-y=m UA-media= screen|stationary|transparency|envelope| continuous-long papersize= na-letter|iso-A4|iso-B4|iso-A3|na-legal color=n (Colour depth in bits) grey=n (Grey scale depth in bits) 4.1 An algebra for data file format selection The basic idea proposed here is that a feature capability of the original content, sender, data file format or recipient is represented as a predicate of a feature set. Under universal quantification (i.e. selecting all possible values that satisfy it), a predicate indicates a range of feature sets). Klyne [Page 8] Internet draft 11 March 1998 An algebra for describing media feature sets This idea is inherent in Prolog clause notation, which is used in the example below to describe a predicate 'acceptable_file_format(File)' which yields a set of possible file transfer formats using other predicates which indicate the file formats available to the sender and feature capabilities of the file format, original content, acceptable_file_format(File) :- sender_available_file_format(File), match_format(File). match_format(File) :- pix_x(File,Px), content_pix_x(Px), recipient_pix_x(Px), pix_y(File,Py), content_pix_x(Py), recipient_pix_y(Py), res_x(File,Rx), content_res_x(Rx), recipient_res_x(Rx), res_y(File,Ry), content_res_y(Ry), recipient_res_y(Ry), colour(File,C), content_colour(C), recipient_colour(C), grey(File,G), content_grey(G), recipient_grey(G), ua_media(File,M), content_ua_media(M), recipient_ua_media(M), papersize(File,P), content_papersize(P), recipient_papersize(P). Essentially, this selects a set of file transfer formats from those available ('sender_available_file_format'), choosing any whose feature capabilities have a non-empty intersection with the feature capabilities of the original content and the recipient. 4.1.1 Describing file format features The above framework suggests a file format is described by a set of feature values. As an abstract theory, this works fine but for practical use it has a couple of problems: (a) description of features with a large number of possibilities (b) describing features which are supported in specific combinations A typical case of (a) would be where a feature (e.g. size of image in pixels) can take any value from a range. To present and test each value separately is not a practical proposition, even if it were possible. (A guide here as to what constitutes a practical approach is to make a judgement about the feasibility of writing the corresponding Prolog program.) A typical case of (b) would be where different values for certain features can occur only in combinations (e.g. allowable Klyne [Page 9] Internet draft 11 March 1998 An algebra for describing media feature sets combinations of resolution and colour depth on a given video display). If the features are treated independently as suggested by the framework above, all possible combinations would be allowed, rather than the specifically allowable combinations. 4.1.1.1 Feature ranges The first issue can be addressed by considering the type of value which can represent the allowed features of a data file format. The features of a specific data file are represented as values from an enumeration (e.g. ua_media, papersize), or a numeric values (integer or rational). The description of allowable file format feature needs to represent all the allowable values. The Prolog clauses used above to describe file format features already allow for multiple enumerated values. Each acts as a mathematical relation to select a subset of the set of file values allowed by the preceding predicates. Section 3 of this document describes proposed media feature value types. For numeric feature values, a sequence of two numbers to represent a closed interval is suggested, where either value may be replaced by an empty list to indicate no limiting value. Thus: [m,n] => { x : m <= x <= n } [m,[]] => { x : m <= x } [[],n] => { x : x <= n } The following Prolog would be used to describe such range matching: feature_match(X,[[],[]]). feature_match(X,[L,[]]) :- L <= X. feature_match(X,[[],H]) :- X <= H. feature_match(X,[L,H]) :- L <= X, X <= H. feature_match(X,X). (This example strectches standard Prolog, which does not support non-integer numbers. The final clause allows 'feature_match' to deal with equality matching for the normal enumerated value case.) 4.1.1.2 Feature combinations Representing allowed combinations of features is trickier. I can see two possible approaches: (a) use additional predicates to impose relationships between features. Klyne [Page 10] Internet draft 11 March 1998 An algebra for describing media feature sets Thus, if x- and y- resolutions were to be constrained to square or semi-square aspect-ratios, the following predicates might be added to the feature set description: ( feature_match(Rx,Ry) ; feature_match(Rx,2*Ry) ; feature_match(2*Rx,Ry) ), feature_match(Rx,[72,600]), feature_match(Ry,[72,600]) (where the last two constraints might be imposed by the 'res_x' and 'res_y' predicates). Another example might be: ( ( feature_match(Px,640), feature_match(Py,480) ) ; ( feature_match(Px,600), feature_match(Py,800) ) ; ( feature_match(Px,1024), feature_match(Py,768) ) ) This is based on the predicates 'pix_x(File,Px)', 'pix_y(File,Py)', 'res_x(File,Rx)' and 'res_y(File,Ry)' from the initial framework above.) (b)another approach might be to allow meta-features which are groupings of other features. Applying this to the above examples would replace: pix_x(File,Px), pix_y(File,Py), res_x(File,Rx), res_y(File,Ry), with: pix(File,[Px,Py]), res(File,[Rx,Ry]) where: pix(File,[640, 480]). pix(File,[800, 600]). pix(File,[1024,768]). res(File,[Rx,Ry]) :- feature_match(Rx,[72,600]), feature_match(Ry,[72,600]), ( feature_match(Rx,Ry) ; feature_match(Rx,2*Ry) ; feature_match(2*Rx,Ry) ). Klyne [Page 11] Internet draft 11 March 1998 An algebra for describing media feature sets On closer examination, these two options turn out to be pretty much the same thing: a requirement to impose additional constraint predicates on a file feature set. They differ only in where the predicates are applied. This all suggests that file format capabilities can be described by feature set predicates: arbitrary logical expressions using AND, OR, NOT logical combining operators, and media feature value matching. 4.1.2 Content, sender and recipient capabilities It has already been suggested that these are represented as predicates on the feature set of a particular data file. Having also shown that these same predicates can represent constraints on feature combinations, we proceed directly to a proposal in which everything is represented by predicates. 4.2 Conclusion and proposal My conclusion is that data file features, original content features, sender features and recipient features (and user features) should all be represented as predicates. A key insight, which points to this conclusion, is that a collection of feature values can be viewed as describing a specific document actually rendered by a specific recipient. The capabilities that we wish to describe, be they sender, file format, recipient or other capabilities, are sets of such feature collections, with the potential to ultimately render using any of the feature value collections in the set. This raises a terminology problem, because the term "feature set" has been used to mean a collection of specific feature values and a range of possible feature values. Thus the more restricted definitions of "feature collection" and "feature set which appear in the terminology section of this document. Original content, data files and recipients (and users) all embody the potential capability to deal with a "feature set". One of the aims of content negotiation is to select an available data file format (availability being circumscribed by the original content and sender capabilities) whose feature set intersection with the recipient feature set is non-empty. (The further issue of preference being deferred for later consideration.) The concept of a mathematical relation as a subset defined by a predicate can be used to define feature sets, using universal Klyne [Page 12] Internet draft 11 March 1998 An algebra for describing media feature sets quantification (i.e. using the predicate to select from some notional universe of all possible feature collections). Thus, a common framework of predicates can be used to represent the feature capabilities of original content, data file formats, recipients and any other participating entity which may impose constraints on the usable feature sets. Within this framework, I believe it is sufficient to represent individual feature values as enumerated values or numeric ranges. The thesis in section 3 of his document, and a study of [6], indicate that more complex media feature values can be handled by predicates. 5. Other issues 5.1 Some thoughts on describing preferences The general problem of describing preferences between feature sets is very much more complex than describing allowable feature sets. Before any real progress can be made, some simplifying assumptions are required. At the end of the day, it is possible that any preference selection mechanism is at best a hint which must be subject to override by operator input. It has been suggested that numeric q-factors, as used in some HTTP negotiations, are misleading and are really just a way of ranking feature sets. The problem appears to be very multidimensional: there may be preferences implied by the original content, the recipient system or the receiving user. In addition, the different features each add an additional dimensions of posible preference. Mathematically, the set of all feature collections and a fully general ordering relation of "preference" could be viewed as yielding a partially ordered set. Simplifying assumptions should, I believe, be aimed at making this into a fully ordered set, so that an ordering relation is defined for every pair of feature collections. Given some simplifying assumptions, the approach suggested for using predicates to select allowable data formats might be extended to preferences. One might then view a predicate as a restricted preference (i.e. preference compared with no data transfer). Klyne [Page 13] Internet draft 11 March 1998 An algebra for describing media feature sets 6. Security considerations [[Does this introduce any security considerations which are not already covered in [1,2,3]? I suspect not.]] 7. Copyright Copyright (C) The Internet Society 1998. All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 8. Acknowledgements My thanks to Larry Masinter for demonstrating to me the breadth of the media feature issue, and encouraging me to air my early ideas. Early discussions of early ideas on the IETF-HTTP and IETF-FAX discussion lists led to useful inputs from Koen Holtman, Larry Masinter, Ted Hardie and Dan Wing. The debate was later moved to the IETF conneg WG mailing list, where Al Gilman was particularly helpful in helping me to refine these ideas for a feature set algebra. Klyne [Page 14] Internet draft 11 March 1998 An algebra for describing media feature sets 9. References [1] "Scenarios for the Delivery of Negotiated Content" T. Hardie, NASA Network Information Center Internet draft: Work in progress, November 1997. [2] "Requirements for protocol-independent content negotiation" G. Klyne, Integralis Ltd. Internet draft: Work in progress, March 1998. [3] "Content feature tag registration procedures" Koen Holtman, TUE Andrew Mutz, Hewlett-Packard Ted Hardie, NASA Internet draft: Work in progress, November 1997. [4] "Notes on data structuring" C. A. R. Hoare, in "Structured Programming" Academic Press, APIC Studies in Data Processing No. 8 ISBN 0-12-200550-3 / 0-12-200556-2 1972. [5] "Programming in Prolog" (2nd edition) W. F. Clocksin and C. S. Mellish, Springer Verlag ISBN 3-540-15011-0 / 0-387-15011-0 1984. [6] "Media Features for Display, Print, and Fax" Larry Masinter, Xerox PARC Koen Holtman, TUE Andrew Mutz, Hewlett-Packard Dan Wing, Cisco Systems Internet draft: Work in progress, January 1998. Klyne [Page 15] Internet draft 11 March 1998 An algebra for describing media feature sets 10. Author's address Graham Klyne Integralis Technology Ltd Brewery Court 43-45 High Street Theale Reading, RG7 5AH United Kingdom Telephone: +44 118 930 6060 Facsimile: +44 118 930 2143 E-mail: GK@ACM.ORG Klyne [Page 16]