Minutes of the Web User Privacy: Expectations & Threats (webpriv) BOF
Chaired by: Larry Masinter (firstname.lastname@example.org)
User Services Area director: Joyce Reynolds (email@example.com)
Mailing list: firstname.lastname@example.org. (to join: email@example.com; a majordomo list)
Minutes taken by Craig R.P. Heath (firstname.lastname@example.org) with slight amendments by April Marine (email@example.com) in square brackets to reflect comments from the list. Questions should go to April Marine.
The scope of the BOF was to discuss current issues with regard to privacy in respect of the WWW, and whether there is something the IETF can do to help, by way of user education, or guidelines for spec/protocol writers.
I. Privacy Issues
Hit Metering - info on user's browsing habits can be gathered by monitoring page hits; if a cache is used (either in a proxy server or the user client) the ability to monitor is moved, but not necessarily avoided. Currently privacy issues are not considered in implementations; avoiding hit metering may be counter to what content providers want. [Privacy issues are considered in RFC 2227, but the fear was expressed that some content providers may desire more information than simple hit-metering provides and would therefore not use the facilities in RFC 2227.]
State Management (cookies) - the cookie mechanism is [an extension to] the HTTP protocol. Along with returning the requested page, the server can set a cookie on the client. The client should return the cookie to the server the next time the page/server is accessed. There is an issue here with regard to European privacy laws, which are more strict than the USA. Users need to be aware of what is going on. There was a discussion of "unverifiable transactions". Content providers may link to images from third party sites - the images may be accompanied by cookies. [One way] the third party can discover where the image was referenced from [is] by looking at the "referrer" field in the http header. If another content provider links to the same third party, the cookie from the first content provider will be returned, allowing a history of the sites visited to be deduced. This situation is typically encountered with sites using a third party advertising agent, e.g., Altavista's use of Doubleclick. [A] purpose of the cookies is, for example, to limit the number of times a particular ad is displayed. It was pointed out that the referrer field can be disabled in Netscape by editing a config file. It was questioned whether changing this mechanism would help, or whether advertisers would just find some other way of achieving the same goal (the content providers are already implicitly cooperating with the advertisers). It was pointed out that SET has persistent info that can be retrieved in certain circumstances. Cookies are a mixed blessing - combined with personal information they can be used to tailor the view an individual is presented with. Users should be able to turn cookies on and off. There was a suggestion that "certified cookies" could be provided where there was some assurance of what use the cookie will be put to. This would need to apply to plug-ins, etc. as well - it is very important for the user to understand how information is to be used. The expectation in Asia is that 90% of the cost of an Internet connection will be met by advertising - there is a danger of privacy issues hurting advertisers. It was pointed out that the aims of advertisers is moving from simply displaying the advert to closing the transaction. A straw poll indicated that some of the audience never register with web sites, and some register using pseudonyms.
[It should be noted that the effect of privacy laws and the fact that they differ in different parts of the world is a general issue for the discussion and is not limited to the topic of cookies.]
W3C Platform for Privacy Preferences Project (P3P) - http://www.w3.org/P3P/ - privacy information can be encoded in the W3C metadata format, everyone can define their own private policy, similar to the PICS model for content labelling. Servers announce their intentions for processing of personal information in metadata, allowing the user to accept or reject them. Several drafts are available from the web site, and there are two working groups on implementation. User preferences can be configured to cover domains from a single site, through a group of sites, up to all sites. Configurations can be exchanged via a URL. The privacy assertions would include cookie processing, making the cookie issue academic. All other processing of personal info would be included, e.g., use of user name/serial number in URLs. A decision must be made to trust the accuracy of the privacy assertions - this is essentially the standard trojan horse problem. Privacy assertions can be signed by a third party as assurance, but they don't have to be. The user policy can specify whether assertions must be signed to be trusted.
Other Privacy Issues - Misuse of indexed mailto: URLs for spam was raised as an issue. It was suggested that privacy problems are social, not technological, and the basic issue is trust. If a working group is formed, it would need strong links with W3C, also with any working group emerging from the spam BOF. Users need "informed consent" - both informational and technical. The basis of the P3P model is in real-world trust in institutions; institutions can vouch for others.
II. Operational Issues
NASA received a Freedom of Information Act request for full access logs for all web sites (~30M/day). Person has requested this from all federal sites. Agencies are not allowed to ask why the information is wanted. There is a concern that the requester will be able to use the information to build a "click trail" for users accessing the sites. "Anonymising" the log was not permissable. As a result of this, it is now policy to keep raw logs for only 30 days (even backups); summary reports have their own archival policies. Businesses also have issues with "legal discovery" and may need to take a similar approach. It was pointed out that logs are necessary for characterisation of usage - if logs must be discarded, it will be necessary to do the analysis "as you go."
With the greater availability of information, it is getting harder to effectively anonymise logs - "De-anonymising" has been demonstrated with medical records, for example.
III. Plans for Working Group
The User Services (USV) area is not just concerned with end-users, but all levels of users. Its charter includes the production of guidelines and books. There seems to be a reasonable level of interest - a possible overlap with the run (Responsible Use of the Network) working group was noted. The working group could be a combined effort with other areas, in particular the Security area. Although the end product would not be a new protocol, expertise from the technical community is needed. The choice of area (USV, SEC, etc.) is less important than the composition of the working group itself.
IV. Potential Inputs
There is a paper analyzing the effects of the European Union privacy directive on US trade - inaction on this may result in difficulties.
April Marine (firstname.lastname@example.org) has volunteered to chair the group, at least until the next meeting.
Erik Bataller (email@example.com) has volunteered to gather information on user experiences (similar to the NASA experience above).
Ted Hardie (firstname.lastname@example.org) has set up this list (email@example.com).
go to list