idnits 2.17.1 draft-sparks-genarea-mailarch-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (Dec 16, 2011) is 4514 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-12) exists of draft-ietf-eai-5738bis-02 -- Obsolete informational reference (is this intentional?): RFC 3501 (Obsoleted by RFC 9051) Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group R. Sparks 3 Internet-Draft Tekelec 4 Intended status: Informational Dec 16, 2011 5 Expires: June 18, 2012 7 IETF Email List Archiving and Search Tool Requirements 8 draft-sparks-genarea-mailarch-03 10 Abstract 12 The IETF makes heavy use of email lists to conduct its work. 13 Participants frequently need to search and browse the archives of 14 these lists, and have asked for improved search capabilities. The 15 current archive mechanism could also be made more efficient. This 16 memo captures the requirements for improved email list archiving and 17 searching systems. 19 Status of this Memo 21 This Internet-Draft is submitted in full conformance with the 22 provisions of BCP 78 and BCP 79. 24 Internet-Drafts are working documents of the Internet Engineering 25 Task Force (IETF). Note that other groups may also distribute 26 working documents as Internet-Drafts. The list of current Internet- 27 Drafts is at http://datatracker.ietf.org/drafts/current/. 29 Internet-Drafts are draft documents valid for a maximum of six months 30 and may be updated, replaced, or obsoleted by other documents at any 31 time. It is inappropriate to use Internet-Drafts as reference 32 material or to cite them other than as "work in progress." 34 This Internet-Draft will expire on June 18, 2012. 36 Copyright Notice 38 Copyright (c) 2011 IETF Trust and the persons identified as the 39 document authors. All rights reserved. 41 This document is subject to BCP 78 and the IETF Trust's Legal 42 Provisions Relating to IETF Documents 43 (http://trustee.ietf.org/license-info) in effect on the date of 44 publication of this document. Please review these documents 45 carefully, as they describe your rights and restrictions with respect 46 to this document. Code Components extracted from this document must 47 include Simplified BSD License text as described in Section 4.e of 48 the Trust Legal Provisions and are provided without warranty as 49 described in the Simplified BSD License. 51 Table of Contents 53 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 54 2. List Search and Archive Requirements . . . . . . . . . . . . . 3 55 2.1. Search and Browsing . . . . . . . . . . . . . . . . . . . . 3 56 2.2. IMAP access . . . . . . . . . . . . . . . . . . . . . . . . 4 57 2.3. Archiving Active Lists . . . . . . . . . . . . . . . . . . 5 58 2.4. Importing Messages from Other Archives . . . . . . . . . . 5 59 2.5. Exporting messages from the Archives . . . . . . . . . . . 6 60 2.6. Redundancy . . . . . . . . . . . . . . . . . . . . . . . . 6 61 2.7. Archive Administration . . . . . . . . . . . . . . . . . . 7 62 2.8. Transition Requirements . . . . . . . . . . . . . . . . . . 7 63 3. Internationalized Address Considerations . . . . . . . . . . . 7 64 4. Security Considerations . . . . . . . . . . . . . . . . . . . . 7 65 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 7 66 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 7 67 7. Changelog . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 68 7.1. 01 to 02 . . . . . . . . . . . . . . . . . . . . . . . . . 8 69 7.2. 00 to 01 . . . . . . . . . . . . . . . . . . . . . . . . . 8 70 8. Informative References . . . . . . . . . . . . . . . . . . . . 8 71 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 9 73 1. Introduction 75 The IETF makes heavy use of email lists to conduct its work. 76 Participants frequently need to search the archives of these lists, 77 and have asked for improved search capabilities, particularly when 78 the search needs to cover a large period of time, or cross several 79 lists. For instance, document editors, shepherds, working group 80 chairs, and area directors may need to review all discussion of a 81 particular draft. That discussion may be spread across the working 82 group list, one or more directorate lists, and the IETF general list. 83 Occasionally, work impacts multiple groups, possibly in different 84 areas, and the search must cover additional working group lists. The 85 current tools for performing these searches require several manually 86 coordinated steps, which are error prone. Without a local copy of 87 the archive (which may not be complete), searching most working group 88 lists requires brute force effort, aided possibly by web search 89 engines. More advanced search capabilites have been constructed for 90 a limited subset of the available lists and are exposed in the "Email 91 Archives Quick Search" section of the main IETF website. While these 92 tools are of great assistance, there is still significant need for 93 improvement. The current archive mechanism could also be made more 94 efficient. The current practices involve duplicate stores (for the 95 web and ftp interfaces), which impacts storage and replication, and 96 is subject to inconsistency. This memo captures the requirements for 97 improved email list archiving and searching systems. 99 Discussion of this memo should take place on the ietf@ietf.org 100 mailing list. 102 2. List Search and Archive Requirements 104 2.1. Search and Browsing 106 o The system must provide a web interface for search and browsing 107 archived messages. 109 o The system must allow browsing the entire archive of a given list 110 by thread or by date. 112 o The system must allow browsing the results of a search by thread 113 or by date. 115 Both threading based on Message-Id/References/In-Reply-To and 116 threading based on same subject line (modulo short prefixes 117 like re: and fwd:) should be taken into account. 119 o The system must allow searching across any subset of the archived 120 lists (one list, a selection of lists, or all lists). 122 o The system must allow searching of any combination (using AND and 123 OR operators) of the following attributes. Richer search 124 capabilites are highly desirable. 126 - string occurring in sender name 128 - date range 130 - string occurring in Subject 132 - string occurring in message body 134 - string occuring in message header (in particular, exact match 135 of Message-Id) 137 For instance, it would be nice to search the entire archive 138 for instances of a message with a given Message-ID with a 139 URL like 142 o Individual messages must be representable by a long-term stable 143 URI that can be shared between users. That is, the URI must be 144 suitable for reference in an email message. 146 - It would be preferable for that URI to appear in an Archived-At 147 header field in the message [RFC5064]. 149 o Searches should be representable by a URI that can be shared 150 between users 152 - Such URIs should be long-term stable. 154 - The search may be re-executed when the URI is referenced. It 155 is acceptable for the same URI to produce different results if 156 accessed at different times (reflecting additional messages 157 that may match the search criteria for example.) 159 2.2. IMAP access 161 Many participants would prefer to access the list archives using IMAP 162 [RFC3501]. Providing this access while meeting the following 163 requirements will likely require an IMAP server with specialized 164 capabilities. 166 o The system should expose the archive using an IMAP interface, with 167 each list represented as a mailbox. 169 o This interface must work with standard IMAP clients. 171 o The interface should allow users to each have their own read/ 172 unread marks for messages. Allowing other annotation is 173 desirable. 175 - If this requires the user to login, the system should use 176 datatracker login credentials 178 o The interface must have server-side searching enabled, and should 179 support multiple simultaneous extensive searches. 181 2.3. Archiving Active Lists 183 o The archive system must accept messages handled by various mail 184 list manager packages. 186 - Lists hosted on the IETF systems are served by mailman 187 [mailman]. 189 - Lists hosted at other organizations may use other packages. 191 * The archive system must accept messages through subscribing 192 to such an external list. 194 * The archive system may support other mechanisms for 195 accepting messages into the archive 197 2.4. Importing Messages from Other Archives 199 Lists hosted at other systems are sometimes moved to the IETF 200 servers, and their archive is moved with them. The archiving system 201 must be able to import these archives. 203 o At a minimum the archive system must be able to import mbox 204 formatted archives [RFC4155][mbox]. 206 o The archive system should be able to import maildir and maildir- 207 like (the key characteristic being on-message-per-file) formatted 208 archives [maildir]. 210 o It is acceptable to use a separate utility to convert between 211 these formats before import as long as the conversion is lossless 213 2.5. Exporting messages from the Archives 215 o The archive system must support exporting messages in the mbox 216 format 218 o The archive system should support exporting messages in maildir 219 format 221 o The archive system must support exporting the entire archive of a 222 given list 224 o The archive system must support exporting all messages from a 225 given list within a given daterange 227 o The archive system should allow exporting the results of any 228 supported search query 230 2.6. Redundancy 232 o The systems must facilitate providing archive, search, and browse 233 functions through geographically distributed servers 235 - The systems must support a single active and single standby 236 server. This reflects the current operating configuration and 237 is expected to be the initial deployment model. 239 - The systems should support a single active and multiple standby 240 servers. 242 - The systems should support multiple active servers for the 243 search and browse functions. Multiple active archive servers 244 are not a requirement. 246 - The amount of data replication between servers should be on the 247 order of the size of any new/changed messages in the archives. 249 * It is acceptable for replication to be part of the archival 250 system itself (such as using the replication mechanisms from 251 an underlying database). 253 * It is acceptable to rely on replication of the underlying 254 filesystem objects (using rsync of one or more directory 255 trees for example), but only if the objects in the 256 underlying filesystem are formatted such that the size of 257 the replication data is on the order of the size of any new/ 258 changed messages in the archives. 260 2.7. Archive Administration 262 o The archive system must support adding and removing lists to be 263 archived 265 o The system must allow the administrator to add messages to and 266 delete messages from an archived list. The system should log such 267 actions. 269 o The system must allow the administrator to delete messages from an 270 archived list 272 2.8. Transition Requirements 274 There are many existing archived messages containing embedded links 275 into the existing MHonArc mail archive. These links must continue to 276 work, but should reach the message as archived in the new system. 278 3. Internationalized Address Considerations 280 The archive and search functions should anticipate internationalized 281 email addresses as discussed in the following three documents 282 [I-D.ietf-eai-rfc5335bis] [I-D.ietf-eai-rfc5336bis] 283 [I-D.ietf-eai-5738bis]. There is no firm requirement at this time. 285 4. Security Considerations 287 Creating a new tool for searching and archiving IETF email lists does 288 not affect the security of the Internet in any significant fashion. 290 5. IANA Considerations 292 This document has no actions for IANA. 294 6. Acknowledgements 296 The Tools Development team provided input into this initial 297 brainstorm. Text suggestions from Alexey Melnikov, Pete Resnick, and 298 S. Moonesamy have been incorporated. 300 7. Changelog 301 7.1. 01 to 02 303 1. Added request for the Archived-At header field. 305 2. Pointed to the EAI work in progress and in the RFC Editor queue. 307 3. Corrected several typos 309 7.2. 00 to 01 311 1. Requested ability to import maildir-like archives, not just 312 maildir proper 314 2. Added a section requesting IMAP access to the archive. 316 8. Informative References 318 [I-D.ietf-eai-5738bis] 319 Resnick, P., Newman, C., and S. Shen, "IMAP Support for 320 UTF-8", draft-ietf-eai-5738bis-02 (work in progress), 321 December 2011. 323 [I-D.ietf-eai-rfc5335bis] 324 Yang, A., Steele, S., and N. Freed, "Internationalized 325 Email Headers", draft-ietf-eai-rfc5335bis-13 (work in 326 progress), October 2011. 328 [I-D.ietf-eai-rfc5336bis] 329 Yao, J. and W. MAO, "SMTP Extension for Internationalized 330 Email", draft-ietf-eai-rfc5336bis-16 (work in progress), 331 November 2011. 333 [RFC3501] Crispin, M., "INTERNET MESSAGE ACCESS PROTOCOL - VERSION 334 4rev1", RFC 3501, March 2003. 336 [RFC4155] Hall, E., "The application/mbox Media Type", RFC 4155, 337 September 2005. 339 [RFC5064] Duerst, M., "The Archived-At Message Header Field", 340 RFC 5064, December 2007. 342 [maildir] "Maildir", . 344 [mailman] "Mailman", . 346 [mbox] "Mbox", . 348 Author's Address 350 Robert Sparks 351 Tekelec 352 17210 Campbell Road 353 Suite 250 354 Dallas, Texas 75254-4203 355 USA 357 Email: RjS@nostrum.com