idnits 2.17.1 draft-sparks-genarea-mailarch-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (Aug 22, 2012) is 4259 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-12) exists of draft-ietf-eai-5738bis-07 == Outdated reference: A later version (-08) exists of draft-sparks-genarea-imaparch-01 Summary: 0 errors (**), 0 flaws (~~), 3 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group R. Sparks 3 Internet-Draft Tekelec 4 Intended status: Informational Aug 22, 2012 5 Expires: February 23, 2013 7 IETF Email List Archiving, Web-based Browsing and Search Tool 8 Requirements 9 draft-sparks-genarea-mailarch-06 11 Abstract 13 The IETF makes heavy use of email lists to conduct its work. 14 Participants frequently need to search and browse the archives of 15 these lists, and have asked for improved search capabilities. The 16 current archive mechanism could also be made more efficient. This 17 memo captures the requirements for improved email list archiving and 18 searching systems. 20 Status of this Memo 22 This Internet-Draft is submitted in full conformance with the 23 provisions of BCP 78 and BCP 79. 25 Internet-Drafts are working documents of the Internet Engineering 26 Task Force (IETF). Note that other groups may also distribute 27 working documents as Internet-Drafts. The list of current Internet- 28 Drafts is at http://datatracker.ietf.org/drafts/current/. 30 Internet-Drafts are draft documents valid for a maximum of six months 31 and may be updated, replaced, or obsoleted by other documents at any 32 time. It is inappropriate to use Internet-Drafts as reference 33 material or to cite them other than as "work in progress." 35 This Internet-Draft will expire on February 23, 2013. 37 Copyright Notice 39 Copyright (c) 2012 IETF Trust and the persons identified as the 40 document authors. All rights reserved. 42 This document is subject to BCP 78 and the IETF Trust's Legal 43 Provisions Relating to IETF Documents 44 (http://trustee.ietf.org/license-info) in effect on the date of 45 publication of this document. Please review these documents 46 carefully, as they describe your rights and restrictions with respect 47 to this document. Code Components extracted from this document must 48 include Simplified BSD License text as described in Section 4.e of 49 the Trust Legal Provisions and are provided without warranty as 50 described in the Simplified BSD License. 52 Table of Contents 54 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 55 2. List Search and Archive Requirements . . . . . . . . . . . . . 3 56 2.1. Search and Browsing . . . . . . . . . . . . . . . . . . . 3 57 2.2. Archiving Active Lists . . . . . . . . . . . . . . . . . . 5 58 2.3. Importing Messages from Other Archives . . . . . . . . . . 5 59 2.4. Exporting messages from the Archives . . . . . . . . . . . 6 60 2.5. Redundancy . . . . . . . . . . . . . . . . . . . . . . . . 6 61 2.6. Archive Administration . . . . . . . . . . . . . . . . . . 7 62 2.7. Transition Requirements . . . . . . . . . . . . . . . . . 7 63 3. Internationalized Address Considerations . . . . . . . . . . . 7 64 4. IMAP Access . . . . . . . . . . . . . . . . . . . . . . . . . 7 65 5. Security Considerations . . . . . . . . . . . . . . . . . . . 7 66 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 7 67 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 7 68 8. Changelog . . . . . . . . . . . . . . . . . . . . . . . . . . 8 69 8.1. 05 to 06 . . . . . . . . . . . . . . . . . . . . . . . . . 8 70 8.2. 04 to 05 . . . . . . . . . . . . . . . . . . . . . . . . . 8 71 8.3. 03 to 04 . . . . . . . . . . . . . . . . . . . . . . . . . 8 72 8.4. 02 to 03 . . . . . . . . . . . . . . . . . . . . . . . . . 8 73 8.5. 01 to 02 . . . . . . . . . . . . . . . . . . . . . . . . . 8 74 8.6. 00 to 01 . . . . . . . . . . . . . . . . . . . . . . . . . 9 75 9. Informative References . . . . . . . . . . . . . . . . . . . . 9 76 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 10 78 1. Introduction 80 The IETF makes heavy use of email lists to conduct its work. 81 Participants frequently need to search the archives of these lists, 82 and have asked for improved search capabilities, particularly when 83 the search needs to cover a large period of time, or cross several 84 lists. For instance, document editors, shepherds, working group 85 chairs, and area directors may need to review all discussion of a 86 particular draft. That discussion may be spread across the working 87 group list, one or more directorate lists, and the IETF general list. 88 Occasionally, work impacts multiple groups, possibly in different 89 areas, and the search must cover additional working group lists. 91 The current tools for performing these searches require several 92 manually coordinated steps, which are error prone. Without a local 93 copy of the archive (which may not be complete), searching most 94 working group lists requires brute force effort, aided possibly by 95 web search engines. 97 More advanced search capabilities have been constructed for a limited 98 subset of the available lists and are exposed in the "Email Archives 99 Quick Search" section of the main IETF website. While these tools 100 are of great assistance, there is still significant need for 101 improvement. 103 The current archive mechanism could also be made more efficient. The 104 current practices involve duplicate stores (for the web and ftp 105 interfaces), which impacts storage and replication, and is subject to 106 inconsistency. 108 This memo captures the requirements for improved email list archiving 109 and searching systems. 111 2. List Search and Archive Requirements 113 2.1. Search and Browsing 115 o The system must provide a web interface for search and browsing 116 archived messages. 118 o The system must allow browsing the entire archive of a given list 119 by thread or by date. 121 o The system must allow browsing the results of a search by thread 122 or by date. 124 Both threading based on Message-Id/References/In-Reply-To and 125 threading based on same subject line (modulo short prefixes 126 like re: and fwd:) should be taken into account. 128 o The system must allow searching across any subset of the archived 129 lists (one list, a selection of lists, or all lists). 131 o The system must allow searching of any combination (using AND, OR, 132 and NOT operators) of the following attributes. Richer search 133 capabilites are highly desirable. 135 - string occurring in sender name or email address 137 - date range 139 - string occurring in Subject 141 - string occurring in message body 143 - string occurring in message header (in particular, exact match 144 of Message-Id) 146 For instance, it would be nice to search the entire archive 147 for instances of a message with a given Message-ID with a 148 URL like 151 o Individual messages must be representable by a long-term stable 152 URI that can be shared between users. That is, the URI must be 153 suitable for reference in an email message. 155 - It would be preferable for that URI to appear in an Archived-At 156 header field in the message [RFC5064]. 158 o Searches should be representable by a URI that can be shared 159 between users 161 - Such URIs should be long-term stable. 163 - The search may be re-executed when the URI is referenced. It 164 is acceptable for the same URI to produce different results if 165 accessed at different times or by different people (for 166 example, by reflecting additional messages that may match the 167 search criteria, or reflecting changes in access authorization 168 to lists with restricted archives.) 170 o When the system requires credentials, it must use the 171 datatracker's authentication system. 173 - While the vast majority of archived lists have an open access 174 policy, some archived lists have restricted archives 176 - The system must not require credentials for browsing or 177 searching lists with open archives. (But it is acceptable for 178 a user to browse or search such lists while logged in). 180 - The system must make it possible to limit access to a 181 restricted archive based on login credentials. 183 - Messages from restricted archives must be distinguisable from 184 messages from unrestricted archives in any search results. 186 2.2. Archiving Active Lists 188 o The archive system must accept messages handled by various mail 189 list manager packages. 191 - Lists hosted on the IETF systems are served by mailman 192 [mailman]. 194 - Lists hosted at other organizations may use other packages. 196 * The archive system must accept messages through subscribing 197 to such an external list. 199 * The archive system may support other mechanisms for 200 accepting messages into the archive 202 2.3. Importing Messages from Other Archives 204 Lists hosted at other systems are sometimes moved to the IETF 205 servers, and their archive is moved with them. The archiving system 206 must be able to import these archives. 208 o At a minimum the archive system must be able to import mbox 209 formatted archives [RFC4155][mbox]. 211 o The archive system should be able to import maildir and maildir- 212 like (the key characteristic being on-message-per-file) formatted 213 archives [maildir]. 215 o It is acceptable to use a separate utility to convert between 216 these formats before import as long as the conversion is lossless. 218 2.4. Exporting messages from the Archives 220 o The archive system must support exporting messages in the mbox 221 format 223 o The archive system should support exporting messages in maildir 224 format 226 o The archive system must support exporting the entire archive of a 227 given list 229 o The archive system must support exporting all messages from a 230 given list within a given daterange 232 o The archive system should allow exporting the results of any 233 supported search query 235 2.5. Redundancy 237 o The systems must facilitate providing archive, search, and browse 238 functions through geographically distributed servers 240 - The systems must support a single active and single standby 241 server. This reflects the current operating configuration and 242 is expected to be the initial deployment model. 244 - The systems should support a single active and multiple standby 245 servers. 247 - The systems should support multiple active servers for the 248 search and browse functions. Support for multiple active 249 archive servers are not a requirement. 251 - The amount of traffic generated to ensure data replication 252 between servers should be on the order of the size of any new/ 253 changed messages in the archives. 255 * It is acceptable for replication to be part of the archival 256 system itself (such as using the replication mechanisms from 257 an underlying database). 259 * It is acceptable to rely on replication of the underlying 260 filesystem objects (using rsync of one or more directory 261 trees for example), but only if the objects in the 262 underlying filesystem are formatted such that the size of 263 the replication data is on the order of the size of any new/ 264 changed messages in the archives. 266 2.6. Archive Administration 268 o The archive system must support adding and removing lists to be 269 archived 271 o The system must allow the administrator to add messages to and 272 delete messages from an archived list. The system should log such 273 actions. 275 2.7. Transition Requirements 277 There are many existing archived messages containing embedded links 278 into the existing MHonArc mail archive. These links must continue to 279 work, but should reach the message as archived in the new system. 281 3. Internationalized Address Considerations 283 The archive and search functions should anticipate internationalized 284 email addresses as discussed in the following three documents 285 [I-D.ietf-eai-rfc5335bis] [I-D.ietf-eai-rfc5336bis] 286 [I-D.ietf-eai-5738bis]. There is no firm requirement at this time. 288 4. IMAP Access 290 Requirements for allowing access to the archives using IMAP are 291 captured in [I-D.sparks-genarea-imaparch]. 293 5. Security Considerations 295 Creating a new tool for searching and archiving IETF email lists does 296 not affect the security of the Internet in any significant fashion. 298 6. IANA Considerations 300 This document has no actions for IANA. 302 7. Acknowledgements 304 The Tools Development team provided input into the initial 305 brainstorm. Text suggestions from Alexey Melnikov, Pete Resnick, S. 307 Moonesamy, Francis Dupont, and Murray Kucherawy have been 308 incorporated. 310 8. Changelog 312 8.1. 05 to 06 314 1. Incorporated comments and nits from the GenArt and AppsDir 315 reviewers. 317 2. Separated the Introduction's first paragraph into several for 318 readability. 320 3. Added NOT to the search operators 322 4. Deleted the second instance of a repeated requirement to allow 323 administrators to delete messages from an archive. 325 5. Clarified that search results could change along with changes in 326 authorization of the searcher. 328 6. Added a requirement that messages from restricted archives be 329 distinguisable from messages from unrestricted archives in search 330 results. 332 7. Added a reference to the imaparch document. 334 8.2. 04 to 05 336 1. Added requirements to enable controlled access to restricted 337 archives based on credentials, and that the datatracker's 338 credentials must be used. 340 8.3. 03 to 04 342 1. Split IMAP access to the archive into its own document so that it 343 can be pursued as an independent project. 345 8.4. 02 to 03 347 1. Expanded motivation to the Introduction. 349 8.5. 01 to 02 351 1. Added request for the Archived-At header field. 353 2. Pointed to the EAI work in progress and in the RFC Editor queue. 355 3. Corrected several typos 357 8.6. 00 to 01 359 1. Requested ability to import maildir-like archives, not just 360 maildir proper 362 2. Added a section requesting IMAP access to the archive. 364 9. Informative References 366 [I-D.ietf-eai-5738bis] 367 Resnick, P., Newman, C., and S. Shen, "IMAP Support for 368 UTF-8", draft-ietf-eai-5738bis-07 (work in progress), 369 August 2012. 371 [I-D.ietf-eai-rfc5335bis] 372 Yang, A., Steele, S., and N. Freed, "Internationalized 373 Email Headers", draft-ietf-eai-rfc5335bis-13 (work in 374 progress), October 2011. 376 [I-D.ietf-eai-rfc5336bis] 377 Yao, J. and W. MAO, "SMTP Extension for Internationalized 378 Email", draft-ietf-eai-rfc5336bis-16 (work in progress), 379 November 2011. 381 [I-D.sparks-genarea-imaparch] 382 Sparks, R., "IMAP Access to IETF Email List Archives", 383 draft-sparks-genarea-imaparch-01 (work in progress), 384 February 2012. 386 [RFC4155] Hall, E., "The application/mbox Media Type", RFC 4155, 387 September 2005. 389 [RFC5064] Duerst, M., "The Archived-At Message Header Field", 390 RFC 5064, December 2007. 392 [maildir] "Maildir", . 394 [mailman] "Mailman", . 396 [mbox] "Mbox", . 398 Author's Address 400 Robert Sparks 401 Tekelec 402 17210 Campbell Road 403 Suite 250 404 Dallas, Texas 75254-4203 405 USA 407 Email: RjS@nostrum.com