idnits 2.17.1 draft-sparks-genarea-mailarch-07.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (Aug 30, 2012) is 4258 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-12) exists of draft-ietf-eai-5738bis-07 == Outdated reference: A later version (-08) exists of draft-sparks-genarea-imaparch-02 Summary: 0 errors (**), 0 flaws (~~), 3 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group R. Sparks 3 Internet-Draft Tekelec 4 Intended status: Informational Aug 30, 2012 5 Expires: March 3, 2013 7 IETF Email List Archiving, Web-based Browsing and Search Tool 8 Requirements 9 draft-sparks-genarea-mailarch-07 11 Abstract 13 The IETF makes heavy use of email lists to conduct its work. 14 Participants frequently need to search and browse the archives of 15 these lists, and have asked for improved search capabilities. The 16 current archive mechanism could also be made more efficient. This 17 memo captures the requirements for improved email list archiving and 18 searching systems. 20 Status of this Memo 22 This Internet-Draft is submitted in full conformance with the 23 provisions of BCP 78 and BCP 79. 25 Internet-Drafts are working documents of the Internet Engineering 26 Task Force (IETF). Note that other groups may also distribute 27 working documents as Internet-Drafts. The list of current Internet- 28 Drafts is at http://datatracker.ietf.org/drafts/current/. 30 Internet-Drafts are draft documents valid for a maximum of six months 31 and may be updated, replaced, or obsoleted by other documents at any 32 time. It is inappropriate to use Internet-Drafts as reference 33 material or to cite them other than as "work in progress." 35 This Internet-Draft will expire on March 3, 2013. 37 Copyright Notice 39 Copyright (c) 2012 IETF Trust and the persons identified as the 40 document authors. All rights reserved. 42 This document is subject to BCP 78 and the IETF Trust's Legal 43 Provisions Relating to IETF Documents 44 (http://trustee.ietf.org/license-info) in effect on the date of 45 publication of this document. Please review these documents 46 carefully, as they describe your rights and restrictions with respect 47 to this document. Code Components extracted from this document must 48 include Simplified BSD License text as described in Section 4.e of 49 the Trust Legal Provisions and are provided without warranty as 50 described in the Simplified BSD License. 52 Table of Contents 54 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 55 2. List Search and Archive Requirements . . . . . . . . . . . . . 3 56 2.1. Search and Browsing . . . . . . . . . . . . . . . . . . . 3 57 2.2. Archiving Active Lists . . . . . . . . . . . . . . . . . . 5 58 2.3. Importing Messages from Other Archives . . . . . . . . . . 5 59 2.4. Exporting messages from the Archives . . . . . . . . . . . 6 60 2.5. Redundancy . . . . . . . . . . . . . . . . . . . . . . . . 6 61 2.6. Archive Administration . . . . . . . . . . . . . . . . . . 7 62 2.7. Transition Requirements . . . . . . . . . . . . . . . . . 7 63 3. Internationalized Address Considerations . . . . . . . . . . . 7 64 4. IMAP Access . . . . . . . . . . . . . . . . . . . . . . . . . 7 65 5. Security Considerations . . . . . . . . . . . . . . . . . . . 7 66 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8 67 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 8 68 8. Changelog . . . . . . . . . . . . . . . . . . . . . . . . . . 8 69 8.1. 06 to 07 . . . . . . . . . . . . . . . . . . . . . . . . . 8 70 8.2. 05 to 06 . . . . . . . . . . . . . . . . . . . . . . . . . 8 71 8.3. 04 to 05 . . . . . . . . . . . . . . . . . . . . . . . . . 9 72 8.4. 03 to 04 . . . . . . . . . . . . . . . . . . . . . . . . . 9 73 8.5. 02 to 03 . . . . . . . . . . . . . . . . . . . . . . . . . 9 74 8.6. 01 to 02 . . . . . . . . . . . . . . . . . . . . . . . . . 9 75 8.7. 00 to 01 . . . . . . . . . . . . . . . . . . . . . . . . . 9 76 9. Informative References . . . . . . . . . . . . . . . . . . . . 9 77 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 10 79 1. Introduction 81 The IETF makes heavy use of email lists to conduct its work. 82 Participants frequently need to search the archives of these lists, 83 and have asked for improved search capabilities, particularly when 84 the search needs to cover a large period of time, or cross several 85 lists. For instance, document editors, shepherds, working group 86 chairs, and area directors may need to review all discussion of a 87 particular draft. That discussion may be spread across the working 88 group list, one or more directorate lists, and the IETF general list. 89 Occasionally, work impacts multiple groups, possibly in different 90 areas, and the search must cover additional working group lists. 92 The current tools for performing these searches require several 93 manually coordinated steps, which are error prone. Without a local 94 copy of the archive (which may not be complete), searching most 95 working group lists requires brute force effort, aided possibly by 96 web search engines. 98 More advanced search capabilities have been constructed for a limited 99 subset of the available lists and are exposed in the "Email Archives 100 Quick Search" section of the main IETF website. While these tools 101 are of great assistance, there is still significant need for 102 improvement. 104 The current archive mechanism could also be made more efficient. The 105 current practices involve duplicate stores (for the web and ftp 106 interfaces), which impacts storage and replication, and is subject to 107 inconsistency. 109 This memo captures the requirements for improved email list archiving 110 and searching systems. 112 2. List Search and Archive Requirements 114 2.1. Search and Browsing 116 o The system must provide a web interface for search and browsing 117 archived messages. 119 o The system must allow browsing the entire archive of a given list 120 by thread or by date. 122 o The system must allow browsing the results of a search by thread 123 or by date. 125 Both threading based on Message-Id/References/In-Reply-To and 126 threading based on same subject line (modulo short prefixes 127 like re: and fwd:) should be taken into account. 129 o The system must allow searching across any subset of the archived 130 lists (one list, a selection of lists, or all lists). 132 o The system must allow searching of any combination (using AND, OR, 133 and NOT operators) of the following attributes. Richer search 134 capabilites are highly desirable. 136 - string occurring in sender name or email address 138 - date range 140 - string occurring in Subject 142 - string occurring in message body 144 - string occurring in message header (in particular, exact match 145 of Message-Id) 147 For instance, it would be nice to search the entire archive 148 for instances of a message with a given Message-ID with a 149 URL like 152 o Individual messages must be representable by a long-term stable 153 URI that can be shared between users. That is, the URI must be 154 suitable for reference in an email message. 156 - It would be preferable for that URI to appear in an Archived-At 157 header field in the message [RFC5064]. 159 o Searches should be representable by a URI that can be shared 160 between users 162 - Such URIs should be long-term stable. 164 - The search may be re-executed when the URI is referenced. It 165 is acceptable for the same URI to produce different results if 166 accessed at different times or by different people (for 167 example, by reflecting additional messages that may match the 168 search criteria, or reflecting changes in access authorization 169 to lists with restricted archives.) 171 o When the system requires credentials, it must use the 172 datatracker's authentication system. 174 - While the vast majority of archived lists have an open access 175 policy, some archived lists have restricted archives 177 - The system must not require credentials for browsing or 178 searching lists with open archives. (But it is acceptable for 179 a user to browse or search such lists while logged in). 181 - The system must make it possible to limit access to a 182 restricted archive based on login credentials. 184 - Messages from restricted archives must be distinguisable from 185 messages from unrestricted archives in any search results. 187 2.2. Archiving Active Lists 189 o The archive system must accept messages handled by various mail 190 list manager packages. 192 - Lists hosted on the IETF systems are served by mailman 193 [mailman]. 195 - Lists hosted at other organizations may use other packages. 197 * The archive system must accept messages through subscribing 198 to such an external list. 200 * The archive system may support other mechanisms for 201 accepting messages into the archive 203 2.3. Importing Messages from Other Archives 205 Lists hosted at other systems are sometimes moved to the IETF 206 servers, and their archive is moved with them. The archiving system 207 must be able to import these archives. 209 o At a minimum the archive system must be able to import mbox 210 formatted archives [RFC4155][mbox]. 212 o The archive system should be able to import maildir and maildir- 213 like (the key characteristic being one-message-per-file) formatted 214 archives [maildir]. 216 o It is acceptable to use a separate utility to convert between 217 these formats before import as long as the conversion is lossless. 219 2.4. Exporting messages from the Archives 221 The archive system must allow both users and administrators to export 222 messages. 224 o The archive system must support exporting messages in the mbox 225 format 227 o The archive system should support exporting messages in maildir 228 format 230 o The archive system must support exporting the entire archive of a 231 given list 233 o The archive system must support exporting all messages from a 234 given list within a given daterange 236 o The archive system should allow exporting the results of any 237 supported search query 239 2.5. Redundancy 241 o The systems must facilitate providing archive, search, and browse 242 functions through geographically distributed servers 244 - The systems must support a single active and single standby 245 server. This reflects the current operating configuration and 246 is expected to be the initial deployment model. 248 - The systems should support a single active and multiple standby 249 servers. 251 - The systems should support multiple active servers for the 252 search and browse functions. Support for multiple active 253 archive servers are not a requirement. 255 - The amount of traffic generated to ensure data replication 256 between servers should be on the order of the size of any new/ 257 changed messages in the archives. 259 * It is acceptable for replication to be part of the archival 260 system itself (such as using the replication mechanisms from 261 an underlying database). 263 * It is acceptable to rely on replication of the underlying 264 filesystem objects (using rsync of one or more directory 265 trees for example), but only if the objects in the 266 underlying filesystem are formatted such that the size of 267 the replication data is on the order of the size of any new/ 268 changed messages in the archives. 270 2.6. Archive Administration 272 o The archive system must support adding and removing lists to be 273 archived 275 o The system must allow the administrator to add messages to and 276 delete messages from an archived list. The system should log such 277 actions. 279 2.7. Transition Requirements 281 There are many existing archived messages containing embedded links 282 into the existing MHonArc mail archive. These links must continue to 283 work, but should reach the message as archived in the new system. 285 3. Internationalized Address Considerations 287 The archive and search functions should anticipate internationalized 288 email addresses as discussed in the following three documents 289 [I-D.ietf-eai-rfc5335bis] [I-D.ietf-eai-rfc5336bis] 290 [I-D.ietf-eai-5738bis]. There is no firm requirement at this time. 292 4. IMAP Access 294 Requirements for allowing access to the archives using IMAP are 295 captured in [I-D.sparks-genarea-imaparch]. The archive system must 296 anticipate integrating with a system that provides IMAP access. 298 5. Security Considerations 300 Creating a new tool for searching and archiving IETF email lists does 301 not affect the security of the Internet in any significant fashion. 303 Searching can be I/O and CPU intensive. The implementors of this 304 tool should consider the potential for malicously crafted searches 305 attempting to consume all available resources. Similarly, the 306 implementors should consider the potential for denial of service 307 attacks through making many connections to the broswing system or 308 rapid navigating within it. 310 Preserving the integrity of the archives is important. The 311 implementors should ensure that administrative access is 312 appropriately authenticated, and that message paths into the archive 313 are appropriately configured to avoid unauthorized message insertion. 315 6. IANA Considerations 317 This document has no actions for IANA. 319 7. Acknowledgements 321 The Tools Development team provided input into the initial 322 brainstorm. Text suggestions from Alexey Melnikov, Pete Resnick, S. 323 Moonesamy, Francis Dupont, and Murray Kucherawy have been 324 incorporated. 326 8. Changelog 328 RFC Editor - please remove this section when formatting this document 329 as an RFC. 331 8.1. 06 to 07 333 1. Additions to the Security Considerations section reflecting IESG 334 discussion 336 8.2. 05 to 06 338 1. Incorporated comments and nits from the GenArt and AppsDir 339 reviewers. 341 2. Separated the Introduction's first paragraph into several for 342 readability. 344 3. Added NOT to the search operators 346 4. Deleted the second instance of a repeated requirement to allow 347 administrators to delete messages from an archive. 349 5. Clarified that search results could change along with changes in 350 authorization of the searcher. 352 6. Added a requirement that messages from restricted archives be 353 distinguisable from messages from unrestricted archives in search 354 results. 356 7. Added a reference to the imaparch document. 358 8.3. 04 to 05 360 1. Added requirements to enable controlled access to restricted 361 archives based on credentials, and that the datatracker's 362 credentials must be used. 364 8.4. 03 to 04 366 1. Split IMAP access to the archive into its own document so that it 367 can be pursued as an independent project. 369 8.5. 02 to 03 371 1. Expanded motivation to the Introduction. 373 8.6. 01 to 02 375 1. Added request for the Archived-At header field. 377 2. Pointed to the EAI work in progress and in the RFC Editor queue. 379 3. Corrected several typos 381 8.7. 00 to 01 383 1. Requested ability to import maildir-like archives, not just 384 maildir proper 386 2. Added a section requesting IMAP access to the archive. 388 9. Informative References 390 [I-D.ietf-eai-5738bis] 391 Resnick, P., Newman, C., and S. Shen, "IMAP Support for 392 UTF-8", draft-ietf-eai-5738bis-07 (work in progress), 393 August 2012. 395 [I-D.ietf-eai-rfc5335bis] 396 Yang, A., Steele, S., and N. Freed, "Internationalized 397 Email Headers", draft-ietf-eai-rfc5335bis-13 (work in 398 progress), October 2011. 400 [I-D.ietf-eai-rfc5336bis] 401 Yao, J. and W. MAO, "SMTP Extension for Internationalized 402 Email", draft-ietf-eai-rfc5336bis-16 (work in progress), 403 November 2011. 405 [I-D.sparks-genarea-imaparch] 406 Sparks, R., "IMAP Access to IETF Email List Archives", 407 draft-sparks-genarea-imaparch-02 (work in progress), 408 August 2012. 410 [RFC4155] Hall, E., "The application/mbox Media Type", RFC 4155, 411 September 2005. 413 [RFC5064] Duerst, M., "The Archived-At Message Header Field", 414 RFC 5064, December 2007. 416 [maildir] "Maildir", . 418 [mailman] "Mailman", . 420 [mbox] "Mbox", . 422 Author's Address 424 Robert Sparks 425 Tekelec 426 17210 Campbell Road 427 Suite 250 428 Dallas, Texas 75254-4203 429 USA 431 Email: RjS@nostrum.com