Network Working Group M. Crispin
Request for Comments: 5256 Panda Programming
Category: Standards Track K. Murchison
Carnegie Mellon University
June 2008
Network Working Group M. Crispin
Request for Comments: 5256 Panda Programming
Category: Standards Track K. Murchison
Carnegie Mellon University
June 2008
This document specifies an Internet standards track protocol for the Internet community, and requests discussion and suggestions for improvements. Please refer to the current edition of the "Internet Official Protocol Standards" (STD 1) for the standardization state and status of this protocol. Distribution of this memo is unlimited.
A server that implements the SORT and/or THREAD extensions MUST collate strings in accordance with the requirements of I18NLEVEL=1, as described in [IMAP-I18N], and SHOULD implement and advertise the I18NLEVEL=1 extension. Alternatively, a server MAY implement I18NLEVEL=2 (or higher) and comply with the rules of that level.
Discussion: The SORT and THREAD extensions predate [IMAP-I18N] by several years. At the time of this writing, all known server implementations of SORT and THREAD comply with the rules of I18NLEVEL=1, but do not necessarily advertise it. As discussed in [IMAP-I18N] section 4.5, all server implementations should eventually be updated to comply with the I18NLEVEL=2 extension.
Subject sorting and threading use the "base subject", which has specific subject artifacts removed. Due to the complexity of these artifacts, the formal syntax for the subject extraction rules is ambiguous. The following procedure is followed to determine the "base subject", using the [ABNF] formal syntax rules described in section 5:
There is also a UID SORT command that returns unique identifiers instead of message sequence numbers. Note that there are separate searching criteria for message sequence numbers and UIDs; thus, the arguments to UID SORT are interpreted the same as in SORT. This is analogous to the behavior of UID SEARCH, as opposed to UID COPY, UID FETCH, or UID STORE.
REVERSE Followed by another sort criterion, has the effect of that criterion but in reverse (descending) order. Note: REVERSE only reverses a single criterion, and does not affect the implicit "sequence number" sort criterion if all other criteria are identical. Consequently, a sort of REVERSE SUBJECT is not the same as a reverse ordering of a SUBJECT sort. This can be avoided by use of additional criteria, e.g., SUBJECT DATE vs. REVERSE SUBJECT REVERSE DATE. In general, however, it's better (and faster, if the client has a "reverse current ordering" command) to reverse the results in the client instead of issuing a new SORT.
Example: C: A282 SORT (SUBJECT) UTF-8 SINCE 1-Feb-1994
S: * SORT 2 84 882
S: A282 OK SORT completed
C: A283 SORT (SUBJECT REVERSE DATE) UTF-8 ALL
S: * SORT 5 3 4 1 2
S: A283 OK SORT completed
C: A284 SORT (SUBJECT) US-ASCII TEXT "not in mailbox"
S: * SORT
S: A284 OK SORT completed
Example: C: A282 SORT (SUBJECT) UTF-8 SINCE 1-Feb-1994
S: * SORT 2 84 882
S: A282 OK SORT completed
C: A283 SORT (SUBJECT REVERSE DATE) UTF-8 ALL
S: * SORT 5 3 4 1 2
S: A283 OK SORT completed
C: A284 SORT (SUBJECT) US-ASCII TEXT "not in mailbox"
S: * SORT
S: A284 OK SORT completed
There is also a UID THREAD command that returns unique identifiers instead of message sequence numbers. Note that there are separate searching criteria for message sequence numbers and UIDs; thus the arguments to UID THREAD are interpreted the same as in THREAD. This is analogous to the behavior of UID SEARCH, as opposed to UID COPY, UID FETCH, or UID STORE.
The THREAD command first searches the mailbox for messages that match the given searching criteria using the charset argument for the interpretation of strings in the searching criteria. It then returns the matching messages in an untagged THREAD response, threaded according to the specified threading algorithm.
The ORDEREDSUBJECT threading algorithm is also referred to as "poor man's threading". The searched messages are sorted by base subject and then by the sent date. The messages are then split into separate threads, with each thread containing messages with the same base subject text. Finally, the threads are sorted by the sent date of the first message in the thread.
The top level or "root" in ORDEREDSUBJECT threading contains the first message of every thread. All messages in the root are siblings of each other. The second message of a thread is the child of the first message, and subsequent messages of the thread are siblings of the second message and hence children of the message at the root. Hence, there are no grandchildren in ORDEREDSUBJECT threading.
The REFERENCES threading algorithm threads the searched messages by grouping them together in parent/child relationships based on which messages are replies to others. The parent/child relationships are built using two methods: reconstructing a message's ancestry using the references contained within it; and checking the original (not base) subject of a message to see if it is a reply to (or forward of) another message.
Note: "Message ID" in the following description refers to a normalized form of the msg-id in [RFC2822]. The actual text in RFC 2822 may use quoting, resulting in multiple ways of expressing the same Message ID. Implementations of the REFERENCES threading algorithm MUST normalize any msg-id in order to avoid false non-matches due to differences in quoting.
Note: Although [RFC2822] permits multiple Message IDs in the In-Reply-To header, in actual practice this discipline has not been followed. For example, In-Reply-To headers have been observed with message addresses after the Message ID, and there are no good heuristics for software to determine the difference. This is not a problem with the References header, however.
(A) Using the Message IDs in the message's references, link the corresponding messages (those whose Message-ID header line contains the given reference Message ID) together as parent/child. Make the first reference the parent of the second (and the second a child of the first), the second the parent of the third (and the third a child of the second), etc. The following rules govern the creation of these links:
If a message already has a parent, don't change the existing link. This is done because the References header line may have been truncated by a Mail User Agent (MUA). As a result, there is no guarantee that the messages corresponding to adjacent Message IDs in the References header line are parent and child.
(B) Create a parent/child link between the last reference (or NIL if there are no references) and the current message. If the current message already has a parent, it is probably the result of a truncated References header line, so break the current parent/child link before creating the new correct one. As in step 1.A,
(6) Traverse the messages under the root and sort each set of siblings by sent date as described in section 2.2. Traverse the messages in such a way that the "youngest" set of siblings are sorted first, and the "oldest" set of siblings are sorted last (grandchildren are sorted before children, etc). In the case of a dummy message (which can only occur with top-level siblings), use its first child for sorting.
Example: C: A283 THREAD ORDEREDSUBJECT UTF-8 SINCE 5-MAR-2000
S: * THREAD (166)(167)(168)(169)(172)(170)(171)
(173)(174 (175)(176)(178)(181)(180))(179)(177
(183)(182)(188)(184)(185)(186)(187)(189))(190)
(191)(192)(193)(194 195)(196 (197)(198))(199)
(200 202)(201)(203)(204)(205)(206 207)(208)
S: A283 OK THREAD completed
C: A284 THREAD ORDEREDSUBJECT US-ASCII TEXT "gewp"
S: * THREAD
S: A284 OK THREAD completed
C: A285 THREAD REFERENCES UTF-8 SINCE 5-MAR-2000
S: * THREAD (166)(167)(168)(169)(172)((170)(179))
(171)(173)((174)(175)(176)(178)(181)(180))
((177)(183)(182)(188 (184)(189))(185 186)(187))
(190)(191)(192)(193)((194)(195 196))(197 198)
(199)(200 202)(201)(203)(204)(205 206 207)(208)
S: A285 OK THREAD completed
Example: C: A283 THREAD ORDEREDSUBJECT UTF-8 SINCE 5-MAR-2000
S: * THREAD (166)(167)(168)(169)(172)(170)(171)
(173)(174 (175)(176)(178)(181)(180))(179)(177
(183)(182)(188)(184)(185)(186)(187)(189))(190)
(191)(192)(193)(194 195)(196 (197)(198))(199)
(200 202)(201)(203)(204)(205)(206 207)(208)
S: A283 OK THREAD completed
C: A284 THREAD ORDEREDSUBJECT US-ASCII TEXT "gewp"
S: * THREAD
S: A284 OK THREAD completed
C: A285 THREAD REFERENCES UTF-8 SINCE 5-MAR-2000
S: * THREAD (166)(167)(168)(169)(172)((170)(179))
(171)(173)((174)(175)(176)(178)(181)(180))
((177)(183)(182)(188 (184)(189))(185 186)(187))
(190)(191)(192)(193)((194)(195 196))(197 198)
(199)(200 202)(201)(203)(204)(205 206 207)(208)
S: A285 OK THREAD completed
Thread members consist of zero or more message numbers, delimited by spaces, indicating successive parent and child. This continues until the thread splits into multiple sub-threads, at which point, the thread nests into multiple sub-threads with the first member of each sub-thread being siblings at this level. There is no limit to the nesting of threads.
The first thread consists only of message 2. The second thread consists of the messages 3 (parent) and 6 (child), after which it splits into two sub-threads; the first of which contains messages 4 (child of 6, sibling of 44) and 23 (child of 4), and the second of which contains messages 44 (child of 6, sibling of 4), 7 (child of 44), and 96 (child of 7). Since some later messages are parents of earlier messages, the messages were probably moved from some other mailbox at different times.
subj-middle = *subj-blob (subj-base / subj-fwd)
; last subj-blob is subj-base if subj-base would
; otherwise be empty
subj-middle = *subj-blob (subj-base / subj-fwd)
; last subj-blob is subj-base if subj-base would
; otherwise be empty
The SORT and THREAD extensions do not raise any security considerations that are not present in the base [IMAP] protocol, and these issues are discussed in [IMAP]. Nevertheless, it is important to remember that [IMAP] protocol transactions, including message data, are sent in the clear over the network unless protection from snooping is negotiated, either by the use of STARTTLS, privacy protection in AUTHENTICATE, or some other protection mechanism.
As stated in the introduction, the rules of I18NLEVEL=1 as described in [IMAP-I18N] MUST be followed; that is, the SORT and THREAD extensions MUST collate strings according to the i;unicode-casemap collation described in [UNICASEMAP]. Servers SHOULD also advertise the I18NLEVEL=1 extension. Alternatively, a server MAY implement I18NLEVEL=2 (or higher) and comply with the rules of that level.
Instead, note that [RFC2822] section 3.6.5 recommends that "re:" (from the Latin "res", meaning "in the matter of") be used to identify a reply. Although it is evident that, from the multiple forms of token to identify a forwarded message, there is considerable variation found in the wild, the variations are (still) manageable. Consequently, it is suggested that "re:" and one of the variations of the tokens for a forward supported by the base subject extraction rules be adopted for Internet mail messages, since doing so makes it a simple display-time task to localize the token language for the user.
This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr.