Message format for More Messaging Interop (MIMI)

Internet-Draft	MIMI Msg Format	October 2022
Rosenberg & Jennings	Expires 27 April 2023	[Page]

Abstract

This document defines a semantic model and format for the inter-provider exchange of chat messages. This format is focused on interoperability, while providing extensibility for additional content downstream. It supports the common messaging features present in chat systems today, including threading, reactions, images, gifs, videos, delivery and read receipts.¶

2. Chat Resource Semantic Model

A chat resource (often called a chat or chat room), represents a message-based communications between 2 or more users. When there are two users, it is referred to as a 1-1 chat. When there are more than two users, it is referred to as a group chat. Each chat resource is identified by a tuple, consisting of a version 4 UUID, and a DNS name. The UUID uniquely identifies the chat resource, and is called the chat ID. The DNS name identifies the provider in which it lives. We refer to this provider as the owner.¶

In some chat systems, there can only be a single instance of a 1-1 group chat between a pair of users. MIMI is agnostic to this choice, and reflects whatever policy is in place by the owner.¶

A chat has a set of properties. In this version, only a single property is defined - the display name.¶

The chat also maintains the current list of members. Each member is represented by their identity, which can be mapped to the keying material used to decrypt messages using MLS [I-D.ietf-mls-protocol].¶

The chat, of course, has a sequence of messages. Each message has a type. The set of valid types is extensible. Messages are immutable once posted. If a message is edited or deleted, this is handled by sending a new message which is an edit or deletion of the prior message. The set of types are: content (in which the user sends text, image, video or audio), edit (in which the user is modifying a prior message), delete (in which the user is deleting a prior message), reaction (in which the user is reacting to a prior message), and create thread (in which the user is creating a thread about a prior message). The content message type includes the format of the content as a MIME type (e.g., text/plain). All messages include a reference to the prior message. For reactions, edits, deletes and threads, this reference is to the specific message for which this is a reaction, edit, delete or start of a thread. For a content message, the reference indicates the most recent message in the chat known to the user when they posted the message. This facilitates message sequencing operations.¶

There is also a message type for modifying the chat properties. This message contains the property name and its value. In this case, it would be text for the display name of the chat.¶

(TBD: need to sort out message related to group membership changes and whether they are part of this protocol or just using mls in some way).¶

All messages include the identity of the user that generated the message. These must match to the identities known the MLS AS.¶

(TBD: how to convey the keyID needed to decrypt, which is needed outside of the payload that is encrypted?)¶

All messages also include the chat resource (ID and provider DNS name). This makes each MIMI message completely self-contained, and usable without any additional context outside of the message itself.¶

When a user posts a message to the chat, the message is e2e encrypted. This means that the server, and its provider, does not and cannot decrypt the content. Thus, mimi messages are considered opaque to the server. The server will store these messages, but note the timestamp at which the message is received. This timestamp is used to facilitate synchronization of messages between the source of truth and any domains which are holding replicas. The synchronization is performed by having the providers of the participants issue subscription using I-D.nandakumar-mimi-transport, and requesting all messages since a specific timestamp.¶

Different chat systems have different rules about whether or not a new user, added to the chat, has access to historic messages in the chat that were posted prior to joining. This specification leaves that choice to the policy of the owner of the chat, and supports models where history is provided, and where it is not provided. In cases where it is not provided, when a user is added to a chat at time T, they would have access to all content posted from time T onwards. This would work by having their provider subscribe to all messages starting at time T. In cases where history is required, the provider would request messages starting from some time prior to T, probably as the user scrolls backwards through the chat.¶

Consequently, a key property of the system is that, for any value of T, a provider can subscribe to messages sent since time T, pass them to the end client, which can decrypt them and "execute" them in sequence. That sequence produces a valid rendering of the chat history that is not missing information. For this to be true, it also means that reactions, threads, edits and deletes must also include the original content to which they apply. Consider the case where a message is posted at time T-5, and then another user posts a reaction at time T+3. A new user is invited to the group chat at time T. If they subscribe to receive all messages sent since time T, they will get the reaction at time T+3, but not the original content which is being reacted to. Thus, the edit needs to include the content to which there is a reaction.¶

TODO: need to add timestamps, think about whether these are client generated and thus included in the signatures or server side; does MLS say something about this?¶

3. MIMI Message Syntax

MIMI messages are structured as JSON, which is the current syntax dujour for representing extensible data on the Internet. The old CPIM format [RFC3862], while originally specified as an interoperable format for instant messaging, is sufficiently dated at this point and missing many of the fields needed.¶

The following is an example message in json format:¶

{
  "ID" : "6845db7f-95b4-4f60-9a65-820f222e444a",

  "chat" : {
    "ID" : "72c659b7-d1f7-46ab-ae73-2339e3839036",
    "provider": "whatsapp.com"
  },

  "sender" : "+14085551212",

  "type" : "reaction",

  "reaction" : {
    "unicode" : "U+2764"
  }

  "reference" : {
    "ID" : "959489b0-40ab-4baf-b187-5795b8757c67",

    "sender" : "+17329876543",
    "type" : "content",
    "format" "text/plain",

    "text" : "Sure, I will join you guys *l8r*",

    "refersTo" : "473db0ec-7950-4c38-8de2-189ea9ac132b"

  }

}

The "ID" field indicates the identity of the message. The "chat" structure includes the chat resource ID and its associated provider. The "sender" here is an E.164 number which refers to the sender of this mesage. This example message is of type "reaction". For each type, there is always a structure which has information specific to this type. In the case of a reaction, this is a "reaction" structure that has a single field - the unicode character that represents the reaction. In this case, it is U+2764 which is a heart.¶

Most importantly, the message contains a reference structure, which is the message to which the reaction applies. The reference always includes the ID, sender, type and content of the reference. Here, it is a text message from a different user, "+17329876543". That message, in turn, was typed at a time when message "473db0ec-7950-4c38-8de2-189ea9ac132b" was the most recent one in the UI of this user.¶

In this use case, had there been reactions to this message which happened prior to the user joining the group, and history was not provided, the new user would not see all of the reactions - it would only see those reactions which were sent subsequent to the user joining the chat. But, the new user joining the group would at least see the message to which the reaction was applied, even though that message itself may have been sent prior to the user joining the group.¶

For text content, markdown is used to enable based formatting. A limited subset of markdown will be supported (details TBD).¶

Threads are not permitted to have subthreads.¶

Link previews are problematic and require further discussion. THere are two options - previews generated at the sender, and previews generated at the receiver. If the preview is generated at the receiver, it is a significant security issue, since it triggers the receiver to fetch a URL that they did not explicitly click on. When generated at the sender, they potentially reveal private information about the page which would only be shown to the sender, not the receiver (think: sending a link to your bank). My view is that they should be sender generated in mimi, but without cookies.¶

4. Normative References

[I-D.ietf-mls-architecture]: Beurdouche, B., Rescorla, E., Omara, E., Inguva, S., Kwon, A., and A. Duric, "The Messaging Layer Security (MLS) Architecture", Work in Progress, Internet-Draft, draft-ietf-mls-architecture-09, 19 August 2022, <https://www.ietf.org/archive/id/draft-ietf-mls-architecture-09.txt>.
[I-D.ietf-mls-protocol]: Barnes, R., Beurdouche, B., Robert, R., Millican, J., Omara, E., and K. Cohn-Gordon, "The Messaging Layer Security (MLS) Protocol", Work in Progress, Internet-Draft, draft-ietf-mls-protocol-16, 11 July 2022, <https://www.ietf.org/archive/id/draft-ietf-mls-protocol-16.txt>.
[RFC2119]: Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, <https://www.rfc-editor.org/info/rfc2119>.
[RFC3862]: Klyne, G. and D. Atkins, "Common Presence and Instant Messaging (CPIM): Message Format", RFC 3862, DOI 10.17487/RFC3862, August 2004, <https://www.rfc-editor.org/info/rfc3862>.
[RFC6120]: Saint-Andre, P., "Extensible Messaging and Presence Protocol (XMPP): Core", RFC 6120, DOI 10.17487/RFC6120, March 2011, <https://www.rfc-editor.org/info/rfc6120>.
[RFC6914]: Rosenberg, J., "SIMPLE Made Simple: An Overview of the IETF Specifications for Instant Messaging and Presence Using the Session Initiation Protocol (SIP)", RFC 6914, DOI 10.17487/RFC6914, April 2013, <https://www.rfc-editor.org/info/rfc6914>.

Message format for More Messaging Interop (MIMI)

Abstract

Status of This Memo

Copyright Notice

Table of Contents

1. Introduction

2. Chat Resource Semantic Model

3. MIMI Message Syntax

4. Normative References

Authors' Addresses