Internet-Draft MIMI Msg Format October 2022
Rosenberg & Jennings Expires 27 April 2023 [Page]
Workgroup:
Mimi
Internet-Draft:
draft-rosenberg-mimi-msg-format-00
Published:
Intended Status:
Standards Track
Expires:
Authors:
J. Rosenberg
Five9
C. Jennings
Cisco

Message format for More Messaging Interop (MIMI)

Abstract

This document defines a semantic model and format for the inter-provider exchange of chat messages. This format is focused on interoperability, while providing extensibility for additional content downstream. It supports the common messaging features present in chat systems today, including threading, reactions, images, gifs, videos, delivery and read receipts.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 27 April 2023.

Table of Contents

1. Introduction

The More Instant Messaging Interoperability (MIMI) working group will specify the minimal set of mechanisms required to make modern Internet messaging applications interoperable. Over time, messaging applications have achieved widespread use, their feature sets have broadened, and their adoption of end-to-end encryption (E2EE) has grown, but the lack of interoperability between these services continues to create a suboptimal user experience. The standards produced by the MIMI working group will allow for E2EE messaging applications for both consumer and enterprise to interoperate without undermining the security guarantees that they provide.

For security purposes, end to end messaging encryption using MLS [I-D.ietf-mls-architecture] will be used. MLS provides encryption of opaque blobs of message content, but does not specify the content format itself. This specification is meant to fill that gap, providing the semantics of a messaging system and the syntax of messages exchanged between providers.

2. Chat Resource Semantic Model

A chat resource (often called a chat or chat room), represents a message-based communications between 2 or more users. When there are two users, it is referred to as a 1-1 chat. When there are more than two users, it is referred to as a group chat. Each chat resource is identified by a tuple, consisting of a version 4 UUID, and a DNS name. The UUID uniquely identifies the chat resource, and is called the chat ID. The DNS name identifies the provider in which it lives. We refer to this provider as the owner.

In some chat systems, there can only be a single instance of a 1-1 group chat between a pair of users. MIMI is agnostic to this choice, and reflects whatever policy is in place by the owner.

A chat has a set of properties. In this version, only a single property is defined - the display name.

The chat also maintains the current list of members. Each member is represented by their identity, which can be mapped to the keying material used to decrypt messages using MLS [I-D.ietf-mls-protocol].

The chat, of course, has a sequence of messages. Each message has a type. The set of valid types is extensible. Messages are immutable once posted. If a message is edited or deleted, this is handled by sending a new message which is an edit or deletion of the prior message. The set of types are: content (in which the user sends text, image, video or audio), edit (in which the user is modifying a prior message), delete (in which the user is deleting a prior message), reaction (in which the user is reacting to a prior message), and create thread (in which the user is creating a thread about a prior message). The content message type includes the format of the content as a MIME type (e.g., text/plain). All messages include a reference to the prior message. For reactions, edits, deletes and threads, this reference is to the specific message for which this is a reaction, edit, delete or start of a thread. For a content message, the reference indicates the most recent message in the chat known to the user when they posted the message. This facilitates message sequencing operations.

There is also a message type for modifying the chat properties. This message contains the property name and its value. In this case, it would be text for the display name of the chat.

(TBD: need to sort out message related to group membership changes and whether they are part of this protocol or just using mls in some way).

All messages include the identity of the user that generated the message. These must match to the identities known the MLS AS.

(TBD: how to convey the keyID needed to decrypt, which is needed outside of the payload that is encrypted?)

All messages also include the chat resource (ID and provider DNS name). This makes each MIMI message completely self-contained, and usable without any additional context outside of the message itself.

When a user posts a message to the chat, the message is e2e encrypted. This means that the server, and its provider, does not and cannot decrypt the content. Thus, mimi messages are considered opaque to the server. The server will store these messages, but note the timestamp at which the message is received. This timestamp is used to facilitate synchronization of messages between the source of truth and any domains which are holding replicas. The synchronization is performed by having the providers of the participants issue subscription using I-D.nandakumar-mimi-transport, and requesting all messages since a specific timestamp.

Different chat systems have different rules about whether or not a new user, added to the chat, has access to historic messages in the chat that were posted prior to joining. This specification leaves that choice to the policy of the owner of the chat, and supports models where history is provided, and where it is not provided. In cases where it is not provided, when a user is added to a chat at time T, they would have access to all content posted from time T onwards. This would work by having their provider subscribe to all messages starting at time T. In cases where history is required, the provider would request messages starting from some time prior to T, probably as the user scrolls backwards through the chat.

Consequently, a key property of the system is that, for any value of T, a provider can subscribe to messages sent since time T, pass them to the end client, which can decrypt them and "execute" them in sequence. That sequence produces a valid rendering of the chat history that is not missing information. For this to be true, it also means that reactions, threads, edits and deletes must also include the original content to which they apply. Consider the case where a message is posted at time T-5, and then another user posts a reaction at time T+3. A new user is invited to the group chat at time T. If they subscribe to receive all messages sent since time T, they will get the reaction at time T+3, but not the original content which is being reacted to. Thus, the edit needs to include the content to which there is a reaction.

TODO: need to add timestamps, think about whether these are client generated and thus included in the signatures or server side; does MLS say something about this?

3. MIMI Message Syntax

MIMI messages are structured as JSON, which is the current syntax dujour for representing extensible data on the Internet. The old CPIM format [RFC3862], while originally specified as an interoperable format for instant messaging, is sufficiently dated at this point and missing many of the fields needed.

The following is an example message in json format:

{
  "ID" : "6845db7f-95b4-4f60-9a65-820f222e444a",

  "chat" : {
    "ID" : "72c659b7-d1f7-46ab-ae73-2339e3839036",
    "provider": "whatsapp.com"
  },

  "sender" : "+14085551212",

  "type" : "reaction",

  "reaction" : {
    "unicode" : "U+2764"
  }

  "reference" : {
    "ID" : "959489b0-40ab-4baf-b187-5795b8757c67",

    "sender" : "+17329876543",
    "type" : "content",
    "format" "text/plain",

    "text" : "Sure, I will join you guys *l8r*",

    "refersTo" : "473db0ec-7950-4c38-8de2-189ea9ac132b"

  }

}

The "ID" field indicates the identity of the message. The "chat" structure includes the chat resource ID and its associated provider. The "sender" here is an E.164 number which refers to the sender of this mesage. This example message is of type "reaction". For each type, there is always a structure which has information specific to this type. In the case of a reaction, this is a "reaction" structure that has a single field - the unicode character that represents the reaction. In this case, it is U+2764 which is a heart.

Most importantly, the message contains a reference structure, which is the message to which the reaction applies. The reference always includes the ID, sender, type and content of the reference. Here, it is a text message from a different user, "+17329876543". That message, in turn, was typed at a time when message "473db0ec-7950-4c38-8de2-189ea9ac132b" was the most recent one in the UI of this user.

In this use case, had there been reactions to this message which happened prior to the user joining the group, and history was not provided, the new user would not see all of the reactions - it would only see those reactions which were sent subsequent to the user joining the chat. But, the new user joining the group would at least see the message to which the reaction was applied, even though that message itself may have been sent prior to the user joining the group.

For text content, markdown is used to enable based formatting. A limited subset of markdown will be supported (details TBD).

Threads are not permitted to have subthreads.

Link previews are problematic and require further discussion. THere are two options - previews generated at the sender, and previews generated at the receiver. If the preview is generated at the receiver, it is a significant security issue, since it triggers the receiver to fetch a URL that they did not explicitly click on. When generated at the sender, they potentially reveal private information about the page which would only be shown to the sender, not the receiver (think: sending a link to your bank). My view is that they should be sender generated in mimi, but without cookies.

4. Normative References

[I-D.ietf-mls-architecture]
Beurdouche, B., Rescorla, E., Omara, E., Inguva, S., Kwon, A., and A. Duric, "The Messaging Layer Security (MLS) Architecture", Work in Progress, Internet-Draft, draft-ietf-mls-architecture-09, , <https://www.ietf.org/archive/id/draft-ietf-mls-architecture-09.txt>.
[I-D.ietf-mls-protocol]
Barnes, R., Beurdouche, B., Robert, R., Millican, J., Omara, E., and K. Cohn-Gordon, "The Messaging Layer Security (MLS) Protocol", Work in Progress, Internet-Draft, draft-ietf-mls-protocol-16, , <https://www.ietf.org/archive/id/draft-ietf-mls-protocol-16.txt>.
[RFC2119]
Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, , <https://www.rfc-editor.org/info/rfc2119>.
[RFC3862]
Klyne, G. and D. Atkins, "Common Presence and Instant Messaging (CPIM): Message Format", RFC 3862, DOI 10.17487/RFC3862, , <https://www.rfc-editor.org/info/rfc3862>.
[RFC6120]
Saint-Andre, P., "Extensible Messaging and Presence Protocol (XMPP): Core", RFC 6120, DOI 10.17487/RFC6120, , <https://www.rfc-editor.org/info/rfc6120>.
[RFC6914]
Rosenberg, J., "SIMPLE Made Simple: An Overview of the IETF Specifications for Instant Messaging and Presence Using the Session Initiation Protocol (SIP)", RFC 6914, DOI 10.17487/RFC6914, , <https://www.rfc-editor.org/info/rfc6914>.

Authors' Addresses

Jonathan Rosenberg
Five9
Cullen Jennings
Cisco