Add hash of attachments to the pre-message to not download cached ones

Goals

When the user receives a meme, sticker, group image, profile image or webxdc that they already possess from earlier on the same chat, from different chats or from their other profile, they should refer to the existing file instead of downloading it again.

The benefit of indicating a hash in the pre-message would allow to conserve bandwidth on the chatmail server and all devices of the recipient if we don’t need to download those post-messages whose content we already possess.

Present status

Deduplication within a profile is implemented as downloaded attachments within a single profile are already stored only once due to being named after the hash of its content.

The pre-message only includes the name and size of the file (and view type) as of now.

Tracking privacy

A threatened user may be given the option to disable this feature altogether. We may use an HMAC keyed by a nonce instead of a simple hash so that enumerating of well-known hashes of unencrypted content to track our local caches in cooperation with our chatmail server will not be possible. If the sending party already possesses the whole unencrypted content, they could still track our cache, but this would also prove that the probing party is also in possession of the content.

The threat is of lesser concern to most users as downloading of each attachment is already optional and can be gated by a manual decision depending on source user, chat, type, name and file size and that attempts to probing of a large number of files becomes obvious and the sender would be promptly blocked. Attachments could similarly also be forced upon someone if their public key is known by sending them the file on a different account before probing, again proving that this should not be used as incriminating evidence. In general, a user receiving an attachment that they deem to be dangerous to possess should delete it as soon as possible either manually or via setting an automatic cleanup timer.

It should be treated as a hash mismatch if the attachment size indicated in the pre-message differs from the size of the attachment in post-message (or that of the cached file corresponding to the given hash). This could make probing more expensive and more noticeable (i.e., as full-size content would need to be attached for each attempt). Such malicious attachments should be fetched (otherwise we disclose that we know the size corresponding to the hash), but never rendered.

Cleanup

In case of threatened users and potentially dangerous attachments, deleted attachments must also be purged from the cache immediately. In other cases, it can be beneficial to maintain at least the list of hashes (or the content itself if using HMAC keyed by a nonce) of all such attachments for a determined period of time (e.g., 1 year). If someone sends such attachment again, we could either render it from the cache (if the content was kept), or indicate to the user that this content was already viewed and deleted by them, hence they might not want to download and render it again if they would like to conserve bandwidth.

Block lists

If we used a static hash (instead of a HMAC), the user would have the opportunity to subscribe to block lists that enumerated content which they would not like to download automatically. Subscribing to many blocklists would mitigate some of the concerns about tracking by being less certain whether the user failed to download the given post-message attachment due to manual choice, due to the state of their cache or due to an entry within a block list they subscribe to.

Bait & switch security

Remember to ignore new post-message attachments if they have a mismatching hash from that claimed in the pre-message. Validating the announced hash to the content is required to mitigate future vulnerabilities related to sending different messages to different members of a group.

An attacker can prime one or more targets by sending them a benign attachment with a given hash in a different chat or even in private. Then they can send a different, abusive attachment in the shared group, claiming them to correspond to the hash of the past benign attachment. Then the client of each target might render the benign attachment, while other members would be rendered the malicious attachment instead.

2 Likes

I think Delta chat already does some deduplication when sharing media between chats but a common hash index would be useful to deduplicate any attachments you receive.

I agree it’s a good idea to deduplicate attachments when you receive them but I’m not convinced it’s necessary to include the hash in the pre-message.

If you are going to directly calculate the hash of the attachment anyway, I don’t see the benefit of including the hash in the pre-message for the purpose of deduplication because you can just use the hash you directly calculate for that. And if the message is already signed, I don’t see why an additional check for message integrity is needed, but maybe I’m missing something.

One must consider the privacy implications of this. Not downloading the attachment basically tells the server “I have previously received a file that is attached to this message”, which resembles Delta Chat turning into a CSAM scanner.

1 Like

@WofWca I have elaborated in the top post in the Tracking privacy and Block lists sections.

@deltahat This has a different purpose. Sorry for the unfortunate wording, I’ve replaced deduplication with caching & download at key places. Let me elaborate in the top post in the Goals and Bait & switch security sections.

It is much clearer now with the new wording.

By the way, big fan of the name & avatar.

1 Like