FIP: Direct Casts #99

CassOnMars · 2023-06-20T16:38:59Z

CassOnMars
Jun 20, 2023
Maintainer

Abstract

Direct Casts has been appropriately field tested and battle-hardened, and should be in a good position to be moved to hubs. There are many significant considerations required to retain the same (if not improved) privacy and secrecy guarantees, which this draft will discuss.

Problem

Direct Casts have been a feature of Warpcast (the client and backend) for some time, however supporting an implementation on hubs and supporting group messaging both would necessitate some rework to make either possible, with great overlap in designs, making it a nice confluence of conditions. See Rationale for deeper insights to the problems being addressed.

Specification

To achieve per-device-per-application messaging but ensure users can choose to revoke access to a given device or application for subsequent messages, this proposal adopts the use of the asynchronous variant of the Triple-Ratchet Protocol. To ensure this specification is self-contained, we describe it fully within Appendix A.

To support Direct Casts, we will need to add two new stores to support their corresponding message types: Keys and DirectCasts.

Key Store

The key store would introduce the new type KeyBody to the MessageData:

message MessageData {
  ....
  oneof body {
    ...
    KeyBody key_body = 15;
    ....
  }
}

and add the new MessageTypes:

enum MessageType {
  ...
  MESSAGE_TYPE_KEY_ADD = 13;                 
  MESSAGE_TYPE_KEY_REMOVE = 14;   
  ...
}

This would need the addition of a CRDT 2P-Set for Keys:

Add a Key CRDT which validates and accepts KeyAdd and KeyRemove messages. The CRDT also ensures that the message m passes these validations:

m.signer must be a valid Signer in the add-set of the Signer CRDT for message.fid

A conflict occurs if two messages have the same values for m.data.fid, m.data.body.device_id and m.data.body.type. Conflicts are resolved with the following rules:

If m.data.timestamp is distinct, discard the message with the lower timestamp.
If m.data.timestamp is identical and m.data.type is distinct, discard the KeyAdd message.
If m.data.timestamp and m.data.type are identical, discard the message with the lowest lexicographical order.

The Key CRDT has a per-user size limit of 200.

Cutoff

Keys explicitly support cutoffs, where the version of the message is bounded to a cutoff date – one year from the release of Keys, the messages will automatically be no longer supported and purged, unless subsequently extended in a new FIP.

KeyBody Message

message KeyBody {
  string type = 1; // 4 bytes max
  uint32 device_id = 2; // a unique identifier to a user's device, can be monotonically incrementing
  PublicKey public_key = 3; // a unique public key per type, per device, per signer
  Signature signature = 4; // a signature over the raw bytes of the public key, optional for some types
}

PublicKey Message

message PublicKey {
  PublicKeyType type = 1;
  bytes public_key_bytes = 2; // the compressed (typically affine) representation of a public key
  bytes public_key_proof = 3; // a proof of knowledge of the private key
}

enum PublicKeyType {
  PUBLIC_KEY_TYPE_UNSPECIFIED = 0;
  PUBLIC_KEY_TYPE_XED25519 = 1; // follows Signal's XEd25519 specification
  PUBLIC_KEY_TYPE_X25519_RING = 2; // follows Spontaneous Anonymous Group signatures
}

Signature Message

message Signature {
  SignatureType type = 1;
  bytes signature = 2; // the signature, in DER bytes (if applicable)
  bytes signer = 3; // the compressed (typically affine) representation of the signing public key
}

enum SignatureType {
  SIGNATURE_TYPE_UNSPECIFIED = 0;
  SIGNATURE_TYPE_K256 = 1; // secp256k1
  SIGNATURE_TYPE_XED25519 = 2; // follows Signal's XEd25519 specification
  SIGNATURE_TYPE_X25519_RING = 3; // follows Spontaneous Anonymous Group signatures
}

The structure of Key messages implies a hierarchical POV: the signature on a key is expected to be in relation to a longer-living key – an identity key is signed by the user’s signer or wallet key, a signed pre-key is signed by the device’s identity key. For one time pre-keys, the key’s signature is omitted, as the message itself is signed by the user’s signer for authenticity.

Validations

A Key message m must pass the following validations:

m.signature_scheme must be SIGNATURE_SCHEME_ED25519
m.data.body must be KeyBody
m.data.body.type must be ≤ 4 bytes
m.data.body.public_key must be a valid PublicKey
m.data.body.signature, if present, must be a valid Signature

A PublicKey p must pass the following validations:

p.type must be a valid PublicKeyType and not set to PUBLIC_KEY_TYPE_UNSPECIFIED
p.public_key_bytes must be in a compressed representation if available for the key type
p.public_key_proof must be a succinct proof of knowledge of the corresponding private key (for EC keys, see ZKPoK-DL in Appendix A)

A Signature s must pass the following validations:

s.type must be a valid SignatureType and not set to SIGNATURE_TYPE_UNSPECIFIED
s.signature must be a valid DER encoded signature
s.signer must be in a compressed representation if available for the key type

A KeyAdd message m is valid only if it passes these validations:

m.data.type must be MESSAGE_TYPE_KEY_ADD

A KeyRemove in a message m is valid only if it passes these validations:

m.data.type must be MESSAGE_TYPE_KEY_REMOVE

Key APIs

Hubs will be extended to support the following APIs:

// Returns a single Key message of a specific type between for a tuple of fid and device 
rpc GetKey(KeyRequest) returns (Message);

// Returns all Keys for a given fid, with optional parameters to filter and sort then
rpc GetKeysByFid(KeysByFidRequest) returns (MessagesResponse);

// Returns all Keys to a given fid's device, with optional parameters to filter and sort then
rpc GetKeysByDevice(KeysByDeviceRequest) returns (MessagesResponse);

// A convenience alias on GetKeysByFid
rpc GetAllKeyMessagesByFid(FidRequest) returns (MessagesResponse);

message KeyRequest {
  uint64 fid = 1;
  uint32 device_id = 2;
  string key_type = 3;
}

message KeysByFidRequest {
  uint64 fid = 1;
  optional string key_type = 2;
  optional uint32 page_size = 3;
  optional bytes page_token = 4;
  optional bool reverse = 5;
}

message KeysByDeviceRequest {
  uint64 fid = 1;
  optional uint32 device_id = 2;
  optional string key_type = 3;
  optional uint32 page_size = 4;
  optional bytes page_token = 5;
  optional bool reverse = 6;
}

Key types expected for use in Direct Casts are:

"idk" - Identity Key
"spk" - Signed Pre-Key
"ibk" - Inbox Key

DirectCast Store

Direct Casts are a bit unusual compared to other types – they can be treated as message types, but they cannot be validated like other message types. We add the DirectCastRequestBody and DirectCastBody types to the MessageData body:

message MessageData {
  ....
  oneof body {
    ...
    DirectCastRequestBody direct_cast_request_body = 16;
    DirectCastBody direct_cast_body = 17;
    ....
  }
}

and add the new MessageTypes:

enum MessageType {
  ...
  MESSAGE_TYPE_DIRECT_CAST_REQUEST_ADD = 15;
  MESSAGE_TYPE_DIRECT_CAST_ADD = 16;
  ...
}

This would need the addition of CRDT Grow-only Sets for Direct Cast Requests and Direct Casts:

Add a Direct Cast Request CRDT which validates and accepts DirectCastRequestAdd and DirectCastRequestRemove messages. The CRDT also ensures that the message m passes these validations:

m.signer must be empty, deeper data validation will ensure integrity.

These requests are conflict free implicitly – a conflicting message cannot be produced.

DirectCastRequest Message

message DirectCastRequestBody {
  bytes inbox_id = 1;
  PublicKey ephemeral_key = 2;
  Ciphertext envelope = 3;
  Signature signature = 4; // a signature over the raw ciphertext bytes
}

message Ciphertext {
  bytes iv = 1;
  bytes ciphertext = 2;
  bytes associated_data = 3;
}

DirectCast Message

message DirectCastBody {
  string conversation_id = 1;
  Ciphertext header = 2;
  Ciphertext message = 3;
  Signature signature = 4; // a signature over the raw ciphertext bytes
}

The structure of DirectCast message types are fully enveloped, using ring signatures to ensure the message is a viable candidate for delivery. Validation cannot be fully ensured by hubs, but hubs can at least guarantee the sender has the right to send to the recipient.

Validation

A Direct Cast Request differs from a Direct Cast Message both in intention and in a few subtler details.

m.signature_scheme must be SIGNATURE_SCHEME_NONE
m.data.body must match the requisite type, being DirectCastRequestBody for requests, DirectCastBody for ongoing conversations
m.data.body.signature.type must be SIGNATURE_TYPE_X25519_RING and the associated signature must be comprised of some or all inbox keys that are mutual follow links of the recipient plus the recipient (depending on the degree of privacy the user wishes to obtain) at the time of the request, or the recipient plus a sufficiently random selection of other users if the recipient does not have mutual following required to receive messages. For ongoing conversations, the chosen inbox keys should not be changed unless the recipient has changed their inbox key, although for enhanced privacy this may not be an ideal time to immediately .
For DirectCasts, the specified m.data.fid on the message should be 0, but for DirectCastRequests should be set to the recipient.

The DirectCastRequest CRDT has a per-user size limit of 200. The DirectCast CRDT size limit is harder to ascertain the right figure – DCs are inboxed unique to each conversation and have no traceability to a given user from hub-visible information. Ideally, they should be limited per conversation to a reasonable number of messages to be received, as they should be purged by the client after receipt.

Direct Cast APIs

// Returns all DirectCastRequests for a given fid, with optional parameters to filter and sort them
rpc GetDirectCastRequestsByFid(DirectCastRequestsByFidRequest) returns (MessagesResponse);

// A convenience alias on GetDirectCastRequestsByFid
rpc GetAllDirectCastRequestsMessagesByFid(DirectCastRequestsByFidRequest) returns (MessagesResponse);

// Returns all DirectCasts for a given conversation id, with optional parameters to filter and sort them
rpc GetDirectCastsByConversationId(DirectCastsByConversationIdRequest) returns (MessagesResponse);

message DirectCastRequestsByFidRequest {
  uint64 fid = 1;
  optional uint32 page_size = 2;
  optional bytes page_token = 3;
  optional bool reverse = 4;
}

message DirectCastsByConversationIdRequest {
  string conversation_id = 1;
  optional uint32 page_size = 2;
  optional bytes page_token = 3;
  optional bool reverse = 4;
}

These RPCs serve to provide a very simple POV of how encrypted messaging should work – the clients necessarily must be more involved to interpret and process messages accordingly. Because this opinionated stance is required to ensure security and secrecy properties, the required behaviors on the client side must also be elaborated to make this document whole.

Client Behaviors

The Javascript client library (and additional libraries in other languages) will need to be updated to handle the full scope of both the Double and Triple Ratchet protocols, but also to properly utilize the RPCs to perform the requisite behaviors that ensure full end-to-end privacy.

Initiating a Conversation

A client must prepare a DirectCastRequest message which contains the following contents:

Request
- Ephemeral Public Key
- Envelope Ciphertext
  - Initialization Message Header
    - Sender’s Device/Signer Bundles
      - Identity Public Key
      - Signed Public Pre-Key
      - Ephemeral Public Key
  - Message Ciphertext
    - Topic id
    - Optional message
- Ring Signature

Preparing an initialization message requires initiating a Triple or Double-Ratchet as a sender, taking the resulting RatchetMessage, relevant Device key bundles, and then encrypting the entire envelope with a symmetric key derived from Diffie-Hellman with the recipient’s inbox key and an ephemeral key. Taking the ciphertext of the envelope, the sender produces a ring signature over the ciphertext using a collection of public keys that would serve to most significantly decrease the identifiability of the sender – For the initial version, maintaining compatibility with existing behaviors where only mutual follows can send messages, this would be maximally derived from the set of mutual follows for the recipient, using their inbox keys. If a sender does not care about a brief period of time in which hubs are aware they are initiating conversation with the recipient (although still blind to the contents), then the ring signature can simply be of size 2, only containing the sender and recipient’s inbox key in the set.

Continuing the Conversation

The ongoing conversation continues at the DirectCast message level, where the sender's declared topic may receive messages from the recipient. The recipient's first message should contain a corollary topic id for the initiating sender to reply on. Messages are simply following either Triple or Double-Ratchet. Critically, it is important that a maximum message size be defined, and messages be padded to that size prior to encryption to ensure cryptanalysis on message contents is infeasible. The ongoing recipients that did not initiate should pick a random set of public inbox keys including their own to form their ring signatures.

Rationale

At a high level, Warpcast currently uses an X25519-based implementation of Double-Ratchet without header encryption, where the identity key is signed by the Farcaster wallet key, the signed pre-key remains signed by the identity key. This configuration, while imperfect, fits well with the experimental nature of Pareto-optimal feature support – and with the nature of React Native's native bridge, such experiment was well warranted to shake out any potential design alterations required to compensate for RN's quirks.

Managing these keys utilizes a simple key store, which is write-access controlled to the user, but read-accessible by all. Messages utilize a simple store with seven day expiration, and are weakly-correlative with the participants, an ability which is in line with the existing unencrypted header data.

Moving this to a decentralized context requires some important lifts:

Metadata must be encrypted
Key storage needs to be adapted to work in a hub model (supporting identity keys produced by the signer to support application-specific messaging)
Inboxes for conversations should be:
- Ephemeral
- Queue-based (on fetch, the message is dequeued/erased)
- Per-device-per-application
- Non-correlative (conversation identifiers should be non-linkable to participants)
- Ideally, robustly non-correlative (this can be added after the fact, such as mixnet-based approaches)

Further elaboration of these problems follows:

General Problems for E2EE Messengers

To contextualize considerations, we will limit discussion of the problem space to E2EE messengers which offer “Signal-level encryption”.

Unwinding Security Guarantees

Many messengers which claim to offer “Signal-level encryption” make compromises to the original properties that made Signal a world-class direct messaging tool. In particular, these properties are:

Forward Secrecy
Backward Secrecy
Repudiation
Replay Protection
Out of Order Messaging

The latter two are trivial to implement in a centralized messenger, or a messenger with basic E2EE properties, however the Double-Ratchet Protocol was particularly clever because it provided those two properties safely in the context of having the other three. Here’s why that’s hard:

A simple system providing Forward Secrecy would be simply deriving symmetric keys from prior ones, which is enough to enable repudiation and replay protection, but out of order messaging and backward secrecy is not assured – mutual key agreement means both parties may use the same key if they are not fully in sync, and once a symmetric key has been compromised, the subsequent keys are derivable.
A simple system providing both Forward Secrecy and Backward Secrecy would be simply deriving fresh symmetric keys from new material (often a Diffie-Hellman exchange) for every message. This again enables repudiation and replay protection, but out of order messaging is now not assured – where the previous circumstance had out of order messaging from the protocol not guaranteeing keys are unique for each message, we now do not have a protocol-level guarantee which message should necessarily follow the previous.

These first two properties, very succinctly, require frequent key rotation, fresh cryptographic material introduction combined with existing state, and most critically, are deleted after use. Where such constructions frequently get unwound is through a compulsory feature of “portability” (although not impossible to provide without breaking these guarantees).

Central Problem

Any kind of single-key-based encryption scheme for this information unwinds these security guarantees, and if a user cannot opt out of this, the application of the Double-Ratchet protocol or similar approaches is effectively security theatre.

Specific Problems in Decentralization

Degrees of User Unmasking

Correlation attacks are extremely easy to perform on decentralized (whether federated or P2P) networks with public data even when the user-identifying data is encrypted – Typical “non-attributable” encrypted messenger inboxing follows the notion of having a dedicated “rendezvous point” inbox per user, where a sender obtains key info about a recipient, sends a encrypted message to that rendezvous point naming a new non-correlated (frequently random, but otherwise un-derivable) topic name to continue the conversation (bidirectional topics), optionally with a second phase from the recipient to confirm or create a second topic for unidirectional topic queues.

Unmasking Bidirectional Topic Users

If both parties are connecting to the same node, this is a blatantly trivial exercise – associate the IP address of the senders to the topic and other activities (publishing new keys, updating rendezvous topic, etc.). As an intermediary node, this is more difficult, but ultimately obtainable – finding associated events published from the nodes tied to a given user will eventually unmask. If the messaging is not permissioned, this can be made trivial with large broadcast sweeps of new topic requests – the timing of a client fetching messages alongside handling the topic request would be enough.

Unmasking Unidirectional Topic Users

Some decentralized messaging protocols take the opinion that wholly separate topics are sufficient for preventing correlation, but while this helps, this is unfortunately insufficient, as again with same-node considerations, the effort is trivial, but for intermediary nodes simply sniffing messages, the same timing analysis applies.

Additional Pattern Analysis

Assuming the above problems are solved on the timing analysis, a tangential discovery mechanism persists – many “Signal-like” messaging clients encrypt messages using a variant of AES, which is a block cipher that operates on 128-bit blocks of data, resulting in ciphertext sizes always divisible by 16 bytes (or in the case of some variants, also including a tag which is frequently appended to the ciphertext). Due to non-uniform sizes of messages, the ciphertext size can be used as a fingerprint. Additionally, the outboxes must contain valid ring signatures that do not identify the participants.

Central Problem

Stated plainly, a solution to this problem should prevent analysis at the same-node level from successfully unmasking users and correlating conversations.

Degrees of Repudiability

Repudiability is the property in which a user has plausible deniability to the authenticity of a message at any time beyond when it was initially sent – that is, when in conversation with another user, the recipient can be assured the message came from the sender, but after the fact, it is impossible to prove the message originated from the sender. This condition is one of the essential features of Signal, and many messaging applications/protocols which profess to be Signal-like fail to uphold this consideration, sometimes intentionally, frequently unintentionally.

Non-Ephemerality

In a distributed systems context, it can be immediately obvious how ephemerality would be harmful to deliverability, and create a poor user experience, but by taking the naive approach by not supporting ephemerality, repudiability becomes infeasible – a rogue node could simply watch and timestamp all traffic, which would assert provenance of the traffic, eliminating the ability to plausibly deny the origination of a message.

Distribution as Storage

Many networks attempt to solve for long-term storage of user inboxes by simply retaining the messages in a given topic as a set. This is immediately problematic because it implicitly destroys repudiability. To prevent this, moving a message into secondary storage on a given network is often considered – this helps, provided said storage is accessible only to the end user and not by intermediaries, which unfortunately is the case with many implementations.

Central Problem

Message queues must be effectively unidirectional, but also guaranteed to be ephemeral, with any associated long-term storage fulfilled either off network or on network in a wholly oblivious way.

Appendix A: Asynchronous Triple-Ratchet Protocol

The asynchronous variant of the Triple-Ratchet protocol is an extension to the Double-Ratchet protocol, utilizing an asynchronous DKG ratchet to provide a “room key” as the counterparty receiver key plugged into the Double-Ratchet algorithm’s Diffie-Hellman ratchet. The DKG ratchet remains secure in the malicious majority model, provided there exists a trusted PKI authority, which can be assumed via the signing key hierarchy of authorized keys issued from an Ethereum wallet holder.

Terminology

Term	Symbol	Definition
Party Count	$n$	The number of parties involved in a multiparty scheme.
Threshold	$t$	The number of parties required to reach cryptographic quorum.
Prime Field	$\mathbb{Z}_q$	A https://en.wikipedia.org/wiki/Field_(mathematics) where the order is prime.
Field Element	${ a, b, … } \in \mathbb{Z}_q$	Member elements of a field, such as elliptic curve points.
Group	$\mathbb{G}$	A cyclic group of order $q$.
Generator	$G$	A chosen field element under a cyclic group which forms a subgroup.
Discrete Logarithm Assumption	DLA	The assumption that computing discrete logarithms in a group is computationally infeasible. This assumption only applies in certain conditions, such as special classes of elliptic curves over finite fields.
Decisional Diffie-Hellman	DDH	The computational hardness assumption that the discrete logarithm assumption remains applicable when two randomly sampled scalar factors, multiplied together and raised exponentially to the generator of the group is indistinguishable from a singular randomly sampled field element.
Node Identifier	$i$, $j$, $k$	The identifier of a given party ($i$) or the identifier of a party relative to the current party ($j$, $k$).
Identity Key	$IDK_i$, $sidk_i$	The unique permanent key pair associated with a given party, with the public key denoted as $IDK_i$, secret key as $sidk_i$.
Signed Pre-Key	$SPK_i$, $sspk_i$	The unique short-lived key pair associated with a given party that is signed by the party’s identity key, with the public key denoted as $SPK_i$, secret key as $sspk_i$.
One Time Pre-Key	$OPK_i$, $sopk_i$	A single-use short-lived key pair associated with a given party, for enhanced entropy in X3DH, with the public key denoted as $OPK_i$, secret key as $sopk_i$.
Ephemeral Key	$EK_i$, $sek_i$	The unique ephemeral key pair associated with a given party that is used in a DH ratcheting scheme, with the public key denoted as $EK_i$, secret key as $sek_i$.
Security Parameter	$κ$	A value which describes the security attribute imposed under a given process, typically in terms of bits – e.g. a hash function’s output size described as or truncated to $κ$.
Verifiable Encryption	VE	A cryptographic process which encrypts a given payload such that any party may verify the authenticity (to a chosen security parameter $κ$) of its contents without revealing the contents thereof, via a verifiable attribute that is publicly disclosable (e.g. a ciphertext containing a private key corresponding to a public key).
Encryption Key	$ECK_i$, $seck_i$	The unique short-lived key pair associated with a given party that is signed by the party’s identity key, used for verifiable encryption purposes, with the public key denoted as $ECK_i$, secret key as $seck_i$
DKG Secret	$sk$, $sk_i$	The logical secret key of the distributed key generation process ($sk$) and the individual shares of that secret held by each party $sk_i$.
DKG Public Key	$PK$, $PK_i$	The public key output of the DKG process ($PK$) and the individual sharings of points ($PK_i$) that collectively, when interpolating $p$ at the y-intercept ($y$=0) with any subset of $t$ sharings of $PK$.
Polynomial	$p_i$, $p_{i,j}$	A randomly sampled polynomial of degree $t-1$ by each party ($p_i$), with evaluations for another party relative to the current party ($p_{i,j}$).
Hash function	$H$	A cryptographic hash function suitable for use as a random oracle.
Zero Knowledge Proof of Knowledge (ZKPoK)	$z$	A proof which allows a verifier to confirm the prover possesses knowledge without revealing the knowledge.
Commitment	$c$	A binding value which confirms a separate value, yet keeps the value hidden until a later phase in which that value is revealed. This is useful when parties need to collectively agree to values which will later be combined, but if any one party revealed their value ahead of others, a malicious party could leverage information from that value and allow them to choose a value that may manipulate the outcome of the combined value in a way to gain an advantage or complete control (Rushing Adversary model).

Preliminary Info

To fully understand how Triple-Ratchet is implemented, the individual pieces involved are described below.

ZKPoK-DL (Schnorr Proof)

In this variant of ZKPoK, we are computing a value relative to a threshold sharing of a logical secret key. Further description of that sharing in of itself is provided in the Secret Sharing and Distributed Key Generation sections, so for this section, simply assume the threshold sharing of the secret and public key ($sk_i$, $PK_i$) to be already created.

Prove

Generate a new random scalar, $r$, as a private key to an EC keypair ($r_i$, $R_i = r_i \cdot G$), matching the same curve parameters as the DKG key.
To make this process non-interactive, we will apply the Fiat-Shamir heuristic by hashing the serialized threshold sharing’s public key concatenated with the random public point: ($ch_i = H(PK_i||R_i)$)
To calculate the ZKPoK, we take the threshold secret, multiply it against the integer representation of the challenge, and add the random scalar: ($z_i = sk_i \cdot ch_i + r_i)$
We finally commit to this ZKPoK by taking the hash of the serialized random public point concatenated with the ZKPoK: ($c_i=H(R_i||z_i)$)
This commitment is to be broadcast in the DKG process ahead of the threshold sharing of the public key.

Verify

Obtain all commitments, then the ZKPoK, random public points and the threshold sharings of the public key are released. With these values, verify:

Reproduce the challenge by hashing the concatenation of the serialized threshold sharing’s public key with the random public point: ($ch_j = H(PK_j||R_j)$)
Multiply the challenge scalar against the threshold sharing public key, then add the random public point to this point, which should equal the scalar multiplication of the ZKPoK against the generator of the curve: ($Z_j = ch_j \cdot PK_j + R_j$)
Multiply the ZKPoK against the generator of the curve and confirm this value equals the previously calculated value, and abort if this does not match: $z_j \cdot G = Z_j$
Take the hash of the serialized random public point concatenated with the ZKPoK, and confirm this matches the commitment, and abort if this does not match: $c_j = H(R_j||z_j)$
If the values matched, verification has succeeded.

Secret Sharing

Shamir’s Secret Sharing (SSS)

Shamir’s Secret Sharing is a technique for encoding a secret in the form of a constant of a randomly sampled polynomial, then distributing evaluations of that polynomial to each participant such that the threshold number of participants in the scheme could perform Lagrange interpolation to reproduce the constant:

Given a threshold of three participants, construct a $t-1$ degree polynomial, randomly sampling coefficients ($A, B$) from the finite field, setting the constant $C$ as the secret:

$f(x) = Ax^2+Bx+C$

The dealer of these secret shares then evaluates the polynomial where $x$ is the identifier of the participant (notably, $x$ cannot equal zero as it would simply be handing the participant the secret, and likewise, $x$ cannot be the order of the group either, as $x\mod{q} = 0$ where $x = q$.

The dealer distributes these samples to each participant, and when recombining, the participants calculate:
$C = f(0) = \sum_{j=0}^{t-1} y_j\prod_{\substack{(m=0) \ {m!=j}}}^{t-1} {x_m\over{x_m - x_j}}$

Because mathematics notation is laboriously terse, here is the algorithm (assume operations are modulo $q$, don’t use this code outright):

var reconstructedSecret = 0;

for (var j = 0; j < threshold; j++)
{
    var numerator = 1;
    var denominator = 1;

    for (var k = 0; k < threshold; k++)
    {
        if (j != k)
        {
            numerator = numerator * (k);
            denominator = denominator * (k - j);
        }
    }

    var reconstructedFragment = thresholdShare[j] * (numerator / denominator);

    reconstructedSecret += reconstructedFragment;
}

return reconstructedSecret;

Feldman Verifiable Secret Sharing (FVSS)

Feldman Verifiable Secret Sharing is the same premise, but with a twist – how do we know that the shares produced by the dealer are correct? We could have all participants converge after distribution to verify any combination of threshold shares succeed in producing the same value, but oftentimes in secret sharing schemes the recipients of the shares are not in the same proximity at the time of construction, often for security purposes. When we need to trust that the secret sharing scheme was legitimate, we amend the sharing protocol to include a proof:

When computing the shares from the secret, take the secret as an exponent to the group (in the case of elliptic curve secrets, this means use the secret as a scalar to the generator $G$: $s \cdot G = P$, and do the same to all the shares: $s_0 \cdot G = P_0, s_1 \cdot G = P_1, ...$

Distribute the public values with the secrets to each participant, and distribute $P$. All participants may do Lagrange interpolation of the polynomial with the public values of the shares (Shamir in the Exponent) like before, this time iterating through all participants. The resulting output, if the dealer did not cheat, will equal $P$ for all combinations of threshold participants. If the dealer did cheat, some or all of the combinations will not equal $P$. Again, because this is somewhat terse, the algorithm is as follows:

// `curve` represents a known set of curve parameters
// Point at infinity is the `0` equivalent when operating over
//   EC field elements
var publicKey = curve.Infinity;

for (var i = 0; i < this._total - this._threshold - 1; i++)
{
    var reconstructedSum = curve.Infinity;

    for (var j = 0; j < this._threshold; j++)
    {
        var coefficientNumerator = 1;
        var coefficientDenominator = 1;

        for (var k = 0; k < this._threshold; k++)
        {
            if (j != k)
            {
                coefficientNumerator = coefficientNumerator * (i + k);
                coefficientDenominator = coefficientDenominator * (k - j);
            }
        }

        var reconstructedFragment = points[i + j].Multiply(
            coefficientNumerator / coefficientDenominator
        );

        if (reconstructedSum.IsInfinity)
        {
            reconstructedSum = reconstructedFragment;
        }
        else
        {
            reconstructedSum = reconstructedSum.Add(
                reconstructedFragment
            );
        }
    }

    if (publicKey.IsInfinity)
    {
        publicKey = reconstructedSum.Normalize();
    }
    else if (!publicKey.Equals(reconstructedSum.Normalize()))
    {
	return null;
    }
}

return publicKey;

Because the DLA applies to the class of elliptic curves in prime fields that we are working in, it is computationally infeasible to reconstruct $s$ given $P$.

Distributed Key Generation (DKG)

Given the above information, we now have enough tools at our disposal to create a Distributed Key Generation algorithm. Each party will do the following in rounds of communication (over a secure channel, which we already have via Double-Ratchet 🙂):

Individually perform a local SSS, and send evaluations of the polynomial to each party.
These evaluations are fragments, which each party will sum to their scalar secret key $sk_i$, and public point ($sk_i \cdot G = PK_i$). Each party now calculates a ZKPoK-DL against these values, and sends the commitment to all parties.
Once all parties have received the commitments, each party will send their public point ($PK_i$), the random public point from ZKPoK-DL ($R_i$), and the ZKPoK itself ($z$).
Upon receiving these values, each party verifies the ZKPoK-DL outputs, if invalid, aborts, and if valid, then performs PVSS’ Shamir in the Exponent recombination of the public points (notably, the secret key shares are never distributed, because we do not need to expose this data to other parties). Each party may simply transmit a simple bit of information confirming success or failure, or the reconstructed public key, if useful (can be in a trusted PKI environment).

Threshold Diffie-Hellman (Threshold DH)

Note that in the DKG’s second round, we used the generator $G$ to produce the public point sharings $PK_i$. Assume DKG has been performed. Any threshold number of parties may now perform Distributed Diffie-Hellman:

Given a target ephemeral public key ($EK_j$) to perform DH against, calculate $sk_i \cdot EK_j = DH_i$. The creator of the ephemeral public key may optionally elect to calculate their own sharing of $DH_i$ to make this process require fewer participants. All threshold parties broadcast their sharing ($DH_i)$.
Perform the Shamir in the Exponent recombination, which will result in the output agreed key $DH$. Taking the $x$-coordinate of the affine representation of this point, you now have the output value of normal Diffie-Hellman.

Verifiable Encryption

There exist many verifiable encryption schemes. For the sake of our needs, consider a DKG scenario in which some parties are offline after round 3. We have verifiability outputs, but they are specifically encrypted in transmission to be retrieved by each party member. We could optionally elect to expose the values in the clear so that there is no way in which individual parties could be mislead by the outputs when they later come online, but this places trust on the hubs that the contents are not manipulated or if some amended protocol produces a jagged broadcast model where the outputs of step 2 and 3 are combined and stashed, a rogue hub could collude with an attacker to invalidate the security of the ZKPoK by allowing the attacker to amend their stashed values to conduct a rushing adversary attack.

Simple Encryption

Leveraging the DLA, we can encrypt the point value by multiplying the secret key share ($sk_i$) against the public Encryption Key of the recipient ($ECK_j$). The recipient can calculate the modular inverse of their secret Encryption Key ($seck_j^{-1}$) and multiply it against the encrypted value, producing the resulting public point share ($PK_i$). Augmenting the DKG process to transmit this output instead of $PK_i$ would allow the safe transmission of the public point share as a message header, however other parties would be unable to verify that this encryption was correct.

Simple Verifiable Encryption

Leveraging a simple verifiable encryption scheme a la cut-and-choose techniques, we can offer some variant to allow other parties to verify the encryption was done correctly, but further elaboration is less important for the scope of this document.

Triple-Ratchet Protocol

Problem

The problem with the above DKG approach is that the setup phase requires all parties to not only be online at time of key generation, but that they must also all perform each round before any may proceed to the next. While it is feasible in some scenarios to allow a small collection of users to conduct DKG to fit into the “counterparty” receiver side of the DH ratchet with the expectation that they will remain online (typically, this would be in a decentralized service in which at least a quorum of node operators leave their nodes always online), this is inviable for highly asynchronous communicators who may only have one party online at a time.

Relaxing the Rushing Adversary Threat?

This friction can be reduced by simultaneously reducing the security of the protocol through removal of the ZKPoK requirement, and thus participants merely maintain a cache of polynomials/evaluations ahead of time. Losing the security against Rushing Adversaries may not be problematic for a group chat (indeed, Double-Ratchet itself does not care as the DH ratchet itself is quite susceptible to this attack).

Utilizing an epoch-based approach

We could simply allow users to emit one-time use polynomial fragment bundles in step one into a logical set of queues, each encrypted to the individual peers, and upon an epoch, all parties dequeue the latest polynomial fragments, perform their summations, and distribute public points (omitting the ZKPoK process) so that local Shamir in the exponent can be performed to derive the new public key. This can now be our process as well for the “room key” of the Diffie-Hellman ratchet. This poses two problems, with solutions that break the original strength of Double-Ratchet:

A sender, upon first encrypted message send has now revealed to a quorum the shared sending chain key for that sender. The receivers can now forge any message during this epoch from that user. Solutions like MLS resolve the message forgery problem through a combination of trust-oriented components (authentication to the server, authentication in the message), but this removes repudiability.
It is possible that a malicious client could emit one-time use polynomial fragment bundles that produce different public keys to different parties, bifurcating the room key. The protocol could be amended to not allow a new epoch to proceed until all parties have emitted public points and all parties have collectively agreed on a key, but that brings us back into a relatively synchronous pattern, and not all parties are online (and some may even be lost). This would force a re-initiation, which makes the entire experience slow.

A Grab-bag of Polynomials

Revisiting the initial setup phase of DKG, we already have the following available to us:

A list of the participants involved
A secure peer-to-peer channel
Relevant keys to the participants ($IDK_i$, $SPK_i$, $ECK_i$)

Let’s iterate over a theoretical asynchronous DKG protocol which allows for ratcheting of the room key in a way that any party can reasonably verify no bifurcations may occur, nor permit message forgeries (this requires consensus in a P2P model, which does not apply here), without losing the Forward Secrecy, Post-Compromise Secrecy or Repudiation/Deniability properties of Double-Ratchet.

Asynchronous Verifiable DKG Ratchet

Polynomial Verifiable Sharing (PVS)

Recall that in PVSS, we had performed Lagrange interpolation over the set of terms raised to the exponent of the generator (Shamir-in-the-exponent). Performing the same for the fragments produced by any given party can provide a meaningful verifiable sharing mechanism, extending from the simple EC additive homomorphism.

Given a series of fragment ($frag_{i,j}$) scalars I wish to send to each party, I can calculate the following:

A publicly verifiable fragment raised to the exponent of the generator: $FRAG_{i,j} = frag_{i,j} \cdot G$
The secret of the polynomial, raised to the exponent of the generator: $S = s \cdot G$

These fragments are distributed to all parties, who confirm:

It does not resolve anyone’s polynomial fragment to the point at infinity with the negation of the previous fragment and the addition of the new.
All fragments shared recombine in the exponent to S.

Asynchronous DKG Ratchet

Now that we have a means to ensure no party can produce invalid fragments, we can adopt a new ratcheting scheme for DKG. Each party will (upon their need to ratchet):

Individually perform a local FVSS with PVS, and enqueue the output bundles to the respective recipients.
When a new bundle is needed from a given party, the other participants will dequeue the bundle, and perform the verification process of PVS. Upon confirmation of verification, each party will recalculate the new shared polynomial, substituting the old fragment with the new to obtain a new $sk_i$, and each party will send their public point ($PK_i$).
Each party then performs PVSS’ Shamir-in-the-Exponent recombination of the public points.

Diffie-Hellman Ratchet

The Diffie-Hellman Ratchet thus remains roughly the same as before, with the Alice role as any sender, the ADKG ratchet as the receiver, and agreement becomes two phase:

The threshold number of receivers submit their public points produced by the sender Ephemeral Public Key point multiplied by the receiver’s private scalar: $PEK_i = sk_i \cdot EK_j$. Each receiver will add a new ADKG ratchet output bundle to the output queues.
The threshold number of parties perform Shamir-in-the-Exponent recombination to produce the DH agreement key. This value is fed into the KDF ratchet to produce the receiving chain key. The ADKG ratchet gets bumped by the first receiver to have submitted a public point – this can be done deterministically.

KDF Ratchet

The KDF Ratchet also remains roughly the same, except the initial session key (which becomes the initial root key) is derived from the following:

$DH1 = DH(idk_i, PK)$
$DH2 = DH(spk_i, PK)$
$DH3 = DH(ek_i, PK)$
$SK = HKDF(salt, (domain||DH1||DH2||DH3), info)$, where $salt$ is a uniquely different $salt$ from the Double-Ratchet $salt$ of 32 0x00 bytes (we can go with 32 0x01 bytes, it just needs to be unique so it is distinct from a Double-Ratchet session derivation). The $domain$ separator remains the same as we are still dealing with ed25519 keys, and $info$ also remains the same as the application is still Farcaster.

When admitting a new party into the conversation, the current root key must be shared along with all other public values.

The receiver chain key ratchet is the same as before, except a unique receiver ratchet must be maintained for every active party, as their sending chain key will differ. For every DKG rotation, this receiver chain resets, so it is not a high cost.

Security Properties

Forward Secrecy

Because the secrets from previous ratchets are discarded as in Double-Ratchet, we retain the Forward Secrecy property in that previous messages are not decryptable by adversaries who obtain access to the current keys.

Post-Compromise Secrecy

In the event an attacker gains access to the current keys, new key information added in subsequent ratchet rounds enables post-compromise secrecy when the attacker has lost access to key material.

Repudiation

Because the sender’s chain key becomes the sender-oriented recipient chain key for decryption, once the round of messages has concluded, it is no longer distinguishable that the sender had sent a given message, or if the previous messages were forged, granting repudiation.

Appendix B: Breaking the Rogue Hub Link

Given strong analytics, two conversations ids in rapid exchange could be linked, leading to unmasking of senders. To defeat this analysis, hubs can employ a mixnet – RPM in the single party prover scheme or multi-party scheme can sufficiently defeat this kind of analysis, the latter being more ideal as the former still requires a degree of trust on the connection to not be surveilled.

Appendix C: Existing Behaviors

The flow to construct the root key and sending chain key (and thus subsequent DH ratcheting to produce the next root key and alternating with the receiving chain key) is as follows:

flowchart TD
    A[Wallet Key] -->|Signs| B[Identity Key]
    B -->|Signs| C[Signed Pre-Key]
    B --- E[ ]:::empty
    C --- E[ ]:::empty
    D[Ephemeral Key] --- E[ ]:::empty
    E -->|X3DH w/Counterparty, SHA-256 pairwise| SS[Session Key]
    DS["[0xFF...FF] (32 bytes)"] ---|domain separator prefix| F[ ]:::empty
    SS --- F[ ]:::empty
    F -->|secret| SK(HKDF)
    SSalt["[0x00...00] (32 bytes)"] -->|salt| SK
    SI["'farcaster'"] -->|info| SK
    SK --> SKey[Session Key]
    EK[Ephemeral Key] --- G[ ]:::empty
    SPK[Signed Pre-Key] --- G[ ]:::empty
    G -->|DH w/Counterparty, SHA-256| IK[Input Key]
    IK -->|secret| HKDF(HKDF)
    SKey -->|salt| HKDF
    AN["'farcaster'"] -->|info| HKDF
    HKDF --> RK[Root Key]
    HKDF --> SCK[Sending Chain Key]
    classDef empty width:0px,height:0px

This will be modified in the hub-based approach to utilize different domain separators when in double-ratchet vs triple-ratchet modes.

Additionally, the use of header encryption (and thus header key generation) and ring signatures for checking message validity are also present, which were not in the Warpcast API approach.

Appendix D: Additional Curves

Ideally, we will want to move to stronger curves in the near future as attacks on lower bit strength curves become more tenable. For greatest compatibility with existing features/behaviors, Curve448 is a strong drop in replacement, and would only require the additional PublicKeyTypes

  PUBLIC_KEY_TYPE_XED448 = 3; // follows Signal's XEd448 specification
  PUBLIC_KEY_TYPE_X448_RING = 4; // follows Spontaneous Anonymous Group signatures

and SignatureTypes:

  SIGNATURE_TYPE_XED448 = 4; // follows Signal's XEd448 specification
  SIGNATURE_TYPE_X448_RING = 5; // follows Spontaneous Anonymous Group signatures

alexpaden · 2023-06-22T01:16:48Z

alexpaden
Jun 22, 2023

This reads really nicely thanks

0 replies

gabrielayuso · 2023-06-24T13:41:30Z

gabrielayuso
Jun 24, 2023
Maintainer

Such a well written and detailed proposal.
My first meta question is whether it makes sense for the Farcaster protocol to get into the DM game at this point.

Wouldn't it make sense to rely on other protocols such as XMTP for that purpose?

Instead of investing resources to do this in FC directly, figure out the best way for FC and its clients to integrate with XMTP and other protocols? If they aren't up to par then bring up these consents with them so they can improve.

2 replies

CassOnMars Jun 25, 2023
Maintainer Author

Wouldn't it make sense to rely on other protocols such as XMTP for that purpose?

Fundamentally, XMTP's goals are very different from the goals of the Direct Casts FIP for Farcaster – The Rationale section above defines some of the reasoning without specifying particular protocols, but since this was specifically mentioned, I'll elaborate on the most important distinctions with respect to those concerns. I want to be abundantly clear, this is not a critique of XMTP or their team's efforts – they have built a great product, but their goals and priorities are simply different from the offering of what this FIP is describing

Security Properties

XMTP messages are encrypted, but does not eliminate traffic analysis or even basic linkability
To illustrate, let's look at a snippet of the Litepaper:

Messages encrypted with this type of key contain unencrypted headers indicating the public keys of both parties in the conversation. Invitation messages offer cryptographic deniability as only the participants can verify that the keys used to encrypt the message match the identities listed in the headers.

Let's consider the rogue node condition: a node that by all appearances follows the protocol correctly, but despite the data it returns and its conformance to the protocol, in reality does not actually delete messages, and logs all traffic, messages, and topic information it can see.

If the node is the one directly receiving calls from the participants, then unmasking is trivial: simply watching the traffic to invite topics and subsequent conversation removes all doubt. But where things are more concerning is the misuse of X3DH with respect to the stated assumptions: With Alice sending a message to Bob's private invite topic, Bob sending a message to Alice's invite topic, both parties contain unencrypted headers (which is of course necessary to some extent) containing the public key bundles required to perform X3DH and thus decrypt the initialization message. This results in two important observations:

Not Deniable: Deniability comes from the ability of participants to forge messages such that ex parte examination cannot discern if this message truly originated from the person in question – can this be forged? Yes, but conditional acceptance of the message, timing, and subsequent topic creations/messages removes all doubt towards the veracity. (see (2) No Ephemerality in the next section)
Misuse of Key Material: The purpose of X3DH in the Double-Ratchet protocol is to establish a session key which is chained with additional material over the life of the session to create new root keys and sending/receiving chain keys, and is one of the critical components in providing perfect forward secrecy – the admixture of existing key material that is out of band with the continued iteration of contributed key material between each parties Diffie-Hellman ratchet. Using the session key directly to encrypt the invite message now establishes a secondary use – a malicious node holding onto invite messages now can silently collect direct uses of session keys. In designing secure cryptosystems, it is crucial that keys are delineated by purpose, and without a proof of security to their use in multiple purposes, it should be assumed to be unsafe. This is enumerated in multiple standards – NIST (see Recommendations for Key Management: Part 1 - General) in particular very cleanly separates symmetric keys that are used for further derivation (a master derivation key) versus direct use in encryption/decryption.

But, for the sake of exercise, let's assume the session key reuse does not have a sophisticated attack and users do not care about being identified. There are further concerns.

XMTP messages are encrypted, but the properties of Double-Ratchet are completely unwound by its storage mechanism
As the current implementation stands, there are two significant issues:

The master encryption key for all private key material and user-controlled data is derived from a signature from the user's wallet: To ensure portability, this signature is not bound to any process that isolates to a domain, all trust is contingent on the user to not accidentally sign this message when requested by a malicious application. To your point about my bringing concerns to the team, I did: https://github.com/orgs/xmtp/discussions/11#discussioncomment-3533137 To their credit, they are working on a new version to resolve this concern, but at this time the existing NetworkKeyManager is still enabled by default, and critically must be in order to support inbox portability.
No ephemerality: Even when (1) is resolved, all private key material is shared between devices/applications of the user. Key material sharing between devices provides convenience to the end user to ensure that messages are accessible no matter the device or application, but it is a tradeoff that is not explained to the end user: the tradeoff is that even ephemeral keys are no longer truly ephemeral, meaning the PFS properties of Double-Ratchet are completely eliminated by design. Because of this, the use of Double-Ratchet only provides assurance of order in out of order sending, and only lightly – devices/applications will need to be aware of exactly the current state, or they can send different messages using the same sequence number/key.

Features

Secure Multi-device: To perform communication from multiple devices/applications while retaining the PFS and deniability properties of Double-Ratchet, each device/application pair will need to have their own set of keys to manage. The design considerations of Triple-Ratchet utilizes a variation of already-established/proven distributed key generation (DKG) protocols, which enables more than two explicit sets of key bundles (be it devices or users) to mutually agree to relevant key material. This confers one more important additional benefit:
Secure Group Conversations: The use of elliptic curve DKG enables a plug-and-play counterparty side to the DH ratchet (i.e. when labeling the initiator as Alice, the recipient as Bob, Bob becomes the DKG side), as DKG's use of Shamir in the exponent can substitute the sender's ephemeral public point in place of the generator, a behavior used in many existing multi-party encryption schemes (see https://www.shoup.net/papers/thresh1.pdf for a review of TDH1 and TDH2). At the time of writing, XMTP does not yet support group conversations at the protocol level.
Deniability at all Levels: The use of ring signatures at the initialization topic level is important as it unlinks the initialization message (which should be encrypted following Double-Ratchet or Triple-Ratchet) from the identity of the sender initiating the conversation. Senders can optionally remove that anonymity by using a ring signature set size of only their key, but for those who prefer privacy, the anonymity set can be sufficiently high enough to ensure the sender is unidentifiable to anyone but the recipient. This also feeds into one other important design consideration: users who wish to configure their account such that only those they have mutual links with (i.e. both follow each other) may message them does not necessarily reduce anonymity of the ring signatures, it only dictates that the ring signatures' public key set indeed is strictly contained within the set of mutual follows to that recipient, allowing enforcement of the setting at the protocol level.

Important Differences
As mentioned above, there are tradeoffs in the more secure design, that adds friction to the end user experience – most critically, devices/applications may only see messages during the time in which they had keys available. This means that when adding a device or application, the user will only see messages on this pair for the duration in which those keys are active. This is great for security, as you can revoke an entire device, or an entire application, or both, and access can be safely assured to be removed for them to future messages. This however has zero inbox portability – any attempt at delivering portability of inboxes has to be an out-of-band solution. This is more ideal for security anyway, as it can operate over messages without delivering the key material and allow specific dictation of which messages are accessible, but consequentially it is higher friction, as the device with access to the messages must necessarily be involved in this transportation.

gabrielayuso Jul 6, 2023
Maintainer

Thank you for the detailed explanation Cassie, this is very helpful.

I guess my meta point is not about the merits of this design but about prioritization overall.
Then again, this is still in idea stage and perhaps its premature to think about how this fits in the overall roadmap.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIP: Direct Casts #99

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 2 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

FIP: Direct Casts #99

CassOnMars Jun 20, 2023 Maintainer

Abstract

Problem

Specification

Key Store

Key APIs

DirectCast Store

Direct Cast APIs

Client Behaviors

Initiating a Conversation

Continuing the Conversation

Rationale

General Problems for E2EE Messengers

Unwinding Security Guarantees

Specific Problems in Decentralization

Degrees of User Unmasking

Degrees of Repudiability

Appendix A: Asynchronous Triple-Ratchet Protocol

Terminology

Preliminary Info

Triple-Ratchet Protocol

Asynchronous Verifiable DKG Ratchet

Diffie-Hellman Ratchet

KDF Ratchet

Security Properties

Appendix B: Breaking the Rogue Hub Link

Appendix C: Existing Behaviors

Appendix D: Additional Curves

Replies: 2 comments · 2 replies

alexpaden Jun 22, 2023

gabrielayuso Jun 24, 2023 Maintainer

CassOnMars Jun 25, 2023 Maintainer Author

gabrielayuso Jul 6, 2023 Maintainer

CassOnMars
Jun 20, 2023
Maintainer

Replies: 2 comments 2 replies

alexpaden
Jun 22, 2023

gabrielayuso
Jun 24, 2023
Maintainer

CassOnMars Jun 25, 2023
Maintainer Author

gabrielayuso Jul 6, 2023
Maintainer