title: "The Mastic VDAF" abbrev: "Mastic" category: info
docname: draft-mouris-cfrg-mastic-latest submissiontype: IRTF number: date: consensus: true v: 3 area: "IRTF" workgroup: "Crypto Forum" keyword:
- Internet-Draft venue: group: "Crypto Forum" type: "Research Group" mail: "cfrg@ietf.org" arch: "https://mailarchive.ietf.org/arch/search/?email_list=cfrg" github: "jimouris/draft-mouris-cfrg-mastic"
ins: H. Davis
fullname: Hannah Davis
organization: Seagate
email: hannah.e.davis@seagate.com
- ins: D. Mouris fullname: Dimitris Mouris organization: Nillion email: dimitris@nillion.com
- ins: C. Patton fullname: Christopher Patton organization: Cloudflare email: chrispatton+ietf@gmail.com
- ins: N. G. Tsoutsos fullname: Nektarios G. Tsoutsos organization: University of Delaware email: tsoutsos@udel.edu
fullname: Pratik Sarkar
organization: Supra Research
email: pratik93@bu.edu
- fullname: David Cook org: ISRG email: dcook@divviup.org
normative:
informative:
BGI15: title: "Function Secret Sharing" author: - ins: E. Boyle - ins: N. Gilboa - ins: Y. Ishai date: 2015 seriesinfo: EUROCRYPT 2015 target: https://www.iacr.org/archive/eurocrypt2015/90560300/90560300.pdf
CP22: title: "Lightweight, Maliciously Secure Verifiable Function Secret Sharing" author: - ins: L. de Castro - ins: A. Polychroniadou date : 2022 seriesinfo: EUROCRYPT 2022 target: https://iacr.org/cryptodb/data/paper.php?pubkey=31935
MST24: title: "PLASMA: Private, Lightweight Aggregated Statistics against Malicious Adversaries" author: - ins: D. Mouris - ins: P. Sarkar - ins: N. G. Tsoutsos date : 2024 seriesinfo: PETS 2024 target: https://ia.cr/2023/080
MPDST25: title: "Mastic: Private Weighted Heavy-Hitters and Attribute-Based Metrics" author: - ins: D. Mouris - ins: C. Patton - ins: H. Davis - ins: P. Sarkar - ins: N.G. Tsoutsos date: 2025 seriesinfo: PETS 2025 target: https://ia.cr/2024/221
RZCGP24: title: "Private Analytics via Streaming, Sketching, and Silently Verifiable Proofs" author: - ins: M. Rathee - ins: Y. Zhang - ins: H. Corrigan-Gibbs - ins: R.A. Popa date: 2024 seriesinfo: IEEE S&P 2024 target: https://eprint.iacr.org/2024/666
W3C23: title: "Network Error Logging" author: - ins: W3C Working Group date: 2023 target: https://www.w3.org/TR/network-error-logging
--- abstract
This document describes Mastic, a two-party VDAF for the following secure aggregation task: each client holds an input and an associated weight, and the data collector wants to aggregate the weights of all clients whose inputs begin with a prefix chosen by the data collector. This functionality enables two classes of applications. First, it allows grouping metrics by client attributes without revealing which clients have which attributes. Second, it solves the weighted heavy hitters problem, where the goal is to compute the subset of inputs that have the highest total weight.
--- middle
(RFC EDITOR: Remove this paragraph.) The source for this draft and the reference code can be found at https://github.com/jimouris/draft-mouris-cfrg-mastic.
The private "heavy hitters" problem is to compute the most popular input strings generated by clients without learning the inputs themselves. For example, a browser vendor might want to know which websites are visited most frequently without learning which clients visited which websites.
This problem can be solved by combining a binary search with a subroutine solving the simpler "prefix histogram" problem. The goal of this problem is to count how many of the input strings begin with each of a sequence of candidate prefixes. This problem can be solved using a Verifiable Distributed Aggregation Function, or VDAF {{!VDAF=I-D.draft-irtf-cfrg-vdaf-12}}.
The Poplar1 VDAF specified in {{Section 8 of !VDAF}} describes how to distribute this computation amongst two aggregation servers such that, as long as one server is honest, no individual's input is observed in the clear. At the same time, Poplar1 allows the servers to detect and remove any invalid measurements that would otherwise corrupt the computation.
This document describes Mastic {{MPDST25}}, a VDAF for the following, more general functionality: each client holds an input and an associated weight, and the data collector's goal is, for each candidate prefix, to aggregate the weights of all clients whose inputs have the prefix in common. This functionality gives rise to two types of applications:
-
"weighted heavy hitters": Rather than compute the most frequent inputs, as in plain heavy hitters, the goal here is to compute the set of inputs with the highest total weight. For example, a browser vendor might want to know which web pages have the highest average load time, perhaps indicating a performance issue in the browser. Because weighted heavy hitters is more general, Mastic can be used as a drop-in replacement for Poplar1. It is is also more efficient, requiring just one round of communication for preparation ({{Section 5.2 of !VDAF}}) compared to Poplar1's two rounds.
-
"attribute-based metrics": The Prio3 VDAF ({{Section 7 of !VDAF}}) can be used for a variety of aggregation tasks, ranging from simple summary statistics, like average or standard deviation, to more sophisticated representations of data, like histograms or linear regression. In many situations, it is desirable to group such metrics by client attributes such as geolocation or user agent ({{Section 10.1.5 of ?RFC9110}}). Mastic provides this functionality without revealing any client's attribute to the aggregation servers or data collector.
The main component of Mastic is the Verifiable Incremental Distributed Point
Function (VIDPF) of {{MST24}}. VIDPF extends IDPF ({{Section 8.3 of !VDAF}}),
the main building block of Poplar1. Both IDPF and VIDPF are a form of function
secret sharing {{BGI15}}, where a client generates shares of a secret function
F
such that each server can computes shares of F(X)
for a chosen X
. In
our case, the function being shared is associated with a secret input string
alpha
and weight beta
for which F(X) = beta
for every prefix X
of
alpha
and F(X) = 0
for every X
this is not a prefix of alpha
. The
scheme is verifiable in the sense that, for any two candidate prefixes of the
same length, the servers can verify that at most one of them evaluates to
beta
and the other(s) evaluate(s) to 0
.
Mastic combines VIDPF with a method for checking that beta
itself is a valid
weight. For example, if the weights represent page load times, it is important
to make sure each weight is within a sensible range, say within seconds rather
than hours or days. Otherwise, misbehaving clients would be able to poison the
computation by reporting out of range values. This range check is accomplished
with the Fully Linear Proof (FLP) system of {{Section 7.3 of !VDAF}}. An FLP
allows properties of secret shared data to be validated without revealing the
data itself. In Mastic, the client generates an FLP of its beta
's validity;
when the servers are ready to evaluate the VIDPF, they first compute shares of
beta
and verify the FLP, which itself is secret shared. Then the VIDPF
ensures that the non-0
output of the point function is the same for each
evaluation.
This document specifies VIDPF in {{vidpf}} and the composition of VIDPF and FLP into Mastic in {{vidpf}}. The appendix includes supplementary material:
-
{{motivation}} discusses some use cases that motivated Mastic's functionality.
-
{{additional-modes}} describes extensions and optimizations for Mastic, including a batched "preparation" ({{Section 5.2 of !VDAF}}) mode of operation that reduces communication cost, and a 3-party variant of the protocol that ensures robustness against poisoning attacks in the presence of one malicious aggregation server.
{::boilerplate bcp14-tagged}
This document uses the same conventions and definitions of {{Section 2 of !VDAF}}. The following functions are as defined therein:
gen_rand(len: int) -> bytes
- TODO List all functions we use in alphabetical order.
This document uses the following terms as defined in {{!VDAF}}: "Aggregator", "Client", "Collector", "aggregate result", "aggregate share", "aggregation parameter", "batch", "input share", "measurement", "output share", "prep message", "prep share", and "report".
Mastic uses finite fields as specified in {{Section 6.1 of !VDAF}}. We usually
denote a finite field by F
and its Python class object, a subclass of
Field
, as field: type[F]
.
An instance of Mastic is determined by a desired bit-length of the input,
denoted BITS
, and a validity circuit, an instance of Valid
as defined in
{{Section 7.3.2 of !VDAF}}. The validity circuit is used to instantiate the FLP
and defines the type of the weights generated by Clients and the type of the
total weight for each prefix computed by the Collector.
The Client's measurement has two components: the input string alpha: tuple[bool, ...]
of length BITS
and its weight. The weight's type is denoted
by W
. We use beta: list[F]
to denote the encoded weight, obtained by
invoking the FLP's encoder ({{Section 7.1.1 of !VDAF}}).
The aggregate result has type list[R]
, where R
is likewise a type defined
by the validity circuit. Each element of this list corresponds to the total
weight for one of the candidate prefixes.
Finally, Mastic uses eXtendable Output Functions (XOFs) as specified in
{{Section 6.2 of !VDAF}}. Each invocation of a XOF is initialized with a domain
separation tag set to b'mastic' + to_le_bytes(VERSION, 4) + byte(usage) + ctx
, where ctx
is the application context string (see {{Section 4.1 of
!VDAF}}) and usage
is an integer in the range [0, 255)
. The length of ctx
MUST be in range [0, 2^16 - 11)
.
NOTE This range was computed by taking the maximum size of the domain separation tag supported by both XofFixedKeyAes128 and XofTurboShake128 and subtracting the length of the prefix.
NOTE This specification is based on {{MST24}}, which in turn draws on ideas from {{CP22}}. We don't yet have a concrete security analysis of the complete construction. Some details are likely to change as a result of such analysis.
Parameter | Description | Value |
---|---|---|
KEY_SIZE: int |
the size of each VIDPF key | XofFixedKeyAes128.SEED_SIZE ({{Section 6.2.2 of !VDAF}}) |
NONCE_SIZE: int |
the size of the VIDPF nonce | KEY_SIZE |
RAND_SIZE: int |
the number of random bytes consumed by gen() |
2 * KEY_SIZE |
BITS: int |
bit length of the input string alpha |
set by constructor |
VALUE_LEN: int |
length of beta |
set by constructor |
field: type[F] |
class object for the field ({{Section 6.1 of !VDAF}}) | set by constructor |
{: #vidpf-params title="VIDPF parameters."} |
This section specifies the Verifiable Incremental Distributed Point Function (VIDPF) of {{MST24}}. Its parameters are summarized in {{vidpf-params}}.
VIDPF is a function secret sharing scheme {{BGI15}} for functions F
for which:
F(X) = [field(1)] + beta
ifX
is a prefix ofalpha
F(X) = field.zeros(VALUE_LEN+1)
ifx
is not a prefix ofalpha
where alpha
and beta
are the input and encoded weight of a Client. The
scheme is designed to allow each Aggregator to compute a share of F(X)
for
any candidate prefix X
without revealing any information about alpha
or
beta
. Furthermore, the output shares can be aggregated locally, allowing each
Aggregator to compute a share of the total weight for all inputs that have X
as a prefix.
Along with encoded weight beta
, the output includes a counter prefix, denoted
field(1)
, so that the total weight also includes the prefix count. This
allows for the Collector to take this into account when decoding the aggregate
result for each prefix. This is required by {{Section 7.1.1 of !VDAF}}.
The Aggregators evaluate a Client's VIDPF on a sequence of candidate prefixes.
Imagine arranging these prefixes in a binary tree where each path from the root
corresponds to a prefix X
and each node corresponds to a payload F(X)
. We
refer to this as the "prefix tree".
When the Aggregators evaluate a Client's VIDPF, they verify three properties of the prefix tree:
-
One-hotness: at every level of the tree, at most one node has a non-zero payload.
-
Path consistency: each payload is equal to the sum of the payloads of its children. If one-hotness holds, then this ensures the payload is equal to
[field(1)] + beta
for each node along thealpha
path. -
Counter consistency: the counter of the non-zero payload is equal to
field(1)
.
VIDPF is comprised of two algorithms.
The key generation algorithm defined in {{vidpf-key-gen}} takes in a (alpha, beta)
and a nonce and outputs secret shares of F
. The shares take the form
of a pair of "keys", one for each Aggregator, and a sequence of "correction
words" sent to both Aggregators. We define correction words in the next
section.
The key evaluation algorithm defined in {{vidpf-key-eval}} takes in the
correction words, the Aggregator's key, the sequence of candidate prefixes,
and the nonce associated with the Client's report. It outputs secret shares of
beta
, the share of the payload for each prefix, and a byte string known as
the "evaluation proof". To verify one-hotness, payload consistency, and
counter consistency, the Aggregators check that the proofs they computed are
equal.
The VIDPF-key generation algorithm run by each Client is listed below. The
specification invokes auxiliary functions defined in {{vidpf-aux}}. Its inputs
are the input string alpha
, the encoded weight beta
, the application
context string ctx
({{Section 4.1 of !VDAF}}), a public nonce of length
NONCE_SIZE
, and the randomness consumed by the algorithm of length
RAND_SIZE
. Its outputs are the public sequence of "correction words", one for
each level of the tree, and the secret keys, one for each Aggregator.
TODO Give a high level overview of how IDPF works, in particular the seed/control-bit invariant for each level. Define
CorrectionWord
and explain the role of correction words and define node proofs, which are unique to VIDPF.TODO Specify bounds on the inputs, namely that the nonce has to be random. (This is inherited from IDPF security considerations.)
TODO Explain functional differences between VIDPF and IDPF in {{Section 8 of !VDAF}}. Namely, there is no distinction between inner and leaf nodes and the payload is supposed to be the same at each level.
def gen(self,
alpha: tuple[bool, ...],
beta: list[F],
ctx: bytes,
nonce: bytes,
rand: bytes,
) -> tuple[list[CorrectionWord], list[bytes]]:
'''
The VIDPF key generation algorithm.
Returns the public share (i.e., the correction word for each
level of the tree) and two keys, one for each aggregator.
Implementation note: for clarity, this algorithm has not been
written in a manner that is side-channel resistant. To avoid
leaking `alpha` via a side-channel, implementations should avoid
branching or indexing into arrays in a data-dependent manner.
'''
if len(alpha) != self.BITS:
raise ValueError("alpha out of range")
if len(beta) != self.VALUE_LEN:
raise ValueError("incorrect beta length")
if len(nonce) != self.NONCE_SIZE:
raise ValueError("incorrect nonce size")
if len(rand) != self.RAND_SIZE:
raise ValueError("randomness has incorrect length")
keys = [rand[:self.KEY_SIZE], rand[self.KEY_SIZE:]]
# [MST24, Fig. 15]: s0^0, s1^0, t0^0, t1^0
seed = keys.copy()
ctrl = [False, True]
correction_words = []
for i in range(self.BITS):
idx = PrefixTreeIndex(alpha[:i+1])
bit = alpha[i]
# [MST24]: if x = 0 then keep <- L, lose <- R
#
# Implementation note: the value of `bits` is
# `alpha`-dependent.
(keep, lose) = (1, 0) if bit else (0, 1)
# Extend: compute the left and right children the current
# level of the tree. During evaluation, one of these children
# will be selected as the next seed and control bit.
#
# [MST24]: s_0^L || s_0^R || t_0^L || t_0^R
# s_1^L || s_1^R || t_1^L || t_1^R
(s0, t0) = self.extend(seed[0], ctx, nonce)
(s1, t1) = self.extend(seed[1], ctx, nonce)
# Compute the correction words for this level's seed and
# control bit. Our goal is to maintain the following
# invariant, after correction:
#
# * If evaluation is on path, then each aggregator will
# compute a different seed and their control bits will be
# secret shares of one.
#
# * If evaluation is off path, then the aggregators will
# compute the same seed and their control bits will be
# shares of zero.
#
# Implementation note: the index `lose` is `alpha`-dependent.
seed_cw = xor(s0[lose], s1[lose])
ctrl_cw = [
t0[0] ^ t1[0] ^ (not bit), # [MST24]: t_c^L
t0[1] ^ t1[1] ^ bit, # [MST24]: t_c^R
]
# Correct.
#
# Implementation note: the index `keep` is `alpha`-dependent,
# as is `ctrl`.
if ctrl[0]:
s0[keep] = xor(s0[keep], seed_cw)
t0[keep] ^= ctrl_cw[keep]
if ctrl[1]:
s1[keep] = xor(s1[keep], seed_cw)
t1[keep] ^= ctrl_cw[keep]
# Convert.
(seed[0], w0) = self.convert(s0[keep], ctx, nonce)
(seed[1], w1) = self.convert(s1[keep], ctx, nonce)
ctrl[0] = t0[keep] # [MST24]: t0'
ctrl[1] = t1[keep] # [MST24]: t1'
# Compute the correction word for this level's payload.
#
# Implementation note: `ctrl` is `alpha`-dependent.
w_cw = vec_add(vec_sub([self.field(1)] + beta, w0), w1)
if ctrl[1]:
w_cw = vec_neg(w_cw)
# Compute the correction word for this level's node proof. If
# evaluation is on path, then exactly one of the aggregatos
# will correct their node proof, causing them to compute the
# same node value. If evaluation is off path, then both will
# correct or neither will; and since they compute the same
# seed, they will again compute the same value.
proof_cw = xor(
self.node_proof(seed[0], ctx, idx),
self.node_proof(seed[1], ctx, idx),
)
correction_words.append((seed_cw, ctrl_cw, w_cw, proof_cw))
return (correction_words, keys)
The VIDPF-key evaluation algorithm is listed below. See {{vidpf-aux}} for
deferred auxiliary functions. Its inputs are the Aggregator's ID (either 0
or
1
), the correction words, the Aggregator's key, the level of the tree, the
sequence of prefixes, and the nonce associated with the report. Its outputs are
the Aggregator's share of beta
, the sequence of output shares for each
prefix, and the evaluation proof.
TODO Provide an overview and define
PrefixTreeIndex
andPrefixTreeEntry
. Explain how the evaluation proof is constructed.
def eval(self,
agg_id: int,
correction_words: list[CorrectionWord],
key: bytes,
level: int,
prefixes: tuple[tuple[bool, ...], ...],
ctx: bytes,
nonce: bytes,
) -> tuple[list[F], list[list[F]], bytes]:
"""
The VIDPF key evaluation algorithm.
Return the aggregator's share of `beta`, its output share for
each prefix, and its evaluation proof.
"""
if agg_id not in range(2):
raise ValueError("invalid aggregator ID")
if len(correction_words) != self.BITS:
raise ValueError("corrections words has incorrect length")
if level not in range(self.BITS):
raise ValueError("level too deep")
for prefix in prefixes:
if len(prefix) != level + 1:
raise ValueError("prefix with incorrect length")
if len(set(prefixes)) != len(prefixes):
raise ValueError("candidate prefixes are non-unique")
# Evaluate our share of the prefix tree and compute the path
# proof.
#
# Implementation note: we can save computation by storing
# `prefix_tree_share` across `eval()` calls for the same report.
prefix_tree_share: dict[PrefixTreeIndex, PrefixTreeEntry] = {}
root = PrefixTreeEntry.root(key, bool(agg_id))
onehot_proof = ONEHOT_PROOF_INIT
for i in range(level+1):
for prefix in prefixes:
# Compute the entry for `prefix`. Set `idx` to the
# index of its parent.
idx = PrefixTreeIndex(prefix[:i])
node = prefix_tree_share.setdefault(idx, root)
for child_idx in [idx.left_child(), idx.right_child()]:
# Compute the entry for `prefix` and its sibling. The
# sibling is used for the counter and payload checks.
if not prefix_tree_share.get(child_idx):
(child, onehot_proof) = self.eval_next(
node,
onehot_proof,
correction_words[i],
ctx,
nonce,
child_idx,
)
prefix_tree_share[child_idx] = child
# Compute the aggregator's share of `beta`.
w0 = prefix_tree_share[PrefixTreeIndex((False,))].w
w1 = prefix_tree_share[PrefixTreeIndex((True,))].w
beta_share = vec_add(w0, w1)[1:]
if agg_id == 1:
beta_share = vec_neg(beta_share)
# Counter check: check that the first element of the payload is
# equal to 1.
#
# Each aggregator holds an additive share of the counter, so we
# have aggregator 1 negate its share and add 1 so that they both
# compute the same value for `counter`.
counter_check = self.field.encode_vec(
[w0[0] + w1[0] + self.field(agg_id)])
# Payload check: for each node, check that the payload is equal
# to the sum of its children.
payload_check = b''
for prefix in prefixes:
for i in range(level):
idx = PrefixTreeIndex(prefix[:i+1])
w = prefix_tree_share[idx].w
w0 = prefix_tree_share[idx.left_child()].w
w1 = prefix_tree_share[idx.right_child()].w
payload_check += self.field.encode_vec(
vec_sub(w, vec_add(w0, w1)))
# Compute the Aggregator's output share.
out_share = []
for prefix in prefixes:
idx = PrefixTreeIndex(prefix)
w = prefix_tree_share[idx].w
out_share.append(w if agg_id == 0 else vec_neg(w))
# Compute the evaluation proof. If both aggregators compute the
# same value, then they agree on the onehot proof, the counter,
# and the payload.
proof = eval_proof(
ctx, onehot_proof, counter_check, payload_check)
return (beta_share, out_share, proof)
def eval_next(self,
node: PrefixTreeEntry,
onehot_proof: bytes,
correction_word: CorrectionWord,
ctx: bytes,
nonce: bytes,
idx: PrefixTreeIndex,
) -> tuple[PrefixTreeEntry, bytes]:
"""
Extend a node in the tree, select and correct one of its
children, then convert it into a payload and the next seed.
"""
(seed_cw, ctrl_cw, w_cw, proof_cw) = correction_word
keep = int(idx.path[-1])
# Extend.
#
# [MST24, Fig. 17]: (s^L, s^R), (t^L, t^R) = PRG(s^{i-1})
(s, t) = self.extend(node.seed, ctx, nonce)
# Correct.
#
# Implementation note: avoid branching on the value of control
# bits, as its value may be leaked by a side channel.
if node.ctrl:
s[keep] = xor(s[keep], seed_cw)
t[keep] ^= ctrl_cw[keep]
# Convert and correct the payload.
#
# Implementation note: the conditional addition should be
# replaced with a constant-time select in practice in order to
# reduce leakage via timing side channels.
(next_seed, w) = self.convert(s[keep], ctx, nonce)
next_ctrl = t[keep] # [MST24]: s^i, W^i, t'^i
if next_ctrl:
w = vec_add(w, w_cw)
# Compute and correct the node proof and update the onehot proof.
# Each update resembles a step of Merkle-Damgard compression. The
# main difference is that we XOR each block (i.e., corrected node
# proof) with the previous hash (or IV) rather than compress.
#
# corrected node proof
# |
# |
# v
# current +-----+ +------+ +-----+ updated
# proof --+-->| XOR |---->| Hash |---->| XOR |----> proof
# | +-----+ +------+ +-----+
# | ^
# | |
# +-------------------------------+
#
# [MST24]: \tilde\pi = H_1(x^{\leq i} || s^\i)
# \pi = \tilde \pi \xor
# H_2(\pi \xor (\tilde\pi \xor t^\i \cdot \cs^\i)
#
# Implementation note: avoid branching on the control bit here.
node_proof = self.node_proof(next_seed, ctx, idx)
if next_ctrl:
node_proof = xor(node_proof, proof_cw)
onehot_proof = xor(
onehot_proof, hash_proof(ctx, xor(onehot_proof, node_proof)))
return (PrefixTreeEntry(next_seed, next_ctrl, w), onehot_proof)
def extend(self,
seed: bytes,
ctx: bytes,
nonce: bytes,
) -> tuple[list[bytes], Ctrl]:
'''
Extend a seed into the seed and control bits for its left and
right children in the VIDPF tree.
'''
xof = XofFixedKeyAes128(seed, dst(ctx, USAGE_EXTEND), nonce)
s = [
bytearray(xof.next(self.KEY_SIZE)),
bytearray(xof.next(self.KEY_SIZE)),
]
# Use the least significant bits as the control bit correction,
# and then zero it out. This gives effectively 127 bits of
# security, but reduces the number of AES calls needed by 1/3.
t = [bool(s[0][0] & 1), bool(s[1][0] & 1)]
s[0][0] &= 0xFE
s[1][0] &= 0xFE
return ([bytes(s[0]), bytes(s[1])], t)
def convert(self,
seed: bytes,
ctx: bytes,
nonce: bytes,
) -> tuple[bytes, list[F]]:
'''
Convert a selected seed into a payload and the seed for the next
level.
'''
xof = XofFixedKeyAes128(seed, dst(ctx, USAGE_CONVERT), nonce)
next_seed = xof.next(XofFixedKeyAes128.SEED_SIZE)
payload = xof.next_vec(self.field, 1+self.VALUE_LEN)
return (next_seed, payload)
def node_proof(self,
seed: bytes,
ctx: bytes,
idx: PrefixTreeIndex) -> bytes:
'''
Compute the proof for this node.
'''
binder = \
to_le_bytes(self.BITS, 2) + \
to_le_bytes(idx.level(), 2) + \
idx.encode()
xof = XofTurboShake128(seed,
dst(ctx, USAGE_NODE_PROOF),
binder)
return xof.next(PROOF_SIZE)
def hash_proof(ctx: bytes, proof: bytes) -> bytes:
xof = XofTurboShake128(b'',
dst(ctx, USAGE_ONEHOT_PROOF_HASH),
proof)
return xof.next(PROOF_SIZE)
def eval_proof(ctx: bytes,
onehot_proof: bytes,
counter_check: bytes,
payload_check: bytes) -> bytes:
binder = onehot_proof + counter_check + payload_check
xof = XofTurboShake128(b'', dst(ctx, USAGE_EVAL_PROOF), binder)
return xof.next(PROOF_SIZE)
Mastic combines the VIDPF from {{vidpf}} with the FLP from {{Section 7.3 of
!VDAF}}. An instance of Mastic is determined by a choice of length of the
input, denoted BITS
, and a validity circuit, an instance of Valid
as
defined in {{Section 7.3.2 of !VDAF}}.
The validity circuit determines the type of the weights submitted by Clients,
denoted by W
, the total weight computed by the Collector, denoted by R
, and
the field ({{Section 6.1 of !VDAF}}) in which the weights are encoded and
aggregated. The field is denoted by F
.
The VIDPF is instantiated with bit length BITS
, value length
valid.MEAS_LEN
, and field valid.field
, where valid
is the validity
circuit. We denote this instance of the VIDPF by vidpf
.
In the remainder, we write xof
as shorthand for XofTurboShake128
({{Section
6.2.1 of !VDAF}}).
Mastic's implementation of the VDAF interface ({{Section 5 of !VDAF}}) is sepcified in the following sections. {{mastic-aux}} defines some auxiliary functions referenced in the sharding and preparation sections.
The sharding algorithm takes in the measurement (the input and weight), the
nonce, and the sharding randomness. The size of the nonce is 16
bytes; the
size of the randomness is vidpf.RAND_SIZE + 2 * xof.SEED_SIZE
, plus an
additional xof.SEED_SIZE
if the validity circuit takes joint randomness as
input.
The public share, denoted MasticPublicShare
, is a tuple comprised of the list
correction words generated by the VIDPF key generation algorithm and the FLP
"joint randomness parts" (defined below) used during preparation to compute the
joint randomness.
The contents of each input share, denoted MasticInputShare
, depends on the
Aggregator who receives it. We refer to the first Aggregator as the "Leader";
we refer to the second Aggregator as the "Helper". The components of the input
share are:
-
The Aggregator's VIDPF key share.
-
An optional FLP proof share, a vector of field elements. This is set for the Leader only.
-
An optional seed. This is always set for the Helper, who uses it to derive its FLP proof share. This is set for the Leader of the circuit uses joint randomness.
The behavior of the sharding algorithm depends on whether the circuit requires joint randomness:
def shard(self,
ctx: bytes,
measurement: tuple[tuple[bool, ...], W],
nonce: bytes,
rand: bytes,
) -> tuple[MasticPublicShare, list[MasticInputShare]]:
if self.flp.JOINT_RAND_LEN > 0:
return self.shard_with_joint_rand(
ctx, measurement, nonce, rand)
return self.shard_without_joint_rand(
ctx, measurement, nonce, rand)
When no FLP joint randomness is required, sharding involves the following steps:
- Encode the weight as
beta
- Generate the VIDPF correction words and keys for
alpha
andbeta
- Generate the FLP proof of
beta
's validity - Compute the Leader's share of the proof
The complete algorithm is listed below:
def shard_without_joint_rand(
self,
ctx: bytes,
measurement: tuple[tuple[bool, ...], W],
nonce: bytes,
rand: bytes,
) -> tuple[MasticPublicShare, list[MasticInputShare]]:
(vidpf_rand, rand) = front(self.vidpf.RAND_SIZE, rand)
(prove_rand_seed, rand) = front(self.xof.SEED_SIZE, rand)
(helper_seed, rand) = front(self.xof.SEED_SIZE, rand)
(alpha, weight) = measurement
beta = self.flp.encode(weight)
# Generate VIDPF keys.
(correction_words, keys) = \
self.vidpf.gen(alpha, beta, ctx, nonce, vidpf_rand)
# Generate FLP and split it into shares.
prove_rand = self.prove_rand(ctx, prove_rand_seed)
proof = self.flp.prove(beta, prove_rand, [])
helper_proof_share = self.helper_proof_share(ctx, helper_seed)
leader_proof_share = vec_sub(proof, helper_proof_share)
public_share = (correction_words, None)
input_shares = [
(keys[0], leader_proof_share, None),
(keys[1], None, helper_seed),
]
return (public_share, input_shares)
When FLP joint randomness is required, the Client must compute it from the
shares of beta
sent to each Aggregator:
- Encode the weight as
beta
- Generate the VIDPF correction words and keys for
alpha
andbeta
- Compute each Aggregator's FLP joint randomness part by hashing its share of
beta
with its FLP seed, its VIDPF key, and the VIDPF correction words - Compute the FLP joint randomness seed by hashing the joint randomness parts
- Derive the FLP joint randomness from the joint randomness seed
- Generate the FLP proof of
beta
's validity using the derived joint randomness - Compute the Leader's share of the proof
The joint randomness is also needed to verify the FLP and must therefore be recomputed during preparation ({{preparation}}). The Client includes the parts in the public share for this purpose.
The complete algorithm is listed below:
def shard_with_joint_rand(
self,
ctx: bytes,
measurement: tuple[tuple[bool, ...], W],
nonce: bytes,
rand: bytes,
) -> tuple[MasticPublicShare, list[MasticInputShare]]:
(vidpf_rand, rand) = front(self.vidpf.RAND_SIZE, rand)
(prove_rand_seed, rand) = front(self.xof.SEED_SIZE, rand)
(leader_seed, rand) = front(self.xof.SEED_SIZE, rand)
(helper_seed, rand) = front(self.xof.SEED_SIZE, rand)
(alpha, weight) = measurement
beta = self.flp.encode(weight)
# Generate VIDPF keys.
(correction_words, keys) = \
self.vidpf.gen(alpha, beta, ctx, nonce, vidpf_rand)
# Generate FLP joint randomness.
joint_rand_parts = [
self.joint_rand_part(
ctx, 0, leader_seed, keys[0], correction_words, nonce),
self.joint_rand_part(
ctx, 1, helper_seed, keys[1], correction_words, nonce),
]
joint_rand = self.joint_rand(
ctx, self.joint_rand_seed(ctx, joint_rand_parts))
# Generate FLP and split it into shares.
prove_rand = self.prove_rand(ctx, prove_rand_seed)
proof = self.flp.prove(beta, prove_rand, joint_rand)
helper_proof_share = self.helper_proof_share(ctx, helper_seed)
leader_proof_share = vec_sub(proof, helper_proof_share)
public_share = (correction_words, joint_rand_parts)
input_shares = [
(keys[0], leader_proof_share, leader_seed),
(keys[1], None, cast(Optional[bytes], helper_seed)),
]
return (public_share, input_shares)
def is_valid(self,
agg_param: MasticAggParam,
previous_agg_params: list[MasticAggParam],
) -> bool:
(level, _prefixes, do_weight_check) = agg_param
# Check that the weight check is done exactly once.
weight_checked = \
(do_weight_check and len(previous_agg_params) == 0) or \
(not do_weight_check and
any(agg_param[2] for agg_param in previous_agg_params))
# Check that the level is strictly increasing.
level_increased = len(previous_agg_params) == 0 or \
level > previous_agg_params[-1][0]
return weight_checked and level_increased
Each Aggregator initializes preparation with: the verification key shared by
both Aggregators; its own ID, either 0
for the Leader and 1
for the Helper;
the aggregation parameter; the report's nonce; the public share sent to each
Aggregator; and the Aggregator's own input share.
The aggregation parameter has the following components:
- the level of the VIDPF being evaluated
- the sequence of VIDPF prefixes being evaluated
- an indication of whether to verify the FLP
The FLP is verified exactly once, the first time the report is aggregated. See {{agg-param-validity}}.
The outputs of the initialization algorithm include the Aggregator's prep
state, denoted MasticPrepState
, and its outbound prep share, denoted
MasticPrepShare
. The prep share includes the Aggregator's FLP verifier share,
joint randomness part, and VIDPF proof. These are combined into the prep
message in the next step.
Preparation initialization involves the following steps:
-
Evaluate the VIDPF share on the sequence of prefixes, obtaining our share of each corresponding payload. This step also produces our share of
beta
, to be used to verify the FLP, and the VIDPF proof. -
If applicable, run the FLP query generation algorithm on our share of
beta
and proof to obtain our FLP verifier share. If joint randomness is required, then compute our joint randomness part and derive the joint randomness seed using our co-Aggregator's part provided by the Client. Note that the Client may have provided the wrong part, so we need to check that the seed was computed correctly before completing preparation. -
Truncate each payload share according to the FLP encoding scheme and flatten them into a single vector of field elements. This constitutes Mastic's output share.
The complete algorithm is listed below:
def prep_init(
self,
verify_key: bytes,
ctx: bytes,
agg_id: int,
agg_param: MasticAggParam,
nonce: bytes,
public_share: MasticPublicShare,
input_share: MasticInputShare,
) -> tuple[MasticPrepState, MasticPrepShare]:
(level, prefixes, do_weight_check) = agg_param
(key, proof_share, seed) = \
self.expand_input_share(ctx, agg_id, input_share)
(correction_words, joint_rand_parts) = public_share
# Evaluate the VIDPF.
(beta_share, out_share, eval_proof) = self.vidpf.eval(
agg_id,
correction_words,
key,
level,
prefixes,
ctx,
nonce,
)
# Query the FLP if applicable.
joint_rand_part = None
joint_rand_seed = None
verifier_share = None
if do_weight_check:
query_rand = self.query_rand(verify_key, ctx, nonce, level)
joint_rand = []
if self.flp.JOINT_RAND_LEN > 0:
assert seed is not None
assert joint_rand_parts is not None
joint_rand_part = self.joint_rand_part(
ctx, agg_id, seed, key, correction_words, nonce)
joint_rand_parts[agg_id] = joint_rand_part
joint_rand_seed = self.joint_rand_seed(
ctx, joint_rand_parts)
joint_rand = self.joint_rand(
ctx, self.joint_rand_seed(ctx, joint_rand_parts))
verifier_share = self.flp.query(
beta_share,
proof_share,
query_rand,
joint_rand,
2,
)
# Concatenate the output shares into one aggregatable output,
# applying the FLP truncation algorithm on each FLP measurement
# share.
truncated_out_share = []
for val_share in out_share:
truncated_out_share += [val_share[0]] + \
self.flp.truncate(val_share[1:])
prep_state = (truncated_out_share, joint_rand_seed)
prep_share = (eval_proof, verifier_share, joint_rand_part)
return (prep_state, prep_share)
Next, the Aggregators' prep shares are combined into the prep message, denoted
MasticPrepMessage
:
-
Check that both Aggregators computed the same VIDPF proof. If so, then it is presumed that the output share is one-hot, has path consistency, and has counter consistency as defined in {{vidpf}}.
-
If applicable, combine the FLP verifier shares into the FLP verifier and run the FLP decision algorithm. If successful, then it is presumed that the weight is valid.
-
If applicable, compute the FLP joint randomness seed from the parts.
The prep message consists of the optional joint randomness seed. The complete algorithm is listed below:
def prep_shares_to_prep(
self,
ctx: bytes,
agg_param: MasticAggParam,
prep_shares: list[MasticPrepShare],
) -> MasticPrepMessage:
(_level, _prefixes, do_weight_check) = agg_param
if len(prep_shares) != 2:
raise ValueError('unexpected number of prep shares')
(eval_proof_0,
verifier_share_0,
joint_rand_part_0) = prep_shares[0]
(eval_proof_1,
verifier_share_1,
joint_rand_part_1) = prep_shares[1]
# Verify the VIDPF output.
if eval_proof_0 != eval_proof_1:
raise Exception('VIDPF verification failed')
if not do_weight_check:
return None
if verifier_share_0 is None or verifier_share_1 is None:
raise ValueError('expected FLP verifier shares')
# Verify the FLP.
verifier = vec_add(verifier_share_0, verifier_share_1)
if not self.flp.decide(verifier):
raise Exception('FLP verification failed')
if self.flp.JOINT_RAND_LEN == 0:
return None
if joint_rand_part_0 is None or joint_rand_part_1 is None:
raise ValueError('expected FLP joint randomness parts')
# Confirm the FLP joint randomness was computed properly.
prep_msg = self.joint_rand_seed(ctx, [
joint_rand_part_0,
joint_rand_part_1,
])
return prep_msg
Finally, each Aggregator completes preparation by checking that the true FLP
joint randomness seed is equal to the value they computed in the initialization
step, prep_init()
. This is only done if a weight check was required by the
aggregation parameter and joint randomness was required by the FLP:
def prep_next(self,
_ctx: bytes,
prep_state: MasticPrepState,
prep_msg: MasticPrepMessage,
) -> list[F]:
(truncated_out_share, joint_rand_seed) = prep_state
if joint_rand_seed is not None:
if prep_msg is None:
raise ValueError('expected joint rand confirmation')
if prep_msg != joint_rand_seed:
raise Exception('joint rand confirmation failed')
return truncated_out_share
To guarantee secure execution of Mastic, care must be taken in choosing the VIDPF prefixes and whether to verify the FLP. In particular, it is only safe to consume the FLP once; and it is only safe to evaluate the VIDPF at most once at any given level of the tree.
NOTE By "safe" we mean "covered by the analysis of {{MPDST25}}". It could be that we have a little more wiggle room, but we're not certain. If we find matching attacks, we should mention them in {{security-considerations}}.
We further restrict aggregation by requiring that the level strictly increases at each step:
def is_valid(self,
agg_param: MasticAggParam,
previous_agg_params: list[MasticAggParam],
) -> bool:
(level, _prefixes, do_weight_check) = agg_param
# Check that the weight check is done exactly once.
weight_checked = \
(do_weight_check and len(previous_agg_params) == 0) or \
(not do_weight_check and
any(agg_param[2] for agg_param in previous_agg_params))
# Check that the level is strictly increasing.
level_increased = len(previous_agg_params) == 0 or \
level > previous_agg_params[-1][0]
return weight_checked and level_increased
Each output share consists of the truncated payload for each VIDPF prefix, flattened into a single vector. Aggregation involves simply adding these up:
def agg_init(self, agg_param: MasticAggParam) -> list[F]:
(_level, prefixes, _do_weight_check) = agg_param
agg = self.field.zeros(len(prefixes)*(1+self.flp.OUTPUT_LEN))
return agg
def agg_update(self,
agg_param: MasticAggParam,
agg_share: list[F],
out_share: list[F]) -> list[F]:
return vec_add(agg_share, out_share)
def merge(self,
agg_param: MasticAggParam,
agg_shares: list[list[F]]) -> list[F]:
(_level, prefixes, _do_weight_check) = agg_param
agg = self.agg_init(agg_param)
for agg_share in agg_shares:
agg = vec_add(agg, agg_share)
return cast(list[F], agg)
The aggregate result consists of a list of total weights, each corresponding to one of the prefixes. To compute it:
-
Add up the aggregate shares.
-
For each prefix, decode the corresponding vector chunk using the FLP's decoding algorithm ({{Section 7.1.1 of !VDAF}}). This requires the the prefix count, which is also encoded by the chunk.
The complete algorithm is listed below:
def unshard(self,
agg_param: MasticAggParam,
agg_shares: list[list[F]],
_num_measurements: int,
) -> list[R]:
agg = self.merge(agg_param, agg_shares)
agg_result = []
while len(agg) > 0:
(chunk, agg) = front(self.flp.OUTPUT_LEN + 1, agg)
meas_count = chunk[0].int()
agg_result.append(self.flp.decode(chunk[1:], meas_count))
return agg_result
def expand_input_share(
self,
agg_id: int,
input_share: MasticInputShare,
) -> tuple[bytes, list[F], Optional[bytes]]:
if agg_id == 0:
(key, proof_share, seed) = input_share
assert proof_share is not None
else:
(key, _leader_proof_share, seed) = input_share
assert seed is not None
proof_share = self.helper_proof_share(seed)
return (key, proof_share, seed)
def helper_proof_share(self, seed: bytes) -> list[F]:
return self.xof.expand_into_vec(
self.field,
seed,
dst(USAGE_PROOF_SHARE),
b'',
self.flp.PROOF_LEN,
)
def prove_rand(self, seed: bytes) -> list[F]:
return self.xof.expand_into_vec(
self.field,
seed,
dst(USAGE_PROVE_RAND),
b'',
self.flp.PROVE_RAND_LEN,
)
def joint_rand_part(
self,
agg_id: int,
seed: bytes,
key: bytes,
correction_words: list[CorrectionWord],
nonce: bytes,
) -> bytes:
pub = self.vidpf.encode_public_share(correction_words)
return self.xof.derive_seed(
seed,
dst(USAGE_JOINT_RAND_PART),
byte(agg_id) + nonce + key + pub,
)
def joint_rand_seed(self, parts: list[bytes]) -> bytes:
return self.xof.derive_seed(
zeros(self.xof.SEED_SIZE),
dst(USAGE_JOINT_RAND_SEED),
concat(parts),
)
def joint_rand(self, seed: bytes) -> list[F]:
return self.xof.expand_into_vec(
self.field,
seed,
dst(USAGE_JOINT_RAND),
b'',
self.flp.JOINT_RAND_LEN,
)
def query_rand(self,
verify_key: bytes,
nonce: bytes,
level: int) -> list[F]:
return self.xof.expand_into_vec(
self.field,
verify_key,
dst(USAGE_QUERY_RAND),
nonce + to_le_bytes(level, 2),
self.flp.QUERY_RAND_LEN,
)
Mastic inherits its security considerations from {{Section 9 of !VDAF}}. A security analysis of Mastic is provided in {{MPDST25}}.
TODO Contrast with Poplar1, especially {{Section 9.4.2 of !VDAF}} ("Safe Usage of IDPF Outputs"). In particular it's perfectly safe to use Mastic's intermediate outputs.
TODO
--- back
{:numbered="false"}
TODO
{:numbered="false"}
The design of Mastic is informed primarily by two use cases, which we describe here.
{:numbered="false"}
Network Error Logging (NEL) is a mechanism used by web browsers to report errors that occur while attempting to establish a connection to a server {{W3C23}}. Some of these errors are visible to the server, but not all: failures in DNS, TCP, TLS, and HTTP can occur without the server having any visibility into the issue. A small amount of connection errors is expected, even under normal operating conditions; but a sudden, substantial increase in errors may be an indication of an outage, or a configuration issue impacting millions of users. Without a reporting mechanism like NEL, these events would only manifest in the server's telemetry as a drop in overall traffic.
NEL is particularly important for content delivery networks that handle HTTP traffic for a large number of websites (typically millions). A content delivery network acts as a reverse proxy between clients and origin servers that provides a layer of caching and security services, such as DDoS protection.
Reports are comprised of the URL the client attempted to navigate to (e.g., "https://example.com"), the type of error that occurred, and metadata related to the attempt, such as the time that elapsed between when the connection attempt began and the error was observed (e.g., Section 7 of {{W3C23}}). Clients may also report successful connection attempts to give the server a sense of the error rate. The exact client behavior is determined by the reporting policy specified by the server (see Section 5.1 of {{W3C23}}).
NEL data is privacy-sensitive for two reasons. First, it exposes information that the server would not otherwise have access to, which can be abused to probe the client's network configuration as described in Section 9 of {{W3C23}}. Second, for operational reasons, the reporting endpoint may be organizationally separated from the server (i.e., run on different cloud infrastructures), leading to an increased risk of the client's browsing history being exposed (e.g., in a data breach).
MPC helps mitigate these risks by revealing to the endpoint only the information it needs to fulfill its service level objectives. This means, of course, we must be satisfied with limited functionality. Fortunately, Mastic allows us to preserve the most important functionality of NEL while minimizing privacy loss.
Mastic can be applied to a simplified version of NEL where each client reports
a tuple (dom, err)
consisting of a domain name dom (e.g., "example.com") and
a value err that represents an error (e.g., "dns.unreachable") or an indication
that no error occurred (e.g., "ok"). Notably, this can be easily extended in
Mastic to represent more elaborate metrics. e.g., where each weight includes
the time it took each browser to report the error (and the aggregate is the
average error reporting time), user agent (browser type and version), etc.
However, our main goal is to understand 1) the distribution of errors and 2)
which domains are impacted.
We expect there to be a large number of distinct domain names (millions in the case of content delivery networks) and only a small number of error variants (the NEL spec {{W3C23}} defines 30 variants). The following Mastic parameters are suitable for this application.
Each input would encode the domain dom
encoded with a number of bits
sufficient to uniquely represent most of the domains; and each weight would
represent the error variant dom
. To compute the distribution of errors, we
would encode each error variant as a distinct bucket of a histogram so that
[1, 0, 0, ...]
represents "ok", [0, 1, 0, ...]
represents
"dns.unreachable", and so on. (See ection 6 of {{W3C23}}.), This is similar to
Prio3Histogram ({{Section 7 of !VDAF}}.)
{:numbered="false"}
Web browsers collect telemetry generated by users as they navigate the web to gain insights into trends that guide product decisions. In many cases, Prio3 ({{Section 7 of !VDAF}}) can be used to privately aggregate this telemetry. However, this comes at the cost of flexibility.
For example, Prio3 can be used to collect page load metrics from Browser for a list of known popular sites (e.g., "example.com"). The purpose of these metrics is to detect if changes to these sites cause regressions that might be correlated with an increased average load time or error rate. A subtle, but important requirement for this system is the ability to break down the metrics by client attributes. Suppose for example that we want to aggregate by 1) the software version, and 2) the information about the client's location.
Mastic provides a simple solution to this problem. For the sake of presentation,
we consider a simplified use case (the same approach can be applied to any
aggregation task for which Prio3 ({{Section 7 of !VDAF}}) is suitable). Each
client reports a tuple (ver, loc, site, time)
where: ver
is a string
representing the client's software version (e.g., "Browser/122.0"); loc
is a
string encoding its country code (e.g., "GR", "US", "IN", etc.); site
is one
of a fixed set of sites (e.g., "example.com", "example.org", etc.); and time
is the load time of the site in seconds. The version and location are included
in the Mastic input; the site and load time are encoded by the corresponding
weight. Notably, this is just one example of what Mastic can do; the same idea
can be applied to other types of metrics.
Compared to the private NEL application in {{NEL}}, the number of possible inputs here is relatively small: there are less than 200 country codes and a handful of browser versions in wide use at any given time. This means the aggregators can enumerate a set of inputs of interest and evaluate them immediately. Consider the following parameters for Mastic, in its attribute-based metrics mode of operation {{attribute-based-metrics}}:
-
Attributes: Two-letter country codes can easily be encoded in 2 bytes. Likewise, the number of distinct browser versions is easily less than 2^16, so 2 bytes are sufficient. Therefore, each attribute can be encoded with just
32
bits. -
Values: Similar to private NEL, each weight is a
0
-vector except for a single1
representing a bucket in a histogram. We represent(site, time)
as a histogram bucket as follows. First, we quantize time (in seconds) into one of four buckets:[0, 0.1)
,[0.1, 1)
,[1, 5)
, and[5, inf)
. Let0 < t <= 4
denote the time bucket fortime
. Next, suppose we wish to track metrics for25
sites. Let0 < s <= 25
denote the index ofsite
in this list. Then the index of 1 is simplyt * s
.
{:numbered="false"}
{:numbered="false"}
TODO See {{NEL}} for a motivating application and
example_weighted_heavy_hitters_mode()
in the reference implementation for an end-to-end example.
The primary use case for Mastic is a variant of the heavy-hitters problem, in which the prefix counts are replaced with a notion of weight that is specific to some application. For example, when measuring the performance of an ad campaign, it is useful to learn not only which ads led to purchases, but how much money was spent.
To support this use case, we view the Client's alpha
value as its measurement
and the beta
value as the measurement's "weight". The range of valid values
for beta
are therefore determined by the FLP with which Mastic is
instantiated. Concretely, validity of beta
is expressed by a validity circuit
({{Section 7.3.2 of !VDAF}}).
To compute the weighted heavy-hitters, the Collector and Aggregators proceed as described in {{Section 8 of !VDAF}}, except that the threshold represents a minimum weight rather than a minimum count. In addition:
-
The Aggregators MUST perform the range check (i.e., verify the FLP) at the first round of aggregation and remove any invalid reports before proceeding.
-
The level at which the reports are Aggregated MUST be strictly increasing.
{:numbered="false"}
NOTE For an end-to-end example, see
example_weighted_heavy_hitters_mode_with_different_thresholds()
in the reference implementation.
So far, we have assumed that there is a single threshold for determining which prefixes are "heavy". However, we can easily extend this to have different thresholds for different prefixes. There exist use-cases where prefixes starting with "000" may be significantly more popular than prefixes starting with "111". Setting a low threshold may result in an overwhelmingly big set of heavy hitters starting with "000", while setting a high threshold might prune anything starting with "111". Consider the following examples:
-
Popular URLs:
a.example.com
receives a massive amount of traffic whereasb.example.com
may have lower traffic. To identify heavy-hitting search queries ona.example.com
, the Aggregators should set a high threshold, while queries with different domain prefixes may require lower thresholds to be considered popular. -
E-commerce: Grocery items are essential and have a high volume of sales. In contrast, electronics, though popular, usually come with a higher price compared to groceries. Meanwhile, luxury items command significantly higher prices but generally experience lower sales volumes. To identify heavy-hitting grocery items on an e-commerce website, Aggregators could use different threshold for each of these categories. These thresholds are set to ensure that only the top-selling grocery items qualify as heavy hitters while electronics and luxury items are also considered heavy hitters on their own categories.
To tackle this, Mastic can allow different prefixes having different thresholds.
When a specific prefix does not have an associated threshold, we first search if
any of its prefixes has a specified threshold, otherwise we use a default
threshold. For example, if the Aggregators have set the thresholds to be
{"000": 10, "111": 2, "default": 5}
and the search for prefix "01", then
threshold 5 should be used. However, if the Aggregators search for prefix
"11101", then threshold 2 should be used.
{:numbered="false"}
NOTE See {{attribute-based-telemetry}} for a motivating application and
example_attribute_based_metrics_mode()
in the reference implementation for an end-to-end example.
In this mode of operation, we take the beta
value to be the Client's
measurement and alpha
to be an arbitrary "attribute". For a given sequence of
attributes, the goal of the Collector is to aggregate the measurements that
share the same attribute. This provides functionality similar to Prio3
{{!VDAF}}, except that the aggregate is partitioned by Clients who share some
property. For example, the attribute might encode the Client's user agent
{{?RFC9110}}.
Mastic requires each alpha
to have the same length (Vidpf.BITS
). Thus, it
is necessary for each application to choose a scheme for encoding attributes as
fixed-length strings. The following scheme is RECOMMENDED. Choose a
cryptographically secure hash function, such as SHA256
{{?SHS=DOI.10.6028/NIST.FIPS.180-4}}, compute the hash of the Client's input
string, and interpret each bit of the hash as a bit of the VIDPF index.
TODO Are we comfortable recommending truncating the hash? Collisions aren't so bad since the Client can just lie about
alpha
anyway. The main thing is to pick a value forBITS
that is large enough to avoid accidental collisions.
The Aggregators MAY aggregate a report any number times, but:
-
They MUST perform the range check (i.e., verify the FLP) the first time the reports are aggregated and remove any invalid reports before aggregating again.
-
The aggregation parameter MUST specify the last level of the VIDPF tree (i.e.,
level
MUST beVidpf.BITS-1
).
TODO Figure out if these requirements are strict enough. We may need to tighten aggregation parameter validity if we find out that aggregating at the same level more than once is not safe.
{:numbered="false"}
TODO Take "silently verifiable proofs" from {{RZCGP24}} into account here, which allows us to aggregate the FLPs as well.
The total communication cost of using Mastic (or Poplar1 {{!VDAF}}) for heavy
hitters is O(num_measurements * Vidpf.BITS)
bits exchanged between the
Aggregators, where num_measurements
is the number of reports being
aggregated. For plain heavy-hitters, this can be reduced to O(Vidpf.BITS)
in
the best case.
The idea is to take advantage of the feature of VIDPF evaluation whereby the Aggregators compute identical VIDPF proofs if and only if the report is valid. This allows the proofs themselves to be aggregated: if each report in a batch of reports is valid, then the hash of their proofs will be equal as well; on the other hand, if one report is invalid, then the hash of the proofs will not be equal.
To facilitate isolation of the invalid report(s), the proof strings are arranged into a Merkle tree. During aggregation, the Aggregators interactively traverse the tree to detect the subtree(s) containing invalid reports and remove them from the batch.
TODO Decide if we should spell this out in greater detail. This feature is not compatible with {{?DAP=I-D.draft-ietf-ppm-dap-07}}; if we wanted to extend DAP to support this, then we'd need to specify the wire format of the messages exchanged between the Aggregators.
In the worst case, isolating invalid reports requires O(num_measurements * Vidpf.BITS)
bits of communication and Vidpf.BITS
many rounds of communication
between the Aggregators. However, this behavior would only be observed under
attack conditions in which the vast majority of Clients are malicious.
In the simple case where the beta
value is a constant (e.g., 1) we can replace
the FLP check with a simpler check. FLPs are not compatible with proof
aggregation the way VIDPFs are. In order to perform the range check without
FLPs, we use an extension of VIDPF described by {{MST24}}. The high-level idea
here is that the Aggregators can evaluate the empty string and verify that they
have shares of the constant beta
. Next, as described in {{vdaf}}, we use the
"one-hot verifiability" and "path verifiability" checks to verify that each
level is non-zero at only a single point and that the same constant beta
is
propagated down the tree correctly. Note that this trick is not suitable for
weighted heavy-hitters, since it expects that each beta
value is constant
(e.g., 1).
TODO Proof aggregation could work with plain Mastic, but we would need to check the FLPs at the first round of aggregation, leading to best-case communication cost would be
O(num_measurements + Vidpf.BITS)
. This would be OK, but we would still want to support a mode for plain heavy-hitters that is as good as we can get.One idea is to always do the PLASMA
0
/1
check alongside the FLP. This would be useful for another reason: Usually FLP decoding requiresnum_measurements
as a parameter. We currently don't support this because we currently don't have a pure counter as part of the VIDPF output.
{:numbered="false"}
Next, we describe an enhancement that allows Mastic to achieve robustness in the presence of a malicious Aggregator. The two-party Mastic (as well as Poplar1) is susceptible to additive attacks by a malicious Aggregator. In more detail, if one of the Aggregators starts acting maliciously, they can arbitrarily add to the aggregation result (simply by adding to their own aggregation shares) without the honest Aggregator noticing.
We can solve this problem in Mastic by using a technique from {{MST24}} that lifts the two-party semi-honest secure PLASMA to the three-party maliciously secure setting. Rather than having two Aggregators as in the previous setting, this flavor involves three Aggregators, where every pair of Aggregators communicate over a different channel. In essence, each pair of Aggregators will run one session of the VDAF with unique randomness but on the same Client measurement. The following changes are necessary:
-
The Client needs to generate three pairs of VIDPF keys all corresponding to the same
alpha
andbeta
values. We represent the keys based on the session as follows:- Session 0 (between Aggregators 0 and 1):
key_01, key_10
- Session 1 (between Aggregators 1 and 2):
key_12, key_21
- Session 2 (between Aggregators 2 and 0):
key_20, key_02
Each pair of Aggregators cannot check that the Client input is consistent across two sessions without the involvement of the third Aggregator. To address this, we let two Aggregators (i.e., Aggregators 0 and 1) to run all three sessions so that they can check that the Client input is consistent across three sessions. The third Aggregator (i.e., Aggregator 2) is involved as an attestator in two of the sessions. The check involves field addition and subtraction and then hash comparisons.
- Session 0 (between Aggregators 0 and 1):
-
The Client sends the following keys to the Aggregators:
- Aggregator 0 receives:
key_01
,key_02
, andkey_21
- Aggregator 1 receives:
key_10
,key_12
, andkey_20
- Aggregator 2 receives:
key_21
andkey_20
- Aggregator 0 receives:
-
The Aggregators need to verify that the Client's input is consistent across the different sessions (i.e., that all the keys correspond to the same
alpha
andbeta
values). Aggregators 0 and 1 check that:- Their output shares of Session 0 minus their output shares of Session 1 are shares of zero
- Their output shares of Session 1 minus their output shares of Session 2 are shares of zero.
The subtraction is a local operation and verifying that two Aggregators possess a sharing of zero requires exchanging one hash.
Using a third Aggregator, we can lift the security of Mastic from the semi-honest setting to malicious security. While more complex to implement than 2-party Mastic, this mode allows achieves both privacy and robustness against a malicious Aggregator.
{:numbered="false"}
TODO