Chapter status: ✅ in good shape ✅

Related puzzles: Puzzle 1

TODO:

Commitment Schemes

This section contains a brief introduction to commitment schemes, focusing on Pedersen commitments.

Generalities
Pedersen Commitments
ElGamal Commitments
Commitments and Hash Functions
Further Resources

Generalities

A commitment scheme involves two parties, a committer (or prover) and a verifier. It allows the committer to send to the verifier a commitment $C$ to some secret value $m$ and later on to open this commitment to reveal $m .$ The commitment $C$ should not reveal any information about the message $m$ (hiding property) and the committer should not be able to open a commitment in two distinct ways (binding property).

For a physical analogy, imagine the committer as writing the message $m$ on a piece of paper and placing it in an opaque, unbreakable, locked box which he sends to the verifier. At this point, the verifier cannot learn anything about the message and the committer cannot modify the message. Later, when the committer wants to open the commitment, he sends the key opening the box to the verifier to reveal the message.

Syntax

More formally, a commitment scheme consists of three algorithms (the exact syntax can vary slightly in the literature):

a probabilistic setup algorithm $Setup$ which on input the security parameter $1^{λ}$ returns public parameters¹ $p a r$ which in particular specify a message space $M_{p a r}$ (in the following, we simply denote the message space $M,$ leaving the dependency on $p a r$ implicit);
a probabilistic commitment algorithm $Commit$ which on input parameters $p a r$ and a message $m \in M$ returns a commitment $C$ and a decommitment² $D;$
a deterministic verification algorithm $Verif$ which on input parameters $p a r,$ a commitment $C,$ a message $m \in M,$ and a decommitment $D,$ return 1 if the decommitment is valid for $(p a r, C, m)$ and 0 otherwise.

Quite often, the decommitment $D$ simply consists of the random coins $r$ used by the commitment algorithm, and the verification algorithm simply recomputes the commitment given $m$ and $r$ and compares with $C .$ When this is the case, we say that the commitment scheme has canonical verification and, overloading the notation, we let $Commit (p a r, m; r)$ denote the function explicitly taking the random coins $r$ of the commitment algorithm as input and returning the commitment $C$ (letting the decommitment $D = r$ implicit in that case).

Note that what we just defined here is the syntax for a non-interactive commitment scheme, where the $Setup$ algorithm is run once and for all and committing consists of a single message sent by the prover to the verifier. There exists more complex commitment schemes where committing requires some interaction between the prover and the verifier.

Correctness requires that for every security parameter $λ,$ the following game capturing the nominal execution of algorithms returns true with probability 1:

$p a r \leftarrow Setup (1^{λ}) m \leftarrow_{$} M (C, D) \leftarrow Commit (p a r, m) b \leftarrow Verif (p a r, C, m, D) assert (b = 1)$

Security

A commitment scheme should satisfy two security properties informally defined as follows:

hiding: the commitment $C$ should not reveal any information about the secret value $m$ to the verifier,
binding: the committer should not be able to open the commitment in two different ways.

Let us formalize these two properties more precisely, starting with hiding, defined by the following game:

$\underline{Game HIDING:} b \leftarrow_{$} {0, 1} p a r \leftarrow Setup (1^{λ}) b^{'} \leftarrow A^{Commit} (p a r) assert (b = b^{'}) \underline{Oracle Commit (m_{0}, m_{1}) :} assert (m_{0} \in M) assert (m_{1} \in M) (C, D) \leftarrow Commit (p a r, m_{b}) return C$

In some cases, it might be necessary to check additional conditions and messages $m_{0}$ and $m_{1}$ queried to oracle $Commit$ (e.g., when $M$ consists of bit strings of various lengths and $Commit$ does not hide the message length, $m_{0}$ and $m_{1}$ should have the same length).

Binding is defined by the following game:

$\underline{Game BINDING:} p a r \leftarrow Setup (1^{λ}) (C, m, D, m^{'}, D^{'}) \leftarrow A (p a r) b \leftarrow Verif (p a r, C, m, D) b^{'} \leftarrow Verif (p a r, C, m^{'}, D^{'}) assert (m \neq = m^{'}) assert (b = 1) assert (b^{'} = 1)$

For some commitment schemes, one of these two properties holds statistically (i.e., cannot be broken with non-negligible advantage even by a computationally unbounded adversary) or even perfectly. However, a commitment scheme cannot be both statistically hiding and statistically binding at the same time. Hence, at best, a commitment scheme can be either statistically hiding and computationally binding or computationally hiding and statistically binding.

Homomorphic Commitments

Informally, a commitment scheme is homomorphic if the message space $M$ equipped with some binary operation $⋆$ forms a group and given two commitments $C_{1}$ and $C_{2}$ to respectively $m_{1}$ and $m_{2},$ anyone can compute a commitment $C$ that the committer can open to $m_{1} ⋆ m_{2} .$

More formally, a commitment scheme is homomorphic (with respect to group operation $⋆)$ if there exists two algorithms $HomCom$ and $HomDecom$ such that

$HomCom$ takes parameters $p a r$ and two commitments $C_{1}$ and $C_{2}$ and returns a commitment $C;$
$HomDecom$ takes parameters $p a r$ and two decommitments $D_{1}$ and $D_{2}$ and returns a decommitment $D;$
for any security parameter $λ,$ the following game returns true with probability 1: $p a r \leftarrow Setup (1^{λ}) m_{1}, m_{2} \leftarrow_{$} M (C_{1}, D_{1}) \leftarrow Commit (p a r, m_{1}) (C_{2}, D_{2}) \leftarrow Commit (p a r, m_{2}) C \leftarrow HomCom (p a r, C_{1}, C_{2}) D \leftarrow HomDecom (p a r, D_{1}, D_{2}) b \leftarrow Verif (p a r, C, m_{1} ⋆ m_{2}, D) assert (b = 1)$

Algorithms $HomCom$ and $HomDecom$ are often quite simple (e.g., when the commitment and decommitment spaces also have a group structure, they simply consist in applying the corresponding group operation to $C_{1}$ and $C_{2}$ or $D_{1}$ and $D_{2}$ respectively).

Pedersen Commitments

Description and Security

The Pedersen commitment scheme, initially introduced in [Ped91], is widely used, in particular to build zero-knowledge proof systems. It is specified as follows. Let $GroupSetup$ be a group setup algorithm. Then:

the setup algorithm $Setup,$ on input $1^{λ},$ runs $(G, p) \leftarrow GroupSetup (1^{λ}),$ draws two random generators $G$ and $H$ of $G,$ and returns parameters $p a r = (G, p, G, H);$ the message space is $M = Z_{p};$
the commitment algorithm $Commit,$ on input parameters $p a r = (G, p, G, H)$ and a message $m \in Z_{p},$ draws $r \leftarrow_{$} Z_{p}$ and returns a commitment $C = m G + rH$ and a decommitment $D = r;$
the verification algorithm $Verif,$ on input parameters $p a r = (G, p, G, H),$ a commitment $C \in G,$ a message $m \in Z_{p},$ and a decommitment $D = r \in Z_{p},$ returns 1 if $m G + rH = C$ and 0 otherwise.

Theorem 18.1. The Pedersen commitment scheme is perfectly hiding, computationally binding under the discrete logarithm assumption, and homomorphic with respect to addition over $Z_{p} .$

Proof

Let us sketch the proof of each property:

perfectly hiding: as $r$ is uniformly random in $Z_{p},$ for any message $m,$ $C$ is uniformly random in $G$ and hence does not reveal any information about $m;$
computationally binding: assume an adversary can output two message/decommitment pairs $(m, r)$ and $(m^{'}, r^{'})$ with $m \neq = m^{'}$ for the same commitment $C;$ then $(m - m^{'}) G = (r^{'} - r) H,$ which yields the discrete logarithm of $H$ in base $G$ (note that $m - m^{'} \neq = 0$ implies $r^{'} - r \neq = 0$ as $G$ and $H$ are generators of $G);$
additively homomorphic: given two commitments $C_{1} = m_{1} G + r_{1} H$ and $C_{2} = m_{2} G + r_{2} H,$ anyone can compute $C : = C_{1} + C_{2} = (m_{1} + m_{2}) G + (r_{1} + r_{2}) H,$ and the committer can compute $r_{1} + r_{2}$ which is a valid decommitment for $C$ and message $m_{1} + m_{2} .$

A Note About the Setup

Importantly, the setup algorithm should ensure that nobody knows the discrete logarithm of $H$ in base $G .$ In particular, it is not safe to allow the committer to choose the public parameters: if the committer knows the discrete logarithm of $H$ in base $G,$ then the scheme is not binding anymore. Say the committer knows $h$ such that $H = h G .$ Then it can send $C = c G$ as commitment; later, it can open this commitment to any value $m \in Z_{p}$ it wants by computing $r = h^{- 1} (c - m) mod p$ and sending decommitment $D = (m, r) :$ it satisfies $m G + rH = (m + r h) G = c G = C .$

For this reason, Pedersen's scheme is sometimes referred to as a trapdoor (or equivocal) commitment scheme, which can be useful in security proofs but should also make us wary. However, there are secure ways to select the commitment key without a trusted third party, such as using a hash-to-group function (a.k.a. hash-to-curve function in case $G$ is based on an elliptic curve) applied to some NUMS (nothing-up-my-sleeve) input. Hence, even though Pedersen commitments do not require a trusted setup, one should always verify that parameters were correctly generated. For a real-world example where this trapdoor property could have been used, see Section VI of [HLPT20] about the Scytl/SwissPost e-voting system.

Variants

The Pedersen commitment scheme can be generalized to messages which are vectors $m = (m_{0}, \dots, m_{n - 1}) \in (Z_{p})^{n} :$ the parameters are extended to $p a r = (G, p, G_{0}, \dots, G_{n - 1}, H)$ where $G_{0}, \dots, G_{n - 1},$ and $H$ are uniformly random and independent generators of $G,$ and the commitment for message $m = (m_{0}, \dots, m_{n - 1})$ with randomness $r$ is $C : = i = 0 \sum n - 1 m_{i} G_{i} + rH .$ This is usually called the generalized Pedersen commitment scheme, or sometimes the Pedersen vector commitment scheme, although this is somehow a misnomer as it does not have all the properties required from a vector commitment scheme [CF13]. As for the basic variant, it can be shown to be perfectly hiding, computationally binding under the DL assumption, and homomorphic with respect to addition over $(Z_{p})^{n} .$

The "random" part of the commitment $rH$ is sometimes omitted, in which case the commitment algorithm becomes deterministic and the commitment is simply $C : = i = 0 \sum n - 1 m_{i} G_{i} .$ In that case, the scheme is still computationally binding under the DL assumption (and even perfectly binding for $n = 1$ as the commitment function is bijective), but it is not hiding anymore (given two messages $m_{0}$ and $m_{1}$ and a commitment $C$ to $m_{b}$ for some random bit $b,$ one can recover $b$ by simply computing the commitments corresponding to respectively $m_{0}$ and $m_{1}$ and comparing with $C) .$ For this reason, it is sometimes referred to as the non-hiding Pedersen commitment scheme. It is however preimage-resistant under the DL assumption, meaning that given a random commitment $C \in G,$ it is hard to compute a message $m$ such that $Commit (p a r, m) = C .$

ElGamal Commitments

The Pedersen commitments scheme has a relative known as the ElGamal commitment scheme where the commitment key $c k$ is $(G, p, G, H)$ as for Pedersen and the commitment for message $m \in Z_{p}$ with randomness $r \leftarrow_{$} Z_{p}$ is the pair $(C_{1}, C_{2}) = (r G, m G + rH) .$ (Note the similarity with ElGamal encryption w.r.t. public key $H .$ ) This scheme is perfectly binding, computationally hiding under the DDH assumption, and additively homomorphic.

If the message is encoded as a group element $M \in G$ and the commitment computed as $(C_{1}, C_{2}) = (r G, M + rH),$ the scheme has a trapdoor property allowing anyone with knowledge of the discrete logarithm $h$ of $H$ in base $G$ to extract the message $M$ (by "decrypting" the commitment as in ElGamal encryption, i.e., computing $M = C_{2} - h C_{1}) .$

Commitments and Hash Functions

There is a strong connection between commitment schemes and collision-resistant hash functions.

First, let us consider the following strengthening of the binding property: We say that a commitment scheme if strongly binding if it is hard to find a commitment $C$ and two distinct message-decommitment pairs $(m, D)$ and $(m^{'}, D^{'})$ such that $Verif (p a r, C, m, D) = 1$ and $Verif (p a r, C, m^{'}, D^{'}) = 1.$ That is, the adversary wins also when the messages $m$ and $m^{'}$ are equal but the decommitments $D$ and $D^{'}$ are different.

Given a hash function family $H_{p a r} : {0, 1}^{*} \to {0, 1}^{2 λ}$ indexed by some parameter $p a r,$ one can define a simple commitment scheme with $Commit (p a r, m; r) : = H_{p a r} (m ∥ r)$ where $r \leftarrow_{$} {0, 1}^{λ} .$ It can be shown to be (computationally) strongly binding assuming the family $(H_{p a r})$ is collision-resistant. On the other hand, there is no reason for this scheme to be hiding in general ( $H$ could for example reveal the first bit of the message, allowing to distinguish commitments to two messages with distinct first bits). It is however easily seen to be (computationally) hiding in the random oracle model.

Reciprocally, it is straightforward to derive a collision-resistant hash function family from a strongly binding commitment scheme.

Proposition 18.1. Consider a commitment scheme $Π$ with a $Commit$ function taking parameters $p a r,$ a message $m \in M,$ and explicit random coins $r \in R .$ If $Π$ is strongly binding, then the function family $H_{p a r} : (m, r) \mapsto Commit (p a r, m; r)$ is collision-resistant.

Proof

Assume that $H$ is not collision-resistant and that there is an adversary which on input $p a r$ returns $(m, r) \neq = (m^{'}, r^{'})$ such that $H_{p a r} (m, r) = H_{p a r} (m^{'}, r^{'}) .$ Let $C : = H_{p a r} (m, r) = H_{p a r} (m^{'}, r^{'}) .$ Then $Verif (p a r, C, m, r) = Verif (p a r, C, m^{'}, r^{'}) = 1,$ hence this adversary can be used to break strong binding of $Π.$

Note that the assumption that $Π$ is binding is not sufficient: it could be easy to find $(m, r) \neq = (m^{'}, r^{'})$ such that $Commit (p a r, m; r) = Commit (p a r, m^{'}; r^{'})$ but with $m = m^{'},$ which would break collision-resistance but not binding.

It is not hard to see that the Pedersen commitment scheme is actually strongly binding, which directly gives an algebraic family of collision-resistant hash functions usually called Pedersen hashing. A specific instance of the family is specified by a tuple of parameters $p a r = (G, p, G_{0}, \dots, G_{n - 1})$ where $n \geq 2,$ $G$ is a cyclic group of order $p,$ and $G_{0}, \dots, G_{n - 1}$ are generators chosen in a way such that nobody knows any discrete logarithm relation between them. Then $H_{p a r}$ has domain $(Z_{p})^{n},$ range $G,$ and is defined by $H_{p a r} (m_{0}, \dots, m_{n - 1}) = i = 0 \sum n - 1 m_{i} G_{i} .$ (Note that in the context of hashing, there is no distinction between the "message" and the "randomness" as in the context of commitment schemes.)

This family of hash functions is collision-resistant assuming the discrete logarithm problem is hard.

Variants are possible: for example, if inputs are bit strings of length exactly $L,$ one can split the input $m$ into chunks of $w$ consecutive bits with $2^{w} \leq p$ and $L = n w,$ convert the $i$ -th chunk into an integer $m_{i},$ and let $H_{p a r} (m) = \sum_{i = 0}^{n - 1} m_{i} G_{i} .$

Further Resources

For more background on commitment schemes, see for example this article by Damgård and Nielsen and this lecture by Dodis.

1: The name can vary; these parameters are sometimes called a commitment key or a public key.

2: Again, the name can vary and it is sometimes called an opening or a hint.

Crypto Book (Work in Progress)