09.04.2022
Formal Verification of ZK Constraint Systems
Delendum Research & Kestrel Institute
Morgan Thomas and Ventali Tan
Thanks to Eric McCarthy from Kestrel Institute for helping with some sections of this document, and to Grigore Rosu (UIUC), Shashank Agrawal (Delendum), Daniel Lubarov (Delendum), Tim Carstens (Risc Zero), Bolton Bailey (UIUC) and Alessandro Coglio (Kestrel) for many helpful suggestions.
Table of Content
- Leading Problem
- Techniques
- Formal Verification for ZK Circuits
- Synthesizing Formally Verified Programs
- The State of Current Progress
- Future Research and Development Directions
Leading Problem
How do we formally verify that an arithmetic circuit, as used by a zero knowledge proof system, has the desired characteristics, such as soundness, completeness and zero knowledge1?
Soundness of a proving system means that it does not prove statements that are false. Similarly, soundness of the circuit used by a proof system means that the circuits are unsatisfiable when applied to a statement that is not true.
Completeness of a proving system means that it is able to prove all statements that are true. Similarly, completeness of the circuit used by a proof system means that the circuit is satisfiable when applied to a statement that is true.
Soundness and completeness of a circuit, taken together, are equivalent to the statement that the relation denoted by the circuit is the intended relation.
A proof system is zero knowledge if and only if a proof does not reveal any additional information beyond the statement being proven (i.e., the public inputs to the proving process). Any facts straightforwardly implied by the statement being proven are necessarily also implied by the proof, but additional facts not implied by the statement should not be revealed. Additional "non-public" witness data, which may be considered as secret inputs, must not be revealed.
A sound and complete circuit can only be satisfied by the statements which are true in the intended semantics. If the underlying proving system is sound and complete, and if the circuit is correct, then the overall system provides a sound and complete means of proving the intended statements. We can separate the leading problem into two subproblems: verifying the desired characteristics of the underlying proving system, and verifying the soundness and completeness of the circuit. In other words, the circuit denotes the intended relation and the proving system will behave as intended when applied to the circuit.
Our main focus here is on the second subproblem, that of verifying the circuits of a system while assuming the soundness and completeness of the underlying proving system2.
Zero knowledge proof systems often use a mathematical constraint system such as R1CS or AIR to encode a computation. The zero knowledge proof is a probabilistic cryptographic proof that the computation was done correctly. Formal verification of a circuit used in a zero knowledge proof requires (1) a formal specification of the relation the circuit is intended to encode, (2) a formal model of the semantics of the circuit, (3) the circuit itself, (4) the theorem prover, and (5) the mechanized proof of a theorem relating (1), (2) and (3).
Techniques
- To prove a general statement with certainty, it must be a purely mathematical statement which can be known to be true by reasoning alone, without needing to perform any tests. The reasoning must start from consistent assumptions (axioms) and proceed by truth-preserving inferences.
- Example: given that a bachelor is defined as an unmarried man, and a married man by definition has a spouse, it can be proven with certainty that no bachelor has a spouse.
- A proof theory (sometimes also called a logic or metalogic) codifies a consistent set of rules for performing truth-preserving inference. Proof theories can be used to construct proofs manually, with pencil and paper.
- Example: sequent calculus for classical logic
- Computer programs called proof assistants reduce labor and mitigate the risk of human error by helping the user to construct machine-checkable proofs.
- Examples: Coq, Agda, Isabelle, ACL2, PVS, Imandra, and Lean
Using a proof assistant
- To use a proof assistant to prove a statement about a program, there are two main approaches:
- Write the program in the proof assistant language and apply the proving facilities to the program directly.
- Use a transpiler3 to turn a program written in another language into an object which the proof assistant can reason about. This is called post hoc verification.
- Of these approaches, (1) seems preferable for the greater confidence provided by the proof being about exactly the (source) program being executed, as opposed to output of a transpiler which is assumed to have the same meaning as the source program.
- What motivates approach (2) is when (for whatever reason) the proof assistant language is not suitable as a language for developing the application in4.
Without using a proof assistant
- There are also ways to prove a statement about a program without (directly) using a proof assistant:
- Use a verified compiler (e.g. CompCert), which turns a source program into an object program which provably has certain properties by virtue of (proven) facts about the verified compiler.
- Note that there is a distinction between a verified compiler and a verifying compiler. CompCert itself is proved to generate binary that will always be semantically equivalent to the C input. A verifying compiler generates a binary along with a proof of correctness that the binary is semantically equivalent to the source.
- When a program is compiled by a verifying or verified compiler, we say that the compiler output is correct by construction (provided that the input is correct).
- Verified compilers do not detect or fix bugs in the input source code; however, they also do not introduce any new bugs during compilation.
- Use an automatic proof search algorithm, which takes as input statements to be proven and outputs proofs of those statements if those statements are true and the proof search algorithm finds proofs.
- Use a static analyzer, which takes as input a program and automatically checks for various kinds of issues using predetermined algorithms.
- Use a verified compiler (e.g. CompCert), which turns a source program into an object program which provably has certain properties by virtue of (proven) facts about the verified compiler.
- All of these approaches have limitations:
- A verified compiler is limited in what statements it can prove about the resulting program: typically, just that the resulting program has the same meaning or behavior as the source program.
- An automatic proof search algorithm is limited in what statements it can prove by the sophistication of the algorithm and the computational power applied to it. Also, due to the undecidability of first-order logic5, there cannot exist a proof search algorithm which would find a proof of any given true statement.
- A static analyzer is generally not capable of reasoning about the meaning of a program to see if it's correct; it is only able to recognize patterns which always or often indicate some kind of issue.
Formal Verification for ZK Circuits
- Formal verification for probabilistic proof systems, inclusive of ZK proof systems, encompasses two main problem spaces:
- Proving the intended properties of a general-purpose proving system, such as soundness, completeness, and zero knowledge.
- E.g., Bulletproofs, Halo 2, etc.
- Proving the intended properties of a circuit, namely that it correctly encodes the intended relation.
- Proving the intended properties of a general-purpose proving system, such as soundness, completeness, and zero knowledge.
- Problem (1) is generally considered a very difficult problem and has not been done for any significant proving system.
- Problem (2) can be done with a formal specification of the relation and a model of the circuit semantics. Usually it requires proving functional correctness of functions defined within the relation as well as the existence of witness variables for every argument to the function.
Note: There are different ways to axiomatize a problem to prove it. Some categories are denotational semantics, axiomatic semantics, and operational semantics.
Operational semantics is particularly useful for proving things about programs. For example, in the ACL2 theorem prover, a common approach to formalizing programming language semantics is an interpretive operational semantics, i.e. a high-level interpreter, written in the logic of the theorem prover, that formalizes the possible forms of the states of computation and describes how the execution of the constructs of the programming language (e.g. expressions and statements) manipulate the computation states.
Denotational design
Denotational design provides a helpful way of thinking about both problem spaces (general and application-specific).
Denotational design also provides a methodology for defining the requirements of a system, such as a zero-knowledge proving system, in such a way that the requirements can be expressed and proven in a formal system.
- A circuit denotes a set: namely, the set of public inputs (i.e., statements) for which the circuit is satisfiable (i.e., the statement is true)
- For example, consider a hash verification circuit
- A public input is a pair (x, h) where x is some data and h is a hash
- A public input (x, h) expresses the statement that h is the hash of x
- The circuit is (or should be) satisfiable exactly when h is the hash of x
- For example, consider a hash verification circuit
- The goal of circuit verification is to prove that the circuit denotes the intended relation.
- The goal of general purpose proving system verification is to prove that it has the intended properties with respect to the denotational semantics of circuits:
- Soundness means that if the verifier accepts a proof, then with high probability, the public input used to generate the proof (i.e., the statement being proven) is in the set denoted by the circuit (i.e., the statement is true)
- Completeness means that if a public input (i.e., a statement) is in the set denoted by the circuit (i.e., the statement is true), then the proving algorithm successfully outputs a proof which the verifier accepts
Figure 1: denotational design
*A formal version of this diagram is in Sigma^1_1 arithmetization paper ((288) on page 47). This is a commutative diagram. Soundness and completeness of a circuit compiler are both expressed by the statement that this diagram commutes, meaning that all paths through it between the same objects are equal, or in other words, the denotation of the compiled circuit is always equal to the denotation of the spec.
If you know that your circuit denotes the relation you intend, and you know that your general purpose proof system is sound and complete in the above senses, then you know that your application-specific proving system (i.e., the circuit plus the general proof system) has the intended soundness and completeness properties for that application.
This suggests that, given a formally verified general-purpose proving system, and a verified compiler from statements to circuits, one can solve the problem of proving correctness of application-specific proving systems without application-specific correctness proofs.
Synthesizing formally verified programs
It is often a challenge to bridge the gap between the theorem we can prove and the code we can execute. For example, we may be able to prove a theorem about a spec, and code that spec, but then not able to prove that the code implements the spec. Or, we may be able to prove a theorem about some code but not able to compile that code to efficient machine code. Or, we may be able to do that, but unable to prove that the machine code has the semantics as the source code.
Here is a summary of some of the ways in which the ecosystem supports correct-by-construction synthesis of efficient code:
- The proof assistants Coq and Agda provide for extraction to other languages to be compiled to machine code; this may leave something to be desired in terms of the efficiency of the resulting machine code.
- The language ATS provides proof facilities and purports to allow for programming with the efficiency of C and C++.
- There are various means for transpiling code written in a mainstream language such as C or Haskell into a proof assistant, which allows for theorems to be proven about the extracted model of the source program.
- You can synthesize an efficient binary program using Coq (e.g., using Fiat or CertiCoq).
- The proof assistant ACL2 defines a subset of Common Lisp with a full formal logic6. When a definition is executable, it can be compiled into efficient code, and because the language is a formal logic, you can define and prove theorems about the code.
- There is a verifying compiler project, ATC, from ACL2 to C.
- Imandra defines a subset of OCaml with a full formal logic and a theorem prover.
Current limitations of formal methods on modern hardware
- In the context of modern computing, most computationally intensive tasks deal with vector math and other embarrassingly parallel problems which are done most efficiently on specialized hardware such as GPUs, FPGAs, and ASICs.
- This is generally true of the problem of constructing proofs in probabilistic proof systems. Provers for these proof systems would be most efficient if implemented on specialized hardware, but in practice, they are usually implemented on CPUs, due to the greater ease of programming on CPUs and the greater availability of those skill sets in the labor market.
- For creating a formally verified implementation of a probabilistic proof system which executes efficiently, it seems that the right goal is not to optimize for speed of execution on a CPU, but to target specialized hardware such as FPGAs, GPUs, or ASICs.
- Unfortunately, tools for compiling formally verified programs to run on FPGAs, GPUs, or ASICs are more or less nonexistent as far as we know.
The State of Current Progress
- Decades of research exists on formal verification strategies for arithmetic circuits in the context of hardware verification.
- See, e.g., Drechsler et al (2022) and Yu et al (2016)
- This work has limited industrial applications, e.g., the AAMP5 (see Kern and Greenstreet (1997), page 43).
- This line of research is not directly applicable to formal verification of arithmetic circuits for zk-SNARKs, because arithmetic circuits in hardware and arithmetic circuits in zk-SNARKs are not quite the same things.
- Decades of research exists on proving soundness, completeness, and zero knowledge for probabilistic proof systems.
- This work has generally been carried out informally, but provides a starting point for work on proving the same factors formally.
- Examples:
- Groth16
- Bulletproofs
- PLONK
[State of current progress section continues with details about various organizations and their work...]
Future Research and Development Directions
A lot of work needs to be done. There is not enough emphasis placed on formal verification in the security industry.
Based on the observations and arguments presented in this blog post, we think the following will be some interesting directions for future research and development:
- Build foundations for formally verifying zero knowledge proof systems:
- Generalizable proof techniques for proving the desired properties formally
- Reusable verified abstractions for proof systems, e.g., a polynomial commitment scheme library
- Improve specification languages and verified translators between specification languages
- Understand how to create formally verified programs to run on vectorized hardware, e.g., FPGAs, GPUs, and/or ASICs
- Can we formally verify systems that are designed to automatically make ZK circuits more efficient?
- For example: systems that choose a different circuit so that the setup MPC is more parallelizable or that allow a prover who learns part of the witness in advance to partially evaluate a circuit and use this information to compute proofs faster
- Use K to prove statements about ZK constraint system
- Define the semantics of circom/cairo in K
- Use Rust semantics defined in K to prove properties of arkworks-rs programs
If you're interested in further discussions on this topic or working together on this subject, please consider joining our group chat or reach out to me at ventali@delendum.xyz.
Terminology:
If you write a specification of a computation in a high-level formal language and compile it to a constraint system using a verified or verifying compiler, that is called correct by construction. If you take an existing constraint system and you try to prove properties about it (up to and including soundness and completeness) with respect to a specification in a high-level formal language, that is called post-hoc verification.
Footnotes
- We do not elaborate on the zero knowledge property in the article and will write another article on that. To describe that, you need to get into the concept of private inputs. (The example of correct computation of a hash from a hidden plaintext is a good one for understanding this.) The zero knowledge property is how hard it is to determine the private inputs from the public information. You must encode some hard-to-invert function of a private input in the circuit if you want it to have the zero-knowledge property. The zero-knowledge property is generally a probabilistic measure. However, if the computation is too simple and the output is known, it might become possible to figure out the input with certainty.
- It is worth noting that this problem can be solved in a manner that is independent of the ambient ZK system: it is really just a problem about systems of polynomial equations.
- The transpiler idea makes the most sense when there is a formally defined DSL that is close to the new language you want to define, and then you can transpile from your new language to that DSL. However, that transpiler should really be formally verified as well.
- In our experience of formal verification for systems that have some cryptographic component, we have noticed a few different things:
It's extremely difficult to formally reason about the security properties of the "whole system". The system design is usually very complex with a lot of dependencies of different types. Even the cryptographic component, which could be pretty small and simple at a high-level (say encryption, MAC, etc), may be implemented in a very complex way, with support for failures, recovery, backup, etc. As a result, even modeling the entirety of just the crypto component in a formal way is quite challenging. So any formal conclusions we draw ends up being for a very small part of the whole system. Our methods have not yet developed to a point where we know how to reason about the whole system in formal proofs.
Proof assistant languages – at least for crypto protocols – so far have limited expressivity. There are limited tools that can directly prove theorems about a crypto protocol written in C or even Rust. In general, developers of protocols need to handle multiple threads and timing issues and those are hard to formalize. E.g., Gallina (for Coq) doesn't have such constructs. From that point of view, Gallina is not very developer-friendly and Rust using threads is. Future efforts to reason formally about concurrent software systems may benefit from formalization of existing abstractions for reasoning about such systems, such as process calculi, like for example the π-calculus. - Originally, we put "due to Gödel's incompleteness theorem". Alessandro suggested we change to "the undecidability of first-order logic", because the non-existence of such a proof search algorithm is due to "less than" Gödel's incompleteness theorem: it is due to the undecidability of first-order logic (which is complete according to Gödel's completeness theorem). Then second-order and higher-order logic is incomplete, in addition to being undecidable.
- Common Lisp can be compiled into quite efficient code, very close to C or Rust. (However, not as efficient as GPUs. Also, concurrent programming, while available in Common Lisp, is not part of the subset in ACL2 for which a formal logic is defined.) If you write a formal specification that is executable though, then it can usually be compiled into fairly efficient code, all within the language of the theorem prover.