Bonsol: Verifiable Compute for Solana

Anagram Build spends the majority of our time researching novel crypto primitives and applying these primitives in specific products. One of our recent research projects took us into the realm of Verifiable Compute (VC); our team leveraged this research to create a new open source system called Bonsol. We chose this area of research given the emergence of effective use-cases that VC enables and the concerted efforts across various L1s to optimize VC's cost-effectiveness and scalability.

In this blog, we have two goals.

First, we want to make sure you come away with a better understanding of VC as a concept and the possible products it enables in the Solana ecosystem.
Second, we want to introduce you to our latest creation, Bonsol.

What is Verifiable Compute, anyway?

The term ‘Verifiable Compute’ may not appear in the investment decks of bull market startups, but the term ‘Zero Knowledge’ does. So, what do these terms mean?

Verifiable Compute (VC) is running a specific workload in a way that it produces an attestation of its workings that can be publicly verified without running the computation again. Zero Knowledge (ZK) is being able to prove a statement about data or a computation without revealing all of the data or the inputs to the computation. The terms are somewhat conflated in the real world, ZK is somewhat of a misnomer. It has more to do with selecting the information that needs to be made public to prove a statement about it. VC is a more accurate term and is the overall goal in many existing distributed system architectures.

How does VC help us build better crypto products?

So, why do we want to add VC or ZK systems, on top of platforms like Solana and Ethereum? It seems the answer is more about security for the developer. The developer of a system acts as a mediator between the end users' trust in a black box and the technical functions that make that trust objectively valid. By utilizing ZK/VC techniques, the developer can reduce the surface area for attack in the products they are building. VC systems shift the locus of trust to the proving system and the compute workload being proven. This is a similar trust inversion that occurs when moving from a typical web2 client/server approach to a web3 blockchain approach. The trust shifts from relying on a company's promises to trusting the open-source code and the cryptographic systems of the network. There are no true zero-trust systems from the user's perspective, and I argue that to end-users, it all seems like a black box.

For example, by using a ZK login system, a developer will have less liability in maintaining a secure database and infrastructure than just a system that verifies some cryptographic properties are achieved. VC techniques are being applied in many places where consensus is needed to ensure that the only thing needed to create consensus is that the mathematics are valid.

While there are many promising examples of using VC and ZK in the wild, many of those currently rely on in-progress development at all levels of the crypto software stack to make it fast and efficient enough to be used in production.

As part of our work here at Anagram we have the opportunity to speak to a multitude of crypto founders / developers to understand where the current state of crypto’s software stack is slowing product innovation. Historically our conversations have helped us identify an interesting trend. Specifically, a cohort of projects is actively moving on-chain product logic off-chain because it's becoming too expensive, or they need to add more exotic business logic. So in the end these developers find themselves trying to find systems and tools to balance the on- and off-chain parts of the products they are developing which are becoming more and more powerful. This is where VC becomes a critical part of the path forward in helping connect the on- and off-chain worlds using trustless and verifiable methods.

Let’s go! So how do VC/ZK systems work today?

VC and ZK functions are now mainly being performed on alternative compute layers (aka rollups, sidechains, relays, oracles, or coprocessors) available via a callback to smart contract runtime. To enable this workflow, many of the L1 chains have efforts in progress to provide shortcuts outside of the smart contract runtime (e.g., syscalls or precompiles) that lets them do things that would be otherwise too expensive on-chain.

There are a few common modes of current VC systems. I'll mention the top four that I'm aware of. In all but the last case, the ZK proofs are happening off-chain, but it's where and when the proofs are verified that give each of these modes their edge.

Fully On-chain Verification

For VC and ZK proving systems that can produce small proofs, such as Groth16 or some Plonk varieties, the proof is then submitted on-chain and verified on-chain using previously deployed code. Such systems are now very common, and the best way to try this out is using Circom and a Groth16 verifier on Solana or EVM. The drawback is that these proof systems are quite slow. They also usually require learning a new language. To verify a 256-bit hash in Circom, you actually need to deal manually with each of those 256 bits. There are many libraries that allow you to just call hash functions, but behind the scenes, you need to reimplement these functions in your Circom code. These systems are great when the ZK and VC element of your use case is smaller, and you need the proof to be valid before some other deterministic action is taken. Bonsol falls in this first category currently.

Off-chain Verification

The proof is submitted to the chain so that all parties can see that it's there and can then later use off-chain compute to verify. In this mode, you can support any proving system, but since the proof is not happening on-chain, you don't get the same determinism for whatever action depends on the proof submission. This is good for systems that have some sort of challenge window where parties can "naysay" and try to prove that the proof is incorrect.

Verification Networks

The proof is submitted to a verification network, and that verification network acts as an oracle to call the smart contract. You get the determinism, but you also need to trust the verification network.

Synchronous On-chain Verification

The fourth and final mode is quite different; in this case, the proving and verifying are happening all at once and completely on-chain. This is where the L1 or a smart contract on the L1 is actually able to run a ZK scheme over user inputs that allow the execution to be proven over private data. There aren't that many widespread examples of this in the wild, and usually, the types of things you can do with this are limited to more basic mathematical operations.

Recap

All four of these modes are being tested in various chain ecosystems, and we will see if new patterns emerge and which pattern becomes dominant. For example, on Solana, there isn't a clear winner, and the VC and ZK landscape is very early. The most popular method across many chains, including Solana, is the first mode. The fully on chain verification is the gold standard, but as discussed it comes with some drawbacks. Mainly latency and it limits what your circuits can do. As we dive into Bonsol you will see it follows the first mode with a slight twist.

Introducing Bonsol!

Enter Bonsol, a new Solana native VC system that we at Anagram built and open sourced. Bonsol allows a developer to create a verifiable executable over private and public data and integrate the results into Solana smart contracts. Note that this project depends on the popular RISC0 toolchain.

This project was inspired by a question asked by many of the projects we work with on a weekly basis: “How can I do this thing with private data and prove it on-chain?" While the "thing" differed in each case, the underlying desire was the same: to minimize their centralized dependencies.

Before we dive into the details of the system, let’s kick this off by illustrating the power of Bonsol with two separate use cases.

Scenario one

A Dapp that allows users to buy raffle tickets into pools of various tokens. The pools are "decanted" once a day from a global pool in such a way that the amount of money in the pool (the amounts of each token) are hidden. The users can buy access to increasingly specific ranges of the tokens in the pool. But there is a catch: once a user buys the range, it becomes public for all users at the same time. The user must then decide to buy the ticket. They can decide that it's not worth the purchase, or they can secure a stake in the pool by buying a ticket.

Bonsol comes into play when the pool is created and when the user pays for the range to become known. When the pool is created/decanted, the ZK program takes in the private inputs of the amount of each token to decant. The types of tokens are known inputs, and the pool address is a known input. This proof is a proof of random selection from the global pool into the current pool. The proof contains a commitment to the balances as well. The contract on-chain will receive this proof, verify it, and hold on to the commitments such that when the pool is finally closed and the balances are sent from the global pool to the raffle ticket owners, they can verify that the token amounts were not changed since the random selection at the beginning of the pool.

When a user buys an "opening" of the hidden token balance ranges, the ZK program takes the actual token balances as private inputs and produces a range of values that are committed along with the proof. A public input of this ZK program is the previously committed pool creation proof and its outputs. This way, the whole system is verified. The prior proof must be validated in the range proof, and the token balances must hash to the same value committed in the first proof. The range proof is also committed to the chain and, as previously said, makes the range visible to all participants.

While there are many ways to accomplish this raffle ticket sort of system, the properties of Bonsol make it quite easy to have very little trust in the entity putting on the raffle. It also highlights the interoperation of Solana and the VC system. The Solana program (smart contract) plays a crucial role in brokering that trust because it verifies the proofs and then allows the program to take the next action.

Scenario two

Bonsol allows developers to create a primitive that is used by other systems. Bonsol contains the notion of deployments, where a developer can create some ZK program and deploy it to Bonsol operators. The Bonsol network operators currently have some basic ways to evaluate if an execution request for one of the ZK programs will be economically advantageous. They can see some basic information about how much compute the ZK program will take, the input sizes, and the tip that the requester is offering. A developer can deploy a primitive that they think many other Dapps will want to use.

In the configuration for a ZK program, the developer specifies the order and type of required inputs. The developer can release an InputSet that preconfigures some or all of the inputs. This means they can configure some of the inputs to the effect that they can create primitives that can help users verify computation over very large datasets.

For example, let's say that a developer creates a system that, given an NFT, can prove on-chain the transfer of ownership included a specific set of wallets. The developer can have a preconfigured input set that contains a bunch of historical transaction information. The ZK program searches through the set to find the matching owners. This is a contrived example and can be done in a myriad of ways.

Consider another example: a developer is able to write a ZK program that can verify that a keypair signature comes from a keypair or hierarchical set of keypairs without revealing the public keys of those authoritative keypairs. Let's say that is useful to many other Dapps, and they use this ZK program. The protocol gives the author of this ZK program a small usage tip. Because performance is critical, developers are incentivized to make their programs fast so that operators want to run them, and devs seeking to rip off the work of another dev will need to change the program in some ways to be able to deploy it since there is a validation of the content of the ZK program. Any operation added to the ZK program will affect performance, and while it's definitely not foolproof, this may help ensure developers are rewarded for innovation.

Bonsol Architecture

These use cases help describe what Bonsol is useful for, but let's take a look at its current architecture, current incentive model and execution flow.

The above image describes a flow from a user needing to perform some kind of verifiable compute, this will usually be via a dapp that needs this to let the user perform some action. This will take the form of an Execution Request that contains information about the ZKprogram being executed, the inputs or input sets, the time within which the compute must be proven and the tip (which is how the Relays get paid). The request gets picked up by the Relays and they must race to decide if they want to claim this execution and start proving. Based on the specific relay operators capabilities, they may choose to pass on this because the tip is not worth it or the zkprogram or inputs are too big. If they decide they want to perform this computation, they must execute a claim on it. If they are the first to get the claim, then their proof will be accepted until a certain time. If they fail to produce the proof in time, other nodes can claim the execution. In order to claim the Relay must put up some stake currently hard coded to tip / 2 that will be slashed if they fail to produce a correct proof.

Bonsol was built with the thesis that more compute will move to a layer where it is attested to and verified on chain, and that Solana will be a “go-to” chain for VC and ZK soon. Solana’s fast transactions, cheap compute and burgeoning user base make it an excellent place to test these ideas.

Was this easy to build? Heck no!

That isn’t to say there weren’t challenges building Bonsol. In order to take the Risco0 proof and verify it on Solana, we need to make it smaller. But we can't just do that without sacrificing the security properties of the proof. So we use Circom and wrap the Risc0 Stark which can be in the ~200kb and wrap it in a Groth16 proof which ends up always being 256 bytes. Fortunately Risc0 provided some nascent tooling for this, but it adds a lot of overhead and dependencies to the system.

As we started to build out Bonsol and use existing tooling for wrapping the Stark with the Snark, we sought ways to reduce dependencies and increase speed. Circom allows compilation of Circom code into C++ or wasm. We first tried to compile the Circom circuit into a wasmu file produced by the LLVM. This is the fastest and most efficient way to make the Groth16 tooling portable and still fast. We chose wasm due to its portability as the C++ code depended on the x86 cpu architecture, meaning new Macbooks or Arm based servers won't be able to use this. But this became a dead end for us on the timeline we had to work with. Because most of our product research experiments are time boxed until they prove their worth we had 2-4 weeks of dev time to test this idea. The llvm wasm compiler just could not handle the generated wasm code. With more work, we could have gotten past this, but we tried many optimization flags and ways to get the llvm compiler working as a wasmer plugin to precompile this code into llvm, but we were unsuccessful. Because the Circom circuit is around 1.5 million lines of code you can imagine that the amount of Wasm became huge. We then turned our sights to trying to just create a bridge between the C++ and our Rust relay code base. This also met a quick defeat as the C++ contained some x86 specific assembly code that we did not want to fiddle with. In order to get the system out to the public we ended up simply bootstrapping the system in a way that makes use of the C++ code but removes some of the dependencies. In the future we would like to expand upon another line of optimization we were working on. That was to take the C++ code and actually compile it into an execution graph. These C++ artifacts from the Circom compilation are mostly just modular arithmetic over a finite fields with a very very large prime number as the generator. This showed some promising results for smaller and simpler C++ artifacts, but more work is needed to make it function with the Risc0 system. This is because the generated C++ code is around 7 million lines of code, and the graph generator seems to hit stack size limits, and raising those seems to produce other faults that we have not had time to determine. Even though some of these avenues did not pan out, we were able to make contributions to OSS projects and hope that at some point those contributions will be upstreamed.

The next set of challenges are more in the design space. An essential part of this system is being able to have private inputs. Those inputs need to come from somewhere, and due to time constraints we weren’t able to add in some fancy MPC encryption system to allow the private inputs to be in the system in a closed loop. So to address this need and unblock developers we added the notion of a private input server which needs to validate that requester, which is validated by a signature of a payload is the current claimant of the proof and serve it to them. As we extend Bonsol we plan to implement a MPC threshold decryption system by which the Relay nodes can allow the claimant to decrypt the private inputs. All of this discussion about private inputs does bring us to a design evolution we plan to make available in the Bonsol repo. That is Bonsolace, which is a simpler system that allows you as a developer to easily prove these zk programs on your own infrastructure. Instead of factoring out to the prover network you can just prove it yourself and verify it on the same contract that the prover network uses. This use case is for very high value private data use cases where access to the private data must be minimized at all costs.

One last note about Bonsol that we haven’t seen in other places using Risc0 yet is that we force a commitment (hash) over the input data into the zk program. And we actually check on the contract that the inputs that the prover had to commit to match what the user expected and sent into the system. This comes at some cost, but without it means the prover could cheat and run the zkprogram over inputs the user did not specify. The rest of the Bonsol development fell into normal Solana development but it should be noted that we intentionally tried out some new ideas there. On the smart contract we are using flatbuffers as the only serialization system. This is a somewhat novel technique that we would like to see developed and made into a framework because it lends itself nicely to auto-generating sdks that are cross platform. One final note about Bonsol is that it currently needs a precompile to work most efficiently, this precompile is slated to land in Solana 1.18, but until then we are working to see if teams are interested in this research and looking beyond Bonsol into other technologies.

Wrapping Up

Beyond Bonsol the Anagram build team is looking deeply into many places within the VC domain. Projects like Jolt, zkllvm, spartan 2, Binius are projects we are tracking, along with companies working in the Fully Homomorphic Encryption (FHE) space, if you don't know what that is, stay tuned as we will cover it at some point.

Please checkout the Bonsol repository and make an issue for examples you need or how you want to extend it. It's a very early project and you have the chance to make your mark.

If you are working on interesting VC projects we encourage you to apply here for the Anagram EIR program where alongside the Anagram team you will be able to test your thesis, build a company and tackle the biggest possible problems. Feel free to contribute or ask any questions.