Eos and Block Producer Verification

in #eos5 years ago (edited)

Standby.jpg

EOS is secured by 21 producing nodes with standby nodes always at the ready to step into a producing role. Critically it is assumed that each node is running on independent server infrastructure.

In EOS right now block producers are trading their pay for votes so that they might achieve a higher bp ranking. This is welcomed by some community members seeking passive returns on their token holdings. But sacrificing block producer pay to buy votes means that there is less money to maintain and upgrade vital network infrastructure. Moreover standby block producers do not need to demonstrate their ability to produce blocks in order to receive standby rewards. Only once they are elevated into the top 21 are standby nodes required to show that they are capable of producing blocks for the network.

Eos by default uses approval voting whereby each token can vote for up to thirty producers. Approval voting incrrases the risk of cartel formation when voter kickbacks are allowed. Voting pools have gradually emerged that appear to control multiple nodes. These pools can offer the highest rewards and have been consolidating their control over the network while many independent producers are being relegated to ever lower ranks.

With these dynamics in play block producers and the tokens supporting them are incentivized to squeeze production costs in order to offer greater voter returns - regardless of the damage that such game theory appears to be doing to the token price itself. Some voter kickbacks already pay returns higher than the inflation rate paid for block production itself! Go figure. Consequently there is justifiable concern that some nodes may be tempted to share hosting infractructure or that some paid standby nodes may have no infrastructure backing their candidacy at all. Incentives could lead to a race to the bottom.

Here then is a proposal that hopes to address these concerns.

Requirement

  • Provide a mechanism to test that all block producer nodes and paid block producer standby candidates are running independent server infrastructure
  • The mechanism should not add any overhead or costs to honest block producers
  • The mechanism should not compromise the security of the chain or affect its normal operation

Solution

It is proposed that an extensible architecture be built to periodically and in a synchronised way challenge random sets of standby block producers to provide objective proofs regarding their hosting infrastructure.

The architecture offered provides only for the scheduling of the challenges and the recording the results. It leaves open the question of what remedial action should be taken in the evet of a producer being found to be in violation of the block producer agreement.

Rotate In/Rotate Out

One way to test for block producer readiness that is currently being advocated for is to randomly select a standby producer and rotate that producer into a producing role for some number of rounds. While this would prove that the candidate does indeed have an operational server backing it, it does not prove that their server is hosted on independent infrastructure.

The solution below builds on this in the reverse sense. It will be useful to periodically rotate producing nodes out of the top 21 so that they can be included in the proofs of independent infrastructure described.

Challenge Contracts

Challenges are defined under a common interface and implemented as wasm code. The challenge is distributed via the code property associated with a standard Eos account. They can be included in the challenge schedule by being activated as such by an agreed procedure - perhaps by a 15/21 block producer multi-signature should the chain's governance not have been captured already by a cartel:)

The wasm interface has a simple entry point and is provided context and state data sufficient to run the challenge. Challenges will be srictly time bound and have a maximum possible duration.

Challenge Coordinator System Contract

A smart contract run under the system contract periodically selects n random standby block producers to participate in one of the registered challenges. Each challengee is scheduled to run the challenge at a configurable block height. The challenge is executed as follows:

  • The system raises an event in the participating node when the block that specifies the challenge is seen.
  • The participating node enters a suspended state so that it cannot be promoted into a producing role. If the node is currently in a producing role then upon entering a suspended state it will temporarily resign itself from block production and a non-suspended standby will takes its place for the duration of the challenge.
  • The chain is paused at the targeted block height within the participating node
  • The challenge code is executed and results submitted to the coordinator contract
  • The node is then resumed

Exmple Challenge - Proof of RAM

This example is provided for illustrative purposes only. It has some holes in it as it stands! Nonetheless it provides a suitable starting point for investigating the types of challenges that would be needed to achieve the desired goals.

Description

Participating nodes must calculate a POW style difficulty challenge across the chain's RAM state at the given block height. For example 7 block producers are selected to participate in the challenge. They are each assigned a different block height at which to perform the challenge - e.g. n+0, n+1..n+6. The challengee must compute a cryptographic proof across the RAM state at the prescribed block height. The proof will target a difficulty of ten seconds. The proof will need to be parallel resistant so that multiple CPUs will not improve the targeted completion time. The results are recorded on chain and can be cross checked by any validating node.

Commentary

If the same infrastructure is running multiple nodes then only the first node will be able to complete the proof and submit the results within the expected time window of 12 seconds or so. If the node is running on a shared server the server might copy the RAM state for each of the nodes that it is hosting and run the computation over the RAM copy or it might outsource the computation altogether. However in either case the node would need to have access to the correct RAM state at each of the given block heights and memory equal to n times the RAM size for at least the duration of the challenge window. Nodes trying to game the challenge would also have some difficulty in making the requisite code changes to the base software in order to monitor for and interrupt the standard scheduling of the challenge system.

The main problem with the proposed challenge is in finding a proof over memory state that is parallel resistant. A thread about the problem is in the link below but it is a bit out of my league unfortunately :) I'll take the course haha

A discussion about parallel resistant proof of work schemes

Discussion

Some have argued that this is not a real concern for the EOS network or that we should rather focus our efforts towards other areas first. I would respond that nodes having independent infrastructure is paramount to network security. It is really step 1. Blockchain is built on cryptographic/objective proofs and to ask anything less of the infrastructure hosting it defeats the purpose. There can be no bearer instruments without a secure chain.

While the requirement as stated is to prove that the producers are running on independent servers it may be sufficient from a gaming perspective to prove only that the nodes are suffering a near equivalent financial cost. But this is not a given either. There are potentially other nefarious behaviours aside from cost cutting alone that running nodes on shared infrastructure might support. For example coordinating transaction delays in the case of front running order books on a dex.

Once the architecture described is in place then I would expect the community to offer many interesting challenges. New proofs are being found all the time. Collecting data about block producer operations will have value in its own right and may point to optimisations beyond merely policing the block producer agreement. It could allow for block producer rotations to improve geodiversity. Challenges for geolocation, bandwidth and history come to mind.

The proposed architecture is quite complex and would need to be built into the protocol itself so I would expect that B1 would have to be on board to implement it. Probably a long shot that. But if nothing else I hope this article promotes some discussion about these important issues.

Many questions remain. How for example are the random sets of block producers selected? How many should be in each challenge set? And of course what are the remedial actions? If an offending block producer is found, should the tokens supporting that producer suffer some consequence? I think so.

A validation scheme like the one described is sorely needed for DPoS chains to improve their credibility in the blockchain space. I sincerely hope that B1 are working on such solutions to these problems already.

Thanks in advance for any critiques in the comments below!