Consensus in Exonum¶
Generally, a consensus algorithm is a process of obtaining an agreed result by a group of participants. In Exonum the consensus algorithm is used to agree on the list of transactions in blocks added to the blockchain. The other goal of the algorithm is to ensure that the results of the transaction execution are interpreted in the same way by all nodes in the blockchain network.
The consensus algorithm in Exonum uses some ideas from the algorithm proposed in Tendermint, but has several distinguishing characteristics as compared to it and other consensus algorithms for blockchains.
Assumptions¶
The Exonum consensus algorithm assumes that the consensus participants can be identified. Thus, the algorithm fits permissioned blockchains, which Exonum is oriented towards, rather than permissionless ones.
Not all the nodes in the blockchain network may be actively involved in the consensus algorithm. Rather, there is a special role for active consensus participants – validators or validator nodes. For example, in a consortium blockchain validators could be controlled by the companies participating in the consortium.
The consensus algorithm must operate in the presence of faults, i.e., when participants in the network may behave abnormally. The Exonum consensus algorithm assumes the worst; it operates under the assumption that any individual node or even a group of nodes in the blockchain network can crash or be compromised by a resourceful adversary (say, a hacker or a corrupt administrator). This threat model is known in computer science as Byzantine faults; correspondingly, the Exonum consensus algorithm is Byzantine fault tolerant (BFT).
From the computer science perspective, the Exonum consensus algorithm takes usual assumptions:
- Validator nodes are assumed to be partially synchronous, i.e., their computation performances do not differ much
- The network is partially synchronous, too. That is, all messages are delivered in the finite time which, however, is unknown in advance
- Each validator has access to a local stopwatch to determine time intervals. On the other hand, there is no global synchronized time in the system
- Validators can be identified with the public-key cryptography; correspondingly, the communication among validators is authenticated
The same assumptions are used in PBFT (the most well-known BFT consensus) and its successors.
Algorithm Overview¶
Consensus algorithm in Exonum proceeds in iterations (epochs). At each epoch, the nodes make a decision as to the transactions in the next block and their effect on the blockchain state. As an alternative to accepting a new block, the nodes may decide to skip block creation for the epoch; this outcome is called block skip.
The reasoning for having block skips is outlined at the end of this article.
The process of reaching consensus at epoch E
consists of several rounds,
numbered from 1. The first round starts once the
validator commits the outcome of epoch E - 1
. The onsets of rounds are
determined by the following timetable: rounds start with time intervals
that linearly increase with the round number.
As there is no global time,
rounds may start at different time for different validators.
When the round number R
comes, the previous rounds are not completed.
That is, round R
means that the validator can process messages
related to a round with a number no greater than R
.
The current state of a validator can be described as a tuple (E, R)
.
The R
part may differ among validators, but the epoch E
is usually the
same.
If a specific validator is lagging (e.g., during its initial synchronization,
or if it was switched off for some time), its epoch may be lower.
In this case, the validator can request missing blocks from
other validators and full nodes in order to quickly synchronize with the rest
of the network.
Strawman Version¶
To put it very simply, rounds proceed as follows:
- Each round has a leader node. The round leader offers a proposal for the next block or block skip and broadcasts it across the network. The logic of selecting the leader node is described in a separate algorithm.
- Validators may vote for the proposal by broadcasting a prevote message. A prevote means that the validator has been able to parse the proposal and has all transactions specified in it.
- After the validator has collected enough prevotes from a supermajority of other validators, it applies transactions specified in the voted proposal, and broadcasts a precommit message. This message contains the result of the proposal execution in the form of a new state hash. The precommit expresses that the sender is ready to commit the corresponding proposed block to the blockchain, but needs to see what other validators have to say on the matter just to be sure.
- Finally, if the validator has collected a supermajority of precommits with the same state hash for the same proposal, the proposed block or block skip is committed to the blockchain.
Non-Strawman Version¶
Note
In the following description, +2/3 means more than two thirds of the validators, and -1/3 means less than one third.
The algorithm above is overly simplified:
- The validator may receive messages in any order because of network delays. For example, the validator may receive a prevote or precommit for a block proposal that the validator does not know
- There can be validators acting not according to the consensus algorithm. Validators may be offline, or they can be corrupted by an adversary. To formalize this assumption, it is assumed that -1/3 validators at any moment of time may be acting arbitrarily. Such validators are called Byzantine in computer science; all other validators are honest
The 3-phase consensus (proposals, prevotes and precommits) described above is there to make the consensus algorithm operational under these conditions. More precisely, the algorithm is required to maintain safety and liveness:
- Safety means that once a single honest validator has committed a block or a block skip, no other honest validator will ever commit any other outcome at the same epoch
- Liveness means that honest validators keep deciding on an epoch outcome from time to time
Locks¶
Byzantine validators may send different messages to different validators. To maintain safety under these conditions, the Exonum consensus algorithm uses the concept of locks.
A validator that has collected a +2/3 prevotes for some proposal locks on that proposal. The locked validator does not vote for any other proposal except for the proposal on which it is locked. When a new round starts, the locked validator immediately sends a prevote indicating that it is locked on a certain proposal. Other validators may request prevotes that led to the lock from the locked validator, if they do not have them locally (these prevotes are known as proof of lock).
Example
Validator A gets prevotes from validators B and C, and they do not get prevotes from each other because of connection problems. Then validators B and C can request each other’s prevotes from validator A.
Locks can be changed: if A locked on a proposal and during next round all other validators locked on the next proposal, A would update its lock eventually.
Requests¶
As consensus messages may be lost or come out of order, the Exonum consensus uses the requests mechanism to obtain unknown information from other validators. A request is sent by a validator to its peer if the peer has information of interest, which is unknown to the validator, and which has been discovered during the previous communication with the peer.
Example
A request is sent if a node receives a consensus message from an epoch greater than the local epoch. The peer is supposed to respond with the contents of the epoch outcome (block with transactions or a block skip), together with a proof of authenticity (i.e., precommits of +2/3 validators).
There are requests for all consensus messages: proposals, prevotes, and precommits. As consensus messages are authenticated with digital signatures, they can be sent directly in response to requests.
Node States Overview¶
The order of states in the proposed algorithm is as follows:
Commit -> (Round)+ -> Commit -> ...
On the timeline, these states look the following way (for one of the validator nodes):
Commit: | E | | E+1 | ...
Round1: | | R1 | | R1 ...
Round2: | | R2 | | R2 ...
Round3: | | R3 | | R3 ...
Round4: | | R4 | | R4 ...
...
------------------------------------------------------------------> Time
Note that rounds have a fixed start time but they do not have a definite end
time (they end when the next block is received). This differs from common
behavior of partially synchronous consensus algorithms, in which rounds have
a definite conclusion (i.e., messages generated during round R
must be processed only during round R
).
Network Communication¶
Messages¶
The consensus algorithm makes use of several types of messages. All messages are authenticated with the help of public-key digital signatures, so that the sender of the message is unambiguously known and cannot be forged. Furthermore, use of digital signatures (instead of, say, HMACs) ensures that messages can be freely retransmitted across the network. Moreover, this can be done by load balancers that have no idea whatsoever as to the content of messages.
Propose¶
Propose
message is a set of transactions proposed by the round leader
for inclusion into the next block.
Instead of whole transactions, Propose
messages include only transactions
hashes.
A validator that received a Propose
message can request missing transactions
from its peers.
If all validators behave correctly, Propose
is sent only by the leader node
of the round.
Prevote¶
Prevote
is a vote for a Propose
message. Prevote
indicates that a
validator
has a correctly formed Propose
and all the transactions specified in it.
Prevote
is broadcast to all validators.
Precommit¶
Precommit
is a message expressing readiness to include a certain proposal
as the next block into the blockchain. Precommit
is broadcast to all
validators.
Status¶
Status
is an information message about the current node state (in particular,
its blockchain height and epoch). It is sent with a periodicity determined
by the status_timeout
global configuration parameter.
BlockResponse¶
A BlockResponse
message contains a block with transaction or a block skip,
and a set of Precommit
messages authenticating it.
BlockResponse
messages are sent upon request.
Connect¶
A node sends a Connect
message to all addresses from the list of its known
peers during initialization. The message tells the peer to connect to the
address specified in this message. Each of the receiving nodes respond
by their own Connect
messages.
Request Messages¶
There are request messages for transactions, Propose
and Prevote
messages,
and blocks. The generation and processing rules for these messages are fairly
obvious.
Example
A ProposeRequest
message is generated if a validator receives a consensus
message (Prevote
or Precommit
) that refers to the Propose
message,
which
is unknown to the validator. A receiver of a ProposeRequest
message sends
the requested Propose
in response.
Distinguishing Features¶
In comparison with other BFT algorithms, the consensus algorithm in Exonum has the following distinctive features.
Block Skips¶
Warning
Block skips are considered an experimental feature and need to be explicitly switched on. (See node docs for more details.) In best Rust traditions, you don’t pay for security / safety if you don’t use this feature.
Skipping block creation under certain conditions allows to save on storage and processing power of the nodes. Unlike normal blocks, block skips are not recorded in the blockchain forever; instead, the nodes store only the latest skip if it corresponds to the newest blockchain height. If a node accepts a normal block, it erases the skip.
If there are no transactions in the network for a prolonged period of time, block skips provide tangible reduction in storage space compared to constantly creating empty blocks.
Note
A curious reader may wonder why the consensus algorithm does not just halt when there are no transactions. The answers lies in the partial network synchronicity assumption underpinning the consensus algorithm. This assumption means that a situation where nodes do not exchange messages is indistinguishable with the situation where nodes are not connected! Thus, block skips provide the “heartbeat” for the network.
Another reason to use block skips is that they allow to actualize proofs regarding the blockchain state. Since block skips are stored similarly to empty blocks, it is possible to tie a Merkle proof to the newest skip instead of the newest normal block.
Customizable Proposal Logic¶
Warning
Just like block skips, customizing proposal logic is experimental and need to be explicitly switched on. Consult node docs for more details.
The positive effect of block skips is enhanced by the ability of nodes to customize the logic how they form block proposals. For example, nodes may whitelist / blacklist transactions by the sender, prioritize transactions by the addressed service or other parameters, implement more flexible transaction limits, and so on.
Unbounded Rounds¶
Rounds have a fixed start time but they do not have a definite end time (a round ends only when the next block is received). This helps decrease delays when the network connection among validators is unstable.
Assume that consensus messages from a certain round need to be processed within the round. If the state of the network deteriorates, the network might not manage to accept the proposal until the end of the round. Then in the next round the entire process of nominating a proposal and voting for it must begin again. The timeout of the next round should be increased so that the block could be accepted during the new round timeout with a poor network connectivity. The need to repeat anew the work that has already been done and increase in the timeout would lead to additional delays in accepting the block proposal.
In contrast to the case discussed in the previous paragraph, the absence of a fixed round end in Exonum allows the system to accept the proposal with a minimum necessary delay.
Work Split¶
Propose
messages include only transactions hashes. (Transactions are included
directly into BlockResponse
messages.) Furthermore, transactions execution is
delayed;
transactions are applied only at the moment when a node locks on a Propose
.
Delayed transactions processing reduces the negative impact of malicious nodes on the system throughput and latency. Indeed, it splits transactions processing among the stages of the algorithm:
- On the prevote stage, validators only ensure that a list of transactions
included in the proposal is correct (the validator checks that all the
transactions in
the
Propose
are already stored by this node. Correctness of a transaction is verified when the transaction is received; nodes do not store incorrect transactions.) - On the precommit stage, validators apply the transactions to the current blockchain state
- On the commit stage, validators ensure that they achieved the same state after applying the transactions in the proposal
If a Byzantine validator sends out proposals with a different transactions
order to different validators, the validators do not need to spend time
checking the order and applying the transactions on the prevote stage.
A different transactions order will be detected when comparing the
propose_hash
received
in the prevote messages from other validators and the propose_hash
received
in the proposal message.
Thus, the split of work helps reduce the negative impact of Byzantine nodes on the overall system performance.
Requests Algorithm¶
Requests algorithm allows a validator to restore any consensus info from other validators. This has a positive effect on system liveness.