Consensus Algorithm Specification¶
This article contains the specification of the consensus algorithm in Exonum.
Consensus algorithm in Exonum is the process of reaching an agreement about the order of transactions and the result of their execution in presence of Byzantine faults and partially synchronous network. While executing the consensus algorithm, nodes exchange consensus messages authenticated with public-key crypto. These messages are processed via a message queue and determine transitions of nodes among states. The output of the consensus algorithm is a sequence of blocks of transactions, which is guaranteed to be identical for all honest nodes in the network.
Tip
See algorithm overview and list of assumptions for more details.
Definitions¶
The definitions from the general description of the consensus algorithm are used. In particular, +2/3 means more than two thirds of the validators, and -1/3 means less than one third.
Epochs¶
The consensus algorithm occurs in epochs. During each epoch nodes can agree on one of the following types of outcomes:
-
Apply a block of transactions. The nodes need to agree on the ordering of transactions, their effect on the blockchain state and errors that occurred during execution. Besides transactions, hooks for active services are executed before and after transactions.
-
Skip block creation on this epoch, instead approving a block skip. Block skips are similar to empty blocks, but they do not entail executing any calls. Thus, the blockchain state is unaffected after a block skip (or any number of successive block skips).
An epoch is represented by non-negative integer, which is incremented by one each time a decision is made by the network. Note that this integer may differ from the blockchain height; the epoch is always greater or equal to the height.
Applying a block increases the blockchain height by one. The block is permanently recorded into the node storage. In contrast, applying a block skip does not increase the blockchain height. The node stores only the latest block skip, overwriting it if a skip is approved at the same blockchain height and the greater epoch. If the network approves a normal block, a stored block skip (if any) is erased by the node.
Rounds¶
The consensus algorithm proceeds in rounds for each epoch. Rounds are numbered from 1 and are not synchronized among nodes.
The onsets of rounds are determined by the following timetable.
The first round starts after committing a normal block or a block skip
at the previous epoch.
The second round starts within first_round_timeout
interval,
The value of which can be configured via
consensus configuration.
All further round timeouts are calculated according to the following formula:
first_round_timeout + (r - 1) * round_timeout_increase
Here, round_timeout_increase
is 10% of the first_round_timeout
.
Thus, the duration of rounds gradually increases. This provides the network with more time every round to make a decision on a new block.
Pool of Unconfirmed Transactions¶
Each node has a set of transactions that have not yet been added to the blockchain. This set is called pool of unconfirmed transactions. In general, the pools of unconfirmed transactions are different for different nodes. If necessary, the nodes can request unknown transactions from other nodes.
Proof-of-Lock¶
A set of +2/3 Prevote
messages for the same proposal from the nodes at the
current round and epoch is called Proof-of-Lock (PoL). Nodes
store PoL as a part of the node state. The node can have no more than one
stored PoL.
A PoL is greater than the recorded one (has a higher priority), in the following cases:
- There is no PoL recorded.
- The recorded PoL corresponds to a proposal with a smaller round.
Thus, PoLs are partially ordered. A node must replace the stored PoL with a greater PoL if it is collected by the node during message processing.
Configuration Parameters¶
-
max_propose_timeout
Initial proposal timeout after a new block is committed to the blockchain. -
propose_timeout_threshold
If the amount of transactions in the pool of unconfirmed transactions of a node is larger than thepropose_timeout_threshold
value, the node switches frommax_propose_timeout
tomin_propose_timeout
, i.e. generates blocks faster, and vice versa. -
min_propose_timeout
The proposal timeout for the case when the amount of transactions in the pool of unconfirmed transactions is larger thanpropose_timeout_threshold
value. -
first_round_timeout
Interval between the first and the second rounds of the consensus algorithm. This parameter is used to estimate timeouts for further consensus rounds. The estimation formula isfirst_round_timeout + (r - 1) * round_timeout_increase
, whereround_timeout_increase
is 10% of thefirst_round_timeout
. -
status_timeout
Interval betweenStatus
message broadcasts.
Tip
These parameters are a part of the global configuration. They can be adjusted with the help of the supervisor service.
Node State Variables¶
-
current_height
Current blockchain height (i.e., the number of blocks in it). -
current_epoch
Current epoch of the consensus algorithm. -
queued
Queue for consensus messages (Propose
,Prevote
,Precommit
) from a future epoch or round. -
proposes
Hash map with known block proposals. -
locked_round
Round when the node has locked on a proposal. 0 if the node is not locked. -
current_round
Current round (1-based). -
locked_propose
Propose
on which the node is locked. May be undefined. -
state_hash
Hash of the blockchain state.
Consensus Messages¶
The consensus algorithm uses the following types of messages:
Propose
,
Prevote
,
Precommit
,
Status
,
BlockResponse
.
The following fields are present in all messages:
-
validator_id
Index of a validator in thevalidators
list in the global configuration. -
epoch
Epoch of the consensus algorithm to which the message is related. -
round
Round to which the message is related. -
hash
Hash of the message.
Propose
and BlockResponse
messages have the following additional field:
prev_hash
Hash of the previous block in the blockchain.
Prevote
messages have the following additional field:
locked_round
Round in which the author of the message has locked on the proposal which is referenced by the message. If the author is not locked on any proposal, thelocked_round
field is 0.
Note
A node that is locked on a proposal must send Prevote
s only
for the proposal it is locked on. Thus, locked_round
in Prevote
s sent
by a node is always equal to locked_round
from its state.
Prevote
and Precommit
messages have the following additional fields:
propose_hash
Hash of thePropose
message corresponding to this message.
Precommit
messages have the following additional fields:
state_hash
Hash of the blockchain state after the execution of all transactions in thePropose
referenced by thePrecommit
.time
Local time of the validator duringPrecommit
generation. This field does not influence the validity of thePrecommit
; rather, it is used by light clients to check whether responses from full nodes correspond to the current state of the blockchain.
Algorithm Stages¶
The algorithm proceeds in stages, transitions among which are triggered by incoming messages and timeouts.
- Full proposal
Occurs when the node gets complete info about some proposal and all the transactions from the proposal. - Availability of +2/3
Prevote
s
Occurs when the node collects +2/3Prevote
messages from the same round for the same known proposal. - Lock
Occurs when the node replaces the stored PoL (or collects its first PoL for thecurrent_epoch
). - Commit
Occurs when the node collects +2/3Precommit
messages for the same round for the same known proposal. Corresponds to the Commit node state.
The steps performed at each stage are described below.
Message Processing¶
Nodes use a message queue based on the tokio
library for message
processing. Incoming requests and consensus messages are placed in the queue
when they are received. The same queue is used for processing timeouts. Timeouts
are implemented as messages looped to the node itself.
Messages from the next epoch (i.e., current_epoch + 1
) or from a future
round are placed into a separate queue (queued
).
Deserialization¶
- Check the message against the serialization format.
- If any problems during deserialization are detected, ignore the message as something that a node cannot correctly interpret.
- If verification is successful, proceed to Consensus messages processing or Transaction processing, depending on whether the message is a transaction.
Transaction Processing¶
- If the transaction is already committed, ignore it.
- If the transaction is already in the pool of unconfirmed transactions, ignore it.
- Add the transaction to the pool of unconfirmed transactions.
-
For every known proposal
propose
where this transaction is included:- Exclude the hash of this transaction from the list of unknown transactions
for the
propose
. - If the number of unknown transactions for the proposal becomes zero,
proceed to Full proposal state for
propose
.
- Exclude the hash of this transaction from the list of unknown transactions
for the
Consensus Messages Processing¶
-
Do not process the message if it belongs to a future round or epoch. In this case:
- If the message refers to the epoch
current_epoch + 1
, add the message to thequeued
queue. - If the message is related to a future epoch and updates the knowledge of the node about the current epoch of the message author, save this information according to the requests algorithm.
- If the message refers to the epoch
-
If the message refers to a past epoch, ignore it.
-
If the message refers to the current epoch and any round not higher than the current one, then:
- Check that the
validator_id
specified in the message is less than the total number of validators. - Check the message signature against the public key of the validator with
the
validator_id
index.
- Check that the
-
If verification is successful, proceed to the message processing according to its type.
Note
Consensus messages can lead to sending request(s) based on the information in the message. Requests are used to obtain information unknown to the node, but known to its peers. This part of message processing is described in the Requests article and is only marginally touched upon here.
Propose¶
Arguments: propose
.
- If
propose.hash
is already present in theproposes
hash map (i.e., the message has been processed previously), ignore the message. - Check
propose.prev_hash
correctness. - Check that the specified validator is the leader for the given round.
- Check that the proposal does not contain previously committed transactions
(
Propose
messages contain only hashes of transactions, so absence of hashes in the table of committed transactions is checked). - Add the proposal to the
proposes
hash map. - Request missing information based on the message.
- If all transactions in the proposal are known, go to Full proposal.
Prevote¶
Arguments: prevote
.
- Add
prevote
to the list of knownPrevote
messages for the given proposal inprevote.round
. -
If:
- the node has formed +2/3
Prevote
messages for the same round andpropose_hash
locked_round < prevote.round
- the node knows a
Propose
message referenced by thisprevote
- the node knows all the transactions from the
Propose
- the node has formed +2/3
-
Then proceed to Availability of +2/3
Prevote
s for the referencedPropose
message inprevote.round
.
Precommit¶
Arguments: precommit
.
- Add the message to the list of known
Precommit
s forpropose_hash
in this round with the givenstate_hash
. -
If:
- the node has formed +2/3
Precommit
s for the same round,propose_hash
andstate_hash
- the node knows the
Propose
referenced bypropose_hash
- the node knows all the transactions in this
Propose
- the node has formed +2/3
-
Then:
- Execute the proposal, if it has not yet been executed.
- Check that
state_hash
of the node coincides with thestate_hash
in thePrecommit
s. If not, stop working and signal about an unrecoverable error. - Proceed to Commit for this block.
-
Else:
BlockResponse¶
Arguments: block
.
Note
BlockResponse
messages are requested by validators if they see a
consensus message belonging to a future blockchain height or epoch.
BlockResponse
messages are not a part of an ordinary consensus
message workflow for nodes at the latest epoch.
-
Check the
BlockResponse
message:- The key in the
to
field must match the key of the node. block.prev_hash
must match the hash of the latest committed block.- If the returned value is a normal block, its height must equal
the current height of the node. If the returned value is a block skip,
its height must equal the previous height (i.e.,
current_height - 1
) and its epoch must be greater thancurrent_epoch
. - The number of
Precommit
messages from different validators must be sufficient to reach consensus. - All
Precommit
messages must be correct.
- The key in the
-
If the checks are successful, then check all transactions in the block for correctness. If some transactions are incorrect, stop working and signal about an unrecoverable error.
-
Execute all transactions. If the hash of the blockchain state after the execution diverges from that in the
Block
message, stop working and signal about an unrecoverable error. -
Add the block to the blockchain and move to a new epoch. Set the value of the variable
locked_round
at the new epoch to0
.
Timeout Processing¶
Round Timeout¶
- If the timeout does not match the current epoch and round, skip further timeout processing.
- Add a timeout for the
N
th round with the lengthfirst_round_timeout * (1 + (N - 1) * q)
, whereq = 0.1
is the relative increase in a timeout after each round. - Process all messages from
queued
that have become relevant (their round and epoch coincide with the current ones). - If the node has a saved PoL, send a
Prevote
forlocked_propose
in the new round, and proceed to Availability of +2/3Prevote
s. - Else, if the node is a leader, form and send
Propose
andPrevote
messages (after expiration ofpropose_timeout
, if the node has just moved to a new epoch).
Status Timeout¶
- If the epoch of the node has not increased since the timeout was set, then
broadcast a
Status
message to all peers. - Add a timeout for the next
Status
broadcast (its length is specified bystatus_timeout
).
Stage Processing¶
Full Proposal¶
Arguments: propose
, all transactions in which are known.
- If the node does not have a saved PoL, send a
Prevote
message in the round to which the proposal belongs. -
For each round
r
in the interval[max(locked_round + 1, propose.round), current_round]
:- If the node has +2/3
Prevote
s forPropose
inr
, then proceed to Availability of +2/3Prevote
s forpropose
inr
.
- If the node has +2/3
-
For each round
r
in the interval[propose.round, current_round]
:-
If +2/3
Precommit
s are available forpropose
inr
and with the samestate_hash
, then:- Execute the proposal, if it has not yet been executed.
- Check that the node’s
state_hash
after applying transactions inpropose
coincides with thestate_hash
in the aforementioned +2/3Precommit
s. If not, stop working and signal about an unrecoverable error. - Proceed to Commit for this block.
-
Availability of +2/3 Prevote
s¶
Arguments: shared propose_hash
and round
of the collected +2/3
Prevote
s.
- If
locked_round
of the node is less thanprevote.round
and the hash of the lockedPropose
message is the same aspropose_hash
in the collectedPrevote
s, then proceed to Lock for the latterPropose
message.
Lock¶
-
For each round
r
in the interval[locked_round, current_round]
:- If the node has not sent
Prevote
inr
, send it forlocked_propose
with thelocked_round
specified as thelocked_round
in the node state. - If the node has formed +2/3
Prevote
s inr
, then changelocked_round
tocurrent_round
,locked_propose
topropose.hash
(propose
corresponds to +2/3Prevote
s inr
). -
If the node did not send
Prevote
for any other proposal exceptlocked_propose
in subsequent rounds afterlocked_round
, then:- Execute the proposal, if it has not yet been executed.
- Send
Precommit
forlocked_propose
incurrent_round
. - If the node has +2/3
Precommit
s for the same round with the samepropose_hash
andstate_hash
, then proceed to Commit.
- If the node has not sent
Commit¶
Arguments: block or block skip (i.e., a proposal with all known transactions,
and the state_hash
resulting from the execution of all transactions in the
proposal).
- Store the block.
- Push all the transactions from the block to the table of committed transactions.
- Increment
current_epoch
. If the block is a normal block, incrementcurrent_height
. - Set the value of the variable
locked_round
to0
at the new epoch. - Delete all transactions of the committed block from the pool of unconfirmed transactions.
- If the node is the leader, form and send
Propose
andPrevote
messages afterpropose_timeout
expiration. - Process all messages from the
queued
that have become relevant (their round and epoch coincide with the current ones). - Add a timeout for the next round.
Properties¶
Note
Formal proof of the following properties is proven in a separate white paper.
Warning
The white paper does not consider block skips. While adding other possible consensus outcomes does not influence safety and liveness arguments, it could adversely affect chain quality.
If:
- Digital signature scheme used in the algorithm is secure (i.e., signatures cannot be forged)
- Network is partially synchronous (i.e., all messages are delivered in finite, but a priori unknown time)
- Less than 1/3 of validators act Byzantine (i.e., in an arbitrary way, including being offline, having arbitrary hardware and/or software issues or being compromised, possibly in a coordinated effort to break the system)
Then the algorithm described above has the following properties:
-
Safety
If an honest node adds a block to the blockchain, then no other honest node can add a different block, confirmed with +2/3Precommit
messages, to the blockchain at the same epoch. -
Liveness
At any point in time, a block will be committed eventually by an honest node in the future. In other words, the transaction processing won’t stall. -
Weak form of chain quality
1 block out of anyF + 1
(whereF
is one third of the validators) sequentially committed blocks is guaranteed to be proposed by non-Byzantine validators. This can provide a certain degree of censorship resistance (any correct transaction broadcasted to every validator will be committed eventually).