Consensus, message delivery and time

13 October 2017

alt A fundamental problem in distributed computing is how to achieve system reliability in the presence of faulty processes. The blockchain consensus is not an exception. Simply speaking, the problem is how to add (commit) a new block (which is the same for all or the most of the validators) to the blockchain via validators' voting, when some validators may be non-responsive or compromised.

Naturally, blocks shouldn't be defined as constant (for example, empty) – they should depend on incoming information (for example, new transactions from users). Hence, validators must send and process messages among themselves to arrive at a common block. Understanding the network model (that is, assumptions how messages are delivered among validators) is therefore crucial for solving the distributed consensus problem.

Networks and Time

There are three most popular assumptions as to the network model:

Asynchronous system: Any message could be delivered at any time after sending, or not be delivered at all. In other words, messages from validator A to validator B could be delivered in any order.

Partially synchronous system: Some unknown time T exists, such that each message is delivered faster than T.

Synchronous system: Each message is delivered faster than a known finite latency T.

Why consider different assumptions for message delivery? Because each assumption has completely different mathematical properties. For example, it is shown in [2] that there is no correct deterministic consensus algorithm in the case of an asynchronous system if at least one validator is faulty. This observation is quite natural: if messages do not get delivered, validators will not get any information from each other. And if consensus needs to produce results depending on external input, then consensus becomes impossible. (We recommend discussion [3] for a better understanding of [2].)

Thus, if one wants to construct a deterministic consensus algorithm, an upper bound for message delivery is needed. Both partially synchronous and synchronous network models make such assumptions. The partially synchronous model is more general, and is more realistic for relatively small validator networks – it is hardly possible to predict network delays in such networks beforehand. In large public networks, such as Bitcoin, the synchronous model is more reasonable, as an adversary’s power to disrupt network operation is substantially reduced. (As a side note, Bitcoin does not require low network latency T due to relatively large block times.)

Hence, partially synchronous network model seems adequate for permissioned blockchain, and it is the one that Exonum uses. In the case of Byzantine faults (roughly speaking, allowing for validators to be hacked or otherwise compromised), this model enables to continue committing transactions when up to 1/3 of validators are compromised. More precisely, if a system has 3f + 1 validators, no more than f of them may be compromised at any time. This is a fundamental theoretical bound; it is rigorously proven that no consensus algorithm can withstand more Byzantine faults. The consensus algorithm used in Exonum reaches this bound.

All for Clients

The partial synchronicity assumes that there is no global time agreed upon by validators. However, validators may still use a local timer to measure relative timespans, for example, timeouts for certain actions. On the one hand, this construction eliminates vulnerabilities associated with time (for example, attacks on the Network Time Protocol). On the other hand, eliminating time from consensus raises a problem for lightweight clients.

A client would like to know whether information returned from a full node is up-to-date. If there is no way to determine this, a node could simply return stale information (say, from a month ago), which would be otherwise valid, but highly misleading to the client. To deal with this problem, Exonum consensus mandates to include local timestamps into Precommit messages signed by validators; these messages are sent to clients to prove information authenticity. The client calculates the median timestamp from the sent blocks; it is guaranteed to be Byzantine-resistant. To see why this is true, observe that the client receives Precommits from at least 2f + 1 validators; by assumption above, no more than f of these messages are faulty (thus, at least f + 1 are not faulty). Hence, there is an honest validator with a timestamp lower or equal to the median and one with a timestamp greater or equal to the median. For this reason, the median is reasonably accurate.

What’s Next?

The described approach is what is used in Exonum right now, but it is not without its quirks. For one, it is not guaranteed that the median time increases with the block height. A possible solution is to make clock synchronization a part of the consensus algorithm; this is a well-studied Byzantine clock synchronization problem [4] with known solutions [5]. For some practical tasks (such as using time in contracting), third-party trusted timestamping authorities may be appropriate. One of such authorities may be the Bitcoin blockchain (or other public blockchains). Although their timestamps are not very accurate, public blockchains with proof-of-work consensus have measurable (and in case of Bitcoin, very high) attack costs so that they could be used together with other timestamping methods for maximum security.

  1. Cynthia Dwork, Nancy Lynch, Larry Stockmeyer. Consensus in the presence of partial synchrony. Journal of the ACM (JACM), Volume 35 Issue 2, 1988. Pages 288-323
  2. Michael J. Fischer, Nancy A. Lynch, Michael S. Paterson. Impossibility of distributed consensus with one faulty process. Journal of the ACM (JACM), Volume 32 Issue 2, April 1985. Pages 374-382
  4. Leslie Lamport, Michael Melliar-Smith. Byzantine Clock Synchronization. Proceedings of the Third Annual ACM Symposium on Principles of Distributed Computing, 1984
  5. Ariel Daliot, Danny Dolev, Hanna Parnas. Linear-time Self-stabilizing Byzantine Clock Synchronization. Proc. of 7th International Conference on Principles of Distributed Systems (OPODIS'03 La Martinique, France), December, 2003