Leader Election algorithm: RoundRobin

Introduction

We suggest that readers already be familiar with the main consensus flow prior to reading this blog.

As we already know, blocks for each height are created from block proposals. Each validator generates those proposals separately. The network selects one of the proposals for the height H and creates a new block from it.
We want to create an algorithm with the following properties:

  1. There is some order among the different block proposals. We need such order to choose one of the generated proposals deterministically.
  2. Such order should be changed for each height H. Even if a malefactor takes the control over nodes that have a maximum priority right now, they will lose that priority on the next blockchain height.

The RoundRobin algorithm satisfies these prerequisites. Here is its short description:

  1. We define some order over the nodes for each height.
  2. The time between a block’s acceptance is divided into the rounds.
  3. In the first round every node waits for the block_proposal from the node #1;
  4. If there is no correct proposal received, then the whole network moves into the round #2 and waits for the block_proposal from the node #2; etc.
  5. After round #N node #1 receives priority back.

Wherein on the height H+1 the node order moves to 1: we have priorities 1, 2, .., N - 1, N on the height 12345 and priorities 2, 3, .. , N, 1 on the height 12346.

rounds

We call the node with a maximum priority the round leader.

Algorithm enhancement

Desired properties

That was a common idea. Now let us consider what characteristics we want from this algorithm.

  1. required The algorithm should be common for all nodes; that is, all nodes that have actual assets-blockchain state should receive identical nodes order after executing Round Robin.
  2. required The algorithm should guarantee that there will be a block in the near future that is generated by the honest validator. This feature is called censorship resistance. There should not be any Byzantine strategy after which only Byzantine blocks would be accepted. Note that with usual RoundRobin, usually we will accept blocks from the node #1 as it creates a block proposal earlier than others; this remark will help us further.
  3. required In particular, the algorithm should essentially depend on factors that are not under the influence of some node (or some predefined nodes). For example, the correct algorithm should not depend on a prevblockhash, because such hash is directly defined by the leader node from the previous height. That is, the previous leader can generate block hash that gives higher priority to the malicious nodes on the next block height.
  4. desirable The order of nodes should differ in height to one another. Due to this property, Byzantine nodes would not follow one after the other in a series but would be randomly interspersed with honest nodes.
  5. desired The algorithm should not give preferences to any nodes (or artificially decrease priority for other nodes). If not, then some nodes are more valuable for malefactors.
  6. desired Round Robin orders should be calculated strictly after the previous block was accepted, but not earlier. If not then the malefactor gets a lot of time to develop its strategy for some future height.

Actual modifications

Let us make some modifications to the basic RoundRobin algorithm.
Also, let us have F Byzantine validators, as well as N = 3 * F + 1 validators in total. H is the current height for assets-blockchain.

  1. Every node whose block proposal is accepted by the network (that is, after the node becomes the author of the newly accepted block) is moving to the locked state for F blocks. During the next F blocks, one would not have right to create new block proposals. The node would behave itself as usual in other activities, that is one would vote for a new block, sign messages, etc. Such exclusion provides censorship resistance for our algorithm (desired property 2): even if the malefactor takes control over F nodes, one would not be the author of every future block. After every Byzantine node is moved to the locked state, a new block author is guaranteed to be a honest node.
    locking authors Remove authors of the accepted block proposals on the previous F heights. That is, Round Robin on each height includes M = N - F = 2 * F + 1 validators. Let us re-enumerate them as 0, 1, ..., M - 1 according to their base numbers.
  2. To suffice the desired property 4 we may shuffle this list pseudo-randomly. A shuffling should be deterministic but the place of each node should be uniformly distributed over the list.
    To do so, we take a permutation over these M validators. The permutation number is calculated as T = Hash(H) mod M!. This calculation provides uniform distribution of the orders, so Byzantine validators would be randomly distributed inside the current H height.

Experiments

We held experiments with the following versions of python and packages:

from sys import version  
from math import factorial  
from hashlib import sha256  
import numpy as np  
from matplotlib import pyplot as pp  
from matplotlib.colors import LinearSegmentedColormap, hex2color  
import seaborn  
import scipy

%pylab inline
%config InlineBackend.figure_format = 'retina'
pylab.rcParams['figure.figsize'] = (6.0, 4.0)  
Populating the interactive namespace from numpy and matplotlib
print('python version: ', version)  
print('numpy version: ', np.__version__)  
print('scipy version: ', scipy.__version__)  
python version:  3.5.2 |Anaconda custom (64-bit)| (default, Jul  5 2016, 11:41:13) [MSC v.1900 64 bit (AMD64)]
numpy version:  1.11.3
scipy version:  0.19.0

Let's make some experiments for the first 10,000,000 blocks.

Quick nodes

As we remember, usually a block proposal from a leader of the round #1 is accepted. That is if every node is quick enough, distribution of nodes over the position #1 (after shuffling) should be uniform.

First, let's define some useful functions:

def kth_permutation(lst, k):  # return K-th permutation over the given list  
    lst = lst[:] 
    n = len(lst)
    res = []
    bign = factorial(n)
    for ic in range(n-1, -1, -1):
        bign = bign // (ic + 1)
        elem_pos = int(k / bign)
        res.append(lst.pop(elem_pos))
        k = k % bign
    return res

def to_cmap(colors):  
    step = 1 / (len(colors) - 1)
    cmap_dict = {'red': [],
                 'green': [],
                 'blue': []}
    pos = 0.0
    for ic, color_hex in enumerate(colors):
        color = hex2color(color_hex)
        if ic == len(colors) - 1:
            pos = 1.0

        cmap_dict['red'].append((pos, color[0], color[0]))
        cmap_dict['green'].append((pos, color[1], color[1]))
        cmap_dict['blue'].append((pos, color[2], color[2]))
        pos += step
    return LinearSegmentedColormap('exonum',segmentdata=cmap_dict)    

EXONUM_CMAP = to_cmap(['#5c6700', '#bccc29', '#f5ff9a'])  
EXONUM_COLOR = '#d0e01b'

np.random.seed(514230)  

We will work with 16 nodes, the first 5 of them are be Byzantine.

N = 16  
F = 5  
n_blocks = 10000000  

Let us look to distribution of nodes over position in the permutation if all nodes are quick enough and block_proposal from node #1 is always accepted:

# magic only here
def get_permutation_number(h, m_fac):  
    hashed_height = int(sha256(h.to_bytes(4,byteorder='big')).hexdigest(),base=16)
    k = hashed_height % m_fac
    return k

def generate_blocks(n_blocks, N, F, get_block_author_from_shuffle):  
    mat_counters = np.zeros(shape=(N, N - F),dtype=np.int32)
    block_authors = np.zeros(shape = (n_blocks,), dtype=np.int32)

    # The list of authors of previous F blocks. 
    # Actualy it is a queue: we append to the end and pop elements from the head
    locked_validators = [] 
    # As we have (N - F) validators for each block, then (N - F)! permutations could happen
    m_fac = factorial(N - F)  


    for h in range(n_blocks):
        allowed_validators = [_ for _ in range(N) if _ not in locked_validators] 
        k = get_permutation_number(h, m_fac)
        shuffled_validators = kth_permutation(allowed_validators, k)
        for ic, new in enumerate(shuffled_validators[:N-F]):
            mat_counters[new, ic] += 1
        new_author = get_block_author_from_shuffle(shuffled_validators)
        block_authors[h] = new_author
        locked_validators.append(new_author)  # lock new author
        locked_validators = locked_validators[-F:]  # release the oldest locked validator

    return mat_counters, block_authors

# We believe the block proposal from the first node as accepted
mat_counters, block_authors = generate_blocks(n_blocks, N, F, lambda shuffled: shuffled[0])  

How are validators positioned distributed after shuffling?

def plot_heatmap(mat_counters):  
    ax = pp.axes()
    seaborn.heatmap(mat_counters, ax=ax, cmap=EXONUM_CMAP)
    ax.set_xlabel('Position after shuffling')
    ax.set_ylabel('Validator id')
    pp.show()

plot_heatmap(mat_counters)  

png

The node position looks randomly distributed. Let look if there are any outliers when validator X falls into Pth position too often or too rare:

def plot_frequencies_hist(mat_counters):  
    ax = pp.axes()    
    pp.hist(mat_counters.reshape(-1), bins=20, color=EXONUM_COLOR)
    ax.set_xlabel('Count of matches: node vs position')
    ax.set_ylabel('Frequency')
    ax.set_title('How often a node falls into position')
    print('frequencies mean: {0}, frequencies std: {1:.2f}'.format(
            int(np.mean(mat_counters)),
            np.std(mat_counters)))

plot_frequencies_hist(mat_counters)  
frequencies mean: 625000, frequencies std: 835.97

png

Frequencies are centered around 625000 and deviation is low. Looks like there are no significant outliers.

Off-topic. Is it distributed normally?

Looks like it isn't, let's check it:

p_value = scipy.stats.shapiro(mat_counters.reshape(-1))[1]  
print('p-value: {0:.4f}'.format(p_value))  
p-value: 0.7377

The values of mat_counters represent the sum of series of length 10,000,000 of atomic variables. Here the atomic random variable stands for 1 if node X come into the place Y, and 0 else.
According to the Central Limit Theorem, when independent random variables are added, their sum tends toward a normal distribution even if the original variables themselves are not normally distributed.
However, our problem is that atomic variables are not independent: for each height H, each column of mat_counters could have strictly one non-zero value.

Now, how often does each node falls into the very first position (and becomes a block author)?
def block_authors_hist(block_authors):  
    authors_values, authors_counts = np.unique(block_authors, return_counts = True)
    ax = pp.axes()
    pp.bar(authors_values, authors_counts, color=EXONUM_COLOR)
    ax.set_xlabel('Validator id')
    ax.set_ylabel('frequency')
    ax.set_title('How often the validator becomes a block author')

block_authors_hist(block_authors)  

png

Yet, it is really uniform as we were waiting.

Now, imagine that with respect to other nodes, validators 0-4 are Byzantine ones. How long should we wait until a block from a honest validator is accepted?

honest_blocks_percent = np.sum(np.array(block_authors) >= F) / n_blocks  
print('honest blocks: {0:.2f}%, Byzantine blocks: {1:.2f}%'.format(honest_blocks_percent * 100,  
                                                                (1 - honest_blocks_percent) * 100))
honest blocks: 68.76%, Byzantine blocks: 31.24%
def calculate_time_to_wait(block_authors):  
    # throwing out last X blocks because there are no "honest" blocks after them
    time_to_wait = np.zeros(shape = (n_blocks,), dtype=np.int32)
    block_with_honest_blocks_later = -1
    for h in range(n_blocks):
        for ic in range(1, n_blocks - h):
            if block_authors[h + ic] >= F:  # this block author is honest
                time_to_wait[h] = ic
                block_with_honest_blocks_later = h
                break
    print('We throwed out last {0} blocks'.format(n_blocks - block_with_honest_blocks_later - 1))
    print('time to wait for the first 10 blocks: ', time_to_wait[:10])
    return time_to_wait[:block_with_honest_blocks_later]

def plot_time_to_wait_hist(time_to_wait):  
    wait_values, wait_counts = np.unique(time_to_wait, return_counts=True)
    ax = pp.axes()
    ind=wait_values
    width=0.8
    pp.bar(ind, wait_counts, width, color=EXONUM_COLOR)
    ax.set_xticks(ind + width / 2)
    ax.set_xticklabels(wait_values)
    ax.set_title('How long should we wait until honest block is accepted')
    ax.set_xlabel('Blocks to wait')
    ax.set_ylabel('Frequency')
    pp.show()

print('first ten authors: ', block_authors[:10])  
time_to_wait = calculate_time_to_wait(block_authors)  
plot_time_to_wait_hist(time_to_wait)  
first ten authors:  [ 0  1  2  3  4  6 14 10  5  1]
We throwed out last 2 blocks
time to wait for the first 10 blocks:  [5 4 3 2 1 1 1 1 2 1]

png

As we see, usually the next block comes from a honest validator. The worst situation is when the current block is honest with F Byzantine blocks later. Than we should wait for F+1 blocks to get honest one.

Honest nodes are slow; Byzantine nodes are fast

Now imagine that somehow Byzantine nodes generate block proposals much faster than honest nodes. Also, honest nodes are slow enough. In total, it takes 5 rounds for honest nodes to generate a block proposal while Byzantine nodes send their proposals immediately.

def block_author_slow_honest_nodes(shuffled):  
    honest_slow = 5  # how much round it takes to honest node to generete block proposal
    # if first 5 rounds are for honest nodes then honest node generates proposal.
    # else first Byzantine node generate proposal
    for ic, validator in enumerate(shuffled):
        if validator < F:
            return validator
        if ic >= honest_slow:
            return shuffled[0]

mat_counters, block_authors = generate_blocks(n_blocks, N, F, block_author_slow_honest_nodes)  
plot_heatmap(mat_counters)  

png

It is interesting: the Byzantine nodes appear in the list of probable block authors rarely compared to the usual nodes. That happens because the Byzantine nodes become authors of a block much more often, so usually they sit in "prison" :)

plot_frequencies_hist(mat_counters)  
frequencies mean: 625000, frequencies std: 225896.63

png

honest_blocks_percent = np.sum(np.array(block_authors) >= F) / n_blocks  
print('honest blocks: {0:.2f}%, Byzantine blocks: {1:.2f}%'.format(honest_blocks_percent * 100,  
                                                                (1 - honest_blocks_percent) * 100))
honest blocks: 31.89%, Byzantine blocks: 68.11%

But despite the fact that the honest nodes are so slow, they generate a significant amount of blocks anyway.

block_authors_hist(block_authors)  

png

Byzantine nodes create many more blocks than honest nodes. But we still have enough correct blocks.

print('first ten authors: ', block_authors[:10])  
time_to_wait = calculate_time_to_wait(block_authors)  
plot_time_to_wait_hist(time_to_wait)  
first ten authors:  [ 0  1  2  3  4  6 14 10  2  1]
We throwed out last 4 blocks
time to wait for the first 10 blocks:  [5 4 3 2 1 1 1 5 4 3]

png

And we still need just F + 1 blocks to wait for honest block.

Conclusion

We use the basic RoundRobin algorithm for choosing the round leader with two modifications.

  1. Only M = N - F validators could propose blocks for the new height. If the validator was the author of one of the last blocks, it couldn’t generate block proposals for some time.
  2. On each height, we shuffle the list of allowed validators. That is, every allowed node has the same chance to turn up at the beginning of priority queue or at the end.

This modification gives us the next features:

  1. The algorithm is common for all nodes
  2. The algorithm guarantees that during the next F + 1 block there will be a block generated by the honest node (i.e. censorship resistance)
  3. Nodes priorities do not depend on Byzantine node behavior.
  4. Byzantine nodes do not follow one after the other in a series but are randomly interspersed with honest nodes.

We still want to bring the following properties to the leader selection procedure. We will enhance the algorithm further.

  1. The algorithm should not give preference to any nodes (or artificially decrease priority for other nodes).
  2. Round Robin orders may be calculated strictly after the previous block was accepted but not earlier.