Exonum Data Model
This page describes Exonum data storage principles, from the database engine used (RocksDB), to the abstractions that are used in client applications.
- Exonum table types lists supported types of data storage collections. Tables represent the highest abstraction level for data storage
- Low-level storage explains how tables are persisted using RocksDB
- View layer describes the wrapper over the DB engine that ensures atomicity of blocks and transactions
- List of system tables contains tables used directly by the Exonum core
- Indexing gives an insight how indices over structured data can be built in Exonum
Tables (aka indices) perform the same role as in relational database management systems (RDBMSs). Every table stores records of a specific type. However, unlike RDBMS tables, all Exonum tables internally are implemented as wrappers around key-value stores. Both keys and values in the wrapped stores are persisted as byte sequences. Exonum does not natively support operations (matching, grouping, sorting, etc.) over separate value fields, as it is the case with other key-value storages.
Key Sorting and Iterators
Exonum tables implement iterators over stored items (or keys, values, and key-value pairs in the case of maps). Such iterators use key ordering of the underlying key-value storage to determine the iteration order. Namely, keys are lexicographically ordered over their binary serializations; this ordering coincides with that used in RocksDB.
MapIndex implements a key-value store aka a map. It has
the following functionality:
- Get, set and remove value by key
- Check if a specific key is present in the map
- Iterate over the key-value pairs in the lexicographic key order
- Iterate over keys in the lexicographic key order
- Iterate over values in the lexicographic key order
- Clear the map (i.e., remove all stored key-value pairs)
ListIndex represents an array list.
The following operations are supported:
- Get and set a list item by index
- Append an item to the list
- Pop or poll the last item from the list
- Get the list length
- Check if the list is empty
- Iterate over index-item pairs ordered by index
- Insert a sequence of items from an iterator
- Truncate the list to the specified length
- Clear the list (i.e., remove all stored items from the list)
ListIndex does not support inserting items in the middle of the
list or removing items by index
(although it is still possible to implement these operations manually).
ListIndex saves its items with 8-byte unsigned item
indices as keys, serialized in big-endian form (to support proper
The list length is saved in this map with a
zero-length byte sequence as a key.
SparseListIndex represents a
ListIndex that may
contain "gaps". It provides the possibility to delete elements not only from
the end of the list, but from any part thereof. Such deletions do not break
the order of the indices inside the list.
The remaining functionality of the
SparseListIndex is the same as for
ValueSetIndex represents a hash set.
The following operations are implemented:
- Add and remove set elements
- Check if an element is already present using the element itself or its hash
- Iterate over stored elements in the lexicographic order of their hashes
- Iterate over hashes of elements in the lexicographic order
- Clear the set (i.e., remove all elements)
The hash used in
ValueSetIndex is calculated using the
All built-in types implementing
StorageValue compute this hash as SHA-256
of the binary serialization of a type instance.
ValueSetIndex uses element hashes as keys,
and elements themselves as corresponding values.
KeySetIndex represents a set.
The following procedures are implemented:
- Add and remove set elements
- Check if a specific element is in the set
- Iterate over elements in the lexicographic order
- Clear the set (i.e., remove all stored elements)
Internally, the element is used as a key, and its value is always empty.
KeySetIndex vs ValueSetIndex
ValueSetIndex uses a hash as a key,
KeySetIndex puts an entire binary
serialization of an element into the key.
KeySetIndexdoes not have an additional overhead on hashing set elements.
KeySetIndexshould not be used when set elements are relatively big; only small elements should be stored in it (such as integers, small strings, small tuples). On the other hand, the
ValueSetIndexmore easily handles storing big and complex elements.
KeySetIndexintroduces a lexicographical order over stored elements, while the
ValueSetIndexorders elements arbitrarily due to hash function properties.
Entry represents an index that contains only one element.
The following operations are implemented:
- Get, set and remove value
- Check if the value is present
Merkelized indices represent a list and a map with additional features. Such indices can create the proofs of existence or absence for stored data items.
When a light client requests data from an Exonum full node, the proof can be built and sent along with the actual data. Having block headers and this proof, the client may check that received data was really authorized by the validators without having to replicate the entire blockchain contents.
ProofListIndex implements a Merkle
tree, which is a Merkelized version of an
array list. It implements the same methods as
ListIndex, and adds an
additional feature: based on Merkle trees,
ProofListIndex allows efficiently
creating compact proofs of existence for the list items.
The following additional procedures are implemented:
- Get the height of the Merkle tree. As the tree is balanced (though may be not
full), its height is close to
log2of the list length
- Get the value of the tree root (i.e., the hash of the entire Merkle tree)
- Build a proof of existence for an item at a specific position
- Build a proof of existence for items at a specific contiguous index range
ProofListIndex is append-only; it does not allow deleting list items.
The only way to delete an item from a
ProofListIndex is clearing it.
ListIndex, list items are stored with 8-byte keys. However,
ProofListIndex also persists all intermediate nodes of the Merkle tree
built on top of the list, in order to quickly build proofs and recalculate
the Merkle tree after operations on the list.
ProofMapIndex is a Merkelized version of a map
based on Merkle Patricia tree.
It implements the same methods as the
MapIndex, adding the ability to
create proofs of existence for its key-value pairs, or proofs of absence
if a key is absent in the map. The following additional
procedures are supported:
- Get the root node’s value
- Build a proof for the requested key. Tree proves either key existence (and its value), or key absence
Exonum uses third-party database engines to persist blockchain state
locally. To use the particular database, a minimal
interface should be implemented for it:
- Get a value by a column family name and a key
- Put a new value at the specified column family / key (insert or update the saved one)
- Delete a key-value pair by column family name / key
All the tables functionality is reduced to these atomic call types.
Currently the main database engine is RocksDB.
Values from different tables are stored in column families in the low-level storage, wherein the keys are represented as a byte sequence, and values are serialized according to Protobuf serialization format. A single column family may store data for more than one table (see table groups below). Keys of a specific table are mapped to the low-level storage keys in a deterministic manner using table identifiers.
Every table is uniquely identified by a compound identifier, which is used to map table keys into a column family and its keys in the underlying low-level storage. A table identifier consists of 2 parts:
- String name, which is mapped 1-to-1 to a column family.
The name may contain uppercase and lowercase Latin letters, digits,
_, and periods
.. By convention, table names in services should start with the service name and a period. For example, the only table in the Cryptocurrency Tutorial is named
cryptocurrencyis the service name, and
walletsis the own name of the table.
- Optional prefix presented as a sequence of bytes (
Vec<u8>in Rust terms).
If the prefix is present, the column family identified by the table name stores a group of tables, rather than a single table. In this case, prefixes are used to distinguish tables within the group.
key at the table with name
exchange.crypto and prefix
0x42 0x54 0x43 in ASCII) matches key
0x42 0x54 0x43 | key in the column family in RocksDB named
It is strongly advised not to admit a situation when a table prefix in a table group starts with another table prefix in the same group. Such cases may cause unpredictable collisions between logically different keys and elements. As a possible way to avoid this, prefixes within the group may have a fixed byte size.
Exonum introduces additional layer over database to handle transaction and block atomicity.
Patch is a set of serial changes that should be applied to the low-level storage atomically. A patch may include two types of operations: put a value addressed by a key, or delete a value by a key.
Snapshot fixes the storage state at the moment of snapshot creation and provides a read-only API to it. Even if the storage state is updated, the snapshot still refers to the old table content.
Forks implement the same interfaces as the database underneath, transparently wrapping the real data storage state, and some additional changes. Every fork is based on the storage snapshot. From the outer point of view, the changes are eagerly applied to the data storage; however, these changes are stored directly in the fork and may be easily rolled back. Moreover, there may be different forks of the same database snapshot.
Forks are used during transaction and block processing.
A fork is successively passed
to each transaction in the block to accumulate changes produced by the
in a patch.
If one of transactions in the block quits with an unhandled exception (i.e.,
execution, its changes are promptly rolled back, so that execution of the
transactions continues normally.
The core maintains tables that are used for core blockchain
functionality. All of them have
Represents a map from the transaction hash into a raw transaction structure.
Keeps execution results for all accepted transactions, indexed by transaction hashes.
Stores the set of hashes of the known transactions that have not been committed yet.
Caches the number of entries in
Keeps the block height and the tx position inside the block for every transaction hash.
Stores the block object for every block height.
Saves the block hash that has the requested height.
Group of tables keyed by a block height. Each table keeps a list of transactions for the specific block.
Group of tables keyed by a block hash. Each table stores a list of precommits of the validators for the specific block.
Stores the configuration content in JSON format, using its hash as a key.
Builds an index to quickly get a configuration that should activate at the specific height.
An accessory table used to calculate the "aggregation" of the root hashes of the individual service tables. In effect is sums up the state of various entities scattered across distinct services and their tables.
Unlike relational databases, Exonum does not support indices over fields of table elements as a first-class entity. However, it is possible to create additional tables with indexing semantics and update their content together with the tables being indexed.
The system table
block_transactions stores a list of transactions
for every block.
transactions_locations is an auxiliary table that
an index to quickly lookup
block_transactions by a transaction hash.