Bitcoin and Ethereum: what happens on nodes that do not mine, and what will happen to them next?

Tomcat

Professional
Messages
2,656
Reputation
10
Reaction score
650
Points
113

Introduction​

Recently, quite often they talk about the prospects of blockchain systems, that in the future blockchain will replace classical payment systems, such as, for example, Visa or Mastercard, and then, perhaps, will radically change jurisprudence thanks to the capabilities of “smart” contracts. But, despite all the expectations, a full-fledged and comprehensive payment system on the blockchain has not yet been created. Payment for real goods and services with cryptocurrencies is usually carried out in cases where some restrictions are imposed on the use of classical payment methods. At the same time, a significant part of transactions using cryptocurrencies is speculative in nature.

There are, of course, many factors hindering the development of blockchain systems. They can be of a technical, economic, political or even psychological nature. This article will discuss only some of the technical limitations of the two most popular blockchain systems - Bitcoin and Ethereum.

It is assumed that you are already familiar with the basic operating principles of these systems. If some terms are not entirely clear, their explanation can be found in the book Mastering Bitcoin by Andreas Antonopoulos (you can find a Russian translation online), and in an article on the principles of Ethereum on Geektimes.

It is also worth noting that recently, in the wake of the popularity of blockchain systems, they have begun to include some distributed data storage facilities, which in the original understanding of the term blockchain are not such, and therefore, as a rule, do not have the proper level of security along with openness and independence. In this article, by blockchain systems we will understand distributed storage that meets the following requirements:
  • Data is stored in a single block chain. Over short periods of time, several branches (forks) may appear at the end of the chain. But subsequently, only one of the branches can be considered valid.
  • Data is stored decentralized, all nodes are equal and independent. The system does not have a single owner who can independently manage it. Any changes to the system can only happen if the majority of nodes accept the changes.
  • All users have the opportunity to view any data stored in the system and, based on it, check the correctness of adding blocks, and, consequently, the correctness of the data itself.

Prospects for accelerating block mining​

In modern systems, transactions are carried out much slower compared to conventional payment systems. The most popular systems today, primarily Bitcoin and Ethereum, are built on the use of the proof-of-work principle. To confirm a transaction in such systems, you must first wait until the transaction goes through the queue for addition and is recorded in the next block. Then you need to wait some more time, during which the amount of computational work will be completed sufficient to guarantee that no one will be able to repeat this work on their own in order to modify the added data. In the Bitcoin system, a transaction is usually considered confirmed after adding 6 blocks after the block containing the corresponding transaction. This process takes approximately one hour. When it comes to Ethereum, there is no consensus regarding reliable confirmation: some wallets may wait 5 blocks (about a minute) for confirmation, while some exchanges may require adding several hundred blocks for confirmation.

Some exchanges can carry out internal transactions with cryptocurrencies much faster – in a matter of seconds. But these transactions are internal. They are not entered into the general blockchain, but are stored in the database of a specific exchange. Information about transfers will only appear in the main blockchain during operations of depositing and withdrawing funds from the exchange. Therefore, in the future we will not consider internal exchange transactions as blockchain transactions.

To make small purchases, due to the low risks of the seller, there is often no need to wait for confirmation. It is enough information that the transaction has entered the network and is added to the queue. But, if we consider the system as a whole, the main element limiting the performance of the network is still the mining of blocks, since all sent transactions must sooner or later be added to blocks. The difficulty of mining has no real physical basis, but is set artificially to ensure the necessary amount of work for reliable confirmation. In the Bitcoin system, the difficulty of mining is selected so that the average time to add a new block is approximately 10 minutes. If blocks start to be mined faster, the difficulty increases, if longer, it decreases. The frequency of block additions in Ethereum is higher - a new block is added approximately every 15 seconds.

If necessary, the frequency of adding new blocks can be increased, reducing the difficulty of mining, or the mining process can be abandoned altogether, switching the system to work in accordance with the proof-of-stake principle. However, in both cases, the probability of simultaneously finding new blocks will increase, which will lead to an increase in the number of chain branches (forks) and corresponding inconveniences. The critical time for adding a block here will be the time corresponding to the characteristic time of propagation of the block through the network. Once this threshold is exceeded, miners will too often receive information about a new block mined after they have mined it themselves (perhaps including a different set of transactions) and sent it to the network. Thus, adding almost every block will be accompanied by a branch, which will make the normal functioning of the network impossible. The time it takes for data to propagate across a network between different continents can be roughly estimated at 100 ms. Thus, the speed of adding blocks in Bitcoin can be increased by no more than
$10 \cdot 60 \cdot 1 \; 000 / 100 = 6\; 000\; times$
, and in Ethereum - by
$15 \cdot 1 \; 000 / 100 = 150\; times$
.

Features of storing the transaction database on full nodes​

The number of processed transactions can be increased not only by increasing the frequency of mining new blocks, but also by increasing their size. In this case, sooner or later certain restrictions will also appear. One of the most simple restrictions is the amount of disk space used. Working with data in Bitcoin and Ethereum works differently, so it makes sense to consider them separately from each other.

Bitcoin​

In the Bitcoin blockchain, information about the status of accounts is not stored. Only information about transfers is stored there. Therefore, in order to obtain the current balance of a user or simply make sure that he has enough funds to complete a particular transaction, you need to find in the database such transfers as a result of which the corresponding user receives funds, and make sure that these funds are not were spent as a result of subsequent transfers. Therefore, in order to be able to verify the authenticity of transactions in Bitcoin, you need to store on your device the history of transactions from the moment the system was created.

Currently, the size of the Bitcoin blockchain exceeds 150 GB and is growing at a rate of approximately 50 GB/year. It is worth noting that this growth is associated not so much with the arrival of new users, but with the completion of transactions by existing users. Since information about new transactions is added to the repository every day, and information about old transactions is not deleted, the volume of the database will only grow.

Not all users are ready to allocate several hundred gigabytes of disk space on their personal computer to store the blockchain. It should be taken into account that these are not just files stored on disk, but a database, which, when adding new blocks, rebuilds part of the data to ensure efficient storage. Bitcoin clients most often use the NoSQL database LevelDB, built on LSM trees.

Transaction verification can be optimized by using a separate database of transactions with unspent outputs (UTXO), since this is the data that is of greatest interest. Theoretically, by verifying the authenticity of the UTXO, information about old transactions can be removed from the common database for optimization purposes. However, the remaining database will still take up more than one gigabyte of disk space. The intensity of disk access will decrease slightly. The size of the database, of course, will be an order of magnitude smaller, but in addition to adding transactions, in this case there will also be deletions of UTXOs with closed outputs. Thus, maintaining a so-called full network node causes certain inconvenience for ordinary users. It is believed that a significant part of full Bitcoin nodes are already located in data centers.

Storing a complete database is not a requirement for the wallet software to operate and send transactions. Usually, a “light” node is sufficient, which will request the data necessary for operation from other nodes on the network. This is how many wallet programs and clients for mobile devices work. However, this approach does not provide absolute reliability: you have to trust the data received over the network. By requesting additional, redundant data from various sources, it is possible to achieve increased reliability, but the reliability will be probabilistic in nature. The same applies to anonymity: by requesting data about certain transactions over the network, the user actually gives away his area of interest. By requesting redundant data, the area of interest can be masked, but only with a certain probability. The refusal of users to maintain full nodes is not favorable for the network as a whole, since it becomes easier to accumulate a critical number of full nodes in it, which allows influencing the transaction processing process.

Ethereum​

Ethereum has the concept of account state, which includes the state of the account. The blockchain stores changes to such states. Ethereum nodes store current account states and, when new blocks are received, apply changes to existing states, thereby maintaining their relevance. To ensure reliability, there is no need to store old states; the latest current state and confidence that it is correct are sufficient. Thanks to this, Ethereum has not two but three types of nodes:
  • Archive nodes. Such nodes sequentially process all transactions, store the entire history of states and allow you to recreate the entire history of operations, in particular, the history of the execution of smart contracts, at any time. Such nodes are typically used for debugging and statistical analysis. They may also be required to explain and justify actions resulting from smart contract calls. In addition, a complete database is required to carry out mining.
  • Full nodes. Full nodes also execute all incoming transactions, but only store current states and do not store the entire history of the operations. Using such nodes, you can safely make transfers (there is a guarantee that the available information about the balance of other participants in the transaction is reliable), but there is no way to view the history of transactions. It makes sense to use such nodes on desktop PCs to make transfers.
  • Light knots. Similar to Bitcoin light nodes, such nodes store a minimum set of information and request data from the network as needed. The use of such nodes is not completely safe. But nodes of this type place significantly lower requirements on user computers and can work on mobile devices.

For more information on types of synchronization, see the article on dev.to.

The problems of Ethereum archive nodes are similar to the problems of Bitcoin full nodes: the size of the database is in the hundreds of gigabytes and, if the rate of growth in the amount of data continues, then by the end of 2018 the complete database will no longer fit on one standard hard drive or solid-state drive.

But, as mentioned above, to safely conduct transactions, a full Ethereum node with a “trimmed” database, which currently occupies “only” 40 GB, is sufficient.

However, in practice, not everything turns out to be so simple. As a result of experiments with one of the most popular implementations of the Ethereum client, Geth (Go Ethereum), in January 2018, it turned out that in about two weeks of the client’s operation, the database size increased from 40 GB to 80 GB. Further research showed that after database synchronization is completed, this client switches to archive node mode. That is, information about outdated account states is not removed from the database, which leads to an increase in the size of the database, which can relatively quickly exceed several hundred gigabytes (see Figure 3). The developers are aware of the problem, but the ticket was closed almost six months ago without taking any measures to solve the problem. The only practical recommendation is to delete the database and load it again in "fast" synchronization mode (may take several days). In this case, the loaded database will indeed be smaller, but after synchronization is completed, the client will again switch to full synchronization mode and the size of the database will again begin to increase rapidly.

It seems that one of the most popular Ethereum clients is not designed for long-term use in a secure mode, but is aimed only at miners who require a complete base to operate, and speculators who are only interested in the cost of the cryptocurrency, and not in the safety and reliability of using the corresponding software.

Ethereum Virtual Machine​

As you know, Ethereum is not only a means for making payments, but also a distributed virtual machine that allows you to execute “smart” contracts. Such contracts are simple programs executed on a special virtual machine. Taking into account the fact that each full or archive node must sequentially execute all incoming calls to such contracts, it is reasonable to assume that the further development of Ethereum is limited precisely by the capabilities of modern computers to process a large number of calls to such contracts.

You can evaluate the increase in the computational complexity of processing smart contracts by downloading the full database. Among other things, the complete database stores the code of all contracts, a complete history of their states and the arguments with which they were called. All this allows you to replay all contract calls and obtain traces of their execution in the operations of the Ethereum Virtual Machine (EVM - Ethereum Virtual Machine). The list of commands used can be found, for example, on GitHub .

By analyzing these traces, you can estimate the number of operations performed when adding the next block. The most resource-demanding operations seem to be those that calculate hash functions, operations for working with memory, and service operations associated with calling other contracts.

As can be seen from the graphs, at the moment the Ethereum virtual machine does not require a large amount of resources for its operation: approximately one hundred hash functions are calculated per second (taking into account the mining speed of 4 blocks per minute), several hundred accesses to memory of various types and orders 10 system calls. Even if we take into account the tenfold growth during the year, problems with data storage will arise much earlier (see previous sections).

Ethereum Profiling​

Although the Ethereum virtual machine should not consume many resources during its operation, the Geth client is almost completely CPU intensive and uses quite a lot of memory and hard drive. The Ethereum database contains a significant number of records, so it may require quite a lot of resources to work with it. You can roughly estimate the frequency of disk accesses. When performing a transaction, you need to check that the sender's funds are recorded in the blockchain. To do this, you need to search the database on disk. Taking into account the logarithmic search time and the overhead associated with the structure of LSM trees, one search operation in the database can be estimated as 100 random disk accesses. Each block contains approximately 100 transactions (see previous section). Approximately 4 blocks are added per minute. Thus, just to check transactions per minute you need to make
$4 \cdot 100 \cdot 100 \approx 10^4 - 10^5$
random disk accesses.

To verify this assessment and identify the most computationally complex part of the client, it was profiled using the execution trace recording function provided by the developers . To analyze the recorded trace, standard go language tools (go tool trace) were used.

The diagram below shows the results of measuring the operating time of various goroutines (analogous to threads in the go language) while the Geth client is running after full synchronization is completed. The measurements were carried out for 60 seconds.

From the diagram, you can see that most of the time is occupied by the loader, which must import blocks, validate transactions, make appropriate contract calls using the virtual machine, and update the corresponding states in the database. Due to the small amount of data (4 blocks of approximately 100 transactions per minute), the download itself should not take much time. The work of the virtual machine, as it turned out, too. A more detailed analysis of the profile showed that during the measurement time (1 minute) about 100,000 system calls were made, most of which were calls to LevelDB database files that were located on the hard drive (that is, the estimate given above turned out to be correct) . Thus, the average time between adjacent file accesses is
$60\; 000\; 000\; μs / 100\; 000\; calls = 600\; mks$
, which is comparable in order of magnitude to the time of random access to a hard disk. Based on this, we can conclude that disk access is the bottleneck in this case. This assumption also has experimental confirmation in the form of consistently high disk load during client operation.

The situation with rpc.ServerRequest (the task that took second place in terms of execution time) is similar - a fairly large number of system calls, at the time of which the top of the stack looks like:

syscall.Pread:1372
os.(*File).pread:238
os.(*File).ReadAt:120
github.com/ethereum/go-ethereum/vendor/github.com/syndtr/goleveldb/leveldb/table.(*Reader).readRawBlock:564
github.com/ethereum/go-ethereum/vendor/github.com/syndtr/goleveldb/leveldb/table.(*Reader).readFilterBlock:653
* * *

In third place in terms of execution time is database maintenance - leveldb.tCompaction. Next comes the overhead associated with garbage collection in the go language. The remaining goroutines take negligible time to execute.

Thus, in the case of storing a complete database on a hard drive (not an SSD), the limiting factor is the operation of the database, or rather file I/O. Therefore, to ensure efficient operation of a full node, it is recommended to use solid-state drives, which theoretically can speed up the database by 2-3 orders of magnitude. You can further reduce the load on the database by using fast synchronization. In this case, the database size should be an order of magnitude smaller (tens of gigabytes instead of hundreds), so updating data in such a database should be faster, including due to fewer requests to the hard drive. The results of profiling the Geth client in the process of adding blocks after fast synchronization when the database is hosted on an SSD are shown in the diagram below. The measurements were carried out for 10 minutes.

As a result of profiling, it turned out that the execution of the miner.(*worker).update goroutine takes the most time. At first glance, the results seem surprising, since before profiling was started, mining was disabled by explicitly calling the corresponding API function. A more detailed analysis of the results of profiling and viewing the Geth source code showed that, despite mining being disabled, the miner continues to process incoming transactions and add them to its database. Thus, there is no real mining involved, and most of the miner's work comes down to working with the database and making system calls to access the appropriate files.

Database maintenance takes second place in terms of execution time. Next come other tasks that use the database to some extent.

Interestingly, hashing (goroutine (*hasher).hashChildren) takes on average 30 times less time than working with the miner’s transaction database. Moreover, if a miner always works within the same goroutine, then to calculate hashes, goroutines are created and destroyed each time with all the associated overhead. Several hundred such goroutines are created and destroyed per second. Thus, as the database speeds up and the number of added transactions increases, hash function computation is likely to become one of the major bottlenecks.

Hashing is used not only to confirm and verify blocks in proof-of-work systems, but also underlies the storage of data in blocks and verification of their integrity. Therefore, the problem of calculating hash functions will also arise in systems based on proof-of-stake. A solution to this problem can be the use of accelerators, which can be built both on the basis of general-purpose video cards, and on the basis of specialized coprocessors or microcircuits, which are already used today for mining in the Bitcoin system.

Conclusion​

Without considering the mining process as an artificially created complexity, the biggest performance bottleneck in modern blockchain systems is the database. If the growth rate of the number of transactions does not decrease, then, probably, by the end of 2018, the complete block chains of the most popular blockchain systems will no longer fit on one hard drive. One of the most popular clients of the Ethereum system still does not have normal support for a full node in pruned mode. The Bitcoin system does not explicitly provide for such modes.

When working with databases in blockchain systems, not only disk space is important, but also its speed. Already now, in the case of storing data on conventional hard drives, a significant part of the time is spent reading and writing data. It is likely that full nodes will soon be able to run only on solid-state drives. In the future, a move to SSD arrays and the use of more complex databases that provide higher performance due to parallel work with a large number of drives will likely be required.

In terms of non-database technical limitations, the biggest one is likely to be hash functions. This problem can be solved by using video cards or other specialized devices.

Therefore, many users who currently support full nodes on their personal computers may soon have to abandon them in favor of “light” and less secure clients. Thus, the number of full nodes in the network will be reduced, and they will most likely be located in data centers of large companies that can afford high-performance data warehouses, servers with a really large amount of RAM to optimize database operation, and specialized accelerators for computing hash functions. Whether large companies will be able to unite into a common network or whether they will support several different systems independently of each other, and how exactly work with such systems will be organized is still unclear. However, it is obvious that such systems will no longer be as decentralized and independent as, for example, modern Bitcoin.

Another approach is also possible, which consists in distributed storage of individual parts of the blockchain on user devices. In this case, users will require significantly less disk space, but the network load will increase, since most of the data will be requested from other users. A sufficient level of security can only be achieved with significant redundancy of the requested data, provided there is a sufficient number of independent sources. To organize interaction between nodes storing different parts of the database, special router nodes may be required (perhaps such nodes will be structured similarly to BitTorrent trackers), which will also lead to a decrease in the decentralization of the network.

In any case, as the number of transactions increases, changes will occur in the structure of blockchain systems. What exactly they will be, how this will affect the safety, reliability and independence of such systems, time will tell.

Literature​

  1. Andreas M. Antonopoulos. Mastering Bitcoin.
  2. Preethi Kasireddy. How does Ethereum work, anyway? ( translation )
  3. The Role of Bitcoin Nodes: Do Full Nodes Running in Data Centers Benefit the Bitcoin Network?
  4. The Ethereum-blockchain size will not exceed 1TB anytime soon.
 
Top