In the fifth part of his essay “A Possible Future for the Ethereum Protocol,” dedicated to The Purge, Buterin drew attention to two main “weaknesses”:
1 Storing historical data: every transaction and account is stored on the network forever, which causes a constant increase in the amount of data that each client must download when synchronizing.
2 Complexity of the protocol: adding new functions is much easier than removing old ones, which leads to code complexity.
“For the long-term sustainability of Ethereum, these processes must be counteracted, which will reduce the complexity and volume of the network. At the same time, it is important to preserve the principle of constancy, which is a distinctive feature of blockchains,” Buterin noted.
Currently, a full Ethereum node requires about 1.1 TB of disk space for the execution client and several hundred more gigabytes for the consensus client.
One of Buterin's proposals is for each node to store only a portion of the data, which should reduce the load on the network. The Purge phase is planned to make it easier to launch clients on regular PCs, which will increase the number of nodes to 100,000. Each node will contain a random set of 10% of historical data, and its replication across the network will ensure data integrity.
Buterin estimates that about 800 GB of the 1.1 TB of disk space of the execution client is historical data, the rest is state data. He proposed to reduce their volume by implementing retention periods and partial access to data with the ability to fully restore on request.
Touching on the problems of protocol complexity, Buterin emphasized that removing old features will entail a compromise in backward compatibility. He acknowledged that there is no single solution to simplifying the protocol, since each of the small tasks requires an individual approach.
Some improvements, such as removing legacy transaction types and Beacon Chain committee mechanisms, could be implemented relatively easily. However, changes to the underlying components built into the EVM would require significantly more analysis and technical work, Buterin believes.
In Part 4 of his essay, Buterin already discussed ways to reduce the amount of network state data using technologies like Verkla trees.