TL;DR: I would like to share our recent research paper on chaos engineering here. The title is "Chaos Engineering of Ethereum Blockchain Clients", which could be found here:
https://arxiv.org/abs/2111.00221. Welcome to drop your comments!
The Ethereum blockchain is the operational backbone of major decentralized finance platforms. As such, it is expected to be exceptionally reliable. During the talk, I would like to introduce our research work on applying chaos engineering to Ethereum clients for resilience assessment. Our research prototype, ChaosETH, operates in the following manner: First, it monitors Ethereum clients to determine their normal behavior. Then, it injects system call invocation errors into the Ethereum clients and observes the resulting behavior under perturbation. Finally, ChaosETH compares the behavior recorded before, during, and after perturbation to assess the impact of the injected system call invocation errors.
The experiments are performed on the two most popular Ethereum client implementations: GoEthereum and OpenEthereum. We experiment with 22 different types of system call invocation errors. We assess their impact on the Ethereum clients with respect to 15 application-level metrics. Our results reveal a broad spectrum of resilience characteristics of Ethereum clients in the presence of system call invocation errors, ranging from direct crashes to full resilience. The experiments clearly demonstrate the feasibility of applying chaos engineering principles to blockchains.
Though in the paper we focus on assessing the resilience of Ethereum clients, the concepts of applying chaos engineering at system call invocation level, and resilience benchmarking using chaos engineering can be applied to a broader set of software systems such as cloud-native applications.