In the presentation below, Dominik Muhs, Senior Security Engineer from ConsenSys Diligence, provides an in-depth analysis of the components that make up an oracle system. Dominik presents this talk on behalf of Shayan Eskandari, CTO of Ether Capital.
Below are some important concepts to note from his presentation.
What is an oracle?
An oracle is a piece of blockchain infrastructure that connects blockchains to real-world data. Oracles help to fetch and verify data from multiple external sources. This data is then delivered to smart contracts, allowing them to be executed based on off-chain data.
The Oracle Problem
Blockchains and smart contracts are useful and exciting because they are decentralized, immutable, and censorship-resistant. However, blockchains cannot transmit data to and from any external system. External data is required for almost all use cases of smart contracts. Some of these use cases include stablecoins, prediction markets, identity, randomness, decentralised exchanges (DEXs), dynamic non-fungible tokens (NFTs), and, of course, price feeds. Such systems would require confidence in the origin of the real-world data.
At the same time, a blockchain’s isolation and resulting inability to connect with real-world data is the exact quality that also gives it its security. The oracle problem can thus be defined as the conflict arising from the trustless execution of smart contracts and a blockchain’s overreliance on third-party oracles for external data. If oracles are centralised or corruptible, they become a single point of failure, potentially defeating the entire purpose of blockchain and smart contract technology.
Oracle’s Modular Workflow
Prior to writing the research paper, “SoK: Oracles from the Ground Truth to Market Manipulation”, Shayan and his colleagues came across the oracle problem while attempting to build a decentralised derivatives market as an academic exercise. Decentralized derivative markets are peer-to-peer marketplaces where derivatives are traded. In traditional finance, derivatives are contracts (or securities) that derive their value from a chosen asset’s performance. Some common types of derivatives include futures, options, and bonds. DeFi derivatives differ from their traditional counterparts in that they do not require a broker or a third party to ensure the contract is fulfilled. Instead, the terms can be programmed into smart contracts and automatically settled on-chain once the conditions have been met. Some examples of decentralized derivative markets include Synthetix, dYdX, and Vega Protocol.
The main purpose of the paper is to divide smart contract oracle systems into separate parts or “modules”. Studying each module individually makes it much easier to analyse oracle design principles, possible attacks, and ways they can be mitigated. The modules are as follows:
- Ground Truth
- Data Sources
- Data Feeders
- Selection of Data Feeders
- Dispute Phase
Ground Truth – The first challenge of oracle design starts with a philosophical question: What is the truth? Is the truth subjective or objective?
The “ground truth” refers to the exact piece of information that oracles would put on-chain. An example would be the real price of Ethereum. This could be an aggregated price from different data sources, such as a centralised exchange (CEX) like Gemini, or a decentralised exchange (DEX) like Uniswap.
In the ground truth module, oracle designers try to find the best way of collecting raw data from various data sources and aggregate them to achieve a reasonable level of data quality. Oracle designers must try to distil the data in a way that is consumable for systems, and in a deterministic way, in which no randomness is involved.
Data Sources – Data sources are passive entities that store and measure the representation of the aforementioned ground truth. According to Dominik, examples of common types of data sources include:
- Databases & API – Centralised exchange prices.
- Sensors: Blockchain-based supply chain systems
- Humans: Prediction markets
- Smart Contracts: On-chain markets, such as Uniswap as the oracle data source
- A combination of the above is also possible.
Data sources are thus the quantifiers of the ground truth. For instance, if the ground truth is the price of Ethereum, then the data source is centralised exchange prices.
Data Feeders – Data Feeders are responsible for transferring data from external or off-chain data sources to on-chain oracles. As such, it is crucial that a set of security provisions are made, such as source authentication and confidentiality.
Data feeders themselves do not verify the reliability of the data. Instead, the external data is pushed through the feeders, which carry out security checks such as source authentication. Depending on the feeder, there are different ways of authenticating sources. In his presentation, Dominik cites the example of being connected to a REST API, and authenticating the API (application programming interface), which uses HTTPS with TLS Notary in order to have an attestation for the correctness of the HTTP data that is being gathered. Alternatively, Intel SGX and Trusted Execution Environments (TEE) can be used. TEE is an environment for executing code separated from the main operating system. It enables end-to-end security and provides a trusted environment where data is stored, processed, and protected. However, in both cases, trust is allocated to third parties like the TLS Notary or the Intel chip manufacturer.
Freeloading Attacks and other Threats
Another security provision that data feeders need to meet is confidentiality to prevent common attacks such as “mirroring attacks” and “freeloading attacks” in oracle systems. Freeloading attacks occur when a data feeder copies data provided by feeders or a public off-chain source without verifying that the data is accurate. This way, the data feeder is able to maximise profits and minimize the work required to get the result.
Mirroring attacks are a type of Sybil attack that can complement freeloading attacks. A Sybil attack is a type of attack in which a bad actor creates a large number of pseudonymous identities and uses them to gain a disproportionately large influence over the entire system. In the case of oracles, an attacker can simply collect data from a single centralized data source and copy the result across multiple nodes. This would result in the final data point largely relying on that single data source.
According to Dominik, commit reveal schemes can be used to prevent such attacks as they help to ensure that no one else can “peek” at other data feeder submissions. This article provides more information about commit reveal schemes on Ethereum. Non-repudiation can also guarantee that a data feeder’s submission into the system cannot be manipulated or denied. This can be accomplished through cryptographic signature schemes to account for the possibility of corruption in the data submission process.
Selection of data feeders – Aggregating all the data provided by data feeders would be very expensive if data feeders must be rewarded for every contribution made. However, according to Dominik, the resilience of the oracle system is known to increase with the number of data feeders inside the network. To form a community of data feeders, a selection process has to take place. There are several ways to select data feeders for an oracle network. In his presentation, Dominik elaborates on the different selection processes being used by oracles like Chainlink, MakerDAO, and BAND Protocol, as well as Tellor’s staked proof-of-work process from 2019.
Important to note: Tellor no longer uses Proof-of-Work. Tellor’s latest upgrade, named Tellor 360, continues to use a staking model, where reporters stake $1500 or $150 worth of TRB Tokens on Ethereum or Polygon respectively. Reporters providing good data get tips and rewards, while dishonest reporters risk losing their stake. To find out more, read this blog post and watch this space! More details on the Tellor 360 upgrade are coming soon.
Dominik also explained some of the shortcomings of these processes, including the example of B Protocol using a flash loan to sway the Maker V2 governance vote in order to add their own oracle to the data feeder allow list. Read more about this in this article by our Media Partner, Beincrypto.
Aggregation – Aggregation refers to the process of combining the selected data feeds into one single output. End users must be able to extract value from the data feeders. This can be done by applying different methods of statistics, such as calculating the mean or median of the given data. There are advantages and drawbacks to each statistical computation, and oracle designers must carefully select which to utilise.
Dispute phase – There are several ways of disputing incorrect data and punishing the providers of incorrect data. The point of reference for any dispute should be the correct data. Projects may make use of an economic game of staking where participants stake a certain amount of tokens to challenge an incoming piece of data. An outcome could be tentative first and only confirmed after a dispute window is completed. If a piece of data is disputed, the provider of the bad data could be removed or they could have their staked tokens confiscated or their reputation score slashed. Augur, a decentralised prediction market platform, and Asteria, which aims to build an options-based infrastructure for DeFi, are examples of protocols that dispute data this way.
Another way to dispute data would be through a game of arbitrage. For example, when there is incorrect price data in a decentralised market, an arbitrage opportunity would arise due to the difference in prices being reported. In such cases, a miner could resolve the arbitrage trade. Miners are incentivised to resolve these trades due to the additional profit that can be made, also known as Miner Extractable Value (MEV). MEV is a term used when miners can extract additional value by “jumping the queue” or reordering transactions within a block to complete trades.
The Dark Forest Problem
Towards the end of his presentation, Dominik refers to the “Dark Forest” problem. “Dark Forest” is a term popularized by Dan Robinson and Georgios Konstantopoulos from investment firm, Paradigm, in this blog post. The term was inspired by the science fiction book, “The Dark Forest,” in which the dark forest is an environment where detection leads to certain death at the hands of predators. In the post, Dan likens the Ethereum mempools to a “dark forest”. Mempools are databases of all pending transactions. Once a transaction is confirmed, it is removed from the mempool. According to the “dark forest” concept, any pending transaction detected in the Ethereum mempool will be vulnerable to attacks by arbitrage bots.
Where do we go from here?
According to Dominik, certain aspects of oracle system design, such as staking, must be strengthened. Oracles use their governance token for staking, the mechanism whereby users pledge their tokens to support a network’s improved operation in exchange for token rewards. This provides an additional layer of security to the oracle system by giving it more guarantees. For this to be effective, however, the token’s market capitalisation must remain significant. If the token’s value crashes, the entire oracle system could be in danger of collapsing, and staking would prove counterproductive. The tokens must also be fairly and evenly distributed for the token value to maintain its integrity, which would improve the effectiveness of staking.
Moving forward, the oracle industry also needs to examine potential attacks closely and develop the relevant strategies to prevent each one of them from happening in the future. More detailed information on oracle manipulation can be found in this Github article by ConsenSys Diligence.
The Blockchain Oracle Summit was the world’s first conference to focus solely on the importance of oracle systems and their design. Leading oracle builders and researchers from across the industry gathered in Berlin to take deep dives into their work to address the biggest challenges faced.