Firedancer - Industry Intel | Revelo Intel

Solana Firedancer

Published: March 05, 2024

Firedancer

In the following report, we delve into Firedancer, exploring its value proposition and the role it will play in realizing the vision outlined in Solana’s original whitepaper. For many, its release in production will mark the first steps towards Solana 2.0, pushing the boundaries of both hardware and software to the limit, addressing the challenges of existing validator clients, and redefining the network’s scalability, reliability, and efficiency.

Key Takeaways

Introduction

The story of Solana begins in 2017 when Anatoly Yakovenko published the original whitepaper describing Proof of History. This set out the vision to build a blockchain that could overcome the scalability problems of existing solutions. It was designed as a high-performance, low-latency platform – a blockchain where you could build trading engines poised to rival the Nasdaq. 

Based on the original idea laid out by Anatoly and the founding team, provided that software did not get in the way of hardware, the aggregate network performance of a blockchain could grow linearly with hardware advancements — as per Moore’s law and the observation that the number of transistors in an integrated circuit doubles about every two years with minimal rise in cost. Assuming optimal software, Solana’s performance metrics could double every two years without any further upgrades.

Source: Medium

However, even if performance can scale with hardware advancements, efficient node communication is crucial to prevent network bandwidth from becoming a bottleneck, avoiding network slowdowns. This is critical to prevent any type of network slowdown, even with hardware advancements.

Current Solana validator clients have some limitations that can impact their performance and scalability, as well as network reliability, due to a lack of diversity. Most of these limitations are software-based, and not the results of constraints at the hardware level. Kevin J Bowers, a Jump High-Performance Systems researcher leading the efforts on Firedancer, is known for the infamous phrase “the speed of light is too slow”. Backed by decades of experience developing high-performance global trading systems, Jump seeks to improve the reliability of Solana by building an independent validator client called Firedancer.

Source: Solana

Firedancer is not a short-term patch designed by just looking at the rearview mirror. It is meant to push the bounds of computing to offer scale and reliability, running at the edge of physical and theoretical limits.

Towards Solana 2.0

In conventional development, operations are viewed in terms of data space, not physical space. This perspective is inevitably constrained by physical limitations like the speed of light, resulting in suboptimal hardware utilization and slower systems. 

Source: Revelo Intel

As a consequence of traditional development practices often ignoring physical limitations, this leads to suboptimal hardware utilization. In other words, system scalability isn’t restricted by hardware but by inefficient software. There is a famous quote that goes by “faster hardware is a bad first solution to slow software”.

Jump Trading faced these constraints head-on, evolving from conventional trading systems to a highly customized technology stack. This transformation addressed the unique challenges of global-scale trading, where physical proximity to exchanges, heterogeneous market requirements, and stringent operational standards are critical.

Unlike traditional supercomputers, Jump’s systems had to be prompt, transparently fault-tolerant, and incrementally upgradable. They needed to function in a diverse, competitive, and heavily regulated environment, with strict data retention and accountability standards. Merely adding more hardware to Solana won’t suffice. Instead, the optimization of the software is the key, and Firedancer propels it closer to realizing these objectives. 

For reference, a single validator running Firedancer processed 1.2M TPS in a live demo in November 2022, although most estimates point to 600k TPS as a more realistic number. This figure is many times greater than Solana’s current theoretical limit of 50k TPS, and far exceeds the amount of transactions leading service providers like Visa typically process (~1.5k – 2k TPS).

Source: Twitter

Firedancer introduces a paradigm shift in the validator client architecture and functionality of Solana. It’s not just about building a validator client; it’s about constructing one that’s highly reliable, and performance-oriented, enhancing the already rapid and efficient Solana blockchain. 

Source: Frictionless Capital

With Firedancer, Jump is spearheading what many refer to as Solana 2.0. By leveraging years of experience in a high-frequency trading environment, adding this upcoming independent validator client to Solana’s toolkit will help to make the network more reliable, performant, and robust. 

Source: Revelo Intel

Solana’s exceptional speed and recent solutions to downtime have set it apart. Yet, there’s potential for further enhancements. Firedancer aims to be the top-performing validator client across blockchains, improving Solana’s client diversity and addressing its current over-reliance on a few clients.





Source: Github

Setting The Stage

In Solana, validators participate in Solana’s Proof of Stake consensus mechanism to validate transactions and propose new blocks to be added to the chain.

Source: Revelo Intel

This way, Solana ($SOL) token holders can participate in consensus and earn staking rewards by locking up their $SOL tokens in a validator node. 

Source: Revelo Intel

Proof of Stake (PoS) provides both positive and negative incentives. Participants need a financial stake as a security deposit. If a validator breaks the rules, a part of their stake is “slashed” as a penalty. Otherwise, they are compensated with staking rewards.

To do their work and connect with other nodes in the network, validators require the use of validator clients, which are applications that enable them to participate in consensus. This is the software that allows for validating and confirming transactions to ensure that they adhere to the network’s rules. 

Different validator clients can contain different code, but all of them must implement the same logic. This is important because if there is a bug or failure in one of the clients, nodes running other clients will be able to keep the network online and prevent it from experiencing downtime. 

Source: Revelo Intel

Increasing client diversity means that there is more resilience and protective measures against the presence of possible bugs or vulnerabilities in the code of a validator client. As an example, in 2016, attackers found a bug in Ethereum’s geth client and started attacking the network in a series of DDoS (Distributed Denial of Service) attacks. To protect against this, the solution was for node operators to switch to another client rapidly, keeping the network alive while the vulnerability was being patched. 

Current State of Validator Clients on Solana

Solana currently has ~1,649 validators, featuring one of the largest PoS networks by node count, as well as one of the most distributed, as measured by the Nakamoto Coefficient. The current validator clients on Solana are the Solana Labs client and the Jito Labs client, holding 68% and 32% of the active stake respectively. Even though the growth of the Jito Labs client has played a key role in diversifying the network during 2023, there is still work to be done. 

The Jito client is a fork of the Solana Labs client, with the difference that it introduces a pseudo-mempool (since Solana does not have one) to optimize for MEV. This allows validators to search through unconfirmed or pending transactions in the mempool, bundle them together optimally, and submit them to the Jito Block Engine.

Source: Revelo Intel

However, the fact that it is a fork implies that both clients share many components developed in the original codebase. Therefore, if there was any undiscovered vulnerability or bug in the original Solana Labs client, both clients would be impacted. 

Moving forward, in addition to the two existing clients (Solana Labs and Jito) and Firedancer, there are two independent teams working on new clients. This makes for a total of 5 validators, three of them in Rust (Solana Labs, Jito, and Agave by Anza), one in C (Firedancer), and one in Zig (Sig by Syndica). 

Why Another Validator Client?

Having another validator client like Firedancer is important for Solana as it enhances network resilience and performance. Diverse validator clients reduce the risk of systemic failures, as different clients may have varied responses to the same network issues. 

Source: Solana

Diversity in the software of validator clients ensures that a single bug or vulnerability in one client doesn’t compromise the entire network. Additionally, specialized clients like Firedancer can bring performance improvements tailored to Solana’s unique requirements, further strengthening the network’s efficiency and reliability.

Source: Revelo Intel

When there are multiple diverse clients operating on a network, they are less likely to share the same vulnerabilities. If one client fails due to a specific bug or attack, other clients with different architectures may remain unaffected, ensuring the continuity and stability of the network. 

Source: Revelo Intel

Another problem is that the current clients (Solana Labs and Jito) run as a single process – making it very difficult to change the code once it is running in production. For instance, validators running these clients are forced to shut down when they need to implement on-the-fly security upgrades. All distributed systems – be it a centralized payment processor, a database, or a blockchain – have one thing in common. A single implementation bug can take the entire system down. 

Besides, no amount of testing can prevent every possible bug. This is why supporting multiple clients in production is so important, especially when it comes to a blockchain that aims to experience zero downtime and be antifragile while being pushed to its limits.

An ideal case could involve 4 clients developed by independent teams and using different programming languages, each with a 25% dominance of the active stake. This way, no single implementation would have more than 33% of the active stake, helping to prevent single points of failure. 

Source: Revelo Intel

Besides improving performance, Jump is building a new validator client for Solana to address the network’s reliability. For instance, Solana has already experienced 4 incidents in the past where block production halted due to a problem in the software module affecting consensus. These incidents were often resolved by coordinating validators through Discord and required manual fixes. 

Once Firedancer is in production, it will be much easier to apply the same optimizations across all the other clients. For instance, Jito is working on JitoDancer, which will implement the necessary optimizations to work alongside Firedancer. 

Firedancer

Firedancer is a rewrite of Solana’s current validator client in the C programming language, ensuring that no bugs are carried over from the original codebase.

It is being developed by Jump from the ground up. By leveraging their expertise in high-frequency trading, Jump is one of the most capable teams for building such a high-performance, low-latency, and fault-tolerant system. 

Source: Revelo Intel

The code is open-source, making it possible for anyone to audit and learn about the optimizations being developed. Kevin Bowers is currently leading a team with a proven track record of pushing the boundaries of technology in high-performance environments. Their expertise in optimizing data orchestration, customizing technology stacks, and scaling systems globally is unmatched.

Like Jump, Solana requires a robust, adaptable, and efficient infrastructure to support its growing ecosystem. The introduction of Firedancer, an independent second validator, is a step towards this goal. Firedancer is designed to be globally scalable, incrementally evolvable, fault-tolerant, and accessible to a broad community. It’s being developed openly, inviting collaboration and feedback from the Solana community.

Source: Revelo Intel

In addition to that, the client remains interoperable with the Solana Labs and Jito Labs clients, despite them being written in Rust. The Firedancer client will also work with existing hardware used by Solana validators, not forcing them to upgrade to new machines. With Firedancer, Solana will scale with bandwidth and hardware

The Firedancer vision is simple: each component must be optimized for maximum performance so that the client will operate at a capacity only limited by the validator’s hardware. Contrast this to the current state, where the performance of validators is limited by software inefficiencies. 

The Firedancer Stack

Firedancer was built from the ground up by rewriting in C 3 major functional components from the existing Solana Labs client, optimizing each individual piece to bring the client to its hardware limit. 

Source: Revelo Intel

Essentially, Firedancer is more than a validator; it’s a testament to Jump’s belief in pushing boundaries and its capacity to innovate at the intersection of technology and finance – aiming to optimize transaction throughput and minimize costs.

Source: Revelo Intel

By optimizing these critical components, Firedancer ensures that Solana’s scaling is primarily limited by the hardware that clients are running, not the software itself. This aligns with the original vision of Solana laid out by Anatoly years ago: to design an efficient way for nodes to communicate, so bandwidth was no longer a bottleneck.

In the world of high-performance computing, everything must be optimized eventually. This approach aligns with Moore’s Law, which states that the number of transistors on a microchip doubles approximately every two years, leading to increased computing power and decreased costs per transistor. 

Firedancer’s scaling is closely related to the number of transistors in a microchip, as more transistors result in more physical space on the chip’s surface. Hence, chip designers can add more cores, resulting in higher data throughput for Solana. This makes it possible for the network to scale with bandwidth and hardware improvements whilst maintaining a sufficient level of decentralization.

Source: Revelo Intel

In the context of Solana, having more processor cores means the ability to execute more transactions in parallel. This leads to significantly higher data throughput, as multiple transactions can be processed simultaneously without waiting for each other.

Early simulations show that, as the number of cores increases, the bandwidth increases almost linearly. In other words, doubling the number of cores in the hardware of a validator would result in almost double the number of transactions being received in the same period of time by that validator. 

Modular Architecture

Firedancer’s modular architecture sets it apart from existing clients. This modularity refers to its internal structure, aiming for efficiency and reliability in network validation processes. This should not be conflated with the broader blockchain architecture debate, which focuses on how blockchain networks themselves are structured and operate. 

The latter involves discussions on the division of labor within a blockchain ecosystem, such as separating consensus, execution, and data availability layers, a concept not directly related to Firedancer’s design principles or functionality. 

Source: Revelo Intel

Firedancer uses a tile-based architecture where each tile is an independent Linux C process with some memory. In comparison, the current Rust client operates as a single process. 

In this context, a process refers to an instance of a running program. For example, modern operating systems work by allocating memory and resources to different processes, each handling a specific task.

Source: Revelo Intel

Tiles, as independent processes, offer several advantages over a single process system. They isolate failures by managing the validator state in different workspaces, ensuring that an issue in one tile doesn’t affect the entire system. 

This isolation also simplifies maintenance and upgrades, as individual tiles can be modified or replaced without halting the entire system. For instance, each tile can pick up processing where it left off during a restart or upgrade.

Additionally, tiles enable better resource utilization, as each can operate on separate CPU cores, preventing resource contention and optimizing performance. These factors make tiles more robust, flexible, and efficient compared to a single-process approach.

Early tests have shown a 10-100x improvement over the Solana Labs client, achieving more than 1M tps per GPU core during testing.

Source: Revelo Intel

Security Model

One of the tradeoffs of C with respect to Rust is that C does not have the memory safety guarantees that Rust provides natively. To protect against these scenarios, Jump is following a practice called “OS sandboxing” which basically refers to isolating the different processes (tiles) from the operating system. 

Following the Principle of Least Privilege, tiles can only access the system resources and perform the system calls that they strictly need to perform their job. In fact, the well-studied field of memory safety in C has pushed the team towards adopting an adversarial mindset and minimizing the attack surface as much as possible. If tiles were not isolated, an attack could cause damage to the entire network if they manage to successfully exploit this vulnerability on multiple validators. 

This practice acknowledges that there is no such thing as “perfect code”, and that at some point it is inevitable to introduce a bug or security vulnerability. However, in order to protect against these scenarios, Jump follows a Defense in Depth strategy, such that if an attacker manages to compromise a piece of code, its scope will be limited, and the thread will not affect other pieces of the tech stack. 

The Road Ahead

Undoubtedly, the release of Firedancer to production is a large milestone for Solana, and many refer to it as Solana 2.0. That being said, there are still challenges ahead from a security perspective.

Despite all the benefits that come with it, Firedancer still has to replicate the behavior of existing clients closely. Otherwise there is a chance that consensus bugs could be introduced due to incompatibility.

One solution to address these concerns is to have a percentage of the stake running on both clients, aiming to keep Firedancer under 33% of the total stake for an extended period of time. 

What’s most important is to take into account that the codebase cannot be developed in isolation. This is the reason why Jump continues to review performance against the functionality of the existing Labs client. 

Another challenge that Jump is facing right now is the lack of specification and standardized documentation. On the positive side of things, this is something that will change. 

New implementations will need to be extra careful when aiming to keep up with the speed of development without introducing new bugs. Therefore, Jump has made it one of their priorities to improve the documentation of Solana. 

Once Jump provides the actual specification documents to define the Solana protocol, anyone should be able to create a Solana validator just by looking at the documentation, not the validator code. 

Not only will this help Firedancer stay compatible over time, but this comprehensive documentation will also play a key role for other independent teams to build their own clients. Ultimately, even if the network can leverage the decades of expertise of Jump, the end goal is for Solana to become an open standard that is governed by a diverse set of community contributors. 

Conclusion

In summary, the introduction of Firedancer, even if it is just an independent validator client for Solana, represents a significant milestone for the network. 

Firedancer aims to enhance Solana’s reliability, performance, and diversity of clients, addressing the current over-reliance on a few clients. 

This initiative aligns with Solana’s vision to design an efficient way for nodes to communicate, scaling with hardware improvements rather than being limited by software inefficiencies.

References

Solana Whitepaper

Solana Validators

Solana Labs client

Jito Labs client

Firedancer client

Tinydancer – Solana’s first light client

Jump vs the speed of light

Helius: What is Firedancer?

Firedancer Docs

Disclosures 

Revelo Intel has never had a commercial relationship with Solana or Jump Crypto, and this report was not paid for or commissioned in any way.

Members of the Revelo Intel team, including those directly involved in the analysis above, may have positions in the tokens discussed.

This content is provided for educational purposes only and does not constitute financial or investment advice. You should do your own research and only invest what you can afford to lose. Revelo Intel is a research platform and not an investment or financial advisor.