Building a Cloud Database From Scratch: Why We Moved From C++ to Rust
Discover the motivations behind RisingWave's transition from C++ to Rust when building their cloud database from scratch. This blog delves into the reasons behind the switch, highlighting the benefits and advantages of using Rust for enhanced performance, reliability, and security.
RisingWave is a cloud-native streaming database. The idea behind the system is to reduce the complexity and cost of building real-time applications in the cloud.
When we started building RisingWave in early 2021, we wrote it in C++. The founding team consisted of several seasoned C++ engineers with 10+ years of relevant experience. So, using C++ was a no-brain decision. The first few months of development seemed smooth. We were in full gear building the most incredible database of a new era, dreaming of how RisingWave could shake the modern data stack. We were on a quest for greater effectiveness.
But as more and more engineers joined us, some shortcomings of C++ came to bite us: unreadable coding style, memory leak, segmentation fault, and more. We started to question ourselves: is C++ the right language for us to write a new database system? After around seven months of development, a whole month's debate followed; finally, we made the hard decision to move from C++ to Rust.
What did the decision mean? It meant we, a team of 10+ seasoned engineers, had to rewrite the entire system from scratch! And our seven months of efforts went in vain. It was an insane decision for an early-stage startup. Note that time is almost everything in the furiously competitive world of tech startups.
After making the decision, we spent around two months removing our C++ codebase entirely and rewriting it in Rust. We deleted 276,406 lines of code in total. 10 months have passed since then. Thanks to the decision, RisingWave is sailing through, with its source code accessible for everyone under Apache License 2.0. More than 60 contributors have joined us in developing the cloud-native streaming database. We are proud that RisingWave survived the rewriting; we are pleased that RisingWave gained over 1,600 stars on Github within just a month!
The Rust community keeps growing rapidly, and many engineers may consider whether to (re)write their project in Rust, just like what we did 10 months ago. We would love to share our thoughts on how we made the decision: what made us move to Rust, and what kind of pitfalls we confronted.
Now, let's first look back at what went wrong with C++.
Implement RisingWave with C++: the good, the bad,
and the ugly
C/C++ is undoubtedly one of the most popular programming languages for building database systems. Most well-known database systems, including MySQL, PostgreSQL, Oracle, and IBM Db2, are created in C/C++. It is still a viable, vital, and relevant language. Choosing C++ won't be a wrong decision to build a brand-new database system, but it doesn't mean that C++ is the best choice, especially for an early-stage startup that aims at innovating a large-scale database system from scratch. To understand the reason, let's review the good, the bad, and the ugly parts of this battle-tested programming language.
- C++ offers developers the opportunity to develop high-performance programs. It provides fine-grained control over both memory and computation without the overhead of automatic garbage collection. Moreover, C++ code can be compiled into assembly language for direct execution on the OS instead of relying on interpreters or language runtime.
- C++ has proved to be a feasible language for system programming. Plenty of databases are built in C/C++. Therefore decision makers can believe that choosing C++ is never a bad idea.
- C++ offers a lot of flexibility to programmers, but it comes at a price. It is extremely easy to program a bug, and some are highly non-trivial. But, it is super hard to debug C++ programs, especially for concurrent programming.
- Dependency management can be a hassle. Although there are some tools, for example, CMake, to automatically configure the compilation of C++ projects, developers still need to manually configure and install the dependent libraries.
- The STL library lacks support for some modern programming tools, for example, native co-routine support. As a result, developers must rely on many community projects, and most lack long-term support.
- Quality assurance is challenging. C++ supports so many features that different developers can code C++ in drastically different styles. As more developers with diverse backgrounds joined our team, we could not maintain readability. Furthermore, bugs in C++ code are non-trivial to identify; hence reviewing C++ code can become daunting.
Why choose Rust over C++
Since C++ is not a bad choice for building a database system, why did we rewrite the whole codebase again? Was it due to the same reason as the majority: it is cool? The answer is no; we decide to move to Rust after careful consideration.
A streaming database is typically used for mission-critical tasks that are incredibly latency-sensitive. Hence, we can only build RisingWave in a language that:
- guarantees zero-cost abstraction so that we won't have performance capped
- doesn't require runtime garbage collection so that we can have the latency spike potentially caused by memory management under our control
We cannot compromise on these two essential requirements for cutting-edge performance.
With these goals in mind, we thus chose Rust over C++. Both languages offer developers zero-cost abstraction and complete control of memory management. Rust, in our view, is a much better choice for relieving the developer's mental load and paving the way for efficient and large-scale collaboration. Here are the top four reasons:
- Rust is safe. Rust guarantees memory safety and thread safety at compile time by introducing ownership rules. It goes beyond RAII, a common memory management mechanism used in C++. There are two advantages. The first is self-evident: once the Rust compiler validates our program, we won't have segmentation faults or data races at runtime, which would otherwise necessitate dozens of hours of debugging, especially in a codebase that is highly concurrent and primarily asynchronous. The second is more subtle: the Rust compiler simply restricts the types of faults, which reduces the intricately intertwined code fragments that can cause such buggy behavior. The replication of bugs is substantially improved with the help of deterministic execution (we'll have more on this in future blogs).
- Rust is easy to use. C++ is based on the philosophy of giving developers the most degree of freedom. But, sometimes, it backfires. For example, in C++, the template expands at compile time to check whether any operation is uninvokable with a particular type. In Rust, traits constrain the methods that a concrete variety can invoke, so the compiler can check the validity of the type at the call site instead of running expansion. This difference makes C++ template error messages more obscure and often requires seasoned C++ veterans' decipherment. Another example is the widespread abuse of implicit conversion in C++. Implicit conversion may help you code less, but things are more likely to go wrong, and when it really goes wrong, the error would be "implicit" and harder to debug. Check out the Google C++ style guide; implicit conversion only causes more benefits than confusion when properly restricting its usage, especially in a large codebase.
- Rust is easy to learn. For seasoned C++ programmers, Rust is easy to learn. When they first start out, Rust learners usually spend most of their time making sense of ownership and lifetime. Even if they don't explicitly express these concepts in code, experienced C++ engineers always keep these two concepts in mind when programming in C++. Rust can be challenging for beginners. But our interns proved otherwise. They picked up Rust within one or two weeks — even with no prior Rust/C++ expertise — because there are fewer implicit conversion and overload resolution rules to remember. And examining the basic Rust code is a breeze for our colleagues. Now, we spend far less time reviewing beginners' Rust code than C++ code.
- Unsafe Rust is manageable. Due to the conservative nature of Rust's static analyzer, we may encounter situations where only unsafe Rust can make the impossible possible. A classic example is creating a self-referential type. Or we must gain extra performance by unsafe, i.e., directly manipulating bits in a compacted memory representation. In either case, it is reasonable for skeptics to ask: will this make the codebase vulnerable? For RisingWave, empirically, no. Two major use cases are LRU cache and Bitmap, which take less than 200 lines out of 170000 lines of code in total. Adopting this approach of first coding in Safe Rust and only resorting to Unsafe when there is concrete evidence and solid arguments is now the secret to our good night's sleep.
Here comes the dark side
While Rust meets most of our requirements, we are also fully aware of the dark sides:
- Fragmented async ecosystem: Without making initial decisions on async runtime, we spent months getting rid of futures-rs and async-std and switched to tokio-rs finally.
- Cumbersome error handling: We need to store and implement backtrace capture on errors manually to get a backtrace.
- Insufficient support of AsyncIterator: Without native support for stable generators and async fn in traits, we used third-party libraries to achieve the same goal. However, these libraries allocate extra Boxes compared to the pending standard implementation, ultimately lowering performance. Also, using macro from these libraries hinders IDE from working properly, making development less programmer-friendly.
- Practical limitations of Generic Associated Type (GAT): GAT is the foundation for many existing/pending features, e.g., static/dynamic async fn in traits. However, complete support for GAT has complex technical issues that may require longer than expected time to be solved. Before that, we have to use various tricks to bypass limitations or live with suboptimal solutions.
Nevertheless, with so many talented engineers in our team, we find that, overall, Rust improves productivity and code quality significantly while keeping the negative impact under control.
Learning from our experience
This blog is not about convincing every database development team to abandon their entire C++ codebase and rewrite their system in Rust from scratch. Instead, its primary purpose is to tell people why we made such a decision. Rewriting the entire code base is no fun; instead, it's excruciating for a startup where wasting time is suicidal. In fact, despite the apparent benefits brought by Rust, we probably wouldn't have made this tough decision without the following key factors:
- We were at that time refactoring our code base to adapt to our new system architecture, and rewriting (at least a portion of) the codebase became inevitable.
- We have a few Rust enthusiasts (Rustaceans!) in our team, and they kept evangelizing Rust to other engineers and convinced the entire team that rewriting in Rust was a practical option.
- We expanded our engineering team rapidly, and more engineers joined us in the summer of 2021, which significantly accelerated the codebase rewriting.
Rust is a cool programming language, and everyone should try it. But don’t rewrite your project simply because it is cool to do so. If you are considering whether to rewrite your production-level project in Rust, then please ask yourself the following questions:
- Will low-level programming, performance, memory safety, and package management become a concern for your project?
- Do you have any Rust experts who can help avoid potential pitfalls?
- How long will it take to rewrite this project?
- Will you miss any critical deadlines because of the rewriting?
- Do you have in-house training programs on Rust?
You can decide after careful deliberation of answers to these questions. Again, Rust (or any other language) will never determine the destiny of your project. But making a wise choice may save you hundreds or even thousands of man-months.