Developer Experience Counts: Behind the Scenes of RiseDev

Background

Let’s begin by thinking about the projects you are working on. How would you be willing to:

Run a full unit test suite of your project locally?
Start a cluster and run a full integration test suite locally?
Run performance testing (e.g., benchmarks) of your changes?

... when you’ve changed some code and are preparing to submit a PR?

Back when we started the RisingWave project, developers always complained about the slow test speed and bad test experience. They’re indeed painful for developers!

Unit tests were super slow. We had a workspace of 10+ crates and 700 unit test cases. Running a full test suite with a cargo test took a long time, not to mention the time spent on compiling with code coverage.

Integration tests were very hard. RisingWave is a cloud-native streaming database from day one, and a minimal RisingWave cluster requires a compute node, a meta node, and a frontend node, which need to be started with many parameters and configs. Therefore, we had a long bash script to start a cluster, and we had to wait 30 seconds before we could run integration tests. Sometimes, if developers forgot to stop the components they had previously started, it would take several minutes to figure out why the e2e test yielded no output. With more and more components (e.g., multiple compute nodes, MinIO, Kafka) added to the integration test suite, few developers were even willing to start clusters by themselves. We would simply submit a PR with our fingers crossed, hoping the change would pass e2e tests on CI. Furthermore, Every time we made changes to the command-line interface, we had to hack the startup scripts and let other developers know the changes so that everyone could still correctly set up RisingWave on their local machine. At that time, CI failure rates were high. There was no way of detecting whether a component was correctly set up (e.g., by gRPC health check) — we just used sleep 5s. Sometimes the e2e tests just failed because the TCP port was closed or the service was not fully initialized.

A RisingWave cluster needs to maintain a set of compute nodes, a meta node, and a frontend node.

Performance tests were painful. Developers had to do everything manually — compile RisingWave, send binaries to EC2, write command-line arguments by ourselves, check if every service started, configure the Grafana dashboard, run benchmarks, and finally, clean up everything. Anyway, no one knew whether their PRs would affect performance because (nearly) no one would do performance tests.

RiseDev

I got tired of those problems, so I initiated RiseDev, the RisingWave developers’ tool, at the end of December 2021. I initially thought about naming it RiseLAB, as developers will do a lot of “experiments” in the “lab” of “RisingWave.” Nevertheless, to avoid name conflicts with some well-known brands, we later renamed it to RiseDev before open-source.

We introduced RiseLab in December 2021.

Two weeks after RiseDev was introduced, all developers at Singularity Data were using it for daily development. It’s super easy to use: with one line of command ./risedev d, it will start a RisingWave cluster within 4 seconds. It also offers several other options to tweak, but we won’t go deeper about the internals of RiseDev today.

With RiseLab, we can start a RisingWave cluster within 4 seconds.

Now let's see how RiseDev can simplify the development process and improve the developers' experiences.

Fast Unit Tests

Rust developers generally use cargo test to invoke unit tests. Let’s recap how unit tests are compiled and run with cargo test:

All crates within the workspace will be compiled with a special config test, so that code inside #[cfg(test)] will be compiled.
Each crate (library) will produce a binary linked with libtest, which executes unit tests.
cargo test will invoke each unit test binary one by one to complete the unit test process.

The problem is with the last step — indeed, those unit test binaries can run in parallel! That’s where cargo-nextest comes into play. It runs unit tests from different crates in parallel, so as to improve test speed. Furthermore, it integrates with libtest-mimic well. We have some file-based tests, where the test harness will scan test files and generate unit tests dynamically using libtest-mimic. Nextest supports such use cases well. After migrating to nextest, we could run all test cases within 10 seconds:

All the tests are done within 10 seconds.

One issue with nextest at that time was the lack of coverage tools. Luckily, I found cargo-llvm-cov and sent a PR to add integration with nextest. Now, with a single command cargo llvm-cov nextest, we can run unit tests with coverage reports using nextest. This solution also improves build speed compared with cargo tarpaulin, the coverage tool we used before.

Easy Cluster Setup

The core functionality of RiseDev is composing and starting a test cluster. RiseDev implements a flexible yaml configuration engine that enables developers to compose the cluster as they want. Let’s take a look at the usage examples below.

By default, RiseDev only starts RisingWave in in-memory mode: a meta node storing metadata in memory, a compute node storing states in memory, and a frontend node. The cluster configuration is stored in risedev.yml at the root of the GitHub repository. The default config is like this:

risedev:
  default:
    - use: meta-node
    - use: compute-node
    - use: frontend

And with a single command ./risedev d, all these components start one by one. RiseDev ensures each component is fully initialized before starting the next one by doing health checks to avoid starting a misconfigured cluster.

✅ tmux: session risedev
✅ prepare: all previous services have been stopped
✅ meta-node-5690: api grpc://127.0.0.1:5690/, dashboard http://127.0.0.1:5691/
✅ compute-node-5688: api grpc://127.0.0.1:5688/
✅ frontend-4566: api postgres://127.0.0.1:4566/
✅ playground: done bootstrapping with config default
---- summary of startup time ----
meta-node-5690: 0.83s
compute-node-5688: 1.37s
frontend-4566: 0.83s
-------------------------------

RiseDev invokes tmux command to spawn processes in the background, which provides developers maximum control of every component. Developers can attach to the tmux console and manage each running process within the cluster.

Now when a developer wants to persist state store data on disk, they can modify the risedev.yml in this way:

risedev:
  default:
    - use: minio
    - use: meta-node
    - use: compute-node
    - use: frontend
    - use: compactor

That’s where the magic happens — RiseDev automatically provides available services. Now that we have minio in cluster profile, RiseDev automatically picks it up when starting the compute node.

✅ tmux: session risedev
✅ prepare: all previous services have been stopped
✅ minio: api http://127.0.0.1:9301/, console http://127.0.0.1:9400/
✅ meta-node-5690: api grpc://127.0.0.1:5690/, dashboard http://127.0.0.1:5691/
✅ compute-node-5688: api grpc://127.0.0.1:5688/
✅ frontend-4566: api postgres://127.0.0.1:4566/
✅ compactor-6660: compactor 127.0.0.1:6660
✅ playground: done bootstrapping with config default
---- summary of startup time ----
minio: 0.38s
meta-node-5690: 0.82s
compute-node-5688: 1.41s
frontend-4566: 0.83s
compactor-6660: 0.69s
-------------------------------

When starting the services, RiseDev automatically creates buckets and configures policies for MinIO. And while initiating the compute node, RiseDev adds a command-line argument --state-store hummock+minio://... so that the compute node uses MinIO for storing data. After all, a two-line change enables data persistence of RisingWave!

It’s common that developers may want to test a 3-node cluster and ensure their changes work under a distributed environment. RiseDev also supports this with this flexible yaml config engine.

risedev:
  default:
    - use: minio
    - use: meta-node
    - use: compute-node
      port: 5687
    - use: compute-node
      port: 5688
    - use: compute-node
      port: 5689
    - use: frontend
    - use: compactor

And when developers want to test RisingWave with external data sources like Kafka, and view metrics in Grafana, then can simply run ./risedev d full. RiseDev automatically configures everything! Just navigate to localhost:3001 and we can find the metrics we want:

Developers can find all the metrics at localhost:3001.

Sounds cool, right? After starting the cluster, we can now invoke sqllogictest, a SQL test framework to run integration tests originating from the SQLite project. We’re using sqllogictest-rs, a reimplementation of sqllogictest in the Rust programming language, maintained by the RisingLightDB community. We’ve contributed a lot of features to make it easier and more convenient to use.

Putting together, RiseDev provides a flexible configuration framework to start a RisingWave cluster. It runs health checks, configures services, and generates command-line arguments for all components. With minor changes, we can compose RisingWave cluster of our needs and run integration tests!

Painless Performance Tests

When developing RisingWave, we care a lot about performance — how to ingest data faster, how to read from and write to S3 more efficiently, etc. But for a long time, we had no idea how each commit would affect performance. After some investigations, we set up some standard RisingWave AMIs on EC2 with all required tools installed. Developers only need to create an EC2 from the template, and they can immediately start benchmarking their changes with RiseDev — no need to install the Rust compiler or other dependencies by themselves.

Developers can directly create an EC2 instance from the templates.

But single-node setups are only for quick verifications — all components share the same disk and CPUs, and dataset size is generally small. What we eventually need is a full distributed RisingWave setup with a large amount of data to benchmark.

We’ve developed two separate tools to set up a test environment and deploy RisingWave.Setting up Benchmark Environment in One CommandThanks to Terraform, we can describe our benchmark infrastructure on AWS using HCL code. Inside Singularity Data, we have a private Terraform module to start EC2 for each RisingWave component, connect them in one VPC, and set up an S3 bucket and ECR (elastic container registry). Every EC2 will get an IAM role with the necessary permissions to access each experiment’s S3 bucket and ECR without any manual configuration.

Deploying RisingWaveRiseDev already can start in a cluster locally, but can we deploy it in a remote cluster? The answer is YES! RiseDev supports generating docker-compose config and deploying RisingWave using docker-compose. It will read the deployment information from Terraform and generate a docker-compose config on each EC2. For example, we can instruct RiseDev to deploy one compute node to the EC2 named rw-compute-0:

    - use: compute-node
      id: compute-node-0
      listen-address: "0.0.0.0"
      address: ${dns-host:rw-compute-0}

And with ./risedev compose-deploy <profile>, RiseDev will generate a docker-compose config for that node. Then, we can execute ./risedev apply-compose-deploy to initiate the deployment process. Within one minute, developers can get their cluster up and running!

Developers can get their cluster up and running within one minute. > RiseDev, the developers’ tool for RisingWave, has revolutionized the development process of RisingWave. Developers can easily use RiseDev to run local tests and performance tests. It automates the process of starting and deploying a full RisingWave cluster. It has made RisingWave’s development process much smoother and more efficient. > >

>

There is much more to explore in RisingWave and RiseDev, like jaeger tracing, auto Grafana dashboard generating, using deterministic testing to accelerate time-based tests, etc. We can’t cover them all in one article. We’ve constantly been improving the development experience, so developers will find it easy to work on the RisingWave project.

*Note 1: RiseDev deployment is only used for testing and should not be used in production environments. Please check out our docs on how to use RisingWave for t esting purposes and how to deploy RisingWave on Kubernetes. Stay tuned for RisingWave Cloud in our roadmap.

*Note 2: We have not published any official benchmark numbers of RisingWave — the performance varies under different use cases, and you should try it yourself!