How Randomized SQL Testing Can Help Detect Bugs?
Learn about our pursuit for the ultimate SQL solution — SQLSmith in Rust.
Learn about our pursuit for the ultimate SQL solution — SQLSmith in Rust.
We chose to build our own SQLSmith in Rust. It wasn't a mere whim. We tested various alternatives, analyzing their capabilities and adaptability to our needs. While some solutions showed promise, SQLSmith in Rust consistently emerged as the right solution.
SQLSmith is a tool used for automated SQL query generation and testing. It is designed to explore the capabilities and limitations of database systems by generating random valid SQL queries and executing them against a target database. When a bug is discovered, the query causes the database to crash or produce unexpected errors.
Before implementing our own SQLSmith, Risingwave also tried another approach powered by AFL++. It generates random binary data and checks it using Risingwave's frontend to ensure syntactic and semantic correctness. The frontend consists of the following components:
Join
has two inputs: a table or a subquery.select count(user) from foo
, foo
is a table that contains multiple columns. The binder must verify the column user's existence by consulting the database catalog. It also helps detect the compatibility of data types and related operations. For instance, + 8 seconds
cannot be applied to a non-timestamp data type.
However, random binary data generated by a general-purpose fuzzer is unlikely to pass the parser or even the lexer. This means that testing the binder, optimizer, and execution engine becomes difficult. Using a domain-specific fuzzy testing tool is much more efficient.
There are a few open-source SQLSmith implementations available:
However, there are reasons that prevent us from directly using them to test RisingWave:
Therefore, we chose to build our own SQLSmith in Rust.
We run SQLSmith frontend tests per pull request and a snapshot of generated queries from SQLSmith. This has helped us catch many bugs in the four components mentioned above: Risingwave's frontend, execution engine, and storage engine.
Initially, we ran SQLSmith with a different random seed for each pull request, as we believed that generating a different set of queries each time maximizes code coverage. However, in the early stages of development, we encountered numerous important and unimportant bugs. Our CI/CD pipeline blocks any pull request that fails tests from being merged. Thus, developers were under pressure to fix even lower-priority bugs, i.e., queries that are too complex for users to write in real life.
Additionally, queries generated for each pull request were simply forgotten once the test passed. This is not ideal, as the same bugs detected by SQLSmith may reappear in the future. It would be better to collect all the failed test cases and periodically test them to avoid repeating the same mistakes.
To address this issue, we generate a snapshot of SQL queries to maintain a stable test set.
The snapshot approach allows us to prune the query set. We can remove queries that trigger unimportant bugs or invalid queries that were previously not rejected by Risingwave. Once a bug is detected, it will fail our run pre-generated queries
workflow, preventing the pull request from being merged.
If a bug is deemed unimportant or actually an invalid query, we can remove it from the active test set. However, we still store the failing set of queries so that we can test and reproduce these errors.
Meanwhile, we generate a fresh set of queries weekly to ensure comprehensive coverage.
At a high level, SQLSmith generates the following queries:
CREATE TABLE, CREATE MATERIALIZED VIEW
.INSERT, UPDATE, DELETE
.SELECT
.CREATE MATERIALIZED VIEW
.SET RW_ENABLE_TWO_PHASE_AGG TO FALSE
, to influence the optimizer's query plan output with/without certain optimizations.DDL statements allow us to run batch queries or create new materialized views based on the tables we create, enabling us to test our batch and stream engines.
DML statements ensure that data is processed through batch and stream engines.
Session variables allow us to test the new behavior of the database after modifying these variables. These variables are usually set per query, as they often modify the behavior of query plan optimization.
The query generation process follows a top-down recursive approach, closely following the SQL Abstract Syntax Tree. This process is applied to batch and stream queries, selectively disabling certain SQL features not supported by the stream engine.
Here is a general example of how a complex query is generated:
WITH
clause.SELECT
statement.ORDER BY
clause.
For the set expression, we generate it through the following steps:
FROM
expression, including any necessary joins. SELECT
query.SELECT
items.
When generating the list of SELECT
items, the process first chooses a type and then calls gen_expr with that type. This can result in the generation of:
All of this help ensure that various types of SQL statements are well-tested and reliable.
If you're interested in further details, you can read our developer docs.
SQLSmith has already found nearly 40 bugs in Risingwave across the frontend, execution engines, and storage engine. If interested, you can check out the details by searching on GitHub.
Here are some interesting bugs we found:
NULL
in places we require boolean
SQLSmith is useful for ensuring test coverage of different CAST
s from one data type to another. There are many combinations of types, which are too cumbersome for people to write test cases manually. It managed to uncover this NULL-casting issue caused by implicit_cast
not being enforced correctly in the binder.
2. bug: risingwave-streaming-actor’ panicked at ’mem-table operation inconsistent
SQLSmith discovered a bug with FULL JOIN
when the primary key is null. It exposed a limitation in our optimizer when dealing with streaming queries.
3. bug: two phase stream agg panicked at Option::unwrap()
SQLSmith detected a bug with two_phase_stream_agg
that occurs during plan generation.
💡 What is Two-Phase Aggregation?
For group-by aggregation, Risingwave’s streaming engine can choose between two query plans:
The second approach favors low-cardinality data, i.e., columns with few unique values, as the reduction in data can be significant.
4. bug (stream): mv-on-mv: vnode 98 should not be accessed by this table
SQLSmith found a bug with the computation of the hash key due to an overflow of the interval type during execution.
Other implementations of SQLSmith typically need an oracle of truth to test correctness. However, in Risingwave, we can largely mitigate this inconvenience. Risingwave has both a streaming engine and a batch engine, each implemented by a different set of operators optimized for their respective use cases, resulting in drastically different computation logics:
Therefore, we generate and test the same query using batch and streaming computation engines. Finally, we compare the results to deduce any incorrectness. While the results from both engines can be wrong, the chances of them being wrong in the same way are minimal.
SQLSmith can be further enhanced to test connectors, such as reading data in a specific format from upstream sources. Taking an example from Risingwave's documentation (https://www.risingwave.dev/docs/current/create-source-kafka/#examples), Risingwave supports ingesting data from Kafka in six different data formats: Avro, Upsert Avro, JSON, Upsert JSON, Protobuf, and CSV. Consequently, Risingwave has six different code paths for parsing each data format.
SQLSmith is supposed to generate randomized valid or invalid input data with different data types, such as numeric
or jsonb
, to ensure that the parser can correctly parse valid data and reject ill-formed data.
Often, after detecting a failed query, it takes a significant amount of time to locate the cause of the bug because the queries generated by SQLSmith are very complex and not designed to trigger bugs intentionally.
As a result, we have to apply a binary-search-like algorithm manually:
This is a time-consuming process that could be automated. Discussions about this can be found on Risingwave's GitHub.
Conclusion
Without SQLSmith, we would have to generate test cases manually, which is inefficient. Although we could incorporate existing test cases from other databases, they may not be compatible with Risingwave’s syntax, requiring manual adjustments. Moreover, both approaches explore a search space that is much smaller than what SQLSmith is capable of. It has become evident to us that investing time into SQLSmith uncovers many more bugs than manual labor alone. SQLSmith has significantly increased our confidence in delivering new features and making changes to code or SQL syntax.
ZHI (Martin) LIU
Product Manager
Noel Kwan
Software Engineer