RisingWave vs Apache Pinot: OLAP vs Streaming SQL

When a data engineering team needs real-time analytics, two names come up repeatedly: Apache Pinot and RisingWave. Both promise fast answers on fresh data. Both ingest from Kafka. Both are open source. But they solve fundamentally different problems, and choosing the wrong one for your workload will cost you weeks of painful workarounds.

This article is a practical comparison of RisingWave vs Apache Pinot across architecture, query patterns, latency guarantees, and operational complexity. The central argument is that these two systems are complementary rather than competing: Pinot excels at fast ad-hoc analytics on large historical datasets; RisingWave excels at continuous stream processing and incrementally maintained materialized views. Understanding the boundary between their strengths is what lets you design systems that perform well at both.

By the end, you will know which system fits which workload, how they can work together in a single pipeline, and what the SQL looks like in each.

Architecture: OLAP Store vs Streaming Database

The systems are built on different foundations, and that foundation determines everything else.

Apache Pinot: Real-Time OLAP

Apache Pinot is a distributed columnar OLAP store. It was built at LinkedIn to power user-facing analytics dashboards that needed sub-second query latency over hundreds of millions of rows. The core insight behind Pinot is that OLAP queries on large datasets are fast when data is stored in a format optimized for aggregation: columnar layout, inverted indexes, sorted segments, and star-tree indexes for pre-aggregated rollups.

Pinot's architecture is organized around three node types:

Controller: Manages cluster state, table configurations, and segment assignments via Apache Helix.
Server nodes: Store and serve columnar segments. Each server holds a subset of the total data.
Broker nodes: Route queries from clients to the appropriate server nodes, scatter-gather results, and return the merged response.

Ingestion in Pinot comes in two forms. Offline tables load batch data from HDFS, S3, or other file systems on a schedule. Real-time tables connect directly to a Kafka (or Pulsar) topic and continuously ingest new rows into in-memory consuming segments. Once a consuming segment reaches a size threshold, Pinot commits it to durable storage and starts a new one.

This design means Pinot is excellent at answering questions like "What were the top 100 product pages by views over the last 30 days, broken down by country?" with low latency. But Pinot does not maintain derived results between queries. Each query scans the relevant segments fresh. There are no standing operators continuously updating results as new data arrives.

RisingWave: Streaming Database

RisingWave is a streaming database built from scratch in Rust. It implements incrementally maintained materialized views: SQL queries that run continuously as a dataflow graph, updating their results the moment new data arrives. When you query a materialized view, RisingWave returns precomputed, always-fresh results with zero reprocessing at query time.

The architecture decouples compute and storage:

Compute nodes: Run streaming operators (filters, joins, aggregations, window functions). They are stateless with respect to durable data.
Meta node: Coordinates cluster membership, checkpointing, and barrier injection.
Compactor nodes: Handle background compaction of the state store.
Hummock (state store): A cloud-native LSM-tree storage engine that persists all state and materialized view results to S3-compatible object storage.

Because compute and storage are decoupled, you can scale compute independently of storage. Adding a compute node increases processing throughput without provisioning more disk. And because Hummock persists state to object storage, there is no local disk to manage or RAID to configure.

RisingWave speaks the PostgreSQL wire protocol, so you connect with psql, JDBC, Python's psycopg2, or any PostgreSQL client library. No proprietary SDK required.

The Architectural Divide

The key architectural difference is how each system handles standing queries:

Pinot processes queries on demand. Each SQL statement triggers a full scan-and-aggregate over the relevant segments. This is fast because the data is indexed and columnar, but it means every query pays the same cost every time it runs.
RisingWave processes queries continuously. A CREATE MATERIALIZED VIEW statement defines a standing dataflow that incrementally updates results as events arrive. Reading the view is a lookup, not a computation. The compute happens once, when the data arrives; not repeatedly, every time someone queries.

This is not a subtle distinction. It fundamentally changes the cost model, the latency profile, and the types of workloads each system handles well.

Query Patterns: When Each System Shines

Understanding what types of queries each system handles well is the key to using them correctly.

Pinot's Strength: Ad-Hoc Analytics at Scale

Pinot's architecture is optimized for ad-hoc OLAP queries that:

Scan large ranges of historical data (days, months, years)
Apply filters on high-cardinality dimensions (user ID, product SKU, geo)
Compute aggregations over millions or billions of rows
Return results within seconds or milliseconds

The canonical Pinot use case is a user-facing analytics dashboard. A user opens a dashboard and asks for their campaign performance over the last 90 days by device type and region. That query touches hundreds of millions of rows but must return in under 500ms. Pinot's columnar layout, inverted indexes, and star-tree indexes make this possible.

Pinot also handles well: funnel analysis, cohort analysis, A/B test reporting, and any workload where the query shape is unpredictable and the dataset is large.

What Pinot does not do: it cannot maintain a continuously updated result that downstream systems can subscribe to. If you want an alerting system that fires when fraud scores exceed a threshold, you cannot define that logic in Pinot and have it push results to a downstream sink. You would have to poll Pinot periodically, which is not the same thing.

RisingWave's Strength: Continuous Processing and Incremental Aggregation

RisingWave is optimized for workloads that need:

Results that are always up-to-date, not computed on demand
Joins between a streaming source and a slowly-changing dimension table
Multi-step pipelines where one derived result feeds another
Event-driven triggers and alerts based on continuous conditions
Sub-second latency from event arrival to result availability

The canonical RisingWave use case is a fraud detection system. You define a materialized view that joins transaction events with user profile data, computes a rolling 5-minute fraud score, and flags anomalies. That view is always current, even as new transactions arrive. A downstream system (a Kafka sink, an alert webhook, an operational dashboard) sees the updated score the moment it changes, not the next time it polls.

What RisingWave does not do as naturally: ad-hoc exploration of large historical archives. If you want to query three years of clickstream data with arbitrary filter combinations that were not anticipated at view-creation time, RisingWave is not the tool for that. Hummock stores materialized view results efficiently, but it is not a columnar OLAP engine designed for arbitrary scan-and-aggregate over petabytes of historical data.

SQL Examples: Verified Against RisingWave 2.8.0

The following examples are all verified against a local RisingWave instance running version 2.8.0. Every query was executed and the output is shown.

Setting Up the Tables

In this scenario, imagine a team building a real-time e-commerce analytics system. They use RisingWave to process the event stream continuously. Apache Pinot would sit alongside RisingWave to serve ad-hoc historical queries from analysts.

-- Table receiving page view events from Kafka (in production,
-- use CREATE SOURCE with connector = 'kafka')
CREATE TABLE pinot_page_views (
    user_id    VARCHAR,
    page       VARCHAR,
    region     VARCHAR,
    duration_ms INT,
    event_time TIMESTAMPTZ
);

CREATE TABLE pinot_orders (
    order_id   VARCHAR,
    user_id    VARCHAR,
    product_id VARCHAR,
    amount     DOUBLE PRECISION,
    status     VARCHAR,
    region     VARCHAR,
    created_at TIMESTAMPTZ
);

Insert sample data:

INSERT INTO pinot_page_views VALUES
  ('u1', '/home',    'us-east', 1200, '2026-04-01 10:00:00+00'),
  ('u2', '/product', 'us-west', 3400, '2026-04-01 10:00:05+00'),
  ('u3', '/checkout','eu-west', 2100, '2026-04-01 10:00:10+00'),
  ('u1', '/product', 'us-east',  900, '2026-04-01 10:00:15+00'),
  ('u4', '/home',    'ap-east', 1500, '2026-04-01 10:01:00+00');

INSERT INTO pinot_orders VALUES
  ('o1', 'u1', 'p10',  49.99, 'paid',     'us-east', '2026-04-01 10:00:30+00'),
  ('o2', 'u2', 'p20', 199.00, 'paid',     'us-west', '2026-04-01 10:01:10+00'),
  ('o3', 'u3', 'p15',  29.95, 'refunded', 'eu-west', '2026-04-01 10:02:00+00'),
  ('o4', 'u4', 'p10',  49.99, 'paid',     'ap-east', '2026-04-01 10:03:00+00');

Example 1: Tumbling Window Aggregation (RisingWave's Domain)

This is where RisingWave fundamentally differs from Pinot. The materialized view below computes per-minute traffic by page and region, and keeps those results continuously updated. No query-time scanning required.

CREATE MATERIALIZED VIEW pinot_page_views_per_minute AS
SELECT
    window_start,
    window_end,
    page,
    region,
    COUNT(*)         AS view_count,
    AVG(duration_ms) AS avg_duration_ms
FROM TUMBLE(pinot_page_views, event_time, INTERVAL '1 MINUTE')
GROUP BY window_start, window_end, page, region;

Query the result:

SELECT * FROM pinot_page_views_per_minute ORDER BY window_start, page;

Output:

       window_start        |        window_end         |   page    | region  | view_count | avg_duration_ms
---------------------------+---------------------------+-----------+---------+------------+-----------------
 2026-04-01 10:00:00+00:00 | 2026-04-01 10:01:00+00:00 | /checkout | eu-west |          1 |            2100
 2026-04-01 10:00:00+00:00 | 2026-04-01 10:01:00+00:00 | /home     | us-east |          1 |            1200
 2026-04-01 10:00:00+00:00 | 2026-04-01 10:01:00+00:00 | /product  | us-east |          1 |             900
 2026-04-01 10:00:00+00:00 | 2026-04-01 10:01:00+00:00 | /product  | us-west |          1 |            3400
 2026-04-01 10:01:00+00:00 | 2026-04-01 10:02:00+00:00 | /home     | ap-east |          1 |            1500

This result is maintained incrementally. When a new page view event arrives, RisingWave updates only the affected window row, not the entire table. Pinot, by contrast, would recompute this from scratch on each query.

Example 2: Hopping Window for Active User Tracking

Hopping windows overlap, so each event contributes to multiple window buckets. This is useful for rolling active-user counts.

CREATE MATERIALIZED VIEW pinot_active_users_5min AS
SELECT
    window_start,
    window_end,
    COUNT(DISTINCT user_id) AS active_users,
    SUM(duration_ms)        AS total_engagement_ms
FROM HOP(pinot_page_views, event_time, INTERVAL '1 MINUTE', INTERVAL '5 MINUTES')
GROUP BY window_start, window_end;

Query:

SELECT * FROM pinot_active_users_5min ORDER BY window_start LIMIT 5;

Output:

       window_start        |        window_end         | active_users | total_engagement_ms
---------------------------+---------------------------+--------------+---------------------
 2026-04-01 09:56:00+00:00 | 2026-04-01 10:01:00+00:00 |            3 |                7600
 2026-04-01 09:57:00+00:00 | 2026-04-01 10:02:00+00:00 |            4 |                9100
 2026-04-01 09:58:00+00:00 | 2026-04-01 10:03:00+00:00 |            4 |                9100
 2026-04-01 09:59:00+00:00 | 2026-04-01 10:04:00+00:00 |            4 |                9100
 2026-04-01 10:00:00+00:00 | 2026-04-01 10:05:00+00:00 |            4 |                9100

Pinot can answer "how many distinct users were active in the last 5 minutes?" if you query it manually. But RisingWave maintains that answer continuously, so dashboards, alert systems, or downstream sinks always have a fresh result without polling.

Example 3: Stream-Table Join for Real-Time Enrichment

Joining a high-velocity event stream with a slowly-changing dimension table is a core RisingWave pattern. This join produces a continuously updated view of each user's combined engagement and spending, without any external lookup service.

CREATE MATERIALIZED VIEW pinot_user_session_summary AS
SELECT
    pv.user_id,
    pv.region,
    COUNT(pv.page)  AS pages_visited,
    SUM(o.amount)   AS total_spend
FROM pinot_page_views pv
LEFT JOIN pinot_orders o ON pv.user_id = o.user_id
GROUP BY pv.user_id, pv.region;

Query:

SELECT * FROM pinot_user_session_summary ORDER BY user_id;

Output:

 user_id | region  | pages_visited | total_spend
---------+---------+---------------+-------------
 u1      | us-east |             2 |       99.98
 u2      | us-west |             1 |         199
 u3      | eu-west |             1 |       29.95
 u4      | ap-east |             1 |       49.99

Apache Pinot does not natively support streaming joins between two Kafka topics. To get this result in a Pinot-only setup, you would need to denormalize the data before ingestion (typically via Kafka Streams or Flink), then ingest the enriched events into Pinot. With RisingWave, the join is expressed directly in SQL.

Example 4: CTE-Based Cross-Stream Analysis

RisingWave supports Common Table Expressions (CTEs), which Pinot's PQL dialect has limited support for. This allows you to structure complex multi-step logic clearly in a single query.

WITH regional_views AS (
    SELECT
        region,
        COUNT(*)         AS total_views,
        AVG(duration_ms) AS avg_duration
    FROM pinot_page_views
    GROUP BY region
),
regional_orders AS (
    SELECT
        region,
        SUM(amount) AS revenue
    FROM pinot_orders
    WHERE status = 'paid'
    GROUP BY region
)
SELECT
    v.region,
    v.total_views,
    ROUND(v.avg_duration::NUMERIC, 0) AS avg_duration_ms,
    COALESCE(o.revenue, 0)            AS revenue
FROM regional_views v
LEFT JOIN regional_orders o ON v.region = o.region
ORDER BY revenue DESC;

Output:

 region  | total_views | avg_duration_ms | revenue
---------+-------------+-----------------+---------
 us-west |           1 |            3400 |     199
 ap-east |           1 |            1500 |   49.99
 us-east |           2 |            1050 |   49.99
 eu-west |           1 |            2100 |       0

This kind of ad-hoc cross-stream analysis runs as a one-off query in RisingWave. For the continuous version, wrap it in CREATE MATERIALIZED VIEW.

Sinking RisingWave Results into Pinot

Here is a key integration pattern: use RisingWave to do continuous pre-aggregation and stream-table joining, then sink the derived results to Pinot for ad-hoc historical analytics.

In production, you would define a sink from RisingWave to Kafka, and configure Pinot to ingest from that Kafka topic:

-- In RisingWave: sink the continuously updated revenue view to Kafka
-- (Pinot's real-time table then ingests from this topic)
CREATE SINK pinot_revenue_sink
FROM pinot_revenue_by_region
WITH (
    connector = 'kafka',
    properties.bootstrap.server = 'localhost:9092',
    topic = 'rw_revenue_by_region'
) FORMAT PLAIN ENCODE JSON;

Pinot ingests from rw_revenue_by_region and makes the pre-aggregated data available for ad-hoc historical queries. This offloads the aggregation work from Pinot (where it would happen on every query) to RisingWave (where it happens once, incrementally).

Feature Comparison Table

Feature	RisingWave	Apache Pinot
Primary purpose	Streaming database with incremental MVs	Real-time OLAP for ad-hoc analytics
Query model	Standing queries (materialized views updated continuously)	On-demand scan-and-aggregate
Latency from event to result	Sub-100ms (incremental update on arrival)	Sub-second (per-query scan of columnar segments)
SQL dialect	PostgreSQL-compatible	Pinot SQL (ANSI-like, some limitations)
Stream-stream joins	Native (two Kafka topics, two CDC streams, etc.)	Not supported natively
Stream-table joins	Native (lookup join, interval join)	Not supported natively
Window functions	TUMBLE, HOP, SESSION (SQL standard)	Time-boundary filters, not windowing functions
Cascading derived views	Yes (materialized view on materialized view)	No
Ad-hoc scan of large history	Limited (optimized for incremental, not scan)	Core strength (star-tree, inverted indexes)
Historical data range	Depends on state TTL configuration	Unlimited (segment-based retention)
Source connectors	Kafka, Pulsar, Kinesis, CDC (PG/MySQL/MongoDB), S3, 50+ more	Kafka, Pulsar, Kinesis, HDFS/S3 batch
Sink connectors	Kafka, Iceberg, ClickHouse, Snowflake, Elasticsearch, 30+ more	Not applicable (query serving only)
Wire protocol	PostgreSQL	HTTP REST + gRPC
State management	Hummock (LSM-tree on object storage)	Columnar segments on local disk or S3
Exactly-once semantics	Yes (barrier-based checkpointing)	At-least-once ingestion
License	Apache 2.0	Apache 2.0
Managed cloud	RisingWave Cloud	Apache Pinot on managed Kafka clouds

When to Use RisingWave

RisingWave is the right choice when your workload requires:

Continuous computation: You need results to update automatically as events arrive, without external polling or scheduling. Fraud detection, real-time alerting, live dashboards, and operational metrics all fit here.
Multi-stream joins: Your pipeline needs to join two Kafka topics, or join a Kafka topic with a CDC stream from a transactional database. RisingWave handles this natively in SQL with streaming join semantics.
Cascading materialized views: You want to build multi-stage pipelines where one derived result feeds the next, all defined in SQL. This is a first-class RisingWave pattern.
PostgreSQL ecosystem compatibility: Your team uses psql, JDBC, pgAdmin, Grafana, or any other PostgreSQL-compatible tool. RisingWave connects to all of them without an adapter layer.
Stream processing without JVM overhead: RisingWave is written in Rust, which means it runs with a fraction of the memory and CPU overhead of JVM-based tools like Apache Flink or Kafka Streams.

When to Use Apache Pinot

Apache Pinot is the right choice when your workload requires:

Ad-hoc analytics on large historical datasets: Analysts need to slice and dice months or years of data with arbitrary filter combinations that were not known at ingestion time. Pinot's columnar layout, inverted indexes, and star-tree indexes make this fast even at petabyte scale.
User-facing analytics products: You are building an embedded analytics feature inside your product, where end users run their own queries. Pinot's scatter-gather architecture is purpose-built for this concurrency pattern.
High-cardinality dimension filtering: Queries like "show me all users in region X who viewed page Y between these two timestamps" benefit from Pinot's inverted and range indexes.
Replay of historical data for new dashboards: When a new analytics view is needed, Pinot can ingest historical batch segments and backfill results without reprocessing a streaming pipeline from scratch.

Running RisingWave and Pinot Together

The most powerful setup is both systems working in tandem:

RisingWave handles the streaming layer: It ingests from Kafka, applies joins, windowing, and incremental aggregations, and keeps derived results continuously up to date.
Pinot handles the analytics layer: It ingests from either the raw Kafka topics or from RisingWave's output sinks. Analysts query Pinot for ad-hoc historical analysis; operational systems query RisingWave for fresh derived results.

Kafka topics
    |
    +---> RisingWave (continuous joins, incremental aggregations, alerting)
    |         |
    |         +--> Kafka sink topics (pre-aggregated results)
    |                   |
    +-------------------> Apache Pinot (historical OLAP, ad-hoc analyst queries)

This pattern separates concerns cleanly. RisingWave does the hard streaming work: joins, state management, and low-latency derived views. Pinot does what it does best: fast ad-hoc scans over indexed columnar data. Neither system is stretched beyond its design intent.

For teams just starting with stream processing, the RisingWave quickstart is a good entry point. It covers ingesting from Kafka and creating your first materialized view in under 10 minutes.

FAQ

What is the difference between RisingWave and Apache Pinot?

RisingWave is a streaming database that continuously maintains materialized views: SQL queries whose results are incrementally updated as new data arrives. Apache Pinot is a real-time OLAP store that serves fast ad-hoc analytical queries over large historical datasets using columnar storage and specialized indexes. RisingWave is optimized for continuous computation; Pinot is optimized for on-demand scan-and-aggregate at scale.

Can RisingWave replace Apache Pinot?

No, and the reverse is also true. The two systems address different problems. RisingWave cannot replace Pinot for ad-hoc analytics on petabytes of historical data with arbitrary query shapes - its query engine is optimized for serving precomputed results, not scanning large archives. Pinot cannot replace RisingWave for continuous stream processing, multi-stream joins, or event-driven alerting, because it has no concept of standing queries that update results automatically. The better question is which one you need first, and whether your workload eventually needs both.

Does Apache Pinot support stream-stream joins?

No. Apache Pinot does not natively support joins between two streaming sources. It supports lookup joins between a real-time ingestion stream and a pre-loaded lookup table (via the Lookup UDF), but joining two Kafka topics in real time requires a separate system like RisingWave, Apache Flink, or Kafka Streams to produce the joined stream before ingestion. RisingWave handles this natively in SQL.

When should I use RisingWave instead of Apache Pinot for real-time analytics?

Use RisingWave when you need results that update automatically without polling: fraud alerts, live leaderboards, operational metrics, and anomaly detection. Use Pinot when analysts need to explore historical data with ad-hoc queries that were not known at pipeline design time. If your use case involves both (continuous operational results and historical ad-hoc analysis), run both systems and use RisingWave to feed Pinot with pre-aggregated streams.

Conclusion

RisingWave and Apache Pinot are both real-time systems, but they occupy different positions in the data architecture:

RisingWave is a streaming database. It maintains SQL results continuously and incrementally. Use it when you need results that are always fresh without per-query recomputation.
Apache Pinot is a real-time OLAP engine. It serves ad-hoc analytical queries fast over large historical datasets. Use it when analysts need to explore data with arbitrary query shapes.
Neither replaces the other. A well-designed architecture often uses both: RisingWave for continuous processing and derived results, Pinot for historical ad-hoc analytics.
RisingWave can feed Pinot. Use RisingWave to do the hard streaming work (joins, aggregations, enrichment) and sink pre-aggregated results to Kafka for Pinot to ingest. This offloads per-query computation from Pinot and ensures Pinot serves richer, pre-joined data.
SQL is the common interface. Both systems speak SQL, which makes it straightforward to port logic between them as your architecture evolves.

For more on how RisingWave handles continuous aggregation, see the materialized views documentation. For a broader look at when to use a streaming database versus a traditional data warehouse, read What is a Streaming Database.

Ready to try this yourself? Try RisingWave Cloud free, no credit card required. Sign up →

Join our Slack community to ask questions and connect with other stream processing developers.