Kafka to Grafana: Real-Time Dashboards Without Flink

You can build real-time Grafana dashboards from Kafka streams without Flink. Connect Kafka to RisingWave (a PostgreSQL-compatible streaming database), define your metrics as materialized views in SQL, and point Grafana at RisingWave using the standard PostgreSQL data source. Every dashboard panel stays current in seconds with no JVM cluster to operate.

Why the Flink Path to Grafana Is Painful

The canonical real-time dashboard stack before RisingWave looked like this:

Kafka --> Flink (Java jobs) --> PostgreSQL / ClickHouse --> Grafana

On paper, the architecture makes sense. In practice, the Flink layer introduces substantial operational complexity that teams often underestimate:

A Flink production deployment means a JobManager, multiple TaskManagers, ZooKeeper for HA, and local SSDs for RocksDB state backends. You are operating a distributed JVM cluster.
Every new dashboard metric requires a code change, a Maven build, a JAR deployment, and a restart with a savepoint.
Flink checkpoints must be tuned to avoid out-of-memory failures under backpressure. When they fail at 2 AM, someone gets paged.
Flink does not serve queries. You still need a separate PostgreSQL or ClickHouse instance to hold processed results so Grafana can read them.

The result is a four-component system where the complexity is proportional to the number of streaming jobs you run.

RisingWave collapses Flink plus the serving database into a single process that speaks PostgreSQL. You write SQL to define how Kafka events become dashboard metrics. Grafana connects to RisingWave exactly the same way it connects to PostgreSQL. No Java code. No cluster manager. No checkpoint tuning.

Architecture: Kafka to Grafana in Three Components

Kafka Topics
    |
    v
RisingWave (streaming database)
  - Reads Kafka natively
  - Maintains materialized views incrementally
  - Serves queries via PostgreSQL wire protocol
    |
    v
Grafana
  - PostgreSQL data source
  - Queries materialized views
  - Sub-second refresh latency

The key difference from the Flink stack: RisingWave is both the stream processor and the query serving layer. Grafana queries the same system that processes the stream. You do not need a separate sink database.

How RisingWave Keeps Dashboards Fresh

When Kafka produces a new message, RisingWave applies it incrementally to every downstream materialized view. Only the affected rows update. When Grafana runs its next refresh query, it reads the already-computed result from the materialized view. The query returns in milliseconds regardless of how much raw event history exists in Kafka.

This is fundamentally different from batch recomputation. A Grafana panel backed by a materialized view does not trigger a full table scan every refresh interval. The compute happened when the Kafka message arrived.

Setting Up the Data Model

For this walkthrough, we use two tables that simulate real Kafka ingestion. In production you replace the CREATE TABLE statements with CREATE SOURCE pointing at your Kafka brokers (shown at the end of this section).

Events Table

Application events: purchases, page views, search queries, with HTTP status codes and revenue fields.

CREATE TABLE grafana_events (
    event_id      VARCHAR,
    user_id       VARCHAR,
    event_type    VARCHAR,
    service       VARCHAR,
    status_code   INT,
    region        VARCHAR,
    revenue_cents BIGINT,
    event_time    TIMESTAMPTZ
);

Metrics Table

Service-level metrics emitted by your infrastructure: latency, CPU, memory.

CREATE TABLE grafana_metrics (
    metric_id     VARCHAR,
    service       VARCHAR,
    metric_name   VARCHAR,
    value         DOUBLE PRECISION,
    host          VARCHAR,
    metric_time   TIMESTAMPTZ
);

Insert some sample rows to work with:

INSERT INTO grafana_events VALUES
    ('e001', 'u101', 'purchase', 'checkout', 200, 'us-east', 4999,  '2026-04-01 10:00:05+00'),
    ('e002', 'u102', 'purchase', 'checkout', 200, 'us-west', 12999, '2026-04-01 10:00:10+00'),
    ('e003', 'u103', 'pageview', 'frontend', 200, 'us-east', 0,     '2026-04-01 10:00:15+00'),
    ('e004', 'u104', 'purchase', 'checkout', 500, 'eu-west', 0,     '2026-04-01 10:00:20+00'),
    ('e005', 'u105', 'purchase', 'checkout', 200, 'us-east', 8999,  '2026-04-01 10:00:25+00'),
    ('e006', 'u106', 'pageview', 'frontend', 200, 'eu-west', 0,     '2026-04-01 10:00:30+00'),
    ('e007', 'u101', 'purchase', 'checkout', 500, 'us-east', 0,     '2026-04-01 10:01:05+00'),
    ('e008', 'u108', 'purchase', 'checkout', 200, 'us-west', 3499,  '2026-04-01 10:01:10+00'),
    ('e009', 'u109', 'search',   'frontend', 200, 'ap-east', 0,     '2026-04-01 10:01:15+00'),
    ('e010', 'u110', 'purchase', 'checkout', 200, 'us-east', 15999, '2026-04-01 10:01:20+00');

INSERT INTO grafana_metrics VALUES
    ('m001', 'checkout', 'latency_ms', 45.2,  'host-1', '2026-04-01 10:00:05+00'),
    ('m002', 'checkout', 'latency_ms', 52.1,  'host-2', '2026-04-01 10:00:10+00'),
    ('m003', 'frontend', 'latency_ms', 12.3,  'host-3', '2026-04-01 10:00:15+00'),
    ('m004', 'checkout', 'latency_ms', 890.5, 'host-1', '2026-04-01 10:00:20+00'),
    ('m005', 'frontend', 'latency_ms', 14.7,  'host-3', '2026-04-01 10:00:25+00'),
    ('m006', 'checkout', 'latency_ms', 48.9,  'host-2', '2026-04-01 10:01:05+00'),
    ('m007', 'frontend', 'latency_ms', 11.2,  'host-3', '2026-04-01 10:01:10+00'),
    ('m008', 'checkout', 'latency_ms', 61.3,  'host-1', '2026-04-01 10:01:15+00');

Connecting Kafka in Production

Replace the CREATE TABLE above with a CREATE SOURCE that points at your Kafka cluster:

CREATE SOURCE grafana_events_kafka (
    event_id      VARCHAR,
    user_id       VARCHAR,
    event_type    VARCHAR,
    service       VARCHAR,
    status_code   INT,
    region        VARCHAR,
    revenue_cents BIGINT,
    event_time    TIMESTAMPTZ
)
WITH (
    connector      = 'kafka',
    topic          = 'application-events',
    properties.bootstrap.server = 'broker1:9092,broker2:9092',
    scan.startup.mode = 'latest'
)
FORMAT PLAIN ENCODE JSON;

RisingWave's Kafka connector handles offset tracking internally. No consumer group configuration, no external offset storage, no hand-rolled offset commits. See the RisingWave Kafka source documentation for the full connector reference.

Building the Dashboard Materialized Views

Each section below creates one materialized view that backs a Grafana panel. All views use the TUMBLE window function, which slices a continuous event stream into fixed-size time buckets.

Panel 1: Revenue Per Minute by Region

A time-series panel showing how much revenue each region generates per minute, with successful orders counted separately from failed ones.

CREATE MATERIALIZED VIEW grafana_revenue_per_minute AS
SELECT
    window_start,
    window_end,
    region,
    COUNT(*) FILTER (WHERE status_code = 200 AND event_type = 'purchase') AS successful_orders,
    ROUND(SUM(revenue_cents) FILTER (WHERE status_code = 200)::NUMERIC / 100, 2) AS revenue_usd
FROM TUMBLE(grafana_events, event_time, INTERVAL '1 MINUTE')
GROUP BY window_start, window_end, region;

SELECT * FROM grafana_revenue_per_minute ORDER BY window_start, region;

       window_start        |        window_end         | region  | successful_orders | revenue_usd
---------------------------+---------------------------+---------+-------------------+-------------
 2026-04-01 10:00:00+00:00 | 2026-04-01 10:01:00+00:00 | eu-west |                 0 |           0
 2026-04-01 10:00:00+00:00 | 2026-04-01 10:01:00+00:00 | us-east |                 2 |      139.98
 2026-04-01 10:00:00+00:00 | 2026-04-01 10:01:00+00:00 | us-west |                 1 |      129.99
 2026-04-01 10:01:00+00:00 | 2026-04-01 10:02:00+00:00 | ap-east |                 0 |           0
 2026-04-01 10:01:00+00:00 | 2026-04-01 10:02:00+00:00 | us-east |                 1 |      159.99
 2026-04-01 10:01:00+00:00 | 2026-04-01 10:02:00+00:00 | us-west |                 1 |       34.99
(6 rows)

In Grafana, configure this panel as a time-series chart with window_start as the time column, revenue_usd as the value, and region as the series label. Set the panel's time range to "Last 1 hour" and the refresh to 10s.

Panel 2: Error Rate Per Service

An error rate graph lets your on-call team see immediately which service is producing 5xx responses and by how much.

CREATE MATERIALIZED VIEW grafana_error_rate_per_service AS
SELECT
    window_start,
    window_end,
    service,
    COUNT(*) AS total_requests,
    COUNT(*) FILTER (WHERE status_code >= 500) AS error_count,
    ROUND(
        100.0 * COUNT(*) FILTER (WHERE status_code >= 500) / COUNT(*),
        2
    ) AS error_rate_pct
FROM TUMBLE(grafana_events, event_time, INTERVAL '1 MINUTE')
GROUP BY window_start, window_end, service;

SELECT * FROM grafana_error_rate_per_service ORDER BY window_start, service;

       window_start        |        window_end         | service  | total_requests | error_count | error_rate_pct
---------------------------+---------------------------+----------+----------------+-------------+----------------
 2026-04-01 10:00:00+00:00 | 2026-04-01 10:01:00+00:00 | checkout |              4 |           1 |           25.0
 2026-04-01 10:00:00+00:00 | 2026-04-01 10:01:00+00:00 | frontend |              2 |           0 |              0
 2026-04-01 10:01:00+00:00 | 2026-04-01 10:02:00+00:00 | checkout |              3 |           1 |          33.33
 2026-04-01 10:01:00+00:00 | 2026-04-01 10:02:00+00:00 | frontend |              1 |           0 |              0
(4 rows)

Add a Grafana alert threshold at 5% error rate. Grafana reads directly from the materialized view at query time, so the alert evaluates pre-computed results rather than scanning raw events.

Panel 3: Active Users in 5-Minute Windows

Unique active users per region gives a sense of traffic distribution. A 5-minute tumbling window smooths short spikes while staying responsive to real shifts.

CREATE MATERIALIZED VIEW grafana_active_users_5min AS
SELECT
    window_start,
    window_end,
    region,
    COUNT(DISTINCT user_id) AS active_users
FROM TUMBLE(grafana_events, event_time, INTERVAL '5 MINUTES')
GROUP BY window_start, window_end, region;

SELECT * FROM grafana_active_users_5min ORDER BY window_start, region;

       window_start        |        window_end         | region  | active_users
---------------------------+---------------------------+---------+--------------
 2026-04-01 10:00:00+00:00 | 2026-04-01 10:05:00+00:00 | ap-east |            1
 2026-04-01 10:00:00+00:00 | 2026-04-01 10:05:00+00:00 | eu-west |            2
 2026-04-01 10:00:00+00:00 | 2026-04-01 10:05:00+00:00 | us-east |            4
 2026-04-01 10:00:00+00:00 | 2026-04-01 10:05:00+00:00 | us-west |            2
(4 rows)

Panel 4: P99 Latency Per Service

Latency percentiles expose tail behavior that averages hide. RisingWave supports PERCENTILE_CONT inside windowed aggregations.

CREATE MATERIALIZED VIEW grafana_latency_p99 AS
SELECT
    window_start,
    window_end,
    service,
    ROUND(AVG(value)::NUMERIC, 1)                                            AS avg_latency_ms,
    ROUND(PERCENTILE_CONT(0.99) WITHIN GROUP (ORDER BY value)::NUMERIC, 1)   AS p99_latency_ms,
    COUNT(*) AS sample_count
FROM TUMBLE(grafana_metrics, metric_time, INTERVAL '1 MINUTE')
WHERE metric_name = 'latency_ms'
GROUP BY window_start, window_end, service;

SELECT * FROM grafana_latency_p99 ORDER BY window_start, service;

       window_start        |        window_end         | service  | avg_latency_ms | p99_latency_ms | sample_count
---------------------------+---------------------------+----------+----------------+----------------+--------------
 2026-04-01 10:00:00+00:00 | 2026-04-01 10:01:00+00:00 | checkout |          329.3 |          873.7 |            3
 2026-04-01 10:00:00+00:00 | 2026-04-01 10:01:00+00:00 | frontend |           13.5 |           14.7 |            2
 2026-04-01 10:01:00+00:00 | 2026-04-01 10:02:00+00:00 | checkout |           55.1 |           61.2 |            2
 2026-04-01 10:01:00+00:00 | 2026-04-01 10:02:00+00:00 | frontend |           11.2 |           11.2 |            1
(4 rows)

Notice the checkout service's P99 in the first window: 873.7 ms versus an average of 329.3 ms. That gap signals a latency spike that an average-only panel would have masked.

Connecting Grafana to RisingWave

RisingWave speaks the PostgreSQL wire protocol. Grafana's built-in PostgreSQL data source works without any plugin.

In Grafana:

Go to Connections > Data sources > Add data source.
Select PostgreSQL.
Fill in the connection fields:

Field	Value
Host	`your-risingwave-host:4566`
Database	`dev`
User	`root`
Password	(leave empty for local; set for cloud deployments)
TLS/SSL mode	`disable` (local) or `require` (cloud)
PostgreSQL version	`13`

Click Save & Test. Grafana should report a successful connection.

Once connected, each Grafana panel's query tab looks like standard SQL. For the revenue panel:

SELECT
    window_start AS time,
    region,
    revenue_usd
FROM grafana_revenue_per_minute
WHERE window_start >= $__timeFrom()::timestamptz
  AND window_start <  $__timeTo()::timestamptz
ORDER BY window_start;

Grafana's $__timeFrom() and $__timeTo() macros inject the dashboard's selected time range as bound parameters. Because window_start is already indexed inside the materialized view's storage, this query typically returns in under 10 milliseconds even across weeks of data.

Operational Comparison: Flink vs. RisingWave for Grafana Dashboards

The table below compares the operational profile of each approach for a team running 10-15 dashboard panels backed by Kafka streams.

Dimension	Apache Flink	RisingWave
Infrastructure to operate	JobManager, TaskManagers (JVM), ZooKeeper, separate sink DB	Single RisingWave cluster (or cloud)
Language to write metrics	Java or Scala (+ Flink SQL for some cases)	Standard SQL
Adding a new dashboard metric	Code change, build, deploy JAR, restart with savepoint	`CREATE MATERIALIZED VIEW`
Grafana connection	Via sink DB (PostgreSQL, ClickHouse, etc.)	Direct PostgreSQL connection
State management	RocksDB, tuned per job	Automatic (stored on S3)
Failure recovery	Restore from checkpoint, replay Kafka	Automatic from internal checkpoint
Benchmark evidence	Baseline	22 of 27 Nexmark queries faster than Flink

The operational simplicity advantage compounds as you add more dashboard panels. In Flink, each new metric is a new job with its own lifecycle, checkpoint, and potential failure mode. In RisingWave, it is a SQL statement.

For a deeper look at the cost dimension, see Flink vs RisingWave: Total Cost of Ownership.

When to Keep Flink

RisingWave is the right choice when your primary need is real-time aggregations, joins, and windowed metrics served to dashboards or applications. Flink may remain the better option when:

You need custom Java logic that cannot be expressed in SQL or Python UDFs.
You have complex CEP (complex event processing) patterns requiring Flink's PatternStream API.
You are building large-scale ML training pipelines where Flink's DataStream API integrates with your existing Java ML stack.
Your team has deep Flink expertise and the cluster is already operating smoothly.

For most dashboard use cases (revenue tracking, error rate monitoring, latency percentiles, active user counts), SQL is sufficient and RisingWave's operational simplicity pays for itself quickly.

See When to Use Flink vs a Streaming Database for a broader decision framework.

Scaling and Production Considerations

Kafka consumer parallelism: RisingWave automatically parallelizes Kafka consumption across its compute nodes. You do not configure consumer group partitions manually. When you scale RisingWave horizontally, Kafka partition assignment redistributes automatically.

Backpressure handling: RisingWave applies backpressure natively. If a materialized view computation falls behind, ingestion slows at the Kafka consumer, preventing out-of-memory conditions. No external tuning is required.

Late-arriving events: Tumbling windows in RisingWave close when the system watermark advances past the window boundary. Events that arrive late are attributed to the correct window if they fall within the configured watermark delay. Configure this per source:

CREATE TABLE grafana_events (
    event_id      VARCHAR,
    user_id       VARCHAR,
    event_type    VARCHAR,
    service       VARCHAR,
    status_code   INT,
    region        VARCHAR,
    revenue_cents BIGINT,
    event_time    TIMESTAMPTZ
) APPEND ONLY;

For a full treatment of watermarks and late data, see the RisingWave watermark documentation.

Dashboard refresh rate: Grafana refresh intervals of 5-10 seconds are well within RisingWave's capabilities for most workloads. Materialized view queries return pre-computed results, so Grafana does not trigger stream recomputation on each refresh. You can safely set aggressive refresh intervals without straining RisingWave.

Cleanup

If you were following along locally, remove the objects created in this tutorial:

DROP MATERIALIZED VIEW IF EXISTS grafana_latency_p99;
DROP MATERIALIZED VIEW IF EXISTS grafana_active_users_5min;
DROP MATERIALIZED VIEW IF EXISTS grafana_error_rate_per_service;
DROP MATERIALIZED VIEW IF EXISTS grafana_revenue_per_minute;
DROP TABLE IF EXISTS grafana_events;
DROP TABLE IF EXISTS grafana_metrics;

FAQ

Can Grafana connect to RisingWave without any special plugin?

Yes. RisingWave implements the PostgreSQL wire protocol (version 3.0). Grafana's built-in PostgreSQL data source connects to RisingWave on port 4566 without any additional plugin or driver. Set the PostgreSQL version to 13 in the data source configuration.

How fresh is the data in my Grafana panels?

Data in Grafana panels reflects events that have been processed by RisingWave's materialized views. Under normal conditions, materialized views update within 1-2 seconds of a Kafka message being produced. The end-to-end latency from Kafka producer to Grafana panel is typically 2-5 seconds, including Kafka replication and RisingWave's checkpoint interval.

Do I still need Flink if I already use it for other jobs?

Not for the dashboard use case. You can run RisingWave alongside an existing Flink deployment and migrate dashboard workloads incrementally. Teams often start by moving high-churn dashboard metrics (where Flink's JAR deploy cycle is most painful) and leave complex event processing jobs in Flink until they are confident in RisingWave's behavior for their workload. See Migrating from Apache Flink to RisingWave for a step-by-step migration playbook.

What happens to Grafana if RisingWave restarts?

RisingWave persists materialized view state to S3. After a restart, it replays recent Kafka offsets to bring views up to date before accepting queries. This typically takes 5-30 seconds depending on checkpoint interval and how far behind the views are. During this window, Grafana panels may show stale data or return errors. For production deployments, configure RisingWave in cluster mode with multiple replicas to eliminate single-node restarts.