Managing 100,000 industrial IoT sensors with streaming SQL means ingesting millions of telemetry events per minute, maintaining live aggregates per sensor and per fleet, and delivering sub-second query latency—all from a single streaming database. RisingWave, a PostgreSQL-compatible streaming database, is built for exactly this scale.
Why Scale Is a First-Class IoT Problem
A modern industrial operation—an oil refinery, a utility grid, a smart city network—can easily deploy tens of thousands of sensors. Each sensor publishes readings every second or every few seconds. At 100,000 sensors reporting once per second, that's 100,000 events per second, or 360 million events per hour.
Traditional analytics architectures struggle at this scale. A single-node time-series database saturates at a fraction of this throughput. A batch pipeline introduces minutes of latency. A streaming platform like Flink requires Java expertise and complex job management.
Industrial operators need a system that can grow with their sensor fleet, maintain sub-second query latency regardless of scale, and be operated by teams that know SQL—not distributed systems engineers.
How Streaming SQL Solves This
RisingWave is a distributed streaming database. It partitions both ingestion and computation across multiple nodes. Sources shard across Kafka partitions, and materialized views compute in parallel across the cluster. Adding nodes scales both throughput and state capacity linearly.
The SQL interface remains constant regardless of scale. A query that works on 1,000 sensors works identically on 1,000,000 sensors—the system handles parallelism internally.
Step-by-Step Tutorial
Step 1: Connect Your Data Source
Partition your sensor stream across multiple Kafka topics by facility or sensor group to maximize parallelism:
-- Source for facility A sensors
CREATE SOURCE sensors_facility_a (
sensor_id VARCHAR,
facility_id VARCHAR,
zone VARCHAR,
metric VARCHAR,
value DOUBLE,
quality SMALLINT,
event_time TIMESTAMPTZ
) WITH (
connector = 'kafka',
topic = 'sensors-facility-a',
properties.bootstrap.server = 'kafka:9092',
scan.startup.mode = 'earliest'
) FORMAT PLAIN ENCODE JSON;
-- Unified view across all facility sources
CREATE MATERIALIZED VIEW all_sensor_events AS
SELECT * FROM sensors_facility_a
UNION ALL
SELECT * FROM sensors_facility_b
UNION ALL
SELECT * FROM sensors_facility_c;
For a single unified topic with high partition count, RisingWave parallelizes ingestion automatically:
CREATE SOURCE all_sensors (
sensor_id VARCHAR,
facility_id VARCHAR,
zone VARCHAR,
metric VARCHAR,
value DOUBLE,
quality SMALLINT,
event_time TIMESTAMPTZ
) WITH (
connector = 'kafka',
topic = 'all-industrial-sensors',
properties.bootstrap.server = 'kafka:9092',
scan.startup.mode = 'earliest'
) FORMAT PLAIN ENCODE JSON;
Step 2: Build the Real-Time View
At 100,000 sensors, the current-state view must be incremental to avoid full recomputation on every event. RisingWave's materialized views handle this automatically:
CREATE MATERIALIZED VIEW sensor_current_state AS
SELECT DISTINCT ON (sensor_id, metric)
sensor_id,
facility_id,
zone,
metric,
value,
quality,
event_time AS last_seen
FROM all_sensors
WHERE quality > 0
ORDER BY sensor_id, metric, event_time DESC;
Track sensor liveness—identify sensors that haven't reported within the expected interval:
CREATE MATERIALIZED VIEW sensor_liveness AS
SELECT
sensor_id,
facility_id,
zone,
MAX(event_time) AS last_seen,
NOW() - MAX(event_time) AS silence_duration,
COUNT(*) FILTER (WHERE quality > 0) * 1.0
/ NULLIF(COUNT(*), 0) AS quality_rate
FROM all_sensors
GROUP BY sensor_id, facility_id, zone;
Step 3: Window-Based Aggregations
Compute per-zone 1-minute rollups to support fleet-level dashboards without querying 100,000 individual series:
CREATE MATERIALIZED VIEW zone_metrics_1min AS
SELECT
facility_id,
zone,
metric,
window_start,
window_end,
COUNT(DISTINCT sensor_id) AS active_sensors,
AVG(value) AS fleet_avg,
MIN(value) AS fleet_min,
MAX(value) AS fleet_max,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY value) AS median_value,
PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY value) AS p95_value,
COUNT(*) FILTER (WHERE quality = 0) AS bad_quality_count
FROM TUMBLE(all_sensors, event_time, INTERVAL '1 MINUTE')
GROUP BY facility_id, zone, metric, window_start, window_end;
Compute per-facility health summaries:
CREATE MATERIALIZED VIEW facility_health_1min AS
SELECT
facility_id,
window_start,
window_end,
COUNT(DISTINCT sensor_id) AS total_sensors,
COUNT(DISTINCT sensor_id) FILTER (
WHERE quality > 0
) AS healthy_sensors,
ROUND(
100.0 * COUNT(DISTINCT sensor_id) FILTER (WHERE quality > 0)
/ NULLIF(COUNT(DISTINCT sensor_id), 0), 2
) AS health_pct
FROM TUMBLE(all_sensors, event_time, INTERVAL '1 MINUTE')
GROUP BY facility_id, window_start, window_end;
Step 4: Alerts and Sinks
At scale, alert routing is as important as alert detection. Route by facility and severity:
CREATE MATERIALIZED VIEW fleet_alerts AS
SELECT
s.sensor_id,
s.facility_id,
s.zone,
s.metric,
s.value,
s.event_time,
'THRESHOLD_EXCEEDED' AS alert_type,
'HIGH' AS severity
FROM all_sensors s
JOIN sensor_thresholds t
ON s.sensor_id = t.sensor_id
AND s.metric = t.metric
WHERE s.quality > 0
AND s.value > t.high_threshold;
CREATE MATERIALIZED VIEW silent_sensor_alerts AS
SELECT
sensor_id,
facility_id,
zone,
last_seen,
silence_duration,
'SENSOR_SILENT' AS alert_type
FROM sensor_liveness
WHERE silence_duration > INTERVAL '5 MINUTES';
CREATE SINK fleet_alerts_sink
FROM (
SELECT sensor_id, facility_id, zone, metric, alert_type, severity,
event_time AS ts FROM fleet_alerts
UNION ALL
SELECT sensor_id, facility_id, zone, 'liveness' AS metric,
alert_type, 'WARNING' AS severity, last_seen AS ts FROM silent_sensor_alerts
)
WITH (
connector = 'kafka',
topic = 'fleet-alerts',
properties.bootstrap.server = 'kafka:9092'
) FORMAT PLAIN ENCODE JSON;
Comparison Table
| Single-Node TSDB | Distributed TSDB | Streaming SQL (RisingWave) | |
| Max sensor scale | ~10K | ~1M | ~1M+ |
| Query language | InfluxQL / PromQL | Vendor-specific | PostgreSQL SQL |
| Fleet aggregations | Manual grouping | Limited | Native GROUP BY |
| Alert latency | Seconds | Seconds | Sub-second |
| Cross-metric joins | No | No | Full SQL JOINs |
| Operational complexity | Low | High | Low-Medium |
FAQ
How does RisingWave scale horizontally for 100,000 sensors?
RisingWave distributes both source ingestion and materialized view computation across worker nodes. You can scale the cluster by adding nodes, and the system rebalances automatically. Kafka partitioning provides the parallelism at the ingestion layer.
How do I manage thresholds for 100,000 different sensors?
Store thresholds in a CREATE TABLE (RisingWave's persistent table) and join them in your alerts materialized view. Update thresholds with standard INSERT/UPDATE/DELETE statements, and the view automatically reflects the new rules.
What is the recommended Kafka partition count for this scale?
A general rule of thumb is one Kafka partition per 5,000-10,000 sensors per second of expected event rate, with RisingWave parallelism matching the partition count. Profile your specific workload and scale accordingly.
Key Takeaways
- RisingWave scales horizontally to handle millions of sensor events per second, distributing ingestion and computation across nodes while maintaining a single SQL interface.
- Zone-level and facility-level rollup views aggregate 100,000 sensor series into manageable dashboard metrics with sub-second latency.
- Sensor liveness tracking detects silent devices automatically using a materialized view on
MAX(event_time). - Threshold-based alerts join against a configurable threshold table, enabling fleet-wide alert management without code changes.

