How to Stream Time-Series Sensor Data into Real-Time Analytics

How to Stream Time-Series Sensor Data into Real-Time Analytics

Streaming time-series sensor data into real-time analytics means ingesting continuous device readings—temperature, pressure, voltage—and querying aggregates with sub-second latency. RisingWave, a PostgreSQL-compatible streaming database, makes this possible with pure SQL and no extra infrastructure.

Why Time-Series Sensor Streaming Matters

Industrial and consumer IoT deployments generate millions of sensor readings every second. Batch pipelines that load data into a data warehouse and run hourly queries cannot detect a failing motor, a temperature spike, or a pressure anomaly in time to act. By the time the batch job finishes, the damage is done.

Real-time streaming analytics closes that gap. Instead of waiting for data to land in cold storage, you run continuous queries directly on the event stream. Operators see live gauges. Alerts fire in milliseconds. Predictive models consume fresh feature vectors instead of yesterday's data.

The challenge has traditionally been complexity: you needed Kafka, Flink or Spark Streaming, a time-series database, and a separate query layer stitched together with custom code. RisingWave collapses that stack into a single PostgreSQL-compatible interface.

How Streaming SQL Solves This

RisingWave stores incrementally maintained materialized views. Every time a new sensor reading arrives, the database updates only the affected rows in the view—no full recomputation. You query the view with standard SQL and always get a fresh answer. Latency is sub-second even at millions of events per second.

Key primitives for time-series work:

  • CREATE SOURCE — connect to Kafka topics carrying sensor JSON or Avro payloads
  • TUMBLE / HOP windows — fixed and sliding time-bucket aggregations
  • CREATE MATERIALIZED VIEW — incrementally maintained query results
  • CREATE SINK — push alerts or aggregates downstream to Kafka, Postgres, or other systems

Step-by-Step Tutorial

Step 1: Connect Your Data Source

Sensor devices publish readings to a Kafka topic. Create a source that maps the JSON schema to SQL columns.

CREATE SOURCE sensor_readings (
    device_id    VARCHAR,
    sensor_type  VARCHAR,
    value        DOUBLE,
    unit         VARCHAR,
    event_time   TIMESTAMPTZ,
    location     VARCHAR
) WITH (
    connector = 'kafka',
    topic = 'sensor-readings',
    properties.bootstrap.server = 'kafka:9092',
    scan.startup.mode = 'earliest'
) FORMAT PLAIN ENCODE JSON;

Once the source is created, RisingWave tails the Kafka topic continuously. No polling required.

Step 2: Build the Real-Time View

Create a materialized view that shows the latest reading per device. This view is always up to date—each new Kafka message triggers an incremental update.

CREATE MATERIALIZED VIEW latest_sensor_readings AS
SELECT
    device_id,
    sensor_type,
    value,
    unit,
    location,
    event_time,
    ROW_NUMBER() OVER (
        PARTITION BY device_id, sensor_type
        ORDER BY event_time DESC
    ) AS rn
FROM sensor_readings;

CREATE MATERIALIZED VIEW current_device_state AS
SELECT device_id, sensor_type, value, unit, location, event_time
FROM latest_sensor_readings
WHERE rn = 1;

Your dashboard queries current_device_state like a regular Postgres table and always sees the freshest data.

Step 3: Window-Based Aggregations

Tumble windows compute per-minute statistics across the entire fleet. RisingWave emits results as each window closes.

CREATE MATERIALIZED VIEW sensor_stats_1min AS
SELECT
    device_id,
    sensor_type,
    window_start,
    window_end,
    COUNT(*)                        AS reading_count,
    AVG(value)                      AS avg_value,
    MIN(value)                      AS min_value,
    MAX(value)                      AS max_value,
    STDDEV(value)                   AS stddev_value
FROM TUMBLE(sensor_readings, event_time, INTERVAL '1 MINUTE')
GROUP BY device_id, sensor_type, window_start, window_end;

For a sliding 5-minute window that updates every minute, use HOP:

CREATE MATERIALIZED VIEW sensor_stats_5min_hop AS
SELECT
    device_id,
    sensor_type,
    window_start,
    window_end,
    AVG(value)  AS avg_value,
    MAX(value)  AS max_value
FROM HOP(sensor_readings, event_time, INTERVAL '1 MINUTE', INTERVAL '5 MINUTES')
GROUP BY device_id, sensor_type, window_start, window_end;

Step 4: Alerts and Sinks

Define threshold-based alerts as a materialized view, then sink them to a Kafka topic for downstream notification systems.

CREATE MATERIALIZED VIEW temperature_alerts AS
SELECT
    device_id,
    location,
    value        AS temperature,
    event_time,
    'HIGH_TEMP'  AS alert_type
FROM sensor_readings
WHERE sensor_type = 'temperature'
  AND value > 85.0;

CREATE SINK temperature_alerts_sink
FROM temperature_alerts
WITH (
    connector = 'kafka',
    topic = 'sensor-alerts',
    properties.bootstrap.server = 'kafka:9092'
) FORMAT PLAIN ENCODE JSON;

Alerts reach downstream consumers with millisecond latency from the moment the reading arrives.

Comparison Table

CapabilityBatch PipelineStreaming SQL (RisingWave)
End-to-end latencyMinutes to hoursSub-second
InfrastructureSpark + DWH + schedulerSingle streaming database
SQL complexityStandard SQLStandard SQL + window functions
Alert freshnessNext batch runMilliseconds
Schema changesPipeline redeploymentALTER TABLE / new views
BackfillManual re-runAutomatic from Kafka offsets

FAQ

How does RisingWave handle late-arriving sensor data?

RisingWave supports watermark-based event-time processing. You declare a watermark column on your source and RisingWave waits for a configurable delay before closing windows, accommodating network jitter from edge devices.

Can I query historical data alongside live streams?

Yes. Use a CREATE TABLE to store historical data and join it with your streaming source in a materialized view. RisingWave's temporal join syntax handles point-in-time lookups efficiently.

What Kafka message formats are supported?

RisingWave supports JSON, Avro (with Schema Registry), Protobuf, and CSV for Kafka sources. For binary protocols from industrial sensors, use a Kafka Connect transformer or a lightweight edge agent to normalize payloads before they reach Kafka.

Key Takeaways

  • RisingWave connects directly to Kafka and maintains materialized views incrementally, delivering sub-second query latency over continuous sensor streams.
  • TUMBLE and HOP window functions compute per-minute and sliding statistics with no custom aggregation code.
  • Threshold alerts defined as materialized views automatically propagate to Kafka sinks, enabling real-time notification pipelines with millisecond freshness.
  • The entire pipeline—ingest, transform, aggregate, alert—is expressed in standard SQL, eliminating the need for Flink or Spark.

Best-in-Class Event Streaming
for Agents, Apps, and Analytics
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.