Renewable energy forecasting with streaming SQL means ingesting live weather observations, satellite irradiance data, and turbine/panel telemetry into a PostgreSQL-compatible streaming database, then maintaining continuously updated forecast adjustment views that narrow the gap between predicted and actual generation without waiting for batch model retraining cycles.
Why This Matters for Energy Operators
Solar and wind generation forecasts are never perfectly accurate, but the error grows dramatically when the forecast model is not being corrected with live observations. A solar forecast made at 6 AM using yesterday's weather model can be 20–40% off by noon if cloud cover has shifted unexpectedly. Wind ramp events—sudden large changes in wind speed—are routinely missed by day-ahead forecasts and can leave grid operators balancing hundreds of megawatts of unexpected shortfall or surplus.
The standard approach is to rerun the forecast model every hour or every 15 minutes using the latest NWP (Numerical Weather Prediction) data. This helps, but there is still a gap: the model ingestion pipeline takes time, and the result is a snapshot that begins aging immediately.
Streaming SQL adds a complementary layer: a continuously updated bias-correction and nowcasting system that adjusts the baseline forecast using real-time sensor readings without rerunning the full physics model.
How Streaming SQL Works for Energy Data
RisingWave ingests multiple data streams simultaneously—weather station observations, satellite-derived irradiance, and turbine/inverter telemetry—and maintains materialized views that compute rolling forecast error, irradiance bias, and expected generation based on current conditions. The forecast adjustment layer runs as SQL, not a Python notebook or a batch Spark job.
The result: operations teams get a nowcast that is continuously refined as conditions change, visible through any Postgres-compatible dashboard.
Building the System: Step by Step
Step 1: Connect the Data Source
Ingest weather observations and generation telemetry from Kafka:
-- Weather station observations
CREATE SOURCE weather_observations (
station_id VARCHAR,
site_id VARCHAR,
temperature_c DOUBLE PRECISION,
wind_speed_ms DOUBLE PRECISION,
wind_direction DOUBLE PRECISION,
ghi_wm2 DOUBLE PRECISION, -- Global Horizontal Irradiance
cloud_cover_pct DOUBLE PRECISION,
obs_ts TIMESTAMPTZ
) WITH (
connector = 'kafka',
topic = 'weather.observations',
properties.bootstrap.server = 'kafka:9092',
scan.startup.mode = 'latest'
) FORMAT PLAIN ENCODE JSON;
-- Generation telemetry (inverters / turbines)
CREATE SOURCE generation_telemetry (
asset_id VARCHAR,
site_id VARCHAR,
actual_output_kw DOUBLE PRECISION,
availability_pct DOUBLE PRECISION,
tel_ts TIMESTAMPTZ
) WITH (
connector = 'kafka',
topic = 'generation.telemetry',
properties.bootstrap.server = 'kafka:9092',
scan.startup.mode = 'latest'
) FORMAT PLAIN ENCODE JSON;
Step 2: Build Real-Time Aggregations
Compute site-level actual generation and rolling bias against the static hourly forecast:
-- Actual generation aggregated per site over 15-minute windows
CREATE MATERIALIZED VIEW site_actual_generation AS
SELECT
site_id,
window_start,
window_end,
SUM(actual_output_kw) AS total_actual_kw,
AVG(availability_pct) AS avg_availability
FROM TUMBLE(generation_telemetry, tel_ts, INTERVAL '15' MINUTE)
GROUP BY site_id, window_start, window_end;
-- Rolling forecast error: join actuals against the hourly forecast table
CREATE MATERIALIZED VIEW forecast_bias AS
SELECT
a.site_id,
a.window_end AS period,
a.total_actual_kw,
f.forecast_kw,
a.total_actual_kw - f.forecast_kw AS error_kw,
ROUND((a.total_actual_kw - f.forecast_kw) / NULLIF(f.forecast_kw, 0) * 100, 2) AS error_pct
FROM site_actual_generation a
JOIN generation_forecasts f
ON a.site_id = f.site_id
AND a.window_end BETWEEN f.forecast_period_start AND f.forecast_period_end;
Step 3: Detect Anomalies and Generate Alerts
Alert on large forecast deviations and ramp events:
CREATE MATERIALIZED VIEW generation_alerts AS
WITH rolling_avg AS (
SELECT
site_id,
window_end,
total_actual_kw,
AVG(total_actual_kw) OVER (
PARTITION BY site_id
ORDER BY window_end
ROWS BETWEEN 3 PRECEDING AND CURRENT ROW
) AS rolling_avg_kw
FROM site_actual_generation
)
SELECT
r.site_id,
r.window_end,
r.total_actual_kw,
r.rolling_avg_kw,
b.error_pct,
CASE
WHEN ABS(b.error_pct) > 30 THEN 'LARGE_FORECAST_ERROR'
WHEN ABS(r.total_actual_kw - r.rolling_avg_kw) > 0.3 * r.rolling_avg_kw
THEN 'RAMP_EVENT'
ELSE 'NORMAL'
END AS alert_type
FROM rolling_avg r
JOIN forecast_bias b ON r.site_id = b.site_id AND r.window_end = b.period
WHERE ABS(b.error_pct) > 30
OR ABS(r.total_actual_kw - r.rolling_avg_kw) > 0.3 * r.rolling_avg_kw;
Step 4: Integrate with SCADA/EMS Downstream
Send generation alerts to grid control and publish adjusted nowcasts for scheduling:
CREATE SINK generation_alert_sink
FROM generation_alerts
WITH (
connector = 'kafka',
topic = 'renewables.alerts',
properties.bootstrap.server = 'kafka:9092'
) FORMAT PLAIN ENCODE JSON;
CREATE SINK nowcast_sink
FROM forecast_bias
WITH (
connector = 'kafka',
topic = 'renewables.nowcast',
properties.bootstrap.server = 'kafka:9092'
) FORMAT PLAIN ENCODE JSON;
Comparison: Batch vs Streaming
| Capability | Batch Forecasting | Streaming SQL |
| Forecast refresh | Hourly model rerun | Continuous bias correction |
| Ramp event detection | Missed between cycles | Sub-minute detection |
| Actual vs. forecast gap | Visible after batch close | Live, per 15-min window |
| Multi-site aggregation | Overnight reports | Always-current view |
| Weather data integration | Batch NWP ingestion | Real-time observation JOIN |
| Alert latency | Minutes to hours | Seconds |
| Infrastructure | Python + Airflow + data lake | Single streaming database |
FAQ
Does this replace the NWP-based forecast model? No—it complements it. The physics model provides the baseline forecast based on meteorological simulation. The streaming SQL layer provides real-time bias correction and ramp detection using live observations that the model cannot see until its next run cycle.
How do we incorporate satellite irradiance data that arrives every 15 minutes?
Create an additional Kafka source for satellite data and join it to the weather_observations view or use it in a separate irradiance correction materialized view. RisingWave handles multiple sources with different arrival cadences natively.
Can we backfill historical forecast errors to train future model corrections? RisingWave stores materialized view state and can be queried historically. Additionally, the Kafka topics retain raw data according to their retention policy, enabling replay and backfill analysis using standard SQL queries against the streaming views.
Key Takeaways
- Streaming SQL provides a real-time bias-correction and nowcasting layer that continuously narrows the gap between static model forecasts and live generation reality.
- Ramp events—large rapid changes in renewable output—are detected within minutes rather than discovered at the next forecast cycle.
- Materialized views over multiple data streams (weather, generation, forecasts) run as SQL joins without custom application code.
- The Postgres interface means forecast bias data is directly accessible to scheduling tools, energy management systems, and BI dashboards.
- The architecture is additive: it layers on top of existing NWP-based forecasting without replacing it.
Further reading:

