Real-time game telemetry processing with RisingWave means ingesting millions of client-reported performance events — frame rates, load times, crash signals, network latency — and continuously aggregating them into materialized views that surface regressions, device-specific issues, and geographic bottlenecks within seconds of their occurrence.
Telemetry Is Only Valuable When It's Fresh
Game telemetry answers the most important operational question in live-service gaming: is the game actually working well for players right now? Frame rate drops, loading screen freezes, network desync events, and memory crashes are invisible from the server side unless clients are sending telemetry and that telemetry is being processed continuously.
The industry standard has been to batch-process telemetry overnight in a data warehouse. This catches long-term trends but misses the acute incidents that matter most: a patch that tanks GPU performance on a popular device, a server region routing change that degrades latency for 20% of players, or a memory leak that starts crashing clients 45 minutes into a session.
With RisingWave, telemetry flows from Kafka into materialized views that are always current. Your on-call engineer's dashboard shows the real frame rate distribution as of 10 seconds ago — not yesterday.
Setting Up the Telemetry Source
Client telemetry should be batched on-device and sent to a Kafka producer every 30 seconds. Define the source in RisingWave:
CREATE SOURCE game_telemetry (
client_id VARCHAR,
player_id BIGINT,
session_id VARCHAR,
platform VARCHAR,
device_model VARCHAR,
os_version VARCHAR,
game_version VARCHAR,
region VARCHAR,
metric_type VARCHAR,
metric_value FLOAT,
level_id VARCHAR,
recorded_at TIMESTAMPTZ
)
WITH (
connector = 'kafka',
topic = 'game.telemetry.client',
properties.bootstrap.server = 'kafka:9092',
scan.startup.mode = 'latest'
)
FORMAT PLAIN ENCODE JSON;
Live Performance Percentiles
Raw averages hide the tail latency that ruins player experience. Use window aggregations to compute percentile distributions:
CREATE MATERIALIZED VIEW telemetry_performance AS
SELECT
window_start,
window_end,
game_version,
platform,
region,
metric_type,
COUNT(*) AS sample_count,
AVG(metric_value) AS avg_value,
MIN(metric_value) AS min_value,
MAX(metric_value) AS max_value,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY metric_value) AS p50,
PERCENTILE_CONT(0.90) WITHIN GROUP (ORDER BY metric_value) AS p90,
PERCENTILE_CONT(0.99) WITHIN GROUP (ORDER BY metric_value) AS p99
FROM TUMBLE(game_telemetry, recorded_at, INTERVAL '1 minute')
GROUP BY window_start, window_end, game_version, platform, region, metric_type;
For frame-rate telemetry (metric_type = 'fps'), a p99 below 30 on a high-end device after a patch release is a regression signal. For load times (metric_type = 'load_ms'), a rising p90 indicates a specific level or asset pack is causing problems.
Detecting Crash Clusters
Crash events in telemetry often cluster around specific game versions, levels, or device models. Build a materialized view that surfaces crash hotspots:
CREATE MATERIALIZED VIEW crash_cluster_detection AS
SELECT
window_start,
window_end,
game_version,
platform,
device_model,
level_id,
COUNT(*) FILTER (WHERE metric_type = 'crash') AS crash_count,
COUNT(DISTINCT player_id) FILTER (WHERE metric_type = 'crash') AS affected_players,
COUNT(DISTINCT session_id) AS total_sessions,
ROUND(
COUNT(*) FILTER (WHERE metric_type = 'crash')::DECIMAL /
NULLIF(COUNT(DISTINCT session_id), 0) * 100, 2
) AS crash_rate_pct
FROM HOP(game_telemetry, recorded_at, INTERVAL '5 minutes', INTERVAL '30 minutes')
GROUP BY window_start, window_end, game_version, platform, device_model, level_id
HAVING COUNT(*) FILTER (WHERE metric_type = 'crash') >= 5;
When crash_rate_pct exceeds 1% for a specific (game_version, device_model, level_id) combination, your alerting pipeline knows within 30 minutes of the first crashes appearing — not the next morning.
Sinking Telemetry Aggregates to the Data Warehouse
Real-time monitoring handles incident response. Long-term trend analysis requires archival. Sink telemetry aggregates to both a live dashboard and a data warehouse:
CREATE SINK telemetry_to_iceberg
FROM telemetry_performance
WITH (
connector = 'iceberg',
type = 'append-only',
catalog.type = 'storage',
warehouse.path = 's3://game-data-lake/telemetry',
database.name = 'game_telemetry',
table.name = 'performance_minutely'
);
The Iceberg sink writes aggregated telemetry to object storage in a format queryable by Spark, Trino, or Athena for historical trend analysis, while RisingWave continues serving the live dashboard.
Comparison: Telemetry Processing Architectures
| Approach | Freshness | Percentile Support | Crash Detection Speed | Storage Cost |
| Nightly warehouse ETL | 24 hours | Yes (offline) | Next day | Low |
| APM tools (Datadog, Instana) | 1 minute | Approximate | Minutes | High |
| Custom Flink pipeline | Seconds | Custom | Seconds | Medium |
| RisingWave streaming SQL | Sub-second | Exact | Sub-minute | Low |
Version Regression Detection
Compare current game version telemetry against the previous version to automatically detect regressions:
CREATE MATERIALIZED VIEW version_regression_check AS
SELECT
curr.window_start,
curr.game_version AS current_version,
curr.platform,
curr.metric_type,
curr.p99 AS current_p99,
prev.p99 AS prev_p99,
ROUND((curr.p99 - prev.p99) / NULLIF(prev.p99, 0) * 100, 2) AS pct_change
FROM telemetry_performance curr
JOIN telemetry_performance prev
ON curr.platform = prev.platform
AND curr.metric_type = prev.metric_type
AND curr.game_version != prev.game_version
WHERE curr.sample_count > 1000
AND prev.sample_count > 1000;
A pct_change greater than 10% on frame rate or load time metrics triggers an automated alert to the engineering team.
FAQ
Q: How do I handle telemetry from players with poor network connections who send delayed events? A: RisingWave supports configurable watermarks per source. Set the watermark delay to accommodate late arrivals (e.g., 2 minutes) and the window functions will wait for late data before finalizing window results.
Q: Can I correlate client telemetry with server-side performance metrics?
A: Yes. Create a second source ingesting server metrics from a separate Kafka topic and join the two sources in a materialized view on session_id or region to correlate client-perceived performance with server load.
Q: What is the recommended Kafka topic partitioning strategy for high-volume telemetry?
A: Partition by player_id or client_id to ensure ordered processing per player. Use at least as many partitions as RisingWave compute nodes to maximize parallelism.
Q: How do I prevent outlier devices (rooted phones, jailbroken consoles) from skewing percentiles?
A: Add a filtering step in the materialized view. Exclude values outside a plausible range for the metric type: WHERE metric_value BETWEEN 1 AND 300 for FPS, WHERE metric_value BETWEEN 100 AND 60000 for load time milliseconds.
Q: Is RisingWave suitable for telemetry from mobile games with 50 million daily active users? A: Yes. At 50M DAU generating one telemetry event per 30 seconds, that is approximately 1.7 million events per minute — well within RisingWave's processing capacity on a standard cluster.
Know What's Happening in Every Player's Game
Telemetry without real-time processing is just expensive storage. With RisingWave, every frame rate reading, load time measurement, and crash report becomes actionable intelligence the moment it arrives.
Begin at https://docs.risingwave.com/get-started and discuss telemetry patterns with other engineers at https://risingwave.com/slack.

