Companies Using Streaming Databases in Production: 10 Case Studies

Every year, more engineering teams hit the same wall: their batch pipelines cannot keep up with the speed their business demands. Nightly ETL jobs that once felt adequate now create blind spots where fraud goes undetected, dashboards show stale numbers, and customers wait for data that should be instant.

Streaming databases offer a way out. A streaming database is a system that continuously ingests, processes, and serves data in real time using SQL, combining the roles of a stream processor and a database into a single layer. Instead of stitching together Kafka, Flink, Redis, and PostgreSQL, teams write SQL queries that stay up to date as new data arrives.

But does this actually work in production? In this article, we examine 10 real-world case studies from companies running streaming databases at scale. These span quantitative trading, industrial IoT, entity resolution, gaming infrastructure, fraud prevention, and more. Each case study covers the problem, the architecture, and the measurable results.

1. Metabit Trading: Real-Time Risk Monitoring with 95% Cost Reduction

Industry: Quantitative Finance Scale: Billions of dollars in managed assets, tens of thousands of QPS

Metabit Trading is a quantitative investment firm managing over $1 billion in assets, with team members from Stanford, CMU, Facebook, and Google. Their trading systems generate enormous volumes of data that need continuous monitoring for risk control and compliance.

The Problem

Metabit originally used an OLAP database (referred to as "System X" in their published case study) for real-time monitoring. The system had three critical limitations:

Query concurrency capped at ~100 QPS - a single expensive query could exhaust cluster resources
Only eventual consistency - joins across tables were risky because data could be stale
Horizontal scaling degraded performance - inter-shard communication created bottlenecks as the cluster grew

For a trading firm where milliseconds matter, these constraints were unacceptable.

The Architecture

Metabit deployed RisingWave as their streaming database. The data flow is straightforward: trading machines push business data through Kafka, RisingWave creates materialized views on top of Kafka source tables to calculate trading metrics under different aggregation conditions, compares them with thresholds, and generates alert results. RisingWave writes alert data back to Kafka, and the alert service listens to the Kafka stream and sends notifications to monitoring personnel.

Results

95% cost reduction compared to the original OLAP system without materialized views
70% cost reduction compared to an optimized version of the OLAP system
Sub-second alert latency, down from minute-level monitoring
Fewer than 10 CPU cores required, versus hundreds under the previous system
Strong consistency guarantees through RisingWave's barrier-based distributed consistency mechanism

The key technical advantage was incremental computation. Instead of re-scanning the entire dataset for every alert check, RisingWave processes only new or modified data, which is why a handful of cores can replace hundreds.

Read the full Metabit case study on the RisingWave blog.

2. Siemens (via Hivemind): Industrial IoT with Hours-to-Seconds Latency

Industry: Industrial Manufacturing / IoT Scale: Thousands of field devices and sensors

Siemens, one of the world's largest industrial conglomerates, needed to process sensor data from thousands of field devices across manufacturing facilities. The data pipeline was critical for quality monitoring, anomaly detection, and operational efficiency.

The Problem

Siemens relied on nightly batch jobs for data synchronization and cleaning. This created a cascade of issues:

Processing pipelines were long and fragile
Data latency measured in hours meant operators could not react to issues in real time
Complex script-based cleaning logic was expensive to maintain
Dedicated scheduling clusters and intermediate data landing layers drove up infrastructure costs

The Architecture

Hivemind Technologies, a data infrastructure company, built a streaming Medallion architecture using RisingWave for the Siemens deployment. The architecture has three layers:

Bronze Layer: Raw data ingestion via Kafka with no pre-processing, preserving maximum fidelity for traceability and auditing
Silver Layer: Real-time data transformation using SQL rules instead of batch scripts, including field name normalization, unit conversion, field enrichment, and invalid data filtering
Gold Layer: Real-time aggregation via materialized views, delivering results simultaneously to dashboards, data lakes, Kafka topics, and BI systems

Results

Latency dropped from hours to seconds
Infrastructure savings exceeded 50% by eliminating scheduling clusters and intermediate staging layers
Cleaning logic moved from complex script stacks to SQL rules, drastically reducing maintenance overhead
Real-time decisions became possible based on live materialized views rather than stale batch reports

This deployment represents a paradigm shift: from offline to real-time, from brittle scripts to declarative SQL, and from rigid scheduling to continuous streaming.

Read the full Hivemind/Siemens case study.

3. GDU Labs: Real-Time Entity Resolution at Scale

Industry: Data Infrastructure / Identity Resolution Scale: Millions of fragmented records unified into verified profiles

GDU Labs builds identity resolution technology that turns fragmented data from multiple sources into verified, unified profiles. When your customer data lives in a dozen different systems with inconsistent naming, duplicate entries, and conflicting attributes, entity resolution is what stitches it all together.

The Problem

Entity resolution is computationally intensive. Matching and merging records across sources requires continuous comparison, scoring, and deduplication. Batch-based approaches meant that profile updates were always stale: a customer who updated their information in one system would not see the change reflected across the platform until the next batch run.

GDU Labs needed a streaming foundation that could deliver fresh, reliable data at scale, making real-time updates a core part of their product experience rather than an afterthought.

The Architecture

GDU Labs chose RisingWave and took it from prototype to a production-grade infrastructure dependency over two years. The streaming database handles continuous ingestion of data changes from multiple sources, applies matching and transformation logic in SQL, and maintains always-current materialized views that represent the unified profile state.

Results

Real-time profile updates instead of batch-delayed synchronization
Production-grade reliability after two years of continuous operation
SQL-based matching logic that is easier to maintain and iterate than custom application code
Scalable architecture that grows with data volume without requiring a re-architecture

GDU Labs was the first company featured in RisingWave's Customer Spotlight series, speaking publicly about how streaming data became mission-critical for their product.

4. Tencent: QoS Infrastructure Across Tens of Thousands of Machines

Industry: Cloud Computing / Gaming Scale: Tens of thousands of machines, high-throughput metrics

Tencent Cloud operates infrastructure at a scale few organizations ever reach. Their Quality of Service (QoS) framework needs to monitor metrics from tens of thousands of machines, detect anomalies in real time, and dynamically allocate resources to maintain service quality for hundreds of millions of users.

The Problem

Tencent's infrastructure engineering team originally built their QoS system using Apache Kafka for data streaming, Apache Flink for stateful stream processing with lookup joins, and MySQL as both an external sink and operational database. The system used an event-driven state machine formulated in SQL.

As scale increased, three problems emerged:

Flink's TPS performance degraded as scalability demands grew, particularly with lookup joins
Debugging complex SQL involving multiway joins and nested subqueries was extremely difficult within Flink's architecture
RocksDB storage costs escalated because Flink's dependency on block storage for persisting streaming states correlated directly with data volume growth

The Architecture

Tencent replaced Flink with RisingWave, taking advantage of its unified architecture that eliminates the need for external data sinking. RisingWave's PostgreSQL compatibility enabled straightforward SQL development and query debugging. The system integrated with Tencent's Kubernetes-based container platform (TKE) and object storage (TOS) for optimized cluster management.

Results

Logarithmic improvement in TPS performance compared to the Flink-based system
Consolidated maintenance from complex distributed operations to deterministic SQL-based workflows
Optimized compute and storage costs through intelligent task scheduling
Robust fault tolerance with built-in load balancing across distributed environments

The Tencent case is significant because it shows a direct comparison: a team that had already built a working system in Flink chose to migrate to a streaming database because of performance, debuggability, and cost advantages.

Read the full Tencent QoS case study.

5. DoorDash: Hundreds of Billions of Events Per Day

Industry: Food Delivery / Logistics Scale: Hundreds of billions of events per day, 99.99% delivery rate

DoorDash is one of the largest food delivery platforms in the United States, processing an enormous volume of real-time events related to orders, driver locations, restaurant status, and customer interactions.

The Problem

As DoorDash scaled, they needed a real-time event processing system that could handle hundreds of billions of events per day while maintaining sub-second latency for critical operational decisions like driver dispatch and ETA calculation.

The Architecture

DoorDash built its real-time processing platform on Apache Kafka and Apache Flink. Kafka handles event ingestion and routing, while Flink processes streams and writes results to S3 for loading into Snowflake. The team shifted from relying on managed AWS services to open-source frameworks that gave them more control over performance tuning.

Results

Hundreds of billions of events processed per day
99.99% delivery rate for event processing
Sub-second latency for critical operational decisions
Scalable architecture that evolved from managed services to open-source frameworks as scale demanded

DoorDash's case illustrates the scale at which streaming architectures operate in production. Their journey also highlights a common pattern: starting with managed services and migrating to open-source tools as the team's requirements outgrow what managed offerings can deliver.

6. Riskified: Migrating from ksqlDB to Flink for Fraud Prevention

Industry: E-commerce Fraud Prevention Scale: Real-time fraud scoring across millions of transactions

Riskified provides fraud prevention for e-commerce merchants, making instant approve/decline decisions on orders. Their streaming infrastructure directly impacts revenue: false positives mean rejected legitimate customers, and false negatives mean chargebacks.

The Problem

Riskified initially deployed Confluent's ksqlDB for their streaming SQL needs. Once in production, critical limitations surfaced:

Schema evolution was broken - ksqlDB did not automatically include new fields, and fixing it required dropping and recreating streams, disrupting offsets and production pipelines
No resource isolation - streaming query resources were shared, so one expensive task could impact all other tasks on the same node
Type conversion issues - ksqlDB internally converted Avro Enum types to Strings and interpreted fields as nullable, causing deserialization errors in downstream consumers

The Architecture

Riskified migrated to AWS Managed Service for Apache Flink. They built a single, flexible Flink application where end-users can modify input topics, SQL processing logic, and output destinations through runtime properties. Kafka remained the input and output layer.

Results

Schema evolution support - new fields are handled gracefully without pipeline disruption
Job-level resource isolation - each streaming job runs independently
Automatic scaling and built-in monitoring through the managed service
Self-serve streaming SQL for internal teams, reducing the bottleneck on the platform team

This case study is particularly instructive because it shows the real-world limitations of ksqlDB in production and why teams migrate to more capable streaming systems.

7. Coreflux: Real-Time Production Line Monitoring in Manufacturing

Industry: Manufacturing / IoT Scale: Thousands of sensors across production lines

Coreflux is an IoT platform company that provides edge-to-cloud connectivity for manufacturing environments. Their partnership with RisingWave targets a common manufacturing challenge: turning raw sensor data into actionable production insights.

The Problem

Modern production lines generate a constant stream of sensor data, including bottle presence detection, filling status, temperature, pressure, and vibration readings. Traditional approaches store this data in a time-series database and run periodic queries, but this introduces latency that prevents real-time anomaly detection and process optimization.

The Architecture

Sensors on production lines collect operational data, serialize it into JSON, and publish to Coreflux's MQTT broker. In the cloud, RisingWave subscribes to the relevant MQTT topics, ingesting real-time data for processing. Materialized views continuously compute KPIs, detect anomalies, and track production efficiency metrics.

Results

Real-time production monitoring replacing periodic batch queries
MQTT-native ingestion eliminating the need for a separate Kafka layer in edge-to-cloud scenarios
SQL-based anomaly detection accessible to manufacturing engineers without stream processing expertise
Seamless edge-to-cloud pipeline combining Coreflux's IoT capabilities with RisingWave's stream processing

This deployment is notable because it demonstrates streaming databases in the OT (Operational Technology) world, not just IT. Manufacturing engineers who know SQL can define monitoring rules without learning Java or Flink APIs.

Read about the Coreflux and RisingWave integration.

8. Alibaba: Billions of Events Per Second During Singles' Day

Industry: E-commerce Scale: Billions of events per second during peak shopping events

Alibaba, the world's largest e-commerce company by GMV, operates Blink, a fork of Apache Flink, at extraordinary scale. Their Singles' Day shopping festival generates peak loads that few other events in the world can match.

The Problem

During Singles' Day, Alibaba needs to process billions of events per second for real-time inventory management, fraud detection, personalized recommendations, and dynamic pricing. Traditional batch systems simply cannot operate at this speed during peak events.

The Architecture

Alibaba runs Blink (their Flink fork) as a managed service for both internal teams and cloud customers. The system handles real-time event processing, windowed aggregations, and complex event pattern matching across their entire e-commerce ecosystem.

Results

Billions of events per second processed during peak loads
Real-time inventory updates preventing overselling during flash sales
Dynamic pricing and recommendations adapting to user behavior in real time
Production-hardened fork of Flink that has been validated at scales most organizations will never reach

Alibaba's deployment is the extreme end of the spectrum. It demonstrates that streaming architectures are not just for startups or simple use cases; they operate at the largest scales in production.

9. Netflix: Real-Time Personalization Across 200+ Million Subscribers

Industry: Media / Entertainment Scale: 200+ million subscribers, trillions of events per day

Netflix's recommendation engine is one of the most sophisticated real-time data systems in the world. Every interaction, including views, pauses, scrolls, searches, and ratings, feeds into a real-time pipeline that personalizes what each subscriber sees.

The Problem

With over 200 million subscribers generating trillions of daily events, Netflix needs to process user interactions in real time to update recommendations, personalize thumbnails, and optimize content delivery. Batch processing creates a lag between user behavior and personalization, degrading the experience.

The Architecture

Netflix built a multi-layered real-time data infrastructure that has gone through four major innovation phases. The stack includes Apache Kafka for event ingestion, Apache Flink for stream processing, and custom internal tools for managing thousands of streaming jobs. The team operates Keystone, their real-time event processing platform, alongside specialized systems for A/B testing, content delivery optimization, and subscriber-level feature computation.

Results

Trillions of events processed daily across the subscriber base
Real-time recommendation updates that respond to user behavior within seconds
Thousands of concurrent streaming jobs managed through internal tooling
Continuous personalization driving measurable improvements in engagement and retention

Netflix's case shows that streaming is not optional for modern personalization at scale. Their entire product experience depends on processing events as they happen.

10. Capital One: Real-Time Fraud Detection in Banking

Industry: Financial Services / Banking Scale: Millions of transactions per day with real-time decisioning

Capital One, one of the largest banks in the United States, built a context-specific fraud detection system that analyzes events in real time to flag potential fraud as it happens, rather than through post-facto batch analysis.

The Problem

Financial institutions process an average of 1.3 million transactions per second during peak periods. Traditional rule-based fraud detection using batch processing means fraudulent transactions are only caught hours or days after they occur, by which time recovery is often impossible.

The Architecture

Capital One's system ingests transaction events in real time and applies a combination of rule-based and machine learning-based detection. The streaming infrastructure processes transactions as they arrive, enriching them with contextual data (location, device fingerprint, merchant category, transaction history) and scoring them against fraud models before the transaction authorization window closes.

Results

Real-time fraud scoring before transaction authorization completes
Context-enriched decisioning using streaming joins across multiple data sources
Reduced false positive rates compared to batch-based rule systems
Immediate response capability that stops fraud in progress rather than detecting it after the fact

Capital One's deployment demonstrates that even heavily regulated industries with conservative technology cultures are adopting streaming architectures for mission-critical workloads.

Patterns Across Production Deployments

Looking across these 10 case studies, several patterns emerge:

Common Migration Triggers

Latency gaps - Batch pipelines running hourly or nightly while the business needs seconds or milliseconds
Cost escalation - OLAP databases or Flink clusters becoming expensive as data volumes grow
Operational complexity - Maintaining separate systems for ingestion, processing, storage, and serving
Debugging difficulty - Complex stream processing code in Java or Scala that is hard to troubleshoot

Architecture Choices

Approach	Used By	Strengths	Limitations
Streaming database (RisingWave)	Metabit, Siemens, GDU Labs, Tencent, Coreflux	SQL interface, built-in storage, lowest operational complexity	Newer ecosystem
Apache Flink	DoorDash, Alibaba, Netflix, Riskified	Mature ecosystem, massive scale proven	Requires Java/Scala expertise, separate storage needed
ksqlDB	Riskified (migrated away)	Kafka-native, simple setup	Schema evolution issues, no resource isolation, limited analytics

Key Metrics Companies Report

Cost reductions of 50-95% when moving from OLAP or batch systems to streaming databases
Latency improvements from hours to seconds (or seconds to milliseconds)
Team productivity gains when switching from Java-based stream processing to SQL-based approaches
Infrastructure simplification by eliminating scheduling clusters, intermediate storage, and cache layers

What Types of Companies Should Use a Streaming Database in Production?

A streaming database fits best when your team needs real-time data processing but does not want to operate a complex stack of separate tools. Companies that benefit most share these characteristics:

Their core product depends on data freshness measured in seconds, not hours
They have SQL-proficient engineers who do not want to write and maintain Java or Scala stream processing code
They need to serve query results directly from the streaming layer, not just process data and write it elsewhere
They are hitting cost or complexity limits with their current OLAP or Flink-based architecture

If you are processing fewer than a few hundred events per second and a 5-minute delay is acceptable, a batch pipeline or a lightweight change data capture setup is likely sufficient. Streaming databases solve the problem that appears when the business demands both low latency and the ability to query current state using SQL.

How Do Streaming Databases Compare to Apache Flink in Production?

Apache Flink is the most widely deployed stream processing framework, used by Alibaba, Netflix, DoorDash, and many others at massive scale. Streaming databases like RisingWave differ in three key ways:

Built-in storage - RisingWave stores results in materialized views that you can query directly. Flink requires a separate database or cache to serve query results.
PostgreSQL-compatible SQL - RisingWave uses standard SQL that works with existing PostgreSQL tools and drivers. Flink SQL exists but has a different syntax and operational model.
Simpler operations - A streaming database is a single system to deploy, monitor, and scale. Flink requires managing a cluster plus external state stores plus output databases.

The tradeoff: Flink has a more mature ecosystem, broader connector support, and is proven at the extreme scales of Alibaba and Netflix. Streaming databases are a better fit for teams that prioritize operational simplicity and SQL-first development.

For a deeper comparison, see our stream processing systems analysis.

What Are the Biggest Risks of Running a Streaming Database in Production?

Based on the case studies above, teams report three primary risk areas:

Ecosystem maturity - Streaming databases are a newer category than Flink or Kafka Streams. Connector coverage and third-party tooling are growing but not yet at parity.
Operational knowledge - Fewer engineers have production experience with streaming databases than with Flink or Spark Streaming. Teams may need to invest in training.
Edge cases at extreme scale - While companies like Tencent run streaming databases across tens of thousands of machines, the largest proven deployments (billions of events per second) are still on Flink.

Companies like Metabit, Siemens, GDU Labs, and Tencent have demonstrated that these risks are manageable, particularly when the alternative is maintaining a complex multi-system architecture that creates its own reliability and cost risks.

How Do You Get Started with a Streaming Database?

The lowest-friction starting point is identifying one batch pipeline or scheduled query that your team wishes were real time. Common first use cases include:

Real-time dashboards that currently refresh on a schedule
Alerting or monitoring systems that poll a database
Feature computation for ML models that uses stale data
CDC-based synchronization between databases that runs in batch

Convert that single use case to a streaming materialized view. If the results are good, expand from there. Most of the companies in this article started with a single use case before expanding streaming databases across their organization.

Conclusion

Streaming databases are moving from early adoption to mainstream production use. The 10 case studies in this article span quantitative trading, industrial IoT, identity resolution, cloud infrastructure, food delivery, fraud prevention, manufacturing, e-commerce, media, and banking. The common thread: every company reduced latency, simplified operations, or cut costs, and most achieved all three.

Key takeaways:

Metabit cut monitoring costs by 95% and reduced latency to sub-second with fewer than 10 CPU cores
Siemens dropped data latency from hours to seconds and saved over 50% on infrastructure
Tencent achieved logarithmic TPS improvements over their Flink-based system
Riskified's migration from ksqlDB highlights real-world limitations of simpler streaming SQL tools
SQL-based streaming consistently reduces operational complexity compared to Java/Scala-based stream processing

The streaming database category is still young, but these production deployments prove the architecture works at scale across diverse industries.

Ready to try a streaming database in production? Try RisingWave Cloud free, with no credit card required. Sign up here.

Join our Slack community to ask questions and connect with other stream processing developers.