Every year, more engineering teams hit the same wall: their batch pipelines cannot keep up with the speed their business demands. Nightly ETL jobs that once felt adequate now create blind spots where fraud goes undetected, dashboards show stale numbers, and customers wait for data that should be instant.
Streaming databases offer a way out. A streaming database is a system that continuously ingests, processes, and serves data in real time using SQL, combining the roles of a stream processor and a database into a single layer. Instead of stitching together Kafka, Flink, Redis, and PostgreSQL, teams write SQL queries that stay up to date as new data arrives.
But does this actually work in production? In this article, we examine 10 real-world case studies from companies running streaming databases at scale. These span quantitative trading, industrial IoT, entity resolution, gaming infrastructure, fraud prevention, and more. Each case study covers the problem, the architecture, and the measurable results.
1. Metabit Trading: Real-Time Risk Monitoring with 95% Cost Reduction
Industry: Quantitative Finance Scale: Billions of dollars in managed assets, tens of thousands of QPS
Metabit Trading is a quantitative investment firm managing over $1 billion in assets, with team members from Stanford, CMU, Facebook, and Google. Their trading systems generate enormous volumes of data that need continuous monitoring for risk control and compliance.
The Problem
Metabit originally used an OLAP database (referred to as "System X" in their published case study) for real-time monitoring. The system had three critical limitations:
- Query concurrency capped at ~100 QPS - a single expensive query could exhaust cluster resources
- Only eventual consistency - joins across tables were risky because data could be stale
- Horizontal scaling degraded performance - inter-shard communication created bottlenecks as the cluster grew
For a trading firm where milliseconds matter, these constraints were unacceptable.
The Architecture
Metabit deployed RisingWave as their streaming database. The data flow is straightforward: trading machines push business data through Kafka, RisingWave creates materialized views on top of Kafka source tables to calculate trading metrics under different aggregation conditions, compares them with thresholds, and generates alert results. RisingWave writes alert data back to Kafka, and the alert service listens to the Kafka stream and sends notifications to monitoring personnel.
Results
- 95% cost reduction compared to the original OLAP system without materialized views
- 70% cost reduction compared to an optimized version of the OLAP system
- Sub-second alert latency, down from minute-level monitoring
- Fewer than 10 CPU cores required, versus hundreds under the previous system
- Strong consistency guarantees through RisingWave's barrier-based distributed consistency mechanism
The key technical advantage was incremental computation. Instead of re-scanning the entire dataset for every alert check, RisingWave processes only new or modified data, which is why a handful of cores can replace hundreds.
2. Siemens (via Hivemind): Industrial IoT with Hours-to-Seconds Latency
Industry: Industrial Manufacturing / IoT Scale: Thousands of field devices and sensors
Siemens, one of the world's largest industrial conglomerates, needed to process sensor data from thousands of field devices across manufacturing facilities. The data pipeline was critical for quality monitoring, anomaly detection, and operational efficiency.
The Problem
Siemens relied on nightly batch jobs for data synchronization and cleaning. This created a cascade of issues:
- Processing pipelines were long and fragile
- Data latency measured in hours meant operators could not react to issues in real time
- Complex script-based cleaning logic was expensive to maintain
- Dedicated scheduling clusters and intermediate data landing layers drove up infrastructure costs
The Architecture
Hivemind Technologies, a data infrastructure company, built a streaming Medallion architecture using RisingWave for the Siemens deployment. The architecture has three layers:
- Bronze Layer: Raw data ingestion via Kafka with no pre-processing, preserving maximum fidelity for traceability and auditing
- Silver Layer: Real-time data transformation using SQL rules instead of batch scripts, including field name normalization, unit conversion, field enrichment, and invalid data filtering
- Gold Layer: Real-time aggregation via materialized views, delivering results simultaneously to dashboards, data lakes, Kafka topics, and BI systems
Results
- Latency dropped from hours to seconds
- Infrastructure savings exceeded 50% by eliminating scheduling clusters and intermediate staging layers
- Cleaning logic moved from complex script stacks to SQL rules, drastically reducing maintenance overhead
- Real-time decisions became possible based on live materialized views rather than stale batch reports
This deployment represents a paradigm shift: from offline to real-time, from brittle scripts to declarative SQL, and from rigid scheduling to continuous streaming.
3. GDU Labs: Real-Time Entity Resolution at Scale
Industry: Data Infrastructure / Identity Resolution Scale: Millions of fragmented records unified into verified profiles
GDU Labs builds identity resolution technology that turns fragmented data from multiple sources into verified, unified profiles. When your customer data lives in a dozen different systems with inconsistent naming, duplicate entries, and conflicting attributes, entity resolution is what stitches it all together.
The Problem
Entity resolution is computationally intensive. Matching and merging records across sources requires continuous comparison, scoring, and deduplication. Batch-based approaches meant that profile updates were always stale: a customer who updated their information in one system would not see the change reflected across the platform until the next batch run.
GDU Labs needed a streaming foundation that could deliver fresh, reliable data at scale, making real-time updates a core part of their product experience rather than an afterthought.
The Architecture
GDU Labs chose RisingWave and took it from prototype to a production-grade infrastructure dependency over two years. The streaming database handles continuous ingestion of data changes from multiple sources, applies matching and transformation logic in SQL, and maintains always-current materialized views that represent the unified profile state.
Results
- Real-time profile updates instead of batch-delayed synchronization
- Production-grade reliability after two years of continuous operation
- SQL-based matching logic that is easier to maintain and iterate than custom application code
- Scalable architecture that grows with data volume without requiring a re-architecture
GDU Labs was the first company featured in RisingWave's Customer Spotlight series, speaking publicly about how streaming data became mission-critical for their product.
4. Tencent: QoS Infrastructure Across Tens of Thousands of Machines
Industry: Cloud Computing / Gaming Scale: Tens of thousands of machines, high-throughput metrics
Tencent Cloud operates infrastructure at a scale few organizations ever reach. Their Quality of Service (QoS) framework needs to monitor metrics from tens of thousands of machines, detect anomalies in real time, and dynamically allocate resources to maintain service quality for hundreds of millions of users.
The Problem
Tencent's infrastructure engineering team originally built their QoS system using Apache Kafka for data streaming, Apache Flink for stateful stream processing with lookup joins, and MySQL as both an external sink and operational database. The system used an event-driven state machine formulated in SQL.
As scale increased, three problems emerged:
- Flink's TPS performance degraded as scalability demands grew, particularly with lookup joins
- Debugging complex SQL involving multiway joins and nested subqueries was extremely difficult within Flink's architecture
- RocksDB storage costs escalated because Flink's dependency on block storage for persisting streaming states correlated directly with data volume growth
The Architecture
Tencent replaced Flink with RisingWave, taking advantage of its unified architecture that eliminates the need for external data sinking. RisingWave's PostgreSQL compatibility enabled straightforward SQL development and query debugging. The system integrated with Tencent's Kubernetes-based container platform (TKE) and object storage (TOS) for optimized cluster management.
Results
- Logarithmic improvement in TPS performance compared to the Flink-based system
- Consolidated maintenance from complex distributed operations to deterministic SQL-based workflows
- Optimized compute and storage costs through intelligent task scheduling
- Robust fault tolerance with built-in load balancing across distributed environments
The Tencent case is significant because it shows a direct comparison: a team that had already built a working system in Flink chose to migrate to a streaming database because of performance, debuggability, and cost advantages.
5. DoorDash: Hundreds of Billions of Events Per Day
Industry: Food Delivery / Logistics Scale: Hundreds of billions of events per day, 99.99% delivery rate
DoorDash is one of the largest food delivery platforms in the United States, processing an enormous volume of real-time events related to orders, driver locations, restaurant status, and customer interactions.
The Problem
As DoorDash scaled, they needed a real-time event processing system that could handle hundreds of billions of events per day while maintaining sub-second latency for critical operational decisions like driver dispatch and ETA calculation.
The Architecture
DoorDash built its real-time processing platform on Apache Kafka and Apache Flink. Kafka handles event ingestion and routing, while Flink processes streams and writes results to S3 for loading into Snowflake. The team shifted from relying on managed AWS services to open-source frameworks that gave them more control over performance tuning.
Results
- Hundreds of billions of events processed per day
- 99.99% delivery rate for event processing
- Sub-second latency for critical operational decisions
- Scalable architecture that evolved from managed services to open-source frameworks as scale demanded
DoorDash's case illustrates the scale at which streaming architectures operate in production. Their journey also highlights a common pattern: starting with managed services and migrating to open-source tools as the team's requirements outgrow what managed offerings can deliver.
6. Riskified: Migrating from ksqlDB to Flink for Fraud Prevention
Industry: E-commerce Fraud Prevention Scale: Real-time fraud scoring across millions of transactions
Riskified provides fraud prevention for e-commerce merchants, making instant approve/decline decisions on orders. Their streaming infrastructure directly impacts revenue: false positives mean rejected legitimate customers, and false negatives mean chargebacks.
The Problem
Riskified initially deployed Confluent's ksqlDB for their streaming SQL needs. Once in production, critical limitations surfaced:
- Schema evolution was broken - ksqlDB did not automatically include new fields, and fixing it required dropping and recreating streams, disrupting offsets and production pipelines
- No resource isolation - streaming query resources were shared, so one expensive task could impact all other tasks on the same node
- Type conversion issues - ksqlDB internally converted Avro Enum types to Strings and interpreted fields as nullable, causing deserialization errors in downstream consumers
The Architecture
Riskified migrated to AWS Managed Service for Apache Flink. They built a single, flexible Flink application where end-users can modify input topics, SQL processing logic, and output destinations through runtime properties. Kafka remained the input and output layer.
Results
- Schema evolution support - new fields are handled gracefully without pipeline disruption
- Job-level resource isolation - each streaming job runs independently
- Automatic scaling and built-in monitoring through the managed service
- Self-serve streaming SQL for internal teams, reducing the bottleneck on the platform team
This case study is particularly instructive because it shows the real-world limitations of ksqlDB in production and why teams migrate to more capable streaming systems.
7. Coreflux: Real-Time Production Line Monitoring in Manufacturing
Industry: Manufacturing / IoT Scale: Thousands of sensors across production lines
Coreflux is an IoT platform company that provides edge-to-cloud connectivity for manufacturing environments. Their partnership with RisingWave targets a common manufacturing challenge: turning raw sensor data into actionable production insights.
The Problem
Modern production lines generate a constant stream of sensor data, including bottle presence detection, filling status, temperature, pressure, and vibration readings. Traditional approaches store this data in a time-series database and run periodic queries, but this introduces latency that prevents real-time anomaly detection and process optimization.
The Architecture
Sensors on production lines collect operational data, serialize it into JSON, and publish to Coreflux's MQTT broker. In the cloud, RisingWave subscribes to the relevant MQTT topics, ingesting real-time data for processing. Materialized views continuously compute KPIs, detect anomalies, and track production efficiency metrics.
Results
- Real-time production monitoring replacing periodic batch queries
- MQTT-native ingestion eliminating the need for a separate Kafka layer in edge-to-cloud scenarios
- SQL-based anomaly detection accessible to manufacturing engineers without stream processing expertise
- Seamless edge-to-cloud pipeline combining Coreflux's IoT capabilities with RisingWave's stream processing
This deployment is notable because it demonstrates streaming databases in the OT (Operational Technology) world, not just IT. Manufacturing engineers who know SQL can define monitoring rules without learning Java or Flink APIs.
8. Alibaba: Billions of Events Per Second During Singles' Day
Industry: E-commerce Scale: Billions of events per second during peak shopping events
Alibaba, the world's largest e-commerce company by GMV, operates Blink, a fork of Apache Flink, at extraordinary scale. Their Singles' Day shopping festival generates peak loads that few other events in the world can match.
The Problem
During Singles' Day, Alibaba needs to process billions of events per second for real-time inventory management, fraud detection, personalized recommendations, and dynamic pricing. Traditional batch systems simply cannot operate at this speed during peak events.
The Architecture
Alibaba runs Blink (their Flink fork) as a managed service for both internal teams and cloud customers. The system handles real-time event processing, windowed aggregations, and complex event pattern matching across their entire e-commerce ecosystem.
Results
- Billions of events per second processed during peak loads
- Real-time inventory updates preventing overselling during flash sales
- Dynamic pricing and recommendations adapting to user behavior in real time
- Production-hardened fork of Flink that has been validated at scales most organizations will never reach
Alibaba's deployment is the extreme end of the spectrum. It demonstrates that streaming architectures are not just for startups or simple use cases; they operate at the largest scales in production.
9. Netflix: Real-Time Personalization Across 200+ Million Subscribers
Industry: Media / Entertainment Scale: 200+ million subscribers, trillions of events per day
Netflix's recommendation engine is one of the most sophisticated real-time data systems in the world. Every interaction, including views, pauses, scrolls, searches, and ratings, feeds into a real-time pipeline that personalizes what each subscriber sees.
The Problem
With over 200 million subscribers generating trillions of daily events, Netflix needs to process user interactions in real time to update recommendations, personalize thumbnails, and optimize content delivery. Batch processing creates a lag between user behavior and personalization, degrading the experience.
The Architecture
Netflix built a multi-layered real-time data infrastructure that has gone through four major innovation phases. The stack includes Apache Kafka for event ingestion, Apache Flink for stream processing, and custom internal tools for managing thousands of streaming jobs. The team operates Keystone, their real-time event processing platform, alongside specialized systems for A/B testing, content delivery optimization, and subscriber-level feature computation.
Results
- Trillions of events processed daily across the subscriber base
- Real-time recommendation updates that respond to user behavior within seconds
- Thousands of concurrent streaming jobs managed through internal tooling
- Continuous personalization driving measurable improvements in engagement and retention
Netflix's case shows that streaming is not optional for modern personalization at scale. Their entire product experience depends on processing events as they happen.
10. Capital One: Real-Time Fraud Detection in Banking
Industry: Financial Services / Banking Scale: Millions of transactions per day with real-time decisioning
Capital One, one of the largest banks in the United States, built a context-specific fraud detection system that analyzes events in real time to flag potential fraud as it happens, rather than through post-facto batch analysis.
The Problem
Financial institutions process an average of 1.3 million transactions per second during peak periods. Traditional rule-based fraud detection using batch processing means fraudulent transactions are only caught hours or days after they occur, by which time recovery is often impossible.
The Architecture
Capital One's system ingests transaction events in real time and applies a combination of rule-based and machine learning-based detection. The streaming infrastructure processes transactions as they arrive, enriching them with contextual data (location, device fingerprint, merchant category, transaction history) and scoring them against fraud models before the transaction authorization window closes.
Results
- Real-time fraud scoring before transaction authorization completes
- Context-enriched decisioning using streaming joins across multiple data sources
- Reduced false positive rates compared to batch-based rule systems
- Immediate response capability that stops fraud in progress rather than detecting it after the fact
Capital One's deployment demonstrates that even heavily regulated industries with conservative technology cultures are adopting streaming architectures for mission-critical workloads.
Patterns Across Production Deployments
Looking across these 10 case studies, several patterns emerge:
Common Migration Triggers
- Latency gaps - Batch pipelines running hourly or nightly while the business needs seconds or milliseconds
- Cost escalation - OLAP databases or Flink clusters becoming expensive as data volumes grow
- Operational complexity - Maintaining separate systems for ingestion, processing, storage, and serving
- Debugging difficulty - Complex stream processing code in Java or Scala that is hard to troubleshoot
Architecture Choices
| Approach | Used By | Strengths | Limitations |
| Streaming database (RisingWave) | Metabit, Siemens, GDU Labs, Tencent, Coreflux | SQL interface, built-in storage, lowest operational complexity | Newer ecosystem |
| Apache Flink | DoorDash, Alibaba, Netflix, Riskified | Mature ecosystem, massive scale proven | Requires Java/Scala expertise, separate storage needed |
| ksqlDB | Riskified (migrated away) | Kafka-native, simple setup | Schema evolution issues, no resource isolation, limited analytics |
Key Metrics Companies Report
- Cost reductions of 50-95% when moving from OLAP or batch systems to streaming databases
- Latency improvements from hours to seconds (or seconds to milliseconds)
- Team productivity gains when switching from Java-based stream processing to SQL-based approaches
- Infrastructure simplification by eliminating scheduling clusters, intermediate storage, and cache layers
What Types of Companies Should Use a Streaming Database in Production?
A streaming database fits best when your team needs real-time data processing but does not want to operate a complex stack of separate tools. Companies that benefit most share these characteristics:
- Their core product depends on data freshness measured in seconds, not hours
- They have SQL-proficient engineers who do not want to write and maintain Java or Scala stream processing code
- They need to serve query results directly from the streaming layer, not just process data and write it elsewhere
- They are hitting cost or complexity limits with their current OLAP or Flink-based architecture
If you are processing fewer than a few hundred events per second and a 5-minute delay is acceptable, a batch pipeline or a lightweight change data capture setup is likely sufficient. Streaming databases solve the problem that appears when the business demands both low latency and the ability to query current state using SQL.
How Do Streaming Databases Compare to Apache Flink in Production?
Apache Flink is the most widely deployed stream processing framework, used by Alibaba, Netflix, DoorDash, and many others at massive scale. Streaming databases like RisingWave differ in three key ways:
- Built-in storage - RisingWave stores results in materialized views that you can query directly. Flink requires a separate database or cache to serve query results.
- PostgreSQL-compatible SQL - RisingWave uses standard SQL that works with existing PostgreSQL tools and drivers. Flink SQL exists but has a different syntax and operational model.
- Simpler operations - A streaming database is a single system to deploy, monitor, and scale. Flink requires managing a cluster plus external state stores plus output databases.
The tradeoff: Flink has a more mature ecosystem, broader connector support, and is proven at the extreme scales of Alibaba and Netflix. Streaming databases are a better fit for teams that prioritize operational simplicity and SQL-first development.
For a deeper comparison, see our stream processing systems analysis.
What Are the Biggest Risks of Running a Streaming Database in Production?
Based on the case studies above, teams report three primary risk areas:
- Ecosystem maturity - Streaming databases are a newer category than Flink or Kafka Streams. Connector coverage and third-party tooling are growing but not yet at parity.
- Operational knowledge - Fewer engineers have production experience with streaming databases than with Flink or Spark Streaming. Teams may need to invest in training.
- Edge cases at extreme scale - While companies like Tencent run streaming databases across tens of thousands of machines, the largest proven deployments (billions of events per second) are still on Flink.
Companies like Metabit, Siemens, GDU Labs, and Tencent have demonstrated that these risks are manageable, particularly when the alternative is maintaining a complex multi-system architecture that creates its own reliability and cost risks.
How Do You Get Started with a Streaming Database?
The lowest-friction starting point is identifying one batch pipeline or scheduled query that your team wishes were real time. Common first use cases include:
- Real-time dashboards that currently refresh on a schedule
- Alerting or monitoring systems that poll a database
- Feature computation for ML models that uses stale data
- CDC-based synchronization between databases that runs in batch
Convert that single use case to a streaming materialized view. If the results are good, expand from there. Most of the companies in this article started with a single use case before expanding streaming databases across their organization.
Conclusion
Streaming databases are moving from early adoption to mainstream production use. The 10 case studies in this article span quantitative trading, industrial IoT, identity resolution, cloud infrastructure, food delivery, fraud prevention, manufacturing, e-commerce, media, and banking. The common thread: every company reduced latency, simplified operations, or cut costs, and most achieved all three.
Key takeaways:
- Metabit cut monitoring costs by 95% and reduced latency to sub-second with fewer than 10 CPU cores
- Siemens dropped data latency from hours to seconds and saved over 50% on infrastructure
- Tencent achieved logarithmic TPS improvements over their Flink-based system
- Riskified's migration from ksqlDB highlights real-world limitations of simpler streaming SQL tools
- SQL-based streaming consistently reduces operational complexity compared to Java/Scala-based stream processing
The streaming database category is still young, but these production deployments prove the architecture works at scale across diverse industries.
Ready to try a streaming database in production? Try RisingWave Cloud free, with no credit card required. Sign up here.
Join our Slack community to ask questions and connect with other stream processing developers.

