
Customer Story
GDU Labs is a data platform company that unifies fragmented identity data into verified profiles. With RisingWave in production for 2 years, they process 3.5 billion events to maintain 750 million entities with sub-second freshness.
The Challenge
GDU Labs aggregates person, employment, and organization data from multiple sources into unified entities. Their previous system relied on scheduled batch refreshes, causing 45-minute to hour-long delays before updates appeared. Customers who saw fresh data on LinkedIn but stale data in GDU Labs' platform lost trust in the product.
They had systems running in BigQuery and PostgreSQL, lacked clear data lineage, and needed to reduce “time to first happiness” for user onboarding. They evaluated Materialize, considered building with async background jobs, but RisingWave's PostgreSQL compatibility and SQL abstraction won them over.
The Solution
GDU Labs runs RisingWave as a critical production component between PostgreSQL and downstream sinks including OpenSearch, BigQuery, and application databases. They process 3.5 billion events across 1.7 billion data layers, materializing them into 750 million unified entities.
Their multi-cloud architecture spans GCP (data flow, PubSub) and AWS (PostgreSQL, OpenSearch), with RisingWave sitting between PostgreSQL and downstream sinks. As a Ruby on Rails shop, they use Active Record with RisingWave thanks to its PostgreSQL compatibility. Their DAG contains approximately 50-60 relations, with a comprehensive observability pipeline feeding metrics into DataDog for alerting and correlation.
Zero-downtime DAG changes using the table sync pattern GDU Labs pioneered, now documented in RisingWave docs.
Handle join amplification on hot keys without blocking the entire pipeline during high-cardinality operations.
Efficient primary key lookups that power fast queries across 750 million entities without full table scans.
Managed rollouts during business hours instead of risky off-hours deployments with unpredictable downtime.
Denormalized OpenSearch documents materialized in RisingWave, ready for downstream search indexing.
Real-time change tracking that notifies downstream consumers the instant data is updated in the pipeline.
Buffered external delivery to OpenSearch, BigQuery, and PostgreSQL without coupling sink performance to processing.
Results
After 2 years in production, GDU Labs processes 3.5 billion events with peak throughput of 15,000 rows/second. Data updates that previously took 45 minutes now appear in sub-second latency. They're planning to scale 3x to 1.5-2 billion entities over the next 12 months.
| Metric | Before | After (RisingWave) |
|---|---|---|
| Data freshness | 45 min - 1 hour | Sub-second |
| Peak throughput | Limited | 15,000 rows/s |
| Entities managed | Smaller scale | 750 million |
| Schema changes | Downtime, off-hours | Zero-downtime, business hours |
| Downstream integrations | Built manually | Native sinks |
| Planned scale | Current | 3x growth |
Start processing billions of events with sub-second freshness.
Try RisingWave Free