GDU Labs

Customer Story

GDU Labs: 3.5 Billion Events Powering 750 Million Verified Profiles

GDU Labs is a data platform company that unifies fragmented identity data into verified profiles. With RisingWave in production for 2 years, they process 3.5 billion events to maintain 750 million entities with sub-second freshness.

3.5B
Events Processed
750M
Unified Entities
15K/s
Peak Throughput
<1s
Update Latency

The Challenge

What challenge did GDU Labs face with batch data processing?

GDU Labs aggregates person, employment, and organization data from multiple sources into unified entities. Their previous system relied on scheduled batch refreshes, causing 45-minute to hour-long delays before updates appeared. Customers who saw fresh data on LinkedIn but stale data in GDU Labs' platform lost trust in the product.

They had systems running in BigQuery and PostgreSQL, lacked clear data lineage, and needed to reduce “time to first happiness” for user onboarding. They evaluated Materialize, considered building with async background jobs, but RisingWave's PostgreSQL compatibility and SQL abstraction won them over.

“Our prototype when we did test RisingWave, it just worked. It delivered on exactly what the website said. The abstraction they provided, they nailed it.”
Alex Robin
Co-Founder & CTO, GDU Labs

The Solution

How does GDU Labs use RisingWave in production?

GDU Labs runs RisingWave as a critical production component between PostgreSQL and downstream sinks including OpenSearch, BigQuery, and application databases. They process 3.5 billion events across 1.7 billion data layers, materializing them into 750 million unified entities.

Their multi-cloud architecture spans GCP (data flow, PubSub) and AWS (PostgreSQL, OpenSearch), with RisingWave sitting between PostgreSQL and downstream sinks. As a Ruby on Rails shop, they use Active Record with RisingWave thanks to its PostgreSQL compatibility. Their DAG contains approximately 50-60 relations, with a comprehensive observability pipeline feeding metrics into DataDog for alerting and correlation.

Table syncs

Zero-downtime DAG changes using the table sync pattern GDU Labs pioneered, now documented in RisingWave docs.

Unaligned joins

Handle join amplification on hot keys without blocking the entire pipeline during high-cardinality operations.

Indexes

Efficient primary key lookups that power fast queries across 750 million entities without full table scans.

Backfill control

Managed rollouts during business hours instead of risky off-hours deployments with unpredictable downtime.

Wide tables

Denormalized OpenSearch documents materialized in RisingWave, ready for downstream search indexing.

Subscriptions

Real-time change tracking that notifies downstream consumers the instant data is updated in the pipeline.

Decoupled sinks

Buffered external delivery to OpenSearch, BigQuery, and PostgreSQL without coupling sink performance to processing.

“You make an update to one source, and before you're even able to click refresh, the data is updated. It was magical.”
Alex RobinCo-Founder & CTO, GDU Labs

Results

What results did GDU Labs achieve?

After 2 years in production, GDU Labs processes 3.5 billion events with peak throughput of 15,000 rows/second. Data updates that previously took 45 minutes now appear in sub-second latency. They're planning to scale 3x to 1.5-2 billion entities over the next 12 months.

MetricBeforeAfter (RisingWave)
Data freshness45 min - 1 hourSub-second
Peak throughputLimited15,000 rows/s
Entities managedSmaller scale750 million
Schema changesDowntime, off-hoursZero-downtime, business hours
Downstream integrationsBuilt manuallyNative sinks
Planned scaleCurrent3x growth

Ready to build your data platform?

Start processing billions of events with sub-second freshness.

Try RisingWave Free
Best-in-Class Event Streaming
for Agents, Apps, and Analytics
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.