A fintech data lakehouse with RisingWave and Apache Iceberg means using RisingWave to process streaming financial events in real time, then sinking the results as Iceberg tables on object storage—giving you sub-second operational queries on RisingWave and full historical analytics on Iceberg, without duplicating your data pipeline.
Why Fintech Needs a Lakehouse Architecture
Fintech companies face a bifurcated data problem. On one side: real-time operational needs—fraud scores, live balances, active KYC flags—that require sub-second query latency. On the other side: historical analytics, regulatory reporting, and ML feature stores that require querying months or years of data efficiently.
Traditional architectures solve this with two separate systems: a streaming database for real-time and a data warehouse for historical. This creates the classic data engineering headache: dual pipelines, data inconsistency, and high operational overhead.
The lakehouse pattern—pioneered by Delta Lake and Apache Iceberg—proposes a single storage layer (object storage like S3) that supports both operational and analytical workloads. RisingWave's native Iceberg sink connects the streaming and batch worlds: stream data flows through RisingWave's materialized views, then is written to Iceberg tables that Spark, Trino, Snowflake, or any Iceberg-compatible engine can query.
Setting Up the Streaming Layer
-- Payment events from core banking CDC
CREATE SOURCE payment_events (
payment_id VARCHAR,
sender_account VARCHAR,
receiver_account VARCHAR,
amount DECIMAL(18,2),
currency VARCHAR(3),
payment_type VARCHAR, -- 'ACH', 'WIRE', 'RTP', 'CARD'
status VARCHAR, -- 'INITIATED', 'SETTLED', 'FAILED', 'RETURNED'
initiated_at TIMESTAMPTZ,
settled_at TIMESTAMPTZ
) WITH (
connector = 'kafka',
topic = 'payment-events',
properties.bootstrap.server = 'kafka:9092',
scan.startup.mode = 'earliest'
) FORMAT PLAIN ENCODE JSON;
-- Fraud scores from ML inference service
CREATE SOURCE fraud_scores (
payment_id VARCHAR,
fraud_probability DECIMAL(5,4),
model_version VARCHAR,
scored_at TIMESTAMPTZ
) WITH (
connector = 'kafka',
topic = 'fraud-scores',
properties.bootstrap.server = 'kafka:9092',
scan.startup.mode = 'latest'
) FORMAT PLAIN ENCODE JSON;
Computing Materialized Views for Real-Time Serving
-- Real-time payment analytics: hourly settlement metrics
CREATE MATERIALIZED VIEW hourly_payment_metrics AS
SELECT
window_start,
window_end,
payment_type,
currency,
COUNT(*) AS total_payments,
COUNT(*) FILTER (WHERE status = 'SETTLED') AS settled_count,
COUNT(*) FILTER (WHERE status = 'FAILED') AS failed_count,
COUNT(*) FILTER (WHERE status = 'RETURNED') AS returned_count,
SUM(amount) FILTER (WHERE status = 'SETTLED') AS settled_volume,
AVG(amount) AS avg_payment_amount,
AVG(EXTRACT(EPOCH FROM (settled_at - initiated_at))) AS avg_settlement_seconds
FROM TUMBLE(payment_events, initiated_at, INTERVAL '1 HOUR')
GROUP BY window_start, window_end, payment_type, currency;
-- Enriched payment stream joining fraud scores
CREATE MATERIALIZED VIEW enriched_payments AS
SELECT
pe.payment_id,
pe.sender_account,
pe.receiver_account,
pe.amount,
pe.currency,
pe.payment_type,
pe.status,
pe.initiated_at,
pe.settled_at,
fs.fraud_probability,
fs.model_version,
CASE
WHEN fs.fraud_probability >= 0.8 THEN 'HIGH'
WHEN fs.fraud_probability >= 0.4 THEN 'MEDIUM'
ELSE 'LOW'
END AS fraud_risk_tier
FROM payment_events pe
LEFT JOIN fraud_scores fs ON pe.payment_id = fs.payment_id;
Sinking to Apache Iceberg
This is where the lakehouse architecture connects. RisingWave writes materialized view results directly to Iceberg tables on S3:
-- Sink enriched payments to Iceberg for historical analytics
CREATE SINK enriched_payments_iceberg
FROM enriched_payments
WITH (
connector = 'iceberg',
type = 'append-only',
warehouse.path = 's3://fintech-lakehouse/warehouse',
catalog.name = 'fintech',
catalog.type = 'storage',
database.name = 'payments',
table.name = 'enriched_payments'
);
-- Sink hourly metrics to Iceberg for BI and reporting
CREATE SINK hourly_metrics_iceberg
FROM hourly_payment_metrics
WITH (
connector = 'iceberg',
type = 'append-only',
warehouse.path = 's3://fintech-lakehouse/warehouse',
catalog.name = 'fintech',
catalog.type = 'storage',
database.name = 'payments',
table.name = 'hourly_payment_metrics'
);
Once data lands in Iceberg, it is immediately queryable by Spark, Trino, Snowflake External Tables, Databricks, and any other Iceberg-compatible engine—without any ETL job or data copy.
Regulatory Reporting View
-- Daily payment summary for regulatory reporting (queryable in RisingWave)
CREATE MATERIALIZED VIEW daily_regulatory_summary AS
SELECT
window_start::DATE AS report_date,
payment_type,
currency,
COUNT(*) AS total_count,
SUM(amount) AS total_amount,
COUNT(*) FILTER (WHERE fraud_risk_tier = 'HIGH') AS high_risk_count,
SUM(amount) FILTER (WHERE fraud_risk_tier = 'HIGH') AS high_risk_amount
FROM TUMBLE(enriched_payments, initiated_at, INTERVAL '1 DAY')
GROUP BY window_start, payment_type, currency;
Lakehouse Architecture Comparison
| Dimension | Traditional (Streaming DB + Warehouse) | Lakehouse (RisingWave + Iceberg) |
| Pipeline count | Two separate pipelines | Single streaming pipeline |
| Real-time latency | Sub-second (streaming DB) | Sub-second (RisingWave views) |
| Historical query engine | Data warehouse (proprietary) | Any Iceberg engine (open) |
| Data consistency | Risk of divergence | Single source of truth |
| Storage cost | High (duplicated storage) | Low (S3 object storage) |
| Schema evolution | Requires migration in both systems | Iceberg handles schema evolution natively |
| Operational overhead | Two systems to manage | One streaming layer + open storage |
FAQ
Q: What Iceberg catalog types does RisingWave support?
A: RisingWave supports storage-based catalogs (Iceberg's built-in REST catalog using the warehouse path) as well as Hive Metastore and AWS Glue catalog configurations. For production deployments, AWS Glue or a REST catalog service (like Nessie or Polaris) is recommended for better metadata management.
Q: How does schema evolution work when we add new fields to payment events? A: Iceberg natively supports adding nullable columns without rewriting existing data. On the RisingWave side, update the source schema and the materialized view definition. New columns appear in Iceberg with null values for historical rows. This is one of Iceberg's major advantages over Parquet-on-S3 approaches.
Q: Can we use RisingWave to backfill Iceberg from Kafka history?
A: Yes. Set scan.startup.mode = 'earliest' on your Kafka source to replay from the beginning of the topic's retention window. RisingWave will process all historical events, update its materialized views, and sink everything to Iceberg in one operation.
Q: How does this handle GDPR right-to-erasure on Iceberg data?
A: Iceberg supports row-level deletes via its DELETE statement (in V2 format). Use Apache Spark or Trino to issue targeted deletes on Iceberg tables for specific customer IDs. This rewrites only the affected data files, not the entire table.
Q: What is the typical write latency from RisingWave to Iceberg? A: RisingWave batches Iceberg writes and commits new snapshots periodically (configurable, typically every 60 seconds). This means Iceberg tables are slightly behind RisingWave's materialized views. For real-time queries, read from RisingWave; for historical analytics, read from Iceberg.
Get Started
- Set up your first streaming lakehouse with the RisingWave documentation.
- Join the RisingWave Slack to discuss Iceberg integration patterns with the community.

