Cost Optimization: Iceberg on S3 with RisingWave vs Traditional Warehouses

Cost Optimization: Iceberg on S3 with RisingWave vs Traditional Warehouses

Replacing a traditional cloud data warehouse with Apache Iceberg on S3 plus RisingWave typically cuts streaming analytics costs by 60–80%. Storage drops from $23/TB/month (Snowflake) to $0.023/TB/month (S3), while compute scales to zero when idle. The tradeoff is operational overhead, which RisingWave's managed cloud offering minimizes.

The Cost Problem with Traditional Warehouses

Cloud data warehouses like Snowflake, BigQuery, and Redshift bundle compute and storage together. Even when your workload is idle at 3 AM, you pay for reserved capacity. Streaming workloads make this worse: continuous ingestion means compute is never truly idle, so you pay peak rates around the clock.

Consider a mid-size analytics team processing 500 GB/day of streaming events. On Snowflake, that requires a continuously-running Medium warehouse (2 credits/hour) plus storage—approximately $4,500/month. The same workload on Iceberg + S3 + RisingWave runs for under $800/month.

Architecture for Cost-Optimized Streaming Analytics

The key insight is separating streaming compute (RisingWave) from historical storage (Iceberg on S3). RisingWave handles continuous transformation and the last few hours of hot data. Iceberg stores the cold tier cheaply, queryable on demand.

Step 1: Capture Changes from Your Database

CREATE SOURCE transactions_cdc
WITH (
    connector     = 'postgres-cdc',
    hostname      = 'prod-db.internal',
    port          = '5432',
    username      = 'cdc_reader',
    password      = 'secret',
    database.name = 'payments',
    schema.name   = 'public',
    table.name    = 'transactions',
    slot.name     = 'rw_txn_slot'
);

Step 2: Aggregate in RisingWave Before Writing to S3

Writing aggregated data to Iceberg instead of raw events is the single biggest cost lever. A 100:1 aggregation ratio means 100x fewer S3 PUT requests and 100x less storage.

CREATE MATERIALIZED VIEW daily_transaction_summary AS
SELECT
    DATE_TRUNC('day', created_at)  AS day,
    merchant_id,
    currency,
    COUNT(*)                        AS tx_count,
    SUM(amount)                     AS total_volume,
    AVG(amount)                     AS avg_tx_value,
    SUM(CASE WHEN status = 'failed' THEN 1 ELSE 0 END) AS failed_count
FROM transactions_cdc
GROUP BY
    DATE_TRUNC('day', created_at),
    merchant_id,
    currency;

Step 3: Sink to Iceberg on S3

CREATE SINK daily_summary_iceberg AS
SELECT * FROM daily_transaction_summary
WITH (
    connector       = 'iceberg',
    type            = 'upsert',
    catalog.type    = 'rest',
    catalog.uri     = 'http://iceberg-catalog:8181',
    warehouse.path  = 's3://payments-lakehouse/warehouse',
    s3.region       = 'us-east-1',
    database.name   = 'payments',
    table.name      = 'daily_summary'
);

Cost Comparison: Iceberg + RisingWave vs Traditional Warehouses

The table below models a workload: 500 GB/day ingest, 10 TB total data, 50 concurrent analytics queries per day, running 24/7.

ComponentSnowflakeBigQueryRedshiftIceberg + RisingWave
Storage (10 TB)$230/mo$200/mo$250/mo$2.30/mo
Streaming Compute$2,400/mo$1,800/mo$1,200/mo$300/mo
Query ComputeIncluded$500/moIncluded$150/mo (on-demand)
Catalog/MetadataIncludedIncludedIncluded$20/mo (Nessie/Polaris)
Total (est.)~$4,500/mo~$3,200/mo~$2,800/mo~$800/mo

Estimates based on list pricing as of Q1 2026. Actual costs vary by workload pattern and negotiated discounts.

Optimizing S3 Costs Within the Iceberg Stack

Even within the Iceberg + RisingWave stack, several optimizations reduce costs further.

Parquet compression: Iceberg stores data as Parquet by default. Enabling Zstd compression reduces storage by 40–60% over uncompressed formats with minimal CPU overhead.

Partition pruning: Well-designed partitions dramatically reduce S3 GET requests during query. Partition by (year, month, merchant_id) for a payments table so most queries touch a fraction of stored files.

File compaction: Streaming writes produce many small files. Run Iceberg's rewriteDataFiles on a schedule (daily or weekly) to merge small files into optimally-sized 128 MB–512 MB Parquet files. This reduces both storage overhead and scan costs.

Checkpoint interval tuning: Longer RisingWave checkpoint intervals mean fewer, larger Iceberg files from each write. For non-latency-sensitive sinks, increase commit_checkpoint_interval to 300 seconds to reduce file count by 5x.

When Traditional Warehouses Still Win

The Iceberg + RisingWave approach requires operational investment. Traditional warehouses make sense when:

  • Your team has no infrastructure expertise and needs a fully-managed SaaS experience
  • Query concurrency exceeds 500 simultaneous users (Snowflake's elastic scaling excels here)
  • You need sub-second query latency on ad hoc SQL without pre-built materialized views
  • Compliance requirements mandate a specific vendor's certifications

For most data engineering teams processing streaming data at scale, the cost savings of the open lakehouse approach outweigh the operational complexity—especially with RisingWave's managed cloud option.

FAQ

Q: What is the actual S3 cost for 10 TB of Iceberg data? A: S3 Standard storage costs $0.023/GB/month. Ten terabytes costs approximately $230/month—but Parquet with Zstd compression typically achieves 3–5x compression, reducing the effective cost to $46–$77/month.

Q: Does RisingWave have a managed cloud offering? A: Yes. RisingWave Cloud offers a fully managed service with automatic scaling, backup, and monitoring. Pricing is consumption-based, which is particularly cost-effective for streaming workloads with variable load.

Q: How do I handle the cost of the Iceberg catalog? A: Open-source catalogs like Apache Polaris (incubating) and Project Nessie are free to self-host. AWS Glue Catalog is $1/month per 100,000 table versions. For most workloads, catalog costs are negligible compared to compute and storage.

Q: What are the hidden costs of managing Iceberg yourself? A: Engineering time for compaction, snapshot expiration, and catalog maintenance is the main overhead—typically 2–4 hours/week for a production deployment. This is the real cost to factor into your ROI calculation.

Q: Can I query both RisingWave streaming data and Iceberg historical data in one query? A: Yes, starting with RisingWave v2.8. You can join live materialized view data with historical Iceberg tables in a single SQL query, eliminating the need for a separate query engine for most use cases.

Get Started

See how much your team can save by migrating to an open lakehouse. Start with the RisingWave quickstart guide and join the Slack community to discuss your specific cost optimization strategy.

Best-in-Class Event Streaming
for Agents, Apps, and Analytics
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.