We Built an Open Source S3 Tables Alternative

One-line deployment with REST catalog, automatic compaction, and an integrated ingestion engine, fully compatible with all query engines. Finally, you can run Iceberg just like Postgres.

Why We Built This

We started adopting Apache Iceberg internally almost three years ago. At that time, our goal was simple: store analytical data efficiently while avoiding vendor lock-in. But the ecosystem was still immature. There were barely any good catalog services, and we had to manage compaction ourselves by maintaining a Spark cluster.

RisingWave’s Iceberg journey.

Three years later, as a streaming lakehouse vendor, we are excited to see how far the Iceberg ecosystem has come. Nearly every company we talk to has either adopted Iceberg or is evaluating it. Yet one major frustration remains: onboarding is still too complicated. To run Iceberg in production, users need to handle the following components:

Catalog services
Compaction services
Ingestion engine (ETL/ELT, batch/streaming)
Interoperability across all query engines

Among current offerings, Amazon S3 Tables might be the only product that provides a seamless experience from onboarding to scaling. However, it is a black box that cannot be self-hosted and is hard to integrate with existing ecosystems. Snowflake-managed Iceberg tables are also restricted because you must be a Snowflake customer, and advanced features such as equality deletes are not fully supported.

We experienced the same pain. So we decided to make Iceberg truly open, self-hostable, and easy to adopt for everyone.

In the rest of this article, I will show how we achieved that with RisingWave and Lakekeeper, and how you can deploy the full stack using Docker or Kubernetes.

How It Works

We are the creators of RisingWave. Historically, RisingWave has been known for its stream processing capabilities that allow users to build real-time pipelines to continuously ingest, transform, and deliver data to downstream systems.

In early 2025, we released the Iceberg Table Engine, a feature that allows users to directly create tables stored in the Iceberg format. At that time, RisingWave offered a built-in JDBC catalog, but users still needed to host their own REST catalogs and perform compaction using external engines. It was not convenient.

To fix this, we introduced two major improvements:

Lakekeeper as the default REST catalog. Lakekeeper is one of the most popular open-source REST catalogs for Iceberg and has been adopted by many enterprises. RisingWave now includes built-in integration with Lakekeeper so that users can connect their own query engines to the same Iceberg tables without extra setup.
Lightweight compaction engine inside RisingWave. Traditional compaction tools such as Spark are difficult to maintain and not efficient for this specific purpose. We built a new Iceberg compaction engine based on DataFusion, a popular query engine in the data community, optimized for compaction efficiency.

With these two components combined, users can now run a fully managed Iceberg environment with a single command.

Notes: This is not just a demo. It is production-ready product that can serve as the backbone of your data infrastructure.

Getting Started

Let’s go through a simple, end-to-end setup using Docker on your local machine.

You will:

Spin up Lakekeeper’s REST catalog with MinIO (S3-compatible storage).
Connect RisingWave’s Iceberg Table Engine to that catalog.
Create and query native Iceberg tables directly in RisingWave (no external sink required).
Verify interoperability with DuckDB using the same REST catalog.

Installation & Environment Setup

Deploy Lakekeeper for a streaming lakehouse using Docker (local testing) or Helm (production).

Prerequisites

Docker Desktop (running) or a Kubernetes cluster with kubectl and Helm installed.
PostgreSQL psql client (to run SQL against RisingWave).

Option A — Local (Docker)

Use this for quick local runs.

Clone the repo RisingWave repository and switch to the Docker directory

git clone https://github.com/risingwavelabs/risingwave.git
cd risingwave/docker

Launch the environment

This starts a RisingWave cluster and the Lakekeeper REST catalog, spins up all required containers, and prepares the environment for creating Iceberg-native tables in RisingWave with the Lakekeeper catalog.

docker compose -f docker-compose-with-lakekeeper.yml up -d

What you get

A running RisingWave cluster + Lakekeeper REST catalog
A default warehouse: risingwave-warehouse
Lakekeeper UI at: http://localhost:8181/ui/warehouse

Option B — Production (Kubernetes/Helm)

Use this for a durable, production-style setup.

Ensure you have: a working Kubernetes cluster, kubectl, and Helm installed.
Follow the RisingWave documentation to deploy Lakekeeper via Helm (same components, production-ready parameters).

That’s it: Docker for local development, Helm for cluster environments.

Step 1: Lakekeeper bootstrap

After deploying Lakekeeper using the above Docker Compose setup, initialize it once using the management API:

curl -X POST http://127.0.0.1:8181/management/v1/bootstrap \\
  -H 'Content-Type: application/json' \\
  -d '{"accept-terms-of-use": true}'

Provision a warehouse backed by your S3-compatible storage (here we use MinIO):

curl -X POST http://127.0.0.1:8181/management/v1/warehouse \\
  -H 'Content-Type: application/json' \\
  -d '{
    "warehouse-name": "risingwave-warehouse",
    "delete-profile": { "type": "hard" },
    "storage-credential": {
      "type": "s3",
      "credential-type": "access-key",
      "aws-access-key-id": "hummockadmin",
      "aws-secret-access-key": "hummockadmin"
    },
    "storage-profile": {
      "type": "s3",
      "bucket": "hummock001",
      "region": "us-east-1",
      "flavor": "s3-compat",
      "endpoint": "http://minio-0:9301",
      "path-style-access": true,
      "sts-enabled": false,
      "key-prefix": "risingwave-lakekeeper"
    }
  }'

Step 2: Connect RisingWave to the REST catalog

Open a SQL shell to RisingWave:

psql -h localhost -p 4566 -d dev -U root

Create a connection pointing to the Lakekeeper REST catalog:

CREATE CONNECTION lakekeeper_catalog_conn
WITH (
    type = 'iceberg',
    catalog.type = 'rest',
    catalog.uri = 'http://lakekeeper:8181/catalog/',
    warehouse.path = 'risingwave-warehouse',
    s3.access.key = 'hummockadmin',
    s3.secret.key = 'hummockadmin',
    s3.path.style.access = 'true',
    s3.endpoint = 'http://minio-0:9301',
    s3.region = 'us-east-1'
);

Set it as the default Iceberg connection:

SET iceberg_engine_connection = 'public.lakekeeper_catalog_conn';

Step 3: Create a Native Iceberg Table

Here is an example table for banking transactions. RisingWave will write Iceberg files to your object store and register them in the REST catalog automatically.

CREATE TABLE bank_financial_transactions (
    txn_id        INT PRIMARY KEY,     -- unique transaction id
    account_id    INT,                 -- customer/account identifier
    txn_timestamp TIMESTAMP,           -- when the transaction occurred
    amount        DECIMAL,             -- money amount
    currency      VARCHAR,             -- e.g., USD
    txn_type      VARCHAR,             -- CREDIT or DEBIT
    merchant      VARCHAR,             -- merchant/payee if applicable
    category      VARCHAR,             -- e.g., utilities, groceries
    txn_status    VARCHAR              -- PENDING, POSTED, REVERSED
)
WITH (commit_checkpoint_interval = 1)
ENGINE = iceberg;

Commit interval note

The approximate commit time ≈ barrier_interval_ms × checkpoint_frequency × commit_checkpoint_interval. Tune this to balance latency vs. file sizes.

Step 4: Insert and Query Data

Insert five sample transactions for account 50123, then flush:

INSERT INTO bank_financial_transactions (
    txn_id, account_id, txn_timestamp, amount, currency, txn_type, merchant, category, txn_status
) VALUES
  (1001, 50123, '2025-10-03 09:15:23', 3250.00, 'USD', 'CREDIT', 'Acme Corp Payroll', 'income', 'POSTED'),
  (1002, 50123, '2025-10-04 12:48:10',   86.37, 'USD', 'DEBIT',  'Whole Foods Market', 'groceries', 'POSTED'),
  (1003, 50123, '2025-10-05 08:21:44',   54.19, 'USD', 'DEBIT',  'Starbucks', 'dining', 'POSTED'),
  (1004, 50123, '2025-10-06 18:02:05',   62.50, 'USD', 'DEBIT',  'Chevron', 'fuel', 'POSTED'),
  (1005, 50123, '2025-10-07 21:36:59',  129.99, 'USD', 'DEBIT',  'Amazon.com', 'shopping', 'PENDING');
FLUSH;

Query the table:

SELECT * FROM bank_financial_transactions;

txn_id | account_id |     txn_timestamp      | amount | currency | txn_type |       merchant        |  category  | txn_status
-------+------------+------------------------+--------+----------+----------+-----------------------+------------+-----------
1001   |      50123 | 2025-10-03 09:15:23    | 3250.00| USD      | CREDIT   | Acme Corp Payroll     | income     | POSTED
1002   |      50123 | 2025-10-04 12:48:10    |   86.37| USD      | DEBIT    | Whole Foods Market    | groceries  | POSTED
1003   |      50123 | 2025-10-05 08:21:44    |   54.19| USD      | DEBIT    | Starbucks             | dining     | POSTED
1004   |      50123 | 2025-10-06 18:02:05    |   62.50| USD      | DEBIT    | Chevron               | fuel       | POSTED
1005   |      50123 | 2025-10-07 21:36:59    |  129.99| USD      | DEBIT    | Amazon.com            | shopping   | PENDING
(5 rows)

You have just ingested and queried an Iceberg table as easily as a regular database table.

Step 5: Query from DuckDB

Install and launch DuckDB, attach the same REST catalog, and query the same table.

You will see identical results, confirming interoperability across open engines.

Install DuckDB CLI:

curl https://install.duckdb.org | sh

Launch it:

~/.duckdb/cli/latest/duckdb

Adds a host-level DNS override so your DuckDB can resolve minio-0 to 127.0.0.1.

This lets DuckDB outside Compose access MinIO at http://minio-0:9301 without changing endpoints.

echo "127.0.0.1 minio-0" | sudo tee -a /etc/hosts

Configure S3 settings and attach the same REST catalog:

SET s3_region = 'us-east-1';
SET s3_endpoint = 'http://minio-0:9301';
SET s3_access_key_id = 'hummockadmin';
SET s3_secret_access_key = 'hummockadmin';
SET s3_url_style = 'path';
SET s3_use_ssl = false;

ATTACH 'risingwave-warehouse' AS lakekeeper_catalog (
    TYPE ICEBERG,
    ENDPOINT 'http://127.0.0.1:8181/catalog/',
    AUTHORIZATION_TYPE 'none'
  );

Query the exact same table in DuckDB:

SELECT * FROM lakekeeper_catalog.public.bank_financial_transactions;
┌─────────┬────────────┬──────────────────────┬─────────┬──────────┬──────────┬──────────────────────┬───────────┬────────────┐
│ txn_id  │ account_id │    txn_timestamp     │ amount  │ currency │ txn_type │       merchant       │ category  │ txn_status │
│  int32  │   int32    │      timestamp       │ decimal │ varchar  │ varchar  │       varchar        │  varchar  │  varchar   │
├─────────┼────────────┼──────────────────────┼─────────┼──────────┼──────────┼──────────────────────┼───────────┼────────────┤
│    1001 │      50123 │ 2025-10-03 09:15:23  │  3250.00│ USD      │ CREDIT   │ Acme Corp Payroll    │ income    │ POSTED     │
│    1002 │      50123 │ 2025-10-04 12:48:10  │    86.37│ USD      │ DEBIT    │ Whole Foods Market   │ groceries │ POSTED     │
│    1003 │      50123 │ 2025-10-05 08:21:44  │    54.19│ USD      │ DEBIT    │ Starbucks            │ dining    │ POSTED     │
│    1004 │      50123 │ 2025-10-06 18:02:05  │    62.50│ USD      │ DEBIT    │ Chevron              │ fuel      │ POSTED     │
│    1005 │      50123 │ 2025-10-07 21:36:59  │   129.99│ USD      │ DEBIT    │ Amazon.com           │ shopping  │ PENDING    │
└─────────┴────────────┴──────────────────────┴─────────┴──────────┴──────────┴──────────────────────┴───────────┴────────────┘

Result: DuckDB reads the table registered in Lakekeeper and written by RisingWave. That’s the power of an open table format and an open REST catalog.

Advanced features you get out of the box

Time travel

Query past states of the table:

-- By timestamp
SELECT *
FROM bank_financial_transactions
FOR SYSTEM_TIME AS OF TIMESTAMPTZ '2025-10-07 22:00:00';

-- By snapshot ID
SELECT *
FROM bank_financial_transactions
FOR SYSTEM_VERSION AS OF 1234567890;

Partitioning strategies

Best practice is to choose a partition key that is a prefix of the primary key. If you want to partition by value_date and account_id, define the PK accordingly:

-- Partitioned variant of the table
CREATE TABLE bank_financial_transactions_p (
    value_date        DATE,
    account_id        INT,
    txn_id            INT,
    txn_timestamp     TIMESTAMP,
    amount            DECIMAL,
    currency          VARCHAR,
    txn_type          VARCHAR,
    channel           VARCHAR,
    merchant          VARCHAR,
    description       VARCHAR,
    category          VARCHAR,
    txn_status        VARCHAR,
    counterparty_acct VARCHAR,
    reference_id      VARCHAR,
    PRIMARY KEY (value_date, account_id, txn_id)
)
WITH (
    commit_checkpoint_interval = 1,
    partition_by = 'value_date, bucket(16, account_id)'
)
ENGINE = iceberg;

Rule: The partition key must be a prefix of the primary key.

Native compaction controls

Enable per table to keep file counts under control in streaming:

CREATE TABLE bank_financial_transactions_compacted (
    txn_id        INT PRIMARY KEY,
    account_id    INT,
    txn_timestamp TIMESTAMP,
    value_date    DATE,
    amount        DECIMAL,
    currency      VARCHAR,
    txn_type      VARCHAR,
    channel       VARCHAR,
    merchant      VARCHAR,
    description   VARCHAR,
    category      VARCHAR,
    txn_status    VARCHAR,
    counterparty_acct VARCHAR,
    reference_id  VARCHAR
)
WITH (
    enable_compaction = true,
    compaction_interval_sec = 3600,
    enable_snapshot_expiration = true,
    snapshot_expiration_max_age_millis = 604800000,
    snapshot_expiration_retain_last = 5,
    snapshot_expiration_clear_expired_files = true,
    snapshot_expiration_clear_expired_meta_data = true
)
ENGINE = iceberg;

Compaction requires a dedicated Iceberg compactor.

Configuration, limits, and best practices

Commit intervals

Tune commit cadence when creating the connection (example shown earlier).

Approximate commit time:

time = barrier_interval_ms × checkpoint_frequency × commit_checkpoint_interval

Limitations

Limited DDL: Some schema changes may require recreating the table.
Single writer: For tables created with this engine, only RisingWave should write to them to ensure consistency.

Best practices

Use the hosted catalog for the simplest setup, or Lakekeeper REST as shown here.
Tune commit intervals to balance file size and latency.
In AWS, consider S3 Tables for native compaction; use Lakekeeper as an open source REST alternative.
Design partitioning early.
Monitor small file growth and enable compaction as needed.

Conclusion

If you want Iceberg to feel like a database without sacrificing openness, RisingWave delivers: open Iceberg tables managed by RisingWave and cataloged by Lakekeeper (REST), SQL first ingestion, automatic compaction, time travel, and engine portability. Query the same data from DuckDB, Spark, or Trino with no copies.