Full Control of Your Lakehouse: RisingWave's Iceberg REST Catalog Support

Full Control of Your Lakehouse: RisingWave's Iceberg REST Catalog Support

RisingWave's integration with Apache Iceberg took a significant step forward with the introduction of REST catalog support in v2.5. This enhancement allows you to connect RisingWave to any modern Iceberg catalog service through a standardized API. We are especially excited to highlight the seamless integration with Lakekeeper, a new, open-source, self-hosted Iceberg REST catalog.

This combination empowers you to build a truly open and flexible streaming lakehouse, giving you full control over your metadata without vendor lock-in.

Why a REST Catalog Matters

An Iceberg catalog is the central nervous system of your lakehouse, tracking table schemas, snapshots, and data file locations. The Iceberg REST Catalog specification has become the modern standard for interoperability, allowing different processing engines like Spark, Flink, and RisingWave to communicate with a single, shared metadata service. This API-driven approach ensures a consistent and unified view of your data across your entire stack.

Take Control with Lakekeeper: A Self-Hosted Catalog

To help you leverage this new REST capability, we're highlighting Lakekeeper, a fast and lightweight open-source Iceberg REST catalog written in Rust. Instead of relying on a managed (and often costly) cloud service like AWS Glue, you can deploy Lakekeeper in your own environment using Docker or Kubernetes. This gives you the benefits of a modern catalog while maintaining complete control over your infrastructure and costs.

Benefits of Using RisingWave with Lakekeeper

  • Full Control & No Vendor Lock-in: By self-hosting your catalog with Lakekeeper, you own your metadata. This prevents dependency on proprietary cloud services and gives you the freedom to choose the best tools for your stack.

  • Open and Interoperable: The integration is built on the official Iceberg REST protocol. This means any tool that speaks this standard can interact with the tables you create and manage with RisingWave.

  • Simplified, Modern Stack: Lakekeeper is a lightweight, single binary that is easy to deploy and manage, making a production-ready streaming lakehouse more accessible than ever.

  • Unified Streaming and Management: Use RisingWave to ingest data, perform real-time transformations, and sink results directly into Iceberg tables managed by your Lakekeeper instance—all within a cohesive, SQL-driven workflow.

How to Get Started: A Practical Guide

First, ensure you have a running instance of Lakekeeper and your S3-compatible storage (like Minio). The easiest way to do this is by using the docker-compose-with-lakekeeper.yml file from the RisingWave repository, which handles the setup for you.

RisingWave offers two distinct ways to interact with an Iceberg REST catalog based on your goal.

Use Case 1: Sinking Data into an Iceberg Table

This approach is for streaming data from a RisingWave source or materialized view into an Iceberg table. All the necessary connection parameters for the catalog and storage are defined directly within the CREATE SINK statement.

CREATE SINK users_sink
FROM user_profiles_stream
WITH (
    connector = 'iceberg',
    type = 'upsert',
    primary_key = 'user_id',

    -- Catalog Configuration
    catalog.type = 'rest',
    catalog.uri = '<http://lakekeeper:8181>', -- Your Lakekeeper endpoint

    -- Warehouse and Table info
    warehouse.path = 's3://warehouse/',
    database.name = 'users',
    table.name = 'profiles',

    -- S3 Configuration
    s3.endpoint = '<http://minio:9301>',
    s3.access.key = 'minioadmin',
    s3.secret.key = 'minioadmin'
);

Use Case 2: Creating and Managing Native Iceberg Tables

This approach is for creating Iceberg tables that are natively managed by RisingWave. It allows you to create and interact with Iceberg tables directly using SQL. This requires creating a reusable CONNECTION object first.

Step 1: Create a Reusable Connection

Define a CONNECTION object that stores the details for your Iceberg catalog and warehouse. This makes your configuration clean and reusable.

CREATE CONNECTION lakekeeper_catalog_conn
WITH (
  type = 'iceberg',
  catalog.type = 'rest',
  catalog.uri = '<http://lakekeeper:8181/catalog/>', -- URI of your Lakekeeper service
  warehouse.path = 'my-warehouse',
  s3.endpoint = '<http://minio:9301>',
  s3.access.key = 'minioadmin',
  s3.secret.key = 'minioadmin',
  s3.region = 'us-east-1'
);

Step 2: Set the Connection as Active

Activate the connection to make it the default for all native Iceberg table operations in your session or for the entire system.

-- Set the connection as Iceberg engine default connection
SET iceberg_engine_connection = 'public.lakekeeper_catalog_conn';
ALTER SYSTEM SET iceberg_engine_connection = 'public.lakekeeper_catalog_conn';

Step 3: Create a Native Iceberg Table

Now, create your Iceberg table. The WITH clause is minimal because all catalog and storage details are inherited from the active CONNECTION.

CREATE TABLE users (
   user_id INT,
   user_name VARCHAR
) WITH (
   connector = 'iceberg'
);

Conclusion

The support for Iceberg REST catalogs in RisingWave, especially when paired with a self-hosted solution like Lakekeeper, marks a significant milestone. It offers a powerful, flexible, and cost-effective path toward building a modern streaming lakehouse. This feature empowers you to take full ownership of your data architecture, break free from vendor lock-in, and embrace the interoperability of the open data ecosystem. We can't wait to see what you build.

Get Started with RisingWave

If you’d like to see a personalized demo or discuss how this could work for your use case, please contact our sales team.

The Modern Backbone for Your
Data Streaming Workloads
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.