Iceberg REST Catalog Integration with RisingWave

Iceberg REST Catalog Integration with RisingWave

The Iceberg REST Catalog specification provides a vendor-neutral HTTP API for managing Iceberg table metadata. RisingWave integrates with any compliant REST catalog — including the open-source Iceberg REST server, Project Nessie, Tabular, AWS Glue (via adapter), and Polaris — using the catalog.type = 'rest' parameter in CREATE SINK.

What Is the Iceberg REST Catalog?

Before REST catalogs, each Iceberg engine implemented its own catalog drivers: a Hive metastore driver, a Glue driver, a JDBC driver. Each had slightly different behaviors, and adding a new engine meant writing a new driver for every catalog type.

The Iceberg REST Catalog specification (part of the Apache Iceberg project) defines a standardized HTTP API for catalog operations:

  • GET /v1/namespaces — list namespaces
  • POST /v1/namespaces/{namespace}/tables — create a table
  • GET /v1/namespaces/{namespace}/tables/{table} — load table metadata
  • POST /v1/namespaces/{namespace}/tables/{table}/metrics — report metrics
  • POST /v1/transactions/commit — atomic multi-table commits

Any engine that implements the REST client (RisingWave, Trino, Spark, Flink) can work with any compliant server. This is the future of Iceberg catalog interoperability.

Catalog Options Comparison

CatalogOpen SourceMulti-EngineCloud ManagedAuth Support
Iceberg REST serverYesYesNoOAuth2, basic
Project NessieYesYesNoBearer token
AWS Glue (via REST adapter)PartialYesYesIAM/SigV4
TabularNoYesYesOAuth2
Polaris (Snowflake)YesYesYesOAuth2
Gravitino (Apache)YesYesNoOAuth2, basic

Deploying the Open-Source REST Catalog

For local development and self-hosted production, use the official Iceberg REST catalog Docker image:

# docker-compose.yml
services:
  iceberg-catalog:
    image: tabulario/iceberg-rest:latest
    environment:
      AWS_ACCESS_KEY_ID: ${AWS_ACCESS_KEY_ID}
      AWS_SECRET_ACCESS_KEY: ${AWS_SECRET_ACCESS_KEY}
      CATALOG_WAREHOUSE: s3://my-bucket/warehouse
      CATALOG_IO__IMPL: org.apache.iceberg.aws.s3.S3FileIO
      CATALOG_S3_ENDPOINT: https://s3.us-east-1.amazonaws.com
    ports:
      - "8181:8181"

Once running, verify with:

curl http://localhost:8181/v1/config
# Returns: {"defaults":{},"overrides":{}}

Configuring RisingWave Sinks

The most common RisingWave integration pattern: create a materialized view, then sink to Iceberg via REST catalog.

-- Aggregate IoT data from Kafka
CREATE SOURCE temperature_events (
    device_id   VARCHAR,
    location    VARCHAR,
    celsius     DOUBLE PRECISION,
    recorded_at TIMESTAMPTZ
)
WITH (
    connector        = 'kafka',
    topic            = 'iot.temperature',
    properties.bootstrap.server = 'kafka:9092',
    scan.startup.mode = 'earliest'
)
FORMAT PLAIN ENCODE JSON;

CREATE MATERIALIZED VIEW hourly_temperature AS
SELECT
    device_id,
    location,
    window_start,
    window_end,
    AVG(celsius)   AS avg_celsius,
    MIN(celsius)   AS min_celsius,
    MAX(celsius)   AS max_celsius,
    COUNT(*)       AS sample_count
FROM TUMBLE(temperature_events, recorded_at, INTERVAL '1 HOUR')
GROUP BY device_id, location, window_start, window_end;

-- Sink to Iceberg via REST catalog
CREATE SINK temperature_sink AS
SELECT * FROM hourly_temperature
WITH (
    connector      = 'iceberg',
    type           = 'append-only',
    catalog.type   = 'rest',
    catalog.uri    = 'http://iceberg-catalog:8181',
    warehouse.path = 's3://my-lakehouse/warehouse',
    s3.region      = 'us-east-1',
    database.name  = 'iot',
    table.name     = 'hourly_temperature'
);

RisingWave calls the catalog's POST /v1/namespaces/iot/tables/hourly_temperature endpoint to create the table if it doesn't exist, then uses the catalog to commit each snapshot.

Authentication Configuration

Most production catalogs require authentication. RisingWave supports OAuth2 bearer tokens and basic authentication for REST catalog connections:

-- With OAuth2 bearer token (e.g., Tabular, Polaris)
CREATE SINK secure_sink AS
SELECT * FROM my_mv
WITH (
    connector          = 'iceberg',
    type               = 'upsert',
    primary_key        = 'id',
    catalog.type       = 'rest',
    catalog.uri        = 'https://catalog.tabular.io/ws/my-workspace',
    catalog.credential = 'my-oauth2-token',
    warehouse.path     = 's3://my-bucket/warehouse',
    s3.region          = 'us-east-1',
    database.name      = 'production',
    table.name         = 'events'
);

For AWS Glue via the REST catalog adapter, configure IAM-based authentication through environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY) rather than inline credentials.

Namespace Management

Iceberg REST catalogs organize tables into namespaces (equivalent to databases/schemas). Create and manage namespaces via the REST API before creating tables:

# Create a namespace
curl -X POST http://iceberg-catalog:8181/v1/namespaces \
  -H "Content-Type: application/json" \
  -d '{"namespace": ["production"], "properties": {"owner": "data-team"}}'

# List tables in a namespace
curl http://iceberg-catalog:8181/v1/namespaces/production/tables

RisingWave's database.name maps to the top-level namespace. Multi-level namespaces (e.g., production.analytics) require the catalog to support nested namespaces.

Using Nessie for Git-like Catalog Branching

Project Nessie adds Git-like branching to the Iceberg catalog — you can create a feature branch of your catalog, test schema changes in isolation, and merge back to main. RisingWave works with Nessie via its REST-compatible API:

-- Write to Nessie catalog on a specific branch
CREATE SINK nessie_sink AS
SELECT * FROM my_mv
WITH (
    connector      = 'iceberg',
    type           = 'append-only',
    catalog.type   = 'rest',
    catalog.uri    = 'http://nessie:19120/iceberg',
    catalog.warehouse = 's3://my-bucket/warehouse',
    s3.region      = 'us-east-1',
    database.name  = 'analytics',
    table.name     = 'events'
);

Nessie's branching model is particularly valuable for testing schema migrations: run ALTER TABLE on a branch, validate with Trino, then merge to main without affecting the production stream.

Catalog Health and Observability

Monitor catalog operations with RisingWave's system tables:

-- Check sink status and catalog connectivity
SELECT sink_name, sink_type, connection_params, status
FROM rw_sinks
WHERE sink_type = 'iceberg';

On the catalog side, monitor the metrics endpoint:

# Nessie metrics (Prometheus compatible)
curl http://nessie:19120/q/metrics | grep iceberg_catalog

# Tabular REST catalog reports metrics per table
curl https://catalog.tabular.io/ws/my-workspace/v1/namespaces/production/tables/events/metrics

FAQ

Q: What is the difference between catalog.type = 'rest' and catalog.type = 'storage' in RisingWave? A: The REST catalog uses an external HTTP service for metadata management and supports multi-writer concurrency control. The storage catalog stores metadata files alongside data files on S3 and is simpler but does not support multi-writer safety.

Q: Can I use the same REST catalog for both RisingWave and Spark? A: Yes. This is the primary value proposition of the REST catalog spec. Both engines implement the same client-side catalog protocol; the server handles concurrency.

Q: How do I configure TLS for the REST catalog connection? A: Specify an HTTPS catalog.uri. RisingWave respects the system trust store for certificate validation. For self-signed certificates in development, configure the JVM trust store or use a reverse proxy with a valid certificate.

Q: Does the REST catalog support table-level access control? A: It depends on the server implementation. Polaris and Tabular offer fine-grained RBAC at the table level via OAuth2 scopes. The open-source reference implementation has basic namespace-level access control.

Q: What happens to the REST catalog if it goes down while RisingWave is writing? A: RisingWave buffers writes and retries catalog commits with exponential backoff. Data files are already on S3 when the commit is attempted, so no data is lost. Catalog downtime increases write latency but does not lose data.

Get Started

Connect RisingWave to your Iceberg REST catalog today:

Best-in-Class Event Streaming
for Agents, Apps, and Analytics
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.