Iceberg REST Catalog: Setup Guide for Streaming Pipelines

Introduction

You have three engines querying your Iceberg tables: RisingWave for streaming writes, Trino for interactive queries, and Spark for batch compaction. Each engine needs to know where your tables live, what their schemas look like, and which snapshots are current. Without a shared catalog, you end up with configuration scattered across engine configs, environment variables, and deployment scripts.

The Iceberg REST catalog solves this by providing a single HTTP API that all engines use to discover and manage Iceberg tables. It is the modern replacement for Hive Metastore, designed specifically for Iceberg's needs rather than adapted from Hive's legacy catalog model.

This guide walks through setting up an Iceberg REST catalog from scratch and connecting it to RisingWave for streaming sink pipelines. You will learn the architecture, configuration options, authentication setup, and how to verify the entire stack works end to end. All SQL examples target RisingWave v2.3.

Why Choose an Iceberg REST Catalog Over Hive Metastore?

If you have used Hive Metastore with Iceberg, you know it works. But it comes with significant operational overhead and architectural limitations that become more painful as your Iceberg deployment grows.

Problems with Hive Metastore for Iceberg

Heavyweight dependency: Hive Metastore requires a JVM process, a relational database (MySQL or PostgreSQL) for its backend, and Hadoop libraries. This is a lot of infrastructure to manage just for table metadata.

Schema mismatch: Hive Metastore was designed for Hive tables, not Iceberg tables. Iceberg stores its own metadata in JSON files on object storage. The Hive Metastore entry for an Iceberg table is essentially a pointer to the metadata location, making the Hive schema largely redundant.

Limited Iceberg features: Hive Metastore does not natively support Iceberg features like multi-table transactions, view definitions, or server-side access control policies. These features require catalog-level support that Hive Metastore was never designed to provide.

Thrift protocol: Hive Metastore uses Apache Thrift for communication, which is harder to debug, monitor, and secure compared to HTTP/REST APIs. You cannot test the catalog with curl or inspect requests in standard HTTP monitoring tools.

Advantages of the REST Catalog

HTTP-native: Standard REST API over HTTP/HTTPS. You can test endpoints with curl, monitor traffic with standard tools, and route through API gateways.

Lightweight: No JVM, no Hadoop dependencies. A REST catalog server can run as a single container with minimal resource requirements.

Iceberg-native: The REST catalog spec was designed specifically for Iceberg. It supports namespace management, table creation with partition specs, server-side access control, and vended credentials.

Multi-engine by design: Every major query engine supports the REST catalog protocol. Configuring a new engine is a matter of pointing it to a URL, not installing Hive client libraries.

Vended credentials: The REST catalog can issue temporary, scoped storage credentials to clients, eliminating the need to distribute long-lived S3 keys to every engine.

Aspect	Hive Metastore	REST Catalog
Protocol	Thrift (binary)	HTTP/REST (JSON)
Dependencies	JVM, Hadoop, RDBMS	Container, optional RDBMS
Iceberg feature support	Partial	Full
Credential management	Client-side	Vended (server-side)
Debugging	Thrift inspection	curl, HTTP logging
Multi-engine support	Requires Hive client	HTTP client only

How Do You Set Up an Iceberg REST Catalog?

Several implementations of the Iceberg REST catalog spec are available. This guide uses a Docker Compose setup that works for development and small production deployments.

Architecture Overview

A typical REST catalog deployment includes three components:

REST catalog server: Implements the Iceberg REST API. Stores table metadata in a backend database.
Backend database (PostgreSQL): Stores catalog metadata (namespace definitions, table locations, current metadata pointers).
Object storage (S3 or MinIO): Stores Iceberg data files, metadata files, and manifests.

Step 1: Docker Compose Configuration

Create a docker-compose.yml file with all three components:

version: '3.8'

services:
  postgres:
    image: postgres:16
    environment:
      POSTGRES_USER: iceberg
      POSTGRES_PASSWORD: iceberg_password
      POSTGRES_DB: iceberg_catalog
    ports:
      - "5432:5432"
    volumes:
      - postgres_data:/var/lib/postgresql/data

  minio:
    image: minio/minio:latest
    environment:
      MINIO_ROOT_USER: admin
      MINIO_ROOT_PASSWORD: password123
    command: server /data --console-address ":9001"
    ports:
      - "9000:9000"
      - "9001:9001"
    volumes:
      - minio_data:/data

  rest-catalog:
    image: tabulario/iceberg-rest:1.6.1
    environment:
      CATALOG_WAREHOUSE: s3a://warehouse/iceberg
      CATALOG_IO__IMPL: org.apache.iceberg.aws.s3.S3FileIO
      CATALOG_S3_ENDPOINT: http://minio:9000
      CATALOG_S3_PATH__STYLE__ACCESS: "true"
      AWS_ACCESS_KEY_ID: admin
      AWS_SECRET_ACCESS_KEY: password123
      AWS_REGION: us-east-1
      CATALOG_JDBC_URI: jdbc:postgresql://postgres:5432/iceberg_catalog
      CATALOG_JDBC_USER: iceberg
      CATALOG_JDBC_PASSWORD: iceberg_password
    ports:
      - "8181:8181"
    depends_on:
      - postgres
      - minio

volumes:
  postgres_data:
  minio_data:

Step 2: Start the Services

# Create the S3 bucket for Iceberg
docker compose up -d minio
sleep 5

# Create the warehouse bucket
docker compose exec minio mc alias set local http://localhost:9000 admin password123
docker compose exec minio mc mb local/warehouse

# Start all services
docker compose up -d

Step 3: Verify the REST Catalog Is Running

Test the catalog endpoint:

# List namespaces (should return empty list initially)
curl -s http://localhost:8181/v1/namespaces | jq .

# Expected output:
# { "namespaces": [] }

Create a namespace:

curl -s -X POST http://localhost:8181/v1/namespaces \
  -H "Content-Type: application/json" \
  -d '{"namespace": ["analytics"]}' | jq .

# Expected output:
# {
#   "namespace": ["analytics"],
#   "properties": {}
# }

The REST catalog is now running and ready to manage Iceberg tables.

How Do You Configure Authentication for the REST Catalog?

Production deployments need authentication. The Iceberg REST catalog spec supports two primary authentication methods.

OAuth2 Authentication

OAuth2 is the recommended authentication method for production. The catalog server validates bearer tokens issued by your identity provider.

Configure the REST catalog server with an OAuth2 provider:

rest-catalog:
  environment:
    CATALOG_OAUTH2_ENABLED: "true"
    CATALOG_OAUTH2_SERVER_URI: https://auth.example.com/oauth2/token
    CATALOG_OAUTH2_CREDENTIAL: "client_id:client_secret"
    CATALOG_OAUTH2_SCOPE: "catalog"

Client engines then authenticate by obtaining a token and passing it in requests:

-- RisingWave sink with OAuth2 authentication
CREATE SINK orders_sink FROM orders_mv
WITH (
    connector = 'iceberg',
    type = 'append-only',
    catalog.type = 'rest',
    catalog.name = 'production',
    catalog.uri = 'https://catalog.example.com',
    catalog.credential = 'client_id:client_secret',
    catalog.oauth2_server_uri = 'https://auth.example.com/oauth2/token',
    catalog.scope = 'catalog:read,catalog:write',
    warehouse.path = 's3a://warehouse/iceberg',
    database.name = 'analytics',
    table.name = 'orders',
    s3.access.key = 'AKIAEXAMPLE',
    s3.secret.key = 'secret',
    s3.region = 'us-east-1'
);

Token-Based Authentication

For simpler setups, you can use static bearer tokens:

CREATE SINK orders_sink FROM orders_mv
WITH (
    connector = 'iceberg',
    type = 'append-only',
    catalog.type = 'rest',
    catalog.name = 'staging',
    catalog.uri = 'http://rest-catalog:8181',
    catalog.token = 'your-bearer-token-here',
    warehouse.path = 's3a://warehouse/iceberg',
    database.name = 'analytics',
    table.name = 'orders',
    s3.endpoint = 'http://minio:9000',
    s3.access.key = 'admin',
    s3.secret.key = 'password123',
    s3.region = 'us-east-1'
);

Vended Credentials

One of the REST catalog's most valuable features is vended credentials. Instead of distributing S3 access keys to every engine, the catalog server issues temporary, scoped credentials when a client requests table access.

In RisingWave, enable vended credentials with:

CREATE SINK orders_sink FROM orders_mv
WITH (
    connector = 'iceberg',
    type = 'append-only',
    catalog.type = 'rest',
    catalog.name = 'production',
    catalog.uri = 'https://catalog.example.com',
    catalog.credential = 'client_id:client_secret',
    catalog.oauth2_server_uri = 'https://auth.example.com/oauth2/token',
    vended_credentials = 'true',
    warehouse.path = 's3a://warehouse/iceberg',
    database.name = 'analytics',
    table.name = 'orders',
    s3.region = 'us-east-1'
);

With vended_credentials = 'true', RisingWave requests temporary S3 credentials from the REST catalog server instead of using locally configured keys. This improves security by centralizing credential management and reducing the blast radius of a compromised client.

How Do You Connect RisingWave to the REST Catalog?

With the REST catalog running, let's set up a complete streaming pipeline from RisingWave to Iceberg.

Step 1: Create Source Data in RisingWave

-- Create a table to receive streaming events
CREATE TABLE sensor_readings (
    sensor_id VARCHAR,
    location VARCHAR,
    temperature DOUBLE PRECISION,
    humidity DOUBLE PRECISION,
    reading_time TIMESTAMP
);

Step 2: Create a Materialized View for Aggregation

-- Compute 5-minute averages per sensor location
CREATE MATERIALIZED VIEW sensor_averages AS
SELECT
    location,
    window_start,
    window_end,
    COUNT(*) AS reading_count,
    AVG(temperature) AS avg_temperature,
    AVG(humidity) AS avg_humidity,
    MIN(temperature) AS min_temperature,
    MAX(temperature) AS max_temperature
FROM TUMBLE(sensor_readings, reading_time, INTERVAL '5 minutes')
GROUP BY location, window_start, window_end;

Step 3: Create the Iceberg Sink with REST Catalog

CREATE SINK sensor_iceberg_sink FROM sensor_averages
WITH (
    connector = 'iceberg',
    type = 'append-only',
    force_append_only = 'true',
    catalog.type = 'rest',
    catalog.name = 'my_catalog',
    catalog.uri = 'http://rest-catalog:8181',
    warehouse.path = 's3a://warehouse/iceberg',
    database.name = 'analytics',
    table.name = 'sensor_averages',
    s3.endpoint = 'http://minio:9000',
    s3.access.key = 'admin',
    s3.secret.key = 'password123',
    s3.region = 'us-east-1',
    create_table_if_not_exists = 'true',
    partition_by = 'day(window_start), truncate(5, location)'
);

Step 4: Insert Test Data and Verify

-- Insert sample sensor readings
INSERT INTO sensor_readings VALUES
    ('S001', 'warehouse-A', 22.5, 45.0, '2026-03-29 10:00:00'),
    ('S002', 'warehouse-A', 23.1, 44.5, '2026-03-29 10:01:00'),
    ('S003', 'warehouse-B', 18.7, 55.2, '2026-03-29 10:02:00'),
    ('S004', 'warehouse-B', 19.2, 54.8, '2026-03-29 10:03:00'),
    ('S005', 'warehouse-A', 22.8, 45.5, '2026-03-29 10:04:00'),
    ('S006', 'warehouse-C', 25.3, 38.0, '2026-03-29 10:05:00'),
    ('S007', 'warehouse-C', 25.8, 37.5, '2026-03-29 10:06:00'),
    ('S008', 'warehouse-A', 23.5, 44.0, '2026-03-29 10:07:00');

Verify the materialized view is computing results:

SELECT * FROM sensor_averages;

Expected output:

 location      | window_start             | window_end               | reading_count | avg_temperature | avg_humidity | min_temperature | max_temperature
---------------+--------------------------+--------------------------+---------------+-----------------+--------------+-----------------+----------------
 warehouse-A   | 2026-03-29 10:00:00.000  | 2026-03-29 10:05:00.000  | 3             | 22.8            | 45.0         | 22.5            | 23.1
 warehouse-B   | 2026-03-29 10:00:00.000  | 2026-03-29 10:05:00.000  | 2             | 18.95           | 55.0         | 18.7            | 19.2
 warehouse-A   | 2026-03-29 10:05:00.000  | 2026-03-29 10:10:00.000  | 1             | 23.5            | 44.0         | 23.5            | 23.5
 warehouse-C   | 2026-03-29 10:05:00.000  | 2026-03-29 10:10:00.000  | 2             | 25.55           | 37.75        | 25.3            | 25.8

After the commit interval elapses, verify that data appears in Iceberg by querying from Trino or Spark:

-- From Trino connected to the same REST catalog
SELECT * FROM iceberg.analytics.sensor_averages
WHERE window_start >= TIMESTAMP '2026-03-29 10:00:00';

For complete documentation on RisingWave's Iceberg catalog configuration, see the Iceberg catalog configuration reference.

How Do You Connect Multiple Engines to the Same REST Catalog?

One of the primary benefits of a REST catalog is multi-engine access. Here is how to configure Spark and Trino alongside RisingWave.

Spark Configuration

Add the REST catalog to your Spark session:

from pyspark.sql import SparkSession

spark = SparkSession.builder \
    .appName("iceberg-analytics") \
    .config("spark.sql.catalog.my_catalog", "org.apache.iceberg.spark.SparkCatalog") \
    .config("spark.sql.catalog.my_catalog.type", "rest") \
    .config("spark.sql.catalog.my_catalog.uri", "http://rest-catalog:8181") \
    .config("spark.sql.catalog.my_catalog.warehouse", "s3a://warehouse/iceberg") \
    .config("spark.sql.catalog.my_catalog.io-impl", "org.apache.iceberg.aws.s3.S3FileIO") \
    .config("spark.sql.catalog.my_catalog.s3.endpoint", "http://minio:9000") \
    .getOrCreate()

# Query the table that RisingWave is streaming to
spark.sql("SELECT * FROM my_catalog.analytics.sensor_averages").show()

Trino Configuration

Add the REST catalog as a Trino connector in etc/catalog/iceberg.properties:

connector.name=iceberg
iceberg.catalog.type=rest
iceberg.rest-catalog.uri=http://rest-catalog:8181
iceberg.rest-catalog.warehouse=s3a://warehouse/iceberg
fs.native-s3.enabled=true
s3.endpoint=http://minio:9000
s3.aws-access-key=admin
s3.aws-secret-key=password123
s3.region=us-east-1
s3.path-style-access=true

Now all three engines share the same catalog. RisingWave streams data in, Spark handles batch compaction and maintenance, and Trino serves interactive queries, all pointing to the same set of Iceberg tables through one REST catalog endpoint.

Why Multi-Engine Access Matters for Streaming

In a streaming architecture, different engines serve different roles:

RisingWave: Continuous ingestion and transformation. Writes streaming results to Iceberg tables.
Spark: Periodic maintenance jobs (compaction, snapshot expiration, schema evolution).
Trino/DuckDB: Ad-hoc analytics and dashboard queries on the fresh data RisingWave produces.

The REST catalog is the coordination point. When RisingWave commits a new snapshot, Trino immediately sees the new data because both engines read from the same catalog metadata. No syncing scripts, no cache invalidation, no manual refreshes.

For more on building a streaming lakehouse architecture, see the RisingWave lakehouse overview.

What Are Common Troubleshooting Steps?

Connection Refused Errors

If RisingWave cannot connect to the REST catalog:

Verify the catalog is running: curl http://rest-catalog:8181/v1/config
Check network connectivity between RisingWave and the catalog (Docker network, Kubernetes service DNS)
Ensure the catalog.uri in your CREATE SINK uses the correct hostname and port

Authentication Failures

For OAuth2 authentication issues:

Test the token endpoint directly: curl -X POST https://auth.example.com/oauth2/token -d 'grant_type=client_credentials&client_id=...&client_secret=...'
Verify the scope includes the required permissions
Check that the catalog server's OAuth2 configuration matches the client credentials

Table Not Found Errors

If RisingWave reports that the target table does not exist:

Verify the namespace exists: curl http://rest-catalog:8181/v1/namespaces
Check the database.name and table.name parameters match the catalog's namespace and table
If using create_table_if_not_exists = 'true', ensure the namespace already exists (RisingWave creates the table but not the namespace)

S3 Permission Errors

If writes fail with access denied:

Verify the S3 credentials have write access to the warehouse path
If using vended credentials, check that the REST catalog server has the permissions to issue temporary credentials
For MinIO, ensure the bucket exists and the access policy allows the configured user

FAQ

What is an Iceberg REST catalog?

An Iceberg REST catalog is an HTTP-based service that implements the Apache Iceberg REST catalog specification for managing table metadata. It provides a standard API for creating, listing, loading, and dropping Iceberg tables across multiple query engines. Unlike Hive Metastore, it was purpose-built for Iceberg and supports features like vended credentials, server-side access control, and namespace management over simple HTTP/JSON requests.

Why is the REST catalog better than Hive Metastore for Iceberg?

The REST catalog is purpose-built for Iceberg and avoids the heavyweight dependencies (JVM, Hadoop, Thrift) that Hive Metastore requires. It uses standard HTTP/JSON, making it easier to debug, monitor, and secure. It also supports Iceberg-native features like vended credentials (server-issued temporary storage credentials), multi-table transactions, and fine-grained access control that Hive Metastore was not designed to handle.

How does RisingWave connect to an Iceberg REST catalog?

RisingWave connects to an Iceberg REST catalog through the CREATE SINK statement with catalog.type = 'rest' and catalog.uri pointing to the catalog's HTTP endpoint. It supports OAuth2 authentication via catalog.credential and catalog.oauth2_server_uri, token-based authentication via catalog.token, and vended credentials via vended_credentials = 'true'. RisingWave reads and writes table metadata through the REST API, coordinating seamlessly with other engines using the same catalog.

Can multiple engines write to the same Iceberg table through the REST catalog?

Yes, but with caveats. Iceberg supports concurrent writers through optimistic concurrency control at the catalog level. Each writer attempts to commit a new snapshot, and the catalog rejects commits that conflict. For streaming pipelines, it is best to have one writer (such as RisingWave) per table and use other engines (Spark, Trino) for reads and maintenance. This avoids commit conflicts and simplifies pipeline operations.

Conclusion

The Iceberg REST catalog is the recommended way to manage Iceberg tables in streaming architectures. Here are the key points:

REST over Hive Metastore: HTTP-native, lightweight, purpose-built for Iceberg. No JVM or Hadoop dependencies.
Simple setup: A Docker Compose stack with three containers (catalog server, PostgreSQL, MinIO) gets you running in minutes.
RisingWave integration: Connect with catalog.type = 'rest' and catalog.uri. Supports OAuth2, token auth, and vended credentials.
Multi-engine access: RisingWave, Spark, and Trino all point to the same REST catalog endpoint for coordinated table management.
Production-ready auth: OAuth2 with vended credentials centralizes security and eliminates the need to distribute storage keys.

Start with the Docker Compose setup in this guide for development, then move to a managed REST catalog service (like Tabular, Lakekeeper, or your cloud provider's offering) for production workloads.

Ready to stream data to Iceberg with a REST catalog? Get started with RisingWave in 5 minutes. Quickstart

Join our Slack community to ask questions and connect with other stream processing developers.