Introduction
You have three engines querying your Iceberg tables: RisingWave for streaming writes, Trino for interactive queries, and Spark for batch compaction. Each engine needs to know where your tables live, what their schemas look like, and which snapshots are current. Without a shared catalog, you end up with configuration scattered across engine configs, environment variables, and deployment scripts.
The Iceberg REST catalog solves this by providing a single HTTP API that all engines use to discover and manage Iceberg tables. It is the modern replacement for Hive Metastore, designed specifically for Iceberg's needs rather than adapted from Hive's legacy catalog model.
This guide walks through setting up an Iceberg REST catalog from scratch and connecting it to RisingWave for streaming sink pipelines. You will learn the architecture, configuration options, authentication setup, and how to verify the entire stack works end to end. All SQL examples target RisingWave v2.3.
Why Choose an Iceberg REST Catalog Over Hive Metastore?
If you have used Hive Metastore with Iceberg, you know it works. But it comes with significant operational overhead and architectural limitations that become more painful as your Iceberg deployment grows.
Problems with Hive Metastore for Iceberg
Heavyweight dependency: Hive Metastore requires a JVM process, a relational database (MySQL or PostgreSQL) for its backend, and Hadoop libraries. This is a lot of infrastructure to manage just for table metadata.
Schema mismatch: Hive Metastore was designed for Hive tables, not Iceberg tables. Iceberg stores its own metadata in JSON files on object storage. The Hive Metastore entry for an Iceberg table is essentially a pointer to the metadata location, making the Hive schema largely redundant.
Limited Iceberg features: Hive Metastore does not natively support Iceberg features like multi-table transactions, view definitions, or server-side access control policies. These features require catalog-level support that Hive Metastore was never designed to provide.
Thrift protocol: Hive Metastore uses Apache Thrift for communication, which is harder to debug, monitor, and secure compared to HTTP/REST APIs. You cannot test the catalog with curl or inspect requests in standard HTTP monitoring tools.
Advantages of the REST Catalog
HTTP-native: Standard REST API over HTTP/HTTPS. You can test endpoints with curl, monitor traffic with standard tools, and route through API gateways.
Lightweight: No JVM, no Hadoop dependencies. A REST catalog server can run as a single container with minimal resource requirements.
Iceberg-native: The REST catalog spec was designed specifically for Iceberg. It supports namespace management, table creation with partition specs, server-side access control, and vended credentials.
Multi-engine by design: Every major query engine supports the REST catalog protocol. Configuring a new engine is a matter of pointing it to a URL, not installing Hive client libraries.
Vended credentials: The REST catalog can issue temporary, scoped storage credentials to clients, eliminating the need to distribute long-lived S3 keys to every engine.
| Aspect | Hive Metastore | REST Catalog |
| Protocol | Thrift (binary) | HTTP/REST (JSON) |
| Dependencies | JVM, Hadoop, RDBMS | Container, optional RDBMS |
| Iceberg feature support | Partial | Full |
| Credential management | Client-side | Vended (server-side) |
| Debugging | Thrift inspection | curl, HTTP logging |
| Multi-engine support | Requires Hive client | HTTP client only |
How Do You Set Up an Iceberg REST Catalog?
Several implementations of the Iceberg REST catalog spec are available. This guide uses a Docker Compose setup that works for development and small production deployments.
Architecture Overview
A typical REST catalog deployment includes three components:
- REST catalog server: Implements the Iceberg REST API. Stores table metadata in a backend database.
- Backend database (PostgreSQL): Stores catalog metadata (namespace definitions, table locations, current metadata pointers).
- Object storage (S3 or MinIO): Stores Iceberg data files, metadata files, and manifests.
Step 1: Docker Compose Configuration
Create a docker-compose.yml file with all three components:
version: '3.8'
services:
postgres:
image: postgres:16
environment:
POSTGRES_USER: iceberg
POSTGRES_PASSWORD: iceberg_password
POSTGRES_DB: iceberg_catalog
ports:
- "5432:5432"
volumes:
- postgres_data:/var/lib/postgresql/data
minio:
image: minio/minio:latest
environment:
MINIO_ROOT_USER: admin
MINIO_ROOT_PASSWORD: password123
command: server /data --console-address ":9001"
ports:
- "9000:9000"
- "9001:9001"
volumes:
- minio_data:/data
rest-catalog:
image: tabulario/iceberg-rest:1.6.1
environment:
CATALOG_WAREHOUSE: s3a://warehouse/iceberg
CATALOG_IO__IMPL: org.apache.iceberg.aws.s3.S3FileIO
CATALOG_S3_ENDPOINT: http://minio:9000
CATALOG_S3_PATH__STYLE__ACCESS: "true"
AWS_ACCESS_KEY_ID: admin
AWS_SECRET_ACCESS_KEY: password123
AWS_REGION: us-east-1
CATALOG_JDBC_URI: jdbc:postgresql://postgres:5432/iceberg_catalog
CATALOG_JDBC_USER: iceberg
CATALOG_JDBC_PASSWORD: iceberg_password
ports:
- "8181:8181"
depends_on:
- postgres
- minio
volumes:
postgres_data:
minio_data:
Step 2: Start the Services
# Create the S3 bucket for Iceberg
docker compose up -d minio
sleep 5
# Create the warehouse bucket
docker compose exec minio mc alias set local http://localhost:9000 admin password123
docker compose exec minio mc mb local/warehouse
# Start all services
docker compose up -d
Step 3: Verify the REST Catalog Is Running
Test the catalog endpoint:
# List namespaces (should return empty list initially)
curl -s http://localhost:8181/v1/namespaces | jq .
# Expected output:
# { "namespaces": [] }
Create a namespace:
curl -s -X POST http://localhost:8181/v1/namespaces \
-H "Content-Type: application/json" \
-d '{"namespace": ["analytics"]}' | jq .
# Expected output:
# {
# "namespace": ["analytics"],
# "properties": {}
# }
The REST catalog is now running and ready to manage Iceberg tables.
How Do You Configure Authentication for the REST Catalog?
Production deployments need authentication. The Iceberg REST catalog spec supports two primary authentication methods.
OAuth2 Authentication
OAuth2 is the recommended authentication method for production. The catalog server validates bearer tokens issued by your identity provider.
Configure the REST catalog server with an OAuth2 provider:
rest-catalog:
environment:
CATALOG_OAUTH2_ENABLED: "true"
CATALOG_OAUTH2_SERVER_URI: https://auth.example.com/oauth2/token
CATALOG_OAUTH2_CREDENTIAL: "client_id:client_secret"
CATALOG_OAUTH2_SCOPE: "catalog"
Client engines then authenticate by obtaining a token and passing it in requests:
-- RisingWave sink with OAuth2 authentication
CREATE SINK orders_sink FROM orders_mv
WITH (
connector = 'iceberg',
type = 'append-only',
catalog.type = 'rest',
catalog.name = 'production',
catalog.uri = 'https://catalog.example.com',
catalog.credential = 'client_id:client_secret',
catalog.oauth2_server_uri = 'https://auth.example.com/oauth2/token',
catalog.scope = 'catalog:read,catalog:write',
warehouse.path = 's3a://warehouse/iceberg',
database.name = 'analytics',
table.name = 'orders',
s3.access.key = 'AKIAEXAMPLE',
s3.secret.key = 'secret',
s3.region = 'us-east-1'
);
Token-Based Authentication
For simpler setups, you can use static bearer tokens:
CREATE SINK orders_sink FROM orders_mv
WITH (
connector = 'iceberg',
type = 'append-only',
catalog.type = 'rest',
catalog.name = 'staging',
catalog.uri = 'http://rest-catalog:8181',
catalog.token = 'your-bearer-token-here',
warehouse.path = 's3a://warehouse/iceberg',
database.name = 'analytics',
table.name = 'orders',
s3.endpoint = 'http://minio:9000',
s3.access.key = 'admin',
s3.secret.key = 'password123',
s3.region = 'us-east-1'
);
Vended Credentials
One of the REST catalog's most valuable features is vended credentials. Instead of distributing S3 access keys to every engine, the catalog server issues temporary, scoped credentials when a client requests table access.
In RisingWave, enable vended credentials with:
CREATE SINK orders_sink FROM orders_mv
WITH (
connector = 'iceberg',
type = 'append-only',
catalog.type = 'rest',
catalog.name = 'production',
catalog.uri = 'https://catalog.example.com',
catalog.credential = 'client_id:client_secret',
catalog.oauth2_server_uri = 'https://auth.example.com/oauth2/token',
vended_credentials = 'true',
warehouse.path = 's3a://warehouse/iceberg',
database.name = 'analytics',
table.name = 'orders',
s3.region = 'us-east-1'
);
With vended_credentials = 'true', RisingWave requests temporary S3 credentials from the REST catalog server instead of using locally configured keys. This improves security by centralizing credential management and reducing the blast radius of a compromised client.
How Do You Connect RisingWave to the REST Catalog?
With the REST catalog running, let's set up a complete streaming pipeline from RisingWave to Iceberg.
Step 1: Create Source Data in RisingWave
-- Create a table to receive streaming events
CREATE TABLE sensor_readings (
sensor_id VARCHAR,
location VARCHAR,
temperature DOUBLE PRECISION,
humidity DOUBLE PRECISION,
reading_time TIMESTAMP
);
Step 2: Create a Materialized View for Aggregation
-- Compute 5-minute averages per sensor location
CREATE MATERIALIZED VIEW sensor_averages AS
SELECT
location,
window_start,
window_end,
COUNT(*) AS reading_count,
AVG(temperature) AS avg_temperature,
AVG(humidity) AS avg_humidity,
MIN(temperature) AS min_temperature,
MAX(temperature) AS max_temperature
FROM TUMBLE(sensor_readings, reading_time, INTERVAL '5 minutes')
GROUP BY location, window_start, window_end;
Step 3: Create the Iceberg Sink with REST Catalog
CREATE SINK sensor_iceberg_sink FROM sensor_averages
WITH (
connector = 'iceberg',
type = 'append-only',
force_append_only = 'true',
catalog.type = 'rest',
catalog.name = 'my_catalog',
catalog.uri = 'http://rest-catalog:8181',
warehouse.path = 's3a://warehouse/iceberg',
database.name = 'analytics',
table.name = 'sensor_averages',
s3.endpoint = 'http://minio:9000',
s3.access.key = 'admin',
s3.secret.key = 'password123',
s3.region = 'us-east-1',
create_table_if_not_exists = 'true',
partition_by = 'day(window_start), truncate(5, location)'
);
Step 4: Insert Test Data and Verify
-- Insert sample sensor readings
INSERT INTO sensor_readings VALUES
('S001', 'warehouse-A', 22.5, 45.0, '2026-03-29 10:00:00'),
('S002', 'warehouse-A', 23.1, 44.5, '2026-03-29 10:01:00'),
('S003', 'warehouse-B', 18.7, 55.2, '2026-03-29 10:02:00'),
('S004', 'warehouse-B', 19.2, 54.8, '2026-03-29 10:03:00'),
('S005', 'warehouse-A', 22.8, 45.5, '2026-03-29 10:04:00'),
('S006', 'warehouse-C', 25.3, 38.0, '2026-03-29 10:05:00'),
('S007', 'warehouse-C', 25.8, 37.5, '2026-03-29 10:06:00'),
('S008', 'warehouse-A', 23.5, 44.0, '2026-03-29 10:07:00');
Verify the materialized view is computing results:
SELECT * FROM sensor_averages;
Expected output:
location | window_start | window_end | reading_count | avg_temperature | avg_humidity | min_temperature | max_temperature
---------------+--------------------------+--------------------------+---------------+-----------------+--------------+-----------------+----------------
warehouse-A | 2026-03-29 10:00:00.000 | 2026-03-29 10:05:00.000 | 3 | 22.8 | 45.0 | 22.5 | 23.1
warehouse-B | 2026-03-29 10:00:00.000 | 2026-03-29 10:05:00.000 | 2 | 18.95 | 55.0 | 18.7 | 19.2
warehouse-A | 2026-03-29 10:05:00.000 | 2026-03-29 10:10:00.000 | 1 | 23.5 | 44.0 | 23.5 | 23.5
warehouse-C | 2026-03-29 10:05:00.000 | 2026-03-29 10:10:00.000 | 2 | 25.55 | 37.75 | 25.3 | 25.8
After the commit interval elapses, verify that data appears in Iceberg by querying from Trino or Spark:
-- From Trino connected to the same REST catalog
SELECT * FROM iceberg.analytics.sensor_averages
WHERE window_start >= TIMESTAMP '2026-03-29 10:00:00';
For complete documentation on RisingWave's Iceberg catalog configuration, see the Iceberg catalog configuration reference.
How Do You Connect Multiple Engines to the Same REST Catalog?
One of the primary benefits of a REST catalog is multi-engine access. Here is how to configure Spark and Trino alongside RisingWave.
Spark Configuration
Add the REST catalog to your Spark session:
from pyspark.sql import SparkSession
spark = SparkSession.builder \
.appName("iceberg-analytics") \
.config("spark.sql.catalog.my_catalog", "org.apache.iceberg.spark.SparkCatalog") \
.config("spark.sql.catalog.my_catalog.type", "rest") \
.config("spark.sql.catalog.my_catalog.uri", "http://rest-catalog:8181") \
.config("spark.sql.catalog.my_catalog.warehouse", "s3a://warehouse/iceberg") \
.config("spark.sql.catalog.my_catalog.io-impl", "org.apache.iceberg.aws.s3.S3FileIO") \
.config("spark.sql.catalog.my_catalog.s3.endpoint", "http://minio:9000") \
.getOrCreate()
# Query the table that RisingWave is streaming to
spark.sql("SELECT * FROM my_catalog.analytics.sensor_averages").show()
Trino Configuration
Add the REST catalog as a Trino connector in etc/catalog/iceberg.properties:
connector.name=iceberg
iceberg.catalog.type=rest
iceberg.rest-catalog.uri=http://rest-catalog:8181
iceberg.rest-catalog.warehouse=s3a://warehouse/iceberg
fs.native-s3.enabled=true
s3.endpoint=http://minio:9000
s3.aws-access-key=admin
s3.aws-secret-key=password123
s3.region=us-east-1
s3.path-style-access=true
Now all three engines share the same catalog. RisingWave streams data in, Spark handles batch compaction and maintenance, and Trino serves interactive queries, all pointing to the same set of Iceberg tables through one REST catalog endpoint.
Why Multi-Engine Access Matters for Streaming
In a streaming architecture, different engines serve different roles:
- RisingWave: Continuous ingestion and transformation. Writes streaming results to Iceberg tables.
- Spark: Periodic maintenance jobs (compaction, snapshot expiration, schema evolution).
- Trino/DuckDB: Ad-hoc analytics and dashboard queries on the fresh data RisingWave produces.
The REST catalog is the coordination point. When RisingWave commits a new snapshot, Trino immediately sees the new data because both engines read from the same catalog metadata. No syncing scripts, no cache invalidation, no manual refreshes.
For more on building a streaming lakehouse architecture, see the RisingWave lakehouse overview.
What Are Common Troubleshooting Steps?
Connection Refused Errors
If RisingWave cannot connect to the REST catalog:
- Verify the catalog is running:
curl http://rest-catalog:8181/v1/config - Check network connectivity between RisingWave and the catalog (Docker network, Kubernetes service DNS)
- Ensure the
catalog.uriin your CREATE SINK uses the correct hostname and port
Authentication Failures
For OAuth2 authentication issues:
- Test the token endpoint directly:
curl -X POST https://auth.example.com/oauth2/token -d 'grant_type=client_credentials&client_id=...&client_secret=...' - Verify the scope includes the required permissions
- Check that the catalog server's OAuth2 configuration matches the client credentials
Table Not Found Errors
If RisingWave reports that the target table does not exist:
- Verify the namespace exists:
curl http://rest-catalog:8181/v1/namespaces - Check the database.name and table.name parameters match the catalog's namespace and table
- If using
create_table_if_not_exists = 'true', ensure the namespace already exists (RisingWave creates the table but not the namespace)
S3 Permission Errors
If writes fail with access denied:
- Verify the S3 credentials have write access to the warehouse path
- If using vended credentials, check that the REST catalog server has the permissions to issue temporary credentials
- For MinIO, ensure the bucket exists and the access policy allows the configured user
FAQ
What is an Iceberg REST catalog?
An Iceberg REST catalog is an HTTP-based service that implements the Apache Iceberg REST catalog specification for managing table metadata. It provides a standard API for creating, listing, loading, and dropping Iceberg tables across multiple query engines. Unlike Hive Metastore, it was purpose-built for Iceberg and supports features like vended credentials, server-side access control, and namespace management over simple HTTP/JSON requests.
Why is the REST catalog better than Hive Metastore for Iceberg?
The REST catalog is purpose-built for Iceberg and avoids the heavyweight dependencies (JVM, Hadoop, Thrift) that Hive Metastore requires. It uses standard HTTP/JSON, making it easier to debug, monitor, and secure. It also supports Iceberg-native features like vended credentials (server-issued temporary storage credentials), multi-table transactions, and fine-grained access control that Hive Metastore was not designed to handle.
How does RisingWave connect to an Iceberg REST catalog?
RisingWave connects to an Iceberg REST catalog through the CREATE SINK statement with catalog.type = 'rest' and catalog.uri pointing to the catalog's HTTP endpoint. It supports OAuth2 authentication via catalog.credential and catalog.oauth2_server_uri, token-based authentication via catalog.token, and vended credentials via vended_credentials = 'true'. RisingWave reads and writes table metadata through the REST API, coordinating seamlessly with other engines using the same catalog.
Can multiple engines write to the same Iceberg table through the REST catalog?
Yes, but with caveats. Iceberg supports concurrent writers through optimistic concurrency control at the catalog level. Each writer attempts to commit a new snapshot, and the catalog rejects commits that conflict. For streaming pipelines, it is best to have one writer (such as RisingWave) per table and use other engines (Spark, Trino) for reads and maintenance. This avoids commit conflicts and simplifies pipeline operations.
Conclusion
The Iceberg REST catalog is the recommended way to manage Iceberg tables in streaming architectures. Here are the key points:
- REST over Hive Metastore: HTTP-native, lightweight, purpose-built for Iceberg. No JVM or Hadoop dependencies.
- Simple setup: A Docker Compose stack with three containers (catalog server, PostgreSQL, MinIO) gets you running in minutes.
- RisingWave integration: Connect with
catalog.type = 'rest'andcatalog.uri. Supports OAuth2, token auth, and vended credentials. - Multi-engine access: RisingWave, Spark, and Trino all point to the same REST catalog endpoint for coordinated table management.
- Production-ready auth: OAuth2 with vended credentials centralizes security and eliminates the need to distribute storage keys.
Start with the Docker Compose setup in this guide for development, then move to a managed REST catalog service (like Tabular, Lakekeeper, or your cloud provider's offering) for production workloads.
Ready to stream data to Iceberg with a REST catalog? Get started with RisingWave in 5 minutes. Quickstart
Join our Slack community to ask questions and connect with other stream processing developers.

