Querying Apache Iceberg with Trino, DuckDB, and Spark

Apache Iceberg tables can be queried by multiple engines simultaneously — Trino for interactive SQL, DuckDB for local analytics, Spark for large-scale processing. This multi-engine support is Iceberg's key advantage over proprietary warehouse formats.

Engine Comparison

Engine	Best For	Latency	Scale	Setup
Trino	Interactive SQL, dashboards	Seconds	Large clusters	Medium
DuckDB	Local analytics, notebooks	Sub-second (small data)	Single machine	Easy
Spark	Large-scale ETL, ML	Minutes	Massive clusters	Complex
Snowflake	Managed analytics	Seconds	Elastic	Easy
BigQuery	GCP-native analytics	Seconds	Serverless	Easy

Trino + Iceberg

-- Configure Iceberg catalog in Trino
-- trino/catalog/iceberg.properties:
-- connector.name=iceberg
-- iceberg.catalog.type=rest
-- iceberg.rest-catalog.uri=http://catalog:8181

SELECT event_date, COUNT(*) as events, AVG(duration) as avg_duration
FROM iceberg.analytics.events
WHERE event_date > DATE '2026-03-01'
GROUP BY event_date ORDER BY event_date;

DuckDB + Iceberg

-- DuckDB reads Iceberg directly from S3
INSTALL iceberg; LOAD iceberg;
SELECT * FROM iceberg_scan('s3://lakehouse/warehouse/analytics/events');

The Multi-Engine Pattern

RisingWave ──→ Iceberg ←── Trino (dashboards)
                       ←── DuckDB (ad-hoc analysis)
                       ←── Spark (ML training)
                       ←── Snowflake (BI reporting)

Write once (via RisingWave), read from any engine.

Frequently Asked Questions

Which query engine should I use with Iceberg?

Trino for interactive SQL and dashboards. DuckDB for local/notebook analysis. Spark for large-scale ETL and ML. Snowflake/BigQuery if you're already on those platforms.

Can I query the same Iceberg table from multiple engines?

Yes. That's Iceberg's core value. Multiple engines read from the same table simultaneously with snapshot isolation. No data copying or format conversion needed.