Iceberg vs Parquet: Table Format vs File Format

Apache Iceberg and Parquet are often confused, but they solve different problems. Parquet is a file format — it defines how data is stored in columnar format within a single file. Iceberg is a table format — it defines how multiple Parquet files are organized into a table with ACID transactions, schema evolution, and time travel.

How They Relate

Iceberg Table
├── metadata.json (table schema, snapshots, partition spec)
├── manifest-list-1.avro (list of manifest files)
│   ├── manifest-1.avro (list of data files + column stats)
│   │   ├── data-001.parquet  ← Parquet file
│   │   ├── data-002.parquet  ← Parquet file
│   │   └── data-003.parquet  ← Parquet file

Iceberg USES Parquet as its data file format. They are complementary, not competitors.

Comparison

Aspect	Parquet	Apache Iceberg
Level	File format	Table format
Scope	Single file	Multiple files = one table
ACID	❌	✅
Schema	Per-file	Table-wide, evolvable
Partitioning	❌ (file-level)	✅ (hidden, evolvable)
Time travel	❌	✅ (snapshots)
Concurrent writes	❌	✅ (optimistic concurrency)
Query planning	Full scan	Column stats + partition pruning

Frequently Asked Questions

Can I use Iceberg without Parquet?

Iceberg supports ORC and Avro as alternative data file formats, but Parquet is the default and most commonly used. In practice, Iceberg = Iceberg metadata + Parquet data files.

Should I use raw Parquet files or Iceberg?

Use Iceberg. Raw Parquet on S3 lacks transactions, schema management, and efficient query planning. Iceberg adds these with minimal overhead. There is no good reason to use raw Parquet for analytical tables in 2026.