Iceberg vs Parquet: Table Format vs File Format
Apache Iceberg and Parquet are often confused, but they solve different problems. Parquet is a file format — it defines how data is stored in columnar format within a single file. Iceberg is a table format — it defines how multiple Parquet files are organized into a table with ACID transactions, schema evolution, and time travel.
How They Relate
Iceberg Table
├── metadata.json (table schema, snapshots, partition spec)
├── manifest-list-1.avro (list of manifest files)
│ ├── manifest-1.avro (list of data files + column stats)
│ │ ├── data-001.parquet ← Parquet file
│ │ ├── data-002.parquet ← Parquet file
│ │ └── data-003.parquet ← Parquet file
Iceberg USES Parquet as its data file format. They are complementary, not competitors.
Comparison
| Aspect | Parquet | Apache Iceberg |
| Level | File format | Table format |
| Scope | Single file | Multiple files = one table |
| ACID | ❌ | ✅ |
| Schema | Per-file | Table-wide, evolvable |
| Partitioning | ❌ (file-level) | ✅ (hidden, evolvable) |
| Time travel | ❌ | ✅ (snapshots) |
| Concurrent writes | ❌ | ✅ (optimistic concurrency) |
| Query planning | Full scan | Column stats + partition pruning |
Frequently Asked Questions
Can I use Iceberg without Parquet?
Iceberg supports ORC and Avro as alternative data file formats, but Parquet is the default and most commonly used. In practice, Iceberg = Iceberg metadata + Parquet data files.
Should I use raw Parquet files or Iceberg?
Use Iceberg. Raw Parquet on S3 lacks transactions, schema management, and efficient query planning. Iceberg adds these with minimal overhead. There is no good reason to use raw Parquet for analytical tables in 2026.

