Implementation status
This page summarizes the features supported by different Parquet implementations.
Note: If you find out of date information, please help us improve the accuracy of this page by opening an issue or submitting a pull request.
Legend
The value in each box means:
- ✅: supported. Footnote added when support is partial.
- ❌: not supported
- (R): only read support
- (W): only write support
- (blank): no data
Implementations
- arrow (C++)
- parquet-java (Java)
- arrow-go (Go)
- arrow-rs (Rust)
- cudf (cuDF C++)
- hyparquet (JavaScript)
- duckdb (C++)
Physical types
Physical types are defined by the enum Type in parquet.thrift
| Data Type | arrow | parquet-java | arrow-go | arrow-rs | cudf | hyparquet | duckdb |
|---|---|---|---|---|---|---|---|
| BOOLEAN | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| INT32 | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| INT64 | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| INT961 | ✅ | ✅ | ✅ | ✅ | ✅ | (R) | (R) |
| FLOAT | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| DOUBLE | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| BYTE_ARRAY | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| FIXED_LEN_BYTE_ARRAY | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
(1) This type is deprecated, but as of 2024 it's common in currently produced parquet files
Logical types
Logical types are defined by the union LogicalType in parquet.thrift and described in LogicalTypes.md
| Data Type | arrow | parquet-java | arrow-go | arrow-rs | cudf | hyparquet | duckdb |
|---|---|---|---|---|---|---|---|
| STRING | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| ENUM | ❌ | ✅ | ✅ | ✅(1) | ❌ | ✅ | ✅ |
| UUID | ❌ | ✅ | ✅ | ✅(1) | ❌ | ✅ | ✅ |
| 8, 16, 32, 64 bit signed and unsigned INT | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| DECIMAL (INT32) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| DECIMAL (INT64) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| DECIMAL (BYTE_ARRAY) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | (R) |
| DECIMAL (FIXED_LEN_BYTE_ARRAY) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| FLOAT16 | ✅ | ✅(1) | ✅ | ✅ | ✅ | ✅ | ✅ |
| DATE | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| TIME (INT32) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| TIME (INT64) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| TIMESTAMP (INT64) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| INTERVAL | ✅ | ✅(1) | ✅ | ✅ | ❌ | ✅ | ✅ |
| JSON | ✅ | ✅(1) | ✅ | ✅(1) | ❌ | ✅ | ✅ |
| BSON | ❌ | ✅(1) | ✅ | ✅(1) | ❌ | ❌ | ❌ |
| VARIANT | ✅ | ✅ | ✅ | ❌ | ❌ | ✅ | |
| GEOMETRY | ✅ | ✅ | ❌ | ✅ | ❌ | ✅ | ✅ |
| GEOGRAPHY | ✅ | ✅ | ❌ | ✅ | ❌ | ✅ | ✅ |
| LIST | ✅ | ✅ | ✅ | ✅ | ✅ | (R) | ✅ |
| MAP | ✅ | ✅ | ✅ | ✅ | ✅ | (R) | ✅ |
| UNKNOWN (always null) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
(1) Only supported to use its annotated physical type
Encodings
Encodings are defined by the enum Encoding in parquet.thrift and described in Encodings.md
| Encoding | arrow | parquet-java | arrow-go | arrow-rs | cudf | hyparquet | duckdb |
|---|---|---|---|---|---|---|---|
| PLAIN | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| PLAIN_DICTIONARY | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | (R) |
| RLE_DICTIONARY | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| RLE | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| BIT_PACKED (deprecated) | ✅ | ✅ | ✅ | ❌(1) | (R) | (R) | ❌ |
| DELTA_BINARY_PACKED | ✅ | ✅ | ✅ | ✅ | ✅ | (R) | ✅ |
| DELTA_LENGTH_BYTE_ARRAY | ✅ | ✅ | ✅ | ✅ | ✅ | (R) | ✅ |
| DELTA_BYTE_ARRAY | ✅ | ✅ | ✅ | ✅ | ✅ | (R) | ✅ |
| BYTE_STREAM_SPLIT | ✅ | ✅ | ✅ | ✅ | ✅ | (R) | ✅ |
(1) Partial read support, but only in the case of level data with a bitwidth of 0
Compression Codecs
Compressions are defined by the enum CompressionCodec in parquet.thrift and described in Compression.md
| Compression | arrow | parquet-java | arrow-go | arrow-rs | cudf | hyparquet | duckdb |
|---|---|---|---|---|---|---|---|
| UNCOMPRESSED | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| BROTLI | ✅ | ✅ | ✅ | ✅ | (R) | (R) | ✅ |
| GZIP | ✅ | ✅ | ✅ | ✅ | (R) | (R) | ✅ |
| LZ4 (deprecated) | ✅ | ❌ | ❌ | ✅ | ❌ | (R) | ❌ |
| LZ4_RAW | ✅ | ✅ | ✅ | ✅ | ✅ | (R) | ✅ |
| LZO | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
| SNAPPY | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| ZSTD | ✅ | ✅ | ✅ | ✅ | ✅ | (R) | ✅ |
Other format level features
| Feature | arrow | parquet-java | arrow-go | arrow-rs | cudf | hyparquet | duckdb |
|---|---|---|---|---|---|---|---|
| xxHash-based bloom filters | (R) | ✅ | ✅ | ✅ | (R) | ✅ | |
| Bloom filter length1 | (R) | ✅ | ✅ | ✅ | (R) | ✅ | |
| Statistics min_value, max_value | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| Page index | ✅ | ✅ | ✅ | ✅ | ✅ | (R) | (R) |
| Page CRC32 checksum | ✅ | ✅ | ❌ | ✅ | ❌ | ❌ | (R) |
| Modular encryption | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ✅(2) |
| Size statistics3 | ✅ | ✅ | (R) | ✅ | ✅ | (R) | |
| Data Page V24 | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
(1) In parquet.thrift: ColumnMetaData->bloom_filter_length
(2) Partial support
(3) In parquet.thrift: ColumnMetaData->size_statistics
(4) In parquet.thrift: DataPageHeaderV2
High level data APIs for Parquet feature usage
| Feature | arrow | parquet-java | arrow-go | arrow-rs | cudf | hyparquet | duckdb |
|---|---|---|---|---|---|---|---|
| External column data1 | ✅ | ✅ | ❌ | ❌ | (W) | ✅ | ❌ |
| Row group "Sorting column" metadata2 | ✅ | ❌ | ✅ | ✅ | (W) | ❌ | (R) |
| Row group pruning using statistics | ❌ | ✅ | ✅(3) | ✅ | ✅ | ❌ | ✅ |
| Row group pruning using bloom filter | ❌ | ✅ | ✅(3) | ✅ | ✅ | ❌ | ✅ |
| Reading select columns only | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| Page pruning using statistics | ❌ | ✅ | ✅(3) | ✅ | ❌ | ❌ | ❌ |
(1) In parquet.thrift: ColumnChunk->file_path
(2) In parquet.thrift: RowGroup->sorting_columns
(3) Partial support