Metadata

There are two types of metadata: file metadata, and page header metadata.

All thrift structures are serialized using the TCompactProtocol. The full definition of these structures is given in the Parquet Thrift definition.

File metadata

In the diagram below, file metadata is described by the FileMetaData structure. This file metadata provides offset and size information useful when navigating the Parquet file.

Parquet Metadata format

Page header metadata (PageHeader and children in the diagram) is stored in-line with the page data, and is used in the reading and decoding of data.

Parquet PageHeader format

Last modified March 5, 2025: Update Metadata Diagrams (#106) (5ab1cc6)