Skip to content

0010 - File Format

Status

DRAFT

Context

Choosing component of Storage Engine

  • Parquet

Let start with their slogan

Apache Parquet is an open source, column-oriented data file format designed for efficient data storage and retrieval. It provides efficient data compression and encoding schemes with enhanced performance to handle complex data in bulk. Parquet is available in multiple languages including Java, C++, Python, etc...

Source: [^1]

https://parquet.apache.org/docs/overview/motivation/

https://www.slideshare.net/HadoopSummit/file-format-benchmark-avro-json-orc-and-parquet