Internals of ParquetDB

Welcome to the internal documentation of ParquetDB. This section provides an in-depth look into how ParquetDB works under the hood. Here you will find detailed explanations of the following topics:

  • The integration of PyArrow and its role in high-performance data processing.

  • The data flow within ParquetDB: how various input formats are standardized and preprocessed.

  • The internal structure and handling of Parquet files, including file layout and metadata management.

  • Advanced topics related to schema evolution, normalization, and more.

Contents