parquetdb.core.parquetdb.LoadConfig

class LoadConfig(batch_size: int = 131072, batch_readahead: int = 16, fragment_readahead: int = 4, fragment_scan_options: FragmentScanOptions | None = None, use_threads: bool = True, memory_pool: MemoryPool | None = None)

Configuration for loading data, specifying columns, filters, batch size, and memory usage.

Variables:
  • batch_size (int) – The number of rows to process in each batch. Default: 131,072

  • batch_readahead (int) – The number of batches to read ahead in a file. Default: 16

  • fragment_readahead (int) – The number of files to read ahead, improving IO utilization at the cost of RAM usage. Default: 4

  • fragment_scan_options (Optional[pa.dataset.FragmentScanOptions]) – Options specific to a particular scan and fragment type, potentially changing across scans. Default: None

  • use_threads (bool) – Whether to use maximum parallelism determined by available CPU cores. Default: True

  • memory_pool (Optional[pa.MemoryPool]) – The memory pool for allocations. Uses the system’s default memory pool if not specified. Default: None

__init__(batch_size: int = 131072, batch_readahead: int = 16, fragment_readahead: int = 4, fragment_scan_options: FragmentScanOptions | None = None, use_threads: bool = True, memory_pool: MemoryPool | None = None) None

Methods

__init__([batch_size, batch_readahead, ...])

Attributes

batch_readahead

batch_size

fragment_readahead

fragment_scan_options

memory_pool

use_threads