parquetdb.core.parquetdb.LoadConfig¶
- class LoadConfig(batch_size: int = 131072, batch_readahead: int = 16, fragment_readahead: int = 4, fragment_scan_options: FragmentScanOptions | None = None, use_threads: bool = True, memory_pool: MemoryPool | None = None)¶
Configuration for loading data, specifying columns, filters, batch size, and memory usage.
- Variables:
batch_size (int) – The number of rows to process in each batch. Default: 131,072
batch_readahead (int) – The number of batches to read ahead in a file. Default: 16
fragment_readahead (int) – The number of files to read ahead, improving IO utilization at the cost of RAM usage. Default: 4
fragment_scan_options (Optional[pa.dataset.FragmentScanOptions]) – Options specific to a particular scan and fragment type, potentially changing across scans. Default: None
use_threads (bool) – Whether to use maximum parallelism determined by available CPU cores. Default: True
memory_pool (Optional[pa.MemoryPool]) – The memory pool for allocations. Uses the system’s default memory pool if not specified. Default: None
- __init__(batch_size: int = 131072, batch_readahead: int = 16, fragment_readahead: int = 4, fragment_scan_options: FragmentScanOptions | None = None, use_threads: bool = True, memory_pool: MemoryPool | None = None) None ¶
Methods
__init__
([batch_size, batch_readahead, ...])Attributes
batch_readahead
batch_size
fragment_readahead
fragment_scan_options
memory_pool
use_threads