API Reference

Core Module

class chunking_pandas.core.ChunkingExperiment(input_file: str, output_file: str, file_format: FileFormat = FileFormat.CSV, auto_run: bool = True, n_chunks: int = 4, chunking_strategy: str = 'rows', save_chunks: bool = False, n_workers: int | None = None)[source]

Bases: object

get_optimal_chunk_size(data_size: int) int[source]

Calculate optimal chunk size based on data size and available resources.

process_chunks(strategy: ChunkingStrategy) List[DataFrame] | List[ndarray][source]

Process input data into chunks with parallel support.

class chunking_pandas.core.ChunkingStrategy(value)[source]

Bases: str, Enum

An enumeration.

BLOCKS = 'blocks'
COLUMNS = 'columns'
DYNAMIC = 'dynamic'
NO_CHUNKS = 'None'
PARALLEL_BLOCKS = 'parallel_blocks'
PARALLEL_ROWS = 'parallel_rows'
ROWS = 'rows'
TOKENS = 'tokens'
class chunking_pandas.core.FileFormat(value)[source]

Bases: str, Enum

An enumeration.

CSV = 'csv'
JSON = 'json'
NUMPY = 'numpy'
PARQUET = 'parquet'

Gradio Interface