API Reference

Core Module

class chunking_pandas.core.ChunkingExperiment(input_file: str, output_file: str, file_format: FileFormat = FileFormat.CSV, auto_run: bool = True, n_chunks: int = 4, chunking_strategy: str = 'rows', save_chunks: bool = False, n_workers: int | None = None, monitor_performance: bool = False)[source]

Bases: object

get_metrics() Dict[str, ChunkingMetrics][source]

Return performance metrics for all operations.

get_optimal_chunk_size(data_size: int) int[source]

Calculate optimal chunk size based on data size and available resources.

process_chunks(strategy: ChunkingStrategy) List[DataFrame] | List[ndarray][source]

Process input data into chunks with parallel support and performance monitoring.

class chunking_pandas.core.ChunkingMetrics(processing_time: float, memory_usage: float, chunk_sizes: List[int], strategy: str, total_chunks: int)[source]

Bases: object

Store metrics about chunking operations.

chunk_sizes: List[int]
memory_usage: float
processing_time: float
strategy: str
total_chunks: int
class chunking_pandas.core.ChunkingStrategy(value)[source]

Bases: str, Enum

An enumeration.

BLOCKS = 'blocks'
COLUMNS = 'columns'
DYNAMIC = 'dynamic'
NO_CHUNKS = 'None'
PARALLEL_BLOCKS = 'parallel_blocks'
PARALLEL_ROWS = 'parallel_rows'
ROWS = 'rows'
TOKENS = 'tokens'
class chunking_pandas.core.FileFormat(value)[source]

Bases: str, Enum

An enumeration.

CSV = 'csv'
JSON = 'json'
NUMPY = 'numpy'
PARQUET = 'parquet'

Gradio Interface

chunking_pandas.interface.create_interface() Interface[source]

Create and configure the Gradio interface.

chunking_pandas.interface.get_sample_data_path() Path[source]

Get the absolute path to sample data.

chunking_pandas.interface.launch_interface(share: bool = False, port: int = 7860)[source]

Launch the Gradio interface.

chunking_pandas.interface.process_file(input_file, output_filename: str, file_format: str, chunking_strategy: str, n_chunks: int) tuple[update, update][source]

Process file using ChunkingExperiment.