API Reference
Core Module
- class chunking_pandas.core.ChunkingExperiment(input_file: str, output_file: str, file_format: FileFormat = FileFormat.CSV, auto_run: bool = True, n_chunks: int = 4, chunking_strategy: str = 'rows', save_chunks: bool = False, n_workers: int | None = None, monitor_performance: bool = False)[source]
Bases:
object
- get_metrics() Dict[str, ChunkingMetrics] [source]
Return performance metrics for all operations.
- get_optimal_chunk_size(data_size: int) int [source]
Calculate optimal chunk size based on data size and available resources.
- process_chunks(strategy: ChunkingStrategy) List[DataFrame] | List[ndarray] [source]
Process input data into chunks with parallel support and performance monitoring.
- class chunking_pandas.core.ChunkingMetrics(processing_time: float, memory_usage: float, chunk_sizes: List[int], strategy: str, total_chunks: int)[source]
Bases:
object
Store metrics about chunking operations.
- chunk_sizes: List[int]
- memory_usage: float
- processing_time: float
- strategy: str
- total_chunks: int
Gradio Interface
- chunking_pandas.interface.create_interface() Interface [source]
Create and configure the Gradio interface.
- chunking_pandas.interface.get_sample_data_path() Path [source]
Get the absolute path to sample data.