docs for lmcat v0.2.0

Contents

PyPI PyPI - Downloads docs Checks Coverage

GitHub commits GitHub commit activity code size, bytes

lmcat

A Python tool for concatenating files and directory structures into a single document, perfect for sharing code with language models. It respects .gitignore and .lmignore patterns and provides configurable output formatting.

Features

Installation

Install from PyPI:

pip install lmcat

or, install with support for counting tokens:

pip install lmcat[tokenizers]

Usage

Basic usage - concatenate current directory:

# Only show directory tree
python -m lmcat --tree-only

# Write output to file
python -m lmcat --output summary.md

# Print current configuration
python -m lmcat --print-cfg

The output will include a directory tree and the contents of each non-ignored file.

Command Line Options

Configuration

lmcat is best configured via a tool.lmcat section in pyproject.toml:

[tool.lmcat]
# Tree formatting
tree_divider = "│   "    # Vertical lines in tree
tree_indent = " "        # Indentation
tree_file_divider = "├── "  # File/directory entries
content_divider = "``````"  # File content delimiters

# Processing pipeline
tokenizer = "gpt2"  # or "whitespace-split"
tree_only = false   # Only show tree structure
on_multiple_processors = "except"  # Behavior when multiple processors match

# File handling
ignore_patterns = ["*.tmp", "*.log"]  # Additional patterns to ignore
ignore_patterns_files = [".gitignore", ".lmignore"]

# processors
[tool.lmcat.glob_process]
"[mM]akefile" = "makefile_recipes"
"*.ipynb" = "ipynb_to_md"

Development

Setup

  1. Clone the repository:
git clone https://github.com/mivanit/lmcat
cd lmcat
  1. Set up the development environment:
make setup

Development Commands

The project uses make for common development tasks:

Run make help to see all available commands.

Running Tests

make test

For verbose output:

VERBOSE=1 make test

Roadmap

Submodules

API Documentation

View Source on GitHub

lmcat

PyPI PyPI - Downloads docs Checks Coverage

GitHub commits GitHub commit activity code size, bytes

lmcat

A Python tool for concatenating files and directory structures into a single document, perfect for sharing code with language models. It respects .gitignore and .lmignore patterns and provides configurable output formatting.

Features

Installation

Install from PyPI:

pip install lmcat

or, install with support for counting tokens:

pip install lmcat[tokenizers]

Usage

Basic usage - concatenate current directory:

### Only show directory tree
python -m lmcat --tree-only

### Write output to file
python -m lmcat --output summary.md

### Print current configuration
python -m lmcat --print-cfg

The output will include a directory tree and the contents of each non-ignored file.

Command Line Options
Configuration

lmcat is best configured via a tool.lmcat section in pyproject.toml:

[tool.lmcat]
### Tree formatting
tree_divider = "│   "    # Vertical lines in tree
tree_indent = " "        # Indentation
tree_file_divider = "├── "  # File/directory entries
content_divider = "``````"  # File content delimiters

### Processing pipeline
tokenizer = "gpt2"  # or "whitespace-split"
tree_only = false   # Only show tree structure
on_multiple_processors = "except"  # Behavior when multiple processors match

### File handling
ignore_patterns = ["*.tmp", "*.log"]  # Additional patterns to ignore
ignore_patterns_files = [".gitignore", ".lmignore"]

### processors
[tool.lmcat.glob_process]
"[mM]akefile" = "makefile_recipes"
"*.ipynb" = "ipynb_to_md"

Development

Setup
  1. Clone the repository:
git clone https://github.com/mivanit/lmcat
cd lmcat
  1. Set up the development environment:
make setup
Development Commands

The project uses make for common development tasks:

Run make help to see all available commands.

Running Tests
make test

For verbose output:

VERBOSE=1 make test
Roadmap

View Source on GitHub

def main

() -> None

View Source on GitHub

Main entry point for the script

docs for lmcat v0.2.0

API Documentation

View Source on GitHub

lmcat.file_stats

View Source on GitHub

class TokenizerWrapper:

View Source on GitHub

tokenizer wrapper. stores name and provides n_tokens method.

uses splitting by whitespace as a fallback – whitespace-split

TokenizerWrapper

(name: str = 'whitespace-split')

View Source on GitHub

def n_tokens

(self, text: str) -> int

View Source on GitHub

Return number of tokens in text

class FileStats:

View Source on GitHub

Statistics for a single file

FileStats

(lines: int, chars: int, tokens: Optional[int] = None)

def from_file

(
    cls,
    path: pathlib.Path,
    tokenizer: lmcat.file_stats.TokenizerWrapper
) -> lmcat.file_stats.FileStats

View Source on GitHub

Get statistics for a single file

Parameters:

Returns:

class TreeEntry(typing.NamedTuple):

View Source on GitHub

Entry in the tree output with optional stats

TreeEntry

(line: str, stats: Optional[lmcat.file_stats.FileStats] = None)

Create new instance of TreeEntry(line, stats)

Alias for field number 0

Alias for field number 1

Inherited Members

docs for lmcat v0.2.0

API Documentation

View Source on GitHub

lmcat.lmcat

View Source on GitHub

class LMCatConfig(muutils.json_serialize.serializable_dataclass.SerializableDataclass):

View Source on GitHub

Configuration dataclass for lmcat

LMCatConfig

(
    *,
    content_divider: str = '``````',
    tree_only: bool = False,
    ignore_patterns: list[str] = <factory>,
    ignore_patterns_files: list[pathlib.Path] = <factory>,
    plugins_file: pathlib.Path | None = None,
    allow_plugins: bool = False,
    glob_process: dict[str, str] = <factory>,
    decider_process: dict[str, str] = <factory>,
    on_multiple_processors: Literal['warn', 'except', 'do_first', 'do_last', 'skip'] = 'except',
    tokenizer: str = 'gpt2',
    tree_divider: str = '│   ',
    tree_file_divider: str = '├── ',
    tree_indent: str = ' ',
    output: str | None = None
)

Tokenizer to use for tokenizing the output. gpt2 by default. passed to tokenizers.Tokenizer.from_pretrained(). If specified and tokenizers not installed, will throw exception. fallback whitespace-split used to avoid exception when tokenizers not installed.

def get_tokenizer_obj

(self) -> lmcat.file_stats.TokenizerWrapper

View Source on GitHub

Get the tokenizer object

def get_processing_pipeline

(self) -> lmcat.processing_pipeline.ProcessingPipeline

View Source on GitHub

Get the processing pipeline object

def read

(cls, root_dir: pathlib.Path) -> lmcat.lmcat.LMCatConfig

View Source on GitHub

Attempt to read config from pyproject.toml, lmcat.toml, or lmcat.json.

def serialize

(self) -> dict[str, typing.Any]

View Source on GitHub

returns the class as a dict, implemented by using @serializable_dataclass decorator

def load

(cls, data: Union[dict[str, Any], ~T]) -> Type[~T]

View Source on GitHub

takes in an appropriately structured dict and returns an instance of the class, implemented by using @serializable_dataclass decorator

def validate_fields_types

(
    self: muutils.json_serialize.serializable_dataclass.SerializableDataclass,
    on_typecheck_error: muutils.errormode.ErrorMode = ErrorMode.Except
) -> bool

View Source on GitHub

validate the types of all the fields on a SerializableDataclass. calls SerializableDataclass__validate_field_type for each field

Inherited Members

class IgnoreHandler:

View Source on GitHub

Handles all ignore pattern matching using igittigitt

IgnoreHandler

(root_dir: pathlib.Path, config: lmcat.lmcat.LMCatConfig)

View Source on GitHub

def is_ignored

(self, path: pathlib.Path) -> bool

View Source on GitHub

Check if a path should be ignored

def sorted_entries

(directory: pathlib.Path) -> list[pathlib.Path]

View Source on GitHub

Return directory contents sorted: directories first, then files

def walk_dir

(
    directory: pathlib.Path,
    ignore_handler: lmcat.lmcat.IgnoreHandler,
    config: lmcat.lmcat.LMCatConfig,
    tokenizer: lmcat.file_stats.TokenizerWrapper,
    prefix: str = ''
) -> tuple[list[lmcat.file_stats.TreeEntry], list[pathlib.Path]]

View Source on GitHub

Recursively walk a directory, building tree lines and collecting file paths

def format_tree_with_stats

(
    entries: list[lmcat.file_stats.TreeEntry],
    show_tokens: bool = False
) -> list[str]

View Source on GitHub

Format tree entries with aligned statistics

Parameters:

Returns:

def walk_and_collect

(
    root_dir: pathlib.Path,
    config: lmcat.lmcat.LMCatConfig
) -> tuple[list[str], list[pathlib.Path]]

View Source on GitHub

Walk filesystem from root_dir and gather tree listing plus file paths

def assemble_summary

(root_dir: pathlib.Path, config: lmcat.lmcat.LMCatConfig) -> str

View Source on GitHub

Assemble the summary output and return

def main

() -> None

View Source on GitHub

Main entry point for the script

docs for lmcat v0.2.0

API Documentation

View Source on GitHub

lmcat.processing_pipeline

View Source on GitHub

def load_plugins

(plugins_file: pathlib.Path) -> None

View Source on GitHub

Load plugins from a Python file.

Parameters:

class ProcessingPipeline:

View Source on GitHub

Manages the processing pipeline for files.

Attributes:

ProcessingPipeline

(
    plugins_file: pathlib.Path | None,
    decider_process_keys: dict[str, str],
    glob_process_keys: dict[str, str],
    on_multiple_processors: Literal['warn', 'except', 'do_first', 'do_last', 'skip']
)

View Source on GitHub

def get_processors_for_path

(self, path: pathlib.Path) -> list[typing.Callable[[pathlib.Path], str]]

View Source on GitHub

Get all applicable processors for a given path.

Parameters:

Returns:

def process_file

(self, path: pathlib.Path) -> tuple[str, str | None]

View Source on GitHub

Process a file through the pipeline.

Parameters:

Returns:

docs for lmcat v0.2.0

API Documentation

View Source on GitHub

lmcat.processors

View Source on GitHub

def register_processor

(func: Callable[[pathlib.Path], str]) -> Callable[[pathlib.Path], str]

View Source on GitHub

Register a function as a path processor

def register_decider

(func: Callable[[pathlib.Path], bool]) -> Callable[[pathlib.Path], bool]

View Source on GitHub

Register a function as a decider

def is_over_10kb

(path: pathlib.Path) -> bool

View Source on GitHub

Check if file is over 10KB.

def is_documentation

(path: pathlib.Path) -> bool

View Source on GitHub

Check if file is documentation.

def remove_comments

(path: pathlib.Path) -> str

View Source on GitHub

Remove single-line comments from code.

def compress_whitespace

(path: pathlib.Path) -> str

View Source on GitHub

Compress multiple whitespace characters into single spaces.

def to_relative_path

(path: pathlib.Path) -> str

View Source on GitHub

return the path to the file as a string

def ipynb_to_md

(path: pathlib.Path) -> str

View Source on GitHub

Convert an IPython notebook to markdown.

def makefile_recipes

(path: pathlib.Path) -> str

View Source on GitHub

Process a Makefile to show only target descriptions and basic structure.

Preserves: - Comments above .PHONY targets up to first empty line - The .PHONY line and target line - First line after target if it starts with @echo

Parameters:

Returns:

def csv_preview_5_lines

(path: pathlib.Path) -> str

View Source on GitHub

Preview first few lines of a CSV file (up to 5)

Reads only first 1024 bytes and splits into lines. Does not attempt to parse CSV structure.

Parameters:

Returns: