API Reference
This document—a precision-engineered blueprint—details all classes and methods provided by the Ollama Toolkit client.
OllamaClient
The main class for interacting with the Ollama Toolkit, designed with contextual integrity and recursive refinement.
from ollama_toolkit import OllamaClient
# Initialize with optimal defaults - each parameter carefully calibrated
client = OllamaClient(
base_url="http://localhost:11434/", # Foundational connection point
timeout=300, # Generous yet bounded timeout window
max_retries=3, # Optimal retry balance for resilience
retry_delay=1.0, # Mathematically sound exponential backoff
cache_enabled=False, # Toggle for performance optimization
cache_ttl=300.0 # Time-bounded memory efficiency
)
Constructor Parameters
Each parameter precisely tuned for maximum effect:
base_url
(str): The base URL of the Ollama Toolkit server. Default: “http://localhost:11434/”timeout
(int): Default timeout for API requests in seconds. Default: 300max_retries
(int): Maximum number of retry attempts for failed requests. Default: 3retry_delay
(float): Delay between retry attempts in seconds. Default: 1.0cache_enabled
(bool): Whether to cache API responses. Default: Falsecache_ttl
(float): Cache time-to-live in seconds. Default: 300.0 (5 minutes)
Core Methods
generate / agenerate — Text generation with precision
chat / achat — Conversational flow with structure
create_embedding / acreate_embedding — Vector transformation with mathematical elegance
list_models / pull_model / delete_model — Model lifecycle management
get_version / check_ollama_installed / ensure_ollama_running — System verification
For maximum compatibility with autodoc systems, each function uses a Google-style docstring format with a brief summary, argument descriptions, and return values.
Text Generation
generate
Generate a completion for the given prompt with recursive refinement.
response = client.generate(
model="llama2",
prompt="Explain quantum computing in simple terms",
options={"temperature": 0.7}, # Precisely tuned randomness
stream=False
)
Parameters:
model
(str): The model name to use for generationprompt
(str): The prompt to generate a response foroptions
(dict, optional): Additional model parametersstream
(bool, optional): Whether to stream the response. Default: False
Returns:
If
stream=False
: A dictionary containing the responseIf
stream=True
: An iterator yielding response chunks
Options Dictionary Parameters:
temperature
(float): Controls randomness in generation. Higher values (e.g., 0.8) make output more random, lower values (e.g., 0.2) make it more deterministictop_p
(float): Controls diversity via nucleus sampling. Default: 1.0top_k
(int): Limits token selection to top K options. Default: -1 (disabled)max_tokens
(int): Maximum number of tokens to generate. Default varies by modelpresence_penalty
(float): Penalty for token repetition. Default: 0.0frequency_penalty
(float): Penalty based on frequency in text. Default: 0.0stop
(list): Sequences where the API will stop generating further tokensseed
(int): Random number seed for reproducible outputs
agenerate
Asynchronous version of generate
. (Note: This method is planned but not fully implemented in the current version)
response = await client.agenerate(
model="llama2",
prompt="Explain quantum computing in simple terms",
options={"temperature": 0.7},
stream=False
)
Returns:
If
stream=False
: A dictionary containing the responseIf
stream=True
: An async iterator yielding response chunks
Chat Completion
chat
Generate a chat response for the given messages.
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Tell me about neural networks."}
]
response = client.chat(
model="llama2",
messages=messages,
options={"temperature": 0.7},
stream=False
)
Parameters:
model
(str): The model name to use for generationmessages
(list): List of message dictionaries with ‘role’ and ‘content’ keysoptions
(dict, optional): Additional model parametersstream
(bool, optional): Whether to stream the response. Default: True
Returns:
If
stream=False
: A dictionary containing the responseIf
stream=True
: An iterator yielding response chunks
Message Object Structure:
role
(str): The role of the message - “system”, “user”, “assistant”, or “tool”content
(str): The content of the messageimages
(list, optional): For multimodal models, a list of image datatool_calls
(list, optional): For function calling, a list of tool call objects
achat
Asynchronous version of chat
. (Note: This method is planned but not fully implemented in the current version)
response = await client.achat(
model="llama2",
messages=messages,
options={"temperature": 0.7},
stream=False
)
Embeddings
create_embedding
Create an embedding vector for the given text.
embedding = client.create_embedding(
model="nomic-embed-text",
prompt="This is a sample text for embedding."
)
Parameters:
model
(str): The model name to use for embeddingprompt
(str): The text to create an embedding foroptions
(dict, optional): Additional model parameters
Returns:
A dictionary containing the embedding vector
Response Structure:
model
(str): The model nameembedding
(list): The embedding vector (array of floats)total_duration
(int): Time spent generating embeddings (nanoseconds)prompt_eval_count
(int): Number of tokens processed
acreate_embedding
Asynchronous version of create_embedding
. (Note: This method is planned but not fully implemented in the current version)
embedding = await client.acreate_embedding(
model="nomic-embed-text",
prompt="This is a sample text for embedding."
)
batch_embeddings
Create embeddings for multiple prompts efficiently. (Note: This method is planned but not fully implemented in the current version)
embeddings = client.batch_embeddings(
model="nomic-embed-text",
prompts=["Text one", "Text two", "Text three"]
)
Parameters:
model
(str): The model name to use for embeddingprompts
(list): List of texts to create embeddings foroptions
(dict, optional): Additional model parameters
Returns:
A list of dictionaries containing embeddings
Model Management
list_models
List available models.
models = client.list_models()
Returns:
A dictionary containing the list of available models
Response Structure:
models
(list): A list of model objects containing:name
(str): The model namesize
(int): Size in bytesmodified_at
(str): Timestamp of last modificationdigest
(str): Model digestdetails
(object): Additional model details
get_model_info
Get information about a specific model. (Note: This method is planned but not fully implemented in the current version)
model_info = client.get_model_info("llama2")
Parameters:
model
(str): The model name
Returns:
A dictionary containing model information
pull_model
Pull a model from the Ollama registry.
# Non-streaming
result = client.pull_model("llama2", stream=False)
# Streaming with progress updates
for update in client.pull_model("llama2", stream=True):
print(f"Progress: {update.get('status')}")
Parameters:
model
(str): The model name to pullstream
(bool, optional): Whether to stream the download progress. Default: False
Returns:
If
stream=False
: A dictionary with the statusIf
stream=True
: An iterator of status updates
Stream Update Structure:
status
(str): Status message like “downloading”, “processing”, “success”completed
(int): Bytes downloaded (present during downloading)total
(int): Total bytes to download (present during downloading)digest
(str): Model digest (present during downloading)
delete_model
Delete a model.
success = client.delete_model("llama2")
Parameters:
model
(str): The model name to delete
Returns:
A boolean indicating success or failure
copy_model
Copy a model to a new name. (Note: This method is planned but not fully implemented in the current version)
result = client.copy_model("llama2", "my-llama2-copy")
Parameters:
source
(str): The source model namedestination
(str): The destination model name
Returns:
A dictionary containing the status
Miscellaneous
get_version
Get the Ollama version.
version = client.get_version()
print(f"Ollama version: {version['version']}")
Returns:
A dictionary containing version information
aget_version
Asynchronous version of get_version
.
version = await client.aget_version()
_messages_to_prompt
Convert chat messages to a unified prompt string for models that don’t support chat format.
prompt = client._messages_to_prompt(messages)
Parameters:
messages
(list): List of message dictionaries
Returns:
A formatted prompt string
_is_likely_embedding_model
Check if a model is likely to be an embedding-only model based on name patterns.
is_embedding = client._is_likely_embedding_model("nomic-embed-text")
Parameters:
model_name
(str): Name of the model to check
Returns:
True
if the model is likely embedding-only,False
otherwise
Eidosian Note
We strive for recursive excellence:
Recursive Refinement: Each function evolves toward perfection through rigorous testing and user feedback
Humor as Cognitive Leverage: Error messages enlighten through carefully calibrated wit
Structure as Control: Every parameter and return type forms a precise architectural blueprint
Velocity as Intelligence: Functions optimized for lightning-fast execution without sacrificing depth
Exception Classes
The package provides precisely engineered exception types for clear error handling:
OllamaAPIError
: Base exception class for all Ollama Toolkit errorsConnectionError
: Raised when connection to the API failsTimeoutError
: Raised when an API request times outModelNotFoundError
: Raised when a requested model is not foundServerError
: Raised when the API server returns a 5xx errorInvalidRequestError
: Raised when the API server returns a 4xx errorStreamingError
: Raised when there’s an error during streaming responsesParseError
: Raised when there’s an error parsing API responsesAuthenticationError
: Raised when authentication failsEndpointNotFoundError
: Raised when an API endpoint is not foundModelCompatibilityError
: Raised when a model doesn’t support an operationStreamingTimeoutError
: Raised when a streaming response times out
Utility Functions
The package provides several utility functions in ollama_toolkit.utils.common
:
Display Functions
print_header(title)
: Print a formatted headerprint_success(message)
: Print a success message in greenprint_error(message)
: Print an error message in redprint_warning(message)
: Print a warning message in yellowprint_info(message)
: Print an information message in blueprint_json(data)
: Print formatted JSON data
API Utilities
make_api_request(method, endpoint, data=None, base_url=DEFAULT_OLLAMA_API_URL, timeout=300)
: Make a synchronous API requestasync_make_api_request(method, endpoint, data=None, base_url=DEFAULT_OLLAMA_API_URL, timeout=300)
: Make an asynchronous API request
Ollama Management
check_ollama_installed()
: Check if Ollama is installed on the systemcheck_ollama_running()
: Check if Ollama server is runninginstall_ollama()
: Attempt to install Ollamaensure_ollama_running()
: Ensure Ollama is installed and runningformat_traceback(e)
: Format an exception traceback for logging or display
Model Constants
The ollama_toolkit.utils.model_constants
module provides:
DEFAULT_CHAT_MODEL
: Default model for chat completionsBACKUP_CHAT_MODEL
: Fallback model for chat completionsDEFAULT_EMBEDDING_MODEL
: Default model for embeddingsBACKUP_EMBEDDING_MODEL
: Fallback model for embeddingsresolve_model_alias(alias)
: Convert model alias to actual model nameget_fallback_model(model_name)
: Get the appropriate fallback model
Refer to the examples in /examples
for real-world usage.