Getting Started
Welcome to Zuma! 👋 This guide will help you understand what Zuma is and how to use it in your Python
projects.
Zuma is a Python framework that helps you build and manage workflow pipelines. Think of it like a recipe
book -
you can define a series of steps (like a recipe's instructions), and Zuma will help you execute them in
the right order,
handle any errors, and make sure everything runs smoothly.
What is a Workflow?
A workflow in Zuma is like a set of instructions that need to be followed in a specific order. For
example, imagine you're
building a data processing application that needs to:
- Download data from a server
- Clean and format the data
- Save the results to a database
Each of these tasks would be a "step" in your workflow, and Zuma helps you organize and run these steps
efficiently.
Installation
You can install Zuma using your preferred Python package manager:
Choose your package manager:
# Using pip
pip install zuma-workflow
# Using Poetry
poetry add zuma-workflow
# Using Pipenv
pipenv install zuma-workflow
# Using uv
uv add zuma-workflow
Note on Virtual Environments
It's recommended to always use virtual environments when working with Python projects. Each
package manager handles virtual environments differently, but all of the above commands will
work within a virtual environment.
Verifying Installation
To verify that Zuma is installed correctly, open a Python shell and try importing it:
from zuma import ZumaWorkflow
print("Zuma is installed successfully!")
Core Concepts
Let's understand the main building blocks of Zuma. Don't worry if it seems complex at first - we'll
break it down
with examples!
1. Workflow (ZumaWorkflow)
A workflow is the main container that holds all your steps. Think of it as a project manager that:
- Keeps track of all the steps
- Makes sure they run in the right order
- Handles any problems that come up
- Shares information between steps
from zuma import ZumaWorkflow
# Create a simple workflow
workflow = ZumaWorkflow(
name="My First Workflow", # Give your workflow a descriptive name
steps=[
# We'll add steps here
]
)
graph TD
A((Start)) --> B[Step 1]
B --> C[Step 2]
C --> D[Step 3]
D --> E((End))
style A fill:#f4b860,stroke:#f4b860
style B fill:transparent,stroke:#f4b860
style C fill:transparent,stroke:#f4b860
style D fill:transparent,stroke:#f4b860
style E fill:#6ee7b7,stroke:#6ee7b7
The diagram above shows a simple linear workflow where steps execute one after another. Each step can:
2. Steps (ZumaActionStep)
Steps are the individual tasks in your workflow. Each step:
- Has a specific job to do
- Can access data from previous steps
- Can pass data to next steps
- Can handle errors if something goes wrong
from zuma import ZumaActionStep
class DownloadDataStep(ZumaActionStep):
"""A step that downloads data from somewhere"""
async def execute(self, context):
# This is where you put your step's logic
print(f"[{self.name}] Downloading data...")
# Simulate downloading data
data = {"user": "john", "age": 30}
# Return data for next steps to use
return {"downloaded_data": data}
class ProcessDataStep(ZumaActionStep):
"""A step that processes the downloaded data"""
async def execute(self, context):
# Get data from previous step
data = context.get("downloaded_data")
# Process the data
processed_data = {
"name": data["user"].upper(),
"age_in_months": data["age"] * 12
}
return {"processed_data": processed_data}
3. Context
The context is like a shared notebook that all steps can read from and write to. It helps steps
communicate
with each other by passing data along.
For example, if Step A downloads some data and Step B needs to process it:
# Step A puts data in the context
async def execute(self, context):
data = download_something()
return {"my_data": data} # This goes into the context
# Step B reads data from the context
async def execute(self, context):
data = context.get("my_data") # Get data from Step A
result = process_data(data)
return {"processed": result}
Basic Usage
Let's put everything together and create your first complete workflow! We'll build a simple workflow
that processes
user data.
Step 1: Create Your Steps
from zuma import ZumaWorkflow, ZumaActionStep, ZumaRunner
import asyncio
class FetchUserStep(ZumaActionStep):
"""Gets user data (simulated)"""
async def execute(self, context):
# Simulate fetching user data
print("Fetching user data...")
await asyncio.sleep(1) # Simulate network delay
user_data = {
"name": "Alice",
"age": 25,
"city": "New York"
}
return {"user": user_data}
class ValidateUserStep(ZumaActionStep):
"""Validates the user data"""
async def execute(self, context):
user = context.get("user")
# Simple validation
if not user.get("name"):
raise ValueError("User must have a name!")
if not isinstance(user.get("age"), int):
raise ValueError("Age must be a number!")
return {"validated": True}
class ProcessUserStep(ZumaActionStep):
"""Processes the validated user data"""
async def execute(self, context):
user = context.get("user")
# Do some processing
processed_data = {
"full_name": user["name"].upper(),
"age_group": "adult" if user["age"] >= 18 else "minor",
"location": user["city"]
}
return {"processed_user": processed_data}
Step 2: Create and Run the Workflow
# Create the workflow
workflow = ZumaWorkflow(
"User Processing Workflow",
steps=[
FetchUserStep("Fetch User"),
ValidateUserStep("Validate User"),
ProcessUserStep("Process User")
]
)
# Create a runner
runner = ZumaRunner()
# Run the workflow
async def main():
result = await runner.run_workflow(workflow)
print("\nWorkflow completed!")
print("Final result:", result)
# Run it!
if __name__ == "__main__":
asyncio.run(main())
Step 3: Understanding the Output
When you run this workflow, you'll see output like:
Fetching user data...
[Fetch User] ✓ Completed
[Validate User] ✓ Completed
[Process User] ✓ Completed
Workflow completed!
Final result: {
'user': {'name': 'Alice', 'age': 25, 'city': 'New York'},
'validated': True,
'processed_user': {
'full_name': 'ALICE',
'age_group': 'adult',
'location': 'New York'
}
}
Workflow Components
Zuma provides several special components to handle common workflow patterns. Let's look at each one:
1. Parallel Steps (ZumaParallelAction)
When you have multiple steps that can run at the same time (like processing different files), use
ZumaParallelAction:
from zuma import ZumaParallelAction
# Define steps that can run in parallel
parallel_steps = ZumaParallelAction(
"Process Files",
steps=[
ProcessCSVStep("Process CSV"),
ProcessJSONStep("Process JSON"),
ProcessXMLStep("Process XML")
],
max_concurrency=2 # Run 2 steps at a time
)
flowchart TD
S((Start)) --> A[Process CSV]
S --> B[Process JSON]
S --> C[Process XML]
A --> E((End))
B --> E
C --> E
style S fill:#f4b860,stroke:#f4b860
style E fill:#6ee7b7,stroke:#6ee7b7
style A fill:#000000,stroke:#f4b860,color:#ffffff
style B fill:#000000,stroke:#f4b860,color:#ffffff
style C fill:#000000,stroke:#f4b860,color:#ffffff
The diagram above shows how parallel steps work. The workflow:
- Splits into multiple parallel paths
- Executes steps concurrently (up to max_concurrency)
- Waits for all steps to complete before continuing
2. Conditional Steps (ZumaConditionalStep)
Sometimes you need different steps based on certain conditions. Use ZumaConditionalStep for this:
def check_data_size(context):
"""Decide which processing path to take"""
data_size = len(context.get("data", []))
return data_size > 1000
# Create a conditional step
processing_step = ZumaConditionalStep(
"Choose Processing Path",
condition=check_data_size,
true_component=BatchProcessStep("Batch Process"), # For large data
false_component=SimpleProcessStep("Simple Process") # For small data
)
flowchart TD
S((Start)) --> D{Size > 1000?}
D -->|Yes| B[Batch Process]
D -->|No| P[Simple Process]
B --> E((End))
P --> E
style S fill:#f4b860,stroke:#f4b860
style E fill:#6ee7b7,stroke:#6ee7b7
style D fill:#000000,stroke:#3b82f6,color:#ffffff
style B fill:#000000,stroke:#f4b860,color:#ffffff
style P fill:#000000,stroke:#f4b860,color:#ffffff
The diagram above illustrates conditional workflow branching:
- A condition is evaluated (data size in this case)
- Based on the result, one of two paths is taken
- Both paths eventually merge back to continue the workflow
3. Error Handling
Zuma helps you handle errors gracefully. You can:
- Retry failed steps
- Provide fallback steps
- Continue workflow even if some steps fail
class RetryableStep(ZumaActionStep):
"""A step that might fail but can retry"""
def __init__(self, name):
super().__init__(
name=name,
retries=3, # Try up to 3 times
retry_delay=1.0 # Wait 1 second between retries
)
async def execute(self, context):
try:
result = await some_risky_operation()
return {"data": result}
except Exception as e:
raise ZumaExecutionError(f"Operation failed: {str(e)}")
# Create workflow with error handling
workflow = ZumaWorkflow(
"Fault Tolerant Workflow",
steps=[RetryableStep("Risky Step")],
continue_on_failure=True # Continue even if steps fail
)
graph TD
Start((Start)) --> Attempt1[Attempt 1]
Attempt1 -->|Success| Continue[Continue]
Attempt1 -->|Fail| Attempt2[Attempt 2]
Attempt2 -->|Success| Continue
Attempt2 -->|Fail| Attempt3[Attempt 3]
Attempt3 -->|Success| Continue
Attempt3 -->|Fail| HandleError[Handle Error]
HandleError --> Continue
Continue --> End((End))
style Start fill:#f4b860,stroke:#f4b860
style End fill:#6ee7b7,stroke:#6ee7b7
style HandleError fill:transparent,stroke:#ef4444
style Attempt1 fill:transparent,stroke:#f4b860
style Attempt2 fill:transparent,stroke:#f4b860
style Attempt3 fill:transparent,stroke:#f4b860
style Continue fill:transparent,stroke:#f4b860
The diagram above shows how error handling works:
- Each step can be configured to retry on failure
- After max retries, error handling logic is triggered
- The workflow can continue even after failures if configured
Advanced Features
1. Step Dependencies
You can specify that certain steps depend on others:
class DependentStep(ZumaActionStep):
"""A step that needs data from specific previous steps"""
def __init__(self, name):
super().__init__(
name=name,
required_contexts=["user_data", "preferences"] # Names of required data
)
async def execute(self, context):
# This will only run if both user_data and preferences exist in context
user = context.get("user_data")
prefs = context.get("preferences")
return {"result": process_user_with_prefs(user, prefs)}
2. Progress Tracking
Monitor your workflow's progress with built-in tracking:
class TrackableStep(ZumaActionStep):
async def execute(self, context):
total_items = 100
for i in range(total_items):
# Update progress
self.update_progress(
completed=i + 1,
total=total_items,
message=f"Processing item {i + 1}/{total_items}"
)
await process_item(i)
return {"completed": True}
3. Custom Context Processors
Transform data between steps automatically:
from zuma import ZumaContextProcessor
class DataNormalizer(ZumaContextProcessor):
"""Normalizes data between steps"""
def process(self, context):
if "user_data" in context:
# Convert all string values to lowercase
data = context["user_data"]
normalized = {
k: v.lower() if isinstance(v, str) else v
for k, v in data.items()
}
context["user_data"] = normalized
return context
# Use the processor in your workflow
workflow = ZumaWorkflow(
"Normalized Workflow",
steps=[...],
context_processors=[DataNormalizer()]
)
Best Practices
Here are comprehensive guidelines to help you write better Zuma workflows:
1.
Step Design
- Single Responsibility: Keep steps focused on one
specific task. This makes them easier to test, maintain, and reuse.
- Descriptive Naming: Use clear, action-oriented names for
steps and workflows (e.g., 'ValidateUserData', 'ProcessPayment').
- Documentation: Add comprehensive docstrings that
explain:
- What the step does
- Required input context
- Expected output
- Possible errors
- Type Hints: Use Python type hints to make your code more
maintainable and catch type-related errors early.
2.
Error Handling
-
Anticipate Failures:
Always consider what can go wrong and handle edge cases
appropriately.
-
Retry Strategy:
Configure retry settings based on the operation:
- Network operations: Multiple retries with exponential backoff
- Database operations: Short retry window with linear backoff
- CPU-bound operations: Minimal retries to avoid resource waste
-
Error Messages:
Provide detailed error messages that help diagnose issues
-
Resource Cleanup:
Implement proper cleanup in case of failures
3.
Performance
- Parallel Processing: Use ZumaParallelAction for
independent operations
- Resource Management:
- Use connection pooling for databases
- Implement proper caching strategies
- Release resources promptly
- Batch Processing: Process large datasets in chunks to
manage memory usage
- Progress Tracking: Implement progress tracking for
long-running operations
4.
Testing
- Unit Tests: Write comprehensive tests for individual steps:
@pytest.mark.asyncio
async def test_user_processor():
# Arrange
processor = UserProcessor("Test Processor")
context = {"user_data": {"name": "John", "age": 30}}
# Act
result = await processor.execute(context)
# Assert
assert "processed_user" in result
assert result["processed_user"]["name"] == "john"
- Integration Tests: Test complete workflows with realistic data.
- Mock External Services: Use mocking for external dependencies.
- Error Scenarios: Test error handling and recovery mechanisms.
5.
Monitoring and Logging
- Structured Logging: Use consistent log formats:
self.logger.info("Processing user data", extra={
"user_id": user.id,
"step": self.name,
"batch_size": self.batch_size
})
- Metrics Collection: Track important metrics:
- Step execution time
- Success/failure rates
- Resource usage
- Throughput
- Alerting: Set up appropriate alerting for critical failures.
- Debugging: Include sufficient context in logs for debugging.
API Reference
Core Components
ZumaWorkflow
Main container for organizing workflow steps.
class ZumaWorkflow:
def __init__(
self,
name: str,
steps: List[ZumaComponent],
continue_on_failure: bool = False,
context_processors: List[ZumaContextProcessor] = None,
description: str = None
):
"""Initialize a workflow.
Args:
name: Workflow name
steps: List of workflow steps
continue_on_failure: Whether to continue if a step fails
context_processors: List of context processors
description: Optional workflow description
"""
Attributes:
name
Unique identifier for the workflow
steps
List of workflow components to execute
continue_on_failure
If True, continue executing
remaining steps when a step fails
context_processors
Optional processors for
modifying context between steps
ZumaActionStep
Base class for implementing workflow steps.
class ZumaActionStep:
def __init__(
self,
name: str,
description: str = None,
retries: int = 0,
retry_delay: float = 1.0,
timeout: float = None,
required_contexts: List[str] = None
):
"""Initialize an action step.
Args:
name: Step name
description: Optional step description
retries: Number of retry attempts
retry_delay: Delay between retries in seconds
timeout: Maximum execution time in seconds
required_contexts: List of required context keys
"""
async def execute(
self,
context: Dict[str, Any],
dependencies: Dict[str, Any],
**kwargs
) -> Dict[str, Any]:
"""Execute the step logic.
Override this method in your custom steps.
"""
raise NotImplementedError()
Key Methods:
-
execute(context, dependencies, **kwargs)
Main execution method to override in custom steps
-
update_progress(completed, total, message=None)
Update step progress during execution
-
on_retry(attempt: int, error: Exception)
Called before each retry attempt
ZumaParallelAction
Execute multiple steps concurrently.
class ZumaParallelAction:
def __init__(
self,
name: str,
steps: List[ZumaComponent],
max_concurrency: int = None,
fail_fast: bool = True,
description: str = None
):
"""Initialize parallel execution.
Args:
name: Action name
steps: List of steps to execute in parallel
max_concurrency: Maximum concurrent executions
fail_fast: Stop all steps if one fails
description: Optional description
"""
ZumaConditionalStep
Conditional branching in workflows.
class ZumaConditionalStep:
def __init__(
self,
name: str,
condition: Callable[[Dict[str, Any]], bool],
true_component: ZumaComponent,
false_component: ZumaComponent = None,
description: str = None
):
"""Initialize conditional step.
Args:
name: Step name
condition: Function that returns True/False
true_component: Component to execute if True
false_component: Optional component if False
description: Optional description
"""
ZumaRunner
Executes and manages workflows.
class ZumaRunner:
async def run_workflow(
self,
workflow: ZumaWorkflow,
context: Dict[str, Any] = None,
generate_diagram: bool = False,
diagram_output: str = None
) -> ZumaResult:
"""Execute a workflow.
Args:
workflow: Workflow to execute
context: Initial context data
generate_diagram: Create visualization
diagram_output: Diagram output path
"""
Key Methods:
-
run_workflow(workflow, context, **kwargs)
Execute a workflow with optional visualization
-
create_workflow_diagram(result, output_file)
Generate workflow visualization
-
print_execution_summary(result)
Print workflow execution details
Extension Points
ZumaContextProcessor
Custom context modification between steps.
class ZumaContextProcessor:
def process(self, context: Dict[str, Any]) -> Dict[str, Any]:
"""Process and modify the context.
Override this method to implement custom processing.
Args:
context: Current workflow context
Returns:
Modified context dictionary
"""
return context
ZumaExecutionError
Custom error type for workflow execution failures.
class ZumaExecutionError(Exception):
def __init__(
self,
message: str,
error_code: str = None,
details: Dict[str, Any] = None
):
"""Initialize execution error.
Args:
message: Error message
error_code: Optional error code
details: Additional error details
"""
Results and Status
ZumaResult
Contains workflow execution results and metadata.
Attributes:
status
Current execution status (PENDING, RUNNING,
SUCCESS, FAILED)
context
Final workflow context
error
Error information if failed
duration
Total execution time in seconds
metadata
Additional execution metadata
ZumaStatus
Enumeration of possible execution states.
class ZumaStatus(Enum):
PENDING = "PENDING"
RUNNING = "RUNNING"
SUCCESS = "SUCCESS"
FAILED = "FAILED"
SKIPPED = "SKIPPED"
Visualization
Diagram Generation
Generate visual workflow representations.
Features:
- Clear visualization of workflow steps and relationships
-
Support for retry mechanisms visualization
- Main workflow path at top
- Retry attempts branching below
- Success paths rejoining main flow
- Error handling visualization
- Visual representation of parallel processing
- Dark theme support for better readability
- Automatic diagram generation during workflow execution