Quickstart

This page gives a good introduction in how to get started with Recipe. This assumes you already have Recipe installed. If you do not, head over to Installing Recipe.

First, make sure that:

Let’s gets started with some simple examples.

Creating a Shelf

A Shelf is a place to store SQL fragments. In recipe these are called Ingredients. Ingredients may contain columns that should be part of the SELECT portion of a query, filters that are part of the WHERE clause of a query, group_bys that contribute to a query’s GROUP BY and havings which add HAVING limits ot a query.

It’s a safe bet that you won’t have to construct an Ingredient with all these parts directly because Recipe contains convenience classes that help you build the most common SQL fragments. The two most common Ingredient subclasses are Dimensions which supply both a column and a grouping on that column and Metrics which supply a column aggregation.

Shelf acts like a dictionary. The keys are strings and the values are Ingredients. The keys are a shortcut name for the ingredient. Here’s an example.

from recipe import *

shelf = Shelf({
    'age': WtdAvgMetric(Census.age, Census.pop2000),
    'population': Metric(func.sum(Census.pop2000)),
    'state': Dimension(Census.state)
})

This is a shelf with two metrics (a weighted average of age, and the sum of population) and a dimension which lets you group on US State names.

Using the Shelf to build a Recipe

Now that you have the shelf, you can build a recipe. Given the shelf we just created, we can create a recipe that gives us the average age for each state in the United States.

recipe = Recipe(shelf=shelf, session=Session())\
    .dimensions('state')\
    .metrics('age')
print(recipe.to_sql())
print(recipe.dataset.csv)

The output will be:

SELECT census.state AS state,
   CAST(sum(census.age * census.pop2000) AS FLOAT) / (coalesce(CAST(sum(census.pop2000) AS FLOAT), 0.0) + 1e-09) AS age,
   sum(census.pop2000) AS population
FROM census
GROUP BY census.state

state,age,state_id
Alabama,36.27787892421841,Alabama
Alaska,31.947384766048568,Alaska
Arizona,35.37065466080318,Arizona
Arkansas,36.63745110262778,Arkansas
...

Recipes can be built interactively by building on previous recipes. Consider the recipe we just created. We can add a filter to it:

recipe = recipe.filters(Census.state.like('V%'))
print(recipe.to_sql())
print(recipe.dataset.csv)

The output will be:

SELECT census.state AS state,
       CAST(sum(census.age * census.pop2000) AS FLOAT) / (coalesce(CAST(sum(census.pop2000) AS FLOAT), 0.0) + 1e-09) AS age
FROM census
WHERE census.state LIKE 'V%'
GROUP BY census.state
state,age,state_id
Vermont,37.0597968760254,Vermont
Virginia,35.83989223366432,Virginia

Basic parts of a recipe

dimension, metrics, order_by, having

Note that a recipe contains data from a single table.`

Viewing the data from your Recipe

recipe.dataset.xxxx iterating over recipe.all dimensions have a separate _id property

More about Ingredients

Types of Ingredients

List of ingredients

Dimension

Dimensions are groupings that exist in your data.

# A simple dimension
self.shelf['state'] = Dimension(Census.state)

IdValueDimension

IdValueDimensions support separate properties for ids and values. Consider a table of employees with an employee_id and a full_name. If you had two employees with the same name you need to be able to distinguish between them.

# Support an id and a label
shelf = Shelf({
  'employee': IdValueDimension(Employee.id, Employee.full_name)
})

The id is accessible as employee_id in each row and their full name is available as employee.

LookupDimension

Lookup dimension maps values in your data to descriptive names. The _id property of your dimension contains the original value.

# Convert M/F into Male/Female
shelf = Shelf({
    'gender': LookupDimension(Census.sex, {'M': 'Male',
                'F': 'Female'}, default='Unknown')
})

If you use the gender dimension, there will be a gender_id in each row that will be “M” or “F” and a gender in each row that will be “Male” or “Female”.

Metric

DivideMetric

WtdAvgMetric

SumIfMetric

CountIfMetric

Filter

Having

Formatters

Building filters

Ingredient.build_filter

Storing extra attributes in meta

Using Extensions

This part of the documentation services to give you an idea that are otherwise hard to extract from the API Documentation

And now for something completely different.

What are extensions for?

Automatic Filtering

AutomaticFilter

Summarizing over Dimensions

SummarizeOverRecipe

Merging multiple tables

BlendRecipe

Adding comparison data

CompareRecipe

Anonymizing data

Anonymize

Advanced Features

Database connections

Caching

Running recipes in parallel with RecipePool


Now, go check out the API Documentation or begin Recipe Development.