Quickstart¶
This page gives a good introduction in how to get started with Recipe. This assumes you already have Recipe installed. If you do not, head over to Installing Recipe.
First, make sure that:
- Recipe is installed
- Recipe is up-to-date
Let’s gets started with some simple examples.
Creating a Shelf¶
A Shelf
is a place to store SQL fragments. In recipe
these are called Ingredients
. Ingredients may
contain columns that should be part of the SELECT
portion of a query,
filters that are part of the WHERE
clause of a query, group_bys that
contribute to a query’s GROUP BY
and havings which add HAVING
limits
ot a query.
It’s a safe bet that you won’t have to construct an Ingredient
with all these parts directly because Recipe contains convenience classes
that help you build the most common SQL fragments. The two most common
Ingredient subclasses are Dimensions
which supply
both a column and a grouping on that column and
Metrics
which supply a column aggregation.
Shelf acts like a dictionary. The keys are strings and the values are Ingredients. The keys are a shortcut name for the ingredient. Here’s an example.
from recipe import *
shelf = Shelf({
'age': WtdAvgMetric(Census.age, Census.pop2000),
'population': Metric(func.sum(Census.pop2000)),
'state': Dimension(Census.state)
})
This is a shelf with two metrics (a weighted average of age, and the sum of population) and a dimension which lets you group on US State names.
Using the Shelf to build a Recipe¶
Now that you have the shelf, you can build a recipe. Given the shelf we just created, we can create a recipe that gives us the average age for each state in the United States.
recipe = Recipe(shelf=shelf, session=Session())\
.dimensions('state')\
.metrics('age')
print(recipe.to_sql())
print(recipe.dataset.csv)
The output will be:
SELECT census.state AS state,
CAST(sum(census.age * census.pop2000) AS FLOAT) / (coalesce(CAST(sum(census.pop2000) AS FLOAT), 0.0) + 1e-09) AS age,
sum(census.pop2000) AS population
FROM census
GROUP BY census.state
state,age,state_id
Alabama,36.27787892421841,Alabama
Alaska,31.947384766048568,Alaska
Arizona,35.37065466080318,Arizona
Arkansas,36.63745110262778,Arkansas
...
Recipes can be built interactively by building on previous recipes. Consider the recipe we just created. We can add a filter to it:
recipe = recipe.filters(Census.state.like('V%'))
print(recipe.to_sql())
print(recipe.dataset.csv)
The output will be:
SELECT census.state AS state,
CAST(sum(census.age * census.pop2000) AS FLOAT) / (coalesce(CAST(sum(census.pop2000) AS FLOAT), 0.0) + 1e-09) AS age
FROM census
WHERE census.state LIKE 'V%'
GROUP BY census.state
state,age,state_id
Vermont,37.0597968760254,Vermont
Virginia,35.83989223366432,Virginia
Basic parts of a recipe
dimension, metrics, order_by, having
Note that a recipe contains data from a single table.`
Viewing the data from your Recipe¶
recipe.dataset.xxxx iterating over recipe.all dimensions have a separate _id property
More about Ingredients¶
Types of Ingredients¶
List of ingredients
Dimension¶
Dimensions are groupings that exist in your data.
# A simple dimension
self.shelf['state'] = Dimension(Census.state)
IdValueDimension¶
IdValueDimensions support separate properties for ids and values. Consider a
table of employees with an employee_id
and a full_name
. If you had
two employees with the same name you need to be able to distinguish between
them.
# Support an id and a label
shelf = Shelf({
'employee': IdValueDimension(Employee.id, Employee.full_name)
})
The id is accessible as employee_id
in each row and their full name is
available as employee
.
LookupDimension¶
Lookup dimension maps values in your data to descriptive names. The _id
property of your dimension contains the original value.
# Convert M/F into Male/Female
shelf = Shelf({
'gender': LookupDimension(Census.sex, {'M': 'Male',
'F': 'Female'}, default='Unknown')
})
If you use the gender dimension, there will be a gender_id
in each row
that will be “M” or “F” and a gender
in each row that will be “Male” or
“Female”.
Metric¶
DivideMetric¶
WtdAvgMetric¶
SumIfMetric¶
CountIfMetric¶
Filter¶
Having¶
Formatters¶
Building filters¶
Ingredient.build_filter
Storing extra attributes in meta¶
Using Extensions¶
This part of the documentation services to give you an idea that are otherwise hard to extract from the API Documentation
And now for something completely different.
What are extensions for?
Automatic Filtering¶
AutomaticFilter
Summarizing over Dimensions¶
SummarizeOverRecipe
Merging multiple tables¶
BlendRecipe
Adding comparison data¶
CompareRecipe
Anonymizing data¶
Anonymize
Advanced Features¶
Database connections¶
Caching¶
Running recipes in parallel with RecipePool¶
Now, go check out the API Documentation or begin Recipe Development.