The SQLAlchemy SQL Toolkit and Object Relational Mapper is a comprehensive set of tools for working with databases and Python. It has several distinct areas of functionality which can be used individually or combined together. Its major API components, all public-facing, are illustrated below:
+-----------------------------------------------------------+ | Object Relational Mapper (ORM) | | [tutorial] [docs] | +-----------------------------------------------------------+ +---------+ +------------------------------------+ +--------+ | | | SQL Expression Language | | | | | | [tutorial] [docs] | | | | | +------------------------------------+ | | | +-----------------------+ +--------------+ | | Dialect/Execution | | Schema Management | | [docs] | | [docs] | +---------------------------------+ +-----------------------+ +----------------------+ +----------------------------------+ | Connection Pooling | | Types | | [docs] | | [docs] | +----------------------+ +----------------------------------+
Above, the two most significant front-facing portions of SQLAlchemy are the Object Relational Mapper and the SQL Expression Language. These are two separate toolkits, one building off the other. SQL Expressions can be used independently of the ORM. When using the ORM, the SQL Expression language is used to establish object-relational configurations as well as in querying.
back to section topMetaData
and Table
objects; reading database schemas into your application, creating and dropping tables, constraints, defaults, sequences, indexes.
Installing SQLAlchemy from scratch is most easily achieved with setuptools. (setuptools installation). Just run this from the command-line:
# easy_install SQLAlchemy
This command will download the latest version of SQLAlchemy from the Python Cheese Shop and install it to your system.
Otherwise, you can install from the distribution using the setup.py
script:
# python setup.py install
SQLAlchemy is designed to operate with a DB-API implementation built for a particular database, and includes support for the most popular databases:
This documentation covers SQLAlchemy version 0.5. If you're working on a system that already has SQLAlchemy installed, check the version from your Python prompt like this:
>>> import sqlalchemy >>> sqlalchemy.__version__ 0.5.0
Notes on what's changed from 0.4 to 0.5 is available on the SQLAlchemy wiki at 05Migration.
back to section topIn this tutorial we will cover a basic SQLAlchemy object-relational mapping scenario, where we store and retrieve Python objects from a database representation. The tutorial is in doctest format, meaning each >>>
line represents something you can type at a Python command prompt, and the following text represents the expected return value.
A quick check to verify that we are on at least version 0.5 of SQLAlchemy:
>>> import sqlalchemy >>> sqlalchemy.__version__ 0.5.0
For this tutorial we will use an in-memory-only SQLite database. To connect we use create_engine()
:
>>> from sqlalchemy import create_engine >>> engine = create_engine('sqlite:///:memory:', echo=True)
The echo
flag is a shortcut to setting up SQLAlchemy logging, which is accomplished via Python's standard logging
module. With it enabled, we'll see all the generated SQL produced. If you are working through this tutorial and want less output generated, set it to False
. This tutorial will format the SQL behind a popup window so it doesn't get in our way; just click the "SQL" links to see whats being generated.
Next we want to tell SQLAlchemy about our tables. We will start with just a single table called users
, which will store records for the end-users using our application (lets assume it's a website). We define our tables within a catalog called MetaData
, using the Table
construct, which is used in a manner similar to SQL's CREATE TABLE syntax:
>>> from sqlalchemy import Table, Column, Integer, String, MetaData, ForeignKey >>> metadata = MetaData() >>> users_table = Table('users', metadata, ... Column('id', Integer, primary_key=True), ... Column('name', String), ... Column('fullname', String), ... Column('password', String) ... )
All about how to define Table
objects, as well as how to load their definition from an existing database (known as reflection), is described in Database Meta Data.
Next, we can issue CREATE TABLE statements derived from our table metadata, by calling create_all()
and passing it the engine
instance which points to our database. This will check for the presence of a table first before creating, so it's safe to call multiple times:
sql>>> metadata.create_all(engine)
Users familiar with the syntax of CREATE TABLE may notice that the VARCHAR columns were generated without a length; on SQLite, this is a valid datatype, but on most databases it's not allowed. So if running this tutorial on a database such as Postgres or MySQL, and you wish to use SQLAlchemy to generate the tables, a "length" may be provided to the String
type as below:
Column('name', String(50))
The length field on String
, as well as similar precision/scale fields available on Integer
, Numeric
, etc. are not referenced by SQLAlchemy other than when creating tables.
While the Table
object defines information about our database, it does not say anything about the definition or behavior of the business objects used by our application; SQLAlchemy views this as a separate concern. To correspond to our users
table, let's create a rudimentary User
class. It only need subclass Python's built-in object
class (i.e. it's a new style class):
>>> class User(object): ... def __init__(self, name, fullname, password): ... self.name = name ... self.fullname = fullname ... self.password = password ... ... def __repr__(self): ... return "<User('%s','%s', '%s')>" % (self.name, self.fullname, self.password)
The class has an __init__()
and a __repr__()
method for convenience. These methods are both entirely optional, and can be of any form. SQLAlchemy never calls __init__()
directly.
With our users_table
and User
class, we now want to map the two together. That's where the SQLAlchemy ORM package comes in. We'll use the mapper
function to create a mapping between users_table
and User
:
>>> from sqlalchemy.orm import mapper >>> mapper(User, users_table) <Mapper at 0x...; User>
The mapper()
function creates a new Mapper
object and stores it away for future reference, associated with our class. Let's now create and inspect a User
object:
>>> ed_user = User('ed', 'Ed Jones', 'edspassword') >>> ed_user.name 'ed' >>> ed_user.password 'edspassword' >>> str(ed_user.id) 'None'
The id
attribute, which while not defined by our __init__()
method, exists due to the id
column present within the users_table
object. By default, the mapper
creates class attributes for all columns present within the Table
. These class attributes exist as Python descriptors, and define instrumentation for the mapped class. The functionality of this instrumentation is very rich and includes the ability to track modifications and automatically load new data from the database when needed.
Since we have not yet told SQLAlchemy to persist Ed Jones
within the database, its id is None
. When we persist the object later, this attribute will be populated with a newly generated value.
The preceding approach to configuration involving a Table
, user-defined class, and mapper()
call illustrate classical SQLAlchemy usage, which values the highest separation of concerns possible. A large number of applications don't require this degree of separation, and for those SQLAlchemy offers an alternate "shorthand" configurational style called declarative. For many applications, this is the only style of configuration needed. Our above example using this style is as follows:
>>> from sqlalchemy.ext.declarative import declarative_base >>> Base = declarative_base() >>> class User(Base): ... __tablename__ = 'users' ... ... id = Column(Integer, primary_key=True) ... name = Column(String) ... fullname = Column(String) ... password = Column(String) ... ... def __init__(self, name, fullname, password): ... self.name = name ... self.fullname = fullname ... self.password = password ... ... def __repr__(self): ... return "<User('%s','%s', '%s')>" % (self.name, self.fullname, self.password)
Above, the declarative_base()
function defines a new class which we name Base
, from which all of our ORM-enabled classes will derive. Note that we define Column
objects with no "name" field, since it's inferred from the given attribute name.
The underlying Table
object created by our declarative_base()
version of User
is accessible via the __table__
attribute:
>>> users_table = User.__table__
and the owning MetaData
object is available as well:
>>> metadata = Base.metadata
Yet another "declarative" method is available for SQLAlchemy as a third party library called Elixir. This is a full-featured configurational product which also includes many higher level mapping configurations built in. Like declarative, once classes and mappings are defined, ORM usage is the same as with a classical SQLAlchemy configuration.
back to section topWe're now ready to start talking to the database. The ORM's "handle" to the database is the Session
. When we first set up the application, at the same level as our create_engine()
statement, we define a Session
class which will serve as a factory for new Session
objects:
>>> from sqlalchemy.orm import sessionmaker >>> Session = sessionmaker(bind=engine)
In the case where your application does not yet have an Engine
when you define your module-level objects, just set it up like this:
>>> Session = sessionmaker()
Later, when you create your engine with create_engine()
, connect it to the Session
using configure()
:
>>> Session.configure(bind=engine) # once engine is available
This custom-made Session
class will create new Session
objects which are bound to our database. Other transactional characteristics may be defined when calling sessionmaker()
as well; these are described in a later chapter. Then, whenever you need to have a conversation with the database, you instantiate a Session
:
>>> session = Session()
The above Session
is associated with our SQLite engine
, but it hasn't opened any connections yet. When it's first used, it retrieves a connection from a pool of connections maintained by the engine
, and holds onto it until we commit all changes and/or close the session object.
To persist our User
object, we add()
it to our Session
:
>>> ed_user = User('ed', 'Ed Jones', 'edspassword') >>> session.add(ed_user)
At this point, the instance is pending; no SQL has yet been issued. The Session
will issue the SQL to persist Ed Jones
as soon as is needed, using a process known as a flush. If we query the database for Ed Jones
, all pending information will first be flushed, and the query is issued afterwards.
For example, below we create a new Query
object which loads instances of User
. We "filter by" the name
attribute of ed
, and indicate that we'd like only the first result in the full list of rows. A User
instance is returned which is equivalent to that which we've added:
sql>>> our_user = session.query(User).filter_by(name='ed').first()
>>> our_user <User('ed','Ed Jones', 'edspassword')>
In fact, the Session
has identified that the row returned is the same row as one already represented within its internal map of objects, so we actually got back the identical instance as that which we just added:
>>> ed_user is our_user True
The ORM concept at work here is known as an identity map and ensures that all operations upon a particular row within a Session
operate upon the same set of data. Once an object with a particular primary key is present in the Session
, all SQL queries on that Session
will always return the same Python object for that particular primary key; it also will raise an error if an attempt is made to place a second, already-persisted object with the same primary key within the session.
We can add more User
objects at once using add_all()
:
>>> session.add_all([ ... User('wendy', 'Wendy Williams', 'foobar'), ... User('mary', 'Mary Contrary', 'xxg527'), ... User('fred', 'Fred Flinstone', 'blah')])
Also, Ed has already decided his password isn't too secure, so lets change it:
>>> ed_user.password = 'f8s7ccs'
The Session
is paying attention. It knows, for example, that Ed Jones
has been modified:
>>> session.dirty IdentitySet([<User('ed','Ed Jones', 'f8s7ccs')>])
and that three new User
objects are pending:
>>> session.new IdentitySet([<User('wendy','Wendy Williams', 'foobar')>, <User('mary','Mary Contrary', 'xxg527')>, <User('fred','Fred Flinstone', 'blah')>])
We tell the Session
that we'd like to issue all remaining changes to the database and commit the transaction, which has been in progress throughout. We do this via commit()
:
sql>>> session.commit()
commit()
flushes whatever remaining changes remain to the database, and commits the transaction. The connection resources referenced by the session are now returned to the connection pool. Subsequent operations with this session will occur in a new transaction, which will again re-acquire connection resources when first needed.
If we look at Ed's id
attribute, which earlier was None
, it now has a value:
sql>>> ed_user.id
1
After the Session
inserts new rows in the database, all newly generated identifiers and database-generated defaults become available on the instance, either immediately or via load-on-first-access. In this case, the entire row was re-loaded on access because a new transaction was begun after we issued commit()
. SQLAlchemy by default refreshes data from a previous transaction the first time it's accessed within a new transaction, so that the most recent state is available. The level of reloading is configurable as is described in the chapter on Sessions.
Since the Session
works within a transaction, we can roll back changes made too. Let's make two changes that we'll revert; ed_user
's user name gets set to Edwardo
:
>>> ed_user.name = 'Edwardo'
and we'll add another erroneous user, fake_user
:
>>> fake_user = User('fakeuser', 'Invalid', '12345') >>> session.add(fake_user)
Querying the session, we can see that they're flushed into the current transaction:
sql>>> session.query(User).filter(User.name.in_(['Edwardo', 'fakeuser'])).all()
[<User('Edwardo','Ed Jones', 'f8s7ccs')>, <User('fakeuser','Invalid', '12345')>]
Rolling back, we can see that ed_user
's name is back to ed
, and fake_user
has been kicked out of the session:
issuing a SELECT illustrates the changes made to the database:
sql>>> session.query(User).filter(User.name.in_(['ed', 'fakeuser'])).all()
[<User('ed','Ed Jones', 'f8s7ccs')>]
A Query
is created using the query()
function on Session
. This function takes a variable number of arguments, which can be any combination of classes and class-instrumented descriptors. Below, we indicate a Query
which loads User
instances. When evaluated in an iterative context, the list of User
objects present is returned:
sql>>> for instance in session.query(User).order_by(User.id): ... print instance.name, instance.fullname
ed Ed Jones wendy Wendy Williams mary Mary Contrary fred Fred Flinstone
The Query
also accepts ORM-instrumented descriptors as arguments. Any time multiple class entities or column-based entities are expressed as arguments to the query()
function, the return result is expressed as tuples:
sql>>> for name, fullname in session.query(User.name, User.fullname): ... print name, fullname
ed Ed Jones wendy Wendy Williams mary Mary Contrary fred Fred Flinstone
The tuples returned by Query
are named tuples, and can be treated much like an ordinary Python object. The names are the same as the attribute's name for an attribute, and the class name for a class:
sql>>> for row in session.query(User, User.name).all(): ... print row.User, row.name
<User('ed','Ed Jones', 'f8s7ccs')> ed <User('wendy','Wendy Williams', 'foobar')> wendy <User('mary','Mary Contrary', 'xxg527')> mary <User('fred','Fred Flinstone', 'blah')> fred
You can control the names using the label()
construct for scalar attributes and aliased()
for class constructs:
>>> from sqlalchemy.orm import aliased >>> user_alias = aliased(User, name='user_alias') sql>>> for row in session.query(user_alias, user_alias.name.label('name_label')).all(): ... print row.user_alias, row.name_label
Basic operations with Query
include issuing LIMIT and OFFSET, most conveniently using Python array slices and typically in conjunction with ORDER BY:
sql>>> for u in session.query(User).order_by(User.id)[1:3]: ... print u
<User('wendy','Wendy Williams', 'foobar')> <User('mary','Mary Contrary', 'xxg527')>
and filtering results, which is accomplished either with filter_by()
, which uses keyword arguments:
sql>>> for name, in session.query(User.name).filter_by(fullname='Ed Jones'): ... print name
ed
...or filter()
, which uses more flexible SQL expression language constructs. These allow you to use regular Python operators with the class-level attributes on your mapped class:
sql>>> for name, in session.query(User.name).filter(User.fullname=='Ed Jones'): ... print name
ed
The Query
object is fully generative, meaning that most method calls return a new Query
object upon which further criteria may be added. For example, to query for users named "ed" with a full name of "Ed Jones", you can call filter()
twice, which joins criteria using AND
:
sql>>> for user in session.query(User).filter(User.name=='ed').filter(User.fullname=='Ed Jones'): ... print user
<User('ed','Ed Jones', 'f8s7ccs')>
Here's a rundown of some of the most common operators used in filter()
:
equals
query.filter(User.name == 'ed')
not equals
query.filter(User.name != 'ed')
LIKE
query.filter(User.name.like('%ed%'))
IN
query.filter(User.name.in_(['ed', 'wendy', 'jack']))
IS NULL
filter(User.name == None)
AND
from sqlalchemy import and_ filter(and_(User.name == 'ed', User.fullname == 'Ed Jones')) # or call filter()/filter_by() multiple times filter(User.name == 'ed').filter(User.fullname == 'Ed Jones')
OR
from sqlalchemy import or_ filter(or_(User.name == 'ed', User.name == 'wendy'))
match
query.filter(User.name.match('wendy'))
The contents of the match parameter are database backend specific.
The all()
, one()
, and first()
methods of Query
immediately issue SQL and return a non-iterator value. all()
returns a list:
>>> query = session.query(User).filter(User.name.like('%ed')).order_by(User.id) sql>>> query.all()
[<User('ed','Ed Jones', 'f8s7ccs')>, <User('fred','Fred Flinstone', 'blah')>]
first()
applies a limit of one and returns the first result as a scalar:
sql>>> query.first()
<User('ed','Ed Jones', 'f8s7ccs')>
one()
, applies a limit of two, and if not exactly one row returned, raises an error:
sql>>> try: ... user = query.one() ... except Exception, e: ... print e
Multiple rows were found for one() sql>>> try: ... user = query.filter(User.id == 99).one() ... except Exception, e: ... print e
No row was found for one()
Literal strings can be used flexibly with Query
. Most methods accept strings in addition to SQLAlchemy clause constructs. For example, filter()
and order_by()
:
sql>>> for user in session.query(User).filter("id<224").order_by("id").all(): ... print user.name
ed wendy mary fred
Bind parameters can be specified with string-based SQL, using a colon. To specify the values, use the params()
method:
sql>>> session.query(User).filter("id<:value and name=:name").\ ... params(value=224, name='fred').order_by(User.id).one()
<User('fred','Fred Flinstone', 'blah')>
To use an entirely string-based statement, using from_statement()
; just ensure that the columns clause of the statement contains the column names normally used by the mapper (below illustrated using an asterisk):
sql>>> session.query(User).from_statement("SELECT * FROM users where name=:name").params(name='ed').all()
[<User('ed','Ed Jones', 'f8s7ccs')>]
Now let's consider a second table to be dealt with. Users in our system also can store any number of email addresses associated with their username. This implies a basic one to many association from the users_table
to a new table which stores email addresses, which we will call addresses
. Using declarative, we define this table along with its mapped class, Address
:
>>> from sqlalchemy import ForeignKey >>> from sqlalchemy.orm import relation, backref >>> class Address(Base): ... __tablename__ = 'addresses' ... id = Column(Integer, primary_key=True) ... email_address = Column(String, nullable=False) ... user_id = Column(Integer, ForeignKey('users.id')) ... ... user = relation(User, backref=backref('addresses', order_by=id)) ... ... def __init__(self, email_address): ... self.email_address = email_address ... ... def __repr__(self): ... return "<Address('%s')>" % self.email_address
The above class introduces a foreign key constraint which references the users
table. This defines for SQLAlchemy the relationship between the two tables at the database level. The relationship between the User
and Address
classes is defined separately using the relation()
function, which defines an attribute user
to be placed on the Address
class, as well as an addresses
collection to be placed on the User
class. Such a relation is known as a bidirectional relationship. Because of the placement of the foreign key, from Address
to User
it is many to one, and from User
to Address
it is one to many. SQLAlchemy is automatically aware of many-to-one/one-to-many based on foreign keys.
The relation()
function is extremely flexible, and could just have easily been defined on the User
class:
class User(Base): .... addresses = relation(Address, order_by=Address.id, backref="user")
We are also free to not define a backref, and to define the relation()
only on one class and not the other. It is also possible to define two separate relation()
s for either direction, which is generally safe for many-to-one and one-to-many relations, but not for many-to-many relations.
When using the declarative
extension, relation()
gives us the option to use strings for most arguments that concern the target class, in the case that the target class has not yet been defined. This only works in conjunction with declarative
:
class User(Base): .... addresses = relation("Address", order_by="Address.id", backref="user")
When declarative
is not in use, you typically define your mapper()
well after the target classes and Table
objects have been defined, so string expressions are not needed.
We'll need to create the addresses
table in the database, so we will issue another CREATE from our metadata, which will skip over tables which have already been created:
sql>>> metadata.create_all(engine)
Now when we create a User
, a blank addresses
collection will be present. By default, the collection is a Python list. Other collection types, such as sets and dictionaries, are available as well:
>>> jack = User('jack', 'Jack Bean', 'gjffdd') >>> jack.addresses []
We are free to add Address
objects on our User
object. In this case we just assign a full list directly:
>>> jack.addresses = [Address(email_address='jack@google.com'), Address(email_address='j25@yahoo.com')]
When using a bidirectional relationship, elements added in one direction automatically become visible in the other direction. This is the basic behavior of the backref keyword, which maintains the relationship purely in memory, without using any SQL:
>>> jack.addresses[1] <Address('j25@yahoo.com')> >>> jack.addresses[1].user <User('jack','Jack Bean', 'gjffdd')>
Let's add and commit Jack Bean
to the database. jack
as well as the two Address
members in his addresses
collection are both added to the session at once, using a process known as cascading:
>>> session.add(jack) sql>>> session.commit()
Querying for Jack, we get just Jack back. No SQL is yet issued for Jack's addresses:
sql>>> jack = session.query(User).filter_by(name='jack').one()
>>> jack <User('jack','Jack Bean', 'gjffdd')>
Let's look at the addresses
collection. Watch the SQL:
sql>>> jack.addresses
[<Address('jack@google.com')>, <Address('j25@yahoo.com')>]
When we accessed the addresses
collection, SQL was suddenly issued. This is an example of a lazy loading relation. The addresses
collection is now loaded and behaves just like an ordinary list.
If you want to reduce the number of queries (dramatically, in many cases), we can apply an eager load to the query operation. With the same query, we may apply an option to the query, indicating that we'd like addresses
to load "eagerly". SQLAlchemy then constructs an outer join between the users
and addresses
tables, and loads them at once, populating the addresses
collection on each User
object if it's not already populated:
>>> from sqlalchemy.orm import eagerload sql>>> jack = session.query(User).options(eagerload('addresses')).filter_by(name='jack').one()
>>> jack <User('jack','Jack Bean', 'gjffdd')> >>> jack.addresses [<Address('jack@google.com')>, <Address('j25@yahoo.com')>]
SQLAlchemy has the ability to control exactly which attributes and how many levels deep should be joined together in a single SQL query. More information on this feature is available in Relation Configuration.
back to section topWhile the eager load created a JOIN specifically to populate a collection, we can also work explicitly with joins in many ways. For example, to construct a simple inner join between User
and Address
, we can just filter()
their related columns together. Below we load the User
and Address
entities at once using this method:
sql>>> for u, a in session.query(User, Address).filter(User.id==Address.user_id).\ ... filter(Address.email_address=='jack@google.com').all(): ... print u, a
<User('jack','Jack Bean', 'gjffdd')> <Address('jack@google.com')>
Or we can make a real JOIN construct; one way to do so is to use the ORM join()
function, and tell Query
to "select from" this join:
>>> from sqlalchemy.orm import join sql>>> session.query(User).select_from(join(User, Address)).\ ... filter(Address.email_address=='jack@google.com').all()
[<User('jack','Jack Bean', 'gjffdd')>]
join()
knows how to join between User
and Address
because there's only one foreign key between them. If there were no foreign keys, or several, join()
would require a third argument indicating the ON clause of the join, in one of the following forms:
join(User, Address, User.id==Address.user_id) # explicit condition join(User, Address, User.addresses) # specify relation from left to right join(User, Address, 'addresses') # same, using a string
The functionality of join()
is also available generatively from Query
itself using Query.join
. This is most easily used with just the "ON" clause portion of the join, such as:
sql>>> session.query(User).join(User.addresses).\ ... filter(Address.email_address=='jack@google.com').all()
[<User('jack','Jack Bean', 'gjffdd')>]
To explicitly specify the target of the join, use tuples to form an argument list similar to the standalone join. This becomes more important when using aliases and similar constructs:
session.query(User).join((Address, User.addresses))
Multiple joins can be created by passing a list of arguments:
session.query(Foo).join(Foo.bars, Bar.bats, (Bat, 'widgets'))
The above would produce SQL something like foo JOIN bars ON <onclause> JOIN bats ON <onclause> JOIN widgets ON <onclause>
.
When querying across multiple tables, if the same table needs to be referenced more than once, SQL typically requires that the table be aliased with another name, so that it can be distinguished against other occurences of that table. The Query
supports this most expicitly using the aliased
construct. Below we join to the Address
entity twice, to locate a user who has two distinct email addresses at the same time:
>>> from sqlalchemy.orm import aliased >>> adalias1 = aliased(Address) >>> adalias2 = aliased(Address) sql>>> for username, email1, email2 in session.query(User.name, adalias1.email_address, adalias2.email_address).\ ... join((adalias1, User.addresses), (adalias2, User.addresses)).\ ... filter(adalias1.email_address=='jack@google.com').\ ... filter(adalias2.email_address=='j25@yahoo.com'): ... print username, email1, email2
jack jack@google.com j25@yahoo.com
The Query
is suitable for generating statements which can be used as subqueries. Suppose we wanted to load User
objects along with a count of how many Address
records each user has. The best way to generate SQL like this is to get the count of addresses grouped by user ids, and JOIN to the parent. In this case we use a LEFT OUTER JOIN so that we get rows back for those users who don't have any addresses, e.g.:
SELECT users.*, adr_count.address_count FROM users LEFT OUTER JOIN (SELECT user_id, count(*) AS address_count FROM addresses GROUP BY user_id) AS adr_count ON users.id=adr_count.user_id
Using the Query
, we build a statement like this from the inside out. The statement
accessor returns a SQL expression representing the statement generated by a particular Query
- this is an instance of a select()
construct, which are described in SQL Expression Language Tutorial:
>>> from sqlalchemy.sql import func >>> stmt = session.query(Address.user_id, func.count('*').label('address_count')).group_by(Address.user_id).subquery()
The func
keyword generates SQL functions, and the subquery()
method on Query
produces a SQL expression construct representing a SELECT statement embedded within an alias (it's actually shorthand for query.statement.alias()
).
Once we have our statement, it behaves like a Table
construct, such as the one we created for users
at the start of this tutorial. The columns on the statement are accessible through an attribute called c
:
sql>>> for u, count in session.query(User, stmt.c.address_count).\ ... outerjoin((stmt, User.id==stmt.c.user_id)).order_by(User.id): ... print u, count
<User('ed','Ed Jones', 'f8s7ccs')> None <User('wendy','Wendy Williams', 'foobar')> None <User('mary','Mary Contrary', 'xxg527')> None <User('fred','Fred Flinstone', 'blah')> None <User('jack','Jack Bean', 'gjffdd')> 2
The EXISTS keyword in SQL is a boolean operator which returns True if the given expression contains any rows. It may be used in many scenarios in place of joins, and is also useful for locating rows which do not have a corresponding row in a related table.
There is an explicit EXISTS construct, which looks like this:
>>> from sqlalchemy.sql import exists >>> stmt = exists().where(Address.user_id==User.id) sql>>> for name, in session.query(User.name).filter(stmt): ... print name
jack
The Query
features several operators which make usage of EXISTS automatically. Above, the statement can be expressed along the User.addresses
relation using any()
:
sql>>> for name, in session.query(User.name).filter(User.addresses.any()): ... print name
jack
any()
takes criterion as well, to limit the rows matched:
sql>>> for name, in session.query(User.name).filter(User.addresses.any(Address.email_address.like('%google%'))): ... print name
jack
has()
is the same operator as any()
for many-to-one relations (note the ~
operator here too, which means "NOT"):
sql>>> session.query(Address).filter(~Address.user.has(User.name=='jack')).all()
[]
Here's all the operators which build on relations:
equals (used for many-to-one)
query.filter(Address.user == someuser)
not equals (used for many-to-one)
query.filter(Address.user != someuser)
IS NULL (used for many-to-one)
query.filter(Address.user == None)
contains (used for one-to-many and many-to-many collections)
query.filter(User.addresses.contains(someaddress))
any (used for one-to-many and many-to-many collections)
query.filter(User.addresses.any(Address.email_address == 'bar')) # also takes keyword arguments: query.filter(User.addresses.any(email_address='bar'))
has (used for many-to-one)
query.filter(Address.user.has(name='ed'))
with_parent (used for any relation)
session.query(Address).with_parent(someuser, 'addresses')
Let's try to delete jack
and see how that goes. We'll mark as deleted in the session, then we'll issue a count
query to see that no rows remain:
>>> session.delete(jack) sql>>> session.query(User).filter_by(name='jack').count()
0
So far, so good. How about Jack's Address
objects ?
sql>>> session.query(Address).filter( ... Address.email_address.in_(['jack@google.com', 'j25@yahoo.com']) ... ).count()
2
Uh oh, they're still there ! Analyzing the flush SQL, we can see that the user_id
column of each address was set to NULL, but the rows weren't deleted. SQLAlchemy doesn't assume that deletes cascade, you have to tell it to do so.
We will configure cascade options on the User.addresses
relation to change the behavior. While SQLAlchemy allows you to add new attributes and relations to mappings at any point in time, in this case the existing relation needs to be removed, so we need to tear down the mappings completely and start again. This is not a typical operation and is here just for illustrative purposes.
Removing all ORM state is as follows:
>>> session.close() # roll back and close the transaction >>> from sqlalchemy.orm import clear_mappers >>> clear_mappers() # clear mappers
Below, we use mapper()
to reconfigure an ORM mapping for User
and Address
, on our existing but currently un-mapped classes. The User.addresses
relation now has delete, delete-orphan
cascade on it, which indicates that DELETE operations will cascade to attached Address
objects as well as Address
objects which are removed from their parent:
>>> mapper(User, users_table, properties={ ... 'addresses':relation(Address, backref='user', cascade="all, delete, delete-orphan") ... }) <Mapper at 0x...; User> >>> addresses_table = Address.__table__ >>> mapper(Address, addresses_table) <Mapper at 0x...; Address>
Now when we load Jack (below using get()
, which loads by primary key), removing an address from his addresses
collection will result in that Address
being deleted:
# load Jack by primary key sql>>> jack = session.query(User).get(5)
# remove one Address (lazy load fires off) sql>>> del jack.addresses[1]
# only one address remains sql>>> session.query(Address).filter( ... Address.email_address.in_(['jack@google.com', 'j25@yahoo.com']) ... ).count()
1
Deleting Jack will delete both Jack and his remaining Address
:
>>> session.delete(jack) sql>>> session.query(User).filter_by(name='jack').count()
0 sql>>> session.query(Address).filter( ... Address.email_address.in_(['jack@google.com', 'j25@yahoo.com']) ... ).count()
0
We're moving into the bonus round here, but lets show off a many-to-many relationship. We'll sneak in some other features too, just to take a tour. We'll make our application a blog application, where users can write BlogPost
s, which have Keywords
associated with them.
The declarative setup is as follows:
>>> from sqlalchemy import Text >>> # association table >>> post_keywords = Table('post_keywords', metadata, ... Column('post_id', Integer, ForeignKey('posts.id')), ... Column('keyword_id', Integer, ForeignKey('keywords.id')) ... ) >>> class BlogPost(Base): ... __tablename__ = 'posts' ... ... id = Column(Integer, primary_key=True) ... user_id = Column(Integer, ForeignKey('users.id')) ... headline = Column(String(255), nullable=False) ... body = Column(Text) ... ... # many to many BlogPost<->Keyword ... keywords = relation('Keyword', secondary=post_keywords, backref='posts') ... ... def __init__(self, headline, body, author): ... self.author = author ... self.headline = headline ... self.body = body ... ... def __repr__(self): ... return "BlogPost(%r, %r, %r)" % (self.headline, self.body, self.author) >>> class Keyword(Base): ... __tablename__ = 'keywords' ... ... id = Column(Integer, primary_key=True) ... keyword = Column(String(50), nullable=False, unique=True) ... ... def __init__(self, keyword): ... self.keyword = keyword
Above, the many-to-many relation above is BlogPost.keywords
. The defining feature of a many to many relation is the secondary
keyword argument which references a Table
object representing the association table. This table only contains columns which reference the two sides of the relation; if it has any other columns, such as its own primary key, or foreign keys to other tables, SQLAlchemy requires a different usage pattern called the "association object", described at Association Object.
The many-to-many relation is also bi-directional using the backref
keyword. This is the one case where usage of backref
is generally required, since if a separate posts
relation were added to the Keyword
entity, both relations would independently add and remove rows from the post_keywords
table and produce conflicts.
We would also like our BlogPost
class to have an author
field. We will add this as another bidirectional relationship, except one issue we'll have is that a single user might have lots of blog posts. When we access User.posts
, we'd like to be able to filter results further so as not to load the entire collection. For this we use a setting accepted by relation()
called lazy='dynamic'
, which configures an alternate loader strategy on the attribute. To use it on the "reverse" side of a relation()
, we use the backref()
function:
>>> from sqlalchemy.orm import backref >>> # "dynamic" loading relation to User >>> BlogPost.author = relation(User, backref=backref('posts', lazy='dynamic'))
Create new tables:
sql>>> metadata.create_all(engine)
Usage is not too different from what we've been doing. Let's give Wendy some blog posts:
sql>>> wendy = session.query(User).filter_by(name='wendy').one()
>>> post = BlogPost("Wendy's Blog Post", "This is a test", wendy) >>> session.add(post)
We're storing keywords uniquely in the database, but we know that we don't have any yet, so we can just create them:
>>> post.keywords.append(Keyword('wendy')) >>> post.keywords.append(Keyword('firstpost'))
We can now look up all blog posts with the keyword 'firstpost'. We'll use the any
operator to locate "blog posts where any of its keywords has the keyword string 'firstpost'":
sql>>> session.query(BlogPost).filter(BlogPost.keywords.any(keyword='firstpost')).all()
[BlogPost("Wendy's Blog Post", 'This is a test', <User('wendy','Wendy Williams', 'foobar')>)]
If we want to look up just Wendy's posts, we can tell the query to narrow down to her as a parent:
sql>>> session.query(BlogPost).filter(BlogPost.author==wendy).\ ... filter(BlogPost.keywords.any(keyword='firstpost')).all()
[BlogPost("Wendy's Blog Post", 'This is a test', <User('wendy','Wendy Williams', 'foobar')>)]
Or we can use Wendy's own posts
relation, which is a "dynamic" relation, to query straight from there:
sql>>> wendy.posts.filter(BlogPost.keywords.any(keyword='firstpost')).all()
[BlogPost("Wendy's Blog Post", 'This is a test', <User('wendy','Wendy Williams', 'foobar')>)]
Generated Documentation for Query: class Query(object)
ORM Generated Docs: module sqlalchemy.orm
Further information on mapping setups are in Mapper Configuration.
Further information on working with Sessions: Using the Session.
back to section topThis tutorial will cover SQLAlchemy SQL Expressions, which are Python constructs that represent SQL statements. The tutorial is in doctest format, meaning each >>>
line represents something you can type at a Python command prompt, and the following text represents the expected return value. The tutorial has no prerequisites.
A quick check to verify that we are on at least version 0.5 of SQLAlchemy:
>>> import sqlalchemy >>> sqlalchemy.__version__ 0.5.0
For this tutorial we will use an in-memory-only SQLite database. This is an easy way to test things without needing to have an actual database defined anywhere. To connect we use create_engine()
:
>>> from sqlalchemy import create_engine >>> engine = create_engine('sqlite:///:memory:', echo=True)
The echo
flag is a shortcut to setting up SQLAlchemy logging, which is accomplished via Python's standard logging
module. With it enabled, we'll see all the generated SQL produced. If you are working through this tutorial and want less output generated, set it to False
. This tutorial will format the SQL behind a popup window so it doesn't get in our way; just click the "SQL" links to see whats being generated.
The SQL Expression Language constructs its expressions in most cases against table columns. In SQLAlchemy, a column is most often represented by an object called Column
, and in all cases a Column
is associated with a Table
. A collection of Table
objects and their associated child objects is referred to as database metadata. In this tutorial we will explicitly lay out several Table
objects, but note that SA can also "import" whole sets of Table
objects automatically from an existing database (this process is called table reflection).
We define our tables all within a catalog called MetaData
, using the Table
construct, which resembles regular SQL CREATE TABLE statements. We'll make two tables, one of which represents "users" in an application, and another which represents zero or more "email addreses" for each row in the "users" table:
>>> from sqlalchemy import Table, Column, Integer, String, MetaData, ForeignKey >>> metadata = MetaData() >>> users = Table('users', metadata, ... Column('id', Integer, primary_key=True), ... Column('name', String), ... Column('fullname', String), ... ) >>> addresses = Table('addresses', metadata, ... Column('id', Integer, primary_key=True), ... Column('user_id', None, ForeignKey('users.id')), ... Column('email_address', String, nullable=False) ... )
All about how to define Table
objects, as well as how to create them from an existing database automatically, is described in Database Meta Data.
Next, to tell the MetaData
we'd actually like to create our selection of tables for real inside the SQLite database, we use create_all()
, passing it the engine
instance which points to our database. This will check for the presence of each table first before creating, so it's safe to call multiple times:
sql>>> metadata.create_all(engine)
Users familiar with the syntax of CREATE TABLE may notice that the VARCHAR columns were generated without a length; on SQLite, this is a valid datatype, but on most databases it's not allowed. So if running this tutorial on a database such as Postgres or MySQL, and you wish to use SQLAlchemy to generate the tables, a "length" may be provided to the String
type as below:
Column('name', String(50))
The length field on String
, as well as similar fields available on Integer
, Numeric
, etc. are not referenced by SQLAlchemy other than when creating tables.
The first SQL expression we'll create is the Insert
construct, which represents an INSERT statement. This is typically created relative to its target table:
>>> ins = users.insert()
To see a sample of the SQL this construct produces, use the str()
function:
>>> str(ins) 'INSERT INTO users (id, name, fullname) VALUES (:id, :name, :fullname)'
Notice above that the INSERT statement names every column in the users
table. This can be limited by using the values
keyword, which establishes the VALUES clause of the INSERT explicitly:
>>> ins = users.insert(values={'name':'jack', 'fullname':'Jack Jones'}) >>> str(ins) 'INSERT INTO users (name, fullname) VALUES (:name, :fullname)'
Above, while the values
keyword limited the VALUES clause to just two columns, the actual data we placed in values
didn't get rendered into the string; instead we got named bind parameters. As it turns out, our data is stored within our Insert
construct, but it typically only comes out when the statement is actually executed; since the data consists of literal values, SQLAlchemy automatically generates bind parameters for them. We can peek at this data for now by looking at the compiled form of the statement:
>>> ins.compile().params {'fullname': 'Jack Jones', 'name': 'jack'}
The interesting part of an Insert
is executing it. In this tutorial, we will generally focus on the most explicit method of executing a SQL construct, and later touch upon some "shortcut" ways to do it. The engine
object we created is a repository for database connections capable of issuing SQL to the database. To acquire a connection, we use the connect()
method:
>>> conn = engine.connect() >>> conn <sqlalchemy.engine.base.Connection object at 0x...>
The Connection
object represents an actively checked out DBAPI connection resource. Lets feed it our Insert
object and see what happens:
>>> result = conn.execute(ins)
So the INSERT statement was now issued to the database. Although we got positional "qmark" bind parameters instead of "named" bind parameters in the output. How come ? Because when executed, the Connection
used the SQLite dialect to help generate the statement; when we use the str()
function, the statement isn't aware of this dialect, and falls back onto a default which uses named parameters. We can view this manually as follows:
>>> ins.bind = engine >>> str(ins) 'INSERT INTO users (name, fullname) VALUES (?, ?)'
What about the result
variable we got when we called execute()
? As the SQLAlchemy Connection
object references a DBAPI connection, the result, known as a ResultProxy
object, is analogous to the DBAPI cursor object. In the case of an INSERT, we can get important information from it, such as the primary key values which were generated from our statement:
>>> result.last_inserted_ids() [1]
The value of 1
was automatically generated by SQLite, but only because we did not specify the id
column in our Insert
statement; otherwise, our explicit value would have been used. In either case, SQLAlchemy always knows how to get at a newly generated primary key value, even though the method of generating them is different across different databases; each databases' Dialect
knows the specific steps needed to determine the correct value (or values; note that last_inserted_ids()
returns a list so that it supports composite primary keys).
Our insert example above was intentionally a little drawn out to show some various behaviors of expression language constructs. In the usual case, an Insert
statement is usually compiled against the parameters sent to the execute()
method on Connection
, so that there's no need to use the values
keyword with Insert
. Lets create a generic Insert
statement again and use it in the "normal" way:
>>> ins = users.insert() >>> conn.execute(ins, id=2, name='wendy', fullname='Wendy Williams')
<sqlalchemy.engine.base.ResultProxy object at 0x...>
Above, because we specified all three columns in the the execute()
method, the compiled Insert
included all three columns. The Insert
statement is compiled at execution time based on the parameters we specified; if we specified fewer parameters, the Insert
would have fewer entries in its VALUES clause.
To issue many inserts using DBAPI's executemany()
method, we can send in a list of dictionaries each containing a distinct set of parameters to be inserted, as we do here to add some email addresses:
>>> conn.execute(addresses.insert(), [ ... {'user_id': 1, 'email_address' : 'jack@yahoo.com'}, ... {'user_id': 1, 'email_address' : 'jack@msn.com'}, ... {'user_id': 2, 'email_address' : 'www@www.org'}, ... {'user_id': 2, 'email_address' : 'wendy@aol.com'}, ... ])
<sqlalchemy.engine.base.ResultProxy object at 0x...>
Above, we again relied upon SQLite's automatic generation of primary key identifiers for each addresses
row.
When executing multiple sets of parameters, each dictionary must have the same set of keys; i.e. you cant have fewer keys in some dictionaries than others. This is because the Insert
statement is compiled against the first dictionary in the list, and it's assumed that all subsequent argument dictionaries are compatible with that statement.
We're executing our Insert
using a Connection
. There's two options that allow you to not have to deal with the connection part. You can execute in the connectionless style, using the engine, which opens and closes a connection for you:
sql>>> result = engine.execute(users.insert(), name='fred', fullname="Fred Flintstone")
and you can save even more steps than that, if you connect the Engine
to the MetaData
object we created earlier. When this is done, all SQL expressions which involve tables within the MetaData
object will be automatically bound to the Engine
. In this case, we call it implicit execution:
>>> metadata.bind = engine sql>>> result = users.insert().execute(name="mary", fullname="Mary Contrary")
When the MetaData
is bound, statements will also compile against the engine's dialect. Since a lot of the examples here assume the default dialect, we'll detach the engine from the metadata which we just attached:
>>> metadata.bind = None
Detailed examples of connectionless and implicit execution are available in the "Engines" chapter: Connectionless Execution, Implicit Execution.
back to section topWe began with inserts just so that our test database had some data in it. The more interesting part of the data is selecting it ! We'll cover UPDATE and DELETE statements later. The primary construct used to generate SELECT statements is the select()
function:
>>> from sqlalchemy.sql import select >>> s = select([users]) >>> result = conn.execute(s)
Above, we issued a basic select()
call, placing the users
table within the COLUMNS clause of the select, and then executing. SQLAlchemy expanded the users
table into the set of each of its columns, and also generated a FROM clause for us. The result returned is again a ResultProxy
object, which acts much like a DBAPI cursor, including methods such as fetchone()
and fetchall()
. The easiest way to get rows from it is to just iterate:
>>> for row in result: ... print row (1, u'jack', u'Jack Jones') (2, u'wendy', u'Wendy Williams') (3, u'fred', u'Fred Flintstone') (4, u'mary', u'Mary Contrary')
Above, we see that printing each row produces a simple tuple-like result. We have more options at accessing the data in each row. One very common way is through dictionary access, using the string names of columns:
sql>>> result = conn.execute(s)
>>> row = result.fetchone() >>> print "name:", row['name'], "; fullname:", row['fullname'] name: jack ; fullname: Jack Jones
Integer indexes work as well:
>>> row = result.fetchone() >>> print "name:", row[1], "; fullname:", row[2] name: wendy ; fullname: Wendy Williams
But another way, whose usefulness will become apparent later on, is to use the Column
objects directly as keys:
sql>>> for row in conn.execute(s): ... print "name:", row[users.c.name], "; fullname:", row[users.c.fullname]
name: jack ; fullname: Jack Jones name: wendy ; fullname: Wendy Williams name: fred ; fullname: Fred Flintstone name: mary ; fullname: Mary Contrary
Result sets which have pending rows remaining should be explicitly closed before discarding. While the resources referenced by the ResultProxy
will be closed when the object is garbage collected, it's better to make it explicit as some database APIs are very picky about such things:
>>> result.close()
If we'd like to more carefully control the columns which are placed in the COLUMNS clause of the select, we reference individual Column
objects from our Table
. These are available as named attributes off the c
attribute of the Table
object:
>>> s = select([users.c.name, users.c.fullname]) sql>>> result = conn.execute(s)
>>> for row in result: ... print row (u'jack', u'Jack Jones') (u'wendy', u'Wendy Williams') (u'fred', u'Fred Flintstone') (u'mary', u'Mary Contrary')
Lets observe something interesting about the FROM clause. Whereas the generated statement contains two distinct sections, a "SELECT columns" part and a "FROM table" part, our select()
construct only has a list containing columns. How does this work ? Let's try putting two tables into our select()
statement:
sql>>> for row in conn.execute(select([users, addresses])): ... print row
(1, u'jack', u'Jack Jones', 1, 1, u'jack@yahoo.com') (1, u'jack', u'Jack Jones', 2, 1, u'jack@msn.com') (1, u'jack', u'Jack Jones', 3, 2, u'www@www.org') (1, u'jack', u'Jack Jones', 4, 2, u'wendy@aol.com') (2, u'wendy', u'Wendy Williams', 1, 1, u'jack@yahoo.com') (2, u'wendy', u'Wendy Williams', 2, 1, u'jack@msn.com') (2, u'wendy', u'Wendy Williams', 3, 2, u'www@www.org') (2, u'wendy', u'Wendy Williams', 4, 2, u'wendy@aol.com') (3, u'fred', u'Fred Flintstone', 1, 1, u'jack@yahoo.com') (3, u'fred', u'Fred Flintstone', 2, 1, u'jack@msn.com') (3, u'fred', u'Fred Flintstone', 3, 2, u'www@www.org') (3, u'fred', u'Fred Flintstone', 4, 2, u'wendy@aol.com') (4, u'mary', u'Mary Contrary', 1, 1, u'jack@yahoo.com') (4, u'mary', u'Mary Contrary', 2, 1, u'jack@msn.com') (4, u'mary', u'Mary Contrary', 3, 2, u'www@www.org') (4, u'mary', u'Mary Contrary', 4, 2, u'wendy@aol.com')
It placed both tables into the FROM clause. But also, it made a real mess. Those who are familiar with SQL joins know that this is a Cartesian product; each row from the users
table is produced against each row from the addresses
table. So to put some sanity into this statement, we need a WHERE clause. Which brings us to the second argument of select()
:
>>> s = select([users, addresses], users.c.id==addresses.c.user_id) sql>>> for row in conn.execute(s): ... print row
(1, u'jack', u'Jack Jones', 1, 1, u'jack@yahoo.com') (1, u'jack', u'Jack Jones', 2, 1, u'jack@msn.com') (2, u'wendy', u'Wendy Williams', 3, 2, u'www@www.org') (2, u'wendy', u'Wendy Williams', 4, 2, u'wendy@aol.com')
So that looks a lot better, we added an expression to our select()
which had the effect of adding WHERE users.id = addresses.user_id
to our statement, and our results were managed down so that the join of users
and addresses
rows made sense. But let's look at that expression? It's using just a Python equality operator between two different Column
objects. It should be clear that something is up. Saying 1==1
produces True
, and 1==2
produces False
, not a WHERE clause. So lets see exactly what that expression is doing:
>>> users.c.id==addresses.c.user_id <sqlalchemy.sql.expression._BinaryExpression object at 0x...>
Wow, surprise ! This is neither a True
nor a False
. Well what is it ?
>>> str(users.c.id==addresses.c.user_id) 'users.id = addresses.user_id'
As you can see, the ==
operator is producing an object that is very much like the Insert
and select()
objects we've made so far, thanks to Python's __eq__()
builtin; you call str()
on it and it produces SQL. By now, one can that everything we are working with is ultimately the same type of object. SQLAlchemy terms the base class of all of these expressions as sqlalchemy.sql.ClauseElement
.
Since we've stumbled upon SQLAlchemy's operator paradigm, let's go through some of its capabilities. We've seen how to equate two columns to each other:
>>> print users.c.id==addresses.c.user_id users.id = addresses.user_id
If we use a literal value (a literal meaning, not a SQLAlchemy clause object), we get a bind parameter:
>>> print users.c.id==7 users.id = :id_1
The 7
literal is embedded in ClauseElement
; we can use the same trick we did with the Insert
object to see it:
>>> (users.c.id==7).compile().params {'id_1': 7}
Most Python operators, as it turns out, produce a SQL expression here, like equals, not equals, etc.:
>>> print users.c.id != 7 users.id != :id_1 >>> # None converts to IS NULL >>> print users.c.name == None users.name IS NULL >>> # reverse works too >>> print 'fred' > users.c.name users.name < :name_1
If we add two integer columns together, we get an addition expression:
>>> print users.c.id + addresses.c.id users.id + addresses.id
Interestingly, the type of the Column
is important ! If we use +
with two string based columns (recall we put types like Integer
and String
on our Column
objects at the beginning), we get something different:
>>> print users.c.name + users.c.fullname users.name || users.fullname
Where ||
is the string concatenation operator used on most databases. But not all of them. MySQL users, fear not:
>>> print (users.c.name + users.c.fullname).compile(bind=create_engine('mysql://')) concat(users.name, users.fullname)
The above illustrates the SQL that's generated for an Engine
that's connected to a MySQL database; the ||
operator now compiles as MySQL's concat()
function.
If you have come across an operator which really isn't available, you can always use the op()
method; this generates whatever operator you need:
>>> print users.c.name.op('tiddlywinks')('foo') users.name tiddlywinks :name_1
We'd like to show off some of our operators inside of select()
constructs. But we need to lump them together a little more, so let's first introduce some conjunctions. Conjunctions are those little words like AND and OR that put things together. We'll also hit upon NOT. AND, OR and NOT can work from the corresponding functions SQLAlchemy provides (notice we also throw in a LIKE):
>>> from sqlalchemy.sql import and_, or_, not_ >>> print and_(users.c.name.like('j%'), users.c.id==addresses.c.user_id, ... or_(addresses.c.email_address=='wendy@aol.com', addresses.c.email_address=='jack@yahoo.com'), ... not_(users.c.id>5)) users.name LIKE :name_1 AND users.id = addresses.user_id AND (addresses.email_address = :email_address_1 OR addresses.email_address = :email_address_2) AND users.id <= :id_1
And you can also use the re-jiggered bitwise AND, OR and NOT operators, although because of Python operator precedence you have to watch your parenthesis:
>>> print users.c.name.like('j%') & (users.c.id==addresses.c.user_id) & \ ... ((addresses.c.email_address=='wendy@aol.com') | (addresses.c.email_address=='jack@yahoo.com')) \ ... & ~(users.c.id>5) users.name LIKE :name_1 AND users.id = addresses.user_id AND (addresses.email_address = :email_address_1 OR addresses.email_address = :email_address_2) AND users.id <= :id_1
So with all of this vocabulary, let's select all users who have an email address at AOL or MSN, whose name starts with a letter between "m" and "z", and we'll also generate a column containing their full name combined with their email address. We will add two new constructs to this statement, between()
and label()
. between()
produces a BETWEEN clause, and label()
is used in a column expression to produce labels using the AS
keyword; it's recommended when selecting from expressions that otherwise would not have a name:
>>> s = select([(users.c.fullname + ", " + addresses.c.email_address).label('title')], ... and_( ... users.c.id==addresses.c.user_id, ... users.c.name.between('m', 'z'), ... or_( ... addresses.c.email_address.like('%@aol.com'), ... addresses.c.email_address.like('%@msn.com') ... ) ... ) ... ) >>> print conn.execute(s).fetchall() SELECT users.fullname || ? || addresses.email_address AS title FROM users, addresses WHERE users.id = addresses.user_id AND users.name BETWEEN ? AND ? AND (addresses.email_address LIKE ? OR addresses.email_address LIKE ?) [', ', 'm', 'z', '%@aol.com', '%@msn.com'] [(u'Wendy Williams, wendy@aol.com',)]
Once again, SQLAlchemy figured out the FROM clause for our statement. In fact it will determine the FROM clause based on all of its other bits; the columns clause, the whereclause, and also some other elements which we haven't covered yet, which include ORDER BY, GROUP BY, and HAVING.
back to section topOur last example really became a handful to type. Going from what one understands to be a textual SQL expression into a Python construct which groups components together in a programmatic style can be hard. That's why SQLAlchemy lets you just use strings too. The text()
construct represents any textual statement. To use bind parameters with text()
, always use the named colon format. Such as below, we create a text()
and execute it, feeding in the bind parameters to the execute()
method:
>>> from sqlalchemy.sql import text >>> s = text("""SELECT users.fullname || ', ' || addresses.email_address AS title ... FROM users, addresses ... WHERE users.id = addresses.user_id AND users.name BETWEEN :x AND :y AND ... (addresses.email_address LIKE :e1 OR addresses.email_address LIKE :e2) ... """) sql>>> print conn.execute(s, x='m', y='z', e1='%@aol.com', e2='%@msn.com').fetchall()
[(u'Wendy Williams, wendy@aol.com',)]
To gain a "hybrid" approach, any of SA's SQL constructs can have text freely intermingled wherever you like - the text()
construct can be placed within any other ClauseElement
construct, and when used in a non-operator context, a direct string may be placed which converts to text()
automatically. Below we combine the usage of text()
and strings with our constructed select()
object, by using the select()
object to structure the statement, and the text()
/strings to provide all the content within the structure. For this example, SQLAlchemy is not given any Column
or Table
objects in any of its expressions, so it cannot generate a FROM clause. So we also give it the from_obj
keyword argument, which is a list of ClauseElements
(or strings) to be placed within the FROM clause:
>>> s = select([text("users.fullname || ', ' || addresses.email_address AS title")], ... and_( ... "users.id = addresses.user_id", ... "users.name BETWEEN 'm' AND 'z'", ... "(addresses.email_address LIKE :x OR addresses.email_address LIKE :y)" ... ), ... from_obj=['users', 'addresses'] ... ) sql>>> print conn.execute(s, x='%@aol.com', y='%@msn.com').fetchall()
[(u'Wendy Williams, wendy@aol.com',)]
Going from constructed SQL to text, we lose some capabilities. We lose the capability for SQLAlchemy to compile our expression to a specific target database; above, our expression won't work with MySQL since it has no ||
construct. It also becomes more tedious for SQLAlchemy to be made aware of the datatypes in use; for example, if our bind parameters required UTF-8 encoding before going in, or conversion from a Python datetime
into a string (as is required with SQLite), we would have to add extra information to our text()
construct. Similar issues arise on the result set side, where SQLAlchemy also performs type-specific data conversion in some cases; still more information can be added to text()
to work around this. But what we really lose from our statement is the ability to manipulate it, transform it, and analyze it. These features are critical when using the ORM, which makes heavy usage of relational transformations. To show off what we mean, we'll first introduce the ALIAS construct and the JOIN construct, just so we have some juicier bits to play with.
The alias corresponds to a "renamed" version of a table or arbitrary relation, which occurs anytime you say "SELECT .. FROM sometable AS someothername". The AS
creates a new name for the table. Aliases are super important in SQL as they allow you to reference the same table more than once. Scenarios where you need to do this include when you self-join a table to itself, or more commonly when you need to join from a parent table to a child table multiple times. For example, we know that our user jack
has two email addresses. How can we locate jack based on the combination of those two addresses? We need to join twice to it. Let's construct two distinct aliases for the addresses
table and join:
>>> a1 = addresses.alias('a1') >>> a2 = addresses.alias('a2') >>> s = select([users], and_( ... users.c.id==a1.c.user_id, ... users.c.id==a2.c.user_id, ... a1.c.email_address=='jack@msn.com', ... a2.c.email_address=='jack@yahoo.com' ... )) sql>>> print conn.execute(s).fetchall()
[(1, u'jack', u'Jack Jones')]
Easy enough. One thing that we're going for with the SQL Expression Language is the melding of programmatic behavior with SQL generation. Coming up with names like a1
and a2
is messy; we really didn't need to use those names anywhere, it's just the database that needed them. Plus, we might write some code that uses alias objects that came from several different places, and it's difficult to ensure that they all have unique names. So instead, we just let SQLAlchemy make the names for us, using "anonymous" aliases:
>>> a1 = addresses.alias() >>> a2 = addresses.alias() >>> s = select([users], and_( ... users.c.id==a1.c.user_id, ... users.c.id==a2.c.user_id, ... a1.c.email_address=='jack@msn.com', ... a2.c.email_address=='jack@yahoo.com' ... )) sql>>> print conn.execute(s).fetchall()
[(1, u'jack', u'Jack Jones')]
One super-huge advantage of anonymous aliases is that not only did we not have to guess up a random name, but we can also be guaranteed that the above SQL string is deterministically generated to be the same every time. This is important for databases such as Oracle which cache compiled "query plans" for their statements, and need to see the same SQL string in order to make use of it.
Aliases can of course be used for anything which you can SELECT from, including SELECT statements themselves. We can self-join the users
table back to the select()
we've created by making an alias of the entire statement. The correlate(None)
directive is to avoid SQLAlchemy's attempt to "correlate" the inner users
table with the outer one:
>>> a1 = s.correlate(None).alias() >>> s = select([users.c.name], users.c.id==a1.c.id) sql>>> print conn.execute(s).fetchall()
[(u'jack',)]
We're halfway along to being able to construct any SELECT expression. The next cornerstone of the SELECT is the JOIN expression. We've already been doing joins in our examples, by just placing two tables in either the columns clause or the where clause of the select()
construct. But if we want to make a real "JOIN" or "OUTERJOIN" construct, we use the join()
and outerjoin()
methods, most commonly accessed from the left table in the join:
>>> print users.join(addresses) users JOIN addresses ON users.id = addresses.user_id
The alert reader will see more surprises; SQLAlchemy figured out how to JOIN the two tables ! The ON condition of the join, as it's called, was automatically generated based on the ForeignKey
object which we placed on the addresses
table way at the beginning of this tutorial. Already the join()
construct is looking like a much better way to join tables.
Of course you can join on whatever expression you want, such as if we want to join on all users who use the same name in their email address as their username:
>>> print users.join(addresses, addresses.c.email_address.like(users.c.name + '%')) users JOIN addresses ON addresses.email_address LIKE users.name || :name_1
When we create a select()
construct, SQLAlchemy looks around at the tables we've mentioned and then places them in the FROM clause of the statement. When we use JOINs however, we know what FROM clause we want, so here we make usage of the from_obj
keyword argument:
>>> s = select([users.c.fullname], from_obj=[ ... users.join(addresses, addresses.c.email_address.like(users.c.name + '%')) ... ]) sql>>> print conn.execute(s).fetchall()
[(u'Jack Jones',), (u'Jack Jones',), (u'Wendy Williams',)]
The outerjoin()
function just creates LEFT OUTER JOIN
constructs. It's used just like join()
:
>>> s = select([users.c.fullname], from_obj=[users.outerjoin(addresses)]) >>> print s SELECT users.fullname FROM users LEFT OUTER JOIN addresses ON users.id = addresses.user_id
That's the output outerjoin()
produces, unless, of course, you're stuck in a gig using Oracle prior to version 9, and you've set up your engine (which would be using OracleDialect
) to use Oracle-specific SQL:
>>> from sqlalchemy.databases.oracle import OracleDialect >>> print s.compile(dialect=OracleDialect(use_ansi=False)) SELECT users.fullname FROM users, addresses WHERE users.id = addresses.user_id(+)
If you don't know what that SQL means, don't worry ! The secret tribe of Oracle DBAs don't want their black magic being found out ;).
back to section topWe've now gained the ability to construct very sophisticated statements. We can use all kinds of operators, table constructs, text, joins, and aliases. The point of all of this, as mentioned earlier, is not that it's an "easier" or "better" way to write SQL than just writing a SQL statement yourself; the point is that it's better for writing programmatically generated SQL which can be morphed and adapted as needed in automated scenarios.
To support this, the select()
construct we've been working with supports piecemeal construction, in addition to the "all at once" method we've been doing. Suppose you're writing a search function, which receives criterion and then must construct a select from it. To accomplish this, upon each criterion encountered, you apply "generative" criterion to an existing select()
construct with new elements, one at a time. We start with a basic select()
constructed with the shortcut method available on the users
table:
>>> query = users.select() >>> print query SELECT users.id, users.name, users.fullname FROM users
We encounter search criterion of "name='jack'". So we apply WHERE criterion stating such:
>>> query = query.where(users.c.name=='jack')
Next, we encounter that they'd like the results in descending order by full name. We apply ORDER BY, using an extra modifier desc
:
>>> query = query.order_by(users.c.fullname.desc())
We also come across that they'd like only users who have an address at MSN. A quick way to tack this on is by using an EXISTS clause, which we correlate to the users
table in the enclosing SELECT:
>>> from sqlalchemy.sql import exists >>> query = query.where( ... exists([addresses.c.id], ... and_(addresses.c.user_id==users.c.id, addresses.c.email_address.like('%@msn.com')) ... ).correlate(users))
And finally, the application also wants to see the listing of email addresses at once; so to save queries, we outerjoin the addresses
table (using an outer join so that users with no addresses come back as well; since we're programmatic, we might not have kept track that we used an EXISTS clause against the addresses
table too...). Additionally, since the users
and addresses
table both have a column named id
, let's isolate their names from each other in the COLUMNS clause by using labels:
>>> query = query.column(addresses).select_from(users.outerjoin(addresses)).apply_labels()
Let's bake for .0001 seconds and see what rises:
>>> conn.execute(query).fetchall()
[(1, u'jack', u'Jack Jones', 1, 1, u'jack@yahoo.com'), (1, u'jack', u'Jack Jones', 2, 1, u'jack@msn.com')]
So we started small, added one little thing at a time, and at the end we have a huge statement..which actually works. Now let's do one more thing; the searching function wants to add another email_address
criterion on, however it doesn't want to construct an alias of the addresses
table; suppose many parts of the application are written to deal specifically with the addresses
table, and to change all those functions to support receiving an arbitrary alias of the address would be cumbersome. We can actually convert the addresses
table within the existing statement to be an alias of itself, using replace_selectable()
:
>>> a1 = addresses.alias() >>> query = query.replace_selectable(addresses, a1) >>> print query
One more thing though, with automatic labeling applied as well as anonymous aliasing, how do we retrieve the columns from the rows for this thing ? The label for the email_addresses
column is now the generated name addresses_1_email_address
; and in another statement might be something different ! This is where accessing by result columns by Column
object becomes very useful:
sql>>> for row in conn.execute(query): ... print "Name:", row[users.c.name], "; Email Address", row[a1.c.email_address]
Name: jack ; Email Address jack@yahoo.com Name: jack ; Email Address jack@msn.com
The above example, by its end, got significantly more intense than the typical end-user constructed SQL will usually be. However when writing higher-level tools such as ORMs, they become more significant. SQLAlchemy's ORM relies very heavily on techniques like this.
back to section topThe concepts of creating SQL expressions have been introduced. What's left are more variants of the same themes. So now we'll catalog the rest of the important things we'll need to know.
Throughout all these examples, SQLAlchemy is busy creating bind parameters wherever literal expressions occur. You can also specify your own bind parameters with your own names, and use the same statement repeatedly. The database dialect converts to the appropriate named or positional style, as here where it converts to positional for SQLite:
>>> from sqlalchemy.sql import bindparam >>> s = users.select(users.c.name==bindparam('username')) sql>>> conn.execute(s, username='wendy').fetchall()
[(2, u'wendy', u'Wendy Williams')]
Another important aspect of bind parameters is that they may be assigned a type. The type of the bind parameter will determine its behavior within expressions and also how the data bound to it is processed before being sent off to the database:
>>> s = users.select(users.c.name.like(bindparam('username', type_=String) + text("'%'"))) sql>>> conn.execute(s, username='wendy').fetchall()
[(2, u'wendy', u'Wendy Williams')]
Bind parameters of the same name can also be used multiple times, where only a single named value is needed in the execute parameters:
>>> s = select([users, addresses], ... users.c.name.like(bindparam('name', type_=String) + text("'%'")) | ... addresses.c.email_address.like(bindparam('name', type_=String) + text("'@%'")), ... from_obj=[users.outerjoin(addresses)]) sql>>> conn.execute(s, name='jack').fetchall()
[(1, u'jack', u'Jack Jones', 1, 1, u'jack@yahoo.com'), (1, u'jack', u'Jack Jones', 2, 1, u'jack@msn.com')]
SQL functions are created using the func
keyword, which generates functions using attribute access:
>>> from sqlalchemy.sql import func >>> print func.now() now() >>> print func.concat('x', 'y') concat(:param_1, :param_2)
Certain functions are marked as "ANSI" functions, which mean they don't get the parenthesis added after them, such as CURRENT_TIMESTAMP:
>>> print func.current_timestamp() CURRENT_TIMESTAMP
Functions are most typically used in the columns clause of a select statement, and can also be labeled as well as given a type. Labeling a function is recommended so that the result can be targeted in a result row based on a string name, and assigning it a type is required when you need result-set processing to occur, such as for Unicode conversion and date conversions. Below, we use the result function scalar()
to just read the first column of the first row and then close the result; the label, even though present, is not important in this case:
>>> print conn.execute( ... select([func.max(addresses.c.email_address, type_=String).label('maxemail')]) ... ).scalar()
www@www.org
Databases such as Postgres and Oracle which support functions that return whole result sets can be assembled into selectable units, which can be used in statements. Such as, a database function calculate()
which takes the parameters x
and y
, and returns three columns which we'd like to name q
, z
and r
, we can construct using "lexical" column objects as well as bind parameters:
>>> from sqlalchemy.sql import column >>> calculate = select([column('q'), column('z'), column('r')], ... from_obj=[func.calculate(bindparam('x'), bindparam('y'))]) >>> print select([users], users.c.id > calculate.c.z) SELECT users.id, users.name, users.fullname FROM users, (SELECT q, z, r FROM calculate(:x, :y)) WHERE users.id > z
If we wanted to use our calculate
statement twice with different bind parameters, the unique_params()
function will create copies for us, and mark the bind parameters as "unique" so that conflicting names are isolated. Note we also make two separate aliases of our selectable:
>>> s = select([users], users.c.id.between( ... calculate.alias('c1').unique_params(x=17, y=45).c.z, ... calculate.alias('c2').unique_params(x=5, y=12).c.z)) >>> print s SELECT users.id, users.name, users.fullname FROM users, (SELECT q, z, r FROM calculate(:x_1, :y_1)) AS c1, (SELECT q, z, r FROM calculate(:x_2, :y_2)) AS c2 WHERE users.id BETWEEN c1.z AND c2.z >>> s.compile().params {'x_2': 5, 'y_2': 12, 'y_1': 45, 'x_1': 17}
Unions come in two flavors, UNION and UNION ALL, which are available via module level functions:
>>> from sqlalchemy.sql import union >>> u = union( ... addresses.select(addresses.c.email_address=='foo@bar.com'), ... addresses.select(addresses.c.email_address.like('%@yahoo.com')), ... ).order_by(addresses.c.email_address) sql>>> print conn.execute(u).fetchall()
[(1, 1, u'jack@yahoo.com')]
Also available, though not supported on all databases, are intersect()
, intersect_all()
, except_()
, and except_all()
:
>>> from sqlalchemy.sql import except_ >>> u = except_( ... addresses.select(addresses.c.email_address.like('%@%.com')), ... addresses.select(addresses.c.email_address.like('%@msn.com')) ... ) sql>>> print conn.execute(u).fetchall()
[(1, 1, u'jack@yahoo.com'), (4, 2, u'wendy@aol.com')]
To embed a SELECT in a column expression, use as_scalar()
:
sql>>> print conn.execute(select([ ... users.c.name, ... select([func.count(addresses.c.id)], users.c.id==addresses.c.user_id).as_scalar() ... ])).fetchall()
[(u'jack', 2), (u'wendy', 2), (u'fred', 0), (u'mary', 0)]
Alternatively, applying a label()
to a select evaluates it as a scalar as well:
sql>>> print conn.execute(select([ ... users.c.name, ... select([func.count(addresses.c.id)], users.c.id==addresses.c.user_id).label('address_count') ... ])).fetchall()
[(u'jack', 2), (u'wendy', 2), (u'fred', 0), (u'mary', 0)]
Notice in the examples on "scalar selects", the FROM clause of each embedded select did not contain the users
table in its FROM clause. This is because SQLAlchemy automatically attempts to correlate embedded FROM objects to that of an enclosing query. To disable this, or to specify explicit FROM clauses to be correlated, use correlate()
:
>>> s = select([users.c.name], users.c.id==select([users.c.id]).correlate(None)) >>> print s SELECT users.name FROM users WHERE users.id = (SELECT users.id FROM users) >>> s = select([users.c.name, addresses.c.email_address], users.c.id== ... select([users.c.id], users.c.id==addresses.c.user_id).correlate(addresses) ... ) >>> print s SELECT users.name, addresses.email_address FROM users, addresses WHERE users.id = (SELECT users.id FROM users WHERE users.id = addresses.user_id)
The select()
function can take keyword arguments order_by
, group_by
(as well as having
), limit
, and offset
. There's also distinct=True
. These are all also available as generative functions. order_by()
expressions can use the modifiers asc()
or desc()
to indicate ascending or descending.
>>> s = select([addresses.c.user_id, func.count(addresses.c.id)]).\ ... group_by(addresses.c.user_id).having(func.count(addresses.c.id)>1) >>> print conn.execute(s).fetchall()
[(1, 2), (2, 2)] >>> s = select([addresses.c.email_address, addresses.c.id]).distinct().\ ... order_by(addresses.c.email_address.desc(), addresses.c.id) >>> conn.execute(s).fetchall()
[(u'www@www.org', 3), (u'wendy@aol.com', 4), (u'jack@yahoo.com', 1), (u'jack@msn.com', 2)] >>> s = select([addresses]).offset(1).limit(1) >>> print conn.execute(s).fetchall()
[(2, 1, u'jack@msn.com')]
Finally, we're back to UPDATE. Updates work a lot like INSERTS, except there is an additional WHERE clause that can be specified.
>>> # change 'jack' to 'ed' sql>>> conn.execute(users.update(users.c.name=='jack'), name='ed')
<sqlalchemy.engine.base.ResultProxy object at 0x...> >>> # use bind parameters >>> u = users.update(users.c.name==bindparam('oldname'), values={'name':bindparam('newname')}) sql>>> conn.execute(u, oldname='jack', newname='ed')
<sqlalchemy.engine.base.ResultProxy object at 0x...> >>> # update a column to an expression sql>>> conn.execute(users.update(values={users.c.fullname:"Fullname: " + users.c.name}))
<sqlalchemy.engine.base.ResultProxy object at 0x...>
A correlated update lets you update a table using selection from another table, or the same table:
>>> s = select([addresses.c.email_address], addresses.c.user_id==users.c.id).limit(1) sql>>> conn.execute(users.update(values={users.c.fullname:s}))
<sqlalchemy.engine.base.ResultProxy object at 0x...>
Finally, a delete. Easy enough:
sql>>> conn.execute(addresses.delete())
<sqlalchemy.engine.base.ResultProxy object at 0x...> sql>>> conn.execute(users.delete(users.c.name > 'm'))
<sqlalchemy.engine.base.ResultProxy object at 0x...>
The best place to get every possible name you can use in constructed SQL is the Generated Documentation.
Table Metadata Reference: Database Meta Data
Engine/Connection/Execution Reference: Database Engines
SQL Types: The Types System
back to section topThis section references most major configurational patterns involving the mapper() and relation() functions. It assumes you've worked through the Object Relational Tutorial and know how to construct and use rudimentary mappers and relations.
Full API documentation for the ORM:
Options for the mapper()
function:
The default behavior of a mapper
is to assemble all the columns in the mapped Table
into mapped object attributes. This behavior can be modified in several ways, as well as enhanced by SQL expressions.
To load only a part of the columns referenced by a table as attributes, use the include_properties
and exclude_properties
arguments:
mapper(User, users_table, include_properties=['user_id', 'user_name']) mapper(Address, addresses_table, exclude_properties=['street', 'city', 'state', 'zip'])
To change the name of the attribute mapped to a particular column, place the Column
object in the properties
dictionary with the desired key:
mapper(User, users_table, properties={ 'id': users_table.c.user_id, 'name': users_table.c.user_name, })
To change the names of all attributes using a prefix, use the column_prefix
option. This is useful for classes which wish to add their own property
accessors:
mapper(User, users_table, column_prefix='_')
The above will place attribute names such as _user_id
, _user_name
, _password
etc. on the mapped User
class.
To place multiple columns which are known to be "synonymous" based on foreign key relationship or join condition into the same mapped attribute, put them together using a list, as below where we map to a Join
:
# join users and addresses usersaddresses = sql.join(users_table, addresses_table, \ users_table.c.user_id == addresses_table.c.user_id) mapper(User, usersaddresses, properties={ 'id':[users_table.c.user_id, addresses_table.c.user_id], })
This feature allows particular columns of a table to not be loaded by default, instead being loaded later on when first referenced. It is essentially "column-level lazy loading". This feature is useful when one wants to avoid loading a large text or binary field into memory when it's not needed. Individual columns can be lazy loaded by themselves or placed into groups that lazy-load together.
book_excerpts = Table('books', db, Column('book_id', Integer, primary_key=True), Column('title', String(200), nullable=False), Column('summary', String(2000)), Column('excerpt', String), Column('photo', Binary) ) class Book(object): pass # define a mapper that will load each of 'excerpt' and 'photo' in # separate, individual-row SELECT statements when each attribute # is first referenced on the individual object instance mapper(Book, book_excerpts, properties={ 'excerpt': deferred(book_excerpts.c.excerpt), 'photo': deferred(book_excerpts.c.photo) })
Deferred columns can be placed into groups so that they load together:
book_excerpts = Table('books', db, Column('book_id', Integer, primary_key=True), Column('title', String(200), nullable=False), Column('summary', String(2000)), Column('excerpt', String), Column('photo1', Binary), Column('photo2', Binary), Column('photo3', Binary) ) class Book(object): pass # define a mapper with a 'photos' deferred group. when one photo is referenced, # all three photos will be loaded in one SELECT statement. The 'excerpt' will # be loaded separately when it is first referenced. mapper(Book, book_excerpts, properties = { 'excerpt': deferred(book_excerpts.c.excerpt), 'photo1': deferred(book_excerpts.c.photo1, group='photos'), 'photo2': deferred(book_excerpts.c.photo2, group='photos'), 'photo3': deferred(book_excerpts.c.photo3, group='photos') })
You can defer or undefer columns at the Query
level using the defer
and undefer
options:
query = session.query(Book) query.options(defer('summary')).all() query.options(undefer('excerpt')).all()
And an entire "deferred group", i.e. which uses the group
keyword argument to deferred()
, can be undeferred using undefer_group()
, sending in the group name:
query = session.query(Book) query.options(undefer_group('photos')).all()
To add a SQL clause composed of local or external columns as a read-only, mapped column attribute, use the column_property()
function. Any scalar-returning ClauseElement
may be used, as long as it has a name
attribute; usually, you'll want to call label()
to give it a specific name:
mapper(User, users_table, properties={ 'fullname': column_property( (users_table.c.firstname + " " + users_table.c.lastname).label('fullname') ) })
Correlated subqueries may be used as well:
mapper(User, users_table, properties={ 'address_count': column_property( select( [func.count(addresses_table.c.address_id)], addresses_table.c.user_id==users_table.c.user_id ).label('address_count') ) })
A quick way to add a "validation" routine to an attribute is to use the @validates
decorator. This is a shortcut for using the class Validator(AttributeExtension) attribute extension with individual column or relation based attributes. An attribute validator can raise an exception, halting the process of mutating the attribute's value, or can change the given value into something different. Validators, like all attribute extensions, are only called by normal userland code; they are not issued when the ORM is populating the object.
addresses_table = Table('addresses', metadata, Column('id', Integer, primary_key=True), Column('email', String) ) class EmailAddress(object): @validates('email') def validate_email(self, key, address): assert '@' in address return address mapper(EmailAddress, addresses_table)
Validators also receive collection events, when items are added to a collection:
class User(object): @validates('addresses') def validate_address(self, key, address): assert '@' in address.email return address
A more comprehensive way to produce modified behavior for an attribute is to use descriptors. These are commonly used in Python using the property()
function. The standard SQLAlchemy technique for descriptors is to create a plain descriptor, and to have it read/write from a mapped attribute with a different name. To have the descriptor named the same as a column, map the column under a different name, i.e.:
class EmailAddress(object): def _set_email(self, email): self._email = email def _get_email(self): return self._email email = property(_get_email, _set_email) mapper(MyAddress, addresses_table, properties={ '_email': addresses_table.c.email })
However, the approach above is not complete. While our EmailAddress
object will shuttle the value through the email
descriptor and into the _email
mapped attribute, the class level EmailAddress.email
attribute does not have the usual expression semantics usable with Query
. To provide these, we instead use the synonym()
function as follows:
mapper(EmailAddress, addresses_table, properties={ 'email': synonym('_email', map_column=True) })
The email
attribute is now usable in the same way as any other mapped attribute, including filter expressions, get/set operations, etc.:
address = sess.query(EmailAddress).filter(EmailAddress.email == 'some address').one() address.email = 'some other address' sess.flush() q = sess.query(EmailAddress).filter_by(email='some other address')
If the mapped class does not provide a property, the synonym()
construct will create a default getter/setter object automatically.
Sets of columns can be associated with a single datatype. The ORM treats the group of columns like a single column which accepts and returns objects using the custom datatype you provide. In this example, we'll create a table vertices
which stores a pair of x/y coordinates, and a custom datatype Point
which is a composite type of an x and y column:
vertices = Table('vertices', metadata, Column('id', Integer, primary_key=True), Column('x1', Integer), Column('y1', Integer), Column('x2', Integer), Column('y2', Integer), )
The requirements for the custom datatype class are that it have a constructor which accepts positional arguments corresponding to its column format, and also provides a method __composite_values__()
which returns the state of the object as a list or tuple, in order of its column-based attributes. It also should supply adequate __eq__()
and __ne__()
methods which test the equality of two instances, and may optionally provide a __set_composite_values__
method which is used to set internal state in some cases (typically when default values have been generated during a flush):
class Point(object): def __init__(self, x, y): self.x = x self.y = y def __composite_values__(self): return [self.x, self.y] def __set_composite_values__(self, x, y): self.x = x self.y = y def __eq__(self, other): return other.x == self.x and other.y == self.y def __ne__(self, other): return not self.__eq__(other)
If __set_composite_values__()
is not provided, the names of the mapped columns are taken as the names of attributes on the object, and setattr()
is used to set data.
Setting up the mapping uses the composite()
function:
class Vertex(object): pass mapper(Vertex, vertices, properties={ 'start': composite(Point, vertices.c.x1, vertices.c.y1), 'end': composite(Point, vertices.c.x2, vertices.c.y2) })
We can now use the Vertex
instances as well as querying as though the start
and end
attributes are regular scalar attributes:
sess = Session() v = Vertex(Point(3, 4), Point(5, 6)) sess.save(v) v2 = sess.query(Vertex).filter(Vertex.start == Point(3, 4))
The "equals" comparison operation by default produces an AND of all corresponding columns equated to one another. If you'd like to override this, or define the behavior of other SQL operators for your new type, the composite()
function accepts an extension object of type sqlalchemy.orm.PropComparator
:
from sqlalchemy.orm import PropComparator from sqlalchemy import sql class PointComparator(PropComparator): def __gt__(self, other): """define the 'greater than' operation""" return sql.and_(*[a>b for a, b in zip(self.prop.columns, other.__composite_values__())]) maper(Vertex, vertices, properties={ 'start': composite(Point, vertices.c.x1, vertices.c.y1, comparator=PointComparator), 'end': composite(Point, vertices.c.x2, vertices.c.y2, comparator=PointComparator) })
As of version 0.5, the ORM does not generate ordering for any query unless explicitly configured.
The "default" ordering for a collection, which applies to list-based collections, can be configured using the order_by
keyword argument on relation()
:
mapper(Address, addresses_table) # order address objects by address id mapper(User, users_table, properties={ 'addresses': relation(Address, order_by=addresses_table.c.address_id) })
Note that when using eager loaders with relations, the tables used by the eager load's join are anonymously aliased. You can only order by these columns if you specify it at the relation()
level. To control ordering at the query level based on a related table, you join()
to that relation, then order by it:
session.query(User).join('addresses').order_by(Address.street)
Ordering for rows loaded through Query
is usually specified using the order_by()
generative method. There is also an option to set a default ordering for Queries which are against a single mapped entity and where there was no explicit order_by()
stated, which is the order_by
keyword argument to mapper()
:
# order by a column mapper(User, users_table, order_by=users_table.c.user_id) # order by multiple items mapper(User, users_table, order_by=[users_table.c.user_id, users_table.c.user_name.desc()])
Above, a Query
issued for the User
class will use the value of the mapper's order_by
setting if the Query
itself has no ordering specified.
SQLAlchemy supports three forms of inheritance: single table inheritance, where several types of classes are stored in one table, concrete table inheritance, where each type of class is stored in its own table, and joined table inheritance, where the parent/child classes are stored in their own tables that are joined together in a select. Whereas support for single and joined table inheritance is strong, concrete table inheritance is a less common scenario with some particular problems so is not quite as flexible.
When mappers are configured in an inheritance relationship, SQLAlchemy has the ability to load elements "polymorphically", meaning that a single query can return objects of multiple types.
For the following sections, assume this class relationship:
class Employee(object): def __init__(self, name): self.name = name def __repr__(self): return self.__class__.__name__ + " " + self.name class Manager(Employee): def __init__(self, name, manager_data): self.name = name self.manager_data = manager_data def __repr__(self): return self.__class__.__name__ + " " + self.name + " " + self.manager_data class Engineer(Employee): def __init__(self, name, engineer_info): self.name = name self.engineer_info = engineer_info def __repr__(self): return self.__class__.__name__ + " " + self.name + " " + self.engineer_info
In joined table inheritance, each class along a particular classes' list of parents is represented by a unique table. The total set of attributes for a particular instance is represented as a join along all tables in its inheritance path. Here, we first define a table to represent the Employee
class. This table will contain a primary key column (or columns), and a column for each attribute that's represented by Employee
. In this case it's just name
:
employees = Table('employees', metadata, Column('employee_id', Integer, primary_key=True), Column('name', String(50)), Column('type', String(30), nullable=False) )
The table also has a column called type
. It is strongly advised in both single- and joined- table inheritance scenarios that the root table contains a column whose sole purpose is that of the discriminator; it stores a value which indicates the type of object represented within the row. The column may be of any desired datatype. While there are some "tricks" to work around the requirement that there be a discriminator column, they are more complicated to configure when one wishes to load polymorphically.
Next we define individual tables for each of Engineer
and Manager
, which each contain columns that represent the attributes unique to the subclass they represent. Each table also must contain a primary key column (or columns), and in most cases a foreign key reference to the parent table. It is standard practice that the same column is used for both of these roles, and that the column is also named the same as that of the parent table. However this is optional in SQLAlchemy; separate columns may be used for primary key and parent-relation, the column may be named differently than that of the parent, and even a custom join condition can be specified between parent and child tables instead of using a foreign key. In joined table inheritance, the primary key of an instance is always represented by the primary key of the base table only (new in SQLAlchemy 0.4).
engineers = Table('engineers', metadata, Column('employee_id', Integer, ForeignKey('employees.employee_id'), primary_key=True), Column('engineer_info', String(50)), ) managers = Table('managers', metadata, Column('employee_id', Integer, ForeignKey('employees.employee_id'), primary_key=True), Column('manager_data', String(50)), )
We then configure mappers as usual, except we use some additional arguments to indicate the inheritance relationship, the polymorphic discriminator column, and the polymorphic identity of each class; this is the value that will be stored in the polymorphic discriminator column.
mapper(Employee, employees, polymorphic_on=employees.c.type, polymorphic_identity='employee') mapper(Engineer, engineers, inherits=Employee, polymorphic_identity='engineer') mapper(Manager, managers, inherits=Employee, polymorphic_identity='manager')
And that's it. Querying against Employee
will return a combination of Employee
, Engineer
and Manager
objects.
The Query
object includes some helper functionality when dealing with joined-table inheritance mappings. These are the with_polymorphic()
and of_type()
methods, both of which are introduced in version 0.4.4.
The with_polymorphic()
method affects the specific subclass tables which the Query selects from. Normally, a query such as this:
session.query(Employee).filter(Employee.name=='ed')
Selects only from the employees
table. The criterion we use in filter()
and other methods will generate WHERE criterion against this table. What if we wanted to load Employee
objects but also wanted to use criterion against Engineer
? We could just query against the Engineer
class instead. But, if we were using criterion which filters among more than one subclass (subclasses which do not inherit directly from one to the other), we'd like to select from an outer join of all those tables. The with_polymorphic()
method can tell Query
which joined-table subclasses we want to select for:
session.query(Employee).with_polymorphic(Engineer).filter(Engineer.engineer_info=='some info')
Even without criterion, the with_polymorphic()
method has the added advantage that instances are loaded from all of their tables in one result set. Such as, to optimize the loading of all Employee
objects, with_polymorphic()
accepts '*'
as a wildcard indicating that all subclass tables should be joined:
session.query(Employee).with_polymorphic('*').all()
with_polymorphic()
is an effective query-level alternative to the existing select_table
option available on mapper()
.
Next is a way to join along relation
paths while narrowing the criterion to specific subclasses. Suppose the employees
table represents a collection of employees which are associated with a Company
object. We'll add a company_id
column to the employees
table and a new table companies
:
companies = Table('companies', metadata, Column('company_id', Integer, primary_key=True), Column('name', String(50)) ) employees = Table('employees', metadata, Column('employee_id', Integer, primary_key=True), Column('name', String(50)), Column('type', String(30), nullable=False), Column('company_id', Integer, ForeignKey('companies.company_id')) ) class Company(object): pass mapper(Company, companies, properties={ 'employees': relation(Employee) })
If we wanted to join from Company
to not just Employee
but specifically Engineers
, using the join()
method or any()
or has()
operators will by default create a join from companies
to employees
, without including engineers
or managers
in the mix. If we wish to have criterion which is specifically against the Engineer
class, we can tell those methods to join or subquery against the full set of tables representing the subclass using the of_type()
opertator:
session.query(Company).join(Company.employees.of_type(Engineer)).filter(Engineer.engineer_info=='someinfo')
A longhand notation, introduced in 0.4.3, is also available, which involves spelling out the full target selectable within a 2-tuple:
session.query(Company).join(('employees', employees.join(engineers))).filter(Engineer.engineer_info=='someinfo')
The second notation allows more flexibility, such as joining to any group of subclass tables:
session.query(Company).join(('employees', employees.outerjoin(engineers).outerjoin(managers))).\ filter(or_(Engineer.engineer_info=='someinfo', Manager.manager_data=='somedata'))
The any()
and has()
operators also can be used with of_type()
when the embedded criterion is in terms of a subclass:
session.query(Company).filter(Company.employees.of_type(Engineer).any(Engineer.engineer_info=='someinfo')).all()
Note that the any()
and has()
are both shorthand for a correlated EXISTS query. To build one by hand looks like:
session.query(Company).filter( exists([1], and_(Engineer.engineer_info=='someinfo', employees.c.company_id==companies.c.company_id), from_obj=employees.join(engineers) ) ).all()
The EXISTS subquery above selects from the join of employees
to engineers
, and also specifies criterion which correlates the EXISTS subselect back to the parent companies
table.
When loading fresh from the database, the joined-table setup above will query from the parent table first, then for each row will issue a second query to the child table. For example, for a load of five rows with Employee
id 3, Manager
ids 1 and 5 and Engineer
ids 2 and 4, will produce queries along the lines of this example:
session.query(Employee).all()
The above query works well for a get()
operation, since it limits the queries to only the tables directly involved in fetching a single instance. For instances which are already present in the session, the secondary table load is not needed. However, the above loading style is not efficient for loading large groups of objects, as it incurs separate queries for each parent row.
One way to reduce the number of "secondary" loads of child rows is to "defer" them, using polymorphic_fetch='deferred'
:
mapper(Employee, employees, polymorphic_on=employees.c.type, \ polymorphic_identity='employee', polymorphic_fetch='deferred') mapper(Engineer, engineers, inherits=Employee, polymorphic_identity='engineer') mapper(Manager, managers, inherits=Employee, polymorphic_identity='manager')
The above configuration queries in the same manner as earlier, except the load of each "secondary" table occurs only when attributes referencing those columns are first referenced on the loaded instance. This style of loading is very efficient for cases where large selects of items occur, but a detailed "drill down" of extra inherited properties is less common.
More commonly, an all-at-once load may be achieved by constructing a query which combines all three tables together. The easiest way to do this as of version 0.4.4 is to use the with_polymorphic()
query method which will automatically join in the classes desired:
query = session.query(Employee).with_polymorphic([Engineer, Manager])
Which produces a query like the following:
query.all()
with_polymorphic()
accepts a single class or mapper, a list of classes/mappers, or the string '*'
to indicate all subclasses. It also accepts a second argument selectable
which replaces the automatic join creation and instead selects directly from the selectable given. This can allow polymorphic loads from a variety of inheritance schemes including concrete tables, if the appropriate unions are constructed.
Similar behavior as provided by with_polymorphic()
can be configured at the mapper level so that any user-defined query is used by default in order to load instances. The select_table
argument references an arbitrary selectable which the mapper will use for load operations (it has no impact on save operations). Any selectable can be used for this, such as a UNION of tables. For joined table inheritance, the easiest method is to use OUTER JOIN:
join = employees.outerjoin(engineers).outerjoin(managers) mapper(Employee, employees, polymorphic_on=employees.c.type, \ polymorphic_identity='employee', select_table=join) mapper(Engineer, engineers, inherits=Employee, polymorphic_identity='engineer') mapper(Manager, managers, inherits=Employee, polymorphic_identity='manager')
The above mapping will produce a query similar to that of with_polymorphic('*')
for every query of Employee
objects.
When select_table
is used, with_polymorphic()
still overrides its usage at the query level. For example, if select_table
were configured to load from a join of multiple tables, using with_polymorphic(Employee)
will limit the list of tables selected from to just the base table (as always, tables which don't get loaded in the first pass will be loaded on an as-needed basis).
Single table inheritance is where the attributes of the base class as well as all subclasses are represented within a single table. A column is present in the table for every attribute mapped to the base class and all subclasses; the columns which correspond to a single subclass are nullable. This configuration looks much like joined-table inheritance except there's only one table. In this case, a type
column is required, as there would be no other way to discriminate between classes. The table is specified in the base mapper only; for the inheriting classes, leave their table
parameter blank:
employees_table = Table('employees', metadata, Column('employee_id', Integer, primary_key=True), Column('name', String(50)), Column('manager_data', String(50)), Column('engineer_info', String(50)), Column('type', String(20), nullable=False) ) employee_mapper = mapper(Employee, employees_table, \ polymorphic_on=employees_table.c.type, polymorphic_identity='employee') manager_mapper = mapper(Manager, inherits=employee_mapper, polymorphic_identity='manager') engineer_mapper = mapper(Engineer, inherits=employee_mapper, polymorphic_identity='engineer')
Note that the mappers for the derived classes Manager and Engineer omit the specification of their associated table, as it is inherited from the employee_mapper. Omitting the table specification for derived mappers in single-table inheritance is required.
back to section topThis form of inheritance maps each class to a distinct table, as below:
employees_table = Table('employees', metadata, Column('employee_id', Integer, primary_key=True), Column('name', String(50)), ) managers_table = Table('managers', metadata, Column('employee_id', Integer, primary_key=True), Column('name', String(50)), Column('manager_data', String(50)), ) engineers_table = Table('engineers', metadata, Column('employee_id', Integer, primary_key=True), Column('name', String(50)), Column('engineer_info', String(50)), )
Notice in this case there is no type
column. If polymorphic loading is not required, there's no advantage to using inherits
here; you just define a separate mapper for each class.
mapper(Employee, employees_table) mapper(Manager, managers_table) mapper(Engineer, engineers_table)
To load polymorphically, the select_table
argument is currently required. In this case we must construct a UNION of all three tables. SQLAlchemy includes a helper function to create these called polymorphic_union
, which will map all the different columns into a structure of selects with the same numbers and names of columns, and also generate a virtual type
column for each subselect:
pjoin = polymorphic_union({ 'employee': employees_table, 'manager': managers_table, 'engineer': engineers_table }, 'type', 'pjoin') employee_mapper = mapper(Employee, employees_table, select_table=pjoin, \ polymorphic_on=pjoin.c.type, polymorphic_identity='employee') manager_mapper = mapper(Manager, managers_table, inherits=employee_mapper, \ concrete=True, polymorphic_identity='manager') engineer_mapper = mapper(Engineer, engineers_table, inherits=employee_mapper, \ concrete=True, polymorphic_identity='engineer')
Upon select, the polymorphic union produces a query like this:
session.query(Employee).all()
Both joined-table and single table inheritance scenarios produce mappings which are usable in relation() functions; that is, it's possible to map a parent object to a child object which is polymorphic. Similarly, inheriting mappers can have relation()
s of their own at any level, which are inherited to each child class. The only requirement for relations is that there is a table relationship between parent and child. An example is the following modification to the joined table inheritance example, which sets a bi-directional relationship between Employee
and Company
:
employees_table = Table('employees', metadata, Column('employee_id', Integer, primary_key=True), Column('name', String(50)), Column('company_id', Integer, ForeignKey('companies.company_id')) ) companies = Table('companies', metadata, Column('company_id', Integer, primary_key=True), Column('name', String(50))) class Company(object): pass mapper(Company, companies, properties={ 'employees': relation(Employee, backref='company') })
SQLAlchemy has a lot of experience in this area; the optimized "outer join" approach can be used freely for parent and child relationships, eager loads are fully useable, query aliasing and other tricks are fully supported as well.
In a concrete inheritance scenario, mapping relation()
s is more difficult since the distinct classes do not share a table. In this case, you can establish a relationship from parent to child if a join condition can be constructed from parent to child, if each child table contains a foreign key to the parent:
companies = Table('companies', metadata, Column('id', Integer, primary_key=True), Column('name', String(50))) employees_table = Table('employees', metadata, Column('employee_id', Integer, primary_key=True), Column('name', String(50)), Column('company_id', Integer, ForeignKey('companies.id')) ) managers_table = Table('managers', metadata, Column('employee_id', Integer, primary_key=True), Column('name', String(50)), Column('manager_data', String(50)), Column('company_id', Integer, ForeignKey('companies.id')) ) engineers_table = Table('engineers', metadata, Column('employee_id', Integer, primary_key=True), Column('name', String(50)), Column('engineer_info', String(50)), Column('company_id', Integer, ForeignKey('companies.id')) ) mapper(Employee, employees_table, select_table=pjoin, polymorphic_on=pjoin.c.type, polymorphic_identity='employee') mapper(Manager, managers_table, inherits=employee_mapper, concrete=True, polymorphic_identity='manager') mapper(Engineer, engineers_table, inherits=employee_mapper, concrete=True, polymorphic_identity='engineer') mapper(Company, companies, properties={ 'employees': relation(Employee) })
Let's crank it up and try loading with an eager load:
session.query(Company).options(eagerload('employees')).all()
The big limitation with concrete table inheritance is that relation()s placed on each concrete mapper do not propagate to child mappers. If you want to have the same relation()s set up on all concrete mappers, they must be configured manually on each.
back to section topMappers can be constructed against arbitrary relational units (called Selectables
) as well as plain Tables
. For example, The join
keyword from the SQL package creates a neat selectable unit comprised of multiple tables, complete with its own composite primary key, which can be passed in to a mapper as the table.
# a class class AddressUser(object): pass # define a Join j = join(users_table, addresses_table) # map to it - the identity of an AddressUser object will be # based on (user_id, address_id) since those are the primary keys involved mapper(AddressUser, j, properties={ 'user_id': [users_table.c.user_id, addresses_table.c.user_id] })
A second example:
# many-to-many join on an association table j = join(users_table, userkeywords, users_table.c.user_id==userkeywords.c.user_id).join(keywords, userkeywords.c.keyword_id==keywords.c.keyword_id) # a class class KeywordUser(object): pass # map to it - the identity of a KeywordUser object will be # (user_id, keyword_id) since those are the primary keys involved mapper(KeywordUser, j, properties={ 'user_id': [users_table.c.user_id, userkeywords.c.user_id], 'keyword_id': [userkeywords.c.keyword_id, keywords.c.keyword_id] })
In both examples above, "composite" columns were added as properties to the mappers; these are aggregations of multiple columns into one mapper property, which instructs the mapper to keep both of those columns set at the same value.
back to section topSimilar to mapping against a join, a plain select() object can be used with a mapper as well. Below, an example select which contains two aggregate functions and a group_by is mapped to a class:
s = select([customers, func.count(orders).label('order_count'), func.max(orders.price).label('highest_order')], customers.c.customer_id==orders.c.customer_id, group_by=[c for c in customers.c] ).alias('somealias') class Customer(object): pass mapper(Customer, s)
Above, the "customers" table is joined against the "orders" table to produce a full row for each customer row, the total count of related rows in the "orders" table, and the highest price in the "orders" table, grouped against the full set of columns in the "customers" table. That query is then mapped against the Customer class. New instances of Customer will contain attributes for each column in the "customers" table as well as an "order_count" and "highest_order" attribute. Updates to the Customer object will only be reflected in the "customers" table and not the "orders" table. This is because the primary key columns of the "orders" table are not represented in this mapper and therefore the table is not affected by save or delete operations.
back to section topThe first mapper created for a certain class is known as that class's "primary mapper." Other mappers can be created as well on the "load side" - these are called secondary mappers. This is a mapper that must be constructed with the keyword argument non_primary=True
, and represents a load-only mapper. Objects that are loaded with a secondary mapper will have their save operation processed by the primary mapper. It is also invalid to add new relation()
s to a non-primary mapper. To use this mapper with the Session, specify it to the query
method:
example:
{python}
# primary mapper
mapper(User, users_table)
# make a secondary mapper to load User against a join
othermapper = mapper(User, users_table.join(someothertable), non_primary=True)
# select
result = session.query(othermapper).select()
The "non primary mapper" is a rarely needed feature of SQLAlchemy; in most cases, the Query
object can produce any kind of query that's desired. It's recommended that a straight Query
be used in place of a non-primary mapper unless the mapper approach is absolutely needed. Current use cases for the "non primary mapper" are when you want to map the class to a particular select statement or view to which additional query criterion can be added, and for when the particular mapped select statement or view is to be placed in a relation()
of a parent mapper.
Versions of SQLAlchemy previous to 0.5 included another mapper flag called "entity_name", as of version 0.5.0 this feature has been removed (it never worked very well).
back to section topMapping imposes no restrictions or requirements on the constructor
(__init__
) method for the class. You are free to require any
arguments for the function that you wish, assign attributes to the
instance that are unknown to the ORM, and generally do anything else
you would normally do when writing a constructor for a Python class.
The SQLAlchemy ORM does not call __init__
when recreating objects
from database rows. The ORM's process is somewhat akin to the Python
standard library's pickle
module, invoking the low level __new__
method and then quietly restoring attributes directly on the instance
rather than calling __init__
.
If you need to do some setup on database-loaded instances before
they're ready to use, you can use the @reconstructor
decorator to
tag a method as the ORM counterpart to __init__
. SQLAlchemy will
call this method with no arguments every time it loads or reconstructs
one of your instances. This is useful for recreating transient
properties that are normally assigned in your __init__
.
from sqlalchemy import orm class MyMappedClass(object): def __init__(self, data): self.data = data # we need stuff on all instances, but not in the database. self.stuff = [] @orm.reconstructor def init_on_load(self): self.stuff = []
When obj = MyMappedClass()
is executed, Python calls the __init__
method as normal and the data
argument is required. When instances
are loaded during a Query
operation as in
query(MyMappedClass).one()
, init_on_load
is called instead.
Any method may be tagged as the reconstructor
, even the __init__
method. SQLAlchemy will call the reconstructor method with no
arguments. Scalar (non-collection) database-mapped attributes of the
instance will be available for use within the function.
Eagerly-loaded collections are generally not yet available and will
usually only contain the first element. ORM state changes made to
objects at this stage will not be recorded for the next flush()
operation, so the activity within a reconstructor should be
conservative.
While the ORM does not call your __init__
method, it will modify the
class's __init__
slightly. The method is lightly wrapped to act as
a trigger for the ORM, allowing mappers to be compiled automatically
and will fire a init_instance
event that MapperExtension
s may
listen for. MapperExtension
s can also listen for a
reconstruct_instance
event, analagous to the reconstructor
decorator above.
Mappers can have functionality augmented or replaced at many points in its execution via the usage of the MapperExtension class. This class is just a series of "hooks" where various functionality takes place. An application can make its own MapperExtension objects, overriding only the methods it needs. Methods that are not overridden return the special value sqlalchemy.orm.EXT_CONTINUE
to allow processing to continue to the next MapperExtension or simply proceed normally if there are no more extensions.
API documentation for MapperExtension: class MapperExtension(object)
To use MapperExtension, make your own subclass of it and just send it off to a mapper:
m = mapper(User, users_table, extension=MyExtension())
Multiple extensions will be chained together and processed in order; they are specified as a list:
m = mapper(User, users_table, extension=[ext1, ext2, ext3])
The full list of options for the relation()
function:
A quick walkthrough of the basic relational patterns.
A one to many relationship places a foreign key in the child table referencing the parent. SQLAlchemy creates the relationship as a collection on the parent object containing instances of the child object.
parent_table = Table('parent', metadata, Column('id', Integer, primary_key=True)) child_table = Table('child', metadata, Column('id', Integer, primary_key=True), Column('parent_id', Integer, ForeignKey('parent.id'))) class Parent(object): pass class Child(object): pass mapper(Parent, parent_table, properties={ 'children': relation(Child) }) mapper(Child, child_table)
To establish a bi-directional relationship in one-to-many, where the "reverse" side is a many to one, specify the backref
option:
mapper(Parent, parent_table, properties={ 'children': relation(Child, backref='parent') }) mapper(Child, child_table)
Child
will get a parent
attribute with many-to-one semantics.
Many to one places a foreign key in the parent table referencing the child. The mapping setup is identical to one-to-many, however SQLAlchemy creates the relationship as a scalar attribute on the parent object referencing a single instance of the child object.
parent_table = Table('parent', metadata, Column('id', Integer, primary_key=True), Column('child_id', Integer, ForeignKey('child.id'))) child_table = Table('child', metadata, Column('id', Integer, primary_key=True), ) class Parent(object): pass class Child(object): pass mapper(Parent, parent_table, properties={ 'child': relation(Child) }) mapper(Child, child_table)
Backref behavior is available here as well, where backref="parents"
will place a one-to-many collection on the Child
class.
One To One is essentially a bi-directional relationship with a scalar attribute on both sides. To achieve this, the uselist=False
flag indicates the placement of a scalar attribute instead of a collection on the "many" side of the relationship. To convert one-to-many into one-to-one:
mapper(Parent, parent_table, properties={ 'child': relation(Child, uselist=False, backref='parent') })
Or to turn many-to-one into one-to-one:
mapper(Parent, parent_table, properties={ 'child': relation(Child, backref=backref('parent', uselist=False)) })
Many to Many adds an association table between two classes. The association table is indicated by the secondary
argument to relation()
.
left_table = Table('left', metadata, Column('id', Integer, primary_key=True)) right_table = Table('right', metadata, Column('id', Integer, primary_key=True)) association_table = Table('association', metadata, Column('left_id', Integer, ForeignKey('left.id')), Column('right_id', Integer, ForeignKey('right.id')), ) mapper(Parent, left_table, properties={ 'children': relation(Child, secondary=association_table) }) mapper(Child, right_table)
For a bi-directional relationship, both sides of the relation contain a collection by default, which can be modified on either side via the uselist
flag to be scalar. The backref
keyword will automatically use the same secondary
argument for the reverse relation:
mapper(Parent, left_table, properties={ 'children': relation(Child, secondary=association_table, backref='parents') })
The association object pattern is a variant on many-to-many: it specifically is used when your association table contains additional columns beyond those which are foreign keys to the left and right tables. Instead of using the secondary
argument, you map a new class directly to the association table. The left side of the relation references the association object via one-to-many, and the association class references the right side via many-to-one.
left_table = Table('left', metadata, Column('id', Integer, primary_key=True)) right_table = Table('right', metadata, Column('id', Integer, primary_key=True)) association_table = Table('association', metadata, Column('left_id', Integer, ForeignKey('left.id'), primary_key=True), Column('right_id', Integer, ForeignKey('right.id'), primary_key=True), Column('data', String(50)) ) mapper(Parent, left_table, properties={ 'children':relation(Association) }) mapper(Association, association_table, properties={ 'child':relation(Child) }) mapper(Child, right_table)
The bi-directional version adds backrefs to both relations:
mapper(Parent, left_table, properties={ 'children':relation(Association, backref="parent") }) mapper(Association, association_table, properties={ 'child':relation(Child, backref="parent_assocs") }) mapper(Child, right_table)
Working with the association pattern in its direct form requires that child objects are associated with an association instance before being appended to the parent; similarly, access from parent to child goes through the association object:
# create parent, append a child via association p = Parent() a = Association() a.child = Child() p.children.append(a) # iterate through child objects via association, including association # attributes for assoc in p.children: print assoc.data print assoc.child
To enhance the association object pattern such that direct access to the Association
object is optional, SQLAlchemy provides the associationproxy.
Important Note: it is strongly advised that the secondary
table argument not be combined with the Association Object pattern, unless the relation()
which contains the secondary
argument is marked viewonly=True
. Otherwise, SQLAlchemy may persist conflicting data to the underlying association table since it is represented by two conflicting mappings. The Association Proxy pattern should be favored in the case where access to the underlying association data is only sometimes needed.
The adjacency list pattern is a common relational pattern whereby a table contains a foreign key reference to itself. This is the most common and simple way to represent hierarchical data in flat tables. The other way is the "nested sets" model, sometimes called "modified preorder". Despite what many online articles say about modified preorder, the adjacency list model is probably the most appropriate pattern for the large majority of hierarchical storage needs, for reasons of concurrency, reduced complexity, and that modified preorder has little advantage over an application which can fully load subtrees into the application space.
SQLAlchemy commonly refers to an adjacency list relation as a self-referential mapper. In this example, we'll work with a single table called treenodes
to represent a tree structure:
nodes = Table('treenodes', metadata, Column('id', Integer, primary_key=True), Column('parent_id', Integer, ForeignKey('treenodes.id')), Column('data', String(50)), )
A graph such as the following:
root --+---> child1 +---> child2 --+--> subchild1 | +--> subchild2 +---> child3
Would be represented with data such as:
id parent_id data --- ------- ---- 1 NULL root 2 1 child1 3 1 child2 4 3 subchild1 5 3 subchild2 6 1 child3
SQLAlchemy's mapper()
configuration for a self-referential one-to-many relationship is exactly like a "normal" one-to-many relationship. When SQLAlchemy encounters the foreign key relation from treenodes
to treenodes
, it assumes one-to-many unless told otherwise:
# entity class class Node(object): pass mapper(Node, nodes, properties={ 'children': relation(Node) })
To create a many-to-one relationship from child to parent, an extra indicator of the "remote side" is added, which contains the Column
object or objects indicating the remote side of the relation:
mapper(Node, nodes, properties={ 'parent': relation(Node, remote_side=[nodes.c.id]) })
And the bi-directional version combines both:
mapper(Node, nodes, properties={ 'children': relation(Node, backref=backref('parent', remote_side=[nodes.c.id])) })
There are several examples included with SQLAlchemy illustrating self-referential strategies; these include basic_tree.py and optimized_al.py, the latter of which illustrates how to persist and search XML documents in conjunction with ElementTree.
Querying self-referential structures is done in the same way as any other query in SQLAlchemy, such as below, we query for any node whose data
attrbibute stores the value child2
:
# get all nodes named 'child2' sess.query(Node).filter(Node.data=='child2')
On the subject of joins, i.e. those described in Querying with Joins, self-referential structures require the usage of aliases so that the same table can be referenced multiple times within the FROM clause of the query. Aliasing can be done either manually using the nodes
Table
object as a source of aliases:
# get all nodes named 'subchild1' with a parent named 'child2' nodealias = nodes.alias() sqlsess.query(Node).filter(Node.data=='subchild1').\ filter(and_(Node.parent_id==nodealias.c.id, nodealias.c.data=='child2')).all()
or automatically, using join()
with aliased=True
:
# get all nodes named 'subchild1' with a parent named 'child2' sqlsess.query(Node).filter(Node.data=='subchild1').\ join('parent', aliased=True).filter(Node.data=='child2').all()
To add criterion to multiple points along a longer join, use from_joinpoint=True
:
# get all nodes named 'subchild1' with a parent named 'child2' and a grandparent 'root' sqlsess.query(Node).filter(Node.data=='subchild1').\ join('parent', aliased=True).filter(Node.data=='child2').\ join('parent', aliased=True, from_joinpoint=True).filter(Node.data=='root').all()
Eager loading of relations occurs using joins or outerjoins from parent to child table during a normal query operation, such that the parent and its child collection can be populated from a single SQL statement. SQLAlchemy's eager loading uses aliased tables in all cases when joining to related items, so it is compatible with self-referential joining. However, to use eager loading with a self-referential relation, SQLAlchemy needs to be told how many levels deep it should join; otherwise the eager load will not take place. This depth setting is configured via join_depth
:
mapper(Node, nodes, properties={ 'children': relation(Node, lazy=False, join_depth=2) }) sqlsession.query(Node).all()
The relation()
function uses the foreign key relationship between the parent and child tables to formulate the primary join condition between parent and child; in the case of a many-to-many relationship it also formulates the secondary join condition. If you are working with a Table
which has no ForeignKey
objects on it (which can be the case when using reflected tables with MySQL), or if the join condition cannot be expressed by a simple foreign key relationship, use the primaryjoin
and possibly secondaryjoin
conditions to create the appropriate relationship.
In this example we create a relation boston_addresses
which will only load the user addresses with a city of "Boston":
class User(object): pass class Address(object): pass mapper(Address, addresses_table) mapper(User, users_table, properties={ 'boston_addresses': relation(Address, primaryjoin= and_(users_table.c.user_id==addresses_table.c.user_id, addresses_table.c.city=='Boston')) })
Many to many relationships can be customized by one or both of primaryjoin
and secondaryjoin
, shown below with just the default many-to-many relationship explicitly set:
class User(object): pass class Keyword(object): pass mapper(Keyword, keywords_table) mapper(User, users_table, properties={ 'keywords': relation(Keyword, secondary=userkeywords_table, primaryjoin=users_table.c.user_id==userkeywords_table.c.user_id, secondaryjoin=userkeywords_table.c.keyword_id==keywords_table.c.keyword_id ) })
When using primaryjoin
and secondaryjoin
, SQLAlchemy also needs to be aware of which columns in the relation reference the other. In most cases, a Table
construct will have ForeignKey
constructs which take care of this; however, in the case of reflected tables on a database that does not report FKs (like MySQL ISAM) or when using join conditions on columns that don't have foreign keys, the relation()
needs to be told specifically which columns are "foreign" using the foreign_keys
collection:
mapper(Address, addresses_table) mapper(User, users_table, properties={ 'addresses': relation(Address, primaryjoin= users_table.c.user_id==addresses_table.c.user_id, foreign_keys=[addresses_table.c.user_id]) })
Very ambitious custom join conditions may fail to be directly persistable, and in some cases may not even load correctly. To remove the persistence part of the equation, use the flag viewonly=True
on the relation()
, which establishes it as a read-only attribute (data written to the collection will be ignored on flush()). However, in extreme cases, consider using a regular Python property in conjunction with Query
as follows:
class User(object): def _get_addresses(self): return object_session(self).query(Address).with_parent(self).filter(...).all() addresses = property(_get_addresses)
Theres no restriction on how many times you can relate from parent to child. SQLAlchemy can usually figure out what you want, particularly if the join conditions are straightforward. Below we add a newyork_addresses
attribute to complement the boston_addresses
attribute:
mapper(User, users_table, properties={ 'boston_addresses': relation(Address, primaryjoin= and_(users_table.c.user_id==addresses_table.c.user_id, addresses_table.c.city=='Boston')), 'newyork_addresses': relation(Address, primaryjoin= and_(users_table.c.user_id==addresses_table.c.user_id, addresses_table.c.city=='New York')), })
Mapping a one-to-many or many-to-many relationship results in a collection of values accessible through an attribute on the parent instance. By default, this collection is a list
:
mapper(Parent, properties={ children = relation(Child) }) parent = Parent() parent.children.append(Child()) print parent.children[0]
Collections are not limited to lists. Sets, mutable sequences and almost any other Python object that can act as a container can be used in place of the default list.
# use a set mapper(Parent, properties={ children = relation(Child, collection_class=set) }) parent = Parent() child = Child() parent.children.add(child) assert child in parent.children
You can use your own types for collections as well. For most cases, simply inherit from list
or set
and add the custom behavior.
Collections in SQLAlchemy are transparently instrumented. Instrumentation means that normal operations on the collection are tracked and result in changes being written to the database at flush time. Additionally, collection operations can fire events which indicate some secondary operation must take place. Examples of a secondary operation include saving the child item in the parent's Session
(i.e. the save-update
cascade), as well as synchronizing the state of a bi-directional relationship (i.e. a backref
).
The collections package understands the basic interface of lists, sets and dicts and will automatically apply instrumentation to those built-in types and their subclasses. Object-derived types that implement a basic collection interface are detected and instrumented via duck-typing:
class ListLike(object): def __init__(self): self.data = [] def append(self, item): self.data.append(item) def remove(self, item): self.data.remove(item) def extend(self, items): self.data.extend(items) def __iter__(self): return iter(self.data) def foo(self): return 'foo'
append
, remove
, and extend
are known list-like methods, and will be instrumented automatically. __iter__
is not a mutator method and won't be instrumented, and foo
won't be either.
Duck-typing (i.e. guesswork) isn't rock-solid, of course, so you can be explicit about the interface you are implementing by providing an __emulates__
class attribute:
class SetLike(object): __emulates__ = set def __init__(self): self.data = set() def append(self, item): self.data.add(item) def remove(self, item): self.data.remove(item) def __iter__(self): return iter(self.data)
This class looks list-like because of append
, but __emulates__
forces it to set-like. remove
is known to be part of the set interface and will be instrumented.
But this class won't work quite yet: a little glue is needed to adapt it for use by SQLAlchemy. The ORM needs to know which methods to use to append, remove and iterate over members of the collection. When using a type like list
or set
, the appropriate methods are well-known and used automatically when present. This set-like class does not provide the expected add
method, so we must supply an explicit mapping for the ORM via a decorator.
Decorators can be used to tag the individual methods the ORM needs to manage collections. Use them when your class doesn't quite meet the regular interface for its container type, or you simply would like to use a different method to get the job done.
from sqlalchemy.orm.collections import collection class SetLike(object): __emulates__ = set def __init__(self): self.data = set() @collection.appender def append(self, item): self.data.add(item) def remove(self, item): self.data.remove(item) def __iter__(self): return iter(self.data)
And that's all that's needed to complete the example. SQLAlchemy will add instances via the append
method. remove
and __iter__
are the default methods for sets and will be used for removing and iteration. Default methods can be changed as well:
from sqlalchemy.orm.collections import collection class MyList(list): @collection.remover def zark(self, item): # do something special... @collection.iterator def hey_use_this_instead_for_iteration(self): # ...
There is no requirement to be list-, or set-like at all. Collection classes can be any shape, so long as they have the append, remove and iterate interface marked for SQLAlchemy's use. Append and remove methods will be called with a mapped entity as the single argument, and iterator methods are called with no arguments and must return an iterator.
back to section topA dict
can be used as a collection, but a keying strategy is needed to map entities loaded by the ORM to key, value pairs. The collections package provides several built-in types for dictionary-based collections:
from sqlalchemy.orm.collections import column_mapped_collection, attribute_mapped_collection, mapped_collection mapper(Item, items_table, properties={ # key by column 'notes': relation(Note, collection_class=column_mapped_collection(notes_table.c.keyword)), # or named attribute 'notes2': relation(Note, collection_class=attribute_mapped_collection('keyword')), # or any callable 'notes3': relation(Note, collection_class=mapped_collection(lambda entity: entity.a + entity.b)) }) # ... item = Item() item.notes['color'] = Note('color', 'blue') print item.notes['color']
These functions each provide a dict
subclass with decorated set
and remove
methods and the keying strategy of your choice.
The collections.MappedCollection class can be used as a base class for your custom types or as a mix-in to quickly add dict
collection support to other classes. It uses a keying function to delegate to __setitem__
and __delitem__
:
from sqlalchemy.util import OrderedDict from sqlalchemy.orm.collections import MappedCollection class NodeMap(OrderedDict, MappedCollection): """Holds 'Node' objects, keyed by the 'name' attribute with insert order maintained.""" def __init__(self, *args, **kw): MappedCollection.__init__(self, keyfunc=lambda node: node.name) OrderedDict.__init__(self, *args, **kw)
The ORM understands the dict
interface just like lists and sets, and will automatically instrument all dict-like methods if you choose to subclass dict
or provide dict-like collection behavior in a duck-typed class. You must decorate appender and remover methods, however- there are no compatible methods in the basic dictionary interface for SQLAlchemy to use by default. Iteration will go through itervalues()
unless otherwise decorated.
Many custom types and existing library classes can be used as a entity collection type as-is without further ado. However, it is important to note that the instrumentation process will modify the type, adding decorators around methods automatically.
The decorations are lightweight and no-op outside of relations, but they do add unneeded overhead when triggered elsewhere. When using a library class as a collection, it can be good practice to use the "trivial subclass" trick to restrict the decorations to just your usage in relations. For example:
class MyAwesomeList(some.great.library.AwesomeList): pass # ... relation(..., collection_class=MyAwesomeList)
The ORM uses this approach for built-ins, quietly substituting a trivial subclass when a list
, set
or dict
is used directly.
The collections package provides additional decorators and support for authoring custom types. See the package documentation for more information and discussion of advanced usage and Python 2.3-compatible decoration options.
back to section topIn the Object Relational Tutorial, we introduced the concept of Eager Loading. We used an option
in conjunction with the Query
object in order to indicate that a relation should be loaded at the same time as the parent, within a single SQL query:
sql>>> jack = session.query(User).options(eagerload('addresses')).filter_by(name='jack').all()
By default, all relations are lazy loading. The scalar or collection attribute associated with a relation()
contains a trigger which fires the first time the attribute is accessed, which issues a SQL call at that point:
sql>>> jack.addresses
[<Address(u'jack@google.com')>, <Address(u'j25@yahoo.com')>]
The default loader strategy for any relation()
is configured by the lazy
keyword argument, which defaults to True
. Below we set it as False
so that the children
relation is eager loading:
# eager load 'children' attribute mapper(Parent, parent_table, properties={ 'children': relation(Child, lazy=False) })
The loader strategy can be changed from lazy to eager as well as eager to lazy using the eagerload()
and lazyload()
query options:
# set children to load lazily session.query(Parent).options(lazyload('children')).all() # set children to load eagerly session.query(Parent).options(eagerload('children')).all()
To reference a relation that is deeper than one level, separate the names by periods:
session.query(Parent).options(eagerload('foo.bar.bat')).all()
When using dot-separated names with eagerload()
, option applies only to the actual attribute named, and not its ancestors. For example, suppose a mapping from A
to B
to C
, where the relations, named atob
and btoc
, are both lazy-loading. A statement like the following:
session.query(A).options(eagerload('atob.btoc')).all()
will load only A
objects to start. When the atob
attribute on each A
is accessed, the returned B
objects will eagerly load their C
objects.
Therefore, to modify the eager load to load both atob
as well as btoc
, place eagerloads for both:
session.query(A).options(eagerload('atob'), eagerload('atob.btoc')).all()
or more simply just use eagerload_all()
:
session.query(A).options(eagerload_all('atob.btoc')).all()
There are two other loader strategies available, dynamic loading and no loading; these are described in Working with Large Collections.
When full statement loads are used with Query
, the user defined SQL is used verbatim and the Query
does not play any role in generating it. In this scenario, if eager loading is desired, the Query
should be informed as to what collections should also be loaded from the result set. Similarly, Queries which compile their statement in the usual way may also have user-defined joins built in which are synonymous with what eager loading would normally produce, and it improves performance to utilize those same JOINs for both purposes, instead of allowing the eager load mechanism to generate essentially the same JOIN redundantly. Yet another use case for such a feature is a Query which returns instances with a filtered view of their collections loaded, in which case the default eager load mechanisms need to be bypassed.
The single option Query
provides to control this is the contains_eager()
option, which specifies the path of a single relationship to be eagerly loaded. Like all relation-oriented options, it takes a string or Python descriptor as an argument. Below it's used with a from_statement
load:
# mapping is the users->addresses mapping mapper(User, users_table, properties={ 'addresses': relation(Address, addresses_table) }) # define a query on USERS with an outer join to ADDRESSES statement = users_table.outerjoin(addresses_table).select().apply_labels() # construct a Query object which expects the "addresses" results query = session.query(User).options(contains_eager('addresses')) # get results normally r = query.from_statement(statement)
It works just as well with an inline Query.join()
or Query.outerjoin()
:
session.query(User).outerjoin(User.addresses).options(contains_eager(User.addresses)).all()
If the "eager" portion of the statement is "aliased", the alias
keyword argument to contains_eager()
may be used to indicate it. This is a string alias name or reference to an actual Alias
object:
# use an alias of the Address entity adalias = aliased(Address) # construct a Query object which expects the "addresses" results query = session.query(User).outerjoin((adalias, User.addresses)).options(contains_eager(User.addresses, alias=adalias)) # get results normally sqlr = query.all()
The path given as the argument to contains_eager()
needs to be a full path from the starting entity. For example if we were loading Users->orders->Order->items->Item
, the string version would look like:
query(User).options(contains_eager('orders', 'items'))
The descriptor version like:
query(User).options(contains_eager(User.orders, Order.items))
A variant on contains_eager()
is the contains_alias()
option, which is used in the rare case that the parent object is loaded from an alias within a user-defined SELECT statement:
# define an aliased UNION called 'ulist' statement = users.select(users.c.user_id==7).union(users.select(users.c.user_id>7)).alias('ulist') # add on an eager load of "addresses" statement = statement.outerjoin(addresses).select().apply_labels() # create query, indicating "ulist" is an alias for the main table, "addresses" property should # be eager loaded query = session.query(User).options(contains_alias('ulist'), contains_eager('addresses')) # results r = query.from_statement(statement)
The default behavior of relation()
is to fully load the collection of items in, as according to the loading strategy of the relation. Additionally, the Session by default only knows how to delete objects which are actually present within the session. When a parent instance is marked for deletion and flushed, the Session loads its full list of child items in so that they may either be deleted as well, or have their foreign key value set to null; this is to avoid constraint violations. For large collections of child items, there are several strategies to bypass full loading of child items both at load time as well as deletion time.
The most useful by far is the dynamic_loader()
relation. This is a variant of relation()
which returns a Query
object in place of a collection when accessed. filter()
criterion may be applied as well as limits and offsets, either explicitly or via array slices:
mapper(User, users_table, properties={ 'posts': dynamic_loader(Post) }) jack = session.query(User).get(id) # filter Jack's blog posts posts = jack.posts.filter(Post.headline=='this is a post') # apply array slices posts = jack.posts[5:20]
The dynamic relation supports limited write operations, via the append()
and remove()
methods. Since the read side of the dynamic relation always queries the database, changes to the underlying collection will not be visible until the data has been flushed:
oldpost = jack.posts.filter(Post.headline=='old post').one() jack.posts.remove(oldpost) jack.posts.append(Post('new post'))
To place a dynamic relation on a backref, use lazy='dynamic'
:
mapper(Post, posts_table, properties={ 'user': relation(User, backref=backref('posts', lazy='dynamic')) })
Note that eager/lazy loading options cannot be used in conjunction dynamic relations at this time.
back to section topThe opposite of the dynamic relation is simply "noload", specified using lazy=None
:
mapper(MyClass, table, properties={ 'children': relation(MyOtherClass, lazy=None) })
Above, the children
collection is fully writeable, and changes to it will be persisted to the database as well as locally available for reading at the time they are added. However when instances of MyClass
are freshly loaded from the database, the children
collection stays empty.
Use passive_deletes=True
to disable child object loading on a DELETE operation, in conjunction with "ON DELETE (CASCADE|SET NULL)" on your database to automatically cascade deletes to child objects. Note that "ON DELETE" is not supported on SQLite, and requires InnoDB
tables when using MySQL:
mytable = Table('mytable', meta, Column('id', Integer, primary_key=True), ) myothertable = Table('myothertable', meta, Column('id', Integer, primary_key=True), Column('parent_id', Integer), ForeignKeyConstraint(['parent_id'], ['mytable.id'], ondelete="CASCADE"), ) mapper(MyOtherClass, myothertable) mapper(MyClass, mytable, properties={ 'children': relation(MyOtherClass, cascade="all, delete-orphan", passive_deletes=True) })
When passive_deletes
is applied, the children
relation will not be loaded into memory when an instance of MyClass
is marked for deletion. The cascade="all, delete-orphan"
will take effect for instances of MyOtherClass
which are currently present in the session; however for instances of MyOtherClass
which are not loaded, SQLAlchemy assumes that "ON DELETE CASCADE" rules will ensure that those rows are deleted by the database and that no foreign key violation will occur.
As of SQLAlchemy 0.4.2, the primary key attributes of an instance can be changed freely, and will be persisted upon flush. When the primary key of an entity changes, related items which reference the primary key must also be updated as well. For databases which enforce referential integrity, it's required to use the database's ON UPDATE CASCADE functionality in order to propagate primary key changes. For those which don't, the passive_cascades
flag can be set to False
which instructs SQLAlchemy to issue UPDATE statements individually. The passive_cascades
flag can also be False
in conjunction with ON UPDATE CASCADE functionality, although in that case it issues UPDATE statements unnecessarily.
A typical mutable primary key setup might look like:
users = Table('users', metadata, Column('username', String(50), primary_key=True), Column('fullname', String(100))) addresses = Table('addresses', metadata, Column('email', String(50), primary_key=True), Column('username', String(50), ForeignKey('users.username', onupdate="cascade"))) class User(object): pass class Address(object): pass mapper(User, users, properties={ 'addresses': relation(Address, passive_updates=False) }) mapper(Address, addresses)
passive_updates is set to True
by default. Foreign key references to non-primary key columns are supported as well.
The Mapper is the entrypoint to the configurational API of the SQLAlchemy object relational mapper. But the primary object one works with when using the ORM is the Session.
In the most general sense, the Session
establishes all conversations with the database and represents a "holding zone" for all the mapped instances which you've loaded or created during its lifespan. It implements the Unit of Work pattern, which means it keeps track of all changes which occur, and is capable of flushing those changes to the database as appropriate. Another important facet of the Session
is that it's also maintaining unique copies of each instance, where "unique" means "only one object with a particular primary key" - this pattern is called the Identity Map.
Beyond that, the Session
implements an interface which lets you move objects in or out of the session in a variety of ways, it provides the entryway to a Query
object which is used to query the database for data, and it also provides a transactional context for SQL operations which rides on top of the transactional capabilities of Engine
and Connection
objects.
Session
is a regular Python class which can be directly instantiated. However, to standardize how sessions are configured and acquired, the sessionmaker()
function is normally used to create a top level Session
configuration which can then be used throughout an application without the need to repeat the configurational arguments.
The usage of sessionmaker()
is illustrated below:
from sqlalchemy.orm import sessionmaker # create a configured "Session" class Session = sessionmaker(bind=some_engine) # create a Session sess = Session() # work with sess myobject = MyObject('foo', 'bar') sess.add(myobject) sess.commit() # close when finished sess.close()
Above, the sessionmaker
call creates a class for us, which we assign to the name Session
. This class is a subclass of the actual sqlalchemy.orm.session.Session
class, which will instantiate with a particular bound engine.
When you write your application, place the call to sessionmaker()
somewhere global, and then make your new Session
class available to the rest of your application.
In our previous example regarding sessionmaker()
, we specified a bind
for a particular Engine
. If we'd like to construct a sessionmaker()
without an engine available and bind it later on, or to specify other options to an existing sessionmaker()
, we may use the configure()
method:
# configure Session class with desired options Session = sessionmaker() # later, we create the engine engine = create_engine('postgres://...') # associate it with our custom Session class Session.configure(bind=engine) # work with the session sess = Session()
It's actually entirely optional to bind a Session to an engine. If the underlying mapped Table
objects use "bound" metadata, the Session
will make use of the bound engine instead (or will even use multiple engines if multiple binds are present within the mapped tables). "Bound" metadata is described at Binding MetaData to an Engine or Connection.
The Session
also has the ability to be bound to multiple engines explicitly. Descriptions of these scenarios are described in Partitioning Strategies.
The Session
can also be explicitly bound to an individual database Connection
. Reasons for doing this may include to join a Session
with an ongoing transaction local to a specific Connection
object, or to bypass connection pooling by just having connections persistently checked out and associated with distinct, long running sessions:
# global application scope. create Session class, engine Session = sessionmaker() engine = create_engine('postgres://...') ... # local scope, such as within a controller function # connect to the database connection = engine.connect() # bind an individual Session to the connection sess = Session(bind=connection)
As an alternative to sessionmaker()
, create_session()
is a function which calls the normal Session
constructor directly. All arguments are passed through and the new Session
object is returned:
session = create_session(bind=myengine, autocommit=True, autoflush=False)
Configurational arguments accepted by sessionmaker()
and create_session()
are the same as that of the Session
class itself, and are described at sessionmaker().
It's helpful to know the states which an instance can have within a session:
Transient - an instance that's not in a session, and is not saved to the database; i.e. it has no database identity. The only relationship such an object has to the ORM is that its class has a mapper()
associated with it.
Pending - when you add()
a transient instance, it becomes pending. It still wasn't actually flushed to the database yet, but it will be when the next flush occurs.
Persistent - An instance which is present in the session and has a record in the database. You get persistent instances by either flushing so that the pending instances become persistent, or by querying the database for existing instances (or moving persistent instances from other sessions into your local session).
Detached - an instance which has a record in the database, but is not in any session. Theres nothing wrong with this, and you can use objects normally when they're detached, except they will not be able to issue any SQL in order to load collections or attributes which are not yet loaded, or were marked as "expired".
Knowing these states is important, since the Session
tries to be strict about ambiguous operations (such as trying to save the same object to two different sessions at the same time).
When do I make a sessionmaker
?
Just one time, somewhere in your application's global scope. It should be looked upon as part of your application's configuration. If your application has three .py files in a package, you could, for example, place the sessionmaker
line in your __init__.py
file; from that point on your other modules say "from mypackage import Session". That way, everyone else just uses Session()
, and the configuration of that session is controlled by that central point.
If your application starts up, does imports, but does not know what database it's going to be connecting to, you can bind the Session
at the "class" level to the engine later on, using configure()
.
In the examples in this section, we will frequently show the sessionmaker
being created right above the line where we actually invoke Session()
. But that's just for example's sake ! In reality, the sessionmaker
would be somewhere at the module level, and your individual Session()
calls would be sprinkled all throughout your app, such as in a web application within each controller method.
When do I make a Session
?
You typically invoke Session()
when you first need to talk to your database, and want to save some objects or load some existing ones. Then, you work with it, save your changes, and then dispose of it....or at the very least close()
it. It's not a "global" kind of object, and should be handled more like a "local variable", as it's generally not safe to use with concurrent threads. Sessions are very inexpensive to make, and don't use any resources whatsoever until they are first used...so create some !
There is also a pattern whereby you're using a contextual session, this is described later in Contextual/Thread-local Sessions. In this pattern, a helper object is maintaining a Session
for you, most commonly one that is local to the current thread (and sometimes also local to an application instance). SQLAlchemy has worked this pattern out such that it still looks like you're creating a new session as you need one...so in that case, it's still a guaranteed win to just say Session()
whenever you want a session.
Is the Session a cache ?
Yeee...no. It's somewhat used as a cache, in that it implements the identity map pattern, and stores objects keyed to their primary key. However, it doesn't do any kind of query caching. This means, if you say session.query(Foo).filter_by(name='bar')
, even if Foo(name='bar')
is right there, in the identity map, the session has no idea about that. It has to issue SQL to the database, get the rows back, and then when it sees the primary key in the row, then it can look in the local identity map and see that the object is already there. It's only when you say query.get({some primary key})
that the Session
doesn't have to issue a query.
Additionally, the Session stores object instances using a weak reference by default. This also defeats the purpose of using the Session as a cache, unless the weak_identity_map
flag is set to False
.
The Session
is not designed to be a global object from which everyone consults as a "registry" of objects. That is the job of a second level cache. A good library for implementing second level caching is Memcached. It is possible to "sort of" use the Session
in this manner, if you set it to be non-transactional and it never flushes any SQL, but it's not a terrific solution, since if concurrent threads load the same objects at the same time, you may have multiple copies of the same objects present in collections.
How can I get the Session
for a certain object ?
Use the object_session()
classmethod available on Session
:
session = Session.object_session(someobject)
Is the session threadsafe ?
Nope. It has no thread synchronization of any kind built in, and particularly when you do a flush operation, it definitely is not open to concurrent threads accessing it, because it holds onto a single database connection at that point. If you use a session which is non-transactional for read operations only, it's still not thread-"safe", but you also wont get any catastrophic failures either, since it opens and closes connections on an as-needed basis; it's just that different threads might load the same objects independently of each other, but only one will wind up in the identity map (however, the other one might still live in a collection somewhere).
But the bigger point here is, you should not want to use the session with multiple concurrent threads. That would be like having everyone at a restaurant all eat from the same plate. The session is a local "workspace" that you use for a specific set of tasks; you don't want to, or need to, share that session with other threads who are doing some other task. If, on the other hand, there are other threads participating in the same task you are, such as in a desktop graphical application, then you would be sharing the session with those threads, but you also will have implemented a proper locking scheme (or your graphical framework does) so that those threads do not collide.
The query()
function takes one or more entities and returns a new Query
object which will issue mapper queries within the context of this Session. An entity is defined as a mapped class, a Mapper
object, an orm-enabled descriptor, or an AliasedClass
object.
# query from a class session.query(User).filter_by(name='ed').all() # query with multiple classes, returns tuples session.query(User, Address).join('addresses').filter_by(name='ed').all() # query using orm-enabled descriptors session.query(User.name, User.fullname).all() # query from a mapper user_mapper = class_mapper(User) session.query(user_mapper)
When Query
returns results, each object instantiated is stored within the identity map. When a row matches an object which is already present, the same object is returned. In the latter case, whether or not the row is populated onto an existing object depends upon whether the attributes of the instance have been expired or not. As of 0.5, a default-configured Session
automatically expires all instances along transaction boundaries, so that with a normally isolated transaction, there shouldn't be any issue of instances representing data which is stale with regards to the current transaction.
add()
is used to place instances in the session. For transient (i.e. brand new) instances, this will have the effect of an INSERT taking place for those instances upon the next flush. For instances which are persistent (i.e. were loaded by this session), they are already present and do not need to be added. Instances which are detached (i.e. have been removed from a session) may be re-associated with a session using this method:
user1 = User(name='user1') user2 = User(name='user2') session.add(user1) session.add(user2) session.commit() # write changes to the database
To add a list of items to the session at once, use add_all()
:
session.add_all([item1, item2, item3])
The add()
operation cascades along the save-update
cascade. For more details see the section Cascades.
merge()
reconciles the current state of an instance and its associated children with existing data in the database, and returns a copy of the instance associated with the session. Usage is as follows:
merged_object = session.merge(existing_object)
When given an instance, it follows these steps:
merge
cascade. Note that all changes present on the given instance, including changes to collections, are merged.
With merge()
, the given instance is not placed within the session, and can be associated with a different session or detached. merge()
is very useful for taking the state of any kind of object structure without regard for its origins or current session associations and placing that state within a session. Here's two examples:
merge()
to save it to the database, ensuring that the data within the file is used to formulate the primary key of each element of the structure. Later, when the file has changed, the same process can be re-run, producing a slightly different object structure, which can then be merged()
in again, and the Session
will automatically update the database to reflect those changes.
merge()
is frequently used by applications which implement their own second level caches. This refers to an application which uses an in memory dictionary, or an tool like Memcached to store objects over long running spans of time. When such an object needs to exist within a Session
, merge()
is a good choice since it leaves the original cached object untouched. For this use case, merge provides a keyword option called dont_load=True
. When this boolean flag is set to True
, merge()
will not issue any SQL to reconcile the given object against the current state of the database, thereby reducing query overhead. The limitation is that the given object and all of its children may not contain any pending changes, and it's also of course possible that newer information in the database will not be present on the merged object, since no load is issued.
The delete
method places an instance into the Session's list of objects to be marked as deleted:
# mark two objects to be deleted session.delete(obj1) session.delete(obj2) # commit (or flush) session.commit()
The big gotcha with delete()
is that nothing is removed from collections. Such as, if a User
has a collection of three Addresses
, deleting an Address
will not remove it from user.addresses
:
>>> address = user.addresses[1] >>> session.delete(address) >>> session.flush() >>> address in user.addresses True
The solution is to use proper cascading:
mapper(User, users_table, properties={ 'addresses':relation(Address, cascade="all, delete, delete-orphan") }) del user.addresses[1] session.flush()
When the Session
is used with its default configuration, the flush step is nearly always done transparently. Specifically, the flush occurs before any individual Query
is issued, as well as within the commit()
call before the transaction is committed. It also occurs before a SAVEPOINT is issued when begin_nested()
is used. The "flush-on-Query" aspect of the behavior can be disabled by constructing sessionmaker()
with the flag autoflush=False
.
Regardless of the autoflush setting, a flush can always be forced by issuing flush()
:
session.flush()
flush()
also supports the ability to flush a subset of objects which are present in the session, by passing a list of objects:
# saves only user1 and address2. all other modified # objects remain present in the session. session.flush([user1, address2])
This second form of flush should be used carefully as it currently does not cascade, meaning that it will not necessarily affect other objects directly associated with the objects given.
The flush process always occurs within a transaction, even if the Session
has been configured with autocommit=True
, a setting that disables the session's persistent transactional state. If no transaction is present, flush()
creates its own transaction and commits it. Any failures during flush will always result in a rollback of whatever transaction is present.
commit()
is used to commit the current transaction. It always issues flush()
beforehand to flush any remaining state to the database; this is independent of the "autoflush" setting. If no transaction is present, it raises an error. Note that the default behavior of the Session
is that a transaction is always present; this behavior can be disabled by setting autocommit=True
. In autocommit mode, a transaction can be initiated by calling the begin()
method.
Another behavior of commit()
is that by default it expires the state of all instances present after the commit is complete. This is so that when the instances are next accessed, either through attribute access or by them being present in a Query
result set, they receive the most recent state. To disable this behavior, configure sessionmaker()
with expire_on_commit=False
.
Normally, instances loaded into the Session
are never changed by subsequent queries; the assumption is that the current transaction is isolated so the state most recently loaded is correct as long as the transaction continues. Setting autocommit=True
works against this model to some degree since the Session
behaves in exactly the same way with regard to attribute state, except no transaction is present.
rollback()
rolls back the current transaction. With a default configured session, the post-rollback state of the session is as follows:
Session
within the lifespan of the transaction are expunged, corresponding to their INSERT statement being rolled back. The state of their attributes remains unchanged.
With that state understood, the Session
may safely continue usage after a rollback occurs (note that this is a new feature as of version 0.5).
When a flush()
fails, typically for reasons like primary key, foreign key, or "not nullable" constraint violations, a rollback()
is issued automatically (it's currently not possible for a flush to continue after a partial failure). However, the flush process always uses its own transactional demarcator called a subtransaction, which is described more fully in the docstrings for Session
. What it means here is that even though the database transaction has been rolled back, the end user must still issue rollback()
to fully reset the state of the Session
.
Expunge removes an object from the Session, sending persistent instances to the detached state, and pending instances to the transient state:
session.expunge(obj1)
To remove all items, call session.expunge_all()
(this method was formerly known as clear()
).
The close()
method issues a expunge_all()
, and releases any transactional/connection resources. When connections are returned to the connection pool, transactional state is rolled back as well.
To assist with the Session's "sticky" behavior of instances which are present, individual objects can have all of their attributes immediately re-loaded from the database, or marked as "expired" which will cause a re-load to occur upon the next access of any of the object's mapped attributes. This includes all relationships, so lazy-loaders will be re-initialized, eager relationships will be repopulated. Any changes marked on the object are discarded:
# immediately re-load attributes on obj1, obj2 session.refresh(obj1) session.refresh(obj2) # expire objects obj1, obj2, attributes will be reloaded # on the next access: session.expire(obj1) session.expire(obj2)
refresh()
and expire()
also support being passed a list of individual attribute names in which to be refreshed. These names can reference any attribute, column-based or relation based:
# immediately re-load the attributes 'hello', 'world' on obj1, obj2 session.refresh(obj1, ['hello', 'world']) session.refresh(obj2, ['hello', 'world']) # expire the attributes 'hello', 'world' objects obj1, obj2, attributes will be reloaded # on the next access: session.expire(obj1, ['hello', 'world']) session.expire(obj2, ['hello', 'world'])
The full contents of the session may be expired at once using expire_all()
:
session.expire_all()
refresh()
and expire()
are usually not needed when working with a default-configured Session
. The usual need is when an UPDATE or DELETE has been issued manually within the transaction using Session.execute()
.
The Session
itself acts somewhat like a set-like collection. All items present may be accessed using the iterator interface:
for obj in session: print obj
And presence may be tested for using regular "contains" semantics:
if obj in session: print "Object is present"
The session is also keeping track of all newly created (i.e. pending) objects, all objects which have had changes since they were last loaded or saved (i.e. "dirty"), and everything that's been marked as deleted.
# pending objects recently added to the Session session.new # persistent objects which currently have changes detected # (this collection is now created on the fly each time the property is called) session.dirty # persistent objects that have been marked as deleted via session.delete(obj) session.deleted
Note that objects within the session are by default weakly referenced. This means that when they are dereferenced in the outside application, they fall out of scope from within the Session
as well and are subject to garbage collection by the Python interpreter. The exceptions to this include objects which are pending, objects which are marked as deleted, or persistent objects which have pending changes on them. After a full flush, these collections are all empty, and all objects are again weakly referenced. To disable the weak referencing behavior and force all objects within the session to remain until explicitly expunged, configure sessionmaker()
with the weak_identity_map=False
setting.
Mappers support the concept of configurable cascade behavior on relation()
s. This behavior controls how the Session should treat the instances that have a parent-child relationship with another instance that is operated upon by the Session. Cascade is indicated as a comma-separated list of string keywords, with the possible values all
, delete
, save-update
, refresh-expire
, merge
, expunge
, and delete-orphan
.
Cascading is configured by setting the cascade
keyword argument on a relation()
:
mapper(Order, order_table, properties={ 'items' : relation(Item, items_table, cascade="all, delete-orphan"), 'customer' : relation(User, users_table, user_orders_table, cascade="save-update"), })
The above mapper specifies two relations, items
and customer
. The items
relationship specifies "all, delete-orphan" as its cascade
value, indicating that all add
, merge
, expunge
, refresh
delete
and expire
operations performed on a parent Order
instance should also be performed on the child Item
instances attached to it. The delete-orphan
cascade value additionally indicates that if an Item
instance is no longer associated with an Order
, it should also be deleted. The "all, delete-orphan" cascade argument allows a so-called lifecycle relationship between an Order
and an Item
object.
The customer
relationship specifies only the "save-update" cascade value, indicating most operations will not be cascaded from a parent Order
instance to a child User
instance except for the add()
operation. "save-update" cascade indicates that an add()
on the parent will casade to all child items, and also that items added to a parent which is already present in the sessio will also be added.
The default value for cascade
on relation()
s is save-update, merge
.
The Session
manages transactions across all engines associated with it. As the Session
receives requests to execute SQL statements using a particular Engine
or Connection
, it adds each individual Engine
encountered to its transactional state and maintains an open connection for each one (note that a simple application normally has just one Engine
). At commit time, all unflushed data is flushed, and each individual transaction is committed. If the underlying databases support two-phase semantics, this may be used by the Session as well if two-phase transactions are enabled.
Normal operation ends the transactional state using the rolback()
or commit()
methods. After either is called, the Session
starts a new transaction.
Session = sessionmaker() sess = Session() try: item1 = sess.query(Item).get(1) item2 = sess.query(Item).get(2) item1.foo = 'bar' item2.bar = 'foo' # commit- will immediately go into a new transaction afterwards sess.commit() except: # rollback - will immediately go into a new transaction afterwards. sess.rollback()
A session which is configured with autocommit=True
may be placed into a transaction using begin()
. With an autocommit=True
session that's been placed into a transaction using begin()
, the session releases all connection resources after a commit()
or rollback()
and remains transaction-less (with the exception of flushes) until the next begin()
call:
Session = sessionmaker(autocommit=True) sess = Session() sess.begin() try: item1 = sess.query(Item).get(1) item2 = sess.query(Item).get(2) item1.foo = 'bar' item2.bar = 'foo' sess.commit() except: sess.rollback() raise
The begin()
method also returns a transactional token which is compatible with the Python 2.6 with
statement:
Session = sessionmaker(autocommit=True) sess = Session() with sess.begin(): item1 = sess.query(Item).get(1) item2 = sess.query(Item).get(2) item1.foo = 'bar' item2.bar = 'foo'
SAVEPOINT transactions, if supported by the underlying engine, may be delineated using the begin_nested()
method:
Session = sessionmaker() sess = Session() sess.add(u1) sess.add(u2) sess.begin_nested() # establish a savepoint sess.add(u3) sess.rollback() # rolls back u3, keeps u1 and u2 sess.commit() # commits u1 and u2
begin_nested()
may be called any number of times, which will issue a new SAVEPOINT with a unique identifier for each call. For each begin_nested()
call, a corresponding rollback()
or commit()
must be issued.
When begin_nested()
is called, a flush()
is unconditionally issued (regardless of the autoflush
setting). This is so that when a rollback()
occurs, the full state of the session is expired, thus causing all subsequent attribute/instance access to reference the full state of the Session
right before begin_nested()
was called.
Finally, for MySQL, Postgres, and soon Oracle as well, the session can be instructed to use two-phase commit semantics. This will coordinate the commiting of transactions across databases so that the transaction is either committed or rolled back in all databases. You can also prepare()
the session for interacting with transactions not managed by SQLAlchemy. To use two phase transactions set the flag twophase=True
on the session:
engine1 = create_engine('postgres://db1') engine2 = create_engine('postgres://db2') Session = sessionmaker(twophase=True) # bind User operations to engine 1, Account operations to engine 2 Session.configure(binds={User:engine1, Account:engine2}) sess = Session() # .... work with accounts and users # commit. session will issue a flush to all DBs, and a prepare step to all DBs, # before committing both transactions sess.commit()
This feature allows the value of a database column to be set to a SQL expression instead of a literal value. It's especially useful for atomic updates, calling stored procedures, etc. All you do is assign an expression to an attribute:
class SomeClass(object): pass mapper(SomeClass, some_table) someobject = session.query(SomeClass).get(5) # set 'value' attribute to a SQL expression adding one someobject.value = some_table.c.value + 1 # issues "UPDATE some_table SET value=value+1" session.commit()
This technique works both for INSERT and UPDATE statements. After the flush/commit operation, the value
attribute on someobject
above is expired, so that when next accessed the newly generated value will be loaded from the database.
SQL expressions and strings can be executed via the Session
within its transactional context. This is most easily accomplished using the execute()
method, which returns a ResultProxy
in the same manner as an Engine
or Connection
:
Session = sessionmaker(bind=engine) sess = Session() # execute a string statement result = sess.execute("select * from table where id=:id", {'id':7}) # execute a SQL expression construct result = sess.execute(select([mytable]).where(mytable.c.id==7))
The current Connection
held by the Session
is accessible using the connection()
method:
connection = sess.connection()
The examples above deal with a Session
that's bound to a single Engine
or Connection
. To execute statements using a Session
which is bound either to multiple engines, or none at all (i.e. relies upon bound metadata), both execute()
and connection()
accept a mapper
keyword argument, which is passed a mapped class or Mapper
instance, which is used to locate the proper context for the desired engine:
Session = sessionmaker() sess = Session() # need to specify mapper or class when executing result = sess.execute("select * from table where id=:id", {'id':7}, mapper=MyMappedClass) result = sess.execute(select([mytable], mytable.c.id==7), mapper=MyMappedClass) connection = sess.connection(MyMappedClass)
If a Connection
is being used which is already in a transactional state (i.e. has a Transaction
), a Session
can be made to participate within that transaction by just binding the Session
to that Connection
:
Session = sessionmaker() # non-ORM connection + transaction conn = engine.connect() trans = conn.begin() # create a Session, bind to the connection sess = Session(bind=conn) # ... work with session sess.commit() # commit the session sess.close() # close it out, prohibit further actions trans.commit() # commit the actual transaction
Note that above, we issue a commit()
both on the Session
as well as the Transaction
. This is an example of where we take advantage of Connection
's ability to maintain subtransactions, or nested begin/commit pairs. The Session
is used exactly as though it were managing the transaction on its own; its commit()
method issues its flush()
, and commits the subtransaction. The subsequent transaction the Session
starts after commit will not begin until it's next used. Above we issue a close()
to prevent this from occuring. Finally, the actual transaction is committed using Transaction.commit()
.
When using the threadlocal
engine context, the process above is simplified; the Session
uses the same connection/transaction as everyone else in the current thread, whether or not you explicitly bind it:
engine = create_engine('postgres://mydb', strategy="threadlocal") engine.begin() sess = Session() # session takes place in the transaction like everyone else # ... go nuts engine.commit() # commit the transaction
A common need in applications, particularly those built around web frameworks, is the ability to "share" a Session
object among disparate parts of an application, without needing to pass the object explicitly to all method and function calls. What you're really looking for is some kind of "global" session object, or at least "global" to all the parts of an application which are tasked with servicing the current request. For this pattern, SQLAlchemy provides the ability to enhance the Session
class generated by sessionmaker()
to provide auto-contextualizing support. This means that whenever you create a Session
instance with its constructor, you get an existing Session
object which is bound to some "context". By default, this context is the current thread. This feature is what previously was accomplished using the sessioncontext
SQLAlchemy extension.
The scoped_session()
function wraps around the sessionmaker()
function, and produces an object which behaves the same as the Session
subclass returned by sessionmaker()
:
from sqlalchemy.orm import scoped_session, sessionmaker Session = scoped_session(sessionmaker())
However, when you instantiate this Session
"class", in reality the object is pulled from a threadlocal variable, or if it doesn't exist yet, it's created using the underlying class generated by sessionmaker()
:
>>> # call Session() the first time. the new Session instance is created. >>> sess = Session() >>> # later, in the same application thread, someone else calls Session() >>> sess2 = Session() >>> # the two Session objects are *the same* object >>> sess is sess2 True
Since the Session()
constructor now returns the same Session
object every time within the current thread, the object returned by scoped_session()
also implements most of the Session
methods and properties at the "class" level, such that you don't even need to instantiate Session()
:
# create some objects u1 = User() u2 = User() # save to the contextual session, without instantiating Session.add(u1) Session.add(u2) # view the "new" attribute assert u1 in Session.new # commit changes Session.commit()
The contextual session may be disposed of by calling Session.remove()
:
# remove current contextual session Session.remove()
After remove()
is called, the next operation with the contextual session will start a new Session
for the current thread.
A (really, really) common question is when does the contextual session get created, when does it get disposed ? We'll consider a typical lifespan as used in a web application:
Web Server Web Framework User-defined Controller Call -------------- -------------- ------------------------------ web request -> call controller -> # call Session(). this establishes a new, # contextual Session. sess = Session() # load some objects, save some changes objects = sess.query(MyClass).all() # some other code calls Session, it's the # same contextual session as "sess" sess2 = Session() sess2.add(foo) sess2.commit() # generate content to be returned return generate_content() Session.remove() <- web response <-
The above example illustrates an explicit call to Session.remove()
. This has the effect such that each web request starts fresh with a brand new session. When integrating with a web framework, there's actually many options on how to proceed for this step, particularly as of version 0.5:
Session
is thrown away, all of its transactional/connection resources are closed out, everything within it is explicitly gone. A new Session
will be used on the next request.
remove()
, in that all objects are explicitly expunged and all transactional/connection resources closed, except the actual Session
object hangs around. It doesn't make too much difference here unless the start of the web request would like to pass specific options to the initial construction of Session()
, such as a specific Engine
to bind to.
Session
are weakly referenced anyway so its likely that it will be empty on the next request in any case.
rollback()
ensures that no transactional state remains and expires all data, in the case that the request was aborted and did not roll back itself.
Generated docstrings for scoped_session()
back to section topthis section is TODO
Vertical partitioning places different kinds of objects, or different tables, across multiple databases.
engine1 = create_engine('postgres://db1') engine2 = create_engine('postgres://db2') Session = sessionmaker(twophase=True) # bind User operations to engine 1, Account operations to engine 2 Session.configure(binds={User:engine1, Account:engine2}) sess = Session()
Horizontal partitioning partitions the rows of a single table (or a set of tables) across multiple databases.
See the "sharding" example in attribute_shard.py
back to section topExtending the session can be achieved through subclassing as well as through a simple extension class, which resembles the style of Extending Mapper called SessionExtension. See the docstrings for more information on this class' methods.
Basic usage is similar to MapperExtension
:
class MySessionExtension(SessionExtension): def before_commit(self, session): print "before commit!" Session = sessionmaker(extension=MySessionExtension())
or with create_session()
:
sess = create_session(extension=MySessionExtension())
The same SessionExtension
instance can be used with any number of sessions.
The Engine is the starting point for any SQLAlchemy application. It's "home base" for the actual database and its DBAPI, delivered to the SQLAlchemy application through a connection pool and a Dialect, which describes how to talk to a specific kind of database and DBAPI combination.
The general structure is this:
+-----------+ __________ /---| Pool |---\ (__________) +-------------+ / +-----------+ \ +--------+ | | connect() <--| Engine |---x x----| DBAPI |---| database | +-------------+ \ +-----------+ / +--------+ | | \---| Dialect |---/ |__________| +-----------+ (__________)
Where above, a sqlalchemy.engine.Engine references both a sqlalchemy.engine.Dialect and sqlalchemy.pool.Pool, which together interpret the DBAPI's module functions as well as the behavior of the database.
Creating an engine is just a matter of issuing a single call, create_engine()
:
engine = create_engine('postgres://scott:tiger@localhost:5432/mydatabase')
The above engine invokes the postgres
dialect and a connection pool which references localhost:5432
.
The engine can be used directly to issue SQL to the database. The most generic way is to use connections, which you get via the connect()
method:
connection = engine.connect() result = connection.execute("select username from users") for row in result: print "username:", row['username'] connection.close()
The connection is an instance of sqlalchemy.engine.Connection, which is a proxy object for an actual DBAPI connection. The returned result is an instance of sqlalchemy.engine.ResultProxy, which acts very much like a DBAPI cursor.
When you say engine.connect()
, a new Connection
object is created, and a DBAPI connection is retrieved from the connection pool. Later, when you call connection.close()
, the DBAPI connection is returned to the pool; nothing is actually "closed" from the perspective of the database.
To execute some SQL more quickly, you can skip the Connection
part and just say:
result = engine.execute("select username from users") for row in result: print "username:", row['username'] result.close()
Where above, the execute()
method on the Engine
does the connect()
part for you, and returns the ResultProxy
directly. The actual Connection
is inside the ResultProxy
, waiting for you to finish reading the result. In this case, when you close()
the ResultProxy
, the underlying Connection
is closed, which returns the DBAPI connection to the pool.
To summarize the above two examples, when you use a Connection
object, it's known as explicit execution. When you don't see the Connection
object, but you still use the execute()
method on the Engine
, it's called explicit, connectionless execution. A third variant of execution also exists called implicit execution; this will be described later.
The Engine
and Connection
can do a lot more than what we illustrated above; SQL strings are only its most rudimentary function. Later chapters will describe how "constructed SQL" expressions can be used with engines; in many cases, you don't have to deal with the Engine
at all after it's created. The Object Relational Mapper (ORM), an optional feature of SQLAlchemy, also uses the Engine
in order to get at connections; that's also a case where you can often create the engine once, and then forget about it.
Recall that the Dialect
is used to describe how to talk to a specific kind of database. Dialects are included with SQLAlchemy for SQLite, Postgres, MySQL, MS-SQL, Firebird, Informix, and Oracle; these can each be seen as a Python module present in the sqlalchemy.databases
package. Each dialect requires the appropriate DBAPI drivers to be installed separately.
Downloads for each DBAPI at the time of this writing are as follows:
The SQLAlchemy Wiki contains a page of database notes, describing whatever quirks and behaviors have been observed. Its a good place to check for issues with specific databases. Database Notes
back to section topSQLAlchemy indicates the source of an Engine strictly via RFC-1738 style URLs, combined with optional keyword arguments to specify options for the Engine. The form of the URL is:
driver://username:password@host:port/database
Available drivernames are sqlite
, mysql
, postgres
, oracle
, mssql
, and firebird
. For sqlite, the database name is the filename to connect to, or the special name ":memory:" which indicates an in-memory database. The URL is typically sent as a string to the create_engine()
function:
# postgres pg_db = create_engine('postgres://scott:tiger@localhost:5432/mydatabase') # sqlite (note the four slashes for an absolute path) sqlite_db = create_engine('sqlite:////absolute/path/to/database.txt') sqlite_db = create_engine('sqlite:///relative/path/to/database.txt') sqlite_db = create_engine('sqlite://') # in-memory database sqlite_db = create_engine('sqlite://:memory:') # the same # mysql mysql_db = create_engine('mysql://localhost/foo') # oracle via TNS name oracle_db = create_engine('oracle://scott:tiger@dsn') # oracle will feed host/port/SID into cx_oracle.makedsn oracle_db = create_engine('oracle://scott:tiger@127.0.0.1:1521/sidname')
The Engine
will ask the connection pool for a connection when the connect()
or execute()
methods are called. The default connection pool, QueuePool
, as well as the default connection pool used with SQLite, SingletonThreadPool
, will open connections to the database on an as-needed basis. As concurrent statements are executed, QueuePool
will grow its pool of connections to a default size of five, and will allow a default "overflow" of ten. Since the Engine
is essentially "home base" for the connection pool, it follows that you should keep a single Engine
per database established within an application, rather than creating a new one for each connection.
Custom arguments used when issuing the connect()
call to the underlying DBAPI may be issued in three distinct ways. String-based arguments can be passed directly from the URL string as query arguments:
db = create_engine('postgres://scott:tiger@localhost/test?argument1=foo&argument2=bar')
If SQLAlchemy's database connector is aware of a particular query argument, it may convert its type from string to its proper type.
create_engine
also takes an argument connect_args
which is an additional dictionary that will be passed to connect()
. This can be used when arguments of a type other than string are required, and SQLAlchemy's database connector has no type conversion logic present for that parameter:
db = create_engine('postgres://scott:tiger@localhost/test', connect_args = {'argument1':17, 'argument2':'bar'})
The most customizable connection method of all is to pass a creator
argument, which specifies a callable that returns a DBAPI connection:
def connect(): return psycopg.connect(user='scott', host='localhost') db = create_engine('postgres://', creator=connect)
Keyword options can also be specified to create_engine()
, following the string URL as follows:
db = create_engine('postgres://...', encoding='latin1', echo=True)
A list of all standard options, as well as several that are used by particular database dialects, is as follows:
True
alongside convert_unicode=True
, asserts that incoming string bind parameters are instances of unicode
, otherwise raises an error. Only takes effect when convert_unicode==True
. This flag is also available on the String
type and its descendants. New in 0.4.2. connect()
method as additional keyword arguments.
Unicode
column type instead, described in The Types System.
echo
attribute of Engine
can be modified at any time to turn logging on and off. If set to the string "debug"
, result rows will be printed to the standard output as well. This flag ultimately controls a Python logger; see Configuring Logging at the end of this chapter for information on how to configure logging directly.
Unicode
type object.
sqlalchemy.pool.Pool
, such as a QueuePool
instance. If non-None, this pool will be used directly as the underlying connection pool for the engine, bypassing whatever connection parameters are present in the URL argument. For information on constructing connection pools manually, see Connection Pooling.
sqlalchemy.pool.Pool
subclass, which will be used to create a connection pool instance using the connection parameters given in the URL. Note this differs from pool
in that you don't actually instantiate the pool in this case, you just indicate what type of pool to be used.
QueuePool
.
QueuePool
as well as SingletonThreadPool
.
QueuePool
.
TLEngine
class that provides a modified connection scope for connectionless executions. Connectionless execution as well as further detail on this setting are described in Connectionless Execution, Implicit Execution.
threaded
parameter of the connection indicating thread-safe usage. cx_Oracle docs indicate setting this flag to False
will speed performance by 10-15%. While this defaults to False
in cx_Oracle, SQLAlchemy defaults it to True
, preferring stability over early optimization.
column1(+)=column2
must be used in order to achieve a LEFT OUTER JOIN.
Recall from the beginning of this section that the Engine provides a connect()
method which returns a Connection
object. Connection
is a proxy object which maintains a reference to a DBAPI connection instance. The close()
method on Connection
does not actually close the DBAPI connection, but instead returns it to the connection pool referenced by the Engine
. Connection
will also automatically return its resources to the connection pool when the object is garbage collected, i.e. its __del__()
method is called. When using the standard C implementation of Python, this method is usually called immediately as soon as the object is dereferenced. With other Python implementations such as Jython, this is not so guaranteed.
The execute()
methods on both Engine
and Connection
can also receive SQL clause constructs as well:
connection = engine.connect() result = connection.execute(select([table1], table1.c.col1==5)) for row in result: print row['col1'], row['col2'] connection.close()
The above SQL construct is known as a select()
. The full range of SQL constructs available are described in SQL Expression Language Tutorial.
Both Connection
and Engine
fulfill an interface known as Connectable
which specifies common functionality between the two objects, namely being able to call connect()
to return a Connection
object (Connection
just returns itself), and being able to call execute()
to get a result set. Following this, most SQLAlchemy functions and objects which accept an Engine
as a parameter or attribute with which to execute SQL will also accept a Connection
. As of SQLAlchemy 0.3.9, this argument is named bind
.
engine = create_engine('sqlite:///:memory:') # specify some Table metadata metadata = MetaData() table = Table('sometable', metadata, Column('col1', Integer)) # create the table with the Engine table.create(bind=engine) # drop the table with a Connection off the Engine connection = engine.connect() table.drop(bind=connection)
Connection facts:
connection.close()
) whenever the connection is not in use. If your application has a need for management of multiple connections or is otherwise long running (this includes all web applications, threaded or not), don't hold a single connection open at the module level.
The Connection
object provides a begin()
method which returns a Transaction
object. This object is usually used within a try/except clause so that it is guaranteed to rollback()
or commit()
:
trans = connection.begin() try: r1 = connection.execute(table1.select()) connection.execute(table1.insert(), col1=7, col2='this is some data') trans.commit() except: trans.rollback() raise
The Transaction
object also handles "nested" behavior by keeping track of the outermost begin/commit pair. In this example, two functions both issue a transaction on a Connection, but only the outermost Transaction object actually takes effect when it is committed.
# method_a starts a transaction and calls method_b def method_a(connection): trans = connection.begin() # open a transaction try: method_b(connection) trans.commit() # transaction is committed here except: trans.rollback() # this rolls back the transaction unconditionally raise # method_b also starts a transaction def method_b(connection): trans = connection.begin() # open a transaction - this runs in the context of method_a's transaction try: connection.execute("insert into mytable values ('bat', 'lala')") connection.execute(mytable.insert(), col1='bat', col2='lala') trans.commit() # transaction is not committed yet except: trans.rollback() # this rolls back the transaction unconditionally raise # open a Connection and call method_a conn = engine.connect() method_a(conn) conn.close()
Above, method_a
is called first, which calls connection.begin()
. Then it calls method_b
. When method_b
calls connection.begin()
, it just increments a counter that is decremented when it calls commit()
. If either method_a
or method_b
calls rollback()
, the whole transaction is rolled back. The transaction is not committed until method_a
calls the commit()
method. This "nesting" behavior allows the creation of functions which "guarantee" that a transaction will be used if one was not already available, but will automatically participate in an enclosing transaction if one exists.
Note that SQLAlchemy's Object Relational Mapper also provides a way to control transaction scope at a higher level; this is described in unitofwork_transaction.
Transaction Facts:
The above transaction example illustrates how to use Transaction
so that several executions can take part in the same transaction. What happens when we issue an INSERT, UPDATE or DELETE call without using Transaction
? The answer is autocommit. While many DBAPIs implement a flag called autocommit
, the current SQLAlchemy behavior is such that it implements its own autocommit. This is achieved by detecting statements which represent data-changing operations, i.e. INSERT, UPDATE, DELETE, etc., and then issuing a COMMIT automatically if no transaction is in progress. The detection is based on compiled statement attributes, or in the case of a text-only statement via regular expressions.
conn = engine.connect() conn.execute("INSERT INTO users VALUES (1, 'john')") # autocommits
Recall from the first section we mentioned executing with and without a Connection
. Connectionless
execution refers to calling the execute()
method on an object which is not a Connection
, which could be on the Engine
itself, or could be a constructed SQL object. When we say "implicit", we mean that we are calling the execute()
method on an object which is neither a Connection
nor an Engine
object; this can only be used with constructed SQL objects which have their own execute()
method, and can be "bound" to an Engine
. A description of "constructed SQL objects" may be found in SQL Expression Language Tutorial.
A summary of all three methods follows below. First, assume the usage of the following MetaData
and Table
objects; while we haven't yet introduced these concepts, for now you only need to know that we are representing a database table, and are creating an "executable" SQL construct which issues a statement to the database. These objects are described in Database Meta Data.
meta = MetaData() users_table = Table('users', meta, Column('id', Integer, primary_key=True), Column('name', String(50)) )
Explicit execution delivers the SQL text or constructed SQL expression to the execute()
method of Connection
:
engine = create_engine('sqlite:///file.db') connection = engine.connect() result = connection.execute(users_table.select()) for row in result: # .... connection.close()
Explicit, connectionless execution delivers the expression to the execute()
method of Engine
:
engine = create_engine('sqlite:///file.db') result = engine.execute(users_table.select()) for row in result: # .... result.close()
Implicit execution is also connectionless, and calls the execute()
method on the expression itself, utilizing the fact that either an Engine
or Connection
has been bound to the expression object (binding is discussed further in the next section, Database Meta Data):
engine = create_engine('sqlite:///file.db') meta.bind = engine result = users_table.select().execute() for row in result: # .... result.close()
In both "connectionless" examples, the Connection
is created behind the scenes; the ResultProxy
returned by the execute()
call references the Connection
used to issue the SQL statement. When we issue close()
on the ResultProxy
, or if the result set object falls out of scope and is garbage collected, the underlying Connection
is closed for us, resulting in the DBAPI connection being returned to the pool.
With connectionless execution, each returned ResultProxy
object references its own distinct DBAPI connection object. This means that multiple executions will result in multiple DBAPI connections being used at the same time; the example below illustrates this:
db = create_engine('mysql://localhost/test') # execute one statement and receive results. r1 now references a DBAPI connection resource. r1 = db.execute("select * from table1") # execute a second statement and receive results. r2 now references a *second* DBAPI connection resource. r2 = db.execute("select * from table2") for row in r1: ... for row in r2: ... # release connection 1 r1.close() # release connection 2 r2.close()
Where above, we have two result sets in scope at the same time, therefore we have two distinct DBAPI connections, both separately checked out from the connection pool, in scope at the same time.
An option exists to create_engine()
called strategy="threadlocal"
, which changes this behavior. When this option is used, the Engine
which is returned by create_engine()
is a special subclass of engine called TLEngine
. This engine, when it creates the Connection
used by a connectionless execution, checks a threadlocal variable for an existing DBAPI connection that was already checked out from the pool, within the current thread. If one exists, it uses that one.
The usage of "threadlocal" modifies the underlying behavior of our example above, as follows:
db = create_engine('mysql://localhost/test', strategy='threadlocal') # execute one statement and receive results. r1 now references a DBAPI connection resource. r1 = db.execute("select * from table1") # execute a second statement and receive results. r2 now references the *same* resource as r1 r2 = db.execute("select * from table2") for row in r1: ... for row in r2: ... # close r1. the connection is still held by r2. r1.close() # close r2. with no more references to the underlying connection resources, they # are returned to the pool. r2.close()
Where above, we again have two result sets in scope at the same time, but because they are present in the same thread, there is only one DBAPI connection in use.
While the above distinction may not seem like much, it has several potentially desirable effects. One is that you can in some cases reduce the number of concurrent connections checked out from the connection pool, in the case that a ResultProxy
is still opened and a second statement is issued. A second advantage is that by limiting the number of checked out connections in a thread to just one, you eliminate the issue of deadlocks within a single thread, such as when connection A locks a table, and connection B attempts to read from the same table in the same thread, it will "deadlock" on waiting for connection A to release its lock; the threadlocal
strategy eliminates this possibility.
A third advantage to the threadlocal
strategy is that it allows the Transaction
object to be used in combination with connectionless execution. Recall from the section on transactions, that the Transaction
is returned by the begin()
method on a Connection
; all statements which wish to participate in this transaction must be executed by the same Connection
, thereby forcing the usage of an explicit connection. However, the TLEngine
provides a Transaction
that is local to the current thread; using it, one can issue many "connectionless" statements within a thread and they will all automatically partake in the current transaction, as in the example below:
# get a TLEngine engine = create_engine('mysql://localhost/test', strategy='threadlocal') engine.begin() try: engine.execute("insert into users values (?, ?)", 1, "john") users.update(users.c.user_id==5).execute(name='ed') engine.commit() except: engine.rollback()
Notice that no Connection
needed to be used; the begin()
method on TLEngine
(which note is not available on the regular Engine
) created a Transaction
as well as a Connection
, and held onto both in a context corresponding to the current thread. Each execute()
call made use of the same connection, allowing them all to participate in the same transaction.
Complex application flows can take advantage of the "threadlocal" strategy in order to allow many disparate parts of an application to take place in the same transaction automatically. The example below demonstrates several forms of "connectionless execution" as well as some specialized explicit ones:
engine = create_engine('mysql://localhost/test', strategy='threadlocal') def dosomethingimplicit(): table1.execute("some sql") table1.execute("some other sql") def dosomethingelse(): table2.execute("some sql") conn = engine.contextual_connect() # do stuff with conn conn.execute("some other sql") conn.close() def dosomethingtransactional(): conn = engine.contextual_connect() trans = conn.begin() # do stuff trans.commit() engine.begin() try: dosomethingimplicit() dosomethingelse() dosomethingtransactional() engine.commit() except: engine.rollback()
In the above example, the program calls three functions dosomethingimplicit()
, dosomethingelse()
and dosomethingtransactional()
. All three functions use either connectionless execution, or a special function contextual_connect()
which we will describe in a moment. These two styles of execution both indicate that all executions will use the same connection object. Additionally, the method dosomethingtransactional()
begins and commits its own Transaction
. But only one transaction is used, too; it's controlled completely by the engine.begin()
/engine.commit()
calls at the bottom. Recall that Transaction
supports "nesting" behavior, whereby transactions begun on a Connection
which already has a transaction open, will "nest" into the enclosing transaction. Since the transaction opened in dosomethingtransactional()
occurs using the same connection which already has a transaction begun, it "nests" into that transaction and therefore has no effect on the actual transaction scope (unless it calls rollback()
).
Some of the functions in the above example make use of a method called engine.contextual_connect()
. This method is available on both Engine
as well as TLEngine
, and returns the Connection
that applies to the current connection context. When using the TLEngine
, this is just another term for the "thread local connection" that is being used for all connectionless executions. When using just the regular Engine
(i.e. the "default" strategy), contextual_connect()
is synonymous with connect()
. Below we illustrate that two connections opened via contextual_connect()
at the same time, both reference the same underlying DBAPI connection:
# threadlocal strategy db = create_engine('mysql://localhost/test', strategy='threadlocal') conn1 = db.contextual_connect() conn2 = db.contextual_connect() >>> conn1.connection is conn2.connection True
The basic idea of contextual_connect()
is that it's the "connection used by connectionless execution". It's different from the connect()
method in that connect()
is always used when handling an explicit Connection
, which will always reference distinct DBAPI connection. Using connect()
in combination with TLEngine
allows one to "circumvent" the current thread local context, as in this example where a single statement issues data to the database externally to the current transaction:
engine.begin() engine.execute("insert into users values (?, ?)", 1, "john") connection = engine.connect() connection.execute(users.update(users.c.user_id==5).execute(name='ed')) engine.rollback()
In the above example, a thread-local transaction is begun, but is later rolled back. The statement insert into users values (?, ?)
is executed without using a connection, therefore uses the thread-local transaction. So its data is rolled back when the transaction is rolled back. However, the users.update()
statement is executed using a distinct Connection
returned by the engine.connect()
method, so it therefore is not part of the threadlocal transaction; it autocommits immediately.
As of the 0.3 series of SQLAlchemy, Python's standard logging module is used to implement informational and debug log output. This allows SQLAlchemy's logging to integrate in a standard way with other applications and libraries. The echo
and echo_pool
flags that are present on create_engine()
, as well as the echo_uow
flag used on Session
, all interact with regular loggers.
This section assumes familiarity with the above linked logging module. All logging performed by SQLAlchemy exists underneath the sqlalchemy
namespace, as used by logging.getLogger('sqlalchemy')
. When logging has been configured (i.e. such as via logging.basicConfig()
), the general namespace of SA loggers that can be turned on is as follows:
sqlalchemy.engine
- controls SQL echoing. set to logging.INFO
for SQL query output, logging.DEBUG
for query + result set output.
sqlalchemy.pool
- controls connection pool logging. set to logging.INFO
or lower to log connection pool checkouts/checkins.
sqlalchemy.orm
- controls logging of various ORM functions. set to logging.INFO
for configurational logging as well as unit of work dumps, logging.DEBUG
for extensive logging during query and flush() operations. Subcategories of sqlalchemy.orm
include:sqlalchemy.orm.attributes
- logs certain instrumented attribute operations, such as triggered callables
sqlalchemy.orm.mapper
- logs Mapper configuration and operations
sqlalchemy.orm.unitofwork
- logs flush() operations, including dependency sort graphs and other operations
sqlalchemy.orm.strategies
- logs relation loader operations (i.e. lazy and eager loads)
sqlalchemy.orm.sync
- logs synchronization of attributes from parent to child instances during a flush()
For example, to log SQL queries as well as unit of work debugging:
import logging logging.basicConfig() logging.getLogger('sqlalchemy.engine').setLevel(logging.INFO) logging.getLogger('sqlalchemy.orm.unitofwork').setLevel(logging.DEBUG)
By default, the log level is set to logging.ERROR
within the entire sqlalchemy
namespace so that no log operations occur, even within an application that has logging enabled otherwise.
The echo
flags present as keyword arguments to create_engine()
and others as well as the echo
property on Engine
, when set to True
, will first attempt to ensure that logging is enabled. Unfortunately, the logging
module provides no way of determining if output has already been configured (note we are referring to if a logging configuration has been set up, not just that the logging level is set). For this reason, any echo=True
flags will result in a call to logging.basicConfig()
using sys.stdout as the destination. It also sets up a default format using the level name, timestamp, and logger name. Note that this configuration has the affect of being configured in addition to any existing logger configurations. Therefore, when using Python logging, ensure all echo flags are set to False at all times, to avoid getting duplicate log lines.
The core of SQLAlchemy's query and object mapping operations are supported by database metadata, which is comprised of Python objects that describe tables and other schema-level objects. These objects can be created by explicitly naming the various components and their properties, using the Table, Column, ForeignKey, Index, and Sequence objects imported from sqlalchemy.schema
. There is also support for reflection of some entities, which means you only specify the name of the entities and they are recreated from the database automatically.
A collection of metadata entities is stored in an object aptly named MetaData
:
from sqlalchemy import * metadata = MetaData()
To represent a Table, use the Table
class:
users = Table('users', metadata, Column('user_id', Integer, primary_key = True), Column('user_name', String(16), nullable = False), Column('email_address', String(60), key='email'), Column('password', String(20), nullable = False) ) user_prefs = Table('user_prefs', metadata, Column('pref_id', Integer, primary_key=True), Column('user_id', Integer, ForeignKey("users.user_id"), nullable=False), Column('pref_name', String(40), nullable=False), Column('pref_value', String(100)) )
The specific datatypes for each Column, such as Integer, String, etc. are described in The Types System, and exist within the module sqlalchemy.types
as well as the global sqlalchemy
namespace.
Foreign keys are most easily specified by the ForeignKey
object within a Column
object. For a composite foreign key, i.e. a foreign key that contains multiple columns referencing multiple columns to a composite primary key, an explicit syntax is provided which allows the correct table CREATE statements to be generated:
# a table with a composite primary key invoices = Table('invoices', metadata, Column('invoice_id', Integer, primary_key=True), Column('ref_num', Integer, primary_key=True), Column('description', String(60), nullable=False) ) # a table with a composite foreign key referencing the parent table invoice_items = Table('invoice_items', metadata, Column('item_id', Integer, primary_key=True), Column('item_name', String(60), nullable=False), Column('invoice_id', Integer, nullable=False), Column('ref_num', Integer, nullable=False), ForeignKeyConstraint(['invoice_id', 'ref_num'], ['invoices.invoice_id', 'invoices.ref_num']) )
Above, the invoice_items
table will have ForeignKey
objects automatically added to the invoice_id
and ref_num
Column
objects as a result of the additional ForeignKeyConstraint
object.
The MetaData
object supports some handy methods, such as getting a list of Tables in the order (or reverse) of their dependency:
>>> for t in metadata.table_iterator(reverse=False): ... print t.name users user_prefs
And Table
provides an interface to the table's properties as well as that of its columns:
employees = Table('employees', metadata, Column('employee_id', Integer, primary_key=True), Column('employee_name', String(60), nullable=False, key='name'), Column('employee_dept', Integer, ForeignKey("departments.department_id")) ) # access the column "EMPLOYEE_ID": employees.columns.employee_id # or just employees.c.employee_id # via string employees.c['employee_id'] # iterate through all columns for c in employees.c: # ... # get the table's primary key columns for primary_key in employees.primary_key: # ... # get the table's foreign key objects: for fkey in employees.foreign_keys: # ... # access the table's MetaData: employees.metadata # access the table's bound Engine or Connection, if its MetaData is bound: employees.bind # access a column's name, type, nullable, primary key, foreign key employees.c.employee_id.name employees.c.employee_id.type employees.c.employee_id.nullable employees.c.employee_id.primary_key employees.c.employee_dept.foreign_key # get the "key" of a column, which defaults to its name, but can # be any user-defined string: employees.c.name.key # access a column's table: employees.c.employee_id.table is employees >>> True # get the table related by a foreign key fcolumn = employees.c.employee_dept.foreign_key.column.table
A MetaData
object can be associated with an Engine
or an individual Connection
; this process is called binding. The term used to describe "an engine or a connection" is often referred to as a connectable. Binding allows the MetaData
and the elements which it contains to perform operations against the database directly, using the connection resources to which it's bound. Common operations which are made more convenient through binding include being able to generate SQL constructs which know how to execute themselves, creating Table
objects which query the database for their column and constraint information, and issuing CREATE or DROP statements.
To bind MetaData
to an Engine
, use the bind
attribute:
engine = create_engine('sqlite://', **kwargs) # create MetaData meta = MetaData() # bind to an engine meta.bind = engine
Once this is done, the MetaData
and its contained Table
objects can access the database directly:
meta.create_all() # issue CREATE statements for all tables # describe a table called 'users', query the database for its columns users_table = Table('users', meta, autoload=True) # generate a SELECT statement and execute result = users_table.select().execute()
Note that the feature of binding engines is completely optional. All of the operations which take advantage of "bound" MetaData
also can be given an Engine
or Connection
explicitly with which to perform the operation. The equivalent "non-bound" of the above would be:
meta.create_all(engine) # issue CREATE statements for all tables # describe a table called 'users', query the database for its columns users_table = Table('users', meta, autoload=True, autoload_with=engine) # generate a SELECT statement and execute result = engine.execute(users_table.select())
A Table
object can be created without specifying any of its contained attributes, using the argument autoload=True
in conjunction with the table's name and possibly its schema (if not the databases "default" schema). (You can also specify a list or set of column names to autoload as the kwarg include_columns, if you only want to load a subset of the columns in the actual database.) This will issue the appropriate queries to the database in order to locate all properties of the table required for SQLAlchemy to use it effectively, including its column names and datatypes, foreign and primary key constraints, and in some cases its default-value generating attributes. To use autoload=True
, the table's MetaData
object need be bound to an Engine
or Connection
, or alternatively the autoload_with=<some connectable>
argument can be passed. Below we illustrate autoloading a table and then iterating through the names of its columns:
>>> messages = Table('messages', meta, autoload=True) >>> [c.name for c in messages.columns] ['message_id', 'message_name', 'date']
Note that if a reflected table has a foreign key referencing another table, the related Table
object will be automatically created within the MetaData
object if it does not exist already. Below, suppose table shopping_cart_items
references a table shopping_carts
. After reflecting, the shopping carts
table is present:
>>> shopping_cart_items = Table('shopping_cart_items', meta, autoload=True) >>> 'shopping_carts' in meta.tables: True
To get direct access to 'shopping_carts', simply instantiate it via the Table
constructor. Table
uses a special contructor that will return the already created Table
instance if it's already present:
shopping_carts = Table('shopping_carts', meta)
Of course, it's a good idea to use autoload=True
with the above table regardless. This is so that if it hadn't been loaded already, the operation will load the table. The autoload operation only occurs for the table if it hasn't already been loaded; once loaded, new calls to Table
will not re-issue any reflection queries.
Individual columns can be overridden with explicit values when reflecting tables; this is handy for specifying custom datatypes, constraints such as primary keys that may not be configured within the database, etc.
>>> mytable = Table('mytable', meta, ... Column('id', Integer, primary_key=True), # override reflected 'id' to have primary key ... Column('mydata', Unicode(50)), # override reflected 'mydata' to be Unicode ... autoload=True)
Some databases support the concept of multiple schemas. A Table
can reference this by specifying the schema
keyword argument:
financial_info = Table('financial_info', meta, Column('id', Integer, primary_key=True), Column('value', String(100), nullable=False), schema='remote_banks' )
Within the MetaData
collection, this table will be identified by the combination of financial_info
and remote_banks
. If another table called financial_info
is referenced without the remote_banks
schema, it will refer to a different Table
. ForeignKey
objects can reference columns in this table using the form remote_banks.financial_info.id
.
ON UPDATE
and ON DELETE
clauses to a table create are specified within the ForeignKeyConstraint
object, using the onupdate
and ondelete
keyword arguments:
foobar = Table('foobar', meta, Column('id', Integer, primary_key=True), Column('lala', String(40)), ForeignKeyConstraint(['lala'],['hoho.lala'], onupdate="CASCADE", ondelete="CASCADE"))
Note that these clauses are not supported on SQLite, and require InnoDB
tables when used with MySQL. They may also not be supported on other databases.
Tables
may support database-specific options, such as MySQL's engine
option that can specify "MyISAM", "InnoDB", and other backends for the table:
addresses = Table('engine_email_addresses', meta, Column('address_id', Integer, primary_key = True), Column('remote_user_id', Integer, ForeignKey(users.c.user_id)), Column('email_address', String(20)), mysql_engine='InnoDB' )
Creating and dropping individual tables can be done via the create()
and drop()
methods of Table
; these methods take an optional bind
parameter which references an Engine
or a Connection
. If not supplied, the Engine
bound to the MetaData
will be used, else an error is raised:
meta = MetaData() meta.bind = 'sqlite:///:memory:' employees = Table('employees', meta, Column('employee_id', Integer, primary_key=True), Column('employee_name', String(60), nullable=False, key='name'), Column('employee_dept', Integer, ForeignKey("departments.department_id")) ) sqlemployees.create()
drop()
method:
sqlemployees.drop(bind=e)
The create()
and drop()
methods also support an optional keyword argument checkfirst
which will issue the database's appropriate pragma statements to check if the table exists before creating or dropping:
employees.create(bind=e, checkfirst=True) employees.drop(checkfirst=False)
Entire groups of Tables can be created and dropped directly from the MetaData
object with create_all()
and drop_all()
. These methods always check for the existence of each table before creating or dropping. Each method takes an optional bind
keyword argument which can reference an Engine
or a Connection
. If no engine is specified, the underlying bound Engine
, if any, is used:
engine = create_engine('sqlite:///:memory:') metadata = MetaData() users = Table('users', metadata, Column('user_id', Integer, primary_key = True), Column('user_name', String(16), nullable = False), Column('email_address', String(60), key='email'), Column('password', String(20), nullable = False) ) user_prefs = Table('user_prefs', metadata, Column('pref_id', Integer, primary_key=True), Column('user_id', Integer, ForeignKey("users.user_id"), nullable=False), Column('pref_name', String(40), nullable=False), Column('pref_value', String(100)) ) sqlmetadata.create_all(bind=engine)
SQLAlchemy includes several constructs which provide default values provided during INSERT and UPDATE statements. The defaults may be provided as Python constants, Python functions, or SQL expressions, and the SQL expressions themselves may be "pre-executed", executed inline within the insert/update statement itself, or can be created as a SQL level "default" placed on the table definition itself. A "default" value by definition is only invoked if no explicit value is passed into the INSERT or UPDATE statement.
The "default" keyword argument on Column can reference a Python value or callable which is invoked at the time of an insert:
# a function which counts upwards i = 0 def mydefault(): global i i += 1 return i t = Table("mytable", meta, # function-based default Column('id', Integer, primary_key=True, default=mydefault), # a scalar default Column('key', String(10), default="default") )
Similarly, the "onupdate" keyword does the same thing for update statements:
import datetime t = Table("mytable", meta, Column('id', Integer, primary_key=True), # define 'last_updated' to be populated with datetime.now() Column('last_updated', DateTime, onupdate=datetime.datetime.now), )
The "default" and "onupdate" keywords may also be passed SQL expressions, including select statements or direct function calls:
t = Table("mytable", meta, Column('id', Integer, primary_key=True), # define 'create_date' to default to now() Column('create_date', DateTime, default=func.now()), # define 'key' to pull its default from the 'keyvalues' table Column('key', String(20), default=keyvalues.select(keyvalues.c.type='type1', limit=1)) # define 'last_modified' to use the current_timestamp SQL function on update Column('last_modified', DateTime, onupdate=func.current_timestamp()) )
The above SQL functions are usually executed "inline" with the INSERT or UPDATE statement being executed. In some cases, the function is "pre-executed" and its result pre-fetched explicitly. This happens under the following circumstances:
the column is a primary key column
the database dialect does not support a usable cursor.lastrowid
accessor (or equivalent); this currently includes Postgres, Oracle, and Firebird.
the statement is a single execution, i.e. only supplies one set of parameters and doesn't use "executemany" behavior
the inline=True
flag is not set on the Insert()
or Update()
construct.
For a statement execution which is not an executemany, the returned ResultProxy
will contain a collection accessible via result.postfetch_cols()
which contains a list of all Column
objects which had an inline-executed default. Similarly, all parameters which were bound to the statement, including all Python and SQL expressions which were pre-executed, are present in the last_inserted_params()
or last_updated_params()
collections on ResultProxy
. The last_inserted_ids()
collection contains a list of primary key values for the row inserted.
A variant on a SQL expression default is the server_default
, which gets placed in the CREATE TABLE statement during a create()
operation:
t = Table('test', meta, Column('abc', String(20), server_default='abc'), Column('created_at', DateTime, server_default=text("sysdate")) )
A create call for the above table will produce:
CREATE TABLE test ( abc varchar(20) default 'abc', created_at datetime default sysdate )
The behavior of server_default
is similar to that of a regular SQL default; if it's placed on a primary key column for a database which doesn't have a way to "postfetch" the ID, and the statement is not "inlined", the SQL expression is pre-executed; otherwise, SQLAlchemy lets the default fire off on the database side normally.
Columns with values set by a database trigger or other external process may be called out with a marker:
t = Table('test', meta, Column('abc', String(20), server_default=FetchedValue()) Column('def', String(20), server_onupdate=FetchedValue()) )
These markers do not emit a default
clause when the table is created, however they do set the same internal flags as a static server_default
clause, providing hints to higher-level tools that a "post-fetch" of these rows should be performed after an insert or update.
A table with a sequence looks like:
table = Table("cartitems", meta, Column("cart_id", Integer, Sequence('cart_id_seq'), primary_key=True), Column("description", String(40)), Column("createdate", DateTime()) )
The Sequence
object works a lot like the default
keyword on Column
, except that it only takes effect on a database which supports sequences. When used with a database that does not support sequences, the Sequence
object has no effect; therefore it's safe to place on a table which is used against multiple database backends. The same rules for pre- and inline execution apply.
When the Sequence
is associated with a table, CREATE and DROP statements issued for that table will also issue CREATE/DROP for the sequence object as well, thus "bundling" the sequence object with its parent table.
The flag optional=True
on Sequence
will produce a sequence that is only used on databases which have no "autoincrementing" capability. For example, Postgres supports primary key generation using the SERIAL keyword, whereas Oracle has no such capability. Therefore, a Sequence
placed on a primary key column with optional=True
will only be used with an Oracle backend but not Postgres.
A sequence can also be executed standalone, using an Engine
or Connection
, returning its next value in a database-independent fashion:
seq = Sequence('some_sequence') nextid = connection.execute(seq)
Unique constraints can be created anonymously on a single column using the unique
keyword on Column
. Explicitly named unique constraints and/or those with multiple columns are created via the UniqueConstraint
table-level construct.
meta = MetaData() mytable = Table('mytable', meta, # per-column anonymous unique constraint Column('col1', Integer, unique=True), Column('col2', Integer), Column('col3', Integer), # explicit/composite unique constraint. 'name' is optional. UniqueConstraint('col2', 'col3', name='uix_1') )
Check constraints can be named or unnamed and can be created at the Column or Table level, using the CheckConstraint
construct. The text of the check constraint is passed directly through to the database, so there is limited "database independent" behavior. Column level check constraints generally should only refer to the column to which they are placed, while table level constraints can refer to any columns in the table.
Note that some databases do not actively support check constraints such as MySQL and SQLite.
meta = MetaData() mytable = Table('mytable', meta, # per-column CHECK constraint Column('col1', Integer, CheckConstraint('col1>5')), Column('col2', Integer), Column('col3', Integer), # table level CHECK constraint. 'name' is optional. CheckConstraint('col2 > col3 + 5', name='check1') )
Indexes can be created anonymously (using an auto-generated name "ix_index
keyword on Column
, which also modifies the usage of unique
to apply the uniqueness to the index itself, instead of adding a separate UNIQUE constraint. For indexes with specific names or which encompass more than one column, use the Index
construct, which requires a name.
Note that the Index
construct is created externally to the table which it corresponds, using Column
objects and not strings.
meta = MetaData() mytable = Table('mytable', meta, # an indexed column, with index "ix_mytable_col1" Column('col1', Integer, index=True), # a uniquely indexed column with index "ix_mytable_col2" Column('col2', Integer, index=True, unique=True), Column('col3', Integer), Column('col4', Integer), Column('col5', Integer), Column('col6', Integer), ) # place an index on col3, col4 Index('idx_col34', mytable.c.col3, mytable.c.col4) # place a unique index on col5, col6 Index('myindex', mytable.c.col5, mytable.c.col6, unique=True)
The Index
objects will be created along with the CREATE statements for the table itself. An index can also be created on its own independently of the table:
# create a table sometable.create() # define an index i = Index('someindex', sometable.c.col5) # create the index, will use the table's bound connectable if the `bind` keyword argument not specified i.create()
A Table
object created against a specific MetaData
object can be re-created against a new MetaData using the tometadata
method:
# create two metadata meta1 = MetaData('sqlite:///querytest.db') meta2 = MetaData() # load 'users' from the sqlite engine users_table = Table('users', meta1, autoload=True) # create the same Table object for the plain metadata users_table_2 = users_table.tometadata(meta2)
The package sqlalchemy.types
defines the datatype identifiers which may be used when defining metadata. This package includes a set of generic types, a set of SQL-specific subclasses of those types, and a small extension system used by specific database connectors to adapt these generic types into database-specific type objects.
SQLAlchemy comes with a set of standard generic datatypes, which are defined as classes. Types are usually used when defining tables, and can be left as a class or instantiated, for example:
mytable = Table('mytable', metadata, Column('myid', Integer, primary_key=True), Column('data', String(30)), Column('info', Unicode(100)), Column('value', Number(7,4)) )
Following is a rundown of the standard types.
This type is the base type for all string and character types, such as Unicode
, TEXT
, CLOB
, etc. By default it generates a VARCHAR in DDL. It includes an argument length
, which indicates the length in characters of the type, as well as convert_unicode
and assert_unicode
, which are booleans. length
will be used as the length argument when generating DDL. If length
is omitted, the String
type resolves into the TEXT
type.
convert_unicode=True
indicates that incoming strings, if they are Python unicode
strings, will be encoded into a raw bytestring using the encoding
attribute of the dialect (defaults to utf-8
). Similarly, raw bytestrings coming back from the database will be decoded into unicode
objects on the way back.
assert_unicode
is set to None
by default. When True
, it indicates that incoming bind parameters will be checked that they are in fact unicode
objects, else an error is raised. A value of 'warn'
instead raises a warning. Setting it to None
indicates that the dialect-level convert_unicode
setting should take place, whereas setting it to False
disables it unconditionally (this flag is new as of version 0.4.2).
Both convert_unicode
and assert_unicode
may be set at the engine level as flags to create_engine()
.
The Unicode
type is shorthand for String
with convert_unicode=True
and assert_unicode='warn'
. When writing a Unicode-aware application, it is strongly recommended that this type is used, and that only Unicode strings are used in the application. By "Unicode string" we mean a string with a u, i.e. u'hello'
. Otherwise, particularly when using the ORM, data will be converted to Unicode when it returns from the database, but local data which was generated locally will not be in Unicode format, which can create confusion.
These are the "unbounded" versions of String
and Unicode
. They have no "length" parameter, and generate a column type of TEXT or CLOB.
Numeric types return decimal.Decimal
objects by default. The flag asdecimal=False
may be specified which enables the type to pass data straight through. Numeric also takes "precision" and "scale" arguments which are used when CREATE TABLE is issued.
Float types return Python floats. Float also takes a "precision" argument which is used when CREATE TABLE is issued.
back to section topDate and time types return objects from the Python datetime
module. Most DBAPIs have built in support for the datetime module, with the noted exception of SQLite. In the case of SQLite, date and time types are stored as strings which are then converted back to datetime objects when rows are returned.
The Interval type deals with datetime.timedelta
objects. In Postgres, the native INTERVAL type is used; for others, the value is stored as a date which is relative to the "epoch" (Jan. 1, 1970).
The Binary type generates BLOB or BYTEA when tables are created, and also converts incoming values using the Binary
callable provided by each DBAPI.
Boolean typically uses BOOLEAN or SMALLINT on the CREATE TABLE side, and returns Python True
or False
.
PickleType builds upon the Binary type to apply Python's pickle.dumps()
to incoming objects, and pickle.loads()
on the way out, allowing any pickleable Python object to be stored as a serialized binary field.
These are subclasses of the generic types and include:
class FLOAT(Numeric) class TEXT(String) class DECIMAL(Numeric) class INT(Integer) INTEGER = INT class TIMESTAMP(DateTime) class DATETIME(DateTime) class CLOB(String) class VARCHAR(String) class CHAR(String) class BLOB(Binary) class BOOLEAN(Boolean)
The idea behind the SQL-specific types is that a CREATE TABLE statement would generate the exact type specified.
Each dialect has its own set of types, many of which are available only within that dialect. For example, MySQL has a BigInteger
type and Postgres has an Inet
type. To use these, import them from the module explicitly:
from sqlalchemy.databases.mysql import MSEnum, MSBigInteger table = Table('foo', meta, Column('enumerates', MSEnum('a', 'b', 'c')), Column('id', MSBigInteger) )
Or some postgres types:
from sqlalchemy.databases.postgres import PGInet, PGArray table = Table('foo', meta, Column('ipaddress', PGInet), Column('elements', PGArray(str)) # PGArray is available in 0.4, and takes a type argument )
User-defined types can be created which can augment the bind parameter and result processing capabilities of the built in types. This is usually achieved using the TypeDecorator
class, which "decorates" the behavior of any existing type. As of version 0.4.2, the new process_bind_param()
and process_result_value()
methods should be used:
import sqlalchemy.types as types class MyType(types.TypeDecorator): """a type that decorates Unicode, prefixes values with "PREFIX:" on the way in and strips it off on the way out.""" impl = types.Unicode def process_bind_param(self, value, dialect): return "PREFIX:" + value def process_result_value(self, value, dialect): return value[7:] def copy(self): return MyType(self.impl.length)
TypeDecorator
should generally be used for any user-defined type which redefines the behavior of another type, including other TypeDecorator
subclasses such as PickleType
, and the new process_...()
methods described above should be used.
To build a type object from scratch, which will not have a corresponding database-specific implementation, subclass TypeEngine
:
import sqlalchemy.types as types class MyType(types.TypeEngine): def __init__(self, precision = 8): self.precision = precision def get_col_spec(self): return "MYTYPE(%s)" % self.precision def bind_processor(self, dialect): def process(value): return value return process def result_processor(self, dialect): def process(value): return value return process def convert_result_value(self, value, dialect): return value
Once you make your type, it's immediately useable:
table = Table('foo', meta, Column('id', Integer, primary_key=True), Column('data', MyType(16)) )
This section describes the connection pool module of SQLAlchemy. The Pool
object it provides is normally embedded within an Engine
instance. For most cases, explicit access to the pool module is not required. However, the Pool
object can be used on its own, without the rest of SA, to manage DBAPI connections; this section describes that usage. Also, this section will describe in more detail how to customize the pooling strategy used by an Engine
.
At the base of any database helper library is a system of efficiently acquiring connections to the database. Since the establishment of a database connection is typically a somewhat expensive operation, an application needs a way to get at database connections repeatedly without incurring the full overhead each time. Particularly for server-side web applications, a connection pool is the standard way to maintain a "pool" of database connections which are used over and over again among many requests. Connection pools typically are configured to maintain a certain "size", which represents how many connections can be used simultaneously without resorting to creating more newly-established connections.
Any DBAPI module can be "proxied" through the connection pool using the following technique (note that the usage of 'psycopg2' is just an example; substitute whatever DBAPI module you'd like):
import sqlalchemy.pool as pool import psycopg2 as psycopg psycopg = pool.manage(psycopg) # then connect normally connection = psycopg.connect(database='test', username='scott', password='tiger')
This produces a sqlalchemy.pool.DBProxy
object which supports the same connect()
function as the original DBAPI module. Upon connection, a connection proxy object is returned, which delegates its calls to a real DBAPI connection object. This connection object is stored persistently within a connection pool (an instance of sqlalchemy.pool.Pool
) that corresponds to the exact connection arguments sent to the connect()
function.
The connection proxy supports all of the methods on the original connection object, most of which are proxied via __getattr__()
. The close()
method will return the connection to the pool, and the cursor()
method will return a proxied cursor object. Both the connection proxy and the cursor proxy will also return the underlying connection to the pool after they have both been garbage collected, which is detected via the __del__()
method.
Additionally, when connections are returned to the pool, a rollback()
is issued on the connection unconditionally. This is to release any locks still held by the connection that may have resulted from normal activity.
By default, the connect()
method will return the same connection that is already checked out in the current thread. This allows a particular connection to be used in a given thread without needing to pass it around between functions. To disable this behavior, specify use_threadlocal=False
to the manage()
function.
For all types of Pool construction, which includes the "transparent proxy" described in the previous section, using an Engine
via create_engine()
, or constructing a pool through direct class instantiation, the options are generally the same. Additional options may be available based on the specific subclass of Pool
being used.
For a description of all pool classes, see the generated documentation.
Common options include:
echo_pool
.
create_engine()
, corresponding to the "plain" or
"threadlocal" connection strategy.
QueuePool options include:
Besides using the transparent proxy, instances of sqlalchemy.pool.Pool
can be created directly. Constructing your own pool involves passing a callable used to create a connection. Through this method, custom connection schemes can be made, such as a connection that automatically executes some initialization commands to start.
import sqlalchemy.pool as pool import psycopg2 def getconn(): c = psycopg2.connect(username='ed', host='127.0.0.1', dbname='test') # execute an initialization function on the connection before returning c.cursor.execute("setup_encodings()") return c p = pool.QueuePool(getconn, max_overflow=10, pool_size=5, use_threadlocal=True)
Or with SingletonThreadPool:
import sqlalchemy.pool as pool import sqlite def getconn(): return sqlite.connect(filename='myfile.db') # SQLite connections require the SingletonThreadPool p = pool.SingletonThreadPool(getconn)
SQLAlchemy has a variety of extensions available which provide extra functionality to SA, either via explicit usage or by augmenting the core behavior. Several of these extensions are designed to work together.
Author: Mike Bayer
Version: 0.4.4 or greater
declarative
intends to be a fully featured replacement for the very old activemapper
extension. Its goal is to redefine the organization of class, Table
, and mapper()
constructs such that they can all be defined "at once" underneath a class declaration. Unlike activemapper
, it does not redefine normal SQLAlchemy configurational semantics - regular Column
, relation()
and other schema or ORM constructs are used in almost all cases.
declarative
is a so-called "micro declarative layer"; it does not generate table or column names and requires almost as fully verbose a configuration as that of straight tables and mappers. As an alternative, the Elixir project is a full community-supported declarative layer for SQLAlchemy, and is recommended for its active-record-like semantics, its convention-based configuration, and plugin capabilities.
SQLAlchemy object-relational configuration involves the usage of Table, mapper(), and class objects to define the three areas of configuration. declarative moves these three types of configuration underneath the individual mapped class. Regular SQLAlchemy schema and ORM constructs are used in most cases:
from sqlalchemy.ext.declarative import declarative_base Base = declarative_base() class SomeClass(Base): __tablename__ = 'some_table' id = Column('id', Integer, primary_key=True) name = Column('name', String(50))
Above, the declarative_base
callable produces a new base class from which all mapped classes inherit from. When the class definition is
completed, a new Table
and mapper()
have been generated, accessible via the __table__
and __mapper__
attributes on the
SomeClass
class.
You may omit the names from the Column definitions. Declarative will fill them in for you:
class SomeClass(Base): __tablename__ = 'some_table' id = Column(Integer, primary_key=True) name = Column(String(50))
Attributes may be added to the class after its construction, and they will be added to the underlying Table
and mapper()
definitions as
appropriate:
SomeClass.data = Column('data', Unicode) SomeClass.related = relation(RelatedInfo)
Classes which are mapped explicitly using mapper()
can interact freely with declarative classes.
The declarative_base
base class contains a MetaData
object where newly defined Table
objects are collected. This is accessed via the metadata
class level accessor, so to create tables we can say:
engine = create_engine('sqlite://') Base.metadata.create_all(engine)
The Engine
created above may also be directly associated with the declarative base class using the bind
keyword argument, where it will be associated with the underlying MetaData
object and allow SQL operations involving that metadata and its tables to make use of that engine automatically:
Base = declarative_base(bind=create_engine('sqlite://'))
Or, as MetaData
allows, at any time using the bind
attribute:
Base.metadata.bind = create_engine('sqlite://')
The declarative_base
can also receive a pre-created MetaData
object, which allows a declarative setup to be associated with an already existing traditional collection of Table
objects:
mymetadata = MetaData() Base = declarative_base(metadata=mymetadata)
Relations to other classes are done in the usual way, with the added feature that the class specified to relation()
may be a string name. The
"class registry" associated with Base
is used at mapper compilation time to resolve the name into the actual class object, which is expected to
have been defined once the mapper configuration is used:
class User(Base): __tablename__ = 'users' id = Column('id', Integer, primary_key=True) name = Column('name', String(50)) addresses = relation("Address", backref="user") class Address(Base): __tablename__ = 'addresses' id = Column('id', Integer, primary_key=True) email = Column('email', String(50)) user_id = Column('user_id', Integer, ForeignKey('users.id'))
Column constructs, since they are just that, are immediately usable, as below where we define a primary join condition on the Address
class
using them:
class Address(Base) __tablename__ = 'addresses' id = Column('id', Integer, primary_key=True) email = Column('email', String(50)) user_id = Column('user_id', Integer, ForeignKey('users.id')) user = relation(User, primaryjoin=user_id==User.id)
In addition to the main argument for relation
, other arguments
which depend upon the columns present on an as-yet undefined class
may also be specified as strings. These strings are evaluated as
Python expressions. The full namespace available within this
evaluation includes all classes mapped for this declarative base,
as well as the contents of the sqlalchemy
package, including
expression functions like desc
and func
:
class User(Base): # .... addresses = relation("Address", order_by="desc(Address.email)", primaryjoin="Address.user_id==User.id")
As an alternative to string-based attributes, attributes may also be defined after all classes have been created. Just add them to the target class after the fact:
User.addresses = relation(Address, primaryjoin=Address.user_id==User.id)
Synonyms are one area where declarative
needs to slightly change the usual SQLAlchemy configurational syntax. To define a
getter/setter which proxies to an underlying attribute, use synonym
with the instruments
argument:
class MyClass(Base): __tablename__ = 'sometable' _attr = Column('attr', String) def _get_attr(self): return self._some_attr def _set_attr(self, attr) self._some_attr = attr attr = synonym('_attr', instruments=property(_get_attr, _set_attr))
The above synonym is then usable as an instance attribute as well as a class-level expression construct:
x = MyClass() x.attr = "some value" session.query(MyClass).filter(MyClass.attr == 'some other value').all()
The synonym_for
decorator can accomplish the same task:
class MyClass(Base): __tablename__ = 'sometable' _attr = Column('attr', String) @synonym_for('_attr') @property def attr(self): return self._some_attr
Similarly, comparable_using
is a front end for the comparable_property
ORM function:
class MyClass(Base): __tablename__ = 'sometable' name = Column('name', String) @comparable_using(MyUpperCaseComparator) @property def uc_name(self): return self.name.upper()
As an alternative to __tablename__
, a direct Table
construct may be used. The Column
objects, which in this case require their names, will be added to the mapping just like a regular mapping to a table:
class MyClass(Base): __table__ = Table('my_table', Base.metadata, Column('id', Integer, primary_key=True), Column('name', String(50)) )
Other table-based attributes include __table_args__
, which is
either a dictionary as in:
class MyClass(Base) __tablename__ = 'sometable' __table_args__ = {'mysql_engine':'InnoDB'}
or a dictionary-containing tuple in the form
(arg1, arg2, ..., {kwarg1:value, ...})
, as in:
class MyClass(Base) __tablename__ = 'sometable' __table_args__ = (ForeignKeyConstraint(['id'], ['remote_table.id']), {'autoload':True})
Mapper arguments are specified using the __mapper_args__
class variable. Note that the column objects declared on the class are immediately
usable, as in this joined-table inheritance example:
class Person(Base): __tablename__ = 'people' id = Column('id', Integer, primary_key=True) discriminator = Column('type', String(50)) __mapper_args__ = {'polymorphic_on':discriminator} class Engineer(Person): __tablename__ = 'engineers' __mapper_args__ = {'polymorphic_identity':'engineer'} id = Column('id', Integer, ForeignKey('people.id'), primary_key=True) primary_language = Column('primary_language', String(50))
For single-table inheritance, the __tablename__
and __table__
class variables are optional on a class when the class inherits from another
mapped class.
As a convenience feature, the declarative_base()
sets a default constructor on classes which takes keyword arguments, and assigns them to the
named attributes:
e = Engineer(primary_language='python')
Note that declarative
has no integration built in with sessions, and is only intended as an optional syntax for the regular usage of mappers
and Table objects. A typical application setup using scoped_session
might look like:
engine = create_engine('postgres://scott:tiger@localhost/test') Session = scoped_session(sessionmaker(autocommit=False, autoflush=False, bind=engine)) Base = declarative_base()
Mapped instances then make usage of Session
in the usual way.
Author: Mike Bayer and Jason Kirtland
Version: 0.3.1 or greater
associationproxy
is used to create a simplified, read/write view of a relationship. It can be used to cherry-pick fields from a collection of related objects or to greatly simplify access to associated objects in an association relationship.
Consider this "association object" mapping:
users_table = Table('users', metadata, Column('id', Integer, primary_key=True), Column('name', String(64)), ) keywords_table = Table('keywords', metadata, Column('id', Integer, primary_key=True), Column('keyword', String(64)) ) userkeywords_table = Table('userkeywords', metadata, Column('user_id', Integer, ForeignKey("users.id"), primary_key=True), Column('keyword_id', Integer, ForeignKey("keywords.id"), primary_key=True) ) class User(object): def __init__(self, name): self.name = name class Keyword(object): def __init__(self, keyword): self.keyword = keyword mapper(User, users_table, properties={ 'kw': relation(Keyword, secondary=userkeywords_table) }) mapper(Keyword, keywords_table)
Above are three simple tables, modeling users, keywords and a many-to-many relationship between the two. These Keyword
objects are little more than a container for a name, and accessing them via the relation is awkward:
user = User('jek') user.kw.append(Keyword('cheese inspector')) print user.kw # [<__main__.Keyword object at 0xb791ea0c>] print user.kw[0].keyword # 'cheese inspector' print [keyword.keyword for keyword in u._keywords] # ['cheese inspector']
With association_proxy
you have a "view" of the relation that contains just the .keyword
of the related objects. The proxy is a Python property, and unlike the mapper relation, is defined in your class:
from sqlalchemy.ext.associationproxy import association_proxy class User(object): def __init__(self, name): self.name = name # proxy the 'keyword' attribute from the 'kw' relation keywords = association_proxy('kw', 'keyword') # ... >>> user.kw [<__main__.Keyword object at 0xb791ea0c>] >>> user.keywords ['cheese inspector'] >>> user.keywords.append('snack ninja') >>> user.keywords ['cheese inspector', 'snack ninja'] >>> user.kw [<__main__.Keyword object at 0x9272a4c>, <__main__.Keyword object at 0xb7b396ec>]
The proxy is read/write. New associated objects are created on demand when values are added to the proxy, and modifying or removing an entry through the proxy also affects the underlying collection.
creator
function can be used to create instances instead.
Above, the Keyword.__init__
takes a single argument keyword
, which maps conveniently to the value being set through the proxy. A creator
function could have been used instead if more flexiblity was required.
Because the proxies are backed a regular relation collection, all of the usual hooks and patterns for using collections are still in effect. The most convenient behavior is the automatic setting of "parent"-type relationships on assignment. In the example above, nothing special had to be done to associate the Keyword to the User. Simply adding it to the collection is sufficient.
back to section topAssociation proxies are also useful for keeping association objects out the way during regular use. For example, the userkeywords
table might have a bunch of auditing columns that need to get updated when changes are made- columns that are updated but seldom, if ever, accessed in your application. A proxy can provide a very natural access pattern for the relation.
from sqlalchemy.ext.associationproxy import association_proxy # users_table and keywords_table tables as above, then: userkeywords_table = Table('userkeywords', metadata, Column('user_id', Integer, ForeignKey("users.id"), primary_key=True), Column('keyword_id', Integer, ForeignKey("keywords.id"), primary_key=True), # add some auditing columns Column('updated_at', DateTime, default=datetime.now), Column('updated_by', Integer, default=get_current_uid, onupdate=get_current_uid), ) def _create_uk_by_keyword(keyword): """A creator function.""" return UserKeyword(keyword=keyword) class User(object): def __init__(self, name): self.name = name keywords = association_proxy('user_keywords', 'keyword', creator=_create_uk_by_keyword) class Keyword(object): def __init__(self, keyword): self.keyword = keyword def __repr__(self): return 'Keyword(%s)' % repr(self.keyword) class UserKeyword(object): def __init__(self, user=None, keyword=None): self.user = user self.keyword = keyword mapper(User, users_table, properties={ 'user_keywords': relation(UserKeyword) }) mapper(Keyword, keywords_table) mapper(UserKeyword, userkeywords_table, properties={ 'user': relation(User), 'keyword': relation(Keyword), }) user = User('log') kw1 = Keyword('new_from_blammo') # Adding a Keyword requires creating a UserKeyword association object user.user_keywords.append(UserKeyword(user, kw1)) # And accessing Keywords requires traversing UserKeywords print user.user_keywords[0] # <__main__.UserKeyword object at 0xb79bbbec> print user.user_keywords[0].keyword # Keyword('new_from_blammo') # Lots of work. # It's much easier to go through the association proxy! for kw in (Keyword('its_big'), Keyword('its_heavy'), Keyword('its_wood')): user.keywords.append(kw) print user.keywords # [Keyword('new_from_blammo'), Keyword('its_big'), Keyword('its_heavy'), Keyword('its_wood')]
stocks = Table("stocks", meta, Column('symbol', String(10), primary_key=True), Column('description', String(100), nullable=False), Column('last_price', Numeric) ) brokers = Table("brokers", meta, Column('id', Integer,primary_key=True), Column('name', String(100), nullable=False) ) holdings = Table("holdings", meta, Column('broker_id', Integer, ForeignKey('brokers.id'), primary_key=True), Column('symbol', String(10), ForeignKey('stocks.symbol'), primary_key=True), Column('shares', Integer) )
Above are three tables, modeling stocks, their brokers and the number of shares of a stock held by each broker. This situation is quite different from the association example above. shares
is a property of the relation, an important one that we need to use all the time.
For this example, it would be very convenient if Broker
objects had a dictionary collection that mapped Stock
instances to the shares held for each. That's easy.
from sqlalchemy.ext.associationproxy import association_proxy from sqlalchemy.orm.collections import attribute_mapped_collection def _create_holding(stock, shares): """A creator function, constructs Holdings from Stock and share quantity.""" return Holding(stock=stock, shares=shares) class Broker(object): def __init__(self, name): self.name = name holdings = association_proxy('by_stock', 'shares', creator=_create_holding) class Stock(object): def __init__(self, symbol, description=None): self.symbol = symbol self.description = description self.last_price = 0 class Holding(object): def __init__(self, broker=None, stock=None, shares=0): self.broker = broker self.stock = stock self.shares = shares mapper(Stock, stocks_table) mapper(Broker, brokers_table, properties={ 'by_stock': relation(Holding, collection_class=attribute_mapped_collection('stock')) }) mapper(Holding, holdings_table, properties={ 'stock': relation(Stock), 'broker': relation(Broker) })
Above, we've set up the 'by_stock' relation collection to act as a dictionary, using the .stock
property of each Holding as a key.
Populating and accessing that dictionary manually is slightly inconvenient because of the complexity of the Holdings association object:
stock = Stock('ZZK') broker = Broker('paj') broker.holdings[stock] = Holding(broker, stock, 10) print broker.holdings[stock].shares # 10
The by_stock
proxy we've added to the Broker
class hides the details of the Holding
while also giving access to .shares
:
for stock in (Stock('JEK'), Stock('STPZ')): broker.holdings[stock] = 123 for stock, shares in broker.holdings.items(): print stock, shares # lets take a peek at that holdings_table after committing changes to the db print list(holdings_table.select().execute()) # [(1, 'ZZK', 10), (1, 'JEK', 123), (1, 'STEPZ', 123)]
Further examples can be found in the examples/
directory in the SQLAlchemy distribution.
The association_proxy
convenience function is not present in SQLAlchemy versions 0.3.1 through 0.3.7, instead instantiate the class directly:
from sqlalchemy.ext.associationproxy import AssociationProxy class Article(object): keywords = AssociationProxy('keyword_associations', 'keyword')
Author: Jason Kirtland
orderinglist
is a helper for mutable ordered relations. It will intercept
list operations performed on a relation collection and automatically
synchronize changes in list position with an attribute on the related objects.
(See advdatamapping_properties_entitycollections for more information on the general pattern.)
Example: Two tables that store slides in a presentation. Each slide has a number of bullet points, displayed in order by the 'position' column on the bullets table. These bullets can be inserted and re-ordered by your end users, and you need to update the 'position' column of all affected rows when changes are made.
slides_table = Table('Slides', metadata, Column('id', Integer, primary_key=True), Column('name', String)) bullets_table = Table('Bullets', metadata, Column('id', Integer, primary_key=True), Column('slide_id', Integer, ForeignKey('Slides.id')), Column('position', Integer), Column('text', String)) class Slide(object): pass class Bullet(object): pass mapper(Slide, slides_table, properties={ 'bullets': relation(Bullet, order_by=[bullets_table.c.position]) }) mapper(Bullet, bullets_table)
The standard relation mapping will produce a list-like attribute on each Slide containing all related Bullets, but coping with changes in ordering is totally your responsibility. If you insert a Bullet into that list, there is no magic- it won't have a position attribute unless you assign it it one, and you'll need to manually renumber all the subsequent Bullets in the list to accommodate the insert.
An orderinglist
can automate this and manage the 'position' attribute on all
related bullets for you.
mapper(Slide, slides_table, properties={ 'bullets': relation(Bullet, collection_class=ordering_list('position'), order_by=[bullets_table.c.position]) }) mapper(Bullet, bullets_table) s = Slide() s.bullets.append(Bullet()) s.bullets.append(Bullet()) s.bullets[1].position >>> 1 s.bullets.insert(1, Bullet()) s.bullets[2].position >>> 2
Use the ordering_list
function to set up the collection_class
on relations
(as in the mapper example above). This implementation depends on the list
starting in the proper order, so be SURE to put an order_by on your relation.
ordering_list
takes the name of the related object's ordering attribute as
an argument. By default, the zero-based integer index of the object's
position in the ordering_list
is synchronized with the ordering attribute:
index 0 will get position 0, index 1 position 1, etc. To start numbering at 1
or some other integer, provide count_from=1
.
Ordering values are not limited to incrementing integers. Almost any scheme
can implemented by supplying a custom ordering_func
that maps a Python list
index to any value you require. See the module
documentation for more
information, and also check out the unit tests for examples of stepped
numbering, alphabetical and Fibonacci numbering.
Author: Jonathan Ellis
SqlSoup creates mapped classes on the fly from tables, which are automatically reflected from the database based on name. It is essentially a nicer version of the "row data gateway" pattern.
>>> from sqlalchemy.ext.sqlsoup import SqlSoup >>> soup = SqlSoup('sqlite:///') >>> db.users.select(order_by=[db.users.c.name]) [MappedUsers(name='Bhargan Basepair',email='basepair@example.edu',password='basepair',classname=None,admin=1), MappedUsers(name='Joe Student',email='student@example.edu',password='student',classname=None,admin=0)]
Full SqlSoup documentation is on the SQLAlchemy Wiki.
back to section topThis is the MIT license: http://www.opensource.org/licenses/mit-license.php
Copyright (c) 2005, 2006, 2007, 2008 Michael Bayer and contributors. SQLAlchemy is a trademark of Michael Bayer.
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
back to section top