Data Mapping
Version: 0.1.6 Last Updated: 04/12/06 12:42:40
View: Paged  |  One Page
Basic Data Mapping

Data mapping describes the process of defining Mapper objects, which associate table metadata with user-defined classes.

The Mapper's role is to perform SQL operations upon the database, associating individual table rows with instances of those classes, and individual database columns with properties upon those instances, to transparently associate in-memory objects with a persistent database representation.

When a Mapper is created to associate a Table object with a class, all of the columns defined in the Table object are associated with the class via property accessors, which add overriding functionality to the normal process of setting and getting object attributes. These property accessors keep track of changes to object attributes; these changes will be stored to the database when the application "commits" the current transactional context (known as a Unit of Work). The __init__() method of the object is also decorated to communicate changes when new instances of the object are created.

The Mapper also provides the interface by which instances of the object are loaded from the database. The primary method for this is its select() method, which has similar arguments to a sqlalchemy.sql.Select object. But this select method executes automatically and returns results, instead of awaiting an execute() call. Instead of returning a cursor-like object, it returns an array of objects.

The three elements to be defined, i.e. the Table metadata, the user-defined class, and the Mapper, are typically defined as module-level variables, and may be defined in any fashion suitable to the application, with the only requirement being that the class and table metadata are described before the mapper. For the sake of example, we will be defining these elements close together, but this should not be construed as a requirement; since SQLAlchemy is not a framework, those decisions are left to the developer or an external framework.

back to section top
Synopsis

This is the simplest form of a full "round trip" of creating table meta data, creating a class, mapping the class to the table, getting some results, and saving changes. For each concept, the following sections will dig in deeper to the available capabilities.

from sqlalchemy import *

# engine
engine = create_engine("sqlite://mydb.db")

# table metadata
users = Table('users', engine, 
    Column('user_id', Integer, primary_key=True),
    Column('user_name', String(16)),
    Column('password', String(20))
)

# class definition 
class User(object):
    pass

# create a mapper
usermapper = mapper(User, users)

# select
sqluser = usermapper.select_by(user_name='fred')[0]
# modify
user.user_name = 'fred jones'

# commit - saves everything that changed
sqlobjectstore.commit()

Attaching Mappers to their Class

For convenience's sake, the Mapper can be attached as an attribute on the class itself as well:

User.mapper = mapper(User, users)

userlist = User.mapper.select_by(user_id=12)

There is also a full-blown "monkeypatch" function that creates a primary mapper, attaches the above mapper class property, and also the methods get, get_by, select, select_by, selectone, selectfirst, commit, expire, refresh, expunge and delete:

# "assign" a mapper to the User class/users table
assign_mapper(User, users)

# methods are attached to the class for selecting
userlist = User.select_by(user_id=12)

myuser = User.get(1)

# mark an object as deleted for the next commit
myuser.delete()

# commit the changes on a specific object
myotheruser.commit()

Other methods of associating mappers and finder methods with their corresponding classes, such as via common base classes or mixins, can be devised as well. SQLAlchemy does not aim to dictate application architecture and will always allow the broadest variety of architectural patterns, but may include more helper objects and suggested architectures in the future.

back to section top
Overriding Properties

A common request is the ability to create custom class properties that override the behavior of setting/getting an attribute. Currently, the easiest way to do this in SQLAlchemy is how it would be done in any Python program; define your attribute with a different name, such as "_attribute", and use a property to get/set its value. The mapper just needs to be told of the special name:

class MyClass(object):
    def _set_email(self, email):
        self._email = email
    def _get_email(self, email):
        return self._email
    email = property(_get_email, _set_email)

m = mapper(MyClass, mytable, properties = {
        # map the '_email' attribute to the "email" column
        # on the table
        '_email': mytable.c.email
})

In a later release, SQLAlchemy will also allow getemail and setemail to be attached directly to the "email" property created by the mapper, and will also allow this association to occur via decorators.

back to section top
Selecting from a Mapper

There are a variety of ways to select from a mapper. These range from minimalist to explicit. Below is a synopsis of the these methods:

# select_by, using property names or column names as keys
# the keys are grouped together by an AND operator
result = mapper.select_by(name='john', street='123 green street')

# select_by can also combine SQL criterion with key/value properties
result = mapper.select_by(users.c.user_name=='john', 
        addresses.c.zip_code=='12345, street='123 green street')

# get_by, which takes the same arguments as select_by
# returns a single scalar result or None if no results
user = mapper.get_by(id=12)

# "dynamic" versions of select_by and get_by - everything past the 
# "select_by_" or "get_by_" is used as the key, and the function argument
# as the value
result = mapper.select_by_name('fred')
u = mapper.get_by_name('fred')

# get an object directly from its primary key.  this will bypass the SQL
# call if the object has already been loaded
u = mapper.get(15)

# get an object that has a composite primary key of three columns.
# the order of the arguments matches that of the table meta data.
myobj = mapper.get(27, 3, 'receipts')

# using a WHERE criterion
result = mapper.select(or_(users.c.user_name == 'john', users.c.user_name=='fred'))

# using a WHERE criterion to get a scalar
u = mapper.selectfirst(users.c.user_name=='john')

# selectone() is a stricter version of selectfirst() which
# will raise an exception if there is not exactly one row
u = mapper.selectone(users.c.user_name=='john')

# using a full select object
result = mapper.select(users.select(users.c.user_name=='john'))

# using straight text  
result = mapper.select_text("select * from users where user_name='fred'")

# or using a "text" object
result = mapper.select(text("select * from users where user_name='fred'", engine=engine))

Some of the above examples above illustrate the usage of the mapper's Table object to provide the columns for a WHERE Clause. These columns are also accessible off of the mapped class directly. When a mapper is assigned to a class, it also attaches a special property accessor c to the class itself, which can be used just like the table metadata to access the columns of the table:

User.mapper = mapper(User, users)

userlist = User.mapper.select(User.c.user_id==12)
back to section top
Saving Objects

When objects corresponding to mapped classes are created or manipulated, all changes are logged by a package called sqlalchemy.mapping.objectstore. The changes are then written to the database when an application calls objectstore.commit(). This pattern is known as a Unit of Work, and has many advantages over saving individual objects or attributes on those objects with individual method invocations. Domain models can be built with far greater complexity with no concern over the order of saves and deletes, excessive database round-trips and write operations, or deadlocking issues. The commit() operation uses a transaction as well, and will also perform "concurrency checking" to insure the proper number of rows were in fact affected (not supported with the current MySQL drivers). Transactional resources are used effectively in all cases; the unit of work handles all the details.

The Unit of Work is a powerful tool, and has some important concepts that must be understood in order to use it effectively. While this section illustrates rudimentary Unit of Work usage, it is strongly encouraged to consult the None section for a full description on all its operations, including session control, deletion, and developmental guidelines.

When a mapper is created, the target class has its mapped properties decorated by specialized property accessors that track changes, and its __init__() method is also decorated to mark new objects as "new".

User.mapper = mapper(User, users)

# create a new User
myuser = User()
myuser.user_name = 'jane'
myuser.password = 'hello123'

# create another new User      
myuser2 = User()
myuser2.user_name = 'ed'
myuser2.password = 'lalalala'

# load a third User from the database            
sqlmyuser3 = User.mapper.select(User.c.user_name=='fred')[0]
myuser3.user_name = 'fredjones'

# save all changes            
sqlobjectstore.commit()

In the examples above, we defined a User class with basically no properties or methods. Theres no particular reason it has to be this way, the class can explicitly set up whatever properties it wants, whether or not they will be managed by the mapper. It can also specify a constructor, with the restriction that the constructor is able to function with no arguments being passed to it (this restriction can be lifted with some extra parameters to the mapper; more on that later):

class User(object):
    def __init__(self, user_name = None, password = None):
        self.user_id = None
        self.user_name = user_name
        self.password = password
    def get_name(self):
        return self.user_name
    def __repr__(self):
        return "User id %s name %s password %s" % (repr(self.user_id), 
            repr(self.user_name), repr(self.password))
User.mapper = mapper(User, users)

u = User('john', 'foo')
sqlobjectstore.commit()
>>> u
User id 1 name 'john' password 'foo'

Recent versions of SQLAlchemy will only put modified object attributes columns into the UPDATE statements generated upon commit. This is to conserve database traffic and also to successfully interact with a "deferred" attribute, which is a mapped object attribute against the mapper's primary table that isnt loaded until referenced by the application.

back to section top
Defining and Using Relationships

So that covers how to map the columns in a table to an object, how to load objects, create new ones, and save changes. The next step is how to define an object's relationships to other database-persisted objects. This is done via the relation function provided by the mapper module. So with our User class, lets also define the User has having one or more mailing addresses. First, the table metadata:

from sqlalchemy import *
engine = create_engine('sqlite://filename=mydb')

# define user table
users = Table('users', engine, 
    Column('user_id', Integer, primary_key=True),
    Column('user_name', String(16)),
    Column('password', String(20))
)

# define user address table
addresses = Table('addresses', engine,
    Column('address_id', Integer, primary_key=True),
    Column('user_id', Integer, ForeignKey("users.user_id")),
    Column('street', String(100)),
    Column('city', String(80)),
    Column('state', String(2)),
    Column('zip', String(10))
)

Of importance here is the addresses table's definition of a foreign key relationship to the users table, relating the user_id column into a parent-child relationship. When a Mapper wants to indicate a relation of one object to another, this ForeignKey object is the default method by which the relationship is determined (although if you didn't define ForeignKeys, or you want to specify explicit relationship columns, that is available as well).

So then lets define two classes, the familiar User class, as well as an Address class:

class User(object):
    def __init__(self, user_name = None, password = None):
        self.user_name = user_name
        self.password = password

class Address(object):
    def __init__(self, street=None, city=None, state=None, zip=None):
        self.street = street
        self.city = city
        self.state = state
        self.zip = zip

And then a Mapper that will define a relationship of the User and the Address classes to each other as well as their table metadata. We will add an additional mapper keyword argument properties which is a dictionary relating the name of an object property to a database relationship, in this case a relation object against a newly defined mapper for the Address class:

User.mapper = mapper(User, users, properties = {
                    'addresses' : relation(mapper(Address, addresses))
                }
              )

Lets do some operations with these classes and see what happens:

u = User('jane', 'hihilala')
u.addresses.append(Address('123 anywhere street', 'big city', 'UT', '76543'))
u.addresses.append(Address('1 Park Place', 'some other city', 'OK', '83923'))

objectstore.commit()
INSERT INTO users (user_name, password) VALUES (:user_name, :password)
{'password': 'hihilala', 'user_name': 'jane'}
INSERT INTO addresses (user_id, street, city, state, zip) VALUES (:user_id, :street, :city, :state, :zip)
{'city': 'big city', 'state': 'UT', 'street': '123 anywhere street', 'user_id':1, 'zip': '76543'}
INSERT INTO addresses (user_id, street, city, state, zip) VALUES (:user_id, :street, :city, :state, :zip)
{'city': 'some other city', 'state': 'OK', 'street': '1 Park Place', 'user_id':1, 'zip': '83923'}

A lot just happened there! The Mapper object figured out how to relate rows in the addresses table to the users table, and also upon commit had to determine the proper order in which to insert rows. After the insert, all the User and Address objects have all their new primary and foreign keys populated.

Also notice that when we created a Mapper on the User class which defined an 'addresses' relation, the newly created User instance magically had an "addresses" attribute which behaved like a list. This list is in reality a property accessor function, which returns an instance of sqlalchemy.util.HistoryArraySet, which fulfills the full set of Python list accessors, but maintains a unique set of objects (based on their in-memory identity), and also tracks additions and deletions to the list:

del u.addresses[1]
u.addresses.append(Address('27 New Place', 'Houston', 'TX', '34839'))

objectstore.commit()
UPDATE addresses SET user_id=:user_id
WHERE addresses.address_id = :addresses_address_id
[{'user_id': None, 'addresses_address_id': 2}]
INSERT INTO addresses (user_id, street, city, state, zip)
VALUES (:user_id, :street, :city, :state, :zip)
{'city': 'Houston', 'state': 'TX', 'street': '27 New Place', 'user_id': 1, 'zip': '34839'}
Useful Feature: Private Relations

So our one address that was removed from the list, was updated to have a user_id of None, and a new address object was inserted to correspond to the new Address added to the User. But now, theres a mailing address with no user_id floating around in the database of no use to anyone. How can we avoid this ? This is acheived by using the private=True parameter of relation:

User.mapper = mapper(User, users, properties = {
                    'addresses' : relation(mapper(Address, addresses), private=True)
                }
              )
del u.addresses[1]
u.addresses.append(Address('27 New Place', 'Houston', 'TX', '34839'))

objectstore.commit()
INSERT INTO addresses (user_id, street, city, state, zip)
VALUES (:user_id, :street, :city, :state, :zip)
{'city': 'Houston', 'state': 'TX', 'street': '27 New Place', 'user_id': 1, 'zip': '34839'}
DELETE FROM addresses WHERE addresses.address_id = :address_id
[{'address_id': 2}]

In this case, with the private flag set, the element that was removed from the addresses list was also removed from the database. By specifying the private flag on a relation, it is indicated to the Mapper that these related objects exist only as children of the parent object, otherwise should be deleted.

back to section top
Useful Feature: Backreferences

By creating relations with the backref keyword, a bi-directional relationship can be created which will keep both ends of the relationship updated automatically, even without any database queries being executed. Below, the User mapper is created with an "addresses" property, and the corresponding Address mapper receives a "backreference" to the User object via the property name "user":

Address.mapper = mapper(Address, addresses)
User.mapper = mapper(User, users, properties = {
                'addresses' : relation(Address.mapper, backref='user')
            }
          )

u = User('fred', 'hi')
a1 = Address('123 anywhere street', 'big city', 'UT', '76543')
a2 = Address('1 Park Place', 'some other city', 'OK', '83923')

# append a1 to u
u.addresses.append(a1)

# attach u to a2
a2.user = u

# the bi-directional relation is maintained
>>> u.addresses == [a1, a2]
True
>>> a1.user is user and a2.user is user
True

The backreference feature also works with many-to-many relationships, which are described later. When creating a backreference, a corresponding property is placed on the child mapper. The default arguments to this property can be overridden using the backref() function:

Address.mapper = mapper(Address, addresses)

User.mapper = mapper(User, users, properties = {
                'addresses' : relation(Address.mapper, 
                    backref=backref('user', lazy=False, private=True))
            }
          )
back to section top
Creating Relationships Automatically with cascade_mappers

The mapper package has a helper function cascade_mappers() which can simplify the task of linking several mappers together. Given a list of classes and/or mappers, it identifies the foreign key relationships between the given mappers or corresponding class mappers, and creates relation() objects representing those relationships, including a backreference. Attempts to find the "secondary" table in a many-to-many relationship as well. The names of the relations are a lowercase version of the related class. In the case of one-to-many or many-to-many, the name is "pluralized", which currently is based on the English language (i.e. an 's' or 'es' added to it):

# create two mappers.  the 'users' and 'addresses' tables have a foreign key
# relationship
mapper1 = mapper(User, users)
mapper2 = mapper(Address, addresses)

# cascade the two mappers together (can also specify User, Address as the arguments)
cascade_mappers(mapper1, mapper2)

# two new object instances
u = User('user1')
a = Address('test')

# "addresses" and "user" property are automatically added
u.addresses.append(a)
print a.user
back to section top
Selecting from Relationships: Lazy Load

We've seen how the relation specifier affects the saving of an object and its child items, how does it affect selecting them? By default, the relation keyword indicates that the related property should be attached a Lazy Loader when instances of the parent object are loaded from the database; this is just a callable function that when accessed will invoke a second SQL query to load the child objects of the parent.

# define a mapper
User.mapper = mapper(User, users, properties = {
              'addresses' : relation(mapper(Address, addresses), private=True)
            })

# select users where username is 'jane', get the first element of the list
# this will incur a load operation for the parent table
sqluser = User.mapper.select(user_name='jane')[0]
# iterate through the User object's addresses.  this will incur an
# immediate load of those child items
sqlfor a in user.addresses:
Useful Feature: Creating Joins via select_by

In mappers that have relationships, the select_by method and its cousins include special functionality that can be used to create joins. Just specify a key in the argument list which is not present in the primary mapper's list of properties or columns, but is present in the property list of one of its relationships:

sqll = User.mapper.select_by(street='123 Green Street')

The above example is shorthand for:

l = User.mapper.select(and_(
         Address.c.user_id==User.c.user_id, 
         Address.c.street=='123 Green Street')
   )
back to section top
How to Refresh the List?

Once the child list of Address objects is loaded, it is done loading for the lifetime of the object instance. Changes to the list will not be interfered with by subsequent loads, and upon commit those changes will be saved. Similarly, if a new User object is created and child Address objects added, a subsequent select operation which happens to touch upon that User instance, will also not affect the child list, since it is already loaded.

The issue of when the mapper actually gets brand new objects from the database versus when it assumes the in-memory version is fine the way it is, is a subject of transactional scope. Described in more detail in the Unit of Work section, for now it should be noted that the total storage of all newly created and selected objects, within the scope of the current thread, can be reset via releasing or otherwise disregarding all current object instances, and calling:

objectstore.clear()

This operation will clear out all currently mapped object instances, and subsequent select statements will load fresh copies from the databse.

To operate upon a single object, just use the remove function:

# (this function coming soon)
objectstore.remove(myobject)
back to section top
Selecting from Relationships: Eager Load

With just a single parameter "lazy=False" specified to the relation object, the parent and child SQL queries can be joined together.

Address.mapper = mapper(Address, addresses)
User.mapper = mapper(User, users, properties = {
                'addresses' : relation(Address.mapper, lazy=False)
            }
          )

sqluser = User.mapper.get_by(user_name='jane')
for a in user.addresses:  
    print repr(a)

Above, a pretty ambitious query is generated just by specifying that the User should be loaded with its child Addresses in one query. When the mapper processes the results, it uses an Identity Map to keep track of objects that were already loaded, based on their primary key identity. Through this method, the redundant rows produced by the join are organized into the distinct object instances they represent.

The generation of this query is also immune to the effects of additional joins being specified in the original query. To use our select_by example above, joining against the "addresses" table to locate users with a certain street results in this behavior:

sqlusers = User.mapper.select_by(street='123 Green Street')

The join implied by passing the "street" parameter is converted into an "aliasized" clause by the eager loader, so that it does not conflict with the join used to eager load the child address objects.

back to section top
Switching Lazy/Eager, No Load

The options method of mapper provides an easy way to get alternate forms of a mapper from an original one. The most common use of this feature is to change the "eager/lazy" loading behavior of a particular mapper, via the functions eagerload(), lazyload() and noload():

# user mapper with lazy addresses
User.mapper = mapper(User, users, properties = {
             'addresses' : relation(mapper(Address, addresses))
         }
)

# make an eager loader                    
eagermapper = User.mapper.options(eagerload('addresses'))
u = eagermapper.select()

# make another mapper that wont load the addresses at all
plainmapper = User.mapper.options(noload('addresses'))

# multiple options can be specified
mymapper = oldmapper.options(lazyload('tracker'), noload('streets'), eagerload('members'))

# to specify a relation on a relation, separate the property names by a "."
mymapper = oldmapper.options(eagerload('orders.items'))
back to section top
One to One/Many to One

The above examples focused on the "one-to-many" relationship. To do other forms of relationship is easy, as the relation function can usually figure out what you want:

# a table to store a user's preferences for a site
prefs = Table('user_prefs', engine,
    Column('pref_id', Integer, primary_key = True),
    Column('stylename', String(20)),
    Column('save_password', Boolean, nullable = False),
    Column('timezone', CHAR(3), nullable = False)
)

# user table gets 'preference_id' column added
users = Table('users', engine, 
    Column('user_id', Integer, primary_key = True),
    Column('user_name', String(16), nullable = False),
    Column('password', String(20), nullable = False),
    Column('preference_id', Integer, ForeignKey("prefs.pref_id"))
)

# class definition for preferences
class UserPrefs(object):
    pass
UserPrefs.mapper = mapper(UserPrefs, prefs)

# address mapper
Address.mapper = mapper(Address, addresses)

# make a new mapper referencing everything.
m = mapper(User, users, properties = dict(
    addresses = relation(Address.mapper, lazy=True, private=True),
    preferences = relation(UserPrefs.mapper, lazy=False, private=True),
))

# select
sqluser = m.get_by(user_name='fred')
save_password = user.preferences.save_password

# modify
user.preferences.stylename = 'bluesteel'
sqluser.addresses.append(Address('freddy@hi.org'))
# commit
sqlobjectstore.commit()
back to section top
Many to Many

The relation function handles a basic many-to-many relationship when you specify the association table:

articles = Table('articles', engine,
    Column('article_id', Integer, primary_key = True),
    Column('headline', String(150), key='headline'),
    Column('body', TEXT, key='body'),
)

keywords = Table('keywords', engine,
    Column('keyword_id', Integer, primary_key = True),
    Column('keyword_name', String(50))
)

itemkeywords = Table('article_keywords', engine,
    Column('article_id', Integer, ForeignKey("articles.article_id")),
    Column('keyword_id', Integer, ForeignKey("keywords.keyword_id"))
)

# class definitions
class Keyword(object):
    def __init__(self, name = None):
        self.keyword_name = name

class Article(object):
    pass

# define a mapper that does many-to-many on the 'itemkeywords' association 
# table
Article.mapper = mapper(Article, articles, properties = dict(
    keywords = relation(mapper(Keyword, keywords), itemkeywords, lazy=False)
    )
)

article = Article()
article.headline = 'a headline'
article.body = 'this is the body'
article.keywords.append(Keyword('politics'))
article.keywords.append(Keyword('entertainment'))
sqlobjectstore.commit()
# select articles based on a keyword.  select_by will handle the extra joins.
sqlarticles = Article.mapper.select_by(keyword_name='politics')
# modify
a = articles[0]
del a.keywords[:]
a.keywords.append(Keyword('topstories'))
a.keywords.append(Keyword('government'))

# commit.  individual INSERT/DELETE operations will take place only for the list
# elements that changed.
sqlobjectstore.commit()
back to section top
Association Object

Many to Many can also be done with an association object, that adds additional information about how two items are related. This association object is set up in basically the same way as any other mapped object. However, since an association table typically has no primary key columns, you have to tell the mapper what columns will compose its "primary key", which are the two (or more) columns involved in the association. Also, the relation function needs an additional hint as to the fact that this mapped object is an association object, via the "association" argument which points to the class or mapper representing the other side of the association.

# add "attached_by" column which will reference the user who attached this keyword
itemkeywords = Table('article_keywords', engine,
    Column('article_id', Integer, ForeignKey("articles.article_id")),
    Column('keyword_id', Integer, ForeignKey("keywords.keyword_id")),
    Column('attached_by', Integer, ForeignKey("users.user_id"))
)

# define an association class
class KeywordAssociation(object):
    pass

# mapper for KeywordAssociation
# specify "primary key" columns manually
KeywordAssociation.mapper = mapper(KeywordAssociation, itemkeywords,
    primary_key = [itemkeywords.c.article_id, itemkeywords.c.keyword_id],
    properties={
        'keyword' : relation(Keyword, lazy = False), # uses primary Keyword mapper
        'user' : relation(User, lazy = True) # uses primary User mapper
    }
)

# mappers for Users, Keywords
User.mapper = mapper(User, users)
Keyword.mapper = mapper(Keyword, keywords)

# define the mapper. 
m = mapper(Article, articles, properties={
    'keywords':relation(KeywordAssociation.mapper, lazy=False, association=Keyword)
    }
)

# bonus step - well, we do want to load the users in one shot, 
# so modify the mapper via an option.
# this returns a new mapper with the option switched on.
m2 = mapper.options(eagerload('keywords.user'))

# select by keyword again
sqlalist = m2.select_by(keyword_name='jacks_stories')
# user is available
for a in alist:
    for k in a.keywords:
        if k.keyword.name == 'jacks_stories':
            print k.user.user_name
back to section top