ProfileCollection

class biskit.ProfileCollection(profiles=None, infos=None)[source]

Bases: object

Manage profiles (arrays or lists of values) for Trajectory frames or atoms/residues in PDBModel. ProfileCollection resembles a 2-dimensional array where the first axis (let’s say row) is accessed by a string key rather than an index. Each row has an additional (meta)info dictionary assigned to it. The take() and concat() methods operate on the columns, i.e. they are applied to all profiles simultaneously.

By default, profiles of numbers are stored and returned as Numpy.array and all others are stored and returned as ordinary list. This behaviour can be modified with the option asarray of ProfileCollection.set(). Using both lists and arrays is a compromise between the efficiency of Numeric arrays and two problems of of the old Numeric module – (1) arrays of objects could not be unpickled (Numeric bug) and (2) arrays of strings would end up as 2-D arrays of char. The ‘isarray’ entry of a profile’s info dictionary tells whether the profile is stored as array or as list. (Since we have now replaced Numeric by numpy, we can probably switch to the exclusive use of numpy arrays.)

ProfileCollection p can be used like a dictionary of lists::

len( p ) -> number of profiles (== len( p.profiles ) ) p[‘prof1’] -> list/array with values of profile ‘prof1’ del p[‘prof1’] -> remove a profile p[‘prof1’] = [..] -> add/override a profile without additional infos

‘prof1’ in p -> True, if collection contains key ‘prof1’ for k in p: -> iterate over profile keys for k in p.iteritems(): -> iterate over key:profile pairs

Each profile key also has a dictionary of meta infos assigned to it (see getInfo(), setInfo(), p.infos). These can be accessed like:

p['prof1','date']   -> date of creation of profile named 'prof1'
p.getInfo('prof1')  -> returns all info records
p['prof1','comment'] = 'first prof'  -> add/change single info value

ProfileCollections can also be viewed from the side (along columns) – :class:`CrossView`s provide a virtual dictionary with the values of all profiles at a certain position. Changes to the dictionary will change the value in the underlying profile and vice versa, for example:

atom = p[10]        -> CrossView{'prof1' : 42.0, 'name' : 'CA', ... }
atom['prof1'] = 33  -> same as p['prof1'][10] = 33
p[10]['prof1']= 33  -> still the same but much slower, use instead...
p['prof1'][10]= 33  -> doesn't invoke CrossView and is thus faster

p[0] == p[-1]  -> True if the values of all profiles are identical at
                  first and last position

for a in p.iterCrossViews():  -> iterates over CrossView dictionaries
p.toCrossViews()              -> list of CrossViews for all positions

For read-only access, normal dictionaries are faster than CrossViews:

for d in p.iterDicts():       -> iterate over normal dictionaries
p.toDicts()         -> list of normal (but disconnected) dictionaries

Adding a profile to a ProfileCollection will also ‘magically’ add an additional key to all existing CrossViews of it. CrossViews become invalid if their parent collection is garbage collected or re-ordered.

Note: The creation of many CrossViews hampers performance. We provide a ProfileCollection.iterCrossView iterator to loop over these pseudo dictionaries for convenience – direct iteration over the elements of a profile (array or list) is about 100 times faster. If you need to repeatedly read from many dictionaries, consider using ProfileCollection.toDicts and cache the resulting normal (disconnected) dictionaries:

cache = p.toDicts() # 'list( p.iterDicts() )' is equivalent but slower

Note

Profile arrays of float or int are automatically converted to arrays of type Float32 or Int32. This is a safety measure because we have stumbled over problems when transferring pickled objects between 32 and 64 bit machines. With the transition to numpy, this may not be needed any longer.

See also: ProfileCollection.__picklesave_array

Methods Overview

__init__ Initialize self.
array_or_list Convert to array or list depending on asarray option
clear Delete all:: clear() -> None; delete all profiles and infos.
clone Clone (deepcopy) profiles:: clone() -> ProfileCollection (or sub-class)
compress Extract using a mask:: p.compress( mask ) <==> p.take( N.nonzero( mask ) )
concat Concatenate all profiles in this with corresponding profiles in the given ProfileCollection(s).
expand Expand profile to have a value also for masked positions.
get OR get( (profKey, infoKey), [default] ) -> single value of info dict
getInfo Use:: getInfo( name ) -> dict with meta infos about profile:
hasNoneProfile Check wether any profile is None, which means it is waiting to be updated from a source ProfileCollection.
has_key
isChanged
param keys:only check these profiles (default: None -> means all) :type keys: [ str ] OR str :return: True, if any of the profiles is tagged as ‘changed’ :rtype: bool
items Get list of tuples of profile names and profiles:: p.items() -> [ (key1, [any]), (key2, [any]), ..) ]
iterCrossViews Iterate over values of all profiles as CrossView ‘dictionaries’ indexed by profile name, for example: >>> for atom in p.iterCrossViews(): …
iterDicts Iterate over (copies of) values of all profiles as normal dictionaries indexed by profile name, for example: >>> for atom in p.iterCrossViews(): …
iteritems Iterate over (key : profile) pairs: >>> for key, profile in p.iteritems(): …
keys
killViews Deactivate any CrossView instances referring to this ProfileCollection.
plot Plot one or more profiles using Biggles:: plot( name1, [name2, ..],[arg1=x, arg2=y]) -> biggles.FramedPlot
plotArray Plot several profiles as a panel of separate plots.
plotHistogram
param bins:number of bins (10) :type bins: int :param ynormalize: normalize histograms to area 1.0 (False) :type ynormalize: bool :param xnormalize: adapt bin range to min and max of all profiles (True) :type xnormalize: bool :param xrange: min and max of bin range (None) :type xrange: (float, float) :param steps: draw histogram steps (True) :type steps: bool
profLength Length of profile:: profLength() -> int; length of first non-None profile or default (0)
profile2mask Convert profile into a mask based on the max and min cutoff values.
remove Remove profile OR info values of profile:: remove( profKey ) -> 1|0, 1 if complete entry has been removed remove( profKey, infoKey ) -> 1|0, 1 if single info value was removed
set Add/override a profile.
setInfo Add/Override infos about a given profile:: e.g.
setMany setMany( dict, [infoDict] ) Add/Override many profiles
take Take from profiles using provided indices:: take( indices ) -> ProfileCollection with extract of all profiles
toCrossViews
return:list of CrossView pseudo dictionaries :rtype: [ CrossView ]
toDicts
return:(copies of) values of all profiles as normal dictionaries :rtype: [ dict ]
update Merge other ProfileCollection into this one, replacing existing profiles and info values.
updateMissing Merge other ProfileCollection into this one but do not override existing profiles and info records.
values Get list of all profiles (arrays or lists of values):: p.values() -> [ [any], [any], …
version Class version.

ProfileCollection Method & Attribute Details

__init__(profiles=None, infos=None)[source]

Initialize self. See help(type(self)) for accurate signature.

version()[source]

Class version.

Returns:class version number
Return type:str
values()[source]
Get list of all profiles (arrays or lists of values)::
p.values() -> [ [any], [any], … ]
Returns:list of lists or arrays
Return type:[ list/array ]
hasNoneProfile()[source]

Check wether any profile is None, which means it is waiting to be updated from a source ProfileCollection. This method is written such that it is not triggering the updating mechanism. :return bool

items()[source]
Get list of tuples of profile names and profiles::
p.items() -> [ (key1, [any]), (key2, [any]), ..) ]
Returns:list of tuples of profile names and profiles
Return type:[ ( str, list/array ) ]
iteritems()[source]
Iterate over (key : profile) pairs:
>>> for key, profile in p.iteritems():
...
iterCrossViews()[source]

Iterate over values of all profiles as CrossView ‘dictionaries’ indexed by profile name, for example:

>>> for atom in p.iterCrossViews():
...     print atom['name'], atom['residue_name']

The CrossViews remain connected to the profiles and can be used to change values in many profiles simultaneously. Consider using the somewhat faster ProfileCollection.iterDicts if this is not needed and speed is critical.

Returns:CrossView instances behaving like dictionaries
Return type:iterator over [ CrossView ]
iterDicts()[source]

Iterate over (copies of) values of all profiles as normal dictionaries indexed by profile name, for example:

>>> for atom in p.iterCrossViews():
...     print atom['name'], atom['residue_name']
Returns:dictionaries
Return type:iterator over { ‘key1’:v1, ‘key2’:v2 }
toCrossViews()[source]
Returns:list of CrossView pseudo dictionaries
Return type:[ CrossView ]
toDicts()[source]
Returns:(copies of) values of all profiles as normal dictionaries
Return type:[ dict ]
array_or_list(prof, asarray)[source]

Convert to array or list depending on asarray option

Beware: empty lists will be upgraded to empty Float arrays.

Parameters:
  • prof (list OR array) – profile
  • asarray (2|1|0) – 1.. autodetect type, 0.. force list, 2.. force array
Returns:

profile

Return type:

list OR array

Raises:

ProfileError

expand(prof, mask, default)[source]

Expand profile to have a value also for masked positions.

Parameters:
  • prof (list OR array) – input profile
  • mask ([int]) – atom mask
  • default (any) – default value
Returns:

profile

Return type:

list OR array

set(name, prof, mask=None, default=None, asarray=1, comment=None, **moreInfo)[source]

Add/override a profile. None is allowed as special purpose value - in which case all other parameters are ignored. Otherwise, the two info records ‘version’, ‘changed’ and ‘isarray’ are always modified but can be overridden by key=value pairs to this function.

Parameters:
  • name (str) – profile name (i.e. key)
  • prof ([any] OR None) – list of values OR None
  • mask ([int]) – list 1 x N_items of 0|1, if there are less values than items, provide mask with 0 for missing values, N.sum(mask)==N_items
  • default (any) – value for items masked. (default: None for lists, 0 for arrays]
  • asarray (0|1|2) – store as list (0), as array (2) or store numbers as array but everything else as list (1) (default: 1)
  • comment (str) – goes into info[name][‘comment’]
  • moreInfo (key=value) – additional key-value pairs for info[name]
Raises:
  • ProfileError – if length of prof != length of other profiles
  • ProfileError – if mask is given but N.sum(mask) != len(prof)
setInfo(name, **args)[source]
Add/Override infos about a given profile::
e.g. setInfo(‘relASA’, comment=’new’, params={‘bin’:’whatif’})
Raises:ProfileError – if no profile is found with |name|
setMany(profileDict, infos={})[source]

setMany( dict, [infoDict] ) Add/Override many profiles

Parameters:
  • profileDict (dict) – dict with name:profile pairs
  • infos (dict of dict) – info dicts for each profile, indexed by name
get(profKey[, default]) → list of values[source]

OR get( (profKey, infoKey), [default] ) -> single value of info dict

Parameters:
  • name (str OR (str, str)) – profile key or profile and info key
  • default (any) – default result if no profile is found, if None and no profile is found, raise exception
Raises:

ProfileError – if no profile is found with |name|

getInfo(name)[source]
Use::
getInfo( name ) -> dict with meta infos about profile:

Guaranteed infos: ‘version’->str, ‘comment’->str, ‘changed’->1|0

Parameters:name (str) – profile name
Returns:dict with infos about profile
Return type:dict
Raises:ProfileError – if no profile is found with |name|
profile2mask(profName, cutoff_min=None, cutoff_max=None)[source]

Convert profile into a mask based on the max and min cutoff values.

Parameters:
  • profName (str) – profile name
  • cutoff_min (float) – lower limit
  • cutoff_max (float) – upper limit
Returns:

mask len( get(profName) ) x 1|0

Return type:

[1|0]

take(indices, *initArgs, **initKw)[source]
Take from profiles using provided indices::
take( indices ) -> ProfileCollection with extract of all profiles

Any additional parameters are passed to the constructor of the new instance.

Parameters:indices ([int]) – list of indices
Returns:new profile from indices
Return type:ProfileCollection (or sub-class)
Raises:ProfileError – if take error
compress(cond)[source]
Extract using a mask::
p.compress( mask ) <==> p.take( N.nonzero( mask ) )
Parameters:cond (array or list of int) – mask with 1 for the positions to keep
remove(*key)[source]
Remove profile OR info values of profile::
remove( profKey ) -> 1|0, 1 if complete entry has been removed remove( profKey, infoKey ) -> 1|0, 1 if single info value was removed
Parameters:key (str OR str, str) – profile name OR name, infoKey
Returns:sucess status
Return type:1|0
concat(*profiles)[source]

Concatenate all profiles in this with corresponding profiles in the given ProfileCollection(s). Profiles that are not found in all ProfileCollections are skipped:

p0.concat( p1 [, p2, ..]) -> single ProfileCollection with the
same number of profiles as p0 but with the length of p0+p1+p2..
Parameters:profiles (ProfileCollection(s)) – profile(s) to concatenate
Returns:concatenated profile(s)
Return type:ProfileCollection / subclass
update(other, stickyChanged=1, mask=None)[source]

Merge other ProfileCollection into this one, replacing existing profiles and info values. This is the obvious translation of dict.update(). The changed flag of each profile is set to 1 if:

  1. an existing profile is overridden with different values
  2. the profile is marked ‘changed’ in the other collection

The two ProfileCollections should have the same dimension in terms of atoms, that is p1.profLength() == p2.profLength(). If this is not the case, it is possible to ‘mask’ atoms in p1 that are are missing in p2. That means the target ProfileCollection can have more atoms then the other collection but not vice-versa.

Example::
p1.profLength() == 10 p2.profLength() == 5 p1.update( p2, mask=[0,1,0,1,0,1,0,1,0,1] )

…would assign the atom values of the shorter collection p2 to every second atom of the longer collection p1. If p2 has more items (atoms) per profile than p1, this would not work. In this case p2 first needs to be compressed to the same shape as p1:

p1.profLength() == 5
p2.profLength() == 10
p2 = p2.compress( [0,1,0,1,0,1,0,1,0,1] )
p1.update( p2 )
Parameters:
  • other (ProfileCollection) – profile
  • stickyChanged (0|1) – mark all profiles ‘changed’ that are marked ‘changed’ in the other collection (default: 1)
  • mask ([int]) – 1 x N_atoms array of 0|1, if the other collection has less atoms than this one, mark those positions with 0 that are only existing in this collection (N.sum(mask)==self.profLength())
  • mask – N.array of 0 or 1,
updateMissing(source, copyMissing=1, allowEmpty=0, setChanged=0)[source]

Merge other ProfileCollection into this one but do not override existing profiles and info records. There is one exception: Empty profiles (None or []) are replaced but their info records stay untouched. If copyMissing=0, profiles that are existing in source but not in this collection, are NOT copied (i.e. only empty profiles are replaced).

For each profile copied from the source the ‘changed’ flag is reset to |setChanged| (default 0), regardless whether or not the profile is marked ‘changed’ in the source collection.

Parameters:
  • source (ProfileCollection) – profile
  • copyMissing (0|1) – copy missing profiles that exist in source (default: 1)
  • allowEmpty (0|1) – still tolerate zero-length profiles after update (default: 0)
  • setChanged (0|1) – label profiles copied from source as ‘changed’ [0]
Raises:

ProfileError – if allowEmpty is 0 and some empty profiles cannot be found in source

isChanged(keys=None)[source]
Parameters:keys ([ str ] OR str) – only check these profiles (default: None -> means all)
Returns:True, if any of the profiles is tagged as ‘changed’
Return type:bool
clone()[source]
Clone (deepcopy) profiles::
clone() -> ProfileCollection (or sub-class)
Returns:profile
Return type:ProfileCollection
killViews()[source]

Deactivate any CrossView instances referring to this ProfileCollection.

clear()[source]
Delete all::
clear() -> None; delete all profiles and infos.
profLength(default=0)[source]
Length of profile::
profLength() -> int; length of first non-None profile or default (0)
Parameters:default (any) – value to return if all profiles are set to None
Returns:length of first non-None profile or 0
Return type:int
plot(*name, **arg)[source]
Plot one or more profiles using Biggles::
plot( name1, [name2, ..],[arg1=x, arg2=y]) -> biggles.FramedPlot
Parameters:
  • name (str) – one or more profile names
  • arg – key=value pairs for Biggles.Curve() function
Raises:
  • TypeError – if profile contains non-number items
  • ImportError – If biggles module could not be imported
Returns:

plot, view using plot.show()

Return type:

biggles.FramedPlot

plotArray(*name, **arg)[source]

Plot several profiles as a panel of separate plots. :param *name: one or more profile names or tuples of profile names :type *name: str or (str, str,…) :param xkey: profile to be used as x-axis (default: None) :type xkey: str :param arg: key=value pairs for Biggles.Curve() function :type arg:

Returns:

plot, view using plot.show()

Return type:

biggles.FramedPlot

Raises:
  • TypeError – if profile contains non-number items
  • ImportError – If biggles module could not be imported
plotHistogram(*name, **arg)[source]
Parameters:
  • bins (int) – number of bins (10)
  • ynormalize (bool) – normalize histograms to area 1.0 (False)
  • xnormalize (bool) – adapt bin range to min and max of all profiles (True)
  • xrange ((float, float)) – min and max of bin range (None)
  • steps (bool) – draw histogram steps (True)