ProfileCollection¶
-
class
biskit.
ProfileCollection
(profiles=None, infos=None)[source]¶ Bases:
object
Manage profiles (arrays or lists of values) for Trajectory frames or atoms/residues in PDBModel. ProfileCollection resembles a 2-dimensional array where the first axis (let’s say row) is accessed by a string key rather than an index. Each row has an additional (meta)info dictionary assigned to it. The take() and concat() methods operate on the columns, i.e. they are applied to all profiles simultaneously.
By default, profiles of numbers are stored and returned as Numpy.array and all others are stored and returned as ordinary list. This behaviour can be modified with the option asarray of ProfileCollection.set(). Using both lists and arrays is a compromise between the efficiency of Numeric arrays and two problems of of the old Numeric module – (1) arrays of objects could not be unpickled (Numeric bug) and (2) arrays of strings would end up as 2-D arrays of char. The ‘isarray’ entry of a profile’s info dictionary tells whether the profile is stored as array or as list. (Since we have now replaced Numeric by numpy, we can probably switch to the exclusive use of numpy arrays.)
- ProfileCollection p can be used like a dictionary of lists::
len( p ) -> number of profiles (== len( p.profiles ) ) p[‘prof1’] -> list/array with values of profile ‘prof1’ del p[‘prof1’] -> remove a profile p[‘prof1’] = [..] -> add/override a profile without additional infos
‘prof1’ in p -> True, if collection contains key ‘prof1’ for k in p: -> iterate over profile keys for k in p.iteritems(): -> iterate over key:profile pairs
Each profile key also has a dictionary of meta infos assigned to it (see getInfo(), setInfo(), p.infos). These can be accessed like:
p['prof1','date'] -> date of creation of profile named 'prof1' p.getInfo('prof1') -> returns all info records p['prof1','comment'] = 'first prof' -> add/change single info value
ProfileCollections can also be viewed from the side (along columns) – :class:`CrossView`s provide a virtual dictionary with the values of all profiles at a certain position. Changes to the dictionary will change the value in the underlying profile and vice versa, for example:
atom = p[10] -> CrossView{'prof1' : 42.0, 'name' : 'CA', ... } atom['prof1'] = 33 -> same as p['prof1'][10] = 33 p[10]['prof1']= 33 -> still the same but much slower, use instead... p['prof1'][10]= 33 -> doesn't invoke CrossView and is thus faster p[0] == p[-1] -> True if the values of all profiles are identical at first and last position for a in p.iterCrossViews(): -> iterates over CrossView dictionaries p.toCrossViews() -> list of CrossViews for all positions
For read-only access, normal dictionaries are faster than CrossViews:
for d in p.iterDicts(): -> iterate over normal dictionaries p.toDicts() -> list of normal (but disconnected) dictionaries
Adding a profile to a ProfileCollection will also ‘magically’ add an additional key to all existing CrossViews of it. CrossViews become invalid if their parent collection is garbage collected or re-ordered.
Note: The creation of many CrossViews hampers performance. We provide a
ProfileCollection.iterCrossView
iterator to loop over these pseudo dictionaries for convenience – direct iteration over the elements of a profile (array or list) is about 100 times faster. If you need to repeatedly read from many dictionaries, consider usingProfileCollection.toDicts
and cache the resulting normal (disconnected) dictionaries:cache = p.toDicts() # 'list( p.iterDicts() )' is equivalent but slower
Note
Profile arrays of float or int are automatically converted to arrays of type Float32 or Int32. This is a safety measure because we have stumbled over problems when transferring pickled objects between 32 and 64 bit machines. With the transition to numpy, this may not be needed any longer.
See also:
ProfileCollection.__picklesave_array
Methods Overview
__init__
Initialize self. array_or_list
Convert to array or list depending on asarray option clear
Delete all:: clear() -> None; delete all profiles and infos. clone
Clone (deepcopy) profiles:: clone() -> ProfileCollection (or sub-class) compress
Extract using a mask:: p.compress( mask ) <==> p.take( N.nonzero( mask ) ) concat
Concatenate all profiles in this with corresponding profiles in the given ProfileCollection(s). expand
Expand profile to have a value also for masked positions. get
OR get( (profKey, infoKey), [default] ) -> single value of info dict getInfo
Use:: getInfo( name ) -> dict with meta infos about profile: hasNoneProfile
Check wether any profile is None, which means it is waiting to be updated from a source ProfileCollection. has_key
isChanged
param keys: only check these profiles (default: None -> means all) :type keys: [ str ] OR str :return: True, if any of the profiles is tagged as ‘changed’ :rtype: bool items
Get list of tuples of profile names and profiles:: p.items() -> [ (key1, [any]), (key2, [any]), ..) ] iterCrossViews
Iterate over values of all profiles as CrossView
‘dictionaries’ indexed by profile name, for example: >>> for atom in p.iterCrossViews(): …iterDicts
Iterate over (copies of) values of all profiles as normal dictionaries indexed by profile name, for example: >>> for atom in p.iterCrossViews(): … iteritems
Iterate over (key : profile) pairs: >>> for key, profile in p.iteritems(): … keys
killViews
Deactivate any CrossView instances referring to this ProfileCollection. plot
Plot one or more profiles using Biggles:: plot( name1, [name2, ..],[arg1=x, arg2=y]) -> biggles.FramedPlot plotArray
Plot several profiles as a panel of separate plots. plotHistogram
param bins: number of bins (10) :type bins: int :param ynormalize: normalize histograms to area 1.0 (False) :type ynormalize: bool :param xnormalize: adapt bin range to min and max of all profiles (True) :type xnormalize: bool :param xrange: min and max of bin range (None) :type xrange: (float, float) :param steps: draw histogram steps (True) :type steps: bool profLength
Length of profile:: profLength() -> int; length of first non-None profile or default (0) profile2mask
Convert profile into a mask based on the max and min cutoff values. remove
Remove profile OR info values of profile:: remove( profKey ) -> 1|0, 1 if complete entry has been removed remove( profKey, infoKey ) -> 1|0, 1 if single info value was removed set
Add/override a profile. setInfo
Add/Override infos about a given profile:: e.g. setMany
setMany( dict, [infoDict] ) Add/Override many profiles take
Take from profiles using provided indices:: take( indices ) -> ProfileCollection with extract of all profiles toCrossViews
return: list of CrossView pseudo dictionaries :rtype: [ CrossView ] toDicts
return: (copies of) values of all profiles as normal dictionaries :rtype: [ dict ] update
Merge other ProfileCollection into this one, replacing existing profiles and info values. updateMissing
Merge other ProfileCollection into this one but do not override existing profiles and info records. values
Get list of all profiles (arrays or lists of values):: p.values() -> [ [any], [any], … version
Class version.
ProfileCollection Method & Attribute Details
-
__init__
(profiles=None, infos=None)[source]¶ Initialize self. See help(type(self)) for accurate signature.
-
values
()[source]¶ - Get list of all profiles (arrays or lists of values)::
- p.values() -> [ [any], [any], … ]
Returns: list of lists or arrays Return type: [ list/array ]
-
hasNoneProfile
()[source]¶ Check wether any profile is None, which means it is waiting to be updated from a source ProfileCollection. This method is written such that it is not triggering the updating mechanism. :return bool
-
items
()[source]¶ - Get list of tuples of profile names and profiles::
- p.items() -> [ (key1, [any]), (key2, [any]), ..) ]
Returns: list of tuples of profile names and profiles Return type: [ ( str, list/array ) ]
-
iterCrossViews
()[source]¶ Iterate over values of all profiles as
CrossView
‘dictionaries’ indexed by profile name, for example:>>> for atom in p.iterCrossViews(): ... print atom['name'], atom['residue_name']
The CrossViews remain connected to the profiles and can be used to change values in many profiles simultaneously. Consider using the somewhat faster
ProfileCollection.iterDicts
if this is not needed and speed is critical.Returns: CrossView instances behaving like dictionaries Return type: iterator over [ CrossView ]
-
iterDicts
()[source]¶ Iterate over (copies of) values of all profiles as normal dictionaries indexed by profile name, for example:
>>> for atom in p.iterCrossViews(): ... print atom['name'], atom['residue_name']
Returns: dictionaries Return type: iterator over { ‘key1’:v1, ‘key2’:v2 }
-
toDicts
()[source]¶ Returns: (copies of) values of all profiles as normal dictionaries Return type: [ dict ]
-
array_or_list
(prof, asarray)[source]¶ Convert to array or list depending on asarray option
Beware: empty lists will be upgraded to empty Float arrays.
Parameters: - prof (list OR array) – profile
- asarray (2|1|0) – 1.. autodetect type, 0.. force list, 2.. force array
Returns: profile
Return type: list OR array
Raises:
-
expand
(prof, mask, default)[source]¶ Expand profile to have a value also for masked positions.
Parameters: - prof (list OR array) – input profile
- mask ([int]) – atom mask
- default (any) – default value
Returns: profile
Return type: list OR array
-
set
(name, prof, mask=None, default=None, asarray=1, comment=None, **moreInfo)[source]¶ Add/override a profile. None is allowed as special purpose value - in which case all other parameters are ignored. Otherwise, the two info records ‘version’, ‘changed’ and ‘isarray’ are always modified but can be overridden by key=value pairs to this function.
Parameters: - name (str) – profile name (i.e. key)
- prof ([any] OR None) – list of values OR None
- mask ([int]) – list 1 x N_items of 0|1, if there are less values than items, provide mask with 0 for missing values, N.sum(mask)==N_items
- default (any) – value for items masked. (default: None for lists, 0 for arrays]
- asarray (0|1|2) – store as list (0), as array (2) or store numbers as array but everything else as list (1) (default: 1)
- comment (str) – goes into info[name][‘comment’]
- moreInfo (key=value) – additional key-value pairs for info[name]
Raises: - ProfileError – if length of prof != length of other profiles
- ProfileError – if mask is given but N.sum(mask) != len(prof)
-
setInfo
(name, **args)[source]¶ - Add/Override infos about a given profile::
- e.g. setInfo(‘relASA’, comment=’new’, params={‘bin’:’whatif’})
Raises: ProfileError – if no profile is found with |name|
-
setMany
(profileDict, infos={})[source]¶ setMany( dict, [infoDict] ) Add/Override many profiles
Parameters: - profileDict (dict) – dict with name:profile pairs
- infos (dict of dict) – info dicts for each profile, indexed by name
-
get
(profKey[, default]) → list of values[source]¶ OR get( (profKey, infoKey), [default] ) -> single value of info dict
Parameters: - name (str OR (str, str)) – profile key or profile and info key
- default (any) – default result if no profile is found, if None and no profile is found, raise exception
Raises: ProfileError – if no profile is found with |name|
-
getInfo
(name)[source]¶ - Use::
- getInfo( name ) -> dict with meta infos about profile:
Guaranteed infos: ‘version’->str, ‘comment’->str, ‘changed’->1|0
Parameters: name (str) – profile name Returns: dict with infos about profile Return type: dict Raises: ProfileError – if no profile is found with |name|
-
profile2mask
(profName, cutoff_min=None, cutoff_max=None)[source]¶ Convert profile into a mask based on the max and min cutoff values.
Parameters: - profName (str) – profile name
- cutoff_min (float) – lower limit
- cutoff_max (float) – upper limit
Returns: mask len( get(profName) ) x 1|0
Return type: [1|0]
-
take
(indices, *initArgs, **initKw)[source]¶ - Take from profiles using provided indices::
- take( indices ) -> ProfileCollection with extract of all profiles
Any additional parameters are passed to the constructor of the new instance.
Parameters: indices ([int]) – list of indices Returns: new profile from indices Return type: ProfileCollection (or sub-class) Raises: ProfileError – if take error
-
compress
(cond)[source]¶ - Extract using a mask::
- p.compress( mask ) <==> p.take( N.nonzero( mask ) )
Parameters: cond (array or list of int) – mask with 1 for the positions to keep
-
remove
(*key)[source]¶ - Remove profile OR info values of profile::
- remove( profKey ) -> 1|0, 1 if complete entry has been removed remove( profKey, infoKey ) -> 1|0, 1 if single info value was removed
Parameters: key (str OR str, str) – profile name OR name, infoKey Returns: sucess status Return type: 1|0
-
concat
(*profiles)[source]¶ Concatenate all profiles in this with corresponding profiles in the given ProfileCollection(s). Profiles that are not found in all ProfileCollections are skipped:
p0.concat( p1 [, p2, ..]) -> single ProfileCollection with the same number of profiles as p0 but with the length of p0+p1+p2..
Parameters: profiles (ProfileCollection(s)) – profile(s) to concatenate Returns: concatenated profile(s) Return type: ProfileCollection / subclass
-
update
(other, stickyChanged=1, mask=None)[source]¶ Merge other ProfileCollection into this one, replacing existing profiles and info values. This is the obvious translation of dict.update(). The changed flag of each profile is set to 1 if:
- an existing profile is overridden with different values
- the profile is marked ‘changed’ in the other collection
The two ProfileCollections should have the same dimension in terms of atoms, that is p1.profLength() == p2.profLength(). If this is not the case, it is possible to ‘mask’ atoms in p1 that are are missing in p2. That means the target ProfileCollection can have more atoms then the other collection but not vice-versa.
- Example::
- p1.profLength() == 10 p2.profLength() == 5 p1.update( p2, mask=[0,1,0,1,0,1,0,1,0,1] )
…would assign the atom values of the shorter collection p2 to every second atom of the longer collection p1. If p2 has more items (atoms) per profile than p1, this would not work. In this case p2 first needs to be compressed to the same shape as p1:
p1.profLength() == 5 p2.profLength() == 10 p2 = p2.compress( [0,1,0,1,0,1,0,1,0,1] ) p1.update( p2 )
Parameters: - other (ProfileCollection) – profile
- stickyChanged (0|1) – mark all profiles ‘changed’ that are marked ‘changed’ in the other collection (default: 1)
- mask ([int]) – 1 x N_atoms array of 0|1, if the other collection has less atoms than this one, mark those positions with 0 that are only existing in this collection (N.sum(mask)==self.profLength())
- mask – N.array of 0 or 1,
-
updateMissing
(source, copyMissing=1, allowEmpty=0, setChanged=0)[source]¶ Merge other ProfileCollection into this one but do not override existing profiles and info records. There is one exception: Empty profiles (None or []) are replaced but their info records stay untouched. If copyMissing=0, profiles that are existing in source but not in this collection, are NOT copied (i.e. only empty profiles are replaced).
For each profile copied from the source the ‘changed’ flag is reset to |setChanged| (default 0), regardless whether or not the profile is marked ‘changed’ in the source collection.
Parameters: - source (ProfileCollection) – profile
- copyMissing (0|1) – copy missing profiles that exist in source (default: 1)
- allowEmpty (0|1) – still tolerate zero-length profiles after update (default: 0)
- setChanged (0|1) – label profiles copied from source as ‘changed’ [0]
Raises: ProfileError – if allowEmpty is 0 and some empty profiles cannot be found in source
-
isChanged
(keys=None)[source]¶ Parameters: keys ([ str ] OR str) – only check these profiles (default: None -> means all) Returns: True, if any of the profiles is tagged as ‘changed’ Return type: bool
-
clone
()[source]¶ - Clone (deepcopy) profiles::
- clone() -> ProfileCollection (or sub-class)
Returns: profile Return type: ProfileCollection
-
profLength
(default=0)[source]¶ - Length of profile::
- profLength() -> int; length of first non-None profile or default (0)
Parameters: default (any) – value to return if all profiles are set to None Returns: length of first non-None profile or 0 Return type: int
-
plot
(*name, **arg)[source]¶ - Plot one or more profiles using Biggles::
- plot( name1, [name2, ..],[arg1=x, arg2=y]) -> biggles.FramedPlot
Parameters: - name (str) – one or more profile names
- arg – key=value pairs for Biggles.Curve() function
Raises: - TypeError – if profile contains non-number items
- ImportError – If biggles module could not be imported
Returns: plot, view using plot.show()
Return type: biggles.FramedPlot
-
plotArray
(*name, **arg)[source]¶ Plot several profiles as a panel of separate plots. :param *name: one or more profile names or tuples of profile names :type *name: str or (str, str,…) :param xkey: profile to be used as x-axis (default: None) :type xkey: str :param arg: key=value pairs for Biggles.Curve() function :type arg:
Returns: plot, view using plot.show()
Return type: biggles.FramedPlot
Raises: - TypeError – if profile contains non-number items
- ImportError – If biggles module could not be imported
-
plotHistogram
(*name, **arg)[source]¶ Parameters: - bins (int) – number of bins (10)
- ynormalize (bool) – normalize histograms to area 1.0 (False)
- xnormalize (bool) – adapt bin range to min and max of all profiles (True)
- xrange ((float, float)) – min and max of bin range (None)
- steps (bool) – draw histogram steps (True)