PDBModel

class biskit.PDBModel(source=None, pdbCode=None, noxyz=0, skipRes=None, headPatterns=[])[source]

Bases: object

Store and manipulate coordinates and atom infos stemming from a PDB file. Coordinates are stored in the numpy array ‘xyz’; the additional atom infos from the PDB (name, residue_name, and many more) are efficiently stored in a PDBProfiles instance ‘atoms’ which can be used to also associate arbitrary other data to the atoms. Moreover, a similar collection ‘residues’ can hold data associated to residues (but is initially empty). A normal dictionary ‘info’ accepts any information about the whole model.

For detailed documentation, see http://biskit.pasteur.fr/doc/handling_structures/PDBModel

@todo:
  • outsource validSource into PDBParserFactory
  • prevent repeated loading of test PDB for each test

Methods Overview

__init__ Examples:
addChainFromSegid Takes the last letter of the segment ID and adds it as chain ID.
addChainId Assign consecutive chain identifiers A - Z to all atoms.
argsort Prepare sorting atoms within residues according to comparison function.
atom2chainIndices Convert atom indices to chain indices.
atom2chainMask Mask (set to 0) chains for which all atoms are masked (0) in atomMask.
atom2resIndices Get list of indices of residues for which any atom is in indices.
atom2resMask Mask (set 0) residues for which all atoms are masked (0) in atomMask.
atom2resProfile Get a residue profile where each residue has the value that its first atom has in the atom profile.
atomNames Return a list of atom names from start to stop RESIDUE index
atomRange
>>> m.atomRange() == range( m.lenAtoms() )
atomkey Create a string key encoding the atom content of this model independent of the order in which atoms appear within residues.
biomodel Return the ‘biologically relevant assembly’ of this model according to the information in the PDB’s BIOMT record (captured in info[‘BIOMT’]).
center Geometric centar of model.
centerOfMass Center of mass of PDBModel.
centered Get model with centered coordinates.
chain2atomIndices Convert chain indices into atom indices.
chain2atomMask Convert chain mask to atom mask.
chainBreaks Identify discontinuities in the molecule’s backbone.
chainEndIndex Get the position of the each residue’s last atom.
chainIndex Get indices of first atom of each chain.
chainMap Get chain index of each atom.
clone Clone PDBModel.
compareAtoms Get list of atom indices for this and reference model that converts both into 2 models with identical residue and atom content.
compareChains Get list of corresponding chain indices for this and reference model.
compress Compress PDBmodel using mask.
concat Concatenate atoms, coordinates and profiles.
disconnect Disconnect this model from its source (if any).
equals Compares the residue and atom sequence in the given range.
extendIndex Translate a list of positions that is defined, e.g., on residues (/chains) to a list of atom positions AND also return the starting position of each residue (/chain) in the new sub-list of atoms.
extendMask Translate a mask that is defined,e.g., on residues(/chains) to a mask that is defined on atoms.
filter Extract atoms that match a combination of key=values.
filterIndex Get atom positions that match a combination of key=values.
fit Least-square fit this model onto refMode
getAtoms Get atom CrossViews that can be used like dictionaries.
getPdbCode Return pdb code of model.
getXyz Get coordinates, fetch from source PDB or pickled PDBModel, if necessary.
index2map Create a map of len_i length, giving the residue(/chain) numer of each atom, from list of residue(/chain) starting positions.
indices Get atom indices conforming condition.
indicesFrom Get atom indices conforming condition applied to an atom profile.
keep Replace atoms,coordinates,profiles of this(!) model with sub-set.
lenAtoms Number of atoms in model.
lenBiounits Number of biological assemblies defined in PDB BIOMT record, if any.
lenChains Number of chains in model.
lenResidues Number of residues in model.
magicFit Superimpose this model onto a ref.
map2index Identify the starting positions of each residue(/chain) from a map giving the residue(/chain) number of each atom.
mask Get atom mask.
maskBB Short cut for mask of all backbone atoms.
maskCA Short cut for mask of all CA atoms.
maskCB Short cut for mask of all CB I{and} CA of GLY.
maskDNA Short cut for mask of all atoms in DNA (based on residue name).
maskF Create list whith result of atomFunction( atom ) for each atom.
maskFrom Create an atom mask from the values of a specific profile.
maskH Short cut for mask of hydrogens.
maskH2O Short cut for mask of all atoms in residues named TIP3, HOH and WAT
maskHeavy Short cut for mask of all heavy atoms.
maskHetatm Short cut for mask of all HETATM
maskNA Short cut for mask of all atoms in DNA or RNA (based on residue name).
maskProtein Short cut for mask containing all atoms of amino acids.
maskRNA Short cut for mask of all atoms in RNA (based on residue name).
maskSolvent Short cut for mask of all atoms in residues named TIP3, HOH, WAT, Na+, Cl-, CA, ZN
mass Molecular weight of PDBModel.
masses Collect the molecular weight of all atoms in PDBModel.
mergeChains Merge two adjacent chains.
mergeResidues Merge two adjacent residues.
plot Get a quick & dirty overview over the content of a PDBModel.
profile Use:: profile( name, updateMissing=0) -> atom or residue profile
profile2atomMask Same as profile2mask, but converts residue mask to atom mask.
profile2mask
param cutoff_min:
 low value cutoff (all values >= cutoff_min) :type cutoff_min: float :param cutoff_max: high value cutoff (all values < cutoff_max) :type cutoff_max: float
profile2resList Group the profile values of each residue’s atoms into a separate list.
profileChangedFromDisc Check if profile has changed compared to source.
profileInfo Use:
remove Convenience access to the 3 different remove methods.
removeProfile Remove residue or atom profile(s)
removeRes Remove all atoms with a certain residue name.
renameAmberRes Rename special residue names from Amber back into standard names (i.e CYX S{->} CYS )
renumberResidues Make all residue numbers consecutive and remove any insertion code letters.
report Print (or return) a brief description of this model.
reportAtoms
param i:optional list of atom positions to report (default: all) :type i: [ int ] :return: formatted string with atom and residue names similar to PDB :rtype: str
res2atomIndices Convert residue indices to atom indices.
res2atomMask Convert residue mask to atom mask.
res2atomProfile Get an atom profile where each atom has the value its residue has in the residue profile.
resEndIndex Get the position of the each residue’s last atom.
resIndex Get the position of the each residue’s first atom.
resList Return list of lists of atom pseudo dictionaries per residue, which allows to iterate over residues and atoms of residues.
resMap Get list to map from any atom to a continuous residue numbering (starting with 0).
resMapOriginal Generate list to map from any atom to its ORIGINAL(!) PDB residue number.
resModels Creates one new PDBModel for each residue in the parent PDBModel.
residusMaximus Take list of value per atom, return list where all atoms of any residue are set to the highest value of any atom in that residue.
rms Rmsd between two PDBModels.
saveAs Pickle this PDBModel to a file, set the ‘source’ field to this file name and mark atoms, xyz, and profiles as unchanged.
sequence Amino acid sequence in one letter code.
setPdbCode Set model pdb code.
setSource
param source:LocalPath OR PDBModel OR str
setXyz Replace coordinates.
slim Remove xyz array and profiles if they haven’t been changed and could hence be loaded from the source file (only if there is a source file…).
sort Apply a given sort list to the atoms of this model.
sourceFile Name of pickled source or PDB file.
structureFit Structure-align this model onto a reference model using the external TM-Align program (which needs to be installed).
take Extract a PDBModel with a subset of atoms:
takeChains Get copy of this model with only the given chains.
takeResidues Copy the given residues into a new model.
transform Transform coordinates of PDBModel.
transformation Get the transformation matrix which least-square fits this model onto the other model.
unequalAtoms Identify atoms that are not matching between two models.
unsort Undo a previous sorting on the model itself (no copy).
update Read coordinates, atoms, fileName, etc.
validSource Check for a valid source on disk.
version
writePdb Save model as PDB file.
xplor2amber Rename atoms so that tleap from Amber can read the PDB.
xyzChangedFromDisc Tell whether xyz can currently be reconstructed from a source on disc.
xyzIsChanged Tell if xyz or atoms have been changed compared to source file or source object (which can be still in memory).

Attributes Overview

PDB_KEYS keys of all atom profiles that are read directly from the PDB file

PDBModel Method & Attribute Details

PDB_KEYS = ['name', 'residue_number', 'insertion_code', 'alternate', 'name_original', 'chain_id', 'occupancy', 'element', 'segment_id', 'charge', 'residue_name', 'after_ter', 'serial_number', 'type', 'temperature_factor']

keys of all atom profiles that are read directly from the PDB file

__init__(source=None, pdbCode=None, noxyz=0, skipRes=None, headPatterns=[])[source]

Examples:

  • PDBModel() creates an empty Model to which coordinates (field xyz) and PDB records (atom profiles) have still to be added.
  • PDBModel( file_name ) creates a complete model with coordinates and PDB records from file_name (pdb, pdb.gz, or pickled PDBModel)
  • PDBModel( PDBModel ) creates a copy of the given model
  • PDBModel( PDBModel, noxyz=1 ) creates a copy without coordinates
Parameters:
  • source (str or PDBModel) – str, file name of pdb/pdb.gz file OR pickled PDBModel OR PDBModel, template structure to copy atoms/xyz field from
  • pdbCode (str or None) – PDB code, is extracted from file name otherwise
  • noxyz (0||1) – 0 (default) || 1, create without coordinates
  • headPatterns ([(str, str)]) – [(putIntoKey, regex)] extract given REMARK values
Raises:

PDBError – if file exists but can’t be read

residues = None

save atom-/residue-based values

xyzChanged = None

monitor changes of coordinates

initVersion = None

version as of creation of this object

info = None

to collect further informations

report(prnt=True, plot=False, clipseq=60)[source]

Print (or return) a brief description of this model.

Parameters:
  • prnt (bool) – directly print report to STDOUT (default True)
  • plot (bool) – show simple 2-D line plot using gnuplot [False]
  • clipseq (int) – clip chain sequences at this number of letters [60]
Returns:

if prnt==True: None, else: formatted description of this model

Return type:

None or str

plot(hetatm=False)[source]

Get a quick & dirty overview over the content of a PDBModel. plot simply creates a 2-D plot of all x-coordinates versus all y coordinates, colored by chain. This is obviously not publication-quality ;-). Use the Biskit.Pymoler class for real visalization.

Parameters:hetatm (bool) – include hetero & solvent atoms (default False)
update(skipRes=None, updateMissing=0, force=0, headPatterns=[])[source]

Read coordinates, atoms, fileName, etc. from PDB or pickled PDBModel - but only if they are currently empty. The atomsChanged and xyzChanged flags are not changed.

Parameters:
  • skipRes (list of str) – names of residues to skip if updating from PDB
  • updateMissing (0|1) – 0(default): update only existing profiles
  • force (0|1) – ignore invalid source (0) or report error (1)
  • headPatterns ([(str, str)]) – [(putIntoKey, regex)] extract given REMARKS
Raises:

PDBError – if file can’t be unpickled or read:

setXyz(xyz)[source]

Replace coordinates.

Parameters:xyz (array) – Numpy array ( 3 x N_atoms ) of float
Returns:array( 3 x N_atoms ) or None, old coordinates
Return type:array
setSource(source)[source]
Parameters:source – LocalPath OR PDBModel OR str
getXyz(mask=None)[source]

Get coordinates, fetch from source PDB or pickled PDBModel, if necessary.

Parameters:mask (list of int OR array of 1||0) – atom mask
Returns:xyz-coordinates, array( 3 x N_atoms, Float32 )
Return type:array
getAtoms(mask=None)[source]

Get atom CrossViews that can be used like dictionaries. Note that the direct manipulation of individual profiles is more efficient than the manipulation of CrossViews (on profiles)!

Parameters:mask (list of int OR array of 1||0) – atom mask
Returns:list of CrossView dictionaries
Return type:[ ProfileCollection.CrossView ]
profile(name, default=None, update=True, updateMissing=False)[source]
Use::
profile( name, updateMissing=0) -> atom or residue profile
Parameters:
  • name (str) – name to access profile
  • default – default result if no profile is found, if None,

try to update from source and raise error [None] :type default: any :param update: update from source before returning empty profile [True] :type update: bool :param updateMissing: update from source before reporting missing

profile [False]
Raises:ProfileError – if neither atom- nor rProfiles contains |name|
profileInfo(name, updateMissing=0)[source]

Use:

profileInfo( name ) -> dict with infos about profile
Parameters:
  • name (str) – name to access profile
  • updateMissing (0|1) –

    update from source before reporting missing profile. Guaranteed infos are:

    • ’version’ (str)
    • ’comment’ (str)
    • ’changed’ (1||0)
Raises:

ProfileError – if neither atom - nor rProfiles contains |name|

removeProfile(*names)[source]

Remove residue or atom profile(s)

Use:

removeProfile( str_name [,name2, name3] ) -> 1|0,
Parameters:names (str OR list of str) – name or list of residue or atom profiles
Returns:1 if at least 1 profile has been deleted, 0 if none has been found
Return type:int
xyzIsChanged()[source]

Tell if xyz or atoms have been changed compared to source file or source object (which can be still in memory).

Returns:xyz field has been changed with respect to source
Return type:(1||0, 1||0)
xyzChangedFromDisc()[source]

Tell whether xyz can currently be reconstructed from a source on disc. Same as xyzChanged() unless source is another not yet saved PDBModel instance that made changes relative to its own source.

Returns:xyz has been changed
Return type:bool
profileChangedFromDisc(pname)[source]

Check if profile has changed compared to source.

Returns:1, if profile |pname| can currently not be reconstructed from a source on disc.
Return type:int
Raises:ProfileError – if there is no atom or res profile with pname
slim()[source]

Remove xyz array and profiles if they haven’t been changed and could hence be loaded from the source file (only if there is a source file…). AUTOMATICALLY CALLED BEFORE PICKLING Currently also called by deepcopy via getstate

validSource()[source]

Check for a valid source on disk.

Returns:str or PDBModel, None if this model has no valid source
Return type:str or PDBModel or None
sourceFile()[source]

Name of pickled source or PDB file. If this model has another PDBModel as source, the request is passed on to this one.

Returns:file name of pickled source or PDB file
Return type:str
Raises:PDBError – if there is no valid source
disconnect()[source]

Disconnect this model from its source (if any).

Note

If this model has an (in-memory) PDBModel instance as source, the entries of ‘atoms’ could still reference the same dictionaries.

getPdbCode()[source]

Return pdb code of model.

Returns:pdb code
Return type:str
setPdbCode(code)[source]

Set model pdb code.

Parameters:code (str) – new pdb code
sequence(mask=None, xtable={'ca': '+', 'cl-': '-', 'hoh': '~', 'na+': '+', 'nap': 'X', 'ndp': 'X', 'tip3': '~', 'wat': '~'})[source]

Amino acid sequence in one letter code.

Parameters:
  • mask (list or array) – atom mask, to apply before (default None)
  • xtable (dict) – dict {str:str}, additional residue:single_letter mapping for non-standard residues (default molUtils.xxDic) [currently not used]
Returns:

1-letter-code AA sequence (based on first atom of each res).

Return type:

str

xplor2amber(aatm=True, parm10=False)[source]

Rename atoms so that tleap from Amber can read the PDB. If HIS residues contain atoms named HE2 or/and HD2, the residue name is changed to HIE or HID or HIP, respectively. Disulfide bonds are not yet identified - CYS -> CYX renaming must be done manually (see AmberParmBuilder for an example). Internally amber uses H atom names ala HD21 while (old) standard pdb files use 1HD2. By default, ambpdb produces ‘standard’ pdb atom names but it can output the less ambiguous amber names with switch -aatm.

Parameters:
  • change (1|0) – change this model’s atoms directly (default:1)
  • aatm (1|0) – use, for example, HG23 instead of 3HG2 (default:1)
  • parm10 (1|0) – adapt nucleic acid atom names to 2010 Amber forcefield
Returns:

[ {..} ], list of atom dictionaries

Return type:

list of atom dictionaries

renameAmberRes()[source]

Rename special residue names from Amber back into standard names (i.e CYX S{->} CYS )

writePdb(fname, ter=1, amber=0, original=0, left=0, wrap=0, headlines=None, taillines=None)[source]

Save model as PDB file.

Parameters:
  • fname (str) – name of new file
  • ter (int) –

    Option of how to treat the terminal record:

    • 0 - don’t write any TER statements
    • 1 - restore original TER statements (doesn’t work, if preceeding atom has been deleted) [default]
    • 2 - put TER between all detected chains
    • 3 - as 2 but also detect and split discontinuous chains
  • amber (1||0) – amber formatted atom names (implies ter=3, left=1, wrap=0) (default 0)
  • original (1||0) – revert atom names to the ones parsed in from PDB (default 0)
  • left (1||0) – left-align atom names (as in amber pdbs)(default 0)
  • wrap (1||0) – write e.g. ‘NH12’ as ‘2NH1’ (default 0)
  • headlines (list of tuples) – [( str, dict or str)], list of record / data tuples:: e.g. [ (‘SEQRES’, ‘ 1 A 22 ALA GLY ALA’), ]
  • taillines (list of tuples) – same as headlines but appended at the end of file
saveAs(path)[source]

Pickle this PDBModel to a file, set the ‘source’ field to this file name and mark atoms, xyz, and profiles as unchanged. Normal pickling of the object will only dump those data that can not be reconstructed from the source of this model (if any). saveAs creates a ‘new source’ without further dependencies.

Parameters:path (str OR LocalPath instance) – target file name
maskF(atomFunction, numpy=1)[source]

Create list whith result of atomFunction( atom ) for each atom. (Depending on the return value of atomFunction, the result is not necessarily a mask of 0 and 1. Creating masks should be just the most common usage).

Note:

This method is slow compared to maskFrom because the dictionaries that are given to the atomFunction have to be created from aProfiles on the fly. If performance matters, better combine the result from several maskFrom calls, e.g. instead of:

r = m.maskF( lambda a: a['name']=='CA' and a['residue_name']=='ALA' )

use:

r = m.maskFrom( 'name', 'CA' ) * m.maskFrom('residue_name', 'ALA')
Parameters:
  • atomFunction (1||0) – function( dict_from_aProfiles.toDict() ), true || false (Condition)
  • numpy (int) – 1(default)||0, convert result to Numpy array of int
Returns:

Numpy array( [0,1,1,0,0,0,1,0,..], Int) or list

Return type:

array or list

maskFrom(key, cond)[source]

Create an atom mask from the values of a specific profile. Example, the following three statements are equivalent:

>>> mask = m.maskFrom( 'name', 'CA' )
>>> mask = m.maskFrom( 'name', lambda a: a == 'CA' )
>>> mask = N0.array( [ a == 'CA' for a in m.atoms['name'] ] )

However, the same can be also achieved with standard numpy operators:

>>> mask = numpy.array(m.atoms['name']) == 'CA'
Parameters:
  • key (str) – the name of the profile to use
  • cond (function OR any OR [ any ]) – either a function accepting a single value or a value or an iterable of values (to allow several alternatives)
Returns:

array or list of indices where condition is met

Return type:

list or array of int

maskCA(force=0)[source]

Short cut for mask of all CA atoms.

Parameters:force (0||1) – force calculation even if cached mask is available
Returns:array( 1 x N_atoms ) of 0||1
Return type:array
maskBB(force=0, solvent=0)[source]

Short cut for mask of all backbone atoms. Supports standard protein and DNA atom names. Any residues classified as solvent (water, ions) are filtered out.

Parameters:
  • force (0||1) – force calculation even if cached mask is available
  • solvent (1||0) – include solvent residues (default: false)
Returns:

array( 1 x N_atoms ) of 0||1

Return type:

array

maskHeavy(force=0)[source]

Short cut for mask of all heavy atoms. (‘element’ <> H)

Parameters:force (0||1) – force calculation even if cached mask is available
Returns:array( 1 x N_atoms ) of 0||1
Return type:array
maskH()[source]

Short cut for mask of hydrogens. (‘element’ == H)

Returns:array( 1 x N_atoms ) of 0||1
Return type:array
maskCB()[source]

Short cut for mask of all CB I{and} CA of GLY.

Returns:mask of all CB plus CA of GLY
Return type:array
maskH2O()[source]

Short cut for mask of all atoms in residues named TIP3, HOH and WAT

Returns:array( 1 x N_atoms ) of 0||1
Return type:array
maskSolvent()[source]

Short cut for mask of all atoms in residues named TIP3, HOH, WAT, Na+, Cl-, CA, ZN

Returns:array( 1 x N_atoms ) of 0||1
Return type:array
maskHetatm()[source]

Short cut for mask of all HETATM

Returns:array( 1 x N_atoms ) of 0||1
Return type:array
maskProtein(standard=0)[source]

Short cut for mask containing all atoms of amino acids.

Parameters:standard (0|1) – only standard residue names (not CYX, NME,..) (default 0)
Returns:array( 1 x N_atoms ) of 0||1, mask of all protein atoms (based on residue name)
Return type:array
maskDNA()[source]

Short cut for mask of all atoms in DNA (based on residue name).

Returns:array( 1 x N_atoms ) of 0||1
Return type:array
maskRNA()[source]

Short cut for mask of all atoms in RNA (based on residue name).

Returns:array( 1 x N_atoms ) of 0||1
Return type:array
maskNA()[source]

Short cut for mask of all atoms in DNA or RNA (based on residue name).

Returns:array( 1 x N_atoms ) of 0||1
Return type:array
indicesFrom(key, cond)[source]

Get atom indices conforming condition applied to an atom profile. Corresponds to:

>>> numpy.nonzero( m.maskFrom( key, cond) )
Parameters:
  • key (str) – the name of the profile to use
  • cond (function OR any OR [any]) – either a function accepting a single value or a value or an iterable of values
Returns:

array of indices where condition is met

:rtype : array of int

indices(what)[source]

Get atom indices conforming condition. This is a convenience method to ‘normalize’ different kind of selections (masks, atom names, indices, functions) to indices as they are e.g. required by PDBModel.take.

Parameters:what (function OR list of str or int OR int) –

Selection:: - function applied to each atom entry,

e.g. lambda a: a[‘residue_name’]==’GLY’
  • list of str, allowed atom names
  • list of int, allowed atom indices OR mask with only 1 and 0
  • int, single allowed atom index
Returns:N_atoms x 1 (0||1 )
Return type:Numeric array
Raises:PDBError – if what is neither of above
mask(what)[source]

Get atom mask. This is a convenience method to ‘normalize’ different kind of selections (masks, atom names, indices, functions) to a mask as it is e.g. required by PDBModel.compress.

Parameters:what (function OR list of str or int OR int) –

Selection:: - function applied to each atom entry,

e.g. lambda a: a[‘residue_name’]==’GLY’
  • list of str, allowed atom names
  • list of int, allowed atom indices OR mask with only 1 and 0
  • int, single allowed atom index
Returns:N_atoms x 1 (0||1 )
Return type:Numeric array
Raises:PDBError – if what is neither of above
index2map(index, len_i)[source]

Create a map of len_i length, giving the residue(/chain) numer of each atom, from list of residue(/chain) starting positions.

Parameters:
  • index ([ int ] or array of int) – list of starting positions, e.g. [0, 3, 8]
  • len_i (int) – length of target map, e.g. 10
Returns:

list mapping atom positions to residue(/chain) number, e.g. [0,0,0, 1,1,1,1,1, 2,2] from above example

Return type:

array of int (and of len_i length)

map2index(imap)[source]

Identify the starting positions of each residue(/chain) from a map giving the residue(/chain) number of each atom.

Parameters:imap ([ int ]) – something like [0,0,0,1,1,1,1,1,2,2,2,…]
Returns:list of starting positions, e.g. [0, 3, 8, …] in above ex.
Return type:array of int
extendMask(mask, index, len_i)[source]

Translate a mask that is defined,e.g., on residues(/chains) to a mask that is defined on atoms.

:param mask : mask marking positions in the list of residues or chains :type mask : [ bool ] or array of bool or of 1||0 :param index: starting positions of all residues or chains :type index: [ int ] or array of int :param len_i: length of target mask :type len_i: int

Returns:mask that blows up the residue / chain mask to an atom mask
Return type:array of bool
extendIndex(i, index, len_i)[source]

Translate a list of positions that is defined, e.g., on residues (/chains) to a list of atom positions AND also return the starting position of each residue (/chain) in the new sub-list of atoms.

:param i : positions in higher level list of residues or chains :type i : [ int ] or array of int :param index: atomic starting positions of all residues or chains :type index: [ int ] or array of int :param len_i: length of atom index (total number of atoms) :type len_i: int

Returns:(ri, rindex) - atom positions & new index
Return type:array of int, array of int
atom2resMask(atomMask)[source]

Mask (set 0) residues for which all atoms are masked (0) in atomMask.

Parameters:atomMask (list/array of int) – list/array of int, 1 x N_atoms
Returns:1 x N_residues (0||1 )
Return type:array of int
atom2resIndices(indices)[source]

Get list of indices of residues for which any atom is in indices.

Note: in the current implementation, the resulting residues are returned in their old order, regardless of the order of input positions.

Parameters:indices (list of int) – list of atom indices
Returns:indices of residues
Return type:list of int
res2atomMask(resMask)[source]

Convert residue mask to atom mask.

Parameters:resMask (list/array of int) – list/array of int, 1 x N_residues
Returns:1 x N_atoms
Return type:array of int
res2atomIndices(indices)[source]

Convert residue indices to atom indices.

Parameters:indices (list/array of int) – list/array of residue indices
Returns:array of atom positions
Return type:array of int
atom2chainIndices(indices, breaks=0)[source]

Convert atom indices to chain indices. Each chain is only returned once.

Parameters:
  • indices (list of int) – list of atom indices
  • breaks (0||1) – look for chain breaks in backbone coordinates (def. 0)
Returns:

chains any atom which is in indices

Return type:

list of int

atom2chainMask(atomMask, breaks=0)[source]

Mask (set to 0) chains for which all atoms are masked (0) in atomMask. Put another way: Mark all chains that contain any atom that is marked ‘1’ in atomMask.

Parameters:atomMask (list/array of int) – list/array of int, 1 x N_atoms
Returns:1 x N_residues (0||1 )
Return type:array of int
chain2atomMask(chainMask, breaks=0)[source]

Convert chain mask to atom mask.

Parameters:
  • chainMask (list/array of int) – list/array of int, 1 x N_chains
  • breaks (0||1) – look for chain breaks in backbone coordinates (def. 0)
Returns:

1 x N_atoms

Return type:

array of int

chain2atomIndices(indices, breaks=0)[source]

Convert chain indices into atom indices.

Parameters:indices (list/array of int) – list/array of chain indices
Returns:array of atom positions, new chain index
Return type:array of int
res2atomProfile(p)[source]

Get an atom profile where each atom has the value its residue has in the residue profile.

Parameters:p (str) – name of existing residue profile OR … [ any ], list of lenResidues() length
Returns:[ any ] OR array, atom profile
Return type:list or array
atom2resProfile(p, f=None)[source]

Get a residue profile where each residue has the value that its first atom has in the atom profile. :param p: name of existing atom profile OR …

[ any ], list of lenAtoms() length
Parameters:f (func) – function to calculate single residue from many atom values f( [atom_value1, atom_value2,…] ) -> res_value (default None, simply take value of first atom in each res.)
Returns:[ any ] OR array, residue profile
Return type:list or array
profile2mask(str_profname[, cutoff_min, cutoff_max=None])[source]
Parameters:
  • cutoff_min (float) – low value cutoff (all values >= cutoff_min)
  • cutoff_max (float) – high value cutoff (all values < cutoff_max)
Returns:

mask len( profile(profName) ) x 1||0

Return type:

array

Raises:

ProfileError – if no profile is found with name profName

profile2atomMask(str_profname[, cutoff_min, cutoff_max=None])[source]

Same as profile2mask, but converts residue mask to atom mask.

Parameters:
  • cutoff_min (float) – low value cutoff
  • cutoff_max (float) – high value cutoff
Returns:

mask N_atoms x 1|0

Return type:

array

Raises:

ProfileError – if no profile is found with name profName

profile2resList(p)[source]

Group the profile values of each residue’s atoms into a separate list. :param p: name of existing atom profile OR …

[ any ], list of lenAtoms() length
Returns:a list (one entry per residue) of lists (one entry per resatom)
Return type:[ [ any ] ]
mergeChains(c1, id='', segid='', rmOxt=True, renumberAtoms=False, renumberResidues=True)[source]

Merge two adjacent chains. This merely removes all internal markers for a chain boundary. Atom content or coordinates are not modified.

PDBModel tracks chain boundaries in an internal _chainIndex. However, there are cases when this chainIndex needs to be re-built and new chain boundaries are then infered from jumps in chain- or segment labelling or residue numbering. mergeChains automatically re-assigns PDB chain- and segment IDs as well as residue numbering to prepare for this situation.

:param c1 : first of the two chains to be merged :type c1 : int :param id : chain ID of the new chain (default: ID of first chain) :type id : str :param segid: ew chain’s segid (default: SEGID of first chain) :type segid: str :param renumberAtoms: rewrite PDB serial numbering of the adjacent

chain to be consequtive to the last atom of the first chain (default: False)
Parameters:renumberResidues (bool) – shift PDB residue numbering so that the first residue of the adjacent chain follows the previous residue. Other than for atom numbering, later jumps in residue numbering are preserved. (default: True)
mergeResidues(r1, name='', residue_number=None, chain_id='', segment_id='', renumberAtoms=False)[source]

Merge two adjacent residues. Duplicate atoms are labelled with alternate codes ‘A’ (first occurrence) to ‘B’ or later. :param r1: first of the two residues to be merged :type r1: int :param name: name of the new residue (default: name of first residue) :type name: str

concat(*models, **kw)[source]

Concatenate atoms, coordinates and profiles. source and fileName are lost, so are profiles that are not available in all models. model0.concat( model1 [, model2, ..]) -> single PDBModel.

Parameters:
  • models (one or more PDBModel instances) – models to concatenate
  • newRes (bool) – treat beginning of second model as new residue (True)
  • newChain (bool) – treat beginning of second model as new chain (True)

Note: info records of given models are lost.

take(i, rindex=None, cindex=None, *initArgs, **initKw)[source]

Extract a PDBModel with a subset of atoms:

take( atomIndices ) -> PDBModel

All other PDBModel methods that extract portions of the model (e.g. compress, takeChains, takeResidues, keep, clone, remove) are ultimately using take() at their core.

Note: take employs fast numpy vector mapping methods to re-calculate the residue and chain index of the result model. The methods generally work but there is one scenario were this mechanism can fail: If take is used to create repetitions of residues or chains directly next to each other, these residues or chains can get accidentally merged. For this reason, calling methods can optionally pre-calculate and provide a correct version of the new residue or chain index (which will then be used as is).

Parameters:
  • i (list/array of int) – atomIndices, positions to take in the order to take
  • rindex (array of int) – optional residue index for result model after extraction
  • cindex (array of int) – optional chain index for result model after extraction
  • initArgs – any number of additional arguments for constructor of result model
  • initKw – any additional keyword arguments for constructure of result model
Returns:

new PDBModel or sub-class

Return type:

PDBModel

keep(i)[source]

Replace atoms,coordinates,profiles of this(!) model with sub-set. (in-place version of N0.take() )

Parameters:i (list or array of int) – atom positions to be kept
clone()[source]

Clone PDBModel.

Returns:PDBModel / subclass, copy of this model, see comments to numpy.take()
Return type:PDBModel
compress(mask, *initArgs, **initKw)[source]

Compress PDBmodel using mask.

compress( mask ) -> PDBModel
Parameters:mask (array) –

array( 1 x N_atoms of 1 or 0 ):

  • 1 .. keep this atom
Returns:compressed PDBModel using mask
Return type:PDBModel
remove(what)[source]

Convenience access to the 3 different remove methods. The mask used to remove atoms is returned. This mask can be used to apply the same change to another array of same dimension as the old(!) xyz and atoms.

Parameters:what (list of int or int) –

Decription of what to remove:

  • function( atom_dict ) -> 1 || 0 (1..remove) OR
  • list of int [4, 5, 6, 200, 201..], indices of atoms to remove
  • list of int [11111100001101011100..N_atoms], mask (1..remove)
  • int, remove atom with this index
Returns:array(1 x N_atoms_old) of 0||1, mask used to compress the atoms and xyz arrays.
Return type:array
Raises:PDBError – if what is neither of above
takeResidues(i)[source]

Copy the given residues into a new model.

Parameters:i ([ int ]) – residue indices
Returns:PDBModel with given residues in given order
Return type:PDBModel
takeChains(chains, breaks=0, force=0)[source]

Get copy of this model with only the given chains.

Note, there is one very special scenario where chain boundaries can get lost: If breaks=1 (chain positions are based on normal chain boundaries as well as structure-based chain break detection) AND one or more chains are extracted several times next to each other, for example chains=[0, 1, 1, 2], then the repeated chain will be merged. So in the given example, the new model would have chainLength()==3. This case is tested for and a PDBIndexError is raised. Override with force=1 and proceed at your own risk. Which, in this case, simply means you should re-calculate the chain index after takeChains(). Example:

>>> repeat = model.takeChains( [0,0,0], breaks=1, force=1 )
>>> repeat.chainIndex( force=1, cache=1 )

This works because the new model will have back-jumps in residue numbering.

Parameters:
  • chains (list of int) – list of chains, e.g. [0,2] for first and third
  • breaks (0|1) – split chains at chain breaks (default 0)
  • maxDist (float) – (if breaks=1) chain break threshold in Angstrom
  • force (bool) – override check for chain repeats (only for breaks==1)
Returns:

PDBModel consisting of the given chains in the given order

Return type:

PDBModel

addChainFromSegid(verbose=1)[source]

Takes the last letter of the segment ID and adds it as chain ID.

addChainId(first_id=None, keep_old=0, breaks=0)[source]

Assign consecutive chain identifiers A - Z to all atoms.

Parameters:
  • first_id (str) – str (A - Z), first letter instead of ‘A’
  • keep_old (1|0) – don’t override existing chain IDs (default 0)
  • breaks (1|0) – consider chain break as start of new chain (default 0)
renumberResidues(mask=None, start=1, addChainId=1)[source]

Make all residue numbers consecutive and remove any insertion code letters. Note that a backward jump in residue numbering (among other things) is interpreted as end of chain by chainMap() and chainIndex() when a PDB file is loaded.

Parameters:
  • mask (list of int) – [ 0||1 x N_atoms ] atom mask to apply BEFORE
  • start (int) – starting number (default 1)
  • addChainId (1|0) – add chain IDs if they are missing
atomRange()[source]
>>> m.atomRange() == range( m.lenAtoms() )
Returns:integer range for lenght of this model
Return type:[ int ]
lenAtoms(lookup=True)[source]

Number of atoms in model.

Returns:number of atoms
Return type:int
lenResidues()[source]

Number of residues in model.

Returns:total number of residues
Return type:int
lenChains(breaks=0, maxDist=None, singleRes=0, solvent=0)[source]

Number of chains in model.

Parameters:
  • breaks (0||1) – detect chain breaks from backbone atom distances (def 0)
  • maxDist (float) – maximal distance between consequtive residues [ None ] .. defaults to twice the average distance
  • singleRes (1||0) – allow chains consisting of single residues (def 0)
  • solvent (1||0) – also check solvent residues for “chain breaks” (def 0)
Returns:

total number of chains

Return type:

int

resList(mask=None)[source]

Return list of lists of atom pseudo dictionaries per residue, which allows to iterate over residues and atoms of residues.

Parameters:mask – [ 0||1 x N_atoms ] atom mask to apply BEFORE
Returns:a list (one per residue) of lists (one per atom) of dictionaries
[ [ CrossView{'name':'N', ' residue_name':'LEU', ..},
    CrossView{'name':'CA', 'residue_name':'LEU', ..} ],

  [ CrossView{'name':'CA', 'residue_name':'GLY', ..}, .. ]
]
Return type:[ [ biskit.ProfileCollection.CrossView ] ]
resModels(i=None)[source]

Creates one new PDBModel for each residue in the parent PDBModel.

Parameters:i ([ int ] or array( int )) – range of residue positions (default: all residues)
Returns:list of PDBModels, one for each residue
Return type:[ PDBModel ]
resMapOriginal(mask=None)[source]

Generate list to map from any atom to its ORIGINAL(!) PDB residue number.

Parameters:mask (list of int (1||0)) – [00111101011100111…] consider atom: yes or no len(mask) == N_atoms
Returns:list all [000111111333344444..] with residue number for each atom
Return type:list of int
resIndex(mask=None, force=0, cache=1)[source]

Get the position of the each residue’s first atom.

Parameters:
  • force (1||0) – re-calculate even if cached result is available (def 0)
  • cache (1||0) – cache the result if new (def 1)
  • mask (list of int (1||0)) – atom mask to apply before (i.e. result indices refer to compressed model)
Returns:

index of the first atom of each residue

Return type:

list of int

resMap(force=0, cache=1)[source]

Get list to map from any atom to a continuous residue numbering (starting with 0). A new residue is assumed to start whenever the ‘residue_number’ or the ‘residue_name’ record changes between 2 atoms.

See resList() for an example of how to use the residue map.

Parameters:
  • force (0||1) – recalculate map even if cached one is available (def 0)
  • cache (0||1) – cache new map (def 1)
Returns:

array [00011111122223333..], residue index for each atom

Return type:

list of int

resEndIndex()[source]

Get the position of the each residue’s last atom.

Returns:index of the last atom of each residue
Return type:list of int
chainIndex(breaks=0, maxDist=None, force=0, cache=0, singleRes=0, solvent=0)[source]

Get indices of first atom of each chain.

Parameters:
  • breaks (1||0) – split chains at chain breaks (def 0)
  • maxDist (float) – (if breaks=1) chain break threshold in Angstrom
  • force (1||0) – re-analyze residue numbering, chain and segids to find chain boundaries, use with care! (def 0)
  • cache (1||0) – cache new index even if it was derrived from non-default parameters (def 0) Note: a simple m.chainIndex() will always cache
  • singleRes (1||0) – allow chains consisting of single residues (def 0) Otherwise group consecutive residues with identical name into one chain.
  • solvent (1||0) – also check solvent residues for “chain breaks” (default: false)
Returns:

array (1 x N_chains) of int

Return type:

list of int

chainEndIndex(breaks=0, solvent=0)[source]

Get the position of the each residue’s last atom.

Returns:index of the last atom of each residue
Return type:list of int
chainMap(breaks=0, maxDist=None)[source]

Get chain index of each atom. A new chain is started between 2 atoms if the chain_id or segment_id changes, the residue numbering jumps back or a TER record was found.

Parameters:
  • breaks (1||0) – split chains at chain breaks (def 0)
  • maxDist (float) – (if breaks=1) chain break threshold in Angstrom
Returns:

array 1 x N_atoms of int, e.g. [000000011111111111122222…]

Return type:

list of int

chainBreaks(breaks_only=1, maxDist=None, force=0, solvent=0, z=6.0)[source]

Identify discontinuities in the molecule’s backbone. By default, breaks are identified from the distribution of distances between the last backbone atom of a residue and the first backbone atom of the next residue. The median distance and standard deviation are determined iteratively and outliers (i.e. breaks) are identified as any pairs of residues with a distance that is more than z standard deviations (default 10) above the median. This heuristics can be overriden by specifiying a hard distance cutoff (maxDist).

Parameters:
  • breaks_only (1|0) – don’t report ends of regular chains (def 1)
  • maxDist (float) – maximal distance between consequtive residues [ None ] .. defaults median + z * standard dev.

:param z : z-score for outlier distances between residues (def 6.) :type z : float :param solvent: also check selected solvent residues (buggy!) (def 0) :type solvent: 1||0 :param force: force re-calculation, do not use cached positions (def 0) :type force: 1||0

Returns:atom indices of last atom before a probable chain break
Return type:list of int
removeRes(what)[source]

Remove all atoms with a certain residue name.

Parameters:what (str OR [ str ] OR int OR [ int ]) – indices or name(s) of residue to be removed
rms(other, mask=None, mask_fit=None, fit=1, n_it=1)[source]

Rmsd between two PDBModels.

Parameters:
  • other (PDBModel) – other model to compare this one with
  • mask (list of int) – atom mask for rmsd calculation
  • mask_fit (list of int) – atom mask for superposition (default: same as mask)
  • fit (1||0) – superimpose first (default 1)
  • n_it (int) – number of fit iterations:: 1 - classic single fit (default) 0 - until convergence, kicking out outliers on the way
Returns:

rms in Angstrom

Return type:

float

transformation(refModel, mask=None, n_it=1, z=2, eps_rmsd=0.5, eps_stdv=0.05, profname='rms_outlier')[source]

Get the transformation matrix which least-square fits this model onto the other model.

Parameters:
  • refModel (PDBModel) – reference PDBModel
  • mask (list of int) – atom mask for superposition
  • n_it (int) – number of fit iterations:: 1 - classic single fit (default) 0 - until convergence
  • z (float) – number of standard deviations for outlier definition (default 2)
  • eps_rmsd (float) – tolerance in rmsd (default 0.5)
  • eps_stdv (float) – tolerance in standard deviations (default 0.05)
  • profname (str) – name of new atom profile getting outlier flag
Returns:

array(3 x 3), array(3 x 1) - rotation and translation matrices

Return type:

array, array

transform(*rt)[source]

Transform coordinates of PDBModel.

Parameters:rt (array OR array, array) – rotational and translation array: array( 4 x 4 ) OR array(3 x 3), array(3 x 1)
Returns:PDBModel with transformed coordinates
Return type:PDBModel
fit(refModel, mask=None, n_it=1, z=2, eps_rmsd=0.5, eps_stdv=0.05, profname='rms_outlier')[source]

Least-square fit this model onto refMode

Parameters:
  • refModel (PDBModel) – reference PDBModel
  • mask (list of int (1||0)) – atom mask for superposition
  • n_it (int) – number of fit iterations:: 1 - classic single fit (default) 0 - until convergence
  • z (float) – number of standard deviations for outlier definition (default 2)
  • eps_rmsd (float) – tolerance in rmsd (default 0.5)
  • eps_stdv (float) – tolerance in standard deviations (default 0.05)
  • profname (str) – name of new atom profile containing outlier flag
Returns:

PDBModel with transformed coordinates

Return type:

PDBModel

magicFit(refModel, mask=None)[source]

Superimpose this model onto a ref. model with similar atom content. magicFit( refModel [, mask ] ) -> PDBModel (or subclass )

Parameters:
  • refModel (PDBModel) – reference PDBModel
  • mask (list of int (1||0)) – atom mask to use for the fit
Returns:

fitted PDBModel or sub-class

Return type:

PDBModel

structureFit(refModel, mask=None)[source]

Structure-align this model onto a reference model using the external TM-Align program (which needs to be installed).

structureFit( refModel [, mask] ) -> PDBModel (or subclass)

The result model has additional TM-Align statistics in its info record: r = m.structureFit( ref ) r.info[‘tm_score’] -> TM-Align score the other keys are: ‘tm_rmsd’, ‘tm_len’, ‘tm_id’

See also

biskit.TMAlign

Parameters:
  • refModel (PDBModel) – reference PDBModel
  • mask (list of int (1||0)) – atom mask to use for the fit
Returns:

fitted PDBModel or sub-class

Return type:

PDBModel

centered(mask=None)[source]

Get model with centered coordinates.

Parameters:mask (list of int (1||0)) – atom mask applied before calculating the center
Returns:model with centered coordinates
Return type:PDBModel
center(mask=None)[source]

Geometric centar of model.

Parameters:mask (list of int (1||0)) – atom mask applied before calculating the center
Returns:xyz coordinates of center
Return type:(float, float, float)
centerOfMass()[source]

Center of mass of PDBModel.

Returns:array(Float32)
Return type:(float, float, float)
masses()[source]

Collect the molecular weight of all atoms in PDBModel.

Returns:1-D array with mass of every atom in 1/12 of C12 mass.
Return type:array of floats
Raises:PDBError – if the model contains elements of unknown mass
mass()[source]

Molecular weight of PDBModel.

Returns:total mass in 1/12 of C12 mass
Return type:float
Raises:PDBError – if the model contains elements of unknown mass
residusMaximus(atomValues, mask=None)[source]

Take list of value per atom, return list where all atoms of any residue are set to the highest value of any atom in that residue. (after applying mask)

Parameters:
  • atomValues (list) – values per atom
  • mask (list of int (1||0)) – atom mask
Returns:

array with values set to the maximal intra-residue value

Return type:

array of float

argsort(cmpfunc=None)[source]

Prepare sorting atoms within residues according to comparison function.

Parameters:
  • cmpfunc (function) – old style function(m.atoms[i], m.atoms[j]) -> -1, 0, +1
  • key (function) – new style sort key function(m.atoms[i]) -> sortable
Returns:

suggested position of each atom in re-sorted model ( e.g. [2,1,4,6,5,0,..] )

Return type:

list of int

sort(sortArg=None)[source]

Apply a given sort list to the atoms of this model.

Parameters:sortArg (function) – comparison function
Returns:copy of this model with re-sorted atoms (see numpy.take() )
Return type:PDBModel
unsort(sortList)[source]

Undo a previous sorting on the model itself (no copy).

Parameters:sortList (list of int) – sort list used for previous sorting.
Returns:the (back)sort list used ( to undo the undo…)
Return type:list of int
Raises:PDBError – if sorting changed atom number
atomNames(start=None, stop=None)[source]

Return a list of atom names from start to stop RESIDUE index

Parameters:
  • start (int) – index of first residue
  • stop (int) – index of last residue
Returns:

[‘C’,’CA’,’CB’ …. ]

Return type:

list of str

filterIndex(mode=0, **kw)[source]

Get atom positions that match a combination of key=values. E.g. filter( chain_id=’A’, name=[‘CA’,’CB’] ) -> index

Parameters:
  • mode (0||1) – 0 combine with AND (default), 1 combine with OR
  • kw (filter options, see example) – combination of atom dictionary keys and values/list of values that will be used to filter
Returns:

sort list

Return type:

list of int

filter(mode=0, **kw)[source]

Extract atoms that match a combination of key=values. E.g. filter( chain_id=’A’, name=[‘CA’,’CB’] ) -> PDBModel

Parameters:
  • mode (0||1) – 0 combine with AND (default), 1 combine with OR
  • kw (filter options, see example) – combination of atom dictionary keys and values/list of values that will be used to filter
Returns:

filterd PDBModel

Return type:

PDBModel

equals(ref, start=None, stop=None)[source]

Compares the residue and atom sequence in the given range. Coordinates are not checked, other profiles are not checked.

Parameters:
  • start (int) – index of first residue
  • stop (int) – index of last residue
Returns:

[ 1||0, 1||0 ], first position sequence identity 0|1, second positio atom identity 0|1

Return type:

list if int

compareAtoms(ref)[source]

Get list of atom indices for this and reference model that converts both into 2 models with identical residue and atom content.

E.g.
>>> m2 = m1.sort()    ## m2 has now different atom order
>>> i2, i1 = m2.compareAtoms( m1 )
>>> m1 = m1.take( i1 ); m2 = m2.take( i2 )
>>> m1.atomNames() == m2.atomNames()  ## m2 has again same atom order
Returns:indices, indices_ref
Return type:([int], [int])
unequalAtoms(ref, i=None, iref=None)[source]

Identify atoms that are not matching between two models. This method returns somewhat of the opposite of compareAtoms().

Not matching means: (1) residue is missing, (2) missing atom within a residue, (3) atom name is different. Differences in coordinates or other atom profiles are NOT evaluated and will be ignored.

(not speed-optimized)

Parameters:
  • ref (PDBModel) – reference model to compare to
  • i (array( int ) or [ int ]) – pre-computed positions that are equal in this model (first value returned by compareAtoms() )
  • iref – pre-computed positions that are equal in ref model (first value returned by compareAtoms() )
Returns:

missmatching atoms of self, missmatching atoms of ref

Return type:

array(int), array(int)

reportAtoms(i=None, n=None)[source]
Parameters:i ([ int ]) – optional list of atom positions to report (default: all)
Returns:formatted string with atom and residue names similar to PDB
Return type:str
compareChains(ref, breaks=0, fractLimit=0.2)[source]

Get list of corresponding chain indices for this and reference model. Use takeChains() to create two models with identical chain content and order from the result of this function.

Parameters:
  • ref (PDBModel) – reference PDBModel
  • breaks (1||0) – look for chain breaks in backbone coordinates
  • fractLimit (float) –
Returns:

chainIndices, chainIndices_ref

Return type:

([int], [int])

biomodel(assembly=0)[source]

Return the ‘biologically relevant assembly’ of this model according to the information in the PDB’s BIOMT record (captured in info[‘BIOMT’]).

This removes redundant chains and performs symmetry operations to complete multimeric structures. Some PDBs define several alternative biological units: usually (0) the author-defined one and (1) software-defined – see lenBiounits.

Note: The BIOMT data are currently not updated during take/compress calls which may change chain indices and content. This method is therefore best run on an original PDB record before any other modifications are performed.

Parameters:assembly (int) – assembly index (default: 0 .. author-determined unit)
Returns:PDBModel; biologically relevant assembly
lenBiounits()[source]

Number of biological assemblies defined in PDB BIOMT record, if any.

Returns:number of alternative biological assemblies defined in PDB header
Return type:int
atomkey(compress=True)[source]

Create a string key encoding the atom content of this model independent of the order in which atoms appear within residues. Atom names are simply sorted alphabetically within residues and then concatenated.

Parameters:compress (bool) – compress key with zlib (default: true)
Returns:key formed from sorted atom content of model
Return type:str