PDBCleaner

class biskit.PDBCleaner(fpdb, log=None, verbose=True)[source]

Bases: object

PDBCleaner performs the following tasks:

  • remove HETAtoms from PDB
  • replace non-standard AA by its closest standard AA
  • remove non-standard atoms from standard AA residues
  • delete atoms that follow missing atoms (in a chain)
  • remove multiple occupancy atoms (except the one with highest occupancy)
  • add ACE and NME capping residues to C- and N-terminals or chain breaks (see capTerminals(), this is NOT done automatically in process())
>>> c = PDBCleaner( model )
>>> c.process()
>>> c.capTerminals( auto=True )

This will modify the model in-place and report changes to STDOUT. Alternatively, you can specify a log file instance for the output. PDBCleaner.process accepts several options to modify the processing.

Capping will add N-methyl groups to free C-terminal carboxy ends or Acetyl groups to free N-terminal Amines and will thus ‘simulate’ the continuation of the protein chain – a common practice in order to prevent fake terminal charges. The automatic discovery of missing residues is guess work at best. The more conservative approach is to use, for example:

>>> c.capTerminals( breaks=1, capC=[0], capN=[2] )

In this case, only the chain break detection is used for automatic capping – the last residue before a chain break is capped with NME and the first residue after the chain break is capped with ACE. Chain break detection relies on PDBModel.chainBreaks() (via PDBModel.chainIndex( breaks=1 )). The normal terminals to be capped are now specified explicitely. The first chain (not counting chain breaks) will receive a NME C-terminal cap and the third chain of the PDB will receive a N-terminal ACE cap.

Note: Dictionaries with standard residues and atom content are defined
in Biskit.molUtils. This is a duplicate effort with the new strategy to parse Amber prep files for very similar information (AmberResidueType, AmberResidueLibrary) and should change once we implement a real framework for better residue handling.

Methods Overview

__init__
param fpdb:pdb file OR PDBModel instance :type fpdb: str OR Biskit.PDBModel :param log: biskit.LogFile object (default: STDOUT) :type log: biskit.LogFile :param verbose: log warnings and infos (default: True) :type verbose: bool
capACE Cap N-terminal of given chain.
capNME Cap C-terminal of given chain.
capTerminals Add NME and ACE capping residues to chain breaks or normal N- and C-terminals.
convertChainIdsCter Convert normal chain ids to chain ids considering chain breaks.
convertChainIdsNter Convert normal chain ids to chain ids considering chain breaks.
filterProteinChains
logWrite
process Remove Hetatoms, waters.
remove_multi_occupancies Keep only atoms with alternate A field (well, or no alternate).
remove_non_standard_atoms First missing standard atom triggers removal of standard atoms that follow in the standard order.
replace_non_standard_AA Replace amino acids with none standard names with standard amino acids according to MU.nonStandardAA
unresolvedTerminals Autodetect (aka “guess”) which N- and C-terminals are most likely not the real end of each chain.

Attributes Overview

F_ace_cap
F_nme_cap
TOLERATE_MISSING these atoms always occur at the tip of of a chain or within a ring and, if missing, will not trigger the removal of other atoms

PDBCleaner Method & Attribute Details

TOLERATE_MISSING = ['O', 'CG2', 'CD1', 'CD2', 'OG1', 'OE1', 'NH1', 'OD1', 'OE1', 'H5T', "O5'"]

these atoms always occur at the tip of of a chain or within a ring and, if missing, will not trigger the removal of other atoms

__init__(fpdb, log=None, verbose=True)[source]
Parameters:
  • fpdb (str OR Biskit.PDBModel) – pdb file OR PDBModel instance
  • log (biskit.LogFile) – biskit.LogFile object (default: STDOUT)
  • verbose (bool) – log warnings and infos (default: True)
remove_multi_occupancies()[source]

Keep only atoms with alternate A field (well, or no alternate).

replace_non_standard_AA(amber=0, keep=[])[source]

Replace amino acids with none standard names with standard amino acids according to MU.nonStandardAA

Parameters:
  • amber (1||0) – don’t rename HID, HIE, HIP, CYX, NME, ACE [0]
  • keep ([ str ]) – names of additional residues to keep
remove_non_standard_atoms()[source]

First missing standard atom triggers removal of standard atoms that follow in the standard order. All non-standard atoms are removed too. Data about standard atoms are taken from MU.atomDic and symomym atom name is defined in MU.atomSynonyms.

Returns:number of atoms removed
Return type:int
capACE(model, chain, breaks=True, checkgap=True)[source]

Cap N-terminal of given chain.

Note: In order to allow the capping of chain breaks, the chain index is, by default, based on model.chainIndex(breaks=True), that means with chain break detection activated! This is not the default behaviour of PDBModel.chainIndex or takeChains or chainLength. Please use the wrapping method capTerminals() for more convenient handling of the index.

Parameters:
  • model (PDBMode) – model
  • chain (int) – index of chain to be capped
  • breaks (bool) – consider chain breaks when identifying chain boundaries
Returns:

model with added NME capping

:rtype : PDBModel

capNME(model, chain, breaks=True, checkgap=True)[source]

Cap C-terminal of given chain.

Note: In order to allow the capping of chain breaks, the chain index is, by default, based on model.chainIndex(breaks=True), that means with chain break detection activated! This is not the default behaviour of PDBModel.chainIndex or takeChains or chainLength. Please use the wrapping method capTerminals() for more convenient handling of the index.

Parameters:
  • model (PDBMode) – model
  • chain (int) – index of chain to be capped
  • breaks (bool) – consider chain breaks when identifying chain boundaries
Returns:

model with added NME capping residue

:rtype : PDBModel

convertChainIdsNter(model, chains)[source]

Convert normal chain ids to chain ids considering chain breaks.

convertChainIdsCter(model, chains)[source]

Convert normal chain ids to chain ids considering chain breaks.

unresolvedTerminals(model)[source]

Autodetect (aka “guess”) which N- and C-terminals are most likely not the real end of each chain. This guess work is based on residue numbering:

  • unresolved N-terminal: a protein residue with a residue number > 1
  • unresolved C-terminal: a protein residue that does not contain either
    OXT or OT or OT1 or OT2 atoms
Parameters:model – PDBModel
Returns:chains with unresolved N-term, with unresolved C-term

:rtype : ([int], [int])

capTerminals(auto=False, breaks=False, capN=[], capC=[], checkgap=True)[source]

Add NME and ACE capping residues to chain breaks or normal N- and C-terminals. Note: these capping residues contain hydrogen atoms.

Chain indices for capN and capC arguments can be interpreted either with or without chain break detection enabled. For example, let’s assume we have a two-chain protein with some missing residues (chain break) in the first chain:

A: MGSKVSK—FLNAGSK B: FGHLAKSDAK

Then:
capTerminals( breaks=False, capN=[1], capC=[1]) will add N-and C-terminal caps to chain B.
However:
capTerminals( breaks=True, capN=[1], capC=[1]) will add N- and C-terminal caps to the second fragment of chain A.

Note: this operation replaces the internal model.

Parameters:
  • auto (bool) – put ACE and NME capping residue on chain breaks and on suspected false N- and C-termini (default: False)
  • breaks (False) – switch on chain break detection before interpreting capN and capC
  • capN ([int]) – indices of chains that should get ACE cap (default: [])
  • capC ([int]) – indices of chains that should get NME cap (default: [])
process(keep_hetatoms=0, amber=0, keep_xaa=[])[source]

Remove Hetatoms, waters. Replace non-standard names. Remove non-standard atoms.

Parameters:
  • keep_hetatoms (0||1) – option
  • amber (0||1) – don’t rename amber residue names (HIE, HID, CYX,..)
  • keep_xaa ([ str ]) – names of non-standard residues to be kept
Returns:

PDBModel (reference to internal)

Return type:

PDBModel

Raises:

CleanerError – if something doesn’t go as expected …