ModelCraft can be installed using pip, e.g.
python3 -m pip install --user modelcraft
Refer to the pip documentation if pip is not installed. ModelCraft also requires an installation of CCP4. The CCP4 environment needs to be set up so that programs such as Buccaneer and Refmac can be called from the command line. The simplest execution requires only a reflection data file in MTZ format (with amplitudes, a free-R flag and starting phases) and a description of the asymmetric unit contents.
modelcraft --data data.mtz --contents contents.json
Alternatively, a model can be provided (in PDB or mmCIF format), which will be refined and used as a starting point instead of starting from phases in the data file.
modelcraft --data data.mtz --contents contents.json --model model.cif
--data FILE
Input reflection data in MTZ format.
This must contain observed amplitudes and a free-R flag.
If a starting model has not been provided then it must also contain phases
as a phase and figure-of-merit or Hendrickson-Lattman coefficients.
An attempt will be made to automatically identify column labels
unless they are specified using the
--amplitudes
,
--phases
and
--freerflag
arguments.
--contents FILE
A description of the assymetric unit contents,
either as a sequence file (in FASTA or PIR format)
or a more detailed contents file in JSON format.
See the ASU Contents Description
section of the documentation for more details.
--amplitudes COLS
e.g. FP,SIGFP
Column labels for the observed amplitudes.
--basic
Run a basic pipeline using only Buccaneer and Refmac.
Parrot density modification is still used on the first cycle
and input models are still refined using Sheetbend and Refmac.
--convergence-cycles N
default: 4
Number of cycles without improvement
needed to stop the pipeline automatically.
--convergence-tolerance
is used to determine whether
a cycle gave an improvement.
--convergence-tolerance
default: 0.1
R-free difference needed to mark a cycle as an improvement.
If the model at the end of a cycle
has an R-free that is this much better than the previous best R-free
then the cycle is marked as an improvement.
--convergence-cycles
states
how many cycles without improvement are needed to stop automatically.
--cycles N
default: 25
Maximum number of pipeline cycles.
--freerflag COL
e.g. FreeR_flag
Column label for the free-R flag.
--help
Show an automatically generated help message.
--keep-jobs
Keep the files from intermediate jobs instead of deleting them.
This can lead to large directory sizes.
--keep-logs
Keep log files if not using the full --keep-jobs
argument.
--model FILE
Starting model.
If input phases are not specified this will be refined
using Sheetbend followed by Refmac.
--no-auto-stop
Run the maximum number of cycles even if the model is not improving.
--phases COLS
e.g. PHIB,FOM
or HLA,HLB,HLC,HLD
Column labels for input phases as either
a phase and figure of merit or Hendrickson-Lattman coefficients.
--twinned
Turn on twinned refinement.
Only do this if you are sure your data are twinned.
--unbiased
Pass input phases to Refmac for MLHL refinement.
--buccaneer FILE
Path to an alternative buccaneer binary.
A description of the expected contents of the asymmetric unit
must be provided as a FASTA sequence file or a JSON file
using the --contents
argument.
A sequence file is simpler,
but the JSON format has the following advantages:
In order to create a JSON file it may be helpful
to start from the contents for an existing PDB entry.
The modelcraft-contents
script
creates a contents JSON file for a released PDB entry.
An example JSON file is shown below:
{
"copies": 2,
"proteins": [
{
"sequence": "LPGECSVNVIPKMNLDKAKFFSGTWYETHYLDMDPQATEKFCFSFAPRESGGTVMEALYHFNVDSKV",
"start": 1,
"copies": 1,
"modifications": ["M->MSE"]
},
{
"sequence": "GGG"
}
],
"rnas": [
{
"sequence": "GGUAACUGUUACAGUUACC",
"start": 5,
"copies": 2,
"modifications": ["5->GTP", "23->CCC"]
}
],
"dnas": [],
"carbs": [
{ "codes": { "NAG": 2 }, "copies": 1 },
{ "codes": { "MAN": 1, "NAG": 2 }, "copies": 1 }
],
"ligands": [
{ "code": "HEM", "copies": 1 }
],
"buffers": ["GOL", "NA", "CL"]
}
The file has a list of proteins
, rnas
,
dnas
, carbs
, ligands
,
and buffers
that are in the crystal.
The only mandatory items are that each
protein, RNA or DNA chain must have a sequence
,
each carbohydrate must have a dictionary of codes
to specify the number of each sugar,
and each ligand must have a single code
.
Each component (other than buffers) has a
copies
parameter to specify the stoichiometry.
In the example above there are 2 RNA chains for each protein chain.
If the stoichiometry is not specified it is assumed to be 1.
There is also a copies
parameter for the whole file
to specify how many copies of the contents are in the asymmetric unit.
If this value is not known the most likely number will be estimated.
The modelcraft-copies
script can be used to
view the solvent fraction and probability for each number of copies
given a contents file and an MTZ file.
It is assumed that the number of ordered buffer molecules is unknown
so they are not included in the solvent calculation.
In order to specify a number of copies
they can be included as ligand components.
Finally, protein, RNA and DNA chains may have
a list of modifications
,
e.g. M->MSE
to specify that
all methionine residues are actually selenomethionine
or 5->GTP
to specify that
the residue 5 is guanosine triphosphate.
The start
property
changes the residue numbers in modifications,
e.g. residue 5 in the RNA chain is actually the first residue.
Note: ModelCraft does not yet build carbohydrates, ligands, or modified residues (other than selenomethionine derivatives). However, this is planned for the future and inclusion of these components in the contents allows for more accurate calculation of the solvent fraction during density modification.