Crystallographic refinement

Getting started

To perform crystallographic refinement with Servalcat, it is necessary to specify an input model (PDB, mmCIF or smCIF), diffraction data (MTZ or CIF format) and radiation source type (xray, neutron or electron). For example:

$ servalcat refine_xtal_norefmac \
 --model input.pdb --hklin ../data.mtz \
 -s xray \
 [-o prefix]
Output files:
  • prefix.pdb: refined model (legacy PDB format)

  • prefix.mmcif: refined model (mmCIF format)

  • prefix.mtz: 2Fo-Fc and Fo-Fc maps which can be auto-opened with Coot.

Output logs:
  • servalcat.log

  • prefix_stats.json: refinement statistics per cycle in JSON format

Frequently used options

  • --ligand [LIGAND ...]: Restraint dictionary CIF file(s)

  • --ncycle NCYCLE: Number of refinement cycles. Default: 10

  • --weight WEIGHT: Starting value of the weight for the experimental data term (default: automatically determined from resolution). By default, the weight is further adjusted to achieve bond length rmsZ in the range between 0.5 and 1.0. This can be changed using the option --target_bond_rmsz_range MIN_RMSZ MAX_RMSZ.

  • --ncsr: Use local restraints for non-crystallographic symmetry

  • --jellybody: Use jelly body restraints

  • --adp {iso,aniso,fix}: Atomic displacement parameter (isotropic: 1 parameter per atom, anisotropic: 6 parameters per atom, fixed B-values)

  • -d D_MIN, --d_min D_MIN: High-resolution limit (in Å)

  • --d_max D_MAX: Low-resolution limit (in Å)

  • --free FREE: flag number for test set

  • --hydrogen {all,yes,no}: Hydrogen atoms - all: (re)generate riding hydrogen atoms, yes: use hydrogen atoms if present in input structure model, no: remove hydrogen atoms in input structure model. Default: all.

  • --hout: Write hydrogen atoms in the output model

  • --twin: Twin refinement

  • --randomize RANDOMIZE: Shake coordinates with a specified rms (in Å)

  • --bfactor BFACTOR: Reset all atomic B values to specified value

  • --keywords KEYWORDS [KEYWORDS ...]: Keyword(s) in REFMAC5 syntax. See supported Refmac keywords

  • --keyword_file KEYWORD_FILE [KEYWORD_FILE ...]: File with keyword(s) in REFMAC5 syntax (e.g. ProSMART restraint file)

Input columns for diffraction data

In the required --hklin option, it is possible to provide merged or unmerged diffraction data (MTZ or CIF format). If there are multiple columns available in the input file, mean amplitudes (Friedel pairs averaged) are used by default.

To specify which columns to use, use the --labin option. For example, the file data_merged.mtz contains the following columns with merged diffraction data:

H K L FreeR_flag IMEAN SIGIMEAN I(+) SIGI(+) I(-) SIGI(-) FP SIGFP F(+) SIGF(+) F(-) SIGF(-)

Servalcat would select to use the FP, SIGFP,FreeR_flag columns by default (refinement against mean structure factor amplitudes). Anyway, we can specify to use intensities or separate Friedel pairs as follows:

  • --labin I(+),SIGI(+),I(-),SIGI(-),FreeR_flag (refinement against intensities, separate Friedel pairs)

  • --labin IMEAN,SIGIMEAN,FreeR_flag (refinement against mean intensities)

  • --labin F(+),SIGF(+),F(-),SIGF(-),FreeR_flag (refinement against amplitudes, separate Friedel pairs)

  • --labin FP, SIGFP,FreeR_flag (refinement against mean amplitudes, selected by default)

If the separate Friedel pairs are specified, anomalous difference density map (FAN and PHAN) columns will be present in the output MTZ file. Note that the anomalous signal is used only for the map calculation but not for the actual refinement.

If the column for unmerged intensities is specified, Servalcat merges the data internally and refines against merged intensities. In the log files, the CC* statistic is also available (described below). An MTZ or CIF file with free flags can be specified with the --hklin_free option. A particular column for free flags in this file can be specified with the --labin_free option.

Logs and statistics

Refinement progress is monitored by several statistics that quantify the agreement of the refined model with the experimental diffraction data as well as its quality in respect to expected geometry. The statistics are written in the log file servalcat.log and are also available in the JSON format output_prefix_stats.json, which is updated every cycle.

Model agreement with data

Which model quality statistics are calculated depends on the nature of the input reflection data.

When amplitudes are used for refinement and free R flags are provided for the test set, the conventional Rwork and Rfree values are provided. Moreover, the correlation coefficients CCFwork and CCFfree between experimentally observed amplitudes and amplitudes calculated based on the refined model are calculated. If free R flags are not available, only R (sometimes called Rall) and CCF are reported.

When intensities (or unmerged diffraction data) are given and free R flags are provided, R1work and R1free values are provided, as is common in crystallography of small molecules. These statistics are calculated as R-values between the square roots of observed and calculated intensities, considering only reflections with an intensity-to-sigma ratio above 2. The correlation coefficients CCIwork and CCIfree between experimentally observed intensities and intensities calculated based on the refined model are also calculated. If free R flags are not provided, only R1 and CCI are available.

If unmerged data are used, CC* statistic is also reported. It estimates the data quality and represents an upper limit for the CCI statistics. See Karplus and Diederichs (2012) or Diederichs and Karplus (2013).

Model agreement with ideal geometry

The geometry of the refined model is assessed in the same way as in refinement against cryoEM SPA maps. Root mean square deviations (RMSD) from expected bond lengths and angles are calculated, as well as their Z-scores (RMSZ), which represent how many standard deviations the observed geometry deviates from the ideal values.

Importantly, various kinds of individual model geometry outliers with a Z-score greater than 5 are reported in the last cycle of refinement in servalcat.log and output_prefix_stats.json. It is recommended to inspect these outliers as they can indicate where the model may need improvement.

Radiation sources

Radiation sources can be changed by using the -s or --source option (xray, neutron, electron); see Scattering source.

When performing refinement against neutron diffraction data, it is possible to refine deuterium fraction using the option --refine_dfrac. In this case, an extra output file output_prefix_expanded.mmcif is created for the purpose of deposition to the PDB. The bond lengths and their sigmas from _chem_comp_bond.value_dist_nucleus and _chem_comp_bond.value_dist_nucleus_esd are used.

Small molecules

Servalcat can also refine small molecules against crystallographic data. A generally suggested protocol is as follows:

$ servalcat refine_xtal_norefmac \
 --model small_molecule.cif --hklin small_molecule.hkl \
 -s xray --unrestrained --hydrogen no --adp aniso --no_solvent \
 [-o prefix]

For --model and --hklin, common formats for small molecule crystallography (.hkl, .cif, .res, .ins) can be used. This will run unrestrained refinement with anisotropic ADPs without bulk solvent correction. Note that the Servalcat does not currently support riding hydrogen atoms for the unrestrained refinement. If you require riding hydrogen atoms, you will need to use restrained refinement using dictionaries generated by AceDRG.

Servalcat provides a helper command which converts files between the small molecule formats to .pdb, .mmcif or .mtz:

$ servalcat util sm2mm structure_and_data.cif -o output

Complete list of options

$ servalcat refine_xtal_norefmac --help
usage: servalcat refine_xtal_norefmac [-h] --hklin HKLIN [--hklin_free HKLIN_FREE] [-d D_MIN] [--d_max D_MAX] [--nbins NBINS]
                                    [--nbins_ml NBINS_ML] [--labin LABIN] [--labin_free LABIN_FREE] [--free FREE] --model MODEL
                                    [--monlib MONLIB] [--ligand [LIGAND ...]] [--newligand_continue] [--hydrogen {all,yes,no}] [--hout]
                                    [--jellybody] [--jellybody_params sigma dmax] [--jellyonly] [--find_links]
                                    [--keywords KEYWORDS [KEYWORDS ...]] [--keyword_file KEYWORD_FILE [KEYWORD_FILE ...]]
                                    [--randomize RANDOMIZE] [--ncycle NCYCLE] [--weight WEIGHT] [--no_weight_adjust]
                                    [--target_bond_rmsz_range TARGET_BOND_RMSZ_RANGE TARGET_BOND_RMSZ_RANGE] [--ncsr]
                                    [--adpr_weight ADPR_WEIGHT] [--occr_weight OCCR_WEIGHT] [--bfactor BFACTOR] [--fix_xyz]
                                    [--adp {fix,iso,aniso}] [--refine_all_occ] [--max_dist_for_adp_restraint MAX_DIST_FOR_ADP_RESTRAINT]
                                    [--adp_restraint_power ADP_RESTRAINT_POWER] [--adp_restraint_exp_fac ADP_RESTRAINT_EXP_FAC]
                                    [--adp_restraint_no_long_range] [--adp_restraint_mode {diff,kldiv}] [--unrestrained] [--refine_h]
                                    [--refine_dfrac] [--twin] -s {electron,xray,neutron} [--no_solvent] [--use_in_est {all,work,test}]
                                    [--keep_charges] [--keep_entities] [--allow_unusual_occupancies] [-o OUTPUT_PREFIX]
                                    [--write_trajectory] [--vonmises] [--prefer_intensity] [--use_fw] [--config CONFIG]

program to refine crystallographic structures

optional arguments:
-h, --help            show this help message and exit
--hklin HKLIN
--hklin_free HKLIN_FREE
                        Input MTZ file for test flags
-d D_MIN, --d_min D_MIN
--d_max D_MAX
--nbins NBINS         Number of bins for statistics (default: auto)
--nbins_ml NBINS_ML   Number of bins for ML parameters (default: auto)
--labin LABIN         F,SIGF,FREE input
--labin_free LABIN_FREE
                        MTZ column of --hklin_free
--free FREE           flag number for test set
--model MODEL         Input atomic model file
--monlib MONLIB       Monomer library path. Default: $CLIBD_MON
--ligand [LIGAND ...]
                        restraint dictionary cif file(s)
--newligand_continue  Make ad-hoc restraints for unknown ligands (not recommended)
--hydrogen {all,yes,no}
                        all: add riding hydrogen atoms, yes: use hydrogen atoms if present, no: remove hydrogen atoms in input. Default:
                        all
--hout                write hydrogen atoms in the output model
--jellybody           Use jelly body restraints
--jellybody_params sigma dmax
                        Jelly body sigma and dmax (default: [0.01, 4.2])
--jellyonly           Jelly body only (experimental, may not be useful)
--find_links          Automatically add links
--keywords KEYWORDS [KEYWORDS ...]
                        refmac keyword(s)
--keyword_file KEYWORD_FILE [KEYWORD_FILE ...]
                        refmac keyword file(s)
--randomize RANDOMIZE
                        Shake coordinates with specified rmsd
--ncycle NCYCLE       number of CG cycles (default: 10)
--weight WEIGHT       refinement weight (default: auto)
--no_weight_adjust    Do not adjust weight during refinement
--target_bond_rmsz_range TARGET_BOND_RMSZ_RANGE TARGET_BOND_RMSZ_RANGE
                        Bond rmsz range for weight adjustment (default: [0.5, 1.0])
--ncsr                Use local NCS restraints
--adpr_weight ADPR_WEIGHT
                        ADP restraint weight (default: 1.000000)
--occr_weight OCCR_WEIGHT
                        Occupancy restraint weight (default: 0.000000)
--bfactor BFACTOR     reset all atomic B values to specified value
--fix_xyz
--adp {fix,iso,aniso}
--refine_all_occ
--max_dist_for_adp_restraint MAX_DIST_FOR_ADP_RESTRAINT
--adp_restraint_power ADP_RESTRAINT_POWER
--adp_restraint_exp_fac ADP_RESTRAINT_EXP_FAC
--adp_restraint_no_long_range
--adp_restraint_mode {diff,kldiv}
--unrestrained        No positional restraints
--refine_h            Refine hydrogen (default: restraints only)
--refine_dfrac        Refine deuterium fraction (neutron only)
--twin                Turn on twin refinement
-s {electron,xray,neutron}, --source {electron,xray,neutron}
--no_solvent          Do not consider bulk solvent contribution
--use_in_est {all,work,test}
                        Which set of reflections to use for the ML parameter estimation. Default: 'work' if --twin is set; otherwise
                        'test'.
--keep_charges        Use scattering factor for charged atoms. Use it with care.
--keep_entities       Do not override entities
--allow_unusual_occupancies
                        Allow negative or more than one occupancies
-o OUTPUT_PREFIX, --output_prefix OUTPUT_PREFIX
--write_trajectory    Write all output from cycles
--vonmises            Experimental: von Mises type restraint for angles
--prefer_intensity
--use_fw              For debugging purpose; use F&W-converted amplitudes but use intensity for stats
--config CONFIG       Config file (.yaml)

$ servalcat --version
Servalcat 0.4.123 with Python 3.9.18 (gemmi 0.7.3, scipy 1.7.3, numpy 1.19.5, pandas 1.3.5)