Skip to content

Latest commit

 

History

History
599 lines (480 loc) · 24.7 KB

processing_in_detail.md

File metadata and controls

599 lines (480 loc) · 24.7 KB

Processing in Detail (APS / CCP4 2021)

Introduction

DIALS processing may be performed by either running the individual tools (spot finding, indexing, refinement, integration, symmetry, scaling, exporting to MTZ) or you can run xia2, which makes informed choices for you at each stage. In this tutorial we will run through each of the steps in turn, taking a look at the output as we go. We will also look at enforcing the correct lattice symmetry.

The aim of this tutorial is to introduce you to the tools, not teach about data processing - it is assumed you have some idea of the overall process from e.g. associated lectures. With the graphical tools, I am not making so much effort to explain the options as simply "playing" will give you a chance to learn your way around and also find the settings which work for you. Particularly with looking at diffraction images, the "best" settings are very personal.

DIALS version

This tutorial assumes you are using DIALS version 3.5 and that you have this set up (i.e. you've sourced the setup file).

If you type dials.version you should see something like:

DIALS 3.5.0-g82bac9855-release
Python 3.8.10
Installed in: /Applications/dials-v3-5-0/modules/dials

If you are running on the APS virtual machines, module load dials-3.5.0 will get you set up and there is no need to download the data. If you are running at home on Linux or macOS then you should be able to reproduce the results in here. If you are on Windows, try installing the Linux version in a WSL terminal using e.g. Ubuntu.

Tutorial data

The following example uses a small Thaumatin data set collected on beamline i03 at Diamond Light Source, which is available from Zenodo

Files

DIALS creates two principal file types:

  • experiment files called something.expt
  • reflection files called something.refl

"Experiment" in DIALS has a very specific meaning - the capturing of data from one set of detector, beam, goniometer and crystal - so if you have two scans from one crystal this is two experiments, if you have two lattices on one data set this is two experiments. In most cases you can ignore this distinction though.

Usually the output filenames will correspond to the name of the DIALS program that created them e.g. indexed.refl and indexed.expt from dials.index. The only deviations from this are on import (see below) where we are only reading experiment models and spot finding where we find strong reflections so write these to strong.refl - and we create no models so (by default) there is no output experiment file.

At any time you can look at these files with dials.show which will summarise the content of the files to the terminal.

If you're impatient...

Parameters

All DIALS programs accept parameters in the form of parameter=value - in most cases this will be sufficient though some less frequently used options may require "name space" clarification e.g. index_assignment.method=local. All of the DIALS programs support the option

dials.program -c -e2

which will show you all possible configuration options - if you are looking for an option this is the simplest way to search so e.g.

dials.index -c -e2 | less

will allow you to scroll through the extensive list of options you can adjust. In most cases the defaults are relatively sensible for synchrotron data from a pixel array detector, as we are using in this tutorial.

Output

In the majority of cases the dials programs write their output to dials.program.log e.g. dials.find_spots.log etc. - everything which is printed to the terminal is also saved in this file, so you can review the processing later. In the case where you are reporting an issue to the developers including these log files in the error report (particularly for the step which failed) is very helpful.

From most stages you can generate a more verbose report of the current state of processing with:

dials.report step.expt step.refl

which will generate a detailed report as HTML describing the current state of the processing.

Import

The starting point for any processing with DIALS is to import the data - here the metadata are read and a description of the data to be processed saved to a file named imported.expt. This is "human readable" in that the file is JSON format (roughly readable text with brackets around to structure for computers). While you can edit this file if you know what you are doing, usually this is not necessary.

dials.import xtal_1_5_master.h5

will read the metadata from this master file and write imported.expt from this - equally in this case you could import from the NeXus formatted file (which is functionally equivalent) with

dials.import xtal_1_5.nxs

It is important to note that for well-behaved data (i.e. anything which is well-collected from a well-behaved sample) the commands below will often be identical after importing.

At this point you can actually look at the images with the dials.image_viewer tool -

dials.image_viewer imported.expt

in this tool there are many settings you can adjust, which could depend on the source of the data and - most importantly - your preferences. Personally the author finds for basic inspection of the images the brightness is a bit high for pixel array data, and a value of 10 may be better for viewing the diffraction pattern as a whole.

To get a sense of how the diffraction spots are spread, stacking images can help - for example in this case setting the stack to 10 gives a good idea of the real separation between reflections. If the data are not stacked the spot finding process can also be explored - the controls at the bottom of the "Settings" window allow you to step through these and can be very useful for getting a "computer's eye view" of how the data look (particularly for establishing where the diffraction is visible to.)

Here is a short discussion on some more details of importing data.

Find Spots

The first "real" task in any processing using DIALS is the spot finding. Since this is looking for spots on every image in the dataset, this process can take some time so by default will use all of the processors available in your machine - if you would like to control this adjust with e.g. nproc=4 - however the default is usually sensible unless you are sharing the computer with many others.

dials.find_spots imported.expt

This is one of the two steps where every image in the data set is read and processed and hence can be moderately time-consuming. This contains a reflection file strong.refl which contains both the positions of the strong spots and also "images" of the spot pixels which we will use later. You can view these spots on top of the images with

dials.image_viewer imported.expt strong.refl

to get a sense of what spots were found. You will see that the spots are surrounded by little blue boxes - these are the bounding boxes of the reflections i.e. the outer extent of the connected regions of the signal pixels. The signal pixels are highlighted with green blobs giving a sense of what is and is not "strong."

Image viewer

The default parameters for spot finding usually do a good job for Pilatus images, such as these. However they may not be optimal for data from other detector types, such as CCDs or image plates. Issues with incorrectly set gain might, for example, lead to background noise being extracted as spots. You can use the image mode buttons to preview how the parameters affect the spot finding algorithm. The final button 'threshold’ is the one on which spots were found, so ensuring this produces peaks at real diffraction spot positions will give the best chance of success.

The second tool for visualisation of the found spots is the reciprocal lattice viewer - which presents a view of the spot positions mapped to reciprocal space.

dials.reciprocal_lattice_viewer imported.expt strong.refl

No matter the sample orientation you should be able to rotate the space to "look down" the lines of reflections. If you cannot, or the lines are not straight, it is likely that there are some errors in the experiment parameters e.g. detector distance or beam centre. If these are not too large they will likely be corrected in the subsequent analysis.

Reciprocal viewer

Have a play with the settings - you can change the beam centre in the viewer to see how nicely aligned spots move out of alignment. Some of the options will only work after you have indexed the data. If the geometry is not accurately recorded you may find it useful to run:

dials.search_beam_position imported.expt strong.refl

to determine an updated position for the beam centre - ideally the shift that this calculates should be small if the beamline is well-calibrated

  • if it is a couple of mm or more it may be worth discussing this with the beamline staff! Running the reciprocal lattice viewer with the optimised experiment output:
dials.reciprocal_lattice_viewer optimised.expt strong.refl

should show straight lines, provided everything has worked correctly.

Further discussion of the output can be found here.

Indexing

The next step will be indexing of the found spots with dials.index - by default this uses a 3D FFT algorithm to identify periodicity in the reciprocal space mapped spot positions, though there are other algorithms available which can be better suited to e.g. narrow data sets.

dials.index imported.expt strong.refl

or

dials.index optimised.expt strong.refl

are the ways to trigger the program, and the most common parameters to set are the space_group and unit_cell if these are known in advance. While this does index the data it will also perform some refinement with a static crystal model, and indicate in the output the fraction of reflections which have been indexed - ideally this should be close to 100%:

Refined crystal models:
model 1 (19471 reflections):
Crystal:
    Unit cell: 58.276(3), 58.228(2), 150.669(5), 89.9383(8), 89.9515(12), 90.0746(13)
    Space group: P 1
    U matrix:  {{ 0.5142,  0.6367, -0.5747},
                {-0.3924, -0.4212, -0.8177},
                {-0.7626,  0.6460,  0.0332}}
    B matrix:  {{ 0.0172,  0.0000,  0.0000},
                { 0.0000,  0.0172,  0.0000},
                {-0.0000, -0.0000,  0.0066}}
    A = UB:    {{ 0.0088,  0.0109, -0.0038},
                {-0.0067, -0.0072, -0.0054},
                {-0.0131,  0.0111,  0.0002}}
+------------+-------------+---------------+-------------+
|   Imageset |   # indexed |   # unindexed | % indexed   |
|------------+-------------+---------------+-------------|
|          0 |       19471 |          3604 | 84.4%       |
+------------+-------------+---------------+-------------+

If it is significantly less than 100% it is possible you have a second lattice - adding max_lattices=2 (say) to the command-line will indicate to the program that you would like to consider attempting to separately index the unindexed reflections after the first lattice has been identified.

By default the triclinic lattice i.e. with P1 no additional symmetry is assumed - for the majority of data there are no differences in the quality of the results from assigning the Bravais lattice at this stage, even if as here it is perfectly obvious what the correct answer is.

If successful, dials.index writes the experiments and indexed reflections to two new files indexed.expt and indexed.refl - if these are loaded in the reciprocal lattice viewer you can see which spots have been indexed and if you have multiple lattices switch them "on and off" for comparison.

The process that the indexing performs is quite complex -

  • make a guess at the maximum unit cell from the pairwise separation of spots in reciprocal space
  • transform spot positions to reciprocal space using the best available current model of the experimental geometry
  • perform a Fourier transform of these positions or other algorithm to identify the basis vectors of these positions e.g. the spacing between one position and the next
  • determine a set of these basis vectors which best describes the reciprocal space positions
  • transform this set of three basis vectors into a unit cell description, which is then manipulated according to some standard rules to give the best triclinic unit cell to describe the reflections - if a unit cell and space group have been provided these will be enforced at this stage
  • assign indices to the reflections by "dividing through" the reciprocal space position by the unit cell parallelopiped (this is strictly the actual indexing step)
  • take the indexed reflections and refine the unit cell parameters and model of the experimental geometry by comparing where the reflections should be and where they are found
  • save the indexed reflections and experiment models to the output files

The indexing process takes place over a number of cycles, where low resolution reflections are initially indexed and refined before including more reflections at high resolution - this improves the overall success of the procedure by allowing some refinement as a part of the process.

During this process an effort is made to eliminate "outlier" reflections - these are reflections which do not strictly belong to the crystal lattice but are accidentally close to a reciprocal space position and hence can be indexed. Most often this is an issue with small satellite lattices or ice / powder on the sample. Usually this should not be a cause for concern.

Bravais Lattice Determination (optional!)

Once you have indexed the data you may optionally attempt to infer the correct Bravais lattice and assign this to constrain the unit cell in subsequent processing. If, for example, the unit cell from indexing has all three angles close to 90° and two unit cell lengths with very similar values you could guess that the unit cell is tetragonal. In dials.refine_bravais_settings we take away the guesswork by transforming the unit cell to all possible Bravais lattices which approximately match the triclinic unit cell, and then performing some refinement - if the lattice constraints are correct then imposing them should have little impact on the deviations between the observed and calculated reflection positions (known as the R.M.S. deviations). If a lattice constraint is incorrect it will manifest as a significant increase in a deviation - however care must be taken as it can be the case that the true symmetry is lower than the shape of the unit cell would indicate.

In the general case there is little harm in skipping this step, however for information if you run

dials.refine_bravais_settings indexed.expt indexed.refl

you will see a table of possible unit cell / Bravais lattice / R.M.S. deviations printed in the output - in the case of this tutorial data they will all match, as the true symmetry is tetragonal.

Chiral space groups corresponding to each Bravais lattice:
aP: P1
mP: P2 P21
mC: C2
oP: P222 P2221 P21212 P212121
oC: C2221 C222
tP: P4 P41 P42 P43 P422 P4212 P4122 P41212 P4222 P42212 P4322 P43212
+------------+--------------+--------+--------------+----------+-----------+------------------------------------------+----------+------------+
|   Solution |   Metric fit |   rmsd | min/max cc   |   #spots | lattice   | unit_cell                                |   volume | cb_op      |
|------------+--------------+--------+--------------+----------+-----------+------------------------------------------+----------+------------|
|   *      9 |       0.0969 |  0.057 | 0.753/0.914  |     5000 | tP        | 58.08  58.08 150.34  90.00  90.00  90.00 |   507151 | a,b,c      |
|   *      8 |       0.091  |  0.049 | 0.753/0.811  |     5000 | oC        | 82.24  82.41 150.56  90.00  90.00  90.00 |  1020311 | a+b,-a+b,c |
|   *      7 |       0.0969 |  0.053 | 0.753/0.933  |     5000 | oP        | 57.98  58.06 150.24  90.00  90.00  90.00 |   505766 | a,b,c      |
|   *      6 |       0.091  |  0.045 | 0.811/0.811  |     5000 | mC        | 82.31  82.20 150.43  90.00  90.08  90.00 |  1017816 | a-b,a+b,c  |
|   *      5 |       0.0891 |  0.034 | 0.933/0.933  |     5000 | mP        | 58.14  58.18 150.49  90.00  89.91  90.00 |   509099 | -b,-a,-c   |
|   *      4 |       0.0969 |  0.048 | 0.795/0.795  |     5000 | mP        | 58.12  58.13 150.51  90.00  89.90  90.00 |   508490 | a,b,c      |
|   *      3 |       0.0785 |  0.05  | 0.753/0.753  |     5000 | mP        | 58.19 150.48  58.15  90.00  90.11  90.00 |   509165 | b,c,a      |
|   *      2 |       0.0478 |  0.032 | 0.809/0.809  |     5000 | mC        | 82.27  82.38 150.61  90.00  89.93  90.00 |  1020753 | a+b,-a+b,c |
|   *      1 |       0      |  0.031 | -/-          |     5000 | aP        | 58.28  58.23 150.67  89.94  89.95  90.08 |   511263 | a,b,c      |
+------------+--------------+--------+--------------+----------+-----------+------------------------------------------+----------+------------+
* = recommended solution

If you wish to use one of the output experiments from this process e.g. bravais_setting_9.expt you will need to reindex the reflection data from indexing to match this - we do not output every option of reindexed data as these files can be large. In most cases it is simpler to re-run dials.index setting the chosen space group.

The reader is reminded here - in most cases it is absolutely fine to proceed without worrying about the crystal symmetry at this stage 🙂.

Refinement

The model is already refined during indexing, but this is assuming that a single crystal model is appropriate for every image in the data set - in reality there are usually small changes in the unit cell and crystal orientation throughout the experiment as the sample is rotated. dials.refine will first re-run refinement with a fixed unit cell and then perform scan-varying refinement. If you have indexed multiple sweeps earlier in processing (not covered in this tutorial) then the crystal models will be copied and split at this stage to allow per-crystal-per-scan models to be refined.

By and large one may run:

dials.refine indexed.expt indexed.refl

without any options and the program will do something sensible - if you compare the R.M.S. deviations from the end of indexing with the end of refinement you should see a small improvement e.g.

RMSDs by experiment:
+-------+--------+----------+----------+------------+
|   Exp |   Nref |   RMSD_X |   RMSD_Y |     RMSD_Z |
|    id |        |     (px) |     (px) |   (images) |
|-------+--------+----------+----------+------------|
|     0 |   5000 |  0.27013 |  0.31145 |    0.25904 |
+-------+--------+----------+----------+------------+

to:

RMSDs by experiment:
+-------+--------+----------+----------+------------+
|   Exp |   Nref |   RMSD_X |   RMSD_Y |     RMSD_Z |
|    id |        |     (px) |     (px) |   (images) |
|-------+--------+----------+----------+------------|
|     0 |  14423 |  0.24008 |  0.27144 |    0.20241 |
+-------+--------+----------+----------+------------+

If you look at the output of dials.report at this stage you should see small variations in the unit cell and sample orientation as the crystal is rotated - if these do not appear small then it is likely that something has happened during data collection e.g. severe radiation damage:

Refinement graphs

Integration

Once you have refined the model the next step is to integrate the data - in effect this is using the refined model to calculate the positions where all of the reflections in the data set will be found and measure the background-subtracted intensities:

dials.integrate refined.expt refined.refl

By default this will pass through the data twice, first looking at the shapes of the predicted spots to form a reference profile model then passing through a second time to use this profile model to integrate the data, by being fit to the transformed pixel values. This is by far the most computationally expensive step in the processing of the data!

by default all the processors in your computer are used, unless we think this will exceed the memory available in the machine. At times, however, if you have a large unit cell and / or a large data set you may find that processing on a desktop workstation is more appropriate than e.g. a laptop.

If you know in advance that the data do not diffract to anything close to the edges of the detector you can assign a resolution limit at this stage by adding prediction.d_min=1.8 (say) to define a 1.8 Å resolution limit - this should in general not be necessary. At the end of integration two new files are created - integrated.refl and integrated.expt - looking at these in the image viewer e.g.

dials.image_viewer integrated.expt integrated.refl

can be very enlightening as you should see little red boxes around every reflection - if you select "integrated only" you can see what was and was not integrated. You may see a selection of reflections close to the rotation axis are missed - these are not well modelled or predicted in any program so typically excluded from processing.

Symmetry analysis

Before the data may be scaled it is necessary that the crystal symmetry is known - if this was assigned correctly at indexing e.g. space_group=P41212 then you can proceed directly to scaling. In the majority of cases however it will be unknown or not set at this point, so needs to be assigned between integration and scaling. Even if the Bravais lattice was assigned earlier, the correct symmetry within that lattice is needed.

The symmetry analysis in DIALS takes the information from the spot positions and also the spot intensities. The former are used to effectively re-run dials.refine_bravais_settings to identify possible lattices and hence candidate symmetry operations, and the latter are used to assess the presence or absence of these symmetry operations. Once the operations are found, the crystal rotational symmetry is assigned by composing these operations into a putative space group. In addition, systematically absent reflections are also assessed to assign a best guess to translational elements of the symmetry - though these are not needed for scaling, they may help with downstream analysis rather than you having to manually identify them.

dials.symmetry integrated.expt integrated.refl

is how this step is run. At this point it is important to note that the program is trying to identify all symmetry elements, and does not know that e.g. inversion centres are not possible - so for an oP lattice it will be testing for P/mmm symmetry which corresponds to P2?2?2? in standard MX

Scaling and Merging

During the experiment there are effects which alter the measured intensity of the reflections, not least radiation damage, changes to beam intensity or illuminated volume or absorption within the sample. The purpose of dials.scale, like all scaling programs, is to attempt to correct for these effects by using the fact that symmetry related reflections should share a common intensity. By default no attempt is made to merge the reflections - this may be done independently in dials.merge - but a table of merging statistics is printed at the end along with resolution recommendations.

dials.scale symmetrized.expt symmetrized.refl [anomalous=True]

runs everything with the defaults which allows for:

  • modest radiation damage
  • changes in overall intensity
  • modest sample absorption

with the latter being the parameter most likely changed. If you have a data set recorded from a sample containing a large amount of metal (not common in MX) or recorded at long wavelength e.g, for sulphur SAD it may be necessary to adjust the extent to which the absorption correction is constrained with

absorption_level=(low|medium|high)

where setting low, the default, corresponds to ~ 1% absorption, medium to ~ 5% and high to ~ 25% - these are not absolute, more a sense of what you may expect. Testing has indicated that setting it too high is unlikely to do any harm, but setting it too low can have a measurable impact on the quality of the data for phasing experiments. dials.scale generates a HTML report dials.scale.html which includes a lot of information about how the models look, as well as regions of the data which agree well and poorly - from a practical perspective this is the point where you really know about the final quality of the data.

Merging or Exporting

Most downstream software depends on a scaled and merged data set e.g. for molecular replacement, so at the end of processing you can run

dials.export scaled.expt scaled.refl

to simply export the scaled reflections in MTZ format or

dials.merge scaled.expt scaled.refl

which will output a scaled and merged MTZ file.