A worldwide e-Infrastructure for NMR and structural biology

Generating the Necessary Restraint Files for running HADDOCK Manually

You can create the following type of restraints:

Below you can find a detailed explanation for each type of restraint:

  • Ambiguous Interaction Restraints (AIRs)

HADDOCK uses experimental and/or bioinformatics information to drive the complex formation in silico (For the usage of CSP data coming from NMR for example, you can refer to the Case Study section of this manual). The experimental and/or prediction data are used to define active and passive residues.

Active residues are described as the experimentally identified or predicted interface residues that are accessible to the solvent (Either main chain or side chain relative accessibility should be typically > 40-50%). The accessibility cut-off is not a hard limit. You should carefully check the identity of the residues at the interface and possibly include residues with lower accessibility if they carry potentially important functional groups. The passive residues are all solvent accessible surface neighbors of the active residues.

Note that the active and passive residues have to be defined by the users based on their own interpretation of the experimental data, especially in the case of NMR titration data. (One way to interpret the significance of the shift is to calculate the average perturbation and to consider that all perturbations higher than the average are significant.)

Active and passive residues are used to define a network of AIRs between the molecules to be docked. An AIR defines that a residue on the surface of a biomolecule should be in close vicinity to another residue or group of residues on the partner biomolecule when they form the complex. By default this is described as an ambiguous distance restraint between all atoms of the source residue to all atoms of all target residue(s) that are assumed to be in the interface in the complex as illustrated in the following Figure:

    

According to this Figure, an AIR can be defined between the any atom of an active residuei” of protein A and any atom of all active and passive residuesx, y, z” of protein B. The distance calculated among these specific residues is called the effective distance, diABeff. The effective distance is formulized as:                   
  

                

Here NAatom indicates all atoms of the source residue on molecule A, NresB the residues defined to be at the interface of the target molecule B and NBatom all atoms of a residue on molecule B. The 1/r6 summation somewhat mimics the attractive part of a Lennard-Jones potential and ensures that the AIRs are satisfied as soon as any two atoms of the biomolecules are in contact. A detailed description of how to generate and AIR file is given in the Project Setup section.

Once you have defined your active and passive residues,

   i. Go to the manual-HADDOCK home page: http://www.nmr.chem.uu.nl/haddock/
   ii. Select Project Setup and click on Generate AIR File    iii. Enter the residue numbers corresponding to the active and passive residues for each molecule.
   iv. Define the upper distance limit for AIRs (maximum distance between any atom of an active residue of one molecule to any atom of an active or passive residue of the other molecule). Note that the current default value for the upper distance limit is 2 Å, which might seem quite tight, BUT remember that the effective distance will always be tighter than the shortest distance when it is summed according to the  formula. Moreover, since the degree of ambiguity is very high (several thousands distances can enter the sum), the effective distance can be quite shorter than the shortest distance when the distances enter in to the diABeff summation!!
   v. Finally, click on Generate AIR restraints button. Via this action, an AIR restraint file in CNS format will be generated. For saving the generated file, either copy and paste the text in to a new file or use the File > Save as option of your browser.

  • Use of bioinformatics interface predictions

In the absence of any experimental information on the interaction surfaces, you might want to predict the interfacial residues basing on bioinformatics methods. For this purpose, we have developed two different interface predictors: WHISCY and CPORT. WHISCY predictions are based on conservation, where CPORT is a consensus interface predictor basing on several other prediction algorithms. Both WHISCY and CPORT have an easy interface and they output a list of active and passive residues that can be directly supplied as an input to HADDOCK. In the web interface of CPORT, there is also an option to tune the sensitivity/specificity of the predictions.

For more information about WHISCY abd CPORT you can refer to:
S.J. de Vries, A.D.J. van Dijk and A.M.J.J. Bonvin, "WHISCY: WHat Information does Surface Conservation Yield? Application to data-driven docking.", Proteins: Struc. Funct. & Bioinformatics, 63, 479-489 (2006). http://www.nmr.chem.uu.nl/whiscy

S.J. de Vries and A.M.J.J. Bonvin, "CPORT: a Consensus Interface Predictor and its Performance in Prediction-driven Docking with HADDOCK", PloS One, DOI:10.1371/journal.pone.0017695 (2011). http://haddock.chem.uu.nl/services/CPORT/cport.html

  • Use of Random AIR definition (ab-initio mode)

In the absence of any experimental and/or bioinformatics information to drive the docking, HADDOCK 2.1 offers the possibility to randomly define AIRs from solvent accessible residues (>20% relative accessibility). For each docking trial another set of AIRs will be used. These restraints are defined in the randomairs.cns script. To activate ab-initio mode, you should first change the run.cns file. If you want to make a manual change you should change the parameter from randair=false to randair=true. Detailed explanation about this issue is given at Settings/random_interface section.

In ab-inito mode all solvent accessible residues will be sampled by default (provided that enough structures are generated in the rigid-body docking stage, it0). But if you define semi-flexible segments in combination with random AIR definition, you can limit the sampling to a selected region of the surface (e.g. the CDR loops in an antibody-antigen complex). In order to define the semi-flexible segments you should again make the necessary changes in run.cns file (Change the value for nseg_X and the related parameters). For more information go to Settings/semi-flexible section.

The random AIRs are defined between a randomly selected patch (within 5Å radius) on one molecule and the randomly selected patch on the other molecule (within 7.5Å radius) and vice-versa. This definition is coded in the randomairs.cns. The generation of AIRs is as follows (note that this is valid only for the rigid-body energy minimization stage):

   i. One residue on each molecule is selected randomly (Ai,Bi ),
   ii. All surface neighbors of this residue (within 5Å cut-off) are also selected,
   iii. AIRs are defined between each residue selected from molecule A (Ai + 5Å neighbors) and the first residue randomly selected from molecule B and all its surface neighbors within a 7.5Å cutoff (Bi + 7.5Å neighbors)
   iv. AIRs are defined between each residue selected from molecule B (Bi + 5Å neighbors) and the first residue randomly selected from molecule A and all its surface neighbors within a 7.5Å cutoff (Ai+ 7.5Å neighbors). The selected residues are written to disk in structures/it0 as fileroot_1.disp,....

For the semi-flexible refinement stage, contact AIRs are automatically defined between all residues within 5Å across the interface. In the final explicit solvent refinement, no AIR restraints will be defined. To ensure a thorough sampling of the surface, the number of structures generated at the rigid-body stage (it0) should be increased (e.g. 10000), depending on the extent of the surface to be sampled.

  • Use of Surface Contacts Restraints

To define the surface contact restraints automatically you should change the surfrest variable of the run.cns file in to true (surfrest=ture, see for more information Settings/surface_contact section). These restraints are defined in the surf-restraint.cns script and they are fully compatible with all other types of restraints.

If turned on, one surface contact restraint will be defined between each pair of molecules as an ambiguous distance restraint. This contact restraint is calculated with sum averaging (as for the AIRs) and it is measured between all CA or P atoms (protein and/or DNA) of one molecule and all CA or P atoms of the other molecule. If less than 3 CA and P atoms are found, all atoms will be selected instead. The upper distance limit for the measurement is set to either 7Å (if both molecules contain CA and/or P atoms); 4.5Å (if only one molecule contains CA and/or P atoms) or 2Å (if no molecule contains CA and/or P atoms).

Such restraints can be useful in multi-body (N>2) docking to ensure that all molecules are in contact and thus promote compactness of the docking solutions. As for the random AIRs, surface contact restraints can be used in ab-initio docking; in such a case it is important to have enough sampling of the random starting orientations and this significantly increases the number of structures for rigid-body docking.

  • Use of Center of Mass Restraints

Center of mass restraints between various molecules can be automatically defined in HADDOCK 2.1 by turning the cmrest variable given in run.cns on: cmrest=true (See Settings/center_mass). These restraints are defined in the cm-restraint.cns script and they are fully compatible with all other types of restraints.

If turned on, one center of mass restraint will be defined between each molecule as an ambiguous distance restraint with center averaging between all CA or P atoms (protein and/or DNA) of one molecule and all CA or P atoms of the other molecule. If less than 3 CA and P atoms are found, all atoms will be selected instead. The upper distance limit is automatically defined as the sum of the "effective radius" of each molecule. The "effective radius" is defined as half the average length of the three principal components.

Such restraints can be useful in multi-body (N>2) docking to ensure that all molecules are in contact and thus promote compactness of the docking solutions. As for the random AIRs, center of mass restraints can be used in ab-initio docking; in such a case it is important to have enough sampling of the random starting orientations and this increase significantly the number of structures for rigid-body docking.

  • Unambiguous Distance Restraints 

If you have NOE distance restraints or any kind information with regards to a distance between any atom pairs, you can create a restraint file with the CNS syntax. You can find the Unambiguous Distance Restraint format at: http://cns.csb.yale.edu/v1.1/tutorial/formats/noe/text.html

  • Dihedral Angle Restraints

In the case that you have information over one of the dihedral angles of the complex of interest, HADDOCK allows you to specify them as restraints for the docking run. For the syntax of a Dihedral Angle Restraint, you can check: http://cns.csb.yale.edu/v1.1/tutorial/formats/dihedral/text.html

  • Hydrogen Bond (H-bond) Restraints 

To use a known H-bond for driving the docking of your complex, you can create a H-bond restraint file by following the same syntax for Unambiguous Distance Restraints.

  • Residual Dipolar Coupling (RDC) Restraints

Residual dipolar couplings (RDCs) can provide useful information on the orientation of the molecules to be docked. They can be introduced in HADDOCK in two ways:
   i. Directly as RDC restraints (SANI statement in CNS)
   ii. Indirectly by defining intervector projection angle restraints (VEAN statement in CNS)

From our experience, both approaches give good results for docking. The use of intervector projection angle restraints (Meiler et al., 2000, J. Biomol. NMR 17, 185, 2000) avoids the burden of working with a tensor in the structure calculations. Another advantage is that one can distinguish between inter- and intra-molecular restraints. Considering that part of the system will typically be kept rigid during docking, the use of intra-molecular restraints might not make much sense anyway.

For both, the tensor components need first to be determined. In the case of complexes, this can be easily done by using the known structures of the single domains. The software Pales (Zweckstetter & Bax J., 2000, Am. Chem. Soc. 122, 3791-3792) can be used for this purpose.
You need for this to generate a Pales input file containing your residual dipolar couplings. A csh script called ana_pdb_Q-factor.csh is provided in the haddock/tools directory that will calculate from the experimental dipolar coupling the tensor parameters for all PDB files present in the current directory by best-fitting the dipolar coupling tensor to the corresponding 3D structures. Usage:

$HADDOCK/tools/ana_pdb_Q-factor.csh pales.inp

The output will be written to files with extension PDBfilename.pales. The tensor parameters Axx, Ayy and Azz can then be extracted with the following command:

grep Axx *.pales | gawk '{print $4,$5,$6}' > xx-yy-zz.dat

The components from the structure giving the best fit to the experimental data can be used.
Alternatively, the average values can then be calculated with:

cat xx-yy-zz.dat | awk '{print $1}' | $HADDOCKTOOLS/average.perl
cat xx-yy-zz.dat | awk '{print $2}' | $HADDOCKTOOLS/average.perl
cat xx-yy-zz.dat | awk '{print $3}' | $HADDOCKTOOLS/average.perl

Check the values in xx-yy-zz.dat to make sure they match (e.g. same sign) before averaging them. Similarly, the axial (Da) and rhombic (Dr) components can be extracted from Pales1.2 output files and averaged with the following commands:

grep Da *.pales | awk '{print $3}' | $HADDOCKTOOLS/average.perl
grep Dr *.pales | awk '{print $3}' | $HADDOCKTOOLS/average.perl

Note: For use in HADDOCK (and CNS), the tensor components should be expressed in Hertz and the Pales values should be scaled depending on the nuclei observed. For example, for N-H residual dipolar coupling the proper scaling factor is 21700. Also be careful in the conversion since different programs often use different conventions/notations/units.

  • Direct use of RDCs as restraints for docking

The proper format for RDC restraints is the following:

assi ( resid 999 and name OO )
        ( resid 999 and name Z  )
        ( resid 999 and name X  )
        ( resid 999 and name Y  )
        ( resid   20 and name N and segid A )
        ( resid   20 and name HN and segid A )   2.981   0.200

Given a file containing residue_number RDC_value and segid a RDC restraint file in CNS format can be generated with the gawk script generate_sani provided in the HADDOCK/RDCtools directory:

$HADDOCK/RDCtools/generate_sani rdc_data_file

The error on the RDCs is set by default to 0.2 Hz. This can be overruled by giving the error value as argument:

$HADDOCK/RDCtools/generate_sani ERR=0.4 rdc_data_file

The 2.1 version of HADDOCK supports up to 5 different SANI restraints sets. Each can have a separate tensor. The tensor residue number should be in the range 999-995. You can edit and modify the generate_sani script to change the tensor number. To use RDC restraints in HADDOCK, use SANI in run.cns in the dipolar coupling section and define the proper Da and R parameters (R=Dr/Da). The RDC restraints are first used in the rigid body energy minimization step using as force constant the value defined for the hot phase. Keep this value small (the current default is 0.02) to keep a proper balance between the AIR and SANI energy terms.

Note1: Only one set (corresponding to one alignment tensor) of dipolar couplings can be used as direct restraints (SANI) in HADDOCK since currently only one alignment tensor is supported. Multiple sets can however be used as intervector projection angle restraints (see below).
Note2: For proper docking results, dipolar couplings restraints of the various molecules should be input as one set (and thus not split in separate sets for each molecule). The assumption here is that the RDCs of the various components share one common alignment tensor.

  • Intervector projection angle restraints for docking

Intervector projection angle restraints (Meiler et al., 2000, J. Biomol. NMR 17, 185) are obtained by taking pairs of residual dipolar couplings and generating intervector projection angle restraints (somewhat equivalent to dihedral angle restraints). These restraints have the advantage that they do no longer depend on the orientation of the dipole vector with respect to the alignment tensor. Instead they restrain the angle between two dipolar vectors, allowing for two minima. Two force constants must be therefore defined: one for the border potential function and one for the central part (e.g. between the two minima).

Thanks to Helen Mott and Wayne Boucher from Cambridge University we are providing in the HADDOCK/RDCtools a python script, dipole_segid.py that allows the generation of such restraints from RDC data. To use it, you need to have your RDC data in a tab separated file containing residue_number, RDC_value and segid and provide the tensor components Dxx, Dyy and Dzz (in Hertz). For NH couplings, these components are equal to 21700 times the eigenvalues of the Saupe matrix given by Pales. Usage:

python $HADDOCK/RDCtools/dipolar_segid.py rdc_data_file vean_output_file Dxx Dyy Dzz

The resulting restraints file looks like:

assign (resid 19 and name N and segid B ) (resid 19 and name HN and segid B ) (resid 27 and name N and segid B ) (resid 27 and name HN and segid B ) 13.1 2.9 166.9 2.9 ! excluded 0.935
assign (resid 75 and name N and segid A ) (resid 75 and name HN and segid A ) (resid 27 and name N and segid B ) (resid 27 and name HN and segid B ) 13.1 2.9 166.9 2.9 ! excluded 0.935

The last column gives the fraction of angular space excluded by the restraint and can be used to select "significant" restraints, e.g. limiting more than 25% of the torsional space. Note that the number of restraints generated is very high since for N dipolar coupling there are N*(N-1) possible combinations.

To select for example all inter- and intra-molecular restraint excluding more than 25% of the angular space type:

    awk '{if ($27 == $9 && $44 > 0.25) {print $0}}' vean_output_file >vean_intra_25.tbl
    awk '{if ($27 != $9 && $44 > 0.25) {print $0}}' vean_output_file >vean_inter_25.tbl

To use intervector projection angle restraints in HADDOCK, use VANGLE in run.cns in the dipolar coupling section. The VANGLE restraints are introduced in the rigid body energy minimization step using as initial force constants the value defined for the hot phase. The restraints are activated in the second rotational minimization phase (thus earlier than the SANI restraints!)

  • Diffusion Anisotropy (DANI) Restraints

Diffusion anisotropy data (relaxation data) can provide useful information on the orientation of the molecules to be docked (comparable to RDCs). They can be introduced in HADDOCK as direct restraints (DANI statement in CNS). For this, the tensor components need first to be determined. In the case of complexes, this can be easily done by using the known structures of the single domains. The software Tensor2 (Dosset, Marion and Blackledge, 2000, . J. Biomol. NMR 16, 23-28) can be used for this purpose. You need for this to generate a Tensor2 input file containing your relaxation data. A csh script called ana_pdb_tensor2.csh is provided in the haddock/DANItools directory that will calculate from the experimental relaxation data the tensor parameters for all PDB files present in the current directory by best-fitting the data to the corresponding 3D structures. Usage:

$HADDOCK/DANItools/ana_pdb_tensor2.csh tensor2.inp

Note: You have to define the fitting options in the Tensor2 GUI manually. Tensor2 writes its’ output for each structure to a file called resaniso.0; each of these are moved to a subdirectory with directory name equal to the corresponding pdb file.

The script extracts the tensor parameters Dx, Dy and Dz and the chi2-value of the fit; these can be combined with the following command:

paste D?_all.tmp chi2_all.tmp | awk '{print $1,$8,$16,$24,$27*100/100}' | sort -n +4 > tensor2_fit.lis

The components from the structure giving the best fit to the experimental data can be used to calculate the tensor parameters that are needed in run.cns with the script calc_tens.csh (provided in the haddock/DANItools directory). Usage:

$HADDOCK/DANItools/calc_tens.csh dx dy dz

Where dx, dy and dz are the values from the file tensor2_fit.lis. The output of calc_tens.csh can be used directly in run.cns as dan1_tc, dan1_anis and dan1_r, the rotational correlation time, anisotropy and rhombicity of the rotational diffusion tensor, respectively. Alternatively, average values could be used; these can be calculated with:

cat tensor2_fit.lis | awk '{print $2}' | $HADDOCKTOOLS/average.perl
cat tensor2_fit.lis | awk '{print $3}' | $HADDOCKTOOLS/average.perl
cat tensor2_fit.lis | awk '{print $4}' | $HADDOCKTOOLS/average.perl

Check the values in tensor2_fit.lis to make sure they match (e.g. same sign) before averaging them. Also make sure that an anisotropic model is in accordance with your data.

  • Use of Relaxation data as restraints for docking

The proper format for DANI restraints is the following:

assi ( resid 999 and name OO )
        ( resid 999 and name Z  )
        ( resid 999 and name X  )
        ( resid 999 and name Y  )
        ( resid   20 and name N and segid A )
        ( resid   20 and name HN and segid A )   8.705   0.200

The restraints use the R2/R1 value; one should filter out residues where chemical exchange or mobility influences these values. Given a file containing residue_number R2/R1value and Segid a DANI restraint file in CNS format can be generated with the gawk script generate_dani provided in the HADDOCK/DANItools directory:

$HADDOCK/DANItools/generate_dani dani_data_file

The error on the R2/R1 values is set by default to 0.2. This can be overruled by giving the error value as argument:

$HADDOCK/DANItools/generate_dani ERR=0.5 dani_data_file

The 2.1 version of HADDOCK supports up to 5 different DANI restraints sets. Each can have a separate tensor. The tensor residue number should be in the range 999-995. You can edit and modify the generate_dani script to change the tensor number. To use DANI restraints in HADDOCK, use DANI in run.cns in the diffusion anisotropy section and define the proper tensor parameters that are output by the calc_tens.csh script. The DANI restraints are first used in the rigid body energy minimization step using as force constant the value defined for the hot phase.

0
Your rating: None

CNS dihedral link outdated

The new link is here. Update?

http://structure.usc.edu/cns/tutorial/formats/dihedral/text.html

Cite WeNMR/WestLife

 
Usage of the WeNMR/WestLife portals should be acknowledged in any publication:
 
"The FP7 WeNMR (project# 261572) and H2020 West-Life (project# 675858) European e-Infrastructure projects are acknowledged for the use of their web portals, which make use of the EGI infrastructure and DIRAC4EGI service with the dedicated support of CESNET-MetaCloud, INFN-PADOVA, NCG-INGRID-PT, RAL-LCG2, TW-NCHC, SURFsara and NIKHEF, and the additional support of the national GRID Initiatives of Belgium, France, Italy, Germany, the Netherlands, Poland, Portugal, Spain, UK, South Africa, Malaysia, Taiwan and the US Open Science Grid."
 
And the following article describing the WeNMR portals should be cited:
Wassenaar et al. (2012). WeNMR: Structural Biology on the Grid.J. Grid. Comp., 10:743-767.

EGI-approved

The WeNMR Virtual Research Community has been the first to be officially recognized by the EGI.

European Union

WeNMR is an e-Infrastructure project funded under the 7th framework of the EU. Contract no. 261572

WestLife, the follow up project of WeNMR is a Virtual Research Environment e-Infrastructure project funded under Horizon 2020. Contract no. 675858

West-Life