A worldwide e-Infrastructure for NMR and structural biology

A Case Study: Preparing Input Files for a Manual HADDOCK Run

We will here illustrate the process of defining AIRs in the case of NMR chemical shift perturbation data (CSP) describing the following steps:

1. Defining residues with "significant" chemical shift perturbations

We will assume that we have a file called csp.dat containing the combined proton/nitrogen chemical shift changes as obtained from 15N HSQC titration experiments in the following format (The first column corresponds to the residue number and the second to the combined chemical shift perturbation):

1 0.0
2 0.0
3 0.06
4 0.3
...

HADDOCK comes with a number of awk, csh and perl scripts to handle and analyze the data. In order to calculate the average perturbation, you can use the average.perl script located in $HADDOCKTOOLS (See Installation Section).Using the following command, you can extract the second column of the csp.dat file and give it as an input to the average.perl script:

awk '{print $2}' csp.dat | $HADDOCKTOOLS/average.perl

Then select all residues that have a combined chemical shift perturbation larger than -for example- the average value avcsp:

awk '{if ($2>avcsp) print $0}' csp.dat

This will list you all the residues selected! The next step consists of filtering those residues according to their solvent accessibility.

2. Filtering active residues according to solvent accessibility

An important parameter in defining AIRs consists of the relative Residue Solvent Accessibility (RSA). RSA can be calculated with the program NACCESS (see Software Links at http://www.nmr.chem.uu.nl/haddock/). NACCESS will output a file with extension .rsa containing the per- residue solvent accessibilities divided into various classes:
    REM RES _ NUM      All-atoms   Total-Side   Main-Chain    Non-polar    All polar
    REM                ABS   REL    ABS   REL    ABS   REL    ABS   REL    ABS   REL
    RES MET     1   125.45  64.6  75.64  48.3  49.81 132.8  75.64  47.9  49.81 137.1
    RES PHE     2    83.49  41.9  83.49  50.9   0.00   0.0  83.49  50.5   0.00   0.0
    RES GLN     3    79.31  44.4  62.27  44.2  17.04  45.4  17.75  34.0  61.56  48.7
    RES GLN     4    83.82  47.0  83.82  59.4   0.00   0.0  15.03  28.8  68.79  54.5
    RES GLU     5   133.48  77.5 100.65  74.7  32.83  87.5  34.78  57.7  98.70  88.2
    RES VAL     6    20.78  13.7  20.78  18.2   0.00   0.0  20.78  18.0   0.00   0.0
    ...  
Only the amino acids, which have high solvent accessibility, should be selected. The selection can be done either on the all-atoms accessibilities using the following command at the Unix prompt:

awk '{if (NF==13 && $5>50) print $0; if (NF==14 && $6>50) print $0}' pdb_filename.rsa

or by requesting that either the main-chain or the side-chain relative accessibility be larger than 50%:

awk '{if (NF==13 && ($7>50 || $9>50)) print $0; if (NF==14 && ($8>50 || $10>50)) print $0}' pdb_filename.rsa

By combining the experimental data (mutagenesis or chemical shift perturbation) and the solvent accessibility, you should be able to define precisely the active residues to use in HADDOCK.

3. Defining passive residues 

The passive residues are all solvent accessible surface neighbors of active residues. To define them you can display your molecule in space-filling model (e.g. with Rasmol) and color the active residues for example in red.

Then, filter out the residues having a low solvent accessibility (yellow).

Select all surface neighbors to define the passive residues (green).

Filter the passive residues with the solvent accessibility criterion (see above). If you are using an ensemble of structures as the starting point, you should use the average solvent accessibility to filter your active and passive residues (see below).

4. Residue filtering from an ensemble of structures 

If you perform the docking from an ensemble of structures, the solvent accessibility filtering should be performed using the average relative accessibilities (ASAav) over the ensemble. In such a case we are using the following accessibility cut-off:

ASAav + SD > 50% (for either all or main-chain or side-chain atoms)

where SD corresponds to the standard deviation. We are providing in the $HADDOCKTOOLS directory a csh script called calc_ave_asa.csh that will allow you to calculate the average accessibilities from an ensemble of structures using NACCESS. To do so, you should split your pdb file into different files containing each one structure and then use calc_ave_rsa.csh:

$HADDOCKTOOLS/calc_ave_rsa.csh *.pdb

A file named "rsa_ave.lis" will be created that contains the average solvent accessibility and the standard deviation for each residue:
   # resnam resnum < rsa_all > (sd) < rsa_back > (sd) < rsa_side > (sd)
    MET      1  69.323   10.370  125.390   13.626   55.903   12.599
    PHE      2  37.490    5.216    0.320    0.753   45.500    6.390
    GLN      3  53.793    8.246   50.147   14.108   54.770   10.873
    GLN      4  40.907    5.578    0.070    0.306   51.757    7.042
    GLU      5  70.330    6.312   68.017   15.608   70.963    7.614
    VAL      6  16.183    4.345    0.133    0.483   21.397    5.791
    ...
To select the residues that satisfies the 50% accessibility cut-off type:

awk '{if (($5+$6)>=50 || ($7+$8)>=50) print $0}' rsa_ave.lis

Note that the 50% cut-off is not a hard limit and is left to the user choice.

5. Generating the AIR file 

Once you have defined your active and passive residues, go to the HADDOCK home page (http://www.nmr.chem.uu.nl/haddock/), select project setup and click on "Generate AIR restraint file". Enter the residue numbers corresponding to the active and passive residues for each molecule. You can define the upper distance limit for AIRs (maximum distance between any atom of an active residue of one molecule to any atom of an active or passive residues of the second molecule). Click on Generate AIR restraints. An AIR restraint file in CNS format is generated. Use "copy and paste" or save the generate AIR restraints to disk using "File Save As".

0
Your rating: None

Cite WeNMR/WestLife

 
Usage of the WeNMR/WestLife portals should be acknowledged in any publication:
 
"The FP7 WeNMR (project# 261572) and H2020 West-Life (project# 675858) European e-Infrastructure projects are acknowledged for the use of their web portals, which make use of the EGI infrastructure and DIRAC4EGI service with the dedicated support of CESNET-MetaCloud, INFN-PADOVA, NCG-INGRID-PT, RAL-LCG2, TW-NCHC, SURFsara and NIKHEF, and the additional support of the national GRID Initiatives of Belgium, France, Italy, Germany, the Netherlands, Poland, Portugal, Spain, UK, South Africa, Malaysia, Taiwan and the US Open Science Grid."
 
And the following article describing the WeNMR portals should be cited:
Wassenaar et al. (2012). WeNMR: Structural Biology on the Grid.J. Grid. Comp., 10:743-767.

EGI-approved

The WeNMR Virtual Research Community has been the first to be officially recognized by the EGI.

European Union

WeNMR is an e-Infrastructure project funded under the 7th framework of the EU. Contract no. 261572

WestLife, the follow up project of WeNMR is a Virtual Research Environment e-Infrastructure project funded under Horizon 2020. Contract no. 675858

West-Life