A worldwide e-Infrastructure for NMR and structural biology

This site uses cookies for a better experience. By continuing to browse you agree to the use of cookies.

CS-ROSETTA web server tutorial

This tutorial provides a step-by-step explanation of all settings available at every step of the 3D structure generation process. Example projects ready to be downloaded as well as pre-loaded web forms are used to better explain several of the core functions.

Table of Contents:


General introduction


The CS-ROSETTA webserver makes the generation of 3D models of monomeric proteins accessible to the larger scientific community. The advantage of the CS-ROSETTA protocol is, that it only requires the 13CA, 13CB, 13C', 15N, 1HA and 1HN NMR chemical shifts as input for the structural calculations. The power of this webserver lies in its connection to the eNMR grid; calculation which take months on a single CPU are finished in several days.
The generation of the models can be devided in three steps:

  1. Generation of a fragment library
  2. Assembly of the protein
  3. Rescoring of the generated models

Generation of fragment library:
Based on the chemical shifts, CS-ROSETTA uses a SPARTA-based selection procedure to select fragments from a fragment library (Molecular Fragment Replacement - MFR). This library consists of fragments of which the chemical shifts as well as the torsion angles are known. The selected fragments form the building blocks in the subsequent ROSETTA assembly step.

Assembly of the protein:
The assembly of the protein utilizes a regular ROSETTA Monte Carlo assembly and relaxation procedure. To be able to make a reliable prediction of the 3D structure of the query protein, 10000-50000 models are generated for each query protein. Note that the starting package is the same for each model. This step takes the most time and is run on the eNMR grid.

Rescoring of the generated models:
The assembled models are re-evaluated by adding a chemical shift term to the all atom energy score. The chemical shift term is basically a term which compares the original chemical shifts with the backcalculated chemical shifts of a generated model (using SPARTA). Finally, the "new" lowest energy model is compared to the other generated models. If the difference between the models and the all atom score shows convergence the lowest energy model is chosen.

Figure 1: A flow chart of the CS-ROSETTA protocol.

How do I request a username and password
There are two steps required to use the CS-ROSETTA webserver; firstly you should request a personal certificate and register with the enmr.eu VO (see our "access -> grid registration" menu for instruction) and secondly you should register for the CS-ROSETTA webserver (username and password). Your username and password will be emailed to you.


Step 1: Getting Started

The current webserver requires five pieces of information; a name for your run, a chemical shift list in TALOS format, the number of models you want to generate, an optional list of pdb's you want to exclude from the fragment selection procedure and an optional choice to truncate flexible ends.
Give your run a name: The runname will be used as the name for the folder in which you can find the results of the calculation. Only alpha-numeric characters and the "-" and "_" can be used. Furthermore, the runname has a maximum length of 20 characters.
TALOS file: As metioned above, CS ROSETTA uses the protein backbone 13CA, 13CB, 13C', 15N, 1HA and 1HN NMR chemical shifts as input to search a structural database for matching fragments. The chemical shifts must be submitted in TALOS format, as defined at http://spin.niddk.nih.gov/NMRPipe/talos/#preparing shifts. See also our Talos webserver tutorial.
  • The file can only contain 13CA, 13CB, 13C', 15N, 1HA and 1HN shifts, named CA, CB, C, N, HA and HN respectively.
  • The HA of glycine form an exeption, they are named HA2 and HA3 (assigned arbitrarily, see example below).)
  • The protein sequence starts with DATA SEQUENCE, and space characters are ignored. The sequence can be devided over several lines as long as the line starts with DATA SEQUENCE
  • The file must contain a VARS line with the name of the columns and a FORMAT line with contains the format of the rows.

Below you see an example of the talos format:


  1. Missing chemical shift data are allowed, but the amino acid sequence shown in the header MUST be the full sequence of the protein and MUST start from residue #1, CS ROSETTA will generate protein structures with the sequence defined in the header.
  2. The chemical shifts have to be properly referenced. To make sure that your referencing is correct, please go to How do I make sure that me referencing is correct?
  3. Any tags MUST be excluded, because they do not belong to the structure of the protein and do possibly influence it (also change the numbering and the sequence).
Number of Models: In this field you can fill out the number of models you want to calculate. The maximum number of models a user can generate, is determined by the level of access you have.

Level 1: 15000 models (default)
Level 2: 30000 models (upon request)
Level 3: 50000 models (upon request)

"By using the current method implemented in CS-ROSETTA package, 5,000 to 20,000 predicted CS-ROSETTA models are generally required to obtain convergence. For small proteins - proteins smaller than 80 aminoacids - 1,000 to 5,000 CS-ROSETTA models often suffice. ROSETTA takes about 5-10 minutes to calculate one all-atom model on a single 2.4GHz CPU." ( from http://spin.niddk.nih.gov/bax/software/CSROSETTA/index.html).

RCSB PDB to exclude: It is possible to exclude a protein from the fragment library. This is possible by supplying the PDB code of the protein; the 4-letter PDB code and and 1-letter chain-ID. If mutiple proteins have to be excluded, separate each name by a comma and a space.
Truncate flexible ends: The truncate option uses the RCI protocol ( Wishart et al., 2005 ) to determine whether the protein has flexible ends. If flexible ends are identified, then they will be truncated. Please tick the box to use this option.


Example of Input

The example of this tutorial is UvrC, a DNA binding protein of 42 aminoacids. Previous studies have shown that the first X and the last C aminoacids are flexible and therefore they were excluded beforehand. The user requested 20000 models. There are no similar proteins in the reference database, and truncation is, due to the manual intervention, not necessary. The input file can be seen here (and is also available for download at the bottom of this page). Figure 2 shows a snapshot of the filled out webform.

Figure 2: A snapshot of the filled-out webform for UvrC.



Step 2: Calculations

After succesfully submitting the job UvrC, the user receives an email which contains a jobspecific link. This link can be used to firstly see the status of the job and finally to retrieve the results. The input file is validated and saved as talos.tab

The first step in the procedure is the generation of the fragment library. The user recieves an email after initialisation of this process. NOTE: This process might fail due to referencing problems. To solve referencing problems, go to How do I make sure that me referencing is correct?
The second step is the assembly step on the GRID. The number of generated models can be followed by checking your job specific link. Here it will show "RUNNING on the GRID - generating models with rosetta " and the number of structures generated so far.
In the final step the models are rescored and the output is prepared. The user will recieve an email which the jobspecific link after the output is finalized. The jobspecific link will now direct to a page where the output can be retrieved (see example).



Step 3: Analysis of the results

How to select CS-ROSETTA models:

After finishing CS-ROSETTA structure generation, the user has to decide whether the generated ROSETTA models are acceptable. For this purpose, it is convenient to plot the (re-scored) ROSETTA full-atom energies of all models vs the CA RMSD values relative to the lowest-(rescored)-energy model, using the data stored in the files "name.rms.rescore.txt" and "name.rms.orgener.txt". The plots are saved as "name.rms.rescore.png" and "name.rms.rescore.png"

1. If the 10 lowest energy models all differ by less than 2 angstrom CA RMSD from the model with the lowest (re-scored) energy (see plot of protein GB3 below, from http://spin.niddk.nih.gov/bax/software/CSROSETTA/index.html ), the structure prediction is deemed successful and the 10 lowest energy models are accepted.
NOTE: There is accumulation of error with increase of proteinsize. For small proteins, the 2A limit should be applied strictly, but for a 120 AA protein, this limit is less strict.

2. If no clustering around low energy models is observed (see plot of protein nsp1 below, from from http://spin.niddk.nih.gov/bax/software/CSROSETTA/index.html ), the structure prediction has not converged and the low energy models can not be accepted at this stage.

The resultpage of UvrC can be found below. All result are zipped together in outdir.tar.gz. In this file the following files can be found:
  • The input file after validation and possibly truncation; talos.tab
  • The selected fragments for MFR: aat000_03_05.200_v1_3 and aat000_09_05.200_v1_3
  • The outputfile of the ROSETTA step, final.out
  • The top 100 rescored models; named S_****_**.pdb
  • The Chemical Shift chi2 score: CS_chi2.txt
  • Files with the name, RMSD and respective energy( raw or rescored): name.rms.rawscore.txt and name.rms.rescore.txt
  • Other files, which contains the same information as the files mentioned above, but in a different combination. These files are useful for plotting the different variables.


Below you find the output for 2png (E. Ploskon et al., 2008 J.Biol.Chem. 283: 518-528). The convergence for this model is very succesful, especially the ROSETTA score vs the CA RMSD from the lowest energy model - S_3044_14 - shows a very nice funnel. This ensures that the lowest energy models are very similar. Which in its turn makes the model generation very robust for this protein. If the lowest ten models (energyscore) for both the ROSETTA score and the rescore energies are compared to model depostited in the pdb database, the follwing results are obtained.

Rescored     Original  
Name RMSD from
NMR structure
  Name RMSD from
NMR structure
S_3044_14 2.15   S_3044_14 2.15
S_1034_16 2.29   S_2513_16 2.17
S_2513_16 2.17   S_1034_16 2.29
S_1989_01 2.34   S_1989_01 2.34
S_2930_19 2.33   S_3311_17 2.36
S_3361_11 2.37   S_3361_11 2.37
S_2714_20 2.38   S_1387_02 2.54
S_3311_17 2.36   S_3109_14 2.24
S_1387_02 2.54   S_2714_20 2.38
S_1457_01 2.4   S_2930_19 2.33
Average 2.33   Average: 2.32

Table 1: RMSD from NMR-solved structure

Table one shows that the models are very close to the NMR model, so the protocol showed itself also reliable for this protein. The rescoring step did not make much diffence in the selection of models.

(Note that links in blue in the following result page are not active in this tutorial)
CS ROSETTA status for 2png
2012-04-20 11:25:59
The current status of your request is: FINISHED
Your CS ROSETTA run has successfully completed. The complete run can be downloaded as a gzipped tar file here

In total there are 51861 models generated
(requested: 50000).
BEST MODEL: Rescored Energy -152.78
BEST MODEL: Original Energy (original rank) -154.83 (1)
The first table shows the top five models after rescoring, the rescored energy, the original rank, the Chi score and the difference from the best rescored model (RMSD).
The second table shows the top five models before rescoring, their energy term, the rescored rank, Chi score and the difference from the best original score model (RMSD).
View the ten models in a Jmol structure viewer.
Your browser must be Java enabled:
Name Score Orig Rank Chi RMSD
S_3044_14 -152.78 1 8.20 0.00 View Download
S_1034_16 -151.93 3 7.43 0.87 View Download
S_2513_16 -151.51 2 10.30 0.75 View Download
S_1989_01 -151.45 4 9.09 0.53 View Download
S_2930_19 -151.31 10 5.67 0.63 View Download

Name Score Resc Rank Chi RMSD
S_3044_14 -154.83 1 8.20 0.00 View Download
S_2513_16 -154.09 3 10.30 0.75 View Download
S_1034_16 -153.79 2 7.43 0.87 View Download
S_1989_01 -153.72 4 9.09 0.53 View Download
S_3311_17 -153.22 8 9.40 0.59 View Download



Frequently asked questions

How do I make sure that my referencing is correct:
If you are not sure whether you have used the correct referencing, please convert your input file to BMRB format. The RCI server can rereference your chemical shifts. Go to http://www.wenmr.eu/wenmr/rci and upload your bmrb file.
In the advanced options select "6) Correct referencing of chemical shifts - Yes"
In the following screen select: Other files - Rereferenced chemical shifts
In the file you will notice the offset of the chemical shift. Safe this file and covert it to talos format, and submit to the CS ROSETTA webserver.

Your rating: None

Cite WeNMR/WestLife

Usage of the WeNMR/WestLife portals should be acknowledged in any publication:
"The FP7 WeNMR (project# 261572), H2020 West-Life (project# 675858) and the EOSC-hub (project# 777536) European e-Infrastructure projects are acknowledged for the use of their web portals, which make use of the EGI infrastructure with the dedicated support of CESNET-MetaCloud, INFN-PADOVA, NCG-INGRID-PT, TW-NCHC, SURFsara and NIKHEF, and the additional support of the national GRID Initiatives of Belgium, France, Italy, Germany, the Netherlands, Poland, Portugal, Spain, UK, Taiwan and the US Open Science Grid."
And the following article describing the WeNMR portals should be cited:
Wassenaar et al. (2012). WeNMR: Structural Biology on the Grid.J. Grid. Comp., 10:743-767.


The WeNMR Virtual Research Community has been the first to be officially recognized by the EGI.

European Union

WeNMR is an e-Infrastructure project funded under the 7th framework of the EU. Contract no. 261572

WestLife, the follow up project of WeNMR is a Virtual Research Environment e-Infrastructure project funded under Horizon 2020. Contract no. 675858