A worldwide e-Infrastructure for NMR and structural biology

This site uses cookies for a better experience. By continuing to browse you agree to the use of cookies.

Automatic test of enmr.eu jobs on all supporting CEs.

Introduction

Grid sites supporting enmr.eu VO sometime get missconfigured. Examples are: problem with one or both enmr.eu VOMS server, problem with the partition were the enmr.eu software software are stored, problem with the list of software tags, problems with some configuration variables etc...

In order to detect, log and report problems, we developed a series of bash scripts that automatically submit jobs to the grid, and automatically retreive the results. It then become easy to quickly inspect the result, detect problems, and get some information about the problems. This should be enough to open a GGUS ticket.

List of scripts

All the following scripts are stored in 1.check-status_TEMPLATE

1.generate-jdls.sh  

This script uses the grid command 'lcg-infosites --vo enmr.eu ce -v 1' to extract all CEs supporting enmr.eu. It then generate, for each CEs, a jdl file target specificly to this CE. All the jdl files are stored in the directory JDL (automatically created, it overwirte a previous JDL directory if it exists) Only the CEs that are available at the time the script is launched will be further tested.

2.submit-jdls.sh  

This script looks at all jdl contained in the directory JDL, and submits them. The output message of the submission is saved in the directory SUB/ : one file per CEs. Before submission, this script generate a new proxy. To avoid to enter a password, one can use a proxy robot.

3.check-status.sh

This script looks into the directory SUB for all submitted jobs, and queries the status of the job. The output is stored in the directory STATUS, one file per CEs. It overwrites the previous STATUS directory if it already exists. The script that will be executed on the worknode is FILE/status.sh, it contains some basic test. Before chekcing the status, 3.check-status.sh generates a new proxy. To avoid to enter a password, one can use a proxy robot.

4.output-if-success.sh  

This script looks into the STATUS directory for all the jobs that have the status "Success", and retreive the output into the OUT directory. Before geting the output, the script generates a new proxy. To avoid to enter a password, one can use a proxy robot.

5.output-if-exit-code-zero.sh 

This script looks into the STATUS directory for all the jobs with the status "Exit code != 0" and retreive the output into the OUT_EXIT_CODE_WRONG directory. This happen when the job was executed on the worknode, but some problem occured. Most likely, the enmr.eu software directory was not defined or mounted. Before geting the output, the script generates a new proxy. To avoid to enter a password, one can use a proxy robot.

6.get_logging_info.sh 

This script look into the STATUS directory far all the jobs. For each of them, it executes the command 'glite-wms-job-logging-info -v 3 JOBID', and store the result in the directory LOG. If the job was aborted, the extension of the file is '.Aborted'. Before geting the log info, the script generates a new proxy. To avoid to enter a password, one can use a proxy robot.

Automatic submission

In order to facilitate the submission and retreive of the results, two master scripts have been written.

launch_test.sh

This script copy the directory 1.check-status_TEMPLATE into a newly created directory that looks like 1.check-status_20120201-01.01.01/ (YYYYMMDD-hh.mm.ss). It then call the scripts 1.generate-jdls.sh  2.submit-jdls.sh from the directory. Finally, it create an empty file 1.check-status_20120201-01.01.01.RUNNING from the working directory.

retreive_test.sh

This script looks for all files that looks like 1.check-status_20120201-01.01.01.RUNNING, and for each of them, go into the corresponding directory, and launch the scripts 3.check-status.sh  4.output-if-success.sh  5.output-if-exit-code-zero.sh  6.get_logging_info.sh. The script then remove the .RUNNING file.

cron jobs

The following is an example of a crontab used to submit the jobs everyday at 01:01 am, 02:01 am and 03:01 am and retreive the results at 00:01 and 00:31.

01 1,2,3 * * * source ~/.bash_profile; cd /home/christophe/DEPLOY-GRID/ ; ./launch_test.sh   >> /home/christophe/DEPLOY-GRID/cron_launch.log   2>&1
01,31 0 * * *     source ~/.bash_profile; cd /home/christophe/DEPLOY-GRID/ ; ./retreive_test.sh >> /home/christophe/DEPLOY-GRID/cron_retreive.log 2>&1
#0,5,10,15,20,25,30,35,40,45,50,55 11,12 * * * source ~/.bash_profile; cd /home/christophe/DEPLOY-GRID/ ; ./launch_test.sh   >> /home/christophe/DEPLOY-GRID/cron_\
launch.log   2>&1
 
The last line is just a convenient way to launch a series of test during a couple of hours (to be uncommented, modified, and commented once finish)

Practical usage

This is a list of practical commands to extract some info from the test directories:
 
ls 1.check-status_2012*/LOG/*.Aborted to see all Aborted jobs

ls 1.check-status_2012*/LOG/*.Aborted | awk -F "LOG" '{print $2}' | sort to see all Aborted jobs ordered by CEs

ls 1.check-status_20120131-*/LOG/*ufrj* to see all jobs (Aborted or not) from a specific site (ufrj in this case)

grep Reason 1.check-status_20120131-12.25.01/LOG/ce01.eela.if.ufrj.br-status.Aborted to get some important info about why this job got aborted. It is usually important to post this output when opening a GGUS ticket.

ls 1.check-status_2012*/OUT_EXIT_CODE_WRONG/* To see the jobs that finished with an exit code != 0.

Conclusion

You can download all the necessary files with the archive test_grid.tgz

 

0
Your rating: None

Cite WeNMR/WestLife

 
Usage of the WeNMR/WestLife portals should be acknowledged in any publication:
 
"The FP7 WeNMR (project# 261572), H2020 West-Life (project# 675858) and the EOSC-hub (project# 777536) European e-Infrastructure projects are acknowledged for the use of their web portals, which make use of the EGI infrastructure with the dedicated support of CESNET-MetaCloud, INFN-PADOVA, NCG-INGRID-PT, TW-NCHC, SURFsara and NIKHEF, and the additional support of the national GRID Initiatives of Belgium, France, Italy, Germany, the Netherlands, Poland, Portugal, Spain, UK, Taiwan and the US Open Science Grid."
 
And the following article describing the WeNMR portals should be cited:
Wassenaar et al. (2012). WeNMR: Structural Biology on the Grid.J. Grid. Comp., 10:743-767.

EGI-approved

The WeNMR Virtual Research Community has been the first to be officially recognized by the EGI.

European Union

WeNMR is an e-Infrastructure project funded under the 7th framework of the EU. Contract no. 261572

WestLife, the follow up project of WeNMR is a Virtual Research Environment e-Infrastructure project funded under Horizon 2020. Contract no. 675858

West-Life