When you want to run many almost identical jobs simultaneously, perhaps running the same program many times but changing the input data or some argument or parameter. One possible solution is to write a script to create all the qsub files and then write a BASH script to execute them. This is very time consuming and might end up submitting many more jobs to the queue than you actually need to. This is a typical problem suited to an SGE task array.
Advantages of Array Jobs:
You only need to submit one job to run a series of very similar tasks;
These tasks are independent and do not all need to run at once so the job scheduler can efficiently run one or more queued tasks as the requested computational resources become available;
They are particularly useful for Embarrassingly Parallel problems such as:
Monte Carlo simulations (where
$SGE_TASK_ID
might correspond to random number seed);Parameter sensitivity analysis;
Batch file processing (where
$SGE_TASK_ID
might refer to a file in a list of files to be processed).
Example :
make job submit script called my_array_job.sh as following, in this example, myprog will run 72 times through your subjects list after you submit with
$qsub -v subjects=subjects.txt my_array_job.sh
code | ||||
---|---|---|---|---|
| ||||
#$ -S /bin/bash
#$ -N my_array_job # give a meaningful name for the run, like ADRC QC, etc.
#$ -V
#$ -pe mpi 4 # number of slots to run job on each node
#$ -t 1-72 # environment variable $SGE_TASK_ID which will range from 1 to 72
#$ -cwd
#$ -o $HOME/sgestdout # directory where standard output are saved
#$ -e $HOME/sgestderr # directory where standard errors are saved
#$ -q global.q
# variables
DIR=/Project/To/Data/
subject=$(sed -n -e '{SGE_TASK_ID}p' $DIR/$subjects)
./myprog ${subject} |