Please email to all grid engine users at this address : ibic-gridengine@u.washington.edu to reserve any "IBIC-neuron", "IBIC-hp" or "IBIC-SunFire" queues for long-running jobs.
Available Clusters
Submitting jobs
There are two ways in which you can exploit the power of the IBIC-neuron cluster:
(1) You can use software that has been already parallelized and is ready to run on the Sun GridEngine (SGE) - the software that handles dispatching the jobs and running them when machines are free, and
(2) You can submit your own jobs to the SGE. Several long-running neuroimaging packages (e.g., FMRIB's FSL, ANTS, and TractoR) come SGE-enabled and can be run very easily on the SGE. Other packages, like FreeSurfer, require that you write some scripts. Examples of running FSL software and Freesurfer on the SGE are provided below. You can also use the SGE to automatically parallelize a makefile if you have structured your workflow that way. Instructions for doing this are at the end of this tutorial.
NOTE that you need to be logged into a submit host for the particular cluster where you will run gridengine jobs in order to execute the various q* commands (qmon, qsub, ... qmake)
Basics
First, use X2go, NX Client, or ssh -X to log on to an appropriate cluster workstations.
Second, ensure that your environment variables are set correctly, so that SGE commands will work. All of these should be set by default, but if they are not nothing will work correctly. Execute the following:
You should see among the lines returned these variables. In addition, /opt/SGE/sge6_2u5/bin/UNSUPPORTED-lx3.2.0-4-amd64-amd64 should appear in your PATH variable.
File locations
For the SGE to be able to distribute jobs across multiple workstations as it does, it has to be able to find each file in the same pathname on each machine. You should put all data that you want to process on the SGE in either /mnt/home or /project_space (it can be in a subdirectory). These directories are shared across workstations. In contrast, /tmp, /usr/tmp, /var, and /scratch are not shared between workstations and you will probably experience unpredictable failures if you try to access files in those locations from scripts or software that you run on the SGE.
Using FSL on SGE
Almost all FSL commands (e.g. feat, tbss) that have multiple parts to them are written to take advantage of the SGE. However, by default this capability is disabled for debugging purposes. This is so that you can interactively run FSL commands at the terminal to make sure that your output is as expected. To enable it, type the following command:
Then when you run an SGE-enabled FSL command, e.g. feat, it will give you the prompt back right away because it has started the jobs on the SGE.
Using the SGE for other things
Some popular long-running software, e.g. FreeSurfer, does not have built-in support for the SGE. However, submitting jobs by hand is easy. To create a job, you create a small shell script for each job. For example, to run recon-all on a subject with FreeSurfer, an example script might look like the script below (except that I would replace SUBJECTID with an actual subject identifier):
If I name this script job1.sh and edit it so that SUBJECTID is the actual identifier, I can submit it to the gridengine with the following command:
However, in practice, if you have hundreds of subjects, it would be a pain to use an editor to create a script for each subject. If you create one template script like the one above (called 'template.sh') using SUBJECTID as the placeholder for the actual subject numbers, which you have listed in a file called 'mysubjects.txt' you can do the following to submit a lot of jobs:
If you want to test this out before actually submitting the jobs, you can generate all the scripts by using the -g option (Generate only) to sls_submit_parallel_job.
Monitoring progress
Now that you have submitted your jobs you might want to see where they are running. The easiest way to do this is to bring up the graphical user interface to SGE, called qmon:
This pops up a little menu of buttons, as shown below.
The upper-leftmost button is called Job Control, and it will open up a panel that allows you to see your Pending Jobs, Running Jobs, and Finished Jobs. Pending Jobs are those that are submitted to the SGE and have not yet been scheduled to run on one of the machines in the cluster. Running Jobs are jobs that are actually running. Finished Jobs are completed. By looking to see your jobs move from Pending to Running and then to Finished, you can make sure that there are no errors in your scripts or processing that cause them to fail. Note that if there are a lot of Pending jobs before yours, you might need to wait a while until your job is scheduled.
Another fun thing to look at is the SGE utilization. If you click on the second button on the top left of the qmon panel, you can see the Cluster Queue Control panel. The third tab in this panel, called Hosts, lists the hosts that are in the SGE, along with the numbers of CPUs, average load, memory utilization, and CPU utilization.
Error State “E” does not go away automatically
One big message to impart is that E states are persistent and never go away on their own (unlike many SGE queue and job states which clear automatically). State “E” will persist through hardware reboots and Grid Engine restart efforts. The state has to be manually be cleared by a Grid Engine administrator. Again, the reason for this is that SGE wants a human to investigate the root cause first in case there is potential for the “black hole” effect mentioned above.
If you think this was a transient problem you can clear the queues and see what happens with your pending jobs — the command is “qmod -c (queue instance)”.
To globally clear all E states in your SGE cluster:
qmod -c '*'
Troubleshooting and Diagnosing
- qstat -explain E
- Examine the node itself and OS logs with an eye towards entries relating to permissions, failures or access errors
- Try to login to the node in question using a username associated with a failed job. This will help diagnose any username, authentication or access issues
- Look in the job output directory if it is available. Output from failed jobs can be extremely useful, especially if there is a path, ENV or permission problem
- Examine the SGE logs with particular focus on the messages file created by the sge_exced on the execution host in question
- If all else fails, SGE daemons will write log files to /tmp when they can’t write to their normal spool location. Seeing recent SGE event data in /tmp instead of your normal spool location is a good indication of filesystem or permission errors
Using the SGE to run a parallel make workflow
The SGE can automatically parallelize jobs that are started by a makefile. This is a useful way to structure your workflows, because you can run the same neuroimaging code on a single core, a multicore, and the SGE simply by changing your command line. This section assumes that you are familar with make.
The variant of make that runs on the SGE is called qmake. If you are using make in parallel, you will probably want to turn off FSLPARALLEL if you have enabled it by default.
There are two ways that you can execute qmake, giving you a lot of flexibility. The first is by submitting jobs dynamically, so that each one goes into the job queue just like a mini shell script of the type described above. To do this, type
The format of this command is as follows. The flags that appear before the -- are flags to qmake, and control gridengine parameters. The -cwd flag means to start gridengine jobs from the current directory (useful!) and -V tells it to pass all your environment variables along. If you forget the -V, I promise you that very bad things will happen - for example, FSL will crash because it can't find its shared libraries. Many programs will "not be found" because your path is not set correctly. On the opposite side of the -- are flags to make. By default, just like normal make, this will start exactly one job at a time. This is not very useful! You probably want to specify how much parallelism you want by using the -j flag to make (how many jobs to start at any one time). This above example runs 20 jobs simultaneously. The last argument, "all", is a target for make that is dependent upon the particular makefile used.
One drawback of executing jobs dynamically is that make might never get enough computer resources to finish. For this reason, there is also a parallel environment for make that reserves some number of processors for the make process and then manages those itself. You can specify your needs for this environment by typing
This command uses the -pe flag to specify the parallel environment called make, and reserves 10 nodes in this environment. The argument to make is "freesurfer" in this example.