Difference between revisions of "Using Gotcloud on Flux"

From Genome Analysis Wiki
Jump to: navigation, search
(Next, prepare to use the Flux/Torque/PBS cluster)
(Next, prepare to use the Flux/Torque/PBS cluster)
Line 21: Line 21:
 
6. Make a new folder, where you'll run your jobs from.
 
6. Make a new folder, where you'll run your jobs from.
  
7. Make an email address to send your jobs' status to.  I recommend that you don't use your primary one.
+
7. Make an email address to send your jobs' status to.  It'll be hit by hundreds or thousands of emails, so I recommend that you don't use your main email address here. 
 +
:* If you're in a hurry to finish your pipeline, you can find an email address that will text the emails to your phone. Only use that in the second script, though!
  
 
8. Figure out the name of the Flux account that you're going to use.  You can see which Flux accounts you have access to by running <code>mdiag -u $USER</code> and looking at the list after <code>ALIST</code>.
 
8. Figure out the name of the Flux account that you're going to use.  You can see which Flux accounts you have access to by running <code>mdiag -u $USER</code> and looking at the list after <code>ALIST</code>.
 +
:* Eg, <code>sph_flux</code>
  
 
9. Figure out how many processors you're going to use at once.  Run <code>mdiag -a YOU_FLUX_ACCOUNT</code>.  I recommend running <code>MAXPROC</code> + <code>MAXIJOB[USER]</code> many jobs.  <code>MAXPROC</code> is the number of processors on your account, and <code>MAXIJOB[USER]</code> is the number of jobs that can sit idle in the queue waiting to be run (often 20).
 
9. Figure out how many processors you're going to use at once.  Run <code>mdiag -a YOU_FLUX_ACCOUNT</code>.  I recommend running <code>MAXPROC</code> + <code>MAXIJOB[USER]</code> many jobs.  <code>MAXPROC</code> is the number of processors on your account, and <code>MAXIJOB[USER]</code> is the number of jobs that can sit idle in the queue waiting to be run (often 20).
 +
:* This number will usually be between 20 and 1000.
  
10. Figure out which pipelines to run first.  They will run in the order glfN, vcfN, pvcfN, filtN, svmN, splitN, allN where N is the name of a chromosome (ie, 1-22 and maybe X and Y).  If you skip a step, the <code>make</code> will run it for you.  If you're confident you can go straight to the step allN.  If you want to babysit the jobs, do them one-at-a-time. Your MAKEFILE_TARGETS will need one for each chromosome, like <code>glf1 glf2 glf3 glf4 glf5 glf6 glf7 glf8 glf9 glf10 glf11 glf12 glf13 glf14 glf15 glf16 glf17 glf18 glf19 glf20 glf21 glf22 </code>.  Feel free to use the script <code>perl -e 'print "glf$_ " for 1..22'</code> to mitigate repetitive strain injuries.
+
10. Figure out which steps to run first.  The steps go in the order glfN, vcfN, pvcfN, filtN, svmN, splitN, allN where N is the name of a chromosome (ie, 1-22 and maybe X and Y).  If you skip a step, it's not a problem, because <code>make</code> will run it for you.  If you're confident that everything will work beautifully, you can go straight to the step <code>allN</code> (or just <code>all</code> as a shortcut).
 +
:* For example, I used <code>glf1 glf2 glf3 glf4 glf5 glf6 glf7 glf8 glf9 glf10 glf11 glf12 glf13 glf14 glf15 glf16 glf17 glf18 glf19 glf20 glf21 glf22 </code> the first time I ran on FluxThen I ran <code>vcf1 vcf2...<code>, and on down the list until finally <code>all</code>.
 +
:* Feel free to use the script <code>perl -e 'print "glf$_ " for 1..22'</code> to mitigate repetitive strain injuries.
  
 
11. Inside that new folder, make a new file named <code>pbs.options</code> that contains the following:
 
11. Inside that new folder, make a new file named <code>pbs.options</code> that contains the following:
Line 40: Line 45:
 
  #PBS -j oe
 
  #PBS -j oe
  
12. Still inside that folder, create a script that you will submit to flux.  It should look like this:
+
12. Still inside that folder, create a script that you will submit to flux.  Let's name it <code>script_thats_in_charge.sh</code>.  It should look like this:
 
  #!/bin/sh
 
  #!/bin/sh
 
   
 
   
Line 57: Line 62:
 
   
 
   
 
  echo "job ended with status $? at $(date)"
 
  echo "job ended with status $? at $(date)"
 +
 +
13. Run <code>qsub script_thats_in_charge.sh</code>.  It's important that you run this in the same folder where <code>pbs.options</code> lives.
 +
 +
14. Once that finishes, if any steps remain, then update YOUR_MAKEFILE_TARGETS_FROM_STEP_10 and go back to step 13.

Revision as of 02:32, 8 January 2016

Running Gotcloud on Flux

First, Configure GotCloud like you would anywhere else

1. Install GotCloud somewhere as instructed here.

2. Get access to the reference files from someone else, or download them as instructed here.

3. Make a configuration file as usual for your analysis.

4. Include the line BATCH_TYPE = pbs in that configuration file.

Next, prepare to use the Flux/Torque/PBS cluster

5. Run gotcloud with zero jobs to generate a Makefile.

/path/to/gotcloud/gotcloud snpcall --conf /path/to/configuration.conf --numjobs 0
  • The newly generated Makefile will be located in the directory OUT_DIR that is specified in your configuration file. It will be named umake.snpcall.Makefile.

6. Make a new folder, where you'll run your jobs from.

7. Make an email address to send your jobs' status to. It'll be hit by hundreds or thousands of emails, so I recommend that you don't use your main email address here.

  • If you're in a hurry to finish your pipeline, you can find an email address that will text the emails to your phone. Only use that in the second script, though!

8. Figure out the name of the Flux account that you're going to use. You can see which Flux accounts you have access to by running mdiag -u $USER and looking at the list after ALIST.

  • Eg, sph_flux

9. Figure out how many processors you're going to use at once. Run mdiag -a YOU_FLUX_ACCOUNT. I recommend running MAXPROC + MAXIJOB[USER] many jobs. MAXPROC is the number of processors on your account, and MAXIJOB[USER] is the number of jobs that can sit idle in the queue waiting to be run (often 20).

  • This number will usually be between 20 and 1000.

10. Figure out which steps to run first. The steps go in the order glfN, vcfN, pvcfN, filtN, svmN, splitN, allN where N is the name of a chromosome (ie, 1-22 and maybe X and Y). If you skip a step, it's not a problem, because make will run it for you. If you're confident that everything will work beautifully, you can go straight to the step allN (or just all as a shortcut).

  • For example, I used glf1 glf2 glf3 glf4 glf5 glf6 glf7 glf8 glf9 glf10 glf11 glf12 glf13 glf14 glf15 glf16 glf17 glf18 glf19 glf20 glf21 glf22 the first time I ran on Flux. Then I ran vcf1 vcf2..., and on down the list until finally all.
  • Feel free to use the script perl -e 'print "glf$_ " for 1..22' to mitigate repetitive strain injuries.

11. Inside that new folder, make a new file named pbs.options that contains the following:

#PBS -l nodes=1:ppn=1,walltime=10:00:00,pmem=4gb,qos=YOUR_FLUX_ACCOUNT_FROM_STEP_8
#PBS -d .
#PBS -m abe
#PBS -M YOUR_EMAIL_FROM_STEP_7
#PBS -q flux
#PBS -l qos=flux
#PBS -A YOUR_FLUX_ACCOUNT_FROM_STEP_8
#PBS -V
#PBS -j oe

12. Still inside that folder, create a script that you will submit to flux. Let's name it script_thats_in_charge.sh. It should look like this:

#!/bin/sh

#PBS -l nodes=1:ppn=4,walltime=150:00:00,pmem=4gb,qos=YOUR_FLUX_ACCOUNT_FROM_STEP_8
#PBS -d .
#PBS -m abe
#PBS -M YOUR_EMAIL_FROM_STEP_7
#PBS -q flux
#PBS -l qos=flux
#PBS -A YOUR_FLUX_ACCOUNT_FROM_STEP_8
#PBS -V
#PBS -j oe
#PBS -N SOME_ARBITRARY_NAME_FOR_THIS_JOB

make -w --warn-undefined-variables -k -f /path/to/that/Makefile/umake.snpcall.Makefile -j NUMBER_OF_JOBS_FROM_STEP_9 YOUR_MAKEFILE_TARGETS_FROM_STEP_10 > /path/to/wherever/standard_output 2> /path/to/wherever/standard_error

echo "job ended with status $? at $(date)"

13. Run qsub script_thats_in_charge.sh. It's important that you run this in the same folder where pbs.options lives.

14. Once that finishes, if any steps remain, then update YOUR_MAKEFILE_TARGETS_FROM_STEP_10 and go back to step 13.