Changes

From Genome Analysis Wiki
Jump to navigationJump to search
2,632 bytes added ,  12:51, 25 February 2013
no edit summary
Line 80: Line 80:     
Run the alignment pipeline (the example aligns 2 samples) :
 
Run the alignment pipeline (the example aligns 2 samples) :
  $GCHOME/gotcloud align --conf $GCDATA/[[GBR60align.conf]] --outdir $GCOUT
+
  $GCHOME/gotcloud align --conf $GCDATA/[[Alignment Configuration File|GBR60align.conf]] --outdir $GCOUT
    
Upon successful completion of the alignment pipeline, you will see the following message:  
 
Upon successful completion of the alignment pipeline, you will see the following message:  
Line 145: Line 145:  
  /net/fantasia/home/hmkang/bin/epacts/bin/epacts single --vcf $GCOUT/vcfs/chr20/chr20.filtered.vcf.gz --ped $GCDATA/test.GBR60.ped --out EPACTS_TEST --test q.linear --run 1 --top 1 --chr 20
 
  /net/fantasia/home/hmkang/bin/epacts/bin/epacts single --vcf $GCOUT/vcfs/chr20/chr20.filtered.vcf.gz --ped $GCDATA/test.GBR60.ped --out EPACTS_TEST --test q.linear --run 1 --top 1 --chr 20
   −
= Tutorial Inputs / Modifying the Tutorial Inputs to Run Your Own Data =
+
= Tutorial Inputs =
 +
 
 +
== Alignment Pipeline ==
 +
The inputs to the tutorial alignment pipeline are:
 +
# [[#Alignment Configuration File|Configuration File (--conf)]]
 +
# [[#Alignment Output Directory|Output Directory (--outdir)]]
 +
 
 +
=== Alignment Configuration File ===
 +
The configuration file contains KEY = VALUE settings that override defaults and set specific values for the given run.
 +
 
 +
<pre>
 +
INDEX_FILE = GBR60fastq.index
 +
############
 +
# References
 +
REF_DIR = chr20Ref
 +
AS = NCBI37
 +
FA_REF = $(REF_DIR)/human_g1k_v37_chr20.fa
 +
DBSNP_VCF =  $(REF_DIR)/dbsnp135_chr20.vcf.gz
 +
HM3_VCF = $(REF_DIR)/hapmap_3.3.b37.sites.chr20.vcf.gz
 +
</pre>
 +
 
 +
This configuration file sets:
 +
* INDEX_FILE - file containing the fastqs to be processed as well as the read group information for these fastqs. 
 +
* Reference Information:
 +
** AS - assembly value to put in the BAM
 +
** FA_REF - the reference file (.fa extension), the additional files should be at the same location:
 +
*** human_g1k_v37_chr20-bs.umfa
 +
*** human_g1k_v37_chr20.dict
 +
*** human_g1k_v37_chr20.fa
 +
*** human_g1k_v37_chr20.fa.amb
 +
*** human_g1k_v37_chr20.fa.ann
 +
*** human_g1k_v37_chr20.fa.bwt
 +
*** human_g1k_v37_chr20.fa.fai
 +
*** human_g1k_v37_chr20.fa.GCcontent
 +
*** human_g1k_v37_chr20.fa.pac
 +
*** human_g1k_v37_chr20.fa.rbwt
 +
*** human_g1k_v37_chr20.fa.rpac
 +
*** human_g1k_v37_chr20.fa.rsa
 +
*** human_g1k_v37_chr20.fa.sa
 +
** DBSNP_VCF - a vcf containing the dbsnp positions
 +
** HM3_VCF - hapmap vcf
 +
 
 +
For running your own test, update the INDEX_FILE to point to your index file and the reference values to point to your references.
 +
 
 +
This example uses reference files that are chr20 only in order to speed processing of the tutorial data.  If you are using the default references, you may just need to update REF_DIR to the directory where they are installed.  Full Reference files can be downloaded from [[GotCloudReference]].
 +
 
 +
It is recommended that you use absolute paths.  (This example does not use absolute paths in order to be flexible to where the data is installed, but using relative paths require it to be run from the correct directory.)
 +
 
 +
=== Alignment Output Directory ===
 +
This setting tells the pipeline what directory to write the output files into.
 +
 
 +
The output directory will be created if necessary and will contain the following Directories/files:
 +
* bams - directory containing the final bams/bai files
 +
** HG00096.OK - indicates that this sample completed alignment processing
 +
** HG00100.OK - indicates that this sample completed alignment processing
 +
* failLogs - directory containing logs from steps that failed
 +
* Makefiles - directory containing the makefiles for each sample
 +
**
 +
* QCFiles - directory containing the QC Results
 +
**
 +
* tmp - directory containing temporary alignment files
 +
**
 +
 
 +
 
 +
= Modifying the Tutorial Inputs to Run Your Own Data =
    
== Download the whole genome resource files ==
 
== Download the whole genome resource files ==
 
If you want to analyze sequence data beyond chr20, you will first need to download the full resource files from [[TBA]]
 
If you want to analyze sequence data beyond chr20, you will first need to download the full resource files from [[TBA]]
   −
== Alignment Pipeline ==
  −
The inputs to the tutorial alignment pipeline are:
  −
# Configuration File (--conf)
  −
#
      
===Index file===
 
===Index file===

Navigation menu