Changes

From Genome Analysis Wiki
Jump to navigationJump to search
69 bytes removed ,  22:56, 24 February 2013
no edit summary
Line 4: Line 4:  
We will start with a set of sequence reads and associated base quality scores stored in fastq file.
 
We will start with a set of sequence reads and associated base quality scores stored in fastq file.
   −
The mapping pipeline will find the most likely genomic location for each read producing a BAM file.
+
The alignment pipeline will find the most likely genomic location for each read producing a BAM file.
    
The variant calling pipeline generates an initial list of polymorphic sites and genotypes stored in a VCF file and then uses haplotype information to refine these genotypes in an updated VCF file.
 
The variant calling pipeline generates an initial list of polymorphic sites and genotypes stored in a VCF file and then uses haplotype information to refine these genotypes in an updated VCF file.
Line 64: Line 64:  
The first step in processing next generation sequence data is mapping the reads to the reference genome, generating per sample BAM files.  
 
The first step in processing next generation sequence data is mapping the reads to the reference genome, generating per sample BAM files.  
   −
The mapping pipeline has multiple built-in steps to generate BAMs:
+
The alignment pipeline has multiple built-in steps to generate BAMs:
 
# Align the fastqs to the reference genome
 
# Align the fastqs to the reference genome
 
#* handles both single & paired end
 
#* handles both single & paired end
Line 73: Line 73:  
This processing results in 1 BAM file per sample.
 
This processing results in 1 BAM file per sample.
   −
The mapping pipeline also includes Quality Control (QC) steps:
+
The alignment pipeline also includes Quality Control (QC) steps:
 
# Visualization of various quality measures (QPLOT)
 
# Visualization of various quality measures (QPLOT)
 
# Screen for sample contamination & swap (VerifyBamID)
 
# Screen for sample contamination & swap (VerifyBamID)
   −
Run the mapping pipeline:
+
Run the alignment pipeline:
  $GCHOME/gotcloud align --conf $GCDATA/[[GBR60map.conf]] --outdir $GCOUT
+
  $GCHOME/gotcloud align --conf $GCDATA/[[GBR60align.conf]] --outdir $GCOUT
   −
Upon successful completion of the mapping pipeline, you will see the following message:  
+
Upon successful completion of the alignment pipeline, you will see the following message:  
 
  Commands finished in nn secs with no errors reported
 
  Commands finished in nn secs with no errors reported
   −
The final BAM files produced by the mapping pipeline are:
+
The final BAM files produced by the alignment pipeline are:
 
  ls $GCOUT/alignment.recal/*.recal.bam
 
  ls $GCOUT/alignment.recal/*.recal.bam
   Line 114: Line 114:  
Run the variant calling pipeline:
 
Run the variant calling pipeline:
 
  $GCHOME/gotcloud snpcall --conf [[GBR60vc.conf]] --outdir vcResults --snpcall --numjobs 2
 
  $GCHOME/gotcloud snpcall --conf [[GBR60vc.conf]] --outdir vcResults --snpcall --numjobs 2
  −
TBD - maybe merge both mapping & umake into a single script and have them as options.
      
TBD - add link explaining the contents of the .conf & .index files.
 
TBD - add link explaining the contents of the .conf & .index files.
Line 143: Line 141:  
= Modifying the Tutorial Inputs to Run Your Own Data =
 
= Modifying the Tutorial Inputs to Run Your Own Data =
   −
== Mapping Pipeline ==
+
== Alignment Pipeline ==
The inputs to the mapping pipeline are  
+
The inputs to the alignment pipeline are  
    
===Index file===
 
===Index file===

Navigation menu