Line 4: |
Line 4: |
| We will start with a set of sequence reads and associated base quality scores stored in fastq file. | | We will start with a set of sequence reads and associated base quality scores stored in fastq file. |
| | | |
− | The mapping pipeline will find the most likely genomic location for each read producing a BAM file. | + | The alignment pipeline will find the most likely genomic location for each read producing a BAM file. |
| | | |
| The variant calling pipeline generates an initial list of polymorphic sites and genotypes stored in a VCF file and then uses haplotype information to refine these genotypes in an updated VCF file. | | The variant calling pipeline generates an initial list of polymorphic sites and genotypes stored in a VCF file and then uses haplotype information to refine these genotypes in an updated VCF file. |
Line 64: |
Line 64: |
| The first step in processing next generation sequence data is mapping the reads to the reference genome, generating per sample BAM files. | | The first step in processing next generation sequence data is mapping the reads to the reference genome, generating per sample BAM files. |
| | | |
− | The mapping pipeline has multiple built-in steps to generate BAMs: | + | The alignment pipeline has multiple built-in steps to generate BAMs: |
| # Align the fastqs to the reference genome | | # Align the fastqs to the reference genome |
| #* handles both single & paired end | | #* handles both single & paired end |
Line 73: |
Line 73: |
| This processing results in 1 BAM file per sample. | | This processing results in 1 BAM file per sample. |
| | | |
− | The mapping pipeline also includes Quality Control (QC) steps: | + | The alignment pipeline also includes Quality Control (QC) steps: |
| # Visualization of various quality measures (QPLOT) | | # Visualization of various quality measures (QPLOT) |
| # Screen for sample contamination & swap (VerifyBamID) | | # Screen for sample contamination & swap (VerifyBamID) |
| | | |
− | Run the mapping pipeline: | + | Run the alignment pipeline: |
− | $GCHOME/gotcloud align --conf $GCDATA/[[GBR60map.conf]] --outdir $GCOUT | + | $GCHOME/gotcloud align --conf $GCDATA/[[GBR60align.conf]] --outdir $GCOUT |
| | | |
− | Upon successful completion of the mapping pipeline, you will see the following message: | + | Upon successful completion of the alignment pipeline, you will see the following message: |
| Commands finished in nn secs with no errors reported | | Commands finished in nn secs with no errors reported |
| | | |
− | The final BAM files produced by the mapping pipeline are: | + | The final BAM files produced by the alignment pipeline are: |
| ls $GCOUT/alignment.recal/*.recal.bam | | ls $GCOUT/alignment.recal/*.recal.bam |
| | | |
Line 114: |
Line 114: |
| Run the variant calling pipeline: | | Run the variant calling pipeline: |
| $GCHOME/gotcloud snpcall --conf [[GBR60vc.conf]] --outdir vcResults --snpcall --numjobs 2 | | $GCHOME/gotcloud snpcall --conf [[GBR60vc.conf]] --outdir vcResults --snpcall --numjobs 2 |
− |
| |
− | TBD - maybe merge both mapping & umake into a single script and have them as options.
| |
| | | |
| TBD - add link explaining the contents of the .conf & .index files. | | TBD - add link explaining the contents of the .conf & .index files. |
Line 143: |
Line 141: |
| = Modifying the Tutorial Inputs to Run Your Own Data = | | = Modifying the Tutorial Inputs to Run Your Own Data = |
| | | |
− | == Mapping Pipeline == | + | == Alignment Pipeline == |
− | The inputs to the mapping pipeline are | + | The inputs to the alignment pipeline are |
| | | |
| ===Index file=== | | ===Index file=== |