Line 38: |
Line 38: |
| * BAMs->SVs rather than BAMs->SNPs | | * BAMs->SVs rather than BAMs->SNPs |
| | | |
| + | If you want a reminder, of what they look like, here is a link to the previous tutorial : [[SeqShop:_Variant_Calling_and_Filtering_for_SNPs_Practical#Examining_GotCloud_SnpCall_Input_files|GotCloud SnpCall Input Files]] |
| + | |
| + | If you want to check if you still have the bam index file, run |
| + | |
| + | head $OUT/bam.index |
| + | |
| + | <ul> |
| + | <div class="mw-collapsible mw-collapsed" style="width:200px"> |
| + | <li>View Results</li> |
| + | <div class="mw-collapsible-content" style="width:800px"> |
| + | HG00641 ALL /net/seqshop-server/hmkang/out/bams/HG00641.recal.bam |
| + | HG00640 ALL /net/seqshop-server/hmkang/out/bams/HG00640.recal.bam |
| + | HG00551 ALL /net/seqshop-server/hmkang/out/bams/HG00551.recal.bam |
| + | HG00553 ALL /net/seqshop-server/hmkang/out/bams/HG00553.recal.bam |
| + | HG00554 ALL bams/HG00554.recal.bam |
| + | HG00637 ALL bams/HG00637.recal.bam |
| + | HG00638 ALL bams/HG00638.recal.bam |
| + | HG00734 ALL bams/HG00734.recal.bam |
| + | HG00736 ALL bams/HG00736.recal.bam |
| + | HG00737 ALL bams/HG00737.recal.bam |
| + | </div> |
| + | </div> |
| + | </ul> |
| | | |
| + | Also, make sure that you have only 62 samples (you did not append new files twice) |
| + | |
| + | wc -l $OUT/bam.index |
| + | |
| + | 62 /net/seqshop-server/hmkang/out/bam.index |
| | | |
− | If you want a reminder, of what they look like, here is a link to the previous tutorial : [[SeqShop:_Variant_Calling_and_Filtering_for_SNPs_Practical#Examining_GotCloud_SnpCall_Input_files|GotCloud SnpCall Input Files]]
| |
| | | |
| === Reference Files === | | === Reference Files === |
Line 71: |
Line 98: |
| <div class="mw-collapsible mw-collapsed" style="width:200px"> | | <div class="mw-collapsible mw-collapsed" style="width:200px"> |
| <li>View Results</li> | | <li>View Results</li> |
− | <div class="mw-collapsible-content" style="width:1000px"> | + | <div class="mw-collapsible-content" style="width:800px"> |
| 1000G_omni2.5.b37.sites.PASS.chr22.vcf.gz hapmap_3.3.b37.sites.chr22.vcf.gz.tbi human.g1k.v37.chr22.fa.bwt | | 1000G_omni2.5.b37.sites.PASS.chr22.vcf.gz hapmap_3.3.b37.sites.chr22.vcf.gz.tbi human.g1k.v37.chr22.fa.bwt |
| 1000G_omni2.5.b37.sites.PASS.chr22.vcf.gz.tbi human.g1k.v37.chr22-bs.umfa human.g1k.v37.chr22.fa.fai | | 1000G_omni2.5.b37.sites.PASS.chr22.vcf.gz.tbi human.g1k.v37.chr22-bs.umfa human.g1k.v37.chr22.fa.fai |
Line 87: |
Line 114: |
| ls $SV/ref | | ls $SV/ref |
| | | |
− | human_g1k_v37.chr22.mask.100.fasta human_g1k_v37.chr22.mask.100.fasta.dict human_g1k_v37.chr22.mask.100.fasta.fai
| |
− |
| |
− | ls $SV/conf
| |
− | genstrip_parameters.txt humgen_g1k_v37_ploidy.chr22.map humgen_g1k_v37_ploidy.map
| |
− |
| |
− | === GotCloud BAM Index File ===
| |
− | The BAM index file points GotCloud to the BAM files
| |
− | * generated by the alignment pipeline
| |
− |
| |
− | Look at the BAM index file the alignment pipeline generated
| |
− | cat $OUT/bam.index
| |
− |
| |
− | ;What is the path to the BAM file for sample HG00640?
| |
| <ul> | | <ul> |
| <div class="mw-collapsible mw-collapsed" style="width:200px"> | | <div class="mw-collapsible mw-collapsed" style="width:200px"> |
− | <li>Answer:</li> | + | <li>View Results</li> |
− | <div class="mw-collapsible-content"> | + | <div class="mw-collapsible-content" style="width:800px"> |
− | <ul>
| + | human_g1k_v37.chr22.mask.100.fasta human_g1k_v37.chr22.mask.100.fasta.dict human_g1k_v37.chr22.mask.100.fasta.fai |
− | <li>/home/YourUserName/out/bams/HG00640.recal.bam</li>
| |
− | [[File:Bamindex.png|500px]]
| |
| </div> | | </div> |
| </div> | | </div> |
− | </ul>
| |
| </ul> | | </ul> |
| | | |
− | The alignment pipeline only processed 4 samples, but for snpcall, we want to run on 62 samples.
| |
− | * The other 58 samples were already aligned:
| |
− | ls $IN/bams
| |
| | | |
− | Look at the BAM index for those BAMs:
| + | ls $SV/conf |
− | less $IN/bams/bam.index | |
| | | |
− | Remember, use <code>'q'</code> to exit out of <code>less</code>
| |
− | q
| |
− |
| |
− | ;Do you notice a difference between this index and yours?
| |
| <ul> | | <ul> |
− | <div class="mw-collapsible mw-collapsed" style="width:500px"> | + | <div class="mw-collapsible mw-collapsed" style="width:200px"> |
− | <li>Answer:</li> | + | <li>View Results</li> |
− | <div class="mw-collapsible-content"> | + | <div class="mw-collapsible-content" style="width:800px"> |
− | <ul>
| + | genstrip_parameters.txt humgen_g1k_v37_ploidy.chr22.map humgen_g1k_v37_ploidy.map |
− | <li>It doesn't have a full path to the BAM file, while your index has /home/...</li>
| |
− | [[File:Bamindex1.png|300px]]
| |
− | <li>That's ok, <code>gotcloud.conf</code> contains the path to those BAMs</li>
| |
− | [[File:BamindexConf.png|300px]]
| |
| </div> | | </div> |
| </div> | | </div> |
| </ul> | | </ul> |
− | </ul>
| |
| | | |
| + | === GotCloud Configuration File === |
| + | We will use a slightly modified version of configuration file as we used yesterday in GotCloud Align. |
| + | |
| + | See [[SeqShop:_Sequence_Mapping_and_Assembly_Practical#GotCloud Configuration File|SeqShop: Alignment: GotCloud Configuration File]] for more details |
| + | * Note we want to limit snpcall to just chr22 so the configuration already has <code>CHRS = 22</code> (default was 1-22 & X). |
| | | |
− | We need to add these BAMs to our index
| + | For more information on configuration, see: [[GotCloud:_Variant_Calling_Pipeline#Configuration_File|GotCloud snpcall: Configuration File]] |
− | * Append the bam.index from the pre-aligned BAMs to the one you generated from the alignment pipeline | + | * Contains information on how to configure for exome/targeted sequencing |
− | cat $IN/bams/bam.index >> $OUT/bam.index
| |
− | * ">>" will append to the file that follows it
| |
| | | |
− | Verify your BAM index contains the additional BAMs
| + | Check out what was changed. |
− | less $OUT/bam.index
| |
| | | |
− | Remember, use <code>'q'</code> to exit out of <code>less</code>
| + | cat $IN/gotcloud.conf |
− | q
| |
| | | |
− | ;Do you see both sets of BAMs?
| |
| <ul> | | <ul> |
− | <div class="mw-collapsible mw-collapsed" style="width:500px"> | + | <div class="mw-collapsible mw-collapsed" style="width:200px"> |
− | <li>Annotated Screenshot:</li> | + | <li>View Results</li> |
− | <div class="mw-collapsible-content"> | + | <div class="mw-collapsible-content" style="width:800px"> |
− | <ul>
| + | IN_DIR = $(GOTCLOUD_ROOT)/../inputs |
− | <li>If not, let me know</li>
| + | |
− | [[File:Bamindex2.png|400px]]
| + | INDEX_FILE = $(IN_DIR)/align.index |
| + | FASTQ_PREFIX = $(IN_DIR)/fastq |
| + | BAM_PREFIX = $(IN_DIR)/ |
| + | |
| + | OUT_DIR = out |
| + | BAM_INDEX = $(OUT_DIR)/bam.index |
| + | |
| + | ############ |
| + | # References |
| + | REF_DIR = $(GOTCLOUD_ROOT)/../reference/chr22 |
| + | AS = NCBI37 # Genome assembly identifier |
| + | REF = $(REF_DIR)/human.g1k.v37.chr22.fa |
| + | DBSNP_VCF = $(REF_DIR)/dbsnp_135.b37.chr22.vcf.gz |
| + | HM3_VCF = $(REF_DIR)/hapmap_3.3.b37.sites.chr22.vcf.gz |
| + | INDEL_PREFIX = $(REF_DIR)/1kg.pilot_release.merged.indels.sites.hg19 |
| + | OMNI_VCF = $(REF_DIR)/1000G_omni2.5.b37.sites.PASS.chr22.vcf.gz |
| + | |
| + | MAP_TYPE = BWA_MEM |
| + | |
| + | ############### |
| + | CHRS = 22 |
| + | |
| + | ######### THUNDER ######## |
| + | # Update so it will run faster for the tutorial |
| + | # * 10 rounds instead of 30 (-r 10) |
| + | # * without --compact option |
| + | # Runs faster, but uses more memory, but not a lot for the small example |
| + | THUNDER = $(BIN_DIR)/thunderVCF -r 10 --phase --dosage --inputPhased $(THUNDER_STATES) |
| + | |
| + | ############################## |
| + | ## GenomeSTRIP |
| + | ############################# |
| + | GENOMESTRIP_SVTOOLKIT_DIR = $(GOTCLOUD_ROOT)/../reference/svtoolkit |
| + | GENOMESTRIP_MASK_FASTA = $(GENOMESTRIP_SVTOOLKIT_DIR)/ref/human_g1k_v37.chr22.mask.100.fasta |
| + | GENOMESTRIP_PLOIDY_MAP = $(GENOMESTRIP_SVTOOLKIT_DIR)/conf/humgen_g1k_v37_ploidy.chr22.map |
| + | GENOMESTRIP_PARAM = $(GENOMESTRIP_SVTOOLKIT_DIR)/conf/genstrip_parameters.txt |
| </div> | | </div> |
| </div> | | </div> |
| </ul> | | </ul> |
− | </ul>
| + | |
− | | + | == Run GotCloud SnpCall == |
− | === GotCloud Configuration File ===
| + | [[File:SnpcallDiagram.png|500px]] |
− | We will use the same configuration file as we used yesterday in GotCloud Align.
| |
− | | |
− | See [[SeqShop:_Sequence_Mapping_and_Assembly_Practical#GotCloud Configuration File|SeqShop: Alignment: GotCloud Configuration File]] for more details
| |
− | * Note we want to limit snpcall to just chr22 so the configuration already has <code>CHRS = 22</code> (default was 1-22 & X).
| |
− | | |
− | For more information on configuration, see: [[GotCloud:_Variant_Calling_Pipeline#Configuration_File|GotCloud snpcall: Configuration File]]
| |
− | * Contains information on how to configure for exome/targeted sequencing
| |
− | | |
− | == Run GotCloud SnpCall == | |
− | [[File:SnpcallDiagram.png|500px]] | |
| | | |
− | Now that we have all of our input files, we need just a simple command to run: | + | Now that we have all of our input files, we need just a simple command to run: |
− | $GC/gotcloud snpcall --conf $IN/gotcloud.conf --numjobs 4 --region 22:36000000-37000000
| + | $GC/gotcloud snpcall --conf $IN/gotcloud.conf --numjobs 4 --region 22:36000000-37000000 |
| * --numjobs tells GotCloud how many jobs to run in parallel | | * --numjobs tells GotCloud how many jobs to run in parallel |
| ** Depends on your system | | ** Depends on your system |