Changes

From Genome Analysis Wiki
Jump to navigationJump to search
no edit summary
Line 38: Line 38:  
* BAMs->SVs rather than BAMs->SNPs
 
* BAMs->SVs rather than BAMs->SNPs
    +
If you want a reminder, of what they look like, here is a link to the previous tutorial : [[SeqShop:_Variant_Calling_and_Filtering_for_SNPs_Practical#Examining_GotCloud_SnpCall_Input_files|GotCloud SnpCall Input Files]]
 +
 +
If you want to check if you still have the bam index file, run
 +
 +
head $OUT/bam.index
 +
 +
<ul>
 +
<div class="mw-collapsible mw-collapsed" style="width:200px">
 +
<li>View Results</li>
 +
<div class="mw-collapsible-content" style="width:800px">
 +
HG00641 ALL /net/seqshop-server/hmkang/out/bams/HG00641.recal.bam
 +
HG00640 ALL /net/seqshop-server/hmkang/out/bams/HG00640.recal.bam
 +
HG00551 ALL /net/seqshop-server/hmkang/out/bams/HG00551.recal.bam
 +
HG00553 ALL /net/seqshop-server/hmkang/out/bams/HG00553.recal.bam
 +
HG00554 ALL bams/HG00554.recal.bam
 +
HG00637 ALL bams/HG00637.recal.bam
 +
HG00638 ALL bams/HG00638.recal.bam
 +
HG00734 ALL bams/HG00734.recal.bam
 +
HG00736 ALL bams/HG00736.recal.bam
 +
HG00737 ALL bams/HG00737.recal.bam
 +
</div>
 +
</div>
 +
</ul>
    +
Also, make sure that you have only 62 samples (you did not append new files twice)
 +
 +
wc -l $OUT/bam.index
 +
 +
62 /net/seqshop-server/hmkang/out/bam.index
   −
If you want a reminder, of what they look like, here is a link to the previous tutorial : [[SeqShop:_Variant_Calling_and_Filtering_for_SNPs_Practical#Examining_GotCloud_SnpCall_Input_files|GotCloud SnpCall Input Files]]
      
=== Reference Files ===
 
=== Reference Files ===
Line 71: Line 98:  
<div class="mw-collapsible mw-collapsed" style="width:200px">
 
<div class="mw-collapsible mw-collapsed" style="width:200px">
 
<li>View Results</li>
 
<li>View Results</li>
<div class="mw-collapsible-content" style="width:1000px">
+
<div class="mw-collapsible-content" style="width:800px">
 
  1000G_omni2.5.b37.sites.PASS.chr22.vcf.gz            hapmap_3.3.b37.sites.chr22.vcf.gz.tbi  human.g1k.v37.chr22.fa.bwt
 
  1000G_omni2.5.b37.sites.PASS.chr22.vcf.gz            hapmap_3.3.b37.sites.chr22.vcf.gz.tbi  human.g1k.v37.chr22.fa.bwt
 
  1000G_omni2.5.b37.sites.PASS.chr22.vcf.gz.tbi        human.g1k.v37.chr22-bs.umfa            human.g1k.v37.chr22.fa.fai
 
  1000G_omni2.5.b37.sites.PASS.chr22.vcf.gz.tbi        human.g1k.v37.chr22-bs.umfa            human.g1k.v37.chr22.fa.fai
Line 87: Line 114:  
  ls $SV/ref
 
  ls $SV/ref
   −
human_g1k_v37.chr22.mask.100.fasta  human_g1k_v37.chr22.mask.100.fasta.dict  human_g1k_v37.chr22.mask.100.fasta.fai
  −
  −
ls $SV/conf
  −
genstrip_parameters.txt  humgen_g1k_v37_ploidy.chr22.map  humgen_g1k_v37_ploidy.map
  −
  −
=== GotCloud BAM Index File ===
  −
The BAM index file points GotCloud to the BAM files
  −
* generated by the alignment pipeline
  −
  −
Look at the BAM index file the alignment pipeline generated
  −
cat $OUT/bam.index
  −
  −
;What is the path to the BAM file for sample HG00640?
   
<ul>
 
<ul>
 
<div class="mw-collapsible mw-collapsed" style="width:200px">
 
<div class="mw-collapsible mw-collapsed" style="width:200px">
<li>Answer:</li>
+
<li>View Results</li>
<div class="mw-collapsible-content">
+
<div class="mw-collapsible-content" style="width:800px">
<ul>
+
human_g1k_v37.chr22.mask.100.fasta  human_g1k_v37.chr22.mask.100.fasta.dict  human_g1k_v37.chr22.mask.100.fasta.fai
<li>/home/YourUserName/out/bams/HG00640.recal.bam</li>
  −
[[File:Bamindex.png|500px]]
   
</div>
 
</div>
 
</div>
 
</div>
</ul>
   
</ul>
 
</ul>
   −
The alignment pipeline only processed 4 samples, but for snpcall, we want to run on 62 samples.
  −
* The other 58 samples were already aligned:
  −
ls $IN/bams
     −
Look at the BAM index for those BAMs:
+
  ls $SV/conf
  less $IN/bams/bam.index
     −
Remember, use <code>'q'</code> to exit out of <code>less</code>
  −
q
  −
  −
;Do you notice a difference between this index and yours?
   
<ul>
 
<ul>
<div class="mw-collapsible mw-collapsed" style="width:500px">
+
<div class="mw-collapsible mw-collapsed" style="width:200px">
<li>Answer:</li>
+
<li>View Results</li>
<div class="mw-collapsible-content">
+
<div class="mw-collapsible-content" style="width:800px">
<ul>
+
genstrip_parameters.txt  humgen_g1k_v37_ploidy.chr22.map  humgen_g1k_v37_ploidy.map
<li>It doesn't have a full path to the BAM file, while your index has /home/...</li>
  −
[[File:Bamindex1.png|300px]]
  −
<li>That's ok, <code>gotcloud.conf</code> contains the path to those BAMs</li>
  −
[[File:BamindexConf.png|300px]]
   
</div>
 
</div>
 
</div>
 
</div>
 
</ul>
 
</ul>
</ul>
      +
=== GotCloud Configuration File ===
 +
We will use a slightly modified version of configuration file as we used yesterday in GotCloud Align.
 +
 +
See [[SeqShop:_Sequence_Mapping_and_Assembly_Practical#GotCloud Configuration File|SeqShop: Alignment: GotCloud Configuration File]] for more details
 +
* Note we want to limit snpcall to just chr22 so the configuration already has <code>CHRS = 22</code> (default was 1-22 & X).
   −
We need to add these BAMs to our index
+
For more information on configuration, see: [[GotCloud:_Variant_Calling_Pipeline#Configuration_File|GotCloud snpcall: Configuration File]]
* Append the bam.index from the pre-aligned BAMs to the one you generated from the alignment pipeline
+
* Contains information on how to configure for exome/targeted sequencing
cat $IN/bams/bam.index >> $OUT/bam.index
  −
* ">>" will append to the file that follows it
     −
Verify your BAM index contains the additional BAMs
+
Check out what was changed.
  less $OUT/bam.index
     −
Remember, use <code>'q'</code> to exit out of <code>less</code>
+
cat $IN/gotcloud.conf
q
     −
;Do you see both sets of BAMs?
   
<ul>
 
<ul>
<div class="mw-collapsible mw-collapsed" style="width:500px">
+
<div class="mw-collapsible mw-collapsed" style="width:200px">
<li>Annotated Screenshot:</li>
+
<li>View Results</li>
<div class="mw-collapsible-content">
+
<div class="mw-collapsible-content" style="width:800px">
<ul>
+
IN_DIR = $(GOTCLOUD_ROOT)/../inputs
<li>If not, let me know</li>
+
[[File:Bamindex2.png|400px]]
+
INDEX_FILE = $(IN_DIR)/align.index
 +
FASTQ_PREFIX = $(IN_DIR)/fastq
 +
BAM_PREFIX = $(IN_DIR)/
 +
 +
OUT_DIR = out
 +
BAM_INDEX =  $(OUT_DIR)/bam.index
 +
 +
############
 +
# References
 +
REF_DIR = $(GOTCLOUD_ROOT)/../reference/chr22
 +
AS = NCBI37  # Genome assembly identifier
 +
REF = $(REF_DIR)/human.g1k.v37.chr22.fa
 +
DBSNP_VCF =  $(REF_DIR)/dbsnp_135.b37.chr22.vcf.gz
 +
HM3_VCF = $(REF_DIR)/hapmap_3.3.b37.sites.chr22.vcf.gz
 +
INDEL_PREFIX = $(REF_DIR)/1kg.pilot_release.merged.indels.sites.hg19
 +
OMNI_VCF = $(REF_DIR)/1000G_omni2.5.b37.sites.PASS.chr22.vcf.gz
 +
 +
MAP_TYPE = BWA_MEM
 +
 +
###############
 +
CHRS = 22
 +
 +
######### THUNDER ########
 +
# Update so it will run faster for the tutorial
 +
#  * 10 rounds instead of 30 (-r 10)
 +
#  * without --compact option
 +
#  Runs faster, but uses more memory, but not a lot for the small example
 +
THUNDER = $(BIN_DIR)/thunderVCF -r 10 --phase --dosage --inputPhased $(THUNDER_STATES)
 +
 +
##############################
 +
## GenomeSTRIP
 +
#############################
 +
GENOMESTRIP_SVTOOLKIT_DIR = $(GOTCLOUD_ROOT)/../reference/svtoolkit
 +
GENOMESTRIP_MASK_FASTA = $(GENOMESTRIP_SVTOOLKIT_DIR)/ref/human_g1k_v37.chr22.mask.100.fasta
 +
GENOMESTRIP_PLOIDY_MAP = $(GENOMESTRIP_SVTOOLKIT_DIR)/conf/humgen_g1k_v37_ploidy.chr22.map
 +
GENOMESTRIP_PARAM = $(GENOMESTRIP_SVTOOLKIT_DIR)/conf/genstrip_parameters.txt
 
</div>
 
</div>
 
</div>
 
</div>
 
</ul>
 
</ul>
</ul>
+
 
+
== Run GotCloud SnpCall ==
=== GotCloud Configuration File ===
+
[[File:SnpcallDiagram.png|500px]]
We will use the same configuration file as we used yesterday in GotCloud Align.
  −
 
  −
See [[SeqShop:_Sequence_Mapping_and_Assembly_Practical#GotCloud Configuration File|SeqShop: Alignment: GotCloud Configuration File]] for more details
  −
* Note we want to limit snpcall to just chr22 so the configuration already has <code>CHRS = 22</code> (default was 1-22 & X).
  −
 
  −
For more information on configuration, see: [[GotCloud:_Variant_Calling_Pipeline#Configuration_File|GotCloud snpcall: Configuration File]]
  −
* Contains information on how to configure for exome/targeted sequencing
  −
 
  −
== Run GotCloud SnpCall ==
  −
[[File:SnpcallDiagram.png|500px]]
     −
Now that we have all of our input files, we need just a simple command to run:
+
Now that we have all of our input files, we need just a simple command to run:
$GC/gotcloud snpcall --conf $IN/gotcloud.conf --numjobs 4 --region 22:36000000-37000000
+
  $GC/gotcloud snpcall --conf $IN/gotcloud.conf --numjobs 4 --region 22:36000000-37000000
 
* --numjobs tells GotCloud how many jobs to run in parallel
 
* --numjobs tells GotCloud how many jobs to run in parallel
 
** Depends on your system
 
** Depends on your system

Navigation menu