Changes

From Genome Analysis Wiki
Jump to navigationJump to search
72 bytes added ,  10:24, 5 April 2013
no edit summary
Line 37: Line 37:     
   #Rebuild the reference
 
   #Rebuild the reference
   bwa index -a is ref/human_g1k_v37_chr20.fa
+
   bin/bwa index -a is ref/human_g1k_v37_chr20.fa
 
</source>
 
</source>
   Line 44: Line 44:  
Next, we will use BWA to find the most likely sequence location for each read using the <code>bwa aln</code> command. This command requires two parameters, one corresponding to the reference genome, the other corresponding to a fastq file containing reads to be mapped.  
 
Next, we will use BWA to find the most likely sequence location for each read using the <code>bwa aln</code> command. This command requires two parameters, one corresponding to the reference genome, the other corresponding to a fastq file containing reads to be mapped.  
   −
   bwa aln -q 15 ref/human_g1k_v37_chr20.fa fastq/SAMPLE1.fastq > bwa.sai/SAMPLE1.sai
+
   bin/bwa aln -q 15 ref/human_g1k_v37_chr20.fa fastq/SAMPLE1.fastq > bwa.sai/SAMPLE1.sai
    
The file SAMPLE1.fastq contains DNA sequence reads for sample SAMPLE1.  
 
The file SAMPLE1.fastq contains DNA sequence reads for sample SAMPLE1.  
Line 54: Line 54:  
The .sai alignment format is specific to BWA, so the first thing to do is to convert the alignment to a more standard format that will be compatible with downstream analysis tools. We can do this with a combination of the <code>bwa samse</code> command and <code>samtools view</code> and <code>samtoosl sort</code> commands.
 
The .sai alignment format is specific to BWA, so the first thing to do is to convert the alignment to a more standard format that will be compatible with downstream analysis tools. We can do this with a combination of the <code>bwa samse</code> command and <code>samtools view</code> and <code>samtoosl sort</code> commands.
   −
   bwa samse -r "@RG\tID:ILLUMINA\tSM:SAMPLE1" ref/human_g1k_v37_chr20.fa bwa.sai/SAMPLE1.sai fastq/SAMPLE1.fastq | \
+
   bin/bwa samse -r "@RG\tID:ILLUMINA\tSM:SAMPLE1" ref/human_g1k_v37_chr20.fa bwa.sai/SAMPLE1.sai fastq/SAMPLE1.fastq | \
   samtools view -uhS - | samtools sort -m 2000000000 - bams/SAMPLE1
+
   bin/samtools view -uhS - | samtools sort -m 2000000000 - bams/SAMPLE1
    
You can check the use of parameters in the bwa manual. The result BAM file uses a compact binary format to represent the  
 
You can check the use of parameters in the bwa manual. The result BAM file uses a compact binary format to represent the  
Line 61: Line 61:  
of the file using the <code>samtools view</code> command, like so:
 
of the file using the <code>samtools view</code> command, like so:
   −
   samtools view bams/SAMPLE1.bam | more
+
   bin/samtools view bams/SAMPLE1.bam | more
    
The text representation of the alignment produced by <code>samtools view</code> describes
 
The text representation of the alignment produced by <code>samtools view</code> describes
Line 85: Line 85:  
genome location. We do this with the <code>samtools index</code> command, like so:
 
genome location. We do this with the <code>samtools index</code> command, like so:
   −
   samtools index bams/SAMPLE1.bam
+
   bin/samtools index bams/SAMPLE1.bam
    
=== Browsing Alignment Results ===
 
=== Browsing Alignment Results ===
Line 94: Line 94:  
starting at position 2,100,000 on chromosome 20, we could run:
 
starting at position 2,100,000 on chromosome 20, we could run:
   −
   samtools tview bams/SAMPLE1.bam ref/human_g1k_v37_chr20.fa
+
   bin/samtools tview bams/SAMPLE1.bam ref/human_g1k_v37_chr20.fa
    
Then, type "g 20:2100000"
 
Then, type "g 20:2100000"
Line 108: Line 108:  
   foreach file (`ls fastq/SAMPLE*.fastq | cut -f 2 -d '/' | cut -f 1 -d '.'`)
 
   foreach file (`ls fastq/SAMPLE*.fastq | cut -f 2 -d '/' | cut -f 1 -d '.'`)
 
   echo $file
 
   echo $file
   bwa aln -q 15 ref/human_g1k_v37_chr20.fa fastq/$file.fastq > bwa.sai/$file.sai
+
   bin/bwa aln -q 15 ref/human_g1k_v37_chr20.fa fastq/$file.fastq > bwa.sai/$file.sai
   bwa samse -r "@RG\tID:ILLUMINA\tSM:$file" ref/human_g1k_v37_chr20.fa bwa.sai/$file.sai fastq/$file.fastq | \
+
   bin/bwa samse -r "@RG\tID:ILLUMINA\tSM:$file" ref/human_g1k_v37_chr20.fa bwa.sai/$file.sai fastq/$file.fastq | \
     samtools view -uhS - | bin/samtools sort -m 2000000000 - bams/$file
+
     bin/samtools view -uhS - | bin/samtools sort -m 2000000000 - bams/$file
   samtools index bams/$file.bam
+
   bin/samtools index bams/$file.bam
 
   end
 
   end
   Line 118: Line 118:  
  for file in `ls fastq/SAMPLE*.fastq | cut -f 2 -d '/' | cut -f 1 -d '.'`;
 
  for file in `ls fastq/SAMPLE*.fastq | cut -f 2 -d '/' | cut -f 1 -d '.'`;
 
   do echo $file;
 
   do echo $file;
   bwa aln -q 15 ref/human_g1k_v37_chr20.fa fastq/$file.fastq > bwa.sai/$file.sai;
+
   bin/bwa aln -q 15 ref/human_g1k_v37_chr20.fa fastq/$file.fastq > bwa.sai/$file.sai;
   bwa samse -r "@RG\tID:ILLUMINA\tSM:$file" ref/human_g1k_v37_chr20.fa bwa.sai/$file.sai fastq/$file.fastq | \
+
   bin/bwa samse -r "@RG\tID:ILLUMINA\tSM:$file" ref/human_g1k_v37_chr20.fa bwa.sai/$file.sai fastq/$file.fastq | \
     samtools view -uhS - | samtools sort -m 2000000000 - bams/$file;
+
     bin/samtools view -uhS - | samtools sort -m 2000000000 - bams/$file;
   samtools index bams/$file.bam;
+
   bin/samtools index bams/$file.bam;
 
  done
 
  done
   Line 137: Line 137:     
    
 
    
   samtools mpileup -Iuf ref/human_g1k_v37_chr20.fa bams/SAMPLE*bam | bcftools view -bvcg - > result/chr20.mpileup.bcf
+
   bin/samtools mpileup -Iuf ref/human_g1k_v37_chr20.fa bams/SAMPLE*bam | bcftools view -bvcg - > result/chr20.mpileup.bcf
   −
   bcftools view result/chr20.mpileup.bcf  > result/chr20.mpileup.vcf
+
   bin/bcftools view result/chr20.mpileup.bcf  > result/chr20.mpileup.vcf
      Line 215: Line 215:  
To complete our example analysis, we could run:
 
To complete our example analysis, we could run:
 
    
 
    
   TrioCaller --vcf result/chr20.mpileup.vcf --pedfile ped/triocaller.ped --states 50 --rounds 10 --prefix result/chr20.triocaller
+
   bin/TrioCaller --vcf result/chr20.mpileup.vcf --pedfile ped/triocaller.ped --states 50 --rounds 10 --prefix result/chr20.triocaller
    
The format of output file is same as the input file. Again, you can review the contents of the updated VCF file using the more command:
 
The format of output file is same as the input file. Again, you can review the contents of the updated VCF file using the more command:
533

edits

Navigation menu