Changes

From Genome Analysis Wiki
Jump to navigationJump to search
Line 85: Line 85:  
* Subset of FASTQs - should map to chromosome 22 36000000-37000000
 
* Subset of FASTQs - should map to chromosome 22 36000000-37000000
   −
  ls ${IN}/fastq/
+
  ls ${SS}/fastq/
 
There are 24 fastq files: combination of single-end & paired-end.  
 
There are 24 fastq files: combination of single-end & paired-end.  
   Line 104: Line 104:     
Look at a couple of FASTQs:
 
Look at a couple of FASTQs:
  less -S ${IN}/fastq/HG00551.SRR190851_1.fastq
+
  less -S ${SS}/fastq/HG00551.SRR190851_1.fastq
 
<code>less</code> is a Linux command that allows you to look at a file.
 
<code>less</code> is a Linux command that allows you to look at a file.
 
*<code>-S</code> option prevents line wrap
 
*<code>-S</code> option prevents line wrap
Line 124: Line 124:     
Look at the paired read:
 
Look at the paired read:
  less -S ${IN}/fastq/HG00551.SRR190851_2.fastq  
+
  less -S ${SS}/fastq/HG00551.SRR190851_2.fastq  
    
Remember, use <code>'q'</code> to exit out of <code>less</code>
 
Remember, use <code>'q'</code> to exit out of <code>less</code>
Line 157: Line 157:     
Take a look at the chromosome 22 reference files included for this tutorial:
 
Take a look at the chromosome 22 reference files included for this tutorial:
  ls ${REF}
+
  ls ${SS}/ref22
    
<ul>
 
<ul>
Line 169: Line 169:     
Let's read the reference FASTA file (all reference bases for the chromosome):
 
Let's read the reference FASTA file (all reference bases for the chromosome):
  less ${REF}/human.g1k.v37.chr22.fa
+
  less ${SS}/ref22/human.g1k.v37.chr22.fa
    
Remember, use <code>'q'</code> to exit out of <code>less</code>
 
Remember, use <code>'q'</code> to exit out of <code>less</code>
Line 175: Line 175:     
If you want to access the FASTA file by position, you can use <code>samtools faidx</code> command
 
If you want to access the FASTA file by position, you can use <code>samtools faidx</code> command
  $GC/bin/samtools faidx $REF/human.g1k.v37.chr22.fa 22:36000000 | less
+
  ${GC}/bin/samtools faidx ${SS}/ref22/human.g1k.v37.chr22.fa 22:36000000 | less
 
or  
 
or  
  $GC/bin/samtools faidx $REF/human.g1k.v37.chr22.fa 22:36000000-36000100
+
  ${GC}/bin/samtools faidx ${SS}/ref22/human.g1k.v37.chr22.fa 22:36000000-36000100
    
; Where is the reference sequence?
 
; Where is the reference sequence?
Line 187: Line 187:  
<li>The ends of a chromosome are 'N' - unknown bases</li>
 
<li>The ends of a chromosome are 'N' - unknown bases</li>
 
<li>Let's look at 5 lines of the file starting at line 300,000</li>
 
<li>Let's look at 5 lines of the file starting at line 300,000</li>
   tail -n+300000 ${REF}/human.g1k.v37.chr22.fa |head -n 5
+
   tail -n+300000 ${SS}/ref22/human.g1k.v37.chr22.fa |head -n 5
 
[[File:Fasta.png|500px]]
 
[[File:Fasta.png|500px]]
 
</div>
 
</div>

Navigation menu