Difference between revisions of "GlfTrio"

From Genome Analysis Wiki
Jump to navigationJump to search
 
(13 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 +
[[Category:Software]]
 +
{| style="width:100%; background:#FF8989; margin-top:1.2em; border:1px solid #ccc;" |
 +
| style="width:100%; text-align:center; white-space:nowrap; color:#000;" |
 +
<div style="font-size:162%; border:none; margin:0; padding:.1em; color:#000;">This tool has been DEPRECATED, and replaced by [[Polymutt]]</div>
 +
|}
 +
 
'''glfTrio''' is a [[GLF]]-based variant caller for next-generation sequencing data. It takes three [[GLF]] format genotype likelihood files as input and generates a [[VCF]]-format set of variant calls as output.
 
'''glfTrio''' is a [[GLF]]-based variant caller for next-generation sequencing data. It takes three [[GLF]] format genotype likelihood files as input and generates a [[VCF]]-format set of variant calls as output.
  
Line 5: Line 11:
 
Here is an example of how <code>glfTrio</code> works:
 
Here is an example of how <code>glfTrio</code> works:
  
   glfSingle -g NA19240.chrom20.SLX.glf -b NA19240.chrom20.SLX.vcf > NA19240.chrom20.SLX.log
+
   glfTrio -f NA19239.chrom20.SLX.glf -m NA19238.chrom20.SLX.glf -c NA19240.chrom20.SLX.glf \
 +
          --father NA19239 --mother NA19238 --child NA19240 \
 +
          --minMapQuality 30 --minTotalDepth 0 --maxTotalDepth 1000 \
 +
          -b YRI.chrom20.SLX.vcf > YRI.chrom20.SLX.log
  
 
== Command Line Options ==
 
== Command Line Options ==
Line 17: Line 26:
 
=== Basic Output Options ===
 
=== Basic Output Options ===
  
   -b ''base call file''             Specifies the name of the output [[VCF]]-format base call file
+
   -b ''baseCallFile''               Specifies the name of the output [[VCF]]-format base call file
 
   -p ''threshold''                  The threshold for base calling. Base calls will be made when their posterior likelihood exceeds ''threshold''
 
   -p ''threshold''                  The threshold for base calling. Base calls will be made when their posterior likelihood exceeds ''threshold''
 
   --reference                    Positions called as homozygous reference will be included in the output.   
 
   --reference                    Positions called as homozygous reference will be included in the output.   
Line 28: Line 37:
 
                                 before a call is made. Without the --strict option, reads for individuals below the threshold are ignored.
 
                                 before a call is made. Without the --strict option, reads for individuals below the threshold are ignored.
  
   --minDepth      ''threshold''     Positions where the read depth falls below this threshold will be excluded.
+
   --minTotalDepth ''threshold''           Positions where the read depth falls below this threshold will be excluded.
   --maxDepth      ''threshold''     Positions where the read depth exceeds this threshold will be excluded.
+
   --maxTotalDepth ''threshold''           Positions where the read depth exceeds this threshold will be excluded.
  
 
=== Sample Labels ===
 
=== Sample Labels ===
  
   --father ''father label''         Specifies a label for the male parent, which will be included in the output VCF file
+
   --father ''fatherLabel''           Specifies a label for the male parent, which will be included in the output VCF file
   --mother ''mother label''         Specifies a label for the female parent, which will be included in the output VCF file
+
   --mother ''motherLabel''           Specifies a label for the female parent, which will be included in the output VCF file
   --child ''child label''           Specifies a label for the child, which will be included in the output VCF file
+
   --child ''childLabel''             Specifies a label for the child, which will be included in the output VCF file
  
 
=== X Chromosome Variant Calling ===
 
=== X Chromosome Variant Calling ===
  
   --xChr ''chromosome name''         Label for the 'X' chromosome in the GLF file
+
   --xChr ''chromosomeName''         Label for the 'X' chromosome in the GLF file
   --xStart ''sex chromosome start'' Start of the non-pseudo-autosomal portion of the X (2,709,521 bp in build 36)
+
   --xStart ''sexChromosomeStart''   Start of the non-pseudo-autosomal portion of the X (2,709,521 bp in build 36)
   --xStop ''sex chromosome end''     End of the non-pseudo-autosomal portion of the X (154,584,237 bp in build 36)
+
   --xStop ''sexChromosomeEnd''       End of the non-pseudo-autosomal portion of the X (154,584,237 bp in build 36)
 +
 
 +
For NCBI genome build 36, you should use the settings <code>--xChr X --xStart 2709521 --xStop 154584237</code>
 +
 
 +
For NCBI genome build 37, you should use the settings <code>--xChr X --xStart 2699520 --xStop 154931044</code>
  
 
== Model for Variant Calling ==
 
== Model for Variant Calling ==
  
 
== TODO ==
 
== TODO ==
 +
 +
Support for a two pass depth filter that uses the data to automatically work out appropriate filtering thresholds.
  
 
When calling genotypes on the X chromosome, glfTrio should properly account for male offspring. Currently, it assumes the offspring are female ... because the offspring for the two deeply sequenced 1000 Genomes trios are both female.
 
When calling genotypes on the X chromosome, glfTrio should properly account for male offspring. Currently, it assumes the offspring are female ... because the offspring for the two deeply sequenced 1000 Genomes trios are both female.

Latest revision as of 12:45, 1 May 2012

This tool has been DEPRECATED, and replaced by Polymutt

glfTrio is a GLF-based variant caller for next-generation sequencing data. It takes three GLF format genotype likelihood files as input and generates a VCF-format set of variant calls as output.

Basic Usage Example

Here is an example of how glfTrio works:

  glfTrio -f NA19239.chrom20.SLX.glf -m NA19238.chrom20.SLX.glf -c NA19240.chrom20.SLX.glf \
          --father NA19239 --mother NA19238 --child NA19240 \
          --minMapQuality 30 --minTotalDepth 0 --maxTotalDepth 1000 \
          -b YRI.chrom20.SLX.vcf > YRI.chrom20.SLX.log

Command Line Options

Input Files

 -f genotype likelihood file    Father's GLF-format genotype likelihood file
 -m genotype likelihood file    Mother's GLF-format genotype likelihood file
 -c genotype likelihood file    Child's GLF-format genotype likelihood file

Basic Output Options

 -b baseCallFile                Specifies the name of the output VCF-format base call file
 -p threshold                   The threshold for base calling. Base calls will be made when their posterior likelihood exceeds threshold
 --reference                    Positions called as homozygous reference will be included in the output.  
 --verbose                      Print debug information to the screen

Filtering According to Depth and Map Quality

 --minMapQuality threshold      Positions where the root-means squared mapping quality falls below this threshold will be excluded.
 --strict                       When the map quality is interpreted strictly, all three trio individuals must exceed minMapQuality 
                                before a call is made. Without the --strict option, reads for individuals below the threshold are ignored.
 --minTotalDepth threshold           Positions where the read depth falls below this threshold will be excluded.
 --maxTotalDepth threshold           Positions where the read depth exceeds this threshold will be excluded.

Sample Labels

 --father fatherLabel           Specifies a label for the male parent, which will be included in the output VCF file
 --mother motherLabel           Specifies a label for the female parent, which will be included in the output VCF file
 --child childLabel             Specifies a label for the child, which will be included in the output VCF file

X Chromosome Variant Calling

 --xChr chromosomeName          Label for the 'X' chromosome in the GLF file
 --xStart sexChromosomeStart    Start of the non-pseudo-autosomal portion of the X (2,709,521 bp in build 36)
 --xStop sexChromosomeEnd       End of the non-pseudo-autosomal portion of the X (154,584,237 bp in build 36)

For NCBI genome build 36, you should use the settings --xChr X --xStart 2709521 --xStop 154584237

For NCBI genome build 37, you should use the settings --xChr X --xStart 2699520 --xStop 154931044

Model for Variant Calling

TODO

Support for a two pass depth filter that uses the data to automatically work out appropriate filtering thresholds.

When calling genotypes on the X chromosome, glfTrio should properly account for male offspring. Currently, it assumes the offspring are female ... because the offspring for the two deeply sequenced 1000 Genomes trios are both female.