Changes

From Genome Analysis Wiki
Jump to: navigation, search
Normalization
== Setup in person at the SeqShop Workshop ==
''This section is specifically for the SeqShop Workshop computers.''
<div class="mw-collapsiblemw-collapsed" style="width:600px">
''If you are not running during the SeqShop Workshop, please skip this section.''
<div class="mw-collapsible-content">
== Setup when running on your own outside of the SeqShop Workshop ==
''This section is specifically for running on your own outside of the SeqShop Workshop.''
<div class="mw-collapsible mw-collapsed" style="width:600px">
''If you are running during the SeqShop Workshop, please skip this section.''
<div class="mw-collapsible-content">
== Running GotCloud Indel ==
${GC}/gotcloud indel --conf ${SS}/gotcloud.conf --numjobs 2 6 --region 22:36000000-37000000 --base_prefix ${SS} --outdir ${OUT}
* <code>${GC}/gotcloud</code> runs GotCloud
* <code>indel</code> tells GotCloud you want to run the indel calling pipeline.
** The Configuration file cannot read environment variables, so we need to tell GotCloud the path to the input files, ${SS}
** Alternatively, gotcloud.conf could be updated to specify the full paths
* <code>--out_diroutdir</code> tells GotCloud where to write the output.
** This could be specified in gotcloud.conf, but to allow you to use the ${OUT} to change the output location, it is specified on the command-line
</div>
</div>
This should take about 2-4-5 minutes to run.* It should end with a line like: <code>Commands finished in 289 125 secs with no errors reported</code>
If you cancelled GotCloud part way through, just rerun your GotCloud command and it will pick up where it left off.
 
== Examining GotCloud indel Ouptut ==
The columns are CHROM, POS, ID, REF, ALT, QUAL, FILTER, INFO, FORMAT, Genotype fields denoted by the sample name.
22 36662041 . AATAATT A 756 PASS AC=2;AN=114;AF=0.0175439;GC=55,2,0;GN=57; GF=0.964912,0.0350877,0;NS=57; HWEAF=0.019571;HWEGF=0.961242,0.038376,0.000383024; MLEAF=0.0196187;MLEGF=0.960762,0.0392374,2.11537e-15; HWE_LLR=-0.0222464;HWE_LPVAL=-0.182794;HWE_DF=1; FIC=-0.00372601;AB=0.384578 GT:PL:DP:AD:GQ 0/0:0,6,145:3:2,0,1:7 0/0:0,12,192:4:4,0,0:12
Here is a description of the record's fields.
22 : chromosome 36662041 : genome position . : this is the ID field that is left blank. AATAATT : the reference sequence that is replaced by the alternative sequence below. A : so this is basically a deletion of GTATAATT. 756 : QUAL field denoting validity of this variant, higher the better. PASS : a passed variant. INFO AC=2... : fields containing information about the variant. FORMAT GT:PL:DP:AD:GQ : format field labels for the genotype columns. 0/0:0,6,145:3:2,0,1:7 : genotype information.
You can obtain the same output by using the following command
GF=0.96,0.04,0 : genotype frequencies based on GC
NS=57 : no. of samples
HWEAF=0.020 : genotype likelihood based estimation of the allele frequency assuming Hardy Weinberg equilibrium
HWEGF=0.96,0.04,0.00 : genotype frequency derived from HWEAF
HWE_LPVAL=-0.18 : log p value of HWE test
FIC=-0.003 : genotype likelihood based inbreeding coefficient, ranges -1 to 1. <0 denotes excess of heterozygotes and >0 means excess of homozygotes assuming HWE. AB=0.38 : genotype likelihood based allele balance, ranges 0 to 1 with 0.5 for balance, >0.5 meaning reference bias and <0.5 denoting alternate allele bias.
=====GENOTYPE field=====
It is usually useful to examine the call sets against known data sets for the passed variants.
<div class="mw-collapsiblemw-collapsed" style="width:500px">
''Command to use at SeqShop Workshop:''
<div class="mw-collapsible-content">
</div>
</div>
<div class="mw-collapsible mw-collapsed" style="width:500px">
''Commands outside of SeqShop Workshop:''
<div class="mw-collapsible-content">
We perform the same analysis for the failed variants again, the relatively low overlap with known data sets imply a reasonable tradeoff in sensitivity and specificity.
<div class="mw-collapsiblemw-collapsed" style="width:500px">
''Command to use at SeqShop Workshop:''
<div class="mw-collapsible-content">
</div>
</div>
<div class="mw-collapsible mw-collapsed" style="width:500px">
''Command outside of SeqShop Workshop:''
<div class="mw-collapsible-content">
UMICH's algorithm for normalization has been adopted by Petr Danecek in bcftools and is also used in GKNO.
 
 
== Return to Workshop Wiki Page ==
Return to main workshop wiki page: [[SeqShop: December 2014]]

Navigation menu