Changes

From Genome Analysis Wiki
Jump to navigationJump to search
752 bytes removed ,  10:22, 26 October 2016
Line 35: Line 35:     
== Input files ==
 
== Input files ==
* A ped file, with 5 colums [[http://www.sph.umich.edu/csg/abecasis/merlin/tour/ see merlin documentation]]. An example ped file is as follows
+
* A ped file, with 5 colums [[http://www.sph.umich.edu/csg/abecasis/merlin/tour/ see merlin documentation]]. An example ped file is as follows (Note that you can mix trios with other nuclear families in the same VCF file):
 
  quartet1 p1  0  0  1
 
  quartet1 p1  0  0  1
 
  quartet1 p2  0  0  2
 
  quartet1 p2  0  0  2
 
  quartet1 p3  p1 p2  1
 
  quartet1 p3  p1 p2  1
 +
quartet1 p4  p1 p2  1
 +
nuc1 p5  0  0  1
 +
nuc1 p6  0  0  2
 +
nuc1 p7  p1 p2  1
 +
nuc1 p8  p1 p2  1
 +
nuc1 p9  p1 p2  1
 +
trio1 p10  0  0  1
 +
trio1 p11  0  0  2
 +
troi1 p12  p1 p2  1
 +
trio2 p13  0  0  1
 +
trio2 p14  0  0  2
 +
troi2 p15  p1 p2  1
    
* A VCF file [[http://www.1000genomes.org/node/101 VCF specs]]. It can contain variant information for more individuals than in the ped file.
 
* A VCF file [[http://www.1000genomes.org/node/101 VCF specs]]. It can contain variant information for more individuals than in the ped file.
 
** Note: In the VCF file either PL or GL has to be provided, and only the PL (or GL) field is used in the calling.
 
** Note: In the VCF file either PL or GL has to be provided, and only the PL (or GL) field is used in the calling.
   −
* A map file in the PLINK format. See blow for examples how to generate a high quality map file.
+
* A map file in the PLINK format. See blow for examples how to generate a map file with common and high quality variants
    
== Examples of generating the map file ==
 
== Examples of generating the map file ==
Line 77: Line 89:     
== Filtering ==
 
== Filtering ==
We recommend two filtering strategies. The first is a simple filtering and the second one is more advance. Please see the triodenovo page below for more information:
+
We recommend two filtering strategies. The first is a simple filtering and the second one is more advanced. Please see the triodenovo page below for more information:
    
http://genome.sph.umich.edu/wiki/Triodenovo
 
http://genome.sph.umich.edu/wiki/Triodenovo
  −
3. Further thoughts about filtering for SNVs without bam files (step 2 requires bam files). There is no consensus on filtering so this can be very flexible.
  −
* If you have a multi-sample call VCF it may be helpful to select those mutation candidates that appear only once in your VCF (AC=1 for example). This can be the top tier to consider. Relaxing AC to 2 or 3 can recover more real mutations but also increase false positives.
  −
* If it is too stringent to filter out known sites, it may be helpful to select candidates that have low (e.g. <0.002)1000G or ESP allele frequencies. Some mutations can occur on know variant sites but mutations with high population frequencies may not be of great interest, if indeed they are real.
  −
* Candidates in segmental duplications, low complexity regions or other copy number regions may be flagged for further analysis.
  −
* Candidates for which parents are not hom-ref or offspring is a double mutant are more likely to be due to artifacts so the interpretation of these candidates may require additional QC if they appear to be interesting to the investigators.
      
== Download ==
 
== Download ==
480

edits

Navigation menu