Changes

From Genome Analysis Wiki
Jump to navigationJump to search
78 bytes removed ,  10:22, 26 October 2016
Line 35: Line 35:     
== Input files ==
 
== Input files ==
* A ped file, with 5 colums [[http://www.sph.umich.edu/csg/abecasis/merlin/tour/ see merlin documentation]]. An example ped file is as follows
+
* A ped file, with 5 colums [[http://www.sph.umich.edu/csg/abecasis/merlin/tour/ see merlin documentation]]. An example ped file is as follows (Note that you can mix trios with other nuclear families in the same VCF file):
 
  quartet1 p1  0  0  1
 
  quartet1 p1  0  0  1
 
  quartet1 p2  0  0  2
 
  quartet1 p2  0  0  2
 
  quartet1 p3  p1 p2  1
 
  quartet1 p3  p1 p2  1
 +
quartet1 p4  p1 p2  1
 +
nuc1 p5  0  0  1
 +
nuc1 p6  0  0  2
 +
nuc1 p7  p1 p2  1
 +
nuc1 p8  p1 p2  1
 +
nuc1 p9  p1 p2  1
 +
trio1 p10  0  0  1
 +
trio1 p11  0  0  2
 +
troi1 p12  p1 p2  1
 +
trio2 p13  0  0  1
 +
trio2 p14  0  0  2
 +
troi2 p15  p1 p2  1
    
* A VCF file [[http://www.1000genomes.org/node/101 VCF specs]]. It can contain variant information for more individuals than in the ped file.
 
* A VCF file [[http://www.1000genomes.org/node/101 VCF specs]]. It can contain variant information for more individuals than in the ped file.
 
** Note: In the VCF file either PL or GL has to be provided, and only the PL (or GL) field is used in the calling.
 
** Note: In the VCF file either PL or GL has to be provided, and only the PL (or GL) field is used in the calling.
 +
 +
* A map file in the PLINK format. See blow for examples how to generate a map file with common and high quality variants
 +
 +
== Examples of generating the map file ==
 +
* vcf2map: generate a sparse map file (see [[#Download|Download]] for files genetic_map_GRCh37_chr1.txt  and 1000G.SNV.clean.MAF0.05.tbl.gz)
 +
  vcf2map --vcf input.vcf --ped input.ped --map genetic_map_GRCh37_chr1.txt --include_list 1000G.SNV.clean.MAF0.05.tbl.gz --out_map chr1.map
 +
 +
* User defined r2 cutoff for LD pruning , min of average depth for filtering
 +
vcf2map --vcf input.vcf --ped input.ped --map genetic_map_GRCh37_chr1.txt --include_list 1000G.SNV.clean.MAF0.05.tbl.gz --max_r2 0.2 --min_avg_dp 2 --out_map chr1.r0.2.map
    
== Output ==
 
== Output ==
Line 68: Line 89:     
== Filtering ==
 
== Filtering ==
We recommend two filtering strategies. The first is a simple filtering and the second one is more advance. Please see the triodenovo page below for more information:
+
We recommend two filtering strategies. The first is a simple filtering and the second one is more advanced. Please see the triodenovo page below for more information:
    
http://genome.sph.umich.edu/wiki/Triodenovo
 
http://genome.sph.umich.edu/wiki/Triodenovo
  −
3. Further thoughts about filtering for SNVs without bam files (step 2 requires bam files). There is no consensus on filtering so this can be very flexible.
  −
* If you have a multi-sample call VCF it may be helpful to select those mutation candidates that appear only once in your VCF (AC=1 for example). This can be the top tier to consider. Relaxing AC to 2 or 3 can recover more real mutations but also increase false positives.
  −
* If it is too stringent to filter out known sites, it may be helpful to select candidates that have low (e.g. <0.002)1000G or ESP allele frequencies. Some mutations can occur on know variant sites but mutations with high population frequencies may not be of great interest, if indeed they are real.
  −
* Candidates in segmental duplications, low complexity regions or other copy number regions may be flagged for further analysis.
  −
* Candidates for which parents are not hom-ref or offspring is a double mutant are more likely to be due to artifacts so the interpretation of these candidates may require additional QC if they appear to be interesting to the investigators.
      
== Download ==
 
== Download ==
480

edits

Navigation menu