From Genome Analysis Wiki
Jump to navigationJump to search
752 bytes removed
, 10:22, 26 October 2016
Line 35: |
Line 35: |
| | | |
| == Input files == | | == Input files == |
− | * A ped file, with 5 colums [[http://www.sph.umich.edu/csg/abecasis/merlin/tour/ see merlin documentation]]. An example ped file is as follows | + | * A ped file, with 5 colums [[http://www.sph.umich.edu/csg/abecasis/merlin/tour/ see merlin documentation]]. An example ped file is as follows (Note that you can mix trios with other nuclear families in the same VCF file): |
| quartet1 p1 0 0 1 | | quartet1 p1 0 0 1 |
| quartet1 p2 0 0 2 | | quartet1 p2 0 0 2 |
| quartet1 p3 p1 p2 1 | | quartet1 p3 p1 p2 1 |
| + | quartet1 p4 p1 p2 1 |
| + | nuc1 p5 0 0 1 |
| + | nuc1 p6 0 0 2 |
| + | nuc1 p7 p1 p2 1 |
| + | nuc1 p8 p1 p2 1 |
| + | nuc1 p9 p1 p2 1 |
| + | trio1 p10 0 0 1 |
| + | trio1 p11 0 0 2 |
| + | troi1 p12 p1 p2 1 |
| + | trio2 p13 0 0 1 |
| + | trio2 p14 0 0 2 |
| + | troi2 p15 p1 p2 1 |
| | | |
| * A VCF file [[http://www.1000genomes.org/node/101 VCF specs]]. It can contain variant information for more individuals than in the ped file. | | * A VCF file [[http://www.1000genomes.org/node/101 VCF specs]]. It can contain variant information for more individuals than in the ped file. |
| ** Note: In the VCF file either PL or GL has to be provided, and only the PL (or GL) field is used in the calling. | | ** Note: In the VCF file either PL or GL has to be provided, and only the PL (or GL) field is used in the calling. |
| | | |
− | * A map file in the PLINK format. See blow for examples how to generate a high quality map file. | + | * A map file in the PLINK format. See blow for examples how to generate a map file with common and high quality variants |
| | | |
| == Examples of generating the map file == | | == Examples of generating the map file == |
Line 77: |
Line 89: |
| | | |
| == Filtering == | | == Filtering == |
− | We recommend two filtering strategies. The first is a simple filtering and the second one is more advance. Please see the triodenovo page below for more information: | + | We recommend two filtering strategies. The first is a simple filtering and the second one is more advanced. Please see the triodenovo page below for more information: |
| | | |
| http://genome.sph.umich.edu/wiki/Triodenovo | | http://genome.sph.umich.edu/wiki/Triodenovo |
− |
| |
− | 3. Further thoughts about filtering for SNVs without bam files (step 2 requires bam files). There is no consensus on filtering so this can be very flexible.
| |
− | * If you have a multi-sample call VCF it may be helpful to select those mutation candidates that appear only once in your VCF (AC=1 for example). This can be the top tier to consider. Relaxing AC to 2 or 3 can recover more real mutations but also increase false positives.
| |
− | * If it is too stringent to filter out known sites, it may be helpful to select candidates that have low (e.g. <0.002)1000G or ESP allele frequencies. Some mutations can occur on know variant sites but mutations with high population frequencies may not be of great interest, if indeed they are real.
| |
− | * Candidates in segmental duplications, low complexity regions or other copy number regions may be flagged for further analysis.
| |
− | * Candidates for which parents are not hom-ref or offspring is a double mutant are more likely to be due to artifacts so the interpretation of these candidates may require additional QC if they appear to be interesting to the investigators.
| |
| | | |
| == Download == | | == Download == |