Changes

From Genome Analysis Wiki
Jump to navigationJump to search
1,437 bytes added ,  16:09, 23 January 2013
no edit summary
Line 19: Line 19:  
<code>VcfFileReader</code> is declared in <code>VcfFileReader.h</code>, so be sure to include that file.
 
<code>VcfFileReader</code> is declared in <code>VcfFileReader.h</code>, so be sure to include that file.
 
<source lang="cpp">
 
<source lang="cpp">
#include "AspFileReader.h"
+
#include "VcfFileReader.h"
 
</source>
 
</source>
   Line 37: Line 37:  
</source>
 
</source>
   −
==== Subsetting Samples ====
+
==== Reading a Subset of Samples ====
 
To select only a subset of samples to keep, when opening the file also specify the name of the file containing the names of the samples to keep and the delimiter separating the sample names (default is a new line, '\n').
 
To select only a subset of samples to keep, when opening the file also specify the name of the file containing the names of the samples to keep and the delimiter separating the sample names (default is a new line, '\n').
 
<source lang="cpp">
 
<source lang="cpp">
Line 134: Line 134:  
The <code>VcfSubsetSamples* subset</code> parameter is a pointer to the subset of samples that you want to include when counting the number of alternate alleles.  If all samples that are read/kept are to be included, NULL should be passed in.   
 
The <code>VcfSubsetSamples* subset</code> parameter is a pointer to the subset of samples that you want to include when counting the number of alternate alleles.  If all samples that are read/kept are to be included, NULL should be passed in.   
   −
To specify a <code>VcfSubsetSamples</code> use the constructor:
+
See [[#Handling a Subset of Samples|Handling a Subset of Samples]] for how to use <code>VcfSubsetSamples</code>.
<source lang="cpp">
  −
void VcfSubsetSamples::init(const VcfHeader& header, bool include)
  −
</source>
  −
Pass in the header that was read from the VCF file.  Set <code>include</code> to true if all samples should be included except any that are specified as excluded. Set <code>include</code> to false if all samples should be excluded except any that are specified as included.  NOTE: the header is not modified to add/remove any samples.
  −
 
  −
To mark a specific sample as excluded use:
  −
<source lang ="cpp">
  −
bool VcfSubsetSamples::addExcludeSample(const char* sampleName);
  −
</source>
  −
To mark a specific sample as included use:
  −
<source lang ="cpp">
  −
bool VcfSubsetSamples::addIncludeSample(const char* sampleName);
  −
</source>
      
Use the following method to remove the DiscardMinAltAlleleCount rule:
 
Use the following method to remove the DiscardMinAltAlleleCount rule:
Line 153: Line 140:  
VcfFileReader::rmDiscardMinAltAlleleCount()
 
VcfFileReader::rmDiscardMinAltAlleleCount()
 
</source>
 
</source>
 +
 +
Example:  Minimum Alternate Allele Count = 4
 +
Sample1  Sample2  Sample3  Keep/Discard
 +
  0|0      1|1      2|2    Keep
 +
  0|0      0|1      2|2    Discard, only 3 Alternates (1 Allele1 & 2 Allele 2)
 +
  0|0      1|1      1|2    Keep
 +
  0|2      1|1      2|2    Keep
 +
  2|1      0|1      2|0    Keep
 +
 +
Example:  Minimum Alternate Allele Count = 3 & Exclude Sample2 (without the exclusion, all would be kept)
 +
Sample1  Sample2  Sample3  Keep/Discard
 +
  0|0      1|1      2|2    Discard, only 2 Alternates (0 Allele1 & 2 Allele 2)
 +
  0|0      0|1      2|2    Discard, only 2 Alternates (0 Allele1 & 2 Allele 2)
 +
  0|0      1|1      1|2    Discard, only 2 Alternates (1 Allele1 & 1 Allele 2)
 +
  0|2      1|1      2|2    Keep
 +
  2|1      0|1      2|0    Keep
 +
    
===== Minimum Minor Allele Count =====
 
===== Minimum Minor Allele Count =====
Line 164: Line 168:  
The <code>VcfSubsetSamples* subset</code> parameter is a pointer to the subset of samples that you want to include when counting the number of alleles.  If all samples that are read/kept are to be included, NULL should be passed in.   
 
The <code>VcfSubsetSamples* subset</code> parameter is a pointer to the subset of samples that you want to include when counting the number of alleles.  If all samples that are read/kept are to be included, NULL should be passed in.   
   −
To specify a <code>VcfSubsetSamples</code> use the constructor:
+
See [[#Handling a Subset of Samples|Handling a Subset of Samples]] for how to use <code>VcfSubsetSamples</code>.
<source lang="cpp">
  −
void VcfSubsetSamples::init(const VcfHeader& header, bool include)
  −
</source>
  −
Pass in the header that was read from the VCF file.  Set <code>include</code> to true if all samples should be included except any that are specified as excluded. Set <code>include</code> to false if all samples should be excluded except any that are specified as included.  NOTE: the header is not modified to add/remove any samples.
  −
 
  −
To mark a specific sample as excluded use:
  −
<source lang ="cpp">
  −
bool VcfSubsetSamples::addExcludeSample(const char* sampleName);
  −
</source>
  −
To mark a specific sample as included use:
  −
<source lang ="cpp">
  −
bool VcfSubsetSamples::addIncludeSample(const char* sampleName);
  −
</source>
      
Use the following method to remove the DiscardMinMinorAlleleCount rule:
 
Use the following method to remove the DiscardMinMinorAlleleCount rule:
Line 183: Line 174:  
VcfFileReader::rmDiscardMinMinorAlleleCount()
 
VcfFileReader::rmDiscardMinMinorAlleleCount()
 
</source>
 
</source>
 +
 +
Example:  Minimum Minor Allele Count = 2
 +
Sample1  Sample2  Sample3  Keep/Discard
 +
  0|0      1|1      2|2    Keep
 +
  0|0      0|1      2|2    Discard, only 1 Allele1
 +
  0|0      1|1      1|2    Discard, only 1 Allele2
 +
  0|2      1|1      2|2    Discard, only 1 Allele0
 +
  2|1      0|1      2|0    Keep
 +
 +
Example:  Minimum Minor Allele Count = 1 & Exclude Sample2 (without the exclusion, all would be kept)
 +
Sample1  Sample2  Sample3  Keep/Discard
 +
  0|0      1|1      2|2    Discard, 0 Allele1
 +
  0|0      0|1      2|2    Discard, 0 Allele1
 +
  0|0      1|1      1|2    Keep
 +
  0|2      1|1      2|2    Discard, 0 Allele1
 +
  2|1      0|1      2|0    Keep
    
==== Read only Certain Sections of the File / Using a VCF Index (TABIX) File ====
 
==== Read only Certain Sections of the File / Using a VCF Index (TABIX) File ====
Line 309: Line 316:  
=== VcfRecordFilter ===
 
=== VcfRecordFilter ===
 
<code>VcfRecords</code> contain the data from the <code>INFO</code> field in a <code>VcfRecordFilter</code> object.
 
<code>VcfRecords</code> contain the data from the <code>INFO</code> field in a <code>VcfRecordFilter</code> object.
 +
 +
 +
== Handling a Subset of Samples ==
 +
 +
When reading a file if you only want to process/keep a subset of samples, use [[#Reading a Subset of Samples|Reading a Subset of Samples]].  When that method is used, only the specified samples are stored.  Any further processing will only be on those samples.
 +
 +
Some methods allow the user to specify a subset of samples to operate on.  The subset specified when reading the VCF file, if any, is automatically applied since only those samples were stored.  If a different/additional subset needs to be applied for other processing, you can use the <code>VcfSubsetSamples</code> class.
 +
 +
To setup a VcfSubsetSamples object, pass the already set VCF header to:
 +
<source lang="cpp">
 +
void VcfSubsetSamples::init(const VcfHeader& header, bool include)
 +
</source>
 +
Set the <code>include</code> parameter to:
 +
* true if all samples should be included except any that are specified as excluded.
 +
* false if all samples should be excluded except any that are specified as included.
 +
 +
NOTE: the header is not modified to add/remove any samples.
 +
 +
To mark a specific sample as excluded use:
 +
<source lang="cpp">
 +
bool VcfSubsetSamples::addExcludeSample(const char* sampleName);
 +
</source>
 +
To mark a specific sample as included use:
 +
<source lang="cpp">
 +
bool VcfSubsetSamples::addIncludeSample(const char* sampleName);
 +
</source>
60

edits

Navigation menu