Changes

From Genome Analysis Wiki
Jump to navigationJump to search
Line 10: Line 10:       −
* '''Convert files to VCF Format''' : Start by converting the unphased, quality controlled data set into VCF format. See our wiki page on .
+
* '''Convert files to VCF Format:''' Start by converting the unphased, quality controlled data set into VCF format. See our wiki page on [[Minimac3 Cookbook : Converting Files to VCF| Converting to VCF]] for more details on how to convert.
   −
* '''Split the data by Sex''' : Start by splitting the unphased, quality controlled data set by sex.
+
* '''Split the data into PAR and non-PAR:''' Separate the pseudo-autosomal part and non-pseudo-autosomal part into separate files. The non-PAR is located on <font face=Courier>'''chrX:2699520-154931043'''</font> on build hg19. The split can be done for VCF files as follows.
   −
* '''Split the data into PAR and non-PAR:''' Separate the pseudo-autosomal part and non-pseudo-autosomal part into separate files. The PAR is located on <font face=Courier>'''chrX:1-2709520'''</font> and <font face=Courier>'''chrX:154584238-154913754'''</font> on build hg18 and <font face=Courier>'''chrX:60001-2699519'''</font> and <font face=Courier>'''chrX:154931044-155260560'''</font> on build hg19. The split can be done for VCF files as follows (for build hg19):
+
  vcftools --gzvcf gwas.data.vcf.gz \
 
  −
  vcftools --gzvcf males.gwas.data.vcf.gz \
   
           --from-bp 2699520 \
 
           --from-bp 2699520 \
 
           --to-bp 154931043 \
 
           --to-bp 154931043 \
 
           --recode \
 
           --recode \
           --out males.non.PAR.gwas.data
+
           --out Non.PAR.gwas.data
 
  &nbsp;
 
  &nbsp;
  vcftools --gzvcf males.gwas.data.vcf.gz \
+
  vcftools --gzvcf gwas.data.vcf.gz \
           --exclude-positions males.non.PAR.gwas.data.recode.vcf \
+
           --exclude-positions Non.PAR.gwas.data.recode.vcf \
 +
          --recode \
 +
          --out PAR.gwas.data
 +
 
 +
'''NOTE''': After this step, please verify that the male samples have only one haplotype in <font face=Courier>Non.PAR.gwas.data.recode.vcf</font> and two haplotypes in <font face=Courier>PAR.gwas.data.recode.vcf</font>
 +
 
 +
* '''Split the non-PAR data by Sex:''' Separate the non-PAR data by sex, which can also be done by vcftools as follows. Note that the <font face=Courier>PAR.gwas.data.recode.vcf</font> need NOT be separated since both males and females are diploids there.
 +
 
 +
vcftools --vcf Non.PAR.gwas.data.recode.vcf \
 +
          --keep male.sample.list        ## or female.sample.list \
 
           --recode \
 
           --recode \
           --out males.PAR.gwas.data
+
           --out Male.Non.PAR.gwas.data   ## or Female.Non.PAR.gwas.data \
 +
 
 +
* '''Pre-phase PAR data and female non-PAR data:''' Out of the three available data, only the PAR data and female non-PAR data have two haplotypes and thus need to be phased, while the male non-PAR data has haploids and need not be phased. See our wiki page on [[Minimac3 Cookbook : Pre-Phasing| Pre-Phasing]] and [[Minimac3 Cookbook : Converting Files to VCF| Converting to VCF]] for further details on pre-phasing and converting files back to VCF format.
   −
* '''Impute Sex and PAR/non-PAR separately:''' The following example illustrates how to do that (files available in <code>Minimac3/test/</code> directory)
+
* '''Impute Data:''' The following example illustrates how to impute into the pahsed PAR data (both males and females together), phased female non-PAR data and haploid male non-PAR data as follows:
    
  # Male Samples (Non-PAR)
 
  # Male Samples (Non-PAR)
 
   ../bin/Minimac3 --refHaps refPanelChrX.Non.Auto.vcf \
 
   ../bin/Minimac3 --refHaps refPanelChrX.Non.Auto.vcf \
                   --haps targetStudyChrX.males.vcf \
+
                   --haps Phased.Male.Non.PAR.gwas.data.vcf \
 
                   --prefix testRun.males.Non.PAR
 
                   --prefix testRun.males.Non.PAR
 
  &nbsp;
 
  &nbsp;
 
  # Female Samples (Non-PAR)
 
  # Female Samples (Non-PAR)
 
   ../bin/Minimac3 --refHaps refPanelChrX.Non.Auto.vcf \
 
   ../bin/Minimac3 --refHaps refPanelChrX.Non.Auto.vcf \
                   --haps targetStudyChrX.females.vcf \
+
                   --haps Phased.Female.Non.PAR.gwas.data.vcf \
 
                   --prefix testRun.females.Non.PAR
 
                   --prefix testRun.females.Non.PAR
 
  &nbsp;
 
  &nbsp;
  # Male Samples (PAR)
+
  # All Samples (PAR)
  ../bin/Minimac3 --refHaps refPanelChrX.Non.Auto.vcf \
  −
                  --haps targetStudyChrX.males.vcf \
  −
                  --prefix testRun.males.PAR
  −
&nbsp;
  −
# Female Samples (PAR)
   
   ../bin/Minimac3 --refHaps refPanelChrX.Non.Auto.vcf \
 
   ../bin/Minimac3 --refHaps refPanelChrX.Non.Auto.vcf \
                   --haps targetStudyChrX.females.vcf \
+
                   --haps PAR.gwas.data.recode.vcf \
                   --prefix testRun.females.PAR
+
                   --prefix testRun.All.PAR
    
* '''NOTE:''' For imputing non-PAR of chromosome X, user must analyze male and female samples separately, otherwise program would crash. User should also ensure that the reference panel consists of only PAR or non-PAR region of chromosome X, otherwise program would crash.
 
* '''NOTE:''' For imputing non-PAR of chromosome X, user must analyze male and female samples separately, otherwise program would crash. User should also ensure that the reference panel consists of only PAR or non-PAR region of chromosome X, otherwise program would crash.
487

edits

Navigation menu