Line 1: |
Line 1: |
| + | '''Note:''' the latest version of this practical is available at: [[SeqShop: Sequence Mapping and Assembly Practical]] |
| + | * The ones here is the original one from the June workshop (updated to be run from elsewhere) |
| + | |
| == Introduction == | | == Introduction == |
− | See the [[introductory slides]] for an intro to this tutorial. | + | See the [[Media:SeqShop - GotCloud Align.pdf|introductory slides]] for an intro to this tutorial. |
| + | |
| == Goals of This Session == | | == Goals of This Session == |
| * What we want to learn | | * What we want to learn |
Line 7: |
Line 11: |
| ** How to evaluate the quality of sequence data | | ** How to evaluate the quality of sequence data |
| ** How to visualize sequence data to examine the reads aligned to particular genomic positions | | ** How to visualize sequence data to examine the reads aligned to particular genomic positions |
| + | |
| + | == Setup in person at the SeqShop Workshop == |
| + | ''This section is specifically for the SeqShop Workshop computers.'' |
| + | <div class="mw-collapsible mw-collapsed" style="width:600px"> |
| + | ''If you are not running during the SeqShop Workshop, please skip this section.'' |
| + | <div class="mw-collapsible-content"> |
| | | |
| | | |
| {{SeqShopLogin}} | | {{SeqShopLogin}} |
| | | |
− | == Setup your run environment== | + | === Setup your run environment=== |
| | | |
| This will setup some environment variables to point you to | | This will setup some environment variables to point you to |
− | * GotCloud program | + | * [[GotCloud]] program |
| * Tutorial input files | | * Tutorial input files |
| * Setup an output directory | | * Setup an output directory |
| source /home/mktrost/seqshop/setup.txt | | source /home/mktrost/seqshop/setup.txt |
| + | * You won't see any output after running <code>source</code> |
| + | ** It silently sets up your environment |
| <div class="mw-collapsible mw-collapsed" style="width:200px"> | | <div class="mw-collapsible mw-collapsed" style="width:200px"> |
| View setup.txt | | View setup.txt |
| <div class="mw-collapsible-content"> | | <div class="mw-collapsible-content"> |
− | [[File:setup.png|300px]] | + | [[File:setup.png|500px]] |
| + | </div> |
| + | </div> |
| </div> | | </div> |
| </div> | | </div> |
| | | |
| + | == Setup when running on your own outside of the SeqShop Workshop == |
| + | ''This section is specifically for running on your own outside of the SeqShop Workshop.'' |
| + | <div class="mw-collapsible" style="width:600px"> |
| + | ''If you are running during the SeqShop Workshop, please skip this section.'' |
| + | <div class="mw-collapsible-content"> |
| + | === Download & Build GotCloud === |
| + | If you do not already have GotCloud: |
| + | * cd to where you want GotCloud installed (you can change this to any directory you want) |
| + | mkdir -p ~/seqshop |
| + | cd ~/seqshop/ |
| + | * download, decompress, and build the version of gotcloud that was tested with this tutorial: |
| + | wget https://github.com/statgen/gotcloud/archive/gotcloud.workshop.tar.gz |
| + | tar xvf gotcloud.workshop.tar.gz |
| + | mv gotcloud-gotcloud.workshop gotcloud |
| + | cd gotcloud/src |
| + | make |
| + | cd ../.. |
| + | |
| + | Remember the path to gotcloud/ that is what you will need to set your GC variable to. |
| + | |
| + | === Download the example data === |
| + | Download and untar file containing the example data used in the practicals: |
| + | wget http://csg.sph.umich.edu//mktrost/seqshopExample.tar.gz |
| + | tar xvf seqshopExample.tar.gz |
| + | |
| + | You will see the names of all the files included in the example data scrolling on the screen as they are unpacked from the tar file. |
| | | |
− | Alternatively, if you would like to change the output directory, copy the file, make the modifications and source your own file:
| + | {{SeqShopRemoteEnv}} |
− | cp /home/mktrost/seqshop/setup.txt ~/setup.txt
| |
− | nedit ~/setup.txt
| |
− | source ~/setup.txt
| |
− | (You can use your favorite editor instead of nedit. I typically use emacs, but nedit is more like Windows.)
| |
| | | |
− | == Examining Raw Sequence Reads == | + | == Examining [[GotCloud]] Align Input Files == |
| + | === Examining Raw Sequence Reads : FASTQs === |
| FASTQ : standard file format provided to you by those who did the sequencing. | | FASTQ : standard file format provided to you by those who did the sequencing. |
| : For more information on the FASTQ format, see: http://en.wikipedia.org/wiki/FASTQ_format | | : For more information on the FASTQ format, see: http://en.wikipedia.org/wiki/FASTQ_format |
Line 39: |
Line 76: |
| * Subset of FASTQs - should map to chromosome 22 36000000-37000000 | | * Subset of FASTQs - should map to chromosome 22 36000000-37000000 |
| | | |
− | ls ${IN}/fastq/ | + | ls ${SS}/fastq/ |
| There are 24 fastq files: combination of single-end & paired-end. | | There are 24 fastq files: combination of single-end & paired-end. |
| | | |
Line 58: |
Line 95: |
| | | |
| Look at a couple of FASTQs: | | Look at a couple of FASTQs: |
− | less -S ${IN}/fastq/HG00551.SRR190851_1.fastq | + | less -S ${SS}/fastq/HG00551.SRR190851_1.fastq |
| <code>less</code> is a Linux command that allows you to look at a file. | | <code>less</code> is a Linux command that allows you to look at a file. |
− | *<code>-S</code> option prevents line wrap. | + | *<code>-S</code> option prevents line wrap |
− | * Use the arrow (up/down/left/right) keys to scroll through the file. | + | * Use the arrow (up/down/left/right) keys to scroll through the file |
− | * Use <code>zless</code> if the file is compressed. | + | * Use the <code>space bar</code> to jump down a page |
| Use <code>'q'</code> to exit out of <code>less</code> | | Use <code>'q'</code> to exit out of <code>less</code> |
| q | | q |
Line 78: |
Line 115: |
| | | |
| Look at the paired read: | | Look at the paired read: |
− | less -S ${IN}/fastq/HG00551.SRR190851_2.fastq | + | less -S ${SS}/fastq/HG00551.SRR190851_2.fastq |
| + | |
| + | Remember, use <code>'q'</code> to exit out of <code>less</code> |
| + | q |
| | | |
| ;Do you notice something in common? | | ;Do you notice something in common? |
Line 94: |
Line 134: |
| </ul> | | </ul> |
| </ul> | | </ul> |
− |
| |
− | == GotCloud Alignment Pipeline==
| |
− |
| |
− | [[File:AlignDiagram.png|500px]]
| |
− |
| |
− | === Why GotCloud?===
| |
− | * Easy to learn & run
| |
− | ** All-in-one sequence analysis pipeline
| |
− | ** You don’t have to know the details of individual component
| |
− | * Robust parallelization
| |
− | ** Automatic partition of multi-sample jobs
| |
− | ** Reliable and fault-tolerant parallelization via GNU make
| |
− | *** Restart from where it stopped upon unexpected crash
| |
− | * Cloud & Cluster-friendly
| |
− | ** Supports multiple clusters such as MOSIX, Slurm, & SGE
| |
− | ** Amazon instances allow running large-scale jobs without having your own cluster
| |
− |
| |
− | Sequence Processing Recommendations
| |
− | * Be consistent within a project
| |
− | ** Process all samples with same pipeline
| |
− | *** Batch effect may arise if different pipelines are used due to pipeline differences
| |
− | ** Use the same configuration within a project
| |
− |
| |
− | == Examining GotCloud Align Input Files ==
| |
− | === Sequence Data Files : FASTQs ===
| |
− | We already looked at those in: [[#Examining Raw Sequence Reads|Examining Raw Sequence Reads]]
| |
| | | |
| === Reference Files === | | === Reference Files === |
− | Reference files can be downloaded with GotCloud or from other sources | + | Reference files can be downloaded with [[GotCloud]] or from other sources |
| * See [[GotCloud: Genetic Reference and Resource Files]] for more information on downloading/generating reference files | | * See [[GotCloud: Genetic Reference and Resource Files]] for more information on downloading/generating reference files |
| | | |
Line 134: |
Line 148: |
| | | |
| Take a look at the chromosome 22 reference files included for this tutorial: | | Take a look at the chromosome 22 reference files included for this tutorial: |
− | ls ${REF} | + | ls ${SS}/ref22 |
| | | |
| <ul> | | <ul> |
Line 140: |
Line 154: |
| <li>View Screenshot</li> | | <li>View Screenshot</li> |
| <div class="mw-collapsible-content"> | | <div class="mw-collapsible-content"> |
− | [[File:RefDir.png|500px]] | + | [[File:RefDir.png|700px]] |
| </div> | | </div> |
| </div> | | </div> |
Line 146: |
Line 160: |
| | | |
| Let's read the reference FASTA file (all reference bases for the chromosome): | | Let's read the reference FASTA file (all reference bases for the chromosome): |
− | less ${REF}/human.g1k.v37.chr22.fa | + | less ${SS}/ref22/human.g1k.v37.chr22.fa |
| + | |
| + | Remember, use <code>'q'</code> to exit out of <code>less</code> |
| + | q |
| | | |
| ; Where is the reference sequence? | | ; Where is the reference sequence? |
Line 156: |
Line 173: |
| <li>The ends of a chromosome are 'N' - unknown bases</li> | | <li>The ends of a chromosome are 'N' - unknown bases</li> |
| <li>Let's look at 5 lines of the file starting at line 300,000</li> | | <li>Let's look at 5 lines of the file starting at line 300,000</li> |
− | tail -n+300000 ${REF}/human.g1k.v37.chr22.fa |head -n 5 | + | tail -n+300000 ${SS}/ref22/human.g1k.v37.chr22.fa |head -n 5 |
| [[File:Fasta.png|500px]] | | [[File:Fasta.png|500px]] |
| </div> | | </div> |
Line 162: |
Line 179: |
| </ul> | | </ul> |
| </ul> | | </ul> |
| + | |
| + | If you want to access the FASTA file by position, you can use <code>samtools faidx</code> command |
| + | ${GC}/bin/samtools faidx ${SS}/ref22/human.g1k.v37.chr22.fa 22:36000000 | less |
| + | or |
| + | ${GC}/bin/samtools faidx ${SS}/ref22/human.g1k.v37.chr22.fa 22:36000000-36000100 |
| | | |
| === GotCloud FASTQ Index File === | | === GotCloud FASTQ Index File === |
Line 179: |
Line 201: |
| | | |
| Let's look a look at the index file I prepared for this tutorial: | | Let's look a look at the index file I prepared for this tutorial: |
− | less -S ${IN}/align.index | + | less -S ${SS}/align.index |
| + | |
| + | Remember, use <code>'q'</code> to exit out of <code>less</code> |
| + | q |
| | | |
− | ; Which samples had multiple Runs? | + | ; Which samples had multiple runs? |
| <ul> | | <ul> |
| <div class="mw-collapsible mw-collapsed" style="width:500px"> | | <div class="mw-collapsible mw-collapsed" style="width:500px"> |
| <li>Need a reminder of the format?</li> | | <li>Need a reminder of the format?</li> |
| <div class="mw-collapsible-content"> | | <div class="mw-collapsible-content"> |
− | [[File:fqindex.png|650px]] | + | [[File:fqindex.png|750px]] |
| + | </div> |
| + | </div> |
| + | <ul> |
| + | <li>Note: in the screenshots, the fields are shifted into clear columns to make it easier to read</li> |
| + | <ul> |
| + | <li>When you view the file, the fields will not line up in neat columns and it can be hard to read</li> |
| + | </ul> |
| + | </ul> |
| + | <div class="mw-collapsible mw-collapsed" style="width:500px"> |
| + | <li>Hard to read the index? Need a hint?</li> |
| + | <div class="mw-collapsible-content"> |
| + | <ul> |
| + | <li>Use cut to extract just the MERGE_NAME & RGID fields </li> |
| + | cut -f 1,4 ${SS}/align.index |
| + | </ul> |
| </div> | | </div> |
| </div> | | </div> |
Line 195: |
Line 235: |
| <li>HG00553 & HG00640</li> | | <li>HG00553 & HG00640</li> |
| <li>They have multiple unique values in the RGID field</li> | | <li>They have multiple unique values in the RGID field</li> |
− | [[File:fqindexRG.png|650px]] | + | [[File:fqindexRG.png|800px]] |
| </div> | | </div> |
| </div> | | </div> |
Line 202: |
Line 242: |
| | | |
| | | |
− | How do you point Gotcloud to your index file? | + | How do you point GotCloud to your index file? |
| * Command-line <code>--index_file</code> option | | * Command-line <code>--index_file</code> option |
| : or | | : or |
Line 215: |
Line 255: |
| ** Most values should be left as the defaults | | ** Most values should be left as the defaults |
| * Specify values in your configuration file as: | | * Specify values in your configuration file as: |
− | KEY = value
| + | ** <code>KEY = value</code> |
| * Use $(KEY) to refer to another key's value | | * Use $(KEY) to refer to another key's value |
| * If a KEY is specified twice, the later value is used | | * If a KEY is specified twice, the later value is used |
Line 222: |
Line 262: |
| | | |
| Let's look at the configuration file I created for this test: | | Let's look at the configuration file I created for this test: |
− | more ${IN}/gotcloud.conf | + | more ${SS}/gotcloud.conf |
| + | |
| + | Use the <code>space bar</code> to advance if the whole file isn't displayed. |
| | | |
− | ; If your input/reference are at a different path, what would you change? | + | ; If your references are in a different path than what is specified, what would you change? |
| <ul> | | <ul> |
− | <div class="mw-collapsible mw-collapsed" style="width:200px"> | + | <div class="mw-collapsible mw-collapsed" style="width:300px"> |
| <li>Answer:</li> | | <li>Answer:</li> |
| <div class="mw-collapsible-content"> | | <div class="mw-collapsible-content"> |
| <ul> | | <ul> |
− | <li>Change <code>IN_DIR</code> & <code>REF_DIR</code></li> | + | <li>You would change <code>REF_DIR</code> to the new path</li> |
− | [[File:gcConf.png|500px]] | + | [[File:gcConf.png|800px]] |
| </div> | | </div> |
| </div> | | </div> |
Line 237: |
Line 279: |
| </ul> | | </ul> |
| | | |
− | == Run GotCloud Align == | + | == Run [[GotCloud]] Align == |
| + | |
| + | [[File:AlignDiagram.png|500px]] |
| + | |
| Now that we have all of our input files, we need just a simple command to run them | | Now that we have all of our input files, we need just a simple command to run them |
− | ${GC}/gotcloud align --conf ${IN}/gotcloud.conf --numcs 2 | + | ${GC}/gotcloud align --conf ${SS}/gotcloud.conf --numcs 2 --base_prefix ${SS} --outdir ${OUT} |
| + | |
| + | * <code>${GC}/gotcloud</code> runs GotCloud |
| + | * <code>align</code> tells GotCloud you want to run the alignment pipeline. |
| + | * <code>--conf</code> tells GotCloud the name of the configuration file to use. |
| + | ** The configuration for this test was downloaded with the seqshop input files. |
| * <code>--numcs</code> means to run 2 samples at a time. | | * <code>--numcs</code> means to run 2 samples at a time. |
− | ** How many you can run concurrently depends on your system | + | ** How many you can run concurrently depends on your system. |
| + | * <code>--base_prefix</code> tells GotCloud the prefix to append to relative paths. |
| + | ** The Configuration file cannot read environment variables, so we need to tell GotCloud the path to the input files, ${SS} |
| + | ** Alternatively, gotcloud.conf could be updated to specify the full paths |
| + | * <code>--out_dir</code> tells GotCloud where to write the output. |
| + | ** This could be specified in gotcloud.conf, but to allow you to use the ${OUT} to change the output location, it is specified on the command-line |
| + | |
| [[File:gcalignStart.png|850px]] | | [[File:gcalignStart.png|850px]] |
| | | |
Line 249: |
Line 305: |
| | | |
| If you cancelled GotCloud part way through, just rerun your GotCloud command and it will pick up where it left off. | | If you cancelled GotCloud part way through, just rerun your GotCloud command and it will pick up where it left off. |
| + | |
| + | Inside GotCloud align, not only sequence alignment but also pre-processing of sequence data, including deduplication and base quality recalibration are performed along with quality assessment, as illustrated below. |
| + | |
| + | [[File:Gotcloud_align_detail.png|500px]] |
| | | |
| == Examining GotCloud Align Output == | | == Examining GotCloud Align Output == |
| | | |
| Let's look at the output directory: | | Let's look at the output directory: |
− | ls ${OUTPUT} | + | ls ${OUT} |
| [[File:gcalignOutM.png|600px]] | | [[File:gcalignOutM.png|600px]] |
| | | |
− | Let's look at the BAMs (aligned reads that are ready for variant calling):
| + | === Quality Control Files === |
− | ls ${OUTPUT}/bams
| |
− | [[File:GcalignOutBAMm.png|600px]]
| |
− | | |
− | BAM Files:
| |
− | * Binary Sequence Alignment/Map (SAM) Format
| |
− | * Maps reads to Chromosome/Position
| |
− | * For a detailed explanation of the SAM/BAM format, see:
| |
− | ** SAM/BAM Spec: http://samtools.github.io/hts-specs/SAMv1.pdf
| |
− | ** Additional information I put together as I started working with SAM/BAM: [[SAM]]
| |
− | * Consists of:
| |
− | ** Header
| |
− | *** Starts with '@'
| |
− | *** Records - one for each sequence read
| |
− | Let's examine a BAM file:
| |
− | samtools view -h ${OUTPUT}/bams/
| |
− | [[File:BAM.png|750px]]
| |
− | | |
| Let's take a look at our quality control output directory: | | Let's take a look at our quality control output directory: |
− | ls ${OUTPUT}/QCFiles | + | ls ${OUT}/QCFiles |
| [[File:GcalignOutQCm.png|600px]] | | [[File:GcalignOutQCm.png|600px]] |
| | | |
| + | ==== Sample Contamination/Swap ==== |
| Check for sample contamination: | | Check for sample contamination: |
| * *.selfSM : Main output file containing the contamination estimate. | | * *.selfSM : Main output file containing the contamination estimate. |
− | ** If you are only interested in checking sample contamination:
| + | ** Check the 'FREEMIX' column for genotype-free estimate of contamination |
− | *** Check the 'FREEMIX' column for genotype-free estimate of contamination
| |
| **** 0-1 scale, the lower, the better | | **** 0-1 scale, the lower, the better |
− | **** See [[VerifyBamID#A_guideline_to_interpret_output_files|VerifyBamID: A guideline to interpret output files]] for more information | + | **** If [FREEMIX] >= 0.03 and [FREELK1]-[FREELK0] is large, possible contamination |
− | *** Check the 'CHIPMIX' column for contamination estimates with external genotypes (if provided)
| + | ** See [[VerifyBamID#A_guideline_to_interpret_output_files|VerifyBamID: A guideline to interpret output files]] for more information |
− | * *.selfRG : Same output to .*selfSM, but separated by readGroup (which might be helpful for library-level examination)
| + | less -S ${OUT}/QCFiles/HG00551.genoCheck.selfSM |
− | * *.depthSM : depth distribution of reads covering the marker position of the input VCF, across all readGroups.
| + | |
− | * *.depthRG : depth distribution of reads covering the marker position of the input VCF, per readGroups.
| + | Remember, use <code>'q'</code> to exit out of <code>less</code> |
− | less -S ${OUTPUT}/QCFiles/HG00551.genoCheck.selfSM | + | q |
| + | |
| + | ; Is there evidence of sample contamination? |
| + | <ul> |
| + | <div class="mw-collapsible mw-collapsed" style="width:200px"> |
| + | <li>Answer:</li> |
| + | <div class="mw-collapsible-content"> |
| + | <ul> |
| + | <li>No, FREEMIX = 0.00000 (<0.03)</li> |
| + | </ul> |
| [[File:Contam1.png|700px]] | | [[File:Contam1.png|700px]] |
| + | </div> |
| + | </div> |
| + | </ul> |
| + | |
| + | ==== QC Metrics ==== |
| + | See: [[QPLOT#Diagnose_sequencing_quality|QPLOT: Diagnose sequencing quality]] for more info on how to use QPLOT results. |
| | | |
− | Next, let's look at some quality control metrics:
| + | Let's look at some quality control metrics: |
− | cat ${OUTPUT}/QCFiles/HG00551.qplot.stats | + | cat ${OUT}/QCFiles/HG00551.qplot.stats |
− | * 99.16% mapping rate
| |
− | * 94.01% high quality bases
| |
− | * 7x coverage
| |
− | * 31.3% A, 31.3% T
| |
− | * 18.7% C, 18.7% G
| |
| | | |
| + | ; What is the mapping rate & average coverage for HG00551? |
| + | <ul> |
| + | <div class="mw-collapsible mw-collapsed" style="width:200px"> |
| + | <li>Answer</li> |
| + | <div class="mw-collapsible-content"> |
| + | <ul> |
| + | <li> 98.93% Mapped</li> |
| + | <li>7.43 MeanDepth</li> |
| + | </ul> |
| + | [[File:qplots.png|200px]] |
| + | </div> |
| + | </div> |
| + | </ul> |
| | | |
− | Generate the pdf's of our quality metrics: | + | Generate a pdf of quality metrics: |
− | Rscript ${OUTPUT}/QCFiles/HG00551.qplot.R | + | Rscript ${OUT}/QCFiles/HG00551.qplot.R |
− | Rscript ${OUTPUT}/QCFiles/HG00553.qplot.R
| |
− | Rscript ${OUTPUT}/QCFiles/HG00640.qplot.R
| |
− | Rscript ${OUTPUT}/QCFiles/HG00641.qplot.R
| |
| | | |
| Examine the PDF: | | Examine the PDF: |
− | evince ${OUTPUT}/QCFiles/HG00551.qplot.pdf& | + | evince ${OUT}/QCFiles/HG00551.qplot.pdf& |
− | The first plot: Empirical vs reported Phred score does not look as good as we would like.
| + | |
− | * This is due to the small region used for recalibration
| + | It is ok if you see a warning message when opening evince. It should still open. If not, let me know. To close evince, just close the pdf window. |
− | Look at the PDF I produced when I ran the whole genome: | + | |
− | evince ${GC}/example/HG00551.wg.qplot.pdf& | + | ;Does the Empirical vs reported Phred score look as good as we would like? |
| + | <ul> |
| + | <div class="mw-collapsible mw-collapsed" style="width:400px"> |
| + | <li>Answer</li> |
| + | <div class="mw-collapsible-content"> |
| + | <ul> |
| + | <li> No, it is well above the line</li> |
| + | <li> This is due to the small region used for recalibration</li> |
| + | [[File:Qplotpdf.png|400px]] |
| + | <li> Look at the PDF I produced when I ran the whole genome:</li> |
| + | evince ${SS}/ext/HG00551.wg.qplot.pdf& |
| + | </ul> |
| + | [[File:Qplotpdfwg.png|400px]] |
| + | </div> |
| + | </div> |
| + | </ul> |
| + | |
| + | === BAM Files === |
| + | Binary Sequence Alignment/Map (SAM) Format |
| + | * Maps reads to Chromosome/Position |
| + | * For a detailed explanation of the SAM/BAM format, see: |
| + | ** SAM/BAM Spec: http://samtools.github.io/hts-specs/SAMv1.pdf |
| + | ** Additional information I put together as I started working with SAM/BAM: [[SAM]] |
| + | |
| + | Let's look at the BAMs (aligned reads that are ready for variant calling): |
| + | ls ${OUT}/bams |
| + | [[File:GcalignOutBAMm.png|600px]] |
| + | |
| + | Let's examine at the first 5 lines of the BAM file using [http://samtools.sourceforge.net/samtools.shtml#3 samtools view]: |
| + | ${GC}/bin/samtools view -h ${OUT}/bams/HG00551.recal.bam|head -n 5 |
| | | |
− | See: [[QPLOT#Diagnose_sequencing_quality|QPLOT: Diagnose sequencing quality]] for more info on how to use QPLOT results.
| + | ; What are the chromosome and position of the first record in the BAM file? |
| + | <ul> |
| + | <div class="mw-collapsible mw-collapsed" style="width:300px"> |
| + | <li>Need a reminder of the format?</li> |
| + | <div class="mw-collapsible-content"> |
| + | [[File:Bam.png|750px]] |
| + | </div> |
| + | </div> |
| + | <div class="mw-collapsible mw-collapsed" style="width:300px"> |
| + | <li>Answer</li> |
| + | <div class="mw-collapsible-content"> |
| + | <ul> |
| + | <li>Chr 22, Pos: 16114122</li> |
| + | </ul> |
| + | [[File:BamRec.png|650px]] |
| + | </div> |
| + | </div> |
| + | </ul> |
| + | |
| + | ==== Accessing BAMs by Position ==== |
| + | BAM's are so big, what if we want to see a position part way through the file? |
| + | *[http://samtools.sourceforge.net/samtools.shtml#3 samtools] has an option for that. |
| + | |
| + | Add a region to the view command we used above. Let's find all reads that overlap positions 36907000-36907005: |
| + | ${GC}/bin/samtools view -h ${OUT}/bams/HG00551.recal.bam 22:36907000-36907005 |
| + | * Just a few reads. |
| + | |
| + | Let's visualize what reads in that area look like using samtools tview: |
| + | ${GC}/bin/samtools tview ${OUT}/bams/HG00551.recal.bam ${SS}/ref22/human.g1k.v37.chr22.fa |
| + | * Type ‘g’ |
| + | ** Type 22:36907000 |
| + | * Type ‘n’ to color by nucleotide |
| + | * Use the arrow keys to move around and look at the area. |
| + | |
| + | Understanding the syntax: |
| + | * '.' : match to the reference on the forward strand |
| + | * ',' : match to the reference on the reverse strand |
| + | * ACGTN : mismatch to reference on the forward strand |
| + | * acgtn : mismatch to reference on the reverse strand |
| + | |
| + | ; Do you see anything interesting? |
| + | <ul> |
| + | <div class="mw-collapsible mw-collapsed" style="width:500px"> |
| + | <li>Screenshot</li> |
| + | <div class="mw-collapsible-content"> |
| + | <ul> |
| + | <li>We will have to remember this region when we run snpcall to see what it says.</li> |
| + | </ul> |
| + | [[File:tview.png|750px]] |
| + | </div> |
| + | </div> |
| + | </ul> |
| + | |
| + | Other tview commands: |
| + | * Type '?' for a help screen |
| + | * Type 'q' to quit tview |
| + | |
| + | Feel free to play around more and browse the BAM files. |
| + | |
| + | ==== Other tools for BAMs ==== |
| + | We have developed a lot of tools that operate on BAM files. |
| + | |
| + | See [[Software#BAM_Util_Tools|Software: BamUtil Tools]] for a list |
| + | * Many operations: |
| + | ** diff : diff 2 BAM files |
| + | ** stats: per positions statistics |
| + | ** bam2Fastq : convert a BAM back to a FASTQ (how I created the fastqs for this tutorial) |
| + | ** Lots of others |
| + | * Feel free to try some out |
| + | * If you have any questions, let me know, I wrote most of them and am happy to help. |
| + | |
| + | == Logging Off == |
| + | |
| + | ''This section is specifically for the SeqShop Workshop computers.'' |
| + | <div class="mw-collapsible mw-collapsed" style="width:600px"> |
| + | ''If you are not running during the SeqShop Workshop, please skip this section.'' |
| + | <div class="mw-collapsible-content"> |
| + | To logout of seqshop-server, type: |
| + | exit |
| + | And close the windows. |
| + | |
| + | When done, log out of the Windows machine. |
| + | </div> |
| + | </div> |