Changes

From Genome Analysis Wiki
Jump to navigationJump to search
no edit summary
Line 1: Line 1:  +
'''Note:''' the latest version of this practical is available at: [[SeqShop: Aligning Your Own Genome]]
 +
* The ones here is the original one from the June workshop (updated to be run from elsewhere)
 +
 +
 
== First Things First ==
 
== First Things First ==
 
*Helpful reference to many tools:
 
*Helpful reference to many tools:
Line 11: Line 15:  
* Practice setting up and running GotCloud on your own.
 
* Practice setting up and running GotCloud on your own.
   −
== Step 1 : Looking at your FASTQs ==
+
{{SeqShopLogin}}
 +
 
 +
 
 +
== I didn't get sequenced, what can I do? ==
 +
I prepared some test files for you.
 +
* I took a 1000g sample and reduced it to 2x.
 +
** I already created an align.2x.index file for you.
 +
*** This is a 1000g sample, so the filenames/RG information for these do not match the ones produced by our sequencer that are described below.
 +
To be consistent with everyone else you can do:
 +
mkdir ~/personal
 +
cp -r /home/mktrost/seqshop/inputs/2x/* ~/personal/.
 +
ls ~/personal/
 +
ls ~/personal/fastq
 +
 
 +
== I got sequence, how to I get my data ready to run? ==
 +
=== Finding your FASTQs ===
 
Your FASTQ files are under your <code>personal</code> directory.
 
Your FASTQ files are under your <code>personal</code> directory.
   Line 44: Line 63:  
[[File:FastqlistAnnotated.png]]
 
[[File:FastqlistAnnotated.png]]
    +
=== Checking your index file listing your FASTQs ===
 +
Are you analyzing your own genome?  Do you think you setup your file correctly?
 +
 +
Try running this script to see if you have any errors:
 +
perl /home/mktrost/seqshop/inputs/checkIndex.pl ~/personal/align.2x.index
 +
 +
On success it prints: <code>Congratulations, your fastq index looks valid</code>
   −
== Generating the index file listing your FASTQs ==
+
NOTE: This script is tailored to the filenames provided by our sequencing core as described above.
 +
* It could be tailored to other methods, but is designed for the paths of our data.
 +
 
 +
=== Generating the index file listing your FASTQs ===
 
What columns do we need in our file that tells GotCloud about our FASTQ?
 
What columns do we need in our file that tells GotCloud about our FASTQ?
 
* MERGE_NAME
 
* MERGE_NAME
Line 57: Line 86:  
We will store our FASTQ info file in: ~/personal/align.2x.index.
 
We will store our FASTQ info file in: ~/personal/align.2x.index.
   −
=== Using a Spreadsheet ===  
+
There are a few ways to create this file.
 +
* Write into a text file one fastq pair at a time.
 +
* Copy fastq1s into a spreadsheet, fill it in and copy back to a text file
 +
* [[#Using a Script|Write/Use a script]]
 +
==== Using a Regular Text File ====
 +
Follow the instructions below, but do it one FASTQ1 at a time (you won't be able to paste a full column of FASTQs at a time).
 +
* Remember to put a tab between each field.
 +
 
 +
==== Using a Spreadsheet ====  
 
Since we just have a handful of FASTQs, we can use a spreadsheet to construct our file and then copy the data into a text file.
 
Since we just have a handful of FASTQs, we can use a spreadsheet to construct our file and then copy the data into a text file.
 
* Thanks to those who thought to do this yesterday - it was a great idea.
 
* Thanks to those who thought to do this yesterday - it was a great idea.
Line 63: Line 100:  
First, open Excel
 
First, open Excel
   −
==== Header Row ====
+
===== Header Row =====
 
Create the header line by typing each of the column names in a row (you may be able to copy this line):
 
Create the header line by typing each of the column names in a row (you may be able to copy this line):
 
* make sure you enter these in all CAPS & spelling does matter
 
* make sure you enter these in all CAPS & spelling does matter
Line 69: Line 106:  
[[File:HdrRow.png]]
 
[[File:HdrRow.png]]
   −
==== MERGE_NAME ====
+
===== MERGE_NAME =====
 
MERGE_NAME is just your sample name
 
MERGE_NAME is just your sample name
 
* Type your Sample name under the MERGE_NAME column, for example: <code>Sample_12345</code>
 
* Type your Sample name under the MERGE_NAME column, for example: <code>Sample_12345</code>
Line 76: Line 113:  
All FASTQs are for the same sample, so you will use <code>Sample_12345</code> on every line.  We will fill those in after we get know how many rows we need.
 
All FASTQs are for the same sample, so you will use <code>Sample_12345</code> on every line.  We will fill those in after we get know how many rows we need.
   −
==== FASTQ1 ====
+
===== FASTQ1 =====
 
FASTQ1 is just the 1st in pair FASTQs (or the single FASTQ in single end)
 
FASTQ1 is just the 1st in pair FASTQs (or the single FASTQ in single end)
 
* Our sequencing core indicated 1st in pair by <code>R1</code> in the filename.
 
* Our sequencing core indicated 1st in pair by <code>R1</code> in the filename.
Line 96: Line 133:  
[[File:HdrSheetMN1.png|700]]
 
[[File:HdrSheetMN1.png|700]]
   −
====FASTQ2====
+
=====FASTQ2=====
 
As mentioned before, FASTQ2 files are the 2nd in pair.
 
As mentioned before, FASTQ2 files are the 2nd in pair.
 
* They have the same filename as FASTQ1, except replace the R1 with R2
 
* They have the same filename as FASTQ1, except replace the R1 with R2
Line 109: Line 146:  
[[File:HdrSheetFQ2 2.png]]
 
[[File:HdrSheetFQ2 2.png]]
   −
==== RGID ====
+
===== RGID =====
 
We want to group our FASTQs by Run & Lane.
 
We want to group our FASTQs by Run & Lane.
 
* Each Run/Lane combination should have a unique Read Group
 
* Each Run/Lane combination should have a unique Read Group
Line 120: Line 157:  
[[File:HdrSheetRGannotated.png]]
 
[[File:HdrSheetRGannotated.png]]
   −
====SAMPLE ====
+
=====SAMPLE =====
 
Put your sample name in each row of this column (you can copy from MERGE_NAME)
 
Put your sample name in each row of this column (you can copy from MERGE_NAME)
   −
==== LIBRARY ====
+
===== LIBRARY =====
 
Put your sample name in each row of this column (you can copy from MERGE_NAME)
 
Put your sample name in each row of this column (you can copy from MERGE_NAME)
 
* If a sample has multiple library preparations done on it, you would want to give unique names
 
* If a sample has multiple library preparations done on it, you would want to give unique names
 
** That is not our case, so just put in the sample name.
 
** That is not our case, so just put in the sample name.
   −
==== PLATFORM ====
+
===== PLATFORM =====
 
Your data was sequenced on ILLUMINA, so enter <code>ILLUMINA</code> in each row of the platform column.
 
Your data was sequenced on ILLUMINA, so enter <code>ILLUMINA</code> in each row of the platform column.
 
[[File:HdrSheetDone.png]]
 
[[File:HdrSheetDone.png]]
   −
==== Copy to Text File ====
+
===== Copy to Text File =====
 
Open nedit or your favorite linux editor
 
Open nedit or your favorite linux editor
 
  nedit ~/personal/align.2x.index&
 
  nedit ~/personal/align.2x.index&
Line 146: Line 183:     
You now have a tab delimited align.2x.index file (a little simpler than yesterday).
 
You now have a tab delimited align.2x.index file (a little simpler than yesterday).
 +
 +
==== Using a Script ====
 +
When generating an index of your FASTQs, it can be easiest to have a script.
 +
* Especially if you have many samples/runs, it would be very tedious to do by hand
 +
 +
If you are good at scripting, this may be even easier than doing it by hand
 +
* If you aren't good at scripting, and you have too much data to do by hand
 +
** Make friends with someone who is :-)
 +
** I always find it useful to start from another script (reminds me of commands/tricks)
 +
 +
If you still need to create your file and you don't want to use the spreadsheet method above, you can run a script that I made:
 +
perl /home/mktrost/seqshop/inputs/buildIndex.pl ~/personal > ~/personal/align.2x.index
 +
* <code>></code> means to direct the output to the file specified after the <code>></code>
 +
 +
Curious what the script looks like and what it does in case you want to create one in the future?
 +
<div class="mw-collapsible mw-collapsed" style="width:200px">
 +
<li>View Annotated Script</li>
 +
<div class="mw-collapsible-content">
 +
[[File:BuildIndex.png|800px]]
 +
</div>
 +
</div>
 +
 +
=== Checking your index file listing your FASTQs ===
 +
Are you analyzing your own genome?  Do you think you setup your file correctly?
 +
 +
Try running this script to see if you have any errors:
 +
perl /home/mktrost/seqshop/inputs/checkIndex.pl ~/personal/align.2x.index
 +
 +
On success it prints: <code>Congratulations, your fastq index looks valid</code>
 +
 +
NOTE: This script is tailored to the filenames provided by our sequencing core as described above.
 +
* It could be tailored to other methods, but is designed for the paths of our data.
    
== Create your GotCloud Configuration File ==
 
== Create your GotCloud Configuration File ==
Line 166: Line 235:  
* Everything else is configured already.
 
* Everything else is configured already.
 
[[File:Gc2xconf.png]]
 
[[File:Gc2xconf.png]]
 +
    
You'll notice that this file is very similar to the one we have been using.
 
You'll notice that this file is very similar to the one we have been using.
Line 209: Line 279:  
:* Type Ctrl-a Esc and you should be able to scroll up with your mouse wheel
 
:* Type Ctrl-a Esc and you should be able to scroll up with your mouse wheel
 
:** Or at least that is what I do from my Linux machine - (sorry I'm typing this up/testing these commands from Linux and not windows, so can't test it out)
 
:** Or at least that is what I do from my Linux machine - (sorry I'm typing this up/testing these commands from Linux and not windows, so can't test it out)
 +
 +
 +
== Log Out ==
 +
If you have not detached from screen:
 +
Ctrl-a d
 +
 +
exit PuTTY
 +
 +
== FEEDBACK! ==
 +
Since I didn't send this out yesterday, today's survey has feedback for Tuesday & Wednesday.
 +
https://docs.google.com/forms/d/1qaLHq9w1Ib3FZq0CtlrbK_-breNiqGRV06oRYNmUuME/viewform

Navigation menu