Changes

From Genome Analysis Wiki
Jump to navigationJump to search
no edit summary
Line 1: Line 1:  +
'''Note:''' the latest version of this practical is available at: [[SeqShop: Aligning Your Own Genome]]
 +
* The ones here is the original one from the June workshop (updated to be run from elsewhere)
 +
 +
 
== First Things First ==
 
== First Things First ==
 
*Helpful reference to many tools:
 
*Helpful reference to many tools:
Line 11: Line 15:  
* Practice setting up and running GotCloud on your own.
 
* Practice setting up and running GotCloud on your own.
   −
== Step 1 : Looking at your FASTQs ==
+
{{SeqShopLogin}}
 +
 
 +
 
 +
== I didn't get sequenced, what can I do? ==
 +
I prepared some test files for you.
 +
* I took a 1000g sample and reduced it to 2x.
 +
** I already created an align.2x.index file for you.
 +
*** This is a 1000g sample, so the filenames/RG information for these do not match the ones produced by our sequencer that are described below.
 +
To be consistent with everyone else you can do:
 +
mkdir ~/personal
 +
cp -r /home/mktrost/seqshop/inputs/2x/* ~/personal/.
 +
ls ~/personal/
 +
ls ~/personal/fastq
 +
 
 +
== I got sequence, how to I get my data ready to run? ==
 +
=== Finding your FASTQs ===
 
Your FASTQ files are under your <code>personal</code> directory.
 
Your FASTQ files are under your <code>personal</code> directory.
   Line 44: Line 63:  
[[File:FastqlistAnnotated.png]]
 
[[File:FastqlistAnnotated.png]]
    +
=== Checking your index file listing your FASTQs ===
 +
Are you analyzing your own genome?  Do you think you setup your file correctly?
   −
== Generating the index file listing your FASTQs ==
+
Try running this script to see if you have any errors:
 +
perl /home/mktrost/seqshop/inputs/checkIndex.pl ~/personal/align.2x.index
 +
 
 +
On success it prints: <code>Congratulations, your fastq index looks valid</code>
 +
 
 +
NOTE: This script is tailored to the filenames provided by our sequencing core as described above.
 +
* It could be tailored to other methods, but is designed for the paths of our data.
 +
 
 +
=== Generating the index file listing your FASTQs ===
 
What columns do we need in our file that tells GotCloud about our FASTQ?
 
What columns do we need in our file that tells GotCloud about our FASTQ?
 
* MERGE_NAME
 
* MERGE_NAME
Line 57: Line 86:  
We will store our FASTQ info file in: ~/personal/align.2x.index.
 
We will store our FASTQ info file in: ~/personal/align.2x.index.
   −
=== Using a Spreadsheet ===  
+
There are a few ways to create this file.
 +
* Write into a text file one fastq pair at a time.
 +
* Copy fastq1s into a spreadsheet, fill it in and copy back to a text file
 +
* [[#Using a Script|Write/Use a script]]
 +
==== Using a Regular Text File ====
 +
Follow the instructions below, but do it one FASTQ1 at a time (you won't be able to paste a full column of FASTQs at a time).
 +
* Remember to put a tab between each field.
 +
 
 +
==== Using a Spreadsheet ====  
 
Since we just have a handful of FASTQs, we can use a spreadsheet to construct our file and then copy the data into a text file.
 
Since we just have a handful of FASTQs, we can use a spreadsheet to construct our file and then copy the data into a text file.
 
* Thanks to those who thought to do this yesterday - it was a great idea.
 
* Thanks to those who thought to do this yesterday - it was a great idea.
Line 63: Line 100:  
First, open Excel
 
First, open Excel
   −
==== Header Row ====
+
===== Header Row =====
 
Create the header line by typing each of the column names in a row (you may be able to copy this line):
 
Create the header line by typing each of the column names in a row (you may be able to copy this line):
 
* make sure you enter these in all CAPS & spelling does matter
 
* make sure you enter these in all CAPS & spelling does matter
Line 69: Line 106:  
[[File:HdrRow.png]]
 
[[File:HdrRow.png]]
   −
==== MERGE_NAME ====
+
===== MERGE_NAME =====
 
MERGE_NAME is just your sample name
 
MERGE_NAME is just your sample name
 
* Type your Sample name under the MERGE_NAME column, for example: <code>Sample_12345</code>
 
* Type your Sample name under the MERGE_NAME column, for example: <code>Sample_12345</code>
Line 76: Line 113:  
All FASTQs are for the same sample, so you will use <code>Sample_12345</code> on every line.  We will fill those in after we get know how many rows we need.
 
All FASTQs are for the same sample, so you will use <code>Sample_12345</code> on every line.  We will fill those in after we get know how many rows we need.
   −
==== FASTQ1 ====
+
===== FASTQ1 =====
 
FASTQ1 is just the 1st in pair FASTQs (or the single FASTQ in single end)
 
FASTQ1 is just the 1st in pair FASTQs (or the single FASTQ in single end)
 
* Our sequencing core indicated 1st in pair by <code>R1</code> in the filename.
 
* Our sequencing core indicated 1st in pair by <code>R1</code> in the filename.
Line 96: Line 133:  
[[File:HdrSheetMN1.png|700]]
 
[[File:HdrSheetMN1.png|700]]
   −
====FASTQ2====
+
=====FASTQ2=====
 
As mentioned before, FASTQ2 files are the 2nd in pair.
 
As mentioned before, FASTQ2 files are the 2nd in pair.
 
* They have the same filename as FASTQ1, except replace the R1 with R2
 
* They have the same filename as FASTQ1, except replace the R1 with R2
Line 109: Line 146:  
[[File:HdrSheetFQ2 2.png]]
 
[[File:HdrSheetFQ2 2.png]]
   −
==== RGID ====
+
===== RGID =====
 
We want to group our FASTQs by Run & Lane.
 
We want to group our FASTQs by Run & Lane.
 
* Each Run/Lane combination should have a unique Read Group
 
* Each Run/Lane combination should have a unique Read Group
Line 120: Line 157:  
[[File:HdrSheetRGannotated.png]]
 
[[File:HdrSheetRGannotated.png]]
   −
====SAMPLE ====
+
=====SAMPLE =====
 
Put your sample name in each row of this column (you can copy from MERGE_NAME)
 
Put your sample name in each row of this column (you can copy from MERGE_NAME)
   −
==== LIBRARY ====
+
===== LIBRARY =====
 
Put your sample name in each row of this column (you can copy from MERGE_NAME)
 
Put your sample name in each row of this column (you can copy from MERGE_NAME)
 
* If a sample has multiple library preparations done on it, you would want to give unique names
 
* If a sample has multiple library preparations done on it, you would want to give unique names
 
** That is not our case, so just put in the sample name.
 
** That is not our case, so just put in the sample name.
   −
==== PLATFORM ====
+
===== PLATFORM =====
 
Your data was sequenced on ILLUMINA, so enter <code>ILLUMINA</code> in each row of the platform column.
 
Your data was sequenced on ILLUMINA, so enter <code>ILLUMINA</code> in each row of the platform column.
 
[[File:HdrSheetDone.png]]
 
[[File:HdrSheetDone.png]]
   −
==== Copy to Text File ====
+
===== Copy to Text File =====
 
Open nedit or your favorite linux editor
 
Open nedit or your favorite linux editor
 
  nedit ~/personal/align.2x.index&
 
  nedit ~/personal/align.2x.index&
Line 147: Line 184:  
You now have a tab delimited align.2x.index file (a little simpler than yesterday).
 
You now have a tab delimited align.2x.index file (a little simpler than yesterday).
   −
== I didn't get sequenced, what can I do? ==
+
==== Using a Script ====
I prepared some test files for you.
+
When generating an index of your FASTQs, it can be easiest to have a script.
* I took a 1000g sample and reduced it to 2x.
+
* Especially if you have many samples/runs, it would be very tedious to do by hand
** I already created an align.2x.index file for you.
+
 
 +
If you are good at scripting, this may be even easier than doing it by hand
 +
* If you aren't good at scripting, and you have too much data to do by hand
 +
** Make friends with someone who is :-)
 +
** I always find it useful to start from another script (reminds me of commands/tricks)
 +
 
 +
If you still need to create your file and you don't want to use the spreadsheet method above, you can run a script that I made:
 +
perl /home/mktrost/seqshop/inputs/buildIndex.pl ~/personal > ~/personal/align.2x.index
 +
* <code>></code> means to direct the output to the file specified after the <code>></code>
 +
 
 +
Curious what the script looks like and what it does in case you want to create one in the future?
 +
<div class="mw-collapsible mw-collapsed" style="width:200px">
 +
<li>View Annotated Script</li>
 +
<div class="mw-collapsible-content">
 +
[[File:BuildIndex.png|800px]]
 +
</div>
 +
</div>
 +
 
 +
=== Checking your index file listing your FASTQs ===
 +
Are you analyzing your own genome?  Do you think you setup your file correctly?
 +
 
 +
Try running this script to see if you have any errors:
 +
perl /home/mktrost/seqshop/inputs/checkIndex.pl ~/personal/align.2x.index
 +
 
 +
On success it prints: <code>Congratulations, your fastq index looks valid</code>
   −
To be consistent with everyone else you can do:
+
NOTE: This script is tailored to the filenames provided by our sequencing core as described above.
mkdir ~/personal
+
* It could be tailored to other methods, but is designed for the paths of our data.
cp -r /home/mktrost/seqshop/inputs/2x/* ~/personal/.
  −
ls ~/personal/
  −
ls ~/personal/fastq
      
== Create your GotCloud Configuration File ==
 
== Create your GotCloud Configuration File ==
Line 221: Line 279:  
:* Type Ctrl-a Esc and you should be able to scroll up with your mouse wheel
 
:* Type Ctrl-a Esc and you should be able to scroll up with your mouse wheel
 
:** Or at least that is what I do from my Linux machine - (sorry I'm typing this up/testing these commands from Linux and not windows, so can't test it out)
 
:** Or at least that is what I do from my Linux machine - (sorry I'm typing this up/testing these commands from Linux and not windows, so can't test it out)
 +
 +
 +
== Log Out ==
 +
If you have not detached from screen:
 +
Ctrl-a d
 +
 +
exit PuTTY
 +
 +
== FEEDBACK! ==
 +
Since I didn't send this out yesterday, today's survey has feedback for Tuesday & Wednesday.
 +
https://docs.google.com/forms/d/1qaLHq9w1Ib3FZq0CtlrbK_-breNiqGRV06oRYNmUuME/viewform

Navigation menu