Changes

3,570 bytes added , 14:25, 13 November 2014

no edit summary

Line 1: Line 1: +

'''Note:''' the latest version of this practical is available at: [[SeqShop: Aligning Your Own Genome]]

+

* The ones here is the original one from the June workshop (updated to be run from elsewhere)

+

== First Things First ==

*Helpful reference to many tools:

Line 11: Line 15:

* Practice setting up and running GotCloud on your own.

−

== ~~Step 1~~ : ~~Looking at~~ your FASTQs ==

+

== I didn't get sequenced, what can I do? ==

+

I prepared some test files for you.

+

* I took a 1000g sample and reduced it to 2x.

+

** I already created an align.2x.index file for you.

+

*** This is a 1000g sample, so the filenames/RG information for these do not match the ones produced by our sequencer that are described below.

+

To be consistent with everyone else you can do:

+

mkdir ~/personal

+

cp -r /home/mktrost/seqshop/inputs/2x/* ~/personal/.

+

ls ~/personal/

+

ls ~/personal/fastq

+

== I got sequence, how to I get my data ready to run? ==

+

=== Finding your FASTQs ===

Your FASTQ files are under your <code>personal</code> directory.

Line 44: Line 63:

[[File:FastqlistAnnotated.png]]

+

=== Checking your index file listing your FASTQs ===

+

Are you analyzing your own genome? Do you think you setup your file correctly?

+

Try running this script to see if you have any errors:

+

perl /home/mktrost/seqshop/inputs/checkIndex.pl ~/personal/align.2x.index

+

On success it prints: <code>Congratulations, your fastq index looks valid</code>

−

== Generating the index file listing your FASTQs ==

+

NOTE: This script is tailored to the filenames provided by our sequencing core as described above.

+

* It could be tailored to other methods, but is designed for the paths of our data.

+

=== Generating the index file listing your FASTQs ===

What columns do we need in our file that tells GotCloud about our FASTQ?

* MERGE_NAME

Line 57: Line 86:

We will store our FASTQ info file in: ~/personal/align.2x.index.

−

=== Using a Spreadsheet ===

+

There are a few ways to create this file.

+

* Write into a text file one fastq pair at a time.

+

* Copy fastq1s into a spreadsheet, fill it in and copy back to a text file

+

* [[#Using a Script|Write/Use a script]]

+

==== Using a Regular Text File ====

+

Follow the instructions below, but do it one FASTQ1 at a time (you won't be able to paste a full column of FASTQs at a time).

+

* Remember to put a tab between each field.

+

==== Using a Spreadsheet ====

Since we just have a handful of FASTQs, we can use a spreadsheet to construct our file and then copy the data into a text file.

* Thanks to those who thought to do this yesterday - it was a great idea.

Line 63: Line 100:

First, open Excel

−

==== Header Row ====

+

===== Header Row =====

Create the header line by typing each of the column names in a row (you may be able to copy this line):

* make sure you enter these in all CAPS & spelling does matter

Line 69: Line 106:

[[File:HdrRow.png]]

−

==== MERGE_NAME ====

+

===== MERGE_NAME =====

MERGE_NAME is just your sample name

* Type your Sample name under the MERGE_NAME column, for example: <code>Sample_12345</code>

Line 76: Line 113:

All FASTQs are for the same sample, so you will use <code>Sample_12345</code> on every line. We will fill those in after we get know how many rows we need.

−

==== FASTQ1 ====

+

===== FASTQ1 =====

FASTQ1 is just the 1st in pair FASTQs (or the single FASTQ in single end)

* Our sequencing core indicated 1st in pair by <code>R1</code> in the filename.

Line 96: Line 133:

[[File:HdrSheetMN1.png|700]]

−

====FASTQ2====

+

=====FASTQ2=====

As mentioned before, FASTQ2 files are the 2nd in pair.

* They have the same filename as FASTQ1, except replace the R1 with R2

Line 109: Line 146:

[[File:HdrSheetFQ2 2.png]]

−

==== RGID ====

+

===== RGID =====

We want to group our FASTQs by Run & Lane.

* Each Run/Lane combination should have a unique Read Group

Line 120: Line 157:

[[File:HdrSheetRGannotated.png]]

−

====SAMPLE ====

+

=====SAMPLE =====

Put your sample name in each row of this column (you can copy from MERGE_NAME)

−

==== LIBRARY ====

+

===== LIBRARY =====

Put your sample name in each row of this column (you can copy from MERGE_NAME)

* If a sample has multiple library preparations done on it, you would want to give unique names

** That is not our case, so just put in the sample name.

−

==== PLATFORM ====

+

===== PLATFORM =====

Your data was sequenced on ILLUMINA, so enter <code>ILLUMINA</code> in each row of the platform column.

[[File:HdrSheetDone.png]]

−

==== Copy to Text File ====

+

===== Copy to Text File =====

Open nedit or your favorite linux editor

nedit ~/personal/align.2x.index&

Line 146: Line 183:

You now have a tab delimited align.2x.index file (a little simpler than yesterday).

+

==== Using a Script ====

+

When generating an index of your FASTQs, it can be easiest to have a script.

+

* Especially if you have many samples/runs, it would be very tedious to do by hand

+

If you are good at scripting, this may be even easier than doing it by hand

+

* If you aren't good at scripting, and you have too much data to do by hand

+

** Make friends with someone who is :-)

+

** I always find it useful to start from another script (reminds me of commands/tricks)

+

If you still need to create your file and you don't want to use the spreadsheet method above, you can run a script that I made:

+

perl /home/mktrost/seqshop/inputs/buildIndex.pl ~/personal > ~/personal/align.2x.index

+

* <code>></code> means to direct the output to the file specified after the <code>></code>

+

Curious what the script looks like and what it does in case you want to create one in the future?

+

+

<li>View Annotated Script</li>

+

+

[[File:BuildIndex.png|800px]]

+

</div>

+

</div>

+

=== Checking your index file listing your FASTQs ===

+

Are you analyzing your own genome? Do you think you setup your file correctly?

+

Try running this script to see if you have any errors:

+

perl /home/mktrost/seqshop/inputs/checkIndex.pl ~/personal/align.2x.index

+

On success it prints: <code>Congratulations, your fastq index looks valid</code>

+

NOTE: This script is tailored to the filenames provided by our sequencing core as described above.

+

* It could be tailored to other methods, but is designed for the paths of our data.

== Create your GotCloud Configuration File ==

Line 166: Line 235:

* Everything else is configured already.

[[File:Gc2xconf.png]]

+

You'll notice that this file is very similar to the one we have been using.

Line 209: Line 279:

:* Type Ctrl-a Esc and you should be able to scroll up with your mouse wheel

:** Or at least that is what I do from my Linux machine - (sorry I'm typing this up/testing these commands from Linux and not windows, so can't test it out)

+

== Log Out ==

+

If you have not detached from screen:

+

Ctrl-a d

+

exit PuTTY

+

== FEEDBACK! ==

+

Since I didn't send this out yesterday, today's survey has feedback for Tuesday & Wednesday.

+

https://docs.google.com/forms/d/1qaLHq9w1Ib3FZq0CtlrbK_-breNiqGRV06oRYNmUuME/viewform

Mktrost

Administrators

3,045

edits

Changes

SeqShop: Aligning Your Own Genome, June 2014 (view source)

Revision as of 14:25, 13 November 2014

Navigation menu

Page actions

Page actions

Personal tools

quick links

teaching

Navigation

Search

Tools