Changes

From Genome Analysis Wiki
Jump to navigationJump to search
m
Line 1: Line 1:  
== Objective ==
 
== Objective ==
   −
In this winter, Biostatistics 615/815 aims for providing students with a practical understanding of computational aspects in implementing statistical methods. Although C++ language will be used throughout the course, using Java programming language for homework and project will be acceptable.
+
In Fall 2011, Biostatistics 615/815 aims for providing students with a practical understanding of computational aspects in implementing statistical methods. Although C++ language will be used throughout the course, using Java programming language for homework and project will be acceptable.
    
== Target Audience ==
 
== Target Audience ==
Line 10: Line 10:     
== Textbook ==
 
== Textbook ==
* Required Textbook : Cormen, Leiserson, Rivest, and Stein, "Introduction to Algorithms", Third Edition, The MIT Press, 2009 [http://mitpress.mit.edu/algorithms/ [Official Book Web Site]]
+
* Recommended Textbook : Cormen, Leiserson, Rivest, and Stein, "Introduction to Algorithms", Third Edition, The MIT Press, 2009 [http://mitpress.mit.edu/algorithms/ [Official Book Web Site]]
 
* Optional Textbook : Press, Teukolsky, Vetterling, Flannery, "Numerical Recipes", 3rd Edition, Cambridge University Press, 2007 [http://www.nr.com/ [Official Book Web Site]]
 
* Optional Textbook : Press, Teukolsky, Vetterling, Flannery, "Numerical Recipes", 3rd Edition, Cambridge University Press, 2007 [http://www.nr.com/ [Official Book Web Site]]
    
== Class Schedule ==
 
== Class Schedule ==
   −
Classes are scheduled for Tuesday and Thursdays, 8:30 - 10:00 am at SPH II M4318
+
Classes are scheduled for Tuesday and Thursdays, 8:30 - 10:00 am at SPH II M4332
    
== Topics ==
 
== Topics ==
Line 21: Line 21:  
The following contents are planned to be covered.  
 
The following contents are planned to be covered.  
   −
=== Part I : Algorithms 101 ===
+
=== Part I : C++ Basics and Introductory Algorithms ===
* Understanding of Computational Time Complexity
+
* Computational Time Complexity
 
* Sorting
 
* Sorting
 
* Divide and Conquer Algorithms
 
* Divide and Conquer Algorithms
Line 28: Line 28:  
* Key Data Structure
 
* Key Data Structure
 
* Dynamic Programming
 
* Dynamic Programming
 +
* Hidden Markov Models
   −
=== Part II : Matrix Operations and Numerical Optimizations ===
+
=== Part II : Numerical Methods and Randomized Algorithms ===
* Matrix decomposition (LU, QR, SVD)
+
* Random Numbers
* Implementation of Linear Models
+
* Matrix Operations and Least Square Methods
* Numerical Optimizations
+
* Importance Sampling
 
  −
=== Part III : Advanced Statistical Methods ===
  −
* Hidden Markov Models
   
* Expectation Maximization
 
* Expectation Maximization
 
* Markov-Chain Monte Carlo Methods
 
* Markov-Chain Monte Carlo Methods
 +
* Simulated Annealing
 +
* Gibbs Sampling
    
== Class Notes ==
 
== Class Notes ==
* Lecture 1 : Statistical Computing -- [[Media:Biostat615-lecture1-handout.pdf | (Handout mode - PDF)]] [[Media:Biostat615-lecture1-presentation.pdf | (Presentation mode - PDF)]]
+
* Lecture 1 : Introduction to Statistical Computing -- [[Media:Biostat615-Fall2011-lecture01-handout.pdf | (Handout PDF)]] [[Media:Biostat615-Fall2011-lecture01-presentation.pdf | (Presentation PDF)]]
* Lecture 2 : C++ Basics and Precisions -- [[Media:Biostat615-lecture2-handout-nup.pdf | (Handout mode - PDF)]] [[Media:Biostat615-lecture2-presentation.pdf | (Presentation mode - PDF)]]
+
* Lecture 2 : Introduction to C++ Programming -- [[Media:Biostat615-Fall2011-lecture02-handout.pdf | (Handout PDF)]] [[Media:Biostat615-Fall2011-lecture02-presentation.pdf | (Presentation PDF)]]
* Lecture 3 : Implementing Fisher's Exact Test -- [[Media:Biostat615-lecture3-nup.pdf | (Handout mode - PDF)]] [[Media:Biostat615-lecture3.pdf | (Presentation mode - PDF)]]
+
* Lecture 3 : C++ Basics and Fisher's Exact Test -- [[Media:Biostat615-Fall2011-lecture03-handout.pdf | (Handout PDF)]] [[Media:Biostat615-Fall2011-lecture03-presentation.pdf | (Presentation PDF)]]
 +
* Lecture 4 : Standard Template Libraries + Divide and Conquer Algorithms-- [[Media:Biostat615-Fall2011-lecture04-handout.pdf | (Handout PDF)]] [[Media:Biostat615-Fall2011-lecture04-presentation.pdf | (Presentation PDF)]]
 +
* Lecture 5 : Sorting Algorithms -- [[Media:Biostat615-Fall2011-lecture05-handout.pdf | (Handout PDF)]] [[Media:Biostat615-Fall2011-lecture05-presentation.pdf | (Presentation PDF)]]
 +
* Lecture 6 : Sorting Algorithms & Arrays -- [[Media:Biostat615-Fall2011-lecture06-handout.pdf | (Handout PDF)]] [[Media:Biostat615-Fall2011-lecture06-presentation.pdf | (Presentation PDF)]]
 +
* Lecture 7 : List and Binary Search Trees -- [[Media:Biostat615-Fall2011-lecture07-handout.pdf | (Handout PDF)]] [[Media:Biostat615-Fall2011-lecture07-presentation.pdf | (Presentation PDF)]]
 +
* Lecture 8 : Hash Tables -- [[Media:Biostat615-Fall2011-lecture08-handout.pdf | (Handout PDF)]] [[Media:Biostat615-Fall2011-lecture08-presentation.pdf | (Presentation PDF)]]
 +
* Lecture 9 : Dynamic Programming -- [[Media:Biostat615-Fall2011-lecture09-handout.pdf | (Handout PDF)]] [[Media:Biostat615-Fall2011-lecture09-presentation.pdf | (Presentation PDF)]]
 +
* Review : Dynamic Programming & Midterm Review -- [[Media:Biostat615-Fall2011-midterm-2011-winter-.pdf | (PDF)]]
 +
* Lecture 10 : Hidden Markov Model-- [[Media:Biostat615-Fall2011-lecture10-handout.pdf | (PDF)]]  '''(UPDATED on Oct 29th at 12:51PM)'''
 +
* Lecture 11 : Hidden Markov Model (cont'd) -- [[Media:Biostat615-lecture11-2011-10-20.pdf | (PDF)]] '''(UPDATED on Oct 28th at 1:42PM)'''
 +
* Lecture 12 : Boost Library & Random Numbers -- [[Media:Biostat615-lecture12-2011-10-25.pdf | (PDF)]]
 +
* Lecture 13 : Single dimensional optimization -- [[Media:Biostat615-lecture13-2011-10-27.pdf | (PDF)]]
 +
* Lecture 14 : Single and multi dimensional optimizations -- [[Media:Biostat615-fall2011-lecture14.pdf | (PDF)]] (Updated on Nov 3rd 1:25AM)
 +
* Lecture 15 : Multi dimensional optimizations -- [[Media:Biostat615-fall2011-lecture15.pdf | (PDF)]] (Updated Nov 8 10:35AM)
 +
* Lecture 16 : E-M algorithm -- [[Media:Biostat615-fall2011-lecture16.pdf | (PDF)]] (Updated Nov 8 10:35AM)
 +
* Lecture 17 : Simulated Annealing -- [[Media:Biostat615-fall2011-lecture17.pdf | (PDF)]]
 +
* Lecture 18 : Gibbs Sampling -- [[Media:Biostat615-fall2011-lecture18.pdf | (PDF)]] (Updated Nov 16 10:00PM)
 +
* Lecture 19 : Importace Sampling -- [[Media:Biostat615-fall2011-lecture19.pdf | (PDF)]]
 +
* Lecture 20 : Advanced Hidden Markov Models -- [[Media:Biostat615-fall2011-lecture20.pdf | (PDF)]]
 +
* Lecture 21 : Linear Algebra in C++ -- [[Media:Biostat615-fall2011-lecture21.pdf | (PDF)]]
 +
* Lecture 22 : More Linear Algebra in C++ -- [[Media:Biostat615-fall2011-lecture22.pdf | (PDF)]]
 +
* Lecture 23 : Interfacing between C++ and R -- [[Media:Biostat615-fall2011-lecture23.pdf | (PDF)]]
 +
* Review : Final Review -- [[Media:Biostat615-winter2011-final.pdf | (PDF)]] [[Media:Biostat615-homework-review.pdf | (Homework)]]
    
== Problem Sets ==
 
== Problem Sets ==
* Problem Set 1 : Due on January 20th, 2011 -- [[Media:Biostat615-homework1.pdf | (Problem Set 1 - PDF)]]
+
* Problem Set 0 - Running screenshots of helloWorld.cpp and towerOfHanoi.cpp - Due before the submission of Problem Set 1
 +
* Problem Set 1 -- Due on Tuesday September 27th, 2011 [[Media:Biostat615-Fall2011-homework01.pdf | (PDF)]] [[Media:Biostat615-Fall2011-homework01-solutions.pdf | (PDF-SOLUTIONS)]]
 +
* Problem Set 2 -- Due on Thursday October 6th, 2011 [[Media:Biostat615-Fall2011-homework02.pdf | (PDF)]] [[Media:Biostat615-Fall2011-homework02-solutions.pdf | (PDF-SOLUTIONS)]]
 +
** (Update Oct 2, 2011 : Note that the problem 1 and 3 are slightly updated for clarification)
 +
** (If you can't decompress the files above properly, use this alternative link by [http://dl.dropbox.com/u/1850834/biostat615-homework02-datasets.tar.gz CLICKING HERE] )
 +
* Problem Set 3 -- Due on Tuesday November 1st, 2011 [[Media:Biostat615-homework03.pdf | (PDF)]] (UPDATED on Oct 25th at 11:10AM)
 +
* Problem Set 4 -- Due on Tuesday November 15th, 2011 [[Media:Biostat615-homework04.pdf | (PDF)]]
 +
* Problem Set 5 -- Due on Tuesday November 29th, 2011 [[Media:Biostat615-fall2011-homework05.pdf | (PDF)]]
 +
* Problem Set 6 -- Due on Tuesday December 13th, 2011 [[Media:Biostat615-fall2011-homework06.pdf | (PDF)]]
 +
** [http://www.sph.umich.edu/csg/abecasis/class/2006/ModelFittingData.txt DOWNLOAD DATA FOR PROBLEM 1]
 +
** [http://dl.dropbox.com/u/1850834/zip_01.zip DOWNLOAD DATA FOR PROBLEM 3]
 +
 
 +
=== Supplementary Data sets for Problem Sets ===
 +
* Problem Set 2
 +
** [[Media:Shuf-1M.txt.gz| (Example data - shuf-1M.txt.gz)]] 1,000,000 randomly shuffled data (gzipped)
 +
** [[Media:Rand-1M-3digits.txt.gz| (Example data - Rand-1M-3digits.txt.gz)]] 1,000,000 random data from 1 to 1,000]] (gzipped)
 +
** [[Media:Rand-50k.txt.gz | (Example data - Rand-50k.txt.gz)]] 50,000 random data from 1 to 1,000,000)]] (gzippd)
 +
* Problem Set 3
 +
** Example output data for problem 3-1 (input is the second column) '''(NOTE : ADDED on Oct 25 11:45PM)''' -- This is also reflected in lecture 11 class note.
 +
TIME TOSS P(FAIR) P(BIAS) MLSTATE
 +
1 H 0.5950 0.4050 FAIR
 +
2 T 0.8118 0.1882 FAIR
 +
3 H 0.8071 0.1929 FAIR
 +
4 T 0.8584 0.1416 FAIR
 +
5 H 0.7613 0.2387 FAIR
 +
6 H 0.7276 0.2724 FAIR
 +
7 T 0.7495 0.2505 FAIR
 +
8 H 0.5413 0.4587 BIASED
 +
9 H 0.4187 0.5813 BIASED
 +
10 H 0.3533 0.6467 BIASED
 +
11 H 0.3301 0.6699 BIASED
 +
12 H 0.3436 0.6564 BIASED
 +
13 H 0.3971 0.6029 BIASED
 +
14 T 0.5028 0.4972 BIASED
 +
15 H 0.3725 0.6275 BIASED
 +
16 H 0.2985 0.7015 BIASED
 +
17 H 0.2635 0.7365 BIASED
 +
18 H 0.2596 0.7404 BIASED
 +
19 H 0.2858 0.7142 BIASED
 +
20 H 0.3482 0.6518 BIASED
 +
** Example output data for problem 3-2 (input is the second column) '''(NOTE : UPDATED on Oct 25 11:23PM)'''
 +
TIME TOSS Pr(F) Pr(HB) Pr(TB) MLSTATE
 +
1 T 0.8844 0.0326 0.0830 FAIR
 +
2 H 0.9012 0.0791 0.0198 FAIR
 +
3 H 0.9075 0.0735 0.0189 FAIR
 +
4 T 0.9091 0.0145 0.0764 FAIR
 +
5 T 0.9068 0.0114 0.0818 FAIR
 +
6 H 0.9058 0.0440 0.0502 FAIR
 +
7 T 0.8834 0.0275 0.0891 FAIR
 +
8 H 0.8520 0.0698 0.0783 FAIR
 +
9 T 0.7713 0.0347 0.1940 FAIR
 +
10 T 0.6927 0.0823 0.2249 FAIR
 +
11 H 0.4730 0.4984 0.0286 HEAD-BIASED
 +
12 H 0.3227 0.6706 0.0066 HEAD-BIASED
 +
13 H 0.2236 0.7726 0.0037 HEAD-BIASED
 +
14 H 0.1589 0.8381 0.0031 HEAD-BIASED
 +
15 H 0.1169 0.8803 0.0028 HEAD-BIASED
 +
16 H 0.0902 0.9072 0.0026 HEAD-BIASED
 +
17 H 0.0740 0.9235 0.0025 HEAD-BIASED
 +
18 H 0.0654 0.9321 0.0025 HEAD-BIASED
 +
19 H 0.0630 0.9346 0.0025 HEAD-BIASED
 +
20 H 0.0661 0.9314 0.0025 HEAD-BIASED
 +
21 H 0.0755 0.9219 0.0026 HEAD-BIASED
 +
22 H 0.0926 0.9038 0.0036 HEAD-BIASED
 +
23 H 0.1204 0.8684 0.0113 HEAD-BIASED
 +
24 H 0.1603 0.7586 0.0811 HEAD-BIASED
 +
25 T 0.1904 0.0858 0.7238 TAIL-BASED
 +
26 T 0.1819 0.0118 0.8063 TAIL-BASED
 +
27 T 0.1797 0.0036 0.8167 TAIL-BASED
 +
28 T 0.1894 0.0028 0.8077 TAIL-BASED
 +
29 T 0.2136 0.0038 0.7826 TAIL-BASED
 +
30 T 0.2561 0.0123 0.7317 TAIL-BASED
 +
** Example input/output data for problem 3-3 (Applying 2-state HMM in Problem 3-1): Download using [http://dl.dropbox.com/u/1850834/biostat615-homework3-3-20k-examples.zip THIS LINK]
    
== Office Hours ==
 
== Office Hours ==
* TBD : Please fill out the doodle poll at http://doodle.com/7z2mqvft8cdhh4bn
+
* Friday 9:00AM-10:30PM
 
  −
== Information on Biostatistics Cluster ==
  −
* You may want to request an account to the biostatistics cluster. Refer to https://webservices.itcs.umich.edu/mediawiki/Biostatistics/index.php/Cluster for further information.
      
== Standards of Academic Conduct ==
 
== Standards of Academic Conduct ==
Line 62: Line 153:     
== Course History ==
 
== Course History ==
 
+
* Winter 2011 Course Web Site [[Biostatistics_615/815_Winter_2011]]
Goncalo Abecasis taught it in several academic years previously. For previous course notes, see [[http://www.sph.umich.edu/csg/abecasis/class Goncalo's older class notes]].
+
* Goncalo Abecasis taught it in several academic years previously. For previous course notes, see [[http://www.sph.umich.edu/csg/abecasis/class Goncalo's older class notes]].

Navigation menu