Difference between revisions of "Biostatistics 615/815 Fall 2011"

Latest revision as of 17:35, 31 August 2012

Objective

In Fall 2011, Biostatistics 615/815 aims for providing students with a practical understanding of computational aspects in implementing statistical methods. Although C++ language will be used throughout the course, using Java programming language for homework and project will be acceptable.

Target Audience

Students in Biostatistics 615 should be comfortable with simple algebra and basic statistics including probability distribution, linear model, and hypothesis testing. Previous experience in programming is not required, but those who do not have previous programming experience should expect to spend additional time studying and learning to be familiar with a programming language during the coursework. Most students registering for the course are Masters or Doctoral students in Biostatistics, Statistics, Bioinformatics or Human Genetics.

Students in Biostatistics 815 should be familiar with programming languages so that they can complete the class project tackling an advanced statistical problem during the semester. Project will be carried out in teams of 2. The details of the possible projects will be announced soon.

Textbook

Recommended Textbook : Cormen, Leiserson, Rivest, and Stein, "Introduction to Algorithms", Third Edition, The MIT Press, 2009 [Official Book Web Site]
Optional Textbook : Press, Teukolsky, Vetterling, Flannery, "Numerical Recipes", 3rd Edition, Cambridge University Press, 2007 [Official Book Web Site]

Class Schedule

Classes are scheduled for Tuesday and Thursdays, 8:30 - 10:00 am at SPH II M4332

Topics

The following contents are planned to be covered.

Part I : C++ Basics and Introductory Algorithms

Computational Time Complexity
Sorting
Divide and Conquer Algorithms
Searching
Key Data Structure
Dynamic Programming
Hidden Markov Models

Part II : Numerical Methods and Randomized Algorithms

Random Numbers
Matrix Operations and Least Square Methods
Importance Sampling
Expectation Maximization
Markov-Chain Monte Carlo Methods
Simulated Annealing
Gibbs Sampling

Class Notes

Lecture 1 : Introduction to Statistical Computing -- (Handout PDF) (Presentation PDF)
Lecture 2 : Introduction to C++ Programming -- (Handout PDF) (Presentation PDF)
Lecture 3 : C++ Basics and Fisher's Exact Test -- (Handout PDF) (Presentation PDF)
Lecture 4 : Standard Template Libraries + Divide and Conquer Algorithms-- (Handout PDF) (Presentation PDF)
Lecture 5 : Sorting Algorithms -- (Handout PDF) (Presentation PDF)
Lecture 6 : Sorting Algorithms & Arrays -- (Handout PDF) (Presentation PDF)
Lecture 7 : List and Binary Search Trees -- (Handout PDF) (Presentation PDF)
Lecture 8 : Hash Tables -- (Handout PDF) (Presentation PDF)
Lecture 9 : Dynamic Programming -- (Handout PDF) (Presentation PDF)
Review : Dynamic Programming & Midterm Review -- (PDF)
Lecture 10 : Hidden Markov Model-- (PDF) (UPDATED on Oct 29th at 12:51PM)
Lecture 11 : Hidden Markov Model (cont'd) -- (PDF) (UPDATED on Oct 28th at 1:42PM)
Lecture 12 : Boost Library & Random Numbers -- (PDF)
Lecture 13 : Single dimensional optimization -- (PDF)
Lecture 14 : Single and multi dimensional optimizations -- (PDF) (Updated on Nov 3rd 1:25AM)
Lecture 15 : Multi dimensional optimizations -- (PDF) (Updated Nov 8 10:35AM)
Lecture 16 : E-M algorithm -- (PDF) (Updated Nov 8 10:35AM)
Lecture 17 : Simulated Annealing -- (PDF)
Lecture 18 : Gibbs Sampling -- (PDF) (Updated Nov 16 10:00PM)
Lecture 19 : Importace Sampling -- (PDF)
Lecture 20 : Advanced Hidden Markov Models -- (PDF)
Lecture 21 : Linear Algebra in C++ -- (PDF)
Lecture 22 : More Linear Algebra in C++ -- (PDF)
Lecture 23 : Interfacing between C++ and R -- (PDF)
Review : Final Review -- (PDF) (Homework)

Problem Sets

Problem Set 0 - Running screenshots of helloWorld.cpp and towerOfHanoi.cpp - Due before the submission of Problem Set 1
Problem Set 1 -- Due on Tuesday September 27th, 2011 (PDF) (PDF-SOLUTIONS)
Problem Set 2 -- Due on Thursday October 6th, 2011 (PDF) (PDF-SOLUTIONS)
- (Update Oct 2, 2011 : Note that the problem 1 and 3 are slightly updated for clarification)
- (If you can't decompress the files above properly, use this alternative link by CLICKING HERE )
Problem Set 3 -- Due on Tuesday November 1st, 2011 (PDF) (UPDATED on Oct 25th at 11:10AM)
Problem Set 4 -- Due on Tuesday November 15th, 2011 (PDF)
Problem Set 5 -- Due on Tuesday November 29th, 2011 (PDF)
Problem Set 6 -- Due on Tuesday December 13th, 2011 (PDF)
- DOWNLOAD DATA FOR PROBLEM 1
- DOWNLOAD DATA FOR PROBLEM 3

Supplementary Data sets for Problem Sets

Problem Set 2
- (Example data - shuf-1M.txt.gz) 1,000,000 randomly shuffled data (gzipped)
- (Example data - Rand-1M-3digits.txt.gz) 1,000,000 random data from 1 to 1,000]] (gzipped)
- (Example data - Rand-50k.txt.gz) 50,000 random data from 1 to 1,000,000)]] (gzippd)
Problem Set 3
- Example output data for problem 3-1 (input is the second column) (NOTE : ADDED on Oct 25 11:45PM) -- This is also reflected in lecture 11 class note.

TIME	TOSS	P(FAIR)	P(BIAS)	MLSTATE
1	H	0.5950	0.4050	FAIR
2	T	0.8118	0.1882	FAIR
3	H	0.8071	0.1929	FAIR
4	T	0.8584	0.1416	FAIR
5	H	0.7613	0.2387	FAIR
6	H	0.7276	0.2724	FAIR
7	T	0.7495	0.2505	FAIR
8	H	0.5413	0.4587	BIASED
9	H	0.4187	0.5813	BIASED
10	H	0.3533	0.6467	BIASED
11	H	0.3301	0.6699	BIASED
12	H	0.3436	0.6564	BIASED
13	H	0.3971	0.6029	BIASED
14	T	0.5028	0.4972	BIASED
15	H	0.3725	0.6275	BIASED
16	H	0.2985	0.7015	BIASED
17	H	0.2635	0.7365	BIASED
18	H	0.2596	0.7404	BIASED
19	H	0.2858	0.7142	BIASED
20	H	0.3482	0.6518	BIASED

- Example output data for problem 3-2 (input is the second column) (NOTE : UPDATED on Oct 25 11:23PM)

TIME	TOSS	Pr(F)	Pr(HB)	Pr(TB)	MLSTATE
1	T	0.8844	0.0326	0.0830	FAIR
2	H	0.9012	0.0791	0.0198	FAIR
3	H	0.9075	0.0735	0.0189	FAIR
4	T	0.9091	0.0145	0.0764	FAIR
5	T	0.9068	0.0114	0.0818	FAIR
6	H	0.9058	0.0440	0.0502	FAIR
7	T	0.8834	0.0275	0.0891	FAIR
8	H	0.8520	0.0698	0.0783	FAIR
9	T	0.7713	0.0347	0.1940	FAIR
10	T	0.6927	0.0823	0.2249	FAIR
11	H	0.4730	0.4984	0.0286	HEAD-BIASED
12	H	0.3227	0.6706	0.0066	HEAD-BIASED
13	H	0.2236	0.7726	0.0037	HEAD-BIASED
14	H	0.1589	0.8381	0.0031	HEAD-BIASED
15	H	0.1169	0.8803	0.0028	HEAD-BIASED
16	H	0.0902	0.9072	0.0026	HEAD-BIASED
17	H	0.0740	0.9235	0.0025	HEAD-BIASED
18	H	0.0654	0.9321	0.0025	HEAD-BIASED
19	H	0.0630	0.9346	0.0025	HEAD-BIASED
20	H	0.0661	0.9314	0.0025	HEAD-BIASED
21	H	0.0755	0.9219	0.0026	HEAD-BIASED
22	H	0.0926	0.9038	0.0036	HEAD-BIASED
23	H	0.1204	0.8684	0.0113	HEAD-BIASED
24	H	0.1603	0.7586	0.0811	HEAD-BIASED
25	T	0.1904	0.0858	0.7238	TAIL-BASED
26	T	0.1819	0.0118	0.8063	TAIL-BASED
27	T	0.1797	0.0036	0.8167	TAIL-BASED
28	T	0.1894	0.0028	0.8077	TAIL-BASED
29	T	0.2136	0.0038	0.7826	TAIL-BASED
30	T	0.2561	0.0123	0.7317	TAIL-BASED

- Example input/output data for problem 3-3 (Applying 2-state HMM in Problem 3-1): Download using THIS LINK

Office Hours

Friday 9:00AM-10:30PM

Standards of Academic Conduct

The following is an extract from the School of Public Health's Student Code of Conduct [1]:

Student academic misconduct includes behavior involving plagiarism, cheating, fabrication, falsification of records or official documents, intentional misuse of equipment or materials, and aiding and abetting the perpetration of such acts. The preparation of reports, papers, and examinations, assigned on an individual basis, must represent each student’s own effort. Reference sources should be indicated clearly. The use of assistance from other students or aids of any kind during a written examination, except when the use of books or notes has been approved by an instructor, is a violation of the standard of academic conduct.

In the context of this course, any work you hand-in should be your own.

Course History

Winter 2011 Course Web Site Biostatistics_615/815_Winter_2011
Goncalo Abecasis taught it in several academic years previously. For previous course notes, see [Goncalo's older class notes].

Difference between revisions of "Biostatistics 615/815 Fall 2011"

Latest revision as of 17:35, 31 August 2012

Contents

Objective

Target Audience

Textbook

Class Schedule

Topics

Part I : C++ Basics and Introductory Algorithms

Part II : Numerical Methods and Randomized Algorithms

Class Notes

Problem Sets

Supplementary Data sets for Problem Sets

Office Hours

Standards of Academic Conduct

Course History

Navigation menu

Page actions

Page actions

Personal tools

quick links

teaching

Navigation

Search

Tools

@@ Line 1: / Line 1: @@
 == Objective ==
-In this winter, Biostatistics 615/815 aims for providing students with a practical understanding of computational aspects in implementing statistical methods. Although C++ language will be used throughout the course, using Java programming language for homework and project will be acceptable.
+In Fall 2011, Biostatistics 615/815 aims for providing students with a practical understanding of computational aspects in implementing statistical methods. Although C++ language will be used throughout the course, using Java programming language for homework and project will be acceptable.
 == Target Audience ==
@@ Line 10: / Line 10: @@
 == Textbook ==
-* Required Textbook : Cormen, Leiserson, Rivest, and Stein, "Introduction to Algorithms", Third Edition, The MIT Press, 2009 [http://mitpress.mit.edu/algorithms/ [Official Book Web Site]]
+* Recommended Textbook : Cormen, Leiserson, Rivest, and Stein, "Introduction to Algorithms", Third Edition, The MIT Press, 2009 [http://mitpress.mit.edu/algorithms/ [Official Book Web Site]]
 * Optional Textbook : Press, Teukolsky, Vetterling, Flannery, "Numerical Recipes", 3rd Edition, Cambridge University Press, 2007 [http://www.nr.com/ [Official Book Web Site]]
 == Class Schedule ==
-Classes are scheduled for Tuesday and Thursdays, 8:30 - 10:00 am at SPH II M4318
+Classes are scheduled for Tuesday and Thursdays, 8:30 - 10:00 am at SPH II M4332
 == Topics ==
@@ Line 21: / Line 21: @@
 The following contents are planned to be covered.
-=== Part I : Algorithms 101 ===
+=== Part I : C++ Basics and Introductory Algorithms ===
-* Understanding of Computational Time Complexity
+* Computational Time Complexity
 * Sorting
 * Divide and Conquer Algorithms
@@ Line 28: / Line 28: @@
 * Key Data Structure
 * Dynamic Programming
+* Hidden Markov Models
-=== Part II : Matrix Operations and Numerical Optimizations ===
+=== Part II : Numerical Methods and Randomized Algorithms ===
-* Matrix decomposition (LU, QR, SVD)
+* Random Numbers
-* Implementation of Linear Models
+* Matrix Operations and Least Square Methods
-* Numerical Optimizations
+* Importance Sampling
-=== Part III : Advanced Statistical Methods ===
-* Hidden Markov Models
 * Expectation Maximization
 * Markov-Chain Monte Carlo Methods
+* Simulated Annealing
+* Gibbs Sampling
 == Class Notes ==
-* Lecture 1 : Statistical Computing -- [[Media:Biostat615-lecture1-handout.pdf | (Handout mode - PDF)]] [[Media:Biostat615-lecture1-presentation.pdf | (Presentation mode - PDF)]]
+* Lecture 1 : Introduction to Statistical Computing -- [[Media:Biostat615-Fall2011-lecture01-handout.pdf | (Handout PDF)]] [[Media:Biostat615-Fall2011-lecture01-presentation.pdf | (Presentation PDF)]]
-* Lecture 2 : C++ Basics and Precisions -- [[Media:Biostat615-lecture2-handout-nup.pdf | (Handout mode - PDF)]] [[Media:Biostat615-lecture2-presentation.pdf | (Presentation mode - PDF)]]
+* Lecture 2 : Introduction to C++ Programming -- [[Media:Biostat615-Fall2011-lecture02-handout.pdf | (Handout PDF)]] [[Media:Biostat615-Fall2011-lecture02-presentation.pdf | (Presentation PDF)]]
-* Lecture 3 : Implementing Fisher's Exact Test -- [[Media:Biostat615-lecture3-nup.pdf | (Handout mode - PDF)]] [[Media:Biostat615-lecture3.pdf | (Presentation mode - PDF)]]
+* Lecture 3 : C++ Basics and Fisher's Exact Test -- [[Media:Biostat615-Fall2011-lecture03-handout.pdf | (Handout PDF)]] [[Media:Biostat615-Fall2011-lecture03-presentation.pdf | (Presentation PDF)]]
-* Lecture 4 : Classes and STLs -- [[Media:Biostat615-lecture4-nup.pdf | (Handout mode - PDF)]] [[Media:Biostat615-lecture4-presentation.pdf | (Presentation mode - PDF)]]
+* Lecture 4 : Standard Template Libraries + Divide and Conquer Algorithms-- [[Media:Biostat615-Fall2011-lecture04-handout.pdf | (Handout PDF)]] [[Media:Biostat615-Fall2011-lecture04-presentation.pdf | (Presentation PDF)]]
-* Lecture 5 : Divide and Conquer Algorithms -- [[Media:Biostat615-lecture5-nup.pdf | (Handout mode - PDF)]] [[Media:Biostat615-lecture5.pdf | (Presentation mode - PDF)]]
+* Lecture 5 : Sorting Algorithms -- [[Media:Biostat615-Fall2011-lecture05-handout.pdf | (Handout PDF)]] [[Media:Biostat615-Fall2011-lecture05-presentation.pdf | (Presentation PDF)]]
-* Lecture 6 : Linear Sorting Algorithms and Elementary Data Structures -- [[Media:Biostat615-lecture6-nup.pdf | (Handout mode - PDF)]] [[Media:Biostat615-lecture6-presentation.pdf | (Presentation mode - PDF)]]
+* Lecture 6 : Sorting Algorithms & Arrays -- [[Media:Biostat615-Fall2011-lecture06-handout.pdf | (Handout PDF)]] [[Media:Biostat615-Fall2011-lecture06-presentation.pdf | (Presentation PDF)]]
-* Lecture 7 : Data Structures -- [[Media:Biostat615-lecture7-nup.pdf | (Handout mode - PDF)]] [[Media:Biostat615-lecture7-presentation.pdf | (Presentation mode - PDF)]]
+* Lecture 7 : List and Binary Search Trees -- [[Media:Biostat615-Fall2011-lecture07-handout.pdf | (Handout PDF)]] [[Media:Biostat615-Fall2011-lecture07-presentation.pdf | (Presentation PDF)]]
-* Lecture 8 : Hash Tables -- [[Media:Biostat615-lecture8-nup.pdf | (Handout mode - PDF)]] [[Media:Biostat615-lecture8-presentation.pdf | (Presentation mode - PDF)]]
+* Lecture 8 : Hash Tables -- [[Media:Biostat615-Fall2011-lecture08-handout.pdf | (Handout PDF)]] [[Media:Biostat615-Fall2011-lecture08-presentation.pdf | (Presentation PDF)]]
-* Lecture 9 : Dyamic Programming -- [[Media:Biostat615-lecture9-nup.pdf | (Handout mode - PDF)]] [[Media:Biostat615-lecture9-presentation.pdf | (Presentation mode - PDF)]]
+* Lecture 9 : Dynamic Programming -- [[Media:Biostat615-Fall2011-lecture09-handout.pdf | (Handout PDF)]] [[Media:Biostat615-Fall2011-lecture09-presentation.pdf | (Presentation PDF)]]
-* Lecture 10 : Boost Libraries and Graphical Algorithms -- [[Media:Biostat615-lecture10-nup.pdf | (Handout mode - PDF)]] [[Media:Biostat615-lecture10-presentation.pdf | (Presentation mode - PDF)]]
+* Review : Dynamic Programming & Midterm Review -- [[Media:Biostat615-Fall2011-midterm-2011-winter-.pdf | (PDF)]]
-* Lecture 11 : Hidden Markov Models -- [[Media:Biostat615-lecture11-nup.pdf | (Handout mode - PDF)]] [[Media:Biostat615-lecture11.pdf | (Presentation mode - PDF)]]
+* Lecture 10 : Hidden Markov Model-- [[Media:Biostat615-Fall2011-lecture10-handout.pdf | (PDF)]]  '''(UPDATED on Oct 29th at 12:51PM)'''
-* Lecture 12 : Hidden Markov Models -- [[Media:Biostat615-lecture12-nup.pdf | (Handout mode - PDF)]] [[Media:Biostat615-lecture12-presentation.pdf | (Presentation mode - PDF)]]
+* Lecture 11 : Hidden Markov Model (cont'd) -- [[Media:Biostat615-lecture11-2011-10-20.pdf | (PDF)]] '''(UPDATED on Oct 28th at 1:42PM)'''
-* Lecture 13 : Matrix Computation -- [[Media:Biostat615-lecture13-nup.pdf | (Handout mode - PDF)]] [[Media:Biostat615-lecture13-presentation.pdf | (Presentation mode - PDF)]]
+* Lecture 12 : Boost Library & Random Numbers -- [[Media:Biostat615-lecture12-2011-10-25.pdf | (PDF)]]
-* Lecture 14 : Implementing Linear Regression -- [[Media:Biostat615-lecture14-nup.pdf | (Handout mode - PDF)]] [[Media:Biostat615-lecture14-presentation.pdf | (Presentation mode - PDF)]]
+* Lecture 13 : Single dimensional optimization -- [[Media:Biostat615-lecture13-2011-10-27.pdf | (PDF)]]
-* Lecture 15 : Random Number Generation -- [[Media:Biostat615-lecture15-nup.pdf | (Handout mode - PDF)]] [[Media:Biostat615-lecture15-presentation.pdf | (Presentation mode - PDF)]]
+* Lecture 14 : Single and multi dimensional optimizations -- [[Media:Biostat615-fall2011-lecture14.pdf | (PDF)]] (Updated on Nov 3rd 1:25AM)
-* Lecture 16 : Monte-Carlo methods and importance sampling -- [[Media:Biostat615-lecture16-nup.pdf | (Handout mode - PDF)]] [[Media:Biostat615-lecture16-presentation.pdf | (Presentation mode - PDF)]]
+* Lecture 15 : Multi dimensional optimizations -- [[Media:Biostat615-fall2011-lecture15.pdf | (PDF)]] (Updated Nov 8 10:35AM)
-* Lecture 17 : Numerical optimization -- [[Media:Biostat615-lecture17-nup.pdf | (Handout mode - PDF)]] [[Media:Biostat615-lecture17-presentation.pdf | (Presentation mode - PDF)]]
+* Lecture 16 : E-M algorithm -- [[Media:Biostat615-fall2011-lecture16.pdf | (PDF)]] (Updated Nov 8 10:35AM)
-* Lecture 18 : Numerical optimization II -- [[Media:Biostat615-lecture18-nup.pdf | (Handout mode - PDF)]] [[Media:Biostat615-lecture18-presentation.pdf | (Presentation mode - PDF)]]
+* Lecture 17 : Simulated Annealing -- [[Media:Biostat615-fall2011-lecture17.pdf | (PDF)]]
-* Special Lecture : A practical session -- StatGen Library  [[Media:StatGenLecture.pdf | (LectureNotes - PDF)]] [[Media:Debugging.pdf | (Debugging - PDF)]] Demo Code: [[Media:StatGenDemo.tar | StatGenDemo.tar]] Additional Notes: [[Media:GithubWithoutGit.pdf | GithubWithoutGit.pdf]]
+* Lecture 18 : Gibbs Sampling -- [[Media:Biostat615-fall2011-lecture18.pdf | (PDF)]] (Updated Nov 16 10:00PM)
-* Lecture 19 : The Simplex Method [[Media:Biostat615-lecture19-nup.pdf | (Handout mode - PDF)]] [[Media:Biostat615-lecture19-presentation.pdf | (Presentation mode - PDF)]]
+* Lecture 19 : Importace Sampling -- [[Media:Biostat615-fall2011-lecture19.pdf | (PDF)]]
-* Lecture 20 : The E-M Algorithm [[Media:Biostat615-lecture20-nup.pdf | (Handout mode - PDF)]] [[Media:Biostat615-lecture20-presentation.pdf | (Presentation mode - PDF)]]
+* Lecture 20 : Advanced Hidden Markov Models -- [[Media:Biostat615-fall2011-lecture20.pdf | (PDF)]]
-* Lecture 21 : Simulated Annealing [[Media:Biostat615-lecture21-nup.pdf | (Handout mode - PDF)]] [[Media:Biostat615-lecture21-presentation.pdf | (Presentation mode - PDF)]]
+* Lecture 21 : Linear Algebra in C++ -- [[Media:Biostat615-fall2011-lecture21.pdf | (PDF)]]
-* Lecture 22 : Gibbs Sampler [[Media:Biostat615-lecture22-nup.pdf | (Handout mode - PDF)]] [[Media:Biostat615-lecture22-presentation.pdf | (Presentation mode - PDF)]]
+* Lecture 22 : More Linear Algebra in C++ -- [[Media:Biostat615-fall2011-lecture22.pdf | (PDF)]]
-* Lecture 23 : The Baum-Welch Algorithm [[Media:Biostat615-lecture23-nup.pdf | (Handout mode - PDF)]] [[Media:Biostat615-lecture23-presentation.pdf | (Presentation mode - PDF)]]
+* Lecture 23 : Interfacing between C++ and R -- [[Media:Biostat615-fall2011-lecture23.pdf | (PDF)]]
+* Review : Final Review -- [[Media:Biostat615-winter2011-final.pdf | (PDF)]] [[Media:Biostat615-homework-review.pdf | (Homework)]]
 == Problem Sets ==
-* Problem Set 1 : Due on January 20th, 2011 -- [[Media:Biostat615-homework1.pdf | (Problem Set 1 - PDF)]]
+* Problem Set 0 - Running screenshots of helloWorld.cpp and towerOfHanoi.cpp - Due before the submission of Problem Set 1
-* Problem Set 2 : Due on February 8th, 2011 -- [[Media:Biostat615-homework2.pdf | (Problem Set 2 - PDF)]]
+* Problem Set 1 -- Due on Tuesday September 27th, 2011 [[Media:Biostat615-Fall2011-homework01.pdf | (PDF)]] [[Media:Biostat615-Fall2011-homework01-solutions.pdf | (PDF-SOLUTIONS)]]
+* Problem Set 2 -- Due on Thursday October 6th, 2011 [[Media:Biostat615-Fall2011-homework02.pdf | (PDF)]] [[Media:Biostat615-Fall2011-homework02-solutions.pdf | (PDF-SOLUTIONS)]]
+** (Update Oct 2, 2011 : Note that the problem 1 and 3 are slightly updated for clarification)
+** (If you can't decompress the files above properly, use this alternative link by [http://dl.dropbox.com/u/1850834/biostat615-homework02-datasets.tar.gz CLICKING HERE] )
+* Problem Set 3 -- Due on Tuesday November 1st, 2011 [[Media:Biostat615-homework03.pdf | (PDF)]] (UPDATED on Oct 25th at 11:10AM)
+* Problem Set 4 -- Due on Tuesday November 15th, 2011 [[Media:Biostat615-homework04.pdf | (PDF)]]
+* Problem Set 5 -- Due on Tuesday November 29th, 2011 [[Media:Biostat615-fall2011-homework05.pdf | (PDF)]]
+* Problem Set 6 -- Due on Tuesday December 13th, 2011 [[Media:Biostat615-fall2011-homework06.pdf | (PDF)]]
+** [http://www.sph.umich.edu/csg/abecasis/class/2006/ModelFittingData.txt DOWNLOAD DATA FOR PROBLEM 1]
+** [http://dl.dropbox.com/u/1850834/zip_01.zip DOWNLOAD DATA FOR PROBLEM 3]
+=== Supplementary Data sets for Problem Sets ===
+* Problem Set 2
 ** [[Media:Shuf-1M.txt.gz| (Example data - shuf-1M.txt.gz)]] 1,000,000 randomly shuffled data (gzipped)
 ** [[Media:Rand-1M-3digits.txt.gz| (Example data - Rand-1M-3digits.txt.gz)]] 1,000,000 random data from 1 to 1,000]] (gzipped)
 ** [[Media:Rand-50k.txt.gz | (Example data - Rand-50k.txt.gz)]] 50,000 random data from 1 to 1,000,000)]] (gzippd)
-* Problem Set 3 : Due on February 17th, 2011 -- [[Media:Biostat615-homework3.pdf | (Problem Set 3 - PDF)]]
+* Problem Set 3
-** For problem 2 - For Visual C++ users : Refer to the [[Biostatistics 615/815: Main Page#Code_for_Problem_3_-_Problem_2 | (Code Example)]] for a better start
+** Example output data for problem 3-1 (input is the second column) '''(NOTE : ADDED on Oct 25 11:45PM)''' -- This is also reflected in lecture 11 class note.
-* Problem Set 4 : Due on March 8th, 2011 -- [[Media:Biostat615-homework4.pdf | (Problem Set 4 - PDF)]]
+ TIME	TOSS	P(FAIR)	P(BIAS)	MLSTATE
-** [[http://dl.dropbox.com/u/1850834/simulCoinInput.txt.gz Download sample input file (gzipped)]]
+	H	0.5950	0.4050	FAIR
-** [[http://dl.dropbox.com/u/1850834/simulCoinOutput.txt.gz Download sample output file(gzipped)]]
+	T	0.8118	0.1882	FAIR
-** See below if you want to copy and paste example code
+	H	0.8071	0.1929	FAIR
-* Problem Set 5 : Due on March 29th, 2011 -- [[Media:Biostat615-homework5.pdf | (Problem Set 5 - PDF)]]
+	T	0.8584	0.1416	FAIR
-* Problem Set 6 : Due on April 7th, 2011 -- [[Media:Biostat615-homework6.pdf | (Problem Set 6 - PDF)]]
+	H	0.7613	0.2387	FAIR
-* Problem Set 7 : Due on April 21th, 2011 -- [[Media:Biostat615-homework7.pdf | (Problem Set 7 - PDF)]]
+	H	0.7276	0.2724	FAIR
-** [[http://www.sph.umich.edu/csg/abecasis/class/2006/ModelFittingData.txt Download The example data to use]]
+	T	0.7495	0.2505	FAIR
-** [[http://dl.dropbox.com/u/1850834/mixData.txt Download Additional data for sanity check (same data used in the lecture slides)]]
+	H	0.5413	0.4587	BIASED
+	H	0.4187	0.5813	BIASED
+	H	0.3533	0.6467	BIASED
-=== Source code for Homework 6 - simplex615.h ===
+	H	0.3301	0.6699	BIASED
-  #ifndef __SIMPLEX_615_H
+	H	0.3436	0.6564	BIASED
-  #define __SIMPLEX_615_H
+	H	0.3971	0.6029	BIASED
+	T	0.5028	0.4972	BIASED
-  #include <vector>
+	H	0.3725	0.6275	BIASED
-  #include <cmath>
+	H	0.2985	0.7015	BIASED
-  #include <iostream>
+	H	0.2635	0.7365	BIASED
+	H	0.2596	0.7404	BIASED
-  #define ZEPS 1e-10
+	H	0.2858	0.7142	BIASED
+	H	0.3482	0.6518	BIASED
- class optFunc {
+** Example output data for problem 3-2 (input is the second column) '''(NOTE : UPDATED on Oct 25 11:23PM)'''
-  public:
+  TIME	TOSS	Pr(F)	Pr(HB)	Pr(TB)	MLSTATE
-   virtual double operator() (std::vector<double>& x) = 0;
+	T	0.8844	0.0326	0.0830	FAIR
- };
+	H	0.9012	0.0791	0.0198	FAIR
+	H	0.9075	0.0735	0.0189	FAIR
- // Simplex contains (dim+1)*dim points
+	T	0.9091	0.0145	0.0764	FAIR
- class simplex615 {
+	T	0.9068	0.0114	0.0818	FAIR
-  protected:
+	H	0.9058	0.0440	0.0502	FAIR
-   std::vector<std::vector<double> > X;
+	T	0.8834	0.0275	0.0891	FAIR
-   std::vector<double> Y;
+	H	0.8520	0.0698	0.0783	FAIR
-   std::vector<double> midPoint;
+	T	0.7713	0.0347	0.1940	FAIR
-   std::vector<double> thruLine;
+	T	0.6927	0.0823	0.2249	FAIR
+	H	0.4730	0.4984	0.0286	HEAD-BIASED
-   int dim, idxLo, idxHi, idxNextHi;
+	H	0.3227	0.6706	0.0066	HEAD-BIASED
+	H	0.2236	0.7726	0.0037	HEAD-BIASED
-   void evaluateFunction(optFunc& foo);
+	H	0.1589	0.8381	0.0031	HEAD-BIASED
-   void evaluateExtremes();
+	H	0.1169	0.8803	0.0028	HEAD-BIASED
-   void prepareUpdate();
+	H	0.0902	0.9072	0.0026	HEAD-BIASED
-   bool updateSimplex(optFunc& foo, double scale);
+	H	0.0740	0.9235	0.0025	HEAD-BIASED
-   void contractSimplex(optFunc& foo);
+	H	0.0654	0.9321	0.0025	HEAD-BIASED
-   static int check_tol(double fmax, double fmin, double ftol);
+	H	0.0630	0.9346	0.0025	HEAD-BIASED
+	H	0.0661	0.9314	0.0025	HEAD-BIASED
-  public:
+	H	0.0755	0.9219	0.0026	HEAD-BIASED
-   simplex615(double* p, int d);
+	H	0.0926	0.9038	0.0036	HEAD-BIASED
-   void amoeba(optFunc& foo, double tol);
+	H	0.1204	0.8684	0.0113	HEAD-BIASED
-   std::vector<double>& xmin();
+	H	0.1603	0.7586	0.0811	HEAD-BIASED
-   double ymin();
+	T	0.1904	0.0858	0.7238	TAIL-BASED
-  };
+	T	0.1819	0.0118	0.8063	TAIL-BASED
+	T	0.1797	0.0036	0.8167	TAIL-BASED
- simplex615::simplex615(double* p, int d) : dim(d) {
+	T	0.1894	0.0028	0.8077	TAIL-BASED
-   X.resize(dim+1);
+	T	0.2136	0.0038	0.7826	TAIL-BASED
-   Y.resize(dim+1);
+	T	0.2561	0.0123	0.7317	TAIL-BASED
-   midPoint.resize(dim);
+** Example input/output data for problem 3-3 (Applying 2-state HMM in Problem 3-1): Download using [http://dl.dropbox.com/u/1850834/biostat615-homework3-3-20k-examples.zip THIS LINK]
-   thruLine.resize(dim);
-   for(int i=0; i < dim+1; ++i) {
-     X[i].resize(dim);
-   }
-   // set every point the same
-   for(int i=0; i < dim+1; ++i) {
-     for(int j=0; j < dim; ++j) {
-       X[i][j] = p[j];
-     }
-   }
-   // then increase each dimension by one except for the last point
-   for(int i=0; i < dim; ++i) {
-     X[i][i] += 1.;
-   }
-  }
-  void simplex615::evaluateFunction(optFunc& foo) {
-   for(int i=0; i < dim+1; ++i) {
-     Y[i] = foo(X[i]);
-   }
-  }
-  void simplex615::evaluateExtremes() {
-   if ( Y[0] > Y[1] ) {
-     idxHi = 0;
-     idxLo = idxNextHi = 1;
-   }
-   else {
-     idxHi = 1;
-     idxLo = idxNextHi = 0;
-   }
-   for(int i=2; i < dim+1; ++i) {
-     if ( Y[i] <= Y[idxLo] ) {
-       idxLo = i;
-     }
-     else if ( Y[i] > Y[idxHi] ) {
-       idxNextHi = idxHi;
-       idxHi = i;
-     }
-     else if ( Y[i] > Y[idxNextHi] ) {
-       idxNextHi = i;
-     }
-   }
-  }
- void simplex615::prepareUpdate() {
-   for(int j=0; j < dim; ++j) {
-     midPoint[j] = 0;
-   }
-   for(int i=0; i < dim+1; ++i) {
-     if ( i != idxHi ) {
-       for(int j=0; j < dim; ++j) {
- 	midPoint[j] += X[i][j];
-       }
-     }
-   }
-   for(int j=0; j < dim; ++j) {
-     midPoint[j] /= dim;
-     thruLine[j] = X[idxHi][j] - midPoint[j];
-   }
-  }
-  bool simplex615::updateSimplex(optFunc& foo, double scale) {
-   std::vector<double> nextPoint;
-   nextPoint.resize(dim);
-   for(int i=0; i < dim; ++i) {
-     nextPoint[i] = midPoint[i] + scale * thruLine[i];
-   }
-   double fNext = foo(nextPoint);
-   if ( fNext < Y[idxHi] ) { // exchange with maximum
-     for(int i=0; i < dim; ++i) {
-       X[idxHi][i] = nextPoint[i];
-     }
-     Y[idxHi] = fNext;
-     return true;
-   }
-   else {
-     return false;
-   }
- }
- void simplex615::contractSimplex(optFunc& foo) {
-   for(int i=0; i < dim+1; ++i) {
-     if ( i != idxLo ) {
-       for(int j=0; j < dim; ++j) {
- 	X[i][j] = 0.5*( X[idxLo][j] + X[i][j] );
- 	Y[i] = foo(X[i]);
-       }
-     }
-   }
- }
- void simplex615::amoeba(optFunc& foo, double tol) {
-   evaluateFunction(foo);
-   while(true) {
-     evaluateExtremes();
-     prepareUpdate();
-     if ( check_tol(Y[idxHi],Y[idxLo],tol) ) break;
-     updateSimplex(foo, -1.0); // reflection
-     if ( Y[idxHi] < Y[idxLo] ) {
-       updateSimplex(foo, -2.0); // expansion
-     }
-     else if ( Y[idxHi] >= Y[idxNextHi] ) {
-       if ( !updateSimplex(foo, 0.5) ) {
- 	contractSimplex(foo);
-       }
-     }
-   }
- }
- std::vector<double>& simplex615::xmin() {
-   return X[idxLo];
- }
- double simplex615::ymin() {
-   return Y[idxLo];
- }
- int simplex615::check_tol(double fmax, double fmin, double ftol) {
-   double delta = fabs(fmax - fmin); double accuracy = (fabs(fmax) + fabs(fmin)) * ftol;
-   return (delta < (accuracy + ZEPS));
- }
- #endif // __SIMPLEX_615_H
-=== Source code for Homework 4 - Matrix615.h ===
- #ifndef __BIOSTAT615_MATRIX_H  // avoid including the same header twice
- #define __BIOSTAT615_MATRIX_H
- #include <iostream>
- #include <fstream>
- #include <boost/tokenizer.hpp>
- #include <boost/lexical_cast.hpp>
- #include <boost/foreach.hpp>
- #include <Eigen/Core>
- // a generic class for Matrix
- template<class T>
- class Matrix615 {
- protected:   // internal data
-   std::vector<T> data;    // using std::vector - object copy is now possible
-   int nr, nc;             // # rows and cols
-   bool hasMissing;
-   T valueMissing;
-   std::string strMissing;
- public:
-   // default constructor
-  Matrix615() : nr(0), nc(0), hasMissing(false) {}
-  Matrix615(const char* filename) : nr(0), nc(0), hasMissing(false) {
-     readFromFile(filename);
-   }
-  Matrix615(const T value, const char* str) : nr(0), nc(0) {
-     enableMissingValue(value, str);
-   }
-   // Allow missing value as a pair of actual value and string value
-   void enableMissingValue(const T value, const char* str) {
-     hasMissing = true;
-     valueMissing = value;
-     strMissing = str;
-   }
-   // resize the dimension of the matrix
-   void resize(int nrows, int ncols) {
-     nr = nrows;
-     nc = ncols;
-     data.resize(nr*nc);
-   }
-   // fill the content
-   void fill(T defaultValue) {
-     std::fill( data.begin(), data.end(), defaultValue );
-   }
-   void copyTo(Eigen::Matrix<T,Eigen::Dynamic,Eigen::Dynamic>& m) {
-     m.resize(nr,nc);
-     for(int j=0; j < nc; ++j) {
-       for(int i=0; i < nr; ++i) {
- 	m(i,j) = data[i*nc+j];
-       }
-     }
-   }
-   // access individual element
-   T& at(int r, int c) { return data[r*nc+c]; }
-   int numRows() { return nr; }
-   int numCols() { return nc; }
-   // print the content of the matrix
-   void print(std::ostream& o) {
-     for(int i=0; i < nr; ++i) {
-       for(int j=0; j < nc; ++j) {
- 	if ( j > 0 ) o << "\t";
- 	if ( hasMissing && ( valueMissing == at(i,j) ) )
- 	  o << strMissing;
- 	else
- 	  o << at(i,j);
-       }
-       o << std::endl;
-     }
-   }
-   // opens a file to fill the matrix
-   void readFromFile(const char* file) {
-     std::ifstream ifs(file);
-     if ( ! ifs.is_open() ) {
-       std::cerr << "Cannot open file " << file << std::endl;
-       abort();
-     }
-     std::string line;
-     boost::char_separator<char> sep(" \t");
-     typedef boost::tokenizer< boost::char_separator<char> > wsTokenizer;
-     data.clear();
-     nr = nc = 0;
-     while( std::getline(ifs, line) ) {
-       wsTokenizer t(line,sep);
-       for(wsTokenizer::iterator i=t.begin(); i != t.end(); ++i) {
- 	// if hasMissing is set, convert string "Missing" into special value for Missing
- 	if ( hasMissing && ( i->compare(strMissing) == 0 ) )
- 	    data.push_back(valueMissing);
- 	// Otherwise, convert the string to a particular type
- 	else
- 	  data.push_back(boost::lexical_cast<T>(i->c_str()));
- 	if ( nr == 0 ) ++nc;  // count # of columns at the first row
-       }
-       ++nr;
-       // when reading each line, make sure that the # of columns match to expectation;
-       if ( (int)data.size() != nr*nc ) {
- 	std::cerr << "The input file is not rectangle at line " << nr << std::endl;
- 	abort();
-       }
-     }
-   }
- };
- #endif
-=== Source code for Homework 4 - Problem 1 ===
- #include "Matrix615.h"
- #define NIL 999999
- int main(int argc, char** argv) {
-   if ( argc != 2 ) {
-     std::cerr << "Usage: floydWarshall [weight matrix]" << std::endl;
-     abort();
-   }
-   Matrix615<int> W;
-   W.enableMissingValue(NIL,"NIL");
-   W.readFromFile(argv[1]);
-   if ( W.numRows() != W.numCols() ) {
-     std::cerr << "The input file is not exactly square" << std::endl;
-     abort();
-   }
-   int n = W.numRows();
-   Matrix615<int> P;
-   P.enableMissingValue(NIL,"NIL");
-   P.resize(n,n);
-   // Fill out the code in the part marked as *** [FILL HERE] ***
-   // *** FILL HERE ** initialize P and W matrix as described in page 695-696 of the textbook
-   Matrix615<int> D = W;  // object copy is valid because no pointer is involved
-   // print initial D and W matrix
-   std::cout << "---------- Initial D  Matrix ---------------------------------" << std::endl;
-   D.print(std::cout);
-   std::cout << "---------- Initial PI Matrix ---------------------------------" << std::endl;
-   P.print(std::cout);
-   for(int k=0; k < n; ++k) {
-     // *** FILL HERE *** Run floyd-Warshall algorithm for each k and
-     // print out D and PI matrix for each iteration
-     std::cout << "---------- D  Matrix after covering node " << k << "----------" << std::endl;
-     D.print(std::cout);
-     std::cout << "---------- PI Matrix after covering node " << k << "----------" << std::endl;
-     P.print(std::cout);
-   }
-   // *** EXTRA POINTS ***  Extra points (10\%) will be given if the optimal path
-   //                       for every pair of node is printed below
-   return 0;
- }
-=== Source code for Homework 4 - Problem 2 ===
- #include <cstdlib>
- #include <iomanip>
- #include <iostream>
- #include "Matrix615.h"
- #define N_STATES 2
- #define N_DATA 2
- const char* stateLabels[N_STATES] = {"F","B"};
- const char* dataLabels[N_DATA] = {"H","T"};
- class BiasedCoinHMM {
- protected:
-   int T;
-   // HMM initial parameters
-   std::vector<double> pi; // N_STATES * 1 vector
-   Matrix615<double> A;    // N_STATES * N_STATES matrix
-                           //  : A_{ij} is \Pr(q_t=i|q_{t-1}=j)
-   Matrix615<double> B;    // N_OBS * N_STATES matrix
-                           //  : B_{ij} = b_j(o_i) = \Pr(o_t=i|q_t=j)
-   // Data (observations)
-   std::vector<int> o; // vector observations
-   // forward-backward probability
-   Matrix615<double> alphas; // T * N_STATES : alphas[i,j] = alpha_i(i)
-   Matrix615<double> betas;  // T * N_STATES : betas[i,j]  = beta_j(i)
-   Matrix615<double> gammas; // T * N_STATES : gammas[i,j] = gammas_j(i) = Pr(q_t=i|o_t=i)
-   // viterbi probability and paths
-   Matrix615<double> deltas; // T * N_STATES
-   Matrix615<int> phis;   // T * N_STATES
-   std::vector<int> mleStates; // vector of MLE states
- public:
-   BiasedCoinHMM(double priors[N_STATES], double trans[N_STATES][N_STATES], double emis[N_DATA][N_STATES]) {
-     for(int i=0; i < N_STATES; ++i) {
-       pi.push_back(priors[i]);
-     }
-     A.resize(N_STATES,N_STATES);
-     for(int i=0; i < N_STATES; ++i) {
-       for(int j=0; j < N_STATES; ++j) {
-         A.at(i,j) = trans[i][j];
-       }
-     }
-     B.resize(N_DATA,N_STATES);
-     for(int i=0; i < N_DATA; ++i) {
-       for(int j=0; j < N_STATES; ++j) {
-         B.at(i,j) = emis[i][j];
-       }
-     }
-     T = 0;
-   }
-   void loadObservations(std::vector<int>& obs) {
-     o = obs;
-     T = o.size();
-     alphas.resize(T,N_STATES);
-     betas.resize(T,N_STATES);
-     gammas.resize(T,N_STATES);
-     deltas.resize(T,N_STATES);
-     phis.resize(T,N_STATES);
-     mleStates.resize(T);
-   }
-   void computeForwardBackward() {
-     // *** FILL OUT *** to compute
-     // forward and backward probabilities into alphas, betas, and gammas
-   }
-   void computeViterbiPath() {
-     // *** FILL OUT *** to compute
-     // viterbi path and likelihood into deltas, phis, and mleStates,
-   }
-   void outputResults(std::ostream& os, std::vector<int>& trueStates) {
-     if ( T != (int)trueStates.size() ) {
-       std::cerr << "True states are in different length with HMM" << std::endl;
-       abort();
-     }
-     os << "#SEQ\tTRUE_S\tOBS\tP(q|o)\tMLE_S" << std::endl << std::fixed << std::setprecision(4);
-     for(int t = 0; t < T; ++t) {
-       os << t+1 << "\t"
-          << stateLabels[trueStates[t]] << "\t"
-          << dataLabels[o[t]] << "\t"
-          << gammas.at(t,1) << "\t"  // prints Pr(q_t=1|o)
-          << stateLabels[mleStates[t]] << "\n";
-     }
-   }
- };
- int main(int argc, char** argv) {
-   if ( argc != 2 ) {
-     std::cerr << "Usage: coinHMM [inputFile]" << std::endl;
-     return -1;
-   }
-   std::vector<int> trueStates;
-   std::vector<int> observations;
-   std::ifstream ifs(argv[1]);
-   if ( !ifs.is_open() ) {
-     std::cerr << "Cannot open file " << argv[1] << std::endl;
-     return -1;
-   }
-   std::string tok;
-   for(int i=0; ifs >> tok; ++i) {
-     if ( i % 3 == 1 ) {
-       if ( tok.compare("F") == 0 ) {
-         trueStates.push_back(0);
-       }
-       else if ( tok.compare("B") == 0 ) {
-         trueStates.push_back(1);
-       }
-       else {
-         std::cerr << "Cannot recognize state " << tok << std::endl;
-       }
-     }
-     else if ( i % 3 == 2 ) {
-       if ( tok.compare("H") == 0 ) {
-         observations.push_back(0);
-       }
-       else if ( tok.compare("T") == 0 ) {
-         observations.push_back(1);
-       }
-       else {
-         std::cerr << "Cannot recognize observation " << tok << std::endl;
-       }
-     }
-   }
-   std::cout << "Finished reading " << trueStates.size() << " states and observations.." << std::endl;
-   double trans[N_STATES][N_STATES] = { {0.95,0.2}, {0.05,0.8} };
-   double emis[N_DATA][N_STATES] = { {0.5,0.9}, {0.5,0.1} };
-   double pi[N_STATES] = {0.9,0.1};
-   BiasedCoinHMM bcHMM(pi, trans, emis);
-   bcHMM.loadObservations(observations);
-   bcHMM.computeForwardBackward();
-   bcHMM.computeViterbiPath();
-   bcHMM.outputResults(std::cout, trueStates);
-   return 0;
- }
-=== Additional example of containing negative weights ===
- $ cat ./sampleInput2.txt
-	3	8	NIL	-4
- NIL	0	NIL	1	7
- NIL	4	0	NIL	NIL
-	NIL	-5	0	NIL
- NIL	NIL	NIL	6	0
- $ ./floydWarshall ./sampleInput2.txt
- ---------- Initial D  Matrix ---------------------------------
-	3	8	NIL	-4
- NIL	0	NIL	1	7
- NIL	4	0	NIL	NIL
-	NIL	-5	0	NIL
- NIL	NIL	NIL	6	0
- ---------- Initial PI Matrix ---------------------------------
- NIL	0	0	NIL	0
- NIL	NIL	NIL	1	1
- NIL	2	NIL	NIL	NIL
-	NIL	3	NIL	NIL
- NIL	NIL	NIL	4	NIL
- ---------- D  Matrix after covering node 0----------
-	3	8	NIL	-4
- NIL	0	NIL	1	7
- NIL	4	0	NIL	999995
-	5	-5	0	-2
- NIL	NIL	NIL	6	0
- ---------- PI Matrix after covering node 0----------
- NIL	0	0	NIL	0
- NIL	NIL	NIL	1	1
- NIL	2	NIL	NIL	0
-	0	3	NIL	0
- NIL	NIL	NIL	4	NIL
- ---------- D  Matrix after covering node 1----------
-	3	8	4	-4
- NIL	0	NIL	1	7
- NIL	4	0	5	11
-	5	-5	0	-2
- NIL	NIL	NIL	6	0
- ---------- PI Matrix after covering node 1----------
- NIL	0	0	1	0
- NIL	NIL	NIL	1	1
- NIL	2	NIL	1	1
-	0	3	NIL	0
- NIL	NIL	NIL	4	NIL
- ---------- D  Matrix after covering node 2----------
-	3	8	4	-4
- NIL	0	NIL	1	7
- NIL	4	0	5	11
-	-1	-5	0	-2
- NIL	NIL	NIL	6	0
- ---------- PI Matrix after covering node 2----------
- NIL	0	0	1	0
- NIL	NIL	NIL	1	1
- NIL	2	NIL	1	1
-	2	3	NIL	0
- NIL	NIL	NIL	4	NIL
- ---------- D  Matrix after covering node 3----------
-	3	-1	4	-4
-	0	-4	1	-1
-	4	0	5	3
-	-1	-5	0	-2
-	5	1	6	0
- ---------- PI Matrix after covering node 3----------
- NIL	0	3	1	0
-	NIL	3	1	0
-	2	NIL	1	0
-	2	3	NIL	0
-	2	3	4	NIL
- ---------- D  Matrix after covering node 4----------
-	1	-3	2	-4
-	0	-4	1	-1
-	4	0	5	3
-	-1	-5	0	-2
-	5	1	6	0
- ---------- PI Matrix after covering node 4----------
- NIL	2	3	4	0
-	NIL	3	1	0
-	2	NIL	1	0
-	2	3	NIL	0
-	2	3	4	NIL
- Optimal path for 1 <- 0 (d = 1) : 1 <-(4)-- 2 <-(-5)-- 3 <-(6)-- 4 <-(-4)-- 0
- Optimal path for 2 <- 0 (d = -3) : 2 <-(-5)-- 3 <-(6)-- 4 <-(-4)-- 0
- Optimal path for 3 <- 0 (d = 2) : 3 <-(6)-- 4 <-(-4)-- 0
- Optimal path for 4 <- 0 (d = -4) : 4 <-(-4)-- 0
- Optimal path for 0 <- 1 (d = 3) : 0 <-(2)-- 3 <-(1)-- 1
- Optimal path for 2 <- 1 (d = -4) : 2 <-(-5)-- 3 <-(1)-- 1
- Optimal path for 3 <- 1 (d = 1) : 3 <-(1)-- 1
- Optimal path for 4 <- 1 (d = -1) : 4 <-(-4)-- 0 <-(2)-- 3 <-(1)-- 1
- Optimal path for 0 <- 2 (d = 7) : 0 <-(2)-- 3 <-(1)-- 1 <-(4)-- 2
- Optimal path for 1 <- 2 (d = 4) : 1 <-(4)-- 2
- Optimal path for 3 <- 2 (d = 5) : 3 <-(1)-- 1 <-(4)-- 2
- Optimal path for 4 <- 2 (d = 3) : 4 <-(-4)-- 0 <-(2)-- 3 <-(1)-- 1 <-(4)-- 2
- Optimal path for 0 <- 3 (d = 2) : 0 <-(2)-- 3
- Optimal path for 1 <- 3 (d = -1) : 1 <-(4)-- 2 <-(-5)-- 3
- Optimal path for 2 <- 3 (d = -5) : 2 <-(-5)-- 3
- Optimal path for 4 <- 3 (d = -2) : 4 <-(-4)-- 0 <-(2)-- 3
- Optimal path for 0 <- 4 (d = 8) : 0 <-(2)-- 3 <-(6)-- 4
- Optimal path for 1 <- 4 (d = 5) : 1 <-(4)-- 2 <-(-5)-- 3 <-(6)-- 4
- Optimal path for 2 <- 4 (d = 1) : 2 <-(-5)-- 3 <-(6)-- 4
- Optimal path for 3 <- 4 (d = 6) : 3 <-(6)-- 4
-=== Source code from Homework 2 - Problem 3 ===
- #include <iostream>
- #include <vector>
- #include <ctime>
- #include <fstream>
- #include <set>
- #include "mySortedArray.h"
- #include "myTree.h"
- #include "myList.h"
- int main(int argc, char** argv) {
-   int tok;
-   std::vector<int> v;
-   if ( argc > 1 ) {
-     std::ifstream fin(argv[1]);
-     while( fin >> tok ) { v.push_back(tok); }
-     fin.close();
-   }
-   else {
-     while( std::cin >> tok ) { v.push_back(tok); }
-   }
-   mySortedArray<int> c1;
-   myList<int> c2;
-   myTree<int> c3;
-   std::set<int> s;
-   clock_t start = clock();
-   for(int i=0; i < (int)v.size(); ++i) {
-     c1.insert(v[i]);
-   }
-   clock_t finish = clock();
-   double duration = (double)(finish-start)/CLOCKS_PER_SEC;
-   std::cout << "Sorted Array (Insert) " << duration << std::endl;
-   start = clock();
-   for(int i=0; i < (int)v.size(); ++i) {
-     c2.insert(v[i]);
-   }
-   finish = clock();
-   duration = (double)(finish-start)/CLOCKS_PER_SEC;
-   std::cout << "List (Insert) " << duration << std::endl;
-   start = clock();
-   for(int i=0; i < (int)v.size(); ++i) {
-     c3.insert(v[i]);
-   }
-   finish = clock();
-   duration = (double)(finish-start)/CLOCKS_PER_SEC;
-   std::cout << "Tree (Insert) " << duration << std::endl;
-   start = clock();
-   for(int i=0; i < (int)v.size(); ++i) {
-     s.insert(v[i]);
-   }
-   finish = clock();
-   duration = (double)(finish-start)/CLOCKS_PER_SEC;
-   std::cout << "std::set (Insert) " << duration << std::endl;
-   start = clock();
-   for(int i=0; i < (int)v.size(); ++i) {
-     c1.search(v[i]);
-   }
-   finish = clock();
-   duration = (double)(finish-start)/CLOCKS_PER_SEC;
-   std::cout << "Sorted Array (Search) " << duration << std::endl;
-   start = clock();
-   for(int i=0; i < (int)v.size(); ++i) {
-     c2.search(v[i]);
-   }
-   finish = clock();
-   duration = (double)(finish-start)/CLOCKS_PER_SEC;
-   std::cout << "List (Search) " << duration << std::endl;
-   start = clock();
-   for(int i=0; i < (int)v.size(); ++i) {
-     c3.search(v[i]);
-   }
-   finish = clock();
-   duration = (double)(finish-start)/CLOCKS_PER_SEC;
-   std::cout << "Tree (Search) " << duration << std::endl;
-   start = clock();
-   for(int i=0; i < (int)v.size(); ++i) {
-     s.find(v[i]);
-   }
-   finish = clock();
-   duration = (double)(finish-start)/CLOCKS_PER_SEC;
-   std::cout << "std::set (Search) " << duration << std::endl;
- }
-=== The header file of mySortedArray.h (For Homework 2 Problem 3)===
- #include <iostream>
- #define DEFAULT_ALLOC 1024
- template <class T> // template supporting a generic type
- class mySortedArray {
-  protected:     // member variables hidden from outside
-    T *data;    // array of the genetic type
-    int size;   // number of elements in the container
-    int nalloc; // # of objects allocated in the memory
-   mySortedArray(mySortedArray& a) {};
-   int search(const T& x, int begin, int end);
- public:      // abstract interface visible to outside
-    mySortedArray();         // default constructor
-    ~mySortedArray();        // desctructor
-    void insert(const T& x); // insert an element x
-    int search(const T& x);  // search for an element x and return its location
-    bool remove(const T& x); // delete a particular element
- };
- template <class T>
- mySortedArray<T>::mySortedArray() {     // default constructor
-   size = 0;              // array do not have element initially
-   nalloc = DEFAULT_ALLOC;
-   data = new T[nalloc];  // allocate default # of objects in memory
- }
- template <class T>
- mySortedArray<T>::~mySortedArray() {    // destructor
-   if ( data != NULL ) {
-     delete [] data;      // delete the allocated memory before destorying
-   }                      // the object. otherwise, memory leak happens
- }
- template <class T>
- void mySortedArray<T>::insert(const T& x) {
-   if ( size >= nalloc ) {  // if container has more elements than allocated
-     T* newdata = new T[nalloc*2];   // make an array at doubled size
-     for(int i=0; i < nalloc; ++i) {
-       newdata[i] = data[i];         // copy the contents of array
-     }
-     delete [] data;                 // delete the original array
-     data = newdata;                 // and reassign data ptr
-   }
-   int i;
-   for(i=size-1; (i >= 0) && (data[i] > x); --i) {
-     data[i+1] = data[i];            // insert the list into right position
-   }
-   data[i+1] = x;
-   ++size;                           // increase the size
- }
- template <class T>
- int mySortedArray<T>::search(const T& x) {
-   return search(x, 0, size-1);
- }
- template <class T>
- int mySortedArray<T>::search(const T& x, int begin, int end) {
-   if ( begin > end ) {
-     return -1;
-   }
-   else {
-     int mid = (begin+end)/2;
-     if ( data[mid] == x ) {
-       return mid;
-     }
-     else if ( data[mid] < x ) {
-       return search(x, mid+1, end);
-     }
-     else {
-       return search(x, begin, mid);
-     }
-   }
- }
- template <class T>
- bool mySortedArray<T>::remove(const T& x) {
-   int i = search(x);  // try to find the element
-   if ( i >= 0 ) {      // if found
-     for(int j=i; j < size-1; ++j) {
-       data[i] = data[i+1];  // shift all the elements by one
-     }
-     --size;           // and reduce the array size
-     return true;      // successfully removed the value
-   }
-   else {
-     return false;     // cannot find the value to remove
-   }
- }
-=== Example code to generare all possible permutations ===
-Note that http://www.cplusplus.com/reference/algorithm/next_permutation/ provides much simpler solution for generating permutation. (Thanks to Matthew Snyder)
- // Code to generate permutations from 1..n
- // These code was adopted from
- // http://geeksforgeeks.org/?p=767 by Hyun Min Kang
- #include <iostream>
- #include <vector>
- // swaps two elements
- void swap(std::vector<int>& v, int i, int j) {
-   int tmp = v[i];
-   v[i] = v[j];
-   v[j] = tmp;
-   return;
- }
- // actual engine - recursively calls permutation
- void permute(std::vector<int>& v, int from, int to) {
-   if ( from == to ) {           // print the permutation
-     for(int i=0; i < v.size(); ++i) {
-       std::cout << " " << v[i];
-     }
-     std::cout << std::endl;
-   }
-   else {
-     for(int i = from; i <= to; ++i) {
-       swap(v,from,i);           // swaps two elements
-       permute(v, from+1, to);   // permute the rest
-       swap(v,from,i);           // recover the vector to the original state
-     }
-   }
- }
- // Permutation Example
- int main(int argc, char** argv) {
-   if ( argc != 2 ) {
-     std::cerr << "Usage: " << argv[0] << " " << "[# of samples]" << std::endl;
-     return -1;
-   }
-   // takes input from 1..n
-   int n = atoi(argv[1]);
-   std::vector<int> input;
-   for(int i=1; i <= n; ++i) {
-     input.push_back(i);
-   }
-   permute(input, 0, n-1); // print all possible permutations
-   return 0;
- }
- #include <iostream>                       // for input/output
- #include <boost/graph/adjacency_list.hpp> // for using graph type
- #include <boost/graph/dijkstra_shortest_paths.hpp> // for dijkstra algorithm
- using namespace std;   // allow to omit prefix 'std::'
- using namespace boost; // allow to omit prefix 'boost::'
- int getNodeID(int node) {
-   return ((node/5)+1)*10+(node%5+1);
- }
-=== Code for Problem 3 - Problem 2 ===
- #include <iostream>                       // for input/output
- #include <boost/graph/adjacency_list.hpp> // for using graph type
- #include <boost/graph/dijkstra_shortest_paths.hpp> // for dijkstra algorithm
- using namespace std;   // allow to omit prefix 'std::'
- using namespace boost; // allow to omit prefix 'boost::'
- int getNodeID(int node) {
-   return ((node/5)+1)*10+(node%5+1);
- }
- int main(int argc, char** argv) {
-   // defining a graph type
-   // 1. edges are stored as std::list internally
-   // 2. verticies are stored as std::vector internally
-   // 3. the graph is directed (undirectedS, bidirectionalS can be used)
-   // 4. vertices do not carry particular properties
-   // 5. edges contains weight property as integer value
-   typedef adjacency_list< listS, vecS, directedS, no_property, property< edge_weight_t, int> > graph_t;
-   // vertex_descriptor is a type for representing vertices
-   typedef graph_traits< graph_t >::vertex_descriptor vertex_descriptor;
-   // a nodes is represented as an integer, and an edge is a pair of integers
-   typedef std::pair<int, int> E;
-   // Connect between verticies as in the Manhattan Tourist Problem
-   // Each node is numbered as a two-digit integer of [#row] and [#col]
-   enum { N11, N12, N13, N14, N15,
- 	 N21, N22, N23, N24, N25,
- 	 N31, N32, N33, N34, N35,
- 	 N41, N42, N43, N44, N45,
- 	 N51, N52, N53, N54, N55 };
-   E edges [] = { E(N11,N12), E(N12,N13), E(N13,N14), E(N14,N15),
- 		 E(N21,N22), E(N22,N23), E(N23,N24), E(N24,N25),
- 		 E(N31,N32), E(N32,N33), E(N33,N34), E(N34,N35),
- 		 E(N41,N42), E(N42,N43), E(N43,N44), E(N44,N45),
- 		 E(N51,N52), E(N52,N53), E(N53,N54), E(N54,N55),
- 		 E(N11,N21), E(N12,N22), E(N13,N23), E(N14,N24), E(N15,N25),
- 		 E(N21,N31), E(N22,N32), E(N23,N33), E(N24,N34), E(N25,N35),
- 		 E(N31,N41), E(N32,N42), E(N33,N43), E(N34,N44), E(N35,N45),
- 		 E(N41,N51), E(N42,N52), E(N43,N53), E(N44,N54), E(N45,N55) };
-   // Assign weights for each edge
-   int weight [] = { 4, 2, 0, 7,     // horizontal weights
-, 4, 5, 9,
-, 8, 1, 0,
-, 6, 4, 7,
-, 5, 8, 5,
-, 6, 6, 2, 4,  // vertical weights
-, 7, 1, 0, 6,
-, 8, 4, 8, 9,
-, 6, 6, 0, 7 };
-   // define a graph as an array of edges and weights
- #if defined(BOOST_MSVC) && BOOST_MSVC <= 1300 // special routine for MS VC++
-   typedef graph_traits < graph_t >::edge_descriptor edge_descriptor;
-   graph_t g(25);
-   property_map<graph_t, edge_weight_t>::type weightmap = get(edge_weight, g);
-   for (std::size_t j = 0; j < sizeof(edges)/sizeof(E); ++j) {
-     edge_descriptor e;
-     bool inserted;
-     boost::tie(e, inserted) = add_edge(edge_array[j].first, edge_array[j].second, g);
-     weightmap[e] = weights[j];
-   }
- #else  // for regulat compilers
-   graph_t g(edges, edges + sizeof(edges) / sizeof(E), weight, 25);
- #endif
-   std::vector<vertex_descriptor> p(num_vertices(g));
-   std::vector<int> d(num_vertices(g));
-   vertex_descriptor s = vertex(N11, g);
- #if defined(BOOST_MSVC) && BOOST_MSVC <= 1300 // for VC++
-   property_map<graph_t, vertex_index_t>::type indexmap = get(vertex_index, g);
-   dijkstra_shortest_paths(g, s, &p[0], &d[0], weightmap, indexmap,
-                           std::less<int>(), closed_plus<int>(),
-                           (std::numeric_limits<int>::max)(), 0,
-                           default_dijkstra_visitor());
- #else  // for regular compilers
-   dijkstra_shortest_paths(g, s, predecessor_map(&p[0]).distance_map(&d[0]));
- #endif
-   graph_traits < graph_t >::vertex_iterator vi, vend;
-   std::cout << "Backtracking the optimal path from the destination to source" << std::endl;
-   for(int node = N55; node != N11; node = p[node]) {
-     std::cout << "Path: N" << getNodeID(p[node]) << " -> N" << getNodeID(node) << ", Distance from origin is " << d[node] << std::endl;
-   }
-   return 0;
- }
 == Office Hours ==
-* Friday 9:30AM-12:30PM
+* Friday 9:00AM-10:30PM
-== Information on Biostatistics Cluster ==
-* You may want to request an account to the biostatistics cluster. Refer to https://webservices.itcs.umich.edu/mediawiki/Biostatistics/index.php/Cluster for further information.
 == Standards of Academic Conduct ==
@@ Line 1,036: / Line 153: @@
 == Course History ==
+* Winter 2011 Course Web Site [[Biostatistics_615/815_Winter_2011]]
-Goncalo Abecasis taught it in several academic years previously. For previous course notes, see [[http://www.sph.umich.edu/csg/abecasis/class Goncalo's older class notes]].
+* Goncalo Abecasis taught it in several academic years previously. For previous course notes, see [[http://www.sph.umich.edu/csg/abecasis/class Goncalo's older class notes]].