Line 1: |
Line 1: |
| == Objective == | | == Objective == |
| | | |
− | In this winter, Biostatistics 615/815 aims for providing students with a practical understanding of computational aspects in implementing statistical methods. Although C++ language will be used throughout the course, using Java programming language for homework and project will be acceptable. | + | In Fall 2011, Biostatistics 615/815 aims for providing students with a practical understanding of computational aspects in implementing statistical methods. Although C++ language will be used throughout the course, using Java programming language for homework and project will be acceptable. |
| | | |
| == Target Audience == | | == Target Audience == |
Line 10: |
Line 10: |
| | | |
| == Textbook == | | == Textbook == |
− | * Required Textbook : Cormen, Leiserson, Rivest, and Stein, "Introduction to Algorithms", Third Edition, The MIT Press, 2009 [http://mitpress.mit.edu/algorithms/ [Official Book Web Site]] | + | * Recommended Textbook : Cormen, Leiserson, Rivest, and Stein, "Introduction to Algorithms", Third Edition, The MIT Press, 2009 [http://mitpress.mit.edu/algorithms/ [Official Book Web Site]] |
| * Optional Textbook : Press, Teukolsky, Vetterling, Flannery, "Numerical Recipes", 3rd Edition, Cambridge University Press, 2007 [http://www.nr.com/ [Official Book Web Site]] | | * Optional Textbook : Press, Teukolsky, Vetterling, Flannery, "Numerical Recipes", 3rd Edition, Cambridge University Press, 2007 [http://www.nr.com/ [Official Book Web Site]] |
| | | |
| == Class Schedule == | | == Class Schedule == |
| | | |
− | Classes are scheduled for Tuesday and Thursdays, 8:30 - 10:00 am at SPH II M4318 | + | Classes are scheduled for Tuesday and Thursdays, 8:30 - 10:00 am at SPH II M4332 |
| | | |
| == Topics == | | == Topics == |
Line 21: |
Line 21: |
| The following contents are planned to be covered. | | The following contents are planned to be covered. |
| | | |
− | === Part I : Algorithms 101 === | + | === Part I : C++ Basics and Introductory Algorithms === |
− | * Understanding of Computational Time Complexity | + | * Computational Time Complexity |
| * Sorting | | * Sorting |
| * Divide and Conquer Algorithms | | * Divide and Conquer Algorithms |
Line 28: |
Line 28: |
| * Key Data Structure | | * Key Data Structure |
| * Dynamic Programming | | * Dynamic Programming |
| + | * Hidden Markov Models |
| | | |
− | === Part II : Matrix Operations and Numerical Optimizations === | + | === Part II : Numerical Methods and Randomized Algorithms === |
− | * Matrix decomposition (LU, QR, SVD) | + | * Random Numbers |
− | * Implementation of Linear Models | + | * Matrix Operations and Least Square Methods |
− | * Numerical Optimizations
| + | * Importance Sampling |
− | | |
− | === Part III : Advanced Statistical Methods ===
| |
− | * Hidden Markov Models | |
| * Expectation Maximization | | * Expectation Maximization |
| * Markov-Chain Monte Carlo Methods | | * Markov-Chain Monte Carlo Methods |
| + | * Simulated Annealing |
| + | * Gibbs Sampling |
| | | |
| == Class Notes == | | == Class Notes == |
− | * Lecture 1 : Statistical Computing -- [[Media:Biostat615-lecture1-handout.pdf | (Handout mode - PDF)]] [[Media:Biostat615-lecture1-presentation.pdf | (Presentation mode - PDF)]] | + | * Lecture 1 : Introduction to Statistical Computing -- [[Media:Biostat615-Fall2011-lecture01-handout.pdf | (Handout PDF)]] [[Media:Biostat615-Fall2011-lecture01-presentation.pdf | (Presentation PDF)]] |
− | * Lecture 2 : C++ Basics and Precisions -- [[Media:Biostat615-lecture2-handout-nup.pdf | (Handout mode - PDF)]] [[Media:Biostat615-lecture2-presentation.pdf | (Presentation mode - PDF)]] | + | * Lecture 2 : Introduction to C++ Programming -- [[Media:Biostat615-Fall2011-lecture02-handout.pdf | (Handout PDF)]] [[Media:Biostat615-Fall2011-lecture02-presentation.pdf | (Presentation PDF)]] |
− | * Lecture 3 : Implementing Fisher's Exact Test -- [[Media:Biostat615-lecture3-nup.pdf | (Handout mode - PDF)]] [[Media:Biostat615-lecture3.pdf | (Presentation mode - PDF)]] | + | * Lecture 3 : C++ Basics and Fisher's Exact Test -- [[Media:Biostat615-Fall2011-lecture03-handout.pdf | (Handout PDF)]] [[Media:Biostat615-Fall2011-lecture03-presentation.pdf | (Presentation PDF)]] |
− | * Lecture 4 : Classes and STLs -- [[Media:Biostat615-lecture4-nup.pdf | (Handout mode - PDF)]] [[Media:Biostat615-lecture4-presentation.pdf | (Presentation mode - PDF)]] | + | * Lecture 4 : Standard Template Libraries + Divide and Conquer Algorithms-- [[Media:Biostat615-Fall2011-lecture04-handout.pdf | (Handout PDF)]] [[Media:Biostat615-Fall2011-lecture04-presentation.pdf | (Presentation PDF)]] |
− | * Lecture 5 : Divide and Conquer Algorithms -- [[Media:Biostat615-lecture5-nup.pdf | (Handout mode - PDF)]] [[Media:Biostat615-lecture5.pdf | (Presentation mode - PDF)]] | + | * Lecture 5 : Sorting Algorithms -- [[Media:Biostat615-Fall2011-lecture05-handout.pdf | (Handout PDF)]] [[Media:Biostat615-Fall2011-lecture05-presentation.pdf | (Presentation PDF)]] |
− | * Lecture 6 : Linear Sorting Algorithms and Elementary Data Structures -- [[Media:Biostat615-lecture6-nup.pdf | (Handout mode - PDF)]] [[Media:Biostat615-lecture6-presentation.pdf | (Presentation mode - PDF)]] | + | * Lecture 6 : Sorting Algorithms & Arrays -- [[Media:Biostat615-Fall2011-lecture06-handout.pdf | (Handout PDF)]] [[Media:Biostat615-Fall2011-lecture06-presentation.pdf | (Presentation PDF)]] |
− | * Lecture 7 : Data Structures -- [[Media:Biostat615-lecture7-nup.pdf | (Handout mode - PDF)]] [[Media:Biostat615-lecture7-presentation.pdf | (Presentation mode - PDF)]] | + | * Lecture 7 : List and Binary Search Trees -- [[Media:Biostat615-Fall2011-lecture07-handout.pdf | (Handout PDF)]] [[Media:Biostat615-Fall2011-lecture07-presentation.pdf | (Presentation PDF)]] |
− | * Lecture 8 : Hash Tables -- [[Media:Biostat615-lecture8-nup.pdf | (Handout mode - PDF)]] [[Media:Biostat615-lecture8-presentation.pdf | (Presentation mode - PDF)]] | + | * Lecture 8 : Hash Tables -- [[Media:Biostat615-Fall2011-lecture08-handout.pdf | (Handout PDF)]] [[Media:Biostat615-Fall2011-lecture08-presentation.pdf | (Presentation PDF)]] |
| + | * Lecture 9 : Dynamic Programming -- [[Media:Biostat615-Fall2011-lecture09-handout.pdf | (Handout PDF)]] [[Media:Biostat615-Fall2011-lecture09-presentation.pdf | (Presentation PDF)]] |
| + | * Review : Dynamic Programming & Midterm Review -- [[Media:Biostat615-Fall2011-midterm-2011-winter-.pdf | (PDF)]] |
| + | * Lecture 10 : Hidden Markov Model-- [[Media:Biostat615-Fall2011-lecture10-handout.pdf | (PDF)]] '''(UPDATED on Oct 29th at 12:51PM)''' |
| + | * Lecture 11 : Hidden Markov Model (cont'd) -- [[Media:Biostat615-lecture11-2011-10-20.pdf | (PDF)]] '''(UPDATED on Oct 28th at 1:42PM)''' |
| + | * Lecture 12 : Boost Library & Random Numbers -- [[Media:Biostat615-lecture12-2011-10-25.pdf | (PDF)]] |
| + | * Lecture 13 : Single dimensional optimization -- [[Media:Biostat615-lecture13-2011-10-27.pdf | (PDF)]] |
| + | * Lecture 14 : Single and multi dimensional optimizations -- [[Media:Biostat615-fall2011-lecture14.pdf | (PDF)]] (Updated on Nov 3rd 1:25AM) |
| + | * Lecture 15 : Multi dimensional optimizations -- [[Media:Biostat615-fall2011-lecture15.pdf | (PDF)]] (Updated Nov 8 10:35AM) |
| + | * Lecture 16 : E-M algorithm -- [[Media:Biostat615-fall2011-lecture16.pdf | (PDF)]] (Updated Nov 8 10:35AM) |
| + | * Lecture 17 : Simulated Annealing -- [[Media:Biostat615-fall2011-lecture17.pdf | (PDF)]] |
| + | * Lecture 18 : Gibbs Sampling -- [[Media:Biostat615-fall2011-lecture18.pdf | (PDF)]] (Updated Nov 16 10:00PM) |
| + | * Lecture 19 : Importace Sampling -- [[Media:Biostat615-fall2011-lecture19.pdf | (PDF)]] |
| + | * Lecture 20 : Advanced Hidden Markov Models -- [[Media:Biostat615-fall2011-lecture20.pdf | (PDF)]] |
| + | * Lecture 21 : Linear Algebra in C++ -- [[Media:Biostat615-fall2011-lecture21.pdf | (PDF)]] |
| + | * Lecture 22 : More Linear Algebra in C++ -- [[Media:Biostat615-fall2011-lecture22.pdf | (PDF)]] |
| + | * Lecture 23 : Interfacing between C++ and R -- [[Media:Biostat615-fall2011-lecture23.pdf | (PDF)]] |
| + | * Review : Final Review -- [[Media:Biostat615-winter2011-final.pdf | (PDF)]] [[Media:Biostat615-homework-review.pdf | (Homework)]] |
| | | |
| == Problem Sets == | | == Problem Sets == |
− | * Problem Set 1 : Due on January 20th, 2011 -- [[Media:Biostat615-homework1.pdf | (Problem Set 1 - PDF)]] | + | * Problem Set 0 - Running screenshots of helloWorld.cpp and towerOfHanoi.cpp - Due before the submission of Problem Set 1 |
− | * Problem Set 2 : Due on February 2nd, 2011 -- [[Media:Biostat615-homework2.pdf | (Problem Set 2 - PDF)]] | + | * Problem Set 1 -- Due on Tuesday September 27th, 2011 [[Media:Biostat615-Fall2011-homework01.pdf | (PDF)]] [[Media:Biostat615-Fall2011-homework01-solutions.pdf | (PDF-SOLUTIONS)]] |
| + | * Problem Set 2 -- Due on Thursday October 6th, 2011 [[Media:Biostat615-Fall2011-homework02.pdf | (PDF)]] [[Media:Biostat615-Fall2011-homework02-solutions.pdf | (PDF-SOLUTIONS)]] |
| + | ** (Update Oct 2, 2011 : Note that the problem 1 and 3 are slightly updated for clarification) |
| + | ** (If you can't decompress the files above properly, use this alternative link by [http://dl.dropbox.com/u/1850834/biostat615-homework02-datasets.tar.gz CLICKING HERE] ) |
| + | * Problem Set 3 -- Due on Tuesday November 1st, 2011 [[Media:Biostat615-homework03.pdf | (PDF)]] (UPDATED on Oct 25th at 11:10AM) |
| + | * Problem Set 4 -- Due on Tuesday November 15th, 2011 [[Media:Biostat615-homework04.pdf | (PDF)]] |
| + | * Problem Set 5 -- Due on Tuesday November 29th, 2011 [[Media:Biostat615-fall2011-homework05.pdf | (PDF)]] |
| + | * Problem Set 6 -- Due on Tuesday December 13th, 2011 [[Media:Biostat615-fall2011-homework06.pdf | (PDF)]] |
| + | ** [http://www.sph.umich.edu/csg/abecasis/class/2006/ModelFittingData.txt DOWNLOAD DATA FOR PROBLEM 1] |
| + | ** [http://dl.dropbox.com/u/1850834/zip_01.zip DOWNLOAD DATA FOR PROBLEM 3] |
| + | |
| + | === Supplementary Data sets for Problem Sets === |
| + | * Problem Set 2 |
| ** [[Media:Shuf-1M.txt.gz| (Example data - shuf-1M.txt.gz)]] 1,000,000 randomly shuffled data (gzipped) | | ** [[Media:Shuf-1M.txt.gz| (Example data - shuf-1M.txt.gz)]] 1,000,000 randomly shuffled data (gzipped) |
| ** [[Media:Rand-1M-3digits.txt.gz| (Example data - Rand-1M-3digits.txt.gz)]] 1,000,000 random data from 1 to 1,000]] (gzipped) | | ** [[Media:Rand-1M-3digits.txt.gz| (Example data - Rand-1M-3digits.txt.gz)]] 1,000,000 random data from 1 to 1,000]] (gzipped) |
| ** [[Media:Rand-50k.txt.gz | (Example data - Rand-50k.txt.gz)]] 50,000 random data from 1 to 1,000,000)]] (gzippd) | | ** [[Media:Rand-50k.txt.gz | (Example data - Rand-50k.txt.gz)]] 50,000 random data from 1 to 1,000,000)]] (gzippd) |
− | ** Source code from Problem 3 | + | * Problem Set 3 |
− | #include <iostream> | + | ** Example output data for problem 3-1 (input is the second column) '''(NOTE : ADDED on Oct 25 11:45PM)''' -- This is also reflected in lecture 11 class note. |
− | #include <vector> | + | TIME TOSS P(FAIR) P(BIAS) MLSTATE |
− | #include <ctime> | + | 1 H 0.5950 0.4050 FAIR |
− | #include <fstream> | + | 2 T 0.8118 0.1882 FAIR |
− | #include <set> | + | 3 H 0.8071 0.1929 FAIR |
− | #include "mySortedArray.h" | + | 4 T 0.8584 0.1416 FAIR |
− | #include "myTree.h" | + | 5 H 0.7613 0.2387 FAIR |
− | #include "myList.h" | + | 6 H 0.7276 0.2724 FAIR |
− | | + | 7 T 0.7495 0.2505 FAIR |
− | int main(int argc, char** argv) { | + | 8 H 0.5413 0.4587 BIASED |
− | int tok;
| + | 9 H 0.4187 0.5813 BIASED |
− | std::vector<int> v;
| + | 10 H 0.3533 0.6467 BIASED |
− | if ( argc > 1 ) {
| + | 11 H 0.3301 0.6699 BIASED |
− | std::ifstream fin(argv[1]);
| + | 12 H 0.3436 0.6564 BIASED |
− | while( fin >> tok ) { v.push_back(tok); }
| + | 13 H 0.3971 0.6029 BIASED |
− | fin.close();
| + | 14 T 0.5028 0.4972 BIASED |
− | }
| + | 15 H 0.3725 0.6275 BIASED |
− | else {
| + | 16 H 0.2985 0.7015 BIASED |
− | while( std::cin >> tok ) { v.push_back(tok); }
| + | 17 H 0.2635 0.7365 BIASED |
− | }
| + | 18 H 0.2596 0.7404 BIASED |
− | mySortedArray<int> c1;
| + | 19 H 0.2858 0.7142 BIASED |
− | myList<int> c2;
| + | 20 H 0.3482 0.6518 BIASED |
− | myTree<int> c3;
| + | ** Example output data for problem 3-2 (input is the second column) '''(NOTE : UPDATED on Oct 25 11:23PM)''' |
− | std::set<int> s;
| + | TIME TOSS Pr(F) Pr(HB) Pr(TB) MLSTATE |
− | | + | 1 T 0.8844 0.0326 0.0830 FAIR |
− | clock_t start = clock();
| + | 2 H 0.9012 0.0791 0.0198 FAIR |
− | for(int i=0; i < (int)v.size(); ++i) {
| + | 3 H 0.9075 0.0735 0.0189 FAIR |
− | c1.insert(v[i]);
| + | 4 T 0.9091 0.0145 0.0764 FAIR |
− | }
| + | 5 T 0.9068 0.0114 0.0818 FAIR |
− | clock_t finish = clock();
| + | 6 H 0.9058 0.0440 0.0502 FAIR |
− | double duration = (double)(finish-start)/CLOCKS_PER_SEC;
| + | 7 T 0.8834 0.0275 0.0891 FAIR |
− | std::cout << "Sorted Array (Insert) " << duration << std::endl;
| + | 8 H 0.8520 0.0698 0.0783 FAIR |
− | | + | 9 T 0.7713 0.0347 0.1940 FAIR |
− | start = clock();
| + | 10 T 0.6927 0.0823 0.2249 FAIR |
− | for(int i=0; i < (int)v.size(); ++i) {
| + | 11 H 0.4730 0.4984 0.0286 HEAD-BIASED |
− | c2.insert(v[i]);
| + | 12 H 0.3227 0.6706 0.0066 HEAD-BIASED |
− | }
| + | 13 H 0.2236 0.7726 0.0037 HEAD-BIASED |
− | finish = clock();
| + | 14 H 0.1589 0.8381 0.0031 HEAD-BIASED |
− | duration = (double)(finish-start)/CLOCKS_PER_SEC;
| + | 15 H 0.1169 0.8803 0.0028 HEAD-BIASED |
− | std::cout << "List (Insert) " << duration << std::endl;
| + | 16 H 0.0902 0.9072 0.0026 HEAD-BIASED |
− |
| + | 17 H 0.0740 0.9235 0.0025 HEAD-BIASED |
− | start = clock();
| + | 18 H 0.0654 0.9321 0.0025 HEAD-BIASED |
− | for(int i=0; i < (int)v.size(); ++i) {
| + | 19 H 0.0630 0.9346 0.0025 HEAD-BIASED |
− | c3.insert(v[i]);
| + | 20 H 0.0661 0.9314 0.0025 HEAD-BIASED |
− | }
| + | 21 H 0.0755 0.9219 0.0026 HEAD-BIASED |
− | finish = clock();
| + | 22 H 0.0926 0.9038 0.0036 HEAD-BIASED |
− | duration = (double)(finish-start)/CLOCKS_PER_SEC;
| + | 23 H 0.1204 0.8684 0.0113 HEAD-BIASED |
− | std::cout << "Tree (Insert) " << duration << std::endl;
| + | 24 H 0.1603 0.7586 0.0811 HEAD-BIASED |
− | | + | 25 T 0.1904 0.0858 0.7238 TAIL-BASED |
− | start = clock();
| + | 26 T 0.1819 0.0118 0.8063 TAIL-BASED |
− | for(int i=0; i < (int)v.size(); ++i) {
| + | 27 T 0.1797 0.0036 0.8167 TAIL-BASED |
− | s.insert(v[i]);
| + | 28 T 0.1894 0.0028 0.8077 TAIL-BASED |
− | }
| + | 29 T 0.2136 0.0038 0.7826 TAIL-BASED |
− | finish = clock();
| + | 30 T 0.2561 0.0123 0.7317 TAIL-BASED |
− | duration = (double)(finish-start)/CLOCKS_PER_SEC;
| + | ** Example input/output data for problem 3-3 (Applying 2-state HMM in Problem 3-1): Download using [http://dl.dropbox.com/u/1850834/biostat615-homework3-3-20k-examples.zip THIS LINK] |
− | std::cout << "std::set (Insert) " << duration << std::endl;
| |
− | | |
− | start = clock();
| |
− | for(int i=0; i < (int)v.size(); ++i) {
| |
− | c1.search(v[i]);
| |
− | }
| |
− | finish = clock();
| |
− | duration = (double)(finish-start)/CLOCKS_PER_SEC;
| |
− | std::cout << "Sorted Array (Search) " << duration << std::endl;
| |
− | | |
− | start = clock();
| |
− | for(int i=0; i < (int)v.size(); ++i) {
| |
− | c2.search(v[i]);
| |
− | }
| |
− | finish = clock();
| |
− | duration = (double)(finish-start)/CLOCKS_PER_SEC;
| |
− | std::cout << "List (Search) " << duration << std::endl;
| |
− | | |
− | start = clock();
| |
− | for(int i=0; i < (int)v.size(); ++i) {
| |
− | c3.search(v[i]);
| |
− | }
| |
− | finish = clock();
| |
− | duration = (double)(finish-start)/CLOCKS_PER_SEC;
| |
− | std::cout << "Tree (Search) " << duration << std::endl;
| |
− | | |
− | start = clock();
| |
− | for(int i=0; i < (int)v.size(); ++i) {
| |
− | s.find(v[i]);
| |
− | }
| |
− | finish = clock();
| |
− | duration = (double)(finish-start)/CLOCKS_PER_SEC;
| |
− | std::cout << "std::set (Search) " << duration << std::endl;
| |
− | }
| |
| | | |
| == Office Hours == | | == Office Hours == |
− | * Friday 9:30AM-12:30PM | + | * Friday 9:00AM-10:30PM |
− | | |
− | == Information on Biostatistics Cluster ==
| |
− | * You may want to request an account to the biostatistics cluster. Refer to https://webservices.itcs.umich.edu/mediawiki/Biostatistics/index.php/Cluster for further information.
| |
| | | |
| == Standards of Academic Conduct == | | == Standards of Academic Conduct == |
Line 161: |
Line 153: |
| | | |
| == Course History == | | == Course History == |
− | | + | * Winter 2011 Course Web Site [[Biostatistics_615/815_Winter_2011]] |
− | Goncalo Abecasis taught it in several academic years previously. For previous course notes, see [[http://www.sph.umich.edu/csg/abecasis/class Goncalo's older class notes]]. | + | * Goncalo Abecasis taught it in several academic years previously. For previous course notes, see [[http://www.sph.umich.edu/csg/abecasis/class Goncalo's older class notes]]. |