|
|
(70 intermediate revisions by 2 users not shown) |
Line 1: |
Line 1: |
| == Objective == | | == Objective == |
| | | |
− | In this winter, Biostatistics 615/815 aims for providing students with a practical understanding of computational aspects in implementing statistical methods. Although C++ language will be used throughout the course, using Java programming language for homework and project will be acceptable. | + | In Fall 2011, Biostatistics 615/815 aims for providing students with a practical understanding of computational aspects in implementing statistical methods. Although C++ language will be used throughout the course, using Java programming language for homework and project will be acceptable. |
| | | |
| == Target Audience == | | == Target Audience == |
Line 10: |
Line 10: |
| | | |
| == Textbook == | | == Textbook == |
− | * Required Textbook : Cormen, Leiserson, Rivest, and Stein, "Introduction to Algorithms", Third Edition, The MIT Press, 2009 [http://mitpress.mit.edu/algorithms/ [Official Book Web Site]] | + | * Recommended Textbook : Cormen, Leiserson, Rivest, and Stein, "Introduction to Algorithms", Third Edition, The MIT Press, 2009 [http://mitpress.mit.edu/algorithms/ [Official Book Web Site]] |
| * Optional Textbook : Press, Teukolsky, Vetterling, Flannery, "Numerical Recipes", 3rd Edition, Cambridge University Press, 2007 [http://www.nr.com/ [Official Book Web Site]] | | * Optional Textbook : Press, Teukolsky, Vetterling, Flannery, "Numerical Recipes", 3rd Edition, Cambridge University Press, 2007 [http://www.nr.com/ [Official Book Web Site]] |
| | | |
| == Class Schedule == | | == Class Schedule == |
| | | |
− | Classes are scheduled for Tuesday and Thursdays, 8:30 - 10:00 am at SPH II M4318 | + | Classes are scheduled for Tuesday and Thursdays, 8:30 - 10:00 am at SPH II M4332 |
| | | |
| == Topics == | | == Topics == |
Line 21: |
Line 21: |
| The following contents are planned to be covered. | | The following contents are planned to be covered. |
| | | |
− | === Part I : Algorithms 101 === | + | === Part I : C++ Basics and Introductory Algorithms === |
− | * Understanding of Computational Time Complexity | + | * Computational Time Complexity |
| * Sorting | | * Sorting |
| * Divide and Conquer Algorithms | | * Divide and Conquer Algorithms |
Line 28: |
Line 28: |
| * Key Data Structure | | * Key Data Structure |
| * Dynamic Programming | | * Dynamic Programming |
| + | * Hidden Markov Models |
| | | |
− | === Part II : Matrix Operations and Numerical Optimizations === | + | === Part II : Numerical Methods and Randomized Algorithms === |
− | * Matrix decomposition (LU, QR, SVD) | + | * Random Numbers |
− | * Implementation of Linear Models | + | * Matrix Operations and Least Square Methods |
− | * Numerical Optimizations
| + | * Importance Sampling |
− | | |
− | === Part III : Advanced Statistical Methods ===
| |
− | * Hidden Markov Models | |
| * Expectation Maximization | | * Expectation Maximization |
| * Markov-Chain Monte Carlo Methods | | * Markov-Chain Monte Carlo Methods |
| + | * Simulated Annealing |
| + | * Gibbs Sampling |
| | | |
| == Class Notes == | | == Class Notes == |
− | * Lecture 1 : Statistical Computing -- [[Media:Biostat615-lecture1-handout.pdf | (Handout mode - PDF)]] [[Media:Biostat615-lecture1-presentation.pdf | (Presentation mode - PDF)]] | + | * Lecture 1 : Introduction to Statistical Computing -- [[Media:Biostat615-Fall2011-lecture01-handout.pdf | (Handout PDF)]] [[Media:Biostat615-Fall2011-lecture01-presentation.pdf | (Presentation PDF)]] |
− | * Lecture 2 : C++ Basics and Precisions -- [[Media:Biostat615-lecture2-handout-nup.pdf | (Handout mode - PDF)]] [[Media:Biostat615-lecture2-presentation.pdf | (Presentation mode - PDF)]] | + | * Lecture 2 : Introduction to C++ Programming -- [[Media:Biostat615-Fall2011-lecture02-handout.pdf | (Handout PDF)]] [[Media:Biostat615-Fall2011-lecture02-presentation.pdf | (Presentation PDF)]] |
− | * Lecture 3 : Implementing Fisher's Exact Test -- [[Media:Biostat615-lecture3-nup.pdf | (Handout mode - PDF)]] [[Media:Biostat615-lecture3.pdf | (Presentation mode - PDF)]] | + | * Lecture 3 : C++ Basics and Fisher's Exact Test -- [[Media:Biostat615-Fall2011-lecture03-handout.pdf | (Handout PDF)]] [[Media:Biostat615-Fall2011-lecture03-presentation.pdf | (Presentation PDF)]] |
− | * Lecture 4 : Classes and STLs -- [[Media:Biostat615-lecture4-nup.pdf | (Handout mode - PDF)]] [[Media:Biostat615-lecture4-presentation.pdf | (Presentation mode - PDF)]] | + | * Lecture 4 : Standard Template Libraries + Divide and Conquer Algorithms-- [[Media:Biostat615-Fall2011-lecture04-handout.pdf | (Handout PDF)]] [[Media:Biostat615-Fall2011-lecture04-presentation.pdf | (Presentation PDF)]] |
− | * Lecture 5 : Divide and Conquer Algorithms -- [[Media:Biostat615-lecture5-nup.pdf | (Handout mode - PDF)]] [[Media:Biostat615-lecture5.pdf | (Presentation mode - PDF)]] | + | * Lecture 5 : Sorting Algorithms -- [[Media:Biostat615-Fall2011-lecture05-handout.pdf | (Handout PDF)]] [[Media:Biostat615-Fall2011-lecture05-presentation.pdf | (Presentation PDF)]] |
− | * Lecture 6 : Linear Sorting Algorithms and Elementary Data Structures -- [[Media:Biostat615-lecture6-nup.pdf | (Handout mode - PDF)]] [[Media:Biostat615-lecture6-presentation.pdf | (Presentation mode - PDF)]] | + | * Lecture 6 : Sorting Algorithms & Arrays -- [[Media:Biostat615-Fall2011-lecture06-handout.pdf | (Handout PDF)]] [[Media:Biostat615-Fall2011-lecture06-presentation.pdf | (Presentation PDF)]] |
− | * Lecture 7 : Data Structures -- [[Media:Biostat615-lecture7-nup.pdf | (Handout mode - PDF)]] [[Media:Biostat615-lecture7-presentation.pdf | (Presentation mode - PDF)]] | + | * Lecture 7 : List and Binary Search Trees -- [[Media:Biostat615-Fall2011-lecture07-handout.pdf | (Handout PDF)]] [[Media:Biostat615-Fall2011-lecture07-presentation.pdf | (Presentation PDF)]] |
− | * Lecture 8 : Hash Tables -- [[Media:Biostat615-lecture8-nup.pdf | (Handout mode - PDF)]] [[Media:Biostat615-lecture8-presentation.pdf | (Presentation mode - PDF)]] | + | * Lecture 8 : Hash Tables -- [[Media:Biostat615-Fall2011-lecture08-handout.pdf | (Handout PDF)]] [[Media:Biostat615-Fall2011-lecture08-presentation.pdf | (Presentation PDF)]] |
− | * Lecture 9 : Dyamic Programming -- [[Media:Biostat615-lecture9-nup.pdf | (Handout mode - PDF)]] [[Media:Biostat615-lecture9-presentation.pdf | (Presentation mode - PDF)]] | + | * Lecture 9 : Dynamic Programming -- [[Media:Biostat615-Fall2011-lecture09-handout.pdf | (Handout PDF)]] [[Media:Biostat615-Fall2011-lecture09-presentation.pdf | (Presentation PDF)]] |
− | * Lecture 10 : Boost Libraries and Graphical Algorithms -- [[Media:Biostat615-lecture10-nup.pdf | (Handout mode - PDF)]] [[Media:Biostat615-lecture10-presentation.pdf | (Presentation mode - PDF)]] | + | * Review : Dynamic Programming & Midterm Review -- [[Media:Biostat615-Fall2011-midterm-2011-winter-.pdf | (PDF)]] |
− | * Lecture 11 : Hidden Markov Models -- [[Media:Biostat615-lecture11-nup.pdf | (Handout mode - PDF)]] [[Media:Biostat615-lecture11.pdf | (Presentation mode - PDF)]] | + | * Lecture 10 : Hidden Markov Model-- [[Media:Biostat615-Fall2011-lecture10-handout.pdf | (PDF)]] '''(UPDATED on Oct 29th at 12:51PM)''' |
− | * Lecture 12 : Hidden Markov Models -- [[Media:Biostat615-lecture12-nup.pdf | (Handout mode - PDF)]] [[Media:Biostat615-lecture12-presentation.pdf | (Presentation mode - PDF)]] | + | * Lecture 11 : Hidden Markov Model (cont'd) -- [[Media:Biostat615-lecture11-2011-10-20.pdf | (PDF)]] '''(UPDATED on Oct 28th at 1:42PM)''' |
− | * Lecture 13 : Matrix Computation -- [[Media:Biostat615-lecture13-nup.pdf | (Handout mode - PDF)]] [[Media:Biostat615-lecture13-presentation.pdf | (Presentation mode - PDF)]] | + | * Lecture 12 : Boost Library & Random Numbers -- [[Media:Biostat615-lecture12-2011-10-25.pdf | (PDF)]] |
− | * Lecture 14 : Implementing Linear Regression -- [[Media:Biostat615-lecture14-nup.pdf | (Handout mode - PDF)]] [[Media:Biostat615-lecture14-presentation.pdf | (Presentation mode - PDF)]] | + | * Lecture 13 : Single dimensional optimization -- [[Media:Biostat615-lecture13-2011-10-27.pdf | (PDF)]] |
− | * Lecture 15 : Random Number Generation -- [[Media:Biostat615-lecture15-nup.pdf | (Handout mode - PDF)]] [[Media:Biostat615-lecture15-presentation.pdf | (Presentation mode - PDF)]] | + | * Lecture 14 : Single and multi dimensional optimizations -- [[Media:Biostat615-fall2011-lecture14.pdf | (PDF)]] (Updated on Nov 3rd 1:25AM) |
− | * Lecture 16 : Monte-Carlo methods and importance sampling -- [[Media:Biostat615-lecture16-nup.pdf | (Handout mode - PDF)]] [[Media:Biostat615-lecture16-presentation.pdf | (Presentation mode - PDF)]] | + | * Lecture 15 : Multi dimensional optimizations -- [[Media:Biostat615-fall2011-lecture15.pdf | (PDF)]] (Updated Nov 8 10:35AM) |
− | * Lecture 17 : Numerical optimization -- [[Media:Biostat615-lecture17-nup.pdf | (Handout mode - PDF)]] [[Media:Biostat615-lecture17-presentation.pdf | (Presentation mode - PDF)]] | + | * Lecture 16 : E-M algorithm -- [[Media:Biostat615-fall2011-lecture16.pdf | (PDF)]] (Updated Nov 8 10:35AM) |
− | * Lecture 18 : Numerical optimization II -- [[Media:Biostat615-lecture18-nup.pdf | (Handout mode - PDF)]] [[Media:Biostat615-lecture18-presentation.pdf | (Presentation mode - PDF)]] | + | * Lecture 17 : Simulated Annealing -- [[Media:Biostat615-fall2011-lecture17.pdf | (PDF)]] |
− | * Special Lecture : A practical session -- StatGen Library [[Media:StatGenLecture.pdf | (LectureNotes - PDF)]] [[Media:Debugging.pdf | (Debugging - PDF)]] Demo Code: [[Media:StatGenDemo.tar | StatGenDemo.tar]] Additional Notes: [[Media:GithubWithoutGit.pdf | GithubWithoutGit.pdf]] | + | * Lecture 18 : Gibbs Sampling -- [[Media:Biostat615-fall2011-lecture18.pdf | (PDF)]] (Updated Nov 16 10:00PM) |
− | * Lecture 19 : The Simplex Method [[Media:Biostat615-lecture19-nup.pdf | (Handout mode - PDF)]] [[Media:Biostat615-lecture19-presentation.pdf | (Presentation mode - PDF)]] | + | * Lecture 19 : Importace Sampling -- [[Media:Biostat615-fall2011-lecture19.pdf | (PDF)]] |
− | * Lecture 20 : The E-M Algorithm [[Media:Biostat615-lecture20-nup.pdf | (Handout mode - PDF)]] [[Media:Biostat615-lecture20-presentation.pdf | (Presentation mode - PDF)]] | + | * Lecture 20 : Advanced Hidden Markov Models -- [[Media:Biostat615-fall2011-lecture20.pdf | (PDF)]] |
− | * Lecture 21 : Simulated Annealing [[Media:Biostat615-lecture21-nup.pdf | (Handout mode - PDF)]] [[Media:Biostat615-lecture21-presentation.pdf | (Presentation mode - PDF)]] | + | * Lecture 21 : Linear Algebra in C++ -- [[Media:Biostat615-fall2011-lecture21.pdf | (PDF)]] |
− | * Lecture 22 : Gibbs Sampler [[Media:Biostat615-lecture22-nup.pdf | (Handout mode - PDF)]] [[Media:Biostat615-lecture22-presentation.pdf | (Presentation mode - PDF)]] | + | * Lecture 22 : More Linear Algebra in C++ -- [[Media:Biostat615-fall2011-lecture22.pdf | (PDF)]] |
− | * Lecture 23 : The Baum-Welch Algorithm [[Media:Biostat615-lecture23-nup.pdf | (Handout mode - PDF)]] [[Media:Biostat615-lecture23-presentation.pdf | (Presentation mode - PDF)]] | + | * Lecture 23 : Interfacing between C++ and R -- [[Media:Biostat615-fall2011-lecture23.pdf | (PDF)]] |
| + | * Review : Final Review -- [[Media:Biostat615-winter2011-final.pdf | (PDF)]] [[Media:Biostat615-homework-review.pdf | (Homework)]] |
| | | |
| == Problem Sets == | | == Problem Sets == |
− | * Problem Set 1 : Due on January 20th, 2011 -- [[Media:Biostat615-homework1.pdf | (Problem Set 1 - PDF)]] | + | * Problem Set 0 - Running screenshots of helloWorld.cpp and towerOfHanoi.cpp - Due before the submission of Problem Set 1 |
− | * Problem Set 2 : Due on February 8th, 2011 -- [[Media:Biostat615-homework2.pdf | (Problem Set 2 - PDF)]] | + | * Problem Set 1 -- Due on Tuesday September 27th, 2011 [[Media:Biostat615-Fall2011-homework01.pdf | (PDF)]] [[Media:Biostat615-Fall2011-homework01-solutions.pdf | (PDF-SOLUTIONS)]] |
| + | * Problem Set 2 -- Due on Thursday October 6th, 2011 [[Media:Biostat615-Fall2011-homework02.pdf | (PDF)]] [[Media:Biostat615-Fall2011-homework02-solutions.pdf | (PDF-SOLUTIONS)]] |
| + | ** (Update Oct 2, 2011 : Note that the problem 1 and 3 are slightly updated for clarification) |
| + | ** (If you can't decompress the files above properly, use this alternative link by [http://dl.dropbox.com/u/1850834/biostat615-homework02-datasets.tar.gz CLICKING HERE] ) |
| + | * Problem Set 3 -- Due on Tuesday November 1st, 2011 [[Media:Biostat615-homework03.pdf | (PDF)]] (UPDATED on Oct 25th at 11:10AM) |
| + | * Problem Set 4 -- Due on Tuesday November 15th, 2011 [[Media:Biostat615-homework04.pdf | (PDF)]] |
| + | * Problem Set 5 -- Due on Tuesday November 29th, 2011 [[Media:Biostat615-fall2011-homework05.pdf | (PDF)]] |
| + | * Problem Set 6 -- Due on Tuesday December 13th, 2011 [[Media:Biostat615-fall2011-homework06.pdf | (PDF)]] |
| + | ** [http://www.sph.umich.edu/csg/abecasis/class/2006/ModelFittingData.txt DOWNLOAD DATA FOR PROBLEM 1] |
| + | ** [http://dl.dropbox.com/u/1850834/zip_01.zip DOWNLOAD DATA FOR PROBLEM 3] |
| + | |
| + | === Supplementary Data sets for Problem Sets === |
| + | * Problem Set 2 |
| ** [[Media:Shuf-1M.txt.gz| (Example data - shuf-1M.txt.gz)]] 1,000,000 randomly shuffled data (gzipped) | | ** [[Media:Shuf-1M.txt.gz| (Example data - shuf-1M.txt.gz)]] 1,000,000 randomly shuffled data (gzipped) |
| ** [[Media:Rand-1M-3digits.txt.gz| (Example data - Rand-1M-3digits.txt.gz)]] 1,000,000 random data from 1 to 1,000]] (gzipped) | | ** [[Media:Rand-1M-3digits.txt.gz| (Example data - Rand-1M-3digits.txt.gz)]] 1,000,000 random data from 1 to 1,000]] (gzipped) |
| ** [[Media:Rand-50k.txt.gz | (Example data - Rand-50k.txt.gz)]] 50,000 random data from 1 to 1,000,000)]] (gzippd) | | ** [[Media:Rand-50k.txt.gz | (Example data - Rand-50k.txt.gz)]] 50,000 random data from 1 to 1,000,000)]] (gzippd) |
− | * Problem Set 3 : Due on February 17th, 2011 -- [[Media:Biostat615-homework3.pdf | (Problem Set 3 - PDF)]] | + | * Problem Set 3 |
− | ** For problem 2 - For Visual C++ users : Refer to the [[Biostatistics 615/815: Main Page#Code_for_Problem_3_-_Problem_2 | (Code Example)]] for a better start | + | ** Example output data for problem 3-1 (input is the second column) '''(NOTE : ADDED on Oct 25 11:45PM)''' -- This is also reflected in lecture 11 class note. |
− | * Problem Set 4 : Due on March 8th, 2011 -- [[Media:Biostat615-homework4.pdf | (Problem Set 4 - PDF)]]
| + | TIME TOSS P(FAIR) P(BIAS) MLSTATE |
− | ** [[http://dl.dropbox.com/u/1850834/simulCoinInput.txt.gz Download sample input file (gzipped)]]
| + | 1 H 0.5950 0.4050 FAIR |
− | ** [[http://dl.dropbox.com/u/1850834/simulCoinOutput.txt.gz Download sample output file(gzipped)]]
| + | 2 T 0.8118 0.1882 FAIR |
− | ** See below if you want to copy and paste example code
| + | 3 H 0.8071 0.1929 FAIR |
− | * Problem Set 5 : Due on March 29th, 2011 -- [[Media:Biostat615-homework5.pdf | (Problem Set 5 - PDF)]]
| + | 4 T 0.8584 0.1416 FAIR |
− | * Problem Set 6 : Due on April 7th, 2011 -- [[Media:Biostat615-homework6.pdf | (Problem Set 6 - PDF)]]
| + | 5 H 0.7613 0.2387 FAIR |
− | * Problem Set 7 : Due on April 21th, 2011 -- [[Media:Biostat615-homework7.pdf | (Problem Set 7 - PDF)]]
| + | 6 H 0.7276 0.2724 FAIR |
− | ** [[http://www.sph.umich.edu/csg/abecasis/class/2006/ModelFittingData.txt Download The example data to use]]
| + | 7 T 0.7495 0.2505 FAIR |
− | ** [[http://dl.dropbox.com/u/1850834/mixData.txt Download Additional data for sanity check (same data used in the lecture slides)]]
| + | 8 H 0.5413 0.4587 BIASED |
− | | + | 9 H 0.4187 0.5813 BIASED |
− | | + | 10 H 0.3533 0.6467 BIASED |
− | === Source code for Homework 6 - simplex615.h ===
| + | 11 H 0.3301 0.6699 BIASED |
− | #ifndef __SIMPLEX_615_H | + | 12 H 0.3436 0.6564 BIASED |
− | #define __SIMPLEX_615_H | + | 13 H 0.3971 0.6029 BIASED |
− | | + | 14 T 0.5028 0.4972 BIASED |
− | #include <vector> | + | 15 H 0.3725 0.6275 BIASED |
− | #include <cmath> | + | 16 H 0.2985 0.7015 BIASED |
− | #include <iostream> | + | 17 H 0.2635 0.7365 BIASED |
− | | + | 18 H 0.2596 0.7404 BIASED |
− | #define ZEPS 1e-10 | + | 19 H 0.2858 0.7142 BIASED |
− |
| + | 20 H 0.3482 0.6518 BIASED |
− | class optFunc {
| + | ** Example output data for problem 3-2 (input is the second column) '''(NOTE : UPDATED on Oct 25 11:23PM)''' |
− | public:
| + | TIME TOSS Pr(F) Pr(HB) Pr(TB) MLSTATE |
− | virtual double operator() (std::vector<double>& x) = 0;
| + | 1 T 0.8844 0.0326 0.0830 FAIR |
− | };
| + | 2 H 0.9012 0.0791 0.0198 FAIR |
− |
| + | 3 H 0.9075 0.0735 0.0189 FAIR |
− | // Simplex contains (dim+1)*dim points
| + | 4 T 0.9091 0.0145 0.0764 FAIR |
− | class simplex615 {
| + | 5 T 0.9068 0.0114 0.0818 FAIR |
− | protected:
| + | 6 H 0.9058 0.0440 0.0502 FAIR |
− | std::vector<std::vector<double> > X;
| + | 7 T 0.8834 0.0275 0.0891 FAIR |
− | std::vector<double> Y;
| + | 8 H 0.8520 0.0698 0.0783 FAIR |
− | std::vector<double> midPoint;
| + | 9 T 0.7713 0.0347 0.1940 FAIR |
− | std::vector<double> thruLine;
| + | 10 T 0.6927 0.0823 0.2249 FAIR |
− | | + | 11 H 0.4730 0.4984 0.0286 HEAD-BIASED |
− | int dim, idxLo, idxHi, idxNextHi;
| + | 12 H 0.3227 0.6706 0.0066 HEAD-BIASED |
− |
| + | 13 H 0.2236 0.7726 0.0037 HEAD-BIASED |
− | void evaluateFunction(optFunc& foo);
| + | 14 H 0.1589 0.8381 0.0031 HEAD-BIASED |
− | void evaluateExtremes();
| + | 15 H 0.1169 0.8803 0.0028 HEAD-BIASED |
− | void prepareUpdate();
| + | 16 H 0.0902 0.9072 0.0026 HEAD-BIASED |
− | bool updateSimplex(optFunc& foo, double scale);
| + | 17 H 0.0740 0.9235 0.0025 HEAD-BIASED |
− | void contractSimplex(optFunc& foo);
| + | 18 H 0.0654 0.9321 0.0025 HEAD-BIASED |
− | static int check_tol(double fmax, double fmin, double ftol);
| + | 19 H 0.0630 0.9346 0.0025 HEAD-BIASED |
− | | + | 20 H 0.0661 0.9314 0.0025 HEAD-BIASED |
− | public:
| + | 21 H 0.0755 0.9219 0.0026 HEAD-BIASED |
− | simplex615(double* p, int d);
| + | 22 H 0.0926 0.9038 0.0036 HEAD-BIASED |
− | void amoeba(optFunc& foo, double tol);
| + | 23 H 0.1204 0.8684 0.0113 HEAD-BIASED |
− | std::vector<double>& xmin();
| + | 24 H 0.1603 0.7586 0.0811 HEAD-BIASED |
− | double ymin();
| + | 25 T 0.1904 0.0858 0.7238 TAIL-BASED |
− | }; | + | 26 T 0.1819 0.0118 0.8063 TAIL-BASED |
− | | + | 27 T 0.1797 0.0036 0.8167 TAIL-BASED |
− | simplex615::simplex615(double* p, int d) : dim(d) {
| + | 28 T 0.1894 0.0028 0.8077 TAIL-BASED |
− | X.resize(dim+1);
| + | 29 T 0.2136 0.0038 0.7826 TAIL-BASED |
− | Y.resize(dim+1);
| + | 30 T 0.2561 0.0123 0.7317 TAIL-BASED |
− | midPoint.resize(dim);
| + | ** Example input/output data for problem 3-3 (Applying 2-state HMM in Problem 3-1): Download using [http://dl.dropbox.com/u/1850834/biostat615-homework3-3-20k-examples.zip THIS LINK] |
− | thruLine.resize(dim);
| |
− | for(int i=0; i < dim+1; ++i) {
| |
− | X[i].resize(dim);
| |
− | }
| |
− | | |
− | // set every point the same
| |
− | for(int i=0; i < dim+1; ++i) {
| |
− | for(int j=0; j < dim; ++j) {
| |
− | X[i][j] = p[j];
| |
− | }
| |
− | }
| |
− |
| |
− | // then increase each dimension by one except for the last point
| |
− | for(int i=0; i < dim; ++i) {
| |
− | X[i][i] += 1.;
| |
− | }
| |
− | } | |
− | | |
− | void simplex615::evaluateFunction(optFunc& foo) { | |
− | for(int i=0; i < dim+1; ++i) {
| |
− | Y[i] = foo(X[i]);
| |
− | }
| |
− | } | |
− | | |
− | void simplex615::evaluateExtremes() { | |
− | if ( Y[0] > Y[1] ) {
| |
− | idxHi = 0;
| |
− | idxLo = idxNextHi = 1;
| |
− | }
| |
− | else {
| |
− | idxHi = 1;
| |
− | idxLo = idxNextHi = 0;
| |
− | }
| |
− |
| |
− | for(int i=2; i < dim+1; ++i) {
| |
− | if ( Y[i] <= Y[idxLo] ) {
| |
− | idxLo = i;
| |
− | }
| |
− | else if ( Y[i] > Y[idxHi] ) {
| |
− | idxNextHi = idxHi;
| |
− | idxHi = i;
| |
− | }
| |
− | else if ( Y[i] > Y[idxNextHi] ) {
| |
− | idxNextHi = i;
| |
− | }
| |
− | }
| |
− | } | |
− | | |
− | void simplex615::prepareUpdate() {
| |
− | for(int j=0; j < dim; ++j) {
| |
− | midPoint[j] = 0;
| |
− | }
| |
− | for(int i=0; i < dim+1; ++i) {
| |
− | if ( i != idxHi ) {
| |
− | for(int j=0; j < dim; ++j) {
| |
− | midPoint[j] += X[i][j];
| |
− | }
| |
− | }
| |
− | }
| |
− | for(int j=0; j < dim; ++j) {
| |
− | midPoint[j] /= dim;
| |
− | thruLine[j] = X[idxHi][j] - midPoint[j];
| |
− | }
| |
− | } | |
− | | |
− | bool simplex615::updateSimplex(optFunc& foo, double scale) { | |
− | std::vector<double> nextPoint;
| |
− | nextPoint.resize(dim);
| |
− | for(int i=0; i < dim; ++i) {
| |
− | nextPoint[i] = midPoint[i] + scale * thruLine[i];
| |
− | }
| |
− | double fNext = foo(nextPoint);
| |
− | if ( fNext < Y[idxHi] ) { // exchange with maximum
| |
− | for(int i=0; i < dim; ++i) {
| |
− | X[idxHi][i] = nextPoint[i];
| |
− | }
| |
− | Y[idxHi] = fNext;
| |
− | return true;
| |
− | }
| |
− | else {
| |
− | return false;
| |
− | }
| |
− | }
| |
− |
| |
− | void simplex615::contractSimplex(optFunc& foo) {
| |
− | for(int i=0; i < dim+1; ++i) {
| |
− | if ( i != idxLo ) {
| |
− | for(int j=0; j < dim; ++j) {
| |
− | X[i][j] = 0.5*( X[idxLo][j] + X[i][j] );
| |
− | Y[i] = foo(X[i]);
| |
− | }
| |
− | }
| |
− | }
| |
− | }
| |
− |
| |
− | void simplex615::amoeba(optFunc& foo, double tol) {
| |
− | evaluateFunction(foo);
| |
− | while(true) {
| |
− | evaluateExtremes();
| |
− | prepareUpdate();
| |
− |
| |
− | if ( check_tol(Y[idxHi],Y[idxLo],tol) ) break;
| |
− |
| |
− | updateSimplex(foo, -1.0); // reflection
| |
− |
| |
− | if ( Y[idxHi] < Y[idxLo] ) {
| |
− | updateSimplex(foo, -2.0); // expansion
| |
− | }
| |
− | else if ( Y[idxHi] >= Y[idxNextHi] ) {
| |
− | if ( !updateSimplex(foo, 0.5) ) {
| |
− | contractSimplex(foo);
| |
− | }
| |
− | }
| |
− | }
| |
− | }
| |
− |
| |
− | std::vector<double>& simplex615::xmin() {
| |
− | return X[idxLo];
| |
− | }
| |
− |
| |
− | double simplex615::ymin() {
| |
− | return Y[idxLo];
| |
− | }
| |
− |
| |
− | int simplex615::check_tol(double fmax, double fmin, double ftol) {
| |
− | double delta = fabs(fmax - fmin); double accuracy = (fabs(fmax) + fabs(fmin)) * ftol;
| |
− | return (delta < (accuracy + ZEPS));
| |
− | }
| |
− |
| |
− | #endif // __SIMPLEX_615_H
| |
− | | |
− | | |
− | === Source code for Homework 4 - Matrix615.h ===
| |
− | | |
− | #ifndef __BIOSTAT615_MATRIX_H // avoid including the same header twice
| |
− | #define __BIOSTAT615_MATRIX_H
| |
− |
| |
− | #include <iostream>
| |
− | #include <fstream>
| |
− | #include <boost/tokenizer.hpp>
| |
− | #include <boost/lexical_cast.hpp>
| |
− | #include <boost/foreach.hpp>
| |
− |
| |
− | #include <Eigen/Core>
| |
− |
| |
− | // a generic class for Matrix
| |
− | template<class T>
| |
− | class Matrix615 {
| |
− | protected: // internal data
| |
− | std::vector<T> data; // using std::vector - object copy is now possible
| |
− | int nr, nc; // # rows and cols
| |
− | bool hasMissing;
| |
− | T valueMissing;
| |
− | std::string strMissing;
| |
− | public:
| |
− | // default constructor
| |
− | Matrix615() : nr(0), nc(0), hasMissing(false) {}
| |
− | Matrix615(const char* filename) : nr(0), nc(0), hasMissing(false) {
| |
− | readFromFile(filename);
| |
− | }
| |
− | Matrix615(const T value, const char* str) : nr(0), nc(0) {
| |
− | enableMissingValue(value, str);
| |
− | }
| |
− |
| |
− | // Allow missing value as a pair of actual value and string value
| |
− | void enableMissingValue(const T value, const char* str) {
| |
− | hasMissing = true;
| |
− | valueMissing = value;
| |
− | strMissing = str;
| |
− | }
| |
− |
| |
− | // resize the dimension of the matrix
| |
− | void resize(int nrows, int ncols) {
| |
− | nr = nrows;
| |
− | nc = ncols;
| |
− | data.resize(nr*nc);
| |
− | }
| |
− |
| |
− | // fill the content
| |
− | void fill(T defaultValue) {
| |
− | std::fill( data.begin(), data.end(), defaultValue );
| |
− | }
| |
− |
| |
− | void copyTo(Eigen::Matrix<T,Eigen::Dynamic,Eigen::Dynamic>& m) {
| |
− | m.resize(nr,nc);
| |
− | for(int j=0; j < nc; ++j) {
| |
− | for(int i=0; i < nr; ++i) {
| |
− | m(i,j) = data[i*nc+j];
| |
− | }
| |
− | }
| |
− | }
| |
− |
| |
− | // access individual element
| |
− | T& at(int r, int c) { return data[r*nc+c]; }
| |
− |
| |
− | int numRows() { return nr; }
| |
− | int numCols() { return nc; }
| |
− |
| |
− | // print the content of the matrix
| |
− | void print(std::ostream& o) {
| |
− | for(int i=0; i < nr; ++i) {
| |
− | for(int j=0; j < nc; ++j) {
| |
− | if ( j > 0 ) o << "\t";
| |
− | if ( hasMissing && ( valueMissing == at(i,j) ) )
| |
− | o << strMissing;
| |
− | else
| |
− | o << at(i,j);
| |
− | }
| |
− | o << std::endl;
| |
− | }
| |
− | }
| |
− |
| |
− | // opens a file to fill the matrix
| |
− | void readFromFile(const char* file) {
| |
− | std::ifstream ifs(file);
| |
− | if ( ! ifs.is_open() ) {
| |
− | std::cerr << "Cannot open file " << file << std::endl;
| |
− | abort();
| |
− | }
| |
− |
| |
− | std::string line;
| |
− | boost::char_separator<char> sep(" \t");
| |
− | typedef boost::tokenizer< boost::char_separator<char> > wsTokenizer;
| |
− |
| |
− | data.clear();
| |
− | nr = nc = 0;
| |
− | while( std::getline(ifs, line) ) {
| |
− | wsTokenizer t(line,sep);
| |
− | for(wsTokenizer::iterator i=t.begin(); i != t.end(); ++i) {
| |
− | // if hasMissing is set, convert string "Missing" into special value for Missing
| |
− | if ( hasMissing && ( i->compare(strMissing) == 0 ) )
| |
− | data.push_back(valueMissing);
| |
− | // Otherwise, convert the string to a particular type
| |
− | else
| |
− | data.push_back(boost::lexical_cast<T>(i->c_str()));
| |
− | if ( nr == 0 ) ++nc; // count # of columns at the first row
| |
− | }
| |
− | ++nr;
| |
− | // when reading each line, make sure that the # of columns match to expectation;
| |
− | if ( (int)data.size() != nr*nc ) {
| |
− | std::cerr << "The input file is not rectangle at line " << nr << std::endl;
| |
− | abort();
| |
− | }
| |
− | }
| |
− | }
| |
− | };
| |
− |
| |
− | #endif
| |
− | | |
− | === Source code for Homework 4 - Problem 1 ===
| |
− | #include "Matrix615.h"
| |
− |
| |
− | #define NIL 999999
| |
− |
| |
− | int main(int argc, char** argv) {
| |
− | if ( argc != 2 ) {
| |
− | std::cerr << "Usage: floydWarshall [weight matrix]" << std::endl;
| |
− | abort();
| |
− | }
| |
− |
| |
− | Matrix615<int> W;
| |
− | W.enableMissingValue(NIL,"NIL");
| |
− | W.readFromFile(argv[1]);
| |
− | if ( W.numRows() != W.numCols() ) {
| |
− | std::cerr << "The input file is not exactly square" << std::endl;
| |
− | abort();
| |
− | }
| |
− |
| |
− | int n = W.numRows();
| |
− | Matrix615<int> P;
| |
− | P.enableMissingValue(NIL,"NIL");
| |
− | P.resize(n,n);
| |
− |
| |
− | // Fill out the code in the part marked as *** [FILL HERE] ***
| |
− |
| |
− | // *** FILL HERE ** initialize P and W matrix as described in page 695-696 of the textbook
| |
− |
| |
− | Matrix615<int> D = W; // object copy is valid because no pointer is involved
| |
− |
| |
− | // print initial D and W matrix
| |
− | std::cout << "---------- Initial D Matrix ---------------------------------" << std::endl;
| |
− | D.print(std::cout);
| |
− | std::cout << "---------- Initial PI Matrix ---------------------------------" << std::endl;
| |
− | P.print(std::cout);
| |
− |
| |
− | for(int k=0; k < n; ++k) {
| |
− | // *** FILL HERE *** Run floyd-Warshall algorithm for each k and
| |
− |
| |
− | // print out D and PI matrix for each iteration
| |
− | std::cout << "---------- D Matrix after covering node " << k << "----------" << std::endl;
| |
− | D.print(std::cout);
| |
− | std::cout << "---------- PI Matrix after covering node " << k << "----------" << std::endl;
| |
− | P.print(std::cout);
| |
− | }
| |
− |
| |
− | // *** EXTRA POINTS *** Extra points (10\%) will be given if the optimal path
| |
− | // for every pair of node is printed below
| |
− |
| |
− | return 0;
| |
− | }
| |
− | | |
− | === Source code for Homework 4 - Problem 2 ===
| |
− | #include <cstdlib>
| |
− | #include <iomanip>
| |
− | #include <iostream>
| |
− |
| |
− | #include "Matrix615.h"
| |
− |
| |
− | #define N_STATES 2
| |
− | #define N_DATA 2
| |
− |
| |
− | const char* stateLabels[N_STATES] = {"F","B"};
| |
− | const char* dataLabels[N_DATA] = {"H","T"};
| |
− |
| |
− | class BiasedCoinHMM {
| |
− | protected:
| |
− | int T;
| |
− | // HMM initial parameters
| |
− | std::vector<double> pi; // N_STATES * 1 vector
| |
− | Matrix615<double> A; // N_STATES * N_STATES matrix
| |
− | // : A_{ij} is \Pr(q_t=i|q_{t-1}=j)
| |
− | Matrix615<double> B; // N_OBS * N_STATES matrix
| |
− | // : B_{ij} = b_j(o_i) = \Pr(o_t=i|q_t=j)
| |
− |
| |
− | // Data (observations)
| |
− | std::vector<int> o; // vector observations
| |
− |
| |
− | // forward-backward probability
| |
− | Matrix615<double> alphas; // T * N_STATES : alphas[i,j] = alpha_i(i)
| |
− | Matrix615<double> betas; // T * N_STATES : betas[i,j] = beta_j(i)
| |
− | Matrix615<double> gammas; // T * N_STATES : gammas[i,j] = gammas_j(i) = Pr(q_t=i|o_t=i)
| |
− |
| |
− | // viterbi probability and paths
| |
− | Matrix615<double> deltas; // T * N_STATES
| |
− | Matrix615<int> phis; // T * N_STATES
| |
− | std::vector<int> mleStates; // vector of MLE states
| |
− |
| |
− | public:
| |
− | BiasedCoinHMM(double priors[N_STATES], double trans[N_STATES][N_STATES], double emis[N_DATA][N_STATES]) {
| |
− | for(int i=0; i < N_STATES; ++i) {
| |
− | pi.push_back(priors[i]);
| |
− | }
| |
− |
| |
− | A.resize(N_STATES,N_STATES);
| |
− |
| |
− | for(int i=0; i < N_STATES; ++i) {
| |
− | for(int j=0; j < N_STATES; ++j) {
| |
− | A.at(i,j) = trans[i][j];
| |
− | }
| |
− | }
| |
− |
| |
− | B.resize(N_DATA,N_STATES);
| |
− | for(int i=0; i < N_DATA; ++i) {
| |
− | for(int j=0; j < N_STATES; ++j) {
| |
− | B.at(i,j) = emis[i][j];
| |
− | }
| |
− | }
| |
− |
| |
− | T = 0;
| |
− | }
| |
− |
| |
− | void loadObservations(std::vector<int>& obs) {
| |
− | o = obs;
| |
− | T = o.size();
| |
− |
| |
− | alphas.resize(T,N_STATES);
| |
− | betas.resize(T,N_STATES);
| |
− | gammas.resize(T,N_STATES);
| |
− | deltas.resize(T,N_STATES);
| |
− | phis.resize(T,N_STATES);
| |
− | mleStates.resize(T);
| |
− | }
| |
− |
| |
− | void computeForwardBackward() {
| |
− | // *** FILL OUT *** to compute
| |
− | // forward and backward probabilities into alphas, betas, and gammas
| |
− | }
| |
− |
| |
− | void computeViterbiPath() {
| |
− | // *** FILL OUT *** to compute
| |
− | // viterbi path and likelihood into deltas, phis, and mleStates,
| |
− | }
| |
− |
| |
− | void outputResults(std::ostream& os, std::vector<int>& trueStates) {
| |
− | if ( T != (int)trueStates.size() ) {
| |
− | std::cerr << "True states are in different length with HMM" << std::endl;
| |
− | abort();
| |
− | }
| |
− |
| |
− | os << "#SEQ\tTRUE_S\tOBS\tP(q|o)\tMLE_S" << std::endl << std::fixed << std::setprecision(4);
| |
− | for(int t = 0; t < T; ++t) {
| |
− | os << t+1 << "\t"
| |
− | << stateLabels[trueStates[t]] << "\t"
| |
− | << dataLabels[o[t]] << "\t"
| |
− | << gammas.at(t,1) << "\t" // prints Pr(q_t=1|o)
| |
− | << stateLabels[mleStates[t]] << "\n";
| |
− | }
| |
− | }
| |
− | };
| |
− |
| |
− | int main(int argc, char** argv) {
| |
− | if ( argc != 2 ) {
| |
− | std::cerr << "Usage: coinHMM [inputFile]" << std::endl;
| |
− | return -1;
| |
− | }
| |
− |
| |
− | std::vector<int> trueStates;
| |
− | std::vector<int> observations;
| |
− | std::ifstream ifs(argv[1]);
| |
− |
| |
− | if ( !ifs.is_open() ) {
| |
− | std::cerr << "Cannot open file " << argv[1] << std::endl;
| |
− | return -1;
| |
− | }
| |
− |
| |
− | std::string tok;
| |
− | for(int i=0; ifs >> tok; ++i) {
| |
− | if ( i % 3 == 1 ) {
| |
− | if ( tok.compare("F") == 0 ) {
| |
− | trueStates.push_back(0);
| |
− | }
| |
− | else if ( tok.compare("B") == 0 ) {
| |
− | trueStates.push_back(1);
| |
− | }
| |
− | else {
| |
− | std::cerr << "Cannot recognize state " << tok << std::endl;
| |
− | }
| |
− | }
| |
− | else if ( i % 3 == 2 ) {
| |
− | if ( tok.compare("H") == 0 ) {
| |
− | observations.push_back(0);
| |
− | }
| |
− | else if ( tok.compare("T") == 0 ) {
| |
− | observations.push_back(1);
| |
− | }
| |
− | else {
| |
− | std::cerr << "Cannot recognize observation " << tok << std::endl;
| |
− | }
| |
− | }
| |
− | }
| |
− |
| |
− | std::cout << "Finished reading " << trueStates.size() << " states and observations.." << std::endl;
| |
− |
| |
− | double trans[N_STATES][N_STATES] = { {0.95,0.2}, {0.05,0.8} };
| |
− | double emis[N_DATA][N_STATES] = { {0.5,0.9}, {0.5,0.1} };
| |
− | double pi[N_STATES] = {0.9,0.1};
| |
− |
| |
− | BiasedCoinHMM bcHMM(pi, trans, emis);
| |
− |
| |
− | bcHMM.loadObservations(observations);
| |
− | bcHMM.computeForwardBackward();
| |
− | bcHMM.computeViterbiPath();
| |
− | bcHMM.outputResults(std::cout, trueStates);
| |
− | return 0;
| |
− | }
| |
− | | |
− | === Additional example of containing negative weights ===
| |
− | $ cat ./sampleInput2.txt
| |
− | 0 3 8 NIL -4
| |
− | NIL 0 NIL 1 7
| |
− | NIL 4 0 NIL NIL
| |
− | 2 NIL -5 0 NIL
| |
− | NIL NIL NIL 6 0
| |
− |
| |
− | $ ./floydWarshall ./sampleInput2.txt
| |
− | ---------- Initial D Matrix ---------------------------------
| |
− | 0 3 8 NIL -4
| |
− | NIL 0 NIL 1 7
| |
− | NIL 4 0 NIL NIL
| |
− | 2 NIL -5 0 NIL
| |
− | NIL NIL NIL 6 0
| |
− | ---------- Initial PI Matrix ---------------------------------
| |
− | NIL 0 0 NIL 0
| |
− | NIL NIL NIL 1 1
| |
− | NIL 2 NIL NIL NIL
| |
− | 3 NIL 3 NIL NIL
| |
− | NIL NIL NIL 4 NIL
| |
− | ---------- D Matrix after covering node 0----------
| |
− | 0 3 8 NIL -4
| |
− | NIL 0 NIL 1 7
| |
− | NIL 4 0 NIL 999995
| |
− | 2 5 -5 0 -2
| |
− | NIL NIL NIL 6 0
| |
− | ---------- PI Matrix after covering node 0----------
| |
− | NIL 0 0 NIL 0
| |
− | NIL NIL NIL 1 1
| |
− | NIL 2 NIL NIL 0
| |
− | 3 0 3 NIL 0
| |
− | NIL NIL NIL 4 NIL
| |
− | ---------- D Matrix after covering node 1----------
| |
− | 0 3 8 4 -4
| |
− | NIL 0 NIL 1 7
| |
− | NIL 4 0 5 11
| |
− | 2 5 -5 0 -2
| |
− | NIL NIL NIL 6 0
| |
− | ---------- PI Matrix after covering node 1----------
| |
− | NIL 0 0 1 0
| |
− | NIL NIL NIL 1 1
| |
− | NIL 2 NIL 1 1
| |
− | 3 0 3 NIL 0
| |
− | NIL NIL NIL 4 NIL
| |
− | ---------- D Matrix after covering node 2----------
| |
− | 0 3 8 4 -4
| |
− | NIL 0 NIL 1 7
| |
− | NIL 4 0 5 11
| |
− | 2 -1 -5 0 -2
| |
− | NIL NIL NIL 6 0
| |
− | ---------- PI Matrix after covering node 2----------
| |
− | NIL 0 0 1 0
| |
− | NIL NIL NIL 1 1
| |
− | NIL 2 NIL 1 1
| |
− | 3 2 3 NIL 0
| |
− | NIL NIL NIL 4 NIL
| |
− | ---------- D Matrix after covering node 3----------
| |
− | 0 3 -1 4 -4
| |
− | 3 0 -4 1 -1
| |
− | 7 4 0 5 3
| |
− | 2 -1 -5 0 -2
| |
− | 8 5 1 6 0
| |
− | ---------- PI Matrix after covering node 3----------
| |
− | NIL 0 3 1 0
| |
− | 3 NIL 3 1 0
| |
− | 3 2 NIL 1 0
| |
− | 3 2 3 NIL 0
| |
− | 3 2 3 4 NIL
| |
− | ---------- D Matrix after covering node 4----------
| |
− | 0 1 -3 2 -4
| |
− | 3 0 -4 1 -1
| |
− | 7 4 0 5 3
| |
− | 2 -1 -5 0 -2
| |
− | 8 5 1 6 0
| |
− | ---------- PI Matrix after covering node 4----------
| |
− | NIL 2 3 4 0
| |
− | 3 NIL 3 1 0
| |
− | 3 2 NIL 1 0
| |
− | 3 2 3 NIL 0
| |
− | 3 2 3 4 NIL
| |
− | Optimal path for 1 <- 0 (d = 1) : 1 <-(4)-- 2 <-(-5)-- 3 <-(6)-- 4 <-(-4)-- 0
| |
− | Optimal path for 2 <- 0 (d = -3) : 2 <-(-5)-- 3 <-(6)-- 4 <-(-4)-- 0
| |
− | Optimal path for 3 <- 0 (d = 2) : 3 <-(6)-- 4 <-(-4)-- 0
| |
− | Optimal path for 4 <- 0 (d = -4) : 4 <-(-4)-- 0
| |
− | Optimal path for 0 <- 1 (d = 3) : 0 <-(2)-- 3 <-(1)-- 1
| |
− | Optimal path for 2 <- 1 (d = -4) : 2 <-(-5)-- 3 <-(1)-- 1
| |
− | Optimal path for 3 <- 1 (d = 1) : 3 <-(1)-- 1
| |
− | Optimal path for 4 <- 1 (d = -1) : 4 <-(-4)-- 0 <-(2)-- 3 <-(1)-- 1
| |
− | Optimal path for 0 <- 2 (d = 7) : 0 <-(2)-- 3 <-(1)-- 1 <-(4)-- 2
| |
− | Optimal path for 1 <- 2 (d = 4) : 1 <-(4)-- 2
| |
− | Optimal path for 3 <- 2 (d = 5) : 3 <-(1)-- 1 <-(4)-- 2
| |
− | Optimal path for 4 <- 2 (d = 3) : 4 <-(-4)-- 0 <-(2)-- 3 <-(1)-- 1 <-(4)-- 2
| |
− | Optimal path for 0 <- 3 (d = 2) : 0 <-(2)-- 3
| |
− | Optimal path for 1 <- 3 (d = -1) : 1 <-(4)-- 2 <-(-5)-- 3
| |
− | Optimal path for 2 <- 3 (d = -5) : 2 <-(-5)-- 3
| |
− | Optimal path for 4 <- 3 (d = -2) : 4 <-(-4)-- 0 <-(2)-- 3
| |
− | Optimal path for 0 <- 4 (d = 8) : 0 <-(2)-- 3 <-(6)-- 4
| |
− | Optimal path for 1 <- 4 (d = 5) : 1 <-(4)-- 2 <-(-5)-- 3 <-(6)-- 4
| |
− | Optimal path for 2 <- 4 (d = 1) : 2 <-(-5)-- 3 <-(6)-- 4
| |
− | Optimal path for 3 <- 4 (d = 6) : 3 <-(6)-- 4
| |
− | | |
− | === Source code from Homework 2 - Problem 3 ===
| |
− | #include <iostream>
| |
− | #include <vector>
| |
− | #include <ctime>
| |
− | #include <fstream>
| |
− | #include <set>
| |
− | #include "mySortedArray.h"
| |
− | #include "myTree.h"
| |
− | #include "myList.h"
| |
− |
| |
− | int main(int argc, char** argv) {
| |
− | int tok;
| |
− | std::vector<int> v;
| |
− | if ( argc > 1 ) {
| |
− | std::ifstream fin(argv[1]);
| |
− | while( fin >> tok ) { v.push_back(tok); }
| |
− | fin.close();
| |
− | }
| |
− | else {
| |
− | while( std::cin >> tok ) { v.push_back(tok); }
| |
− | }
| |
− | mySortedArray<int> c1;
| |
− | myList<int> c2;
| |
− | myTree<int> c3;
| |
− | std::set<int> s;
| |
− |
| |
− | clock_t start = clock();
| |
− | for(int i=0; i < (int)v.size(); ++i) {
| |
− | c1.insert(v[i]);
| |
− | }
| |
− | clock_t finish = clock();
| |
− | double duration = (double)(finish-start)/CLOCKS_PER_SEC;
| |
− | std::cout << "Sorted Array (Insert) " << duration << std::endl;
| |
− |
| |
− | start = clock();
| |
− | for(int i=0; i < (int)v.size(); ++i) {
| |
− | c2.insert(v[i]);
| |
− | }
| |
− | finish = clock();
| |
− | duration = (double)(finish-start)/CLOCKS_PER_SEC;
| |
− | std::cout << "List (Insert) " << duration << std::endl;
| |
− |
| |
− | start = clock();
| |
− | for(int i=0; i < (int)v.size(); ++i) {
| |
− | c3.insert(v[i]);
| |
− | }
| |
− | finish = clock();
| |
− | duration = (double)(finish-start)/CLOCKS_PER_SEC;
| |
− | std::cout << "Tree (Insert) " << duration << std::endl;
| |
− |
| |
− | start = clock();
| |
− | for(int i=0; i < (int)v.size(); ++i) {
| |
− | s.insert(v[i]);
| |
− | }
| |
− | finish = clock();
| |
− | duration = (double)(finish-start)/CLOCKS_PER_SEC;
| |
− | std::cout << "std::set (Insert) " << duration << std::endl;
| |
− |
| |
− | start = clock();
| |
− | for(int i=0; i < (int)v.size(); ++i) {
| |
− | c1.search(v[i]);
| |
− | }
| |
− | finish = clock();
| |
− | duration = (double)(finish-start)/CLOCKS_PER_SEC;
| |
− | std::cout << "Sorted Array (Search) " << duration << std::endl;
| |
− |
| |
− | start = clock();
| |
− | for(int i=0; i < (int)v.size(); ++i) {
| |
− | c2.search(v[i]);
| |
− | }
| |
− | finish = clock();
| |
− | duration = (double)(finish-start)/CLOCKS_PER_SEC;
| |
− | std::cout << "List (Search) " << duration << std::endl;
| |
− |
| |
− | start = clock();
| |
− | for(int i=0; i < (int)v.size(); ++i) {
| |
− | c3.search(v[i]);
| |
− | }
| |
− | finish = clock();
| |
− | duration = (double)(finish-start)/CLOCKS_PER_SEC;
| |
− | std::cout << "Tree (Search) " << duration << std::endl;
| |
− |
| |
− | start = clock();
| |
− | for(int i=0; i < (int)v.size(); ++i) {
| |
− | s.find(v[i]);
| |
− | }
| |
− | finish = clock();
| |
− | duration = (double)(finish-start)/CLOCKS_PER_SEC;
| |
− | std::cout << "std::set (Search) " << duration << std::endl;
| |
− | }
| |
− | | |
− | === The header file of mySortedArray.h (For Homework 2 Problem 3)===
| |
− | | |
− | #include <iostream>
| |
− | #define DEFAULT_ALLOC 1024
| |
− | template <class T> // template supporting a generic type
| |
− | class mySortedArray {
| |
− | protected: // member variables hidden from outside
| |
− | T *data; // array of the genetic type
| |
− | int size; // number of elements in the container
| |
− | int nalloc; // # of objects allocated in the memory
| |
− | mySortedArray(mySortedArray& a) {};
| |
− | int search(const T& x, int begin, int end);
| |
− | public: // abstract interface visible to outside
| |
− | mySortedArray(); // default constructor
| |
− | ~mySortedArray(); // desctructor
| |
− | void insert(const T& x); // insert an element x
| |
− | int search(const T& x); // search for an element x and return its location
| |
− | bool remove(const T& x); // delete a particular element
| |
− | };
| |
− |
| |
− | template <class T>
| |
− | mySortedArray<T>::mySortedArray() { // default constructor
| |
− | size = 0; // array do not have element initially
| |
− | nalloc = DEFAULT_ALLOC;
| |
− | data = new T[nalloc]; // allocate default # of objects in memory
| |
− | }
| |
− |
| |
− | template <class T>
| |
− | mySortedArray<T>::~mySortedArray() { // destructor
| |
− | if ( data != NULL ) {
| |
− | delete [] data; // delete the allocated memory before destorying
| |
− | } // the object. otherwise, memory leak happens
| |
− | }
| |
− |
| |
− | template <class T>
| |
− | void mySortedArray<T>::insert(const T& x) {
| |
− | if ( size >= nalloc ) { // if container has more elements than allocated
| |
− | T* newdata = new T[nalloc*2]; // make an array at doubled size
| |
− | for(int i=0; i < nalloc; ++i) {
| |
− | newdata[i] = data[i]; // copy the contents of array
| |
− | }
| |
− | delete [] data; // delete the original array
| |
− | data = newdata; // and reassign data ptr
| |
− | }
| |
− |
| |
− | int i;
| |
− | for(i=size-1; (i >= 0) && (data[i] > x); --i) {
| |
− | data[i+1] = data[i]; // insert the list into right position
| |
− | }
| |
− | data[i+1] = x;
| |
− | ++size; // increase the size
| |
− | }
| |
− |
| |
− | template <class T>
| |
− | int mySortedArray<T>::search(const T& x) {
| |
− | return search(x, 0, size-1);
| |
− | }
| |
− |
| |
− | template <class T>
| |
− | int mySortedArray<T>::search(const T& x, int begin, int end) {
| |
− | if ( begin > end ) {
| |
− | return -1;
| |
− | }
| |
− | else {
| |
− | int mid = (begin+end)/2;
| |
− | if ( data[mid] == x ) {
| |
− | return mid;
| |
− | }
| |
− | else if ( data[mid] < x ) {
| |
− | return search(x, mid+1, end);
| |
− | }
| |
− | else {
| |
− | return search(x, begin, mid);
| |
− | }
| |
− | }
| |
− | }
| |
− |
| |
− | template <class T>
| |
− | bool mySortedArray<T>::remove(const T& x) {
| |
− | int i = search(x); // try to find the element
| |
− | if ( i >= 0 ) { // if found
| |
− | for(int j=i; j < size-1; ++j) {
| |
− | data[i] = data[i+1]; // shift all the elements by one
| |
− | }
| |
− | --size; // and reduce the array size
| |
− | return true; // successfully removed the value
| |
− | }
| |
− | else {
| |
− | return false; // cannot find the value to remove
| |
− | }
| |
− | }
| |
− | | |
− | === Example code to generare all possible permutations ===
| |
− | | |
− | Note that http://www.cplusplus.com/reference/algorithm/next_permutation/ provides much simpler solution for generating permutation. (Thanks to Matthew Snyder)
| |
− | | |
− | // Code to generate permutations from 1..n
| |
− | // These code was adopted from
| |
− | // http://geeksforgeeks.org/?p=767 by Hyun Min Kang
| |
− | #include <iostream>
| |
− | #include <vector>
| |
− |
| |
− | // swaps two elements
| |
− | void swap(std::vector<int>& v, int i, int j) {
| |
− | int tmp = v[i];
| |
− | v[i] = v[j];
| |
− | v[j] = tmp;
| |
− | return;
| |
− | }
| |
− |
| |
− | // actual engine - recursively calls permutation
| |
− | void permute(std::vector<int>& v, int from, int to) {
| |
− | if ( from == to ) { // print the permutation
| |
− | for(int i=0; i < v.size(); ++i) {
| |
− | std::cout << " " << v[i];
| |
− | }
| |
− | std::cout << std::endl;
| |
− | }
| |
− | else {
| |
− | for(int i = from; i <= to; ++i) {
| |
− | swap(v,from,i); // swaps two elements
| |
− | permute(v, from+1, to); // permute the rest
| |
− | swap(v,from,i); // recover the vector to the original state
| |
− | }
| |
− | }
| |
− | }
| |
− |
| |
− | // Permutation Example
| |
− | int main(int argc, char** argv) {
| |
− | if ( argc != 2 ) {
| |
− | std::cerr << "Usage: " << argv[0] << " " << "[# of samples]" << std::endl;
| |
− | return -1;
| |
− | }
| |
− |
| |
− | // takes input from 1..n
| |
− | int n = atoi(argv[1]);
| |
− | std::vector<int> input;
| |
− | for(int i=1; i <= n; ++i) {
| |
− | input.push_back(i);
| |
− | }
| |
− | permute(input, 0, n-1); // print all possible permutations
| |
− | return 0;
| |
− | }
| |
− | | |
− | #include <iostream> // for input/output
| |
− | #include <boost/graph/adjacency_list.hpp> // for using graph type
| |
− | #include <boost/graph/dijkstra_shortest_paths.hpp> // for dijkstra algorithm
| |
− | using namespace std; // allow to omit prefix 'std::'
| |
− | using namespace boost; // allow to omit prefix 'boost::'
| |
− |
| |
− | int getNodeID(int node) {
| |
− | return ((node/5)+1)*10+(node%5+1);
| |
− | }
| |
− | | |
− | === Code for Problem 3 - Problem 2 ===
| |
− |
| |
− | #include <iostream> // for input/output
| |
− | #include <boost/graph/adjacency_list.hpp> // for using graph type
| |
− | #include <boost/graph/dijkstra_shortest_paths.hpp> // for dijkstra algorithm
| |
− | using namespace std; // allow to omit prefix 'std::'
| |
− | using namespace boost; // allow to omit prefix 'boost::'
| |
− |
| |
− | int getNodeID(int node) {
| |
− | return ((node/5)+1)*10+(node%5+1);
| |
− | }
| |
− |
| |
− | int main(int argc, char** argv) {
| |
− | // defining a graph type
| |
− | // 1. edges are stored as std::list internally
| |
− | // 2. verticies are stored as std::vector internally
| |
− | // 3. the graph is directed (undirectedS, bidirectionalS can be used)
| |
− | // 4. vertices do not carry particular properties
| |
− | // 5. edges contains weight property as integer value
| |
− | typedef adjacency_list< listS, vecS, directedS, no_property, property< edge_weight_t, int> > graph_t;
| |
− |
| |
− | // vertex_descriptor is a type for representing vertices
| |
− | typedef graph_traits< graph_t >::vertex_descriptor vertex_descriptor;
| |
− | // a nodes is represented as an integer, and an edge is a pair of integers
| |
− | typedef std::pair<int, int> E;
| |
− |
| |
− | // Connect between verticies as in the Manhattan Tourist Problem
| |
− | // Each node is numbered as a two-digit integer of [#row] and [#col]
| |
− | enum { N11, N12, N13, N14, N15,
| |
− | N21, N22, N23, N24, N25,
| |
− | N31, N32, N33, N34, N35,
| |
− | N41, N42, N43, N44, N45,
| |
− | N51, N52, N53, N54, N55 };
| |
− |
| |
− | E edges [] = { E(N11,N12), E(N12,N13), E(N13,N14), E(N14,N15),
| |
− | E(N21,N22), E(N22,N23), E(N23,N24), E(N24,N25),
| |
− | E(N31,N32), E(N32,N33), E(N33,N34), E(N34,N35),
| |
− | E(N41,N42), E(N42,N43), E(N43,N44), E(N44,N45),
| |
− | E(N51,N52), E(N52,N53), E(N53,N54), E(N54,N55),
| |
− | E(N11,N21), E(N12,N22), E(N13,N23), E(N14,N24), E(N15,N25),
| |
− | E(N21,N31), E(N22,N32), E(N23,N33), E(N24,N34), E(N25,N35),
| |
− | E(N31,N41), E(N32,N42), E(N33,N43), E(N34,N44), E(N35,N45),
| |
− | E(N41,N51), E(N42,N52), E(N43,N53), E(N44,N54), E(N45,N55) };
| |
− | // Assign weights for each edge
| |
− | int weight [] = { 4, 2, 0, 7, // horizontal weights
| |
− | 7, 4, 5, 9,
| |
− | 6, 8, 1, 0,
| |
− | 1, 6, 4, 7,
| |
− | 1, 5, 8, 5,
| |
− | 0, 6, 6, 2, 4, // vertical weights
| |
− | 9, 7, 1, 0, 6,
| |
− | 1, 8, 4, 8, 9,
| |
− | 3, 6, 6, 0, 7 };
| |
− |
| |
− | // define a graph as an array of edges and weights
| |
− | #if defined(BOOST_MSVC) && BOOST_MSVC <= 1300 // special routine for MS VC++
| |
− | typedef graph_traits < graph_t >::edge_descriptor edge_descriptor;
| |
− | graph_t g(25);
| |
− | property_map<graph_t, edge_weight_t>::type weightmap = get(edge_weight, g);
| |
− | for (std::size_t j = 0; j < sizeof(edges)/sizeof(E); ++j) {
| |
− | edge_descriptor e;
| |
− | bool inserted;
| |
− | boost::tie(e, inserted) = add_edge(edge_array[j].first, edge_array[j].second, g);
| |
− | weightmap[e] = weights[j];
| |
− | }
| |
− | #else // for regulat compilers
| |
− | graph_t g(edges, edges + sizeof(edges) / sizeof(E), weight, 25);
| |
− | #endif
| |
− |
| |
− | std::vector<vertex_descriptor> p(num_vertices(g));
| |
− | std::vector<int> d(num_vertices(g));
| |
− | vertex_descriptor s = vertex(N11, g);
| |
− |
| |
− | #if defined(BOOST_MSVC) && BOOST_MSVC <= 1300 // for VC++
| |
− | property_map<graph_t, vertex_index_t>::type indexmap = get(vertex_index, g);
| |
− | dijkstra_shortest_paths(g, s, &p[0], &d[0], weightmap, indexmap,
| |
− | std::less<int>(), closed_plus<int>(),
| |
− | (std::numeric_limits<int>::max)(), 0,
| |
− | default_dijkstra_visitor());
| |
− | #else // for regular compilers
| |
− | dijkstra_shortest_paths(g, s, predecessor_map(&p[0]).distance_map(&d[0]));
| |
− | #endif
| |
− | graph_traits < graph_t >::vertex_iterator vi, vend;
| |
− |
| |
− | std::cout << "Backtracking the optimal path from the destination to source" << std::endl;
| |
− | for(int node = N55; node != N11; node = p[node]) {
| |
− | std::cout << "Path: N" << getNodeID(p[node]) << " -> N" << getNodeID(node) << ", Distance from origin is " << d[node] << std::endl;
| |
− | }
| |
− | return 0;
| |
− | }
| |
| | | |
| == Office Hours == | | == Office Hours == |
− | * Friday 9:30AM-12:30PM | + | * Friday 9:00AM-10:30PM |
− | | |
− | == Information on Biostatistics Cluster ==
| |
− | * You may want to request an account to the biostatistics cluster. Refer to https://webservices.itcs.umich.edu/mediawiki/Biostatistics/index.php/Cluster for further information.
| |
| | | |
| == Standards of Academic Conduct == | | == Standards of Academic Conduct == |
Line 1,036: |
Line 153: |
| | | |
| == Course History == | | == Course History == |
− | | + | * Winter 2011 Course Web Site [[Biostatistics_615/815_Winter_2011]] |
− | Goncalo Abecasis taught it in several academic years previously. For previous course notes, see [[http://www.sph.umich.edu/csg/abecasis/class Goncalo's older class notes]]. | + | * Goncalo Abecasis taught it in several academic years previously. For previous course notes, see [[http://www.sph.umich.edu/csg/abecasis/class Goncalo's older class notes]]. |