Biostatistics 615/815: Main Page
Objective
In Fall 2012, Biostatistics 615/815 aims for providing students with a practical understanding of computational aspects in implementing statistical methods. C++ language will be used throughout the course.
Target Audience
Students in Biostatistics 615 should be comfortable with simple algebra and basic statistics including probability distribution, linear model, and hypothesis testing. Previous experience in programming is not required, but those who do not have previous programming experience should expect to spend additional time studying and learning to be familiar with a programming language during the coursework. Most students registering for the course are Masters or Doctoral students in Biostatistics, Statistics, Bioinformatics or Human Genetics.
Students in Biostatistics 815 should be familiar with programming languages so that they can complete the class project tackling an advanced statistical problem during the semester. Project will be carried out in teams of 2. The details of the possible projects will be announced soon.
Textbook
- Recommended Textbook : Cormen, Leiserson, Rivest, and Stein, "Introduction to Algorithms", Third Edition, The MIT Press, 2009 [Official Book Web Site]
- Recommended Textbook : Press, Teukolsky, Vetterling, Flannery, "Numerical Recipes", 3rd Edition, Cambridge University Press, 2007 [Official Book Web Site]
- Optional Textbook : Stephen Prata, "C++ Primer Plus", Sixth Edition, Addison-Wesley, 2011
Class Schedule
Classes are scheduled for Tuesday and Thursdays, 8:30 - 10:00 am at SPH II M4332
Topics
The following contents are planned to be covered.
Part I : C++ Basics and Introductory Algorithms
- Computational Time Complexity
- Sorting
- Divide and Conquer Algorithms
- Searching
- Key Data Structure
- Dynamic Programming
- Hidden Markov Models
- Interface between C++ and R
Part II : Numerical Methods and Randomized Algorithms
- Random Numbers
- Matrix Operations and Least Square Methods
- Importance Sampling
- Expectation Maximization
- Markov-Chain Monte Carlo Methods
- Simulated Annealing
- Gibbs Sampling
Class Notes
- Lecture 1 : Introduction to Statistical Computing -- (Handout PDF) (Presentation PDF) (revised 9/10)
- Lecture 2 : Introduction to C++ Programming -- (Handout PDF) (Presentation PDF) (revised 9/10)
- Lecture 3 : C++ Basics and Fishers Exact Test -- (Handout PDF) (Presentation PDF) (revised 9/18)
- Lecture 4 : STLs and Divide & Conquer Algorithms -- (Handout PDF) (Presentation PDF) (uploaded 9/12)
- Lecture 5 : Merge Sort, Quicksort, and arrays -- (Handout PDF) (Presentation PDF) (revised 9/20)
- Lecture 6 : Elementary Data Structures -- (Handout PDF) (Presentation PDF) (revised 9/24)
- Lecture 7 : Elementary Data Structures -- (Handout PDF) (Presentation PDF) (revised 10/9)
- Lecture 8 : Dynamic Programming -- (Handout PDF) (Presentation PDF) (revised 10/3)
- Lecture 9 : Dynamic Programming & Hidden Markov Models -- (Handout PDF) (Presentation PDF) (revised 10/3)
- Lecture 10 : Hidden Markov Models -- (Handout PDF) (Presentation PDF) (uploaded 10/3)
- Lecture 11 : Hidden Markov Models, STLs, and Boost Library -- (Handout PDF) (Presentation PDF) (revised 10/17)
- Lecture 12 : Interfacing R and C++ -- (Handout PDF) (Presentation PDF) (revised 10/17)
- Lecture 13 : R packages and Matrix Library -- (Handout PDF) (Presentation PDF) (revised 10/30)
- Lecture 14 : Matrix Computation -- (Handout PDF) (Presentation PDF) (revised 10/30)
- Lecture 15 : Random Numbers & Monte Carlo Methods -- (Handout PDF) (Presentation PDF) (revised 10/31)
- Lecture 16 : Importance Sampling & Root Finding -- (Handout PDF) (Presentation PDF) (uploaded 10/31)
- Lecture 17 : Single Dimensional Optimization -- (Handout PDF) (Presentation PDF) (uploaded 11/13)
- Lecture 18 : Multi-Dimensional Optimization -- (Handout PDF) (Presentation PDF) (revised 12/6)
- Lecture 19 : E-M Algorithm -- (Handout PDF) (Presentation PDF) (revised 12/6)
- Lecture 20 : Simulated Annealing -- (Handout PDF) (Presentation PDF) (revised 12/9)
- Lecture 21 : Gibbs Sampling -- (Handout PDF) (Presentation PDF) (revised 12/6)
- Lecture 22 : Advanced Hidden Markov Models -- (Handout PDF) (Presentation PDF) (revised 12/18)
Problem Sets
- Problem Set 0 (Due September 10) : (PDF) (revised on 9/12)
- Problem Set 1 (Due September 22) : (PDF) (revised on 9/18)
- Problem Set 2 (Due October 6) : (PDF)
- Problem Set 3 (Due October 20) : (PDF) (corrected on 10/17)
- Problem Set 4 (Due November 10) : (PDF) (uploaded on 10/30)
- Problem Set 5 (Due December 1) : (PDF) (uploaded on 11/17)
- Problem Set 6 (Due December 19) : (PDF) (revised on 12/18)
Exams
- Midterm on 10/23 : (PDF)
815 Term Project
See Biostatistics_815_Term_Project for detailed information
Office Hours
- Friday 9:00AM-10:30PM
Standards of Academic Conduct
- See "Assignment" section in Lecture 01 for details of honor code.
- The following is an extract from the School of Public Health's Student Code of Conduct [1]:
Student academic misconduct includes behavior involving plagiarism, cheating, fabrication, falsification of records or official documents, intentional misuse of equipment or materials, and aiding and abetting the perpetration of such acts. The preparation of reports, papers, and examinations, assigned on an individual basis, must represent each student’s own effort. Reference sources should be indicated clearly. The use of assistance from other students or aids of any kind during a written examination, except when the use of books or notes has been approved by an instructor, is a violation of the standard of academic conduct.
In the context of this course, any work you hand-in should be your own.
Course History
- Winter 2011 Course Web Site Biostatistics_615/815_Winter_2011
- Fall 2011 Course Web Site Biostatistics_615/815_Fall_2011
- Goncalo Abecasis taught it in several academic years previously. For previous course notes, see [Goncalo's older class notes].