Biostatistics 615/815 Fall 2011

From Genome Analysis Wiki
Jump to navigationJump to search

Objective

In this winter, Biostatistics 615/815 aims for providing students with a practical understanding of computational aspects in implementing statistical methods. Although C++ language will be used throughout the course, using Java programming language for homework and project will be acceptable.

Target Audience

Students in Biostatistics 615 should be comfortable with simple algebra and basic statistics including probability distribution, linear model, and hypothesis testing. Previous experience in programming is not required, but those who do not have previous programming experience should expect to spend additional time studying and learning to be familiar with a programming language during the coursework. Most students registering for the course are Masters or Doctoral students in Biostatistics, Statistics, Bioinformatics or Human Genetics.

Students in Biostatistics 815 should be familiar with programming languages so that they can complete the class project tackling an advanced statistical problem during the semester. Project will be carried out in teams of 2. The details of the possible projects will be announced soon.

Textbook

  • Required Textbook : Cormen, Leiserson, Rivest, and Stein, "Introduction to Algorithms", Third Edition, The MIT Press, 2009 [Official Book Web Site]
  • Optional Textbook : Press, Teukolsky, Vetterling, Flannery, "Numerical Recipes", 3rd Edition, Cambridge University Press, 2007 [Official Book Web Site]

Class Schedule

Classes are scheduled for Tuesday and Thursdays, 8:30 - 10:00 am at SPH II M4318

Topics

The following contents are planned to be covered.

Part I : Algorithms 101

  • Understanding of Computational Time Complexity
  • Sorting
  • Divide and Conquer Algorithms
  • Searching
  • Key Data Structure
  • Dynamic Programming

Part II : Matrix Operations and Numerical Optimizations

  • Matrix decomposition (LU, QR, SVD)
  • Implementation of Linear Models
  • Numerical Optimizations

Part III : Advanced Statistical Methods

  • Hidden Markov Models
  • Expectation Maximization
  • Markov-Chain Monte Carlo Methods

Class Notes

Problem Sets

#include <iostream>
#include <vector>
#include <ctime>
#include <fstream>
#include <set>
#include "mySortedArray.h"
#include "myTree.h"
#include "myList.h"                                            

int main(int argc, char** argv) {
  int tok;
  std::vector<int> v;
  if ( argc > 1 ) {
    std::ifstream fin(argv[1]);
    while( fin >> tok ) { v.push_back(tok); }
    fin.close();
  }
  else {
    while( std::cin >> tok ) { v.push_back(tok); }
  }                                             
  mySortedArray<int> c1;
  myList<int> c2;
  myTree<int> c3;
  std::set<int> s;

  clock_t start = clock();
  for(int i=0; i < (int)v.size(); ++i) {
    c1.insert(v[i]);
  }
  clock_t finish = clock();
  double duration = (double)(finish-start)/CLOCKS_PER_SEC;  
  std::cout << "Sorted Array (Insert) " << duration << std::endl;

  start = clock();
  for(int i=0; i < (int)v.size(); ++i) {
    c2.insert(v[i]);
  }
  finish = clock();
  duration = (double)(finish-start)/CLOCKS_PER_SEC;  
  std::cout << "List (Insert) " << duration << std::endl;
 
  start = clock();
  for(int i=0; i < (int)v.size(); ++i) {
    c3.insert(v[i]);
  }
  finish = clock();
  duration = (double)(finish-start)/CLOCKS_PER_SEC;  
  std::cout << "Tree (Insert) " << duration << std::endl;

  start = clock();
  for(int i=0; i < (int)v.size(); ++i) {
    s.insert(v[i]);
  }
  finish = clock();
  duration = (double)(finish-start)/CLOCKS_PER_SEC;  
  std::cout << "std::set (Insert) " << duration << std::endl;

  start = clock();
  for(int i=0; i < (int)v.size(); ++i) {
    c1.search(v[i]);
  }
  finish = clock();
  duration = (double)(finish-start)/CLOCKS_PER_SEC;  
  std::cout << "Sorted Array (Search) " << duration << std::endl;

  start = clock();
  for(int i=0; i < (int)v.size(); ++i) {
    c2.search(v[i]);
  }
  finish = clock();
  duration = (double)(finish-start)/CLOCKS_PER_SEC;  
  std::cout << "List (Search) " << duration << std::endl;

  start = clock();
  for(int i=0; i < (int)v.size(); ++i) {
    c3.search(v[i]);
  }
  finish = clock();
  duration = (double)(finish-start)/CLOCKS_PER_SEC;  
  std::cout << "Tree (Search) " << duration << std::endl;

  start = clock();
  for(int i=0; i < (int)v.size(); ++i) {
    s.find(v[i]);
  }
  finish = clock();
  duration = (double)(finish-start)/CLOCKS_PER_SEC;  
  std::cout << "std::set (Search) " << duration << std::endl;
}


  1. include <iostream>
  2. define DEFAULT_ALLOC 1024

template <class T> // template supporting a generic type class mySortedArray { protected: // member variables hidden from outside

  T *data;    // array of the genetic type
  int size;   // number of elements in the container
  int nalloc; // # of objects allocated in the memory
 mySortedArray(mySortedArray& a) {};
 int search(const T& x, int begin, int end);

public: // abstract interface visible to outside

  mySortedArray();         // default constructor
  ~mySortedArray();        // desctructor
  void insert(const T& x); // insert an element x
  int search(const T& x);  // search for an element x and return its location
  bool remove(const T& x); // delete a particular element

};

template <class T> mySortedArray<T>::mySortedArray() { // default constructor

 size = 0;              // array do not have element initially
 nalloc = DEFAULT_ALLOC;
 data = new T[nalloc];  // allocate default # of objects in memory

}

template <class T> mySortedArray<T>::~mySortedArray() { // destructor

 if ( data != NULL ) {
   delete [] data;      // delete the allocated memory before destorying
 }                      // the object. otherwise, memory leak happens

}

template <class T> void mySortedArray<T>::insert(const T& x) {

 if ( size >= nalloc ) {  // if container has more elements than allocated
   T* newdata = new T[nalloc*2];   // make an array at doubled size
   for(int i=0; i < nalloc; ++i) {
     newdata[i] = data[i];         // copy the contents of array
   }
   delete [] data;                 // delete the original array
   data = newdata;                 // and reassign data ptr
 }
 int i;
 for(i=size-1; (i >= 0) && (data[i] > x); --i) {
   data[i+1] = data[i];            // insert the list into right position
 }
 data[i+1] = x;
 ++size;                           // increase the size

}

template <class T> int mySortedArray<T>::search(const T& x) {

 return search(x, 0, size-1);

}

template <class T> int mySortedArray<T>::search(const T& x, int begin, int end) {

 if ( begin > end ) {
   return -1;
 }
 else {
   int mid = (begin+end)/2;
   if ( data[mid] == x ) {
     return mid;
   }
   else if ( data[mid] < x ) {
     return search(x, mid+1, end);
   }
   else {
     return search(x, begin, mid);
   }
 }

}

template <class T> bool mySortedArray<T>::remove(const T& x) {

 int i = search(x);  // try to find the element
 if ( i >= 0 ) {      // if found
   for(int j=i; j < size-1; ++j) {
     data[i] = data[i+1];  // shift all the elements by one
   }
   --size;           // and reduce the array size
   return true;      // successfully removed the value
 }
 else {
   return false;     // cannot find the value to remove
 }

}

Office Hours

  • Friday 9:30AM-12:30PM

Information on Biostatistics Cluster

Standards of Academic Conduct

The following is an extract from the School of Public Health's Student Code of Conduct [1]:

Student academic misconduct includes behavior involving plagiarism, cheating, fabrication, falsification of records or official documents, intentional misuse of equipment or materials, and aiding and abetting the perpetration of such acts. The preparation of reports, papers, and examinations, assigned on an individual basis, must represent each student’s own effort. Reference sources should be indicated clearly. The use of assistance from other students or aids of any kind during a written examination, except when the use of books or notes has been approved by an instructor, is a violation of the standard of academic conduct.

In the context of this course, any work you hand-in should be your own.

Course History

Goncalo Abecasis taught it in several academic years previously. For previous course notes, see [Goncalo's older class notes].