CS713028Z: Bioinformatics Algorithms  Spring 2014
Course Information

Staff
Instructors:
Minghua Deng (PKU) ,
Dongbo Bu (ICT)
Email: dbu AT ict.ac.cn , dengmh AT math.pku.edu.cn
Office: Room 0844, ICT Building,
Tel: 62600844
TAs:
Haicang Zhang , Chunlin Huang, Renyu Zhang , Qing Xu,
, Yaojun Wang
, Bin Ling
Email: TA of alGorithm Courses
Location: 0817, ICT Building,
Tel: 62600817
Office Hours: 3:00pm6:00pm, Thursday

Textbooks (recommended, not required):
* R. Durbin, S. R. Eddy, A. Krogh, G. Michison, Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids, Cambridge University Press, 1 edition, 1998.
* C. M. Bishop, Pattern recognition and machine learning, Springer, 2007.
* N. C. Jones, and P. A. Pevzner, An Introduction to Bioinformatics Algorithms , The MIT Press, 2004.
Other reading material:
* Hang Li, Statistical Machine Learning, Tsinghua University Press, 2012.
* Algorithms in Molecular Biology (Lectures by Ron Shamir, Fall 1998)
* Introduction to Computational Biology: Maps, Sequences and Genomes, by Michael S. Waterman, Chapman and Hall/CRC, 1995.

Goals:
* to master the ability to develop a statistical model for a bioinformatic problem;
* to master the ability to train a model, and make inference, etc;

Prerequisites:
We will assume knowledge of:
* Basic knowledge of probability theory and statistics;
* Basic knowledge of genome and proteome;
* Basic knowledge of algorithms, and combinatorial optimization;

Bioinformatic problems:
We will cover the following problems if time permits.
* Sequence analysis, including alignment, motif finding, etc;
* RNA expression analysis, including expression clustering, regulation inferenece, etc;
* Proteome analysis, including protein interactio network, protein structure prediction, etc;

Models and algorithms:
We will describe the following probabilistic models and algorithms to solve the above problems:
* HMM, EM, and Gausian mixture model;
* MCMC and Gibbs sampling;
* Logistic regression, Maximum entropy model;
* Graphical model, including Bayesian network, Markov random field, Conditional random field, Lasso technique, Gaussian graphical model, Belief propagation, and Lasso technique;
Grading policies
Each student is expected to accomplish a project and attend the final examination.
Weekly Schedule
The week number is an active link  each week has its own page that
includes required reading, recommended reading, assignment (if any),
teaching assistants, etc. (Topics for weeks beyond the current and next
are always tentative.)
 Week 1: Introduction

Lecture 1: A brief introduction to statistical machine learning;
 Reading material:
 Week 2: Gene finding

Lecture 2: Hidden Markov Model and Gene Finding
 Reading material:
 Chapter 3 of Biological sequence analysis
 Week 3: Sequence alignment

Lecture 3: Pairwise HMM, and profile HMM
 Reading material:
 Week 4: Phylogeny tree

Lecture 4: Probabilistic models for phylogeny tree
 Reading material:
 Week 5: Sequence motif finding

Lecture 5: EM, Gibbs sampling,and MCMC
 Reading material:
 Chapter 6 of Biological sequence analysis
 Week 6: Gene expression analysis

Lecture 6: Clustering and classification
 Reading material:
 Week 7: Regulatory inference from gene expression

Lecture 7: Bayesian network
 Reading material:
 Week 8: Protein interaction network and protein function prediction

Lecture 8: Markov random field, variational Bayesian
 Reading material:
 Week 9: Data preprocessing techniques

Lecture 9: Dimension reduction, PCA, SVD, MDS, etc
 Reading material: