CSCI/MATH/STAT 732

INTRODUCTION TO BIOINFORMATICS

Welcome to all of you!

INSTRUCTOR

LECTURES

TEXTBOOK (REQUIRED)

Dan Krane, Michael Raymer, Fundamental Concepts of Bioinformatics, Benjamin Cummings, 2003. 

COURSE DESCRIPTION

An introduction to the principles of bioinformatics including statistical techniques for the analysis of one or more gene sequences, and computational techniques for knowledge discovery from biological data.

COURSE OBJECTIVE

The objective of this course is to provide the student with sufficient understanding of the principles of bioinformatics to pursue research topics in collaboration between computer scientists and biologists.

Representative algorithms to solve a variety of bioinformatics problems will be discussed.

Scope

Topics include

PREREQUISITES

GRADING



Points each
Points total

2

Exams

100

200

5

Homework Assignments

20

100

1

Project

100

100

1
Final Exam (optional)
(100)

Total

400


The cut-off will not be higher than 90% for A, 80% for B, 70% for C, and 60% for D, but may be lower.

Project

The course will include a project that will be done in groups that consist of a biologist subgroup of 1-3 students and a computer scientist subgroup of 1-3 students.  If you think you could belong to either group you can chose which part to do.  The biologist subgroup will find data, explain, and document it, and explain interesting questions to the computer science subgroup.  The computer scientist subgroup will implement an algorithm on that data.  Eventually the biology subgroup will explain the results together with the computer science subgroup.  Simplified dataset-algorithm combinations will be provided to seed the process but students are by no means limited to those.  Please contact me as soon as possible with any ideas you have.

PLEASE NOTE: Any student with disabilities or who needs special accommodations in this course is invited to share these concerns or requests with the instructor as soon as possible.
PLEASE NOTE: All work in this course must be completed in a manner consistent with NDSU University Senate Policy, Section 335: Code of Academic Responsibility and Conduct (http://www.ndsu.nodak.edu/policy/335.htm).


 

Course Content

Week 1 (Jan 8)
Concepts of Bioinformatics
  • From Atoms to Humans
  • From Mathematics to Biology

Week 2 (Jan 15)
(Mon Holiday)
Algorithms in Bioinformatics
  • Concepts and performance of an algorithm

Week 3 (Jan 22)
Sequence Alignment
  • Dynamic programming and sequence alignment algorithms
  • Global, local, and semi-global alignments
  • Needleman-Wunsch and Smith-Waterman algorithm
Assignment 1
  • Identify context of bioinformatics literature
  • Contribute to shared information resource
Due Jan 22
Week 4 (Jan 29)
Alignment of More Than Two Sequences
  • Multiple alignment algorithms

Week 5 (Feb 5)
Biological Databases and Database Search
  • BLAST
  • Database search vs. pairwise alignment algorithms
  • Sequence databases
  • Primary vs. secondary databases
  • Flat files vs. database access vs. xml
  • Database principles (avoiding redundancy)
Assignment 2

  • Project proposal

Due Feb 9
Week 6 (Feb 12)
Phylogenetic Trees
  • Biological background on phylogenetic relationships
  • Character states
  • Parsimony algorithms
  • Distance-based algorithms

Week 7 (Feb 19)
Mon Holiday
Clustering Gene Expression Data
  • Gene expression data
  • Hierarchical clustering and its relationship to phylogenetic tree construction
  • k-means and density-based clustering
  • Clustering vs. classification
Assignment 3
  • Literature review for project
Due Feb 23
Week 8 (Feb 26)
Classification in Bioinformatics
  • Function prediction
  • Cancer prediction
Presentations of literature review for project
  • Two groups each Friday

Starting Feb 16
Week 9
Exam 1 (Mar 7)
Spring Break
Week 10 (Mar 19)
Motif Discovery
  • Frequent subsequence algorithms
  • Weight matrices

Week 11 (Mar 26)
Hidden Markov Models
  • Markov chains
  • Viterbi algorithm
  • Profile HMMs

Week 12 (Apr 2)
Protein Structure Prediction
  • Biological Background
  • Chou Fasman algorithms
  • Optimization algorithms
Assignment 3
  • Each student picks two of the weekly topics
  • Shows how they contribute to his/her project
Due Apr 6
Week 13 (Apr 9)
Genome Rearrangements
  • Sorting by reversal

Week 14 (Apr 16)
Interaction-, Regulation-, and
Metabolic Networks
  • Scale-free networks
  • Boolean network models
Assignment 5
  • Implementation for quantitative students
  • Biological interpretation for biology student
Due Apr 20
Week 15
Exam 2 (Apr 25)
Week 16 (Apr 30)
Project Presentations
Project Report due May 4
Finals Week


Further Reading

[1]  T.A. Brown, "Genomes,"  John Wiley & Sons, New York, NY, 1999.

Excellent introduction to genomics.  

[2] J.C Setubal , J. Meidanis, "Introduction to computational Molecular Biology," PWS Publishing Company, Boston, MA, 1997.

Important algorithms, precise treatment.

[3] R. Durbin, S. Eddy, A. Krogh, G. Mitchison, "Biological Sequence Analysis, Probabilistic Models of Proteins and Nucleic Acids," Cambridge University Press, Cambridge , UK , 1998.

Covers Hidden Markov Models and related probabilistic models comprehensively .

[4] A. D. Baxevanis, B.F.F. Ouellette, "Bioinformatics, A Practical Guide to the Analysis of Genes and Proteins,"  2nd  Edition, Wiley-Interscience, New York, NY, 2001.

Extensive collection and description of available databases and tools.

[5] H.-W. Mewes , H. Seidel, and B. Weiss, "Bioinformatics and Genome Analysis," Springer, Berlin , Germany , 2002.

Useful collection of bioinformatics papers.

[6] G. Gibson and S.V. Muse, "A Primer of Genome Science," Sinauer Associates, Inc. Publishers, Sunderland , MA , 2002.

Textbook for Genomics course.   Covers some bioinformatics as well!

[7] A.M. Campbell and L.J. Heyer, "Genomics, Proteomics, and Bioinformatics," Benjamin Cummings, San Francisco , CA , 2003.

Techniques and practical examples.   Focus on Genomics and Proteomics.

[8] T.H. Cormen, C.E. Leiserson, and R.L. Rivest, "Introduction to Algorithms," The MIT Press, Cambridge, MA, 1989.

Standard Computer Science textbook on algorithms.

[9] T. Hastie, R. Tibshirani, and J. Friedman, "The Elements of Statistical Learning: Data Mining, Inference, and Prediction," Springer, New York, NY, 2001.

Data mining text that closes the gap to machine learning and statistics.

[10] I.H. Witten , E. Frank, "Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations," Morgan Kaufmann,  San Francisco , CA , 1999.

There are many data mining texts around.  This one is easier to read than [9].

[11] "Bioinformatics" http://bioinformatics.oupjournals.org/

Current editions: only abstract, older editions (approx. 10mo. old): full text accessible