Bioinformatics II: Theoretical Bioinformatics and Machine Learning (4VO)

Course no.: 365.034
Lecturer: Sepp Hochreiter
Times/locations: Mon 15:30-17:00, HS 14 and
Thur 15:30-17:00, T 111
Start: Mon 5.3.2012
Mode: VO, 4h, weekly
Registration: KUSSS

Lecture notes:

PDF (8 MB)

Slides:

Part 1 (low) (PDF, 1.8 MB) / Part 1 (high) (PDF, 24.1 MB)
Part 2 (low) (PDF, 3.2 MB) / Part 2 (high) (PDF, 8.6 MB)
Part 3 (low) (PDF, 3.1 MB) / Part 3 (high) (PDF, 14.1 MB)
Part 4 (low) (PDF, 2.0 MB) / Part 4 (high) (PDF, 9.1 MB)

Contents:

Classification, regression, kernels, sequence analysis, neuronal nets, support vector machines, hidden Markov models, clustering, principal component analysis, independent component analysis, projection methods, PCA, ICA, factor analysis, error models, optimization techniques, regularization, Bayes approach, hyper-parameter optimization, feature selection, statistical learning theory, generalization error, maximum likelihood, model selection, etc.

Motivation:

Machine learning methods, for example neural networks used for the secondary and 3D structure prediction of proteins, have proven their value as essential bioinformatics tools. Modern measurement techniques in both biology and medicine create a huge demand for new machine learning approaches. One such technique is the measurement of mRNA concentrations with microarrays, where the data is first preprocessed, then genes of interest are identified, and finally predictions made. In other examples DNA data is integrated with other complementary measurements in order to detect alternative splicing, nucleosome positions, gene regulation, etc. All of these tasks are performed by machine learning algorithms. Alongside neural networks the most prominent machine learning techniques relate to support vector machines, kernel approaches, projection method and belief networks. These methods provide noise reduction, feature selection, structure extraction, classification / regression, and assist modeling. In the biomedical context, machine learning algorithms predict cancer treatment outcomes based on gene expression profiles, they classify novel protein sequences into structural or functional classes and extract new dependencies between DNA markers (SNP - single nucleotide polymorphisms) and diseases (schizophrenia or alcohol dependence).

In this course the most prominent machine learning techniques are introduced and their mathematical foundations are shown. However, because of the restricted space neither mathematical or practical details are presented. Only few selected applications of machine learning in biology and medicine are given as the focus is on the understanding of the machine learning techniques. If the techniques are well understood then new applications will arise, old ones can be improved, and the methods which best fit to the problem can be selected.

Students should learn how to chose appropriate methods from a given pool of approaches for solving a specific problem. Therefore they must understand and evalute the different approaches, know their andavtagtes and disandavantages as well as where to obtain and how to use them. In a step further, the students should be able to adapt standart algorithms for their own purposes or to modify those algorithms for specific applications with certain prior knowledge or special constraints.