Bioinformatics II: Theoretical Bioinformatics and Machine Learning (4VO)
| Course no.: |
365.034 |
| Lecturer: |
Sepp Hochreiter |
| Times/locations: |
Mon 15:30-17:00, HS 14 and Thur 15:30-17:00, T 111
Start: Mon 5.3.2012 |
| Mode: |
VO, 4h, weekly |
| Registration: |
KUSSS |
Lecture notes:
PDF (8 MB)
Slides:
Part 1 (low) (PDF, 1.8 MB) /
Part 1 (high) (PDF, 24.1 MB)
Part 2 (low) (PDF, 3.2 MB) /
Part 2 (high) (PDF, 8.6 MB)
Part 3 (low) (PDF, 3.1 MB) /
Part 3 (high) (PDF, 14.1 MB)
Part 4 (low) (PDF, 2.0 MB) /
Part 4 (high) (PDF, 9.1 MB)
Contents:
Classification, regression, kernels, sequence analysis, neuronal nets, support vector
machines, hidden Markov models, clustering, principal component analysis, independent
component analysis, projection methods, PCA, ICA, factor analysis, error models, optimization techniques,
regularization, Bayes approach, hyper-parameter optimization, feature selection,
statistical learning theory, generalization error, maximum likelihood, model selection, etc.
Motivation:
Machine learning methods, for example neural networks used for the
secondary and 3D structure prediction of proteins, have proven their
value as essential bioinformatics tools. Modern measurement techniques in
both biology and medicine create a huge demand for new machine learning
approaches. One such technique is the measurement of mRNA concentrations
with microarrays, where the data is first preprocessed, then genes of
interest are identified, and finally predictions made. In other examples
DNA data is integrated with other complementary measurements in order to
detect alternative splicing, nucleosome positions, gene regulation, etc.
All of these tasks are performed by machine learning algorithms.
Alongside neural networks the most prominent machine learning techniques
relate to support vector machines, kernel approaches, projection method
and belief networks. These methods provide noise reduction, feature
selection, structure extraction, classification / regression, and assist
modeling. In the biomedical context, machine learning algorithms predict
cancer treatment outcomes based on gene expression profiles, they
classify novel protein sequences into structural or functional classes
and extract new dependencies between DNA markers (SNP - single nucleotide
polymorphisms) and diseases (schizophrenia or alcohol dependence).
In this course the most prominent machine learning techniques are
introduced and their mathematical foundations are shown.
However, because of the restricted space
neither mathematical or practical details are presented.
Only few selected applications of machine learning in biology and medicine
are given as the focus is on the understanding of the machine learning
techniques. If the techniques are well understood then new applications
will arise, old ones can be improved, and the methods which best fit to the
problem can be selected.
Students should learn how to chose appropriate methods from a given pool
of approaches for solving a specific problem. Therefore they must understand and evalute the
different approaches, know their andavtagtes and disandavantages as well
as where to obtain and how to use them.
In a step further, the students should be able to adapt standart
algorithms for their own purposes or to modify those algorithms for specific
applications with certain prior knowledge or special constraints.