Ruprecht-Karls-Universität Heidelberg
Siegel der Universität Heidelberg

Module for [Scientific Computing]

[back] to List of Modules.

[Data Mining - Algorithms and Parallel Techniques] - [2014/15 Winter]

Module Code
IDM
Name
Data Mining - Algorithms and Parallel Techniques
Credit Points
6 CP
Workload
180 h
Duration
1 semester
Cycle
0
Methods Lecture 2 h + Exercise course 2 h
Objectives To have a firm knowledge of applications, sequential algorithms and their parallel counterparts within data mining.
Content This module covers the sequential and parallel algorithms within data mining together with their programming and applications. This comprises the following topics: pre-processing of data feature generation and selection classification and regression techniques clustering time series analysis Bayesian networks evaluation of results. An essential part of the course is devoted to the parallel and distributed data mining, for example under the Map-Reduce programming model. The practical aspects are gained via programming examples in Matlab / GNU Octave and usage of libraries / tools such as Weka and KNIME.
Learning outcomes Familiarity with applications of data mining Understanding of the methods of data preprocessing (normalization, discretization, dimensionality reduction) Knowledge of approaches for classification, regression, clustering and their parallel and incremental implementations Familiarity with methods of evaluating results Understanding overfitting phenomena and methods of their prevention Practical knowledge of data mining with Matlab / GNU Octave and with Java libraries and frameworks (Weka, KNIME) Proficiency in parallel mining of large datasets with Map-Reduce and Matlab
Prerequisitesnone
Suggested previous knowledge Knowledge of Java (eg via Introduction to Software Engineering (ISW)) and in elementary probability theory / statistics, IKDD
Assessments Successful participation in the exercises with homework (achieving a minimum score) and passing a final exam.
Literature Ethem Alpaydin, Maschinelles Lernen, Oldenbourg Verlag, 2008
Trevor Hastie, Robert Tibshirani, Jerome Friedman, The Elements of Statistical Learn-ing: Data Mining, Inference, and Prediction, Springer, 2009 (online)
Stephen Marsland, Machine Learning: An Algo-rithmic Perspective, CRC Press Inc., 2009
Ron Bekkerman, Misha Bilenko, John Langford, Scaling Up Machine Learning, Cambridge University Press, 2012
zum Seitenanfang