[ITA] - [2022Sommer] - [en] - [Text Analytics]


Text Analytics [2022 SoSe]
Code
ITA
Name
Text Analytics
CP
8
Duration
one semester
Offered
every 2nd winter semester
Format
Lecture 4 SWS + Exercise course 2 SWS
Workload
240 h; thereof
90 h lecture
15 h preparation for exam
135 h self-study and working on assignments/projects (optionally in groups)
Availability
M.Sc. Angewandte Informatik
M.Sc. Data and Computer Science
M.Sc. Scientific Computing
Language
English
Lecturer(s)
Michael Gertz
Examination scheme
Learning objectives Students
- can implement and apply different text analytics methods using open source NLP and machine learning frameworks
- can describe different document and text representation models and can compute and analyze characteristic parameters of these models
- know how to determine, apply, and interpret use-case specific document similarity measures and underlying ranking concepts
- know the concepts and techniques underlying different text classification and clustering approaches
- know different models for phrase extraction and text summarization and are able to apply respective models and concepts using NLP and machine learning frameworks
- know the fundamental methods for the extraction of document outlines at different levels of granularity
- are familiar with basic concepts of topic models and their application in different text analytics tasks
- understand the principles of evaluating results of text analytics tasks
- know the theoretical background of machine learning methods at sufficient depths to be able to choose parameters and adapt an algorithm to a given text analytics problem
- are aware of ethical issues arising from applying text analytics in different domains
Learning content - Text analytics in the context of Data Science
- Open source text analytics, NLP, and machine learning frameworks
- Fundamentals of NLP pipeline components
- Document and text representation models
- Document and text similarity metrics
- Approaches, techniques and corpora for benchmarking text analytics tasks
- Traditional and recent text classification and clustering approaches
- Information extraction and topic detection approaches
- Fundamentals of keyword and phrase extraction
- Text summarization techniques
- Generating document and text outlines
- Ethical and legal aspects of text analytics methods
- Text Analytics project management
Requirements for participation Recommended are: solid knowledge of basic calculus, statistics, and linear algebra; good Python programming skills
Requirements for the assignment of credits and final grade The module is completed with a graded exam. The note of this exam gives the note for this module. Details for this exam as well as the requirements for the assignment of credits will be given by the lecturer an the beginning of this course.
Useful literature The following textbooks and texts are useful but not required.
- Dan Jurafsky and James H. Martin. Speech and Language Processing (3rd ed. draft)
- Yoav Goldberg. A Primer on Neural Network Models for Natural Language Processing (2015)
- Christopher D. Manning and Hinrich Schütze: Foundations of Statistical Natural Language Processing, MIT Press. Cambridge, MA: May 1999.
Furthermore, during the course of this lecture, several papers covering topics discussed in class will be provided.