[ITA] - [de] - [Text Analytics]

Text Analytics [2021 Sommer]
Text Analytics
8 LP
one semester
every 2nd winter semester
Lecture 4 SWS + Exercise course 2 SWS
240 h; thereof
90 h lecture
15 h preparation for exam
135 h self-study and working on assignments/projects (optionally in groups)
B.Sc. Angewandte Informatik,
M.Sc. Angewandte Informatik,
M.Sc. Scientific Computing
Lernziel Students
- can implement and apply different text analytics methods using open source NLP and machine learning frameworks
- can describe different document and text representation models and can compute and analyze characteristic parameters of these models
- know how to determine, apply, and interpret use-case specific document similarity measures and underlying ranking concepts
- know the concepts and techniques underlying different text classification and clustering approaches
- know different models for phrase extraction and text summarization and are able to apply respective models and concepts using NLP and machine learning frameworks
- know the fundamental methods for the extraction of document outlines at different levels of granularity
- are familiar with basic concepts of topic models and their application in different text analytics tasks
- understand the principles of evaluating results of text analytics tasks
- know the theoretical background of machine learning methods at sufficient depths to be able to choose parameters and adapt an algorithm to a given text analytics problem
- are aware of ethical issues arising from applying text analytics in different domains
Inhalt - Text analytics in the context of Data Science
- Open source text analytics, NLP, and machine learning frameworks
- Fundamentals of NLP pipeline components
- Document and text representation models
- Document and text similarity metrics
- Approaches, techniques and corpora for benchmarking text analytics tasks
- Traditional and recent text classification and clustering approaches
- Information extraction and topic detection approaches
- Fundamentals of keyword and phrase extraction
- Text summarization techniques
- Generating document and text outlines
- Ethical and legal aspects of text analytics methods
- Text Analytics project management
Voraussetzungen Recommended are: solid knowledge of basic calculus, statistics, and linear algebra; good Python programming skills
Assignment (40%) and Programming Project (60%);
about 4-5 assignments focusing on the material learned in class on a conceptual and formal level;
group project in which 3-4 students develop a prototypical text analytics framework, including design and evaluation,
a written project documentation as well as the code need to be submitted at the end of classes, clearly indicating what student is responsible for what part of the project.
Literatur The following textbooks and texts are useful but not required.
- Dan Jurafsky and James H. Martin. Speech and Language Processing (3rd ed. draft)
- Yoav Goldberg. A Primer on Neural Network Models for Natural Language Processing (2015)
- Christopher D. Manning and Hinrich Schütze: Foundations of Statistical Natural Language Processing, MIT Press. Cambridge, MA: May 1999.
Furthermore, during the course of this lecture, several papers covering topics discussed in class will be provided.