In part 1 of this course, participants have learned how to use standard methods of Natural Language Processing (NLP) to support social science research through automatic content analysis. For this purpose, the course started with an introduction of typical use cases for NLP such as information extraction, text classification and topic detection. The participants have acquired a basic understanding of the mature and possible applications of these methods to be able to judge to what kinds of problems they can be applied. Further, participants have acquired practical knowledge of how to implement these methods using the Python library Natural Language Toolkit (NLTK) and the text mining features of the WEKA Machine Learning workbench. We looked at how to generate the specific feature format WEKA needs as input from textual resources and guide the participants through the use of WEKA for performing systematic text classification experiments. Beyond this basic form of text analysis, we also looked at two advanced techniques that go beyond the classification of a text. In particular, we looked at so-called topic models that generate topics that can be identified in a set of documents in terms of a probabilistic assignment of words to the different topics and we introduced the idea of identifying named entities in a text and disambiguating them by linking to unique representations of entities in a knowledge graph.
The second part of the course will contain a practical project as an optional extension of the first theoretical part. Over the course of this project, the participants will apply some of the techniques covered for answering a research question of their choice. The project will consist of four steps in which guidance is provided by the course instructors. In a first step, the participants will define the research problem and sketch a methodology for solving it that contains some text analysis elements. The following two steps consist of preprocessing and analyzing relevant textual resources. In the final step, the results of the text analysis will be used to provide an answer to the research question.
By the end of the course, students will be able to …:
Grading will be based on:
Participants need to have attended the following IPSDS courses or have corresponding knowledge: