SURV704: Computer-Based Content Analysis II (Practical Project)

Data Analysis, Data Generating Process

In part 1 of this course, participants have learned how to use standard methods of Natural Language Processing (NLP) to support social science research through automatic content analysis. For this purpose, the course started with an introduction of typical use cases for NLP such as information extraction, text classification and topic detection. The participants have acquired a basic understanding of the mature and possible applications of these methods to be able to judge to what kinds of problems they can be applied. Further, participants have acquired practical knowledge of how to implement these methods using the Python library Natural Language Toolkit (NLTK) and the text mining features of the WEKA Machine Learning workbench. We looked at how to generate the specific feature format WEKA needs as input from textual resources and guide the participants through the use of WEKA for performing systematic text classification experiments. Beyond this basic form of text analysis, we also looked at two advanced techniques that go beyond the classification of a text. In particular, we looked at so-called topic models that generate topics that can be identified in a set of documents in terms of a probabilistic assignment of words to the different topics and we introduced the idea of identifying named entities in a text and disambiguating them by linking to unique representations of entities in a knowledge graph.

The second part of the course will contain a practical project as an optional extension of the first theoretical part. Over the course of this project, the participants will apply some of the techniques covered for answering a research question of their choice. The project will consist of four steps in which guidance is provided by the course instructors. In a first step, the participants will define the research problem and sketch a methodology for solving it that contains some text analysis elements. The following two steps consist of preprocessing and analyzing relevant textual resources. In the final step, the results of the text analysis will be used to provide an answer to the research question.

Course objectives: 

By the end of the course, students will be able to …:

  • Integrate text analysis into a research methodology and solve a research question from their field using this methodology
  • Define a methodology for solving a research problem that includes automatic text analysis
  • Selected and apply appropriate methods for preprocessing textual resources relevant for their research question.
  • Selected and apply text mining methods to the preprocessed textual resources and conduct systematic experiments.
  • Correlate the result with external variables and draw conclusions concerning their research question.

Grading will be based on:

  • Participation in online meetings (25%)
  • Final Project Report (75%)

Participants need to have attended the following IPSDS courses or have corresponding knowledge:

  • SURV699C Introduction to Python and SQL or necessary knowledge in programming in Python: data types & structures, functions & loops, file I/O
  • SURV736 Web Scraping (recommended)
  • SURV703: Computer-Based Content Analysis Part 1 (Theory)

Course Dates


Winter Term (December – February)