SURV752: Introduction to Data Visualization

Area: 
Data Output/Access
Credit(s)/ECTS: 
1/2
Core/Elective: 
Elective

This course is intended to provide students with a thorough introduction to the best practice of modern data visualization from a social science perspective.

The course is highly applied in nature and emphasizes the practical aspects of data visualization in the social sciences. To illustrate the concepts and methods, examples and data from the social sciences will be used throughout the course. Next to conceptual discussions, the course will spend some time on how to produce data visualizations using the free statistical programming language R. Course participants will get hands-on advice on producing modern visualizations for their practical problems.

After an introduction to data visualization showing new and classic examples, the course will discuss data visualization as a methodology for social science data analysis and exploration rather than simply turning data into visual objects. Data visualization is about solving problems with data, where visualization is the means to an overarching goal. The course distinguishes between high-level goals (exploration vs. presentation) and low-level goals (making specific comparisons and revealing specific patterns). Understanding data visualization as a methodology also implies that, instead of focusing on single graphics and formats, one should think about how they are used in the larger context of a data analysis. This immediately leads to considerations of how to use and combine multiple graphs either of different subsets of the data or different formats of the same subset.

Next, the course will introduce the basic fundamentals of graphical perception – how humans see and process visual stimuli. We will takes a closer look at how to best achieve the low-level goals of making specific comparisons and finding specific patterns in the data. Certain graphical formats are generally superior to others. Understanding the workings of graphical perception suggests specific design principles that improve the detectability of patterns in the data and decrease the cognitive load in processing them. We will apply this knowledge in a discussion of the relative merits of familiar formats such as bar, dot, and line charts.

Comparisons are the heart of any analysis of quantitative data. The course will give an overview of the graphical formats and visual techniques that optimally support some of the most fundamental data analytic tasks: comparing before and after, comparing subgroups, comparing to a standard, comparing to a larger context, etc. Students will get to know the slope graph, spark lines and the bullet graph as less well known but highly effective formats for making visual comparisons. Importantly, this session will present one of the most powerful methods of data visualization: the small multiple design. We will stress the importance of arrangement, sorting, and visual reference elements in enabling effective comparisons.  

Social science data analysis is fundamentally about relations between variables. This includes how a variable changes over time as a special case. The course discusses scatter plot variants for the effective display of bivariate relationships. Considerable time will be spent on the problem of over-plotting, how to deal with it (e.g., through the use of jittering or alpha blending) and how to enhance scatterplots with additional plot elements. One important enhancement that will receive a detailed treatment is the addition of both parametric and non-parametric scatter-plot smoothers that reveal general trends in data. We will close with a discussion of how the aspect-ratio of a graphic affects the perceived strength of a correlation or time trend.

Course objectives: 

By the end of the course, students will…

  • know how to evaluate and criticize data visualizations based on principles of analytic design
  • be in the position to explore and present their data with visual methods
  • understand which graphical formats are useful for which types of data and questions
  • know how to construct compelling visualizations using the free statistics software R
Grading: 

Grading will be based on:

  • Participation at online meetings (10%)
  • Required Course Assignments (40%)
  • Final Visualization Project (50%)
Prerequisites: 

Course prerequisites are a basic understanding of statistics and bivariate linear regression. Some experience in the use of a statistical software package would help but no prior exposure to R is required. I will provide detailed code examples to get students up to pace.

Course Dates

2018

Spring Term (March – May)

2019

Spring Term (March – May)