SURV665: Introduction to Real World Data Management

Area: 
Data Curation/Storage
Credit(s)/ECTS: 
2/4
Core/Elective: 
Elective

Data is ubiquitous in the contemporary world. It comes in a variety of shapes and sizes: surveys, administrative data or found data. Often we want to use this data in order to better understand the world by applying different types of statistical analyses. Unfortunately, most often the data we are interested in do not come prepared for the analysis we want to carry out. This can be due to its format, due to missing cases or just because it captures information in a way that we cannot use in our analysis. In this course you will learn both the conceptual and practical aspects of importing and manipulating data in order to be used both for exploratory and more advanced statistical analyses.

The course will first cover the main concepts needed to prepare real world data using R. We will start by understanding the steps we need to follow in order to prepare data for analysis. Then we will develop the core skills in R such as working with the different types of objects, such as data frames. We will then cover how to use techniques to make our work with data more efficient, for example by using loops or by applying functions over variables or data frames. After covering the main concepts and skills we will concentrate on data management. Here we will discuss how to manipulate data such as selecting cases/variables, recoding variables or reshaping datasets. We will then go on to learn how to explore the data using tables and graphics. Finally, we will cover the topic of cleaning and exploring text data as well as time data.

By the end of the course the students will be able to work with multiple types of data and be able to manipulate them in order to prepare them for analysis. They will know the main steps needed to achieve this in an efficient way.

The course will be divided in four topics. Each one will be covered in two weeks. The first week will cover the online course and the reading materials. In the second week students will have to prepare a project based on what they learned in the first week.

Course objectives: 

By the end of the course, students will…

  • Understand the stages involved in preparing data for analysis
  • Will understand the concept of tidy data
  • Understand the basics of using R
  • Will know how to write their own functions and loop over them
  • Will know how to import and export data
  • Will know how to clean data in R
  • Will know how to merge data
  • Will know how to manipulate textual data
  • Will know how to manipulate date/time data
  • Will know how to use tables and graphs to explore data

 

 

Grading: 

Grading will be based on:

  • 4 fortnightly homework assignments (worth 60% total)
  • Participation in discussion during the weekly online meetings and submission of questions to the forum demonstrating understanding of the required readings and video lectures (10% of grade)
  • A final project (30% of grade)
Prerequisites: 

No prerequisites.

Course syllabus: 
Syllabus Section 1 (PDF; 518.72 KB)
Syllabus Section 2 (PDF; 547.5 KB)

Course Dates

2018

Summer Term (June – August)