SURV665: Introduction to Real World Data Management

Data Curation/Storage

Apply through UMD

Instructors: Alexandru Cernat

Data is ubiquitous in the contemporary world. It comes in a variety of shapes and sizes: surveys, administrative data or found data. Often we want to use this data in order to better understand the world by applying different types of statistical analyses. Unfortunately, most often the data we are interested in do not come prepared for the analysis we want to carry out. This can be due to its format, due to missing cases or just because it captures information in a way that we cannot use in our analysis. In this course you will learn both the conceptual and practical aspects of importing and manipulating data in order to be used both for exploratory and more advanced statistical analyses.

The course will first cover the main concepts needed to prepare real world data using R. We will start by understanding the steps we need to follow in order to prepare data for analysis. Then we will develop the core skills in R such as working with the different types of objects, such as data frames. We will then cover how to use techniques to make our work with data more efficient, for example by using loops or by applying functions over variables or data frames. After covering the main concepts and skills we will concentrate on data management. Here we will discuss how to manipulate data such as selecting cases/variables, recoding variables or reshaping datasets. We will then go on to learn how to explore the data using tables and graphics. Finally, we will cover the topic of cleaning and exploring text data as well as time data.

By the end of the course the students will be able to work with multiple types of data and be able to manipulate them in order to prepare them for analysis. They will know the main steps needed to achieve this in an efficient way.

The course will be divided in four topics. Each one will be covered in two weeks. The first week will cover the online course and the reading materials. In the second week students will have to prepare a project based on what they learned in the first week.

Course objectives: 

By the end of the course, students will…

  • understand the stages involved in preparing data for analysis
  • understand the concept of tidy data
  • understand the basics of using R
  • know how to write their own functions and loop over them
  • know how to import and export data
  • know how to clean data in R
  • know how to merge data
  • know how to manipulate textual data
  • know how to manipulate date/time data
  • know how to use tables and graphs to explore data

Grading will be based on:

  • 4 fortnightly homework assignments (worth 60% total)
  • Participation in discussion during the weekly online meetings and submission of questions to the forum demonstrating understanding of the required readings and video lectures (10% of grade)
  • A final project (30% of grade)

Basic knowledge of R. Prior experience with working with data.


Wickham, H., & Grolemund, G. (2017). R for Data Science. O’Reilly UK Ltd. ( Grolemund, G. (2017). Hands-On Programming with R. Write Your Own Functions and Simulations. O’Reilly UK Ltd.

Weekly online meetings & assignments:

  • Week 1: (Online question 1)
  • Week 2: (Online question 2, graded project 1)
  • Week 3: (Online question 3)
  • Week 4: (Online question 4, graded project 2)
  • Week 5: (Online question 5)
  • Week 6: (Online question 6, graded project 3)
  • Week 7: (Online question 7)
  • Week 8: (Online question 8, graded project 4)
  • Final graded project 

If you want to dive even deeper into these topics, we recommend to sign up for the follow-up course SURV699Y Modern Workflow in Data Science.

Course Dates


Summer Term (June – August)


Fall Semester (September – December)