Working with large datasets, presenting insights and collaborating with others are essential skills for data and survey scientists. In this course you will learn some keys skills needed in this research environment.
We will start the course by discussing different types of data workflows. This will cover typical ways in which organizations produce, manipulate and report on data. Getting an overview of these practices and understand how other organizations work can bring important insights that can make your own work better. In this unit we will also discuss how tools such as GitHub can help collaboration and improve reproducibility.
The second topic covered in the course will be reproducible documents. These are essential tools that can be used to create reports, research papers, books and websites. They are vital for reproducible research and collaboration as they can combine text and code while enabling version control. In this way, typical errors due to copy and pasting and imprecise language can be avoided.
The third topic discussed will be about accessing data online. Many organizations store data on servers due to their size and speed of production. Often you will need to be able to interact with servers directly in order to access, clean and analyze data. We will discuss the main technologies for storing data (such as SQL and JSON) and how you can use R to access them.
The final topic of the course will be dashboards. These are important tools used to present large data in a reliable and easy to read fashion. They are especially useful when data is collected at high speeds and decisions need to be made based on such data. It is a very useful tool also for presenting results to clients and a lay audience. Here we will be discussing how R Shiny can be used to create such dashboards.
Each topic will be covered in two weeks. The first week will cover the online course and the reading materials. In the second week students will have to prepare a project based on what they learned in the first week.
By the end of the course, students will…
Grading will be based on:
SURV665 Real World Data Management with R or a good knowledge of R base and tidyverse.