SURV751: Big Data and Machine Learning

Area: 
Data Analysis
Credit(s)/ECTS: 
1/2
Core/Elective: 
Elective

The amount of data generated as a by-product in society is growing fast including data from satellites, sensors, transactions, social media and smartphones, just to name a few. Such data are often referred to as "big data", and can be used to create value in different areas such as health and crime prevention, commerce and fraud detection.  Big Data are often used for prediction and classification tasks. Both of which can be tackled with machine learning techniques. In this course we explore how Big Data concepts, processes and methods can be used within the context of Survey Research.  Throughout this course we will illustrate key concepts using specific survey research examples including tailored survey designs and nonresponse adjustments and evaluation.

Course objectives: 

This course will offer participants:

  • an overview of key Big Data terminology and concepts
  • an introduction to common data generating processes
  • a discussion of some primary issues with linking Big Data with Survey Data
  • issues of coverage and measurement errors within the Big Data context
  • a discussion of information extraction and signal detection in the context of Big Data
  • a discussion of the similarities and differences in model building for inference versus prediction
  • an overview of general concepts from machine learning as they apply to processing Big Data
  • a discussion of signal detection and information extraction
  • a discussion of the potential pitfalls for inference from Big Data
  • an introduction to a small set of key analytic techniques (e.g. classification trees, random forests, conditional forests) to process Big Data using R with example code provided
Grading: 

Grading will be based on:

  • 4 online quizzes (worth 5% each)
  • Participation in discussion during the weekly online meetings and submission of questions via the discussion forum demonstrating understanding of the required readings and video lectures (20% of grade). Obviously in the first week one question will be enough, since we just started.
  • 3 homework assignments (20% each)  

Course Dates

2018

Spring Term (March – May)

2019

Spring Term (March – May)