SURV673: Introduction to Python and SQL

Area: 
Data Analysis, Data Curation/Storage
Credit(s)/ECTS: 
1/2
Core/Elective: 
Elective

Apply through UMD

Instructor: Diego Fregolent Mendes des Oliveira

Python has recently seen a huge surge not only as a programming language, but also as a tool for data analysis. In this course, we will introduce the basics of programming in Python for the purposes for data analysis. We will explore the Longitudinal Employer-Household Dynamics (LEHD) datasets, specifically the LEHD Origin-Destination Employment Statistics (LODES) datasets, using Python to read in datasets, explore the datasets, find statistical summaries, and create visualizations. By the end of the course, students should be comfortable with using Python for data analysis, as well as be capable of using their general knowledge of the Python language for other applications.

In addition, as more and more data becomes available, relational database management systems (RDBMS) have become increasingly popular because it allows people to relatively easily organize large amounts of data. In many cases, knowledge of SQL is crucial to being able to access this data. In this course, we will introduce the basics of programming in SQL using PostgreSQL. We will explore the Longitudinal Employer-Household Dynamics (LEHD) datasets, specifically the LEHD Origin-Destination Employment Statistics (LODES) datasets, using SQL to explore the datasets and find statistical summaries. By the end of the course, students should be comfortable with constructing basic queries of the database and linking multiple tables together using SQL.

Course objectives: 

By the end of the course, students will

  • Understand the basic structure of how Python and object-oriented programming works
  • Be able to write basic Python code, including functions and loops
  • Know how to use Pandas and matplotlib packages in Python to analyze data and create visualizations
  • Be comfortable reading error messages and Python documentation to diagnose and debug code
  • Understand how relational databases work
  • Be able to construct a query to answer questions about the data
  • Understand how joins work and how to use them
Grading: 

Grading will be based on:

  • 4 online quizzes (5% each)
  • Participation in discussion during the weekly online meetings and posting questions in the forum demonstrating understanding of required readings and video lectures (20% of grade)
  • 4 homework assignments (15% each)
Prerequisites: 

No prerequisite

Readings: 

LEHD Origin-Destination Employment Statistics (LODES) OnTheMap: Data Overview (LODES Version 7)

LEHD Origin-Destination Employment Statistics (LODES) Dataset Structure

Weekly online meetings & assignments:

  • Week 1: Introduction to Python and Pandas (Quiz 1, Homework 1)
  • Week 2: Functions, Loops, and Visualizations (Quiz 2, Homework 2)
  • Week 3: Introduction to SQL (Quiz 3, Homework 3)
  • Week 4: Joins (Quiz 4, Homework 4)

Course Dates

2019

Fall Semester (September – December)

2020

Summer Term (June – August)

Fall Semester (September – December)

2021

Summer Term (June – August)

2022

Summer Term (June – August)