SURV699A: Ethical Considerations for Data Science Research

Data Output/Access, Research Question

Apply through UMD

Instructor: Jessica Vitak

Networked technologies—including the internet of things (IoT), wearables, ubiquitous sensing, social sharing platforms, and other AI-driven systems—are generating a tremendous amount of data about individuals, companies, and societies. These technologies provide a range of new opportunities for data scientists and researchers to understand human behavior and develop new tools that benefit society.  At the same time, the ease with which data can be collected and analyzed raises a wide range of ethical questions about these technologies, their creators, and their users.

In recent years, we have seen numerous examples of research and technologies that are ethically problematic. For example, Facebook’s Cambridge Analytica scandal revealed researchers using problematic tactics to collect profile data from millions of Facebook users. In addition, algorithms and machine learning techniques have been revealed as systematically biased in how they evaluate resumes[1], recommend parole for prisoners[2], decide where police units should deploy[3], and identify people through facial recognition technology[4], just to name a few.

Therefore, it is critical that data scientists and others who will be working with big data can critically assess the potential risks and benefits of any end products, whether they are developing a search engine or a tool for detecting terrorists. This course will provide an overview of key ethical issues that arise when working with big data, and it will provide opportunities to review and reflect on past mistakes in this space.






Course objectives: 

By the end of the course, students will…

  • Describe the history of research ethics and the goals of institutional review boards
  • Describe the challenges data science and big data raise for protecting individuals’ rights and privacy
  • Identify ethical issues in the study design, data collection, and data analysis process
  • Detail best practices for conducting ethical research

Grading will be based on:

  • Participation in discussion during the weekly online meetings and contributions to weekly discussion forums demonstrating understanding of the required readings and video lectures (10% of grade)
  • Four open-book quizzes assessing comprehension of course material (20% of grade; 5% each)
  • Three online homework assignments reviewing specific aspects of the material covered (45% of grade; 15% each)
  • Final paper covering overarching themes of the class (25%).

A+  100 - 97       C+   79-77    F 59 or below
A     96 – 93       C     76-73
A-    92 – 90       C-    72-70
B+   89 – 87       D+   69-67
B     86 – 83       D     66-63
B-    82 – 80       D-    62-60

The grading scale is a base scale recommended by the IPSDS. Variations for grading on a scale are at the discretion of the instructor.

Dates of when assignment will be due are indicated in the syllabus. Extensions will be granted sparingly and are at the instructor's discretion. If you know you will not be able to meet a deadline, email the professor before the due date to request an extension. If an assignment is submitted late and no extension has been given, a 10% penalty will be applied for each day it is late.


No prerequisites.


Boyd, D. & Crawford, K. (2012). Critical questions for big data: Provocations for a cultural, technological, and scholarly phenomenon. Information, Communication & Society, 15(5), 662-679.

Metcalf, J., & Crawford, K. (2016). Where are human subjects in big data research? The emerging ethics divide. Big Data & Society, 3(1).

Moon, M. (2009). The History and Role of Institutional Review Boards: A Useful Tension. AMA Journal of Ethics.

Velasquez, M., Andre, C., Shanks, S.J., T., & Meyer, M.J. “What is ethics?” Center for Applied Ethics, Santa Clara University.

Saltz, J. S., & Dewar, N. (2019). Data science ethical considerations: a systematic literature review and proposed project framework.

Metcalf, J. (2014). “Ethics Codes: History, Context, and Challenges.” Council for Big Data, Ethics, and Society.

Association of Internet Researchers (AoIR) Code of Ethics

Zimmer, M. (2010). “But the data is already public”: On the ethics of research in Facebook. Ethics and Information Technology, 12(4), 313–325.

Barocas, S., & Nissenbaum, H. (2014, November). Big data’s end run around procedural privacy protections: Recognizing the inherent limitations of consent and anonymity. Communications of the ACM, 57(11), 31-33.

Ohm, P. (2009). Broken promises of privacy: Responding to the surprising failure of anonymization. UCLA Law Review, 57, 1701. **read pages 1701-1731**

Angwin, J., Larson, J., Mattu, S., & Kirchner, L. (2016). Machine bias. ProPublica.

Tiell, S. & Metcalf, J. (2016). The Ethics of Data Sharing: A guide to best practices and governance.Accenture.

Keyes, O. (2019, April 8). Counting the Countless: Why data science is a profound threat for queer people. Real Life. 

Olteanu, A., Castillo, C., Diaz, F., & Kiciman, E. (2016). Social data: Biases, methodological pitfalls, and ethical boundaries. Methodological Pitfalls, and Ethical Boundaries.

Barocas, S., & Boyd, D. (2017). Engaging the ethics of data science in practice. Communications of the ACM, 60(11), 23-25.

Weekly online meetings & assignments:

  • Week 1: (Quiz 1, Assignment 1)
  • Week 2: (Quiz 2, Assigment 2)
  • Week 3: (Quiz 3, Assignment 3)
  • Week 4: (Quiz 4)
  • Final Paper

Course Dates


Fall Semester (September – December)


Summer Term (June – August)