Detecting Bias and Fairness Trade-offs in Machine Learning Models Using Real-World Datasets

Research Field

Artificial Intelligence | Machine Learning | Data Science | Ethics in AI

Location

Remote Only

Project Overview

As machine learning systems are increasingly deployed in high-stakes settings, ensuring fairness and minimizing algorithmic bias has become a critical challenge. In this project, a student will work with real-world datasets to examine how common machine learning models can unintentionally encode or amplify bias across demographic groups.

The project introduces students to applied AI research at the intersection of technology, ethics, and policy, reflecting challenges actively explored in industry research environments.

Student Responsibilities

The student will:

  • Analyze real-world structured datasets with demographic attributes

  • Train and evaluate baseline machine learning models (e.g., logistic regression, decision trees)

  • Apply standard fairness metrics to assess model performance across groups

  • Explore trade-offs between accuracy, interpretability, and fairness

  • Summarize findings in a formal research paper suitable for competition and presentation

Prior coding experience is helpful but not required; motivated students will be supported in learning foundational tools.

Skills Gained

  • Practical machine learning model development

  • Data preprocessing and exploratory data analysis

  • Introduction to algorithmic fairness metrics and evaluation

  • Research documentation and technical communication

  • Exposure to industry-relevant AI research practices

Time Commitment

Approximately 6–10 hours per week, depending on depth of exploration and project scope.

Ideal Student Profile

  • Strong interest in computer science, AI, or data science

  • Comfortable with mathematics and logical reasoning

  • Curious about ethical and societal implications of technology

  • Able to work independently with structured mentorship

Previous
Previous

Modeling Plant Stress Responses to Drought Using Phenotypic and Environmental Data

Next
Next

Analyzing Gene Expression Signatures Associated with Neuroinflammation Using Public RNA-Seq Datasets