Skip to main content

Bias and Fairness in Machine Learning: Mitigating Bias with unstructured data

Submission Number: 164
Submission ID: 3745
Submission UUID: a6306a66-2d23-43f6-9251-b53818712523
Submission URI: /form/project

Created: Tue, 05/30/2023 - 10:41
Completed: Tue, 05/30/2023 - 10:41
Changed: Sat, 06/29/2024 - 10:50

Remote IP address: 71.58.230.184
Submitted by: Carrie Brown
Language: English

Is draft: No
Webform: Project
Bias and Fairness in Machine Learning: Mitigating Bias with unstructured data
CAREERS
{Empty}
{Empty}
Complete

Project Leader

Ahmed Rashed
6627032781
{Empty}

Project Personnel

Pranav Venkit
Abdelkrim Kallich
{Empty}

Project Information

With the widespread use of artificial intelligence (AI) systems and applications in our everyday lives, accounting for fairness has gained significant importance in designing and engineering of such systems. AI systems can be used in many sensitive environments to make important and life-changing decisions; thus, it is crucial to ensure that these decisions do not reflect discriminatory behavior toward certain groups or populations. More recently some work has been developed in traditional machine learning and deep learning that address such challenges in different subdomains. With the commercialization of these systems, researchers are becoming more aware of the biases that these applications can contain and are attempting to address them.
In the industry, it has become very critical to create fair ML models in order to respect different groups in the sensitive features that are protected by the law and not to favorably select some groups against the others. Bias can show up in either dataset sampling or model performance against protected groups or individuals. Therefore, it is important in the industry to establish a bias analysis system to identify and mitigate the bias in both the dataset and model performance with respect to group and individual fairness. There are several fairness libraries to achieve this job. In the industry, fairness libraries that are used in bias analysis must be created by well-known organizations. There are fairness libraries created by big companies such as Microsoft, IBM, and Google. The goal of this project is to compare the fairness libraries that can be used in the industry and work out a use-case using a published dataset.

Project Information Subsection

1. Surveying the basics of bias and fairness in machine learning. The students will learn the basics from the two review articles “A Survey on Bias and Fairness in Machine Learning” by NINAREH MEHRABI, FRED MORSTATTER, NRIPSUTA SAXENA, KRISTINA LERMAN, and ARAM GALSTYAN, and “An Introduction to Algorithmic Fairness” arXiv:2105.05595v1 [cs.CY] by Hilde J.P. Weerts.
2. Searching for possible fairness libraries that can be used in the industry. We will use three libraries created by big technology companies, so that they are trustable to be used in industry.
• Fairlearn (By Microsoft)
• AIF360 (By IBM)
• What-if-tool (By Google)
3. Selecting a published structured and unstructured dataset. The main goal of the project is to identify bias in the structured (tabular) dataset. If possible, we will extend our bias analysis to the unstructured data such as text and image.
• Tabular Dataset: TitanicSexism (fairness in ML), https://www.kaggle.com/code/garethjns/titanicsexism-fairness-in-ml/input
• Text Dataset: Fake and real news dataset, https://www.kaggle.com/datasets/clmentbisaillon/fake-and-real-news-dataset
• Imaged Dataset: UTKFace, https://www.kaggle.com/datasets/jangedoo/utkface-new
4. Choosing the proper fairness metrics to identify the bias. Below is an example of the metrics that will be used in each library.
• Fairlearn: Demographic parity, Equalized odds, Equal opportunity
• AIF360: Dataset Metric, Binary Label Dataset Metric, Classification Metric, Sample Distortion Metric, MDSS Classification Metric.
• What-If-Tool: It is still under study
5. Discussing the results and summarizing the comparison among the libraries. In the result discussion, we will classify the fairness metrics as group and individual, also as metrics to measure fairness in dataset and others for the model performance.
{Empty}
{Empty}
{Empty}
{Empty}
{Empty}
CR-Penn State
{Empty}
CR-Penn State
{Empty}
No
Already behind3Start date is flexible
{Empty}
{Empty}
05/08/2024
{Empty}
06/14/2024
  • Milestone Title: Survey basics
    Milestone Description: 1. Surveying the basics of bias and fairness in machine learning. The students will learn the basics from the two review articles “A Survey on Bias and Fairness in Machine Learning” by NINAREH MEHRABI, FRED MORSTATTER, NRIPSUTA SAXENA, KRISTINA LERMAN, and ARAM GALSTYAN, and “An Introduction to Algorithmic Fairness” arXiv:2105.05595v1 [cs.CY] by Hilde J.P. Weerts.

    Completion Date Goal: 2024-01-26
  • Milestone Title: Select fairness libraries
    Milestone Description: 2. Searching for possible fairness libraries that can be used in the industry. We will use three libraries created by big technology companies, so that they are trustable to be used in industry.
    • Fairlearn (By Microsoft)
    • AIF360 (By IBM)
    • What-if-tool (By Google)

    Completion Date Goal: 2024-02-09
  • Milestone Title: Select dataset to use
    Milestone Description: 3. Selecting a published structured and unstructured dataset. The main goal of the project is to mitigate bias in the structured (tabular) dataset. If possible, we will extend our bias analysis to the unstructured data such as text and image.
    • Tabular Dataset: TitanicSexism (fairness in ML), https://www.kaggle.com/code/garethjns/titanicsexism-fairness-in-ml/input
    • Text Dataset: Fake and real news dataset, https://www.kaggle.com/datasets/clmentbisaillon/fake-and-real-news-dataset
    • Imaged Dataset: UTKFace, https://www.kaggle.com/datasets/jangedoo/utkface-new
    Completion Date Goal: 2024-02-16
  • Milestone Title: Apply libraries to dataset
    Milestone Description: Apply libraries to identified datasets and explore effectiveness at bias mitigation
    Completion Date Goal: 2024-03-29
  • Milestone Title: Discussing results and summarizing comparison
    Milestone Description: 5. Discussing the results and summarizing the comparison among the libraries. In the result discussion, we will classify the fairness metrics as group and individual, also as metrics to measure fairness in dataset and others for the model performance.
    Completion Date Goal: 2024-04-19
{Empty}
{Empty}
{Empty}
{Empty}
{Empty}
{Empty}
{Empty}

Final Report

{Empty}
{Empty}
{Empty}
{Empty}
{Empty}
{Empty}
{Empty}
{Empty}
{Empty}
{Empty}