The Bias Report in Action
Using a clean version of the
COMPAS dataset, we demostrate the use of The Bias Report web app.
The Process
Upload data
First, we upload the data. The cleaned dataset is available on the upload page and follows the format described
here.
Select Protected Groups
Following the Propublica-Northpointe debate (discussed below) we focus on race. We select a custom reference group and use Caucasian as the reference group. Our metrics will thus reflect fairness in relation to the historically dominant group.
Select Fairness Metrics
Again following the below debate, we select False Positive Rates, False Negative Rates and False Discovery Rates.
Background
In 2016, Propublica
reported on racial inequality in COMPAS, a risk assessment tool. They showed the algorithm led to unfair disparities in False Negative and False Positive Rates. In particular, they showed black defendants who would not go on to recidivate faced disproportionately high risk scores, while white defendants who would recidivate received disproportionately low risk scores. Northpointe, the company responsible for the algorithm,
responded by arguing they callibrated the algorithm to be fair in terms of False Discovery Rate, also known as calibration. With the Bias Report, we get metrics on each type of disparity, adding clarity to the bias auditing process.
Analysis
The African-American false discovery rates are within the bounds of fairness. This result is expected because COMPAS is calibrated. (The overall FDR fairness returns false, because Asian and Native American defendants did not fall within the fairness threshholds for FDR).
On the other hand, African-Americans are roughly twice as likely to have false positives and 40 percent less likely to false negatives. In real terms, 44.8% of African-Americans who did not recidivate were marked high or medium risk (with potential for associated penalties), compared with 23.4% of Caucasian non-reoffenders. This is unfair and is marked False below.
These findings mark an inherent trade-off between FPR Fairness, FNR Fairness and calibration, which is present in any decision system where base rates are not equal. See Chouldechova (2017). Aequitas helps bring this trade-off to the forefront with clear metrics and asks system designers to make a reasoned decision based on their use case.