{% for group, issues in issues_by_group.items() %}
{% if issues[0].__class__.__name__ == "PerformanceIssue" %}

We found some data slices in your dataset on which your model performance is lower than average. Performance bias may happen for different reasons:

  • Not enough examples in the low-performing data slice in the training set
  • Wrong labels in the training set in the low-performing data slice
  • Drift between your training set and test set

{% elif issues[0].__class__.__name__ == "RobustnessIssue" %}

Your model seems to be sensitive to small perturbations in the input data. These perturbations can include adding typos, changing word order, or turning text into uppercase or lowercase. This happens when:

  • There is not enough diversity in the training data
  • Overreliance on spurious correlations like the presence of specific word
  • Use of complex models with large number of parameters that tend to overfit the training data

{% elif issues[0].__class__.__name__ == "OverconfidenceIssue" %}

We found some data slices in your dataset containing significant number of overconfident predictions. Overconfident predictions are rows that are incorrect but are predicted with high probabilities or confidence scores. This happens when:

  • There are not enough examples in the overconfident data slice in the training set
  • Wrongly labeled examples in the training set in the overconfident data slice
  • For imbalanced datasets, the model may assign high probabilities to predictions of the majority class

{% elif issues[0].__class__.__name__ == "UnderconfidenceIssue" %}

We found some data slices in your dataset containing significant number of underconfident predictions. Underconfident predictions refer to situations where the predicted label has a probability that is very close to the probability of the next highest probability label. This happens when:

  • There are not enough examples in the training set for the underconfident data slice
  • The model is too simple and struggles to capture the complexity of the underlying data
  • The underconfident data slice contains inherent noise or overlapping feature distributions

{% elif issues[0].__class__.__name__ == "EthicalIssue" %}

Your model seems to be sensitive to gender, ethnic, or religion based perturbations in the input data. These perturbations can include switching some words from feminine to masculine, countries or nationalities. This happens when:

  • Underrepresentation of certain demographic groups in the training data
  • Data is reflecting some structural biases and societal prejudices
  • Use of complex models with large number of parameters that tend to overfit the training data

{% elif issues[0].__class__.__name__ == "DataLeakageIssue" %}

Your model seems to present some data leakage. The model provides different results depending on whether it is computing on a single data point or the entire dataset. This happens when:

  • Preprocessing steps, such as scaling, missing value imputation, or outlier handling, are fitted inside the prediction pipeline
  • Train-test splitting is done after preprocessing or feature selection

{% elif issues[0].__class__.__name__ == "StochasticityIssue" %}

Your model seems to present some stochastic behaviour. The model provides different results at each execution. This may happen when some stochastic training process is included in the prediction pipeline.

{% elif issues[0].__class__.__name__ == "LLMToxicityIssue" %}

Your model seems to exhibit offensive behaviour when we use adversarial “Do Anything Now” (DAN) prompts.

{% else %}

Found issues for {{ issues[0].group }}

{% endif %}

Issues

{% if num_major_issues[group] > 0 %} {{num_major_issues[group]}} major {% endif %} {% if num_medium_issues[group] > 0 %} {{num_medium_issues[group]}} medium {% endif %}
{% include "_issues_table.html" %}
{% endfor %} {% if issues|length == 0 %}

We found no issues in your model. Good job!

{% endif %}