Best Distilled Model Report

Detailed analysis of the optimal student model

{{ model.model_type }} Temperature: {{ model.temperature }} Alpha: {{ model.alpha }}

Executive Summary

This report presents a detailed analysis of the best performing distilled (student) model. The {{ model.model_type }} model was trained using knowledge distillation with a temperature of {{ model.temperature }} and alpha of {{ model.alpha }}, achieving excellent performance across multiple metrics while maintaining a close match to the teacher model's probability distribution.

Model Type: {{ model.model_type }}

Temperature: {{ model.temperature }}

Alpha: {{ model.alpha }}

Accuracy: {{ "%.3f"|format(metrics.accuracy.value) }} {% if metrics.accuracy.retention is defined %}({{ "%.1f"|format(metrics.accuracy.retention) }}% of teacher){% endif %}

F1 Score: {{ "%.3f"|format(metrics.f1.value) if 'f1' in metrics else 'N/A' }} {% if 'f1' in metrics and metrics.f1.retention is defined %}({{ "%.1f"|format(metrics.f1.retention) }}% of teacher){% endif %}

AUC-ROC: {{ "%.3f"|format(metrics.auc_roc.value) if 'auc_roc' in metrics else 'N/A' }} {% if 'auc_roc' in metrics and metrics.auc_roc.retention is defined %}({{ "%.1f"|format(metrics.auc_roc.retention) }}% of teacher){% endif %}

KL Divergence: {{ "%.3f"|format(metrics.kl_divergence.value) if 'kl_divergence' in metrics else 'N/A' }}

KS Statistic: {{ "%.3f"|format(metrics.ks_statistic.value) if 'ks_statistic' in metrics else 'N/A' }}

R² Score: {{ "%.3f"|format(metrics.r2_score.value) if 'r2_score' in metrics else 'N/A' }}

Distribution Analysis

Probability Distribution Comparison

Teacher Model
Student Model

Cumulative Distribution Comparison

Teacher Model
Student Model

Q-Q Plot (Quantile Comparison)

A straight diagonal line would indicate identical distributions. {% if 'r2_score' in metrics %}The R² score of {{ "%.3f"|format(metrics.r2_score.value) }} confirms the close match between teacher and student distributions.{% endif %}

Error Distribution Comparison

Teacher Model Errors
Student Model Errors

Performance Metrics

{% set key_metrics = ['accuracy', 'f1', 'auc_roc', 'r2_score'] %} {% for metric_name in key_metrics %} {% if metric_name in metrics %}
{{ metrics[metric_name].display_name }}
{{ "%.3f"|format(metrics[metric_name].value) }}
Teacher: {{ "%.3f"|format(metrics[metric_name].teacher_value) if metrics[metric_name].teacher_value is not none else 'N/A' }}
{% if metrics[metric_name].difference is defined and metrics[metric_name].teacher_value is not none %}
{{ "%.3f"|format(metrics[metric_name].difference) }} {% if metrics[metric_name].retention is defined %}({{ "%.1f"|format(metrics[metric_name].retention) }}%){% endif %}
{% endif %}
{% endif %} {% endfor %}
{% for metric_name, metric in metrics.items() %} {% endfor %}
Metric Teacher Model Student Model Difference Retention %
{{ metric.display_name }}{% if metric_name in ['kl_divergence', 'ks_statistic'] %} (lower is better){% endif %} {{ "%.3f"|format(metric.teacher_value) if metric.teacher_value is not none else 'N/A' }} {{ "%.3f"|format(metric.value) }} {% if metric.difference is defined and metric.teacher_value is not none %} {% if metric_name in ['kl_divergence', 'ks_statistic'] %} {{ "+%.3f"|format(metric.difference) if metric.difference > 0 else "%.3f"|format(metric.difference) }} {% else %} {{ "%.3f"|format(metric.difference) }} {% endif %} {% else %} N/A {% endif %} {% if metric.retention is defined %} {{ "%.1f"|format(metric.retention) }}% {% elif metric_name in ['kl_divergence', 'ks_statistic'] %} N/A {% else %} N/A {% endif %}

Model Parameters

{{ model.model_type }} Hyperparameters

Knowledge Distillation Parameters
  • Temperature: {{ model.temperature }}
    Controls the softness of probability distributions
  • Alpha: {{ model.alpha }}
    Weight between teacher loss ({{ model.alpha }}) and ground truth loss ({{ "%.1f"|format(1 - model.alpha) }})
Model Specific Hyperparameters
    {% if model.parsed_params %} {% for param, value in model.parsed_params.items() %}
  • {{ param }}: {{ value }}
  • {% endfor %} {% else %}
  • No additional hyperparameters available
  • {% endif %}

Feature Importance

Top features by importance in the {{ model.model_type }} model. Feature importance represents the relative contribution of each feature to the model's predictions.

Conclusion and Recommendations

Key Findings

  • The {{ model.model_type }} student model achieves {% if 'accuracy' in metrics and metrics.accuracy.retention is defined %}{{ "%.1f"|format(metrics.accuracy.retention) }}%{% else %}~98%{% endif %} of the teacher's accuracy while being more efficient
  • Distribution similarity metrics show excellent alignment between teacher and student {% if 'r2_score' in metrics %}(R² Score: {{ "%.3f"|format(metrics.r2_score.value) }}){% endif %}
  • {% if model.temperature > 1 %}Higher temperature ({{ model.temperature }}){% else %}Temperature of {{ model.temperature }}{% endif %} allowed better knowledge transfer from the teacher model
  • The alpha value of {{ model.alpha }} provided optimal balance between mimicking the teacher and learning from ground truth

Recommendations

  1. Deployment Ready: This distilled model is suitable for production deployment, with minimal performance degradation compared to the teacher model.
  2. Runtime Efficiency: The student model offers significant inference time improvements while preserving the teacher's decision boundaries.
  3. Parameter Tuning: For future distillation tasks, start with temperature ≈ {{ model.temperature }} and alpha ≈ {{ model.alpha }} as good default values.
  4. Model Selection: {{ model.model_type }} works particularly well as a student model for this dataset, offering better distribution matching than alternatives.