A NeurIPS 2024 Workshop

Algorithmic Fairness through the Lens of Metrics and Evaluation

December 14 or 15, 2024

Pre-registration form: https://forms.gle/YBCwn7L8N5AxExMG7

The Algorithmic Fairness through the Lens of Metrics and Evaluation (AFME) workshop aims to spark discussions on revisiting algorithmic fairness metrics and evaluation in light of advances in large language models and international regulation.

The discussion on defining and measuring algorithmic (un)fairness has predominantly been a focus in the early stages of algorithmic fairness research resulting in four main fairness denominations: individual or group, statistical or causal, equalizing or nonequalizing, and temporal or non-temporal fairness. Since, much work in the field had been dedicated to providing methodological advances within each denomination and understanding various trade-offs between fairness metrics. However, given the changing machine learning landscape, with both increasing global applications and the emergence of large generative models, the question of understanding and defining what constitutes “fairness” in these systems has become paramount again. 

On one hand, definitions of algorithmic fairness are being critically examined regarding the historical and cultural values they encode. The mathematical conceptualization of these definitions and their operationalization through satisfying statistical parities has also raised criticism of not taking into account the context within which these systems are deployed. 

On another hand, it is still unclear how to reconcile standard fairness metrics and evaluations developed mainly for prediction and classification tasks with large generative models. While some works proposed adapting existing fairness metrics, e.g., to large language models, questions remain on how to systematically measure fairness for textual outputs, or even multi-modal generative models. Large generative models also pose new challenges to fairness evaluation with recent work showcasing how biases towards specific tokens in large language models can influence fairness assessments during evaluation. Finally, regulatory requirements introduce new challenges in defining, selecting, and assessing algorithmic fairness. 

Given these critical and timely considerations, this workshop aims to investigate how to define and evaluate (un)fairness in today’s machine learning landscape. 

Invited Speakers 

IBM Fellow

Assistant Professor

Computer Science, Stanford University

Assistant Professor, Ethics and Computational Technologies, Carnegie Mellon University 

Staff Research Scientist, Google DeepMind

Professor of Philosophy, Australian National University


IBM Fellow

Assistant Professor

Computer Science, Stanford University

Assistant Professor, Ethics and Computational Technologies, Carnegie Mellon University 

Staff Research Scientist, Google DeepMind

Professor of Philosophy, Australian National University

Senior Research Scientist, Google DeepMind


Awa Dieng (Google DeepMind, Mila)

Miriam Rateike (Saarland University)

(Harvard University)

Golnoosh Farnadi (McGill University, Mila)

Ferdinando Fioretto (University of Virginia)

Code of Conduct

The AFME workshop abides by the NeurIPS code of conduct. Participation in the event requires agreeing to the code of conduct.

Reviewer Volunteer Form
If you would like to help as a reviewer, please fill out the form below. 

To stay updated about the workshop, pre-register using this form and follow us on Twitter at @afciworkshop