Consequential decisions are increasingly informed by sophisticated data-driven predictive models. For accurate predictive models, deterministic threshold rules have been shown to be optimal in terms of utility, even under a variety of fairness constraints. However, consistently learning accurate predictive models requires access to ground truth labels. Unfortunately, in practice, labels only exist conditional on certain decisions, which may have been made using a potentially imperfect decision policy. As a result, learned deterministic threshold rules are often suboptimal. Can we do better if we learn to decide rather than to predict? We first show that, if decisions are taken by a faulty deterministic policy, the observed labels are insufficient to improve it. Then, we describe how to avoid this undesirable behavior by directly learning stochastic decision policies that maximize utility under fairness constraints. Experiments on synthetic and real-world data illustrate the favorable properties of learning to decide in terms of utility and fairness.

| Link | Bibtex | Poster | Slide | Video | Code |