IFDS Workshop on Distributional Robustness in Data Science

Hosted by the Institute for Foundations of Data Science
University of Washington

Jointly organized with University of Wisconsin - Madison, University of Chicago, University of California Santa Cruz

Dates

August 4-6, 2022

Location

Bill & Melinda Gates Center, University of Washington

Invited Speakers

Aditi Raghunathan
Carnegie Mellon University
Lillian Ratliff
University of Washington
Yao Xie
Georgia Institute of Technology
Samory Kpotufe
Columbia University
Ludwig Schmidt
University of Washington
Jamie Morgenstern
University of Washington
Rediet Abebe
University of California, Berkeley
Stephen J. Wright
University of Wisconsin
Hongseok Namkoong
Columbia University
Brian Ziebart
University of Illinois at Chicago
R. Tyrrell Rockafellar
University of Washington

Schedule

Lectures will be held in Zillow Commons, 4th floor of the Gates Center

Talk titles and abstracts

August 4

9:00-9:45 Yao Xie
9:45-10:30 Samory Kpotufe
10:30-11:00 Break
11:00-11:45 Ludwig Schmidt
11:45-2:00 Lunch
2:00-2:45Jamie Morgenstern
2:45-3:30 Break
4:00 Evening Buffet at Ba Bar University Village

August 5

9:00-9:45 Hongseok Namkoong
9:45-10:30 Aditi Raghunathan
10:30-11:00 Break
11:00-11:45 Tyrrell Rockafellar
11:45-2:00 Lunch
2:00-2:45 Steve Wright
2:45-3:30 Break
3:30-5:00 Short talks and software demos

August 6

10:00-10:45 Brian Ziebart
10:45-11:30 Short talks
11:30-11:45 Closing remarks

Registration

This workshop aims to bring together researchers with different backgrounds in computer science, control theory, statistics and math who are interested in addressing distributional robustness in learning, optimization, and control.


Registration is required for all participants. Registration is now closed.



Accommodation

Please make sure to book your travel very early. For accommodation, there is a set of rooms blocked at the Silver Cloud Hotel near campus. Please make your booking before July 7th 2022.



Organizers

Organizer: Zaid Harchaoui (UW)
Local Organizers: Krishna Pillutla (UW), Ronak Mehta (UW), Ross Boczar (UW), Steve Mussmann (UW)
Steering Committee: Maryam Fazel (UW), Rebecca Willett (Chicago), Stephen Wright (Wisconsin)

Talks

Yao Xie

Hypothesis tests via distributionally robust optimization

Slides   Video

We consider a general data-driven robust hypothesis test formulation to find an optimal test (function) that minimizes the worst-case performance regarding distributions that are close to the empirical distributions with respect to some divergence, in particular, the Wasserstein and the sink horn divergences. The robust tests are beneficial, for instance, for cases with limited or unbalanced samples - such a scenario often arises from applications such as health care, online change-point detection, and anomaly detection. We present a distributionally robust optimization framework to solve such a problem and study the computational and statistical properties of the proposed test by presenting a tractable convex reformulation of the original infinite-dimensional variational problem. Finally, I will present the generalization of the approach to other related problems, including domain adaptation.


Samory Kpotufe

Tracking Most Significant Arm Switches in Bandits

Slides   Video

In bandit with distribution shifts, one aims to automatically adapt to unknown changes in reward distribution, and restart exploration when necessary. While this problem has received attention for many years, no adaptive procedure was known till a recent breakthrough of Auer et al (2018, 2019) which guarantees an optimal (dynamic) regret (LT)^{1/2}, for T rounds and L stationary phases.

However, while this rate is tight in the worst case, it leaves open whether faster rates are possible, adaptively, if few changes in distribution are actually severe, e.g., involve no change in best arm. We provide a positive answer, showing that in fact, a much weaker notion of change can be adapted to, which can yield significantly faster rates than previously known, whether as expressed in terms of number of best arm switches--for which no adaptive procedure was known, or in terms of total variation. Finally, our parametrization captures at once, both stochastic and non-stochastic adversarial settings.


Ludwig Schmidt

A data-centric view on robustness

Slides   Video

Over the past few years, researchers have proposed many ways to measure the robustness of machine learning models. In the first part of the talk, we will survey the current robustness landscape based on a large-scale experimental study involving more than 200 different models and test conditions. Despite the large variety of test conditions, common trends emerge: (i) robustness to natural distribution shift and synthetic perturbations are distinct phenomena, (ii) current algorithmic techniques have little effect on robustness to natural distribution shifts, (iii) training on more diverse datasets offers robustness gains on several natural distribution shifts.

In the second part of the talk, we then leverage the aforementioned insights to improve OpenAI’s CLIP model. CLIP achieved unprecedented robustness on several natural distribution shifts, but only when used as a zero-shot model. The zero-shot evaluation precludes the use of extra data for fine-tuning and hence leads to lower performance when there is a specific task of interest. To address this issue, we introduce a simple yet effective method for fine-tuning zero-shot models that leads to large robustness gains on several distribution shifts without reducing in-distribution performance.


Jamie Morgenstern

Endogeneous distribution shifts in competitive environments

Slides   Video

In this talk, I'll describe recent work exploring the dynamics between ML systems and the populations they serve, particularly when the models deployed impact what populations a system will have as future customers.


Hongseok Namkoong

Assessing the external validity (a.k.a. distributional robustness) of causal findings

Slides   Video

Causal inference—analyzing the ceteris paribus effect of an intervention—is key to reliable decision-making. Due to population shifts over time and underrepresentation of marginalized groups, standard causal estimands that measure the average treatment effect often lose validity outside the study population. To guarantee that causal findings remain valid over population shifts, we propose the worst-case treatment effect (WTE) across all subpopulations of a given size. We develop an optimal estimator for the WTE based on flexible prediction methods, which allows analyzing the external validity of the popular doubly robust estimator. On real examples where external validity is of core concern, our proposed framework successfully guards against brittle findings that are invalidated under unanticipated population shifts.


Aditi Raghunathan

Estimating and improving the performance of machine learning under natural distribution shifts

Slides   Video

Machine learning systems often fail catastrophically under the presence of distribution shift—when the test distribution differs in some systematic way from the training distribution. If we can mathematically characterize a distribution shift, we could devise appropriate robust training algorithms that promote robustness to that specific class of shifts. However, the resulting robust models show limited gains on shifts that do not admit the structure they were specifically trained against. Naturally occurring shifts are both hard to predict a priori and intractable to mathematically characterize necessitating different approaches to addressing distribution shifts in the wild.

In this talk, we first discuss how to estimate the performance of models under natural distribution shifts—the shift could cause a small degradation or a catastrophic drop. Obtaining ground truth labels is expensive and requires the a priori knowledge of when and what kind of distribution shifts are likely to occur. We present a phenomenon that we call agreement-on-the-line that allows us to effectively predict performance under distribution shift from just unlabeled data. Next, we investigate a promising avenue for improving robustness to natural shifts—leveraging representations pre-trained on diverse data. Via theory and experiments, we find that the de facto fine-tuning of pre-trained representations does not maximally preserve robustness. Using insights from our analysis, we provide two simple alternate fine-tuning approaches that substantially boost robustness to natural shifts.


Terry Rockafellar

Robustness from the Perspective of Coherent Measures of Risk

Slides   Video

Measures of risk seek to quantify the overall "risk" in a cost- or loss-oriented random variable by a single value, such as its expectation or worst outcome, or better, something in between.Common sense axioms were developed for this in the late 90s by mathematicians working in finance, and they used the term "coherent" for the risk measures that satisfied those axioms. Their work was soon broadened, and it was established by way of duality in convex analysis that coherent measures of risk correspond exactly to looking for robustness with respect to some collection of alternative probability distributions.

The theory has since become highly developed with powerful approaches to quantification like conditional value-at-risk and others based on that. Now also, it includes interesting connections with statistics and regression captured by the "fundamental quadrangle of risk", as will be explained in this talk, with examples.


Stephen J. Wright

Robust formulations and algorithms for learning problems under distributional ambiguity

Slides   Video

We discuss learning problems in which the empirical distribution represented by the training data is used to define an ambiguous set of distributions, and we seek the classifier or regressor that solves a min-max problem involving this set. Our focus is mainly on linear models. First, we discuss formulation of the robust min-max problem based on classification with the discontinuous "zero-one" loss function, using Wasserstein ambiguity, and describe properties of the resulting nonconvex problem, which is benignly nonconvex in a certain sense. Second, we discuss robust formulations of convex ERM problems involving linear models, where the distributional ambiguity is measured using either a Wasserstein metric or f-divergence. We show that such problems can be formulated as "generalized linear programs" and solved using a first-order primal-dual algorithm that incorporates coordinate descent in the dual variable and variance reduction. We present numerical results to illustrate properties of the robust optimization formulations and algorithms.


Brian Ziebart

Prediction Games: From Maximum Likelihood Estimation to Active Learning, Fair Machine Learning, and Structured Prediction

Slides   Video

A standard approach to supervised machine learning is to choose the form of a predictor and to then optimize its parameters based on training data. Approximations of the predictor's performance measure are often required to make this optimization problem tractable. Instead of approximating the performance measure and using the exact training data, this talk explores a distributionally robust approach using game-theoretic approximations of the training data while optimizing the exact performance measures of interest. Though the resulting "prediction games" reduce to maximum likelihood estimation in simple cases, they provide new methods for more complicated prediction tasks involving covariate shift, fairness constraint satisfaction, and structured data.


Friday Short Talks and Demos

Slides   Video

  • Shiori Sagawa: Extending the WILDS Benchmark for Unsupervised Adaptation
  • Ananya Kumar: Connect, Not Collapse: Explaining Contrastive Learning for Unsupervised Domain Adaptation
  • Josh Gardner: Subgroup Robustness Grows On Trees: An Empirical Baseline Investigation
  • Sushrut Karmalkar: Robust Sparse Mean Estimation via Sum of Squares
  • Mitchell Wortsman: OpenCLIP Software Demo (Slides)
  • Yassine Laguel: SPQR Software Demo
  • Krishna Pillutla: SQwash Software Demo (Colab)


Saturday Short Talks

Slides   Videos

  • Krishna Pillutla: Tackling distribution shifts in federated learning
  • Ahmet Alacaoglu: On the Complexity of a Practical Primal-Dual Coordinate Method
  • Lijun Ding: Flat Minima Generalize for Low-rank Matrix Recovery
  • Zaid Harchaoui: Stochastic Optimization Under Time Drift
  • Yassine Laguel: Chance Constrained Problems: a Bilevel Convex Optimization Perspective
  • Lang Liu: Orthogonal Statistical Learning with Self-Concordant Loss

Contact

Questions? Contact us!