Hosted by the Institute for Foundations of Data Science

University of Washington

Jointly organized with University of Wisconsin - Madison, University of Chicago, University of California Santa Cruz

Aditi Raghunathan Carnegie Mellon University |
Lillian Ratliff University of Washington |
||

Yao Xie Georgia Institute of Technology |
Samory Kpotufe Columbia University |
||

Ludwig Schmidt University of Washington |
Jamie Morgenstern University of Washington |
||

Rediet Abebe University of California, Berkeley |
Stephen J. Wright University of Wisconsin |
||

Hongseok Namkoong Columbia University |
Brian Ziebart University of Illinois at Chicago |
||

R. Tyrrell Rockafellar University of Washington |

9:00-9:45 | Yao Xie |

9:45-10:30 | Samory Kpotufe |

10:30-11:00 | Break |

11:00-11:45 | Ludwig Schmidt |

11:45-2:00 | Lunch |

2:00-2:45 | Jamie Morgenstern |

2:45-3:30 | Break |

4:00 | Evening Buffet at Ba Bar University Village |

9:00-9:45 | Hongseok Namkoong |

9:45-10:30 | Aditi Raghunathan |

10:30-11:00 | Break |

11:00-11:45 | Tyrrell Rockafellar |

11:45-2:00 | Lunch |

2:00-2:45 | Steve Wright |

2:45-3:30 | Break |

3:30-5:00 | Short talks and software demos |

10:00-10:45 | Brian Ziebart |

10:45-11:30 | Short talks |

11:30-11:45 | Closing remarks |

This workshop aims to bring together researchers with different backgrounds in computer science, control theory, statistics and math who are interested in addressing distributional robustness in learning, optimization, and control.

Registration is required for all participants. Registration is now closed.

Please make sure to book your travel very early.
For accommodation, there is a set of rooms blocked at the
**Silver Cloud Hotel** near campus.
Please make your booking before July 7th 2022.

Hypothesis tests via distributionally robust optimization

We consider a general data-driven robust hypothesis test formulation to find an optimal test (function) that minimizes the worst-case performance regarding distributions that are close to the empirical distributions with respect to some divergence, in particular, the Wasserstein and the sink horn divergences. The robust tests are beneficial, for instance, for cases with limited or unbalanced samples - such a scenario often arises from applications such as health care, online change-point detection, and anomaly detection. We present a distributionally robust optimization framework to solve such a problem and study the computational and statistical properties of the proposed test by presenting a tractable convex reformulation of the original infinite-dimensional variational problem. Finally, I will present the generalization of the approach to other related problems, including domain adaptation.Tracking Most Significant Arm Switches in Bandits

In bandit with distribution shifts, one aims to automatically adapt to unknown changes in reward distribution, and restart exploration when necessary. While this problem has received attention for many years, no adaptive procedure was known till a recent breakthrough of Auer et al (2018, 2019) which guarantees an optimal (dynamic) regret (LT)^{1/2}, for T rounds and L stationary phases.However, while this rate is tight in the worst case, it leaves open whether faster rates are possible, adaptively, if few changes in distribution are actually severe, e.g., involve no change in best arm. We provide a positive answer, showing that in fact, a much weaker notion of change can be adapted to, which can yield significantly faster rates than previously known, whether as expressed in terms of number of best arm switches--for which no adaptive procedure was known, or in terms of total variation. Finally, our parametrization captures at once, both stochastic and non-stochastic adversarial settings.

A data-centric view on robustness

Over the past few years, researchers have proposed many ways to measure the robustness of machine learning models. In the first part of the talk, we will survey the current robustness landscape based on a large-scale experimental study involving more than 200 different models and test conditions. Despite the large variety of test conditions, common trends emerge: (i) robustness to natural distribution shift and synthetic perturbations are distinct phenomena, (ii) current algorithmic techniques have little effect on robustness to natural distribution shifts, (iii) training on more diverse datasets offers robustness gains on several natural distribution shifts.In the second part of the talk, we then leverage the aforementioned insights to improve OpenAI’s CLIP model. CLIP achieved unprecedented robustness on several natural distribution shifts, but only when used as a zero-shot model. The zero-shot evaluation precludes the use of extra data for fine-tuning and hence leads to lower performance when there is a specific task of interest. To address this issue, we introduce a simple yet effective method for fine-tuning zero-shot models that leads to large robustness gains on several distribution shifts without reducing in-distribution performance.

Endogeneous distribution shifts in competitive environments

In this talk, I'll describe recent work exploring the dynamics between ML systems and the populations they serve, particularly when the models deployed impact what populations a system will have as future customers.Assessing the external validity (a.k.a. distributional robustness) of causal findings

Causal inference—analyzing the ceteris paribus effect of an intervention—is key to reliable decision-making. Due to population shifts over time and underrepresentation of marginalized groups, standard causal estimands that measure the average treatment effect often lose validity outside the study population. To guarantee that causal findings remain valid over population shifts, we propose the worst-case treatment effect (WTE) across all subpopulations of a given size. We develop an optimal estimator for the WTE based on flexible prediction methods, which allows analyzing the external validity of the popular doubly robust estimator. On real examples where external validity is of core concern, our proposed framework successfully guards against brittle findings that are invalidated under unanticipated population shifts.Estimating and improving the performance of machine learning under natural distribution shifts

Machine learning systems often fail catastrophically under the presence of distribution shift—when the test distribution differs in some systematic way from the training distribution. If we can mathematically characterize a distribution shift, we could devise appropriate robust training algorithms that promote robustness to that specific class of shifts. However, the resulting robust models show limited gains on shifts that do not admit the structure they were specifically trained against. Naturally occurring shifts are both hard to predict a priori and intractable to mathematically characterize necessitating different approaches to addressing distribution shifts in the wild.In this talk, we first discuss how to estimate the performance of models under natural distribution shifts—the shift could cause a small degradation or a catastrophic drop. Obtaining ground truth labels is expensive and requires the a priori knowledge of when and what kind of distribution shifts are likely to occur. We present a phenomenon that we call agreement-on-the-line that allows us to effectively predict performance under distribution shift from just unlabeled data. Next, we investigate a promising avenue for improving robustness to natural shifts—leveraging representations pre-trained on diverse data. Via theory and experiments, we find that the de facto fine-tuning of pre-trained representations does not maximally preserve robustness. Using insights from our analysis, we provide two simple alternate fine-tuning approaches that substantially boost robustness to natural shifts.

Robustness from the Perspective of Coherent Measures of Risk

Measures of risk seek to quantify the overall "risk" in a cost- or loss-oriented random variable by a single value, such as its expectation or worst outcome, or better, something in between.Common sense axioms were developed for this in the late 90s by mathematicians working in finance, and they used the term "coherent" for the risk measures that satisfied those axioms. Their work was soon broadened, and it was established by way of duality in convex analysis that coherent measures of risk correspond exactly to looking for robustness with respect to some collection of alternative probability distributions.The theory has since become highly developed with powerful approaches to quantification like conditional value-at-risk and others based on that. Now also, it includes interesting connections with statistics and regression captured by the "fundamental quadrangle of risk", as will be explained in this talk, with examples.

Robust formulations and algorithms for learning problems under distributional ambiguity

We discuss learning problems in which the empirical distribution represented by the training data is used to define an ambiguous set of distributions, and we seek the classifier or regressor that solves a min-max problem involving this set. Our focus is mainly on linear models. First, we discuss formulation of the robust min-max problem based on classification with the discontinuous "zero-one" loss function, using Wasserstein ambiguity, and describe properties of the resulting nonconvex problem, which is benignly nonconvex in a certain sense. Second, we discuss robust formulations of convex ERM problems involving linear models, where the distributional ambiguity is measured using either a Wasserstein metric or f-divergence. We show that such problems can be formulated as "generalized linear programs" and solved using a first-order primal-dual algorithm that incorporates coordinate descent in the dual variable and variance reduction. We present numerical results to illustrate properties of the robust optimization formulations and algorithms.Prediction Games: From Maximum Likelihood Estimation to Active Learning, Fair Machine Learning, and Structured Prediction

A standard approach to supervised machine learning is to choose the form of a predictor and to then optimize its parameters based on training data. Approximations of the predictor's performance measure are often required to make this optimization problem tractable. Instead of approximating the performance measure and using the exact training data, this talk explores a distributionally robust approach using game-theoretic approximations of the training data while optimizing the exact performance measures of interest. Though the resulting "prediction games" reduce to maximum likelihood estimation in simple cases, they provide new methods for more complicated prediction tasks involving covariate shift, fairness constraint satisfaction, and structured data.**Shiori Sagawa**: Extending the WILDS Benchmark for Unsupervised Adaptation**Ananya Kumar**: Connect, Not Collapse: Explaining Contrastive Learning for Unsupervised Domain Adaptation**Josh Gardner**: Subgroup Robustness Grows On Trees: An Empirical Baseline Investigation**Sushrut Karmalkar**: Robust Sparse Mean Estimation via Sum of Squares**Mitchell Wortsman**: OpenCLIP Software Demo (Slides)**Yassine Laguel**: SPQR Software Demo**Krishna Pillutla**: SQwash Software Demo (Colab)

**Krishna Pillutla**: Tackling distribution shifts in federated learning**Ahmet Alacaoglu**: On the Complexity of a Practical Primal-Dual Coordinate Method**Lijun Ding**: Flat Minima Generalize for Low-rank Matrix Recovery**Zaid Harchaoui**: Stochastic Optimization Under Time Drift**Yassine Laguel**: Chance Constrained Problems: a Bilevel Convex Optimization Perspective**Lang Liu**: Orthogonal Statistical Learning with Self-Concordant Loss

Questions? Contact us!