Hosted by the Institute for Foundations of Data Science
University of Washington
Jointly organized with University of Wisconsin - Madison, University of Chicago, University of California Santa Cruz
Aditi Raghunathan Carnegie Mellon University |
Lillian Ratliff University of Washington |
||
Yao Xie Georgia Institute of Technology |
Samory Kpotufe Columbia University |
||
Ludwig Schmidt University of Washington |
Jamie Morgenstern University of Washington |
||
Rediet Abebe University of California, Berkeley |
Stephen J. Wright University of Wisconsin |
||
Hongseok Namkoong Columbia University |
Brian Ziebart University of Illinois at Chicago |
||
R. Tyrrell Rockafellar University of Washington |
9:00-9:45 | Yao Xie |
9:45-10:30 | Samory Kpotufe |
10:30-11:00 | Break |
11:00-11:45 | Ludwig Schmidt |
11:45-2:00 | Lunch |
2:00-2:45 | Jamie Morgenstern |
2:45-3:30 | Break |
4:00 | Evening Buffet at Ba Bar University Village |
9:00-9:45 | Hongseok Namkoong |
9:45-10:30 | Aditi Raghunathan |
10:30-11:00 | Break |
11:00-11:45 | Tyrrell Rockafellar |
11:45-2:00 | Lunch |
2:00-2:45 | Steve Wright |
2:45-3:30 | Break |
3:30-5:00 | Short talks and software demos |
10:00-10:45 | Brian Ziebart |
10:45-11:30 | Short talks |
11:30-11:45 | Closing remarks |
This workshop aims to bring together researchers with different backgrounds in computer science, control theory, statistics and math who are interested in addressing distributional robustness in learning, optimization, and control.
Please make sure to book your travel very early. For accommodation, there is a set of rooms blocked at the Silver Cloud Hotel near campus. Please make your booking before July 7th 2022.
Hypothesis tests via distributionally robust optimization
We consider a general data-driven robust hypothesis test formulation to find an optimal test (function) that minimizes the worst-case performance regarding distributions that are close to the empirical distributions with respect to some divergence, in particular, the Wasserstein and the sink horn divergences. The robust tests are beneficial, for instance, for cases with limited or unbalanced samples - such a scenario often arises from applications such as health care, online change-point detection, and anomaly detection. We present a distributionally robust optimization framework to solve such a problem and study the computational and statistical properties of the proposed test by presenting a tractable convex reformulation of the original infinite-dimensional variational problem. Finally, I will present the generalization of the approach to other related problems, including domain adaptation.Tracking Most Significant Arm Switches in Bandits
In bandit with distribution shifts, one aims to automatically adapt to unknown changes in reward distribution, and restart exploration when necessary. While this problem has received attention for many years, no adaptive procedure was known till a recent breakthrough of Auer et al (2018, 2019) which guarantees an optimal (dynamic) regret (LT)^{1/2}, for T rounds and L stationary phases.A data-centric view on robustness
Over the past few years, researchers have proposed many ways to measure the robustness of machine learning models. In the first part of the talk, we will survey the current robustness landscape based on a large-scale experimental study involving more than 200 different models and test conditions. Despite the large variety of test conditions, common trends emerge: (i) robustness to natural distribution shift and synthetic perturbations are distinct phenomena, (ii) current algorithmic techniques have little effect on robustness to natural distribution shifts, (iii) training on more diverse datasets offers robustness gains on several natural distribution shifts.Endogeneous distribution shifts in competitive environments
In this talk, I'll describe recent work exploring the dynamics between ML systems and the populations they serve, particularly when the models deployed impact what populations a system will have as future customers.Assessing the external validity (a.k.a. distributional robustness) of causal findings
Causal inference—analyzing the ceteris paribus effect of an intervention—is key to reliable decision-making. Due to population shifts over time and underrepresentation of marginalized groups, standard causal estimands that measure the average treatment effect often lose validity outside the study population. To guarantee that causal findings remain valid over population shifts, we propose the worst-case treatment effect (WTE) across all subpopulations of a given size. We develop an optimal estimator for the WTE based on flexible prediction methods, which allows analyzing the external validity of the popular doubly robust estimator. On real examples where external validity is of core concern, our proposed framework successfully guards against brittle findings that are invalidated under unanticipated population shifts.Estimating and improving the performance of machine learning under natural distribution shifts
Machine learning systems often fail catastrophically under the presence of distribution shift—when the test distribution differs in some systematic way from the training distribution. If we can mathematically characterize a distribution shift, we could devise appropriate robust training algorithms that promote robustness to that specific class of shifts. However, the resulting robust models show limited gains on shifts that do not admit the structure they were specifically trained against. Naturally occurring shifts are both hard to predict a priori and intractable to mathematically characterize necessitating different approaches to addressing distribution shifts in the wild.Robustness from the Perspective of Coherent Measures of Risk
Measures of risk seek to quantify the overall "risk" in a cost- or loss-oriented random variable by a single value, such as its expectation or worst outcome, or better, something in between.Common sense axioms were developed for this in the late 90s by mathematicians working in finance, and they used the term "coherent" for the risk measures that satisfied those axioms. Their work was soon broadened, and it was established by way of duality in convex analysis that coherent measures of risk correspond exactly to looking for robustness with respect to some collection of alternative probability distributions.Robust formulations and algorithms for learning problems under distributional ambiguity
We discuss learning problems in which the empirical distribution represented by the training data is used to define an ambiguous set of distributions, and we seek the classifier or regressor that solves a min-max problem involving this set. Our focus is mainly on linear models. First, we discuss formulation of the robust min-max problem based on classification with the discontinuous "zero-one" loss function, using Wasserstein ambiguity, and describe properties of the resulting nonconvex problem, which is benignly nonconvex in a certain sense. Second, we discuss robust formulations of convex ERM problems involving linear models, where the distributional ambiguity is measured using either a Wasserstein metric or f-divergence. We show that such problems can be formulated as "generalized linear programs" and solved using a first-order primal-dual algorithm that incorporates coordinate descent in the dual variable and variance reduction. We present numerical results to illustrate properties of the robust optimization formulations and algorithms.Prediction Games: From Maximum Likelihood Estimation to Active Learning, Fair Machine Learning, and Structured Prediction
A standard approach to supervised machine learning is to choose the form of a predictor and to then optimize its parameters based on training data. Approximations of the predictor's performance measure are often required to make this optimization problem tractable. Instead of approximating the performance measure and using the exact training data, this talk explores a distributionally robust approach using game-theoretic approximations of the training data while optimizing the exact performance measures of interest. Though the resulting "prediction games" reduce to maximum likelihood estimation in simple cases, they provide new methods for more complicated prediction tasks involving covariate shift, fairness constraint satisfaction, and structured data.Questions? Contact us!