Workshop on regret, optimization & games
Detailed program
Wednesday, November 5th
9:00 — Jelena Diakonikolas (University of Wisconsin–Madison) — On Efficient Solvers for Fixed-Point Equations: Classical Results and New Twists
Fixed-point operator equations—where one seeks solutions to \(T(x) = x\) for an operator \(T\) mapping a vector space to itself—form a fundamental class of problems with broad applications in optimization theory, game theory, economics, and, more recently, reinforcement learning. In this talk, I will begin by reviewing classical algorithmic results for solving such equations under oracle access to the operator \(T\). I will then highlight key gaps in the literature, particularly in settings where \(T\) may be expansive—though in a controlled sense. Finally, I will present recent results that address some of these challenges and conclude with open questions and potential directions for future work.
9:45 — Nicolò Cesa-Bianchi (Università degli Studi di Milano & Politecnico di Milano) — Regret, Games and Boosting: a Reverie of Gambles and Bounds
We revisit the celebrated boosting theorem through the lenses of game theory and online learning. This perspective enables a principled extension of boosting theory to more general notions of losses, including multi-objective criteria. Our analysis characterizes the frontier between boostable guarantees and random guessing.
Joint work with: Marco Bressan, Nataly Brukhim, Emmanuel Esposito, Yishay Mansour, Shay Moran, Max Thiessen.
11:00 — Yishay Mansour (Tel Aviv University) — Swap Regret and Correlated Equilibria Beyond Normal-Form Games
Swap regret is a notion that has proven itself to be central to the study of general-sum normal-form games, with swap-regret minimization leading to convergence to the set of correlated equilibria and guaranteeing non-manipulability against a self-interested opponent. However, the situation for more general classes of games – such as Bayesian games and extensive-form games – is less clear-cut, with multiple candidate definitions for swap-regret but no known efficiently minimizable variant of swap regret that implies analogous non-manipulability guarantees.
In this paper, we present a new variant of swap regret for polytope games that we call "profile swap regret", with the property that obtaining sublinear profile swap regret is both necessary and sufficient for any learning algorithm to be non-manipulable by an opponent (resolving an open problem of Mansour et al., 2022). Although we show profile swap regret is NP-hard to compute given a transcript of play, we show it is nonetheless possible to design efficient learning algorithms that guarantee at most \(\sqrt{T})\) profile swap regret. Finally, we explore the correlated equilibrium notion induced by low-profile-swap-regret play, and demonstrate a gap between the set of outcomes that can be implemented by this learning process and the set of outcomes that can be implemented by a third-party mediator (in contrast to the situation in normal-form games).
Joint work with Eshwar Ram Arunachaleswaran, Natalie Collina, Mehryar Mohri, Jon Schneider, and Balasubramanian Sivan
13:30 — Sergiu Hart (The Hebrew University of Jerusalem) — "Calibeating": Beating Forecasters at Their Own Game
In order to identify expertise, forecasters should not be tested by their calibration score, which can always be made arbitrarily small, but rather by their Brier score. The Brier score is the sum of the calibration score and the refinement score; the latter measures how good the sorting into bins with the same forecast is, and thus attests to "expertise". This raises the question of whether one can gain calibration without losing expertise, which we refer to as "calibeating". We provide an easy way to calibeat any forecast, by a deterministic online procedure. We moreover show that calibeating can be achieved by a stochastic procedure that is itself calibrated, and then extend the results to simultaneously calibeating multiple procedures, and to deterministic procedures that are continuously calibrated. Finally, we show that calibeating is stronger than the "best expert".
Joint work with Dean Foster
14:15 — Gabriele Farina (MIT) — What is the strongest notion of regret that can be kept meaningfully sublinear in convex games?
Φ-equilibria – and the associated notion of Φ-regret – are a powerful and flexible framework at the heart of online learning and game theory, whereby enriching the set of deviations Φ begets stronger notions of rationality. We discuss several recent positive results in the direction of characterizing the largest notion of rationality that can be guaranteed efficiently in general convex games. Specifically, we show that linear and low-degree-polynomial correlated equilibria—relaxations of correlated equilibria and a strengthening of coarse correlated equilibria—can be computed and learned efficiently in general convex games, including extensive-form games. Our results are enabled by a generalization of the seminal framework of Gordon et al. for Φ-regret minimization, providing extensions to this framework that can be used even when the set of deviations Φ is intractable to separate/optimize over. We sidestep the lack of tractability of Φ by means of a new algorithmic routine that we coin semiseparation. We discuss further applications of this algorithmic primitive in the direction of equilibrium computation towards the end of the talk.
Based on joint work that has appeared at STOC’25 and ACM EC’25.
15:30 — Evgenii Chzhen (CNRS) — About an extension of Blackwell’s approachability framework with applications to fairness
We provide a setting and a general approach to fair online learning with stochastic sensitive and non-sensitive contexts. The setting is a repeated game between the Player and Nature, where at each stage both pick actions based on the contexts. Inspired by the notion of unawareness, we assume that the Player can only access the non-sensitive context before making a decision, while we discuss both cases of Nature accessing the sensitive contexts and Nature unaware of the sensitive contexts. Adapting Blackwell’s approachability theory to handle the case of an unknown contexts' distribution, we provide a general necessary and sufficient condition for learning objectives to be compatible with some fairness constraints. This condition is instantiated on (group-wise) no-regret and (group-wise) calibration objectives, and on demographic parity as an additional constraint. When the objective is not compatible with the constraint, the provided framework permits to characterise the optimal trade-off between the two.
16:15 — Kohei Hatano (Kyushu University) — Online Optimization over RIS networks via Mixed Integer Programming
We consider an online optimization problem motivated by reconfigurable intelligent surfaces (RIS), which is one of the core technologies of 6G wireless networks. We formulate the problem as an online optimization of paths over a DAG with non-linear reward functions. We show that the corresponding offline problem is NP-hard, but can be reformulated as an MIP. This result implies an online algorithm with a low regret.
Thursday, November 6th
9:00 — Flore Sentenac (HEC) — Balancing optimism and pessimism in offline-to-online learning
Based on a paper with Ilbin Lee and Csaba Szeppesvari.
We consider what we call the offline-to-online learning setting, focusing on stochastic finite-armed bandit problems. In offline-to-online learning problems, a learner starts with offline data collected from interaction with an unknown environment in a way that is not under the control of the learner. Given this data, the learner then starts interacting with the environment, gradually improving its initial strategy as it collects more data so as to maximize the total reward it receives from the environment. The basic dilemma of the learner in this setting is as follows: if the problem was purely offline (the learner is given data that it can use to design a policy, which then remains fixed), the best strategy (in a number of senses) would be to use the Lower Confidence Bound (LCB) algorithm, which is based on pessimism. Among other things, LCB can simultaneously compete with any policy that is sufficiently "covered" by the offline data. However, if the problem was purely online, the best strategy (in a number of senses) would be to use theUpper Confidence Bound (UCB) algorithm based on optimism. Indeed, as time goes by, the UCB algorithm will match the performance of the optimal policy at a speed that is nearly the best possible among all online algorithms. For offline-to-online learning, however, UCB will explore too much initially, and as such, its performance will be inferior to that of LCB, at least for some time. This suggests that the learner should use LCB initially and then gradually change its strategy to resemble UCB. Just how this should be done (and why) is the subject of this talk. In particular, our main result shows that our new algorithm performs nearly as well as the better of LCB and UCB at any point in time.
9:45 — Elad Hazan (Princeton University) — Learning in Dynamical Systems
Learning in dynamical systems is a fundamental challenge underlying modern sequence modeling. Despite extensive study, efficient algorithms with formal guarantees for general nonlinear systems have remained elusive. This talk presents a provably efficient framework for learning in any bounded and Lipschitz nonlinear dynamical system, establishing the first sublinear regret guarantees in a dimension-free setting. Our approach combines Koopman lifting, Luenberger observers, and, crucially, spectral filtering to show that nonlinear dynamics are learnable. These insights motivate a new neural architecture, the Spectral Transform Unit (STU), which achieves state-of-the-art performance on language modeling and dynamical system benchmarks.
11:00 — Jacob Abernethy (Georgia Institute of Technology) — TBA
13:30 — Francesco Orabona (KAUST) — New Perspectives on the Polyak Stepsize: Surrogate Functions and Negative Results
The Polyak stepsize has been proven to be a fundamental stepsize in convex optimization, giving near optimal gradient descent rates across a wide range of assumptions. The universality of the Polyak stepsize has also inspired many stochastic variants, with theoretical guarantees and strong empirical performance. Despite the many theoretical results, our understanding of the convergence properties and shortcomings of the Polyak stepsize or its variants is both incomplete and fractured across different analyses. We propose a new, unified, and simple perspective for the Polyak stepsize and its variants as gradient descent on a surrogate loss. We show that each variant is equivalent to minimize a surrogate function with stepsizes that adapt to a guaranteed local curvature. Our general surrogate loss perspective is then used to provide a unified analysis of existing variants across different assumptions. Moreover, we show a number of negative results proving that the non-convergence results in some of the upper bounds is indeed real. This is a joint work with Ryan D'Orazio.
14:15 — Kfir Levy (Technion) — Do Stochastic, Feel Noiseless: Stable Optimization via a Double Momentum Mechanism
The tremendous success of the Machine Learning paradigm heavily relies on the development of powerful optimization methods, and the canonical algorithm for training learning models is SGD (Stochastic Gradient Descent). Nevertheless, the latter is quite different from Gradient Descent (GD) which is its noiseless counterpart. Concretely, SGD requires a careful choice of the learning rate, which relies on the properties of the noise as well as the quality of initialization. It further requires the use of a test set to estimate the generalization error throughout its run. In this talk, we will present a new SGD variant that obtains the same optimal rates as SGD, while using noiseless machinery as in GD. Concretely, it enables to use the same fixed learning rate as GD and does not require to employ a test/validation set. Curiously, our results rely on a novel gradient estimate that combines two recent mechanisms which are related to the notion of momentum. Finally, I will discuss several applications of our new variant.
15:30 — Ashok Cutkosky (Boston University) — How to Use Regret Bounds?
Modern online optimization algorithms feature significantly more refined theoretical guarantees than were available a decade ago. By contrast, the ways in which we attempt to apply these algorithms have advanced much more slowly, and with much less success. In this talk, I will discuss challenges that arise in practical application of theoretically-motivated optimization methods to modern neural networks, and some steps towards principled ways to convert regret bounds into convergence guarantees. From a technical perspective, we will see how to apply online (convex) optimization algorithms to non-convex optimization problems, and in the process recover and sometimes improve many prior non-convex optimization guarantees with an arguably simpler proof. I will also discuss some empirical results that describe places where common analytical models both succeed and fail to capture optimization behavior, and how these results motivate new open questions in optimization theory.
16:15 — Aaron Defazio (Meta) — Making your Theory-to-Practice Work: Online-to-Batch via Schedules & Schedule-Free Learning
Online to Batch conversions are the key to applying online learning methods to modern stochastic optimization problems in machine learning. I will show that early conversions such as Polyak averaging or last iterate heuristics have hurt the effectiveness online learning, skewing the results of hundreds of papers and holding the entire field back. Recent Online-to-Batch methods such as the linear decay schedule and schedule-free learning are vastly more effective, enabling theory-driven research with practical guarantees that match the state-of-the-art in applied deep learning.
Friday, November 7th
9:00 — Tomer Koren (Tel Aviv University) — Beyond Online-to-Batch: Exploring the Generalization Ability of Convex SGD
A classic and fundamental result in convex optimization, since Nemirovski and Yudin (1983), is that one-pass stochastic gradient descent (SGD) converges optimally in terms of population excess risk for any convex Lipschitz objective, independently of the problem dimension. Modernly, this is often viewed as a direct consequence of regret minimization through an online-to-batch conversion. In this talk, I will discuss several basic questions concerning this seminal result:
- What form of capacity control enables this optimal out-of-sample performance of SGD?
- At what rate does SGD minimize the empirical risk?
- What happens beyond the first pass? How quickly does the population performance deteriorate (if at all)?
I will present several results that shed light on these questions and reveal intriguing phenomena in the generalization behavior of SGD within the classical convex setting.
Based on joint work with Matan Schliserman, Uri Sherman, Shira Vansover-Hager, and Roi Livni.
9:45 — Tim van Erven (University of Amsterdam) — An Improved Algorithm for Adversarial Linear Contextual Bandits via Reduction
I will present an efficient algorithm for linear contextual bandits with adversarial losses and stochastic action sets. Our approach reduces this setting to misspecification-robust adversarial linear bandits with fixed action sets. Without knowledge of the context distribution or access to a context simulator, the algorithm achieves \(\tilde{O}(\min\{d^2\sqrt{T}, \sqrt{d^3T\log K}\})\) regret and runs in \(\text{poly}(d,C,T)\) time, where d is the feature dimension, C is an upper bound on the number of linear constraints defining the action set in each round, \(K\) is an upper bound on the number of actions in each round, and \(T\) is number of rounds. This resolves the open question by Liu et al. (2023) on whether one can obtain \(\text{poly}(d)\sqrt{T}\) regret in polynomial time independent of the number of actions. For the important class of combinatorial bandits with adversarial losses and stochastic action sets where the action sets can be described by a polynomial number of linear constraints, our algorithm is the first to achieve \(\text{poly}(d)\sqrt{T}\) regret in polynomial time, while no prior algorithm achieves even \(o(T)\) regret in polynomial time to our knowledge. When a simulator is available, the regret bound can be improved to \(\tilde{O}(d\sqrt{L^\star})\), where \(L^\star\) is the cumulative loss of the best policy.
This is joint work with Jack Mayo, Julia Olkhovskaya and Chen-Yu Wei.
11:00 — Emilie Kaufmann (CNRS) — Multi-objective bandits revisited
In multi-objective bandit models, an agent sequentially sample arms and collect multi-variate observations. The arm’s expected payoff are thus vectors in Rd and there is no obvious notion of “best arm” as some arms may be better for some objective but worse for others. In this talk, I will present algorithms for the adaptive identification of the Pareto set of the arms means, in a fixed-confidence setting. These algorithms adaptively sample the arms and also adaptively stop the data collection process so as to guarantee an error at most δ for their guess of the Pareto set. I will present a first algorithm based on confidence regions that achieves a near-optimal sample complexity bound featuring some appropriate notions of “sub-optimality gap”. Then we will discuss asymptotically optimal algorithm, i.e. algorithms whose sample complexity is matching a lower bound in the regime in which the error probability is small.
This talk is based on joint works with Cyrille Koné, Laura Richert and Marc Jourdan — https://arxiv.org/abs/2307.00424 — https://arxiv.org/abs/2411.04939
13:30 — Michael I. Jordan (University of California, Berkeley & INRIA) — Sequential Hypothesis Tests and Universal Log-Optimality for General Classes of E-processes
We consider the problem of sequential hypothesis testing by betting. For a general class of composite testing problems—which include bounded mean testing, equal mean testing for bounded random tuples, and some key ingredients of two-sample and independence testing as special cases—we show that any e-process satisfying a certain sublinear regret bound is adaptively, asymptotically, and almost surely log-optimal for a composite alternative. This is a strong notion of optimality that has not previously been established for the aforementioned problems and we provide explicit test supermartingales and e-processes satisfying this notion in the more general case. Furthermore, we derive matching lower and upper bounds on the expected rejection time for the resulting sequential tests in all of these cases. The proofs of these results make weak, algorithm-agnostic moment assumptions and rely on a general-purpose proof technique involving the aforementioned regret and a family of numeraire portfolios.
Joint work with Ian Waudby-Smith and Ricardo Sandoval
14:15 — Sarah Sachs (University of Bristol) — Tracking Solutions of Time-Varying Variational Inequalities
We can compute the solution of a fixed variational inequality under sufficient assumptions. But what happens if the variational inequality varies over time? Can we track the solutions in an online setting?
In this talk, I will present our recent results on tracking guarantees for solutions of time-varying variational inequalities [1]. Since variational inequality problems generalize equilibrium computation in concave games, minimization of convex functions, and parameter estimation in generalized linear models, our results have applications in game theory, dynamic regret minimization, and parameter estimation in statistics. Existing prior work has primarily focused on time-varying games or time-varying optimization problems. For strongly convex optimization problems or strongly monotone games, such results provide tracking guarantees under the assumption that the variation of the time-varying problem is restrained, that is, problems with a sublinear solution path. We extend existing results in two ways: In our first result, we provide tracking bounds for (1) variational inequalities with a sublinear solution path but not necessarily monotone functions, and (2) for periodic time-varying variational inequalities that do not necessarily have a sublinear solution path length. Our second main contribution is an extensive study of the convergence behavior and trajectory of discrete dynamical systems of periodic time-varying variational inequalities. We show that these systems can either exhibit provably chaotic behavior or can converge to the solution. I will conclude this talk by highlighting open research questions and discussing potential future directions in this area.
[1] Tracking solutions of time-varying variational inequalities. Joint work with Hédi Hadiji and Cristobál Guzmán.
15:30 — Francis Bach (INRIA) — TBA
16:15 — Roi Livni (Tel Aviv University) — Online Proper Learning with Partial Feedback and Non-Random Counterexamples
Online Proper Learning with Partial Feedback and Non-Random Counterexamples
Large-scale learners can behave unpredictably, producing overparameterized models that achieve high accuracy yet exhibit inconsistent behavior. To better understand reliable learning, we revisit the proper online learning model with equivalence queries, introduced by Angluin (1988), where the learner must always output a valid hypothesis from the concept class and learns through counterexamples.
We establish improved bounds and extend the framework to multi-class bandit feedback. Specifically, we generalize to a broader family of counterexample distributions, derive improved rates for infinite Littlestone classes and consider also partial feedback where the hypotheses are multiclass but the feedback is a non-labeled mistake. Our results build on a new proof of Angluin and Dohrn’s analysis, showing that random counterexamples yield faster convergence for finite classes, and extend this insight to richer and more realistic feedback models.