Publications
* Author names are ordered alphabetically
- Seq. DecisionShould Demand Models Incorporate Competitor Prices? Oblivious Learning and Algorithmic Collusion*Yuhang Wu and Assaf Zeevi
Preprint, 2026
Extended abstract “Oblivious Learning, Price Exploration and Collusive Dynamics” accepted at ACM Conference on Economics and Computation (EC), 2026On a platform with many sellers, should a pricing algorithm explicitly model competitors' prices when learning demand? Classical learning arguments suggest an affirmative answer: ignoring competitors induces model misspecification and inefficiency. In contrast, recent work on algorithmic collusion suggests that strategic obliviousness -- deliberately ignoring competitor prices -- may facilitate collusive outcomes and improve profits. We study this modeling choice in a stylized competitive market with unknown noisy demand, in which multiple sellers repeatedly set prices and estimate demand via iterated least squares, and either incorporate competitors' prices into their demand models (informed) or ignore them (oblivious). We first show that, relative to a monopolist, an oblivious seller in a competitive market must explore more aggressively to compensate for the loss of dynamic competitor information. Building on this insight, we characterize market dynamics when all sellers are oblivious and show that prices converge to the competitive outcome under sufficient exploration, while a continuum of pseudo-equilibria arises when exploration decays. Analyzing the resulting price trajectories, we uncover an excursion phenomenon that gives rise to transient collusive patterns that dissipate as learning progresses. In markets with both oblivious and informed sellers, the informed strictly out-earn the oblivious. Read as a strategy game, the modeling choice has a unique Nash equilibrium: the all-informed market, in which prices converge to the competitive outcome efficiently. Overall, our results indicate that collusive patterns are not robust and are not sustained by oblivious modeling; therefore, incorporating competitor information, together with sufficient price exploration, remains a reliable strategy for sellers in competitive markets.
@misc{wu2026demandmodelsincorporatecompetitor, title={Should Demand Models Incorporate Competitor Prices? Oblivious Learning and Algorithmic Collusion}, author={Yuhang Wu and Assaf Zeevi}, year={2026}, eprint={2606.05363}, archivePrefix={arXiv}, primaryClass={cs.GT}, url={https://arxiv.org/abs/2606.05363}, } - GenAI x ORAdaptive Querying with AI Persona Priors*Kaizheng Wang, Yuhang Wu, and Assaf Zeevi
International Conference on Machine Learning (ICML), 2026
ICML 2026 Workshop on Decision-Making from Offline Datasets to Online AdaptationWe study adaptive querying for learning user-dependent quantities of interest, such as responses to held-out items and psychometric indicators, within tight query budgets. Classical Bayesian design and computerized adaptive testing typically rely on restrictive parametric assumptions or expensive posterior approximations, limiting their use in heterogeneous, high-dimensional, and cold-start settings. We introduce a persona-induced latent variable model that represents a user's state through membership in a finite dictionary of AI personas, each offering response distributions produced by a large language model. This yields expressive priors with closed-form posterior updates and efficient finite-mixture predictions, enabling scalable Bayesian design for sequential item selection. Experiments on synthetic data and WorldValuesBench demonstrate that persona-based posteriors deliver accurate probabilistic predictions and an interpretable adaptive elicitation pipeline.
@misc{wang2026adaptivequeryingaipersona, title={Adaptive Querying with AI Persona Priors}, author={Kaizheng Wang and Yuhang Wu and Assaf Zeevi}, year={2026}, eprint={2605.00696}, archivePrefix={arXiv}, primaryClass={stat.ML}, url={https://arxiv.org/abs/2605.00696}, } - Digital TwinSYN-DIGITS: A Synthetic Control Framework for Calibrated Digital Twin Simulation*
Preprint, 2026
Short version accepted at ICML 2026 Workshop on Connecting Low-rank Representations in AIAI-based persona simulation -- often referred to as digital twin simulation -- is increasingly used for market research, recommender systems, and social sciences. Despite their flexibility, large language models (LLMs) often exhibit systematic bias and miscalibration relative to real human behavior, limiting their reliability. Inspired by synthetic control methods from causal inference, we propose SYN-DIGITS (SYNthetic Control Framework for Calibrated DIGItal Twin Simulation), a principled and lightweight calibration framework that learns latent structure from digital-twin responses and transfers it to align predictions with human ground truth. SYN-DIGITS operates as a post-processing layer on top of any LLM-based simulator and thus is model-agnostic. We develop a latent factor model that formalizes when and why calibration succeeds through latent space alignment conditions, and we systematically evaluate ten calibration methods across thirteen persona constructions, three LLMs, and two datasets. SYN-DIGITS supports both individual-level and distributional simulation for previously unseen questions and unobserved populations, with provable error guarantees. Experiments show that SYN-DIGITS achieves up to 50% relative improvements in individual-level correlation and 50--90% relative reductions in distributional discrepancy compared to uncalibrated baselines.
@misc{fan2026syndigits, title={SYN-DIGITS: A Synthetic Control Framework for Calibrated Digital Twin Simulation}, author={Grace Jiarui Fan and Chengpiao Huang and Tianyi Peng and Kaizheng Wang and Yuhang Wu}, year={2026}, eprint={2604.07513}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2604.07513}, } - GenAI x ORE-GEO: A Testbed for Generative Engine Optimization in E-Commerce*
Preprint, 2025
With the rise of large language models (LLMs), generative engines are becoming powerful alternatives to traditional search, reshaping retrieval tasks. In e-commerce, for instance, conversational shopping agents now guide consumers to relevant products. This shift has created the need for generative engine optimization (GEO)--improving content visibility and relevance for generative engines. Yet despite its growing importance, current GEO practices are ad hoc, and their impacts remain poorly understood, especially in e-commerce. We address this gap by introducing E-GEO, the first benchmark built specifically for e-commerce GEO. E-GEO contains over 7,000 realistic, multi-sentence consumer product queries paired with relevant listings, capturing rich intent, constraints, preferences, and shopping contexts that existing datasets largely miss. Using this benchmark, we conduct the first large-scale empirical study of e-commerce GEO, evaluating 15 common rewriting heuristics and comparing their empirical performance. To move beyond heuristics, we further formulate GEO as a tractable optimization problem and develop a lightweight iterative prompt-optimization algorithm that can significantly outperform these baselines. Surprisingly, the optimized prompts reveal a stable, domain-agnostic pattern--suggesting the existence of a "universally effective" GEO strategy.
@misc{bagga2025egeo, title={E-GEO: A Testbed for Generative Engine Optimization in E-Commerce}, author={Puneet S. Bagga and Vivek F. Farias and Tamar Korkotashvili and Tianyi Peng and Yuhang Wu}, year={2025}, eprint={2511.20867}, archivePrefix={arXiv}, primaryClass={cs.IR}, url={https://arxiv.org/abs/2511.20867}, } - GenAI x ORPerformance of LLMs on Stochastic Modeling Operations Research Problems: From Theory to Practice*
Winter Simulation Conference (WSC), 2025
Large language models (LLMs) have exhibited expert-level capabilities across various domains. However, their abilities to solve problems in Operations Research (OR) -- the analysis and optimization of mathematical models derived from real-world problems or their verbal descriptions -- remain underexplored. In this work, we take a first step toward evaluating LLMs' abilities to solve stochastic modeling problems, a core class of OR problems characterized by uncertainty and typically involving tools from probability, statistics, and stochastic processes. We manually procure a representative set of graduate-level homework and doctoral qualification-exam problems and test LLMs' abilities to solve them. We further leverage SimOpt, an open-source library of simulation-optimization problems and solvers, to investigate LLMs' abilities to make real-world decisions under uncertainty. Our results show that, though a nontrivial amount of work is still needed to reliably automate the stochastic modeling pipeline in reality, state-of-the-art LLMs demonstrate proficiency on par with human experts in both classroom and practical settings. These findings highlight the potential of building AI agents that assist OR researchers and amplify the real-world impact of OR through automation.
@misc{kumar2025performance, title={Performance of LLMs on Stochastic Modeling Operations Research Problems: From Theory to Practice}, author={Akshit Kumar and Tianyi Peng and Yuhang Wu and Assaf Zeevi}, year={2025}, eprint={2506.23924}, archivePrefix={arXiv}, primaryClass={cs.AI}, url={https://arxiv.org/abs/2506.23924}, } - Digital TwinHow Many Human Survey Respondents is a Large Language Model Worth? An Uncertainty Quantification PerspectiveChengpiao Huang*, Yuhang Wu*, and Kaizheng Wang
Preprint, 2025
Short version “Uncertainty Quantification for LLM-Based Survey Simulations” appeared at International Conference on Machine Learning (ICML), 2025Large language models (LLMs) are increasingly used to simulate survey responses, but synthetic data can be misaligned with the human population, leading to unreliable inference. We develop a general framework that converts LLM-simulated responses into reliable confidence sets for population parameters of human responses, addressing the distribution shift between the simulated and real populations. The key design choice is the number of simulated responses: too many produce overly narrow sets with poor coverage, while too few yield excessively loose estimates. We propose a data-driven approach that adaptively selects the simulation sample size to achieve nominal average-case coverage, regardless of the LLM's simulation fidelity or the confidence set construction procedure. The selected sample size is further shown to reflect the effective human population size that the LLM can represent, providing a quantitative measure of its simulation fidelity. Experiments on real survey datasets reveal heterogeneous fidelity gaps across different LLMs and domains.
@misc{huang2025human, title={How Many Human Survey Respondents is a Large Language Model Worth? An Uncertainty Quantification Perspective}, author={Chengpiao Huang and Yuhang Wu and Kaizheng Wang}, year={2025}, eprint={2502.17773}, archivePrefix={arXiv}, primaryClass={stat.ME}, url={https://arxiv.org/abs/2502.17773}, } - OptimizationDiscrete-time Simulated Annealing: A Convergence Analysis via the Eyring-Kramers Law*Wenpin Tang, Yuhang Wu, and Xun Yu Zhou
Numerical Algebra, Control and Optimization, 2024
We study the convergence rate of the discrete-time simulated annealing process $(x_k; k=0,1,…)$ for approximating the global optimum of a given function $f$. We prove that the tail probability $\mathbb P (f(x_k) > \min f + \delta)$ decays polynomial in cumulative step size, and provide an explicit rate through a non-asymptotic bound in terms of the model parameters. Our argument applies the recent development on functional inequalities for the Gibbs measure at low temperatures---the Eyring-Kramers law. The result leads to a condition on the step size to ensure the convergence. Finally, we perform numerical experiments to corroborate our theoretical result.
@article{tang2024discrete, title={Discrete-time Simulated Annealing: A Convergence Analysis via the Eyring-Kramers Law}, author={Tang, Wenpin and Wu, Yuhang and Zhou, Xun Yu}, journal={Numerical Algebra, Control and Optimization}, year={2024}, volume={14}, number={4}, pages={778--794}, publisher={American Institute of Mathematics} } - OptimizationAdaptive Data Fusion for Multi-task Non-smooth Optimization*
Preprint, 2022
We study the problem of multi-task non-smooth optimization that arises ubiquitously in statistical learning, decision-making and risk management. We develop a data fusion approach that adaptively leverages commonalities among a large number of objectives to improve sample efficiency while tackling their unknown heterogeneities. We provide sharp statistical guarantees for our approach. Numerical experiments on both synthetic and real data demonstrate significant advantages of our approach over benchmarks.
@misc{lam2022adaptive, title={Adaptive Data Fusion for Multi-task Non-smooth Optimization}, author={Henry Lam and Kaizheng Wang and Yuhang Wu and Yichen Zhang}, year={2022}, eprint={2210.12334}, archivePrefix={arXiv}, primaryClass={stat.ML}, url={https://arxiv.org/abs/2210.12334}, }