I am a third-year PhD student in the Decision, Risk, and Operations (DRO) division at Columbia Business School, working with Prof. Assaf Zeevi and Prof. Kaizheng Wang (Columbia IEOR). I am broadly interested in AI and Operations Research (OR), with a focus on digital twin simulation and sequential decision making under uncertainty. Prior to joining the PhD program at DRO, I received my B.A. in Mathematics and Statistics also from Columbia University.
Contact: yuhang.wu@columbia.edu
News
Jun 16, 2026
I am honored to be selected as a Deming Doctoral Fellow for the 2026-2027 academic year.
May 19, 2026
Paper “Oblivious Learning, Price Exploration and Collusive Dynamics” accepted at ACM Conference on Economics and Computation (EC), 2026.
Apr 30, 2026
Paper “Adaptive Querying with AI Persona Priors” accepted at International Conference on Machine Learning (ICML), 2026.
Apr 10, 2026
New paper “SYN-DIGITS: A Synthetic Control Framework for Calibrated Digital Twin Simulation” posted on arXiv and SSRN.
Nov 26, 2025
New paper “E-GEO: A Testbed for Generative Engine Optimization in E-Commerce” posted on arXiv.
On a platform with many sellers, should a pricing algorithm explicitly model competitors' prices when learning demand? Classical learning arguments suggest an affirmative answer: ignoring competitors induces model misspecification and inefficiency. In contrast, recent work on algorithmic collusion suggests that strategic obliviousness -- deliberately ignoring competitor prices -- may facilitate collusive outcomes and improve profits. We study this modeling choice in a stylized competitive market with unknown noisy demand, in which multiple sellers repeatedly set prices and estimate demand via iterated least squares, and either incorporate competitors' prices into their demand models (informed) or ignore them (oblivious). We first show that, relative to a monopolist, an oblivious seller in a competitive market must explore more aggressively to compensate for the loss of dynamic competitor information. Building on this insight, we characterize market dynamics when all sellers are oblivious and show that prices converge to the competitive outcome under sufficient exploration, while a continuum of pseudo-equilibria arises when exploration decays. Analyzing the resulting price trajectories, we uncover an excursion phenomenon that gives rise to transient collusive patterns that dissipate as learning progresses. In markets with both oblivious and informed sellers, the informed strictly out-earn the oblivious. Read as a strategy game, the modeling choice has a unique Nash equilibrium: the all-informed market, in which prices converge to the competitive outcome efficiently. Overall, our results indicate that collusive patterns are not robust and are not sustained by oblivious modeling; therefore, incorporating competitor information, together with sufficient price exploration, remains a reliable strategy for sellers in competitive markets.
We study adaptive querying for learning user-dependent quantities of interest, such as responses to held-out items and psychometric indicators, within tight query budgets. Classical Bayesian design and computerized adaptive testing typically rely on restrictive parametric assumptions or expensive posterior approximations, limiting their use in heterogeneous, high-dimensional, and cold-start settings. We introduce a persona-induced latent variable model that represents a user's state through membership in a finite dictionary of AI personas, each offering response distributions produced by a large language model. This yields expressive priors with closed-form posterior updates and efficient finite-mixture predictions, enabling scalable Bayesian design for sequential item selection. Experiments on synthetic data and WorldValuesBench demonstrate that persona-based posteriors deliver accurate probabilistic predictions and an interpretable adaptive elicitation pipeline.
@misc{wang2026adaptivequeryingaipersona,
title={Adaptive Querying with AI Persona Priors},
author={Kaizheng Wang and Yuhang Wu and Assaf Zeevi},
year={2026},
eprint={2605.00696},
archivePrefix={arXiv},
primaryClass={stat.ML},
url={https://arxiv.org/abs/2605.00696},
}
Digital Twin
SYN-DIGITS: A Synthetic Control Framework for Calibrated Digital Twin Simulation*
AI-based persona simulation -- often referred to as digital twin simulation -- is increasingly used for market research, recommender systems, and social sciences. Despite their flexibility, large language models (LLMs) often exhibit systematic bias and miscalibration relative to real human behavior, limiting their reliability. Inspired by synthetic control methods from causal inference, we propose SYN-DIGITS (SYNthetic Control Framework for Calibrated DIGItal Twin Simulation), a principled and lightweight calibration framework that learns latent structure from digital-twin responses and transfers it to align predictions with human ground truth. SYN-DIGITS operates as a post-processing layer on top of any LLM-based simulator and thus is model-agnostic. We develop a latent factor model that formalizes when and why calibration succeeds through latent space alignment conditions, and we systematically evaluate ten calibration methods across thirteen persona constructions, three LLMs, and two datasets. SYN-DIGITS supports both individual-level and distributional simulation for previously unseen questions and unobserved populations, with provable error guarantees. Experiments show that SYN-DIGITS achieves up to 50% relative improvements in individual-level correlation and 50--90% relative reductions in distributional discrepancy compared to uncalibrated baselines.
@misc{fan2026syndigits,
title={SYN-DIGITS: A Synthetic Control Framework for Calibrated Digital Twin Simulation},
author={Grace Jiarui Fan and Chengpiao Huang and Tianyi Peng and Kaizheng Wang and Yuhang Wu},
year={2026},
eprint={2604.07513},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2604.07513},
}
GenAI x OR
E-GEO: A Testbed for Generative Engine Optimization in E-Commerce*
With the rise of large language models (LLMs), generative engines are becoming powerful alternatives to traditional search, reshaping retrieval tasks. In e-commerce, for instance, conversational shopping agents now guide consumers to relevant products. This shift has created the need for generative engine optimization (GEO)--improving content visibility and relevance for generative engines. Yet despite its growing importance, current GEO practices are ad hoc, and their impacts remain poorly understood, especially in e-commerce. We address this gap by introducing E-GEO, the first benchmark built specifically for e-commerce GEO. E-GEO contains over 7,000 realistic, multi-sentence consumer product queries paired with relevant listings, capturing rich intent, constraints, preferences, and shopping contexts that existing datasets largely miss. Using this benchmark, we conduct the first large-scale empirical study of e-commerce GEO, evaluating 15 common rewriting heuristics and comparing their empirical performance. To move beyond heuristics, we further formulate GEO as a tractable optimization problem and develop a lightweight iterative prompt-optimization algorithm that can significantly outperform these baselines. Surprisingly, the optimized prompts reveal a stable, domain-agnostic pattern--suggesting the existence of a "universally effective" GEO strategy.
@misc{bagga2025egeo,
title={E-GEO: A Testbed for Generative Engine Optimization in E-Commerce},
author={Puneet S. Bagga and Vivek F. Farias and Tamar Korkotashvili and Tianyi Peng and Yuhang Wu},
year={2025},
eprint={2511.20867},
archivePrefix={arXiv},
primaryClass={cs.IR},
url={https://arxiv.org/abs/2511.20867},
}
Digital Twin
How Many Human Survey Respondents is a Large Language Model Worth? An Uncertainty Quantification Perspective
Preprint, 2025 Short version “Uncertainty Quantification for LLM-Based Survey Simulations” appeared at International Conference on Machine Learning (ICML), 2025
Large language models (LLMs) are increasingly used to simulate survey responses, but synthetic data can be misaligned with the human population, leading to unreliable inference. We develop a general framework that converts LLM-simulated responses into reliable confidence sets for population parameters of human responses, addressing the distribution shift between the simulated and real populations. The key design choice is the number of simulated responses: too many produce overly narrow sets with poor coverage, while too few yield excessively loose estimates. We propose a data-driven approach that adaptively selects the simulation sample size to achieve nominal average-case coverage, regardless of the LLM's simulation fidelity or the confidence set construction procedure. The selected sample size is further shown to reflect the effective human population size that the LLM can represent, providing a quantitative measure of its simulation fidelity. Experiments on real survey datasets reveal heterogeneous fidelity gaps across different LLMs and domains.
@misc{huang2025human,
title={How Many Human Survey Respondents is a Large Language Model Worth? An Uncertainty Quantification Perspective},
author={Chengpiao Huang and Yuhang Wu and Kaizheng Wang},
year={2025},
eprint={2502.17773},
archivePrefix={arXiv},
primaryClass={stat.ME},
url={https://arxiv.org/abs/2502.17773},
}