Publications | Yuhang Wu

* Author names are ordered alphabetically

Adaptive Querying
Adaptive Querying with AI Persona Priors*

Kaizheng Wang, Yuhang Wu, and Assaf Zeevi

arXiv:2605.00696, 2026

Accepted at International Conference on Machine Learning (ICML), 2026

Abs arXiv Bib Code

We study adaptive querying for learning user-dependent quantities of interest, such as responses to held-out items and psychometric indicators, within tight question budgets. Classical Bayesian design and computerized adaptive testing typically rely on restrictive parametric assumptions or expensive posterior approximations, limiting their use in heterogeneous, high-dimensional, and cold-start settings. We introduce a persona-induced latent variable model that represents a user's state through membership in a finite dictionary of AI personas, each offering response distributions produced by a large language model. This yields expressive priors with closed-form posterior updates and efficient finite-mixture predictions, enabling scalable Bayesian design for sequential item selection. Experiments on synthetic data and WorldValuesBench demonstrate that persona-based posteriors deliver accurate probabilistic predictions and an interpretable adaptive elicitation pipeline.
@misc{wang2026adaptivequeryingaipersona, title={Adaptive Querying with AI Persona Priors}, author={Kaizheng Wang and Yuhang Wu and Assaf Zeevi}, year={2026}, eprint={2605.00696}, archivePrefix={arXiv}, primaryClass={stat.ML}, url={https://arxiv.org/abs/2605.00696}, }
Digital Twin
SYN-DIGITS: A Synthetic Control Framework for Calibrated Digital Twin Simulation*

Grace Jiarui Fan, Chengpiao Huang, Tianyi Peng, and 2 more authors

arXiv:2604.07513, 2026

Abs arXiv Bib Code SSRN

AI-based persona simulation -- often referred to as digital twin simulation -- is increasingly used for market research, recommender systems, and social sciences. Despite their flexibility, large language models (LLMs) often exhibit systematic bias and miscalibration relative to real human behavior, limiting their reliability. Inspired by synthetic control methods from causal inference, we propose SYN-DIGITS (SYNthetic Control Framework for Calibrated DIGItal Twin Simulation), a principled and lightweight calibration framework that learns latent structure from digital-twin responses and transfers it to align predictions with human ground truth. SYN-DIGITS operates as a post-processing layer on top of any LLM-based simulator and thus is model-agnostic. We develop a latent factor model that formalizes when and why calibration succeeds through latent space alignment conditions, and we systematically evaluate ten calibration methods across thirteen persona constructions, three LLMs, and two datasets. SYN-DIGITS supports both individual-level and distributional simulation for previously unseen questions and unobserved populations, with provable error guarantees. Experiments show that SYN-DIGITS achieves up to 50% relative improvements in individual-level correlation and 50--90% relative reductions in distributional discrepancy compared to uncalibrated baselines.
@misc{fan2026syndigits, title={SYN-DIGITS: A Synthetic Control Framework for Calibrated Digital Twin Simulation}, author={Grace Jiarui Fan and Chengpiao Huang and Tianyi Peng and Kaizheng Wang and Yuhang Wu}, year={2026}, eprint={2604.07513}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2604.07513}, }
Generative Engine Optimization
E-GEO: A Testbed for Generative Engine Optimization in E-Commerce*

Puneet S. Bagga, Vivek F. Farias, Tamar Korkotashvili, and 2 more authors

arXiv:2511.20867, 2025

Abs arXiv Bib Code

With the rise of large language models (LLMs), generative engines are becoming powerful alternatives to traditional search, reshaping retrieval tasks. In e-commerce, for instance, conversational shopping agents now guide consumers to relevant products. This shift has created the need for generative engine optimization (GEO)--improving content visibility and relevance for generative engines. Yet despite its growing importance, current GEO practices are ad hoc, and their impacts remain poorly understood, especially in e-commerce. We address this gap by introducing E-GEO, the first benchmark built specifically for e-commerce GEO. E-GEO contains over 7,000 realistic, multi-sentence consumer product queries paired with relevant listings, capturing rich intent, constraints, preferences, and shopping contexts that existing datasets largely miss. Using this benchmark, we conduct the first large-scale empirical study of e-commerce GEO, evaluating 15 common rewriting heuristics and comparing their empirical performance. To move beyond heuristics, we further formulate GEO as a tractable optimization problem and develop a lightweight iterative prompt-optimization algorithm that can significantly outperform these baselines. Surprisingly, the optimized prompts reveal a stable, domain-agnostic pattern--suggesting the existence of a "universally effective" GEO strategy.
@misc{bagga2025egeo, title={E-GEO: A Testbed for Generative Engine Optimization in E-Commerce}, author={Puneet S. Bagga and Vivek F. Farias and Tamar Korkotashvili and Tianyi Peng and Yuhang Wu}, year={2025}, eprint={2511.20867}, archivePrefix={arXiv}, primaryClass={cs.IR}, url={https://arxiv.org/abs/2511.20867}, }
AI-Automated OR
Performance of LLMs on Stochastic Modeling Operations Research Problems: From Theory to Practice*

Akshit Kumar, Tianyi Peng, Yuhang Wu, and 1 more author

arXiv:2506.23924, 2025

Accepted at Winter Simulation Conference (WSC), 2025

Abs arXiv Bib Code

Large language models (LLMs) have exhibited expert-level capabilities across various domains. However, their abilities to solve problems in Operations Research (OR) -- the analysis and optimization of mathematical models derived from real-world problems or their verbal descriptions -- remain underexplored. In this work, we take a first step toward evaluating LLMs' abilities to solve stochastic modeling problems, a core class of OR problems characterized by uncertainty and typically involving tools from probability, statistics, and stochastic processes. We manually procure a representative set of graduate-level homework and doctoral qualification-exam problems and test LLMs' abilities to solve them. We further leverage SimOpt, an open-source library of simulation-optimization problems and solvers, to investigate LLMs' abilities to make real-world decisions under uncertainty. Our results show that, though a nontrivial amount of work is still needed to reliably automate the stochastic modeling pipeline in reality, state-of-the-art LLMs demonstrate proficiency on par with human experts in both classroom and practical settings. These findings highlight the potential of building AI agents that assist OR researchers and amplify the real-world impact of OR through automation.
@misc{kumar2025performance, title={Performance of LLMs on Stochastic Modeling Operations Research Problems: From Theory to Practice}, author={Akshit Kumar and Tianyi Peng and Yuhang Wu and Assaf Zeevi}, year={2025}, eprint={2506.23924}, archivePrefix={arXiv}, primaryClass={cs.AI}, url={https://arxiv.org/abs/2506.23924}, }
LLM Survey Simulation
How Many Human Survey Respondents is a Large Language Model Worth? An Uncertainty Quantification Perspective

Chengpiao Huang^*, Yuhang Wu^*, and Kaizheng Wang

arXiv:2502.17773, 2025

Short version "Uncertainty Quantification for LLM-Based Survey Simulations" appeared at International Conference on Machine Learning (ICML), 2025

Abs arXiv Bib Code SSRN

Large language models (LLMs) are increasingly used to simulate survey responses, but synthetic data can be misaligned with the human population, leading to unreliable inference. We develop a general framework that converts LLM-simulated responses into reliable confidence sets for population parameters of human responses, addressing the distribution shift between the simulated and real populations. The key design choice is the number of simulated responses: too many produce overly narrow sets with poor coverage, while too few yield excessively loose estimates. We propose a data-driven approach that adaptively selects the simulation sample size to achieve nominal average-case coverage, regardless of the LLM's simulation fidelity or the confidence set construction procedure. The selected sample size is further shown to reflect the effective human population size that the LLM can represent, providing a quantitative measure of its simulation fidelity. Experiments on real survey datasets reveal heterogeneous fidelity gaps across different LLMs and domains.
@misc{huang2025human, title={How Many Human Survey Respondents is a Large Language Model Worth? An Uncertainty Quantification Perspective}, author={Chengpiao Huang and Yuhang Wu and Kaizheng Wang}, year={2025}, eprint={2502.17773}, archivePrefix={arXiv}, primaryClass={stat.ME}, url={https://arxiv.org/abs/2502.17773}, }
Simulated Annealing
Discrete-time Simulated Annealing: A Convergence Analysis via the Eyring-Kramers Law*

Wenpin Tang, Yuhang Wu, and Xun Yu Zhou

Numerical Algebra, Control and Optimization, 2024

Abs Bib Journal

We study the convergence rate of the discrete-time simulated annealing process $(x_k; k=0,1,…)$ for approximating the global optimum of a given function $f$. We prove that the tail probability $\mathbb P (f(x_k) > \min f + \delta)$ decays polynomial in cumulative step size, and provide an explicit rate through a non-asymptotic bound in terms of the model parameters. Our argument applies the recent development on functional inequalities for the Gibbs measure at low temperatures---the Eyring-Kramers law. The result leads to a condition on the step size to ensure the convergence. Finally, we perform numerical experiments to corroborate our theoretical result.
@article{tang2024discrete, title={Discrete-time Simulated Annealing: A Convergence Analysis via the Eyring-Kramers Law}, author={Tang, Wenpin and Wu, Yuhang and Zhou, Xun Yu}, journal={Numerical Algebra, Control and Optimization}, year={2024}, volume={14}, number={4}, pages={778--794}, publisher={American Institute of Mathematics} }
Multi-task Learning
Adaptive Data Fusion for Multi-task Non-smooth Optimization*

Henry Lam, Kaizheng Wang, Yuhang Wu, and 1 more author

arXiv:2210.12334, 2022

Abs arXiv Bib

We study the problem of multi-task non-smooth optimization that arises ubiquitously in statistical learning, decision-making and risk management. We develop a data fusion approach that adaptively leverages commonalities among a large number of objectives to improve sample efficiency while tackling their unknown heterogeneities. We provide sharp statistical guarantees for our approach. Numerical experiments on both synthetic and real data demonstrate significant advantages of our approach over benchmarks.
@misc{lam2022adaptive, title={Adaptive Data Fusion for Multi-task Non-smooth Optimization}, author={Henry Lam and Kaizheng Wang and Yuhang Wu and Yichen Zhang}, year={2022}, eprint={2210.12334}, archivePrefix={arXiv}, primaryClass={stat.ML}, url={https://arxiv.org/abs/2210.12334}, }