Wenyue Hua

Senior Researcher

Microsoft Research, AI Frontier

About Me

Welcome!

I’m Wenyue Hua, senior researcher at Microsoft Research, AI Frontiers. I was a postdoctoral researcher at University of California, Santa Barbara, working with Prof. William Yang Wang (2024 - 2025). I obtained my Ph.D. degree from Rutgers University, New Brunswick (2020 - 2024). I’m honored to be advised by Prof. Yongfeng Zhang. I received MA in Linguistics at Rutgers in 2020 (proudly advised by Prof. Adam Jardine) and BA in Linguistics and Philosophy and BS in Mathematics at UCLA in 2018 (proudly advised by Prof. Edward Keenan).

My research interests lie in Large Language Models and its various application, such as LLM-based agent, multi-agent system, generative recommender system, LLM reasoning. I care about the decision-making ability, safety, and efficiency of LLM-based agents.

I am also Partner @ NICE AI TALK (https://nice-intl.github.io/). If you have any paper or project that you want to present, please contact nice.ai.academy@gmail.com

Collaboration & Mentoring

I welcome discussions about AI agents and am open to collaborations with researchers and industry professionals. I also enjoy mentoring students at various stages of their academic journey.

Feel free to email me a conversation at wenyue.hua@rutgers.edu to explore potential partnerships or discuss recent developments in the field.

Research Interests

Large language models
LLM-based agent
Trustworthy AI
Efficient AI

Education

Postdoctoral Research in Computer Science, 2024-2025

Computer Science Department, University of California, Santa Barbara
Ph.D. in Computer Science, 2020-2024

Computer Science Department, Rutgers University, New Brunswick
Master of Arts in Linguistics (Ph.D. track transfer out), 2018-2020

Department of Linguistics, Rutgers University, New Brunswick
B.S. in Mathematics, General & B.A. in Linguistics&Philosophy with Specialization in Computing, 2014-2018

UCLA

News

Our workshop on Memory Layer for Agentic Systems (MemAgents) was accepted to ICLR 2026!

Dec 24, 2025 00:00

Gave a talk about Magentic Marketplace at Columbia Agent Workshop!

Oct 27, 2025 00:00

Join the Webinar talk hosted by HireEz talking about AI's impact on job market!

Sep 30, 2025 00:00

Gave an invited talk about Magentic Marketplace at Recsys 2025 EARL workshop!

Sep 26, 2025 00:00

Gave a 5-minute demo at Made in Microsoft NYC!

Sep 10, 2025 00:00

See all events

Experience

Senior Researcher

Microsoft Research, AI Frontiers

June 2025 – Present New York, USA

Postdoctoral Researcher in Computer Science

Computer Science Department, University of California, Santa Barbara

October 2024 – June 2025 California, USA

Advisor: Prof. William Yang Wang

Ph.D. in Computer Science

Computer Science Department, Rutgers University, New Brunswick

September 2020 – October 2024 New Jersey, USA

Dissertation: Trustworthy Large Language Model
Advisor: Prof. Yongfeng Zhang

Master of Arts (Ph.D. track transfer out) in Linguistics

Department of Linguistics, Rutgers University

September 2018 – June 2020 New Jersey, USA

Thesis: Learning Underlying Representations and Input-Strictly-Local Functions
Advisor: Prof. Adam Jardine

B.S. in Mathematics, General & B.A. in Linguistics&Philosophy with Specialization in Computing

UCLA

October 2014 – June 2018 California, USA

Thesis: Boolean-Algebraic Representation of Possible Worlds
Advisor: Prof. Edward Keenan

Featured Publications

Yilin Guan, Qingfeng Lan, Fei Sun, Dujian Ding, Devang Acharya, Chi Wang, William Yang Wang, Wenyue Hua

September, 2025

Dynamic Speculative Agent Planning

Despite their remarkable success in complex tasks propelling widespread adoption, large language-model-based agents still face critical deployment challenges due to prohibitive latency and inference costs. While recent work has explored various methods to accelerate inference, existing approaches suffer from significant limitations – they either fail to preserve performance fidelity, require extensive offline training of router modules, or incur excessive operational costs. Moreover, they provide minimal user control over the tradeoff between acceleration and other performance metrics. To address these gaps, we introduce Dynamic Speculative Planning (DSP), an asynchronous online reinforcement learning framework that provides lossless acceleration with substantially reduced costs without requiring additional pre-deployment preparation. DSP explicitly optimizes a joint objective balancing end-to-end latency against dollar cost, allowing practitioners to adjust a single parameter that steers the system toward faster responses, cheaper operation, or any point along this continuum. Experiments on two standard agent benchmarks demonstrate that DSP achieves comparable efficiency to the fastest lossless acceleration method while reducing total cost by 30% and unnecessary cost up to 60%.

Wenyue Hua, Dujian Ding, Yile Gu, Yujie Ren, Kai Mei, Minghua Ma, William Yang Wang

June, 2025

Semantic Scheduling for LLM Inference

Conventional operating system scheduling algorithms are largely content-ignorant, making decisions based on factors such as latency or fairness without considering the actual intents or semantics of processes. Consequently, these algorithms often do not prioritize tasks that require urgent attention or carry higher importance, such as in emergency management scenarios. However, recent advances in language models enable semantic analysis of processes, allowing for more intelligent and context-aware scheduling decisions. In this paper, we introduce the concept of semantic scheduling in scheduling of requests from large language models (LLM), where the semantics of the process guide the scheduling priorities. We present a novel scheduling algorithm with optimal time complexity, designed to minimize the overall waiting time in LLM-based prompt scheduling. To illustrate its effectiveness, we present a medical emergency management application, underscoring the potential benefits of semantic scheduling for critical, time-sensitive tasks.

Qiyuan Zhang, Fuyuan Lyu, Zexu Sun, Lei Wang, Weixu Zhang, Wenyue Hua, Haolun Wu, Zhihan Guo, Yufei Wang, Niklas Muennighoff, Irwin King, Xue Liu, Chen Ma

May, 2025

A Survey on Test-Time Scaling in Large Language Models: What, How, Where, and How Well?

As enthusiasm for scaling computation (data and parameters) in the pretraining era gradually diminished, test-time scaling (TTS), also referred to as ``test-time computing’’ has emerged as a prominent research focus. Recent studies demonstrate that TTS can further elicit the problem-solving capabilities of large language models (LLMs), enabling significant breakthroughs not only in specialized reasoning tasks, such as mathematics and coding, but also in general tasks like open-ended Q&A. However, despite the explosion of recent efforts in this area, there remains an urgent need for a comprehensive survey offering a systemic understanding. To fill this gap, we propose a unified, multidimensional framework structured along four core dimensions of TTS research – what to scale, how to scale, where to scale, and how well to scale. Building upon this taxonomy, we conduct an extensive review of methods, application scenarios, and assessment aspects, and present an organized decomposition that highlights the unique functional roles of individual techniques within the broader TTS landscape. From this analysis, we distill the major developmental trajectories of TTS to date and offer hands-on guidelines for practical deployment. Furthermore, we identify several open challenges and offer insights into promising future directions, including further scaling, clarifying the functional essence of techniques, generalizing to more tasks, and more attributions.

Wenyue Hua, Tyler Wong, Fei Sun, Liangming Pan, Adam Jardine, William Yang Wang

February, 2025

InductionBench: LLMs Fail in the Simplest Complexity Class

Large language models (LLMs) have shown remarkable improvements in reasoning and many existing benchmarks have been addressed by models such as o1 and o3 either fully or partially. However, a majority of these benchmarks emphasize deductive reasoning, including mathematical and coding tasks in which rules such as mathematical axioms or programming syntax are clearly defined, based on which LLMs can plan and apply these rules to arrive at a solution. In contrast, inductive reasoning, where one infers the underlying rules from observed data, remains less explored. Such inductive processes lie at the heart of scientific discovery, as they enable researchers to extract general principles from empirical observations. To assess whether LLMs possess this capacity, we introduce InductionBench, a new benchmark designed to evaluate the inductive reasoning ability of LLMs. Our experimental findings reveal that even the most advanced models available struggle to master the simplest complexity classes within the subregular hierarchy of functions, highlighting a notable deficiency in current LLMs’ inductive reasoning capabilities. Coda and data are available this url.

Chaoran Chen, Bingsheng Yao, Ruishi Zou, Wenyue Hua, Weimin Lyu, Yanfang Ye, Toby Jia-Jun Li, Dakuo Wang

February, 2025

Towards a Design Guideline for RPA Evaluation: A Survey of Large Language Model-Based Role-Playing Agents

Role-Playing Agent (RPA) is an increasingly popular type of LLM Agent that simulates human-like behaviors in a variety of tasks. However, evaluating RPAs is challenging due to diverse task requirements and agent designs. This paper proposes an evidence-based, actionable, and generalizable evaluation design guideline for LLM-based RPA by systematically reviewing 1,676 papers published between Jan. 2021 and Dec. 2024. Our analysis identifies six agent attributes, seven task attributes, and seven evaluation metrics from existing literature. Based on these findings, we present an RPA evaluation design guideline to help researchers develop more systematic and consistent evaluation methods.

See all publications