Qiyuan Zhang

I am currently a candidate Ph.D. student advised by Prof. Chen Ma. Previously, I completed my B.Sc. and M.Sc. in Computer Science at the University of Electronic Science and Technology of China and spent time at Singapore Management University working with Jing Jiang. Soon, I will join Prof. Xue Liu’s group at MBZUAI as a visiting student.

My research interests lie in auto‑evaluation, reward modeling, preference modeling, and improved scaling strategies such as test‑time scaling for large language models. I am always excited about new collaborations—if you share these interests or see potential synergies, feel free to reach out via email!

Now, I have interned with Noah Lab@Huawei (Hong Kong) and Hunyuan team@Tencent, where I am focusing my efforts on advancing reward modeling. I am also seeking visiting or research‑intern opportunities to further explore frontier research topics.

In addition, I regularly post self-reflections on Medium—feel free to take a look if you’re interested!

Current Research Areas

LLM‑as‑a‑Judge / Reward Models
Methods for Test‑Time Scaling
LLM Performance Prediction
Automatic Benchmark Construction

News

7 Apr 2026 Two Papers accepted at ACL 2026

16 Jan 2026 One Paper accepted at ICLR 2026

16 May 2025 One Paper accepted at ACL 2025

31 Mar 2025 Survey released: A Survey on Test‑Time Scaling …

21 Apr 2025 One Paper accepted at PAKDD 2025

11 Feb 2025 One Paper accepted at ICLR 2025

10 Oct 2024 One Paper accepted at EMNLP 2024

Selected Publications

My selected publications represent my research style and interests.

RubricBench: Aligning Model-Generated Rubrics with Human Standards

Qiyuan Zhang, Junyi Zhou，Yufei Wang, Fuyuan Lyu, Yidong Ming, Can Xu, Qingfeng Sun, Kai Zheng, Peng Kang, Xue Liu, Chen Ma. · ACL 2026 Main

arXiv 🤗data

Beyond Length Scaling: Synergizing Breadth and Depth for Generative Reward Models

Qiyuan Zhang, Yufei Wang, Tianhe Wu, Can Xu, Qingfeng Sun, Kai Zheng, Xue Liu, Chen Ma. · ACL 2026 Finding

arXiv 🤗model & data

From Verifiable Dot to Reward Chain: Harnessing Verifiable Reference-based Rewards for Reinforcement Learning of Open-ended Generation

Yuxin Jiang, Yufei Wang, Qiyuan Zhang, Xingshan Zeng, Liangyou Li, Jierun Chen, Chaofan Tao, Haoli Bai, Lifeng Shang. · ICLR 2026 Poster

arXiv Code

What, How, Where, and How Well? A Survey on Test-Time Scaling in Large Language Models

Qiyuan Zhang, Fuyuan Lyu, Zexu Sun, Lei Wang, Weixu Zhang, Wenyue Hua, Haolun Wu, Zhihan Guo, Yufei Wang, Niklas Muennighoff, Irwin King, Xue Liu, Chen Ma. · Preprint

arXiv Page Code PPT

Crowd Comparative Reasoning: Unlocking Comprehensive Evaluations for LLM-as-a-Judge

Qiyuan Zhang, Yufei Wang, Yuxin Jiang, Liangyou Li, Chuhan Wu, Yasheng Wang, Xin Jiang, Lifeng Shang, Ruiming Tang, Fuyuan Lyu, Chen Ma. · ACL 2025

arXiv Code

RevisEval: Improving LLM-as-a-Judge via Response-Adapted References

Qiyuan Zhang, Yufei Wang, Tiezheng YU, Yuxin Jiang, Chuhan Wu, Liangyou Li, Yasheng Wang, Xin Jiang, Lifeng Shang, Ruiming Tang, Fuyuan Lyu, Chen Ma. · ICLR 2024

arXiv Code

Collaborative Performance Prediction for Large Language Models

Qiyuan Zhang, Fuyuan Lyu, Xue Liu, Chen Ma. · EMNLP 2024

arXiv Code

NOAHQA: Numerical Reasoning with Interpretable Graph QA Dataset

Qiyuan Zhang, Lei Wang, Sicheng Yu, Shuohang Wang, Yang Wang, Jing Jiang, Ee-Peng Lim. · EMNLP 2021 Findings

arXiv Code

MWPToolkit: An Open-Source Framework for DL-Based Math Word Problem Solvers

Yihuai Lan, Lei Wang, Qiyuan Zhang , Yunshi Lan, Bing Tian Dai, Yan Wang, Dongxiang Zhang, Ee-Peng Lim. · AAAI 2021 Workshop

arXiv Code

Current Research Areas

News

Selected Publications

Selected Talks