Date & Time:
October 25, 2023 11:00 am – 12:00 pm
10/25/2023 11:00 AM 10/25/2023 12:00 PM America/Chicago Yuanhao Wang (Princeton)- Is RLHF more difficult than standard RL? A view from reductions

Reinforcement learning from Human Feedback (RLHF) learns from preference signals, while standard Reinforcement Learning (RL) directly learns from reward signals. Preferences arguably contain less information than rewards, which makes preference-based RL seemingly more difficult. This paper theoretically proves that, for a wide range of preference models, we can solve preference-based RL directly using existing algorithms and techniques for reward-based RL, with small or no extra costs. Specifically, (1) for preferences that are drawn from reward-based probabilistic models, we reduce the problem to robust reward-based RL that can tolerate small errors in rewards; (2) for general arbitrary preferences where the objective is to find the von Neumann winner, we reduce the problem to multiagent reward-based RL which finds Nash equilibria for factored Markov games under a restricted set of policies. The latter case can be further reduced to adversarial MDP when preferences only depend on the final state. We instantiate all reward-based RL subroutines by concrete provable algorithms, and apply our theory to a large class of models including tabular MDPs and MDPs with generic function approximation. We further provide guarantees when K-wise comparisons are available.

Speakers

Yuanhao Wang

PhD Student

Yuanhao Wang is a fourth-year PhD student at the Computer Science Department of Princeton University. He is advised by Chi Jin. Prior to Princeton, he received his bachelor’s degree in Computer Science from Yao Class at Tsinghua University. His research interests include reinforcement learning theory, learning in games and minimax optimization. He has received the best paper award in the ICLR 2022 workshop on Gamification and Multiagent Solutions.

Related News & Events

Inside the Lab icon
Video

Inside The Lab: How Can Robots Improve Our Lives?

Oct 27, 2025
best demo award acceptance
UChicago CS News

Shape n’ Swarm: Hands-On, Shape-Aware Generative Authoring for Swarm User Interfaces Wins Best Demo at UIST 2025

Oct 22, 2025
gas example
UChicago CS News

Redirecting Hands in Virtual Reality With Galvanic Vestibular Stimulation: UChicago Lab to Present First-of-Its-Kind Work at UIST 2025

Oct 13, 2025
UIST collage
UChicago CS News

UChicago CS Researchers Expand the Boundaries of Interface Technology at UIST 2025

Sep 26, 2025
child reading to robot
UChicago CS News

Could Robots Help Kids Conquer Reading Anxiety? New Study from the Department of Computer Science at UChicago Suggests So

Sep 10, 2025
UChicago CS News

Hands-On Vision: How a Wrist Camera Can Expand the World for All Users

May 23, 2025
robot interaction
In the News

More Control, Less Connection: How User Control Affects Robot Social Agency

May 16, 2025
collage of photos from conference
UChicago CS News

Innovation at the Forefront: UChicago CS Researchers Make Significant Contributions to CHI 2025

Apr 23, 2025
UChicago CS News

Unveiling Attention Receipts: Tangible Reflections on Digital Consumption

May 15, 2024
UChicago CS News

University of Chicago Computer Science Researchers To Present Ten Papers at CHI 2024

May 06, 2024
UChicago CS News

Five UChicago CS students named to Siebel Scholars Class of 2024

Oct 02, 2023
UChicago CS News

UChicago Computer Scientists Design Small Backpack That Mimics Big Sensations

Sep 11, 2023
arrow-down-largearrow-left-largearrow-right-large-greyarrow-right-large-yellowarrow-right-largearrow-right-smallbutton-arrowclosedocumentfacebookfacet-arrow-down-whitefacet-arrow-downPage 1CheckedCheckedicon-apple-t5backgroundLayer 1icon-google-t5icon-office365-t5icon-outlook-t5backgroundLayer 1icon-outlookcom-t5backgroundLayer 1icon-yahoo-t5backgroundLayer 1internal-yellowinternalintranetlinkedinlinkoutpauseplaypresentationsearch-bluesearchshareslider-arrow-nextslider-arrow-prevtwittervideoyoutube