ExploRLLM: Guiding Exploration in Reinforcement Learning with Large Language Models
In robot manipulation tasks with large observation and action spaces, reinforcement learning (RL) often suffers from low sample efficiency and uncertain convergence. As an alternative, foundation models have shown promise in zero-shot and few-shot applications. However, these models can be unreliable due to their limited reasoning and challenges in understanding physical and spatial contexts. This paper introduces ExploRLLM, a method that combines the commonsense reasoning of foundation models with the experiential learning capabilities of RL. We leverage the strengths of both paradigms by using foundation models to obtain a base policy, an efficient representation, and an exploration policy. A residual RL agent learns when and how to deviate from the base policy while its exploration is guided by the exploration policy. In table-top manipulation experiments, we demonstrate that ExploRLLM outperforms both baseline foundation model policies and baseline RL policies. Additionally, we show that this policy can be transferred to the real world without further training. Supplementary material is available at https://explorllm.github.io.