A Live Environment for Training Predictive Agents with Real-World Outcome Rewards

Summary (EN)

Researchers introduced FutureWorld, a live reinforcement-learning environment for training AI agents to make and improve real-world future predictions. The paper reframes live future prediction as an interactive learning setting rather than only an evaluation task. In the environment, agents answer prediction questions about events that have not yet unfolded, then receive reward signals later when real outcomes become known. The authors argue this setup creates a natural training loop that avoids answer leakage while supplying a steady stream of diverse, grounded supervision from real events. To demonstrate the idea, they train three open-source base models over consecutive days and report that performance improves through this closed loop of prediction, outcome realization, and parameter updates. The paper also introduces a daily benchmark derived from the environment and uses it to evaluate several frontier agent systems as baselines. The authors position FutureWorld as a step toward agents that can continually learn from the world rather than only from static corpora or synthetic tasks. They emphasize that the environment combines scale, diversity, and delayed but objective reward, which are features often missing from standard language-model benchmarks. The release is application-oriented because it targets forecasting agents that could eventually support domains such as markets, operations, planning, and decision support, while also providing a new training substrate for agentic reinforcement learning on real-world questions.

Summary (ZH)

研究人员发布了 FutureWorld,一个面向现实世界结果奖励的实时强化学习环境,用于训练 AI agent 进行未来事件预测并持续改进。论文将“实时未来预测”从传统的评测任务重新定义为一个可交互、可持续训练的学习环境。在该环境中,agent 需要先回答尚未发生事件的预测问题,待真实结果出现后再获得奖励信号。作者认为,这种机制天然避免了答案泄露问题,同时能够持续提供与现实事件紧密相关、类型多样的监督数据。为验证该思路,团队使用三个开源基础模型进行了连续多日训练,并报告称模型在“预测、等待真实结果、再进行参数更新”的闭环中取得了有效提升。论文还基于这一环境构建了一个日更基准,并据此评估多个前沿 agent 系统,建立当前性能基线。作者将 FutureWorld 视为推动 agent 从静态语料或合成任务学习,走向“从现实世界中持续学习”的关键一步。文章强调,这一环境同时具备规模性、多样性以及延迟但客观的奖励信号,而这些特征在常规语言模型基准中往往缺失。从应用导向看,该工作瞄准的是未来可用于市场判断、运营规划、资源调度和决策支持的预测型 agent,也为真实世界问题上的 agentic reinforcement learning 提供了新的训练基础设施。

Source

https://arxiv.org/abs/2604.26733