Skip to content

Knowledge Base

AI论文速递 2026年05月08日（HuggingFace Daily Papers）

AI论文速递 2026年05月08日（HuggingFace Daily Papers）¶

数据来源：https://huggingface.co/papers 采集时间：2026-05-08

📌 重点关注¶

Generate, Filter, Control, Replay: A Comprehensive Survey of Rollout Strategies for LLM Reinforcement Learning | arXiv — 【重点关注】 Reinforcement learning (RL) has become a central post-training tool for impro... 💡 AI Agent强化训练策略全景图，助力Agent应用性能优化
ESARBench: A Benchmark for Agentic UAV Embodied Search and Rescue | arXiv — 【重点关注】 The rapid advancement of Multimodal Large Language Models (MLLMs) has empower... 💡 多模态智能体场景化评测，为Agent应用提供开发标准
OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents | arXiv — 【重点关注】 Deep search has become a crucial capability for frontier multimodal agents, e... 💡 开源多模态搜索智能体架构，Vibe Coding落地参考

📋 其他值得关注¶

ResRL: Boosting LLM Reasoning via Negative Sample Projection Residual Reinforcement Learning | arXiv — Reinforcement Learning with Verifiable Rewards (RLVR) enhances reasoning of L...
Think, then Score: Decoupled Reasoning and Scoring for Video Reward Modeling | arXiv — Recent advances in generative video models are increasingly driven by post-tr...
KinDER: A Physical Reasoning Benchmark for Robot Learning and Planning | arXiv — Robotic systems that interact with the physical world must reason about kinem...
CreativityBench: Evaluating Agent Creative Reasoning via Affordance-Based Tool Repurposing | arXiv — Recent advances in large language models have led to strong performance on re...
D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models | arXiv — The landscape of high-performance image generation models is currently shifti...
ReflectDrive-2: Reinforcement-Learning-Aligned Self-Editing for Discrete Diffusion Driving | arXiv — We introduce ReflectDrive-2, a masked discrete diffusion planner with separat...
Turning Drift into Constraint: Robust Reasoning Alignment in Non-Stationary Environments | arXiv — This paper identifies a critical yet underexplored challenge in reasoning ali...