Skip to content

AI论文速递 2026年05月07日(HuggingFace Daily Papers)

数据来源:https://huggingface.co/papers 采集时间:2026-05-07

📌 重点关注

  1. Generate, Filter, Control, Replay: A Comprehensive Survey of Rollout Strategies for LLM Reinforcement Learning | arXiv【重点关注】 Reinforcement learning (RL) has become a central post-training tool for impro...

💡 Agent核心训练技术演进,助力多智能体协作

  1. ESARBench: A Benchmark for Agentic UAV Embodied Search and Rescue | arXiv【重点关注】 The rapid advancement of Multimodal Large Language Models (MLLMs) has empower...

💡 Agent落地物理世界的关键基准测试

  1. BlenderRAG: High-Fidelity 3D Object Generation via Retrieval-Augmented Code Synthesis | arXiv【重点关注】 Automatic generation of executable Blender code from natural language remains...

💡 多模态生成技术在3D创作中的实用化 2. ESARBench: A Benchmark for Agentic UAV Embodied Search and Rescue | arXiv【重点关注】 The rapid advancement of Multimodal Large Language Models (MLLMs) has empower... 3. BlenderRAG: High-Fidelity 3D Object Generation via Retrieval-Augmented Code Synthesis | arXiv【重点关注】 Automatic generation of executable Blender code from natural language remains...

📋 其他值得关注

  1. PhysicianBench: Evaluating LLM Agents in Real-World EHR Environments | arXiv — We introduce PhysicianBench, a benchmark for evaluating LLM agents on physici...
  2. Beyond SFT-to-RL: Pre-alignment via Black-Box On-Policy Distillation for Multimodal RL | arXiv — The standard post-training recipe for large multimodal models (LMMs) applies ...
  3. Chain of Evidence: Pixel-Level Visual Attribution for Iterative Retrieval-Augmented Generation | arXiv — Iterative Retrieval-Augmented Generation (iRAG) has emerged as a powerful par...
  4. D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models | arXiv — The landscape of high-performance image generation models is currently shifti...
  5. Healthcare AI GYM for Medical Agents | arXiv — Clinical reasoning demands multi-step interactions -- gathering patient histo...
  6. Persistent Visual Memory: Sustaining Perception for Deep Generation in LVLMs | arXiv — While autoregressive Large Vision-Language Models (LVLMs) demonstrate remarka...
  7. T^2PO: Uncertainty-Guided Exploration Control for Stable Multi-Turn Agentic Reinforcement Learning | arXiv — Recent progress in multi-turn reinforcement learning (RL) has significantly i...