AI论文速递 2026年05月07日（HuggingFace Daily Papers）¶

数据来源：https://huggingface.co/papers 采集时间：2026-05-07

📌 重点关注¶

💡 Agent核心训练技术演进，助力多智能体协作

ESARBench: A Benchmark for Agentic UAV Embodied Search and Rescue | arXiv — 【重点关注】 The rapid advancement of Multimodal Large Language Models (MLLMs) has empower...

💡 Agent落地物理世界的关键基准测试

BlenderRAG: High-Fidelity 3D Object Generation via Retrieval-Augmented Code Synthesis | arXiv — 【重点关注】 Automatic generation of executable Blender code from natural language remains...

💡 多模态生成技术在3D创作中的实用化 2. ESARBench: A Benchmark for Agentic UAV Embodied Search and Rescue | arXiv — 【重点关注】 The rapid advancement of Multimodal Large Language Models (MLLMs) has empower... 3. BlenderRAG: High-Fidelity 3D Object Generation via Retrieval-Augmented Code Synthesis | arXiv — 【重点关注】 Automatic generation of executable Blender code from natural language remains...

PhysicianBench: Evaluating LLM Agents in Real-World EHR Environments | arXiv — We introduce PhysicianBench, a benchmark for evaluating LLM agents on physici...
Beyond SFT-to-RL: Pre-alignment via Black-Box On-Policy Distillation for Multimodal RL | arXiv — The standard post-training recipe for large multimodal models (LMMs) applies ...
Chain of Evidence: Pixel-Level Visual Attribution for Iterative Retrieval-Augmented Generation | arXiv — Iterative Retrieval-Augmented Generation (iRAG) has emerged as a powerful par...
D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models | arXiv — The landscape of high-performance image generation models is currently shifti...
Healthcare AI GYM for Medical Agents | arXiv — Clinical reasoning demands multi-step interactions -- gathering patient histo...
Persistent Visual Memory: Sustaining Perception for Deep Generation in LVLMs | arXiv — While autoregressive Large Vision-Language Models (LVLMs) demonstrate remarka...
T^2PO: Uncertainty-Guided Exploration Control for Stable Multi-Turn Agentic Reinforcement Learning | arXiv — Recent progress in multi-turn reinforcement learning (RL) has significantly i...