Skip to content

AI论文速递 2026年05月19日(HuggingFace Daily Papers)

数据来源:https://huggingface.co/papers 采集时间:2026-05-19

📌 重点关注

  1. Learning POMDP World Models from Observations with Language-Model Priors | arXiv【重点关注】 Whether navigating a building, operating a robot, or playing a game, an agent... 💡 语言先验+POMDP建世界模型,Agent感知决策新范式,端侧AI可借鉴
  2. Solvita: Enhancing Large Language Models for Competitive Programming via Agentic Evolution | arXiv【重点关注】 Large language models (LLMs) still struggle with the rigorous reasoning deman... 💡 Agent式进化提升LLM推理,自动化代码优化思路值得深入研究
  3. Unlocking Dense Metric Depth Estimation in VLMs | arXiv【重点关注】 Vision-Language Models (VLMs) excel at 2D tasks such as grounding and caption... 💡 VLM从2D走向3D深度感知,端侧视觉理解能力跃升的关键一步

📋 其他值得关注

  1. PAGER: Bridging the Semantic-Execution Gap in Point-Precise Geometric GUI Control | arXiv — Large vision-language models have significantly advanced GUI agents, enabling...
  2. LiteFrame: Efficient Vision Encoders Unlock Frame Scaling in Video LLMs | arXiv — The fundamental challenge in scaling Video Large Language Models (Video LLMs)...
  3. Stop When Reasoning Converges: Semantic-Preserving Early Exit for Reasoning Models | arXiv — Large Reasoning Models (LRMs) achieve strong performance by generating long c...
  4. From Plans to Pixels: Learning to Plan and Orchestrate for Open-Ended Image Editing | arXiv — Modern image editing models produce realistic results but struggle with abstr...
  5. Nudging Beyond the Comfort Zone: Efficient Strategy-Guided Exploration for RLVR | arXiv — Reinforcement learning with verifiable rewards (RLVR) has emerged as a scalab...
  6. EndPrompt: Efficient Long-Context Extension via Terminal Anchoring | arXiv — Extending the context window of large language models typically requires trai...
  7. Physics-R1: An Audited Olympiad Corpus and Recipe for Visual Physics Reasoning | arXiv — We audit the multimodal-physics evaluation pipeline end-to-end and document t...