AI论文速递 2026年05月19日(HuggingFace Daily Papers)¶
数据来源:https://huggingface.co/papers 采集时间:2026-05-19
📌 重点关注¶
- Learning POMDP World Models from Observations with Language-Model Priors | arXiv — 【重点关注】 Whether navigating a building, operating a robot, or playing a game, an agent... 💡 语言先验+POMDP建世界模型,Agent感知决策新范式,端侧AI可借鉴
- Solvita: Enhancing Large Language Models for Competitive Programming via Agentic Evolution | arXiv — 【重点关注】 Large language models (LLMs) still struggle with the rigorous reasoning deman... 💡 Agent式进化提升LLM推理,自动化代码优化思路值得深入研究
- Unlocking Dense Metric Depth Estimation in VLMs | arXiv — 【重点关注】 Vision-Language Models (VLMs) excel at 2D tasks such as grounding and caption... 💡 VLM从2D走向3D深度感知,端侧视觉理解能力跃升的关键一步
📋 其他值得关注¶
- PAGER: Bridging the Semantic-Execution Gap in Point-Precise Geometric GUI Control | arXiv — Large vision-language models have significantly advanced GUI agents, enabling...
- LiteFrame: Efficient Vision Encoders Unlock Frame Scaling in Video LLMs | arXiv — The fundamental challenge in scaling Video Large Language Models (Video LLMs)...
- Stop When Reasoning Converges: Semantic-Preserving Early Exit for Reasoning Models | arXiv — Large Reasoning Models (LRMs) achieve strong performance by generating long c...
- From Plans to Pixels: Learning to Plan and Orchestrate for Open-Ended Image Editing | arXiv — Modern image editing models produce realistic results but struggle with abstr...
- Nudging Beyond the Comfort Zone: Efficient Strategy-Guided Exploration for RLVR | arXiv — Reinforcement learning with verifiable rewards (RLVR) has emerged as a scalab...
- EndPrompt: Efficient Long-Context Extension via Terminal Anchoring | arXiv — Extending the context window of large language models typically requires trai...
- Physics-R1: An Audited Olympiad Corpus and Recipe for Visual Physics Reasoning | arXiv — We audit the multimodal-physics evaluation pipeline end-to-end and document t...