论文精选 - OpenSeed

研究领域

会议目录

筛选偏好

2025

论文荣誉

附加资源

共找到 1630条结果

EMNLP 2025

s1: Simple test-time scaling

Niklas Muennighoff,Zitong Yang,Weijia Shi,Xiang Lisa Li,Li Fei-Fei,Hannaneh Hajishirzi,Luke Zettlemoyer,Percy Liang,Emmanuel Candes,Tatsunori Hashimoto

Test-time scaling is a promising new approach to language modeling that uses extra test-time compute to improve performance. Recently, OpenAI

PDF Code(6.6k)Project

EMNLP 2025

Search-o1: Agentic Search-Enhanced Large Reasoning Models

Xiaoxi Li,Guanting Dong,Jiajie Jin,Yuyao Zhang,Yujia Zhou,Yutao Zhu,Peitian Zhang,Zhicheng Dou

Large reasoning models (LRMs) like OpenAI-o1 have demonstrated impressive long stepwise reasoning capabilities through large-scale reinforcement learning. However, their extended reasoning processes often suffer from knowledge insufficiency, leading to frequent uncertainties and potential errors. To address this limitation, we introduce **Search-o1**, a framework that enhances LRMs with an agentic retrieval-augmented generation (RAG) mechanism and a Reason-in-Documents module for refining retrieved documents. Search-o1 integrates an agentic search workflow into the reasoning process, enabling dynamic retrieval of external knowledge when LRMs encounter uncertain knowledge points. Additionally, due to the verbose nature of retrieved documents, we design a separate Reason-in-Documents module to deeply analyze the retrieved information before injecting it into the reasoning chain, minimizing noise and preserving coherent reasoning flow. Extensive experiments on complex reasoning tasks in science, mathematics, and coding, as well as six open-domain QA benchmarks, demonstrate the strong performance of Search-o1. This approach enhances the trustworthiness of LRMs in complex reasoning tasks, paving the way for advanced deep research systems. The code is available athttps://github.com/RUC-NLPIR/Search-o1.

PDF Code(1.2k)Project

EMNLP 2025

SoundMind:RL-Incentivized Logic Reasoning for Audio-Language Models

Xingjian Diao,Chunhui Zhang,Keyi Kong,Weiyi Wu,Chiyu Ma,Zhongyu Ouyang,Peijun Qing,Soroush Vosoughi,Jiang Gui

While large language models have demonstrated impressive reasoning abilities, their extension to the audio modality, particularly within large audio-language models (LALMs), remains underexplored. Addressing this gap requires a systematic approach that involves a capable base model, high-quality reasoning-oriented audio data, and effective training algorithms. In this work, we present a comprehensive solution for audio logical reasoning (ALR) tasks: we introduce SoundMind, a dataset of 6,446 audio

PDF Code(1.1k)Project

EMNLP 2025

DeepResearcher: Scaling Deep Research via Reinforcement Learning in Real-world Environments

Yuxiang Zheng,Dayuan Fu,Xiangkun Hu,Xiaojie Cai,Lyumanshan Ye,Pengrui Lu,Pengfei Liu

Large Language Models (LLMs) with web search capabilities show significant potential for deep research, yet current methods

PDF Code(715)Project

EMNLP 2025

Towards Robust Mathematical Reasoning

Thang Luong,Dawsen Hwang,Hoang H Nguyen,Golnaz Ghiasi,Yuri Chervonyi,Insuk Seo,Junsu Kim,Garrett Bingham,Jonathan Lee,Swaroop Mishra,Alex Zhai,Huiyi Hu,Henryk Michalewski,Jimin Kim,Jeonghyun Ahn,Junhwi Bae,Xingyou Song,Trieu Hoang Trinh,Quoc V Le,Junehyuk Jung

Finding the right north-star metrics is highly critical for advancing mathematical reasoning capabilities of foundation models, especially given that existing evaluations are either too easy or only focusing on getting correct short answers. To address these issues, we present IMO-Bench, a suite of advanced reasoning benchmarks that specifically targets the level of the International Mathematical Olympiad (IMO), the most prestigious venue for young mathematicians. IMOAnswerBench first tests models on 400 diverse Olympiad problems with verifiable short answers. IMO-ProofBench is the next-level evaluation for proof-writing capabilities, which includes both basic and advanced IMO problems as well as detailed grading guidelines to facilitate automatic grading. These benchmarks played a crucial role in our historic achievement of the gold-level performance at IMO 2025 with Gemini Deep Think (Luong and Lockhart, 2025). Our model achieved 80.0% on IMO-AnswerBench and 65.7% on the advanced IMO-ProofBench, surpassing the best non-Gemini models by large margins of 6.9% and 42.4% respectively. We also showed that autograders built with Gemini reasoning correlate well with human evaluations and construct IMO-GradingBench, with 1000 human gradings on proofs, to enable further progress in automatic evaluation of long-form answers. We hope that IMO-Bench will help the community towards advancing robust mathematical reasoning and release it at https://github.com/google-deepmind/superhuman/imobench.

PDF Code(625)Project

EMNLP 2025

BYOKG-RAG: Multi-Strategy Graph Retrieval for Knowledge Graph Question Answering

Costas Mavromatis,Soji Adeshina,Vassilis N. Ioannidis,Zhen Han,Qi Zhu,Ian Robinson,Bryan Thompson,Huzefa Rangwala,George Karypis

Knowledge graph question answering (KGQA) presents significant challenges due to the structural and semantic variations across input graphs. Existing works rely on Large Language Model (LLM) agents for graph traversal and retrieval; an approach that is sensitive to traversal initialization, as it is prone to entity linking errors and may not generalize well to custom (

PDF Code(366)Project

EMNLP 2025

SwiftKV: Fast Prefill-Optimized Inference with Knowledge-Preserving Model Transformation

Aurick Qiao,Zhewei Yao,Samyam Rajbhandari,Yuxiong He

LLM inference for enterprise applications, such as summarization, RAG, and code-generation, typically observe much longer prompt than generations, leading to high prefill cost and response latency. We present SwiftKV, a novel model transformation and distillation procedure targeted at reducing the prefill compute (in FLOPs) of prompt tokens while preserving high generation quality. First, SwiftKV prefills later layers

PDF Code(277)Project

EMNLP 2025

PPC-GPT: Federated Task-Specific Compression of Large Language Models via Pruning and Chain-of-Thought Distillation

Tao Fan,Guoqiang Ma,Yuanfeng Song,Lixin Fan,Qiang Yang

Compressing Large Language Models (LLMs) into task-specific Small Language Models (SLMs) encounters two significant challenges: safeguarding domain-specific knowledge privacy and managing limited resources. To tackle these challenges, we propose PPC-GPT, a novel unified framework that systematically addresses both privacy preservation and model compression in federated settings. PPC-GPT works on a server-client federated architecture, where the client sends differentially private (DP) perturbed task-specific data to the server

PDF Code(256)Project

EMNLP 2025

Towards Controllable Speech Synthesis in the Era of Large Language Models: A Systematic Survey

Tianxin Xie,Yan Rong,Pengfei Zhang,Wenwu Wang,Li Liu

Text-to-speech (TTS) has advanced from generating natural-sounding speech to enabling fine-grained control over attributes like emotion, timbre, and style. Driven by rising industrial demand and breakthroughs in deep learning, e.g., diffusion and large language models (LLMs), controllable TTS has become a rapidly growing research area. This survey provides **the first** comprehensive review of controllable TTS methods, from traditional control techniques to emerging approaches using natural language prompts. We categorize model architectures, control strategies, and feature representations, while also summarizing challenges, datasets, and evaluations in controllable TTS. This survey aims to guide researchers and practitioners by offering a clear taxonomy and highlighting future directions in this fast-evolving field. One can visit https://github.com/imxtx/awesome-controllabe-speech-synthesis for a comprehensive paper list and updates.

PDF Code(216)Project

EMNLP 2025

Scaling Rich Style-Prompted Text-to-Speech Datasets

Anuj Diwan,Zhisheng Zheng,David Harwath,Eunsol Choi

We introduce Paralinguistic Speech Captions (ParaSpeechCaps), a large-scale dataset that annotates speech utterances with rich style captions. While rich abstract tags (e.g. guttural, nasal, pained) have been explored in small-scale human-annotated datasets, existing large-scale datasets only cover basic tags (e.g. low-pitched, slow, loud). We combine off-the-shelf text and speech embedders, classifiers and an audio language model to automatically scale rich tag annotations for the first time. ParaSpeechCaps covers a total of 59 style tags, including both speaker-level intrinsic tags and utterance-level situational tags. It consists of 282 hours of human-labelled data (PSC-Base) and 2427 hours of automatically annotated data (PSC-Scaled). We finetune Parler-TTS, an open-source style-prompted TTS model, on ParaSpeechCaps, and achieve improved style consistency (+7.9% Consistency MOS) and speech quality (+15.5% Naturalness MOS) over the best performing baseline that combines existing rich style tag datasets. We ablate several of our dataset design choices to lay the foundation for future work in this space. Our dataset, models and code are released at https://github.com/ajd12342/paraspeechcaps .

PDF Code(155)Project

继续下滑，发现更多