Publications

Beyond Ranked Lists: The SARAL Framework for Cross-Lingual Document Set Retrieval

Abstract

Machine Translation for English Retrieval of Information in Any Language (MATERIAL) is an IARPA initiative targeted to advance the state of cross-lingual information retrieval (CLIR). This report provides a detailed description of Information Sciences Institute's (ISI's) Summarization and domain-Adaptive Retrieval Across Language's (SARAL's) effort for MATERIAL. Specifically, we outline our team's novel approach to handle CLIR with emphasis in developing an approach amenable to retrieve a query-relevant document \textit{set}, and not just a ranked document-list. In MATERIAL's Phase-3 evaluations, SARAL exceeded the performance of other teams in five out of six evaluation conditions spanning three different languages (Farsi, Kazakh, and Georgian).

Date
November 5, 2025
Authors
Shantanu Agarwal, Joel Barry, Elizabeth Boschee, Scott Miller
Journal
arXiv preprint arXiv:2511.03228