Positional Bias in Long-Document Ranking: Impact, Assessment, and Mitigation

arXiv — cs.CLThursday, November 13, 2025 at 5:00:00 AM
The recent study on long-document ranking assessed over 20 Transformer models, including advanced ones like RankGPT, against a baseline model known as FirstP. Despite the sophistication of these models, none surpassed FirstP by more than 5% on average across benchmark datasets such as MS MARCO, TREC DL, and Robust04. The researchers attributed this limited improvement to a positional bias inherent in the benchmarks, where the most relevant passages are often found at the beginning of documents. This bias was not only present in long-document datasets but also surprisingly found in short-document collections like BEIR. To further investigate, the team introduced a new diagnostic dataset, MS MARCO FarRelevant, which intentionally placed relevant information beyond the first 512 tokens. The results indicated that many long-context models performed at random levels on this dataset, highlighting a significant challenge in their effectiveness. The study emphasizes the necessity for careful b…
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended apps based on your readingExplore all apps

Ready to build your own newsroom?

Subscribe to unlock a personalised feed, podcasts, newsletters, and notifications tailored to the topics you actually care about