LegalRikai: Open Benchmark -- A Benchmark for Complex Japanese Corporate Legal Tasks

arXiv — cs.CLMonday, December 15, 2025 at 5:00:00 AM
  • LegalRikai has introduced an Open Benchmark designed to evaluate complex Japanese corporate legal tasks, comprising four intricate tasks created under the guidance of legal professionals. This benchmark includes 100 samples that necessitate long-form, structured outputs, and has undergone both human and automated evaluations using advanced language models such as GPT-5 and Claude Opus 4.1.
  • The development of the LegalRikai benchmark is significant as it addresses the need for a structured evaluation framework in the legal domain, highlighting the challenges faced by AI models in document-level editing and the importance of aligning automated evaluations with human judgment.
  • This initiative reflects a growing trend in the AI field to create specialized benchmarks that not only assess task completion but also focus on the accuracy and reliability of outputs, paralleling other recent benchmarks aimed at improving the factual accuracy of generative AI models and enhancing the evaluation of AI capabilities across various domains.
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended apps based on your readingExplore all apps

Ready to build your own newsroom?

Subscribe to unlock a personalised feed, podcasts, newsletters, and notifications tailored to the topics you actually care about