LegalRikai: Open Benchmark -- A Benchmark for Complex Japanese Corporate Legal Tasks

LegalRikai has introduced an Open Benchmark designed to evaluate complex Japanese corporate legal tasks, comprising four intricate tasks created under the guidance of legal professionals. This benchmark includes 100 samples that necessitate long-form, structured outputs, and has undergone both human and automated evaluations using advanced language models such as GPT-5 and Claude Opus 4.1.
The development of the LegalRikai benchmark is significant as it addresses the need for a structured evaluation framework in the legal domain, highlighting the challenges faced by AI models in document-level editing and the importance of aligning automated evaluations with human judgment.
This initiative reflects a growing trend in the AI field to create specialized benchmarks that not only assess task completion but also focus on the accuracy and reliability of outputs, paralleling other recent benchmarks aimed at improving the factual accuracy of generative AI models and enhancing the evaluation of AI capabilities across various domains.

LegalRikai: Open Benchmark -- A Benchmark for Complex Japanese Corporate Legal Tasks