CLEV: LLM-Based Evaluation Through Lightweight Efficient Voting for Free-Form Question-Answering
PositiveArtificial Intelligence
CLEV, introduced in a recent arXiv submission, addresses the ongoing challenge of evaluating free-form Question Answering (QA), which is complicated by its diverse and open-ended responses. Traditional automatic metrics often fail to capture the semantic nuances of these responses, leading to inconsistencies in evaluation. The proposed method, Consensus via Lightweight Efficient Voting (CLEV), utilizes two primary LLMs to assess answers, invoking a third only in cases of disagreement. This innovative approach not only enhances the reliability of evaluations but also reduces unnecessary computational demands, making it a scalable and resource-efficient solution. Experiments, including human evaluations, have demonstrated CLEV's effectiveness, establishing it as a robust framework for evaluating LLMs in free-form QA scenarios.
— via World Pulse Now AI Editorial System
