$\left|\,\circlearrowright\,\boxed{\text{BUS}}\,\right|$: A Large and Diverse Multimodal Benchmark for evaluating the ability of Vision-Language Models to understand Rebus Puzzles
$\left|\,\circlearrowright\,\boxed{\text{BUS}}\,\right|$: A Large and Diverse Multimodal Benchmark for evaluating the ability of Vision-Language Models to understand Rebus Puzzles
The recently introduced benchmark, named BUS, is designed to evaluate Vision-Language Models' ability to understand Rebus Puzzles, which are puzzles that creatively combine images, symbols, and letters. This benchmark aims to enhance the assessment of models by challenging their cognitive and reasoning skills, reflecting the complex nature of these puzzles. BUS is characterized as a large and diverse multimodal benchmark, indicating its comprehensive scope in testing various aspects of model understanding. The focus on Rebus Puzzles highlights the benchmark's intent to push models beyond simple recognition tasks toward more integrated interpretation capabilities. According to the proposal, BUS is expected to be effective in advancing the evaluation of Vision-Language Models. By targeting the improvement of cognitive and reasoning skills, the benchmark addresses key areas necessary for more sophisticated AI understanding. Overall, BUS represents a significant step in benchmarking that aligns with the growing need for models to interpret complex, multimodal information.
