CellARC: Measuring Intelligence with Cellular Automata

arXiv — cs.LGWednesday, November 12, 2025 at 5:00:00 AM
The introduction of CellARC marks a significant advancement in measuring intelligence through a synthetic benchmark based on multicolor 1D cellular automata. Released on November 12, 2025, it provides 95,000 training episodes and two test splits for evaluation, facilitating rapid iteration and controlled sampling. The benchmark's design allows for the decoupling of generalization from anthropomorphic priors, enabling reproducible studies of how quickly models can infer new rules under tight budgets. Notably, a 10M-parameter vanilla transformer achieved 58.0% and 32.4% per-token accuracy on interpolation and extrapolation tasks, outperforming recent recursive models. Meanwhile, the larger GPT-5 High model reached 62.3% and 48.1% accuracy on subsets of 100 test tasks. An ensemble model that selects between the transformer and the best-performing model reached 65.4% and 35.5% accuracy, highlighting the neuro-symbolic complementarity in AI performance. This benchmark not only enhances our …
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it