Motif 2 12.7B technical report

arXiv — cs.CLWednesday, November 12, 2025 at 5:00:00 AM
On November 12, 2025, the Motif-2-12.7B model was unveiled, representing a significant advancement in the efficiency of large language models. Building on its predecessor, Motif-2.6B, this new model integrates Grouped Differential Attention (GDA) to enhance representational efficiency by effectively managing signal and noise in attention pathways. Pre-trained on an extensive dataset of 5.5 trillion tokens, the model employs a curriculum-driven data scheduler that optimally adjusts data composition. The training process utilizes the MuonClip optimizer and advanced techniques such as fused PolyNorm activations and the Parallel Muon algorithm, resulting in improved throughput and memory efficiency. Following the pre-training phase, a three-stage supervised fine-tuning pipeline is implemented to refine the model's ability to adhere to instructions and enhance its compositional understanding. The Motif-2-12.7B model demonstrates competitive performance across various benchmarks, showcasing …
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Ready to build your own newsroom?

Subscribe to unlock a personalised feed, podcasts, newsletters, and notifications tailored to the topics you actually care about