Motif 2 12.7B technical report
PositiveArtificial Intelligence
On November 12, 2025, the Motif-2-12.7B model was unveiled, representing a significant advancement in the efficiency of large language models. Building on its predecessor, Motif-2.6B, this new model integrates Grouped Differential Attention (GDA) to enhance representational efficiency by effectively managing signal and noise in attention pathways. Pre-trained on an extensive dataset of 5.5 trillion tokens, the model employs a curriculum-driven data scheduler that optimally adjusts data composition. The training process utilizes the MuonClip optimizer and advanced techniques such as fused PolyNorm activations and the Parallel Muon algorithm, resulting in improved throughput and memory efficiency. Following the pre-training phase, a three-stage supervised fine-tuning pipeline is implemented to refine the model's ability to adhere to instructions and enhance its compositional understanding. The Motif-2-12.7B model demonstrates competitive performance across various benchmarks, showcasing …
— via World Pulse Now AI Editorial System
