HipKittens: Fast and Furious AMD Kernels

arXiv — cs.LGWednesday, November 12, 2025 at 5:00:00 AM
The paper 'HipKittens: Fast and Furious AMD Kernels' presents a novel programming framework aimed at enhancing the performance of AI kernels on AMD GPUs. Traditional AMD kernels rely heavily on raw assembly, which poses scalability challenges for diverse AI workloads. The HipKittens framework introduces tile-based abstractions that have been shown to generalize effectively to AMD architectures, specifically validated on CDNA3 and CDNA4 platforms. In performance evaluations, HipKittens kernels not only compete with AMD's hand-optimized assembly kernels for tasks like GEMMs and attention but also consistently outperform existing compiler baselines by a factor of 1.2 to 2.4. This research is crucial as it provides a pathway for developers to leverage AMD's advanced compute and memory bandwidth capabilities without the complexities of assembly programming, potentially broadening the adoption of AMD GPUs in AI applications.
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Ready to build your own newsroom?

Subscribe to unlock a personalised feed, podcasts, newsletters, and notifications tailored to the topics you actually care about