Trending:

HipKittens: Fast and Furious AMD Kernels

arXiv — cs.LG•Wednesday, November 12, 2025 at 5:00:00 AM

The paper 'HipKittens: Fast and Furious AMD Kernels' presents a novel programming framework aimed at enhancing the performance of AI kernels on AMD GPUs. Traditional AMD kernels rely heavily on raw assembly, which poses scalability challenges for diverse AI workloads. The HipKittens framework introduces tile-based abstractions that have been shown to generalize effectively to AMD architectures, specifically validated on CDNA3 and CDNA4 platforms. In performance evaluations, HipKittens kernels not only compete with AMD's hand-optimized assembly kernels for tasks like GEMMs and attention but also consistently outperform existing compiler baselines by a factor of 1.2 to 2.4. This research is crucial as it provides a pathway for developers to leverage AMD's advanced compute and memory bandwidth capabilities without the complexities of assembly programming, potentially broadening the adoption of AMD GPUs in AI applications.

— via World Pulse Now AI Editorial System

Read Original

Was this article worth reading? Share it

Recommended Readings

DEV Community2 days ago

Sector HQ Weekly Digest - November 17, 2025

NeutralArtificial Intelligence

The Sector HQ Weekly Digest for November 17, 2025, highlights the latest developments in the AI industry, focusing on the performance of top companies. OpenAI leads with a score of 442385.7 and 343 events, followed by Anthropic and Amazon. The report also notes significant movements, with Sony jumping 277 positions in the rankings, reflecting the dynamic nature of the AI sector.

Read full article

via DEV Community

arXiv — cs.LG2 days ago

MMA-Sim: Bit-Accurate Reference Model of Tensor Cores and Matrix Cores

NeutralArtificial Intelligence

The paper presents MMA-Sim, a bit-accurate reference model that analyzes the arithmetic behaviors of matrix multiplication accelerators (MMAs) used in modern GPUs, specifically NVIDIA Tensor Cores and AMD Matrix Cores. With the increasing computational demands of deep neural networks (DNNs), the distinct arithmetic specifications of these MMAs can lead to numerical imprecision, affecting DNN training and inference stability. MMA-Sim reveals detailed arithmetic algorithms and confirms bitwise equivalence with real hardware through extensive validation.

Read full article

via arXiv — cs.LG