Arm SIMD Loops – C, ACLE intrinsics, inline assembly – Neon, SVE, SME (gitlab.arm.com)

0 points 18 hours ago ago | visit original

🤖 AI Summary

Arm published "simd-loops," a reference repository of common SIMD loop patterns implemented in plain C, ACLE intrinsics and inline assembly for Neon, SVE and SME. The collection packs canonical loop idioms (strip-mining/unrolling, prolog/epilog, tail handling and predication), memory-access patterns and small microkernels so engineers can compare idiomatic C, intrinsics and hand-written asm side-by-side across fixed-width Neon and scalable SVE/SME targets. This matters for the AI/ML community because vectorization and matrix work are at the heart of inference and training kernels. The repo makes it much easier to write correct, high-performance code that handles partial vectors, VLA (vector-length-agnostic) SVE predication, and the newer SME matrix primitives, and to understand when compilers miss opportunities. Practically, it’s a toolkit for porting/optimizing kernels, experimenting with prefetching/striping strategies, profiling assembly hotspots, and building reproducible benchmarks across Arm ISAs.

Loading comments...

loading comments...