CodegenBench: Can LLMs Write Efficient Code Across Architectures? (arxiv.org)

🤖 AI Summary
A team of researchers has introduced CodegenBench, a novel benchmark suite aimed at assessing the capabilities of large language models (LLMs) in generating efficient code across various architectures, particularly focusing on CPU-oriented high-performance computing (HPC). The suite includes 106 standard Basic Linear Algebra Subprograms (BLAS) routines and 20 specialized computational kernels tailored for different supercomputing platforms, such as x86_64, Sunway, and Kunpeng. This initiative highlights a critical gap in the existing literature, as LLMs have primarily been evaluated in general-purpose programming environments without a thorough analysis of their performance in diverse HPC contexts. The findings reveal that while LLMs excel in generating optimized code for well-documented architectures like x86_64, they struggle significantly with domain-specific architectures that lack extensive training data, underscoring their limitations in cross-platform generalization. The research indicates that LLMs perform best on moderately difficult coding tasks that require brief snippets of code. With an open-source dataset and evaluation infrastructure, CodegenBench aims to foster further research in LLM-driven code generation, paving the way for enhanced performance in high-performance computing applications.
Loading comments...
loading comments...