🤖 AI Summary
IBM has announced commercial availability of the Spyre Accelerator, a 5nm system‑on‑chip designed for low‑latency inferencing of generative and agentic AI on enterprise platforms. Spyre ships as a 75W PCIe card containing 32 accelerator cores and ~25.6 billion transistors; you can cluster up to 48 cards in IBM Z/LinuxONE systems and up to 16 in Power11 servers. General availability begins Oct. 28 for IBM z17 and LinuxONE 5, with Power11 support arriving in early December. The product stems from work at IBM Research’s AI Hardware Center and is productionized to run alongside Telum II on mainframes and with on‑chip MMA on Power, plus a one‑click AI services catalog for Power deployments.
For the AI/ML community, Spyre targets a growing need to run multi‑model, agentic workloads with mainframe-grade security, resilience and energy efficiency—keeping sensitive data on‑prem while delivering low latency and high transaction throughput for real‑time inference (use cases cited include fraud detection and retail automation). Key implications: tighter co‑location of mission‑critical workloads and AI inference, hardware scaling options for throughput, and improved data conversion and prompt throughput for generative workflows (IBM reports >8M documents/hour ingestion at prompt size 128 on a one‑card test). Spyre signals IBM’s push to move advanced inference into enterprise datacenters where security, determinism and integration with legacy stacks matter.
Loading comments...
login to comment
loading comments...
no comments yet