Foundry Local comes to Android–plus on-device speech, and on-prem support (devblogs.microsoft.com)

0 points 1 day ago ago | visit original

🤖 AI Summary

Microsoft announced Foundry Local on Android, a new way to run optimized open‑source models entirely on mobile devices so apps can perform inference with no cloud round trips. The release includes a Speech API powered by Whisper that transcribes audio on‑device (audio stays local by default) with low latency and streaming outputs, and early integrations—such as PhonePe’s preview—show feasibility at scale. The move targets privacy‑sensitive and connectivity‑constrained scenarios while cutting cloud costs and round‑trip latency for features like voice-driven forms, offline assistants, and payments UX. Technically, the Foundry Local SDK provides self‑contained packaging (no separate server executables), a smaller runtime footprint, automatic device runtime/driver selection (including Windows ML detection), and simple APIs to download, load, and run models from a Foundry catalog (examples include qwen and whisper variants). APIs support chat completions and audio transcription with OpenAI‑style request/response and streaming, plus an optional OpenAI‑compliant web server for integrations (LangChain, OpenAI SDK, Web UI). Microsoft also previewed Arc‑enabled Kubernetes support to run Foundry Local in containers for edge, hybrid, sovereign, and disconnected on‑prem deployments—enabling validated dev/test workloads to move into industrial and regulated environments. Preview signups are open and Microsoft plans broader platform and multi‑modal expansions.

Loading comments...

loading comments...