google Oct 14, 2025

Coral NPU: A full-stack platform for Edge AI (opens in new tab)

ai machine-learning coral-npu edge-ai risc-v npu system-on-chip tensorflow-lite-micro mlir

Coral NPU is a new full-stack, open-source platform designed to bring advanced AI directly to power-constrained edge devices and wearables. By prioritizing a matrix-first hardware architecture and a unified software stack, Google aims to overcome traditional bottlenecks in performance, ecosystem fragmentation, and data privacy. The platform enables always-on, low-power ambient sensing while providing developers with a flexible, RISC-V-based environment for deploying modern machine learning models.

Overcoming Edge AI Constraints

The platform addresses the "performance gap" where complex ML models typically exceed the power, thermal, and memory budgets of battery-operated devices.
It eliminates the "fragmentation tax" by providing a unified architecture, moving away from proprietary processors that require costly, device-specific optimizations.
On-device processing ensures a high standard of privacy and security by keeping personal context and data off the cloud.

AI-First Hardware Architecture

Unlike traditional chips, this architecture prioritizes the ML matrix engine over scalar compute to optimize for efficient on-device inference.
The design is built on RISC-V ISA compliant architectural IP blocks, offering an open and extensible reference for system-on-chip (SoC) designers.
The base design delivers performance in the 512 giga operations per second (GOPS) range while consuming only a few milliwatts of power.
The architecture is tailored for "always-on" use cases, making it ideal for hearables, AR glasses, and smartwatches.

Core Architectural Components

Scalar Core: A lightweight, C-programmable RISC-V frontend that manages data flow using an ultra-low-power "run-to-completion" model.
Vector Execution Unit: A SIMD co-processor compliant with the RISC-V Vector instruction set (RVV) v1.0 for simultaneous operations on large datasets.
Matrix Execution Unit: A specialized engine using quantized outer product multiply-accumulate (MAC) operations to accelerate fundamental neural network tasks.

Unified Developer Ecosystem

The platform is a C-programmable target that integrates with modern compilers such as IREE and TFLM (TensorFlow Lite Micro).
It supports a wide range of popular ML frameworks, including TensorFlow, JAX, and PyTorch.
The software toolchain utilizes MLIR and the StableHLO dialect to facilitate the transition from high-level models to hardware-executable code.
Developers have access to a complete suite of tools, including a simulator, custom kernels, and a general-purpose MLIR compiler.

SoC designers and ML developers looking to build the next generation of wearables should leverage the Coral NPU reference architecture to balance high-performance AI with extreme power efficiency. By utilizing the open-source documentation and RISC-V-based tools, teams can significantly reduce the complexity of deploying private, always-on ambient sensing.