Coral NPU: A full-stack platform for Edge AI (opens in new tab)
Coral NPU is a new full-stack, open-source platform designed to bring advanced AI directly to power-constrained edge devices and wearables. By prioritizing a matrix-first hardware architecture and a unified software stack, Google aims to overcome traditional bottlenecks in performance, ecosystem fragmentation, and data privacy. The platform enables always-on, low-power ambient sensing while providing developers with a flexible, RISC-V-based environment for deploying modern machine learning models. ## Overcoming Edge AI Constraints * The platform addresses the "performance gap" where complex ML models typically exceed the power, thermal, and memory budgets of battery-operated devices. * It eliminates the "fragmentation tax" by providing a unified architecture, moving away from proprietary processors that require costly, device-specific optimizations. * On-device processing ensures a high standard of privacy and security by keeping personal context and data off the cloud. ## AI-First Hardware Architecture * Unlike traditional chips, this architecture prioritizes the ML matrix engine over scalar compute to optimize for efficient on-device inference. * The design is built on RISC-V ISA compliant architectural IP blocks, offering an open and extensible reference for system-on-chip (SoC) designers. * The base design delivers performance in the 512 giga operations per second (GOPS) range while consuming only a few milliwatts of power. * The architecture is tailored for "always-on" use cases, making it ideal for hearables, AR glasses, and smartwatches. ## Core Architectural Components * **Scalar Core:** A lightweight, C-programmable RISC-V frontend that manages data flow using an ultra-low-power "run-to-completion" model. * **Vector Execution Unit:** A SIMD co-processor compliant with the RISC-V Vector instruction set (RVV) v1.0 for simultaneous operations on large datasets. * **Matrix Execution Unit:** A specialized engine using quantized outer product multiply-accumulate (MAC) operations to accelerate fundamental neural network tasks. ## Unified Developer Ecosystem * The platform is a C-programmable target that integrates with modern compilers such as IREE and TFLM (TensorFlow Lite Micro). * It supports a wide range of popular ML frameworks, including TensorFlow, JAX, and PyTorch. * The software toolchain utilizes MLIR and the StableHLO dialect to facilitate the transition from high-level models to hardware-executable code. * Developers have access to a complete suite of tools, including a simulator, custom kernels, and a general-purpose MLIR compiler. SoC designers and ML developers looking to build the next generation of wearables should leverage the Coral NPU reference architecture to balance high-performance AI with extreme power efficiency. By utilizing the open-source documentation and RISC-V-based tools, teams can significantly reduce the complexity of deploying private, always-on ambient sensing.