Iceberg Low-Latency Queries with Materialized Views (opens in new tab)
This technical session from NAVER ENGINEERING DAY 2025 explores the architectural journey of building a low-latency query system for real-time transaction reports. The project focuses on resolving the tension between high data freshness, massive scalability, and rapid response times for complex, multi-dimensional filtering. By leveraging Apache Iceberg in conjunction with StarRocks’ materialized views, the team established a performant data pipeline that meets the demands of modern business intelligence.
Challenges in Real-Time Transaction Reporting
- Query Latency vs. Data Freshness: Traditional architectures often struggle to provide immediate visibility into transaction data while maintaining sub-second query speeds across diverse filter conditions.
- High-Dimensional Filtering: Users require the ability to query reports based on numerous variables, necessitating an engine that can handle complex aggregations without pre-defining every possible index.
- Scalability Requirements: The system must handle increasing transaction volumes without degrading performance or requiring significant manual intervention in the underlying storage layer.
Optimized Architecture with Iceberg and StarRocks
- Apache Iceberg Integration: Iceberg serves as the open table format, providing a reliable foundation for managing large-scale data snapshots and ensuring consistency during concurrent reads and writes.
- StarRocks for Query Acceleration: The team selected StarRocks as the primary OLAP engine to take advantage of its high-speed vectorized execution and native support for Iceberg tables.
- Spark-Based Processing: Apache Spark is utilized for the initial data ingestion and transformation phases, preparing the transaction data for efficient storage and downstream consumption.
Enhancing Performance via Materialized Views
- Pre-computed Aggregations: By implementing Materialized Views, the system pre-calculates intensive transaction summaries, significantly reducing the computational load during active user queries.
- Automatic Query Rewrite: The architecture utilizes StarRocks' ability to automatically route queries to the most efficient materialized view, ensuring that even ad-hoc reports benefit from pre-computed results.
- Balanced Refresh Strategies: The research focused on optimizing the refresh intervals of these views to maintain high "freshness" while minimizing the overhead on the cluster resources.
The adoption of a modern lakehouse architecture combining Apache Iceberg with a high-performance OLAP engine like StarRocks is a recommended strategy for organizations dealing with high-volume, real-time reporting. This approach effectively decouples storage and compute while providing the low-latency response times necessary for interactive data analysis.