naver Dec 1, 2025

Iceberg Low-Latency Queries with Materialized Views (opens in new tab)

data-architecture apache-spark apache-iceberg starrocks real-time-data-processing materialized-views low-latency-queries

This technical session from NAVER ENGINEERING DAY 2025 explores the architectural journey of building a low-latency query system for real-time transaction reports. The project focuses on resolving the tension between high data freshness, massive scalability, and rapid response times for complex, multi-dimensional filtering. By leveraging Apache Iceberg in conjunction with StarRocks’ materialized views, the team established a performant data pipeline that meets the demands of modern business intelligence.

Challenges in Real-Time Transaction Reporting

Query Latency vs. Data Freshness: Traditional architectures often struggle to provide immediate visibility into transaction data while maintaining sub-second query speeds across diverse filter conditions.
High-Dimensional Filtering: Users require the ability to query reports based on numerous variables, necessitating an engine that can handle complex aggregations without pre-defining every possible index.
Scalability Requirements: The system must handle increasing transaction volumes without degrading performance or requiring significant manual intervention in the underlying storage layer.

Optimized Architecture with Iceberg and StarRocks

Apache Iceberg Integration: Iceberg serves as the open table format, providing a reliable foundation for managing large-scale data snapshots and ensuring consistency during concurrent reads and writes.
StarRocks for Query Acceleration: The team selected StarRocks as the primary OLAP engine to take advantage of its high-speed vectorized execution and native support for Iceberg tables.
Spark-Based Processing: Apache Spark is utilized for the initial data ingestion and transformation phases, preparing the transaction data for efficient storage and downstream consumption.

Enhancing Performance via Materialized Views

Pre-computed Aggregations: By implementing Materialized Views, the system pre-calculates intensive transaction summaries, significantly reducing the computational load during active user queries.
Automatic Query Rewrite: The architecture utilizes StarRocks' ability to automatically route queries to the most efficient materialized view, ensuring that even ad-hoc reports benefit from pre-computed results.
Balanced Refresh Strategies: The research focused on optimizing the refresh intervals of these views to maintain high "freshness" while minimizing the overhead on the cluster resources.

The adoption of a modern lakehouse architecture combining Apache Iceberg with a high-performance OLAP engine like StarRocks is a recommended strategy for organizations dealing with high-volume, real-time reporting. This approach effectively decouples storage and compute while providing the low-latency response times necessary for interactive data analysis.