optimization

2 posts

kakao

Building an Ultra-lightweight (opens in new tab)

Kakao developed a specialized, lightweight morphological analyzer to meet the strict resource constraints of mobile environments where modern deep-learning models are often too heavy. By opting for a classical Viterbi-based approach implemented in C++20, the team successfully reduced the library's binary size to approximately 200KB while ensuring high performance. This development highlights how traditional algorithmic optimization and careful language selection remain vital for mobile software efficiency. ## The Choice of C++ over Rust - While Rust was considered for its safety, it was ultimately rejected because its default binary size (even with optimization) reached several megabytes, which was too large for the specific project requirements. - C++ was chosen because mobile platforms like iOS and Android already include standard libraries (libc++ or libstdc++), allowing the final analyzer binary to be stripped down to core logic. - The project utilized C++20 features such as Concepts and `std::span` to replace older patterns like SFINAE and `gsl::span`, resulting in more readable and maintainable code without sacrificing performance. ## Trie Compression using LOUDS - To minimize the dictionary size, the team implemented a LOUDS (Level-Order Unary Degree Sequence) structure, which represents a Trie using a bit sequence instead of pointers. - This approach provides a compression rate near the information-theoretic lower bound, allowing approximately 760,000 nodes to be stored in just 9.4MB. - Further optimization was achieved through a custom encoding scheme that represents Hangul in 2 bytes and English in 1 byte, significantly reducing the dictionary's memory footprint compared to standard UTF-8. ## Optimizing the Select Bit Operation - Initial performance profiling showed that the `select0` operation (finding the N-th zero in a bit sequence) consumed 90% of the dictionary search time due to linear search overhead. - The solution involved dividing the bit sequence into 64-bit chunks and storing the cumulative count of zeros at each chunk boundary in a separate array. - By using binary search to find the correct chunk and applying parallel bit-counting techniques for intra-chunk searching, the dictionary search time was reduced from 165ms to 10ms. - These optimizations led to a total analysis time improvement from 182ms to 28ms, making the tool highly responsive for real-time mobile use. For mobile developers facing strict hardware limitations, this project proves that combining classical data structures like LOUDS with modern low-level language features can yield performance and size benefits that deep learning alternatives currently cannot match.

google

A new quantum toolkit for optimization (opens in new tab)

Researchers at Google Quantum AI have introduced Decoded Quantum Interferometry (DQI), a new quantum algorithm designed to tackle optimization problems that remain intractable for classical supercomputers. By leveraging the wavelike nature of quantum mechanics to create specific interference patterns, the algorithm converts complex optimization tasks into high-dimensional lattice decoding problems. This breakthrough provides a theoretical framework where large-scale, error-corrected quantum computers could eventually outperform classical methods by several orders of magnitude on commercially relevant tasks. ### Linking Optimization to Lattice Decoding * The DQI algorithm functions by mapping the cost landscape of an optimization problem onto a periodic lattice structure. * The "decoding" aspect involves identifying the nearest lattice element to a specific point in space, a task that becomes exponentially difficult for classical computers as dimensions increase into the hundreds or thousands. * By using quantum interference to bridge these fields, researchers can apply decades of sophisticated classical decoding research—originally developed for data storage and transmission—to solve optimization challenges. * This approach is unique because it requires a quantum computer to leverage these classical decoding algorithms in a way that conventional hardware cannot. ### Solving the Optimal Polynomial Intersection (OPI) Problem * The most significant application of DQI is for the OPI problem, where the goal is to find a low-degree polynomial that intersects the maximum number of given target points. * OPI is a foundational task in data science (polynomial regression), cryptography, and digital error correction, yet it remains "hopelessly difficult" for classical algorithms in many scenarios. * DQI transforms the OPI problem into a task of decoding Reed-Solomon codes, a family of codes widely used in technologies like QR codes and DVDs. * Technical analysis indicates a massive performance gap: certain OPI instances could be solved by a quantum computer in approximately a few million operations, while the most efficient classical algorithms would require over $10^{23}$ (one hundred sextillion) operations. ### Practical Conclusion As quantum hardware moves toward the era of error correction, Decoded Quantum Interferometry identifies a specific class of "NP-hard" problems where quantum machines can provide a clear win. Researchers and industries focusing on cryptography and complex data regression should monitor DQI as a primary candidate for demonstrating the first generation of commercially viable quantum advantage in optimization.