3d-reconstruction

1 posts

google

Bringing 3D shoppable products online with generative AI (opens in new tab)

Google has developed a series of generative AI techniques to transform standard 2D product images into immersive, interactive 3D visualizations for online shopping. By evolving from early neural reconstruction methods to state-of-the-art video generation models like Veo, Google can now produce high-quality 360-degree spins from as few as three images. This progression significantly reduces the cost and complexity for businesses to create shoppable 3D experiences at scale across diverse product categories. ## First Generation: Neural Radiance Fields (NeRFs) * Launched in 2022, this initial approach utilized NeRF technology to synthesize novel views and 360° spins, specifically for footwear on Google Search. * The system required five or more images and relied on complex sub-processes, including background removal, XYZ prediction (NOCS), and camera position estimation. * While a breakthrough, the technology struggled with "noisy" signals and complex geometries, such as the thin structures found in sandals or high heels. ## Second Generation: View-Conditioned Diffusion * Introduced in 2023, this version addressed previous limitations by using a diffusion-based architecture to predict unseen viewpoints from limited data. * The model utilized Score Distillation Sampling (SDS), which compares rendered 3D models against generated targets to iteratively refine parameters for better realism. * This approach allowed Google to scale 3D visualizations to the majority of shoes viewed on Google Shopping, handling more diverse and difficult footwear styles. ## Third Generation: Generalizing with Veo * The current advancement leverages Google’s Veo video generation model to transform product images into consistent, high-fidelity 360° videos. * By training on millions of synthetic 3D assets, Veo captures complex interactions between light, texture, and geometry, making it effective for shiny surfaces and diverse categories like electronics and furniture. * This method removes the need for precise camera pose estimation, increasing reliability across different environments. * While the model can generate a 3D representation from a single image by "hallucinating" missing details, using three images significantly reduces errors and ensures high-fidelity accuracy. These technological milestones mark a shift from specialized 3D reconstruction toward generalized AI models that make digital products feel tangible and interactive for consumers.