stable-diffusion

2 posts

naver

Research on Protecting the Webtoon (opens in new tab)

Naver Webtoon is proactively developing technical solutions to safeguard its digital creation ecosystem against evolving threats like illegal distribution and unauthorized generative AI training. By integrating advanced AI-based watermarking and protective perturbation technologies, the platform successfully tracks content leaks and disrupts unauthorized model fine-tuning. These efforts ensure a sustainable environment where creators can maintain the integrity and economic value of their intellectual property. ## Challenges in the Digital Creation Ecosystem - **Illegal Content Leakage**: Unauthorized reproduction and distribution of digital content infringe on creator earnings and damage the platform's business model. - **Unauthorized Generative AI Training**: The rise of fine-tuning techniques (e.g., LoRA, Dreambooth) allows for the unauthorized mimicry of an artist's unique style, distorting the value of original works. - **Harmful UGC Uploads**: The presence of violent or suggestive user-generated content increases operational costs and degrades the service experience for readers. ## AI-Based Watermarking for Post-Tracking - To facilitate tracking in DRM-free environments, Naver Webtoon developed an AI-based watermarking system that embeds invisible signals into the pixels of digital images. - The system is designed around three conflicting requirements: **Invisibility** (signal remains hidden), **Robustness** (signal survives attacks like cropping or compression), and **Capacity** (sufficient data for tracking). - The technical pipeline involves three neural modules: an **Embedder** to insert the signal, a differentiable **Attack Layer** to simulate real-world distortions, and an **Extractor** to recover the signal. - Performance metrics show a high Peak Signal-to-Noise Ratio (PSNR) of over 46 dB, and the system maintains a signal error rate of less than 1% even when subjected to intense signal processing or geometric editing. ## IMPASTO: Disrupting Unauthorized AI Training - This technology utilizes **protective perturbation**, which adds microscopic changes to images that are invisible to humans but confuse generative AI models during the training phase. - It targets the way diffusion models (like Stable Diffusion) learn by either manipulating latent representations or disrupting the denoising process, preventing the AI from accurately mimicking an artist's style. - The research prioritizes overcoming the visual artifacts and slow processing speeds found in existing academic tools like Glaze and PhotoGuard. - By implementing these perturbations, any attempts to fine-tune a model on protected work will result in distorted or unintended outputs, effectively shielding the artist's original style. ## Integrated Protection Frameworks - **TOONRADAR**: A comprehensive system deployed since 2017 that uses watermarking for both proactive blocking and retrospective tracking of illegal distributors. - **XPIDER**: An automated detection tool tailored specifically for the comic domain to identify and block harmful UGC, reducing manual inspection overhead. - These solutions are being expanded not just for copyright protection, but to establish long-term trust and reliability in the era of AI-generated content. The deployment of these AI-driven defense mechanisms is essential for maintaining a fair creative economy. By balancing visual quality with robust protection, platforms can empower creators to share their work globally without the constant fear of digital theft or stylistic mimicry.

line

How to evaluate AI-generated images? (opens in new tab)

LY Corporation is developing a text-to-image pipeline to automate the creation of branded character illustrations, aiming to reduce the manual workload for designers. The project focuses on utilizing Stable Diffusion and Flow Matching models to generate high-quality images that strictly adhere to specific corporate style guidelines. By systematically evaluating model architectures and hyperparameters, the team seeks to transform subjective image quality into a quantifiable and reproducible technical process. ### Evolution of Image Generation Models * **Diffusion Models:** These models generate images through a gradual denoising process. They use a forward process to add Gaussian noise via a Markov chain and a reverse process to restore the original image based on learned probability distributions. * **Stable Diffusion (SD):** Unlike standard diffusion that operates in pixel space, SD works within a "latent space" using a Variational Autoencoder (VAE). This significantly reduces computational load by denoising latent vectors rather than raw pixels. * **SDXL and SD3.5:** SDXL improves prompt comprehension by adding a second text encoder (CLIP-G/14). SD3.5 introduces a major architectural shift by moving from diffusion to "Flow Matching," utilizing a Multimodal Diffusion Transformer (MMDiT) that handles text and image modalities in a single block for better parameter efficiency. * **Flow Matching:** This approach treats image generation as a deterministic movement through a vector field. Instead of removing stochastic noise, it learns the velocity required to transform a simple probability distribution into a complex data distribution. ### Core Hyperparameters for Output Control * **Seeds and Latent Vectors:** The seed is the integer value that determines the initial random noise. Since Stable Diffusion operates in latent space, this noise is essentially the starting latent vector that dictates the basic structure of the final image. * **Prompts:** Textual inputs serve as the primary guide for the denoiser. Models are trained on image-caption pairs, allowing the U-Net or Transformer blocks to align the visual output with the user’s descriptive intent. * **Classifier-Free Guidance (CFG):** This parameter adjusts the weight of the prompt's influence. It calculates the difference between noise predicted with a prompt and noise predicted without one (or with a negative prompt), allowing users to control how strictly the model follows the text instructions. ### Practical Recommendation To achieve consistent results that match a specific brand identity, it is insufficient to rely on prompts alone; developers should implement automated hyperparameter search and black-box optimization. Transitioning to Flow Matching models like SD3.5 can provide a more deterministic generation path, which is critical when attempting to scale the production of high-quality, branded assets.