line

AI and the Writer’s Journey (opens in new tab)

LY Corporation is addressing the chronic shortage of high-quality technical documentation by treating the problem as an engineering challenge rather than a training issue. By utilizing Generative AI to automate the creation of API references, the Document Engineering team has transitioned from a "manual craftsmanship" approach to an "industrialized production" model. While the system significantly improves efficiency and maintains internal context better than generic tools, the team concludes that human verification remains essential due to the high stakes of API accuracy. ### Contextual Challenges with Generic AI Standard coding assistants like GitHub Copilot often fail to meet the specific documentation needs of a large organization. * Generic tools do not adhere to internal company style guides or maintain consistent terminology across projects. * Standard AI lacks awareness of internal technical contexts; for example, generic AI might mistake a company-specific identifier like "MID" for "Member ID," whereas the internal tool understands its specific function within the LY ecosystem. * Fragmented deployment processes across different teams make it difficult for developers to find a single source of truth for API documentation. ### Multi-Stage Prompt Engineering To ensure high-quality output without overwhelming the LLM's "memory," the team refined a complex set of instructions into a streamlined three-stage workflow. * **Language Recognition:** The system first identifies the programming language and specific framework being used. * **Contextual Analysis:** It analyzes the API's logic to generate relevant usage examples and supplemental technical information. * **Detail Generation:** Finally, it writes the core API descriptions, parameter definitions, and response value explanations based on the internal style guide. ### Transitioning to Model Context Protocol (MCP) While the prototype began as a VS Code extension, the team shifted to using the Model Context Protocol (MCP) to ensure the tool was accessible across various development environments. * Moving to MCP allows the tool to support multiple IDEs, including IntelliJ, which was a high-priority request from the developer community. * The MCP architecture decouples the user interface from the core logic, allowing the "host" (like the IDE) to handle UI interactions and parameter inputs. * This transition reduced the maintenance burden on the Document Engineering team by removing the need to build and update custom UI components for every IDE. ### Performance and the Accuracy Gap Evaluation of the AI-generated documentation showed strong results, though it highlighted the unique risks of documenting APIs compared to other forms of writing. * Approximately 88% of the AI-generated comments met the team's internal evaluation criteria. * The specialized generator outperformed GitHub Copilot in 78% of cases regarding style and contextual relevance. * The team noted that while a 99% accuracy rate is excellent for a blog post, a single error in a short API reference can render the entire document useless for a developer. To successfully implement AI-driven documentation, organizations should focus on building tools that understand internal business logic while maintaining a strict "human-in-the-loop" workflow. Developers should use these tools to generate the bulk of the content but must perform a final technical audit to ensure the precision that only a human author can currently guarantee.

line

Code Quality Improvement Techniques Part 1 (opens in new tab)

Effective naming in software development should prioritize the perspective of the code's consumer over the visual consistency of class declarations. By following natural grammatical structures, developers can reduce ambiguity and ensure that the purpose of a class or variable is immediately clear regardless of context. Ultimately, clear communication through grammar is more valuable for long-term maintenance than aesthetic symmetry in the codebase. ### Prefixing vs. Postfixing for Class Names When splitting a large class like `SettingRepository` into specific modules (e.g., Account, Security, or Language), the choice of where to place the modifier significantly impacts readability. * Postfixing modifiers (e.g., `SettingRepositorySecurity`) might look organized in a file directory, but it creates grammatical confusion when the class is used in isolation. * A developer encountering `SettingRepositorySecurity` in a constructor might misinterpret it as a "security module belonging to the SettingRepository" rather than a repository specifically for security settings. * Prefixing the modifier (e.g., `SecuritySettingRepository`) follows standard English grammar, clearly identifying the object as a specific type of repository and reducing the cognitive load for the reader. ### Handling Multiple Modifiers and the "Sandwich" Effect In cases where a single prefix is insufficient, such as defining the "height of a send button in portrait mode," naming becomes more complex. * Using only prefixes (e.g., `portraitSendButtonHeight`) can be ambiguous, potentially being read as the "height of a button used to send a portrait." * To resolve this, developers can use a "modifier sandwich" by moving some details to the end using prepositions like "for," "of," or "in" (e.g., `sendButtonHeightForPortrait`). * While prepositions are helpful for variables, they should generally be avoided in class or struct names to ensure that instance names derived from the type remain concise. * Developers should also defer to platform-specific conventions; for example, Java and Kotlin often omit prepositions in standard APIs, such as using `currentTimeMillis` instead of `currentTimeInMillis`. When naming any component, favor the clarity of the person reading the implementation over the convenience of the person writing the definition. Prioritizing grammatical correctness ensures that the intent of the code remains obvious even when a developer is looking at a single line of code.

line

Checking LINE app video call quality (opens in new tab)

To optimize the LINE messenger’s communication performance, LY Corporation conducted an on-site call quality assessment in Thailand to analyze local network conditions and compare performance against rising competitors. The study concluded that while LINE offers superior visual clarity and higher bitrates than its rivals, this high-performance strategy requires a careful technical balance to prevent video freezing in unstable network environments. ### High Video Call Adoption in Thailand * Thailand exhibits the highest video call usage among LINE’s major markets, with video calls accounting for 30.43% of all 1:1 sessions—more than double the rate of Japan or Taiwan. * The surge in usage by competitors, specifically "Messenger A," has necessitated frequent benchmarking to maintain LINE’s market leadership and technical edge. * Thailand serves as the primary testing ground for any updates to video modules due to the local user base's preference for high-quality real-time visual communication. ### On-Site Quality Testing Methodology * The assessment was performed over five days by five engineers across high-traffic locations in Bangkok, such as Siam Paragon and Samron Market, using True and AIS 4G/5G networks. * Engineers focused on Quality of Service (QoS) metrics—including packet loss and jitter—to estimate the actual Quality of Experience (QoE) for users. * Baseline performance for LINE in Thailand was recorded at VGA resolution, with frame rates exceeding 20 FPS and an average latency of approximately 150ms. ### Bitrate Strategy and Performance Trade-offs * LINE utilizes a high-bitrate strategy, capping at 1Mbps on 5G and 600kbps on 4G, to deliver sharper, more defined images than Competitor A. * A "start-at-max" approach is used where LINE attempts to find and utilize the highest possible bitrate from the beginning of the call to ensure immediate high quality. * In contrast, competitors adopt a conservative bitrate strategy, starting low and increasing slowly to prioritize connection stability over visual fidelity. * The trade-off for LINE’s higher quality is an increased risk of "freezing"—defined as a single frame persisting for more than 200ms—when the network becomes congested or unstable. ### Technical Implications for Future Development * The relationship between bitrate and network stability remains a zero-sum trade-off; higher bitrates provide better clarity but increase the likelihood of packet delay and loss at the router level. * LINE’s engineering focus is directed toward optimizing the "initial bitrate" detection logic to ensure high quality without triggering network-induced lag in crowded urban environments. * Continuous tuning of the balance between peak visual performance and consistent playback remains the core challenge for maintaining service quality in the Thai market.

google

Optimizing LLM-based trip planning (opens in new tab)

Google Research has developed a hybrid planning system that combines Large Language Models (LLMs) with traditional optimization algorithms to solve complex trip-planning tasks. While LLMs excel at interpreting qualitative user preferences—such as a desire for "lesser-known museums"—they often struggle with hard quantitative constraints like travel logistics and fluctuating opening hours. By using an LLM to generate an initial draft and a secondary algorithm to refine it against real-world data, the system produces itineraries that are both highly personalized and logistically feasible. ## The Hybrid Planning Architecture * The process begins with a Gemini model generating an initial trip plan based on the user's natural language query, identifying specific activities and their perceived importance. * This draft is grounded using live data, incorporating up-to-date opening hours, transit schedules, and travel times between locations. * Search backends simultaneously retrieve alternative activities to serve as potential substitutes if the LLM's original suggestions prove logistically impossible. ## Two-Stage Optimization Algorithm * The first stage focuses on single-day scheduling, using dynamic programming and exhaustive search to find the most efficient sequence for subsets of activities. * Each potential daily schedule is assigned a quality score based on its feasibility and how closely it aligns with the LLM's original intent. * The second stage addresses the multi-day itinerary as a weighted variant of the "set packing problem," which ensures that activities do not overlap across different days. * Because multi-day optimization is NP-complete, the system employs local search heuristics to swap activities between days, iteratively improving the total score until the plan converges. ## Balancing Intent and Feasibility * In practical testing, the system demonstrated a superior ability to handle nuanced requests, such as finding "lesser-known" museums in NYC, which traditional retrieval systems often fail by suggesting famous landmarks like the Met. * The optimization layer specifically corrects geographical inefficiencies, such as the LLM suggesting a "zig-zag" route across San Francisco, by regrouping activities into logical clusters to minimize travel time. * The system maintains the "spirit" of the LLM's creative suggestions—like visiting a specific scenic viewpoint—while ensuring the user doesn't arrive after the gates have closed. This hybrid approach suggests that the most reliable AI planning tools do not rely on LLMs in isolation. By using LLMs as creative engines for intent interpretation and delegating logistical verification to rigid algorithmic frameworks, developers can create tools that are both imaginative and practically dependable.

google

Zooming in: Efficient regional environmental risk assessment with generative AI (opens in new tab)

Google Research has introduced a dynamical-generative downscaling method that combines physics-based climate modeling with probabilistic diffusion models to produce high-resolution regional environmental risk assessments. By bridging the resolution gap between global Earth system models and city-level data needs, this approach provides a computationally efficient way to quantify climate uncertainties at a 10 km scale. This hybrid technique significantly reduces error rates compared to traditional statistical methods while remaining far less computationally expensive than full-scale dynamical simulations. ## The Resolution Gap in Climate Modeling * Traditional Earth system models typically operate at a resolution of ~100 km, which is too coarse for city-level planning regarding floods, heatwaves, and wildfires. * Existing "dynamical downscaling" uses regional climate models (RCMs) to provide physically realistic 10 km projections, but the computational cost is too high to apply to large ensembles of climate data. * Statistical downscaling offers a faster alternative but often fails to capture complex local weather patterns or extreme events, and it struggles to generalize to unprecedented future climate conditions. ## A Hybrid Dynamical-Generative Framework * The process begins with a "physics-based first pass," where an RCM downscales global data to an intermediate resolution of 50 km to establish a common physical representation. * A generative AI system called "R2D2" (Regional Residual Diffusion-based Downscaling) then adds fine-scale details, such as the effects of complex topography, to reach the target 10 km resolution. * R2D2 specifically learns the "residual"—the difference between intermediate and high-resolution fields—which simplifies the learning task and improves the model's ability to generalize to unseen environmental conditions. ## Efficiency and Accuracy in Risk Assessment * The model was trained and validated using the Western United States Dynamically Downscaled Dataset (WUS-D3), which utilizes the "gold standard" WRF model. * The dynamical-generative approach reduced fine-scale errors by over 40% compared to popular statistical methods like BCSD and STAR-ESDM. * A key advantage of this method is its scalability; the AI requires training on only one dynamically downscaled model to effectively process outputs from various other Earth system models, allowing for the rapid assessment of large climate ensembles. By combining the physical grounding of traditional regional models with the speed of diffusion-based AI, researchers can now produce granular risk assessments that were previously cost-prohibitive. This method allows for a more robust exploration of future climate scenarios, providing essential data for farming, water management, and community protection.

line

Code Quality Improvement Techniques Part 14 (opens in new tab)

Applying the Single Responsibility Principle is a fundamental practice for maintaining high code quality, but over-fragmenting logic can inadvertently lead to architectural complexity. While splitting classes aims to increase cohesion, it can also scatter business constraints and force callers to manage an overwhelming number of dependencies. This post explores the "responsibility of assigning responsibility," arguing that sometimes maintaining a slightly larger, consolidated class is preferable to creating fragmented "Ravioli code." ### Initial Implementation and the Refactoring Drive The scenario involves a dynamic "Launch Button" that can fire rockets, fireworks, or products depending on its mode. * The initial design used a single `LaunchButtonBinder` that held references to all possible `Launcher` types and an internal enum to select the active one. * To strictly follow the Single Responsibility Principle, developers often attempt to split this into two parts: a binder for the button logic and a selector for choosing the mode. * The refactored approach utilized a `LaunchBinderSelector` to manage multiple `LaunchButtonBinder` instances, using an `isEnabled` flag to toggle which logic was active. ### The Problem of Scattered Constraints and State While the refactored classes are individually simpler, the overall system becomes harder to reason about due to fragmented logic. * **Verification Difficulty:** In the original code, the constraint that "only one thing launches at a time" was obvious in a single file; in the refactored version, a developer must trace multiple classes and loops to verify this behavior. * **State Redundancy:** Adding an `isEnabled` property to binders creates a risk of state synchronization issues between the selector’s current mode and the binders' internal flags. * **Information Hiding Trade-offs:** Attempting to hide implementation details often forces the caller to resolve all dependencies (binders, buttons, and launchers) manually, which can turn the caller into a bloated "God class." ### Avoiding "Ravioli Code" Through Balanced Design The pursuit of granular responsibilities can lead to "Ravioli code," where the system consists of many small, independent components but lacks a clear, cohesive structure. * The original implementation’s advantage was that it encapsulated all logic related to the launch button's constraints in one place. * When deciding to split a class, developers must evaluate if the move improves the overall system or simply shifts the burden of complexity to the caller. * Effective design requires balancing individual class cohesion with the overhead of inter-module coupling and dependency management. When refactoring for code quality, prioritize the clarity of the overall system over the dogmatic pursuit of small classes. If splitting a class makes it harder to verify business constraints or complicates the caller's logic significantly, it may be better to keep those related responsibilities together.

google

Learning to clarify: Multi-turn conversations with Action-Based Contrastive Self-Training (opens in new tab)

Action-Based Contrastive Self-Training (ACT) is a novel approach designed to enhance the multi-turn conversational capabilities of large language models, specifically their ability to ask clarifying questions when faced with ambiguity. While standard models often default to guessing a user's intent or overhedging, ACT optimizes conversational action planning as an implicit subtask of response generation. This method demonstrates that data-efficient tuning can significantly improve dialogue policy learning and reasoning in complex, mixed-initiative interactive scenarios. ## Implicit Action Planning * Traditional conversational agents use separate modules for dialogue planning (deciding when to clarify) and response generation. * ACT introduces "implicit action planning," which integrates these steps by teaching the model to perform planning as an inherent part of the end-to-end generation process. * This approach addresses the limitations of standard Direct Preference Optimization (DPO), which often fails to account for the long-term, multi-turn consequences of specific dialogue actions. ## Action-Based Contrastive Data Generation * The first phase involves building a preference dataset by identifying "winning" and "losing" actions for specific conversation turns. * Using an existing dataset, the system identifies a successful turn (e.g., a clarifying question) as the winning response. * A synthetic "rejected" response is then generated to represent a converse, less-optimal action (e.g., attempting to answer despite ambiguity). * This creates a pairwise dataset that contrastively defines successful versus unsuccessful conversational strategies. ## Quasi-Online Contrastive Self-Training * Instead of relying solely on static, offline pairs, ACT employs on-policy sampling to simulate the multi-turn trajectory of a response. * The model evaluates whether a sampled response (such as a clarifying question) leads to a successful final outcome based on the user's original intent. * If the simulated trajectory is successful, it replaces the winning response in the DPO update; if it fails, it is used to refine the losing response. * This quasi-online feedback loop ensures the model is optimized based on the actual outcomes of its conversational decisions rather than just single-turn labels. ## Evaluation and the AmbigSQL Benchmark * The researchers introduced AmbigSQL, a new benchmark task focusing on disambiguating information-seeking requests for complex SQL code generation. * ACT was also tested on real-world tasks including tabular-grounded question-answering and machine reading comprehension. * Experimental results show that ACT substantially outperforms standard Supervised Fine-Tuning (SFT) and standard DPO in multi-turn conversation modeling. By focusing on the downstream consequences of dialogue actions, ACT provides a practical framework for developers to build more "mixed-initiative" agents that know when to stop and ask for clarification, ultimately leading to higher accuracy in complex data-seeking tasks.

line

Complex user authentication processes are (opens in new tab)

Designing a robust membership authentication system is a critical early-stage requirement that prevents long-term technical debt and protects a platform’s integrity. By analyzing the renewal of the Demaecan delivery service, it is evident that choosing the right authentication mechanism depends heavily on regional infrastructure and a balance between security costs and user friction. Ultimately, a well-structured authentication flow can simultaneously reduce fraud rates and significantly lower user drop-off during registration. ### The Consequences of Weak Authentication Neglecting authentication design during the initial stages of a project often leads to "ghost members" and operational hurdles that are difficult to rectify later. * **Data Integrity Issues:** Without verification, databases fill with unreachable or fake contact information, such as invalid phone numbers. * **Onboarding Blockers:** Legitimate new users may be prevented from signing up if their recycled phone numbers are already linked to unverified legacy accounts. * **Marketing Abuse:** A lack of unique identifiers makes it impossible to prevent bad actors from creating multiple accounts to exploit promotional coupons or events. ### Regional Differences in Verification Authentication strategies must be tailored to the specific digital infrastructure of the target market, as "identity verification" varies globally. * **Domestic (Korea) Standards:** Highly integrated systems allow for "Identity Verification," which combines possession (OTP) and real-name data through telecommunications companies or banking systems. * **Global and Japanese Standards:** Most regions lack a centralized government-linked identity system, relying instead on "Possession Authentication" via email or SMS, or simple two-factor authentication (2FA). * **Verification Expiration:** High-security services must define clear validity periods for authentication data and determine how long to retain data after a user withdraws to prevent immediate re-abuse. ### Strategic Fraud Prevention via IVR When SMS-based possession authentication becomes insufficient to stop determined abusers, shifting the economic cost for the fraudster is an effective solution. * **SMS vs. Voice (IVR):** In Japan, acquiring phone numbers capable of receiving voice calls is more expensive than acquiring SMS-only numbers. * **IVR Implementation:** By switching to call-based (Inbound Voice Response) authentication, Demaecan increased the barrier to entry for abusers. * **Impact:** This strategic shift in authentication type reduced the fraudulent user rate from over 20% to just 1.5%. ### Optimizing Sign-up UX and Retention A complex authentication process does not have to result in high churn if the UI flow is logically organized and user-friendly. * **Logical Grouping:** Grouping similar tasks—such as placing phone and email verification sequentially—helps users understand the progression of the sign-up flow. * **Streamlined Data Entry:** Integrating social login buttons early in the process allows for email auto-fill, reducing the number of manual input fields for the user. * **Safety Nets:** Implementing simple "back" buttons for correcting typos during email verification and adding warning dialogs when a user tries to close the window significantly reduces accidental exits. * **Performance Metrics:** These UX improvements led to a 30% decrease in user attrition, proving that structured flows can mitigate the friction of multi-step verification. To build a successful authentication system, planners should prioritize the most cost-effective verification method for their specific market and focus on grouping steps logically to maintain a smooth user experience. Monitoring conversion logs is essential to identify and fix specific points in the flow where users might struggle.

line

Code Quality Improvement Techniques Part 1 (opens in new tab)

The "Clone Family" anti-pattern occurs when two parallel inheritance hierarchies—such as a data model tree and a provider tree—share an implicit relationship that is not enforced by the type system. This structure often leads to type-safety issues and requires risky downcasting to access specific data types, increasing the likelihood of runtime errors during code modifications. To resolve this, developers should replace rigid inheritance with composition or utilize parametric polymorphism to explicitly link related types. ## The Risks of Implicit Correspondence Maintaining two separate inheritance trees where individual subclasses are meant to correspond to one another creates several technical hurdles. * **Downcasting Requirements:** Because a base provider typically returns a base data model type, developers must manually cast the result to a specific subclass (e.g., `as FooDataModel`), which bypasses compiler safety. * **Lack of Type Enforcement:** The constraint that a specific provider always returns a specific model is purely implicit; the compiler cannot prevent a provider from returning the wrong model type. * **Fragile Architecture:** As the system grows, ensuring that "Provider A" always maps to "Model A" becomes difficult to audit, leading to potential bugs when new developers join the project or when the hierarchy is extended. ## Substituting Inheritance with Composition When the primary goal of inheritance is simply to share common logic, such as fetching raw data, using composition or aggregation is often a superior alternative. * **Logic Extraction:** Shared functionality can be moved into a standalone class, such as an `OriginalDataProvider`, which is then held as a private property within specific provider classes. * **Direct Type Returns:** By removing the shared parent class, each provider can explicitly return its specific data model type without needing a common interface. * **Decoupling:** This approach eliminates the "Clone Family" entirely by removing the need for parallel trees, resulting in cleaner and more modular code. ## Leveraging Parametric Polymorphism In scenarios where a common parent class is necessary—for example, to manage a collection of providers within a shared lifecycle—generics can be used to bridge the two hierarchies safely. * **Generic Type Parameters:** By defining the parent as `ParentProvider<T>`, the base class can use a type parameter for its return values rather than a generic base model. * **Subclass Specification:** Each implementation (e.g., `FooProvider : ParentProvider<FooDataModel>`) explicitly defines its return type, allowing the compiler to enforce the relationship. * **Flexible Constraints:** Developers can still utilize type bounds, such as `ParentProvider<T : CommonDataModel>`, to ensure that the generics adhere to a specific interface while maintaining type safety for callers. When designing data providers and models, avoid creating parallel structures that rely on implicit assumptions. Prioritize composition to simplify the architecture, or use generics if inheritance is required, ensuring that the relationships between classes remain explicit and verifiable by the compiler.

line

Implementing a RAG-Based Bot to (opens in new tab)

To address the operational burden of handling repetitive user inquiries for the AWX automation platform, LY Corporation developed a support bot utilizing Retrieval-Augmented Generation (RAG). By combining internal documentation with historical Slack thread data, the system provides automated, context-aware answers that significantly reduce manual SRE intervention. This approach enhances service reliability by ensuring users receive immediate assistance while allowing engineers to focus on high-priority development tasks. ### Technical Infrastructure and Stack * **Slack Integration**: The bot is built using the **Bolt for Python** framework to handle real-time interactions within the company’s communication channels. * **LLM Orchestration**: **LangChain** is used to manage the RAG pipeline; the developers suggest transitioning to LangGraph for teams requiring more complex multi-agent workflows. * **Embedding Model**: The **paraphrase-multilingual-mpnet-base-v2** (SBERT) model was selected to support multi-language inquiries from LY Corporation’s global workforce. * **Vector Database**: **OpenSearch** serves as the vector store, chosen for its availability as an internal PaaS and its efficiency in handling high-dimensional data. * **Large Language Model**: The system utilizes **OpenAI (ChatGPT) Enterprise**, which ensures business data privacy by preventing the model from training on internal inputs. ### Enhancing LLM Accuracy through RAG and Vector Search * **Overcoming LLM Limits**: Traditional LLMs suffer from "hallucinations," lack of up-to-date info, and opaque sourcing; RAG fixes this by providing the model with specific, trusted context during the prompt phase. * **Embedding and Vectorization**: Textual data from wikis and chats are converted into high-dimensional vectors, where semantically similar phrases (e.g., "Buy" and "Purchase") are stored in close proximity. * **k-NN Retrieval**: When a user asks a question, the bot uses **k-Nearest Neighbors (k-NN)** algorithms to retrieve the top *k* most relevant snippets of information from the vector database. * **Contextual Generation**: Rather than relying on its internal training data, the LLM generates a response based specifically on the retrieved snippets, leading to higher accuracy and domain-specific relevance. ### AWX Support Bot Workflow and Data Sources * **Multi-Source Indexing**: The bot references two main data streams: the official internal AWX guide wiki and historical Slack inquiry threads where previous solutions were discussed. * **Automated First Response**: The workflow begins when a user submits a query via a Slack workflow; the bot immediately processes the request and provides an initial AI-generated answer. * **Human-in-the-Loop Validation**: After receiving an answer, users can click "Issue Resolved" to close the ticket or "Call AWX Admin" if the AI's response was insufficient. * **Efficiency Gains**: This tiered approach filters out "RTFM" (Read The F***ing Manual) style questions, ensuring that human administrators only spend time on unique or complex technical issues. Implementing a RAG-based support bot is a highly effective strategy for SRE teams looking to scale their internal support without increasing headcount. For the best results, organizations should focus on maintaining clean internal documentation and selecting embedding models that reflect the linguistic diversity of their specific workforce.

google

Fine-tuning LLMs with user-level differential privacy (opens in new tab)

Researchers from Google investigated scaling user-level differential privacy (DP) to the fine-tuning of large language models in datacenter environments. While traditional example-level DP protects individual data points, user-level DP provides a stronger guarantee by masking the presence of an entire user's dataset, which is critical for privacy-sensitive, domain-specific tasks. The study explores how the flexibility of datacenter training can be used to optimize sampling strategies and contribution bounds to minimize the noise typically required for these stringent privacy guarantees. ## Limitations of Example-Level Privacy * Standard differential privacy focuses on "example-level" protection, which prevents attackers from learning about specific individual data points. * In many real-world scenarios, a single user contributes many examples to a dataset; if an attacker can analyze these multiple points together, they may still learn private information about the user even under example-level DP. * User-level DP addresses this by ensuring a model remains essentially the same whether or not a specific user’s entire data collection was used during training. * While more robust, user-level DP is "strictly harder" to implement because it requires injecting significantly more noise into the training process, a problem that scales with the size of the model. ## Methodologies for User-Level DP Fine-Tuning * Both primary algorithms require a "contribution bound" during pre-processing, which strictly limits the number of examples any single user can provide to the training set. * Example-Level Sampling (ELS) involves sampling random individual examples for a batch and then applying a modified version of DP-SGD with high noise to compensate for the potential presence of multiple examples from the same user. * User-Level Sampling (ULS) involves sampling random users and including all of their (bounded) examples in a batch, which more closely resembles the structure of federated learning. * The datacenter environment offers a unique advantage over federated learning because researchers can perform precise queries on both individual examples and whole users, allowing for better optimization of the noise-to-utility ratio. ## Optimization and Datacenter Flexibility * The researchers focused on fine-tuning rather than full training because DP requires additional computation that is often unaffordable for base model training. * A central challenge in this research is determining the optimal "contribution bound"—if the bound is too low, valuable data is discarded, but if it is too high, more noise must be added to maintain privacy. * Because the datacenter allows for random sampling of any user at any time (unlike federated learning where devices must be online), the ULS algorithm can be tuned more effectively to achieve quality gains in the final model. To maximize the utility of LLMs fine-tuned on private data, developers should prioritize User-Level Sampling (ULS) strategies and carefully calibrate the contribution bounds of their datasets. By leveraging the controlled environment of a datacenter to optimize these parameters, it is possible to achieve high-performance models that respect user privacy more effectively than traditional example-level methods.

line

Code Quality Improvement Techniques Part 1 (opens in new tab)

The "Set Discount" technique improves code quality by grouping related mutable properties into a single state object rather than allowing them to be updated individually. By restricting state changes through a controlled interface, developers can prevent inconsistent configurations and simplify the lifecycle management of complex classes. This approach ensures that dependent values are updated atomically, significantly reducing bugs caused by race conditions or stale data. ### The Risks of Fragmented Mutability When a class exposes multiple independent mutable properties, such as `isActive`, `minImportanceToRecord`, and `dataCountPerSampling`, it creates several maintenance challenges: * **Order Dependency:** Developers might accidentally set `isActive` to true before updating the configuration properties, causing the system to briefly run with stale or incorrect settings. * **Inconsistent Logic:** Internal state resets (like clearing a counter) may be tied to one property but forgotten when another related property changes, leading to unpredictable behavior. * **Concurrency Issues:** Even in single-threaded environments, asynchronous updates to individual properties can create race conditions that are difficult to debug. ### Consolidating State with SamplingPolicy To resolve these issues, the post recommends refactoring individual properties into a dedicated configuration class and using a single reference to manage the state: * **Atomic Updates:** By wrapping configuration values into a `SamplingPolicy` class, the system ensures that the minimum importance level and sampling interval are always updated together. * **Representing "Inactive" with Nulls:** Instead of a separate boolean flag, the `policy` property can be made nullable. An `inactive` state is naturally represented by `null`, making it impossible to "activate" the recorder without providing a valid policy. * **Explicit Lifecycle Methods:** Replacing property setters with methods like `startRecording()` and `finishRecording()` forces a clear transition of state and ensures that counters are reset consistently every time a new session begins. ### Advantages of Restricting State Transitions Moving from individual property mutation to a consolidated interface offers several technical benefits: * **Guaranteed Consistency:** It eliminates the possibility of "half-configured" states because the policy is replaced as a whole. * **Simplified Thread Safety:** If the class needs to be thread-safe, developers only need to synchronize a single reference update rather than coordinating multiple volatile variables. * **Improved Readability:** The intent of the code becomes clearer to future maintainers because the valid combinations of state are explicitly defined by the API. When designing components where properties are interdependent or must change simultaneously, you should avoid providing public setters for every field. Instead, provide a focused interface that limits updates to valid combinations, ensuring the object remains in a predictable state throughout its lifecycle.

google

Google Research at Google I/O 2025 (opens in new tab)

Google Research at I/O 2025 showcases the "research to reality" transition, highlighting how years of foundational breakthroughs are now being integrated into Gemini models and specialized products. By focusing on multimodal capabilities, pedagogy, and extreme model efficiency, Google aims to democratize access to advanced AI while ensuring it remains grounded and useful across global contexts. ## Specialized Healthcare Models: MedGemma and AMIE * **MedGemma:** This new open model, based on Gemma 3, is optimized for multimodal medical tasks such as radiology image analysis and clinical data summarization. It is available in 4B and 27B sizes, performing similarly to much larger models on the MedQA benchmark while remaining small enough for efficient local fine-tuning. * **AMIE (Articulate Medical Intelligence Explorer):** A research AI agent designed for diagnostic medical reasoning. Its latest multimodal version can now interpret and reason about visual medical information, such as skin lesions or medical imaging, to assist clinicians in diagnostic accuracy. ## Educational Optimization through LearnLM * **Gemini 2.5 Pro Integration:** The LearnLM family of models, developed with educational experts, is now integrated into Gemini 2.5 Pro. This fine-tuning enhances STEM reasoning, multimodal understanding, and pedagogical feedback. * **Interactive Learning Tools:** A new research-optimized quiz experience allows students to generate custom assessments from their own notes, providing specific feedback on right and wrong answers rather than just providing solutions. * **Global Assessment Pilots:** Through partnerships like the one with Kayma, Google is testing the automatic assessment of short and long-form content in regions like Ghana to scale quality educational tools. ## Multilingual Expansion and On-Device Gemma Models * **Gemma 3 and 3n:** Research breakthroughs have expanded Gemma 3’s support to over 140 languages. The introduction of **Gemma 3n** targets extreme efficiency, capable of running on devices with as little as 2GB of RAM while maintaining low latency and low energy consumption. * **ECLeKTic Benchmark:** To assist the developer community, Google introduced this novel benchmark specifically for evaluating how well large language models transfer knowledge across different languages. ## Model Efficiency and Factuality in Search * **Inference Techniques:** Google Research continues to set industry standards for model speed and accessibility through technical innovations like **speculative decoding** and **cascades**, which reduce the computational cost of generating high-quality responses. * **Grounded Outputs:** Significant focus remains on factual consistency, ensuring that the AI models powering features like AI Overviews in Search provide reliable and grounded information to users. As Google continues to shrink the gap between laboratory breakthroughs and consumer products, the emphasis remains on making high-performance AI accessible on low-cost hardware and across diverse linguistic landscapes. Developers and researchers can now leverage these specialized tools via platforms like HuggingFace and Vertex AI to build more targeted, efficient applications.

line

How should we evaluate AI-generated (opens in new tab)

To optimize the Background Person Removal (BPR) feature in image editing services, the LY Corporation AMD team evaluated various generative AI inpainting models to determine which automated metrics best align with human judgment. While traditional research benchmarks often fail to reflect performance in high-resolution, real-world scenarios, this study identifies a framework for selecting models that produce the most natural results. The research highlights that as the complexity and size of the masked area increase, the gap between model performance becomes more pronounced, requiring more sophisticated evaluation strategies. ### Background Person Removal Workflow * **Instance Segmentation:** The process begins by identifying individual pixels to classify objects such as people, buildings, or trees within the input image. * **Salient Object Detection:** This step distinguishes the main subjects of the photo from background elements to ensure only unwanted figures are targeted for removal. * **Inpainting Execution:** Once the background figures are removed, inpainting technology is used to reconstruct the empty space so it blends seamlessly with the surrounding environment. ### Comparison of Inpainting Technologies * **Diffusion-based Models:** These models, such as FLUX.1-Fill-dev, restore damaged areas by gradually removing noise. While they excel at restoring complex details, they are generally slower than GANs and can occasionally generate artifacts. * **GAN-based Models:** Using a generator-discriminator architecture, models like LaMa and HINT offer faster generation speeds and competitive performance for lower-resolution or smaller inpainting tasks. * **Performance Discrepancy:** Experiments showed that while most models perform well on small areas, high-resolution images with large missing sections reveal significant quality differences that are not always captured in standard academic benchmarks. ### Evaluation Methodology and Metrics * **BPR Evaluation Dataset:** The team curated a specific dataset of 10 images with high quality-variance to test 11 different inpainting models released between 2022 and 2024. * **Single Image Quality Metrics:** Evaluated models using LAION Aesthetics score-v2, CLIP-IQA, and Q-Align to measure the aesthetic quality of individual generated frames. * **Preference and Reward Models:** Utilized PickScore, ImageReward, and HPS v2 to determine which generated images would be most preferred by human users. * **Objective:** The goal of these tests was to find an automated evaluation method that minimizes the need for expensive and time-consuming human reviews while maintaining high reliability. Selecting an inpainting model based solely on paper-presented metrics is insufficient for production-level services. For features like BPR, it is critical to implement an evaluation pipeline that combines both aesthetic scoring and human preference models to ensure consistent quality across diverse, high-resolution user photos.

line

Code Quality Improvement Techniques Part 1 (opens in new tab)

Effective code design often involves shifting the responsibility of state verification from the caller to the receiving object. By internalizing "if-checks" within the function that performs the action, developers can reduce boilerplate, prevent bugs caused by missing preconditions, and simplify state transitions. This encapsulation ensures that objects maintain their own integrity while providing a cleaner, more intuitive API for the rest of the system. ### Internalizing State Verification * Instead of the caller using a pattern like `if (!receiver.isState()) { receiver.doAction() }`, the check should be moved inside the `doAction` method. * Moving the check inside the function prevents bugs that occur when a caller forgets to verify the state, which could otherwise lead to crashes or invalid data transitions. * This approach hides internal state details from the caller, simplifying the object's interface and focusing on the desired outcome rather than the prerequisite checks. * If "doing nothing" when a condition isn't met is non-obvious, developers should use descriptive naming (e.g., `markAsFriendIfNotYet`) or clear documentation to signal this behavior. ### Leveraging Return Values for Conditional Logic * When a caller needs to trigger a secondary effect—such as showing a UI popup—only if an action was successful, it is better to return a status value (like a `Boolean`) rather than using higher-order functions. * Passing callbacks like `onSucceeded` into a use case can create unnecessary dependency cycles and makes it difficult for the caller to discern if the execution is synchronous or asynchronous. * Returning a `Boolean` to indicate if a state change actually occurred allows the caller to handle side effects cleanly and sequentially. * To ensure the caller doesn't ignore these results, developers can use documentation or specific compiler annotations to force the verification of the returned value. To improve overall code quality, prioritize "telling" an object what to do rather than "asking" about its state and then acting. Centralizing state logic within the receiver not only makes the code more robust against future changes but also makes the intent of the calling code much easier to follow.