Figma Rendering: Powered by WebGPU | Figma Blog (opens in new tab)
The search for speed in Figma Inside Figma Engineering Quality & performance Infrastructure Behind the scenes
The search for speed in Figma Inside Figma Engineering Quality & performance Infrastructure Behind the scenes
Self Logits Evolution Decoding (SLED) is a novel decoding strategy designed to reduce hallucinations and improve the factual accuracy of large language models without requiring external data or fine-tuning. By leveraging the internal representations of all model layers rather than just the final output, SLED aligns generation with the model’s intrinsic knowledge more effectively. Research shows that this approach consistently enhances performance across diverse tasks, including complex reasoning, multiple-choice questions, and open-ended generation. ## Limitations of Standard Decoding * Standard LLMs typically generate text by relying solely on the "logits" (prediction scores) of the final layer to determine the next token. * This process often leads to hallucinations because the final layer may prioritize "popular" or common patterns from training data over factual accuracy. * While techniques like Retrieval Augmented Generation (RAG) provide external context, they increase system complexity and do not address the model's internal tendency to ignore subtle contextual cues during the final projection. ## The Technical Mechanism of SLED * SLED utilizes "early exit" logits from every intermediate layer of the Transformer architecture, rather than just the final one. * The strategy reuses the model's final projection matrix on these intermediate layers to create multiple probability distributions across the same set of potential tokens. * By calculating a weighted average of the distributions from all layers, SLED refines the prediction to better reflect the model's latent knowledge. * This multi-layer approach allows the model to catch nuances—such as specific math constraints or geographic facts—that might be "smoothed over" by the final layer’s preference for high-probability sequences. ## Practical Performance and Reasoning * In chain-of-thought tasks, SLED helps the model maintain logic; for example, it can correctly identify when a discount should be applied in a math problem by favoring intermediate layers that recognize the "if/then" logic over a simple arithmetic pattern. * The method is model-agnostic and has shown consistent accuracy gains across various LLM scales and configurations. * SLED is highly flexible and can be integrated with existing factuality decoding methods or speculative decoding to further reduce hallucinations without the need for additional training data. For developers and researchers seeking to boost the reliability of LLMs, SLED offers a computationally efficient alternative to fine-tuning. By simply adjusting the decoding strategy to incorporate the rich information available in intermediate layers, models can achieve higher factuality and more robust reasoning capabilities in real-world applications.
Google Research has introduced Learn Your Way, an AI-driven educational experiment that reimagines traditional textbooks as personalized, multimodal learning journeys. By leveraging the LearnLM family of models integrated into Gemini 2.5 Pro, the system transforms static source material into tailored content based on a student’s specific grade level and interests. Early efficacy studies demonstrate that this approach significantly enhances retention, with students scoring 11 percentage points higher than those using standard digital readers. ### Pedagogical Foundations and Dual Coding The research is built on the "dual coding theory," which suggests that forming mental connections between different representations of information strengthens conceptual understanding. * The system moves away from a "one-size-fits-all" model toward a student-driven experience where learners can choose and intermix formats. * Personalization is used as a tool to enhance situational interest and motivation by adapting content to specific student attributes. * The framework incorporates active learning through real-time quizzing and feedback to address knowledge gaps as they arise. ### The Personalization Pipeline The technical architecture begins with a layered pipeline that processes source material, such as a textbook PDF, to create a foundational text for all other formats. * The original material is first "re-leveled" to match the learner’s reported grade level while maintaining the integrity and scope of the curriculum. * Generic examples within the text are strategically replaced with personalized examples based on user interests, such as sports, music, or food. * This personalized base text serves as the primary input for generating all subsequent multimodal representations, ensuring consistency across formats. ### Multimodal Content Generation To produce a wide variety of educational assets, the system utilizes a combination of large language models and specialized AI agents. * **Agentic Workflows:** While tools like mind maps and timelines are generated directly by Gemini, complex assets like narrated slides use multi-step agentic workflows to ensure pedagogical effectiveness. * **Custom Visuals:** Because general-purpose image models often struggle with educational accuracy, the researchers fine-tuned a dedicated model specifically for generating educational illustrations. * **Diverse Representations:** The interface provides "immersive text" with embedded questions, audio lessons for auditory learning, and interactive slides that mimic recorded classroom sessions. ### Research Outcomes and Future Application The project’s effectiveness was validated through a study comparing the GenAI approach against standard digital reading materials. * Students using the personalized AI tools showed a significant improvement in retention test scores. * Beyond retention, the system aims to transform passive reading into an active, multimodal experience that follows established learning science principles. * The "Learn Your Way" experiment is currently available on Google Labs, providing a practical look at how adaptive, learner-centric materials might replace static textbooks in future K-12 and higher education settings.
Viaduct, Five Years On: Modernizing the Data-Oriented Service Mesh A more powerful engine and a simpler API for our data-oriented mesh -- Listen Share By: Adam Miskiewicz, Raymie Stata In November 2020 we published a post about Viaduct, our data-oriented service mesh. Today, we’…
Discord's "Patch Notes" series serves as a dedicated update log detailing the platform's ongoing efforts to improve performance, reliability, and responsiveness. By highlighting recent bug fixes and usability enhancements, the series keeps the community informed about the specific engineering changes being deployed across the service. All listed improvements have been officially committed and merged into the codebase, though they may roll out to different platforms at varying speeds. ### Community Feedback and Bug Reporting * Users can report technical issues through the Bimonthly Bug Megathread hosted on the r/DiscordApp subreddit. * This community-run channel allows the Discord Engineering team to directly review and address specific problems reported by the user base. ### Early Feature Testing via TestFlight * iOS users are invited to join the Discord TestFlight program to gain early access to features before their official release. * This beta testing environment is used to identify and "squish" bugs through community interaction before the changes reach the general public. ### Deployment and Release Status * Improvements documented in these updates represent code that has already passed the commit and merge stages of the development cycle. * Because the rollout process is incremental, users may experience a slight delay before specific fixes become active on their particular device or platform. To ensure the best experience, users are encouraged to keep their applications updated and utilize the TestFlight program if they wish to provide early feedback on new builds.
Free association: Production designer Jeremy Hindle on building Severance Insights Profiles & interviews Culture From Jacque Tati’s “Playtime” to David Lynch’s “Twin Peaks,” Jeremy Hindle traces the ideas and images that shaped Lumon’s uncanny world. “Playtime (1967)” John Deere…
When implementing resource management patterns similar to Kotlin's `use` or Java's try-with-resources, developers often face the challenge of handling exceptions that occur during both primary execution and resource cleanup. Simply wrapping these multiple failures in a custom exception container can inadvertently break the calling code's error-handling logic by masking the original exception type. To maintain code quality, developers should prioritize the primary execution exception and utilize the `addSuppressed` mechanism to preserve secondary errors without disrupting the expected flow. ### The Risks of Custom Exception Wrapping Creating a new exception class to consolidate multiple errors during resource management can lead to significant issues for the caller. * Wrapping an expected exception, such as an `IOException`, inside a custom `DisposableException` prevents specific `catch` blocks from identifying and handling the original error. * This pattern often results in unhandled exceptions or the loss of specific error context, especially when the wrapper is hidden inside utility functions. * While this approach aims to be "neat" by capturing all possible failures, it forces the caller to understand the internal wrapping logic of the utility rather than the business logic errors. ### Prioritizing Primary Logic over Cleanup When errors occur in both the main execution block and the cleanup (e.g., `dispose()` or `close()`), it is critical to determine which exception takes precedence. * The exception from the main execution block is typically the "primary" failure that reflects a business logic or IO error, whereas a cleanup failure is often secondary. * Throwing a cleanup exception while discarding the primary error makes debugging difficult, as the root cause of the initial failure is lost. * In a typical `try-finally` block, if the `finally` block throws an exception, it naturally suppresses any exception thrown in the `try` block unless handled manually. ### Implementing Better Suppression Logic A more robust implementation mimics the behavior of Kotlin’s `Closeable.use` by ensuring the most relevant error is thrown while keeping others accessible for debugging. * Instead of creating a wrapper class, use `Throwable.addSuppressed()` to attach the cleanup exception to the primary exception. * If only the primary block fails, throw that exception directly to satisfy the caller's `catch` requirements. * If both the primary block and the cleanup fail, throw the primary exception and add the cleanup exception as a suppressed error. * If only the cleanup fails, it is then appropriate to throw the cleanup exception as the standalone failure. ### Considerations for Checked and Unchecked Exceptions The impact of exception handling varies by language, particularly in Java where checked exceptions are enforced by the compiler. * Converting a checked exception into an unchecked `RuntimeException` inside a wrapper can cause the compiler to miss necessary error-handling requirements. * If exceptions have parent-child relationships, such as `IOException` and `Exception`, wrapping can cause a specific handler to be bypassed in favor of a more generic one. * It is generally recommended to only wrap checked exceptions in `RuntimeException` when the error is truly unrecoverable and the caller is not expected to handle it. When designing custom resource management utilities, always evaluate which exception is most critical for the caller to see. Prioritize the primary execution error and use suppression for auxiliary cleanup failures to ensure that your error-handling remains transparent and predictable for the rest of the application.
VaultGemma represents a significant milestone in privacy-preserving AI as the most capable large language model trained from scratch using differential privacy (DP). By establishing new scaling laws specifically for DP training, researchers have optimized the complex trade-offs between compute, privacy budgets, and model utility. The resulting 1-billion-parameter model demonstrates that high-performance generative AI can be achieved while maintaining rigorous mathematical guarantees against data memorization. ## Scaling Laws for Differentially Private Training * Performance in DP-trained models is primarily governed by the "noise-batch ratio," which measures the amount of random privacy noise relative to the size of the training data groups. * Research suggests that for any given compute and privacy budget, there exists an optimal training configuration that balances model size, iterations, and batch size to achieve the lowest possible training loss. * A critical finding indicates that DP training requires a departure from standard scaling practices, favoring significantly larger batch sizes and smaller model architectures than traditional non-DP training. ## Synergies in Privacy, Compute, and Data * Increasing the privacy budget (epsilon) in isolation leads to diminishing returns unless it is paired with a proportional increase in compute (FLOPs) or data (tokens). * Visualizations of the scaling laws show that different model sizes can provide similar utility if the number of training iterations and batch sizes are correctly adjusted. * The optimal configuration shifts between investing in larger models versus more iterations depending on the specific constraints of the data and privacy budgets. ## Training at Scale with Algorithmic Advancements * VaultGemma is built on the Gemma 2 architecture and utilizes a 1B parameter setup optimized for the unique constraints of DP. * To overcome hardware limitations when processing the massive batch sizes required for DP training, the team developed a "Virtual Batch" technique in JAX to aggregate gradients across multiple steps. * Training from scratch allows the model to outperform traditional DP-finetuned models, which often struggle to balance utility with the noise introduced during the fine-tuning process. ## Performance and Evaluation * VaultGemma achieves competitive results against standard 1B parameter models while providing formal privacy protections. * The model demonstrates superior privacy-utility trade-offs, proving that carefully scaled DP models can retain high levels of reasoning and language capability. * The release includes the model weights and a comprehensive technical report to assist the community in developing the next generation of private-by-design AI. VaultGemma provides a practical blueprint for developers who need to balance the power of large language models with strict data confidentiality requirements. By leveraging the provided scaling insights, organizations can now train models that are mathematically resistant to data leakage without sacrificing significant performance.
Is the app layer where AI proves its value? Insights AI Design thinking Thought leadership The next leap in AI won’t come from new models alone—the app layer will be what makes new technology stick. Hero illustration by Zoé Maghamès Peters “Right now we’re in the MS-DOS era for…
Speculative cascades represent a hybrid inference method that integrates the cost-efficiency of model cascades with the latency-reducing benefits of speculative decoding. By utilizing a smaller drafter model to generate token sequences that are verified in parallel by a larger expert model, this approach allows for high-speed generation while maintaining flexible quality standards. The result is a system that achieves superior cost-quality trade-offs and higher speed-ups than either traditional cascading or standard speculative decoding alone. ### Limitations of Cascades and Speculative Decoding * **Sequential Bottlenecks in Cascades:** Traditional cascades use a deferral rule to decide if a small model can handle a prompt. If the small model is not confident, the system waits for it to finish before starting the large model from scratch, wasting significant time. * **Strict Matching in Speculative Decoding:** This method requires the large model to verify the small model’s tokens. Even if the small model produces a factually correct and high-quality response, the large model will reject the entire draft if the tokens do not match its own preferred output exactly. * **Trade-off Divergence:** Cascades prioritize reducing computational costs but suffer from latency when deferring, while speculative decoding prioritizes speed but often performs redundant work because it mandates identical output to the larger model. ### The Speculative Cascades Mechanism * **Parallel Verification with Deferral:** Speculative cascades use the parallel processing of speculative decoding but introduce a flexible decision rule. The system can choose to accept the smaller model’s draft even if it differs from the larger model’s prediction, provided it meets a confidence threshold. * **Flexible Token Matching:** Unlike standard speculative decoding, which often relies on strict token-by-token matching, speculative cascades allow for "probabilistic matches" or quality-based acceptance to prevent unnecessary rejections. * **Resource Optimization:** By strategically deferring to the smaller model for certain segments of the generation, the system reduces the total work required from the expensive expert model without losing the speed of parallel execution. ### Empirical Results and Performance * **Model Testing:** The approach was validated using Gemma and T5 models across diverse language tasks, including reasoning, coding, translation, and question answering. * **Superior Trade-offs:** Testing showed that speculative cascades consistently outperformed baselines in cost-quality metrics, providing faster inference without the strict "all-or-nothing" quality constraints of speculative decoding. * **Task Versatility:** The hybrid method proved effective across both creative tasks (like summarization) and factual tasks (like math or coding), where different levels of "correctness" are acceptable. Speculative cascades offer a practical path for scaling LLM deployments by balancing the high cost of large models with the need for low-latency user experiences. Developers looking to optimize inference should consider this hybrid approach to capture the efficiency of small models while retaining the oversight of larger, more capable ones.
Google Research and Move37 Labs have introduced NucleoBench, a comprehensive open-source benchmark for nucleic acid design, alongside AdaBeam, a high-performing new optimization algorithm. While AI models have become highly proficient at predicting the biological properties of DNA and RNA, generating optimal sequences within massive search spaces—such as the $2 \times 10^{120}$ possible variations for a 5' UTR—remains a significant hurdle. By standardizing evaluation across 16 distinct biological tasks, this research identifies AdaBeam as a superior method that scales effectively to the large-scale models required for modern drug discovery. ## Standardizing the Optimization Pipeline The process of computational nucleic acid design typically follows a five-step workflow: data collection, training a predictive model, generating candidate sequences (the design step), wet-lab validation, and iterative retraining. NucleoBench focuses specifically on the design step, which has historically lacked standardized evaluation. * Most existing benchmarks rely on decades-old methods like simulated annealing or vanilla genetic algorithms. * Traditional algorithms often treat predictive models as "black boxes," failing to leverage internal model data to guide the search. * The vastness of genomic search spaces makes brute-force optimization impossible, necessitating more intelligent, model-aware generation strategies. ## The NucleoBench Framework NucleoBench is the first large-scale benchmark designed to compare gradient-free and gradient-based design algorithms under identical conditions. The framework encompasses over 400,000 experiments to ensure statistical rigor across diverse biological challenges. * **Algorithm Categories**: It compares gradient-free methods (like directed evolution), which are simple but ignore model internals, against gradient-based methods (like FastSeqProp), which use the model’s internal "direction of steepest improvement" to find better sequences. * **Task Diversity**: The 16 tasks include controlling gene expression in specific cell types (liver or neuronal), maximizing transcription factor binding, and improving chromatin accessibility. * **Scale**: The benchmark includes long-range DNA sequence challenges using large-scale models like Enformer, which are computationally demanding but critical for understanding complex genomic interactions. ## AdaBeam’s Hybrid Optimization Performance Drawing on insights from the NucleoBench evaluation, the researchers developed AdaBeam, a hybrid algorithm that combines the strengths of various optimization strategies. * **Success Rate**: AdaBeam outperformed existing algorithms on 11 of the 16 tasks in the benchmark. * **Efficiency and Scaling**: Unlike many gradient-based methods that struggle with computational overhead, AdaBeam demonstrates superior scaling properties as sequences become longer and predictive models grow in complexity. * **Methodology**: It functions as a hybrid approach, using sophisticated search techniques to navigate the sequence space more effectively than "vanilla" algorithms developed before the era of deep learning. The researchers have made AdaBeam and the NucleoBench repository freely available to the scientific community. By providing a standardized environment for testing, they aim to accelerate the development of next-generation treatments, including more stable mRNA vaccines and precise CRISPR gene therapies.
Figma's 2025 AI report: Perspectives from designers and developers Insights AI Research Report Thought leadership
Issue no.11: Made with love Working Well The Long & Short of It
Design systems and AI: Why MCP servers are the unlock Working Well Design systems AI Productivity
The anatomy of an activation: How Figma Commons brought design to the public Inside Figma Behind the scenes Culture Events