google

Graph foundation models for relational data (opens in new tab)

Google researchers have introduced Graph Foundation Models (GFMs) as a solution to the limitations of traditional tabular machine learning, which often ignores the rich connectivity of relational databases. By representing tables as interconnected graphs where rows are nodes and foreign keys are edges, this approach enables a single model to generalize across entirely different schemas and feature sets. This shift allows for transferable graph representations that can perform inference on unseen tasks without the costly need for domain-specific retraining. ### Transforming Relational Schemas into Graphs The core methodology involves a scalable data preparation step that converts standard relational database structures into a single heterogeneous graph. This process preserves the underlying logic of the data while making it compatible with graph-based learning: * **Node Mapping:** Each unique table is treated as a node type, and every individual row within that table is converted into a specific node. * **Edge Creation:** Foreign key relationships are transformed into typed edges that connect nodes across different tables. * **Feature Integration:** Standard columns containing numerical or categorical data are converted into node features, while temporal data can be preserved as features on either nodes or edges. ### Overcoming the Generalization Gap A primary hurdle in developing GFMs is the lack of a universal tokenization method, unlike the word pieces used in language models or patches used in vision models. Traditional Graph Neural Networks (GNNs) are typically locked to the specific graph they were trained on, but GFMs solve this through several technical innovations: * **Schema Agnosticism:** The model avoids hard-coded embedding tables for specific node types, allowing it to interpret database schemas it has never encountered during training. * **Feature Interaction Learning:** Instead of training on "absolute" features (like specific price distributions), the model captures how different features interact with one another across diverse tasks. * **Generalizable Encoders:** The architecture uses transferable methods to derive fixed-size representations for nodes, whether they contain three continuous float features or dozens of categorical values. ### Scaling and Real-World Application To handle the requirements of enterprise-level data, the GFM framework is built to operate on a massive scale using Google’s specialized infrastructure: * **Massive Throughput:** The system utilizes JAX and TPU infrastructure to process graphs containing billions of nodes and edges. * **Internal Validation:** The model has been tested on complex internal Google tasks, such as spam detection in advertisements, which requires analyzing dozens of interconnected relational tables simultaneously. * **Performance Benefits:** By considering the connections between rows—a factor traditional tabular baselines like decision trees often ignore—the GFM provides superior downstream performance in high-stakes prediction services. Transitioning from domain-specific models to Graph Foundation Models allows organizations to leverage relational data more holistically. By focusing on the connectivity of data rather than just isolated table features, GFMs provide a path toward a single, generalist model capable of handling diverse enterprise tasks.

discord

Discord Patch Notes: July 7, 2025 (opens in new tab)

Discord’s Patch Notes series serves as a transparent update log detailing the engineering team's ongoing efforts to enhance performance, reliability, and usability across the platform. By integrating community feedback with rigorous pre-release testing, the company aims to resolve technical debt and refine the user experience through a structured deployment cycle. These updates reflect a commitment to a high-quality, responsive application that evolves based on real-world user interactions. ### Engineering Priorities and Quality Assurance * Focuses on optimizing core application metrics including responsiveness, reliability, and general system performance. * Targets a broad range of improvements from high-level usability features to granular "bug-squishing" and stability fixes. * Ensures that all documented changes have been successfully committed and merged into the codebase prior to announcement. ### Community-Based Bug Identification * Leverages the r/DiscordApp subreddit as a primary channel for crowdsourcing bug reports via a Bimonthly Bug Megathread. * Provides a direct feedback loop where the Engineering team monitors community reports to identify and triage persistent issues. * Encourages user-led troubleshooting to help the development team prioritize fixes that impact the broader user base. ### Pre-Release Testing and Deployment * Utilizes the iOS TestFlight program to allow "edge" users to test upcoming features and identify regressions before they reach the general public. * Directs interested testers to specialized access points like dis.gd/testflight to facilitate early-stage bug detection. * Operates on a rolling deployment schedule, meaning that while fixes are merged, they may appear on different platforms at different times. To help maintain the platform's stability, users are encouraged to report any discovered issues to the community megathread or join the TestFlight program to test new builds before their official release.

line

Code Quality Improvement Techniques Part 1 (opens in new tab)

The Null Object Pattern is a design technique that replaces null values with objects representing "empty" or "invalid" states to simplify code and provide functional fallbacks. While it effectively streamlines logic for collections and general data flows, using it when error conditions must be explicitly distinguished can lead to hidden bugs and reduced type safety. Developers should generally prefer statically verified types, such as Optionals or language-native nullable types, unless the error case can be seamlessly integrated into the happy-path logic. ### Benefits of the Null Object Pattern * **Code Simplification:** By returning an empty list or a "null object" instead of a literal `null`, callers can avoid repetitive null-check boilerplate. * **Functional Continuity:** It allows for uninterrupted processing in functional chains, such as using `.asSequence().map().forEach()`, because the "empty" object still satisfies the required interface. * **Fallback Provisioning:** The pattern is useful for converting errors into safe fallback values, such as displaying an "Unknown User" profile image rather than crashing or requiring complex conditional UI logic. ### Risks of Silent Failures and Logic Errors * **Bypassing Compiler Safety:** Unlike nullable types in Kotlin or Swift, which force developers to handle the `null` case, a custom null object (e.g., `UserModel.INVALID`) allows code to compile even if the developer forgets to check the object's validity. * **Identity vs. Equivalence:** Implementing the pattern requires caution regarding how the object is compared. If a null object is checked via reference identity (`==`) but the class lacks a proper `equals` implementation, new instances with the same "empty" values may not be recognized as invalid. * **Debugging Difficulty:** When a null object is used inappropriately, the program may continue to run with dummy data. This makes bugs harder to detect compared to a runtime error or a compile-time type mismatch. ### Best Practices for Type Safeness * **Prefer Static Verification:** When boundary conditions or errors must be handled differently than the "happy path," use `Optional`, `Maybe`, or native nullable types to ensure the compiler enforces error handling. * **Criteria for Use:** Reserve the Null Object Pattern for cases where the error logic is identical to the normal logic, or when multiple "empty" candidates exist that cannot be easily resolved through static typing. * **Runtime Errors as a Tool:** In dynamic or non-nullable contexts, a runtime error is often preferable to silent execution with an invalid object, as it provides a clear signal that an unexpected state has been reached. ### Recommendation To maintain high code quality, utilize the Null Object Pattern primarily for collections and UI fallbacks. For core business logic where the presence of data is critical, rely on type-safe mechanisms that force explicit handling of missing values, thereby preventing invalid states from propagating silently through the system.

google

MedGemma: Our most capable open models for health AI development (opens in new tab)

Google Research has expanded its Health AI Developer Foundations (HAI-DEF) collection with the release of MedGemma and MedSigLIP, a series of open, multimodal models designed specifically for medical research and application development. These models offer a high-performance, privacy-preserving alternative to closed systems, allowing developers to maintain full control over their infrastructure while leveraging state-of-the-art medical reasoning. By providing both 4B and 27B parameter versions, the collection balances computational efficiency with complex longitudinal data interpretation, even enabling deployment on single GPUs or mobile hardware. ## MedGemma Multimodal Variants The MedGemma collection utilizes the Gemma 3 architecture to process both image and text inputs, providing robust generative capabilities for healthcare tasks. * **MedGemma 27B Multimodal:** This model is designed for complex tasks such as interpreting longitudinal electronic health records (EHR) and achieves an 87.7% score on the MedQA benchmark, performing within 3 points of DeepSeek R1 at approximately one-tenth the inference cost. * **MedGemma 4B Multimodal:** A lightweight version that scores 64.4% on MedQA, outperforming most open models under 8B parameters; it is optimized for mobile hardware and specific tasks like chest X-ray report generation. * **Clinical Accuracy:** In unblinded studies, 81% of chest X-ray reports generated by the 4B model were judged by board-certified radiologists to be sufficient for patient management, achieving a RadGraph F1 score of 30.3. * **Versatility:** The models retain general-purpose capabilities from the original Gemma base, ensuring they remain effective at instruction-following and non-English language tasks while handling specialized medical data. ## MedSigLIP Specialized Image Encoding MedSigLIP serves as the underlying vision component for the MedGemma suite, but it is also available as a standalone 400M parameter encoder for structured data tasks. * **Architecture:** Based on the Sigmoid loss for Language Image Pre-training (SigLIP) framework, it bridges the gap between medical imagery and text through a shared embedding space. * **Diverse Modalities:** The encoder was fine-tuned on a wide variety of medical data, including fundus photography, dermatology images, histopathology patches, and chest X-rays. * **Functional Use Cases:** It is specifically recommended for tasks involving classification, retrieval, and search, where structured outputs are preferred over free-text generation. * **Data Retention:** Training protocols ensured the model retained its ability to process natural images, maintaining its utility for hybrid tasks that mix medical and non-medical visual information. ## Technical Implementation and Accessibility Google has prioritized accessibility for developers by ensuring these models can run on consumer-grade or limited hardware environments. * **Hardware Compatibility:** Both the 4B and 27B models are designed to run on a single GPU, while the 4B and MedSigLIP versions are adaptable for edge computing and mobile devices. * **Open Resources:** To support the community, Google has released the technical reports, model weights on Hugging Face, and implementation code on GitHub. * **Developer Flexibility:** Because these are open models, researchers can fine-tune them on proprietary datasets without compromising data privacy or being locked into specific cloud providers. For medical AI development, the choice of model should depend on the specific output requirement: MedGemma is the optimal starting point for generative tasks like visual question answering or report drafting, while MedSigLIP is the preferred tool for building high-speed classification and image retrieval systems.

google

Making group conversations more accessible with sound localization (opens in new tab)

Google Research has introduced SpeechCompass, a system designed to improve mobile captioning for group conversations by integrating multi-microphone sound localization. By shifting away from complex voice-recognition models toward geometric signal processing, the system provides real-time speaker diarization and directional guidance through a color-coded visual interface. This approach significantly reduces the cognitive load for users who previously had to manually associate a wall of scrolling text with different speakers in a room. ## Limitations of Standard Mobile Transcription * Traditional automatic speech recognition (ASR) apps concatenate all speech into a single block of text, making it difficult to distinguish between different participants in a group setting. * Existing high-end solutions often require audio-visual separation, which needs a clear line of sight from a camera, or speaker embedding, which requires pre-registering unique voiceprints. * These current methods can be computationally expensive and often fail in spontaneous, mobile environments where privacy and setup speed are priorities. ## Hardware and Signal Localization * The system was prototyped in two forms: a specialized phone case featuring four microphones connected to an STM32 microcontroller and a software-only implementation for standard dual-microphone smartphones. * While dual-microphone setups are limited to 180-degree localization due to "front-back confusion," the four-microphone array enables full 360-degree sound tracking. * The system utilizes Time-Difference of Arrival (TDOA) and Generalized Cross Correlation with Phase Transform (GCC-PHAT) to estimate the angle of arrival for sound waves. * To handle indoor reverberations and noise, the team applied statistical methods like kernel density estimation to improve the precision of the localizer. ## Advantages of Waveform-Based Diarization * **Low Latency and Compute:** By avoiding heavy machine learning models and weights, the algorithm can run on low-power microcontrollers with minimal memory requirements. * **Privacy Preservation:** Unlike speaker embedding techniques, SpeechCompass does not identify unique voiceprints or require video, instead relying purely on the physical location of the sound source. * **Language Independence:** Because the system analyzes the differences between audio waveforms rather than the speech content itself, it is entirely language-agnostic and can localize non-speech sounds. * **Dynamic Reconfiguration:** The system adjusts instantly to the movement of the device, allowing users to reposition their phones without recalibrating the diarization logic. ## User Interface and Accessibility * The prototype Android application augments standard speech-to-text with directional data received via USB from the microphone array. * Transcripts are visually separated by color and accompanied by directional arrows, allowing users to quickly identify where a speaker is located in the physical space. * This visual feedback loop transforms a traditional transcript into a spatial map of the conversation, making group interactions more accessible for individuals who are deaf or hard of hearing.

discord

Discord Update: June 30, 2025 Changelog (opens in new tab)

Discord’s latest updates focus on enhancing user expression and streamlining platform navigation through a series of identity-driven features and technical refinements. By introducing Server Tags and expanding profile customization options, the platform aims to deepen community connections while simultaneously optimizing back-end processes like image compression and search algorithms. These changes reflect an ongoing effort to balance aesthetic personalization with functional performance for both individual users and server administrators. ### Server Tags and Community Visibility * Users can now display Server Tags next to their names to represent specific communities or favorite games. * These tags are interactive, allowing others to click them to learn about the server or apply for membership directly from the tag. * Server administrators can unlock this feature for their community once the server reaches three Boosts. ### Profile Customization and Asset Management * The desktop client now saves the last six used avatars in Profile Settings, enabling users to swap back to previous images without re-uploading files. * Nitro members gain extended access to Quest-earned Avatar Decorations, allowing them to keep these rewards beyond the standard two-month expiration period. * New Nameplate designs have been added to the Shop on the desktop app to further customize user presence in chat lists. ### Integrated Activities and Syntax Enhancements * The New York Times Games’ Wordle is now available as a Discord Activity, accessible by typing the `/wordle` command in any text channel. * Players can use the `/share` command to distribute their results across different channels or direct messages. * New Markdown support for email addresses allows users to wrap an address in brackets (e.g., `<email@address.com>`) to create a clickable link that opens a mail client automatically. ### Performance and Infrastructure Optimizations * The Quick Switcher tool received an algorithmic upgrade to improve the accuracy of channel and DM suggestions based on user behavior. * Mobile image embeds have been improved through a change in how the mobile application handles image compression, resulting in higher-quality renders. * The updated mobile image pipeline also reduces the time required for uploading and rendering images on handheld devices. ### Advanced Server Boosting Features * Beyond Server Tags, communities with sufficient boosts can now access Enhanced Role Styles, which add glowing gradients to specific server roles. * These aesthetic upgrades are designed to provide more visual hierarchy and flair to server member lists. To make the most of these updates, server owners should coordinate community boosts to unlock the new Role Styles and Server Tags, while power users should adopt the Quick Switcher and new Markdown syntax to increase their communication efficiency.

discord

Authenticity Matters: Discord&#39;s Pride Month 2025 (opens in new tab)

Discord celebrates Pride Month by honoring the 1969 protests that sparked the LGBTQIA+ movement and reaffirms its commitment to fostering a culture of authenticity. The company highlights the role of its PRIDE Employee Resource Group (ERG) in building an environment where diverse perspectives are considered essential to long-term success. By prioritizing belonging, Discord aims to ensure all team members feel respected and empowered to contribute their full selves to the mission. ### The Roots of Advocacy and Progress * Pride is recognized as a protest originating in 1969, emphasizing the historical adversity and courage required to live authentically. * The legacy of early advocacy serves as an ongoing inspiration for Discord’s efforts to ensure dignity and equality for its global workforce. * The company acknowledges that while 56 years of progress has been made, active work remains necessary to achieve full systemic respect and equality. ### Organizational Structure and Belonging * Discord facilitates inclusion through nine distinct Employee Resource Groups (ERGs) that are open to all employees across the organization. * The PRIDE ERG specifically focuses on "building belonging," a concept the company views as the heart of its internal community and operational strategy. * The management philosophy encourages team members to "bring their whole selves to work," operating on the principle that collective unique perspectives create a greater institutional impact. Organizations can look to Discord’s model of integrating ERGs as a method for fostering a culture of authenticity, ensuring that diversity is treated as a core driver of innovation rather than just a social metric.

google

How we created HOV-specific ETAs in Google Maps (opens in new tab)

Google Maps has enhanced its routing capabilities by introducing HOV-specific ETAs, addressing the significant speed differences between carpool and general lanes. This was achieved through a novel unsupervised learning approach that classifies historical trips into HOV or non-HOV categories without initial manual labels. The resulting system enables more precise travel predictions, helping users optimize their commutes and supporting the shift toward sustainable travel modes. ### Segment-Level Speed Distribution * The model analyzes trip segments within short, 15-minute time windows to identify patterns in aggregated, anonymized traffic data. * During peak traffic hours, researchers often observe a bimodal speed distribution where HOV lanes maintain significantly higher average speeds compared to general lanes. * The classification system distinguishes between "Scenario A," where the speed gap is dramatic (e.g., 65 mph vs. 25 mph), and "Scenario B," where HOV lanes are only marginally faster, ensuring accurate modeling even when benefits are minimal. * Individual trip points, including speed and observation time, are processed collectively to determine if a specific segment of a journey occurred in a restricted lane. ### Incorporating Lateral Distance and Soft Clustering * To refine accuracy beyond simple speed metrics, the model incorporates the estimated lateral distance of a vehicle from the center of the road. * While GPS data is inherently noisy, this spatial information helps identify lane-specific behaviors by mapping trip points to the known physical location of HOV lanes (e.g., the far-left lanes). * The system employs soft clustering techniques, calculating the probability of a point belonging to a specific cluster rather than using hard binary assignments, which better manages borderline data points. * Temporal clustering via a weighted median approach is used to prioritize more recent traffic observations, ensuring the model accounts for the most current road conditions and availability constraints. By integrating these segment-level classifications into full-trip analyses, Google Maps can train its ETA prediction models on high-fidelity, lane-specific data. This implementation provides users with a more realistic view of their travel options, encouraging the use of high-occupancy lanes to reduce individual travel time, urban congestion, and overall emissions.

google

REGEN: Empowering personalized recommendations with natural language (opens in new tab)

Google Research has introduced REGEN, a benchmark dataset designed to evolve recommender systems from simple item predictors into conversational agents capable of natural language interaction. By augmenting the Amazon Product Reviews dataset with synthetic critiques and narratives using Gemini 1.5 Flash, the researchers provide a framework for training models to understand user feedback and explain their suggestions. The study demonstrates that integrating natural language critiques significantly improves recommendation accuracy while enabling models to generate personalized, context-aware content. ### Composition of the REGEN Dataset * The dataset enriches the existing Amazon Product Reviews archive by adding synthetic conversational elements, specifically targeting the gap in datasets that support natural language feedback. * **Critiques** are generated for similar item pairs within hierarchical categories, allowing users to guide the system by requesting specific changes, such as a different color or increased storage. * **Narratives** provide contextual depth through purchase reasons, product endorsements, and concise user summaries, helping the system justify its recommendations to the end-user. ### Unified Generative Modeling Approaches * The researchers framed a "jointly generative" task where models must process a purchase history and optional critique to output both a recommended item ID and a supporting narrative. * The **FLARE (Hybrid)** architecture uses a sequential recommender for item prediction based on collaborative filtering, which then feeds into a Gemma 2B LLM to generate the final text narrative. * The **LUMEN (Unified)** model functions as an end-to-end system where item IDs and text tokens are integrated into a single vocabulary, allowing one LLM to handle critiques, recommendations, and narratives simultaneously. ### Performance and Impact of User Feedback * Incorporating natural language critiques consistently improved recommendation metrics across different architectures, demonstrating that language-guided refinement is a powerful tool for accuracy. * In the Office domain, the FLARE hybrid model's Recall@10—a measure of how often the desired item appears in the top 10 results—increased from 0.124 to 0.1402 when critiques were included. * Results indicate that models trained on REGEN can achieve performance comparable to state-of-the-art specialized recommenders while maintaining high-quality natural language generation. The REGEN dataset and the accompanying LUMEN architecture provide a path forward for building more transparent and interactive AI assistants. For developers and researchers, utilizing these conversational benchmarks is essential for moving beyond "black box" recommendations toward systems that can explain their logic and adapt to specific user preferences in real time.

discord

Staff Picks, June 2025: Summer of Showcases (opens in new tab)

June was a massive month for the gaming industry, highlighted by a staggering 23 different showcases that broadcasted a wide variety of upcoming titles. To process this influx of news, contributors Alex, Armando, and Matt provide a curated look at the most interesting announcements from both AAA and indie events. Their analysis focuses on identifying the standout games of the season and those that best capture a specific summer aesthetic. ### The Summer Showcase Marathon * June featured a dense schedule of 23 separate game livestreams, necessitating constant monitoring of events like Summer Game Fest and The MIX. * The presentations covered a broad spectrum of the industry, ranging from high-budget, mainstream spectacles to niche indie-focused showcases. * This high volume of announcements reflects a diverse upcoming release calendar, catering to various gaming interests and platforms. ### Curated Highlights and Seasonal Recommendations * A panel consisting of Alex, Armando, and a new contributor, Matt, breaks down the most compelling titles revealed during the intensive month of livestreams. * The contributors highlight specific games that stood out for their innovation or presentation among the hundreds of titles shown. * A core part of the discussion focuses on identifying games that evoke a "summer mood," providing recommendations that align with the seasonal atmosphere. For those overwhelmed by the sheer volume of June’s gaming news, following these curated insights is an effective way to identify the must-watch indie and AAA titles slated for the near future.