discord

Overclocking dbt: Discord's Custom Solution in Processing Petabytes of Data (opens in new tab)

Discord scaled its data infrastructure to manage petabytes of data and over 2,500 models by moving beyond a standard dbt implementation. While the tool initially provided a modular and developer-friendly framework, the sheer volume of data and a high headcount of over 100 concurrent developers led to critical performance bottlenecks. To resolve these issues, Discord developed custom extensions to dbt’s core functionality, successfully reducing compilation times and automating complex data transformations. ### Strategic Adoption of dbt * Discord integrated dbt into its stack to leverage software engineering principles like modular design and code reusability for SQL transformations. * The tool’s open-source nature allowed the team to align with Discord’s internal philosophy of community-driven engineering. * The framework offered seamless integration with other internal tools, such as the Dagster orchestrator, and provided a robust testing environment to ensure data quality. ### Scaling Bottlenecks and Performance Issues * The project grew to a size where recompiling the entire dbt project took upwards of 20 minutes, severely hindering developer velocity. * Standard incremental materialization strategies provided by dbt proved inefficient for the petabyte-scale data volumes generated by millions of concurrent users. * Developer workflows often collided, resulting in teams inadvertently overwriting each other’s test tables and creating data silos or inconsistencies. * The lack of specialized handling for complex backfills threatened the organization’s ability to deliver timely and accurate insights. ### Engineering Custom Extensions for Growth * The team built a provider-agnostic layer over Google BigQuery to streamline complex calculations and automate massive data backfills. * Custom optimizations were implemented to prevent breaking changes during the development cycle, ensuring that 100+ developers could work simultaneously without friction. * By extending dbt’s core, Discord transformed slow development cycles into a rapid, automated system capable of serving as the backbone for their global analytics infrastructure. For organizations operating at massive scale, standard open-source tools often require custom-built orchestration and optimization layers to remain viable. Prioritizing the automation of backfills and optimizing compilation logic is essential to maintaining developer productivity and data integrity when dealing with thousands of models and petabytes of information.

google

Geospatial Reasoning: Unlocking insights with generative AI and multiple foundation models (opens in new tab)

Google Research is introducing Geospatial Reasoning, a new framework that integrates generative AI with specialized foundation models to streamline complex geographical problem-solving. By combining large language models like Gemini with domain-specific data, the initiative seeks to make large-scale spatial analysis accessible to sectors like public health, urban development, and climate resilience. This research effort moves beyond traditional data silos, enabling agentic workflows that can interpret diverse data types—from satellite imagery to population dynamics—through natural language. ### Specialized Foundation Models for Human Activity * The Population Dynamics Foundation Model (PDFM) captures the complex interplay between human behaviors and their local environments. * A dedicated trajectory-based mobility foundation model has been developed to process and analyze movement patterns. * While initially tested in the US, experimental datasets are expanding to include the UK, Australia, Japan, Canada, and Malawi for selected partners. ### Remote Sensing and Vision Architectures * New models utilize advanced architectures including masked autoencoders, SigLIP, MaMMUT, and OWL-ViT, specifically adapted for the remote sensing domain. * Training involves high-resolution satellite and aerial imagery paired with text descriptions and bounding box annotations to enable precise object detection. * The models support zero-shot classification and retrieval, allowing users to locate specific features—such as "residential buildings with solar panels"—using flexible natural language queries. * Internal evaluations show state-of-the-art performance across multiple benchmarks, including image segmentation and post-disaster damage assessment. ### Agentic Workflows and Industry Collaboration * The Geospatial Reasoning framework utilizes LLMs like Gemini to manage complex datasets and orchestrate "agentic" workflows. * These workflows are grounded in geospatial data to ensure that the insights generated are both useful and contextually accurate. * Google is collaborating with inaugural industry partners, including Airbus, Maxar, Planet Labs, and WPP, to test these capabilities in real-world scenarios. Organizations interested in accelerating their geospatial analysis should consider applying for the trusted tester program to explore how these foundation models can be fine-tuned for specific proprietary data and use cases.

discord

Wicked Saints Turns Players into IRL Superheroes with the Help of e.l.f. Beauty and Discord (opens in new tab)

Wicked Saints Studios has launched *World Reborn*, a pioneering interactive story game designed to translate digital achievements into tangible real-world social impact. Built by a team of peacebuilders and behavioral scientists, the platform leverages positive psychology to reveal players' character strengths through narrative gameplay and "Training Mods." The launch marks a new paradigm for brand-led missions where Gen Z players can improve their personal wellbeing while contributing to global social causes. **Behavioral Technology and Skill-Building** * The game is described as "Duolingo for saving humanity," using bite-sized interactive narratives to improve player wellbeing and relationships. * The platform employs behavioral technology to identify a player's natural "character strengths," encouraging them to apply these traits to day-to-day reality. * Mastery within the game is specifically designed to bridge the gap between digital entertainment and real-life skills, such as resilience and emotional regulation. * The project emerged from Niantic’s BDI incubator, utilizing the expertise of former senior engineers from *Pokémon GO*. **Brand-Led Social Impact Missions** * The platform features "Training Mods," which are real-world quests hosted by corporate and non-profit partners to tackle specific social issues. * Exclusive launch partner e.l.f. Beauty has developed quests focused on building confidence in young girls and supporting Elite Women Athletes, featuring insights from WNBA star Aerial Powers. * Discord serves as the exclusive communications partner, hosting Training Mods centered on teen mental health, authenticity, and the use of gaming for stress reduction. * The Starlight Children’s Foundation provides a mission where players send words of encouragement to hospitalized children, using the story of a transplant survivor to foster empathy and connection. **Connecting with Gen Z Through Authenticity** * The platform targets the 13–24 demographic, noting that 87% of Gen Z play games at least once a week. * By moving beyond traditional advertising, brands can connect with Gen Z through the social spaces and authentic interactions they already prioritize. * The game leverages the social infrastructure of Discord to help players form communities around their unique strengths and gameplay experiences. **The Multi-Disciplinary Development Team** * Wicked Saints is a Black female-led studio backed by major industry players including Riot Games and Reid Hoffman. * The leadership team combines an Emmy-award winning storyteller and international peacebuilder with a behavioral science researcher. * Creative talent for the project includes veterans from major franchises and productions such as *Spider-Man: Into the Spider-Verse*, *Love Death + Robots*, and *Marvel*. For organizations looking to engage younger demographics, *World Reborn* offers a scalable model for integrating corporate social responsibility directly into the gaming experience. The app is currently available on the Apple App Store for an initial eight-week limited run.

discord

Discord Update: March 25, 2025 Changelog (opens in new tab)

The Discord March 25, 2025, update focuses on a comprehensive modernization of the desktop client and its in-game integration capabilities. Through a modular overlay redesign and significant UI flexibility, the platform aims to enhance user agency and service reliability for its core gaming audience. ### Modular In-Game Overlay * The overlay has been transitioned to a widget-based system, allowing users to move and organize specific UI elements to suit different game genres. * New functional depth includes the ability to watch friend's streams or start a personal gameplay broadcast with a single click without leaving the game. * Native Soundboard support is now integrated directly into the overlay for immediate access during gameplay. * The update expands the library of supported games, ensuring notifications and widgets work across more titles. ### Desktop UI Refresh and Customization * Four new base themes—Light, Ash, Dark, and Onyx—have been introduced alongside three UI density options to adjust app spacing. * The interface has been decluttered by moving the Inbox to the title bar and centralizing voice and video call buttons into a single unified bar. * The channel list is now resizable, facilitating easier navigation for servers with long channel names. * Active camera status is now indicated by a persistent green button, providing better visual feedback when a user's video is live. ### Developer Integrations and Social Tools * A new API allows developers to implement Discord-powered text chat directly within their game clients, with initial support in titles like *Rust* and *Pax Dei*. * The "Ignore" feature provides a privacy layer that hides messages from specific users in DMs and servers without notifying the ignored party. * Discord’s App Launcher now includes "Spark: Hero Tactics," a new first-party deck-building battle game available on both desktop and mobile. ### Backend Reliability and API Deployment * Discord has overhauled its API deployment process to isolate and protect critical infrastructure from potential configuration errors. * This structural change is designed to minimize the frequency of outages and ensure that core messaging and voice services remain stable during maintenance cycles. To get the most out of this update, users should experiment with the new UI density settings to reclaim screen real estate and check the Overlay settings to configure their widget layout before their next gaming session.

discord

How to Stream Games and Applications to Discord from Desktop or Mobile (opens in new tab)

Discord’s streaming functionality aims to foster a sense of proximity among friends by allowing users to share their activities in real-time from any location. By offering a fast and integrated setup across various devices, the platform simplifies the process of broadcasting content directly to a community. This guide explores the mechanics of initiating a stream, the configuration settings involved, and the hardware compatibility for both PC and mobile users. ### Real-Time Sharing and Speed * Discord prioritizes speed in its streaming architecture, allowing users to start a broadcast almost instantly through the interface. * The feature is designed to mimic the casual experience of showing a physical screen to someone nearby, bridging the gap between remote users. ### Cross-Platform Integration * High-speed streaming is supported across both desktop and mobile applications, providing flexibility for different use cases. * The guide details the specific technical requirements and available platforms for both PC and mobile devices. * Users are presented with various configuration options during the setup phase to tailor the stream's performance and quality to their needs. To begin broadcasting, users should navigate to their desired voice channel and select the streaming icon to access the platform-specific configuration menu. Reviewing the available stream settings before going live ensures the best balance between visual quality and performance for your specific device.

discord

Discord Patch Notes: April 3, 2025 (opens in new tab)

Discord’s latest Patch Notes highlight the engineering team's ongoing commitment to improving platform performance, reliability, and responsiveness through a series of incremental updates and bug fixes. While specific technical changes are currently being deployed across various platforms, the update emphasizes the collaborative role of the community in maintaining the app's stability. **Community Bug Tracking and Feedback** * Discord utilizes a Bimonthly Bug Megathread hosted on the r/DiscordApp subreddit to gather user-reported issues. * The Engineering team directly monitors these community reports to prioritize and resolve software regressions and usability obstacles. **Beta Testing and Pre-Release Development** * Users can participate in early-stage testing by opting into the Discord TestFlight version on iOS. * This testing environment allows for the identification of "pesky bugs" before features reach the general public, accessible via dis.gd/testflight. * All reported fixes are confirmed as committed and merged, though deployment timing may vary depending on the individual platform’s rollout schedule. To maintain the best user experience, it is recommended to keep the application updated and participate in official feedback channels to ensure that performance issues are addressed by the development team in a timely manner.

google

Evaluating progress of LLMs on scientific problem-solving (opens in new tab)

Current scientific benchmarks for large language models (LLMs) often focus on simple knowledge recall and multiple-choice responses, which do not reflect the complex, context-rich reasoning required in real-world research. To bridge this gap, Google Research has introduced CURIE, alongside the SPIQA and FEABench datasets, to evaluate LLMs on their ability to understand long-form documents, analyze multimodal data, and solve multi-step problems. These benchmarks aim to move AI from merely surfacing facts to actively assisting scientists in workflows involving information extraction, algebraic manipulation, and tool use. ### The CURIE Multitask Benchmark * CURIE spans six diverse scientific disciplines: materials science, condensed matter physics, quantum computing, geospatial analysis, biodiversity, and proteins. * The benchmark includes 10 challenging tasks, such as concept tracking, information aggregation, and cross-domain expertise, based on 429 full-length research documents. * The complexity of the benchmark is reflected in its scale, with input queries averaging 15,000 words and ground truth responses averaging 954 words. * Domain experts were involved in every phase of development, from sourcing papers to creating nuanced ground-truth answers in formats like JSON, LaTeX, and YAML. ### Multimodal Reasoning and Agentic Simulation * The SPIQA (Scientific Paper Image Question Answering) dataset evaluates the ability of multimodal LLMs to ground their answers in complex figures and tables found in scientific literature. * FEABench (Finite Element Analysis Benchmark) measures the ability of LLM agents to simulate and solve multiphysics, mathematics, and engineering problems. * These tools specifically test whether models can choose the correct computational tools and reason through the physical constraints of a given problem. ### Programmatic and Model-Based Evaluation * Because scientific answers are often descriptive or formatted heterogeneously, the evaluation uses programmatic metrics like ROUGE-L and Intersection-over-Union (IoU). * For free-form and complex technical generation, the framework incorporates model-based evaluations to ensure AI responses align with expert assessments. * Task difficulty is quantified by expert ratings, ensuring the benchmark measures high-level reasoning rather than just pattern matching. These new benchmarks provide a rigorous framework for developing LLMs that can act as true collaborators in the scientific process. By focusing on long-context understanding and tool-integrated reasoning, researchers can better track the progress of AI in handling the actual complexities of modern scientific discovery.

google

ECLeKTic: A novel benchmark for evaluating cross-lingual knowledge transfer in LLMs (opens in new tab)

ECLeKTic is a novel benchmark designed to evaluate how effectively large language models (LLMs) transfer knowledge between languages, addressing a common limitation where models possess information in a source language but fail to access it in others. By utilizing a closed-book question-answering format based on language-specific Wikipedia entries, the benchmark quantifies the gap between human-like cross-lingual understanding and current machine performance. Initial testing reveals that even state-of-the-art models have significant room for improvement, with the highest-performing model, Gemini 2.5 Pro, achieving only a 52.6% success rate. ## Methodology and Dataset Construction The researchers built the ECLeKTic dataset by focusing on "information silos" within Wikipedia to ensure the models would need to perform internal transfer rather than simply recalling translated training data. * The dataset targets 12 languages: English, French, German, Hebrew, Hindi, Indonesian, Italian, Japanese, Korean, Mandarin Chinese, Portuguese, and Spanish. * Researchers selected 100 articles per language from a July 2023 Wikipedia snapshot that existed exclusively in that specific language and had no equivalent articles in the other 11 targeted languages. * This approach uses Wikipedia presence as a proxy to identify facts likely encountered by the model in only one language during its training phase. ## Human Refinement and Decontextualization To ensure the quality and portability of the questions, the team employed native speakers to refine and verify the data generated by AI. * Human annotators filtered Gemini-generated question-and-answer pairs to ensure they were answerable in a closed-book setting without referring to external context. * Annotators performed "decontextualization" by adding specific details to ambiguous terms; for example, a reference to the "Supreme Court" was clarified as the "Israeli Supreme Court" to ensure the question remained accurate after translation. * Questions were curated to focus on cultural and local salience rather than general global knowledge like science or universal current events. * The final dataset consists of 384 unique questions, which were translated and verified across all 11 target languages, resulting in 4,224 total examples. ## Benchmarking Model Performance The benchmark evaluates models using a specific metric called "overall success," which measures a model's ability to answer a question correctly in both the original source language and the target language. * The benchmark was used to test eight leading open and proprietary LLMs. * Gemini 2.0 Pro initially set a high bar with 41.6% success, which was later surpassed by Gemini 2.5 Pro at 52.6%. * The results demonstrate that while models are improving, they still struggle to maintain consistent knowledge across different linguistic contexts, representing a major hurdle for equitable global information access. The release of ECLeKTic as an open-source benchmark on Kaggle provides a vital tool for the AI community to bridge the "knowledge gap" between high-resource and low-resource languages. Developers and researchers should use this data to refine training methodologies, aiming for models that can express their internal knowledge regardless of the language used in the prompt.