AI Catalog: Discover, create, and share agents and flows (opens in new tab)

The GitLab AI Catalog serves as a centralized repository designed to streamline the discovery, creation, and distribution of AI agents and automated flows across an organization. By providing a structured environment for managing foundational and custom AI assets, it fosters team collaboration and ensures consistency throughout the development lifecycle. Ultimately, the catalog enables developers to scale AI-driven automation from experimental private prototypes to production-ready, instance-wide solutions. ## Discovering and Enabling AI Assets * The catalog acts as a central hub for two distinct asset types: Agents, which handle on-demand or context-specific tasks, and Flows, which are multi-step automations that orchestrate multiple agents. * Users can browse assets via the Explore menu, inspecting titles, descriptions, and visibility statuses before implementation. * To utilize an asset, it must first be added to a top-level group via the "Enable in group" button and then activated within specific projects. * The duplication feature allows teams to copy existing agents or flows to serve as templates for further customization. ## Development and Configuration * Custom agents are built by defining specialized system prompts and configuring specific tool access, such as granting read-only permissions for code and merge requests. * Custom flows utilize a YAML-based structure to define complex behaviors, incorporating components like prompts, routers, and agent hierarchies. * New assets are typically assigned a unique display name (e.g., `ci-cd-optimizer`) and initially set to private visibility to allow for safe experimentation. * Effective creation requires thorough documentation of prerequisites, dependencies, and specific use cases to ensure the asset is maintainable by other team members. ## Managing Visibility and Sharing * Private visibility restricts access to project members with at least a Developer role or top-level group Owners, making it ideal for sensitive or team-specific workflows. * Public visibility allows anyone on the GitLab instance to view and enable the asset in their own projects. * Best practices for sharing include using descriptive, purpose-driven names like `security-code-review` rather than generic identifiers. * Organizations are encouraged to validate and test assets privately before moving them to public status to ensure they solve real problems and handle edge cases. ## Versioning and Lifecycle Management * GitLab employs automated semantic versioning (e.g., 1.1.0) where any change to a prompt or configuration triggers an immutable version update. * The platform uses "version pinning" to ensure stability; when an asset is enabled, projects remain on a fixed version rather than updating automatically. * Updates are strictly opt-in, requiring users to manually review changes and click an "Update" button to adopt the latest version. * Version history and current status can be monitored through the "About" section in the Automate menu for both agents and flows. To maximize the benefits of the AI Catalog, organizations should establish a clear transition path from private experimentation to public sharing. By leveraging version pinning and granular tool access, teams can safely integrate powerful AI automations into their development workflows while maintaining full control over environment stability and security.

Understanding flows: Multi-agent workflows (opens in new tab)

The GitLab Duo Agent Platform introduces flows as a sophisticated orchestration layer that allows multiple specialized AI agents to collaborate on complex, multi-step developer workflows. Unlike standard interactive agents, flows are designed to work autonomously and asynchronously on GitLab’s platform compute, executing tasks ranging from initial requirement analysis to final merge request creation. This architecture enables teams to offload repetitive or high-compliance tasks to a background process that integrates directly with the existing GitLab ecosystem. ## Core Mechanics of Multi-Agent Flows * Flows function as event-driven systems triggered by specific actions such as @mentions, issue assignments, or being designated as a reviewer on a merge request. * Execution occurs on GitLab's platform compute, removing the need for users to maintain separate infrastructure for their automation logic. * While standard agents are interactive and synchronous, flows are designed to be autonomous, gathering context and making decisions across various project files and APIs without constant human intervention. * The system supports background processing, allowing developers to continue working on other tasks while the flow handles complex implementations or security audits. ## Foundational and Custom Flow Categories * Foundational flows are production-ready, general-purpose workflows maintained by GitLab and accessible through standard UI controls and IDE interfaces. * Custom flows are specialized workflows defined via YAML that allow teams to tailor AI behavior to unique organizational requirements, such as specific coding standards or regulatory compliance like PCI-DSS. * Custom flows utilize a YAML schema to define specific components, including "Routers" for logic steering and "Toolsets" that grant agents access to GitLab API functions. * Real-world applications for custom flows include automated security scanning, documentation generation, and complex dependency management across a project. ## Technical Configuration and Triggers * Flows are triggered through simple Git commands and UI actions, such as `/assign @flow-name` or `/assign_reviewer @flow-name`. * The configuration for a custom flow includes an "ambient" environment setting and defines specific `AgentComponents` that map to unique prompts and toolsets. * Toolsets provide agents with capabilities such as `get_repository_file`, `create_commit`, `create_merge_request`, and `blob_search`, enabling them to interact with the codebase programmatically. * YAML definitions also manage UI log events, allowing users to track agent progress through specific hooks like `on_tool_execution_success` or `on_agent_final_answer`. To maximize the value of the GitLab Duo Agent Platform, teams should identify repetitive compliance or boilerplate implementation tasks and codify them into custom flows. By defining precise prompts and toolsets within the YAML schema, organizations can ensure that AI-driven automation adheres to internal domain expertise and coding standards while maintaining a high level of transparency through integrated UI logging.

Discord Appoints Humam Sakhnini as Chief Executive Officer (opens in new tab)

Discord has appointed former Activision Blizzard executive Humam Sakhnini as its new CEO, effective April 28, 2025, marking a significant transition from founder-led management to veteran industry leadership. Co-founder Jason Citron will move into an advisory role while remaining on the Board of Directors, and co-founder Stanislav Vishnevskiy will continue his tenure as Chief Technology Officer. This leadership shift is designed to accelerate Discord’s commercial scaling and deepen its integration within the global gaming ecosystem. **Leadership Transition and Industry Pedigree** * Humam Sakhnini brings over 15 years of high-level gaming experience, having most recently served as Vice Chairman at Activision Blizzard. * Sakhnini previously led King Digital Entertainment as President, where he managed record-breaking performance for massive titles such as *Candy Crush*. * His expertise spans managing multi-billion dollar portfolios including *Call of Duty* and *World of Warcraft*, positioning him to lead Discord’s efforts in professionalizing its business operations. * Jason Citron’s transition to Advisor follows a decade of building the platform from its inception to a cornerstone of digital communication. **Market Position and Financial Momentum** * Discord currently hosts more than 200 million monthly active users who generate 2 billion hours of gameplay each month. * The company has reported five consecutive quarters of positive adjusted EBITDA, signaling a transition from venture-backed growth to financial sustainability. * The platform remains the primary social infrastructure for multiplayer gaming, supporting thousands of individual titles through voice, video, and text integration. **Strategic Evolution of Revenue Streams** * While the core business has historically relied on the Nitro subscription service, the company is now diversifying into advertising and micro-transactions. * A renewed focus on "gaming roots" involves providing social infrastructure directly to game developers to help them engage their communities. * The new leadership aims to capitalize on modern revenue creation and customer acquisition strategies to better monetize the platform’s massive engagement metrics. This leadership change indicates that Discord is moving into a "growth and monetization" phase, prioritizing industrial-scale operations over early-stage product discovery. For developers and partners, this likely means a more aggressive rollout of developer tools and collaborative features designed to bridge the gap between gameplay and community management.

.NET Continuous Profiler: Memory usage | Datadog (opens in new tab)

Datadog’s Continuous Profiler timeline view addresses the challenge of diagnosing performance bottlenecks in production by providing a granular, time-sequenced visualization of code execution. By correlating thread activity with resource consumption, it enables engineers to move beyond high-level metrics and identify the exact lines of code responsible for latency spikes or CPU saturation. This visibility ensures that teams can optimize application performance and resolve complex runtime issues without the overhead of manual reproduction. ### Visualizing Thread Activity and CPU Utilization * The timeline view displays a breakdown of thread states, allowing developers to distinguish between "Running," "Runnable," "Blocked," and "Waiting" statuses. * By comparing wall time (total elapsed time) against CPU time (active processing), users can identify if a process is bottlenecked by intensive calculations or external dependencies. * Hovering over specific time slices reveals the associated stack traces, providing immediate context into which functions were active during a performance anomaly. ### Detecting Garbage Collection and Runtime Overhead * The profiler highlights runtime-specific events, such as Garbage Collection (GC) pauses, directly within the execution timeline. * This correlation allows teams to see if a spike in latency was caused by "Stop-the-World" events or inefficient memory allocation patterns that trigger frequent GC cycles. * By visualizing these events alongside application logic, engineers can determine whether to optimize their code or tune the underlying runtime configuration. ### Correlating Profiling Data with Distributed Traces * The timeline view integrates with Application Performance Monitoring (APM) to link specific slow traces to their corresponding profile data. * This "trace-to-profile" workflow allows developers to pivot from a high-latency request directly to the exact thread behavior occurring at that moment. * This integration eliminates guesswork when investigating "P99" latency outliers, as it shows exactly where time was spent—whether on lock contention, I/O wait, or complex algorithmic execution. ### Streamlining Production Troubleshooting * The tool enables a proactive approach to performance management by identifying "silent" inefficiencies that do not necessarily trigger errors but degrade the user experience. * Using the timeline view during post-mortem investigations provides a factual record of thread behavior, reducing the Mean Time to Resolution (MTTR) for intermittent production issues. For organizations running high-scale distributed systems, adopting a continuous profiling strategy with a focus on timeline analysis is recommended. This approach transforms observability from simple monitoring into a deep diagnostic capability, allowing for precise optimizations that lower infrastructure costs and improve application responsiveness.

Failure is inevitable: Learning from a large outage, and building for reliability in depth at Datadog | Datadog (opens in new tab)

Following a major 2023 incident that caused a near-total platform outage despite partial infrastructure availability, Datadog shifted its engineering philosophy from "never-fail" architectures to a model of graceful degradation. The company identified that prioritizing absolute data correctness during systemic stress created "square-wave" failures, where the entire platform appeared down if even a portion of data was missing. By moving toward a "fail better" mindset, Datadog now focuses on maintaining core functionality and data persistence even when underlying infrastructure is compromised. ## Limitations of the Never-Fail Approach * Classical root-cause analysis focused on a legacy, unsupervised global update mechanism that disconnected 50–60% of production Kubernetes nodes. * While the "precipitating event" was easily identified and disabled, the engineering team realized that fixing the trigger did not address the systemic fragility that caused a binary (up/down) failure pattern. * Prioritizing absolute accuracy meant that systems would wait for all data tags to process before displaying results; under stress, this caused the UI to show no data at all rather than "almost correct" data. * Sequential queuing, aggressive retry logic, and node-specific processing requirements exacerbated the bottleneck, preventing real-time recovery. ## Prioritizing Graceful Degradation * The incident prompted a shift away from relying solely on redundancy to prevent outages, acknowledging that some level of failure is eventually inevitable at scale. * Engineering priorities were redefined to ensure that data is never lost (even if delayed) and that real-time data is processed before stale backlogs. * The platform now aims to serve partial-but-accurate results to customers during an incident, providing visibility rather than a complete blackout. * Implementation is handled as a company-wide program where individual product teams adapt these principles to their specific architectural needs. ## Strengthening Data Persistence at Intake * Analysis revealed that data was lost during the outage because it was stored in memory or on local disks before being replicated to persistent stores. * The original design favored low-latency responses by acknowledging receipt of data before it was fully replicated, making that data unrecoverable if the node failed. * Downstream failures caused intake nodes to overflow their local buffers, leading to data loss even on nodes that remained online. * New architectural changes focus on implementing disk-based persistence at the very beginning of the processing pipeline to ensure data survives node restarts and downstream congestion. To build truly resilient systems, engineering teams must move beyond trying to prevent every possible failure trigger. Instead, focus on designing services that can survive partial infrastructure loss by prioritizing data persistence and allowing for degraded states that still provide value to the end user.