How to set up GitLab SAML SSO with Google Workspace (opens in new tab)

Organizations using GitLab.com SaaS can streamline access control by integrating SAML-based Single Sign-On (SSO) with Google Workspace. This setup enables automated user provisioning and dynamic permission management by mapping Google Workspace groups directly to GitLab roles. The result is a centralized security model that reduces manual administrative tasks while ensuring users have immediate, secure access to the platform. ### Prerequisites and Architectural Benefits * The integration requires a GitLab Premium or Ultimate subscription and Super Admin access to Google Workspace. * Once configured, the authentication flow redirects users to Google for credentials, after which Google sends a SAML assertion to GitLab containing user details and group memberships. * The system supports "Just-in-Time" provisioning, meaning GitLab accounts are created automatically upon a user's first successful login. * Permissions are dynamic; GitLab updates group memberships and roles every time a user signs in to reflect their current status in Google Workspace. ### Gathering GitLab Configuration Details * Configuration must be performed at the GitLab top-level group rather than within individual subgroups. * Administrators need to retrieve the Assertion Consumer Service (ACS) URL, which typically follows the format `https://gitlab.com/groups/[your-group]/-/saml/callback`. * The Identifier (Entity ID) must be copied to uniquely identify the GitLab group within the Google identity provider settings. * The GitLab SSO URL is the specific entry point users will utilize to initiate the authentication process. ### Configuring the Google Workspace SAML Application * Within the Google Admin Console, administrators must create a "Custom SAML app" to house the integration settings. * The setup process provides a Google SSO URL and a certificate file (typically a `.pem` format) that must be saved for the GitLab-side configuration. * The previously gathered GitLab ACS URL and Entity ID are entered into the Service Provider details section of the Google app configuration. ### Mapping User Attributes and Synchronizing Groups * Specific attribute mapping is required to ensure user data flows correctly: Google’s "Primary Email" should map to the "NameID," "First Name" to "firstName," and "Last Name" to "lastName." * For group synchronization to function, administrators must map selected Google Groups to an app attribute named exactly `groups` (lowercase). * Google allows for the synchronization of up to 75 groups, which GitLab uses to determine and update user permissions upon login. * The application must be explicitly turned "ON" for specific organizational units or the entire domain within the Google Admin Console to allow user access. ### Finalizing the Identity Provider Connection * GitLab requires a SHA-1 certificate fingerprint for security verification rather than the raw certificate file provided by Google. * Administrators must convert the downloaded Google `.pem` certificate into a SHA-1 fingerprint using an online conversion tool or a command-line utility. * This fingerprint, along with the Google SSO URL, is entered into GitLab’s SAML SSO settings to establish the trusted connection between the two platforms. To ensure a smooth rollout, it is recommended to test the integration with a small group of users before enforcing SAML for the entire organization. This allows administrators to verify that group-based permissions are mapping correctly to GitLab roles without disrupting existing workflows.

Bringing DAVE to All Discord Platforms (opens in new tab)

Discord is expanding its DAVE protocol to provide end-to-end encryption (E2EE) across all supported platforms, including web browsers, game consoles, and the Social SDK. This transition marks the move from an experimental rollout to a mandatory security standard for all voice and video communications on the platform. By March 1, 2026, Discord will officially deprecate non-E2EE calls, requiring all clients to support the protocol to maintain connectivity. ### Transitioning to a Global E2EE Standard * Discord currently facilitates tens of millions of E2EE calls daily via the DAVE protocol since its initial launch. * The update brings support to previously excluded environments, ensuring a unified privacy model across desktop, mobile, console, and web interfaces. * Support for the Social SDK ensures that third-party developers can integrate the same security standards into their own Discord-based applications. ### Technical Hurdles in Web Integration * Bringing DAVE to the browser required leveraging WebAssembly (Wasm) to handle the performance-intensive cryptographic operations necessary for real-time encryption. * Engineers utilized a Web Worker-based architecture to offload encryption and decryption tasks from the main execution thread, preventing UI latency and ensuring smooth audio/video playback. * The implementation involved navigating the specific security trade-offs and sandboxing limitations inherent to modern web browser environments. ### Deprecation Timeline and Compatibility * Starting March 1, 2026, any client or application that does not support the DAVE protocol will be blocked from participating in Discord calls. * Users and developers are encouraged to update their software and SDK integrations well ahead of the deadline to avoid service interruptions. * This move signifies the final step in Discord's strategy to make E2EE the default state for all voice and video channel interactions.

2023-03-08 incident: A deep dive into the platform-level recovery | Datadog (opens in new tab)

Following a massive system-wide outage in March 2023, Datadog successfully restored its EU1 region by identifying that a simple node reboot could resolve network connectivity issues caused by a faulty system patch. While the team managed to restore 100 percent of compute capacity within hours, the recovery effort was subsequently hindered by cloud provider infrastructure limits and IP address exhaustion. This post-mortem highlights the complexities of scaling hierarchical Kubernetes environments under extreme pressure and the importance of accounting for "black swan" capacity requirements. ## Hierarchical Kubernetes Recovery Datadog utilizes a strict hierarchy of Kubernetes clusters to manage its infrastructure, which necessitated a granular, three-tiered recovery approach. Because the outage affected network connectivity via `systemd-networkd`, the team had to restore components in a specific order to regain control of the environment. * **Parent Control Planes:** Engineers first rebooted the virtual machines hosting the parent clusters, which manage the control planes for all other clusters. * **Child Control Planes:** Once parent clusters were stable, the team restored the control planes for application clusters, which run as pods within the parent infrastructure. * **Application Worker Nodes:** Thousands of worker nodes across dozens of clusters were restarted progressively to avoid overwhelming the control planes, reaching full capacity by 12:05 UTC. ## Scaling Bottlenecks and Cloud Quotas Once the infrastructure was online, the team attempted to scale out rapidly to process a massive backlog of buffered data. This surge in demand triggered previously unencountered limitations within the Google Cloud environment. * **VPC Peering Limits:** At 14:18 UTC, the platform hit a documented but overlooked limit of 15,500 VM instances within a single network peering group, blocking all further scaling. * **Provider Intervention:** Datadog worked directly with Google Cloud support to manually raise the peering group limit, which allowed scaling to resume after a nearly four-hour delay. ## IP Address and Subnet Capacity Even after cloud-level instance quotas were lifted, specific high-traffic clusters processing logs and traces hit a secondary bottleneck related to internal networking. * **Subnet Exhaustion:** These clusters attempted to scale to more than twice their normal size, quickly exhausting all available IP addresses in their assigned subnets. * **Capacity Planning Gaps:** While Datadog typically targets a 66% maximum IP usage to allow for a 50% scale-out, the extreme demands of the recovery backlog exceeded these safety margins. * **Impact on Backlog:** For six hours, the lack of available IPs forced these clusters to process data significantly slower than the rest of the recovered infrastructure. ## Recovery Summary The EU1 recovery demonstrates that even when hardware is functional, software-defined limits can create cascading delays. Organizations should not only monitor their own resource usage but also maintain visibility into cloud provider quotas and ensure that subnet allocations account for extreme recovery scenarios where workloads may need to double or triple in size momentarily.