distributed-tracing | Techlist.io

toss Nov 13, 2025

Frontend Code That Lasts (opens in new tab)

Toss Payments evolved its Payment SDK to solve the inherent complexities of integrating payment systems, where developers must navigate UI implementation, security flows, and exception handling. By transitioning from V1 to V2, the team moved beyond simply providing a library to building a robust, architecture-driven system that ensures stability and scalability across diverse merchant environments. The core conclusion is that a successful SDK must be treated as a critical infrastructure layer, relying on modular design and deep observability to handle the unpredictable nature of third-party runtimes. ## The Unique Challenges of SDK Development * SDK code lives within the merchant's runtime environment, meaning it shares the same lifecycle and performance constraints as the merchant’s own code. * Internal logging can inadvertently create bottlenecks; for instance, adding network logs to a frequently called method can lead to "self-DDoS" scenarios that crash the merchant's payment page. * Type safety is a major hurdle, as merchants may pass unexpected data types (e.g., a number instead of a string), causing fatal runtime errors like `startsWith is not a function`. * The SDK acts as a bridge for technical communication, requiring it to function as both an API consumer for internal systems and an API provider for external developers. ## Ensuring Stability through Observability * To manage the unpredictable ways merchants use the SDK, Toss implemented over 300 unit tests and 500 E2E integration tests based on real-world use cases. * The team utilizes a "Global Trace ID" to track a single payment journey across both the frontend and backend, allowing for seamless debugging across the entire system. * A custom Monitoring CLI was developed to compare payment success rates before and after deployments, categorized by merchant and runtime environment (e.g., PC Chrome vs. Android WebView). * This observability infrastructure enables the team to quickly identify edge-case failures—such as a specific merchant's checkout failing only on mobile WebViews—which are often missed by standard QA processes. ## Scaling with Modular Architecture * To avoid "if-statement hell" caused by merchant-specific requirements (e.g., fixing installment months or custom validation for a specific store), Toss moved to a "Lego-block" architecture. * The SDK is organized into three distinct layers based on the "reason for change" principle: * **Public Interface Layer:** Manages the contract with the merchant, validating inputs and translating them into internal domain models. * **Domain Layer:** Encapsulates core business logic and payment policies, keeping them isolated from external changes. * **External Service Layer:** Handles dependencies like Server APIs and Web APIs, ensuring technical shifts don't leak into the business logic. * This separation allows the team to implement custom merchant logic by swapping specific blocks without modifying the core codebase, reducing the risk of regressions and lowering maintenance costs. For developers building SDKs or integration tools, the shift from monolithic logic to a layered, observable architecture is essential. Prioritizing the separation of domain logic from public interfaces and investing in environment-specific monitoring allows for a highly flexible product that remains stable even as the client-side environment grows increasingly complex.

distributed-tracing test-automation typescript observability+3