toss

Tax Refund Automation: AI (opens in new tab)

At Toss Income, QA Manager Suho Jung successfully automated complex E2E testing for diverse tax refund services by leveraging AI as specialized virtual team members. By shifting from manual coding to a "human-as-orchestrator" model, a single person achieved the productivity of a four-to-five-person automation team within just five months. This approach overcame the inherent brittleness of testing long, React-based flows that are subject to frequent policy changes and external system dependencies.

Challenges in Tax Service Automation

The complexity of tax refund services presented unique hurdles that made traditional manual automation unsustainable:

  • Multi-Step Dependencies: Each refund flow averages 15–20 steps involving internal systems, authentication providers, and HomeTax scraping servers, where a single timing glitch can fail the entire test.
  • Frequent UI and Policy Shifts: Minor UI updates or new tax laws required total scenario reconfigurations, making hard-coded tests obsolete almost immediately.
  • Environmental Instability: Issues such as "Target closed" errors during scraping, differing domain environments, and React-specific hydration delays caused constant test flakiness.

Building an AI-Driven QA Team

Rather than using AI as a simple autocomplete tool, the project assigned specific "personas" to different AI models to handle distinct parts of the lifecycle:

  • SDET Agent (Claude Sonnet 4.5): Acted as the lead developer, responsible for designing the Page Object Model (POM) architecture, writing test logic, and creating utility functions.
  • Documentation Specialist: Automatically generated daily retrospectives and updated technical guides by analyzing daily git commits.
  • Git Master: Managed commit history and PR descriptions to ensure high-quality documentation of the project’s evolution.
  • Pair Programmers (Cursor & Codex): Handled real-time troubleshooting, type errors, and comparative analysis of different test scripts.

Technical Solutions for React and Policy Logic

The team implemented several sophisticated technical strategies to ensure test stability:

  • React Interaction Readiness: To solve "Element is not clickable" errors, they developed a strategy that waits not just for visibility, but for event handlers to bind to the DOM (Hydration).
  • Safe Interaction Fallbacks: A standard click utility was created that attempts a Playwright click, then a native keyboard 'Enter' press, and finally a JS dispatch to ensure interactions succeed even during UI transitions.
  • Dynamic Consent Flow Utility: A specialized system was built to automatically detect and handle varying "Terms of Service" agreements across different sub-services (Tax Secretary, Hidden Refund, etc.) through a single unified function.
  • Test Isolation: Automated scripts were used to prevent userNo (test ID) collisions, ensuring 35+ complex scenarios could run in parallel without data interference.

Integrated Feedback and Reporting

The automation was integrated directly into internal communication channels to create a tight feedback loop:

  • Messenger Notifications: Every test run sends a report including execution time, test IDs, and environment data to the team's messenger.
  • Automated Failure Analysis: When a test fails, the AI automatically posts the error log, the specific failed step, a tracking EventID, and a screenshot as a thread reply for immediate debugging.
  • Human-AI Collaboration: This structure shifted the QA's role from writing code to discussing failures and policy changes within the messenger threads.

The success of this 5-month experiment suggests that for high-complexity environments, the future of QA lies in "AI Orchestration." Instead of focusing on writing selectors, QA engineers should focus on defining problems and managing the AI agents that build the architecture.