안녕하세요. 토스뱅크 Product Designer 전누리예요. 이번 글에서는 인턴으로 토스에 합류해 처음으로 실험을 설계했던 경험을 공유해보려고 해요. 제가 맡은 첫 과제는 토스뱅크 비회원의 가입 전환율을 높이는 것이었어요. 처음 실험을 설계해야 했던 저는 세 가지가 가장 어려웠어요. 1️⃣ 이탈이 큰 구간이 여러 개인데, 어디부터 개선해야 할까? 2️⃣ 이미 많은 실험이 진행되었는데, 나는 뭘 더 할 수 있을까? 3️⃣ 가설을 어떻게 세워야 흔들리지 않을까? 이 세 가지 고민을 어떻게 풀었는…
Toss designer Lee Hyeon-jeong argues that business goals and user experience are not mutually exclusive, even when integrating controversial elements like advertising. By identifying the intersection between monetization and usability, her team transformed intrusive ads into value-driven features that maintain user trust while driving significant revenue. The ultimate conclusion is that transparency and appropriate rewards can mitigate negative feedback and even increase user engagement.
### Reducing Friction through Predictability and Placement
* Addressed "surprise" ads by introducing clear labeling, such as "Watch Ad" buttons or specifying ad durations (e.g., "30-second ad"), which reduced negative sentiment without decreasing revenue.
* Discovered that when users are given a choice and clear expectations, their anxiety decreases and their willingness to engage with the content increases.
* Eliminated "flow-breaking" ads that mimicked functional UI elements, such as banners placed inside transaction histories that users frequently mistook for personal bank records.
* Established a design principle to place advertisements only in areas that do not interfere with information discovery or core user navigation tasks.
### Transforming Advertisements into User Benefits
* Developed a dedicated B2B ad platform to scale the variety of available advertisements, ensuring that users receive ads relevant to their specific life stages, such as car insurance or new credit cards.
* Shifted the internal perception of ads from "noise" to "benefits" by focusing on the right timing and high-quality matching between the advertiser and the user's needs.
* Institutionalized regular "creative ideation sessions" to explore interactive formats, including advertisements that respond to phone movement (gyroscope), quizzes, and mini-games.
* Leveraged long-term internal experiments to ensure that even if an idea cannot be implemented immediately, it remains in the team's "creative bank" for future product opportunities.
### Optimizing Value Exchange through Rewards
* Conducted over a year of A/B testing on reward thresholds, comparing small cash amounts (1 KRW to 200 KRW), non-monetary items (gifticons), and high-stakes lottery-style prizes.
* Analyzed the "labor intensity" of ads by adjusting lengths (10 to 30 seconds) to find the psychological tipping point where users felt the reward was worth their time.
* Implemented a high-value lottery system within the Toss Pedometer service, which successfully transitioned a loss-making feature into a profitable revenue stream.
* Maintained user activity and satisfaction levels despite the increased presence of ads by ensuring the "worst-case experience"—viewing ads for no gain—was entirely avoided.
Product teams should stop viewing business requirements and UX as a zero-sum game. By focusing on user psychology—specifically transparency, non-disruption, and fair value exchange—it is possible to achieve aggressive business targets while maintaining a sustainable and trusted user environment.
When Discord launched Voice Messages in 2023, the engineering and data teams faced a significant hurdle in measuring the feature's impact through traditional A/B testing. Because the feature is inherently social—requiring both a sender and a receiver—standard user-level randomization would fail to capture the true causal effect due to heavy network interference. The team had to navigate the limitations of their testing infrastructure, ultimately seeking a balance between imperfect user-level tests and geographically biased alternatives.
### The Conflict Between Social Features and SUTVA
* Traditional A/B testing relies on the Stable Unit Treatment Value Assumption (SUTVA), which posits that the behavior of one user is independent of the treatment assignment of others.
* Voice Messages break this assumption because the feature’s value is realized through interactions; if a sender is in the treatment group but the receiver is in control, the experimental boundaries blur.
* Network effects occur when treatment behavior in one group influences the control group, potentially skewing metrics and leading to an inaccurate understanding of the feature's success.
### Infrastructure Constraints and Randomization Strategies
* The ideal solution for social platforms is cluster randomization, which assigns entire networks or communities to a single experimental arm to contain interactions.
* Discord’s internal testing platform did not support cluster randomization at the time of the Voice Message launch, forcing the team to consider less-than-ideal methodologies.
* User-level randomization was deemed "bad" for this specific use case because it could not account for the interconnected nature of Discord’s user base.
### The Trade-offs of Geo-Testing
* One proposed alternative was randomizing by country, based on the assumption that most social networks are language or country-specific.
* By treating an entire geographic region while keeping another as a control, the team hoped to mitigate cross-group network interference.
* However, geo-testing introduces significant bias, as it conflates the treatment effect with existing cultural, economic, and behavioral differences between countries.
To accurately measure the impact of features built on social connectivity, organizations must account for network interference that violates standard statistical assumptions. When cluster randomization infrastructure is unavailable, data teams must carefully weigh the bias introduced by geographic testing against the interference inherent in user-level randomization.