We hear consistently from our SMB customers that access to financing is a primary obstacle to growth, and that they’re not getting what they need from traditional lenders. We launched Stripe Capital to help fill this gap. Now, to quantify the impact, we completed a two-year rand…
Toss has developed MTVi (Mid-term Value - incremental) to quantify the financial impact of specific services within its platform, moving beyond the limitations of traditional LifeTime Value (LTV). By focusing on the incremental value generated over a one-year period, the metric allows the company to justify services that may lose money individually but drive significant ecosystem-wide growth. This framework provides a data-driven standard for prioritizing features and setting marketing budgets based on actual financial contributions.
### Limitations of Traditional LTV
* **Time Horizon Mismatch:** Traditional LTV projects value over 3 to 5 years, which is too slow for Toss’s rapid iteration cycles and fails to reflect the immediate impact of service improvements.
* **Investment Recovery Gaps:** Standard LTV models often benchmark marketing costs (CAC) against long-term projections, making it difficult to evaluate the efficiency of short-term experiments.
* **Lack of Incrementality:** LTV measures average user value but cannot isolate the specific "extra" value created by a single service, making it impossible to distinguish between a service's impact and natural user growth.
### Defining MTVi and DID Methodology
* **Incremental Focus:** MTVi is defined as the net financial value generated over one year specifically because a user experienced a new service, rather than just the average revenue of a user.
* **Quasi-Experimental Design:** Since A/B testing every service combination is impossible, Toss uses the Difference-in-Difference (DID) method to compare "Newly Activated Users" (NAU) against "Never" users.
* **Segment-Based Analysis:** To prevent bias—such as highly active users naturally gravitating toward more services—Toss segments users by age and historical activity (e.g., app open frequency) to ensure "apples-to-apples" comparisons within identical cohorts.
### Organizational Impact and Strategy
* **Unified Decision Metric:** MTVi provides a "common language" for different product teams (silos), allowing them to compare the value of disparate services—like pedometers versus remittances—on a single financial scale.
* **Efficiency Benchmarking:** The metric establishes a hard ceiling for investment; for example, Customer Acquisition Cost (CAC) is strictly managed so it does not exceed the calculated MTVi.
* **Platform-Wide Valuation:** By calculating both direct revenue and indirect spillover effects, Toss can prove the financial viability of "loss-leader" services that provide user benefits but increase overall app engagement and cross-service usage.
For organizations operating complex multi-service platforms, adopting an incremental value metric like MTVi is essential for moving beyond isolated P&L statements. Data teams should prioritize quasi-experimental methods like DID and rigorous user segmentation to accurately map how individual features influence the broader financial health of the ecosystem.
When Discord launched Voice Messages in 2023, the engineering and data teams faced a significant hurdle in measuring the feature's impact through traditional A/B testing. Because the feature is inherently social—requiring both a sender and a receiver—standard user-level randomization would fail to capture the true causal effect due to heavy network interference. The team had to navigate the limitations of their testing infrastructure, ultimately seeking a balance between imperfect user-level tests and geographically biased alternatives.
### The Conflict Between Social Features and SUTVA
* Traditional A/B testing relies on the Stable Unit Treatment Value Assumption (SUTVA), which posits that the behavior of one user is independent of the treatment assignment of others.
* Voice Messages break this assumption because the feature’s value is realized through interactions; if a sender is in the treatment group but the receiver is in control, the experimental boundaries blur.
* Network effects occur when treatment behavior in one group influences the control group, potentially skewing metrics and leading to an inaccurate understanding of the feature's success.
### Infrastructure Constraints and Randomization Strategies
* The ideal solution for social platforms is cluster randomization, which assigns entire networks or communities to a single experimental arm to contain interactions.
* Discord’s internal testing platform did not support cluster randomization at the time of the Voice Message launch, forcing the team to consider less-than-ideal methodologies.
* User-level randomization was deemed "bad" for this specific use case because it could not account for the interconnected nature of Discord’s user base.
### The Trade-offs of Geo-Testing
* One proposed alternative was randomizing by country, based on the assumption that most social networks are language or country-specific.
* By treating an entire geographic region while keeping another as a control, the team hoped to mitigate cross-group network interference.
* However, geo-testing introduces significant bias, as it conflates the treatment effect with existing cultural, economic, and behavioral differences between countries.
To accurately measure the impact of features built on social connectivity, organizations must account for network interference that violates standard statistical assumptions. When cluster randomization infrastructure is unavailable, data teams must carefully weigh the bias introduced by geographic testing against the interference inherent in user-level randomization.