noodls browser compatibility check

The security settings of your browser are blocking the execution of scripts.

To use noodls, javascript support must be enabled. Please change your browser's security settings to enable javascript.

If you have changed your browser's security settings, you can click here.

related announcements

Financial Services

Blackrock Corporate High Yield[...]

Distribution Dates and Amounts Announced for Certain BlackRock[...]
BlackRock MuniAssets Fund Inc.

Distribution Dates and Amounts Announced for Certain BlackRock[...]
BlackRock Core Bond Trust

Distribution Dates and Amounts Announced for Certain BlackRock[...]

Information Technology

Stripe Inc.

01/23/2025 | News release | Distributed by Public on 01/23/2025 11:21

Using ML to detect and respond to performance degradations in slices of Stripe payments

Every day, Stripe processes billions of dollars in payment volume. This figure surges during peak volume periods; over Black Friday and Cyber Monday in 2024, businesses processed more than $31 billion on Stripe. Our users rely on us to maximize their revenue and deliver a seamless experience to their customers, which is why we consistently monitor our global money movement systems to ensure they're operating smoothly.

One approach to monitoring payment performance would be tracking aggregate performance across all payments on our platform. While this would give us a comprehensive overview, it would likely obscure degradations affecting specific segments of traffic. For instance, a card issuer might make system changes that alter the acceptance of a specific payment type (e.g., a UK card issuer begins to decline recurring payments on prepaid cards at high rates). Given the scale of Stripe's processing, that spike in failed payments might not be enough to move global metrics, even though specific businesses in the UK with high use of prepaid cards would feel an acute impact.

To solve this problem, we have developed a system that offers near real-time visibility into the performance of subsets, or "slices," of Stripe traffic. We apply a combination of machine learning (ML) and time series algorithms to detect performance degradations across various metrics, including payment success rates, authentication rates, costs, fraud, and more. When our monitoring system detects a degradation, it automatically alerts the relevant experts at Stripe to investigate and resolve the underlying issue. Achieving this capability required us to grapple with three fundamental problems:

How should we define a slice of payment traffic?
How can we detect when a slice is experiencing performance degradation?
How do we trigger a swift and effective response?

Defining slices

We monitor payments in a high-dimensional space characterized by over 16,000 payment-related variables. These include more than 10,000 issuing banks, hundreds of currencies, countries, card products, and payment features (e.g., Apple Pay, mail or telephone orders, account funding transfers). Performance issues might arise from unique combinations of any of these factors (e.g., a spike in failures on digital wallet payments from debit cards on French issuers).

To effectively define slices, we strike a balance between making them narrow enough to isolate the specific shape of degradations, yet data-rich enough to detect degradations with high statistical confidence. To do this, we:

Continuously refine our definitions of payment slices by integrating new functionalities (e.g., advanced card features such as multicapture or partial authorization) into our monitoring system as they become supported by Stripe.
Incorporate insights gained from historical incidents. If a past degradation went undetected due to our monitoring limitations, we'll establish a new slice to track the relevant variables going forward.
Define slices based on our deep understanding of how the information users share in Stripe API calls gets transmitted through the broader financial environment. This method necessitates collaboration with the engineering teams responsible for our financial infrastructure, ensuring we capture nuances in our monitoring strategy.

Finally, we've invested heavily in platformization: we've built slice monitoring as a general framework that allows engineering teams to develop their own metrics and slices while leveraging core slice-monitoring algorithms and operational tooling. This accelerates their ability to build and deploy powerful, precise detectors.

Detecting degradations

Once we define slices, our next challenge is to accurately identify performance degradation within them. This task is more complex than it might initially appear, as payment metrics can fluctuate dramatically for valid reasons.

Standard time-series anomaly detection approaches compare the current value of a time series against a baseline derived from historic data. However, for payment slice monitoring, standard anomaly detection is insufficient due to the absence of a stable baseline. There are many sources of underlying variation: customer onboarding, fraud trends, changes in business behavior, and more. For example, running a large free trial or launching a new line of business or product could alter customer composition, which consequently affects payment success rates. An algorithm that doesn't account for these intricacies would likely trigger false positives, hindering the effectiveness of our monitoring system.

To control for underlying changes in transaction composition, we employ a combination of Stripe's machine learning models and time-series analysis. First, we leverage ML models to estimate the probability of success for every transaction in our monitoring dataset (i.e., the expected outcome). These models are trained on Stripe's vast transaction-level datasets. Next, we conduct near real-time, time-series anomaly detection, adjusting for the underlying probability of success.

Figuring out when to act

Accurately detecting performance degradations is the first step in our monitoring process. The next challenge lies in determining how and when to act. Even with highly precise anomaly detection, monitoring tens of thousands of slices inherently leads to false positives from random fluctuations in metrics. Given that, we need to enable rapid detection while avoiding unnecessary alarms caused by transient drops in performance.

To achieve this, we use a finite state machine that aggregates losses over time, only triggering alerts when loss thresholds from sustained events are breached. Alerts are classified based on urgency-derived from the rate of volume loss-and inferred root cause, streamlining the routing to the appropriate team for investigation and remediation.

Extending slice monitoring and analytics directly to Stripe users

Solving these problems has resulted in a slice monitoring platform that identifies real degradations in payment performance each day with a precision exceeding 90%. This level of effectiveness allows us to have excellent coverage without generating unsustainable operational burden from false positives.

As we continue to refine and expand our slice monitoring capabilities, we're committed to sharing this powerful tool with you. Later this year, we plan to make slice monitoring alerts available directly to select users. Additionally, all users will have full visibility into payment success rates in the Payments analytics page of the Stripe Dashboard, where you'll also find recommendations for performance optimization. For more information on how to increase your revenue on Stripe, check out Stripe's payments performance suite. We'll also be talking more about this at Stripe Sessions 2025, so come join us.

Stripe Inc. published this content on January 23, 2025, and is solely responsible for the information contained herein. Distributed via Public Technologies (PUBT), unedited and unaltered, on January 23, 2025 at 17:21 UTC. If you believe the information included in the content is inaccurate or outdated and requires editing or removal, please contact us at support@pubt.io

Back

View original format