How not to lose money in a black box: billing testing methods

Tomcat · May 27, 2024

Testing paid services is one of the key engineering issues in Badoo testing. Our application is integrated with 70 payment providers in 250 countries, and a bug in at least one of them can lead to unpredictable consequences.

In this article, I will talk about the testing methods we use at Badoo and the limits of applicability of these methods - the testing stages at which they are most effective.

The article will be useful to testers, developers and product managers whose projects are already integrated with payment providers, or the integration process is just beginning. If in your work you are faced with the problem of choosing methods for testing such integrations, welcome to the cat!

My name is Vladimir Solodov, I am a Billing QA Engineer at Badoo: I check test integrations and payment processing. My colleague Viktor Koronevich helped me in preparing the text: together with him we gave a report at the Heisenbug conference (video). In the article, we expanded the scope of description to all integrations with payment providers that are used in Badoo, classified and described in more detail the practices of removing external dependencies.

Using business cases as an example, I will tell you why you should be more careful when testing paid services and how not to aggravate problems if they do arise. And then we will move on to a description of the technical problems of integration testing and ways to solve them.

Go!

Specifics of billing testing

Typically, the purpose of a business is to generate income. Badoo, a social network for dating, generates income from credits and premium subscriptions. Credits are Badoo's internal currency. With their help, for example, you can raise your profile to the first place in search results, give a gift to another user, and so on. A premium subscription is valid for a certain period of time and provides several opportunities at once: turn on the “invisibility” mode, see people who have shown interest in you, confirm the authenticity of your account, and much more.

To make these paid services work, we use integrations with over 70 payment providers. The choice of provider depends on the platform, country, device, mobile operator and other factors. Therefore, the issue of testing paid services is very pressing for us.

First, let’s look at why testing paid services needs to be approached with special attention. There are two reasons.

1. Bugs in billing are critical for business.

The first problem is reputational. A user who has paid money for the service becomes more sensitive (and less tolerant) to bugs in the application. Any review in the public space, be it a review of an application on a blog or a comment in the App Store or Google Play, from a user who has encountered a bug in a paid service will be more emotional - this is a factor that leads to reputational losses.

The second problem is that as soon as you start receiving money from the user for the service, you become subject to rights that protect the consumer of the services. And reputational losses can easily turn into financial ones.

Companies lose money in three ways.

The first way is refunds. Let's say a user discovers that you've sold them a service that doesn't quite meet their expectations. In this case, he contacts your support service. Its employees conduct an investigation and find out that the user’s expectations were indeed not met due to a bug in the application. You initiate a refund. In this case, a refund occurs: as a result, the company is faced with lost profits, and this is the most harmless way to lose money.

The second way is chargebacks. Suppose the same situation occurred, only the user did not contact your support service, but the bank that issued him the card, or the payment provider. The bank/provider initiates a refund. In this case, we are dealing with a chargeback. The danger for business here is not only lost profit. After a certain number of chargebacks, the company receives a fine, and its risk rating is also reduced. A decrease in rating, in turn, leads to an increase in the cost of payment provider services.

The third way is lawsuits. In the most advanced cases, lawsuits (including collective ones) may occur, leading to the most serious consequences. Thus, in 2015, after a lawsuit by the regulator Ofgem, British Gas was forced to pay multimillion-dollar compensation to users from whom it charged increased fees due to an error in the charging system. You can read more about this here .

2. Testing integrations requires knowledge and expertise.

Teams just starting the process of integrating with payment providers often run into this problem. Without knowing all possible billing cases, they miss important nuances when implementing the system’s response to notifications from payment providers.

This can lead to unpredictable consequences - from lost profits to dissatisfied users.

Let's turn to the diagram that lists the types of paid services and take a closer look at the problem.

Figure 1. Possible billing cases

There are three main cases: error, successful payment and refund to the user. But every case has details, and your system must handle each case differently.

Errors can be critical or non-critical. Non-critical ones include a notification error - when the payment provider informs about the lack of funds in the user's account, and critical ones - blocking the user's payment method. And if in the first case you can try to make payment later, then in the second case it would be nice to figure out why the method is blocked. Perhaps the user has been caught committing fraud and you should be more attentive to his transactions.

Returns. You already know that there are two types of returns: refunds and chargebacks. Your system should respond differently to them. For example, after a chargeback, it makes sense to think about blocking some functions of your application for the user, because chargeback is one of the popular methods of fraud.

A successful payment can be a one-time payment, or it can be a subscription.

One-off payments can be consumable or non-consumable. We looked at an example of a spendable payment at the very beginning of the article - these are loans in Badoo. An example of a non-spendable payment can be found in games. Let's say you have a character that you play as. You want to buy him superpowers that last for some time. In this case, the purchase falls into the class of non-expendable payments.

Subscriptions. There is the widest variety of cases here. In addition to the initial subscription purchase, you may have:

renewal of subscription (renew);
cancel subscription;
trial subscription (trial);
grace period subscription: when we are unable to renew the subscription and try to pay again within a period of time called grace period. For the user, the grace period looks like this. Let's say you buy a monthly newspaper subscription. The company that sends you newspapers tries to write off payment for the next month of subscription at the end of the first month, but cannot do this (due to the blocking of the card, insufficient funds in the account, etc.). If the grace period is valid for ten days, then during this time the company tries to write off the payment, while the subscription remains valid. If the company is unable to write off the money, the subscription is cancelled. If successful, the subscription is renewed from the date of the last payment;
partial payment (partial billing). For example, PayPal allows you to make partial payments if there are not enough funds in the user's account (partial pay), or split the payment into parts (partial invoice).

There are also two characteristics to take into account that are entirely dependent on the payment provider: the subscription can be controlled by your application or managed externally.

An internally managed subscription is, for example, a subscription using credit cards or PayPal, when after the first payment you are given a token, with which you contact the provider again, without having the user's payment details.
An externally managed subscription is when a payment aggregator completely takes over the management of your subscriptions and simply sends you notifications about their current states.

In the figure, the most obvious areas are highlighted in purple, the reaction to which is usually realized first. All others begin to be taken into account much later, as expertise accumulates. This is largely due to the incorrect application of iterative development methodologies in the field of billing.

Figure 2. Billing cases that are primarily implemented in systems

Such a phased implementation can lead to unpredictable consequences. For example, in one of the projects I worked on before Badoo, the possibility of a refund was not taken into account. As a result, all returns were made not through refunds, but through chargebacks, which negatively affected the company’s risk rating and led to failures in collecting income statistics. Ignorance of the full variety of billing cases can lead to lost profits or the company's vulnerability to users who feel deceived.

So, on the one hand, bugs in payment processing must be found before release, because they can lead to the most negative consequences. If this was not possible, then it is important to quickly understand that the bug was included in the release version of the application, fix it and - the most important thing that many people forget about - to “calm down” users who encountered this bug.

On the other hand, the situation is complicated by the fact that integrations with payment providers are always interaction with a “black box”, which adds many variables to the testing process.

Technical problems during billing testing process

Let's look at them using the example of Badoo integration with the App Store.

Subscriptions to the App Store belong to the class of externally managed, that is, they are fully managed on the provider’s side, and our system can only request the current status or receive notification of its change.

We specifically chose this particular integration because it is the most complex and contains all the variety of cases that can be encountered in the process of integrating a service with other payment providers.

First, let's look at the one-time consumable purchase.

Figure 3. Process of making a one-time spendable payment

In step 1, the user makes a request to purchase a service. The application decides that payment should be made, and in step 2 control is transferred to the payment provider (App Store). Step 3: The user is provided with a form to make a payment. Step 4: The user provides payment information. Step 5: the provider completes the transaction and reports the result to the application, returning a receipt containing complete information about the purchase (date, service, status, etc.). Step 6: the check, supplemented with user data, is sent to the server for processing. The server processes the receipt data and generates a push notification for the application in the seventh step. In the eighth step, the notification is shown to the user.

The problem is that steps 3, 4 and 5 are performed on the payment provider's side, have little control over us and may have different variations. Thus, the process is not actually linear, as shown in Figure 2, but rather branched (see Figure 4), and each branch must be handled differently by the application.

Figure 4. Branching state diagram of a one-time payment

The purchase of subscriptions begins in the same way as a one-time payment, but further management of the process is quite difficult to control.

Figure 5. States of an externally managed subscription

Let me remind you that the Apple subscription we are considering as an example is externally managed. This means that after a purchase, the user can manage it asynchronously: close it, change the expiration date, request a refund. We see this in step 9. Since the action takes place outside of our system, I have indicated it with a dotted line in the figure.

At step 10, the App Store can change the subscription status: extend, close, enter a grace period into the window.

So that we can find out what state the subscription is in, there is step 11, which is specific to aggregators such as the App Store and Google Wallet. At this step, the system sends a token, which is used as a receipt (receipt), received at the very beginning when purchasing a subscription or after its previous renewal.

Step 12 is the provider's response. We receive a receipt with the current subscription status. The result of this step depends on asynchronous steps 9 and 10.

In the fall of 2018, Apple implemented a mechanism for cross-server (server-to-server) notifications for everyone , which allows you to notify about changes that have occurred with the subscription. Receiving such a notification is shown in step 13. For most other providers, the server-to-server notification mechanism is the only one, so it can be argued that the Apple example covers the whole variety of cases. In the case of other providers, step 13 allows you to exclude steps 11 and 12 from the scheme.

At step 14, the server generates a response for the application to a change in the subscription state.

Thus, we have obtained a complete graph of states that must be passed through in order to check paid services.

Figure 6. The complete process of changing payment states (using the example of an externally managed subscription)

The parts that we do not control in our system are colored in orange, and they are black boxes for us.

Billing testing methods

So, the main technical problem when testing paid services is the presence of “black boxes”, the states of which we have very little control over. This defines a set of methods that can cover the entire variety of cases.

There are not many of these methods, and we divided them into three categories: real payments, sandboxes, and eliminating external dependencies on black boxes.

Real payments

Real payments as a test method are good because they give a clear picture of the state of integration. An error when making a real payment is unconditional evidence of a bug.

Otherwise the actual payments are bad. Firstly, it is expensive: obviously, in order to make a real payment, you need to spend real money. You are mistaken if you think that ultimately the entire amount will be returned to the company: firstly, providers take a commission from each transaction, the size of which, as described above, depends on the risk rating of the organization and can reach 40% (and even more) ; secondly, you may lose money when testing payments in other countries due to the currency spread - the difference between the buying and selling rates of a currency (you will make a purchase at the bank’s rate for selling a currency, and the return will come at the purchase rate).

In addition, this method may take a long time, because you will have to wait for the end of the subscription renewal period, the end of grace periods, and this can be months.

Sandboxes

Sandboxes are wonderful. This is essentially the same functionality that a payment provider gives us in the case of a real payment, but without spending real money. It is fully supported by the provider, meaning sandbox integration is low cost.

The problem of testing being extended over time is usually solved by using various tricks. For example, the App Store Sandbox uses the following subscription expiration transformation.

Real subscription time	Subscription time in the Apple sandbox
1 Week	3 minutes
1 month	5 minutes
2 months	10 minutes
3 months	15 minutes
6 months	30 minutes
1 year	1 hour

Table 1. Correlation between the validity periods of a real subscription and a subscription in the Apple sandbox

The default validity period for subscriptions in the Google Wallet sandbox is shown in Table 2, and it can be configured in the merchant console.

Real subscription time	Subscription time in Google sandbox
1 Week	5 minutes
1 month	5 minutes
3 months	10 minutes
6 months	15 minutes
1 year	30 minutes

Table 2. Setting the subscription validity period in the Google sandbox

Unlike the Apple sandbox, in the Google sandbox you can also check the trial, grace period, etc., using the ratio from Table 3.

Real subscription time	Subscription time in Google sandbox
Trial period	3 minutes
Introductory period	Equal to the time of the corresponding subscription
Grace period (3/7 days)	5 minutes
Temporary account blocking (hold)	10 minutes
Pause (1/2/3 months)	5/10/15 minutes (respectively)

Table 3. Validity of additional subscription functions in the Google sandbox

Closing a subscription can also be implemented in different ways: in the App Store sandbox, closure is performed after the fifth renewal, and in Google Wallet it is performed from the seller’s console or on a device from the Play Store.

The problem with sandboxes is that providers have different views on their quality. Our experience shows that of the more than 70 payment providers that are integrated into Badoo, only two sandboxes can boast full functionality and stable operation. These are Adyen and PayPal sandboxes. The rest of the providers have either stable sandboxes that are limited in functionality (like Google Wallet), or unstable sandboxes that are severely limited in functionality (like the App Store and Fortumo). And there are providers that do not have and do not intend to have a sandbox at all.

Figure 7. Classification of sandboxes by stability and functionality

Eliminating external dependencies

If we have convinced you that testing using real payments is expensive and ineffective, and payment providers do not provide sandboxes of the required quality, then all that remains is to turn to various ways to eliminate external dependencies. There are only three of them: mocks, fakes and stubs.

Mocks in billing are the formation of reactions of your system to requests with predetermined parameters without actually contacting the payment provider (see Fig. 8). For example, a request to an SMS payment provider to the number +7111-111-11-11 is intercepted at the stage of sending a request to the provider and generates a system response in the form of a successful payment. A request to the number +7111-111-11-12 is also intercepted, but results in an error response with the code “There are not enough funds to complete the transaction.”

Figure 8. Mock scheme

Fakes in billing are fake notifications (as if they came from a real provider) (see Figure 9). Integration with each provider involves a limited set of system reactions to a limited set of types of notifications or resits. Based on this information, for each individual payment, we can generate a set of notifications (with signatures and other fake security attributes), which our system will consider real notifications from the payment provider.

Figure 9. Fake scheme

Stubs in billing is a redirect to a page with a list of possible system reactions instead of sending and processing a request (see Figure 10), when we provide all possible reactions of the payment provider for the current state of the payment and call this reaction instead of sending a request to a real provider or sandbox.

Figure 10. Stub diagram

All these methods allow you to avoid wasting real money and time, but they cannot be called completely cheap, because to use them it is necessary to draw up maps of all possible billing states for each provider and keep them up to date. Also, to use all methods (except, perhaps, fake) you need to make significant changes to the code. In addition, being different options for simulating a real payment, mocks, stubs and fakes have a certain degree of approximation to reality and risks when used that must be taken into account.

Let's return to the process of making a one-time payment. Steps 3, 4, 5 are key for integration: transferring control to the payment provider, sending a request to the provider and receiving a response. When using each of the considered methods for eliminating external dependencies, the focus is on some of these steps: when using a mock, we simulate the transfer of control and sending a request, when using a stub, only the transfer of control, when using a fake, receiving a response. The remaining steps are “put out of brackets.”

Figure 11. Modeling the interaction of an application with a provider using different methods of eliminating external dependencies (using the example of a one-time purchase in the App Store)

On the one hand, such elimination of steps leads to risks (for example, you can miss a bug in untested steps). On the other hand, modeling each step makes the method more expensive, since it requires changes to the system. Therefore, in practice we use combinations of methods. For example, mocks and fakes, when when sending a request to a certain number, the system reaction is not generated, but a fake notification is sent to the entry point for notifications on our server. Or stubs and fakes, when when choosing a reaction from the stub a fake notification is also sent. Naturally, such implementations should be limited to development environments and should not end up in production.

Limitations of Billing Test Methods

All the methods described are not a panacea. How to understand at what moment it is better to use one or another of them? To do this, we propose to evaluate them according to the following criteria:

reproducibility and coverage - which method will help cover and reproduce as many cases as possible?
the possibility of end-to-end checks - what does the method do better: allows you to check the entire payment process or thoroughly and quickly test only some of its stages?
low cost - evaluate the full cost: not only real money spent, but also the cost of writing and maintaining code.

We summarized the assessment results in a table.

Table 4. Comparative characteristics of billing testing methods

Real payment. Quite a limited number of cases. The annual subscription must be tested for a year. But this is the only method that allows you to test the entire integration process. It is quite expensive: we constantly spend real money paying transactions to providers.

Sandbox. Sandboxes, for example, at Apple and Google, are different. Therefore, they can cover a different number of cases (and certainly not all). The sandbox does not provide the possibility of full end-to-end testing: even the code in the sandbox itself may differ from the code in production. However, this is perhaps the cheapest method.

Fakes, mocks, stubs are the most flexible method. We can cover the entire range of cases. Due to the specifics of this method, we do not test the entire payment process. The method is not cheap: you need to write code and keep it up to date.

Selecting a Test Method

In order to determine which method to use at which stage, let's turn to the classic testing pyramid.

At the bottom of the pyramid there is a large number of tests that should completely cover the entire functionality of our system. These should be very small cases and fairly cheap.

At the top of the pyramid, coverage may be incomplete: these may be expensive cases. The main check we want to do here is to check the complete path of our service from the request to delivery to the user.

If we correlate this with the criteria for evaluating testing methods, we get the following ratio: for tests at the bottom of the pyramid - fakes, mocks, stubs; for tests at the top of the pyramid - integration-oriented methods: real payment and sandbox.

Figure 12. Correlation of stages and testing methods on the testing pyramid

Antipatterns when choosing a method

Information about what happens if the ratio of tests in the testing pyramid is violated can be found in a large number of articles.

Let's look at examples of three testing antipatterns that do not match the relationship in Figure 12 that we encountered at Badoo.

Real payments at the bottom of the pyramid

For testing using real payments, a special card was created. It was available only to a narrow circle of people. But one day one QA engineer from our team found out her data. Having good intentions, he decided to implement autotests. Naturally, at some point the bank saw that it was receiving requests for several thousand payments within a very short period of time, and blocked the card. And he blocked it so much that we couldn’t unblock it for about two weeks.

The conclusion is this: there is no need to use real payments everywhere.

Sandboxes at the bottom and top of the pyramid

The first problem that arises from over-reliance on sandboxes is their failure. For example, for testing Apple payments, the sandbox was the only way for a long time. As a result, we were faced with the consequences of her unstable work. There were two cases when the sandbox did not work at all. It didn’t work for two weeks: as a result, we were forced to release four releases of the client application without any adequate testing.

The second problem is the limitations that sandboxes have. Firstly, it is difficult to change the subscription period. Secondly, this is the absence of such features as grace period, refunds and others, that is, some of the functionality is not covered by tests at all.

The consequences of using sandboxes at the bottom of the pyramid are the emergence of various infrastructure problems: when using the same account in a sandbox for a large number of payments, the size of the transmitted resit or notification may increase because Apple accumulates purchase history. For one of the users, the resit reached 1 GB - naturally, the test stand simply could not withstand the transfer of such a volume of data.

Eliminate external dependencies at the top of the pyramid

For one of the payment providers, we only used a combination of mocks and fakes. As a result, the notification format was changed for one of the operators, and the test gave false positive results. The provider's problem was the inability to make a real payment, since this required a SIM card from a certain operator in another country.

In such cases, it is necessary to carry out a risk assessment of eliminating external dependencies, it is important to track actual notifications and check them for compliance with the template or scheme (in case of non-compliance, such notifications should be investigated separately).

Conclusions

Paid services should be tested especially carefully, since even the most minor bugs can lead to unexpected consequences.
When implementing integration with a payment provider (especially when using iterative development methodologies), it is important to study and map all possible states of the provider. Iteration can be used to complicate the system's response to certain states, but the system must classify the states themselves correctly from the very beginning.
The payment provider is always a “black box” for us; testing its operation is very difficult. You should not try to use one method and test everything with it - this will lead to dire consequences. It’s better to test everything in combination, in composition: with fakes, mocks and stubs - all cases, a sandbox and a real payment - a couple of cases to test integration.
When using fakes, mocks and stubs, it is important to remember that these are real payment models, and, like any model, they have a degree of approximation to reality and risks. These risks must be assessed and covered either by actual payments or additional checks.

We talk about how we managed to achieve stable and inexpensive automation of testing of paid services in an iOS application in the next article.

Thank you for your attention! Great profits and fewer bugs to everyone!

How not to lose money in a black box: billing testing methods

Tomcat

Professional

Specifics of billing testing

1. Bugs in billing are critical for business.

2. Testing integrations requires knowledge and expertise.

Technical problems during billing testing process

Billing testing methods

Real payments

Sandboxes

Eliminating external dependencies

Limitations of Billing Test Methods

Selecting a Test Method

Antipatterns when choosing a method

Real payments at the bottom of the pyramid

Sandboxes at the bottom and top of the pyramid

Eliminate external dependencies at the top of the pyramid

Conclusions

Similar threads

How not to lose money in a black box: billing testing methods

Tomcat

Professional

Specifics of billing testing​

1. Bugs in billing are critical for business.​

2. Testing integrations requires knowledge and expertise.​

Technical problems during billing testing process​

Billing testing methods​

Real payments​

Sandboxes​

Eliminating external dependencies​

Limitations of Billing Test Methods​

Selecting a Test Method​

Antipatterns when choosing a method​

Real payments at the bottom of the pyramid​

Sandboxes at the bottom and top of the pyramid​

Eliminate external dependencies at the top of the pyramid​

Conclusions​

Similar threads

Specifics of billing testing

1. Bugs in billing are critical for business.

2. Testing integrations requires knowledge and expertise.

Technical problems during billing testing process

Billing testing methods

Real payments

Sandboxes

Eliminating external dependencies

Limitations of Billing Test Methods

Selecting a Test Method

Antipatterns when choosing a method

Real payments at the bottom of the pyramid

Sandboxes at the bottom and top of the pyramid

Eliminate external dependencies at the top of the pyramid

Conclusions