Guide

How to Automate In-App Subscription Testing for iOS and Android

iOS and Android subscription flows break when test agents install outside the store. Here is the two-layer architecture that makes them testable in CI.

Dhaval Shreyas

CEO & Co-founder

14 min read

Automate a subscription purchase on iOS or Android and the payment sheet does nothing. The Google Play Billing dialog and the StoreKit payment sheet both require a real store connection to activate. No CI environment can provide one. Apple made this harder in late 2024 by extending TestFlight subscription renewals to 24 hours per cycle, making full lifecycle testing unreachable even through beta builds.

Sandbox accounts are the standard fallback, and they fail at scale too. State leaks between runs. Parallel workers collide on shared accounts. The sandbox goes down during your release cycle. We have watched dozens of iOS and Android teams cycle through these same workarounds. This guide explains the architecture we built to get past all of them.

What you’ll learn

Why the payment dialog disappears in test automation
Why sandbox accounts fail at scale in CI
The two-layer architecture that works in CI
How Pie scripts subscription state on demand
How to keep the test API out of production

Where Subscription Testing Falls Apart

Subscription testing comes up in almost every conversation we have with mobile engineering teams. They want to test it. But when a test agent runs the app outside an official store environment, the dialog never appears. The tap does nothing. The payment sheet stays blank. The test fails, not because the subscription is broken, but because the environment cannot reach the real store. No testing tool can override this.

Most teams discover this late. A 7-day trial button ships with a UI bug that took manual QA hours to catch. A billing-retry banner never gets tested because reaching that state requires waiting for an actual subscription to lapse. Teams accept this as a permanent gap in their mobile testing automation and move on.

Most teams move on from here. You do not have to. We built an architecture that gives iOS and Android apps full subscription coverage in CI. But to understand why it works, we first need to look at where the standard approaches break down.

Why Subscription Testing Breaks in Automation

When a real user downloads an app from the Play Store or App Store, the store’s payment infrastructure is wired into that specific installation. The app recognizes it came from an official channel, and the billing SDK activates accordingly.

When an automated test agent installs an app, it typically sideloads it: installing the APK or IPA directly without going through the official store. In a standard sideloaded build, the Google Play Billing dialog and Apple’s StoreKit payment sheet do not appear. The billing SDK has no store connection to open. The UI just sits there.

Sideloading alone is not always the culprit. On Android, license testers can work with sideloaded or debug builds, but only when the build’s package name matches the app registered in Play Console and the tester account is properly configured. Most CI environments do not meet these conditions. The root issue is running the app in a store-incompatible or unauthorized environment, not sideloading itself.

What This Means for Test Frameworks

Every major testing framework runs into this wall:

Appium, Detox, Maestro, and XCUITest all install apps outside the official store channel. None can trigger the native payment sheet in a standard CI environment.
Most CI configurations do not meet the license tester prerequisites for a valid store connection. The environment looks fine. The test fails anyway.
Apps that pass manual QA fail in automation for reasons unrelated to the subscription code itself. The same “it worked in manual testing” explanation, every time.

The store environment is not a framework limitation. It is a platform constraint built into how Google Play and Apple’s App Store are designed.

Why the Standard Advice Fails

Every major player in this space says the same thing when pressed. The problem is real, widely acknowledged, and largely unsolved.

The SDKs built specifically for subscription testing on iOS and Android have published their own guides on this. The most honest conclusion in any of them is also the most deflating. Automated subscription testing is doable, but very difficult and limited. The SDK vendors said so themselves. The TestFlight subscription changes in late 2024 made even their own workarounds harder to sustain.

Apple and Google offer no programmatic control over subscription state in test environments. No API to put a user into billing retry. No way to jump to trial day 6. No clean reset between runs. Whatever state a test leaves behind, the next test inherits.

Search YouTube, Stack Overflow, and engineering blogs for “automated subscription testing CI.” Every result points the same direction. Use sandbox accounts manually. Accept the limitations. Skip the hard edge cases.

My team stopped accepting that answer.

How does Pie handle subscription testing?

See how the programmable entitlement control plane works and what it takes to set it up for your app.

Book a Demo

Why Sandbox Accounts Do Not Scale

The standard advice is to use license testers. Google Play Console lets you designate accounts that can go through purchase flows without being charged real money. Apple’s sandbox provides a similar mechanism. For one engineer running one test manually, this works.

For end-to-end testing at scale in CI, it falls apart immediately.

Parallel runs need many accounts. Each concurrent worker needs its own Google or Apple test account. A test suite with 20 parallel workers needs 20 separate license tester accounts, all configured and maintained.
Workers are ephemeral. Modern CI/CD infrastructure spins up fresh workers for each run and tears them down when done. Persistent accounts do not survive worker restarts.
State leaks between tests. There is no programmatic way to reset a license tester’s subscription status between runs. If run N puts a user into billing retry, run N+1 starts with that same state. Test isolation breaks entirely.
The sandbox clock is not yours. Apple’s sandbox accelerates renewals, but “5 minutes” is still real time you must wait. You cannot jump to trial day 6 on demand. You cannot trigger a refund event and immediately verify the UI response.
External availability. When Apple’s sandbox goes down, every team’s subscription tests fail simultaneously. You wait for Apple to fix it.

Every sandbox approach shares the same root problem. You depend on infrastructure you do not control, with state you cannot reset, on timing you cannot determine. Not a testing strategy. A permanent flakiness tax on your most important flows, producing the same symptoms as flaky tests from any other source.

How to Fix Subscription Testing in CI

Most teams treat “subscription testing” as one thing. It is not. There are two distinct layers that require completely different approaches.

Layer One: Verify the payment sheet appears. Install the app from the Play Store or TestFlight. Open the paywall. Tap Subscribe. Confirm the native payment dialog appears. Cancel. Done.

Run it before a release. Do not build your entire subscription test suite on it.

Layer Two: Verify the app behaves correctly across subscription states. Ninety percent of what subscription testing actually requires lives here. Premium features unlocking when they should. Paywalls appearing when they should. Trial countdown banners showing the right message on day 6. Billing-retry prompts appearing when a payment fails.

None of these states require a real purchase to test. They require setting the right preconditions and reading the right app behavior afterward.

Layer One: The Smoke Test

For Layer One, keep the real-store test small and intentional. Install from the Play Store or TestFlight. Navigate to the paywall. Tap the subscription CTA. Verify the native payment dialog appears. Cancel without completing the purchase.

A confidence check, not a regression suite. It confirms the app has a valid store configuration, the payment SDK is initialized, and the paywall can reach the store. Run it before every release. Do not run it for every subscription scenario.

Environment Prerequisites

Most CI failures in the smoke test layer are not test failures. They are environment misconfigurations. Before any test runs, these conditions need to be in place.

Android:

Build package name must match the app registered in Play Console exactly
The test Google account must be added as a license tester under Play Console → Settings → License testing
The app must be published to at least the internal testing track in Play Console (draft state does not qualify)
The test device must be signed into the license tester account in device Settings
Pie classifies environments that fail these conditions as unsupported_environment, not a test failure

iOS:

App must be distributed via TestFlight, not a direct IPA install
Tester must use an Apple sandbox account created in App Store Connect → Users and Access → Sandbox
On iOS 13 and later, sign in with the sandbox account when iOS prompts during the test purchase — no pre-configuration in Settings required
In-app purchase products must be in “Ready to Submit” status or higher in App Store Connect

Meet these conditions and the smoke test is reliable. Skip any one of them and you will get inconsistent results that look like a testing problem but are an environment problem.

What the Test Confirms

When Pie runs this test, we are looking for one thing in the UI: the payment sheet appears with the correct product name and price. If it does not, the store configuration changed. The subscription logic is fine.

Layer Two: The Programmable Control Plane

Layer Two requires direct, deterministic control over subscription state. The real store cannot give you that.

Your engineering team builds a test-only entitlement control plane. Protected backend endpoints that mirror the paths your production system takes via receipt validation and store webhooks. In production, a real billing event sets entitlement state. In a test build, activated by an environment variable or build flag, your test does.

No mocking, no store interception. You are talking directly to the entitlement layer your app already depends on, using the same paths a real store event would trigger.

What Your Backend Must Expose

Three capabilities make the control plane work. Here is what the minimum contract looks like:

subscriptionTesting:
  version: 1
  states:
    - free
    - trialing
    - premium
    - expired
    - billing_retry
    - grace_period
    - refunded
  actions:
    - resetUser
    - setTier
    - setSubState
    - setTrialDay
    - triggerRenewal
    - triggerRefund
  getters:
    - getSubscription
    - getEntitlements

Your backend exposes corresponding endpoints in a test-only namespace. The exact paths are yours to define. Pie registers whatever protected endpoints your app exposes and wraps them as callable scripts. A common pattern:

POST /test/subscription/reset
POST /test/subscription/set-tier
POST /test/subscription/set-state
POST /test/subscription/set-trial-day
POST /test/subscription/trigger-renewal
POST /test/subscription/trigger-refund
GET  /test/subscription

How State Changes Flow

The logic mirrors how engineering teams test other backend-dependent flows. Consider loan approval in a fintech app. You cannot approve a loan through the UI in a test environment. The QA agent calls a backend endpoint to set the approval state, then verifies the UI reflects the change when the app opens.

Subscription testing works the same way. The control plane does what a real store event would do. Because it routes through the same entitlement service, you are also testing your webhook handler and database update logic along the way.

Every subscription state your backend can express is now reachable on demand. Set tier, set trial day, trigger a refund — all in the same run. A universal reset clears state between test cases without touching the store. Billing events that used to take days to reproduce manually now take one script call.

Isolation and Reset Requirements

These are not optional. They are the conditions CI reliability depends on.

Idempotent reset before every test. Every test case must start with #{reset_user} as its first step. If a previous test left the user in a billing-retry state, the next test that skips the reset inherits that state. One flaky run in a hundred becomes the cause of the next failure. The reset is cheap. The debugging time it prevents is not.
Unique test user per parallel worker. When 20 workers run simultaneously and share a single test user, their state mutations collide. Worker A puts the user into expired. Worker B simultaneously sets it to premium. Both tests read incorrect state and produce unreliable results. Each parallel worker needs its own isolated user identity so subscription state belongs exclusively to that run. No shared state, no collisions, no test flakiness from infrastructure your tests didn’t cause.

Two rules for reliable CI

Reset to clean state before every test.
One isolated user per parallel worker.

These are not best practices. They are the conditions your subscription tests depend on.

Securing the Test API

A control plane that can flip any user to premium is also a critical vulnerability if it escapes into production. Four gates keep it contained.

Build flag gate. Compile test endpoints only into test builds. Use a build variant on Android or a scheme on iOS. If TEST_SUBSCRIPTION_API_ENABLED is not set, the routes do not exist.
Runtime environment check. Add a second gate at the handler level. If the environment variable is not present, return 404. Defense in depth: even if the code ships accidentally, the endpoint does not respond.
Separate namespace with access control. All test endpoints live under a /test/ prefix (or equivalent). In production infrastructure, this prefix routes to a 403 or does not resolve. Apply a service token or internal API key so the endpoints are not open even in staging.
No public exposure of state mutation. An endpoint that can set any user’s subscription tier is a business-critical attack surface. Treat it like a database admin endpoint: never accessible from the public internet.

How Pie Scripts Subscription Tests

When a customer exposes these endpoints, Pie registers each one as a custom test the test agent can call directly. Script names use underscores and map to the operations your control plane exposes:

#{reset_user} — clears subscription state before each test
#{set_tier_premium} — grants premium access
#{set_sub_state_expired} — simulates an expired subscription
#{set_trial_day_6} — fast-forwards to day 6 of a trial
#{trigger_refund} — fires the refund event
#{get_subscription} — reads current state for assertions

Premium feature access

Does granting premium tier in the control plane unlock the right content?

#{reset_user}
#{set_tier_premium}
[open app]
[navigate to premium feature]
[verify feature is accessible]

Trial-ending messaging

Day 6 of a 7-day trial is when users need a clear signal. Does the countdown banner show the right day, and does the upgrade CTA appear?

#{reset_user}
#{set_trial_day_6}
[open app]
[verify trial countdown banner shows correct day]
[verify upgrade CTA appears]

Refund downgrade

A refund fires. Premium content should become inaccessible immediately. Does the paywall come back?

#{reset_user}
#{set_tier_premium}
#{trigger_refund}
[open app]
[verify paywall appears]
[verify premium content is inaccessible]

Each test starts with #{reset_user} so the next run starts clean. Workers run in parallel because each gets its own isolated user. No shared state. No state leaks. The same autonomous testing loop that handles every other app flow handles subscription flows the same way.

Subscription States You Can Finally Test

The entitlement control plane opens scenarios that were previously unreachable in mobile app testing:

Billing retry. When a payment fails, the app should show a billing-retry banner prompting the user to update their payment method. With the real store, triggering a payment failure requires a declined test card and a timing window. With the control plane, #{set_sub_state_billing_retry} sets it instantly.
Grace period. Apple and Google provide a short window after payment failure where the subscription stays active. Verifying your app correctly handles this window requires entering it, which takes real time in the sandbox. The control plane gets you there with a single script call.
Trial day 6 of a 7-day trial. On day 6, users should see an upgrade prompt. Reaching this state in sandbox testing means waiting real minutes, or a full day during one of Apple’s bad weeks. Call #{set_trial_day_6} from your test setup. Done.
Cancellation and re-subscription. Cancel, verify downgrade, re-subscribe, verify restoration. In the real store this sequence requires multiple interactions and timing assumptions. Against the control plane it is deterministic, fast, and repeatable.
Refund edge cases. What happens when a user completes a purchase and requests a refund the same day? With the real store, this requires a real refund and a processing delay. With the control plane, #{trigger_refund} fires the event and you verify the UI response immediately.

The control plane also makes it possible to run these tests against every build in your CI/CD pipeline. Subscription coverage becomes part of your standard regression suite, not a special case that requires manual setup and a cooperative sandbox.

Subscription State as Infrastructure

Every other critical app state is controllable in CI. Auth state. Network state. Feature flags. UI state. Engineering teams have solved all of these at the infrastructure level.

Subscription state is the last one standing.

The tooling industry has accepted this gap as permanent. SDK vendors who built specifically for subscription testing have said so in their own documentation. Apple’s sandbox makes it worse every year. At Pie, we built this architecture because we got tired of telling customers that their most revenue-critical flows were essentially unreachable in automation.

The two-layer approach is not complicated. One smoke test confirms the store sheet appears, staged correctly for Android or iOS. A programmable entitlement control plane handles everything else, with clean state on every run, isolated users per worker, and test endpoints that will never ship to production.

Your billing stays on Apple and Google. Your test environment finally belongs to you.

Get subscription testing into your CI

See the two-layer architecture running in a real app. We will walk through what it takes to set this up for your stack.

Book a Demo

Frequently Asked Questions

No. Production billing is unchanged. Real Apple and Google APIs handle all live purchases. The test-only entitlement control plane exists only in test builds and staging environments. It is activated by an environment variable or build flag and is never shipped to production.

Start with the smoke test layer: install from the Play Store or TestFlight, tap Subscribe, verify the payment sheet appears, and cancel. That covers the one thing you genuinely need the real store for. For full E2E subscription coverage, the programmable entitlement control plane is required. Most engineering teams can implement the minimum contract (reset user, set tier, get state) in two to three days.

Yes. The entitlement control plane abstracts away platform-specific billing APIs. Pie registers your backend endpoints as scripts that work identically across iOS and Android. One test library covers both platforms.

Any state your backend can set: free, trialing, premium, expired, billing retry, grace period, refunded, paused, family sharing. Including time-dependent states like trial day 6 that are unreachable through Apple or Google's sandbox without waiting.

That depends on how your backend is built. The best implementations route state changes through the same entitlement service that real store webhooks hit, so you also test your webhook handler and database update logic, which is where subscription bugs most often live.

The minimum contract (reset user, set tier, get state) typically takes an engineering team two to three days to implement. Pie handles the execution side: registering your endpoints as scripts and generating test cases for each subscription state your backend can reach.

Dhaval Shreyas

CEO & Co-founder

13 years building mobile infrastructure at Square, Facebook, and Instacart. Now building the QA platform he wished existed the whole time. LinkedIn →