Pokemon Go: Performance Testing Failures

Ron Wilson · QA Revolution

Aug 13, 2016

I broke this down in the video above. Below is the written version, with more detail on what the Pokemon Go launch can teach any team that ships software at scale.

Performance testing is the discipline most teams underinvest in until a launch goes sideways in front of millions of users. If you build or test software that real people hit all at once, this one is for you. When Pokemon Go launched and became a global phenomenon almost overnight, the first few days were rough. Downloads failed. The system buckled. People could not log in. In this article I want to walk through what that failure actually was, why performance testing is the insurance policy nobody wants to pay for, and how you can avoid the same fate when your own big day arrives.

Why performance testing matters more than ever

Performance testing matters because functional correctness means nothing if the system collapses the moment real traffic shows up.

A feature can work fine for one user and fall apart when a million users hit it at the same time. That is the gap performance testing is there to catch. In the Pokemon Go case, the application itself was not broken. The problem was that the load was far larger than the system was prepared to handle.

Launches are bigger now. An app can go from nobody to tens of millions of users in a few days. A sale can push a year of traffic into one hour. When that hits a system nobody load tested, the failure is public and it is expensive.

What actually went wrong at the Pokemon Go launch

The Pokemon Go launch struggled because the demand far exceeded what the team appears to have planned and tested for, which is a classic insufficient-performance-testing signature.

When I analyzed the launch, the pattern was clear. There were download problems, system performance problems, and stretches where the application simply was not available when users tried to log on. None of those are feature bugs. They are capacity and scale problems, and they show up only under heavy concurrent load.

My read at the time was that some level of performance testing was probably done, but not nearly enough to match the real demand. It is likely that nobody anticipated just how many people would download and use the app in the first days. That is an easy mistake to make and a brutal one to live through. I covered this root-cause analysis in the video, and the short version is that the system met a wave it was never sized to handle.

Performance testing is an insurance policy

Performance testing is a lot like an insurance policy, because you do not know you needed it until the problem has already happened.

When everything is running fine, performance testing is easy to skip. It costs time and money up front. Then a lot of users show up at once and the system goes down. Now that testing you skipped is the most expensive thing on the list.

In my testing work over the years, I found that performance problems stay hidden until load brings them out. The application passes every functional test. It demos fine. Then real traffic shows up and the cracks were there the whole time. What I learned is simple. You pay for performance testing before the launch, or you pay a lot more during it.

How to performance test before a big launch

You performance test by modeling realistic load, running it against a production-like environment, and finding the breaking point before your users do.

Start with honest numbers. Estimate the realistic peak: how many concurrent users, how many transactions per second, how much data flowing through. Then deliberately overshoot, because optimistic estimates are how launches fail. A few test types cover most of the risk. Load testing checks behavior at expected peak. Stress testing pushes past the peak to find where it breaks and how it breaks. Soak testing runs sustained traffic for hours to catch memory leaks and slow degradation that a short test never reveals.

I go deeper on this in the video walkthrough. Measure the things users feel. Response times under load. Error rates as concurrency climbs. The point where throughput stops scaling. And test against an environment that resembles production, because a result from an undersized test environment tells you almost nothing about the real day.

Building performance testing into the release cycle

The teams that avoid launch disasters treat performance as a continuous discipline, not a one-time check the week before release.

A single performance test before a launch is better than nothing, but systems change constantly. A new feature, a new dependency, a schema change can quietly erode performance that was fine last release. If you only test once, you are validating a system that no longer exists by launch day.

The stronger model is to make performance part of the normal pipeline. Run a baseline load test on a schedule and after major changes. Watch the trend, not just a single pass or fail. Pay attention when response times creep upward release over release, because that creep is the early warning of a future outage. Augmented reality apps like Pokemon Go pushed this even harder, since real-world location data and constant device chatter add load patterns that traditional apps never see. Whatever you are building, the principle holds. Find the ceiling on your terms, in a test, not in a headline.

The takeaway for quality teams

Pokemon Go was a big success that almost tripped over its own popularity in the first week. The team was not careless. Performance is just the part of quality that punishes optimism. Build the insurance policy before you need it. If you have a launch coming, the real question is not whether your features work. It is whether your system holds up when everyone shows up at once.

If this was useful, the full breakdown lives in my video on the Pokemon Go performance failure. Here is my question for the comments: what is the largest traffic spike your system has ever faced, and did it hold up or fall over? Subscribe if you want more plain-spoken testing analysis like this.

Pokemon Go: Performance Testing Failures

Why performance testing matters more than ever

What actually went wrong at the Pokemon Go launch

Performance testing is an insurance policy

How to performance test before a big launch

Building performance testing into the release cycle

The takeaway for quality teams

More from QA Revolution

AI Is the Most Articulate Liar You Will Meet

AI Does Not Remove Bias, It Automates It

When AI Fails, Nobody Is Accountable