Solving randomly failing tests in Laravel

This was probably one of the most frustrating issues I’ve dealt in recent times. And finally, after many hours thrown at it, I finally found a solution.

The Problem

We have a suite of around 700 backend tests in our Laravel application. We use PHPUnit and Paratest to speed up the testing by spawning parallel processes.

At some point in the past, our GitHub job that ran the tests began to fail in a seemingly random manner. At first, we looked into differences in environment settings, trying to replicate the same context but when we ran the tests individually, they always passed.

Actually, we noticed that if we just re-run the job, the test would pass. And sometimes, a totally different and unrelated test would fail.

As our test suite grew, this problem became more and more frequent up to the point of having to trigger re-runs 2 or 3 times for every PR just to have the tests pass.

But one thing was clear, the tests failed due to having bad data in the DB. How could that happen? Every test seeded its own data, so the only explanation was if the data was leaking between tests.

But why? If each process has its own separate Database, and each test seeds or creates its data, how was it possible for data to leak?

I spent many hours experimenting with the different traits that Laravel offers, that makes the database seeding and preparing work differently. I tried them all, RefreshDatabases, DatabaseTransaction, DatabaseMigrations,… you name it.

I kept experimenting with many ideas people would throw in their replies, sometimes with a small glimmer of hope when the whole test suite would pass. Only to be disappointed when doing a second run and getting 5 failures out of the blues.

I looked into many pages of Google results, stack overflow, Reddit… even posted my own threads in Laracasts’ discussion and subreddit. I tried all the tips. Nothing worked. (Well, one thing did work. By using process isolation we could get rid of the issue, but it also meant losing the advantage we had by running the tests in parallel. So it wasn’t really a great solution.)

Finally, when I was getting very close to giving up on this (at least for a few weeks to regain some sanity), while I was trying a lot of custom code to replace some internals, I bumped into a strange PDO error message which I hadn’t seen before. (sorry, I forgot to write it down so I can’t remember what it was now). I used that to continue some more searches and eventually ended up in a bug report which, although unrelated to my problem, had somewhere in its comments the final missing piece for my puzzle.

And the funny thing is that I even had bumped into this issue before in a totally unrelated problem, but I never connected the dots!

So what was it?

In MySQL, TRUNCATE is not really possible to do inside a database transaction. Or, it is possible but it messes up the transaction. From the MySQL docs:

Truncate operations cause an implicit commit, and so cannot be rolled back.

I dealt with a related problem a few weeks ago and had to replace the truncate with a (slower) delete. Then I connected the dots. We must have some truncates somewhere in our tests, and when it hits the truncates, the results will be messed up for the next test. As the order of the tests is not always the same, this would explain why different tests would fail in different runs. It just depended on when the truncates would execute.

So I searched all the tests code with truncates in them. Found 3 different files. Replace them with queries using deletes. Ran the full suite, and everything passed! Ran it again, and again, and again. Always green. Made a PR to have GH run it too. Passed. Re-run to be sure. Passed!

Finally, it was solved.

I usually never write these kind of posts. But I wish someone had written it when I was looking for a solution. So I hope my post will eventually help someone in the future!

Leave a Reply