Random Test Failures in Jenkins
I work on Python – Django websites. I use GIT as my VCS. For continuous Integration, I use Jenkins CI. I have set up two virtual environments using Python, one for development and other for pre-production.
I have many unit tests, regression and smoke tests written for the website. both my development and pre-production virtualenvs are connected to the Jenkins CI.
Recently, tests are failing randomly for both the environments in Jenkins CI whenever the changes in code are pushed to them. Sometimes, tests are failing randomly without any code changes been pushed forward.
- Ran the tests locally, they are passing.
- Did some builds manually in Jenkins CI (using the Build Now button) the tests are passing.
- Ran the failing tests individually, still they are passing.
The tests that failed in the earlier builds passed in the next builds. And some tests that passed in the earlier builds, failed in the next builds. Can someone suggest what I can do?
One Solution collect form web for “Random Test Failures in Jenkins”
You are going to have to identify an environmental factor that causes the tests to fail randomly.
Some things I have seen cause this:
- Memory – there are other things running on the CI machine and it
doesn’t have enough memory to do all of them and build your stuff
- Time – There is something in your code that fails depending on the
time. For example, I had code that would fail on Feb 29th. It
surprised us after succeeding may times. It could be something like a
failure to format the number of seconds if there was only one digit.
- External dependencies – Your tests depend on some other server to be
up. If it goes down or gets really busy, it won’t respond to your
test code and the test fails. This could be a database server.
- Database content – You might not have set all the preconditions correctly for the test that runs against the database
- Concurrency – Sometimes multi-threaded code will only fail when conditions are just right (or just wrong). A
little random delay introduced by outside factors could make the code
work or make it fail. Its easy to overlook race conditions in
- Servers (or CPUs) – Sometimes a test will fail
if it runs on a particular server or core in among the test machines.
Of course if you only have one test machine, this can’t happen. But
if one machine has something broken, poor connectivity (firewall
rules), other processes running, less (or more) memory, your tests
could fail when they are randomly assigned to run on that one.
- [Insert yours here] – And there are a million more.
These are hard problems to solve. Especially if they go away for no good reason. It makes you nervous because you suspect it will come back just when you are in a big hurry to fix a nasty bug in the production system.