Tuesday, November 29, 2011

Cleaning Up Dirty Tests

One of the things I've been trying to master is the ability to make my tests stand on their own and not have dependencies on other tests. This seems like a standard thing to do, and we often think we are doing a good job on this front, but how well are we doing it, really?

As I was looking at some recent tests with one of my co-workers, I was really happy that they were passing as frequently as they did, but my elation turned to frustration the next build when a lot of tests were failing for me. The more maddening factor was that the tests that were failing would all pass if I reran them.

Wait, a little explanation is in order here. In my current work environment, I use Cucumber along with Rspec and Ruby. I also use a Rakefile to handle a number of tagged scenarios and to perform other basic maintenance things. When I run a test the first time, I get an output that tells me the failed scenarios. As a way of doing retesting, I tee the output to a rerun file and then I run a shell script that turns the failed scenarios into individual rake cases.

In almost all of my test runs, if I ran a suite of 50 scenarios, I'd get about 10 failures (pretty high). If I reran those 10 failure cases, on the second try I would get anywhere from 8 - 10 of them to pass. More often than not the full 10 would pass on a second run. As I started to investigate this, I realized that, during the rerun, each test was being run separately, which meant each test would open a browser, test the functionality for that scenario and then close the browser. Each test stood alone, so each test would pass.

Armed with this, I decided to borrow a trick I'd read about in a forum... if you want to see if your tests actually are running as designed, or if there is an unintended ordered precedence to their success rate, the best way to check is to take the tests as they currently exist, and swap their order in how they are run (rename feature files, reorder scenarios, change the mames of folders, etc.). The results from doing this were very telling. I won't go into specifics, but I did discover that certain tests, if all run in the same browser session, would get unusual artifacts to hang around because of the dedicated session. these artifacts were related to things like Session ID's, cookies, and other items our site depends on to operate seamlessly. When I ran each of these tests independently (with their own browser sessions) the issues disappeared. That's great, but it's impractical. The time expense of spawning and killing a browser instance with every test case is just too high. Still, I learned a great deal about the way my scripts were constructed and the setup and teardown details that I had (or didn't have) in my scenarios.

This is a handy little trick, and I encourage you to use it for your tests, too. If there's a way to influence which tests get run and you can manipulate the order, you may learn a lot more about your tests than you thought you knew :).

1 comment:

Zoltán Molnár said...

Michael,

Your point is right. We can be fed up with wasting the time to investigate what the problem is with our regression tests, but these reveal important issues: what factors contribute to the behavior of the SUT in the tested scenarios. Factors, which we were not aware of prior to designing our tests. Clear reminder of both the software complexity and the limited human thinking.

Regards, Zoltan