Friday, August 9, 2013

Learn from Bugs That End Up in Production: 99 Ways Workshop #44

Learn from Bugs That End Up in Production: 99 Ways Workshop #44


The Software Testing Club recently put out an eBook called "99 Things You Can Do to Become a Better Tester". Some of them are really general and vague. Some of them are remarkably specific.


My goal for the next few weeks is to take the "99 Things" book and see if I can put my own personal spin on each of them, and make a personal workshop out of each of the suggestions.


Suggestion #44: Learn from bugs that end up in production. Try to work out how your testing missed them. - Amy Phillips


If I could quantify how much emotional turmoil I put myself trough over the years from this very issue (bugs in production that "should have been found by the testers"), I'm willing to bet that I would be able to add several years to the end of my life. 

I won't say that there were no times that I didn't deserve to be asked "why didn't you find that bug?!" but I think a turning point came when I finally was able to say, with very direct purpose "I'm sorry, I don't know. How about we ask the programmer who put the bug there in the first place? Maybe we might get some answers there." No, I never really said that to anyone, but the notion was an important one for me to learn and internalize. 

Quality isn't a blame game, and it isn't the responsibility of software testing to prevent all bugs from going out into the field. First, it can't be done, and second, quality of software needs to be owned and handled by the entire team the entire way, from conception to release. Joining teams that understood that helped a great deal. The topic turned away from "how could YOU miss that?" to "why did WE miss that? What could WE learn from this?"


Workshop #44: Examine all bugs found in production or reported by customers, and compare them to issues in the bug database. Determine if indeed we have a missed opportunity for testing, or if there's a deeper story to be told.


This will make more sense in light of the company I currently work and the way that we differentiate between story tests and bugs, and yes, we differentiate. An issue or problem discovered during the course of development and testing for a particular story is recorded and part of the development process. It resides inside of the story and gets resolved, retested and signed off before the story is completed. 

This gives us several opportunities to see if we missed story tests, or if we really came across a different aspect of behavior we hadn't considered. This process tends to result in, you guessed it, more detailed story tests. By the end of the process, when testing was completed, and the code was considered completed, it is merged to a staging branch, and after hardening and baking there for awhile, we see if there are other issues we didn't consider. 

At this point, if we have found an issue, we have the choice to continue work on the story or to split out an issue and call it a bug (since we use our application aggressively, once a build is on staging, it is "out to the customers"). If we don't see any additional issue, we release to production, an then we watch and observe. If we get a bug report at this point, it is filed as such and it is focused on as exactly what it is. It's an issue that got out into the field, for a variety of reasons.

By tracking and looking at stories that were associated with the bug that was filed, we correlate and see if there were hints of the behavior we discovered, or if it was out of the blue. There are indeed times where we notice that an issue was spotted, we reported it, we made a conscious decision to not focus on it for the time being, and later on when a customer reported it, the priority was pushed up considerably. 

It's entirely possible that, no matter what you do testing wise, you just will not find particular issues. This may require heavy system load, or hundreds of thousands of users, or an unusual user environment. It's also possible that the issue is entirely based on "pilot error". Since we are fallible, we can forget things or we can misplace a keyword and lose hours of productivity due to examining the wrong item. It's also important to see if, indeed, we DID report an issue and it was considered not a high enough priority to work on. Don't discount these situations, they can happen quite frequently. Often areas are deferred until later, only for that "later" to arrive when a customer complains. Priorities change when customers complain. We may have thought the issue was minor... and now we are proven wrong.


Bottom Line:

Every bug for us means an external entity discovered our issue before we did. That's not being said to shame ourselves, but to highlight a fact; we will never ship 100% defect free software. It's impossible. What we can do is focus on the issues that do slip through, see if we can learn what put them there, and how we can deal with them more effectively. We can also look at past examples and see if the decision to defer something came back to be a bigger issue than we anticipated. Again, this is an opportunity for us to get better at determining where the risk areas are, and which risk areas need to be mitigated, and when. 

No comments: