Friday, October 21, 2011

Book Review: Perfect Software and Other Illusions About Testing

There are few members of the testing community that have been as active as long and with as much staying power As Gerald (Jerry) Weinberg. Jerry use to sit in a room with hardware that, at the time, represented about 10% of all the computer power in the world.

Coming from that to our world today, it's no wonder that we may treat computing as ubiquitous, all encompassing, and something we often take for granted.

What's also often taken for granted is that software is tested, or not tested. "Perfect Software and Other Illusions About Testing" tackles many of the myths and the challenges that surround the practice of software testing. Some common questions from the preface of the book:


  • Why do we have to bother testing when it just seems to slow us down?
  • Why can't people just build software right, so it doesn't need testing?
  • Do we have to test everything?
  • Why not just test everything?
  • What is it that makes testing so hard?
  • Why does testing take so long?
  • Is perfect software even possible?
  • Why can't we just accept a few bugs?



So how does "Perfect Software" handle these questions? Let's find out.

Chapter 1: Why Do We Bother Testing?

No program is perfect. The simple fact is, human beings are fallible, we make mistakes. As long as we are human, we will have need of testing and testers. The truth is, we do testing all the time. Think of using one web browser over another. Why do I prefer Chrome over Firefox? Is there a reason? For me, the flow of Chrome is more natural and the response time and layout feels faster. How did I get that feeling? Through testing. Granted, I didn't write formal test cases and publish a report, but I tested and found what worked for me. Jerry makes the following points: we test because we want to make sure the software satisfies customer restraints. We test because we want to make sure the software does what people want it to do. We test because we want to make sure that the product doesn't impose unacceptable costs to the customer. We test to make sure that customers will trust our product. We test to be sure our software doesn't waste our customers time. There are lots of other reasons, but all of those make the point that it is the customer we are working for, and if we want the customer to have confidence in our product, there needs to be testing. Also, there will be different levels of testing based on the risk associated with the product in question. The risk of a bug in web browser, while annoying, is totally different compared to the risk of a bug in an embedded heart-monitor pacemaker. In the web browser, some time may be lost or a page may not load right. In the pacemaker, a bug might cause someone to die. Both products have risks, but they are hardly comparable. We'll test very differently an with much greater rigor on the heart monitor than we will on the web software.

Chapter 2: What Testing Cannot Do

Testers gather information and present it with analysis as to what they have seen and with regards to their understanding of how things should work. That's it! That's testing, plain and simple. Yes, we can do a lot of things to help us get to that point, but ultimately that's what we do. We don't ensure quality. We don't block bugs, we don't fix bugs (well, maybe some of us do if we are good programmers, but then we are not testers at that point; we are software developers). Testing also can't be done in zero time and for free. Gang, software is a cost center. It's an expense. It always will be. Unless you sell testing services, testing will be an expense towards doing business. It also takes time to process the needed information, and often the information will have repercussions, some of them good, but a lot of times bad. Also, testing is not fixing. Fixing requires development to actually move on the information testers provide. Testing also hits an emotional center. It's easy to think decisions are made with rational thought and deliberation. Often, nothing could be further from the truth. Testing cannot overcome politics, procedures or cronyism, but it might help identify where it is ;). In short, if you are not going to act on the information provided, don't bother to test.

Chapter 3: Why Not Just Test Everything?

Let's be clear, there is no way, with man or machine that we can "test everything" in an application. From a purely code standpoint, even simple nested loops could have billions of combinations. Exhaustive testing of all functions and all permutations would take thousands of years (do the math, it's true). What's more, the state of a machine is never the same twice, or the same on every machine. There are many parameters that can make a test that passed in certain instances fail in others. Are you sure your testers can find every one of them? Not likely. Also, absence of evidence does not mean evidence of absence, so testers can never make that assumption. The Holy Grail of testing is to find that "just enough" amount of testing that will reveal the "largest amount of bugs possible". Testing is always about getting a sample set and making the sample set tell you as much as possible. If you are methodical, a small sample set may tell you most of what you need to know and may find a lot of bugs, but it may not.

Chapter 4: What's the Difference Between Testing and Debugging?

Testing is a fuzzy discipline. When we look for new information and try to find areas where the program has issues, that's testing. When we have found an issue and want to isolate it, is that testing? Actually, it's pinpointing, and while it's valuable, it's an additional task above and beyond testing. Determining the significance of an issue is likewise important. Is it testing, too? Is locating where in the code the issue is testing? Is repairing the code and working through scenarios testing? Is troubleshooting to make sure that the areas that need to be covered are covered, is that testing? Once the code is deployed and the testers look at the software en toto as deployed to find new and fresh information... OK, now we know that that is testing. How about the rest? the repairing of code and debugging we feel confident in saying that's not our area, but what about the rest? In small organizations, many of these roles are handled by a couple of people, or in some instances, one person. In larger organizations, there's supposed to be an organizational and process hierarchy... and here's where it gets weird. Confusion about the differences of these roles can lead to conflict, resentment, and failed projects. Different tasks require different skills. Lumping them all in with testing distorts planning, estimating, and work focus, and can cause lingering problems.

Chapter 5: Meta-Testing

Meta-information is information about the quality of information. Yeah, go over that a coupe of times :). What Jerry's saying is that we can learn a lot about the state of a project by the way that the information itself is treated and valued. Do you test to spec? Yes. Awesome! Where are the specs? Well we can't find them... you see where this is going, right? There's an issue with a bug database. When it gets more than 14,000 bugs, it starts to slow down to unusable levels. Any consideration as to what development process is generating 14,000 bugs for a project? A tester discovers a problem, but it's not "in their area" of testing, so they don't record it. A tester is diligently checking the scroll bar functionality of a web application, not realizing that the scroll bars are part of the web browser, not the application. Why doesn't the tester know this? Why does the organization let them keep on in their ignorance? A company discovers a lot of bugs one week, an then makes the announcement that the product is almost ready to ship, because "we've found so many bugs there can't be much more". We tested a product with ten users, and it should handle 100 users. Take the stats for the ten, multiply by ten and we should be golden. Testers present information, but the development team and project management team ignores their information. And so on. Just by looking at these situations, you can see there's so much more going on than what people think is going on.

Chapter 6: Information Immunity

Information is the key deliverable item of a tester. The problem? It can be seen as threatening. Bugs == issues, and issues == potential embarrassment, missed schedules, possibly bad reviews and reduced revenues. Scary stuff for some, so what happens? We tend to block out the information we don't want to hear. Information Immunity can stop dead anything of value we as testers may be able to provide. So we need to get to the heart of the fears first, and then figure out how to counteract them (later chapters deal with that). There's a survival instinct that comes to the fore when we're about to break one of our rules (those who know of my love for Seth Godin's Linchpin, well, here's where "The Lizard Brain" and "The Resistance" show themselves in full bloom). We will get defenses when we find those rules at the risk of being broken; we will not look smart, we will not execute perfectly, we will not make our deadline. We repress details that will be embarrassing, we get used to the way things are and we become complicit in going along with the program (bad tester, no donut!).

Chapter 7: How to Deal With Defensive Reactions

People get defensive, it just happens. We also tend to be less than gracious when we are called on it. So we need to use some of our tester skills to help over come these issues. First we need to identify what the fear is, as fear is what usually drives defensiveness. from there, thinking critically will help us determine what might be behind "the rest of the story". From there, it's time to spend some time focusing on how to counteract the fear and help them either overcome it or deal with it.

Chapter 8: What Makes a Good Test?

How do you know that you are testing well, or that your testing is actually being effective? Honestly, you can't tell. That's a dirty little fact, but it's true. There's no way that we can really say "testing is doing well" because we really don't know how many things we are missing or how far away we are from discovering a devastatingly bad issue. We can't find every bug. We can't test every possible scenario. So we have to sample, and that sample set is as good or as bad as our intuition. We really don't know if we did good testing until after the fact. In fact, we may never really know if bugs that were in the product will ever surface, or if they do, they may do so because new hardware and/or system software may be the cause to finally bring it to the fore. Does that invalidate our testing that we once thought was good, but is now "bad"? Perhaps we could enter some intentional bugs, but even then that depends on our knowledge of the "undiscovered, hidden bugs". Doesn't make a lot of sense, though it would certainly be a good exercise to see if your testers actually find them. Also, testers are often judged on the number of bug reports they file. Is that fair? Does that mean they are good testers, or does it mean they have an especially buggy project? One does not necessarily validate the other. In short, while you can't really determine what makes for good testing, it is possible to ferret out practices that lead to or are suggestive of "bad" testing.

Chapter 9: Major Fallacies About Testing

As stated in the last chapter, there's no way to really know if you have done "good" testing, but there are lots of ways to avoid doing "Bad" testing. Here's some examples. when you BLAME others, you tend to not see the rest of the potential issues. Stop the BLAME and look critically, and the issues may be both easier to see and easier to manage. If someone tells you that you need to do EXHAUSTIVE TESTING, you'll need to step back and explain/demonstrate the impossibility of such a task (really, lots of people don't get this). Get out of the idea that TESTING PRODUCES QUALITY. It doesn't. It provides information. QUALITY comes from developers fixing the issues that are signs of low quality. Do it enough, and you will have good quality ("for some definition of good quality", and a hat tip to Matt Heusser for that ;) ). By approaching an application through DECOMPOSITION, we thing testing the parts will be the same as testing thew whole thing. It's not. The parts of a system may work fine by themselves, but the customer will see the whole system, so DECOMPOSITION doesn't buy us anything if we don't test the full system as well. The opposite is also true, by approaching an application through COMPOSITION, we may miss many independent actions of modules that make up the whole. The ALL TESTING IS TESTING and EVERYTHING IS EQUALLY EASY TO TEST fallacies can also be stumbling blocks to good testing. Unit testing and integration testing are not the same thing. They provide different information. Stress testing and Performance testing may sound like the same thing, but they are not. Also, let's please put to bed the ANY IDIOT CAN TEST fallacy, as once we get into real, active exploratory testing, where previous tests inform and provide avenues for new tests (some of which were never considered before) the any idiots quickly drop off the testing train, leaving the active and inquisitive testers to follow through and complete the job.

Chapter 10: Testing Is More Than Banging Keys

There is more than just typing on the keyboarding and running through steps. Even when tests have been automated and they follow the lines they always have, I watch and see what happens, because I can tell when something looks out of place or isn't behaving the way that we think it should. Note, I'm not touching any keys, I'm watching what's going across the screen. It's my mind that's doing the testing, not my hands. Jerry describes a concept he calls the White Glove Test. A company had all of their testing standards in a manual in the library, and no where else. The dust on top of the manual showed that no one in the organization had touched, much less read, the manual for a very long time. Another good approach and one that many organisations I've worked with use is Dog Food Testing, meaning the developers live on the environment they helped create and actively use the products they code for. I lived for years behind a Dog Food Network at Cisco Systems; any change that was going to be rolled out had to bake there for awhile first. Very instructive, and often very frustrating, but it had the great effect of helping us see issues in a different light and much quicker than otherwise. Testers need to be tested, too (stay with me here :) ). Sometimes we see things that we ant to see, or we are overly critical in our results, so it helps to have another tester evaluate the information another tester has found (think of it as two reporters corroborating a story). Additionally, demonstrating a product but avoiding the areas where an issue might appear is not testing. It may be deft navigation, but it's not providing any new information to inform a decision. Most of these, as you can see, have little to do with banging on keys.

Chapter 11: Information Intake

We are always taking in information, but that information has little benefit if we can't finesse out its meaning and how to actively use it to communicate our findings. We often confuse data for information. They are not the same thing. Data is just the stuff that comes at us. Information is what we tease out of the data. From the testing perspective, there are areas that are ripe for information, but they may well be areas that developers don't want us to be looking. Too bad. Mine away. It's also important to have the right tools to mine that data for information (note, they nee not be expensive; the test tools that I currently use cost nothing, and often times I fall back on the simpleness of shell scripts.

Chapter 12: Making Meaning

Jerry starts off with an idea that he calls the Rule of Three... in the context of test data interpretation, "If you can't think of at least three possible interpretations of the test result, you haven't thought enough". We can have several interactions that seem on the surface to say "we have found a few bugs" but by looking more closely and inquiring, we could find several interpretations of what is being seen by the tester and how it relates to the product, and the developers and managers will also suss out their own meanings based on their needs, biases and interpretations. It's also vital to know what you should be expecting before you start interpreting the data you've collected. Even if you don't know what to expect, you can find out what the people that matter want to have it do, and from there, you can then start interpreting the data (note: this is how heuristics work, and why they are so valuable to testers. None are infallible, but they are all useful to a point :) ).

Chapter 13: Determining Significance

When we think of the significance of an issue, there are lots of things that determine what the significance actually is. What is significant to one person may be inconsequential to another, or at least could have a totally different spin. Spin is actually the practice of assigning significance. What might be a devastating bug could be portrayed as a unique feature if talked up in the right way. Significance can be prone to personal agendas, and therefore bias can easily creep in. When we take the time to recognize and filter out as much of the subjective details as we can, and look at things as objectively as possible, then we are able to attach a more appropriate significance to issues, and then determine how we want to proceed and what actions we need to take.

Chapter 14: Making a Response

Many times we chalk up projects not coming to fruition as being the result of bad luck. Yet if we look closely enough, most projects seem to have issues of "bad luck", too, yet those projects shipped. What was the difference? The difference was with how management and the team chose to work with their processes. Management and the way they respond to issues may well be the best indicator as to whether a project succeeds or fails. Usually, there is way to optimistic a projection as to how long projects will take. the old joke "the first 10% of the project will take up 90% of the time. The remaining 90% will take up the remaining 90%". No that's not a typo, and that's not bad math. That's the fact that sunny testing estimates often take nearly twice as long to complete. More realistic expectations are seen by many as too pessimistic, but they very often prove to be more on the mark than not. Yet we still think the sunny outlook will be the right one "this time".

Chapter 15: Preventing Testing from Growing More Difficult

The great irony when it comes to testing is that as software become more ubiquitous, covers more areas of our live, and is becoming more indispensable, the task of adequate testing is growing ever harder. Projects are growing larger, more complex, and have more features than ever before. and with this complexity, the odds of there being problem areas go way up. How to combat this? Try to keep single areas of functionality as simple as possible. Note, I don't mean bare, I mean as simple as is necessary. This fits in with the Software Craftsmanship's movement as well. Simple does not mean weak and incapable. it means do what is necessary and important with as little clutter and waste as possible. Our testing should also follow suit, and allow us to keep focused on doing good testing. Having up to date tools, having frank discussion about potential problem areas, and doing what we can to not extrapolate results from one area to tell us how another areas is doing.

Chapter 16: Testing Without Machinery

The best testing tool there is is the human brain. That's what does the real heavy lifting. While computers can take out some of the tedium, or make short work of lots of data and help to get down to the important bits of data that provide real information, a computer can't make the important decisions. It's the human brain that makes real meaning out of the work that computers do. So often we put too much emphasis and focus on what test automation can do. True, it can do a lot, and it can really be helpful with some of the longer running challenges or the truly repetitive steps, but automated testing can never replace the decision capabilities of a human brain.

Chapter 17: Testing Scams

Testing tools are a big business, and many of them are hyped and sold with a lot of promises and expectations. In most of the instances, they rarely live up to the hype. Oftentimes, they can be a larger drain on your budget than not having tools at all. Demonstrations are often canned and sidestep many of the bigger challenges. Going from the initial examples and moving on to doing real work is problematic, and almost never as easy as the demonstration suggested. The simple fact is that you never get something for nothing, and if claims for a product seem to be too good to be true, you can bet they are. Also, the likelihood that there is a totally free solution available that is comparable to the expensive tool being offered is quite high, but again, realize that even free, open source tools have their prices, too. Usually, they require time and energy to actually learn how to use.

Chapter 18: Oblivious Scams

It's possible to be scammed without spending a penny (well, directly spending a penny, that is. Indirectly, lots can be spent). We often get lulled into believing that certain actions can help us speed things along or make us more efficient, and oftentimes they just slow us down even more than we were before we started tinkering. Postponing doing documentation tasks may seem to save us time, but we still have to document them, and the likelihood of doing an accurate job of it gets more difficult the farther away from the issue we get. Wording things ambiguously can get us in big trouble; leaving things open to interpretation may indeed have the wrong interpretation made. Not reporting issues with the mistaken believe that we are "being nice" can come back and bite us later on. Nick Lowe has this one right, sometimes "You Gotta be Cruel to be Kind". It can be all too easy to project our own fantasies of what should be into our testing, and therefore, the ones who cam us are ourselves.

In addition, each chapter ends with a "Common Mistakes" section that handily summarizes and also puts into clear focus many of the issues that testers have to deal with, and the ways that organizations (and testers) can help to change the culture towards better software, since we already know that "Perfect Software" does not exist and never will exist.

Bottom Line:

I've always said that I respect and admire Jerry Weinberg because of the many books of his that I have had a chance to read. I adore Jerry for writing this book. As one who has been in the trenches for almost two decades, this book doesn't just speak to me, it screams to me. this has been my reality more times than I have wanted to admit. It also lets me know that I am certainly not alone in my understanding of some of these situations and dilemmas. For those who want to maintain the illusions of testing and have platitudes that say what you are doing is fine or to encourage you to align with "Best Practices", then this book is not for you. If, however, you want to see what the reality of software testing is all about, and approach your testing with clear eyes and clear understanding, "Perfect Software..." is a necessity.

3 comments:

Gerald M. Weinberg said...

Thanks Michael, for your kind and generous review. It's an honor from a testing pro like you.

Joe said...

Nice review of a terrific book!

Michael Larsen said...

Thank you, Jerry, once again for writing it. I think it should be required reading for every new tester; it might save them their sanity in those first few years :).