Saturday, December 4, 2010

BOOK CLUB: How We Test Software at Microsoft (5/16)

This is my fifth installment in the TESTHEAD BOOK CLUB for the book How We Test Software at Microsoft, written by Alan Page, Ken Johnston and Bj Rollison.

In Chapter 5, Bj takes the lead and discusses Functional Testing Techniques. By the way, a side note about how I’m writing these reviews and summaries. There are times where my take on the details in the book will not suffice, and the actual text of the area or really close summation (Where it’s obvious that it’s not my words) needs to be differentiated. That’s what the red text represents. Whenever you see red text, it is pretty much verbatim what’s in the book. I do this mostly with checklists and tables that I think are key to understanding the chapter details, and I do it when I make direct quotes to text in the book. So now you know.

Chapter 5. Functional Testing Techniques

Bj describes a Christmas morning when he received a gift of a Demolition derby slot car kit. Setting up the track and having the cars crash into each other was great fun, and Bj credits this with possibly starting him down the road towards testing, or as he put it, his “penchant for breaking things”.

The play was fun, but more to the point, his curiosity got him to wondering how the cars actually worked. HE admits to enjoying this, but writes with chagrin that unraveling the motor winding wasn’t a good idea because he didn’t know how to put it back together again, and so the car was permanently broken (a trip to the toy store and a new car fixed that). The takeaway is that you can learn a lot about something just by taking it apart and dissecting it, and likewise putting it back together again (if of course, you can :) ). This curiosity is a hallmark of testers, and therefore it should be encouraged. Likewise with software, if we want to learn about the program beyond the surface level user experience testing, we need to be willing to unscrew the top and look at “the wiring under the board” or in this case, the code structures that hold the application together.

One approach is to break the product down to its specific, discrete feature sets, and then test the specific functional attributes and capabilities of those specific pieces. This chapter focuses on ways that testers at Microsoft do exactly that.

The Need for Functional Testing

No question about it, software today is more complex than ever before, and it’s just going to continue on that way with the demand for more features and functionality. The tester’s job will not get any easier as time goes by, so testers have to make choices. How can they define a set of tests that will effectively examine a product or feature set while knowing full well that they will never be able to do an exhaustive run of all possible tests? More to the point, how can we hone in on those high value tests and provide the best coverage possible with the limited amount of testing we will realistically be able to do?

To this end, and to my great pleasure, Bj champions the use of Exploratory Testing (ET). ET is a great way to get a handle on an application, especially when a tester is new to it. My favorite metaphor for Exploratory testing is “testing without a map” (and credit for that phrase rightfully belongs to Michael Bolton since he wrote an article with that title, hence why I associate that phrase with ET :) ). Bj argues that Exploratory Testing can be sufficient for small software projects, or software with limited distribution, or software with a limited shelf life. But that it doesn’t scale well for large-scale, complex or mission-critical applications. I may disagree with that personally, but I get what he’s trying to say, in that Exploratory Testing techniques will not be the be all and the end all with testing (nor should it be; all tools have their right time and their right place). Exploratory Testing from a black box perspective also will not answer what BJ is referring to here, which is performing in-depth and detailed tests specific to what the functions themselves are coded for. Bj makes a good point in that black box behavioral testing, while certainly a key element, cannot be the sole approach to testing.

The question he proposes is this: How can we increase the effectiveness of our tests to limit redundancy and reduce our team’s overall exposure to risk?

Weinberg’s Triangle Revisited

Remember how I said that Alan Page emphasized that the stories and anecdotes will probably stick around with the users more than the book prose will? BJ proves that point again in this chapter with an example of “Weinberg’s Triangle”. What’s that? It’s a simple program based on Gerald Weinberg’s triangle problem to establish a baseline skill assessment. The goal is to use their existing skills and knowledge within 15 minutes to explore the application and to define tests necessary to determine if the program satisfies functional requirements. The requirement states that the program reads three integer values representing the lengths of the sides of a triangle and then displays a message: the triangle is scalene, isosceles, or equilateral. This time box limitation forces the tester to try to identify the most important tests. In reviewing over 5,000 samples, they discovered that the majority of SDETs included only one test in which valid integer inputs would result in an invalid triangle type, one test for equilateral, one test for scalene, and one test for isosceles. These four tests exercise approximately cover only 50 percent of the paths in the most critical method in the software.

So how can functional techniques help us here? Functional techniques allow us to systematically look at the feature sets capabilities. Note: this implies white box techniques and understanding of the code, but it doesn’t have to. It’s entirely possible to perform black box user interface level tests using Functional techniques. The key is to use them in the proper context; when you test and why you test are just as, if not more important, than what you test.

Bj takes another diversion with a comment he calls the “Pesticide Paradox”. In this he describes growing food in his backyard and the fact that, once planted in the open, numerous pests come after the food, specifically slugs. While there are various efforts he uses (beer, salt, copper stripping, wood ash, etc.) no matter how many he uses, invariably some slugs get through.

Likewise, when we test, if we rely on just one method, bugs will get through. Even if we vary our approach and use all sorts of different testing methods, we may catch or stop some bugs, but guess what? There’s still going to be some bugs that get through, because no matter how thorough we are, bugs will prove resistant to one form of testing or another. So diversifying the testing and using different tactics will, if not eliminate all bugs, at least give the tester more of a fighting chance to knock out a larger percentage of them.

Functional testing techniques can also help by zeroing in on key areas and eliminating redundant or needless tests. The rub is that it requires a certain amount of domain knowledge and understanding to do that effectively. Ultimately, any testing technique is a tool, just like a screwdriver and a reciprocating saw are tools for a house builder. Knowing when to use the right tool will make all the difference in the effectiveness in completing the job.

So what are some key functional testing techniques? What are our reciprocating saws and screwdrivers, to carry the metaphor further? Some examples are:

• Boundary Value Analysis (BVA)
• Equivalence Class Partitioning (ECP)
• Combinatorial Analysis
• State Transition Testing

Used in the correct context, testers can focus on specific features and methods. Still, a tool is just that; it won’t do the work for you, and it won’t be able to evaluate or reason through what it finds. For that, you need a skilled, aware, alert, “sapient” tester (to borrow a much adored term from James Bach and others) who knows where, how and when to use the appropriate tools to accomplish the task at hand.

Bj goes through and gives a thorough breakdown of these test approaches, and making a thorough summary of any of them would be deserving of its own blog post, or several. Therefore, I can only give a high level view of each of them.

Equivalence Class Partitioning

Bj considers Equivalence Class Partitioning (ECP) to be foundational to understanding and using other testing techniques. Actually, most testers already do this, they just don’t realize that they are doing it, or may not know that what they are doing iis ECP.

For ECP to work, need to break down the possible variable values for each input and/or output parameter, and then determine a way to define valid and invalid inputs for each. The ECP tests are made up of groups of valid until all of them have been used, and then going through and examining the invalid values individually. The trick is that this requires a solid grasp of the underlying structure of the application so as to correctly define all valid and invalid data groups. With this information, we design positive and negatives tests to evaluate functionality, as well as the ability of the system to handle errors (sound familiar?).

The first benefit of ECP is that it helps us systematically reduce the number of tests while still providing confidence that the variable combinations we use will produce the expected result repeatedly.

For example, assume you are testing a text box control that accepts a string variable of Unicode characters between upper case A and Z with a minimum string length of 1 and a maximum string length of 25 characters. Exhaustive testing would include each letter one time (261) and each letter combination for every possible string length. So, to test for all possible inputs the number of tests is equal to 2625 + 2624 + 2623 . . . + 261.

Yeah, that’s a huge number, and that’s exactly Bj’s point. Instead of testing every character combination imaginable, we determine that the following would be a logical way of breaking it down:

• uppercase Unicode characters between uppercase A and Z
• any string of characters with a string length between 1 and 25 characters

In short, any combination of those two criteria will work as a positive input, and anything that falls outside of that range would be considered a negative input.

Randomly selecting valid or invalid values randomly is meant to help increase the odds of exposing unusual behavior that might not otherwise be exposed using “real-world valid class static data”.

Decomposing Variable Data

So how can we effectively break these variables down into a good enough subset of valid/invalid data groups? Bj states that a solid familiarity with the data and the underlying structures is key. It also requires an understanding of “historically problematic variables” (or areas that have been known to cause problems in the past). Scope and intended audience can also give clues as to the proper context for the data. The danger with this is twofold; first, if we overgeneralize, we will not get a data set likely to find anomalies. If our variable data is too specific (hyperanalysis), we run the risk of too many redundant test cases. Finding the balance comes with use and practice.

Some of this is going to seem like a blinding flash of the obvious, but the point that Bj makes here is subtle. Valid data should create a normal or expected response. Invalid data should produce errors. To help me with the subtlety, I was reminded of James Whittaker’s Attack #1 from How to Break Software: determine every potential error message in the code, and force the system to display it at least once.

Bj uses the example of reserved words for system variables in a PC to demonstrate how an error was uncovered. Certain terms like LPT1 and COM1, COM2, etc. cannot be used as file names because they are handled at the system level. During a training course and showing how to test these examples, they discovered something in Windows XP. Some of the values if they were used to save a file with the reserved name (LPT and COM 1 through 4) showed one error message, but entering other reserved names (LPT and COM 5 through 9) displayed an entirely different error. With EQP, it might be seen as simple to assume that LPT and COM values would be treated the same. In this particular case, that would have been incorrect, and choosing too few names would not have uncovered the issue.

Glenford Myers proposes four heuristics in the book The Art of Software Testing when looking to partition data values. BJ defines a heuristic as “a guideline, principle, or rule of thumb that is often useful in performing tasks such as decision making, troubleshooting, and problem solving.”

Range: A contiguous set of data in which any data point in the minima and maxima boundary values of the range is expected to produce the same result. For example, a number field in which you can enter an integer value from 1 through 999. The valid equivalence class is >=1 and <=999. Invalid equivalence class ranges include integers <1 and >999.
Group: A group of items or selections is permitted as long as all items are handled identically. For example, a list of items of vehicles used to determine a taxation category includes truck, car, motorcycle, motor home, and trailer. If truck, car, and motor home reference the same taxation category, that group of items is considered equivalent.
Unique Data: in a class or subset of a class that might be handled differently from other data in that class or subset of a class. For example, the date January 19, 2038, at 3:14:07 is a unique date in applications that process time and date data from the BIOS clock and it should be separated into a discrete subset.
Specific: The condition must be or must not be present. For example, in Simple Mail Transfer Protocol (SMTP) the at symbol (@) is a specific character that cannot be used as part of the e-mail name or domain name.

Bj uses an example of a program called NextDate that is based off the Gregorian Calendar and uses this as a a jumping off point for several pages of code and examples. While all of them are very instructive, the volume of information is too large to include in a single chapter review (again, might make for a great series of posts another time).

The key takeaway with ECP is that it allows the tester to winnow down test variables in a systematic way and reduce the total number of tests required to perform an exhaustive test of all possible values. When applied correctly ECP lets the tester do the following:

• Force the tester to perform a more detailed analysis of the feature set and the variable data used by input and output parameters
• Help testers identify corner cases they might have overlooked
• Provide clear evidence of which data sets were tested and how they were tested
• Increase test effectiveness systematically, which helps reduce risk
• Increase efficiency by logically reducing redundant tests

Boundary Value Analysis

Boundary value analysis (BVA) is possibly the best known and most often misused functional test tool because it is assumed to be a trivial process. In short, a lot of issues pop up at the boundaries of variable values. Analyzing and determining these boundaries is important. Using BVA along with EQP (yep, Alan wasn’t kidding when he mentioned the proliferation of TLA’s (three letter acronyms).

BVA or boundary testing is especially useful for detecting the following types of errors:

• Incorrect artificial constraints of a data type
• Erroneously assigned relational operators
• Wrapping of data types
• Problems with looping structures
• Off-by-one errors

BVA is helpful with examining fixed-constant values (such as date conventions or common constants used in mathematics) and fixed-variable values (such as the X-Y axes used in creating and resizing a window.

Defining Boundary Tests

Paul Jorgensen in “Software Testing: A Craftsman’s Approach” uses the formula 4n + 1 (or 6n + 1, which would include the minimum value –1 and maximum value +1), where n is equal to the number of independent parameters. This should not be seen as the be all and end all to boundary condition tests. The actual boundary conditions mav vary depending on the actual implementation, so check your code to be sure.

What about those values that are not at the extreme edge? Do we use the same tools or something else? BJ suggests that, after identifying all boundary conditions the minimum set of test data can be calculated with a simple formula: 3(BC), where BC is equal to the number of specific boundary conditions.

To use the example for the NextDate program referenced in the text, there are three hard limits:

  • month (1 to 12)
  • day (1 to 31)
  • year (1582 to 3000)

Jorgensen’s 6n + 1 formula makes for (6 * 3) + 1, or 19 test cases (see book for table breakdown). There are additional values to consider, though. How about months that only have 30 days (or that pesky month that sometimes has 28 and sometimes 29 days)? Applying the 3(BC) formula for boundary testing reveals the test set required to analyze more potential boundary conditions. 3(BC) is (3 * 18), or 54 possible cases (again, see book).

Combinatorial Analysis

Put simply, it’s nice to have the ability to test simple structures and perform tests where only one variable at a time changes. However, in the real world, dynamic data entry, saves and access methods cause lots of variable values to be changed simultaneously. What’s more, many of these variables depend on each other. What to do? Combinatorial analysis to the stage!

Combinatorial analysis has a number of benefits, including:

• Identifies most defects caused by variable interaction
• Provides greater structural coverage
• Has great potential to reduce overall testing costs (when used appropriately)

Note: Combinatorial analysis is not an effective technique for testing independent parameters with no direct or indirect interaction, mathematical calculations, ordered parameter input, or inputs that require sequential operations.

Combinatorial Testing Approaches

There are a number of ways to apply Combinatorial testing, including:

random or ad hoc methods: (best guess, random selection, rely primarily on intuition and luck of the tester).
each choice: (EC, or simply testing each variable at least once).
base choice: (BC, identifies a combination of variables as the base test. This is usually the happy path or the most commonly used combinations of variable states).
orthogonal array: (OA, where each array variable is linked to a state for each interdependent parameter, and those states are mapped into an array).
combination tests: (pair-wise through n-wise or t = n),
exhaustive testing: (just doing every possible test combination, likely not possible in anything but the most trivial programs).

Chapter 6 will be covered on Tuesday.

No comments: