Tuesday, November 30, 2010

BOOK CLUB: How We Test Software at Microsoft (4/16)

This is the fourth installment in the TESTHEAD BOOK CLUB coverage of “How We Test Software at Microsoft”. With this chapter, we move into what Alan referred to in the introduction as the “About Testing” phase of the book, where he and the other authors talk about Testing skills in general. Chapter 4 covers Test Case Design.

Chapter 4: A Practical Approach to Test Case Design

Alan makes the case that Microsoft does not develop products with a short shelf life, generally speaking. While some applications and Operating systems have a bigger impact than others, it’s generally a given that an Operating System or an Office Suite may well be in use for 10 years or more, so the tests that are developed need to be robust enough to be in active use for at least a decade, if not longer. As a user of certain applications, I happen to know this is the truth. While I have a laptop that is running with the latest and greatest applications (Windows 7, Office 2010, etc.) my work desktop is still running Windows XP and Office 2007. Various virtual machines that I maintain and use for testing have Operating systems as far back as Windows 2000, Office 2000, SQL Server 2000, etc., because we have customers that still actively use these technologies. So I get where Alan’s coming from here.

Test automation is especially helpful when you have nailed down the long term test cases and have a good feeling they will not be changing (or the likelihood of change is very small). Manual testing is also required, but making the effort to design test cases that are robust, intelligent and focused is essential regardless of the method used to execute them.

Neat aside: Microsoft Office 2007 had more than 1 million test cases written for it. Divide that by the three thousand “feature crews” (see chapter three summary for description of a Feature Crew), and see that that gives an average of around three hundred test cases for each feature crew. That’s entirely believable, and may even be seen as “light” by some standards, but when combined, wow!

Practicing Good Software Design and Test Design

Software design requires both planning and problem solving. The user experience and the end goal of the work that needs to be done by the stakeholders in question guide those decisions. It’s not always to do what Alan describes as the “big design up front” (BDUF). When using Agile, ideally, the code itself is considered the design, but it still requires planning to be designed well.

Designing tests and designing the software have many parallels. Planning and problem solving are essential. The tester needs to know what to test, when to test, and what the risks are for what doesn’t get tested. Customer needs and expectations must be addressed. Good test design often stems from a thorough review of the software design.

Using Test Patterns

Alan references work by Robert Binder and Brian Marick to describe various Test Patterns that mirror or directly associate with patterns used in Software development. The idea behind test patterns is that they are meant to provide guidance for testers when they design tests. Some patterns are used for structured testing and heuristics, and other patterns go into different areas entirely. Test patterns allow the tester to communicate the goals of a strategy in a way that others can understand (development, support, sales, etc.).

Alan includes a template based on Robert Binder’s test design ideas as an example to help see the attributes of a test pattern.

• Name:  Give a memorable name—something you can refer to in conversation.
• Problem:  Give a one-sentence description of the problem that this pattern solves.
• Analysis:  Describe the problem area (or provide a one-paragraph description of the problem area) and answer the question of how this technique is better than simply poking around.
• Design:  Explain how this pattern is executed (how does the pattern turn from design into test cases?).
• Oracle:  Explain the expected results (can be included in the Design section).
• Examples:  List examples of how this pattern finds bugs.
• Pitfalls or Limitations:  Explain under what circumstances and in which situations this pattern should be avoided.
• Related Patterns:  List any related patterns (if known).

The key benefit to using this approach is that there is flexibility to create different patterns but still provide enough to help testers and developers and other interested parties understand what is happening and speak to it in a way all can understand.

Estimating Test Time

Sometimes, this is the bane of my existence. Effectively estimating test time can range anywhere from methodical and times specifics from past events (rare, but it can happen given enough time and iterations) to what I frequently refer to as “SWAGing” (SWAG stands for “silly wild-assed guess”). I wish I was kidding about that, but often when we deal with new technologies, SWAGing is often the best we can do.

Alan argues that simply adding a few weeks of "buffer" or "stabilization" time at the end of a product cycle doesn’t do the trick. In fact, this often causes more problems.

So how long should a test take? Alan mentions a rule of thumb to be copying the development time. Two man–weeks to develop a feature? Expect it to take the same two man-weeks to write the automation tests and define the manual tests… and execute them. Even this can be wrong, since there are a lot of factors to be considered when testing; the complexity of the feature could require ten times the time it took to code it because of the numerous variations that have to be applied to determine if it’s “road worthy”. Some things to consider:

• Historical data: How long have previous projects similar to this one taken to test?
• Complexity: The more bells and whistles, the more possible avenues to test, and the longer it will take to perform many of them (note I didn’t say all of them; even simple programs could have millions of possible test cases, and testing them all can range from impractical to impossible)
• Business goals: Web widget for a game, or control software for a pacemaker? End use and expectations can have a huge effect on the time needed to test a feature.
• Conformance/Compliance: is there a regulatory body we need to answer to? What regulations do we have to make sure we have covered?

Starting with Testing

Have you ever said to yourself “wouldn’t it be great if the test team could be involved from the very beginning”? Personally, I’ve said this many times, and in some aspects of Agile development, I’ve actually had my wish fulfilled. More times than not, though, testing comes later in the process, and I’ve lamented that my voice couldn’t be heard earlier on in the development and design process (“hey, let’s make testability hooks so that we can look at A, B and C”). Getting testers involved with reviewing the requirements or functional specs can be a good start. Barring the availability of up-front design specs (come on, raise your hand if you’ve ever worked on a project where you got a snicker or an eye roll when you’ve asked for a formal specification to review… yeah, I thought so :) ), The next best thing we can do is ask investigative questions. Put on that Mike Wallace hat (Mike Wallace being the investigative journalist made famous in the states for the TV program “60 Minutes”, my apologies to those elsewhere in the world that have no idea who I am referring to). How does the feature work? How does it handle data? Errors? File I/O? Getting these answers can go a long way in helping the tester develop a strategy, even when there is little or no design documentation.

Alan makes a segue here with a story about testing with the code debugger. While this may be a head scratcher for those testers that are not regularly looking at source code, he’s correct in that getting in and doing code review and walking through the code will help fill in the understanding necessary to see which paths testing may exercise and where to add test cases to make sure more coverage is provided.

Have a Test Strategy

Having a testing strategy is helpful to keep the team insync and know which areas to cover first and which areas to leave until later (or perhaps not deal with at all for the time being). Rick analysis is important here, and determining which areas are mission critical and which areas are lower priority can help make sure that you apply your testing time effectively. Part of the strategy should include training the test team in areas that they many need additional understanding (paired sessions, peer workshops, meet-ups or conferences can help with this). Key takeaway is that part of the testing strategy needs to be making smarter and better equipped testers.

Alan suggests using the following attributes when devising a testing strategy:

• Introduction: Create an overview and describe how the strategy will be used
• Specification Requirements: What are the documentation pans? What are the expectations from these documented plans?
• Key Scenarios: What customer scenarios will drive testing?
• Testing Methodology: Code Coverage? Test Automation? Exploratory Testing? Case Management? Bug Tracking system?
• Test Deliverables: What is expected of the test team? How do we report our results? Who will see them? When do we know we are done?
• Training: how do we fill in knowledge gaps? How do we help transfer skills to other testers?

Thinking About Testability

Testability is the ability to have software actually be in a state where testing is possible and perhaps optimized for testing. I’m a big fan of developers that look to create test hooks to help make the job of testing easier or more productive. I cannot count how many times I’ve asked “so how will we be able to test this?” over the years. Sometimes, I’ve been met with “well, that’s your problem” but often I’ve had developers say “Oh, it’s no problem, you can use our API” or “I’ve created these switch options you can set so you can see what is going on in the background” or “setting this option will create a log file to show you everything that is happening”. When in doubt, ask, preferably early on in the process. Testing hooks become harder to create as the project moves on and gets closer to ship date. Not impossible, but it adds time that could be used for feature development and hardening. Better to ask earliy in the game :).

Alan uses the acronym SOCK to describe a model for helping make software more testable in the development stage:

• Simple: Simple components and applications are easier (and less expensive) to test.
• Observable: Visibility of internal structures and data allows tests to determine accurately whether a test passes or fails.
• Control: If an application has thresholds, the ability to set and reset those thresholds makes testing easier.
• Knowledge: By referring to documentation (specifications, Help files, and so forth), the tester can ensure that results are correct.

How do you test hundreds of modems?

We needed to test modem dial-up server scalability in Microsoft Windows NT Remote Access Server (RAS) with limited hardware resources. We had the money and lab infrastructure only for tens of modems, but we had a testability issue in that we needed to test with hundreds of modems to simulate real customer deployments accurately. The team came up with the idea to simulate a real modem in software and have it connect over Ethernet; we called it RASETHER. This test tool turned out to be a great idea because it was the first time anyone had created a private network within a network. Today, this technology is known as a virtual private network, or a VPN. What started as a scalability test tool for the Windows NT modem server became a huge commercial success and something we use every time we "tunnel" into the corporate network.
—David Catlett, Test Architect

Test Design Specifications

Designing tests well is arguably as important as designing the software well. Using a test design specification (TDS) can be effective for both manual and automated tests, because it describes both the approach and intent of the testing process. This also has the added benefits in that it allows those who take on the responsibility of maintaining the product down the line to easily interpret the intent of the testing and what the software’s expected parameters are meant to be.

These are items that might be found in a typical TDS:
• Overview/goals/purpose
• Strategy
• Functionality testing
• Component testing
• Integration/system testing
• Interoperability testing
• Compliance/conformance testing
• Internationalization and globalization
• Performance testing
• Security testing
• Setup/deployment testing
• Dependencies
• Metrics

Testing the Good and the Bad

As testers we are often called on to do devious and not very pleasant things to software. AS is often quoted, while a developer will try very hard to make sure the software does the ten things it needs to do right, a tester is tasked with finding the one (or more ways) that we can cause the software to fail.

We perform verification tests (tests that verify functionality using expected inputs) and falsification tests (tests that use unexpected data to see whether the program handles that data appropriately). James Whittaker’s #1 attack is to find out every error message associated with a program/function/feature, and force the program to make that error appear at least once (for more on Whittaker attacks, check out his book “How to Break Software”).

In many ways, the falsification tests are more important than the tests that prove the product “works”. Of course, every one of us has heard the refrain from a developer or development manager “yeah, but what customer would ever do that?” After almost 20 years of testing, I can confirm that, if a customer can do something, they most likely will, whether they intend to or not!

Very often, we test the “happy path” of an application because as a customer, we want to cover the things we would expect to work. This is not doing the nefarious evil tests that testers are famous for, but actually focusing on tasks that the average user would do. Very often, we find that errors occur even here, such as a button that should bring up a report, but clicking the button does nothing. Adam makes the point that “the Happy Path should always pass” and generally that’s true, but just because the Happy Path has been well covered ground, all it takes is one change to throw a few rocks into the Happy Path.

Other Factors to Consider in Test Case Design

Simply put, it is impossible to test everything, even in simple programs. Large complex applications are even more challenging. Adding people, tools, time, etc. will get you a little closer, but complete and total test coverage is an asymptote; you may get close, but you’ll never really reach it (and in many cases, you won’t get anywhere near close to compete and total test coverage). This is where considering things like the project scope, the mission of testing, risk analysis and the testers skills come into play.

Black Box, White Box, and Gray Box

Alan goes on to describe a process where he defines Black box, white box and gray box testing. The key points are that black box testing focuses on the inputs and expected outputs (good and bad) of a system without knowledge (or concern) for the underlying code. White box testing focuses on knowing the specific code paths and constructing cases around intimate knowledge of the code structures. In actuality, effective testing blends both, and this is where Alan’s and Microsoft’s definition of Gray Box testing comes into play. Tests are designed from the customers perspective first (black box), but then reference the code to make sure that the tests actually cover the areas of the functions in question (white box).

Exploratory Testing at Microsoft

Alan describes Exploratory testing as a generally manual approach to testing where every step of testing influences the subsequent step. During exploratory testing, testers use their knowledge of the application and what they know of testing methods to find bugs quickly. Teams often schedule "bug bashes" throughout the product cycle, where testers take part of a day to test their product using an exploratory approach (we do the same thing at my company but we call them “Testapalooza’s”. basically the same thing and approach). The goals are to effectively go “off script” and see what we can discover.

Microsoft uses pair testing for many of these efforts. The idea is that two testers team up for an exploratory testing session. One tester sits at the keyboard and exercises the feature or application while the other tester stands behind or sits next to the first tester and helps to guide the testing. Then the testers switch roles. Alan reports that, in a single 8-hour session, 15 pairs of testers found 166 bugs, including 40 classified as severity 1 (bugs that must be fixed as soon as possible). In this case, it shows that two heads can often be better than one (and can often be a lot more fun, too).

Chapter 5 will be covered on Thursday.

No comments: