Thursday, January 20, 2011

BOOK CLUB: How We Test Software at Microsoft (13/16)

This is the fifth part of Section 3 in “How We Test Software at Microsoft ”. This chapter focuses on how Microsoft communicates with customers and process the feedback they receive. Note, as in previous chapter reviews, Red Text means that the section in question is verbatim (or almost verbatim) as to what is printed in the actual book.

Chapter 13: Customer Feedback Systems

Alan makes the point in this chapter the customer is a large part of the quality puzzle. In fact, without the customer, there isn’t much of a point to focusing on quality in the first place (of course, without a customer, there’s not a market for Microsoft’s products either… hey, it’s the truth!). The main point for companies like Microsoft to be in business is to fill a need for people that need the tools and options that they provide. Software helps people do tasks that they either cannot do without it, or would require a great deal of time to do were it not there. So Microsoft recognizes the value of the customer in the relationship, and they work in a number of ways to include the customer in the quality conversation.

Testing and Quality

The truth is, customers don’t care a whole lot if something has been tested a lot or little, what they care about is whether or not a product works for them or not. Actually, they do care if a product has been tested as they voice frustration about a product not working well or causing them frustration or “pain” to use. Outside of that, however, they don’t really care all that much what was done to test a product.

Alan includes a great hypothetical list. Those of us who test products would certainly appreciate this list, but for anyone else, it would probably result in a shrug followed by a muttered “whatever” for most people. For the testers reading this (and yes, I realize that is 99% of you  ), here’s a list we’d like to see:

- Ran more than 9,000 test cases with a 98 percent pass rate.

- Code coverage numbers over 85 percent!

- Stress tested nightly.

- Nearly 5,000 bugs found.

- More than 3,000 bugs fixed!

- Tested with both a black box and white box approach.

- And much, much more…

For those of us who test, that’s an impressive set of statements. Guess what? The customer doesn’t care about any of them. The customer really only cares if the product fits their needs, is affordable under the circumstances, and that it works the way that they expect it to. As far as value to the user, software quality rates pretty high up there (people want good quality products regardless of what it is). The rub is, most testing activities don’t really improve software quality (well, not in and of themselves). So why do we do it and why do we consider it important?

Testing Provides Information

Take a look again at the bullet point list above. What does it tell us? It gives us some indication as to what was actually done to test the product, and it gives us information on the success and failure rate for the tests performed. While our customers may not find any of that terribly interesting, rest assured the development team does! This information provides important details about the progress of the testing efforts and what areas are being covered (and a bit about the areas that are not).

Quality Perception

Even if everything goes smoothly and no issues are found after extensive testing, the test team has still provided a valuable service in the fact that they have decreased the risk of an issue being found in the field (not eliminated, understand, because testers cannot test everything, and there are situations that may never have been considered by the test team). There is, however a danger to the way that a lot of testing is done. When done in isolation or from a perspective of meeting functional requirements, we can create a false sense of security. While we may well have tested the product, we may also have tested the product in a manner that is totally foreign to the way that the customers would actually use the product. We hope that our test data and the quality experience of our customers would overlap. In truth, while they often do, the correlation of the two is not exact, and often there are only small areas where they intersect and find common ground. The more we can increase that commonality between a customer’s quality experience and the test data and test efforts performed the better.

Microsoft gathers information from a number of sources; emails, direct contact through customer support, PSS data, usability studies, surveys, and other means like forums and blog posts all help to inform as to the customer experience. The bigger question then is, what do we (meaning Microsoft in the book, but also we as testers and test managers) do with all of this information? How do we prioritize it, make sense of it, and interact with it to make a coherent story that we can readily follow and understand?

Customers to the Rescue

In a perfect world, we would be able to watch all of our customers, see how they interact with the product, and get the necessary feedback to make the product better. This approach works in small groups, but what do you do when your user base numbers in the tens of millions (or even hundreds)? Microsoft has a mechanism that they use called the Customer Experience Improvement program (CEIP). It’s entirely voluntary; you may have seen it if you installed a Microsoft product. Participation is entirely voluntary, and if you do participate, you send statistics to Microsoft that they can analyze and get a better feeling as to how you are using the system. The information provided is anonymous, untraceable, and no personal or confidential information is ever collected. Below are some examples as to what data is collected:

- Application usage

o How often are different commands used?

o Which keyboard shortcuts are used most often?

o How long are applications run?

o How often is a specific feature used?

- Quality metrics

o How long does an application run without crashing (mean time to failure)?

o Which error dialog boxes are commonly displayed?

o How many customers experience errors trying to open a document?

o How long does it take to complete an operation?

o What percentage of users is unable to sign in to an Internet service?

- Configuration

o How many people run in high-contrast color mode?

o What is the most common processor speed?

o How much space is available on the average person’s hard disk?

o What operating system versions do most people run the application on?

Based on this information, the test teams at Microsoft are able to see this information and make their game plans accordingly.

More Info: For more information about CEIP, see the Microsoft Customer Experience Improvement Program page at

Customer-driven Testing

Our group built a bridge between the CEIP data that we were receiving and incoming data from Microsoft Windows Error Reporting (WER) from our beta customers. We monitored the data frequently and used the customer data to help us understand which errors customers were seeing and how they correlated with what our testing was finding. We put high priority on finding and fixing the top 10 issues found every week. Some of these were new issues our testing missed, and some were things we had found but were unable to reproduce consistently. Analyzing the data enabled us to increase our test coverage and to expose bugs in customer scenarios that we might not have found otherwise.

—Chris Lester, Senior SDET

Games, Too!

This customer experience approach isn’t just on windows and Office. It extends to the Xbox and PC gaming realm as well. VINCE (Verification of Initial Consumer Experience) is a tool that is used widely on the Xbox and Xbox 360. Beta users of particular games can share their experiences and provide feedback as to how challenging a particular game level is, using just the game controller and a quick survey. Microsoft specifically used this feedback to help develop Halo 2, arguable one of the biggest titles in Xbox franchise history. The team was able to get consumer feedback on each of the encounters in the game (over 200 total) at least three times. Overall, more than 2,300 hours of gameplay feedback from more than 400 participants was gathered.

VINCE is also able to capture video of areas and show the designers potential issues that can hinder gameplay and advancement. Halo 2 used this information for an area that was deemed especially difficult, and by analyzing the video and the customer feedback they were able to help tailor the area and encounters to still be challenging but with a realistic chance of working through the level.

Customer usage data is valuable for any software application, in that it allows the developers to see things from the users perspective and helps to fill in the blanks of scenarios that they might not have considered. Adding instrumentation to help provide this feedback has been a boon for Microsoft and has helped shaped their design, development and testing strategies.

Windows Error Reporting

Just about every Windows user at one point or another has seen the “this program has encountered an error and needs to close” dialog box. If you are one who routinely hits the “Send Error Report” button, do you ever wonder what happens with that report? Microsoft uses a reporting system called Windows Error Reporting (WER) and these dialogues help them to gather the details of when systems have problems. In later OS versions such as Windows 7 and Vista, there is no need for the dialog box. If an issue appears, the feedback can be sent automatically.

WER works a lot like a just-in-time (JIT) debugger. If an application doesn’t catch an error, then the Windows system catches it. Along with error reporting, the system captures data at the point of failure, process name and version, loaded modules, and call stack information.

The flow goes like this:

1. The error event occurs.

2. Windows activates the WER service.

3. WER collects basic crash information. If additional information is needed, the user might be prompted for consent.

4. The data is uploaded to Microsoft (if the user is not connected to the Internet, the upload is deferred).

5. If an application is registered for automatic restart (using the RegisterApplicationRestart function available on Windows Vista), WER restarts the application.

6. If a solution or additional information is available, the user is notified.

Although WER automatically provides error reporting for crash, hang, and kernel faults, applications can also use the WER API to obtain information for custom problems not covered by the existing error reporting. In addition to the application restart feature mentioned previously, Windows Vista extends WER to support many types of noncritical events such as performance issues.

In addition to individuals providing this information, corporations can provide it as well from a centralized service. In many ways, this helps to ensure that company specific and trade secret details are not shared, and to prevent a potential leak of sensitive information. Microsoft ensures confidentiality on all of these transactions, but it’s cool that they offer an option that doesn’t require the companies to expose more than they have to for error reporting purposes.

Filling and Emptying the Buckets

Processing all of these details from potentially millions of instances of a crash or an issue would be daunting for individuals to handle on their own. Fortunately, this is something that computers and automation helps with handily.

All of the specific details of a crash are analyzed and then they are sorted into buckets (specific errors associated/ a specific driver, function, feature, etc.). These buckets allow the development team to prioritize which areas get worked on first. If there are enough instances of a crash or issue, bugs are automatically generated so that teams can work on the issues causing the most customer frustration.

In many cases, trends develop and the situations being seen can be rendered down to a function, a .dll file or a series of items that can be fixed with a patch. In classic Pareto Principle fashion, fixing 20% of the problems amounts to fixing 80% of customer’s issues. Addressing just 1% of the bugs often fixes 50% of the reported issues.

Out of the total number of crash experienced and reported, most can be whittled down to a small number of actual errors. By looking at the issues that cause the most crashes, testers and developers are able to focus on the problems that are causing the greatest pain, and in many cases, resolve most of the issues seen.

WER information is especially helpful and effective during beta release and testing. Many product teams set goals regarding WER data collected during product development. Common goals include the following:

- Coverage method: When using the coverage method, groups target to investigate N percent (usually 50 percent) of the total hits for the application.

- Threshold method: Groups can use the threshold method if their crash curves (one is shown in Figure 13-6) are very steep or very flat. With flat or steep crash curves, using the previously described coverage method can be inappropriate because it can require too many or too few buckets to be investigated. A reasonable percentage of total hits for the threshold method is between 1 percent and 0.5 percent.

- Fix number method: The fix number method involves targeting to fix N number of defects instead of basing goals on percentages.

Test and WER

So what is test’s role in gathering and analyzing this data? Monitoring the collected and aggregated crash data and measuring progress are important. Getting to the bottom of what is causing the crash is also important. Understanding how to get to a crash, or using code analysis tools to see why the bug was missed in testing can help strengthen and tighten up testing efforts. Fixing bugs is good, but preventing them from happening is even better.

One of the benefits of exploring the crash data is that “crash patterns” can emerge, and when armed with crash patterns, these steps can be used to see if other programs or applications run into the same difficulties.

More Info: For more information about WER, see the Windows Error Reporting topic on Microsoft MSDN at

Smile and Microsoft Smiles with You

When a developer has worked on a product at Microsoft, often the CEIP and WER data can provide information about which features are actually being used in a given product. The only problem is that this feedback shows where something is going wrong. Wouldn’t it be great to have a system that also shares what the user loves about a product, or perhaps even share about what isn’t necessarily a crash or sever issue, but would be a nice improvement over what’s already there?

Microsoft has something that does this, and it’s called their Send a Smile program. It’s a simple tool that beta and other early-adopting users can use to submit feedback about Microsoft products.

After installing the client application, little smiley and frowny icons appear in the notification area. When users have a negative experience with the application that they want to share with Microsoft, they click the frowny icon. The tool captures a screen shot of the entire desktop, and users enter a text message to describe what they didn’t like. The screen shot and comment are sent to a database where members of the product team can analyze them.

This program is appreciated by many of the beta testers and early adopters, in that they can choose to send a smiley or frown with a given feature and quickly report an experience. The program is a relatively recent one, so not all products or platforms have it as of yet.

Although Send a Smile is a relatively new program, the initial benefits have been significant. Some of the top benefits include the following:

- The contribution of unique bugs: The Windows and Office teams rely heavily on Send a Smile to capture unique bugs that affect real consumers that were not found through any other test or feedback method. For example, during the Windows Vista Beta 1 milestone, 176 unique bugs in 13 different areas of the product were filed as a direct result of Send a Smile feedback from customers in the early adopter program. During Office 12 Beta 1, 169 unique bugs were filed as a result of feedback from customers in the Office early adopter program.

- Increased customer awareness: Send a Smile helps increase team awareness of the pain and joy that users experience with our products.

- Bug priorities: driven by real customer issues Customer feedback collected through Send a Smile helps teams prioritize the incoming bug load.

- Insight into how users are using our products: Teams often wonder which features users are finding and using, and whether users notice the “little things” that they put into the product. Send a Smile makes it easy for team members to see comments.

- Enhancing other customer feedback: Screen shots and comments from Send a Smile have been used to illustrate specific customer issues collected though other methods such as CEIP instrumentation, newsgroups, usability lab studies, and surveys.

Connecting with Customers

Alan relays how he early on his career used a beta version of Microsoft Test. While he felt it ran well enough for a beta product, he ultimately did run into some blockers that needed some additional help. This led him to a Compuserve account (hey, I remember those :) ) and an interaction on a forum where he was able to get an answer to his question. Newsgroups and forums are still a primary method of communication with customers. Rather than tell all about each, I’ll list them here:

- microsoft.public.* (USENET hierarchy): hundreds of newsgroups, with active participation from customers, developers

- Numerous forums sponsored directly at and under Microsoft.

- Numerous developers and testers blog under these areas.

- a somewhat social network approach to communicating with Microsoft.

Each of these areas allows the customers the ability to interact with and communicate with the developers and testers at Microsoft as well as other customers. This way the pain points that are occurring for many users can be discussed, addressed and, hopefully, resolved.

Much as we would like to believe that we as testers are often the customer’s advocates and therefore the closest example of the customer experience, we still do not have the opportunity to engage as deeply or as often in customer related activities as would be optimal. Still testers need to be able to gather and understand issues specific to the customers using their product and quickly determine their pain points and seek ways to help communicate them to others in the development hierarchy. By listening to customer feedback and understanding where the “pain points” are, we can generate better test cases, determine important risk areas, and give our attention to those areas that matter the most to our customers.

No comments: