Monday, December 27, 2010

BOOK CLUB: How We Test Software At Microsoft (10/16)

This is the second part of Section 3 in “How We Test Software at Microsoft”. This is one of the chapters I have been most interested in reading (intrigued is possibly a more apt description). This chapter focuses on Test Automation, and specifically how Microsoft automates testing. Note, as in previous chapter reviews, Red Text means that the section in question is verbatim (or almost verbatim) as to what is printed in the actual book.

Chapter 10. Test Automation

Alan starts out this chapter with an explanation (and a re-assurance, in a way) that automation doesn't have to be such a scary thing. We've all done automation in one way, shape, form or another. If you have created or tweaked a batch file to run multiple commands, you have performed automation. When you do backups, or archive folders in Outlook, or create macros in Excel, you are creating automation. So at the core, most of us have already done the steps to automate things. Making a series of repeatable steps, however, is not really test automation. It's automation, to be sure, but test automation has a few more steps and some paradigms that are important to understand. More to the point, even with test automation, one size does not fit all, and different groups will use different processes to meet different needs.

The Value of Automation

Alan takes this debate straight-away, and acknowledges that this is a contentious issue. Alan ultimately comes down on the side of the fact that testers are not shell scripts (let’s face it, if a company could realistically replaced testers with shell scripts, we would be). Testing and testers require the human brain to actually understand and evaluate the test results. Also, there are many areas where full automation is great and helpful, and some places where automation gets in the way of actually doing good work (or for that matter, is just not practical to automate).

To Automate or Not to Automate, That Is the Question

Generally speaking, one of the best rules-of-thumb to use to determine which tests might be candidates for automation is "how often will this test be run"? If a test is only going to be run one time, it's probably low on the value of being automated. Perhaps even ten or twenty times may not make it a candidate for automation. However, if it's a test you will perform hundreds or even thousands of times, then automation becomes much more valuable as a testing tool.

Beyond times executed, Alan suggests the following as attributes that will help determine if a test is a candidate for automation:

  • Effort: Determining the effort or cost is the first step in determining the return on investment (ROI) of creating automated tests. Some types of products or features are simple to automate, whereas other areas are inherently problematic. For example, application programming interface (API) testing, as well as any other functionality exposed to the user in the form of a programming object, is more often than not straightforward to automate. User interface (UI) testing, on the other hand, can be problematic and frequently requires more effort.

  • Test lifetime: How many times will an automated test run before it becomes useless? Part of the process of determining whether to automate a specific scenario or test case includes estimating the long-term value of the test. Consider the life span of the product under test and the length of the product cycle. Different automation choices must be made for a product with no planned future versions on a short ship cycle than for a product on a two-year ship cycle with multiple follow-up releases planned.

  • Value: Consider the value of an automated test over its lifetime. Some testers say that the value of a test case is in finding bugs, but many bugs found by automated tests are only found the first time the test is run. Once the bug is fixed, these tests become regression tests—tests that show that recent changes do not cause previously working functionality to stop working. Many automation techniques can vary data used by the test or change the paths tested on each run of the test to continue to find bugs throughout the lifetime of the test. For products with a long lifetime, a growing suite of regression tests is an advantage—with so much complexity in the underlying software, a large number of tests that focus primarily on making sure functionality that worked before keeps on working is exceptionally advantageous.

  • Point of involvement: Most successful automation projects I have witnessed have occurred on teams where the test team was involved from the beginning of the project. Attempts at adding automated tests to a project in situations where the test team begins involvement close to or after code complete usually fail.

  • Accuracy: Good automation reports accurate results every time it runs. One of the biggest complaints from management regarding automated tests is the number of false positives automation often generates. (See the following sidebar titled "Positively false?") False positives are tests that report a failure, but the failure is caused by a bug somewhere in the test rather than a product bug. Some areas of a project (such as user interface components that are in flux) can be difficult to analyze by automated testing and can be more prone to reporting false positives.

Positively False?

The dangers with test automation are augmented when we do not take into account false positives or false negatives (using Alan’s non-software example, convicting someone of a crime they didn’t commit is a false positive. Letting someone go from a crime they did commit is a false negative).

We Don’t Have Time to Automate This

Alan describes the process of using Microsoft Test to write UI automation. His first job at Microsoft focused on networking functionality for Japanese, Chinese, and Korean versions of Windows 95. When he was ready to start automating the tests and asked when they needed to be finished, he recalls vividly hearing the words "Oh, no, we don’t have time to automate these tests, you just need to run them on every build."

Thinking the tests must have been really difficult to automate, he went back and started running the manual test cases. After a few days, the inevitable boredom set in and with it, the just as inevitable missing of steps "Surely, a little batch file would help me run these tests more consistently." Within fifteen minutes, some additional batch files answered that question handily. He applied the same approach to the UI tests, and found he was able to automate them after all, saving weeks of time that freed him up to look at other areas that would otherwise never get consideration due to the time to finish the manual tests. At least a hundred bugs were found simply because Alan automated a series of tests he had been told couldn't or wouldn’t be automated.

User Interface Automation

The API or forward facing functions are good candidates for being automated, but most of the time, the tools and promotional materials we as testers are most familiar with are those tests that automate at the user interface level. As someone who used to participate in the music industry, one of the coolest things you would ever see would be a mixing board with “flying faders” that seemed to move up and down of their own accord based on the timing of the music (most often seen at what was referred to as “mix down”).

This is the same equivalent that user interface automation provides. It has the “wow” factor, that “pop” that gets people’s attention, and in many ways, it really excites testers because it provides automation at a level where testers and users are most likely to apply their energies interacting with a product.

Over the past decade and a half tools that could record and play back sequences of tests at the user interface level have been very popular, but somewhere In the process of recording and playing back tests, problems ensue. The biggest complaint of this paradigm is the fact that the tests are “brittle”, meaning any change to the way the software is called, or any trivial changes to the interface or spawning area of a test could cause the tests to fail.

Microsoft uses methods to bypass the presentation layer of the UI, and instead interact with the objects directly, as opposed to directly interacts with the UI by simulating mouse clicks and keystrokes. Recording or simulating mouse clicks are the closest to the way the user interacts with the program, but they are also the most prone to errors. Instead of simulating mouse clicks or keystrokes, another approach is to automate the underlying actions represented by the click and keystroke This interaction occurs by using what is referred to as an ”object model”. This way, test can manipulate any portion of the UI without directly touching the specific UI controls.

Microsoft Active Accessibility (MSAA) is another approach to writing automation. MSAA utilizes the “IAccessible” interface, which allows testers to get information about specific UI elements. Buttons, text boxes, list boxes, and scroll bars all utilize the IAccessible interface. By pointing to the IAccessible interface using various function calls, automated tests can use methods to get information about various controls or make changes to them.With the release of .NET 3.0, Microsoft UI Automation is the new accessibility framework for all operating systems that support Windows Presentation Foundation (WPF). UI Automation exposes all UI items as an Automation Element. Microsoft testers write automated UI tests using variations on these themes.

Brute Force UI Automation

While using the various models to simulate the UI can save time and effort, and even make for robust automated test cases, they are not in and of themselves foolproof. It’s possible to miss critical bugs if these are the only methods employed.

While testing a Windows CE device, Alan decided to try some automated tests to run overnight to find issues that might take days or weeks to appear for a user. In this case there wasn’t an object model for the application, so he applied “brute force UI automation” to find each of the windows on the screen that made up the application. By writing code that centered the mouse over the given window and sending a “click”, he was able to make a simple application that could connect to a server, verify the connection was created, and then terminate the terminal server session.

After some time, he noticed the application had crashed after running successfully. After debugging the application, it was determined that there was a memory leak in the application. IT required a few hundred connections before it would manifest.

In this case, the difference was the fact that he focused on automating to the UI directly instead of the underlying objects. Had he focused on the underlying objects and not the specific Windows interface elements, this issue might never have been caught. Key takeaway; realize that sometimes directly accessing the UI at the user level is the best way to run a test, so keep all options open when considering how to automate tests.

What’s in a Test?

Test automation is more than just running a sequence of steps. The environment has to be in state where the test can be run. After running the test, criteria must be examined to confirm a PASS/FAIL and test results must be saved to review/analyze. Tests also need to be broken down so that the system is in the proper state to rerun the test or run the next test. Additionally, the test needs to be understandable so that it can be maintained or modified if necessary.

There are many tests where an actively thinking and analyzing human being can determine what’s happening in a system much more effectively than the machine itself can. Even with that, automated tests can save lots of time and, by extension, money.

Keith Stobie and Mark Bergman describe components of test automation in their 1992 paper "How to Automate Testing: The Big Picture" ( in terms of the acronym SEARCH:


  • Setup: Setup is the effort it takes to bring the software to a point where the actual test operation is ready for execution.

  • Execution: This is the core of the test—the specific steps necessary to verify functionality, sufficient error handling, or some other relevant task.

  • Analysis: Analysis is the process of determining whether the test passes or fails. This is the most important step—and often the most complicated step of a test.

  • Reporting: Reporting includes display and dissemination of the analysis, for example, log files, database, or other generated files.

  • Cleanup: The cleanup phase returns the software to a known state so that the next test can proceed.

  • Help: A help system enables maintenance and robustness of the test case throughout its life.

Oftentimes, the entire approach is automated so all steps are performed by the automation procedures. At other times, the automation steps may only go so far and then stop, allowing the tester to continue the rest of the way manually. An example of this is using the tool Texter. Texter allows the user to set up a number of fillable forms or provide text for fields in a scripted manner, and then the user can take control and walk through the test themselves, calling on other Texter scripts when needed to fill in large amounts of data that would be tedious to enter in manually, yet still provide complete control of the application’s execution manually.

Many automation efforts fail because the test execution is all that is considered. There are many more steps that need to be taken into consideration. Setting up the test, preparing the environment to run the test, running the test, gathering necessary information to determine a PASS/FAIL, storing the results in a report or log for analysis, breaking down the test to put the machine back into a state where other tests can be run or back to the initial state and then a reporting mechanism to show exactly which case was run, whether it passed or failed, and how that relates to the other tests all need to be considered.

Alan included an extensive layout of a series of tests that include the entire SEARCH acronym. Due to the amount of space it would take to walk through the entire example, I have omitted it from this review, instead choosing to focus on some areas that interested me (see below). For those who wish to check out the entire example please reference the book. It’s very thorough :) ).

I Didn’t Know You Could Automate That

Sometimes Automation of certain tasks can be very difficult to downright impossible. However, before giving up, it may be worth it to investigate further and see what you might be able to do.

An example is using PCMCIA devices and simulating plugging in and removing the devices, a task that manually would be challenging and automating the task without a “robot” would be impossible… that is, if the actual test required plugging in actual devices over and over. Microsoft utilizes a device called a "PCMCIA Jukebox" which contains six PCMCIA slots and a serial connection. By using this device they could, through software, turn on or off a device and simulate the insertion and removal of a particular device. Using these jukeboxes and other “switching” tools allowed the testers to simulate and automate tests where physical insertions or removal would be practically impossible.

The Challenge of Oracles

When determining if a test passes or fails, the program needs to be able to compare the results with a reference. This reference is called an “Oracle”. When tests are run manually, it’s often very easy to determine if a test passes or fails, because we are subconsciously comparing the state of the test and the results with what we know the appropriate response should be. A computer program that is automating a test doesn’t have the ability to do that unless it is explicitly told to, so it needs to reference an Oracle directly. Whether it be a truth table, a function call or some other method of comparison, to determine if a test passes or fails, there has to be a structure in place to help make that determination.

Determining Pass/Fail

Examples of test reporting also go beyond simple Pass/Fail. Below are some common test results that need to be considered in any test automation scenarios:

  • Pass: The test passed.

  • Fail: The test failed.

  • Skip: Skipped tests typically occur when tests are available for optional functionality. For example, a test suite for video cards might include tests that run only if certain features are enabled in hardware.

  • Abort: The most common example of an aborted result occurs when expected support files are not available. For example, if test collateral (additional files needed to run the test) are located on a network share and that share is unavailable, the test is aborted.

  • Block: The test was blocked from running by a known application or system issue. Marking tests as blocked (rather than failed) when they cannot run as a result of a known bug keeps failure rates from being artificially high, but high numbers of blocked test cases indicate areas where quality and functionality are untested and unknown.

  • Warn: The test passed but indicated warnings that might need to be examined in greater detail.


In many cases, log files that are created *are* the report, and for certain projects that is sufficient. For larger projects however, the ability to aggregate those log files or condense them to make a more coherent or abbreviated report is essential to understanding or analyzing the results of the tests.

One method is to go through and pull out key pieces of data from log files and append them to another file in a formatted way to help simplify and consolidate the results in an easier to view report. From this output a determination can be made as to which tests passed, which failed, which were skipped, which were aborted, etc.

Putting It All Together

An automation system contains lots of moving parts. Part from the test ha4rness itself, there is automated steps to retrieve tests from the test management system. Those tests are then mapped to existing test scripts or executables that are run to execute the automation. Computers and devices must be in place to run the tests. The results are parsed, reports are made, posted where appropriate, and stored back to the test case manager.

Alan points out that as of this writing, there are more than 100,000 computers at Microsoft dedicated to automated testing.

Large-Scale Test Automation

Many of the automation initiatives at Microsoft are start to finish processes:

  1. A command executed from the command line, a Web page, or an application.
  2. The TCM constructs a list of tests to run.
  3. Test cases are cross-referenced with binaries or scripts that execute the test.
  4. A shared directory on one of the database servers or on another computer in the system may contain additional files needed for the tests to run.
  5. The automation database and test case manager contact test controllers, configure the test computers to run the specified tests.
  6. Once preparation of the test computer is complete, the test controller runs the test.
  7. The test controller waits for the test to complete (or crash), and then obtains the log file from the test computer.
  8. The controller parses the log file for results or sends the log file back to the TCM or another computer to be parsed.
  9. After the log files are examined, a test result is determined and recorded.
  10. The file is saved and results are recorded.
  11. Some systems will directly test failures directly to the bug tracking system.
  12. Results are presented in a variety of reports, and the test team examines the failures.

Common Automation Mistakes

Test automation is as much of a coding discipline as software development, and as such, shares many of the benefits and disadvantages of the software development life cycle. In short, testers writing automation code make mistakes and create bugs in automation code, too.

Production code is tested, but this begs the question: who tests the test code? Some could say that repeated test runs and the ability to complete them verify that the test code is working properly… but is it? The goal of nearly every Microsoft team is for test code to have the same quality as the production code it is testing. How do the SDET’s do that?

Alan includes the following common errors seen when writing test code:

  • Hard-coded paths: Tests often need external files during test execution. The quickest and simplest method to point the test to a network share or other location is to embed the path in the source file. Unfortunately, paths can change and servers can be reconfigured or retired. It is a much better practice to store information about support files in the TCM or automation database.

  • Complexity: The complexity issues discussed in Chapter 7, are just as prevalent in test code as they are in production code. The goal for test code must be to write the simplest code possible to test the feature sufficiently.

  • Difficult Debugging: When a failure occurs, debugging should be a quick and painless procedure—not a multi-hour time investment for the tester. Insufficient logging is a key contributor to making debugging difficult. When a test fails, it is a good practice to log why the test failed. "Streaming test failed: buffer size expected 2048, actual size 1024" is a much better result than "Streaming test failed: bad buffer size" or simply "Streaming test failed." With good logging information, failures can be reported and fixed without ever needing to touch a debugger.

  • False positives: A tester investigates a failure and discovers that the product code is fine, but a bug in her test caused the test to report a failure result. The opposite of this, a false negative, is much worse—a test incorrectly reports a passing result. When analyzing test results, testers examine failures, not passing tests. Unless a test with a false negative repeats in another test or is caught by an internal user during normal usage, the consequences of false negatives are bugs in the hands of the consumer.

Automated testing is considered a core competency at Microsoft. The products under test are large and complex. Testing them all without significant automation would be very difficult to practically impossible, even with their vast number of testers. Significant automation efforts are required, and are thus in place. Scaling automation and making repeated use of automated test cases is very important to Microsoft’s long term strategy. With potentially thousands of configuration options and dozens of supported languages, the more robust and extensible the automation platforms, the more testing can be performed automatically, freeing up the testers for more specific and targeted testing where automation is impractical if not impossible.


Anonymous said...

Great post, thanks!

Which tools do you use to automate?
Inhouse, QTP,SilkTest,...?

Michael Larsen said...

Me personally? I use a mixture of Selenium and Java/Junit, though I'm shifting over to using Ruby/RSpec, Selenium and Cucumber, since that's what we use for TDD where I'm at right now. Got a lot to learn and work with before I can say I've mastered the test automation side of things, but thankfully, I'm back in a UNIX environment where automating rolls a bit easier (I find it does anyway :) ).