This is the fourth part of Section 3 in “How We Test Software at Microsoft”. First off, my aplogies for the delay between chapter posts. I had to finish up some work as I was changing jobs, and much as I love this blog, it had to take a back seat to other things that I had to complete. This chapter focuses on Other Testing tools and the build process and tools that facilitate that. Note, as in previous chapter reviews, Red Text means that the section in question is verbatim (or almost verbatim) as to what is printed in the actual book.
Chapter 12: Other Tools
Alan starts off this chapter with the analogy of a carpenter and the need for a variety of tools for the carpenter to do their job. It's important for that carpenter to not just have them but know intimately how to use all of them to best do the job necessary. Detective shows like CSI likewise use various tools to help solve crimes and uncover the truth (well, on TV at least ;) ).
Tools for testing and developing software are all over the place. They cover areas such as physically running tests, probing the system, tracking testing progress, and provide some automated and "computer aided testing" assstance in lots of areas (an exhaustive list might uncover dozens if not closer to a hundred tools for these purposes).
In this chapter, Alan discusses a few additional tools that are part of a tester's everyday life at Microsoft.
Churn describe the process and amount of changes that are applied to a file or a code module over a particular period of time. To determine the amount of code churn, it is helpful to consider the following:
Count of Changes: The number of times a file has been changed
Lines Added: The number of lines added to a file after a specified point
Lines Deleted: Total number of lines deleted over a selected period
Lines Modified: Total number of lines modified over a selected period
Microsoft Visual Studio Team System calculates a Total Churn metric by summing the total of Lines Added, Lines Deleted, and Lines Modified.
Code churn can give the tester an idea as to where more bugs are likely to be found. typically code is changed for only two reasons; writing new code to add features, or changing existing code to fix bugs. Often, in complex software systems, fixing one bug leads to introducing another one, which requires more code changes which... well, you get the point.
Is code churn in and of itself an indication that there are problems in the code? Maybe, then again maybe not. still, it's worth taking a look at these frequently changing areas as the likelihood for instability is there, and deserves a closer look.
Keeping It Under Control
Microsoft's testers utilize Source Control Management (SCM) just as regularly and consider it as important a tool as do the development staff. One of the main uses of source control for the test teams is tracking changes made to test tools and test automation. Some test tools span the entire company, so keeping those tools in sync is important for a number of teams. Just like in a development team, changes in testing tools are just as prone to introducing bugs as regular software development code is. The main difference is that the test teams are the stakeholders, rather than external customers. still, test code is software development code every bit as much as traditional software development code is.
One common benefit is the creation of a "snapshot" of a particular point in time of the application's development. By understanding all of the test in place, say, when an application was released to manufacturing, all of the tests in use up to that time can then be used as a baseline for further tests related to maintaining an application. In essence, this is a way of creating a regression test for a suite of tests already in use (software can have regression issues, and so can test cases and test code).
One of the most powerful tools is using the various file comparison tools. SCM systems support the ability to view two files side by side and highlighting the differences. This can help to demonstrate where code errors may exist by highlighting the changes made to the code. SCM can also be applied to other documents as well as to source code. Requirements, specifications, even the pages of HWTSAM were written and reviewed within a SCM system.
SCM is helpful when it comes to what changes, but often it will not tell the rest of the story, i.e. why was it changes in the first place? Alan shows an example where a return code was changed so that it returned the value * 2. So why was this change implemented? If there is a standard for comments in code, then the comments may explain it, but if not, then the SCM can tell who the developers were that made the changes and when, usually with specific notes explaining what was changed. Sometime this is helpful with recent changes, but what about for those developers who may be long gone from a project or even the company? In this case, the SCM may house additional information, such as developer comments, bug ID's to compare to comments associated with the fix applied.
The biggest challenge Alan mentioned was that, at least in the earlier days, there were many systems used, and they rarely talked to one another. In addition, the use of the SCM was informal at best when it came to the testing teams. The files were often manually copied to shares so other teams could access them, and most of the time this approach worked, but at times, it didn't because files didn't copy or were lost.
As the various test teams began to bring their various server resources together under one roof, they decided to make the system a little more structured so it could be backed up, maintained and managed better. With these changes came the storing of test code alongside and in conjunction with the source code for the product under test (treating test assets on par with development assets). This way, product code and test code can be maintained together and used by multiple groups if necessary, in addition to propagating test binaries to machines as needed.
The daily build is an integral part of the development and test life at Microsoft (as it is at many organizations). Source control, bug management, and test runs/test passes are all worked into the build process, which at Microsoft flows as follows:
- Perform pre-build (sync code and tools)
- Perform compile and code analysis
- Conduct check-in tests
Private and/or Buddy Build
- Clean Setup and Test
- Create a checkpoint release (copy source and components) --> Self-Test Build
- Conduct build verification tests
- Conduct post-build (cleanup)
- Create release build (copy source and binaries) --> Self-Host Build
The Daily Build
The daily build is exactly as it sounds; the entire product is built at least daily to make sure that all components can compile, all executables can be created, and all install scripts can be run. In addition, the process of Continuous Integration (meaning continuous builds performed with frequent code check-in) is also actively supported in Agile environments.
Note: The Windows Live build lab creates more than 6,000 builds every week.
Test teams usually run a suite of smoke tests. Most often, these are known as build acceptance tests (BATs) or build verification tests (BVTs). A good set of BVTs ensures that the daily build is usable for testing.
- Automate Everything: BVTs run on every single build, and then need to run the same every time. If you have only one automated suite of tests for your entire product, it should be your BVTs.
- Test a Little: BVTs are non-all-encompassing functional tests. They are simple tests intended to verify basic functionality. The goal of the BVT is to ensure that the build is usable for testing.
- Test Fast: The entire BVT suite should execute in minutes, not hours. A short feedback loop tells you immediately whether your build has problems.
- Fail Perfectly: If a BVT fails, it should mean that the build is not suitable for further testing, and that the cause of the failure must be fixed immediately. In some cases, there can be a workaround for a BVT failure, but all BVT failures should indicate serious problems with the latest build.
- Test Broadly—Not Deeply: BVTs should cover the product broadly. They definitely should not cover every nook and cranny, but should touch on every significant bit of functionality. They do not (and should not) cover a broad set of inputs or configurations, and should focus as much as possible on covering the primary usage scenarios for key functionality.
- Debuggable and Maintainable: In a perfect world, BVTs would never fail. But if and when they do fail, it is imperative that the underlying error can be isolated as soon as possible. The turnaround time from finding the failure to implementing a fix for the cause of the failure must be as quick as possible. The test code for BVTs needs to be some of the most debuggable and maintainable code in the entire product to be most effective. Good BVTs are self-diagnosing and often list the exact cause of error in their output. Great BVTs couple this with an automatic source control lookup that identifies the code change with the highest probability of causing the error.
- Trustworthy: You must be able to trust your BVTs. If the BVTs pass, the build must be suitable for testing, and if the BVTs fail, it should indicate a serious problem. Any compromises on the meaning of pass or fail for BVTs also compromises the trust the team has in these tests.
- Critical: Your best, most reliable, and most trustworthy testers and developers create most reliable and most trustworthy BVTs. Good BVTs are not easy to write and require time and careful thought to adequately satisfy the other criteria.
Breaking the Build
One of the most simple benefits of having daily builds is that, if there are errors, then the error will be found within 24 hours of the check in and attempted build. Syntax errors or missing files (forgetting to check in) are the most common culprits, but there can be other situations that can "break the build", too. Sometimes the build can be broken because of a dependency on another part of the system that has been changed.
While eliminating build breaks entirely is probably a pipe dream, it is possible to take steps to minimize them and their impact. Two of the most popular techniques Microsoft uses are Rolling Builds and Check-In systems.
A rolling build is an automatic continuous build of the product based on the most current source code. Several builds might occur in any given day, and with this process build errors are found more quickly.
A rolling build system needs the following:
- A clean build environment
- Automatic synchronization to the most current source
- Full build of system
- Automatic notification of errors (or success)
Using scripts to combine the steps and to parse for errors (cmd and Power Shell in Windows, plus tools like sed, awk, or perl in UNIX environments) can help to make the process as automated and hands-off as possible. In some cases, BVTs are also performed as part of a rolling build and are automatically reported to the team after each build and test run.
A Check-In System also helps to verify changes made to the main source code. A Staged Check-In can be helpful when dealing with very large projects.
Instead of checking the code directly into the main SCM, they submit them first to an interim system. The interim computer verifies that the code builds correctly on at least one platform, and then submits the code on behalf of the programmer to the main source control system. In addition, many of these staged interim check -in systems will make builds for multiple configurations.
The interim system (often referred to as a "gatekeeper") can also run various automated tests against the changes and see if there are any regression failures. By using this "gatekeeper" process, a significant chunk of bugs are found before they get committed to the main trunk.
A common question when kit comes to test software being written is, of course, “Who tests the tests?”. A lot of effort gets put into writing test cases and test code for automation purposes, but make m=no mistake, that code is code, just as production code is code. testers have the same susceptibility to creating errors as developers do. In a lot of ways of ways, running and debugging tests does a lot of that and helps shake out problems, but there's still lots of stuff that we can miss, especially if we are not specifically looking for those errors. Let's face it, testers are great when they look at other people's code and implementations, but we are likely to turn more of a blind eye to our own work (or at least not as critical an eye as we would for the developers).
One approach that can be used is to run Static Analysis tools. These tools examine source code or binaries and can pinpoint many errors without actually running the code.
Native code in Microsoft is code that has been written in C or C++. there are a number of tools that are commercially available to allow the tester to check code and see if there are any issues. Microsoft uses a tool called PREfast. This tool is also available in Visual Studio Team System. PREfast scans the source code and looks for patterns and incorrect syntax or usage. When PREfast finds an error, it displays a warning and the line number of the code where the error occurs.
Managed Code is any program code that requires a Common Language Runtime (CLR) Environment. .NET Framework code and languages such as C# fit this description. FxCop is an application that performs Managed Code Analysis. It can report issues related to design, localization, performance, and security. FxCop also detects violations of programming and design rules. FxCop is available as a stand-alone tool or integrated into Visual Studio.
Note: while these tools are helpful in finding many errors, they do not take the place of regular and focused testing. The code could be free of code analysis–detected issues, and yet still have lots of bugs. still, finding many of these issues early does help free up time for the testers to focus on potentially more significant issues.
Another key detail to be aware of here is that the test code is subject to the same limitations, and therefore, the same level of focus to static analysis needs to be performed on the test code as well. Test code is production code, too, just for a different audience.
Even More Tools
There are many tools available to testers, many of them are internally built, many are commercial products, but all are meant to help make a process work faster and better. Screen recorders, file parsers, browser add-ons, and other specific tools are developed to help solve particular problems. Shared libraries are often implemented to that various teams can use each others automated tests. The Microsoft internal tool repository contains nearly 5,000 different tools written by Microsoft employees. still, even with the large number of test tools available, there is no replacement for human eyes and hands in many test scenarios.
Tools are often essential when it comes to performing efficient testing, but just as important is knowing which tool to use for which purpose, and like the carpenter with the latest and greatest tools and technology, sometimes the best tool is to just get in the house and see what is going on. After getting a good lay of the land and understanding the potential situations, then they can pull out the necessary tools to do their jobs as efficiently as possible. For many of us as testers, the same rules apply. Tools are only as good as their implementation and their effectiveness is limited to the skill of the user.