Chapter 14: Testing Software Plus Services
Ken Johnston wrote this chapter and he opens it up with an homage to a great book, the dangerous Book for Boys.Assembling the ultimate adventure backpacks and finding out what gear is essential and necessary for getting into danger, is a big part of the fun. Some of the stuff was great fun, but some of the areas were a little big hard to get his head around (tripwires are indeed dangerous and hazardous, which of course made them even more enjoyable and fun for his son. the point being, boys and danger go hand in hand, and the The Dangerous Book for Boys helps differentiate between big dangers and little dangers.
Software Plus Services (S+S) is an area that Microsoft is now focusing on ,and just like that Dangerous book for Boys, they also realize that S+S is an area that has its own dangers, both big and small. Services-oriented architecture (SOA) and software as a service (SaaS) is a big topic today, as is anything having to do with software that exists "in the cloud". Currently, though, there is no book that covers the dangers that lurk out there for those looking to take on development of Software+services, though this chapter certainly tries.
Two Parts: About Services and Test Techniques
To make the most of the chapter and help us get our heads around the issues, Ken has given us two sections to look at and consider. Part 1 deals with Microsoft’s services history and strategy, and how it compares and contrasts with regular software applications and Software as a Service (SaaS). Part 2 discusses the testing strategy needed to test services.
Part 1: About Services
The Microsoft Services Strategy
When Microsoft refers to Software Plus Services, what exactly are they talking about? The idea is that distributed software allows for companies to provide various online services (think Hotmail, Facebook, or Google Docs as examples) that leverage the local processing power of the 800 million plus computers out on the Internet (as of the date the book was published; there's likely more than a billion of them now if you include all the smart phone devices).
Shifting to Internet Services as the Focus
In 1995, Bill Gates released a memo that called for Microsoft to embrace the Internet as the top priority for the company. Inside Microsoft, this is often referred to as the “Internet memo,” an excerpt of which follows. With clear marching orders, the engineers of Microsoft turned their eyes toward competing full force with Netscape, America Online, and several other companies ahead of us on the Internet bandwagon.
The Internet tidal wave
I have gone through several stages of increasing my views of its importance. Now I assign the Internet the highest level of importance. In this memo I want to make clear that our focus on the Internet is crucial to every part of our business. The Internet is the most important single development to come along since the IBM PC was introduced in 1981.
—Bill Gates, May 26, 1995
Having been part of the middle wave of the development of the Internet (I can't claim the seventies or eighties, but I can claim anything after 1991 :) ), I remember well how it felt to try to connect a PC to a network and how much extra stuff had to be done to get it on, communicate, and then all of the extra tools necessary to facilitate communication with other machines. Over time Microsoft really took seriously the desire and the focus of providing Internet capabilities; windows 95 made it very easy to network computers without having to go through special steps; plug and play Internet was pretty much available to anyone that wanted to use it. But now that people were online, what would they do with that access?
Older models of "services" already existed. FTP allowed the user to transfer files; UUCP and USENET allowed for communications across various newsgroups on thousands of topics. With the coming of the Web, many of those older structures grew into new sites and new services; while USENET still exists, and the groups and hierarchies are still out there, but when we talk about services today, again, it's the Facebook's, the twitter's and the Tumblr's out there that people first think of.
In October 2005, another memo was sent out to the Microsoft staff. The “Services memo,” In it, a word of seamless integration was discussed, where applications and online services would not just be complementary, but nearly indistinguishable from each other. This is where Microsoft's strategy of SoftwarePlus Services comes from, along with their goal of integrating the Windows operating system and the various online offerings (including SaaS and various Web 2.0 offering).
The Internet Services Disruption
Today there are three key tenets that are driving fundamental shifts in the landscape— all of which are related in some way to services. It’s key to embrace these tenets within the context of our products and services.
1. The power of the advertising-supported economic model.
2. The effectiveness of a new delivery and adoption model.
3. The demand for compelling, integrated user experiences that “just work.”
—Ray Ozzie, October 28, 2005
S+S goes farther than simple having a PC on the network. PCs and mobile devices are all part of this equation, and the list of supported products is growing. Microsoft has also embraced "cloud services" and to that effect have released a software platform called Azure (www.microsoft.com/azure). Azure will be a focus for such technologies as virtual machines and "in the cloud" storage (think of Dropbox or again, Google Docs as examples). Live Mesh is a part of Azure that allows users the ability to synchronize data across multiple systems (computers, smart phones, etc.). Smug Mug and Twitter are active users of a cloud bases system that amazon offers called Amazon Simple Storage Service (S3). On the flip side of this, if S3 goes down, the services offered go down, too (hey, nobody rides for free :) ).
Growing from Large Scale to Mega Scale
Microsoft launched the Microsoft Network (MSN) in 1994, and it quickly became the second largest dial-up service in the U.S. MSN was a large service, requiring thousands of production servers to operate.
With the acquisition of Hotmail and WebTV, Microsoft went from a big player in the online world to a truly massive player. WebTV brought to Microsoft the idea and the need for "service groups", which is where large blocks of machines work mostly independently of other blocks. Service groups have the added benefit of not bringing down an entire site or service. If one group goes down, other groups will still be running to keep the service going (if perhaps not at the same level of performance as with all service groups up and active). Hotmail taught Microsoft of the benefit of what have come to be called "Field Replaceable Units" (FRUs) from Hotmail. These FRU systems were little more than motherboards and a hard drive and a power source, and they were on flat trays like open pizza boxes. As is pointed out in the chapter, this was in the days prior to the wide scale growth of the gigahertz plus multicore CPU's that required skyscraper heat sinks to keep cool, and just using the server rooms A/C would be sufficient to keep the systems running optimally (ah, those halcyon days ;) ).
These concepts are still relevant and in practice, but there are limits; at the time of the writing of this book, Microsoft was adding 10,000 computers each month to its datacenters just to meet demand. Even with a modular open pizza box design and structure, that's a lot of hardware that requires electricity and cooling (which, of course, likewise requires electricity) as well as other space and infrastructure needs (buildings, rooms, cabling, security, etc.). A few years ago, Microsoft standardizaed on data center racks loaded with all the gear they would need, and these racks were dense. Now, instead of just loading racks, Microsoft now develops Datacenter Modules that are the size of freight shipping containers, and referred to as container SKU's. For visual purposes, think of the size of an 18 Wheeler's container hold. Now picture that thing loaded with racks, and with large connector harnesses. Now picture each of those being dropped somewhere and connected to hook up and add to a datacenter. That gives you an idea of the scale that these S+S systems are using and adding routinely, as well as the size of the systems they rotate out when maintenance or retiring of machines is required.
Microsoft Services Factoids
Number of servers: On average, Microsoft adds 10,000 servers to its infrastructure every month.
Datacenters: On average, the new datacenters Microsoft is building to support Software Plus
Services: cost about $500 million (USD) and are the size of five football fields.
Windows Live ID: WLID (formerly Microsoft Passport) processes more than 1 billion authentications per day.
Performance: Microsoft services infrastructure receives more than 1 trillion rows of performance data every day through System Center (80,000 performance counters collected and 1 million events collected).
Number of services: Microsoft has more than 200 named services and will soon have more than 300 named services. Even this is not an accurate count of services because some, such as Office Online, include distinct services such as Clip Art, Templates, and the Thesaurus feature.
Number of servers: On average, Microsoft adds 10,000 servers to its infrastructure every month.
Datacenters: On average, the new datacenters Microsoft is building to support Software Plus
Services: cost about $500 million (USD) and are the size of five football fields.
Windows Live ID: WLID (formerly Microsoft Passport) processes more than 1 billion authentications per day.
Performance: Microsoft services infrastructure receives more than 1 trillion rows of performance data every day through System Center (80,000 performance counters collected and 1 million events collected).
Number of services: Microsoft has more than 200 named services and will soon have more than 300 named services. Even this is not an accurate count of services because some, such as Office Online, include distinct services such as Clip Art, Templates, and the Thesaurus feature.
Power Is the Bottleneck to Growth
Moore's Law is still proving to be in effect. Roughly every 18 months, the processor speed, capabilities, storage and RAM availability just about double. Along with that, the need for power, cooling, and sufficient space likewise rises along with it. The average datacenter costs about $500 million to construct. To this end, Microsoft is working aggressively with vendors and Other Original Equipment Manufacturers (OEM's) to find ways to help design systems and datacenter infrastructure that is the most efficient with the way that it can draw power and be constructed to get the most bang for the buck in as many categories as possible.
In a sidebar presentation, Eric Hautala makes the case that "Producing higher efficiency light bulbs is a fine way to reduce power consumption, but learning to see in the dark is much better..." The point of the comments is that we can work towards making systems more efficient, or we can work towards making the code and applications themselves more efficient”. In other words, rather than try to accommodate for a greater and greater need and find ways of doing the same job cheaper, how about finding ways to do this and reduce a particular need entirely?
Services vs. Packaged Product
When we refer to a product that is purchased on a CD or DVD, that's a "shrink-wrap" product, and typically comes with all of the items associated with that kind of delivery, including a box, a cellophane wrapper, and glossy artwork to make the package appealing. Same as when the applications are installed on a computer that is being purchased. The distinction is blurring, though, as many examples of items are being purchased as Shrink-Wrap software (Xbox games, Office versions, etc.) but they also access content and service items online that can be downloaded and used to enhance the product, or to connect with the Internet to update and interact (Xbox Live, for instance). the rebranding of Hotmail and passport to be part of the Windows Live brand are movements towards a more defined services model on the web.
Windows Live Mail is currently the largest e-mail service in the world with hundreds of millions of users. WLM works with many different Web browsers, Office and Outlook versions, PC's and Smart devices. Many of these services require a live connection to the Internet (example, you can't get new email when you are not connected to the Internet). By contrast, Web 2.0 tends to rely on newer browsers, Flash, or Microsoft SilverLight to allow for processing changes to be moved away from the web server to applications and modules that are built into or rendered and processed in the browser, and subsequently, through the client device itself. Because of this ability to have the local system do the processing, in many cases, these systems can be offline to do this.
Moving from Stand-Alone to Layered Services
Generally when a web site was developed in the early days of the net, sites were self-contained, and most of the processing or validation or workflows were programmed on the server and the users relied on that server being available. No server, no site, no service.
Layered services allows the ability for companies to leverage multiple machines, including the client machine itself, to help process transactions, track progress, and select different workflows and execute those workflows. In truth, most sites that look like they are single site applications are actually layered and spread out systems.
Using eBay as an example, eBay acquired PayPal and then integrated it with the eBay experience. Not only is it a service for eBay, but for anyone who wants to request and send money. Standalone services generally are easier to test by comparison to those that are leveraged and spread out through many different technologies. When a system is layered, the testing requirement can go up significantly (even exponentially).
Part 2: Testing Software Plus Services
In this section, Ken goes through describing test techniques that can be used for testing services, and also walking through various approaches to using those test techniques.
Waves of Innovation
Microsoft has gone through a number of innovations in the way that computers, and the way people interact with them has changed over its existence. The evolution of these methods follows below:
1. Desktop computing and networked resources
2. Client/server
3. Enterprise computing
4. Software as a Service (SaaS) or Web 1.0 development
5. Software Plus Services (S+S) or Web 2.0 development
With each change, software testing has had to evolve and change along with it so as to meet the new challenges. Number of users and methods of interacting have blossomed and in some cases exploded. With each wave, most of the older tests continue being used, and more and newer tests get added.
Designing the Right S+S and Services Test Approach
Ken brings us back to the Dangerous Book For Boys idea, and how he wishes he could write the Dangerous Book for Software Plus Services. In his mind, this book would have a lock on it but no key. Readers would have to figure out how to pick the lock if they want to read the actual secrets within the book. Ken confesses his reason for this is that, were he to write the book, he'd have to share all the mistakes he has made developing, testing, and shipping these services.
Client Support
It's not just browsers to deal with these days. For many services, testers at Microsoft have to look at a matrix of browsers, plus various applications (Outlook, Outlook Express, and Windows Live Mail client for the example of Mail services)as well as mobile devices. Add to that other products that interact with Mail services and the various languages and regions that have to be considered. The point is that exhaustively testing these options is impossible, so the testers have to determine various aspects to help them pare down the matrix. Market share is considered for various browsers and clients, as are the goals of the particular service and which area they are trying to serve. A risk analysis needs to be made to help determine which areas will require a large scale push for testing and which areas can be given a lower priority or even pushed down or of the list completely.
Built on a Server
The benefit of having many of the services integrated into a Server platform is that many of the systems have already been tested and integrated into the systems. Many of the services and their interactions with the underlying server code have been tested over several years as the applications have been developed. As the service matures, many of the issues are located within areas like performance and enterprise wide actions (such as having multiple people accessing the same objects at the same time, and how they interact under those circumstances).
Server products also require testing and a focus on how the product can be managed and how the service can actually scale and just how far the system can be scaled. The simpler and more hands off the management of the systems can be, the more profitable and effective the services can be, as they will require less overall direct interaction and more automated or scripted interaction will help the environments scale at a lower cost point.
The testing efforts are focused on how the users and remote applications access the service via API or direct interaction. Integration testing is also critical and how many items can be tested (since not every possible combination can be tested; the order of magnitude is absolutely massive).
Loosely Coupled vs. Tightly Coupled Services
Layered services are adding new features and growing with new dependencies with every release. Coupling is another way of saying how dependent one program is upon another. When systems change frequently, loosely couple systems are desired. With loosely coupled systems, the fewer the dependencies, the easier it is to innovate individual areas and add features for specific areas. More tightly coupled services require closer monitoring and more project management. To ensure that features are delivered with high quality and to ensure that a change in one component doesn't cause ill effects in other areas. Loosely coupled services are important when integrating with third party applications or external devices and services. When dealing with credit card companies or payment services, loosely coupled systems were easier to implement and, subsequently, ship. More tightly coupled services were much less easy to implement and often prone to delays in shipping.
Stateless to Stateful
Stateless services, such as sending an email, or loading a series of web pages, don't require that data from one transaction be stored and compared for other transactions. Stateful transactions, however, depend on the data from each transaction to carry over (logging into a secure service, making a credit card payment, etc.). When stateful events are disrupted or cannot be completed, there can be significant impact on the user experience. When a service takes a long time to complete a transaction and needs to store unique user or business-critical data, it is considered to be more stateful and thus less resilient to failure. Compare stateful to working on a Microsoft Word document for hours, and then experiencing a crash just as you are trying to save the file. It might be a single crash, but the impact on the user is dramatic.
Time to Market or Features and Quality
There's a fine balancing act between being first to market and having a product that can maintain the hold once a product gets there. While often a product that is first in the market can have a dominant position for a long time, there's no guarantee that will happen. The example of Friendster is used as one of the first Social networking sites, but MySpace and Facebook were able to jump ahead because of features and quality (and then mySpace was eclipsed by Facebook as the dominant social media app, and a few years down the road, provided Facebook doesn't keep innovating and working to make the platform more appealing, perhaps another innovator will take its place). When balancing between the two, both need to be considered, but if the time to market is all that's being considered, it's a good bet that someone else will take over your spot because they offered better quality and value that trumped your first arrival innovation.
Release Frequency and Naming
Hand in hand with the time to market and quality question is whether or not to release regularly with regular updates, or to offer the software as a beta version publicly. Google did exactly this with Gmail, allowing it to incubate and get feedback over the course of three years. Releasing it as a public beta alerted users that there very likely would be problems with it, and in doing so, they were able to solicit feedback and develop some mind share without fearing that the bugs that were found would be such a turn off that many would stop using the service. I know that this is often a plus for me, when I use software for various purposes. I am OK with some bugs in these types of environments, provided they don't interfere or cause problems with "mission critical" applications that I may be working with. Released software that's been "polished" (or purported to be) I'm a little less forgiving with, especially if the bugs are frequent or very up front (odd esoteric things I can deal with, especially if there are identified workarounds).
Testing Techniques for S+S
So now that we have identified the potential "Danger areas", what can we set up to actually test them?
Fully Automated Deployments
Ideally, in a Microsoft environment and when working with Microsoft products, the testers work through fully automated deployments, where the interaction of the tester or developer with the installation and configuring of the software should be minimal. The more hand holding required, the farther away from a final completed solution the product actually is. Here are some tips offered by Ken and his team:
Great deployment is critical to operational excellence
What makes for a good deployment scenario? According to Ken, the key features of a good deployment are:
- zero downtime
- zero data loss
- partial production upgrades (a service might upgrade a small percentage of what is considered the production servers to the new version of the code) 
- rolling upgrades (a service can have portions of the live production servers upgraded automatically without any user-experienced downtime)
- fast rollback (the safety net used if anything goes wrong; if something goes wrong, roll back to a previously known good state)
Test Environments
The simple fact is that, with Microsoft products and Microsoft services in particular, there is no one size fits all approach. There can't be. Each service and each setup may require more or less interaction, more or less resources, and demand will ultimately change and determine how a service acts and is interacted with. In many cases, different test environments must be set up to test as many parameters and features as possible.
The One Box
When you have a single test machine, with all of the required software running on one platform, and all of the code needed to run the service and system exists on one physical computer (or VM) this is a one box setup for testing. One box has the advantage of allowing fairly quick read/write transactions and interactions between components. The systems can all be checked from the same terminal service session, and configuration changes to one have the effect of changing performance to another option. RAM and disk access are all specific to that machine, and if one piece suffers, everything suffers. Still for speed and debugging purposes, having a one box test setup can be and often is one of the single most important and quick to debug and troubleshoot test systems there is.
The Test Cluster
A Test cluster, by contrast, may be several physical or virtual machines running simultaneously and interacting with one another. While these are not as easy to diagnose and debug issues as the "one box" systems are, they definitely do provide for a more realistic environment, one where multiple security groups can be sent and where context can be tested and privileges can be across a network or specific to a particular machine. An example is when a datacenter or an enterprise customer has a web server, a database, server, a file server and a terminal services server. In a one box environment, these are all part of the same system. By contrast, in a real production environment, to prevent from a single point of failure, the web server would be on one system, the file server on another, the database system on another, and the secure remote access on yet another machine (or at least being managed by another machine). These options allow the user to see the context in which successful connections are being made, and in those where they are not so obvious).
The Perf and Scale Cluster
When we use a single box setup, we can do some basic performance analysis of the application a d response time with specific criteria easier than we can if we have multiple machines with specific roles. It's just easier and there's no network latency or other aspects of the load of the network really getting in the way. However, some services just place such a demand on a system that the split out systems are required. In addition, it helps to see which components get slower independent of other variables (is the latency due to the web server, the database, or something else?). In many cases, the data load itself can be an issue. Ken uses the example of an email account, and how it will perform differently when there are just ten messages in the inbox vs. thousands of them. By comparison, how does a system work when it is handling email accounts for 10 people or 10,000 people. By determining the performance curve and seeing where the performance impact takes place, it's possible to see how many machines are required and what resources for each so that a "break point" is determined, where one physical machine needs to be augmented with two, four, eight and so on machines to meet that service's requirements to scale effectively.
The Integrated Services Test Environment
When there is a collection of services being offered, components may work well in isolation, but there are challenges when all of the services are tested together and tested to see how well they work with each other. One can look at the most recent changes made to windows Live Hotmail to see how much of a challenge this could be. In previous revisions of Live Mail (Hotmail) users were able to open Word Documents, spreadsheets and presentations by downloading the files and opening them via their local copy of Office (that is, if they had it, or they had an equivalent application that could load it. The most recent version of Live Mail now has the ability of opening a stripped down version of various Office applications to view, edit and save files in "real time" through the email account. While I can enjoy the end result, I can only imagine the technical challenges that were required to get a "web specific” version of Microsoft Office to work within Live mail.
The Deployment Test Cluster
Using multiple machines helps the tester take a look at each of the components in turn, and how they would interact with other components were they deployed on different systems. Early in the development and testing process, these tests could be performed on lower end hardware or in a Virtual Machine cluster. As the product gets closer to its release date, more emphasis on creating and running tests on production grade machines would be needed.
Testing Against Production
In any environment, while detailed and specific testing can be made on the most controlled of systems, the real work and real world demands are going to be very different from the automated tests that are performed and even the very specific targeted tests that are run on a number of systems. For the true impact of changes and new development to be seen, there must be a level of testing performed that covers the real system. Yes, that means the live production environment. While that doesn’t mean that the service must be brought down entirely to deploy and test, it does mean that many of the components will need to be loaded and actually tested in real world scenarios on the live systems. Examples of this can be seen with Facebook; they actually take a feature and roll it out to a small subset of their customers to see how it responds to real interaction and real usage. Based on those tests, they can choose to roll it out further or bring the component back in house for continued evaluation, testing and possible reworking.
Production Dogfood, Now with More Big Iron
The idea of a “dogfood” network or system was discussed in chapter 11, and it has to do with the idea that a company will live with its service before it is “inflicted” on anyone else. The developers and testers and the other folks who work at Microsoft are the customers before their customers are. Having worked at Cisco Systems in the 1990’s, I am familiar with living on a dogfood network and having to often check and see what is happening; it’s a sobering situation, and one that helps users realize all of the things they need to be aware of that might impact their customers, as well as bring to light needed changes that could help the product and the user experience overall.
Production Data, so Tempting but Risky Too
Microsoft has a collection of tens of thousands of documents that have been cleansed (sanitized) so that the information inside the documents cannot get out and be seen as a security risk, yet can still be used as a good representation of real production data. Having access to this may documents can again help the developers and testers see how real data in real situations can impact performance for applications and services. While having a set of 100,000 email addresses that are live could potentially create a problem if a script is mis-configured (lots of unintended SPAM email getting sent), having email addresses that are nonsense but still form a valid address construction can be helpful in analyzing how a database stores them and how long it takes for those addresses to be served up and used.
Performance Test Metrics for Services
Many of the metrics used for testing performance of applications can also be used to test services. Some tests are specific to services and are most helpful in that particular paradigm:
- Page Load Time 1 (PLT1):
Some tools that help to make these test possible:
- Visual Round Trip Analyzer (VRTA)
- Fiddler (http://www.fiddler2.com)
Several Other Critical Thoughts on S+S
There are a number of additional areas that are helpful when it comes to testing services. This review is way longer than any other chapter review I have done, so to not draw this out, I have made this section smaller than it probably deserves. Check out the book for more detailed explanations of each of these.
Continuous Quality Improvement Program
Just like how Microsoft Office and SQL Server have the option for users to participate in Customer Improvement Programs, so do the service offerings. Plain and simple, the real world usage patterns and actual interaction with the systems is going to be multi-faceted and vary from user to user and company to company. While it is possible to make some assumptions and get some likely scenarios, the fact is that there will be many areas that were not considered or come to light only after being seen in the field. Freqwuent monitoring of the actual usage statistics and pain points that customers actually experience will help to inform future development and testing.
Common Bugs I’ve Seen Missed
The simple fact is that, no matter how detailed a test team is, no matter how focused they are and how much of a real world they simulate, the test environment is, at best, a crude approximation of what happens in the real world. This is super clear when it comes to working with Software+Services, because let’s face it, there really isn’t any way to 100% guarantee that all scenarios are covered for a system that has thousands of users, let along hundreds of millions. There will be cases missed, there will be situations never considered, and yes, there will be bugs that come to light only when they have actually been exposed to real world use. Severe bugs and security issues will continue to come to light, and often it takes a concerted effort to learn from the issues discovered. It’s easy to criticize a company for a product that has a “glaring” defect, but again, glaring to who? Under what circumstances? Could it have been foreseen? Could it have been prevented? It’s easy to Monday morning quarterback a lot of testing decisions and look at bugs in hindsight. It’s not so easy when the systems are under development and their full scope and impact may not be seen for weeks, months or in some cases, years, until they are scaled up to the tens or even hundreds of millions of users.
Services are now a huge part of the Internet landscape. For many Internet savvy users, it’s all they access. Think of going online to check your Gmail account, listen to music through iTunes or Rhapsody, communicate with friends via Facebook, watch videos through YouTube or Hulu, or for that matter, track your television shows with SideReel (hey, why not put in a shameless plug ;) ). The fact is, most of the Internet properties out there that people access are related to or are in and of themselves Software+Services offerings. These systems are going to continue to grow in the future, and they are going to become every bit as powerful and every bit as sophisticated as desktop applications that we use and take for granted. Likewise, the expectations that they perform the same way will continue to rise, and thus the need to diligently test them will rise along with those expectations.
 
 
No comments:
Post a Comment