Tuesday, October 9, 2012

Day 2 of PNSQC, Live from Portland


In my ever running tradition of Live Blogging events, I'm going to do my part to try to live blog PNSQC and the observations, updates and other things I see while I am here. Rather than do a tweet-stream that will just run endlessly, I decided a awhile ago that having an imprecise, scattered and lousy first draft level stream of consciousness approach, followed by cleaning up and making a more coherent presentation later to be a fairly good compromise. So, if you would like to follow along with me, please feel free to just load this page and hit refresh every hour or so :).



---


Here we go, day 2 of PNSQC, and we hit the ground running with Dale Emory's keynote talk "Testing Quality In". Dale starts out the conversation with the premise that testers are often under appreciated, and with that, their skills are often squandered. This is a tragedy, but one that we can do something about. What if we could identify the variables that affect the value of our testing?

Dale makes the point that "You Can't Test Quality In" and asked if people agreed and disagreed. Hands went up for both Yes and No... and Dale said he agreed with both groups. Huh? Let's see where this is going :). The key to this is a bit of reframing of the statement. A lot of it has to do with the way that we look at the problem.

One of the most common statements testers make is that "Testing Merely Informs". "Merely" in this case doesn't mean that the testing isn't important, it's that there's a limit to what we can inform and how far that informing can take us.

Dale walks us through an example of a discount value for a class of customers, and determining where we can add value and informing of the requirements CAN be an example of testing quality in, in the sense that it it prevents problems from getting made in the product. On the other hand, when we say we CAN'T test quality in, we're looking at the end product, after it is made, and in that case, it's also true.

Can you test quality into products? How about testing quality into decisions? Does our information that we provide allow our team(s) to make appropriate decisions? If so, do those decisions help to "test quality in"? Dave gave an example of one of the Mars Landers, where the flash file system filled up and couldn't reboot. Was it a bug to get to that state? Yes. Was it known? Again, yes. Why wasn't it fixed? Because when the bug was found, it was determined that it would push the launch date back by six weeks, which would mean Mars would be 28 million miles further away, and thus the launch team would be faced with a situation of "can't get there from here". With this knowledge, they went with it, knowing that they had a potential workaround if it were ever needed (which, it turns out, was the cae, and they knew what to do in that event). Could that be seen as testing quality in? I'd dare say YES. Apparently, Dale would, too :).

So why do we say "You Can't Test Quality In"? The reason is that, if we seem to start from a premise where we are running a lot of tests on a product that's already built, and if the product team do nothing with the information we provide, then yes, the statement stands. It's also a gross caricature, but the caricature is based on some reality. Many of us can speak plainly and clearly to something very much like this. The problem is the caricature has gotten more press than the reality. You absolutely can't test quality in at the end of a project when you have not considered quality at all during the rest of the process. Again, that's likely also a gross caricature, or at least I have not seen that kind of disregard on the terams I have been part of. Dale says that what we are actually doing is we are reacting to Ineffective Use of Testing (and often too late in the process). We also react to Blame ("Why didn't you find that bug?", or even "Why didn't you convince us to fix that?!"). Instead of fight, we should seek to help (hilarious reference to Microsoft Office's Clippy... "It looks like you are trying to test s lousy product... How can I help?")

Another question is "why don't people act on our information?" It's possible they don't see the significance (or we are having trouble communicating it). Often there may be competing concerns. It's also possibly that they just don't like you! So instead of just providing information, what if we were to give feedback? If feedback is accurate, relevant and timely, that also helps (scatological joke included ;) ). A way to help is for us to notice what we are really good at, and try to see how we can find those skills to give value in new ways. We are good at detecting ambiguity, we can detect ignorance, we can detect inconsistencies, we can imagine scenarions, consequences, etc.

What if we could test when we can find errors before we commit to it. This slides right into Elisabeth Hendrickson's advice in Explore It to Explore Requirements. Dale points it out to saying we should test "Anything about which your customers have intentions that have not been validated". We can test requirements, we can test features, we can test stories, and ultimately, we can test the team's shared understanding. We can test the understanding of our customer's concerns. If they are not reacting to our testing, it's possible we are not connecting with what is important to them. Dale makes reference to Elisabeth's "Horror Story Headlines"... what would be the most horrifying thing you could see about your company on the front page of the newspaper? Think of those concerns, and you can now get to the heart of what testing is really important, and what really matters to the team. Consider testing your testing. How well are your customers benefiting from the information you provide?

Dale then gave us some homework... What is one small thing that you can do to increase the value or appreciation of testing in your organization? Form me, I think it's finding a way to add value and help determine issue early in the process. Instead of doing the bulk of my testing after a story is delivered, try to find ways to test the stories before they are problems. In short, BE EARLY!


---


I had the chance to moderate the Test track talks. When we offer to moderate, sometimes we get talks that are assigned to us, sometimes we get to moderate the talks we really want to hear. This morning's talk is one of the latter, called "The Black Swan of Software Testing - An Experience Report on Exploratory Testing" delivered by Shaham Yussef.


The idea of a Black Swan is that it is an outlier from expectations; it doesn't happen often, but it can and does happen. It is likely to have a high impact, and we cannot predict when it will happen. Shaham considers Exploratory Testing as the Black Swan of software testing.

For many, Exploratory Testing is seen as a "good idea" and an "interesting approach", but we rarely consider it as a formal part of our test planning. Exploratory Testing, when performed well, can have a huge impact, because it can greatly increase the defect detection rate (perhaps even exponentially so). Exploratory Testing lets us look at issues early on, perhaps even in tandem with development, rather than wait until the software is "officially delivered". Scripted tests, as we typically define them, are often limited to pre-defined test cases, while Exploratory Testing is geared towards focusing on an area and letting the product inform us as to where we might want to go next. It's text execution and test design simultaneously performed.

Are there places where Exploratory Testing may not apply? Of course. In highly regulated environments where steps much be followed precisely, this is less of an effective argument, but even in these environments, we can learn a fair amount by exploration. 


---


Lisa Shepard from WebMD is the next speaker and her topic is "Avoiding Overkill in Manual Regression Tests"

By a show of hands, a large number of people do manual testing, do some kind of manual regression testing, and can honestly say that they don't really know what tests they really need to run. Lisa shared her story of how she wrote hundreds of manual test cases and how she was making all of these cases and how that was a good thing. Fast forward two years, and many of those test cases were not relevant any longer, and the fact was that all of them were put into a Deprecated folder.

This is an example of how much time can be wasted because of over-documentation of all the tests that have already been run. Now a more fair question is to ask: what manual regression tests do we NOT want, or need to document?

At some time, we reach a point of diminishing returns, where the documentation becomes overwhelming and is no longer relevant to the organization. Web MD faced this, and decided to look at why they felt so compelled to document so many cases. Sometimes the quantity of tests is seen as synonymous with quality of testing. Sometimes the culture rewards a lot of test cases, just the way some organizations reward a large quantity of bugs (regardless of the value those bugs represent). 

So what is the balance point? How do we write regression tests that will be valuable, and balance them so that they are written to be effective? There are lots of reasons why we feel the need to write so many of these tests.

Lisa thinks this is the one rule to remember: 

All manual regression tests must be tested during regression testing

Huh? What is the logic to that? The idea is that, if every single test is run, certain things happen. A critical eye is given to determine which tests need to actually be run. If there's no way to run all of them during a test cycle, that should tell you something. It should tell you what tests you really need to have listed and that you need to physically run, and it will also help indicate which tests could be consolidated. 

Now, I have to admit, part of me is looking at these examples and asking "Is there a way to automate some of this?" I know, don't throw things at me, but I'm legitimately asking this question. I realize that there are a number of tests that need to be and deserve to be run manually (exploration especially) but with a lot of the regression details, it seems to me that automating the steps would be helpful. 

But back to the point of the talk, this is about MANUAL regression test. What makes a test WORTHY of being a manual regression test? That's important in this approach. It helps also to make sure that the testers in question are familiar with the test domain. Another idea to take into consider is the development maxim Dont Repeat Yourself. In the manual regression space, consider "Thou Shalt Not Copy/Paste". Take some time to go through and clean up existing test plans. Cut out stuff that is needlessly duplicated. Keep the intent, clear out the minuscule details. If you must, make a "To Be Deleted" folder if you don't trust yourself, though you might find you don't really need it.

Lisa made a number of metaphorical "bumper stickers" to help in this process:

'tis better to have tested and moved on than to have written an obsolete test'

'Beware of Copy and Paste'

'If you are worried about migrating your tests... YOU HAVE TOO MANY TESTS!!!'

'Less Tests = More Quality' (not less testING, just less test writing for the sake of test writing)

'Friends don't let friends fall off cliffs' (pair with your testers and help make sure each other knows what's going on).


---


The lunch session appealed to me, since it was on "Craftsmanship" and I had heard huge praise for Michael "Doc" Norton, so I was excited to hear what this was all about, and how the cross sections of function, craft and art come together, and how we often lose some of the nuance as we move between them. Many of us decided that we didn't want to be seen as replaceable cogs. We want to be engaged, involved, we want to be seen as active and solid practitioners in our craft (software development and testing, respectively). In the Agile world, there was a de-emphasis on engineering practices, to the point where the North American Software Craftsmanship Conference formed in protest to the Agile conference de-emphasis of engineering principles and skills in 2009, and devised a new Maifesto for Software Craftsmanship.

As we are discussing this idea, it has struck me that, for someo=ne like me, I have to seek out my mentoring or my opportunities for mentoring outside of my company. In some ways I am the sole practitioner for my "community", yet sometimes I wonder if the skills that I offer are of real or actual value. In some ways, we reward individualistic, isolationist behaviors, and it's only when those behaviors start to hurt us that we wonder "how could we let this happen? We live and work in a global economy, and because of that, there are a lot of people who can do competent, functional work, and because of their economic realities, they can do that work for a lot less than I can. What should I do? Should I hoard my skill and knowledge so that they can't get to it? For what? So that when I leave the game eventually, all of my skills are lost from the community? If I believe I have to complete on a functional level, then yes, that's a real concern. What do do? Stop competing on a functional level, and start growing in a craftsmanship and artist level. If we excel there, it will be much more compelling for us to say that we have genuine talent that is worth paying for.

What if we had a chance to change jobs with people from different companies? What if you could share ideas and skills with your competitors? What if teo startups could trade team members so they go and work for the other team for a month or two, with the idea that they will come back and share their newfound knowledge with each other and their teams? It seems blasphemous, but in Chicago, two start-up companies are doing exactly that. What have they discovered? They've boosted each others business! It sounds counter-intuitive, but by sharing knowledge with their competitors, they are increasing the size of the pie they both get access to.

We've had an interesting discussion following on about how Agile teams are in reality working and effectively dealing with issues of technical debt and how they actually apply the proceses that they use. Agile and Scrum means one thing on paper, but in practice varies wildly between organizations. Returning a focus to craftsmanship, allowing others to step up and get into the groove with others, balancing the ability to have a well functioning team with performing actual discipline growth is imperative. The trick is that this needs to happen on a grass-roots level. It's not going to happen if it's mandated from management. Well, it may happen, but it won't be effective. People in the trenches need to decide this is important to them, and their culture will help to define how this works for them.


---


Jean Hartmann picks up after lunch with a talk about "30 Years of Regression Testing: Past, Present, & Future", and how situations have changed dramatically from how to determine the number, and whether or not the cases are really doing anything for us. Regression testing is software testing’s least glamorous activities, but is one of the truly vital functions needed to make sure that really bad things don't happen.


Kurt Fischer proposed a mathematical approach where the goal is, if I do regression test selection, I want to pick the minimum number of test cases to maximize my code coverage". That was in the early 80's. At the time, the hardware and CPU speed was a limiting factor. Running a lot of regression tests was just not practical yet. Thus, in the 80's, large scale regression testing was still relatively theoretical. Analysis capabilities were still limited, and getting more challenging with the advent and development of stronger and more capable languages (Yep, C, I'm looking at you :) ).


In the 90s, we saw the hardware capacity and the increasing ubiquitous nature of networks (I still remember the catch phrase "the Network is the Computer") to open up the ability for great regression test cycles to be run. I actually remember watching Cisco Systems go through this process in the early 90's, and I set up several labs for these regression tests to actually be run, so yes, I remember well the horsepower necessary to run a "complete regression test suite for Cisco Systems IOS, and that was "just" for routing protocols. I say just because it was, relatively speaking, a single aspect that needed to be tested, Level Three networking protocols and configuration. That so called simple environment required us to set up twenty racks of near identical equipment so that we could parallelize what was effectively hundreds of thousands of tests. I can only imaging the challenge of testing a full operating system like UNIX, MacOS or Windows and all of the features and suites of Productivity software! Still, while it was a lot of gear to support it, it was much less expensive than it would have been in the previous decade. 

During the 2000s, setting up environments to do similar work became smaller, faster and much less expensive. Ironically, for many organizations (maybe as a result of the dot-com bomb) many organizations scaled back large scale Regression Testing. One organization to not only not back away, but to double down, unsurprisingly, was Microsoft. Test suites for operating system releases frequently run into the hundreds of thousands of tests, both automated and manual. As you might imagine, test cycles are often measured in weeks.

Microsoft started using tools to help them prune and determine a more efficient allocation of test resources for cases, and in the process, Microsoft's  Magellan Scout was developed for this purpose.

Additionally with the advent of less expensive hardware and with the ability to virtualize and put machines up in the cloud, Microsoft can spin up as many parallel servers as possible, and by doing so can bring down turnaround times for regression test suites. What does the future hold? It's quite possible the ability to spin up more environments and do so for less money wil continue to accelerate, and as long s that does, we can expect that the size of regression tet cycles for places like Micrsooft, Google and others will likely increase.


---


For the rest of the day, I decided to focus on the UX track, and to start that ff, I atended Venkat Moncompu's presentation on "Deriving Test Strategies from User Experience Design Practices". The initial focus on Agile Methodologies point to an idea that scripted tests with known expectations and outcomes gives us automated confirmatory tests as a focus. Human beings are best suited for exploration, because we are curious by nature.

While Exploratory Testing offers us a number of interesting avenues (free-style, scenario-based, and feedback driven) there is also an aspect that we can consider to help drive our efforts. That approach is using User Experience criteria. Think about the session based test management approach (charter - test - retrospective - adaptation ). Now imagine adding the User Experience criteria to the list (create personas, create profile data, and follow them as though they are real people) we start to see the product as more than just the functions that need to be tested, we start to see them in the light of the persons most likely to care about the success of the product.

So why does this matter?


Scenarios that can adapt to the behavior of the system offer more than scripted test plans that do not. 
The personas and user scenarios provide us with a framework to run our tests. Exploration, in addition to developing test charters, also helps our testing adapt based on  based on feedback we receive.

So why don't we do more of this? Part of it seems to be that UX is treated as a discrete component of the development experience. Many companies have UX directors, and that's their core competency. That's not to denigrate that practice. I think it's a good one, but we need to also get testers actively engaged as well. These are good methods to help us not just be more expressive with our tests, but ultimately, it helps us to care about our tests. That's a potentially big win.


---


The final session for today (well, the last track session) is a chance to hear my friend and conference room mate Pete Walen. Pete is also discussing User Experience Design, User Experience and Usability. UX comes down to Visual Design, Information Architecture, Interaction Design and Accessibility. Pete emphasizes that the key to all of this is "the anticipated use". We talk a lot about this stuff, and that's why Pete's not going to cover those aspects. None of those ideas matter if the people using the software can't do what they need to in a reasonable fashion.

Pete has decided to talk about ships that sank... figuratively. Thus, Pete decided to share some of his own disasters, and what we could learn from them. I couldn't help but chuckle when he was talking about being made a  test lead for 100 developers... and he would be the sole tester for that initiative. No, really, I can relate! this first process was to replace an old school mainframe system with a modern feature friendly Windowed system, but using all of their old forms. Everyone loved it... except the people who had to enter the data! All the old forms were the same... except that error messages were rampant. The forms that were created were spreading one screen over five screens... and those five screens were not accounting for the fact that the form was expecting all the values to be filled in. The system was carefully tested, the odd behavior was referred back from the developers "as designed". Did the users approve of the design? Yes... if you mean the people who parse the data from the screens. What about the people who actually entered the data? They were not asked! Whoops! In short, the visual appeal trumped the functionality, and lost big time.

This was a neat exercise to show that the "user" is an amorphous creature. they are not all the same. They are not necessarily doing the same job, no do they have the same goals. Yet they are using the same application. In fact, the same person can do the same thing at different times of the day and have two totally different reactions. That happens more than we realie. We want our systems to support work, but often we make systems that impede it.

We are often guilty of systematic bias: we think that the outcome we desire is the one that should happen. this is also called "Works As Designed" Syndrome. In short, the people making the rules beter be involved in the actions that are being performed. If we want to make sure that the system works well for the users entering the data, we have to include them in the process.  Pete used a military phrase to describe this process in a colorful metaphor... "when the metal meets the meat!" that means that YOU are part of the process, and if YOU are part of the process, and if the rules apply to you, then YOU will be part of the PROCESS. In short, if the system is being designed for you, YOU will be the one involved in making sure the design works for YOU.

Additionally, when you are dealing with systems, know that one component cannot be tested completely and no testing being done on other components is not going to end well. People do not see "the system" the same way you do. There is no single user. This is why personas ar so important. their experiences are intertwined, but they are different. even people doing similar processes and jobs are very often unique, and have their own needs based on location and focus.

The question around UX and UX stuff is that there is a fair amount of training required to do their job well. Hoe many testers have actually received any formal UX training? For that matter, how many testers have received any formal testing training from their company? [Hilarity ensues].

On a more serious note, someone recommended handing the official users manual to their testers and let them loose for awhile... and that's actually not at all a bad idea for some introductory UX training. Why? Because a users manual is (truthfully) the closest to a requirements doc many people will ever get their hands on! In my experience (and many other people's) a users manual is both a good guide and almost usually wrong (or at least inconsistent). A lot of questions will develop from this experience, and the tester will learn a lot by doing this. Interestingly, many services are being made with no manual whatsoever, so we're even losing that possibility.

Ultimately, we need to do all we can to make sure that we do the best we can when it comes to representing our actual users, as many of them as humanly possible. there are many ways to do that, and it's important to extend the net as broadly as possible. Buy doing so, while we can't guarantee we will hit every possible issue, we can minimize a great number of them.


---

You all thought I must be done, but you'd be WRONG!!! One more session, and that is with Rose City SPIN. Michael "Doc" Norton of Lean Dog is presenting a talk called "Agile Velocity is NOT the Goal!" Agile velocity is the measure of a given number of stories and the amount of work time it takes to complete them. According to Doc, Velocity is actually a training indicator. What this means is that we have to wait for something to happen before we know that something has happened. We use past data to help us predict future results.

Very often we end up "planning by velocity". Our prior performance helps us forecast what we might be able to do with our next iteration. In some ways, this is an arbitrary value, and is about as effective as trying to determine the weather tomorrow based on the weather today, or yesterday.  Do you feel confident making your planning this way? What if you are struggling? What if you are on-boarding new team members? What if you are losing a team member to another company.

One possible approach to looking at all the information and forecasting is to instead use standard deviation (look it up statistics geeks, I just saw the equation and vaguely remember it from college). Good news, the slides from this talk will be available  so you can see the math, but in this case, what we are seeing is a representative example of what we know right now. Standard deviation puts us between 16 and 18 iterations. that may feel uncomfortable, but it's going to be much more realistic.

One of the dangers in in this, and other statistical approaches, is that we run a dangerous risk when we try to manage an organization on numbers alone. On one side, we have the Hawthorne Effect, which is "That which is measured, will improve".  The danger is that something that is not being measured is getting sacrificed. Very often, when we measure velocity and try to improve the velocity, quality gets sacrificed.

Another issue is that stories are often inconsistent. Even 1 point stories or small stories can vary wildly.  also, there are other factors we need to consider. Do we want velocity to increase? Is it the best choice to increase velocity? If we take a metric, which is a measure of health, and make it a target, we run the risk of doing more harm than good. An example of this is Goodhart's Law. He said, effectively "making a metric a target destroys the metric". There are a number of other measures that we could make, and one that Doc showed was "Cumulative Flow". This was a measure of Deployed, Ready for Approval, In Testing, In Progress, and Ready to Start. This was interesting, because when we graphed this out, we could much more clearly see where the bottleneck was over time. While, again, this is still a leading indicator, it's a much more vivid leading indicator; it's a measure of multiple dimensions and it shows over time what's really happening.

Balanced metrics help out a lot here, too. In short, measure more than one value in tandem. Take hours to consider alongside velocity and quality. What is likely to happen is that we can keep hours steady for a bit, and increase velocity for a bit, but quality takes a hit. Add a value like "Team Joy" and you may well see trends that help tell the truth of what's happening with your team.

No comments: