Thursday, April 20, 2023

Into the Depths of Risk Analysis to Improve Your Testing: an #InflectraCON2023 Live Blog


It has been a while since I've done a blog update. Granted, it's been a while since I've been anywhere so reality has been much of the same but I am currently at InflectraCON and taking notes, so you can all come along for the ride if you'd like :).

Bob Crews Avatar

Bob Crews

CEO, Checkpoint Technologies


Our first talk is with Bob Crews and is covering Risk Analysis to improve testing. Interestingly, we have seen the complexity of software development explode over the past couple of decades. Web sites and apps have matured significantly and what they can do has increased exponentially and continues to do so. By virtue of that, sites and apps are becoming more challenging to test every day. We can't test everything, no matter how delusional we believe ourselves to be. Thus, we have to apply a different metric. We have to consider what is critical and of most importance, and then work our way down from there to "nice to have" long before we ever get remotely close to "we've done verything" (trust me, no one gets to that point, ever).

With this, we need to make sure that we have a clear understanding of what areas are most important, what risks we face, and how we are able to mitigate those risks, to the best of our ability. We can't prevent risk but we can do some mitigation in the process. By analyzing what the potential threats are, we can make sure that we put the most important situations at the forefront. Elisabeth Hendrickson often led with the idea of waking up and seeing your company on the front page of the local newspaper. What would be the most terrifying thing you could see in those headlines? If you can envision that, then you can envision what the potential risks are if your product were to fail. Odds are, we will never face anything that dire but it illustrates the critical elements that we should be alert to. By putting those horrorshow examples front and center, you have done a simple risk analysis of what could go wrong. From there, you can start to consider what would be next in line, and then consider how to mitigate those potential issues.

To be clear, risk assessment is a time-consuming process and can be as formal or informal as you want to make it. It can be an enterprise-level operation and exercise, or it can be a personal and singular effort just for our own benefit. I'm not sure how many people have pipeline CI/CD systems but much of the time, we have created tests that are independent and can run in any order. That's great for parallelization and speed but it may not be the best approach for risk mitigation. In a randomized, parallelized environment, every test is basically considered equal. Every test has the same potential to be a pass or fail and every test can stop the pipeline until it is resolved. How often do we find ourselves working on trivial tests that stop the system while something major doesn't even get run? There are possible ways to set up a prioritized run and make those the tests that get run first and cover the broadest area possible. By doing this, we can schedule and structure our tests so that they run in a criticality order. Think of it as placing your tests in folders, where those folders are rated by priority. We would of course want to run the tests in folder #1 before we run the tests in folder #9. To determine what those tests are based on that kind of hierarchy, we would need to evaluate and assign a risk assessment to each test.

By taking the time to look at a test, giving it a risk impact score, a likelihood that it might happen, and the possible frequency that it might happen, we can determine which bucket an item falls into. Also, high impact is subjective much of the time, but there are places where that subjectivity can rise from annoyance to a critical issue. Over time, we can get to the point where we might assign a weight to these tests, let's say that 99 is a top weight and 10 may be a minimal weight (I'd argue anything less than 10 may not even be worth running, at least not daily or as part of the full CI/CD commitment).

The fact is, we often look at risks as being "Acceptable". For years, Accessibility and Inclusive Design are low priority items unless legal action pushes them to the forefront. Accessibility may be seen as a low-priority item unless a big client demands it to buy your product. Then Accessibility rapidly rises to the top of your risk list. Security is always a top-level and critical area but how much is critical? If everything security related is critical, then nothing really is. Of course, we want to keep the system secure but what level is intelligent and prudent coverage and what level is overkill? To that end, we create a Risk computation, based on the classic four quadrants (urgency and impact, meaning we have at level 1 high risk and high impact, level 2 being low risk and high impact, Level 3 being high risk and low impact, and level 4 being low risk and low impact). Level 1 is of course the most important and arguably Level 3 is the next most important. Level 4 is probably not even worth our time but again, circumstances can move any of these situations into a different quadrant. This is why risk assessment is never a "one-and-done" thing.

There's a phrase called the "wisdom of the crowd" where the idea is a large group of people can determine what is important. If enough people consider an issue to be an issue, it will be addressed. It may or may not make a lot of sense on the surface but if enough people consider it important and make known the fact it is important, best be sure it will be considered and worked into whatever process is necessary to have it be addressed. The crowd is not always right but it is often a good indication of conventional wisdom. Usability often falls into this. While we may decide that a process is logical and rational, if enough users disagree with us and decide they will not use our product because of it, it will become a talking point and possibly a critical one if enough people voice their displeasure. 

Over time, we can get pretty good at looking at the risk areas we face and weigh them in order of how critical they are. We may never get to a perfect level, but we will come closer to a workable risk assessment that will help us address the most needful things and prioritize those areas over just trying to be thorough and cover everything.

No comments: