Friday, April 21, 2023

Low-level Approaches for Testing AI/ML: an #InflectraCON2013 Live Blog


 One of the great parts of conferences like this is that I meet people I have interacted with for years. Jeroen is one of those people. We worked together on the book "How to Reduce the Cost of Software Testing" back in 2010 but we have never met in person before this week. We've had some great conversations here and now I finally see him present.

Jeroen Rosink Avatar

Jeroen Rosink

Sr. Test consultant, Squerist

I think it's safe to say everyone has been hit with some form of AI and ML in some capacity. If you need an explanation of AI and Machine learning, I'll let Chat GPT tell you ;).


AI, or Artificial Intelligence, refers to the development of computer systems that can perform tasks that would typically require human intelligence. These tasks might include things like recognizing speech or images, understanding natural language, making decisions, and solving problems. AI can be classified into various categories such as supervised learning, unsupervised learning, reinforcement learning, and deep learning.

Machine learning is a subset of AI that focuses on teaching computers how to learn from data without being explicitly programmed. In other words, it's a method of training algorithms to make predictions or decisions based on patterns in data. Machine learning algorithms can be trained on a variety of data types, including structured data (like spreadsheets) and unstructured data (like text or images). The most commonly used machine learning algorithms are supervised and unsupervised learning algorithms.

I mean, that's not bad, I'll take it. So I used AI to explain AI. What Inception level is this ;).

AI is always learning and it has been trained on large data sets. I often look at AI as a good research assistant. It can do some pretty good first-level drafting but it may miss out on some of the nuances and it may also not be completely up to date with the information it provides. Also, Machine Learning really comes down to ranking agents and probability. The more successes it establishes, the higher it ranks certain responses. To be clear, even with how rad AI and ML seem to be, we are still in the early days of it. We can have all sorts of debates as to how much AI will take over our work lives and make us obsolete. Personally, I don't think we are anywhere near that level but I'd be a fool to not pay attention to its advances. Therefore, we need to consider not just how we are going to deal with these things but how we are going to test them going forward.

 Jeroen talks about the confusion matrix and how that is used to test ML.

The confusion matrix is used to evaluate machine learning models, particularly in classification tasks. Think of it as a table with a number of correct and incorrect predictions made by a model for each class in a set of data.

The four possible outcomes are:
- true positives (TP)
- false positives (FP)
- true negatives (TN)
- false negatives (FN).


A true positive occurs when the model correctly predicts a positive instance.
A false positive occurs when the model incorrectly predicts a positive instance.
A true negative occurs when the model correctly predicts a negative instance.
A false negative occurs when the model incorrectly predicts a negative instance.

Jeroen has two approaches that he is recommending:

The Auditor's Approach

First, we perform a walkthrough so that we can see if the data is reliable and useful. From there, we do a Management Test to use data in enough volume to see if the data as presented works with small and larger numbers. If we can see that the data is relevant with one, and with 25, then we can see if it's relevant with 50 or 100, or 1000 and so on. We can't predict the output but we can have some suppositions as to what they might do.

The Blackhole Approach

This is an interesting approach in which we don't necessarily know what the data is or what we would actually have as data. We can't describe what is actually inside the black hole but we can describe what surrounds or is visible around the black hole. In this capacity, we look for patterns and anomalies that don't correspond with our expectations. If we see a pattern that doesn't match what we expect, we may have an issue or something that we should investigate but we are not 100% sure of that fact. Jeroen explained that there's a technique that can be used in the classic illustrations for "Where's Waldo?" The idea is that with a pen and making some marks on the page, we can figure out where Waldo is in about ten passes. To be clear, the system doesn't know where Waldo is, but it examines patterns in the image and breaks down the patterns to figure out where the item it is looking for might be.


These are neat ideas and frankly, I would not have considered these prior to today but be sure I'm going to think a lot more about these going forward :).



Technical Debt Is Not Free: an #InflectraCON2013 Live Blog

 


Chad Green Avatar

Chad Green

Director of Architecture, Glennis Solutions


When you hear the term "Technical Debt", what does that mean to you? Often the term "a quick and dirty fix" is used with the idea that it will be taken care of later. In short, anything you have to revisit later because of the limitations of today is specifically technical debt. 

The fact is, many of us have probably participated in the process of developing technical debt, whether we intended it or not. Even mature teams find themselves in technical debt, sometimes by active means, and sometimes by inaction. I remember working with an organization that had a great automation framework and it was very robust, with a lot of code libraries to support it. It was a great system, as long as the developers that created it were there to maintain and support it. However, we came to a point where our developers that did support this were no longer there and the team members that were there did not have the level of expertise necessary to maintain it. What was once a vital linchpin of our efforts became outdated and in some ways dangerous to do any work on. We went from having a reliable system to a need to modernize. In short, we woke up with a technical debt through no intentional effort on our part but we had to address it. We ultimately did but it took time and effort and a lot of iteration. We learned from that experience that we needed to make sure that what we developed had a broad base of support so that any of us could work on and maintain it.

Going into debt is not a crime or even a bad situation by itself. It can certainly become a bad situation if it's not addressed or worse, ignored. In many ways, we have to be more careful and make sure we are doing things in ways that are maintainable and understood. I went through this recently with some changes I proposed to a system that had a different way of handling API data (this came from suggestions of one of our devs). When I submitted it, the comments back were, "This is interesting and a way that we hadn't considered. We're not saying "no" but we may want to make sure we understand why we'd want to do it this way. Can we revisit this in the next sprint?" That's a perfectly reasonable request. Let's understand what making this change might be and how it might modify our testing methodology.

Technical debt sounds scary but there are ways to limit it or mitigate it. Take the time to do actual code reviews and make sure that the changes being made are understood. Talk about these things in Stand Up if necessary but surface these issues, as well as discuss these in your sprint reviews. Much of the time technical debt will be hiding in plain sight. The best way to deal with technical debt is to look for it and identify it as early as possible.


Bad Tests Running Wild - an #InflectraCON2023 Live Blog


 

Paul Grizzaffi Avatar

Paul Grizzaffi

Senior QE Automation Architect, Vaco

Paul and I go way back. It's always fun to see my fellow in heavy metal arms at these events. We frequently talk music as much as we talk testing, so we are often in each other's sessions and today is no exception. Plus, Paul and I love making musical puns in our talk titles, and seeing Bad Tests Running Wild, I knew that was a reference to Scorpions' lead-off track from 1984's "Love at First Sting", aka "Bad Boys Running Wild"... yeah, this is going to be fun :).


The point here is that, especially with CI/CD pipelines, we need to have the tests pass to successfully complete and deploy an application. If a test fails, the whole process fails. By virtue of how tests run in a CI/CD pipeline, we need to make sure that any test that we have can run all the time, independent of any other test, and independent of any state of our product. This means a flakey test can really derail us. Note, this is not talking about a test legitimately failing or finding a fault. This is more the "random timeout because of a latency that occurs and that has nothing to do with our application".    

Let's think about how we create our calls and procedures. Do we have everything under our own umbrella? How much of our solution uses third-party code? Do we understand that third-party code? If we are using threaded processes for concurrency, are all of our components able to use those concurrent thread approaches?  

Let's think about configuration and how we set things up. Why do we want or need parallelization? Overall, it comes down to time and speed. I remember well our earlier setup with Jenkins from about a decade ago. It took us several hours to run everything in serial. Thus, we needed to set up the environment in such a  way that we could run four servers in parallel. At a point, we have to look at the costs of running our CI/CD pipeline vs. the time it takes to deploy. Our sweet spot was determined to be four servers running in parallel. Those four servers ran our tests in twenty minutes and then did our deployment if everything went smoothly. Going from several hours to twenty minutes was a big time saving but yes, it cost to set up robust enough servers to get those savings in time. After those four servers, we determined that adding more servers created a less favorable cost to time savings, as compared to running four servers. Still, it was critical to make sure that any tests we ran and any states that changed had to be all self-contained. No test was allowed to leave any residual footprints. Additionally, we had to ensure that our main server and out client machines were responding quickly enough to make sure that we didn't have potential latency with multiple machines (heck, spinning up a machine in a different server farm could mess everything up, so you needed to make sure that everything was proximate to each other.   

Also, we are only considering what happens when a test fails when we don't want it to or it's not supposed to fail. However, we also have to consider the flip side, which is what happens if a test passes that shouldn't? That's the flip side of a flaky test. What if we have made a change but our test is too generic to capture the specific error that we have introduced? That means we may well have introduced a bug that we didn't or wouldn't catch. 

Risks are always going to be present and our goal as testers and automation specialists is that we want to look at the potential risks that are present. What is the basic risk we need to mitigate? What happens when we deploy our systems? Do we have the ability to back out of a change? What do we need to do to redeploy if necessary? If we deploy do we have an easy way to monitor what has gone in? Paul makes the point that, if a change is potentially expensive, then you probably need human eyes to watch and monitor the situation. If there's little cost or risk of failures, then it could be handled without a person looking over it. Regardless, you will need to have the ability to monitor and to that end, you need logs that tell you meaningful information. More to the point, you need to know where the logs are and that they are actually accessible.

As always, exciting, interesting, and great fod for thought. Thanks, Paul. Rock on!!!

The Dark Side of Test Automation: an #InflectraCON2023 Live Blog

 



Jan Jaap Cannegieter Avatar

Jan Jaap Cannegieter

Principal Consultant, Squerist


Jan starts out this talk with the idea from Nicolas Carr's book "The Glass Cage" that "the introduction of automation decreases the craftsmanship of the process that is automated". I've seen this myself a number of times. There's a typical strategy that I know all too well:

- explore an application or workflow
- figure out the repeatable paths and patterns
- run them multiple times, each time capturing a little more into scripts so I don't have to keep typing
- ultimately capture what I need to and make sure it passes.

The challenge with this is that, by the time I'm done with all this, unless a test breaks, that test will now effectively run forever (or every time we do a build) and honestly, I don't think about it any longer. The question I should be asking is, "If a test always passes, is it really telling us anything?" Of course, it tells me something if the test breaks. What it tells me varies. It may indicate that there's a problem but it also may indicate a frailty in my test that I hadn't considered. Fix it, tweak it, make it pass again, and then... what?  

I'm emphasizing this because Jan is. Just because a test is automated doesn't necessarily tell us how good the testing is, just that we can do it over and over again. Likewise, just because a test is automated, it doesn't really give us much indication as to the quality of the testing itself. Let me give an example from my own recent testing which revolves around APIs. On one hand, I am able to find a variety of ways to handle GET and POST commands but on the other, do I really know that what I am doing actually makes sense? I know I have a test or a series of tests but do I actually have tests that are worth running repeatedly? 

I appreciate the fact that automation does something important but it may not be the importance we really want. Automation makes test efforts visible. It's hard to quantify exploratory sessions in a way that is easy to understand. By comparison, it's easy to quantify the statement, "I automated twenty tests this week". Still, much of the time, the energy I put into test automation saves me repetitive typing, so that part is great but it doesn't specifically find bugs for me or uncover other paths that I hadn't considered. 

There are five common misunderstandings when it comes down to test automation:

- the wish to automate everything

I have been in this situation a number of times and it typically becomes frustrating. More times than not, I find that I'm spending more time futzing with tooling than I am actually learning about or understanding the product. There's certainly a variety of benefits that come with automation but thinking the machines will make the testing more effective and frequent often misses the mark.

- you can save money with test automation

Anyone who has ever spent money on cloud infrastructure or on CI/CD pipelines realizes that often having more automated testing doesn't save money at all, it actually increases cycles and spending. Don't get me wrong, that may very well be valuable and helpful in the long run but thinking that automation is going to ultimately save money is short-sighted and in the short term, it absolutely will not save money. At best, it will preserve your investment... which in many cases is the same thing as saving money, just not in raw dollar terms.

- automation makes testing more accessible

Again, automation makes testing more "Visible" and "Quantifiable" but I'd argue that it's not really putting testing into more people's hands or making them more capable. It does allow the user who maintains pipelines to be able to wrap their heads around the coverage that exists but is it really adding to better testing? Subjective at best but definitely a backstop to help with regressions.

- every tester should learn how to program

I'd argue that every tester who ever takes a series of commands, saves them in a script, and then types one command instead of ten is programming. It's almost impossible not to. Granted, your programming may be in the guise of the shell but it is still programming. Add variables and parameters and you are de facto programming. From there, stepping into an IDE has a bit more learning but it's not a radical step. In other words, it's not a matter of, "Does every tester need to learn how to program?" We invariably will. To what level and at what depth is the broader question.
 
- automation = tooling

I'm going to argue that this is both a "yes" and "no". As I said previously, you can do a lot of test automation using nothing but a bash shell (and I have lots of scripts that prove this point). Still, how do scripts work? They work by calling commands that pipe the output to some other command and then based on what we pipe to what, we do one thing or we do something else. Is this classic test tooling as we are used to thinking about it? No. Is it test tooling? Well, yes. Still, I think if you were to present this to a traditional developer, they would maybe raise an eyebrow if you explain this as test tooling. 

My rationale and it seems Jan feels a similar way is that we need to look at automated testing as more than just a technical problem. There are organizational concerns, there are perception issues, and there are communication issues. Having automation in place is not sufficient. We need to have a clear understanding of what automation is providing. We need clarity on what we are actually testing. We need to have an understanding of how robust our testing actually is and also how much of our testing is tangibly capturable in an automated test. What does our testing actually cover? How much does it cover? What does running one test tell us versus what ten tests tell us? Are we really learning more with the ten tests we run or is it just a number to show we have lots of tests?

The real answer to this comes down to, "Why are we testing in the first place?" We hope to get the information we can make judgment calls on and ultimately, automated tests have a limited ability to make judgment calls (if they can make them at all). People need to analyze and consider to see what is going on and if it is actually worthwhile. It has its place, to be sure, and I wouldn't want my CI/CD environments running without them but let's not confuse having a lot of tests with having good tests.


Castle Defense 101 (aka Threat Modeling): an #InflectraCON2023 Live Blog

Well, good morning to everyone. We woke up to some excitement today. A fore alarm went off at 7:30 a.m. this morning, so I had to evacuate the venue. It was interesting watching me react to what was happening. I've always been fond of saying I'd be one to just get up and get out but as it turns out, no, I had the presence of mind to gather my stuff quickly and push it into some bags, grab my laptop and backpack, and then walked out. Fortunately, I travel relatively lean as a principle so it didn't take me long but were it a more dire situation, I fear I may have been one of those stragglers that might have been potentially cut off. All's well that ends well but yeah, gave me pause to think, to say the least.

Anyway, on to today's sessions.

Gene Gotimer Avatar

Gene Gotimer

DevSecOps Engineer, Praeses

This talk on Castle Defense is both entertaining and an interesting look at what we might face as potential security issues and how we want to protect against potential attacks. When we think of castles, we often think of grand, large structures. Interestingly, castles all start as smaller strutcures. If we think of the old chessboards and the character of the Rook (or the Castle, yes), they are minimalist towers in many cases and typically, that's where keep and bailey castles start, too. They may be small or not terribly imposing but they can still be remarkably effective at defense if set up correctly. This is an interesting metaphor when it comes to security because, in many ways, security is graded on a curve. It also helps to know what your castle or fortress is meant to defend. A manor home for a noble is designed to keep people out. Prisons are meant to keep people in. It's important to know what you are protecting and what direction matters. 

In addition to the conversations about potential security threats for applications, we are taking the time to look at and analyze actual castles from the medieval period and later to see what they did and how they set up their defenses, and then analyze ways we could undermine its security. Granted, some of these castles have been modernized and no longer set up in a logical way that would have addressed the threats of the past (an example of a castle in the Netherlands shows a ground-level manor with what looks to be no defensive walls, windows down to the ground, and a river flowing by outside. In short, it doesn't look to be defensive in any meaningful way, until you see the center tower. That tower resembles what may have been the original structure, with a broad overhand with machicolations (I love that word so much (LOL!) ) but you can see that time and necessities have changed how the building is used. It's original threat modeling from when it was built wasn't necessary for later centuries, so the building was adapted to face more modern realities. Many castle fortifications made sense in the era of catapult and trebuchet but became obsolete with the advent of gunpowder and cannons.

 So let's consider this in the modern world. We aren't building castles and keeps in the literal sense (generally speaking for this audience) but our applications are in many ways our castles. If they are breached or hacked, our data, our financial well-being, and our reputations are on the line, so the threats, while different, are every bit as potentially devastating. Thus we need to put time and attention towards making a rational and logical level of security for our applications. Some situations are going to require more hardening than others. If you have an informational site with no database backend for transactions, your threat modeling is going to be smaller and less intensive compared to a site that handles the personal data of individuals or the literal handling of payments. A WordPress blog is going to be lower in priority compared to a banking app. We need to measure our time and investments for the threats that make sense.

There's a site called "Threat Modeling Manifesto" that spells out a broad range of these possible attacks and threats and how to handle them. From their headline:

Threat modeling is analyzing representations of a system to highlight concerns about security and privacy characteristics.

At the highest levels, when we threat model, we ask four key questions:

  • What are we working on?
  • What can go wrong?
  • What are we going to do about it?
  • Did we do a good enough job?
This was an interesting way to talk about this topic and I applaud the creative approach. It took a potntially dry topic and made it a lot more engaging.

Thursday, April 20, 2023

Being an A11y: Why Accessibility Advocacy Matters: my talk from #InflectraCON2023

Accessibility is a broad area. It can be applied to many different scenarios and can be met in many different ways. At the end of the day, though, we are dealing with people with challenges and concerns that, let's face it, most if not all of us will face if we live long enough. 

Accessibility is more than checking off a box that says "We are compliant". It is advocating for people to be able to effectively participate in daily life as any of us would, with accommodations where necessary. 

In this talk, I will show you areas where we can do better to make products more usable, not just for those with physical disabilities but for all users. I will demonstrate tools and techniques to help test as well as make a case on behalf of those people who are not able to speak for themselves.


Michael Larsen Avatar

Michael Larsen

Senior Quality Assurance Engineer, Learning Technologies Group/PeopleFluent 

The key to this talk this go around was that I stepped a bit away from the what and the how (still important) and emphasize the "why". This was less a talk about tools and processes (though I touched on them) and instead emphasized ways we could advocate for Accessibility. As always, I owe a big round of thanks to Jeremy Sydik (his 10 principles will probably always be a keystone to my presentations) and to Albert Gareev (the HUMBLE principles are still, in my mind, the easiest way to encourage Accessibility advocacy regardless of your skill level).

I do want to share this tweet the organizers of InflectaCON shared because, wow, this made my day :).


Accelerating Quality with Conscious Deliveries: an #InflectraCON2013 Live Blog


So this is fun. Lalit and I have known each other for years. We have attended conferences together. I've been interviewed by Lalit for Tea Time With Testers. We've worked together within the Association for Software Testing on a variety of initiatives. Having said all that, I think this is the first time I've actually seen/heard Lalit speak :).

Lalitkumar Bhamare Avatar

Lalitkumar Bhamare

Tech Arch Manager, Accenture Song

It's interesting to realize that with all of the technological advances we have had over the past thirty years (probably longer but I've only been in the game for three decades), we still have to pick from the speed, cost, and quality triangle (you know it, "Fast, Cheap, Good. Pick two!"). It seems that if any shift is going to happen, it typically happens at the "Good" part, meaning that if any pressure comes into the situation, the quality side is the side that ends up bending. Granted, often that means we get "good enough" and for many people, that is sufficient. 

The irony is that we don't have to settle for good enough but it will require that upfront planning and resources be allocated to make sure that quality is reinforced. This comes down to requiring people to be motivated to provide not just good testing but a mindset of the importance of testing beyond the busywork of automation and declaring that testing has been performed. 

We had a little discussion about apps we actually like using. Recently due to life circumstances, I've become more familiar with healthcare apps. My company's insurance provider has done big on telemedicine recently and to that extent, it seems they have either found out I'm a tester or I just tick off a lot of boxes for them because they have thrown just about every app and tool my direction to manage healthcare and treatment options from a digital perspective. Some of these apps have been really helpful and some of them have been... less than desirable, to say the least. To be clear, these apps are not developed by the insurance provider but they are either partnerships or investments that my insurance provider has made and encourages use. It's interesting to compare them and see what makes them "quality" products vs. not-so-good quality. Also, some apps have great quality in some areas while being less good in others. A perfect example of this is an app I'm involved with that focuses on weight management specifically for people who are at risk for Type 2 Diabetes (which family history points to me being, so I'm part of their initiative for that reason). In areas where real human interaction takes place, it's great. However, they have recently made decisions to limit website updates and almost exclusively manage via their mobile app. In one way, this makes sense, as we are more likely to be within arms reach of our phones at meal breaks as compared to our computers. Still, I type way faster on a computer than I do on my phone, so invariably, their change has resulted in my updates being delayed, sometimes by days, because it's less convenient for me to enter the details. I'm curious if anyone on their team even brought up this possibility.

Lalit emphasized three "P" areas of quality consideration. You have a Project aspect, a People aspect, and a Product aspect. He additionally emphasizes the 4 "E"s of quality. Enable, Engage, Execute, and Evaluate. The 4Es apply to each of the 3Ps. Granted, each of these elements has a context based on where it is applied and there are biases that come into play, it uses the story of the "Parable of the Elephant" where our limitations often constrain our vision and view of an aspect of something (I had a good laugh realizing that Lalit chose Dieter F. Uchdorf's telling of this story. It took me a minute but I kept thinking "wait, I know that voice" (LOL!) ).

Over time, we need to be open to learning new things and incorporating more knowledge of the potential for requirements, and translating that to concrete actions that can be used. As testers, we may be doing a lot of work but we may not actually be aligned with actual business issues or challenges. As I love to borrow from Steve Covey, how infuriating is it to know we've climbed a significantly tall ladder only to realize we've placed it against the wrong wall?

We know that we cannot engineer quality, at least not in a literal sense. What we can do is preserve as much of the product's intended integrity as possible and take steps to make sure that we are learning and focusing on areas to make sure that we are creating the best product we can. To that end, having quality experience sessions can help to inform how a product is being used and what can be done going forward. Ideally, these considerations are made early on in the life of the product or as it is being developed. For that to be effective, it requires people with a focus on quality to ask questions and experiment with requirements early on. The later this happens, the less likely they will be of value in design but it might be very demoralizing to realize the "right ladder, wrong wall" problem is happening after we've climbed quite a bit.

To wrap this up, if you need to have a simple thing to consider and practice, "test early, test small, and test continuously" is a pretty good approach, and apply it to all of the areas you interact with. if you find it valuable, share the approach and help it expand through the organization.


Use Design Thinking and Gut Brain for Agile Teaming: an #InflectraCON2023 Live Blog


 Jenn Bonine Avatar

Jenn Bonine

Founder and CEO, Valhalla Capital and COYABLU

I confess I'm stealing these graphical assets from the InflectraCON site but they make things nice and easy to associate a face with a name and a talk with a byline. Jenn and I have spoken at a variety of conferences together over the past decade and it's always fun to catch one of her sessions. Also, I'm proud to announce that the pink lion she gave me at last year's InflectraCON sits proudly on my guitar amp at home (LOL!).

One of the ways we tend to create or develop things is that we make them based on what "we" want. "We" means the people who are making it making it for themselves. However, Design Thinking goes beyond that, in the sense of designing something that will affect and influence many people, perhaps millions or even billions at the outer edge.  

Design thinking incorporates elements of imagining the future. What will this thing look like a year, five years, or ten years from now? Can we speak to what that future might look like? More important, can we actually shape that future? This is the FORESIGHT aspect necessary for Design Thinking. Often, we have to get out of the ways that make sense to us and see if we can make sense in a way that works for many people. 

Apple is a good example of a company that started as one thing and then over time grew into and developed markets that not only were not their specialties earlier but didn't really exist prior. Just think about the way that we interact with our phones today (and for that matter what phones look like today). There is a dividing line between pre-iPhone and post-iPhone coming on the scene. Cell phones used to have wildly different design aspects prior to 2007. Now, almost every phone we see is the ubiquitous "black mirror", regardless of who makes it. 

ChatGPT is the new design thinking brainchild that is capturing everyone's attention. We now have the job description of "Prompt Engineer" which, unless you are an AI nerd, was something you may have never even heard of or considered but now will likely become a prevalent focus. This has come about because ChatGPT has considered the breadth of who their potential customers might be. You may think to yourself that there is a limited level of use for this technology but given some time and experience, you realize there is a lot that can be done provided you know what to ask and how to ask it. That's prompt engineering in a nutshell.

So how does Design Thinking fit into all of this? In art, it means that we need to look beyond our immediate needs and our immediate bubble. Too often, we only focus on the areas that are either our specific pain points or our immediate sphere of influence. While that's a great approach for inner peace, it's probably going to cause us to fall behind if we don't look further afield. Heck, as a person that does automation, we can often get used to the patterns that we have determined that work. Once we do, we can do a lot of work and be seen as effective. That may work for a time but if we don't continually look farther afield, we will effectively be working feverishly on yesterday's problems. Taking a bigger picture look to see what can either make things easier or, barring that, may pay off years down the road can be a big step. 

Design thinking principles (originally from Nielsen-Norman Group)


The ideas behind design thinking fall into three broad areas and six efforts that support those areas. First, we have to understand (discover) a problem, explore our options (create) , and then materialize those results (deliver). Often, we start with one idea but in the process, if we are open to and see the interactions with other people, we may develop something different than what we originally envisioned. Each of the big areas can be broken down more, such as Empathize and Define for Discover, Ideate and Prototype for Create, and then Test and Implement for Deliver.

The next step is to look at what is called the "Gut Brain" (And while we're not talking about specifically the microbiota that resides in our digestive tracts, it absolutely plays into this ;) ). This sense of "Gut Brain" is delving into our intuitiveness and looking at things that hit us viscerally and deciding to use those intuitions and interactions to look farther afield of our immediate considerations. In short, this is a way of tapping into our playful nature and making considerations we might not tap into. Sketch notes are examples of this, where a person instead of just writing words includes pictures and other doodles to help inform and nurture further thought and understanding. In some ways, it requires visualizing a different world than the one others see. I'm reminded of the woman that finally beat Ken Jennings in Jeopardy years back. How did she do it where others failed? In part, she didn't go in thinking Ken was unbeatable. She visualized herself being able to beat him and used that gut instinct to get the better of him. I'm spacing on who it is but I always loved her statement that so many people thought, "It's such an honor to even compete against him" but she said, "He's beatable, I just have to figure out how". She led with that gut instinct and figured out how to prevail.

Lots of neat stuff to consider here as well as a lot of digital tools. In short, explore, discover, consider, play, and see if you might actually reinvent your world.

Into the Depths of Risk Analysis to Improve Your Testing: an #InflectraCON2023 Live Blog


It has been a while since I've done a blog update. Granted, it's been a while since I've been anywhere so reality has been much of the same but I am currently at InflectraCON and taking notes, so you can all come along for the ride if you'd like :).

Bob Crews Avatar

Bob Crews

CEO, Checkpoint Technologies


Our first talk is with Bob Crews and is covering Risk Analysis to improve testing. Interestingly, we have seen the complexity of software development explode over the past couple of decades. Web sites and apps have matured significantly and what they can do has increased exponentially and continues to do so. By virtue of that, sites and apps are becoming more challenging to test every day. We can't test everything, no matter how delusional we believe ourselves to be. Thus, we have to apply a different metric. We have to consider what is critical and of most importance, and then work our way down from there to "nice to have" long before we ever get remotely close to "we've done verything" (trust me, no one gets to that point, ever).

With this, we need to make sure that we have a clear understanding of what areas are most important, what risks we face, and how we are able to mitigate those risks, to the best of our ability. We can't prevent risk but we can do some mitigation in the process. By analyzing what the potential threats are, we can make sure that we put the most important situations at the forefront. Elisabeth Hendrickson often led with the idea of waking up and seeing your company on the front page of the local newspaper. What would be the most terrifying thing you could see in those headlines? If you can envision that, then you can envision what the potential risks are if your product were to fail. Odds are, we will never face anything that dire but it illustrates the critical elements that we should be alert to. By putting those horrorshow examples front and center, you have done a simple risk analysis of what could go wrong. From there, you can start to consider what would be next in line, and then consider how to mitigate those potential issues.

To be clear, risk assessment is a time-consuming process and can be as formal or informal as you want to make it. It can be an enterprise-level operation and exercise, or it can be a personal and singular effort just for our own benefit. I'm not sure how many people have pipeline CI/CD systems but much of the time, we have created tests that are independent and can run in any order. That's great for parallelization and speed but it may not be the best approach for risk mitigation. In a randomized, parallelized environment, every test is basically considered equal. Every test has the same potential to be a pass or fail and every test can stop the pipeline until it is resolved. How often do we find ourselves working on trivial tests that stop the system while something major doesn't even get run? There are possible ways to set up a prioritized run and make those the tests that get run first and cover the broadest area possible. By doing this, we can schedule and structure our tests so that they run in a criticality order. Think of it as placing your tests in folders, where those folders are rated by priority. We would of course want to run the tests in folder #1 before we run the tests in folder #9. To determine what those tests are based on that kind of hierarchy, we would need to evaluate and assign a risk assessment to each test.

By taking the time to look at a test, giving it a risk impact score, a likelihood that it might happen, and the possible frequency that it might happen, we can determine which bucket an item falls into. Also, high impact is subjective much of the time, but there are places where that subjectivity can rise from annoyance to a critical issue. Over time, we can get to the point where we might assign a weight to these tests, let's say that 99 is a top weight and 10 may be a minimal weight (I'd argue anything less than 10 may not even be worth running, at least not daily or as part of the full CI/CD commitment).

The fact is, we often look at risks as being "Acceptable". For years, Accessibility and Inclusive Design are low priority items unless legal action pushes them to the forefront. Accessibility may be seen as a low-priority item unless a big client demands it to buy your product. Then Accessibility rapidly rises to the top of your risk list. Security is always a top-level and critical area but how much is critical? If everything security related is critical, then nothing really is. Of course, we want to keep the system secure but what level is intelligent and prudent coverage and what level is overkill? To that end, we create a Risk computation, based on the classic four quadrants (urgency and impact, meaning we have at level 1 high risk and high impact, level 2 being low risk and high impact, Level 3 being high risk and low impact, and level 4 being low risk and low impact). Level 1 is of course the most important and arguably Level 3 is the next most important. Level 4 is probably not even worth our time but again, circumstances can move any of these situations into a different quadrant. This is why risk assessment is never a "one-and-done" thing.

There's a phrase called the "wisdom of the crowd" where the idea is a large group of people can determine what is important. If enough people consider an issue to be an issue, it will be addressed. It may or may not make a lot of sense on the surface but if enough people consider it important and make known the fact it is important, best be sure it will be considered and worked into whatever process is necessary to have it be addressed. The crowd is not always right but it is often a good indication of conventional wisdom. Usability often falls into this. While we may decide that a process is logical and rational, if enough users disagree with us and decide they will not use our product because of it, it will become a talking point and possibly a critical one if enough people voice their displeasure. 

Over time, we can get pretty good at looking at the risk areas we face and weigh them in order of how critical they are. We may never get to a perfect level, but we will come closer to a workable risk assessment that will help us address the most needful things and prioritize those areas over just trying to be thorough and cover everything.