TESTHEAD: ML

Friday, April 21, 2023

Low-level Approaches for Testing AI/ML: an #InflectraCON2013 Live Blog

One of the great parts of conferences like this is that I meet people I have interacted with for years. Jeroen is one of those people. We worked together on the book "How to Reduce the Cost of Software Testing" back in 2010 but we have never met in person before this week. We've had some great conversations here and now I finally see him present.

Jeroen Rosink

Sr. Test consultant, Squerist

I think it's safe to say everyone has been hit with some form of AI and ML in some capacity. If you need an explanation of AI and Machine learning, I'll let Chat GPT tell you ;).

AI, or Artificial Intelligence, refers to the development of computer systems that can perform tasks that would typically require human intelligence. These tasks might include things like recognizing speech or images, understanding natural language, making decisions, and solving problems. AI can be classified into various categories such as supervised learning, unsupervised learning, reinforcement learning, and deep learning.

Machine learning is a subset of AI that focuses on teaching computers how to learn from data without being explicitly programmed. In other words, it's a method of training algorithms to make predictions or decisions based on patterns in data. Machine learning algorithms can be trained on a variety of data types, including structured data (like spreadsheets) and unstructured data (like text or images). The most commonly used machine learning algorithms are supervised and unsupervised learning algorithms.

I mean, that's not bad, I'll take it. So I used AI to explain AI. What Inception level is this ;).

AI is always learning and it has been trained on large data sets. I often look at AI as a good research assistant. It can do some pretty good first-level drafting but it may miss out on some of the nuances and it may also not be completely up to date with the information it provides. Also, Machine Learning really comes down to ranking agents and probability. The more successes it establishes, the higher it ranks certain responses. To be clear, even with how rad AI and ML seem to be, we are still in the early days of it. We can have all sorts of debates as to how much AI will take over our work lives and make us obsolete. Personally, I don't think we are anywhere near that level but I'd be a fool to not pay attention to its advances. Therefore, we need to consider not just how we are going to deal with these things but how we are going to test them going forward.

Jeroen talks about the confusion matrix and how that is used to test ML.

The confusion matrix is used to evaluate machine learning models, particularly in classification tasks. Think of it as a table with a number of correct and incorrect predictions made by a model for each class in a set of data.

The four possible outcomes are:
- true positives (TP)
- false positives (FP)
- true negatives (TN)
- false negatives (FN).

A true positive occurs when the model correctly predicts a positive instance.
A false positive occurs when the model incorrectly predicts a positive instance.
A true negative occurs when the model correctly predicts a negative instance.
A false negative occurs when the model incorrectly predicts a negative instance.

Jeroen has two approaches that he is recommending:

The Auditor's Approach

First, we perform a walkthrough so that we can see if the data is reliable and useful. From there, we do a Management Test to use data in enough volume to see if the data as presented works with small and larger numbers. If we can see that the data is relevant with one, and with 25, then we can see if it's relevant with 50 or 100, or 1000 and so on. We can't predict the output but we can have some suppositions as to what they might do.

The Blackhole Approach

This is an interesting approach in which we don't necessarily know what the data is or what we would actually have as data. We can't describe what is actually inside the black hole but we can describe what surrounds or is visible around the black hole. In this capacity, we look for patterns and anomalies that don't correspond with our expectations. If we see a pattern that doesn't match what we expect, we may have an issue or something that we should investigate but we are not 100% sure of that fact. Jeroen explained that there's a technique that can be used in the classic illustrations for "Where's Waldo?" The idea is that with a pen and making some marks on the page, we can figure out where Waldo is in about ten passes. To be clear, the system doesn't know where Waldo is, but it examines patterns in the image and breaks down the patterns to figure out where the item it is looking for might be.

These are neat ideas and frankly, I would not have considered these prior to today but be sure I'm going to think a lot more about these going forward :).

Tuesday, October 11, 2022

Digitizing Testers: A #PNSQC2022 Live Blog with @jarbon

I must confess, I usually smile any time I see that Jason Arbon is speaking. I may not always agree with him but I appreciate his audacity ;).

I mean, seriously, when you see this in a tweet:

I’m sharing perhaps the craziest idea in software testing this coming Tuesday. Join us virtually, and peek at something almost embarrassingly ambitious along with several other AI testing presentations.

You know you're going to be in for a good time.

Jason Arbon

I'm going to borrow this initial pitch verbatim:

Not everyone can be an expert in everything. Some testers are experts in a specific aspect of testing, while other testers claim to be experts. Wouldn’t it be great if the testing expert who focuses on address fields at FedEx could test your application’s address fields? So many people attend Tariq King’s microservices and API testing tutorials–wouldn’t it be great if a virtual Tariq could test your application’s API? Jason Arbon explores a future where great testing experts are ultimately digitized and unleashed will test the world’s apps–your apps.

Feeling a little "what the...?!!" That's the point. Why do we come to conferences? Typically it's to come and learn things from people who know a thing or three more than we do. Of course, while we may be inspired to learn something or get inspired to dig deeper, odds are we are not going to develop the same level of expertise as, say, Tariq King when it comes to using AI and ML in testing. For that matter, maybe people look to me and see me as "The Accessibility and Inclusive Design Expert" (yikes!!! if that's the case but thank you for the compliment). Still, here's the point Jason is trying to make... what if instead of learning from me about Accessibility and Inclusive Design, *I* did your Accessibility and Inclusive Design Testing? Granted, if I were a consultant in that space, maybe I could do that. However, I couldn't do that for everyone... or could I?

What if... WHAT IF... all of my writings, my presentations, my methodologies & approaches, were gathered, analyzed, and applied to some kind of business logic and data model construction. Then, by calling on all of that, you could effectively plug in all of my experience to actually test your site for Accessibility and Inclusive Design. In short, what if you could purchase "The Michael Larsen AID" testing bot and plug me into your testing scripts. Bonkers, right?! Well... here's the thing. Once upon a time, if someone were to tell me that I could effectively buy a Mesa Boogie Triple Rectifier tube amp and a pair of Mesa 4x12 cabinets loaded with Celestion Vintage 30s, be able to select that as a virtual instrument and impulse controllers, and get a sound that sounds indistinguishable compared to the real thing? Ten years ago. Impossible. Today? Through Amplitube 5, I literally own that setup and it works stunningly well.

Arguably, the idea of taking what I've written about Accessibility and Inclusive Design and compartmentalizing that as a "testing persona" is probably a lot easier than creating a virtual tube amp. I'm not saying that the results would be an exact replica of what I would do while I test... but I think the virtual version of me could reliably be called upon to do what I at least have said I did or at least what I espouse when I speak. Do you like my overall philosophy? Then maybe the core of my philosophy could be written into logic so that you can have my overall philosophy applied to your application.

I confess the idea of loading up the "Michael Larsen AID" widget cracks me up a bit. For it to be effective, sure, I could go in the background and look at stuff and give you a yes/no report. However, that skips over a lot of what I hope I'm actually bringing to the table. When I talk about Accessibility and Inclusive Design, only a small part of it is my raw testing efforts. Sure, it's there and I know stuff but what I think makes me who and what I am is my advocacy and my frenetic energy of getting into people's faces and advocating about these issues. Me testing is a dime a dozen. Me advocating and explaining the pros and cons as to why your pass might actually be a fail is where I can really be of benefit. Sure, I could work in the background, but I'd rather be the present Doctor as we remember him on Star Trek: Voyager.

Thanks, Jason. This is a fun and out-there thought experiment. I must confess the thought of buying me as a "Virtual Instrument" both cracks me up and intrigues me. I'm really curious to see if something like this could really come to be. Still, I think you may be able to encapsulate and abstract my core knowledge base but I'd be surprised if you could capture my advocacy. IF you want to try, I'm game to see if I could be done ;).

Pages

Friday, April 21, 2023

Low-level Approaches for Testing AI/ML: an #InflectraCON2013 Live Blog

Tuesday, October 11, 2022

Digitizing Testers: A #PNSQC2022 Live Blog with @jarbon

Jason Arbon