One of the great parts of conferences like this is that I meet people I have interacted with for years. Jeroen is one of those people. We worked together on the book "How to Reduce the Cost of Software Testing" back in 2010 but we have never met in person before this week. We've had some great conversations here and now I finally see him present.
I think it's safe to say everyone has been hit with some form of AI and ML in some capacity. If you need an explanation of AI and Machine learning, I'll let Chat GPT tell you ;).
I mean, that's not bad, I'll take it. So I used AI to explain AI. What Inception level is this ;).
AI is always learning and it has been trained on large data sets. I often look at AI as a good research assistant. It can do some pretty good first-level drafting but it may miss out on some of the nuances and it may also not be completely up to date with the information it provides. Also, Machine Learning really comes down to ranking agents and probability. The more successes it establishes, the higher it ranks certain responses. To be clear, even with how rad AI and ML seem to be, we are still in the early days of it. We can have all sorts of debates as to how much AI will take over our work lives and make us obsolete. Personally, I don't think we are anywhere near that level but I'd be a fool to not pay attention to its advances. Therefore, we need to consider not just how we are going to deal with these things but how we are going to test them going forward.
Jeroen talks about the confusion matrix and how that is used to test ML.
The confusion matrix is used to evaluate machine learning models, particularly in classification tasks. Think of it as a table with a number of correct and incorrect predictions made by a model for each class in a set of data.
The four possible outcomes are:
- true positives (TP)
- false positives (FP)
- true negatives (TN)
- false negatives (FN).
A true positive occurs when the model correctly predicts a positive instance.
A false positive occurs when the model incorrectly predicts a positive instance.
A true negative occurs when the model correctly predicts a negative instance.
A false negative occurs when the model incorrectly predicts a negative instance.
Jeroen has two approaches that he is recommending:
The Auditor's Approach
First, we perform a walkthrough so that we can see if the data is reliable and useful. From there, we do a Management Test to use data in enough volume to see if the data as presented works with small and larger numbers. If we can see that the data is relevant with one, and with 25, then we can see if it's relevant with 50 or 100, or 1000 and so on. We can't predict the output but we can have some suppositions as to what they might do.
The Blackhole Approach
This is an interesting approach in which we don't necessarily know what the data is or what we would actually have as data. We can't describe what is actually inside the black hole but we can describe what surrounds or is visible around the black hole. In this capacity, we look for patterns and anomalies that don't correspond with our expectations. If we see a pattern that doesn't match what we expect, we may have an issue or something that we should investigate but we are not 100% sure of that fact. Jeroen explained that there's a technique that can be used in the classic illustrations for "Where's Waldo?" The idea is that with a pen and making some marks on the page, we can figure out where Waldo is in about ten passes. To be clear, the system doesn't know where Waldo is, but it examines patterns in the image and breaks down the patterns to figure out where the item it is looking for might be.
These are neat ideas and frankly, I would not have considered these prior to today but be sure I'm going to think a lot more about these going forward :).